Sustainability Disclosures and Greenwashing: An Analysis of Corporate Reports Using Machine Learning

  • Typ:Masterarbeit
  • Betreuung:

    Niklas Letmathe

  • Zusatzfeld:

    2025

  • As sustainability reporting gains prominence in corporate communication, concerns about greenwashing have become increasingly relevant. This study examines how machine learning methods can support the detection and classification of greenwashing in corporate sustainability disclosures. A sample of 95 sustainability reports from Fortune Global 100 companies was used. Based on these reports, over 20,000 environmental claims were extracted. These were automatically annotated by GPT-4o using a multi-class framework that distinguishes genuine claims, vague or ambiguous claims, potential greenwashing, and contradictory claims. To classify the annotated claims, a traditional TF-IDF combined with Logistic Regression and a domain-specific transformer model (ClimateBERT) were trained on both the original dataset and an augmented version enhanced with synthetic claims. The results indicate that data augmentation and removing ambiguous labels substantially improved classification performance, with ClimateBERT achieving an accuracy and macro-average F1 score of 0.83. Beyond model development, the study investigates sectoral differences in disclosure practices, revealing that the oil, gas, and mining sector and the manufacturing sector presented a higher proportion of potentially misleading claims compared to sectors such as healthcare. These findings support the view that greenwashing should be seen as a spectrum rather than a binary issue. The study contributes to the growing body of literature on automated sustainability assessment and provides insights for companies and regulators seeking to enhance transparency in environmental communication.