Loading...
2 results
Search Results
Now showing 1 - 2 of 2
- Retrieving, classifying and analysing narrative commentary in unstructured (glossy) annual reports published as PDF filesPublication . El-Haj, Mahmoud; Alves, Paulo; Rayson, Paul; Walker, Martin; Young, StevenWe provide a methodological contribution by developing, describing and evaluating a method for automatically retrieving and analysing text from digital PDF annual report files published by firms listed on the London Stock Exchange (LSE). The retrieval method retains information on document structure, enabling clear delineation between narrative and financial statement components of reports, and between individual sections within the narratives component. Retrieval accuracy exceeds 95% for manual validations using a random sample of 586 reports. Large-sample statistical validations using a comprehensive sample of reports published by non-financial LSE firms confirm that report length, narrative tone and (to a lesser degree) readability vary predictably with economic and regulatory factors. We demonstrate how the method is adaptable to non-English language documents and different regulatory regimes using a case study of Portuguese reports. We use the procedure to construct new research resources including corpora for commonly occurring annual report sections and a dataset of text properties for over 26,000 U.K. annual reports.
- Multilingual financial narrative processing: analysing annual reports in English, Spanish and PortuguesePublication . El-Haj, Mahmoud; Rayson, Paul; Alves, Paulo; Herrero-Zorita, Carlos; Young, StevenThis chapter describes and evaluates the use of information extraction (IE) and natural language processing (NLP) methods for extraction and analysis of financial annual reports in three languages: English, Spanish, and Portuguese. The work described retains information on document structure which is needed to enable a clear distinction between narrative and financial statement components of annual reports and between individual sections within the narratives component. Extraction accuracy varies between languages with English exceeding 95%. We apply the extraction methods on a comprehensive sample of annual reports published by UK, Spanish, and Portuguese non-financial firms between 2003 and 2014.