data

In the world of finance, professionals are constantly looking for ways to improve data processing and analysis, especially unstructured data. Unstructured data is information that does not reside in a traditional row-column database, does not have a predefined data model or structure, and cannot be processed or analyzed using conventional data tools and methods. Specifically, unstructured financial data can be found in various forms such as emails, financial reports, news articles, social media posts, and customer feedback. With the exponential growth of unstructured data, it has become increasingly important to utilize artificial intelligence (AI) solutions to efficiently process and extract insights from this data. One such AI approach is named entity recognition (NER), a machine learning-based natural language processing (NLP) technique that helps create structure from unstructured textual documents by identifying and extracting entities within the document.

Named Entity Recognition vs. Language Models

Named entity recognition (NER) can classify entities and put them into predefined categories such as names of people, organizations, locations, times, quantities, monetary values, percentages, and more, depending on how you trained your model. On the other hand, language models are designed to understand and generate human language based on the probability of a sequence of words appearing in a given text.

While both NER and language models contribute to the field of NLP, using an entity recognition model on financial forms offers several advantages over using a language model:

  • Precision: NER models are specifically designed to extract and classify entities within a text, whereas language models focus on understanding and generating human language. Consequently, NER models are better suited for extracting key information from financial forms, such as names, dates, and monetary values, with greater precision and accuracy.
  • Efficiency: NER models can quickly and accurately identify the relevant entities in a text, helping to sort and structure unstructured data. This is particularly useful in finance, where professionals often need to process large amounts of data in a short period of time
  • Customization: NER models can be customized to recognize and categorize specific entities relevant to the financial domain, such as stock symbols, financial ratios, and economic indicators. This level of customization allows for more accurate and relevant extraction of information from financial documents.
  • Integration: NER models can be easily integrated with other NLP techniques and machine learning algorithms to provide a more comprehensive intake of financial documents. This enables finance professionals to gain a deeper insight into the data and make more informed decisions.

Example:

We trained an NER model on 40 DCF passages (excerpts from SEC merger forms) that are written differently but generally convey the same message about financial metrics – a great scenario for an NER model. In comparison, we primed a ChatGPT model to read and understand these passages. Here are the results from the NER Model: 

Retrieved from:; UNITED STATES SECURITIES AND EXCHANGE COMMISSION, “Schedule 14A Horizon Therapeutics Public Limited Company”, January 23, 2023, https://ir.horizontherapeutics.com/node/21376/html

Tags various method types used, discount ranges, and the discount rate types correctly. This took a few annotators around an hour to complete and accuracy was 92.4%

We also took the same amount of time to train a GPT model by pointing at the method types, discount rate ranges, and other financial metrics through prompting.

As we can see above, ChatGPT yielded results with wordy answers, only sometimes correct. On the right the method type is not correctly identified and our GPT model answer gives a summarized version of the text that does not even include the correct method. On the left, our model has correctly identified the portion of the DCF passage that identifies the discount rate type and ranges, yet it does not specifically parse out those individual metrics. In our NER model, we can programmatically select these ranges, method types, and discount rates to be automatically injected into various systems. Whereas for our ChatGPT model, there would need to be additionally rule-based parsing that defeats the purpose of building a language model to detect these entities.

In conclusion, using an entity recognition model on financial forms is superior to using a language model because it offers greater precision, efficiency, customization, and integration capabilities. By leveraging NER, finance professionals can extract valuable insights from unstructured data, improve decision-making, and streamline their workflows.

Let’s Talk

Group of business colleagues communicating on a meeting in the office.

Elevate AI/ML now with DataInFormation for company success.

Contact Us