Case Study

Corporate Email Content Mining


High volumes of documents pertaining to past, current, and future client and prospect activities are received daily and require parsing and annotation in order to be analyzed for patterns and potential action items. With a very high level of variability among the documents, accurately identifying relevant text strings is not possible in a completely automated environment.


Capital Markets

Data Type

Semi-Structured (.pdf, .txt, .html)

Project Duration

2 Months




The ability to infer relevance based on context is required in order to successfully harvest all the useful text strings in each document – and that judgment relies on a human-augmented process. Our Data Associates reviewed a wide variety of document formats and file types received by our client’s staff from various sources around the world. While navigating the different layouts and detail levels, Associates parsed out the relevant information from thousands of emails per day, categorized each string, and researched key items to add additional value to the parsed content.


Combining human judgment and context sensitivity with traditional text annotation yielded a customized dataset which allowed our client to realize tremendous insight and significant value from seemingly random external documents. Progress tracking allowed the client to maintain visibility into the volume of documents being processed, while active collaboration ensured that the capture and categorization of data was tightly aligned with the analytical objectives.

Download Case Study