Case Study
Cause & Effect Question Set Evaluation for LLMs
Challenge
Identify potential problems (such as Causal Reasoning and Model Performance) and their causes in our client’s
existing LLM.Additionally, we needed to verify the performance and enhance the cause/effect relationship in
outputs from our client’s LLM by generating benchmark “Cause and Effect” question sets.
Industry
Prompt Engineering
Data Type
Text
Project Duration
2 Months
Ongoing?
Yes
Solution
To execute this, our prompt writers began by choosing a word from a list (provided by the client) to use in a sentence. Operating within a set of client parameters, our skilled writers created two statements. Both statements had plausible causes or effects related to the word selected from the original list. Then the writer chose one of the responses to their two statements as the single most accurate response to the client’s “Cause” or “Effect” question, thus providing a usable future benchmark.
Those responses were then cycled through an evaluation phase which was based on answering pre-determined questions about the relevance of the content in the response. The post-evaluation responses were then used to teach the model to generate relevant responses in potentially ambiguous situations.
This type of evaluation cannot be done without Human-in-the-Loop (HitL) judgement and expertise, which we utilized by leveraging the domain experience and knowledge of our prompt writers.
Outcome
We created hundreds of prompts and validated responses (in the form of answer statements) for given “Cause” and “Effect” question prompts. Then, the prompt and answer pairs were exported as a JSON file and fed into a training pipeline for the client’s LLM. Lastly, the outcome statements were then used by the client as human-generated benchmarks for training an LLM to accurately determine “Cause and Effect” relationships.
By using these updated benchmark statements, our client successfully pinpointed potential issues and their underlying causes. As a result, they proactively addressed these issues before any problems could arise.
Download Case Study