A warning that “AI-Generated Content May Not Be Accurate” appears at the bottom of the AI-powered chatbot contained within the CRM platform that my company uses. Think about it for a moment: information provided in the platform, by the manufacturer, about the product they designed might not be correct. I’m sure they know the right information about their own product: last time I called their help desk I don’t remember the call center agent advising me that the answer he provided was “just a guess”.
So why shake my confidence in their new chatbot with this warning? If the answers it provides are correct, there shouldn’t be any need for a warning. If the answers aren’t correct, then what’s the point of having this feature built into their product? And if the answers are correct only some of the time, how is a user supposed to know which ones to believe?
Imagine that a service tech tells you that the brakes he just installed on your car might not work sometimes: would you accept that advice and pay the bill with a smile? Would you be satisfied with a doctor who points to a sign in the exam room stating that some of her diagnoses might not really make any sense? Yet here’s a SaaS platform where the manufacturer believes that it’s OK if some of its information is wrong (after all, they did take the time to include a nice warning note).
Despite the picture at the top of this post, we haven’t [yet] reached the ‘black box’ stage – like the Surgeon General’s warning on tobacco products, and where the FDA mandates that prescription drugs with the potential for serious adverse reactions or special problems occur, “particularly those that may lead to death or serious injury.”
But that may not be very far off: we’ve all heard about Google’s AI Overviews advising a user to ‘put white glue on pizza’, and falsely claiming that a UC Berkeley geologist recommended “eating at least one small rock per day.” And let’s not forget the three fatal Tesla accidents where the vehicle on Autopilot misinterpreted a semi-trailer for a tunnel and attempted to drive under it. Those all sound like pretty serious adverse reactions.
Of course, we ignore warning labels all the time. Some are so inane that they warrant being ignored: a tag attached to an iron that says ‘Do Not Iron Clothes on Body’; a folding stroller that warns ‘Do Not Fold With Child Inside’; a coffee cup that sagely claims ‘Hot Beverages Are Hot!’; the Scrubbing Bubbles Toilet Brush that advises ‘Do Not Use For Personal Hygiene.’
Those are obviously placed on products to avoid gratuitous product-liability lawsuits – and they could all be replaced with one generic “Don’t Be An Idiot” label. But the implications of using incorrect or biased AI-generated content can be far more dangerous than a rash from using a toilet brush in place of a loofah.
These errors and biases have a relatively simple – but fairly unpopular – origin: the desire for AI-generated content is currently running ahead of the technology. Large Language Models are designed to assemble strings of words based on the probability that one word correctly follows another. But the model can’t always understand context – hence, comical word salads when a prompt about a Falcon fighter jet results in a response that includes falconry and baby birds. Models that use Retrieval Augmented Generation (RAG) don’t fare much better – Google’s infamous glue-on-pizza gaffe was ultimately traced back to a joker on Reddit who suggested it as a way to prevent cheese from sliding off a greasy slice.
And perhaps most insidious is the fact that most GenAI tools are designed to always provide an answer, even when the tool lacks sufficient correct data to formulate a correct one. Most people can spot uncertainty when dealing with another person: faltering speech, a lot of ‘probably’ and ‘maybe’ qualifiers, and a whole array of physical ‘tells’. But there isn’t yet a digital equivalent to that most human of signals – the head scratch – nor are there any confidence meters included in UIs that would blink or beep when a machine-generated response shouldn’t be trusted.
While it’s tempting to blame the technology, most of these issues can ultimately be traced to insufficient data – insufficient in both quality and quantity. Development tends to focus on algorithms, and all too often data is an afterthought. Model builders use data they’re familiar with, and that is readily accessible, to train their models. And those datasets generally don’t include unstructured data (images, videos, audio recordings, chat logs, scanned documents) containing valuable information that significantly improves the accuracy of AI, GenAI, and ML models.
Unstructured data typically gets excluded from both training and production datasets because it must be labeled / annotated and re-formatted before it can be ingested into one of these advanced technology platforms. While it may appear to be a daunting task for organizations not familiar with the process and without the tools and the staff necessary, there are service providers who can do this work quickly and at modest cost relative to the large gains in accuracy they make possible.
For now – whether you see a warning label or not – if you’re a user of GenAI tools be sure to regard the output with a healthy dose of skepticism, and verify/validate it according to the amount of damage that an incorrect piece of information could cause. If you’re an AI/GenAI/ML platform provider, you owe it to your users to include unstructured data in your foundational models, and counsel your clients to include their own unstructured data in their implementations.
If a CRM provider can “AI wash” their chatbot and opt for a warning message in lieu of accuracy, then I guess I can short my next payment to them – as long as I include a message that says, ‘Payment Amounts May Be Incorrect’.
To discover how our DataInFormation suite of solutions helps you achieve superior performance from your advanced technology investment contact Joe Bartolotta, CRO at joseph.bartolotta@liberty-source.com