When the discussion turns to analytics and data science, do you immediately think of aluminum? If you said no, that’s good — you shouldn’t have.
Until now.
Aluminum has many applications. It comes in a wide range of sizes. It can be flexible or very rigid. It can be endlessly recycled and turned into any number of new objects when the original has served its purpose. When it was first discovered, aluminum didn’t have much practical value. Yet today, it’s an essential part of our lives, and its value has increased accordingly.
Does this description sound familiar? Replace “aluminum” with “data” in the previous paragraph and read it again.
There’s an even more important parallel. In its natural state (a material called bauxite), aluminum looks like any other rock you might come across — and ignore. But without that rock you just kicked aside, we wouldn’t have airliners, spacecrafts, fuel-efficient automobiles, or hundreds of other products essential to modern life.
Whatever you call it — perfect camouflage, hiding in plain sight, or going incognito — the issue is that a great deal of value could easily be overlooked.
And now for the startling revelation: The same holds true for the unstructured data in your organization.
You deal with it all the time: Images, video, call recordings, scanned documents, chat logs, PDFs, and all those other types of files that aren’t in a neat row-and-column format. In fact, 80% of all new data created every day is now of the “unstructured” variety. Yet, it is frequently overlooked by data analysts and other data scientists because it’s not readily available in an easy-to-use format.
Historically, this kind of data hasn’t been included in analytic data sets or data catalogs. Even those firms that sell data mesh and data fabric platforms that claim to make “all” of an organization’s data visible and accessible inevitably exclude unstructured data. Instead, they leave it at the source or buried at the bottom of a data lake — because they would rather not deal with it.
If that’s the case, then why should anyone worry about their unstructured data?
Traditional structured data is very good for answering “what” questions: What were yesterday’s sales? What is our current customer satisfaction level? What is the average output of Unit #3?
But structured data tends to be backward-looking, and it can’t answer “why” questions: Why were yesterday’s sales 10% over plan? Why has customer satisfaction dropped by 5 percentage points this week? Why is Unit #3 producing at half the level of the other units?
Unstructured data usually represents current, real-time readings: news feeds, call center audio files, sensor outputs. In the examples above, an examination of news feeds showed that a sudden blast of cold weather triggered a spike in coat purchases — driving up yesterday’s sales.
A sentiment analysis of customer service calls uncovered a recurring issue that continues to pull down satisfaction survey results. An analysis of the operating parameters of a specific production machine indicated that it needs maintenance, which explains its lower output.
All of these insights and answers are possible only by including unstructured data in models and analyses. While most organizations will say that they’re already capturing this kind of data — and even storing it in a data mart, data warehouse, data lake, etc. — they’re not actually using it.
Unstructured data is not routinely fed into AI models, and it’s not part of most BI analytics mostly because doing so requires work. Unstructured data must be labeled, annotated, or transcribed before it can be ingested by any advanced technology platform. Unfortunately, most organizations aren’t set up to do that kind of work: There’s no one whose job title is “Data Labeler,” and there’s no one with experience in recruiting and managing data labelers.
So, for those forward-thinking leaders aware of the value locked inside their unstructured data, the solution is to find someone in the organization with “data” in their title. And that’s how data labeling usually ends up being assigned to data engineers.
Which is tragic.
Not that they’re unfamiliar with the process or incapable of doing it. But it’s a very expensive way to get the job done – both on a direct cost basis and on an opportunity cost basis (if these highly qualified individuals are working on annotating data, the jobs they were hired for are likely not getting done).
Moreover, these are resourceful people tasked with doing something they’d rather not do. So they find workarounds (e.g., purchasing someone else’s already-labeled data). Or try to automate the process (using an AI-assisted labeling or transcription platform).
Or, worst of all, shortcut the process (by using a generative AI tool to create synthetic data). But nothing is as powerful and unique as an organization’s own experiential data, accurately annotated and ready to provide a significant boost to the structured data already in an AI or BI tool.
If labeling that data isn’t something you can do, a data services provider can do it for you (just be sure to find out where the work will actually be done, and whether or not the firm has specific domain expertise for your business vertical).
What does all this have to do with aluminum? Actually, quite a bit.
Most people would pass by a chunk of bauxite without a second glance. It doesn’t look anything like aluminum; it’s nothing they’ve ever used or needed before, so it’s assumed to have very little value. But refine it into an ingot of pure aluminum, and the possible uses quickly present themselves.
All the scanned image files, thousands of recorded customer service calls, chat logs, and enormous geospatial surveys are exactly the same kind of uninteresting ore — until those unstructured files are labeled or annotated. And then, the use cases suddenly appear.
Don’t overlook your unstructured data. Recognize its true potential and refine it into valuable ingots. You’ll outshine your competition.
To discover how Liberty Source, through its DataInFormationSM suite of solutions, helps you achieve superior performance from your advanced technology investment, contact Joe Bartolotta, CRO, at joseph.bartolotta@liberty-source.com.