2024 XponentL Data and AI Outlook

December 31, 2023

As we reflect on our first six months assisting clients with their Data and AI challenges, we are excited by the progress and opportunities ahead. We are privileged to help them address their challenges through innovation of technology, ways of working and thought. Thus, just like many that step out on a limb of their limited trees, why not get into the fun of providing our own predictions for 2024! Here are a few of our thoughts on what may happen in 2024:

Large, centralized AI initiatives will deliver the necessary AI/ML platform infrastructure and tooling but will fall short in delivering tangible enterprise-level business value, leading to some frustration and disillusionment with centrally funded efforts and start an evolution toward a federated, domain-specific AI operating model.
Smaller, industry- or function-specific large language models (LLMs) will quietly gain traction and come to dominate the enterprise. While all the publicity will be captured by the large players and their foundational models, organizations will look for GenAI models/partners who have fine-tuned models to their specific domains and business problems.
These smaller and industry focused LLMs will give rise to a Model Proliferation Quagmire.  Organizations will start to struggle with the “too many models problem” which will drive differences in outcomes eroding internal confidence in model output and creating factions within organizations who believe their output is correct.  Case in point, the Hugging Face index already lists over 450,000 LLMs, most of which are undocumented and unusable.
While the industry pivots to Retrieval Augmented Generation (RAG) and Knowledge Graphs to minimize (but not eliminate) hallucinations when inferring over a knowledge base, there will still be a need to define confidence around the generated output, especially in industries such as Life Sciences, Healthcare, Finance and Legal.  Look out for the emergence of confidence measures or indexes accompanying the outputs.  This will be followed by confusion as to what those metrics actually mean as the field matures.
Organizations will realize that GenAI can help them unlock insights that are currently hidden within unstructured data and will rush headfirst into it (in fact, they already are).  Those that go in without a strong GenAI strategy and governance approach will quickly realize that the benefits of just pointing an LLM to their document repositories will quickly get past the honeymoon stage and will realize how important data quality and curation are to fully unlock the power of their data.
Data leakage within LLM’s will become something that organizations need to take seriously.  In the last 24 months, organizations have developed Responsible AI teams and policies. In 2024, we must drive a systematic approach to the continuous monitoring of leakage.  Check out our partner DynamoFL who has the most comprehensive suite of solutions we’ve seen in the market!
It’s all about the data and its quality.  Somehow, through the development of computer programing, decades of increased compute capabilities and advanced modeling, we are still learning the concept of GIGO.  The phrase originally coined in the late 50’s and attributed to IBM programmer George Fueschel- Have we not yet realized that feeding a program or model garbage creates garbage???  A strong data strategy, including well documented lineage and provenance, is necessary to overcome this issue but we’ve seen organizations struggle to sustain the interest and focus required to truly solve the problem.  With the turnover in the CDAO role, we question when boardroom leaders will truly understand that you must fix your foundation before you can build a grand structure.
Data Products will climb past data mesh in the ascent towards data concept mindshare.  Organizations will understand that it’s not just about packaging data attractively. They must ensure the experience of data and their products are FAIR (Findable, Accessible, Interoperable, and Reusable).
Organizations will finally start to learn that the results from data analysis must include a detailed description (or link to it) that documents the lineage and provenance of any findings - not only how they were obtained but the versions of data and models used to generate them.  This is the "Fine Print" we all get when seeing an ad or buying a product; results need it too. We hope the days of the single graph in a PowerPoint deck without accompanying evidence are coming to an end.  The analysis process itself and the version of the data used should be treated and documented as well as the SDLC on the software engineering side.  It is not only about MLOps and DataOps but get ready for the rise of AnalyticsOps and the Results and Insights Life Cycle (RILC).
Data teams will start looking more and more like software teams with true owners, managers, defined SLA’s, micro releases, etc. This somewhat new way of thinking for data teams will start to drive a divide between the can and the can nots in the data world.  This will drive out engineers who lack the ability to articulate their features and collaborate across teams and make way for those engineers that understand the value of the data within a specific domain (context engineering).
The use of Synthetic Data will continue to grow to not only save money but also to get around some data usage, rights, and regulations.  The use of Digital Twins (or Patients) will continue to grow as more empirical evidence is gathered to confirm their utility, but this must be done carefully to avoid the infinite loops that pollute the data sets and analyses.
The launch of Google's Gemini marks a significant step towards the advancement of Multimodal Generative AI in
2024, promising substantial benefits, particularly in the healthcare sector. These advancements are set to revolutionize the way data is processed and used, enhancing patient care and medical research with their ability to interpret and integrate diverse data types like text, images, and audio.

All-in-all, while we are optimistic about the future of Data & AI, we are skeptical business leaders will understand and invest in the right areas to enable and sustain value for the future. 

It is and will always be about the data!  Here’s to an XponentLy great 2024 together!

Matt Arellano – CDO XponentL Data

Art Morales, Ph.D. – AI Lead, XponentL Data

Elkida Bazaj – GenAI Lab Lead, XponentL Data