A Newcomer's Journey Through The Evolving World Of Data & AI
March 25, 2024
Brenden Reeves
As a soon-to-be university graduate entering today’s data and AI landscape, I find myself at the intersection of excitement and innovation. This isn't just an evolution; it is a complete transformation. Every day I wake up to a new tool, technology, or methodology that is revolutionizing data and AI in some way. A paradigm shift is taking place around how we view and utilize data. A major change in the right direction is great, but without firsthand experience of the past, I struggle to relate to the scary stories of disparate data that serve no purpose. This blog reflects on my journey of making sense of both the past and present world of data.
The Problem
I constantly hear horror stories about siloed data that, instead of acting as winds of innovation pushing an organization towards new horizons, acts as a series of small, insidious leaks below the waterline that might slow and eventually sink the whole ship. At the same time, I am also surrounded by people with hope and confidence that, instead of sinking ships and being thought of as a byproduct, data can be the leading force behind advancements in all domains and industries. It is clear to me that data was not utilized effectively in the past. While I use and generate data every day, I never had to stop and think about the backend systems and architectures through which the data was traveling. To me, this resembles cleaning up your room by shoving everything in your closet. While it may work for a while, eventually the clutter builds to a level that demands your attention. It seems to me like I joined right at this tipping point. I arrived too late to understand firsthand the systems that created the mess, but just in time to help develop the tools that will drive real change. Luckily, I work with people who have been deep in the weeds of data, and it is because they know the past that they can pave the way into the future. So far, this has given me space to contribute to the forward movement while also being able to rely upon their learned experience as I attempt to understand where we came from.
How Did I Jump In?
At a startup, there isn't much time for handholding. Although I admit that I required, and still do require, some hand-holding from time to time due to my experience level, I knew the most important thing I could do was dive right in. On day one, I started hearing terms like: ‘Data Mesh’, ‘Data Product’, ‘Data-Centric’, “Data Domain’, it seemed like everything I saw and heard was data, data, data. Turns out, that was just what I needed. Although it was like trying to hike a mountain at night with sunglasses on, I knew that if I just kept going, I would figure it out. Fortunately, I learn well through repetition, so seeing and hearing the same terms and definitions repeatedly allowed me to build a crude mental map to traverse as I further explored this initially confusing landscape. Over time, I have iteratively developed this mental map into something that I can use to understand and explain new terms and technologies more easily than when I started. If you had asked me to explain what a Data Product was 9 months ago, I would not have been able to give you a concrete answer. Now, I am helping provide clients with a system designed for producers and consumers to manage data sharing through Data Products, Data Assets, and everything associated with them across an organization. While it might not work for everyone, I chose to learn how to swim by simply jumping in and figuring it out as I went.
My Take On The Current Landscape
Everything is changing quickly. While it took some time to find stable footing, there is no other time I would rather join the world of data and AI. Advancements are happening so quickly that techniques used last month may no longer be state of the art. The tools to change the way we view and utilize data are now available to everyone. The old saying “the early bird gets the worm” has shown its’ true value this past year. For one, the field of Natural Language Processing has made leaps and bounds over the past 12 months and this speed is not expected to slow any time soon. We can now talk with our proprietary data through retrieval augmented generation (RAG) backed chatbots. Through AI, we can gain valuable insights into our data at a level that was not possible before. Today, data is not just forgotten about in some folder deep in the archives of a hard drive lost years ago; it is fueling the fire of innovation. As Andrew Ng said, “Data is the new electricity”. Who wouldn’t want to be a part of this change? An additional bonus of being at a startup is that we have the flexibility to utilize new tools from day one without waiting on the approval systems large companies rely on. This means we can better understand current and future predicted innovations since we can understand and utilize every new technology that crosses our path. This sets us apart from the rest because instead of just reading about how these new tools are being used, we are the ones using them.
What Do I See For Data And AI?
Unfortunately, this is where my map starts to fog back up. The world does not even look the same as it did 9 months ago when I joined XponentL. While this is amazing, I have spent so long trying to untangle the past in hopes of understanding the present I haven't had much time to think about what is next. However, this has been gnawing at me a little more each day. While I can't say for certain where we are going, I know for sure that we will never go back. Now that the world is starting to understand the true power of data, the pace of innovation will never stop advancing.
Master Term Management (MTM)
As a final note, it seems to me that alongside focusing on MDM (Master Data Management) solutions within organizations, we also need to focus on golden record creation for the hundreds of different definitions associated with each term surrounding data. While this only slightly slowed down my initial understanding, it did cause me to incorrectly assume I had understood some terms and topics without grasping their true meaning. Eventually, through sheer volume and a few questions to colleagues who could explain these concepts to me in their sleep, I was able to rule out misleading definitions and find golden records of my own. Because of this process, I believe there should be a greater focus on ontology and a known centralized store of terms with their factual definitions. With a tool like this, the community would not be forced to spend so much time sorting through the fakes. If I had a penny for every slightly different version of ‘Data Mesh’ I have seen, I would have at least a couple of pennies. It isn’t much, but it’s honest work.