Evolving The Data Landscape
January 8, 2024
Art Morales, Ph.D.
From Catalogs to Marketplaces: Reflecting the Rise of Data Products in a Modern Data Platform
Data in the 21st century has evolved beyond being a mere byproduct of business operations and has become a cornerstone of strategic decision-making. This evolution is exemplified by the transition from traditional methods such as queries, BI reports, dashboards, and data catalogs to more modern, dynamic marketplaces. This shift has been propelled by the emergence and integration of data products, fundamentally changing the way we manage data in support of GenAI and disrupting the way we answer questions, e.g., natural language.
Initially, data catalogs served as simple repositories, cataloging metadata to assist organizations in managing and locating their data assets. However, as the volume and intricacy of data increased, these catalogs have started to evolve into something more sophisticated: the data marketplace. These marketplaces aren’t just repositories; they’re ecosystems that facilitate the entire lifecycle of data products, from inception and development to deployment and retirement. Of interest, some modern marketplaces can also facilitate provisioning, access, and analytics of data products and their usage/ROI.
Data products, in this new paradigm, are not just mere data sets. They are data assets (potentially multiple assets combined) augmented with rich business metadata, processes, and maturity levels assigned to them, providing contextual intelligence and serving a multitude of use cases across the organization. The shift from static catalogs to dynamic data marketplaces must, therefore, mirror and properly handle the transition from basic data assets to multifaceted data products. To effectively manage these data products, marketplaces must track a broad spectrum of metadata. These include not only technical metadata like data type, source, and schema but also fields that reflect the business value and utility of the data. Essential metadata fields in this metamodel include:
Data Lineage: Traces the origin and evolution of data, enhancing trust and transparency.
Data Owner: Identifies the steward responsible for the data product.
Compliance Status: Indicates adherence to regulatory and internal standards.
Use Cases: Examples of how the product can be used.
Usage Rights: GDPR, HIPAA, Copyright, Consent, etc.
Update Frequency: Provides information on the timeliness and currency of data.
Usage Statistics: Highlights how often and in what ways data are utilized.
Data Quality: Metrics that assess the accuracy, completeness, and reliability of the data. Note: these can vary based on the product and context, so it’s important to choose general and specific metrics to track.
Access Controls: Details regarding who can view or modify the data.
Integration Capabilities: Information on how the data product can be integrated with other systems.
User Ratings and Reviews: Offer insights into the usability and relevance of data products.
… and many more, covering various aspects of the data product’s lifecycle and utility.
Some of this metadata must be entered manually, but some can and should be automatically synchronized from existing systems. One of the key requirements for a successful marketplace is that metadata must reflect the true state of the product. Thus, keeping metadata in sync is key for the longevity and success of a marketplace.
Speaking of metadata, as one begins to build the standard metamodel to properly describe data products, we are not only assembling the information that enables findability, but we also enable the ability to use that metadata to foster better consumption of the products. This is an area where we are excited by the advances in thinking from vendors such as Data.World, who are using a graph model to power their marketplace.
Graph models represent a revolutionary approach to managing this intricate web of metadata. By representing metadata as an interconnected network, these models enable AI-based algorithms to traverse and analyze the network, enhancing findability and generating insights. This graph-based approach is pivotal for realizing the full potential of AI in data marketplaces. By feeding the graph metamodel and the underlying product data graphs/schemas to GenAI tools, data consumers can fully explore the products and the underlying data to maximize the value of an organization’s data, encouraging reuse and lowering the time between question and answer.
Another vital role of these evolving marketplaces is in monitoring the return on investment (ROI) of data products. Providing centralized visibility into usage patterns and utility, they enable organizations to make informed decisions about which data products to invest in, develop, or phase out.
As data product thinking evolves in the industry, understanding the distinction between a data asset and a data product is crucial. A data asset is a raw piece of information, a building block. In contrast, a data product is an enhanced version of this asset, enriched with key metadata and bound by a structured lifecycle process, making it a comprehensive solution rather than just a piece of data. This added layer transforms a data product from a simple brochure site into a strategic business tool. We’re excited by the advances of products such as AWS DataZone, Data.World, Collibra, and Atlan as they evolve to manage data products in addition to assets.
As data marketplaces mature, they transcend the role of mere catalogs and become hubs for innovation, collaboration, and strategic decision-making. AWS DataZone’s ability to provision access to the underlying data as a foundational feature and their integration with the rest of the AWS ecosystem (and other platforms) is expected to be a game-changer and has the potential to become the main way to provision access to products in the AWS ecosystem, regardless of which data marketplace is implemented.
Another key feature of these marketplaces is their ability to provide a brilliant user experience that not only provides technical details for those that need and want them, but also abstracts them in a way that makes the products accessible to the less technical but still data-savvy users. It is imperative that these tools adapt to the different user personas (Consumer, Producer, Admin, etc.) so that they can provide the level of detail and experience that encourages users to rely on the marketplace for their data needs, as opposed to bypassing it and trying to go directly to the sources. This has the additional advantage of helping centralize usage monitoring and management to better track priorities and ROI.
The addition of GenAI capabilities to describe and annotate data sources is exciting. Both DataZone and Data.World are showing promising results with this approach, and we look forward to more AI-based capabilities from these tools to bridge the gap between technical producers and use-case driven data consumers.
Data Product Marketplaces are the next step in the modernization and evolution of data-first industries. They not only manage data but also cultivate it, turning raw data into valuable products that drive business growth and innovation. This evolution is not merely a technological advancement but a fundamental shift in how data is perceived, managed, and utilized in the contemporary business landscape. Get in touch if you want to learn more and discuss how data marketplaces can help your organization achieve its data strategy goals.