Exploring Data Products

June 7, 2024

Granit Berisha

In my previous article, "From DevOps to DataOps: My Journey into the New Frontier of Data Engineering," I shared how I transitioned from a focus on DevOps practices to embracing DataOps. As I navigated through this journey, one of the first core concepts I had to grasp was Data Products. To understand data products, we must shift our mindset from thinking of data as a by-product or as a static piece of information to an asset with an owner, lifecycle and a set of key features. Data Products help transform data into a tangible asset that drives business outcomes and success. As Pascal Hultz puts it, "Data has to be a living, breathing organism that should be managed as a product!" Listen on Spotify

What are Data Products?

Data products are managed data packages designed to answer specific business questions with minimal friction. They are purpose-built and tailored for reuse and interoperability, integrating data with business context to deliver tangible value. Characteristics such as being consumption-ready, self-identifiable, and having a continuous lifecycle and defined ownership, distinguish data products from traditional data outputs. Data products are integral to modern business ecosystems, offering information designed to repeatably answer specific questions and support decision-making. Unlike traditional data sets, these products are tailored to be immediately actionable and user centric. Let us explore the five key characteristics that define a data product:

Purpose-Built:
Data products are created with specific use cases in mind. They address clear user or machine needs by bringing together data assets and business context to deliver significant business value. This purpose-driven design ensures that each data product is relevant and capable of solving real-world problems, promoting reuse and interoperability within and across enterprises.
Consumption Ready:
These products are more than just data; they are integrated solutions that are accessible and usable from the get-go. They prioritize user experience, presenting data in a way that minimizes the need for additional processing or interpretation. This readiness enhances user engagement by reducing friction and facilitating smoother interactions with the data
Self-Identifiable:
Each data product carries metadata (we call this the product metamodel) and key descriptors that outline its unique features and define its utility. This self-identification makes data products easy to find, access, and differentiate in a crowded marketplace. By clearly stating what each product is and what it is designed to do, users can efficiently understand each product's use and integrate them into their workflows.
Continuous Lifecycle:
Unlike static datasets, data products are dynamic. They are developed with agility, allowing them to grow and adapt over time to meet the evolving needs of users. This continuous lifecycle involves regular updates and improvements, incorporating user feedback, and adapting to changing market conditions. Such adaptability ensures that data products remain relevant and continue to provide value long after their initial release.
Ownership:
Ownership in the context of data products implies accountability for their success throughout their lifecycle. Each product has a designated owner who ensures that the product meets the set standards of quality and usefulness. These owners, often supported by a team of stewards, manage the product's lifecycle, champion its integration within business processes, and uphold the governance standards necessary for its sustained success. Viewing data products in this light; as strategic, managed, and continuously improved assets - can transform how organizations approach data. By applying these five core principles to our data product decisions, we ensure that they are trusted, have a source of truth, and are continuously updated and maintained. This perspective is not just a technical requirement but a business imperative that can lead to significant competitive advantages.

The Data Product Framework

A diagram of a diagram

Description automatically generated

The XponentL Data Product Framework outlines a structure for enterprises to create and manage their data assets systematically. This framework is crucial as it incorporates a set of capabilities, disciplines, and controls, modifying an organization’s operating model and ownership structure. It guides how to structure, build, and sustain data products that are not only operational but also scalable and aligned with business needs. Here are some key components of a good Data Product Framework:

Product Architecture and Lifecycle Imagine the data product lifecycle as a manufacturing process in a factory. The data enters like raw materials on a conveyor belt. As it moves through the factory, different teams add features. Just like in manufacturing, where each machine on the shop floor plays a crucial role, every step in our data process is geared towards refining the data, enhancing its quality, and fitting it for specific business purposes. That’s why before starting to write code, it's very crucial that we first identify the product architecture and the purpose and value of the product!
Continuous Improvement and Marketplace Importance We adopt a continuous approach to the development and refinement of data products. Feedback collected during and after deployment informs future releases, ensuring that the products not only meet current needs but are also adaptable to future requirements. The marketplace for data products is vital - it serves as the showroom where the finished products are displayed, helping users understand how they can be utilized effectively.
Platform Governance In the realm of data products, governance is critical. Whether it's a single platform or multiple platforms serving different segments, governance ensures there is uniformity in the services provided. This unification is essential for creating a common metamodel across platforms, which simplifies the process for data producers and users alike. Typically, governance is viewed as a static component, a set framework that dictates procedures and standards. However, we challenge this view by treating governance as a dynamic element integral to our operational ecosystem. Rather than thinking of governance as merely the schematics of a factory, envision it as the systems, processes, and telemetry that ensure operations are efficient and transparent. This active oversight provides continuous insights into process efficiency and product quality, fostering an environment of constant improvement and adaptation.
Integration, Efficiency, and Domain Stewardship
Efficient management of data products involves good intake and demand management to avoid redundancy and ensure resources are allocated effectively. For instance, if a type of data product already exists, it's more efficient to enhance it rather than create a duplicate. This process is supported by domain stewards who guide the development and ensure that the data products align with the broader business goals and data strategy.
Agile Teams and Persistent Product Teams
The shift from project-based teams to persistent product teams is a crucial evolution in data product development. Unlike project teams that disband after delivering a project, product teams are enduring. They carry the knowledge and ownership of the data products, continually working on improving them and even participating in marketing these products to maximize their impact across the organization.
Enabling Services and Capabilities
Finally, the framework must support data producers with enabling services that make it easier, faster, and cheaper to create high-quality data products. These services include data acquisition, ingestion processes, and frameworks for ensuring data quality and metadata standards. By simplifying these processes, we empower producers to focus more on innovation and less on the complexities of data handling.

Lifecycle of Data Products

A close-up of a data product

Description automatically generated

Understanding the Data Product Lifecycle (DPLC) is essential for the long-term success of these products. From the initial strategy and planning phase to the stages of ideation, design, and production, each phase is integral. The lifecycle does not end with deployment; it extends into growth, maturity, and operational management, ensuring that the data product continuously evolves to meet changing needs. This lifecycle, much like a manufacturing process, involves several key stages:

Strategy & Planning
At this initial stage, we strategically plan data products by identifying and refining the vision based on current and future business needs. It’s about forecasting what data products will be needed and ensuring there’s a clear alignment with business goals. This planning isn't just reactive to immediate needs; it involves foresight into how data can drive long-term value.

Ideation & Product Design
During the ideation phase, we conceptualize new data products or iterate on existing ones. This involves aligning the product’s design with identified business requirements, ensuring that the product will serve a broad and strategic purpose. It is about turning abstract ideas into concrete plans that have practical utility.

Product Engineering & Move to Production
The engineering phase involves detailed design and prototype development, which are tested against business requirements. This stage helps validate the concepts and refine them based on feedback. Once validated, we move to create a Minimum Viable Data Product (MVDP), which contains just enough features to function effectively and meet essential needs.

Growth, Maturity & Operations
After deployment, the focus shifts to monitoring and continuous improvement. This involves analyzing how the product is used and gathering feedback to enhance its functionality and performance. Change control mechanisms are crucial here to integrate new features and improvements seamlessly. Operations also focus on the infrastructure needed to support the data product effectively.

Monitoring, Improvement, and Continuous Evolution
Continuous monitoring of performance and user satisfaction (CSAT scores) helps determine if the data product is meeting its intended goals. If not, it may require redesign, enhancements, or even retirement if it no longer serves a valuable purpose. This stage is about maintaining the quality and relevance of the product throughout its lifecycle.

The lifecycle is not linear but cyclical—continually returning to ideation and re-engineering based on ongoing feedback and changing market conditions. This ensures that data products remain robust, relevant, and aligned with the organization's strategic objectives.

Connecting Data Products with Business Success

Data products are far more than just technical solutions - they are strategic business tools. They have the potential to significantly impact an organization’s efficiency and agility by transforming raw data into actionable intelligence. The integration of data products into daily business operations allows for smarter, faster business decisions. These tools streamline operations, enhance decision-making capabilities, and drive innovation, making them essential for businesses looking to maintain a competitive edge in today’s data-driven landscape.

Maximizing Utility Across Multiple Use Cases

The true value of a data product lies in its versatility and ability to address a wide range of business needs. A well-designed data product that fills multiple use cases becomes a pivotal asset, integrating deeply into an organization’s core operations. This versatility not only ensures a higher return on investment but also promotes consistency and efficiency across various departments.

For example, a data product created to analyze customer interactions can provide insights into operational efficiencies, supply chain management, or financial forecasting. Organizations can achieve comprehensive insights by leveraging the same data product across different functions, leading to more informed decision-making and strategic planning.

Here is a practical example of data products in the Life Sciences, Precision Medicine domain that we have designed and implemented.

A diagram of a data flow

Description automatically generated

From 'Genomics Test Results' feeding into 'Precision Diagnostics' to 'Patient Outcomes' influencing 'Clinical Study Protocols,' each data node serves multiple downstream processes. This network not only supports targeted drug development but also enhances patient selection and study design, ultimately improving the probability of success and safety of treatments. Such an integrated data product landscape exemplifies how robust data architecture can support various stages of precision medicine; from initial biomarker identification through to clinical trials and patient treatment strategies, thereby maximizing the value extracted from every piece of data.

Organizations that effectively develop and utilize data products find themselves well-equipped to tackle modern business challenges. These products should be designed to align closely with business objectives and be capable of adapting to various needs, ensuring they provide lasting value.

We firmly believe that building a strong data product ecosystem and network supports the next stage of organizational innovation and fuels your GenAI aspirations. Effective data management is crucial for making informed decisions that drive responsible business practices. By improving our understanding and control of data, we are setting the stage for a brighter, more innovative future. Together, we are moving towards a future where our decisions and innovations are guided by the clarity and precision that a robust data ecosystem provides.