Feb 07, 2023

Data observability, a key concept in the field of data engineering and analytics, describes toolsets, methods, and practices that provide insights into the health of your firm’s data pipelines. It also supports your firm’s ability to provide reliable delivery of accurate and timely information to consumers.

Observability by Design

Data doesn’t tell you when it’s broken. Smart data engineers will incorporate observability into their “definition of done,” alongside privacy and security. Data observability processes incorporate metadata generated by job schedulers, file deliveries, vendor notifications, transformation processes, and validation tests to monitor data health. We recommend that data managers prioritize data observability over incorporating new features into your firm’s data management program. Let’s look at why data observability is gaining so much traction.

Data Observability Benefits

  • Improved Data Quality ─ Business users expect accurate data before they start their daily activities. Their confidence level in the data needs to be 100%, or they will create redundant feeds and reconciliations, thereby diluting the capacity to generate business value. Reliable data pipelines require actionable insights across the data capture, transformation, and delivery processes.
  • Proactive Data Quality Management ─ Data consumers will generally put up with new types of errors that were not anticipated. Of course, this goes over better if you discover the issue before the consumers do, quickly diagnose the cause, and fix it. Proactively identifying and remediating data issues builds credibility in the data and those providing the data.
  • High-Value Employee Retention ─ Supporting the data function at an investment firm is a demanding role. You arrive at your “virtual” office on Monday morning to emails from executives experiencing issues with their dashboards, alerts from vendors on format changes, and Microsoft Teams messages from your trading desk in all caps. Investing in data observability toolsets can lift some of the weight from IT and business users, freeing them up to spend less time on data wrangling and more time on adding business value. Data observability solutions provide a focal point that helps your firm to mature the processes over time. Consider anything that assists your firm in retaining and attracting talent as a big win in today’s competitive market for talent.
  • Cost Control ─ Tools like Acceldata provide insights into anomalous usage patterns across the organization. Firms moving to cloud database providers like Snowflake quickly learn about the challenges of “pay-as-you-go” pricing. Acceldata can identify datasets that are not optimally configured from a cost standpoint.
  • Self-Service ─ Asking a newly hired software engineer to establish and monitor data quality may not achieve the desired result. New employees lack the required institutional knowledge to know if the data is correct. What’s more, business users from different parts of the organization may not agree on their requirements for data completeness and quality. Enter solutions like Soda that allow business users to create data-quality rules for their teams. We can see how the data empowerment trend is positive for supporting data consumers.

Data Observability Solutions

Mature observability toolsets like Datadog deliver user-friendly radar screens that peer into the health of your hosted service architecture. Technology managers are enjoying these user-friendly presentations of information from logs, metrics, and traces to quickly recover from issues. The team at Monte Carlo Data recognized the lack of similar tools for data engineering and applied these concepts to data pipelines, with the goal of minimizing “data downtime.”

Data Observability Embedded in Financial Service Solutions

Data service vendors that cater to financial services firms have experienced data managers, extensive vendor relationships, and expertise gained from experience with servicing clients. Tools from these same service providers such as RIMES Matrix and BNY DataVault incorporate observability throughout their products. Unless these vendors are managing all your data pipelines, your firm will need its own observability platform for an enterprise view.

Data Observability Solutions Providers

If you’re keen to learn more about data observability, check out the following vendors that were the first to market and are defining the space as they go. If you take some time to watch a few of these vendors’ demos, you will quickly understand their value:

  • Monte Carlo Data
  • DataKitchen
  • Acceldata
  • Databand (IBM)
  • Datafold
  • Informatica
  • Soda

Is Data Observability Right for You?

If your firm currently monitors your data pipelines with proprietary dashboards, scheduler alert emails, and log scraping, you have already laid much of the groundwork required for these platforms. Vendors, including Monte Carlo Data, have experience applying machine learning to enhance their anomaly detection using observability metadata.

Firms that effectively integrate and professionally manage new data sources into their decision-making gain a competitive advantage over their peers. Portfolio analysts need the ability to contact a vendor, grab a new dataset, and start analyzing it in a sandbox. Reaching an agreement on a minimal set of metrics for observability of new sources is important, as data can quickly gain momentum. Firms need to strike a balance between supporting innovation and creating inflexible controls. Your approach to data observability needs to manage important production datasets, but cannot hinder exploration.

If you are interested in learning more about the topic of data management, take a peek at Cutter’s infographic, Best Practices in Self-Service Data Analytics.