Mar 07, 2023

For years, firms have put together a patchwork of solutions to provide access to approved data in a safe and efficient manner, while end users have complained about not knowing where to find data, or lacking access to the data they require, in the form that they need it. Although investment professionals have used statistical analysis to explore data and inform their investment decisions, until recently only a small group of managers or quants used Python or R to perform complex data analysis on datasets they would assemble themselves. Today, demand for data and self-service analytics comes from across the enterprise, especially the front office.

A semantic layer serves many purposes and, depending on how it is deployed, can support different end users and solutions, including support for self-service analytics and providing an abstraction layer that limits disruption when modernizing your data architecture.

But what is a semantic layer and how does it work?

Traditional Semantic Models

Before firms can analyze data, and before users can create their own analytics, they need access to data in a format they can use. Traditionally, quants may have created their own data models (often in Excel) by extracting, normalizing, and blending data from multiple sources such as trading or accounting platforms, shared drives, or wherever else they could beg, borrow, or steal data from. Quants created a view of the data to answer the business questions they were addressing. This model they constructed is sometimes called a semantic layer.

Over time, IT would recreate these solutions using data warehouses and data marts. With these traditional tools, semantic layers were static and grew stale. They required a significant investment in time and development from IT resources to maintain and add new data sources to. New models based on old ones created a brittle, spaghetti-like mess of connections and dependencies. The result was that everyone was frightened to change anything for fear of breaking something.

Tools like Tableau, Power BI, and Alteryx help by allowing business users to build semantic models and blend new data sources. But the problem with these models is that they tend to have static data that can’t be shared with other tools because they are proprietary representations within those applications.

A Better Way Forward

A better way forward is to build a flexible semantic layer using next-gen tools that create a performant, virtual, and cloud-enabled semantic layer. Although a virtual semantic layer provides an abstraction layer that models data from various sources, the data is not moved or replicated. Depending on a firm’s data architecture, the use cases for data consumption, and where and how the source data is stored, this can look different. Some firms may implement a semantic layer to support self-service analytics. Others may create a semantic layer to act as an abstraction layer that minimizes disruption to the end users when changes are made to the underlying source systems while modernizing the data architecture or replacing a system.

Firms may use data virtualization tools such as Dremio and Denodo to create a virtualized semantic layer without duplicating or physically moving data to a centralized data repository. Data virtualization accesses data where it lives and contains no source data, only the metadata required to access each data source, plus security or governance controls. Data virtualization tools allow firms to create reusable semantic models that can be shared across an enterprise and enable reuse and traceability of the data source.

These modern tools expose the data through query languages such as SQL or Web APIs, allowing a vast array of tools to take advantage of the models. Data lineage is fully documented even as logic changes. Knowing how the logic is applied gives users confidence to make changes in the dependent models. These tools also allow firms to blend new datasets quickly without expensive IT resources. A semantic layer allows data to remain in the source formats (CSV, Excel, Parque) and locations, but also can expose it to business users to look like data in one consolidated database where they can use SQL or more business user-friendly tools to analyze it.

Five Benefits of a Virtual Semantic Layer

  1. Accessibility to different data sources in one centralized location
  2. The ability to mask data or present it in a transformed state, reducing the need for ETL/ELT development
  3. Providing an abstraction layer from actual data sources so changes can be made to the underlying sources without disruption to data consumers
  4. The ability to optimize performance and scalability without significant development
  5. The ability to quickly blend new data sources without IT involvement

Don’t Miss Out

A virtual semantic layer allows different end users who are using different tools to access the same approved data that may physically be stored in multiple locations. A virtual semantic layer also can help prevent users on different teams from creating their own siloed data models and ensures approved and trusted data is used across the enterprise.

Once a virtual semantic layer is created, then users across the organization can use the tools that best work for them ─ BI tools, Python, R, SQL, etc. ─ to connect to that same semantic layer to explore the data, regardless of where or how it is physically stored.

To learn more about self-service analytics that leverage the semantic layer, check out our best practices for self-service data analytics infographic.

To speak with a Cutter analyst or consultant about this topic, contact us at [email protected].