Publicerad: Text: Max Krog Lästid: 5 min

How data teams can get better outcomes with a declarative reporting layer

The world of analytics has seen a huge development in recent years. Most of this development has happened within the data-ingestion and data-modeling parts of the data lifecycle.

In this article, I’ll explore what development I expect to see in the near future for the tooling we use to manage metrics and dashboards.

Much of this recent development can be attributed to the adoption of software engineering best practices. Modern data teams are managing most of their assets with code and configuration files, unlocking the possibility to iterate on, review and test new features in a development environment before releasing them into production.

For example, pull requests to a typical analytics git repo can cover:

  • Changes to cloud infrastructure
  • Changes to data ingestion pipelines
  • Changes to how data is modeled in the warehouse
  • Changes to how we monitor the health of the resources above

These pull requests automatically trigger CICD pipelines that deploy the changes into a development environment. This allows us to take a deeper look at the changes, make sure that everything works as expected and get feedback from our colleagues. When we are satisfied with the results, we simply merge the pull request and watch as the changes are automatically deployed into our production environment. It allows us to work with small incremental changes that are released into the production environment securely and with full reproducibility.

Prominent product coaches have been preaching this way of delivering software for a long time. Without this way of delivering, it’s hard to make the switch from an output-oriented to an outcome-oriented team, you’re simply not able to get user feedback quickly enough. Many software engineers probably think this way of operating is nothing special, but within the world of data it still feels like something novel. Many of us remember relying on manual routines for at least part of the work mentioned above as if it were yesterday.

Realizing how much this area has improved has also made me think – what parts of the analytics stack can’t be managed in this way, yet?

The obvious candidate is the reporting layer. While assets that live behind the scenes can be managed in this new way, the stuff that our stakeholders are interacting with is still being managed in a completely different way.

Metric definitions are spread out in the various business intelligence tools you use. Dashboards are built and laid out manually and keeping them consistent looking often comes down to having a real enthusiast on your team that makes sure ’show x-axis name’ is always enabled in the various graphs.

As most business intelligence tools don’t offer a development environment that can be shared with stakeholders, new ideas are often tested out directly in the live version of a dashboard. (Remember to roll this back if the new idea didn’t work out as you expected.) If the change is too big to be tested directly in the live version, you might even create a copy of the dashboard and share the copy directly with a stakeholder during the development phase.

And – I almost forgot – most of the new ideas you want to test in the business intelligence tool depend on new data being available from the data warehouse. So you probably have to create a PR for that anyhow, and then connect the work in the reporting layer to the development environment output from that PR.

We’re still handling the reporting layer – where stakeholders interact with the data – in a labor-intensive, hard to replicate and error-prone way

If a software engineer was asked to describe this way of working I think the term ”ClickOps” would often be used. ClickOps is the labor-intensive and error-prone process of having people click through various menu options in cloud providers’ websites to configure infrastructure. While a ClickOps approach can get the job done rather quickly at first, it comes at the long-term cost of being hard to reproduce, with a high risk of manual errors, and a lack of automation causing repeat deployment to be slow.

Luckily this way of operating has since been replaced with the infrastructure-as-code approach. To keep it simple, the infrastructure-as-code approach boils down to managing your infrastructure with code (instead of clicks). I think a big part of why the infrastructure-as-code approach has succeeded can be explained by how it’s been implemented. The most common tool (Terraform) uses a declarative approach. This means that the person who wants to configure their infrastructure only has to focus on describing the infrastructure they desire, they can leave the work of figuring out how to get to that target infrastructure from the current infrastructure to the technology. This way of working makes changes to infrastructure simple, predictable and repeatable. (The contrary to a declarative approach is an imperative approach, where you describe the exact steps to take.)

Could a declarative configuration-as-code approach be borrowed for the reporting layer in analytics?

I think so. Manually clicking around in business intelligence tools is preventing a lot of teams from leveling up to an outcome-based approach. This fragile and error-prone way of working is not letting us iterate and get feedback from users at the pace we need.

Image being able to submit a PR to your git repo that covers:

  • The introduction of a new metric (that’s used across many dashboards)
  • Changes to an existing metric
  • Changes to a core dashboard used by most of the company

And then sharing the result of these changes with teammates and stakeholders for feedback. Based on the feedback you might then re-work the changes completely, change them slightly, or discard them completely. As it’s all a development environment, you can use it to test and evaluate new ideas without impacting anything that’s currently used across the company. This is very different from how things are done today.

This should enable us to build higher quality products, that are truly valuable for users, at a quicker rate than ever before.

Some happy news is that movement towards this way of working with the reporting layer has already started:

  • The introduction of a semantic layer (by companies like dbt Labs and Cube) gives us the possibility to ship fully queryable metric catalogs instead of tables and dashboards. All managed with code of course
  • Business intelligence tools like Steep are taking a metrics-first approach, and focusing only on making it simpler than ever before to explore the metrics catalog from an existing semantic layer
  • Tools like Streamlit are making it simpler than ever to build data apps

I’m a strong believer in this movement and hope that code and configuration files will start to cover assets in the reporting layer as well. I believe this will set us up to build things that are right for users at a higher pace than ever before.

Do you want to learn more about this new approach?

We’ve done two webinars on this topic. The goal of both of them is to give viewers an even better understanding of the issues with how we’re working today, and how new technology and approaches aim to fix these issues.

Please do check them out, maybe together with your data team?

What do you think?

Is the reporting layer ”too exploratory” for this sort of rigidness, or is there a middle ground to be found that will enable us to create even better data products?

Feel free to add me on LinkedIn or shoot an email to max.krog@signific.se if you want to share ideas.