(2021-07-21) Udell A Virtuous Cycle For Analytics
Jon Udell: A virtuous cycle for analytics. Suppose you’re a member of a team that runs a public web service. You need to help both internal and external users make sense of all the data that’s recorded as it runs
The web service featured in this case study is the Hypothesis web annotation system. The primary database, Postgresql...
Early on we adopted a tool called Metabase that continues to be a pillar of our analytics system. When Metabase was hooked up to our Postgres database the team could start asking questions without leaning on developers
Before long we had a large catalog of Metabase questions that query Postgres and display results as tables or charts that can be usefully arranged on Metabase dashboards
It’s all nicely RESTful. Interactive elements that can parameterize queries, like search boxes and date pickers, map to URLs
Over time, and with growing amounts of data, early success with this approach gave way to two kinds of frustration: queries began to choke, and the catalog of Metabase questions became unmanageable. And so, in the time-honored tradition, we set up a data warehouse for analytics.
To unthrottle the choking queries I began building materialized views that cache the results of Postgres queries
Such views can join with others, and/or with underlying tables, in SQL SELECT contexts. The views become nouns in a language that expresses higher-order business concepts.
The verbs in this language turned out to be Postgres functions written in the native procedural language, pl/pgsql, and later also in its Python counterpart, pl/python
Functions were, at first, a way to reuse chunks of SQL that otherwise had to be duplicated across Metabase questions and Postgres
To visualize what had now become a three-body system of sources in which Metabase questions, Postgres views, and Postgres functions can call (or link to) one another, I wrote a tool that builds a crosslinked concordance. That made it practical to reason effectively about the combined system.
a virtuous cycle: The team member formulates a question and does their best to answer it using Metabase; the toolsmith captures the intent and re-expresses it in a higher-level business language; the expanded language enables the team member to go farther in a next cycle. (Doing SQL Data Analysis As A PM)
Edited: | Tweet this! | Search Twitter for discussion