Here is a list of content we found interesting this month.
To build the last-mile of corporate technology, to usher in a new era of computing literacy, and to generally indulge the insatiable appetite of world-munching software, we need live apps that, like spreadsheets, are not treated as finished products, but as building blocks to be extended in real time using low-code techniques.
Live apps are not finished products. They can be extended in real time using low-code techniques that blur the line between user and developer.
Over the last decade, many of these early BI functions have been stripped out of BI and relaunched as independent products.
Just as the cloud rewrote our expectations of what software is and what it isn’t, the modern data stack is slowly rewriting our expectations of BI.
BI tools should aspire to do one thing, and do it completely: They should be the universal tool for people to consume and make sense of data. If you—an analyst, an executive, or any person in between—have a question about data, your BI tool should have the answer.
The boundary between BI and analytical research is an artificial one. People don’t sit cleanly on one side or the other, but exist along a spectrum (should a PM, for example, use a self-serve tool or a SQL-based one?). Similarly, analytical assets aren’t just dashboards or research reports; they’re tables, drag-and-drop visualizations, narrative documents, decks, complex dashboards, Python forecasts, interactive apps, and novel and uncategorizable combinations of all of the above.
A better, more universal BI tool would combine both ad hoc and self-serve workflows, making it easy to hop between different modes of consumption. Deep analysis could be promoted to a dashboard.
Marrying BI with the tools used by analysts brings everyone together in a single place. A lot of today’s analytical work isn’t actually that collaborative.
Analytics engineers provide clean data sets to end users, modeling data in a way that empowers end users to answer their own questions.
While a data analyst spends their time analyzing data, an analytics engineer spends their time transforming, testing, deploying, and documenting data. Analytics engineers apply software engineering best practices like version control and continuous integration to the analytics code base.
Today, if you’re a “modern data team” your first data hire will be someone who ends up owning the entire data stack.
On the surface, you can often spot an analytics engineer by the set of technologies they are using (dbt, Snowflake/BigQuery/Redshift, Stitch/Fivetran). But deeper down, you’ll notice they are fascinated by solving a different class of problems than the other members of the data team. Analytics engineers care about problems like:
Is it possible to build a single table that allows us to answer this entire set of business questions?
What is clearest possible naming convention for tables in our warehouse?
What if I could be notified of a problem in the data before a business user finds a broken chart in Looker?
What do analysts or other business users need to understand about this table to be able to quickly use it?
How can I improve the quality of my data as its produced, rather than cleaning it downstream?
The analytics engineer curates the catalog so that the researchers can do their work more effectively.
Despite a recent proliferation of tools in the modern data stack, it’s unclear whether we’re seeing an unbundling of data tooling into many separate layers, or the first steps towards consolidation of data tools.
One popular interpretation of this explosion of data tools is that we are witnessing the “unbundling” of the data stack. Under this interpretation, classically monolithic data tools like data warehouses are being dismantled into constituent parts.
However, it’s also possible that this “unbundling” represents a temporary state of affairs. Specifically, under this alternative thesis – which we’ll call “consolidation” – the proliferation of data tools today reflects what will ultimately become a standard set of features within just a few discrete, consolidated layers of the data stack.
If consolidation is so beneficial to users, why are we seeing “unbundling” now? My thesis is that this unbundling is a response to the rapidly-evolving demands on and capabilities of cloud data.
In a nutshell, the data ecosystem is slowly rebuilding the warehouse and analysis layers to adapt to the new reality of cloud data.
In the next two years, I expect we’ll see more attempts to consolidate the modern data stack, albeit in intermediate stages – for example, the consolidation of data pipelines and transformation, data catalogs with metrics layers, and dashboards with diagnostics.
Much of the work – especially in the analysis layer – is spread across an absurd number of tools today – not just business intelligence, but also spreadsheets, docs, and slides. Consolidating this work has the potential to transform the future of work for every modern organization, and to redefine the future of data.