(2021-04-15) Taylor Spreadsheet Rantifesto

Dorian Taylor: Spreadsheet Rantifesto. The spreadsheet as the go-to data structure is so lamentable. In general, I never want data as it comes in a spreadsheet. I almost always want it as a graph, or at least a tree. Graphs and trees—at least ordered trees like XML or JSON—can embed tabular data, such as that found in a spreadsheet, but the same cannot be said for the other way around.

I almost always want to model one-to-many relationships, which spreadsheets straight-up can't do. I always want to define the data semantics—what the columns and rows unambiguously mean. Spreadsheets can't do this either.

CSV

Delimited text files—usually but not necessarily by commas—have emerged over the decades as the de facto way to exchange record-oriented data. This situation could be much worse, but it isn't especially good.

Headers

Nulls

Dates and times

enums

Mixed datatypes

I suppose you could say the spiritual successor to CSV is JSON

Honourable(?) mention goes to YAML, which is yet another markup language liability borderline-Turing-complete data serialization format.

Anything that originates as an actual spreadsheet

Actual spreadsheets import all the problems of CSV and have plenty of their own.

I worked on a project last year where I interfaced with a teammate primarily through a spreadsheet. Being a site map, it shook out as a pseudo-hierarchy of about 200 entities

I want to underscore that I believe there is great value in being able to type in data with your fingers and move it around ad-hoc like a spreadsheet affords. I also believe that very same ad-hockery is what has limited the spreadsheet's functionality since it was invented over four decades ago

The original design was never meant to grow past the confines of an individual PC

How do you preserve this beefed-up, ad-hoc flexibility while adding the guardrails necessary to make the data more valuable to the larger ecosystem? I see two main challenges: data semantics and user interface.

Getting a modest quantity of data, by hand, into a persistent structure, that doesn't demand any up-front work on schema definition, and provides a rudimentary computational vocabulary, is what I believe to be the core strength of the spreadsheet.


Edited:    |       |    Search Twitter for discussion