(2019-05-09) The Empty Promise Of Data Moats

The Empty Promise of Data Moats: we constantly hear about the combination of the two: “data network effects”

But for enterprise startups — which is where we focus — we now wonder if there’s practical evidence of data network effects at all.

This isn’t just an academic question: It has important implications for where founders invest their time and resources. If you’re a startup that assumes the data you’re collecting equals a durable moat, then you might underinvest in the other areas that actually do increase the defensibility of your business long term (verticalization, go-to-market dominance, post-sales account control, the winning brand, etc).

Data + network effects ≠ data network effects

Systems with network effects generally have the property of direct interactions between the nodes over a defined interface or protocol.

There generally isn’t an inherent network effect that comes from merely having more data.

Most data network effects are really scale effects. (economies of scale)

Most discussions around data defensibility actually boil down to scale effects, a dynamic that fits a looser definition of network effects in which there is no direct interaction between nodes. For instance, the Netflix recommendation engine

Yet even with scale effects, our observation is that data is rarely a strong enough moat. Unlike traditional economies of scale, where the economics of fixed, upfront investment can get increasingly favorable with scale over time, the exact opposite dynamic often plays out with data scale effects: The cost of adding unique data to your corpus may actually go up, while the value of incremental data goes down!

the ability to stay ahead of the pack tends to slow down, not speed up, with data scale. Instead of getting stronger, the defensible moat erodes as the data corpus grows and the competition races to catch up.

unless you understand the lifecycle of the data journey for your target domain, you’re not guaranteed defensibility; the following framework may help.

A practical framework for understanding the data journey

Minimum Viable Corpus

When most people talk about network effects, they focus on overcoming the bootstrapping or cold-start problem (colloquially called the “chicken-egg” problem)

But this isn’t necessarily true for many enterprise businesses with a data scale effect. Bootstrapping what we think of as the “minimum viable corpus” is sufficient to start training against, and is the first inflection point along a startup’s data journey.

Data Acquisition Cost

In a given corpus, getting the next piece of data tends to become more expensive to capture over time

Incremental Data Value

As you gather data, the data also tends to become less valuable to add to the corpus.

Data Freshness

In many real-world use cases, data goes stale over time… it is no longer relevant

When IS data defensible, and what can you do to manage this?

Because data moats clearly don’t last (or automatically happen) through data collection alone, carefully thinking about the strategies that map onto the data journey can help you compete with — and more intentionally and proactively keep up with — a data advantage

Bootstrap the initial corpus to compete with incumbents

founders can actually use this to their advantage to go head to head with incumbents that have data, but fail to apply it properly

use that know-how to accelerate ahead of incumbent competition before those incumbents figure out how to make sense of the data.

Know the distribution of the data

intimately understand the shape of the distribution, and to craft the right strategy to capture it. Is there a fat tail of critical data that’s hard to acquire? If so, what’s the plan to scale the corpus into the long tail? How important is accuracy in your domain?

The challenge we shared earlier — that so much of the learnings in many domains are in the long tail of exceptional use cases — can also be an advantage if you’re a first mover.

Understand the extent to which data improves your product

In some domains, having more data results in a dramatically better product.

Of course, understanding the extent to which data contributes to a product is not always straightforward. Often choice of algorithms or other product-feature tuning has a far greater impact than having more data alone.

Weigh the tradeoff between quality and quantity

One of the trickiest tradeoffs in nurturing a data corpus is how to balance quality versus quantity.

Secure proprietary data sources

Accumulating proprietary data is a defensible strategy that is strongest when the sources are scanty or are reticent to provide data to more than one vendor (such as government buyers).

Wither data moats…


Edited:    |       |    Search Twitter for discussion