The more people search online, the more data they generate, the more Google can improve their algo performance and personalise the search experience, making their engine in turn more valuable to each user. Pretty amazing uh?
This is called “data network effects (nfx)”. Within a data network, they theoretically occur when every node in the network feeds useful data back to the central database. As the aggregated data accretes, the value of the service provided to each individual node grows consequently. Think of Waze where users of the app feed back real time data (e.g. works signalisation, traffic bottlenecks etc.), hence increasing the utility of the app for the other users.
First, the value of incremental data typically goes down. At first, your machine learning algorithm learns very quickly, but you can quickly get to an asymptote, with new data being added to the model not translating into the same level of accuracy improvement. Second, the “cost of data” can go up significantly: it can gradually become more expensive & time consuming to collect useful data and to maintain the data infrastructure (i.e. growing computational costs, extensive data-points labelling, increasing security costs etc.). Third, oftenthe relationship between product usage and the amount of useful new data gathered can be asymmetrical: on TripAdvisor, not everyone is a contributor; most users passively use the platform. And finally, data network effects are only one type of defensibility one can build with data: they are often confused with data advantages that can come from mere scale; data “embedding” can also ensure product stickiness through high switching costs.
In the end, can data really generate a strong moat? Are data nfx a sustainable source of competitive advantage for your company?
It appears that it could be the case under very specific preconditions defined below in a 7-points framework.
The Data-nfx 7 points-framework
1. The product/service value should increase with more incremental data.
Short and positive feedback loops should be built into the product to ensure that the new data generated is quickly incorporated back into the model. And product and/or services’ value increases automatically as more data is added.
2. Higher product/service usage generates more useful data.
Ideally data production is native, with proprietary data generation (vs. inputing third party data into the model); and the new data produced is of high quality.
3. The value created by the data is core to the model’s value proposition & to the way the product/service benefits the user.
Data needs to be central to the way the product benefits its users. For instance, recommendation engines using data to push personalised content are typically only a feature of product or service and not its core: the data nfx of a Netflix or of a Spotify tend to be weaker than advertised. At the end of the day, the main value of such platforms lies in the content itself.
4. There is no asymptotic value of additional data.
This is typically the case if the service provided is real-time or dynamic for instance: highly dynamic datasets entail perpetual learning curves. It could be fine to have an asymptote, yet if it’s the case the threshold at which this asymptote occurs need to be super high.
5. There are no asymmetries in data contribution by members of the network.
All nodes need to contribute to the data network and produce useful data. For instance, the ML model behind Gboard is fed by anonymised typing data coming from all Android devices.
6. The data production/collection process is scalable and cost effective.
Ideally data capture and data labelling can be automated to keep the cost of new data production & incorporation low.
7. The “minimally viable corpus” (i.e. min. amount of data one required to start training the model) needs to be high.
The threshold for the amount of data required before the product starts providing value needs to be high as to provide scale defensibility against competitors entering the market. For instance, one of the key strengths of the research platform Owkin is their proprietary network of hospitals and clinics, providing them with large, real-world health datasets to fuel their disease detection models. This is very linked to the marketplace infamous chicken & egg challenge.
Under those 7 conditions, data network effects could potentially be an interesting source of defensibility. At Samaipata, we invest in digital platforms displaying increasing returns, and we believe that data nfx are becoming a possibility for a much broader group of companies, even at early stages. This is allowed by recent advances in technologies such as i) Big Data tools allowing for cheaper and faster infrastructure to process large amounts of data, ii) advances in Machine Learning / Deep learning and an increasing number of off-the-shelf tools and algorithms to automatically analyse and learn from those large amount of data, and finally iii) the rise of Cloud computing to be able to process it. The next generation of engineers increasingly moving from R&D-types of projects to business cases is also contributing to the market momentum as tech talent is fuelling into the startup scene.
At Samaipata, we are always looking for ways to improve. Do not hesitate to send us your thoughts. We strive to partner with early-stage founders and to support them in taking their business to the next level. Check out more ways in which we can help here or for all our other content here
And as always, if you’re a European digital business founder looking for Seed funding, please send us your deck here or subscribe to our Quarterly updates here.
More insights to better the world through technology
Unlocking Business Expansion with This Strategic Framework
There comes a time within your business where you realise that your team needs additional skill set and expansion. Typically, product-market fit has been established and the opportunity to grow revenue and profit is there.
Scaling your customer service team: in-house or outsource?
As an early-stage startup, making the decision to manage your customer service team in-house or to outsource really depends on a variety of factors including where you are as a company in terms of your lifecycle, size and complexity, what your strategic customer service vision & goals are, and finally, what your financial resources and priorities are.
In a rapid-growth startup, demands can be high, budgets can be low, and processes can be lacking. There’s also strong competition for top talent in the startup environment and it can be challenging to keep talent as other companies also look to acquire people with specific skills, many of which can afford higher salaries.
Restructuring data teams that are ready to scale: 5 learnings from BlablaCar
In today's interconnected world, data has become a powerful driving force behind innovation and growth. Companies that harness the potential of data hold a competitive edge, and one such company at the forefront is BlaBlaCar. As a pioneer in the carpooling industry, BlaBlaCar has revolutionized the way people travel and the way it leverages data plays a critical role in its business strategy.
In this article, we delve into the fascinating world of BlaBlaCar's data team strategy, exploring how we restructured our teams in order to scale.
What role software can really play in helping us reach net zero goals?
While considerable venture capital investments have already flowed into the software sector, the intensifying climate crisis is pushing the need for radical action to the forefront. Software is currently receiving negative publicity as seen as involving too “shy” of an effort. On the other hand, hardware and infrastructure investments, which were traditionally overlooked by asset-light venture capitalists, are gaining momentum.