How & When to Set Up a Proper Data Stack in an Early Stage Startup
September 14, 2021
This article is part of a series we’re writing in collaboration with our network of Operating Partners at Samaipata. These posts strive to be useful & actionable for early stage startup founders, and cover most of the key challenges faced at this stage, ranging from tech hirings to culture growth pains. You can find them all here!
What does Gitlab, Airbnb and Stitch Fix have in common? Not a lot at first sight! Very different business models, spamming different industries… Yet, those 3 companies have understood very early on how data can be leveraged as a key source of competitive advantage: in each of their internal cultures, data is not considered as a service, but rather as a product.
Yet, at Samaipata, we have identified that this simple truth is often overlooked by founders when data is not at the core of their value proposition. This article came from an enlightening conversation we had on the topic with one of our Operating Partners, Enrique Colin, explaining what the purpose is of a data team and how & when to go about structuring one’s data & analytics stack.
On top of advising us at Samaipata, Enrique has been working at financial services company Ebury for the last 7 years. As VP of Data there, he is leading a team of 15 data scientists, analytics engineers, BI analysts and data engineers. He also helps our portfolio companies on data topics ranging from data science & analytics, data warehousing, BI & management reporting, to developing a data team.
Disclaimer: the content displayed in this article is mostly relevant for late Seed / Series A companies, in which a data product is not the core of the value proposition. Here by “data” we refer to any type of data: Sales, Marketing, Finance, Operations, Supply Chain, Customer Service, Online Platform… plus any other functional area specific to the business.
The purpose of the data team — What role should it play?
In early stage companies, the purpose of the data team should be to build a data product that enables the business to make better decisions. It shouldn’t be another “service” function, busy answering questions (i.e. reactive work) instead of generating useful insights (i.e. proactive work), and spending a lot of energy trying to convince C-levels about the veracity of their analysis.
The “data product” should span the entire organisation and include anything that people in the company uses to make decisions, i.e. all spreadsheets, all analytical tools used, all pieces of data flowing through people, platforms and processes…
The data team should have a clear vision and strategy for the data product being built, and it should be linked to revenue and efficiency-generating activities.
The data team should iterate with “customers” (i.e. co-workers) to improve the product.
Its “success” should be defined, not by its output (i.e. dashboards) but by its business impact on the 3–5 KPIs that define your business (e.g. number of decisions it is enabling).
The data team should not be serving the entire business too early on but should focus on the right market first (i.e. focus on serving a few functions of the business at first before expanding).
The modern data stack — How should you structure & set-up your data & analytics stack? And when should you start doing so?
The modern “data stack” can provide great leverage to the data team if set up properly. The good news is that, unlike 10 years ago when analytics tech was only accessible to large companies with huge resources, anyone now has a number of awesome, user-friendly tools available off-the-shelf for a fraction of the cost. It’s not about the How anymore, but about the What.
Catalysed by the launch of Amazon Redshift in 2012, the advent of Massive Parallel Processing (MPP) and the shift from ETL to ELT (more details on the evolution of the ecosystem here), the modern data stack comprises of a set of 4 building blocks or layers, all now fully interoperable:
1. Ingestion with data pipeline services & ETL tools (e.g. Fivetran, Stitch): this step is about transporting the data from various sources (e.g. CRM, back-end, website etc.) to a storage medium (the warehouse); it does the E & L in ELT (Extract, Load, Transform).
2. Warehousing with Cloud-based data warehouses (e.g. Google Big Query, Amazon Redshift, Snowflake): it’s about storing the data in a single place. Those tools can cost-effectively scale compute and storage resources with low latency; this allows data engineers to skip the preload transformations and load the organisation’s raw data into the data warehouse.
3. In-Warehouse Transformation witha data transformation tool (e.g. Dbt): this is about transforming the data already loaded in the warehouse through data models written in SQL. It does the T in ELT (Extract, Load, Transform). You can make the data more instrumental to be processed by e.g. linking together different types of data.
4. Business Intelligence (BI) tools withe.g. Looker, Google Data Studio, PowerBI: this is about building analytics reports on top of the processed, business data layer. Basically dashboards with computed ratios!
All of those tools are increasingly well connected to each other with native integrations being rolled out. It has thus become much easier to set-up your data infrastructure. Yet there is a right time for everything!
Do not over engineer analytics! It might be too early for a data warehouse, a BI platform and complex analytics.
You can rely on the analytics functionalities and built-in reporting of your existing SaaS tools and CRM, or set-up Google Analytics and MixPanel. At this stage it is more about having a “clean” tech stack to do some data analysis on top.
Above all what is key is to measure your product: it’s your product metrics that will help you iterate quickly in this phase. Before Series A, c.80% of decisions can be taken based on less than 2–3 hours of analysis. Once again, no need to over-engineer analytics when you basically have no time at all!
You don’t necessarily need a “proper” data team”: at this stage, it’s largely about infusing a “data mindset” to the entire organisation, with ideally e.g. with most people able to do easy SQL requests to access some info that might not be available on the dashboards. This will boost empowerment and efficiency. Last thing you want is for people to start annoying back-end developers with data requests.
For Series A companies (>20–50 FTES):
You can start setting-up your data infrastructure. This means choosing a Data warehouse, an ETL tool and a BI tool.
You can start hiring a team starting with a strong “Analytics Engineer” and/or a consultant. More on this crucial first hire below.
Finding the right data talent — How do you create the right data & analytics competency in your startup?
Those new technologies are redefining the careers of data professionals. We have witnessed the birth of the “Analytics Engineer”: technical analysts who apply software engineering best practices to the production and the maintenance of analytics code.
This profile is different from your average data analyst (> more engineering skills) and from your average data engineer (> more curious to solve analytical business questions). It’s key that they are business-oriented people.
Your startup’s first Data hire should be an Analytics Engineer who owns the entire data stack, sets solid foundations and starts building a team around him or her.
Then, as you hire more people, you can think along the lines of 2 development paths:
The Engineering path: Data engineers building custom-made data plumbing infrastructure in Python (when custom work is required) + Machine Learning engineers training & deploying models as data transformations.
The Analytics path: Analytics engineers building, documenting and maintaining core data sets + Data analysts performing frontline support and deep-dive work.
All in all, Enrique is convinced that you should not overlook the importance of data & analytics. Analytics should power decisions at every stage of a startup lifetime. Yet as with everything, it’s about proper sequencing: at Seed stages, data is essentially a tool to help decision making, not necessarily a team on its own as — as always, you have limited time & resources.
Check out dbt’s resources if you want to dig into some of the points: they are simply amazing. Gitlab and Netlify are great references as well. More precisely, please find below some of the resources used:
At Samaipata, we are always looking for ways to improve. Do not hesitate to send us your thoughts. We strive to partner with early-stage founders and to support them in taking their business to the next level. Check out more ways in which we can help here or for all our other content here
And as always, if you’re a European digital business founder looking for Seed funding, please send us your deck here or subscribe to our Quarterly updates here.
More insights to better the world through technology
Unlocking Business Expansion with This Strategic Framework
There comes a time within your business where you realise that your team needs additional skill set and expansion. Typically, product-market fit has been established and the opportunity to grow revenue and profit is there.
Scaling your customer service team: in-house or outsource?
As an early-stage startup, making the decision to manage your customer service team in-house or to outsource really depends on a variety of factors including where you are as a company in terms of your lifecycle, size and complexity, what your strategic customer service vision & goals are, and finally, what your financial resources and priorities are.
In a rapid-growth startup, demands can be high, budgets can be low, and processes can be lacking. There’s also strong competition for top talent in the startup environment and it can be challenging to keep talent as other companies also look to acquire people with specific skills, many of which can afford higher salaries.
Restructuring data teams that are ready to scale: 5 learnings from BlablaCar
In today's interconnected world, data has become a powerful driving force behind innovation and growth. Companies that harness the potential of data hold a competitive edge, and one such company at the forefront is BlaBlaCar. As a pioneer in the carpooling industry, BlaBlaCar has revolutionized the way people travel and the way it leverages data plays a critical role in its business strategy.
In this article, we delve into the fascinating world of BlaBlaCar's data team strategy, exploring how we restructured our teams in order to scale.
What role software can really play in helping us reach net zero goals?
While considerable venture capital investments have already flowed into the software sector, the intensifying climate crisis is pushing the need for radical action to the forefront. Software is currently receiving negative publicity as seen as involving too “shy” of an effort. On the other hand, hardware and infrastructure investments, which were traditionally overlooked by asset-light venture capitalists, are gaining momentum.