By clicking “Accept All Cookies”, you agree to the storing of first-party and third-party cookies on your device to enhance site navigation, analyze site usage, and assist in our marketing efforts. View our Cookie Policy for more information.

Language

How & When to Set Up a Proper Data Stack in an Early Stage Startup

This is some text inside of a div block.

September 14, 2021

‍This article is part of a series we’re writing in collaboration with our network of Operating Partners at Samaipata. These posts strive to be useful & actionable for early stage startup founders, and cover most of the key challenges faced at this stage, ranging from tech hirings to culture growth pains. You can find them all here!

What does Gitlab, Airbnb and Stitch Fix have in common? Not a lot at first sight! Very different business models, spamming different industries… Yet, those 3 companies have understood very early on how data can be leveraged as a key source of competitive advantage: in each of their internal cultures, data is not considered as a service, but rather as a product.

Yet, at Samaipata, we have identified that this simple truth is often overlooked by founders when data is not at the core of their value proposition. This article came from an enlightening conversation we had on the topic with one of our Operating Partners, Enrique Colin, explaining what the purpose is of a data team and how & when to go about structuring one’s data & analytics stack.

On top of advising us at Samaipata, Enrique has been working at financial services company Ebury for the last 7 years. As VP of Data there, he is leading a team of 15 data scientists, analytics engineers, BI analysts and data engineers. He also helps our portfolio companies on data topics ranging from data science & analytics, data warehousing, BI & management reporting, to developing a data team.

Disclaimer: the content displayed in this article is mostly relevant for late Seed / Series A companies, in which a data product is not the core of the value proposition. Here by “data” we refer to any type of data: Sales, Marketing, Finance, Operations, Supply Chain, Customer Service, Online Platform… plus any other functional area specific to the business.

The purpose of the data team — What role should it play?

In early stage companies, the purpose of the data team should be to build a data product that enables the business to make better decisions. It shouldn’t be another “service” function, busy answering questions (i.e. reactive work) instead of generating useful insights (i.e. proactive work), and spending a lot of energy trying to convince C-levels about the veracity of their analysis.

Instead, you should run your data team as a product team, with a true product mindset:

The “data product” should span the entire organisation and include anything that people in the company uses to make decisions, i.e. all spreadsheets, all analytical tools used, all pieces of data flowing through people, platforms and processes…
The data team should have a clear vision and strategy for the data product being built, and it should be linked to revenue and efficiency-generating activities.
The data team should iterate with “customers” (i.e. co-workers) to improve the product.
Its “success” should be defined, not by its output (i.e. dashboards) but by its business impact on the 3–5 KPIs that define your business (e.g. number of decisions it is enabling).
The data team should not be serving the entire business too early on but should focus on the right market first (i.e. focus on serving a few functions of the business at first before expanding).

The modern data stack — How should you structure & set-up your data & analytics stack? And when should you start doing so?

The modern “data stack” can provide great leverage to the data team if set up properly. The good news is that, unlike 10 years ago when analytics tech was only accessible to large companies with huge resources, anyone now has a number of awesome, user-friendly tools available off-the-shelf for a fraction of the cost. It’s not about the How anymore, but about the What.

Catalysed by the launch of Amazon Redshift in 2012, the advent of Massive Parallel Processing (MPP) and the shift from ETL to ELT (more details on the evolution of the ecosystem here), the modern data stack comprises of a set of 4 building blocks or layers, all now fully interoperable:

1. Ingestion with data pipeline services & ETL tools (e.g. Fivetran, Stitch): this step is about transporting the data from various sources (e.g. CRM, back-end, website etc.) to a storage medium (the warehouse); it does the E & L in ELT (Extract, Load, Transform).

2. Warehousing with Cloud-based data warehouses (e.g. Google Big Query, Amazon Redshift, Snowflake): it’s about storing the data in a single place. Those tools can cost-effectively scale compute and storage resources with low latency; this allows data engineers to skip the preload transformations and load the organisation’s raw data into the data warehouse.

3. In-Warehouse Transformation with a data transformation tool (e.g. Dbt): this is about transforming the data already loaded in the warehouse through data models written in SQL. It does the T in ELT (Extract, Load, Transform). You can make the data more instrumental to be processed by e.g. linking together different types of data.

4. Business Intelligence (BI) tools with e.g. Looker, Google Data Studio, PowerBI: this is about building analytics reports on top of the processed, business data layer. Basically dashboards with computed ratios!

All of those tools are increasingly well connected to each other with native integrations being rolled out. It has thus become much easier to set-up your data infrastructure. Yet there is a right time for everything!

As per Tristan Handy’s “Startup founder’s guide to Analytics”, for pre-Seed / Seed companies (<20 FTEs):

Do not over engineer analytics! It might be too early for a data warehouse, a BI platform and complex analytics.
You can rely on the analytics functionalities and built-in reporting of your existing SaaS tools and CRM, or set-up Google Analytics and MixPanel. At this stage it is more about having a “clean” tech stack to do some data analysis on top.
Above all what is key is to measure your product: it’s your product metrics that will help you iterate quickly in this phase. Before Series A, c.80% of decisions can be taken based on less than 2–3 hours of analysis. Once again, no need to over-engineer analytics when you basically have no time at all!
You don’t necessarily need a “proper” data team”: at this stage, it’s largely about infusing a “data mindset” to the entire organisation, with ideally e.g. with most people able to do easy SQL requests to access some info that might not be available on the dashboards. This will boost empowerment and efficiency. Last thing you want is for people to start annoying back-end developers with data requests.

For Series A companies (>20–50 FTES):

You can start setting-up your data infrastructure. This means choosing a Data warehouse, an ETL tool and a BI tool.
You can start hiring a team starting with a strong “Analytics Engineer” and/or a consultant. More on this crucial first hire below.

Finding the right data talent — How do you create the right data & analytics competency in your startup?

Those new technologies are redefining the careers of data professionals. We have witnessed the birth of the “Analytics Engineer”: technical analysts who apply software engineering best practices to the production and the maintenance of analytics code.

This profile is different from your average data analyst (> more engineering skills) and from your average data engineer (> more curious to solve analytical business questions). It’s key that they are business-oriented people.

Your startup’s first Data hire should be an Analytics Engineer who owns the entire data stack, sets solid foundations and starts building a team around him or her.

Then, as you hire more people, you can think along the lines of 2 development paths:

The Engineering path: Data engineers building custom-made data plumbing infrastructure in Python (when custom work is required) + Machine Learning engineers training & deploying models as data transformations.
The Analytics path: Analytics engineers building, documenting and maintaining core data sets + Data analysts performing frontline support and deep-dive work.

All in all, Enrique is convinced that you should not overlook the importance of data & analytics. Analytics should power decisions at every stage of a startup lifetime. Yet as with everything, it’s about proper sequencing: at Seed stages, data is essentially a tool to help decision making, not necessarily a team on its own as — as always, you have limited time & resources.

****

Check out dbt’s resources if you want to dig into some of the points: they are simply amazing. Gitlab and Netlify are great references as well. More precisely, please find below some of the resources used:

‍At Samaipata, we are always looking for ways to improve. Do not hesitate to send us your thoughts. We strive to partner with early-stage founders and to support them in taking their business to the next level. Check out more ways in which we can help here or for all our other content here‍

‍And as always, if you’re a European digital business founder looking for Seed funding, please send us your deck here or subscribe to our Quarterly updates here.

‍

Contributors

Aurore Falque-Pierrotin