Theory First - The Multi-Armed Bandit Problem
Entrepreneurs manage exploration-exploitation tradeoffs. The best way to understand them is by exploring the multi-armed bandit problem, which is a classic problem in probability theory that showcases the complexity of decision-making. Imagine walking into a casino, and in front of you is a row of slot machines, each referred to as a 'one-armed bandit'. Now, let's say you've got a fixed number of coins to play with. The question then becomes: how do you leave the casino with more money in your pocket?
Each machine has its own unknown probability of payout. You could get lucky with the first machine, but there might be another one that pays out more. Do you stick with the first machine, or do you try your luck with the others? Finding an answer to those questions is the 'multi-armed bandit problem'.
Let's use three slot machines, Bandit A, B, and C, and generate some hypothetical payout data:
- Bandit A: [€2, €2, €2, €3, €2, €2, €2, €2, €3, €2]
- Bandit B: [€0, €0, €€0, €0, €50, €0, €0, €0, €50, €0]
- Bandit C: [€1, €5, €1, €6, €1, €4, €1, €7, €1, €5]
Now, let's calculate the mean (average) and standard deviation for each bandit:
- Bandit A: Average = €2.2, varies little
- Bandit B: Average = €10, varies a lot
- Bandit C: Average = €3.2, varies more than Bandit A and less than Bandit B
How sure can you be that the observed average is the real average? Only pulling the bandits twice and calculating an average won’t help much (e.g. you would have concluded that Bandit B would be the worst slot machine, even though it is the best).
This demonstrates the need to evaluate both the average outcome (mean), the variability of the outcome (standard deviation), and the certainty gained through enough experiments when making decisions under uncertainty, whether it's playing slot machines, playing black jack, choosing the right marketing channel, and making strategic decisions in entrepreneurship.
Essentially, the multi-armed bandit problem presents a dilemma between exploration and exploitation. And it’s far closer to real-life than one would guess when learning about it for the first time. 'Exploration' is about gathering more information, in this case, trying out different slot machines to see which one gives the best return. 'Exploitation' is about using the information you have to get the best possible outcome, like sticking with a machine you know gives decent payouts.
Running a start-up is a lot like standing in front of those slot machines. You have a finite amount of resources and an array of options on where to invest those resources for the best return. There are multiple start-ups you could build and even once you’ve decided on a start up there are multiple customer groups you can target and multiple products you can build to target those customer groups. These options could be different markets, product features, business strategies, and so on. 'Exploration' in this context means trying out new options, like entering a new market or changing the product. 'Exploitation' is about focusing on known options that work, such as investing in a successful product or doubling down on a profitable market.
If you do too much exploration, you spread your resources too thin, potentially missing out on capitalizing what's already working well. But if you do too much exploitation, you might miss out on new opportunities that could be even more rewarding. In entrepreneurship, as in the multi-armed bandit problem, the goal is to find the right balance between exploration and exploitation to maximize return on your resources.
Most entrepreneurs enter the market with an exploitation-mindset. They are already convinced that their product will sell like wildfire, and they just want to push their new technology or product into the market as soon as possible. The multi-armed bandit problem is a nice mental model that allows you to understand how detrimental this can be. Even though exploration doesn’t often feel productive as you are not making more sales and winning lots of new customers, but it is simply the most productive thing you can possibly do when starting your venture.
Founders Managing Exploration-Exploitation Trade-Offs
I’ve noticed that founders face these trade-offs all the time. They come in all shapes and forms along the founding journey, whether this is about the employees you hire or finding product market fit. You need to explore for a while, decide when you’ve explored enough and then focus on exploitation until another exploration-exploitation trade-off manifests itself. The following journey is a typical journey of a founder described by means of those tradeoffs.
Keep in mind that the ultimate question of any of those trade-offs is always: When do I stop exploring and start exploiting? In this article, I want to give some useful answers, some gut feeling, some heuristics that help you with your considerations on “optimal stopping”.
The First Tradeoff: To Start or Not to Start
The very first question great founders encounter is whether to start a company or not. I always encourage them to think differently and ask a different question. I have written an article about this dilemma which is linked here. In this article, I want to explore this question from a different angle: Is starting a start-up really for you? The only way to find out is to explore. The payoff is incredibly high, but without enough exploration, you don’t know if you have the resilience and capabilities to run one - and, more importantly, whether you enjoy doing it. Only when you’ve explored enough can you be certain that building a start-up is for you and switch to exploitation mode, which is, in this case, building one start-up first and giving it your all.
The Second Tradeoff: Problem-Solution Fit
Soon enough, once you’ve decided to build a start-up, the second trade-off presents itself: The question remains whether you find a fitting solution to the problem you’re investigating. You can explore different problems and different solutions. It doesn’t matter where you start. I’ve seen great founders think of a solution or “tech” first. Max Levchin, the founder of Paypal and Affirm built both of these companies by getting excited about tech. Paypal really was a cryptography business in the beginning and Affirm was built of his excitement about AI, which required multiple pivots. Pivoting multiple times is very typical when you start solution-first. Other founders focus purely on the problem and want to find the optimal solution in relation to the problem they are excited about. A good example of such a founder is Elon Musk, who’s known for mainly describing the problems his start-ups solve rather than focussing on the solution. In the end, all people care about is getting their problem fixed, so eventually your solution will have to fit to a problem. As Theodore Levitt put it:
“People don't want a quarter-inch drill bit; they want a quarter-inch hole.”
No matter which strategy you pick, before you’ve found problem-solution fit there is nothing you can do but to explore. Different problem-solution pairs will have different potential and it is your job to find one of the best ones before starting to exploit. Dissimilar to the multi-armed bandit problem, you cannot calculate statistical significance here as the nature of your problem is quantitative. But you can follow the following proxies that are useful:
- Enthusiastic user/customer feedback, e.g. during exploration calls
- Organic growth of visitors on your website without active marketing
- Willingness to pay, i.e. customers are asking you when they can pay for your solution and how much it costs.
- MVP traction or positive feedback on a quick prototype you’ve built
- Expert validation, i.e. industry experts and investors telling you this is a great idea
- Market demand signals such as inbound inquiries or high engagement rates with your marketing content related to the problem you're solving.
- Competitive advantage, i.e. there is demand as indicated by competitors but you offer a clear differentiation from existing solutions that users can easily articulate
- Scalability potential, i.e. early indicators that your solution can address the problem for a broader market without significant changes (note: This is more of a concern of whether you actually wanna built a start-up around this problem-solution fit but not a question of problem-solution fit itself)
The Third Tradeoff: Product-Market Fit
Knowing that your solution fixes a problem isn’t enough. If you haven’t read about PMF, I encourage you to read Marc Andreessen’s phenomenal article about the subject. In short, you should explore until you know your product sells like wildfire. This can be indicated by (with some overlap with problem-solution fit):
- Exponential user growth, i.e. a rapid, often viral increase in user and customer acquisition without proportional marketing spend.
- Customer pull, i.e. prospects actively seeking out your product, sometimes even before you've officially launched in their market.
- Expanding use cases, i.e. users and customers finding innovative ways to apply your product beyond its initial intended purpose.
- Overwhelming demand, i.e. struggling to keep up with customer requests or server load due to high traffic.
- Industry buzz, i.e. unsolicited media coverage and industry influencers talking about your product leading to more sales
- Metrics exceeding benchmarks, i.e. key performance indicators significantly outperforming industry standards for your stage.
- Customer evangelism, i.e. users becoming vocal advocates, creating content, and actively promoting your product.
- Competitor reaction, i.e. established players in the market starting to respond to your presence or copy your features.
- Revenue acceleration, i.e. a sharp uptick in revenue growth rate, often accompanied by improving unit economics.
- Market leadership: Emerging as the go-to solution in your niche, even if you're not the first mover.
As put in the article, this is how you feel it:
“You can always feel when product/market fit isn’t happening. The customers aren’t quite getting value out of the product, word of mouth isn’t spreading, usage isn’t growing that fast, press reviews are kind of “blah”, the sales cycle takes too long, and lots of deals never close.
And you can always feel product/market fit when it’s happening. The customers are buying the product just as fast as you can make it—or usage is growing just as fast as you can add more servers. Money from customers is piling up in your company checking account. You’re hiring sales and customer support staff as fast as you can. Reporters are calling because they’ve heard about your hot new thing and they want to talk to you about it. You start getting entrepreneur of the year awards from Harvard Business School. Investment bankers are staking out your house. You could eat free for a year at Buck’s.”
The Fourth Tradeoff: Channel Fit
Brain Balfour found that distribution follows a power law. You will likely get most of your growth from one channel - by far. Without exploring enough channels you might miss out on an immense growth opportunity that outperforms all of your competitors. I believe this is true for many start-ups - they hone in on one channel that is good enough, without considering that others might be profoundly better.
He writes:
“If you look at most $100M+ companies, you will find this to be true:
UGC SEO: TripAdvisor, Yelp, Glassdoor, Pinterest, Houzz all got 70% of their growth from UGC SEO.
Virality: WhatsApp, Evernote, Dropbox, Slack all got 70%+ of their growth from some form of virality.
Paid Marketing: Supercell, Squarespace, Blue Apron all got 70%+ of their growth from some form of paid marketing.
The power law of distribution exists because of the concept of product channel fit. Companies that are able to achieve product channel fit with multiple channels are rare, but end up being monsters. LinkedIn is the perfect example where over time they've achieved Product Channel Fit with Virality, UGC SEO, and different forms of Inbound and Outbound Sales.”
Compared to problem-solution fit and product-market fit you can actually explore market channel fit quite quantitatively and use statistical significance as a means of defining when to stop. However, here are still some further considerations to power your gut feeling:
- The key is disproportionate ROI, i.e. the channel consistently delivers a significantly higher return on investment compared to other channels you've tried - as expected in a powerlaw
- Scalability, i.e. as you increase resources (time, money, effort) into this channel, you see consistent or improving returns without diminishing effects.
- Natural product-channel alignment, i.e. you notice your product's core features or benefits align seamlessly with the channel's strengths and user behaviour.
- User acquisition velocity, i.e. the speed at which you acquire new users through this channel outpaces other channels by a significant margin.
- Lower customer acquisition costs (CAC) coming from this channel
- Higher customer lifetime value (LTV) from users acquired through this channel (remember that different channels address different ICPs, sometimes without you actively noticing)
- In the best case: Resistance to market changes, i.e. you believe the channel's effectiveness remains stable even as market conditions or competitors' strategies shift.
While focusing on your dominant channel is important, it's also key to keep exploring other channels. Market dynamics can change, and having a diversified approach, even if heavily weighted towards one channel, provides resilience and new growth opportunities.
The Fifth Tradeoff: Organisational Growth
You need to constantly explore when hiring new people. Firstly, you need to have enough interviews until you make a hire and then you need to watch people for long enough until you decide they are a fit.
The golden rule in hiring is: Do at least 10 interviews for every position and give everyone a 3-month probationary period in which you can terminate the employment relationship within 1 week. You can never get it right just through interviewing. A good founder gets it right in ~50% of all cases and a brilliant one in ~80% of all cases. However, you will have to let people go who don’t perform in order to get the percentage of the high performers in your organisation close to 100%. The main question you should ask yourself after a 3-month probation period is: Would I hire this person again with everything I know about them now? If the answer is no: You need to let them go.
In essence, whenever you are hiring, you are managing exploration-exploitation trade-offs. The quality of your results is going to be defined by the optimal balance between exploration and exploitation, yet the exploration part is widely disregarded, especially when I see first-time founders build their first companies..
The Omnipresence of Exploration-Exploitation Trade-Offs
Even once you grow beyond 100 employees and have entered full growth mode, exploration-exploitation trade-offs will remain omnipresent. You will ponder:
- When to launch new products and features
- How to allocate resources between improving existing products and developing new ones
- When to enter new industries and expand your ICP
- Whether to expand into new geographies
- When to pivot your business model or revenue streams
- How to balance between organic growth and growth through acquisitions
- When to experiment with new marketing channels versus optimising existing ones
- Whether to develop in-house capabilities or outsource/partner for new competencies
- How to strike a balance between maintaining company culture and adapting to scale
If you manage those trade-offs well, you likely won’t have to pivot. I believe most founders go into exploitation-mode way too early and this leads to failure and pivots that could have been prevented early on. In short, being aware of those trade-offs and building useful mental models and heuristics that help you determine stopping points is one of the core skills you have to master as a founder. I personally think about these trade-offs all the time, and this led to building a multi-hundred-million € business in a few years. I am convinced that developing this muscle will provide you with an unfair advantage.