TL;DR: Every company in the business of building software needs to invest in software infrastructure, but building and maintaining infrastructure that provides value and is cost effective is challenging. I've spent a lot of years as a software engineer thinking about this problem. Now I advise Bit Complete's clients on infrastructure and I've found myself needing to distill some of my thinking on the topic.
There's been a lot of talk about investing in infrastructure recently but to be concrete, what exactly is infrastructure? At a societal level it refers to things like mass transit, power plants and the water supply, so called "hard" infrastructure. But there is also "soft" infrastructure which are institutions and organizational structures that are equally important to the functioning of society: schools, law enforcement, and parks.
From these, it's pretty easy to draw analogs for software. Things like continuous integration/deployment, observability systems, application frameworks (all hard), code review processes, onboard training and on-call rotations (all soft) constitute software infrastructure.
Based on analogies in traditional infrastructure, here is a framework that can be applied when making decisions about investments in software infrastructure.
Align your investment with your business goals
When it comes to infrastructure investment, software or otherwise, the key question is what is its purpose? Infrastructure isn't an end in itself, it's there to facilitate some other objectives. The societal examples above might promote economic growth, improve the wellness of the citizenry, or protect society from some kind of harm. The relative investment in different kinds of infrastructure should reflect societal goals.
Analogously, software infrastructure investments should align with the objectives you have for your business. Maybe you're kicking off a large marketing campaign and site reliability is paramount to avoid wasting ad dollars. Or perhaps your engineering team size is expected to increase dramatically and therefore you need to ensure the effectiveness of your onboarding process. When you're deciding whether to invest in a project, make sure you understand how it will help you achieve your goals.
For example, Thumbtack kicked off its "mobile first" initiative in 2018 to capitalize on the significantly higher customer lifetime value of mobile users over web users. Unfortunately our mobile API was in rough shape because of our historic focus on the website, and so developing features was difficult, and the quality of our mobile experience wasn't where it should be. To address this I led a project to rebuild our API and surrounding infrastructure from the ground up. The result was a much more consistent and stable API, easier cross-platform feature development and experimentation, and a higher quality mobile product overall.
As software engineers, thinking about our company objectives helped us identify this as an impactful project. And for the leadership team, investing in this area was a no brainer, since they had already established mobile as a priority.
Invest early ... but not too early
So when is the right time to invest in infrastructure? Intuitively it's easy to understand that building an airport and a subway system for a town of 800 is probably not the best use of public money. The same is true with software. In the early stages of a product's development, you're just trying to figure out the market fit. Scaling out your serving capacity or designing a detailed career ladder at this stage is a bit like building a highway to nowhere, and the opportunities lost to building and carrying unnecessary infrastructure may be significant.
On the flip side, failing to invest in critical infrastructure can lead to other kinds of bad outcomes. The current COVID-19 pandemic is an object lesson in the costs of failing to act early and decisively when confronted with exponential growth. In the context of a business this can manifest as growing pains, as shown below:
Note that "business size" may refer to one of a variety of dimensions: revenue, users, employees, etc. The key point is that infrastructure investment should be commensurate with the scale of the business.
Although it's uncommon for businesses to invest too early in infrastructure in aggregate, it's all too common to invest in the wrong kinds of infrastructure. Thus it's important to remember to align infrastructure projects with business goals as discussed above. As long as you're disciplined in this way, opportunities lost to over-investment should be rare.
Late infrastructure investment is much more common, and much more pernicious. Just like the proverbial frog in boiling water, growing pains caused by under-investment in software infrastructure can be difficult to perceive. Some symptoms include:
- Grumbling from engineers about how difficult certain basic tasks are, or about the complexity of particular parts of the system
- Persistent bugs having a material effect on product quality that simply cannot be fixed by the team, or that get fixed but frequently suffer regressions
- An uptick in outages or other system-wide issues
At Thumbtack, one of the core components of the customer/pro matching system is the Scheduling Service. When you use the app to find pros, this service is responsible for figuring out which pros are available for the times you specified. A couple of years ago, this service was plagued by all of the symptoms described above: difficult to build on, data consistency issues that were hard to track down, and paging the on-call at least once a week. The team and I prioritized infrastructure work to get the service passing muster, after which it dropped off the radar completely.
Knowing when to dedicate resources to infrastructure is often as simple as paying attention to who's getting paged, which projects are late, and what parts of the product feel flaky.
Let ideas bubble up, then invest top down
In the public sector we rely on officials to make large infrastructure investments on our behalf, because (we hope) they have the information they need to make good decisions, and the resources to put money behind those decisions. But those officials aren’t usually directly identifying the problems and dreaming up solutions, they rely on public input for that.
This lesson applies well to the software domain. Engineering leadership must rely on those on the ground to get a handle on the variety and magnitude of the problems. But then they must be deliberate in investing based on that input. Hoping that local engineering teams will fix global problems is a bit like hoping that the citizenry will spontaneously organize and assemble a water treatment plant.
On the other hand, if engineering leaders push for infrastructure that lacks support on the ground, the project is likely to fail. There’s a two-way dynamic at play here: the ideas need to percolate from the bottom up, but the investment must come from the top down. Infrastructure projects that have both a broad base of support and a commitment from leaders are much more likely to be successful.
By way of example, in late 2016 YouTube's monolithic application server which had served us well for a decade of intense growth, was showing its age. It had performance problems, it was expensive to run, the release processes had become unmanageable, and the code was difficult for new engineers to work with. These were all grass roots observations. Leaders in the engineering organization decided to invest in migrating critical parts of YouTube's application logic to new serving infrastructure to address these issues.
Fast forward to today: many of the high traffic and user critical flows have been moved out to other services where they are easier to develop and release, and they are more performant and efficient. By many measures this was a successful project, and a big part of that success had to do with the way leadership deliberately and continuously invested in this area.
Not all infrastructure projects are quite as dramatic as YouTube's frontend migration, but few non-trivial projects will get off the ground without following this ideas up/invest down approach.
Next steps
In this article I've described a framework for thinking about the what, when, and how of software infrastructure investment:
- What you invest in should align with your business goals
- When symptoms demonstrate a real need, it's time to invest
- How you invest matters: listen to ideas, then commit real resources
If you'd like to apply this framework at your company but you're not quite sure where to start, Bit Complete can help.