(2012-10-30) Scaling Agile at Spotify

Henrik Kniberg: Scaling (PDF) Agile at Spotify. Dealing with multiple teams in a product development organization is always a challenge! One of the most impressive examples we’ve seen so far is Spotify, which has kept an agile mindset despite having scaled to over 30 teams across 3 cities.

Alistair Cockburn (one of the founding fathers of agile software development) visited Spotify and said “Nice - I've been looking for someone to implement this matrix format since 1992 :) so it is really welcome to see.”

Squads

The basic unit of development at Spotify is the Squad. (Agile Squad)

designed to feel like a mini-startup

self-organizing team and decide their own way of working (whole team)

Each squad has a long-term mission such as building and improving the Android client, creating the Spotify radio experience, scaling the backend systems, or providing payment solutions

Squads are encouraged to apply Lean Startup principles

Because each squad sticks with one mission and one part of the product for a long time, they can really become experts in that area

To promote learning and innovation, each squad is encouraged to spend roughly 10% of their time on “hack days

A squad doesn’t have a formally appointed squad leader, but it does have a product owner (product manager)... responsible for prioritizing the work to be done by the team, but is not involved with how they do their work

The product owners of different squads collaborate with each other to maintain a high-level roadmap document that shows where Spotify as a whole is heading, and each product owner is responsible for maintaining a matching product backlog for their squad.

Ideally each squad is fully autonomous with direct contact with their stakeholders, and no blocking dependencies to other squads. Basically a mini-startup. With over 30 teams, that is a challenge!

To aid in this, we run a quarterly survey with each squad. This helps focus our improvement efforts and find out what kind of organizational support is needed

Tribes

A tribe is a collection of squads that work in related areas – such as the music player, or backend infrastructure.

The tribe can be seen as the “incubator” for the squad mini-startups, and have a fair degree of freedom and autonomy

Tribes are sized based on the concept of the “Dunbar number

Tribes hold gatherings on a regular basis, an informal get-together where they show the rest of the tribe (or whoever shows up) what they are working on, what they have delivered and what others can learn from what they are currently doing

Squad dependencies

there will always be dependencies

we regularly ask all our squads which other squads they depend on, and to what extent those dependencies are blocking or slowing the squad down

We then discuss ways to eliminate the problematic dependencies, especially blocking and cross-tribe dependencies. This often leads to reprioritization, reorganization, architectural changes or technical solutions

Scrum has a practice called “scrum of scrums”, a synchronization meeting where one person from each team meets to discuss dependencies. We don’t usually do scrum of scrums at Spotify, mainly because most of the squads are fairly independent and don’t need such a coordination meeting. Instead, scrum of scrums happens “on demand”. For example we recently had a large project that required the coordinated work of multiple squads for a few months.

At Spotify there is a separate operations team, but their job is not to make releases for the squads - their job is to give the squads the support they need to release code themselves; support in the form of infrastructure, scripts, and routines. They are, in a sense, “building the road to production”.

Chapters and guilds

There is a downside to everything, and the potential downside to full autonomy is a loss of economies of scale. The tester in squad A may be wrestling with a problem that the tester in squad B solved last week

The chapter is your small family of people having similar skills and working within the same general competency area, within the same tribe

The chapter lead is line manager for his chapter members, with all the traditional responsibilities such as developing people, setting salaries, etc. However, the chapter lead is also part of a squad and is involved in the day-to-day work, (I don't think I agree with this, since this line manager doesn't work with the people in other squads that closely. I prefer Product Team Members Report To The Team Leader)

A Guild is a more organic and wide-reaching “community of interest”, a group of people that want to share knowledge, tools, code, and practices. Chapters are always local to a Tribe, while a guild usually cuts across the whole organization. (cf community of practice)

A guild often includes all the chapters working in that area and their members

Wait a sec, isn’t this just a matrix org? (matrix management)

Yes. Well, sort of. It’s a different type of matrix than what most of us are used to though

Our matrix is weighted towards delivery

people are grouped into stable co-located squads, where people with different skill sets collaborate and self-organize to deliver a great product. That’s the vertical dimension in the matrix, and it is the primary one since that is how people are physically grouped and where they spend most of their time.

The horizontal dimension is for sharing knowledge, tools, and code. The job of the chapter lead is to facilitate and support this

This matches the “professor and entrepreneur” model recommended by Mary Poppendieck and Tom Poppendieck. The PO is the “entrepreneur” or “product champion”, focusing on delivering a great product, while the chapter lead is the “professor” or “competency leader”, focusing on technical excellence.

What about architecture?

We have over 100 distinct systems, and each can be maintained and deployed separately. This includes backend services such as playlist management or search or payment, and clients such as the iPad player, and specific components such as the radio, or the “what’s new” section of the music player.

Technically, anyone is allowed to edit any system. Since the squads are effectively feature teams, they normally need to update multiple systems to get a new feature into production

The risk with this model is that the architecture of a system gets messed up if nobody focuses on the integrity of the system as a whole.

To mitigate this risk, we have a role called “System Owner”. All systems have a system owner, or a pair of system owners (we encourage pairing). For operationally critical systems, the System Owner is a Dev-Ops pair – that is, one person with a developer perspective and one person with an operations perspective.

The System Owner is not a bottleneck or ivory tower architect (architecture astronaut)

He is typically a squad member or chapter lead

Normally we try to keep this system ownership to less than a tenth of a person’s time,

We also have a chief architect role, a person who coordinates work on high-level architectural issues that cut across multiple systems.

The feedback is always just suggestions and input - the decision for the final design of the system still lies with the squad building it.

over 3 years we have grown from 30 to 250 people in tech


Edited:    |       |    Search Twitter for discussion