Your Most Recent Fire Is Not Your Biggest | by Nicholas Kwan | Aug, 2022

How to tell if it is and what to do about it

Photo by Matt C on Unsplash

Working at a startup can sometimes feel like moving from one fire to the next. You never catch a break, and often multiple fires are going on at the same time. The constant firefighting can cause your team to feel like they are spinning their wheels and playing catch-up all the time.

In this article, I will go over the lessons I learned in the last few years and how I set up some structure to prevent the team from being overwhelmed by external requests.

The most telling symptom of constantly having to put out fires is that your sprint items that were put in during planning never get done when you expect them to. In the worst-case scenario, sometimes, no one even had the time to start on the story!

Depending on the stage of your company and your team’s mandate, this can be expected. However, from my experience, a development team that spends most of its efforts supporting existing software is challenging to manage, and turnover will inevitably be high. This further increases the time between raising a bug and fixing it.

For this article, we will assume that it is a big deal that your team cannot work on new features and make proactive improvements to tech debt. I have seen the inability to handle unexpected items on multiple teams, and the reaction to these items can vary from “let’s all jump into a meeting” to “let’s ignore it until the next sprint.” Both of these extremes will create further issues down the road.

If everything is urgent, nothing is urgent. — Patrick Lencioni

You have probably heard this cliche before. I repeat this to my team often, and different items are urgent to different people. For this reason, your team will need to come up with some guidelines before moving forward.

Let’s review these steps and how they can be integrated into your team. This assumes a team is currently using some flavour of agile with sprints.

  1. Form a consensus on what constitutes a Priority 1 (P1s) issue and Priority 2 issue (P2s)
  2. Form a process so that new tickets do not require the whole team to triage, just a subset of the team
  3. Abstract the inner workings of a team’s high-priority defects triage process from other teams
  4. Team agreement on how much tech debt should be prioritized so urgent P1s and P2s do not overwhelm the team
  5. If P1s get out of control, consider a bug-fixing sprint to tackle P2s and high-priority tech debt items to decrease the chances of P1s surfacing
  6. Once the team understands what should be pulled into a sprint outside of planning, allow anyone to pull in a ticket but make sure they mention it during daily scrum or stand-up

As a starting point, get the team to agree on what constitutes as high priority. For me, a high-priority ticket is something that prevents a high number of customers from accessing basic functionality. For example, the classic example of a P1 is that no users can log in.

As you move down the urgency and scope scale, you will have to draw a cutoff somewhere. The cutoff can be a function of how many users are affected and its effects. For example, if some members cannot use a key feature of a product, this would be a higher priority than if all members are unable to use a minor feature. To take it one step further, I would argue that both of these issues are not high-priority issues.

Next, get external members to log a ticket against the team. The process will prevent many backdoors that result in tickets getting lost. At first, this might seem too process-heavy, as engineers from different teams can quickly work out minor issues without involving others.

However, this can overwhelm engineers who have knowledge in many areas of the product. Instead, having a constant sprint backlog of items in priority order will remove bottlenecks on specific team members. At the same time, the product and the manager will have clearer visibility of what type of issues are affecting the team.

When there is a good process for the team to handle incoming issues during the sprint, it is time to determine if the amount of time dedicated to these issues is what the team wants. Of course, the amount of time will vary greatly depending on the product type and how much of it may be legacy. We will discuss in the following few sections some tricks to handle this.

As a side note, if your team is using scrum and it doesn’t seem to be working, this is a great time to revisit how some of your processes can be modified to handle interruptions. I will not diverge too much and go into agile and its philosophies here. Instead, I will dive deeper into this topic in another article to highlight how transitioning from scrum to kanban is one way to combat the need to constantly move tickets from one sprint to the next due to unexpected items mid-sprint.

When the CEO and CTO work closely with the product development team, many individual contributors will jump on the most recent thing that comes out of an exec, however small or large it is.

I have seen something as trivial as an error message in the logs for an extended period get prioritized by devs because the CTO noticed and mentioned it while investigating another item.

At an early-stage startup, the gatekeeper of this process will often have to make a lot of tough choices about turning down some fantastic features to create even better ones. Also, the team will have to hold off on some critical tech debt items from time to time. The one making the final call will need to know enough about the customers’ current issues, the tech stack and the development pain points, and future impactful features to pick and choose.

When I was the only engineering manager at the company, the CTO and I were constantly discussing prioritization for our engineering teams. We made many of these decisions together, with him having the final verdict if we disagreed. However, once we got into a rhythm, our decision on whether to pull something into a sprint or not was almost always the same.

As companies move beyond one team, there will be some customer-facing problems that are difficult to pinpoint exactly which part of the tech stack the issue stems from. Some companies have a centralized place to triage the bugs, but no single team can know the inner workings of a product with dozens, if not hundreds, of engineers working on it.

I have not come across a silver bullet to handle issues for large systems because the flow of data and the architecture of large systems all have unique challenges. However, I have found that the teams’ compositions play a large part.

Here are some of the ways I have seen this handled:

  1. Buffer team between the issues coming in and the team developing the product (aka support team vs product team)
  2. Adding in the necessary flex room to account for disruptions, such as only filling a sprint to 70% capacity
  3. Rotate between teams or individual members, allowing the disruption to be concentrated and the load to be split evenly

I have found the third one to be the most productive in most situations. Not having a dedicated resource can sometimes leave high-priority fixes unattended as everyone else is already occupied with in-progress items. The rotation also allows every member to experience pain points in a system and more fruitful discussions amongst the team(s) to address tech debt items.

Even if a product spans multiple teams, the rotation can still work if the teams have a working agreement about how top-level issues are triaged amongst the teams.

As the company grows, CEOs and other executives will become more distant from the day-to-day operations. However, they will continue to ask for specific features or bugs to be handled. Still, stay true to your new process with clear explanations about the decisions made. In most cases, the decision and rationale must be communicated to the executive in person or through email since they are unlikely to read through sprint tickets.

In this article, I went over some of my experiences ensuring that the team does not spend all their time responding to issues. As with all software engineering, your trade-offs depend on the team and the product.

When your team adopts a stricter way to control what can enter a sprint, designate a driver who will take the responsibility of making a judgment call on whether something should enter a sprint. Depending on how your team makes decisions about ticket priority, this could be the engineering manager, product manager, or Scrum Master.

Gather feedback from the team about the decisions made. With a strong process and practice, the team will learn to triage these distractions quickly and effectively without compromising the velocity or the customer’s needs. Distractions that pull the team away from planned work come in all shapes and forms; your team should be able to develop a good idea of what’s urgent and what’s not.

In my next article, I will talk about how our team transitioned our agile process from a scrum-inspired one to a Kanban-inspired one which further alleviated the problem of letting the most recent issue or outage derail the team’s work in progress.

Thanks for reading!

News Credit

%d bloggers like this: