6 ways staff engineers help reach clarity
How to bring clarity without mandate over people or product
One of the things that I catch myself doing more and more as a staff engineer is to help individuals, teams or sometimes the organisation reach clarity. I’m no oracle and this is not rocket science but in this article I’d like to go meta and clarify the process of reaching clarity with some examples.
Until we reach singularity, businesses are run by people. People come from different backgrounds and bring different perspectives to the table. Every person hired to the company, adds to its DNA. However, no matter how great the employees are, misalignment can ruin the end result.
Image source: LinkedIn
Despite different agendas and diverse backgrounds, the majority of the people want to do good for their company. It gives meaning to their lives. They want to win and for the right cause, they are willing to give it their best. Unfortunately, doing their best is not enough. They need to push in the same direction. And here’s when clarity is important. We need to make sure that everyone has the same truth.
The ability to collaborate both flexibility and in large numbers is the key to our survival as a species:
At its core, clarity is about understanding. One of the key value propositions for staff engineers is to reach collective understanding across teams.
The lack of clarity usually presents itself in two ways:
Active: The person/team reaches out with a concrete question or request for feedback.
Passive: The output from the person/team shows that they are operating with the wrong assumptions and their truth and perspective is not aligned with that of the business.
The difficulty of reaching clarity depends on many factors:
How invested are they in their truth? How much work has been implemented?
How open are they? Have they failed and acknowledge that they need a different perspective?
Where is the truth? In one person? Another team? Behind an experiment or passage of time?
0. Build the base
If you’ve even taken an airplane, you’ve seen a version of this:
You can’t save anyone if you’re suffocating!
All the clarity methods discussed in the rest of this article depend on having a strong base yourself to be able to help others. You need to have a good understanding of:
People: who does what? What does the org look like? What are the organisational dynamics and forces? Who are the stakeholders and users? etc.
Domain: What’s the problem that the business is solving? What objects, relations, data and processes build the solution? etc.
Tech: What services are in place? Where do they run? How do they interact? How are they tested, deployed, monitored? etc.
I have an upcoming post about habits of highly effective staff engineers.
Subscribe to get the latest posts in your mailbox
Alright! Let’s get to it.
Broadly speaking, there are 6 ways that staff engineers help teams and individuals reach clarity. They’re listed from easy to hard:
1. You have the answer
This is the simplest scenario where an individual or team needs help and the answer is within your domain of knowledge, mandate or responsibility.
Although it is a satisfying feeling to be able to single handedly untie other people’s knots, if you do it too much, it might be the symptom of a broken ownership trio:
Knowledge: the team should have the knowledge to operate independently and when needed acquire that knowledge autonomously cheaper than having to go to you.
Mandate: instead of gatekeeping and acting as a benevolent babysitter, aim to establish processes which allow the team to autonomously make decisions.
Responsibility: if the knowledge and mandate are adjusted as pointed above, there’s no reason that the responsibility should reside outside the team with the staff engineer.
Example: when I rolled out the Architectural Decision Records at a new organisation (topic for an upcoming post), several people reached out asking different questions about the format and process. One of my professional principles is to deprecate myself. I did answer each question exactly once because it went directly to the FAQ on a wiki. This has multiple benefits:
When people asked a question that I have already answered, I just pointed them to the FAQ saving their and my time.
When people checked the FAQ, they came across other questions which helped them understand the system even better.
Gradually, people learned to check the FAQ instead of asking me.
People started contributing to the FAQ either correcting the information or adding new questions to it. As someone who got the diff for every edit, I kept learning about the latest developments in the system.
I wasn’t a single point of failure (SOP) for the organisation and regardless if I was on vacation or sick leave, people could get further than if we didn’t have the FAQ.
Last but not least, it reduced my distraction level so I could be more efficient helping other things.
Some days, you’re the go-to person!
2. The answer is within
Sometimes the person asking the question also has the answer. They just don’t see it that way.
In this case, you need to actively listen to the problem and ask the right questions. There’s nothing magical about the “right” here other than the fact that your experience and exposure to the business goals and insights gives you a different perspective. You will both learn together through this discussion journey.
This classic illustration applies perfectly to this situation: all you have to do is to collaboratively connect the dots:
Example: I was the tech lead for a platform shared by some 150+ people. Every now and then someone encountered a use case that they deemed “special”. So they reached out for advice about where and how to solve it on the platform. They often knew the business case, the stakeholders and the requirements better than me. They also knew the code (we had good documentation and FAQ). So I asked questions to help them see the problem from different angels and judge themselves where to put the solution. It was a useful learning experience for me as well. These special cases helped mature the platform API and developer experience (DX) over time.
These situations were tricky because as an engineer I love to just solve it myself but as mentioned earlier my ultimate goal is to deprecate myself. So I used these as an opportunity to hand-over my knowledge and grow the team/person. There are two useful tools to achieve this goal:
Pair programming: we solve the problem together. There’s a risk that one person takes the lead and leaves the other one in the dust. It is important to use this time effectively as a coaching for gaining hands-on experience.
Mob programming: multiple people work on the same problem but for a solution to go through the keyboard, is has to go through someone else’s hands (hence mitigating the biggest risk with pair programming)
It is important to solve the problem together. That way, they own the solution and not me.
Some days, you’re a mentor.
3. The answer is out there
There are a few scenarios where the answer is out there and you approach it together.
3.1 Another person/team has the answer
Often for larger organisations and/or higher level questions, the answer is in another team but due to the knowledge silos or lack of ownership the clarity falls between the cracks. It goes faster if the staff engineer has visibility into the other team but that’s not always the case. Besides when you are new or the company has recently gone through reorg, it is tricky to even find the start of the rope.
Example: the iOS team were told to implement a podcast feature but they had little to no visibility on how the metadata should look like. I connected them to the upstream API team where there was a wealth of documentation but there were some missing pieces for the new podcast feature. In this case, the result of one team collaborating with another team was to add a few fields to the API and improve the documentation as the formal contract between the two teams.
Some days, you’re just a catalyst!
3.2 The answer is scattered across multiple teams/people
Many staff engineers operate across multiple teams. Well executed ownership trio may lead to silos. One of the main value propositions for the staff engineer role is to break knowledge silos. To be able to bring clarity, they need to actively involve themselves outside their immediate teams and have visibility across the larger organisation.
A common technique is to break the question into smaller pieces which fall within the domain of different teams or people. This breakdown may need the team(s) to do some work (e.g. drawing diagrams) to find the smaller problem boundaries within the larger need or problem domain.
This is by far one of the most time consuming way to bring clarity but also the one with the highest ROI (return of investment).
Example: a global company decided to migrate their infrastructure from ECS to EKS. For those not familiar with these AWS services, the former is older and proprietary, while the latter is based on Kubernetes. One of the goals of the migration was to understand the obscure role of the operation team behind the old system and modernise it with DevOps best practices. When I joined, the migration was already 2 months behind schedule. Several people had quit and there was a risk that this migration would turn into an operational hell. I was dropped into it along with an experienced TPM. We divided the effort between ourselves and covered each other. I would talk to the engineers to learn as much as I can about various teams, processes and technologies involved and the TPM would spend most of the time on making sure that we have an actionable, realistic and timely backlog. We synced between 5 teams as well as upper management.
The trickiest part of the project was to extract relevant knowledge, sync with relevant people from those teams and make sure that we are making the right technical decisions along the way. I heavily used documentation to both expose what I’ve learned (to get feedback) and bring everyone on board. Fortunately those long hours paid off and we got traction to finish the main chunk of migration 1 week ahead of time.
Some days, you are a detective.
3.3 The answer hasn’t trickled down
Sometimes the insight is in leadership’s head but it hasn’t trickled down in an actionable form. Sure everyone got the memo or internal newsletter but in a world where we’re bombarded with information, sometimes it is not so obvious what a specific message impacts the day to day lives of its target.
The larger the organisation and the more layers between where the decision is made and where it is executed, the more pronounced is this problem.
The simplest mitigation is to make sure that the leadership keeps the people whose life will be impacted by the decision in the loop and make those decisions together. In a top-down culture this might be far from reality and as a staff engineer you have a responsibility to educate up and anchor the need for the participatory decision making process.
Sometimes you may be expected to represent your teams (as a practical substitute for getting tens of team members at the table). It can only work if you have a solid understanding of the domain, people and tech as well as keeping the teams in the loop. In practice the most effective staff engineers act as ambassadors of the teams they represent and strive to create a two-way transparent information flow between the leadership and the teams.
Example: the frontend team was impacted by an infrastructure migration project but they know little about why the migration was done in the first place. Part of the reason is because they were kept away from the infrastructure for too long.
As one of the people leading the migration, I got a good exposure to why, what and how of the migration but this knowledge didn’t exist outside my head in a cohesive way.
So I set out to write a migration strategy document to put the pieces of the puzzle together clarifying the goals, approach and roadmap to everyone involved and impacted by the migration. I also collaborated closely with the team leads to arrange some education for their expanded ownership.
Some days, you’re just a communicator.
4. The answer is behind an experiment
Sometimes the clarity cannot be reached without doing some experiments. This is often the case for technical questions and how they fit together to build a solution. A PoC (proof of concept) is one solution. Sometimes the teams confuse a successful PoC with an MVP (minimum viable product) worthy to go to production. In those cases, a risk assessment is in order to make sure that the full implication of taking an experiment to the customer is understood.
The PoC takes team bandwidth so it needs to be weighted against the team backlog. Sometimes, I work with the team to make a case and write a proposal to get the manager buy-in and dedicate enough bandwidth to the PoC.
As a staff engineer it is very tempting to go and do it yourself. But remember: you build it, you own it. Think about how many things you can own and still be productive. The better approach is to sit together, define the problem and support the team doing the experimentation themselves. That’s how they grow. If you want to be close to ground zero, that’s a trade-off you need to make considering all of your other responsibilities and potentially impact points.
Example: at one company, the software reliability issues were crippling the company’s core business and led to many unhappy users. The management came up with the idea of using GitFlow as a solution to always have a predictable, ready-to-test version of the software. As you can imagine there was a huge push back from the engineers because GitFlow hindered their feedback loop. At the core of the problem was lack of trust from the management (backed by user feedback). As I always say “you cannot fix people’s problems with technical solutions”. A better solution would be to clarify the ownership models but also extend it in a way that one team would be responsible for an end to end product instead of one team per component.
In this particular case, I worked with one team to write a proposal for an experiment with a clear hypothesis and research methodology complete with data gathering and analysis to objectively prove that there is a better way. The team loved it. In a nutshell the idea was to use error budgets to put the team in charge of their deliveries.
Some days, you’re a researcher.
5. The answer is in the future
This is an interesting one. Sometimes, the most reliable way to reach clarity is to wait and see. How will the market react to a new feature? How will the upcoming legislation impact our business? How will the new leadership change our daily lives? etc.
The problem is that in a competitive market, you cannot afford to just wait! That time can give an edge in the competition.
To bring clarity, use this time to do a risk analysis and be prepared for different outcomes. You can do a PoC for different scenarios to have a more concrete assessment.
One may even build a product in the forecast of the future. But be mindful not to generate work just to have something to do. Always be asking: “am I doing the right thing?”
Example: I was hired as a frontend developer building an interactive web interface on top of a backend. The only problem was that the backend didn’t exist for practical purposes. According to the best estimates, the backend team was 3 months behind. All while the PM wanted to see something on screen to evaluate the product UX. So I mocked the backend API just enough to make a high-fidelity prototype and unblock the frontend work. This got a lot of praise and as a side effect, helped the backend to define the shape of their API as our prototype evolved before they were ready. This reduced the pressure on them while enabled the UX team to do some work in parallel.
Some days you just wait, but hopefully not idle!
6. The answer is unknown
Occasionally, it may turn out that something is not clear and no experiment or passage of time is going to make it clearer. It is out of our control.
One needs to be pragmatic and work around the unknown. One tool that comes handy is the risk analysis where the unknown factor is treated as a threat and its impacts and likelihood is assessed together with mitigation methods. The idea is to minimise the risk of the “unknown” hurting the business or the people.
Example: The system that one of my teams owned, depended on an unstable 3rd party API. The 3rd party was a small startup and despite their best efforts and promises we genuinely couldn’t predict when their API went down. We identified 2 workarounds: 1. To use circuit breakers to reduce the risk of our product going down in the case the 3rd party API had an issue, 2. Create a fallback that would fetch the 3rd party API and store the result on S3 which has a solid SLA (I’ve dissected the S3 SLA here). As a bonus, our workaround was way cheaper to run as we did not call the 3rd party API as often.
Some days, you work around the unknown.
We wnt through 6 ways to reach clarity. Some are easy, some are hard and expensive. In practice however, a combination of those methods are used to reach clarity.
We also mentioned a few tools: PoC, Strategy, Risk analysis, mob/pair programming, FAQ, etc.
If you liked what you read, consider subscribing to get the latest post directly in your inbox. You can also follow me on LinkedIn for more frequent inspiration about tech leadership, web architecture and organisational culture.
Also you can inspire people around you by sharing this post.