Engineering Leadership: Applying Tech Scaling Principles to Team Management

Posted February 18, 2024
Comments 0

The only constant in life is change, and businesses are no exception! Maybe its pivoting projects based on business needs, maybe it’s changing market conditions, and maybe it’s keeping up with your ever-changing, ever-scaling, customer demands. If you are in Engineering Leadership or DevOps you are probably all too familiar with that last one. Having experienced the customer growth cycle from the engineering perspective at many different companies of many different scales, I have recently been thinking a lot about what systems and practices worked well, and which ones I’ve experienced in the past that fell flat.

The concept of ‘scale’, even from the relatively narrow lens of engineering, can mean many different things: Writing code that is extendable, optimizing servers, load balancing, standards and practices for growing teams, and on and on. There is no end to articles and resources on technical topics like load balancing, auto-scaling, and optimizing resources to handle increased workloads and traffic. What interests me though is how we can take those technical methodologies and apply them to management and leadership concepts. All too often, SOPs (Standard Operating Procedures), and other three-letter acronyms, are developed by starting in a theoretical vacuum and then applied to practical situations. But there’s no reason why we can’t apply some of the hard-learned lessons from technical best practices and apply them to the world of team leadership and design.

Take load balancing for example. It’s a practice applied to servers to spin up new instances of your application (or microservices) when under increased user load, and then assign incoming users across your instances in a ‘balanced’ manner. It’s a way to keep up with extra load during high-use times. The same concept can be applied to technical teams when in high-volume development or crunch times. Think about it, your teams are working at their normal capacity, on regular projects, when suddenly a massive new contract requires a massive and quick ramp-up of work from engineering for a short period. Just like a massive influx of users on the system, the same principles of load balancing can apply!

Let’s look at the definitions that are needed when defining a load-balancing system for a server and then apply them to a team-based environment.

1) Thresholds and Capacity

Technical load balancing: Server thresholds. First, we define warning signs that a server (or infrastructure) is nearing capacity. This is usually a metric with increasing gates, and can sometimes be as simple as user count or CPU utilization triggering warnings at over 10%, over 25%, over 50%, etc… and then automatically causing our load balancing process to kick in at a specific threshold.

Team load balancing: Team capacity. We can apply the same principle directly to your teams by having metrics that measure their capacity. Luckily for us, there are already plenty of tools available to use such as Agile for tracking teamwork, capacity, and velocity. All we need to do now is define and apply thresholds that tell us when we are going in a dangerous direction and signs that a team is nearing capacity. When these thresholds trigger, we can kick into load-balancing mode.

2) Bringing in resources

Technical load balancing: Spinning up new resources. So what happens when the threshold is triggered and the server is running low on resources? In the simplest terms, it starts spinning up new resources to take on the extra load. As load thresholds are crossed, additional resources (whether that be new instances, virtual resources, RAM, CPU, etc…) are brought in and turned on, allowing the infrastructure to handle the increased load for a short period. Once the activity spike subsides, those resources are spun back down again.

Team load balancing: Bringing in additional support. So how do we apply the same lesson when one or more teams is nearing max capacity, or showing warning signs of being under extraordinary workloads? The same way the servers handle the extra load: bringing in additional support from alternate teams, or spinning up ad-hoc teams to begin to balance the workload between our new working groups. Sounds simple enough, but the real key is in the next couple of steps.

3) Load balancing

Technical load balancing: User allocation. This is where the real magic happens, once we have the resources in play to handle the additional stress that our infrastructure is under, we need to begin allocating the incoming load in a balanced way across all of the available resources. There are multiple methods of handling this from a technical perspective. Still, from a high-level overview it comes down to 1) putting incoming work into a queue 2) determining the next-up server to handle incoming (by lowest current usage or maybe as simple as round-robin), and 3) assigning the work to the next-up server. By now, you probably see where this is going.

Team load balancing: Task allocation. The same three steps can be applied to our teams as we spin up ad-hoc working groups or bring in additional resources. 1) Put incoming work into a queue, odds are you’re already implementing this via project management software or issue tracking software somehow (Jira, Monday.com, etc…). If you’re not, there’s no better time than the present to get started! 2) Use your teams workload metrics to track available resources to begin task assignments, or begin some time of output pipeline for your queue. 3) Standardize and implement your queue system with your teams so that they can begin pulling issues and triumphing over our work spike!

4) Documentation and SOPs

As always, the key to all of this is going to be documentation and making sure that all your teams are fully aligned and working together. Just like a DevOps team would document and clearly define each step on the load-balancing pipeline, it is crucial to document responsibilities and workflows the same way you would enforce documentation of code. This allows teams to scale quickly when necessary and not get caught in the trap of having to learn the ropes as they are brought into ad-hoc working teams and spin up new tasks. The better your systems and processes are documented and shared between your teams, the better equipped they will be to overcome challenges like these as they are presented.

Author Josh Voitko
Categories Engineering Leadership

Comments

There are currently no comments on this article.