The non-technical challenges to implementing technology at scale
…but will it scale!? Let’s talk patterns you can follow for implementing at scale.
Implementation of any technology at scale has historically been complex and difficult. Each business may have their own requirements to meet, but overall, there are successful patterns that can be implemented across companies to help make deploying and managing systems at large scale easier.
Let’s talk about Proof Of Concepts
Before diving into those patterns, it is important to make a clear distinction between a proof of concept and deployment at scale. When teams or people are performing a proof of concept, this should be a short-lived example of whether or not an initial theory can be met. It is important to not get stuck in the details of implementation or “analysis paralysis” when performing a proof of concept.
If you find yourself answering questions about scaling, performance and extensibility during a proof of concept, you’ve gone too deep into the proof of concept black hole.
A proof of concept is like writing a topic sentence for a paper… and obviously all engineers love writing, especially when it comes to documentation! Think back to writing a topic sentence for a essay in school…since we’ve all probably let this information leave our brain, here’s some key points to writing a good topic sentence:
The sentence is clear and easy to understand It is not too general or too vague The simpler it is, the more understandable
Stick to points like these for the proof of concept and it will be successful.
It is very important for managers and leadership to understand the difference between a proof of concept and a full scale implementation. Too many times I’ve heard stories of proof of concepts going to production. Since an engineer may not always have a technical manager, it is important to highlight business outcomes of deploying proof of concepts to production.
Here are some examples you could use:
Deploying the proof of concept increases risk in the overall platform due to x,y and z. Risk = loss of $$, production outages/downtime, uncertain behavior of the application It is much cheaper and easier to design the initial system with scaling and performance requirements than to retrofit a system in place. The proof of concept never planned to go to production Work-life balance, mental health and burn out The on-call team is going to get paged more, support is going to get more tickets, developers and operations sprint boards are going to start to grow at a rate the teams cannot handle Burn out is bound to occur
Alright, so now the proof of concept is finished and management wants you to implement it at scale. What are some patterns you can use across products to help make everyone’s life easier?
Patterns to implementing at scale
Implementing at scale can feel like a momentous and considerably large task. If you don’t do it properly, you could negatively impact the lives of many people at your organization. On the other hand, you could make a positive impact and maybe even get that promotion you’ve been working towards.
As engineers, we are good at technology, we can find examples and proper resources to implement the technology effectively. The other, non-technical stuff is the hardest part of our job yet it is what often impacts our technology in the most complex ways.
In order to standardize, automate and deploy at scale, select a naming convention and more importantly STICK to that naming convention. Selecting a name or naming convention is one of those things we rarely talk about in a technical sense, but has major implications. Sometimes it can take days or weeks to come up with a naming convention. It is okay if it takes a long time to nail down the naming convention, as long as the discussions are productive and include multiple teams (yes, even security), not just the team working on the product.
If naming conventions are not standard, it is basically impossible to write effective automation at scale. Have fun with all those exceptions someone allowed.
A standardized naming convention should help the support team or anyone not familiar with the product easily identify the product and type of environment.
The only exception to this rule is… just kidding there is no exception.
Now that a standard naming convention has been agreed upon, start grouping. Create groups for the types of environments or deployments at your organization.
For example you may have groups such as:
From your naming convention, automatically add nodes into the proper group. I’d recommend having no more than 5 groups.
Be deliberate about adding new groups. The more exceptions allowed, the harder things become to manage effectively at scale. Just like the discussions and agreements made with naming conventions, the same type of discussions should take place for groups.
There are many ways you can implement groups today. Use what makes sense to your organization, it could be tags, resource group names, or any other logical container. Ensure that the group can easily be identified using automation across your organization.
Markdown documentation stored alongside code
No one likes documentation, but the reality of it is that people will want to know how to make changes to the product or application. You’ll have new hires that need to reference docs. Fight it all you want, but I guarantee you in the long run, these docs save you time and interruptions during the day.
- Contributing.md (or include a section in the readme for it)
- Architectural diagram
Write each document to be clear and without assumptions.
Annoyed you’ll have to update docs? You shouldn’t be. There should be a culture of making things better when you find something incorrect or hard to understand. You should be reviewing pull requests of documentation the majority of the time vs. updating it.
The 80/20 rule
Like anything in life, you’ll never be able to make everyone happy. Aim for making 80% of your audience happy. The 20% are outliers, they will either conform to the 80% or decide to continue down a harder, more complex path and stay in that 20% window.
Hold hard and fast to this rule, make it 90/10 if you want a simpler, more maintainable solution. We once again circle back to the idea of how making exceptions to initial decisions makes it nearly impossible to scale.
What if you need to allow others to integrate into the application or product? Not all products do, but it is a good idea to consider it for most. This concept is called extensibility, it is separate from allowing customizations inside the code base.
If you build it, they will come, and want to make it better. People use software in ways the original developers never considered or intended. So how do you let people use your software or product without you having to own every single edge case? Make it extensible!
Extensibility is how easily others can add onto your code, without compromising the initial product code.
This is really the best of both worlds. The product team only needs to worry about the stability of the core product as planned. Any other team, or organization, has the ability to import the core product and extend it. When other teams extend on top of the original product, they are responsible for their own customization and usage. These consumers will often find bugs for you in the core product.
Finally, make sure the core product team understands the difference between a bug in the product and how the product is meant to be used. This will help ensure that anyone extending the product also picks the best tool for the job, not the easiest or most convenient one.
A lot of non-technical decisions made when architecting a product impact our day to day lives. These decisions should be given just as much time and consideration as the technical ones. If you stick to the above guidance both deploying and also managing systems at scale becomes an easily automated task. Additionally, that awesome developer or operations employee now spends more time doing the work they were hired to do.