What Is Scalability and How Do You Build for It? 6 Engineers Weigh In.

What does it take to build a truly scalable system? Built In Chicago asked six engineering leaders from different industries to share what tools and processes they trust most when it comes to building scalable technology.

Written by Adrienne Teeley
Published on Jul. 21, 2020
What Is Scalability and How Do You Build for It? 6 Engineers Weigh In.
Brand Studio Logo

When you think of scalability, think of Black Friday. 

At least that’s what Alex Bugosh, a principal software engineer at Jellyvision, does.

“The classic problem of scalability is that of an e-commerce system,” Bugosh said. “It needs to be able to handle the traffic of Black Friday while being economical enough to run the rest of the year.” 

An e-commerce system that lags or experiences downtime can impact sales and user experience dramatically, so it needs to be built with a rush of shoppers in mind. Otherwise, users will get frustrated and move on. This idea of scalability can be applied to not just online shopping but any industry that experiences surges: finance, healthcare and even certain SaaS applications. 

And while avoiding system overload is a core component of scalability, Ryan Fischer, founder of 20spokes, said there’s more to it than that. Technology needs to be adaptable enough for modifications and tools to be added to support future needs.  

“You want your product not only to be able to handle increased activity but also be able to adapt in its feature set to meet the needs of the customer,” Fischer said. “This is crucial in the technology you choose, as it needs to be flexible and remove roadblocks preventing change.”

So, what does it take to build a truly scalable system?

Built In Chicago asked six engineering leaders from different industries to share what tools and processes they trust most when it comes to building scalable technology.

 

Optiver team photo
Optiver
Will Wood
Team Lead Infrastructure & Control • Optiver

Optiver, a global electronic market maker, uses their own capital at their own risk to provide liquidity to financial markets. Optiver's engineers and traders come together to craft simple solutions to complex problems. Will Wood, based in Chicago, said that to keep tech scalable, his team uses load testing systems they build themselves to see how their tech reacts in specific situations.

 

Describe what scalability means to you. Why is scalability important for the technology you're building?

Scalability means being able to easily handle the next busy market day. The technology I'm building is at the center of the environment, so if it has performance problems, the impact will be large. If it scales well, the firm will be able to remain fully active through extreme market conditions. My goal is that this technology has the same performance characteristics on an extreme day as it has on an average one.
 

My goal is that this technology has the same performance characteristics on an extreme day as it has on an average one.”


How do you build this tech with scalability in mind? 

I try to reduce the variance in performance as much as possible. In order to accomplish this, I choose algorithms with consistent performance, use simple programming language features and keep the resources my systems use isolated from interference. 

I also design my systems to handle a specific load, which is usually some estimate of an extreme day, and behave in a deterministic way if that load is exceeded. Then, I monitor the actual performance and load in the production environment, signaling me when the latter is approaching the designed threshold.


What tools or technologies does your team use to support scalability?

On my team, we build our own systems for load testing. This allows us to test very specific scenarios and adapt to changing market conditions and business requirements. We have also developed a system for monitoring the performance of our applications in the production environment. This gives us regular feedback into how our tech is behaving and allows us to quickly notice any degraded performance.

 

 

jellyvision team photo
Jellyvision
Alex Bugosh
Principal Software Engineer • Jellyvision

When open enrollment for healthcare hits, Jellyvision’s benefits tools have to be able to handle an influx in traffic. Bugosh says his team relies on data from previous years, frequent load testing and tools like AWS to help ensure their systems are ready for the busy season.

 

Why is scalability important for the technology you’re building?

At Jellyvision, we face a fairly interesting set of scalability problems. Our ALEX Benefits Counselor product is used by employees to help them choose their healthcare each year during open enrollment. Since most of our customers have open enrollment during the same few months in the fall, we experience a spiked but predictable load pattern. The ability to have those systems scale up and down and deliver results quickly is key for our ability to help our users.
 

The back-end services utilize autoscaling, in case we see any large unexpected spikes in traffic, but are efficient during our off-peak times.” 


How do you build this tech with scalability in mind?

The first and most important strategy we have is identifying the expected usage of our systems when we are initially developing the requirements. Those requirements guide our expectations during code and design reviews and help us to choose the right core technologies and architecture patterns for our systems.

This process leads us to make deliberate choices about what logic should live in the front-end JavaScript applications and what logic should live in our back-end services. Our front end is served entirely by a CDN, which has moved much of the scalability concerns over to AWS. Since our back-end services operate mostly independently, we can add instances at will. The back-end services utilize autoscaling, in case we see any large unexpected spikes in traffic, but are efficient during our off-peak times.

The other part of our strategy depends on having years of data around our expected load levels, based on the number of customers and historic load levels. We perform extensive load testing leading up to our peak times and will adjust our baselines so that autoscaling has less work to do. 


What tools or technologies does your team use to support scalability?

Jellyvision’s scalability story is AWS-centric. We use Cloudfront and S3 together to serve our front-end JavaScript code and media assets. We are in the process of moving our core services over from Elastic Beanstalk to ECS and Fargate.

For logging and metrics, Sumo Logic is our vendor. We rely on Sumo Logic’s dashboards, alerts and access to our historical data to prepare for our busy season.

 

 

hudson river training team shot
Hudson River Trading
Jason Mast
Lead Core Developer • Hudson River Trading

Fintech company HRT has built its own custom tools and proprietary reliability features to help with scalability issues. More than tools, however, Mast credits his team’s culture for having the biggest impact on scalability. He says that, by adopting a “long-term mindset,” engineers are inspired to design systems able to handle new features and stressors. 

 

Describe what scalability means to you. Why is scalability important for the technology you’re building?

Scalability means understanding how current capacity is being utilized and how technology will behave as it approaches or exceeds saturation. This is particularly important when operating in financial markets, where volume varies greatly and bursts of activity can unexpectedly overwhelm a system that appeared to have substantial headroom.

In finance, opportunities can be fleeting. Failure to scale to meet growth demands can quickly reflect on the bottom line. In proprietary trading, there are several dimensions of growth to consider: market volume, geographies and asset classes, our catalog of trading models and sophistication of machine learning. Our success depends on being able to respond to expansion in all of these dimensions.  
 

In finance, opportunities can be fleeting. Failure to scale to meet growth demands can quickly reflect on the bottom line.”

 

How do you build this tech with scalability in mind?

Building scalable technology starts with company culture. We strive to hire developers that can reason about complex systems and foster an engineering environment that values a long-term mindset. We cultivate that mindset through openly collaborative development and an iterative code review process where we encourage scalability and maintainability. 

Furthermore, we leverage an expansive compute cluster for research and development. This organically encourages modular design by improving parallelism in the cluster and increasing productivity. Since we run the same software in production, those aspects of scalability carry over nicely to the live environment. Meanwhile, a collection of automated stress tests provides continual insight into performance and scaling considerations.
 

What tools or technologies does your team use to support scalability? 

A foundational element of our technology that facilitates scale is a set of refined libraries that provide efficient communication among components. The straightforward access to concise data structures over shared memory or zero-copy network transports eases the burden of building distributed solutions. 

We’ve built proprietary reliability features into our network stack, enabling multicast communication using modern network hardware that provides high-speed fan-out of data to a potentially vast array of processing nodes. We’ve also built custom tools that assess production utilization daily and redistribute workload uniformly. 

Meanwhile, we’ve created a notification platform atop Redis and visualization tools using Grafana that continually communicate utilization of the system at the hardware, operating system and software levels. 

 

20spokes

Ryan Fischer
Founder • 20spokes

To Fischer, founder of development agency 20spokes, building scalable tech should be simple. That is, he said, engineers should keep their architecture lightweight and “easy to update and change.” This way, products are able to accommodate the user’s needs today and in the future. 

 

Describe what scalability means to you. 

Scalability is the ability to adapt to change and new needs. In the tech world, it is mostly attributed to how many users a particular app or site can manage without having performance issues. It’s important to remember the original definition, as you want your product not only to be able to handle increased activity but also be able to adapt in its feature set to meet the needs of the customer. This is crucial in the technology you choose, as it needs to be flexible and remove roadblocks preventing change.
 

During development, one of the best ways to improve scaling is not with tools but peer reviews.”


How do you build your tech with scalability in mind? 

Building to scale is keeping everything simple. All major frameworks used today scale just fine — when there are scaling issues, it tends to stem from how the product was architected. Many times, a product can be overengineered, leading to scaling problems not only with the demand on a server but also in creating new features. 

Keeping it simple means building components to be independent and small, which makes them easy to update and change. The same can be done with the server architecture. 

 

What tools or technologies does your team use to support scalability?

We use services such as Datadog to monitor performance. Google’s Firebase toolset has been improving and is very helpful to monitor the performance of mobile applications. 

During development, one of the best ways to improve scaling is not with tools but peer reviews. All of our code is peer-reviewed to ensure it is meeting our standards before being merged and deployed.

 

Sphera

Albion H.
Software Architect • Sphera

Sphera, a risk management software company, can’t afford to have its platforms experience even a moment of downtime. That’s why the engineering team turned to Microsoft Azure, a robust technology Albion says is equipped to handle the scaling and support of each one of its products. “The use of managed and proven technologies allows us to keep our teams focused on more creative endeavors and building business value,” Albion said. 

 

Why is scalability important for the technology you're building?

Our goal is to make SpheraCloud the best SaaS operational excellence platform. Our customers are running global operations with tens of thousands of users — they rely on our software to manage operations in their plants and visualize risk in real time. In this environment, even very small amounts of downtimes can result in a major disruption to operations and an increase in risk.
 

Even very small amounts of downtimes can result in a major disruption to operations and an increase in risk.”


How do you build this tech with scalability in mind? 

People are the most important piece of the scale puzzle. We’ve seen that clarity into roles and responsibilities is critical for scale initiatives to be successful. Absence of clarity usually results in things not getting done because nobody felt they had ownership, or in multiple teams developing the same feature without any coordination. The latter scenario not only wastes money but can also create long-term resentment between teams and destroy employee morale.

Following research in the area of organizational behavior, our engineering department is organized into “squads” of no more than five people to maximize engagement and collaboration. Scaling cross-team collaboration and communication happens via chapters and guilds, as popularized by Spotify. 


What tools or technologies does your team use to support scalability?

Our software runs on Microsoft Azure, and we strive to make use of its core capabilities to enable scalability and robustness. SpheraCloud is composed of multiple modules that seamlessly integrate with each other to feel like a single, cohesive platform. They’re deployed in Azure App Service, which enables us to automatically scale each module horizontally based on load and performance metrics. 

We store our data in Azure SQL Database, which can scale and support even the most demanding web apps. We take advantage of this architecture to direct all our most demanding “read” operations, such as reporting or searching, to "read-only" replicas.

We use Azure Cache for Redis, a fully managed, in-memory cache store for session data such as user cookies, roles and permissions, and application resources. Redis gives us sub-millisecond response time and diminishes the load on the database, enabling even greater scalability.

 

Carmini Consulting

Brittany Carminati
President • Carminati Consulting

When the pandemic began, Carminati Consulting had to scale its ImmuwareTM product to meet the needs of healthcare clients affected by COVID-19 — and do it fast. Carminati said that because the software was designed around custom compliance requirements for each customer, they were able to quickly build out new functions to support facility administrators monitoring the disease. 

 

Why is scalability important for the technology you're building?

Our SaaS product, ImmuwareTM, is a comprehensive employee and occupational health solution. Since the COVID-19 pandemic, we rapidly configured the product to handle the demands of an ever-changing landscape for healthcare organizations battling to keep healthcare professionals safe during the pandemic. We would not have been able to serve our new or existing customers if it weren’t for our flexible and scalable platform.   
 

We would not have been able to serve our new or existing customers if it weren’t for our flexible and scalable platform.”


How do you build this tech with scalability in mind? 

From its inception, ImmuwareTM was designed to be an online community for healthcare customers — meaning it is the organization’s job to ensure their workforce is compliant for specific healthcare-related activities. In doing so, we have ensured employees, supervisors and occupational health administrators have access to the necessary data, reports and dashboards to drive compliance. 

Since this initial adoption, we’ve scaled ImmuwareTM to specific niche roles, such as COVID-19 administrators designated to oversee symptom monitoring. Because ImmuwareTM was designed with the notion of “record types” — meaning each customer has different compliance requirements and, therefore, different record types which need to be tracked — we are able to easily offer tailored and rapid deployment for customers just seeking a specific record type such as “daily COVID-19 wellness checks.”

 

What tools or technologies does your team use to support scalability?

Azure Cloud hosting has been huge for us. Without Azure Web Services we could not as easily scale during peak usages and will not be able to integrate with external systems.

 

Responses have been edited for length and clarity.

Hiring Now