One of the fun things we get to do as software engineers is to figure out how to build enterprise applications that scale.
Most of us are taught the fundamentals of programming by a computer science department at a college or university. We learned about algorithms, data structures, logic, concurrent programming, etc. Very few of us were taught how to build systems at scale. It’s a skill we have to acquire on the job. Quite often, building a scalable enterprise application takes looking at the problem from a different perspective.
Let’s say we’re building a photo gallery website. Users are allowed to upload their photos, which are stored on our servers, then browse both their photos and their friends photos. It’s a relatively simple application which we can build using the n-tier architecture we’ve all learned.
Our user uploads the photo to our website
The website sends the photo to an API
The API writes the photo to an Amazon S3 bucket and the URL to the photo in a database.
Now comes the requirement that whenever a photo is uploaded, we generate a thumbnail of the photo. We can add the thumbnail generation logic to our API easily enough. If we think a little harder about the solution, we’ll start to see it has some problems.
First, the design breaks the single responsibility principle in the API. It now has to save a photo, store its URL, generate a thumbnail for the photo, and store its URL, too.
Second, it doesn’t scale too well. Generating a thumbnail could be an expensive process. The API would be spending more time generating the thumbnail than storing the photo and its URL. Imagine if we had a thousand concurrent users browsing the galleries and uploading photos. The API would be so busy generating thumbnails that it would have fewer cycles to process new uploads.
Third, it’s a poor user experience. The user doesn’t need to wait until our service is done storing the file and generating the thumbnail to continue using our application. Most users in our case care that their upload is eventually consistent; that is they want the photo to be available to browse, but they don’t care if it happens now or 10 seconds from now.
The API sends a “photo uploaded” event to a queue and immediately returns to the website. At this point, the user has their response and can continue browsing and uploading photos while we work in the background.
A listener receives a “photo uploaded” event from the queue and begins to process it.
In this second example, we can see that the system is totally decoupled. Each part has a single responsibility and can scale according to its individual load. If we notice the queue is getting backed up, we can add more event listeners without scaling the API or the website. If the website is bogged down but the queue is fine, we can scale the web end of it.
This solution has the added benefit of allowing us to add new functionality by adding new event types. We simply modify our API to add a new event to an event queue, then we add new listeners to this queue. These new listeners will process their new events, leaving the existing listeners and queues unaffected. Each set of listeners can scale independently based on their own unique requirements.
As a practical example, this design pattern is what we used to scale out Trimble Maps’s Trip Management platform that my colleague Nirjhar wrote about recently. We had a business requirement to monitor our customers’ active trips for possible delivery window misses. We needed to calculate a vehicle’s estimated time of arrival (ETA) based on its current location and determine if the ETA fell within the promised delivery time.
The brute force way would be to find all the trips in progress in our system that had locations and calculate the ETA from the current location to the next stop. In theory this works, but it falls apart at scale. The query to get active trips is a passive, pull mechanism. ETAs are only calculated at a set interval, meaning a delivery could be in jeopardy of missing its window long before we get to processing it. As more customers come on board and send more trips, the processing time would steadily increase. Parallelizing the ETA calculation process, spreading trip analysis across multiple threads, would increase throughput. But there are still problems with this approach.
Our customers’ IoT devices continuously send us GPS positions. By polling for active trips, we get the last position of the vehicle. If the vehicle hasn’t sent us its location recently, then we are repeating a calculation we’ve done already and generating ETAs based on stale information.
The aha moment came when we realized our customers were telling us when we needed to calculate new ETAs — when they sent us information. Whenever we receive a new position from a vehicle or that the vehicle arrived at a stop, we queue an event which our listeners retrieve and process. It was a total inversion of how we initially conceived of the alerting feature, but it made total sense. Now our customers get near instant alerting if a delivery window is going to be missed and we can independently scale each part of our stack based on current load.
As you can see, scaling up an enterprise-level application doesn’t mean throwing more hardware at the problem. Sometimes all you have to do is take a step back and look at what you’re trying to build from a different perspective.