Vertical scaling in the cloud

  • Author: Andrea Rákosfalvy

  • Date published: 7/8/2021

  • Time to read: 12 minutes

If you read our previous article on scaling modern web applications, you already know what vertical scaling is, but let’s do a quick recap in case you missed it: 

Vertical scaling is changing the amount of resources on the same server, as opposed to horizontal scaling when you add or remove same-purpose servers in your stack. Using the transportation analogy, vertical scaling is deciding whether to go by car, minibus or double-decker to your favourite team’s away-game.

The biggest advantage of vertical scaling vs horizontal scaling is that there’s nothing special needed in your code to accommodate it. Whilst with horizontally scaled apps you need to think about things such as load balancing, database replication etc., with vertical scaling there are no hoops to jump through. 

Various types of hosting solutions have emerged to meet the scalability needs of modern web apps, many present scaling as this *magical* thing that just happens. We believe in transparency and the scope of this article is to present the options and technologies behind vertical scaling as well as demystify the vertical scaling model offered by Enscale. 

We hope that with this knowledge you will be able to better evaluate the available solutions, the benefits and drawbacks of each as applied to your application and make a more informed decision about a hosting solution that helps support your growth at a pace that is right for you.

Why is scalability important?

It’s probably obvious that as the number of your users increases, your application will require more resources to handle the concurrent requests, or that as your database grows you will need more disk space, but also more RAM to be able to handle database queries. 

Smaller servers are naturally cheaper and often the budget dictates the server size - we understand. However, there are times when skimping on the server can cost you dearly. 

When your server runs out of RAM to service the memory allocations it's granted to running processes (aka it’s out-of-memory) it starts what is called an OOM-killer - a last resort alternative to an OS crash.  The OOM-killer selects and kills running processes to release enough memory to stabilise the system. The process(es) killed may be something relatively small and unimportant to your application - in which case neither you nor your users will notice. Or it can be quite obvious: when your database gets corrupted due to multiple OOM kills and won’t start up again. In other cases, the OOM may still affect a vital part of your application, but you wouldn’t realise until a user complains.

Let’s say there’s a script in your application that does the following: 

  1. take customer payment
  2. send API request to warehouse for item to be picked and packed
  3. send API request to courier to request parcel dispatch

If step 1 completes, but the process is OOM killed during step 2 or 3, your customer won’t receive what they’ve paid for and you may not even find out about it until they write an angry email and tell all their friends to never order from you again.

Enscale has built-in email alerts to automatically notify you whenever an OOM kill occurs, detailing the processes affected, so such problems are impossible to miss. Regardless of which hosting solution you choose, you should make sure you have a way to identify when OOM kills occur as they can snowball quickly.

The ability to scale vertically means that you can increase resource limits to prevent nasty things, such as OOMs, without having to change the way your application works.

Physical vs virtual vertical scaling

A bare-metal server can be scaled up vertically, but only if you didn’t get the most power out of the server from the initial setup.

RAM can only be scaled up to the level the processor allows for, and you also have to be mindful of available slots in the motherboard. Similarly, for upgrading CPU you can add multiple processors - provided your motherboard and the CPU model used support it.

Although in theory, you can replace 1 CPU with 1 (or more) CPUs from the same generation, it's not normally cost-effective to do that - by the time you decide that CPU X that used to satisfy your needs some months/years ago is no longer powerful enough, a new generation of CPUs are in town and you'll get best performance for cost by switching to one of those instead. 

A new generation of CPUs will have different socket types and require different motherboard chips, so essentially changing the processor usually goes hand-in-hand with changing the motherboard. At this point generally it’s more cost-effective to purchase a new server altogether.

Once you have your basic parts or new server - you also need to schedule downtime for the duration of the work. If you go with a new server you also need to migrate your data from the old server to the new one, test it to make sure it works and put it in production. If you only add more RAM to your existing server, you’d need to shut the server down, insert the RAM, boot up and make sure things still work and the new RAM is being utilised.

Considering that we’re talking about hardware, you can’t do ad hoc upgrades - everything needs to be planned in advance, hardware needs to be ordered, delivered, maintenance windows announced to your users etc.

Hosting providers who offer bare metal servers for lease do generally (but not always) have spare hardware to make upgrades happen faster, but dedicated servers (especially good ones) still cost a lot and the best practice is to plan ahead for growth when purchasing. Of course, this translates into paying more for a server that will also support your eventual growth, rather than getting one that’s just the right size for now.

Virtualisation technology made things a lot simpler and now most providers offer virtual private servers which can be scaled up and down a lot more easily (not to mention they are easier to handle - for example when you need to migrate).

With the hypervisor-based virtualisation model a bare metal server could be split up into virtual machines, making resizing less painful, as it’s a resize operation instead of a change in hardware. Much like repartitioning your computer disk, but for any resource type, not only disk space allocation. While the process itself can be automated better and won’t require as much time, there’s still some sort of reboot or service restart required to finish the task so scheduling these for times when your app is less busy is still advised.

The other virtualisation technology is based on containers, so similarly a bare metal server will consist of one or more containers. Unlike in the previous model, resources can be reallocated at any time between containers as there’s no disruption to service, so the technology is useful for ad hoc upgrades as well as downgrades.

Virtualisation-critics often bring up the hosting providers’ ability to oversell servers - and while some probably do, we continually monitor resources to ensure that your containers are able to grow on demand. A big advantage of virtualisation technology is that containers can also be live-migrated between hardware nodes without impacting your application which allows our team to move servers around and ensure that the underlying hardware node can support the resource requirements of all of the containers on it.

Important vertical scaling considerations

We noted earlier that OOM events can cause a lot of harm so you should always ensure that your application has enough available resources to prevent it.

While applications also like to be “comfortable” and if they do have more RAM they will likely use a little more than what they could still function on, for example keeping more objects in memory than strictly necessary, there are measures in place to keep this in check, like the Garbage Collector. Putting this very simply the GC identifies objects in memory that are no longer necessary and removes them.  Garbage collection happens automatically in Ruby, but you can also call GC to force it if need be. Removing objects from memory is only one part of the process though as the object might not be there anymore, but its “place” may not be returned to the available memory - this phenomenon is referred to as fragmentation.

You can counteract fragmentation by calling malloc_trim(), using jemalloc, or setting the MALLOC_ARENA_MAX=2 environment variable. Each come with their own costs and benefits. With Ruby 2.7 Garbage Compaction was introduced with the specific purpose of reducing fragmentation, but it still needed to be handled manually, in Ruby 3.0 it became fully automated.

There are already a lot of amazing resources out there about memory allocation and garbage collection in Ruby, so we won’t go into more detail here, but if you want to learn more we strongly recommend checking out Jemma Issroff’s blog series: Ruby Garbage Collection Deep Dive.

With garbage collection and compaction, Ruby already handles a lot of memory usage optimisations on your behalf, but badly written or badly configured applications are still capable of eating up everything that is thrown at them. You should always make sure to monitor resource usage and optimise your code as well regardless of whether you’re hosting on an auto-scaling PaaS or a fixed-sized server. Memory problems don’t only affect your budget, but strongly affect application performance as well.

Due to Ruby’s garbage handling, provided your application doesn’t have any major memory leaks, you should see an increased memory usage on increased load, and a drop in memory usage as the load decreases and objects are being garbage collected. This fluctuation in memory basically means that your application already has automatic vertical scaling capabilities. However, it still needs to be translated to the server in order to reflect on your costs. This is where Enscale’s scaling model comes into play.

Enscale’s Vertical Scaling Model

Enscale uses Virtuozzo, a container-based virtualisation solution, to create multiple role-based servers that form your environment, we will continue to refer to these containers as nodes.

Each node is created with a predefined resource limit, also known as the scaling limit - which you can change from the dashboard any time to increase or decrease the RAM and CPU power allocated to each node type. 

So far there’s nothing special, most virtual servers work the same way. However the resource limits in Enscale are just that: limits - the resources are not in fact allocated to your node, but you can use that maximum if and when required. The container-based virtualisation allows for us to allocate the required resources without any disruption to service automatically as they are needed. 

Consider this like getting the largest server you think you’d ever need for that specific role. The big difference is that we don’t overcharge you by making you pay for resources you only think you might need (resource limit). Our billing model is based on your actual resource allocation to your nodes for any given hour, so you only pay for what your application really uses.

As RAM and CPU generally go together (increasing CPU is useless without enough RAM to serve it, and RAM by itself does nothing), in Enscale it is combined into a composite resource unit that consists of 128MiB RAM and 400MHz CPU. This association comes from Jelastic PaaS, the underlying platform for Enscale, based on years of experience and best practices. In fact, their original starting value for CPU was just 200MHz, which they increased on the recommendation of our very own service director who pointed out that it would be a significant performance improvement, especially for users who prefer to set a lower limit based on RAM to keep their app “in-check”.

The container-based virtualisation model allows for resource reallocation without disruption to the server. But there’s no point in having more power if your server configuration isn’t taking advantage of it. 

For example, if you up your MySQL database’s max resources, the `innodb_buffer_pool_size` value should be set higher to allow for more optimal use of the extra available RAM. The mysql service then needs to be restarted for the config changes to be applied.

While you can change the configuration files yourself, to save you time and effort, Enscale automatically adjusts them for you to make best use of the new resource limits and does the necessary service restarts. So when you manually adjust your limits, there is a brief (few seconds) disruption in service. This disruption doesn’t happen on automatic scaling events though; only when adjusting the limits.

A word about “infinite scaling”

Whilst this is a popular tagline, the basis of the “cloud” is still bare metal servers which naturally come with a limited amount of resources. These servers are split up to multiple virtual private servers or containers, but the underlying hardware still determines the maximum possible vertical scaling for any single VPS or container. 

There’s naturally a limit to Enscale’s vertical scaling as well, but instead of possibly maxing out the underlying hardware with a single container, we set a soft limit of 32GB of RAM per container (which can be increased further by request). Our experience is that when applications have higher resource requirements than the imposed soft limits per node, it's worth a discussion with our technical support team to validate the overall application topology and find the most suitable long-term solution - be this an increase in the soft-limit for vertical scaling, or explore alternative scaling options depending on your exact needs.


In conclusion

The ability to scale your application vertically with Enscale has two main benefits: 

No added complexity, Ruby already takes care of everything for you with garbage collection and compaction. This, especially opposed to horizontal scaling where you have to also adjust your code to work with multiple servers, saves you a lot of time and effort when coding. Enscale also handles the server config changes to take advantage of the new limits you have. You can adjust these at any time, but you don’t have to

Secondly, Enscale’s billing model ensures that your wallet also benefits from your application’s scalability as you only pay for the resources your app actually uses. Your application visitors rarely double from one day to the next, so why settle for being forced to double your resources (and costs)??

Tl;dr version

  • Virtualisation technology allows applications to scale quite easily and it became a sought-out option for modern applications
  • Vertical scaling allows for increasing resources within the same server/container
  • Ruby’s garbage collection and garbage compaction makes your application vertically scalable by default
  • Enscale offers budget-friendly granular automatic vertical scaling
  • Our vertical scaling solution also performs config changes so you don’t have to

Helping you run and maintain Ruby apps in production with minimum effort.

© 2022 Layershift Limited. All Rights Reserved.