What is it that we really mean by scalability?
A service is said to be scalable if when we increase the resources in a system, it results in increased performance in a
manner proportional to resources added. Increasing performance in general means serving more units of work, but it can
also be to handle larger units of work, such as when datasets grow.
In distributed systems there are other reasons for adding resources to a system; for example to improve the reliability
of the offered service. Introducing redundancy is an important first line of defense against failures. An always-on
service is said to be scalable if adding resources to facilitate redundancy does not result in a loss of performance.
Why Scalability is so hard?
Because scalability cannot be an after-thought. It requires applications and platforms to be designed with scaling in
mind, such that adding resources actually results in improving the performance or that if redundancy is introduced the
system performance is not adversely affected. Many algorithms that perform reasonably well under low load and small
datasets can explode in cost if either requests rates increase, the dataset grows or the number of nodes in the
distributed system increases.
A second problem area is that growing a system through scale-out generally results in a system that has to come to terms
with heterogeneity. Resources in the system increase in diversity as next generations of hardware come on line, as
bigger or more powerful resources become more cost-effective or when some resources are placed further apart.
Heterogeneity means that some nodes will be able to process faster or store more data than other nodes in a system and
algorithms that rely on uniformity either break down under these conditions or underutilize the newer resources.
Is achieving good Scalability possible?
Absolutely, but only if we architect and engineer our systems to take scalability into account. For the systems we build
we must carefully inspect along which axis we expect the system to grow, where redundancy is required, and how one
should handle heterogeneity in this system, and make sure that architects are aware of which tools they can use for
under which conditions, and what the common pitfalls are.
A service is scalable if it results in increased performance in a manner proportional to resources added. Generally, increasing performance means serving more units of work, but it can also be to handle larger units of work, such as when datasets grow. Another way to look at performance vs scalability:
Generally, you should aim for maximal throughput with acceptable latency.
Latency
: Latency is the time required to perform some action or to produce some result. Latency is measured in
units of time: hours, minutes, seconds, nanoseconds or clock periods.
Throughput
: Throughput is the number of such actions executed or results produced per unit of time. This is
measured in units of whatever is being produced (cars, motorcycles, I/O samples, memory words, iterations) per unit of
time. The term “memory bandwidth” is sometimes used to specify the throughput of memory systems.