As an administrator, you probably hear about the importance of having web servers that scale well. But what exactly is scalability? Simply, scalability is a web server's ability to maintain a site's availability, reliability, and performance as the amount of simultaneous web traffic, or load, hitting the web server increases.
The major issues that affect website scalability include:
Performance refers to how efficiently a site responds to browser requests according to defined benchmarks. You can design, tune, and measure application performance. Performance can also be affected by many complex factors, including application design and construction, database connectivity, network capacity and bandwidth, back office services (such as mail, proxy, and security services), and hardware server resources.
Web application architects and developers must design and code an application with performance in mind. When the application is built, administrators can tune performance by setting specific flags and options on the database, the operating system, and often the application itself to achieve peak performance. Following the construction and tuning efforts, quality assurance testers should test and measure an application's performance prior to deployment to establish acceptable quality benchmarks. If these efforts are performed well, you can better diagnose whether the website is operating within established operating parameters, when reviewing the statistics generated by web server monitoring and logging programs.
Depending on the size and complexity of your web application, it may be able to handle from ten to thousands of concurrent users. The number of concurrent connections to your web server(s) ultimately has a direct impact on your site's performance. Therefore, your performance objectives must include two dimensions:
Thus, you must establish response benchmarks for your site and then achieve the highest number of concurrent users connected to your site at the response rates. By doing so, you will be able to determine a rough number of concurrent users for each web server and then scale your website by adding additional servers.
When your site runs on multiple web servers, you must monitor and manage the traffic and load across the group of servers. To learn how to do these tasks, see "Hardware planning" and "Creating scalable and highly available sites".
Perfect scalability - excluding cache initializations - is linear. Linear scalability, relative to load, means that with fixed resources, performance decreases at a constant rate relative to load increases. Linear scalability, relative to resources, means that with a constant load, performance improves at a constant rate relative to additional resources.
Caching and resource management overhead affect an application server's ability to approach linear scalability. Caching allows processing and resources to be reused, alleviating the need to reprocess pages or reallocate resources. Disregarding other influences, efficient caching can result in superior linear application server scalability.
Resource management becomes more complicated as the quantity of resources increases. The extra overhead for resource management, including resource reuse mechanisms, reduces the ability of application servers to scale linearly relative to constraining resources. For example, when a processor is added to a single processor server, the operating system incurs extra overhead in synchronizing threads and resources across processors to provide symmetric multiprocessing. Part of the additional processing power that the second processor provides is used by the operating system to manage the additional processor, and is not available to help scale the application servers.
It is important to note that application servers can scale relative to resources only when the resource changes affect the constraining resources. For example, adding processor resources to an application server that is constrained by network bandwidth would provide, at best, minor performance improvements. When discussing linear scalability relative to server resources, you should assume that it is relative to the constraining server resources.
Understanding linear scalability in relation to your site's performance is important because it affects not only your application design and construction, but also indirectly related concerns, such as capital equipment budgets.
Load management refers to the method by which simultaneous user requests are distributed and balanced among multiple servers (Web, JRun, ColdFusion, DBMS, file, and search servers). Effectively balancing load across your servers ensures that they do not become overloaded and eventually unavailable.
There are several different methods that you can use to achieve load management:
Each option has distinct merits.
Most load-balancing solutions today manage traffic based on IP packet flow. This approach effectively handles non-application-centric sites. However, to effectively manage web application traffic, you must implement a mechanism that monitors and balances load based on specific web application load. ClusterCATS ensures that the JRun or ColdFusion server, the web server, and other servers on which your applications depend remain highly available.
For more information on using hardware and software for load balancing, see "Creating scalable and highly available sites".