Optimizing High Traffic Massive Data Websites
The big dilemma in web development is that, on the one hand, web pages are expected to do more and more, while on the other hand, are expected to load instantaneously. A web page that takes longer than 2 seconds to load is considered slow. By 10 seconds many users give up, and 20 seconds load time is considered completely unusable. On the other hand, with dynamic data, hundreds of data queries need to be performed to display a single web page. When thousands of users are simultaneously accessing the page, the server access can get clogged up, increasing the server response time to render the page.
So many staff are now also working remotely and if that’s the case for your business make sure that you use some good screen monitoring software so that you can see what your staff are doing.
Client Side Optimization
What makes a page take time to be displayed? Loading time of additional files and script execution time. To decrease initial script execution time, the display initial should be as independent as possible from script execution, and scripts should be reserved for event handling. To decrease loading time of additional files, there are a few helpful methods. First of all, each http request has its own overhead, so combine JavaScript and CSS files to make less http requests. Image files may also be combined into one larger sprite file. Then, use background position to determine which image to be displayed. Additionally, since these files are generally static, they may be loaded faster by storing them on a CDN, so that the end user will download them from a distributed server, decreasing network time.
Database Optimization
On the server side, the biggest bottleneck is database access. To optimize database response time, there are 2 types of optimizations – query optimization and server distribution. To optimize the queries, the first thing is to make sure that all the necessary indexes are in place. Every field that is used in a “WHERE”, “JOIN”, or “ORDER BY” should be indexed. This should eliminate the most sever inefficiencies. After that, overhead may be further reduced by monitoring the server for queries that take more resources and use EXPLAIN to find the inefficiencies. Server distribution is increasing the number of database servers, so that each server will not have a long queue of requests. To do this, there should be a master server where are write command, such as “INSERT”, “UPDATE”, etc. are executed. All changes should be replicated on the slave servers. All read commands, such as “SELECT” should be performed on the slave servers. To speed things up further when there is greater traffic, additional slave servers should be created, and the requests should be distributed across the slave servers. The disadvantage of this is that the information queried could be a split-second out of date due to replication time. It also creates another place where data retrieval can go wrong, when the replication fails. Still, with large data and traffic, server distribution is absolutely needed for the web site to run at reasonable speed.
Caching
After all the queries are optimized, the website may take time because there are still a large number of queries being executed. If the data is not constantly changing, there is no need for all the queries to be executed each time a page is loaded. Instead, the content should be store in memcache. Content will then load almost instantaneously. Sometimes, the specific page content is dynamic, but much of the data is static. Here, the page may still load quicker when the model caches the static data. Just make sure to invalidate the memchaced content when the data is updated.
External Indexing
Still, if I want to search the website’s data, I need dynamic results that cannot always be cached. How do I avoid increased database usage? Use and external data index, such as Sphinx to search the data. You can have it return the ids of each record of the search results. Once you have the ids, you just look up the records in memcache, and no direct database access is now required to display the search results.
From time to time, we respond quickly to support “How-to” requests by using JobShuk activities, and tagging them #HowTo. Click here to view those now.