Optimizing High Traffic Massive Data Websites

The big dilemma in web development is that, on the one hand, web pages are expected to do more and more, while on the other hand, are expected to load instantaneously.  A web page that takes longer than 2 seconds to load is considered slow.  By 10 seconds many users give up, and 20 seconds load time is considered completely unusable.  On the other hand, with dynamic data, hundreds of data queries need to be performed to display a single web page.  When thousands of users are simultaneously accessing the page, the server access can get clogged up, increasing the server response time to render the page.

Client Side Optimization

What makes a page take time to be displayed? Loading time of additional files and script execution time.  To decrease initial script execution time, the display initial should be as independent as possible from script execution, and scripts should be reserved for event handling.  To decrease loading time of additional files, there are a few helpful methods.  First of all, each http request has its own overhead, so combine JavaScript and CSS files to make less http requests.  Image files may also be combined into one larger sprite file.  Then, use background position to determine which image to be displayed.  Additionally, since these files are generally static, they may be loaded faster by storing them on a CDN, so that the end user will download them from a distributed server, decreasing network time.

Database Optimization

On the server side, the biggest bottleneck is database access.  To optimize database response time, there are 2 types of optimizations – query optimization and server distribution.  To optimize the queries, the first thing is to make sure that all the necessary indexes are in place.  Every field that is used in a “WHERE”,  “JOIN”, or “ORDER BY” should be indexed.  This should eliminate the most sever inefficiencies.  After that, overhead may be further reduced by monitoring the server for queries that take more resources and use EXPLAIN to find the inefficiencies.  Server distribution is increasing the number of database servers, so that each server will not have a long queue of requests.  To do this, there should be a master server where are write command, such as “INSERT”, “UPDATE”, etc. are executed.  All changes should be replicated on the slave servers.  All read commands, such as “SELECT” should be performed on the slave servers.  To speed things up further when there is greater traffic, additional slave servers should be created, and the requests should be distributed across the slave servers.  The disadvantage of this is that the information queried could be a split-second out of date due to replication time.  It also creates another place where data retrieval can go wrong, when the replication fails.  Still, with large data and traffic, server distribution is absolutely needed for the web site to run at reasonable speed.

Caching

After all the queries are optimized, the website may take time because there are still a large number of queries being executed.  If the data is not constantly changing, there is no need for all the queries to be executed each time a page is loaded.  Instead, the content should be store in memcache.  Content will then load almost instantaneously.    Sometimes, the specific page content is dynamic, but much of the data is static.  Here, the page may still load quicker when the model caches the static data.  Just make sure to invalidate the memchaced content when the data is updated.

External Indexing

Still, if I want to search the website’s data, I need dynamic results that cannot always be cached.  How do I avoid increased database usage? Use and external data index, such as Sphinx to search the data.  You can have it return the ids of each record of the search results.  Once you have the ids, you just look up the records in memcache, and no direct database access is now required to display the search results.

This entry was posted in Uncategorized and tagged , , , . Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>