Monday, April 26, 2010

Achieving Fast Performance in Web Based Applications

Fast performance is absolutely necessary when serving web-based applications to thousands of users simultaneously. Performance issues can arise at different points, including the network level (due to congestion caused by network traffic loads and bandwidth issues), application level (due to inefficiencies or bottlenecks in code), web server level (due to overload), the user interface level (due to unresponsive web-based interfaces) and the platform level (due to overload and large datasets).

At Charonite we are constantly researching new ways of improving the performance of web-based applications. We have found from experience that measuring and logging performance is absolutely critical to troubleshooting performance problems efficiently.

At the network level, congestion can often be reduced by having applications that prioritise data communications automatically, using intelligent "rationing" of bandwidth and giving priority to smaller, critical data over larger, less critical data. For example, our intelligent transport system applications give priority to small pieces of data like vehicle license plates, and give lower priority to routine images and videos coming from monitoring applications. Intelligent caching and layers of proxies can help in reducing the number of overall queries and internal data transfers that need to be handled by the system. Reducing the amount of disk bound I/O also helps to increase data throughput significantly. Using simple network settings like enabling GZIP compression for HTML data in the webserver can also significantly reduce the amount of bandwidth usage needed.

At the application level, using profiling tools and good algorithm design and appropriate choice of suitable data structures generally produce the best improvements in performance. Profiling helps find bottlenecks easily and ensure that yet unnoticed problems are proactively found and fixed. Often, simple fixes in algorithms, such as reducing the number of unnecessary loops and disk reads can provide significant performance boosts. Good knowledge of data structures and their behaviour under different real-life loads For database bound applications, analysing slow query logs and the structure of SQL statements can improve performance literally by hundreds of times with intelligent query modifications and rewrites. Asynchronous events are also generally useful for fast reaction times, while synchronous polling cycles limited to a max number of items for processing per cycle are useful in realtime environments where data throughput and processing times need to be guaranteed and modelled formally.

At the web server level, various strategies can be employed to improve performance, including having multiple separate web server farms to serve dynamic and static content. Dynamic content that does not change often can also be cached and later on served via a simple web server that offloads some of the demand on more complicated portions of the web application automatically. We recommend the use of the Apache web server with a customised configuration that keeps just the necessary modules loaded in the server.


At the user interface level, using techniques such as AJAX will ensure a faster and more responsive user interface. Newer technologies such as HTML5 support look promising and will provide an even better user experience in the near future. Using thicker clients such as Flash or Java applets can also provide faster performance, especially for graphics intensive uses, however we generally try to use such techniques as a last resort - proper use of HTML and CSS can give a lot of performance without sacrificing compatibility and standards based development. A well designed user interface can also workaround performance issues by reducing the amount of work needed by the user - auto-completion, auto-suggestion of data, intelligent highlighting and hiding/showing of whole sections of forms, and so on.



At the platform level, we have been doing research on using massively parallel computing clusters and parallel algorithms to speed up performance and be able to handle large amounts of data simultaneously. The Obulus Platform has been designed with large datasets in mind, and uses features like federation and column oriented databases to speed up the handling of datasets going up in the terabyte ranges.

Performance issues are critical to providing a good end user experience. We hope that the issues discussed briefly here and the tips given will help out in achieving better performance in web based applications.

p.s. Charonite is hiring: if you're passionate about achieving performance on a large scale and want to work in a focused team, drop us an email.