Today is October 1st. Surprisingly by this time many eCommerce shops have locked down their systems from code changes to prepare for the holiday season. Expect to see spot sales in the coming weeks to check for hot spots on well known eCommerce sites that may require a critical patch before the onslaught of the week of Thanksgiving. There are still opportunities to take advantage of to tune the environment outside of code changes. Over the next few weeks we'll look at items to implement today and items to improve efficiency in scaling. The bonus will be improved end user response times. As end client response time improves, so does a site’s conversion rate.
The types of items that we will be covering in this series are not the types of items that you will discover with tools such as Google PageSpeed or YSlow and possibly not even with traditional deep diagnostic tools. There are several reasons for this. First, we will be looking at 100% of the actions of the user population as captured by your HTTP logs. Second, many organizations ignore the actual web server statistics and concentrate their analysis on what happens on the back end or in the CDN. This is a function of available bandwidth and keeping the cart system healthy and happy for revenue purposes. We’ll be covering fresh ground for a lot of organizations.
Common Log Elements you need to begin collecting today
Let's establish a common, objective unit of measurement across all tools and all sites. As part of this series we will leverage information captured by all current web servers and application servers with a web interface, the w3c extended attribute time-taken. This then is your opportunity to begin to collect the time-taken data if you are not currently collecting it. Information on you you may modify your logs for collection can be found at the LiteSquare Logs Page for some common application and web servers.
These discussions will also assume that the HTTP Request logs are collecting the end-user IP addresses for the requests. Otherwise it may appear that 100% of your requests are coming from your CDN Provider or your firewall. Often filtering of the data will be required by IP address for analysis purposes. If you are not currently collecting host information in your logs then this is your opportunity to do so. No code changes required on your application.
Both the collection of the host information and the collection of the time-taken field are not typically changes to your site. These changes are limited to the CDN, the firewall and possibly the data being output as part of your HTTP logs. In short, configuration changes, not code changes. As we have moved into a period where code is locked from changes for the upcoming holiday season, configuration changes become the only options.
Shopping Cart Session Duration and Why Setting it too Long costs you Sales
In this first week, we’ll take a look at the session timeout on your shopping cart. The longer the cart session is held open the more resources are locked-up the application servers These locked-up resources are unable to be accessed for new users. When many users arrive in a narrow window of time then the system can easily be bogged down with not enough cart resources to service all client needs. This site market rush behavior is easy to observe in the stories of websites crashing or slowing to a crawl during high traffic times. Holding the client cart session longer than is necessary also results in a long recovery period for the website when there is an issue present related to load. This may seem a daunting task to set a value which is going to cover your users effectively, but this is made simpler by looking at small set of data in your logs.
Note, I use Splunk for my analysis but you may take advantage of other tools. So, the actual queries are not presented, just the process.
Filter the logs. We need a subset of data for analysis which just includes people who have purchased a widget or service from the website. Every site has a unique end page which is associated with the completion of a sale, so by collecting the unique host IP addresses associated with requests to this page, we have defined a list which can be used to filter all of the other requests. You will want a subset of requests which includes HOST IP addresses which appear in the list of IP addresses for people who have completed a sale on your website.
IP addresses outside of this range include a high percentage of window shoppers who will never convert to a sale as well as automated agents. There will be some individuals who begin the sales process by adding an item to the cart but who do not complete the conversion - this will also include a fair share of window shoppers who just wanted to see a final price with taxes for comparison shopping.
With this subset of data for revenue-only users we want to ask questions related to the cart itself. When is the cart created. What is the time between cart creation and the next time the cart is “touched” by the client? There are some sites which have cart “strobe” on ever page after cart instantiation. Other sites leverage local cookies for capturing data and then update the cart periodically in the checkout process. What is important to realize is that there is a minimum and maximum window for the time from cart request A to cart request A+1. By understanding the time between each touch of the cart we can better set the amount of time that the cart object is alive in memory before being timed out. Capture the time between cart requests grouped by IP address and hour per day.
The end goal is a set of statistical information on the times between cart access, Min, Max, Avg, Standard Deviation, a percentile (I prefer 99% in this case) and the Maximum. The reason for an hourly breakout is to account for a user departing for a bit who then returns later in the day to complete the same sale. This hourly breakout provides a more realistic perspective on the natural cart maximum rather than holding the maximum to someone who adds an element to the cart at lunch and then checks out after the kids go to bed at night. The return visitor should be treated as a new client session.
There are third party sources of data that may be useful as well. Omniture, mPulse & Webtrends are all examples of tools which can provide insight into page-to-page transition times on a site, as well as complete time on a site for users. The information would need to be paired with intimate knowledge of the client cart architecture for insight on when a cart is created related to the traversal of the site and how often it is “touched” on the back end from the client. This type of reporting is an excellent complementary data set to what is available from the HTTP request logs.
Collect information on your cart object timeout for your eCommerce cart system. This information should be located in a configuration file. If it is in code then the reconciliation path becomes more complex. Compare the cart timeout values to the observable, objective data from your own system on the amount of time between requests to the cart object. In most cases the cart timeout is set extremely high, on the order of 20 to 30 minutes, where the average time between requests for a revenue user examined by hour is typically at a maximum of five minutes, accounting for someone who had to run to the next room, collect their credit cards, was social for a moment before returning to the PC to type in the data. It is rarely the case that double the 99th percentile value is greater than the existing cart timeout. As you can see in the figure, the 99th percentile page to page transition time with cart access on each page is substantially less than the example 30 minute session timeout.
Where the eCommerce Cart session timeout is longer than the maximum observable cart handshakes then the session timeout should be trimmed to allow for the release of system resources faster. Trimming the session directly supports a larger number of users on a site with the same pool of available resources. In the figure above the gap between the longest handshake to the cart, represented by the Page B to C transition time, and the cart timeout at 30 minutes represents an opportunity to recover substantial resources by adjusting the cart session timeout lower. Even if the cart timeout was shaved by ten minutes this would still represent a 1/3 faster recovery of resources without a degraded user experience related to resource scarcity. A faster cart leads directly to higher conversion rates. In most cases the cart-to-cart handshake is far less than what is represented in the diagram, at just a few minutes. The cart timeout in the figure, at 30 minutes, is very typical.
I don’t recommend the use of the maximum observed value for the session timeout. For one, this will frustrate at least one user at the end of the shopping experience which is not a positive item for final sales conversion. Second, the holiday season is a time of maximum system stress due to the larger than average load on the system. The system will already be experiencing some level of natural degradation from higher than normal load resulting in slower user experiences with a likely higher maximum on the cart-to-cart handshake. Having a value which is derived from observations from last year would be optimal, but such data is rarely available. Look to the rule of thumb of twice the 99th percentile where the timeout is less than the current cart timeout to allow for headroom when the system slows, but still allowing for faster recovery of resources. As more data is collected live in production the value can be trimmed further to allow for the observable level of headroom required for the cart session under maximum user conditions.
So, to recap the first week:
- Make sure you are collecting time-taken in your HTTP request logs
- Make sure you are collecting the client HOST in your HTTP request logs
- Tune your cart timeout for an optimal value based upon the behavior of revenue customers.