First off, I am sorry for all of the recent website issues. We are actively working on it. Thank you for your patience.
Growth & Limitations
Over the past few years nearly every stat has doubled, and we have finally maxed out our server hardware with the current database architecture.
One of the things that has made us so successful is that our site was built from the ground up specifically for card collectors. We don’t use any pre-canned shopping cart software because none of them are designed to handle millions of unique products or offer all of the specialized functionality that our users have come to love. This also means that we are often pioneering new frontier, and we discover our limitations and issues as we grow.
I started to see the writing on the wall last fall when DDearing was trying to update 50,000 prices every day. So I priced out new server hardware. Unfortunately it was going to cost about $100,000 for us to make a significant hardware upgrade. So I took a very close look at our database architecture to see if there were ways we could be more efficient with our hardware.
It is actually quite amazing that the core of our database architecture has held up from its original design when we had only a few thousand items to now with more than 3.7 million items. Fortunately I found some opportunities for significant savings. For example, we currently store about 1,000 bytes of historical data every time an item is modified (e.g. asking price changes, ownership changes, book price changes…). I designed a new storage schema that could reduce that to only about 50 bytes of data for the most common data changes. This should mean that we can process much larger transactions very efficiently. However, this will be a pretty big change to implement. Since we didn’t want to rock the boat for the critical holiday season, I shelved these efforts until January.
Since mid January I have been finishing the design and porting our data to the new system. Now I am nearly to the point where I can start comparing the performance of the new architecture to the old one. However… issues on the current site continue to mount, and over the last few days it has been virtually unusable for many users. To keep the site limping along, I experimented with some temporary fixes. Some of them have helped a little, but ultimately it has just reconfirmed that with the current architecture our hardware cannot really serve more than our 8,000 daily visitors.
Search Engine Crawler Traffic
It used to be that Google made up about 90% of the crawler traffic on our site, and they were very responsible with how they did it. Now we are getting crawled by yahoo, bing, google, and many others. Unfortunately they are not all as responsible as Google. Today I just discovered that that Yahoo Slurp engine has been paging through our data 12 items at a time. Each request has been pegging our server for about 30 seconds. So Yahoo alone was basically monopolizing 1 of our 8 processors.
To address this issue, I have updated our robots.txt file to tell search engines not to page through our data. Instead they should be using our sitemap.xml to get all of the data in the most efficient manner. This should free up server resources so that more real visitors can use the site.
Right now the thing that taxes our servers the most is all of the searching and browsing of cards for sale. We are investigating the possibility of offloading the generation of search results to a farm of servers that will have access to their own cache of the live data. If we can move this to the cloud, we should be able to scale the site to handle virtually any amount of traffic.
In summary, there are several opportunities we have to make the site performance much better, but none of them are quick fixes. Our main focus is on these major improvements, but we will continue to make minor adjustments to the current site if people are commonly experiencing website errors.
Thank you for your patience and understanding. We are all experiencing the growing pains of a successful startup.
CheckOutMyCards.com Founder, CEO & CTO