Better Results

Over the last couple of weeks a good deal of work has been put into the 10C platform to do something about the excessive number of requests coming from bots and, for the most part, the measures are working like a charm. Known content scrapers are given a 403 "Forbidden" error. Bots looking for WordPress, PHPMyAdmin, or other exploits are given a 422 "Unprocessable Entity" with this happy response page. Contact form spam is way down, too. This results in not having to process about 60,000 SQL queries per day1 and, more importantly, having more accurate statistical data available for everyone. The "Popular Posts" segment on people's blogs is a prime example of this.

Popular Posts

Before making the necessary changes to better handle bots, every page load from a machine was treated the same way as a page load from a person. This resulted in some horribly skewed numbers when it came to "popularity" as some posts from over a decade ago consistently remained near the top of the list. Because the posts were so old, just about every content scraper knew the URL was valid and would come back to it quite regularly. However, if a real set of eyes is not looking at a post, can it be considered "popular"?


So, with the filters and content loading processes better equipped to determine whether someone is actually looking at an article on a website, we get better results that are both more accurate and more relevant. Looking at the 9 most-read items on my site, it's good to see that 8 of them were written this year2. Naturally, the items listed on other 10C-powered sites will see a similar improvement in the reliability of the categorisation.

Hopefully the next round of updates to the platform are just as productive as the most recent dozen have been.

  1. 60,000 SQL queries is not very many, but it does work out to about 30-seconds of CPU time per day. Less processing power means having slightly "greener" operations.

  2. The post from 2012 is an odd aberration, but it seems to be legit.