Thinking About "Smarter" Crawlers

This past week has seen a number of much-needed updates to 10C get released, one of which was a direct result of seeing an excessive number of requests coming from unabashed slurping bots. There is no denying that organisations that exist for the sole purpose of earning money from the efforts of others tend to follow each and every URL that is found on our websites, but is this really the best use of everyone's resources? Any crawler that wants to index each and every bit of public content that is hosted on 10C will need to read several million pages worth of text. Even accessing 5 URLs per second would mean that a crawling engine would need about 291 days to access each an every post. This is terribly inefficient for both the content-scraping scum and my infrastructure.

This got me thinking ….

As of this moment, every post on 10C works out to about 261 megabytes of text. Compressed it would be closer to 64MB. Updates are sequentially tracked and can be rolled back to any point at the drop of a hat. Would it not make more sense for crawling engines to have a URL that could generate a complete package of content on demand for them to download then, when the machines stop by in the future to look for updates, they can use the same URL but with a query string saying something akin to "give me all updates since last Tuesday at 3 o'clock" to download a much smaller package than the first. Doing this would be a win-win for everyone in that web servers could save their resources for actual visitors and companies that crawl every page online will not need to wait darn near a year to completely archive a platform as sprawling as this one.

Bandwidth consumption would go down. Page load times would drop. Server hosting requirements would decrease ever so slightly. All for the sake of an open mechanism that allows marketing companies to get what they want.

Then again, why would we want to make this easy for marketing organisations? No … I'm happy to have the recent updates that identify the bots and block them outright with a nice and clear message. The bandwidth usage has dropped. SQL operations per second have been halved. The server can actually relax a little bit now.

Darkness Across the Screens

Allergy season has started early this year, bringing a great deal of discomfort along with a little loss of vision. Fortunately there's some relief from the itchy, watery eyes with some over-the-counter medicines. The loss of vision, however, is new.

Late last week I noticed that it was almost impossible to use my phone in a dim room. Even at the lowest brightness, the screen was far too bright to be readable. This has happened a couple of times in my life when over-tired or staring at a CRT screen for too long, but the issue is so rare that I actually had to remember that my eyes would become sensitive to light after staring at a CRT monitor for a dozen hours or more. On weekends I seldom use the notebook for very long as Saturday and Sunday is "family time", and yesterday there weren't any real issues after putting in a full day's work at the day job. However, today the vision problem returned shortly after dinnertime.

One of the things that I've noticed about my eyesight is that things in the periphery are essentially just fuzzy blurs. While our peripheral vision is not designed to provide a high resolution depiction of our surroundings, it was generally decent enough up until the age of 35 to identify the brand and make of a car1. Now I'm hard-pressed to see there's a person next to me without paying a good deal of attention. That said, because the vast majority of my day is spent working with text, I can focus really well on characters that many people complain are far too small to be readable. To make these characters as legible as possible, I've generally stuck with dark text on a white background. This preference started back in the late 90s when I was working a great deal with Visual C++ and Visual Basic 5.0. Microsoft's development tools were all like this; black text with blue keywords and red errors on a white background, and I've generally set up every system I use — including how I write notes on paper with my three pens; black, blue, and red — to follow the same colour scheme2.

Tonight, though, I couldn't stand to look at the screens on my work desk after putting the boy to bed. It's not only that they were too bright; the white backgrounds everywhere were causing my eyes to water and blur. Could this just be exhaustion? Most likely. With all the pollen in the air and a distinct lack of the typical humidity associated with this part of the globe, my eyes are likely expressing a desire to be protected with a pair of closed eyelids. Unfortunately, there's still work to be done.

This is where "dark mode" can actually be useful.

A lot of people who use their computers for most of the day have sworn by dark mode for as long as I've been writing software. There are certainly a number of benefits to the inverted colour scheme, but I've always found it hard to read light text on a dark background, particularly if the characters are in any colour other than white or very light grey. That said, when there's work to be done, sometimes it's better to make do3 with what works rather than not work at all.

Sublime Text, the program I use for the vast majority of my coding work, was the first to get the dark treatment. That helped. Then it was Safari, followed by Outlook4, followed by Teams, then the desktop wallpaper, and finally the websites I use on a regular basis5.

My eyes responded well to this. Better than expected, I'd venture. They're not feeling nearly as tired as before. While I don't know how visible a dark theme will be during the daylight hours, I'll certainly make use of it during the nighttime.

Of course, I'll also make an appointment with my eye doctor when all this coronavirus kerfuffle comes to an end. This could be fatigue. It could be something else. I'd rather give up working with computers for the rest of my life than give up my vision6.


  1. One could probably argue that the key information was subconsciously collected with a rapid, involuntary glance that identified the shape and logo of the vehicle. I'm no expert on how our eyes work or how our brain translates photons into a visual expression of the universe we inhabit.

  2. The one exception to this rule has been the tools I use to write essays and blog posts, where the text is only black and the background is a light grey.

  3. I originally typed "make due", which is supposedly "a historical variant that is no longer accepted". If it's no longer accepted, then why the heck have I been using it my whole life? There's no way it was deprecated after I left college then fully expunged from the language at some point in the intervening years. Language doesn't work that way!

  4. Even after changing Outlook to "Dark Mode", there's another toggle you need to hit to make the email section flip from a white background to a black one. Who thought this was a good idea?

  5. My personal site has an auto-dark mode feature, so long as JavaScript is enabled.

  6. Vision is the one sense that I would never want to lose. Smell? Sound? Touch? Taste? Yeah … I can get by without them. It wouldn't be fun, but it wouldn't be impossible. No sight? That would make many of the things that I enjoy most in life all but impossible.

Old Databases

Earlier today I stumbled across an older post while looking for something else and noticed that there is no footnote defined for the 1 in the first sentence. At first I thought this was something that might not have come over during the data migration that converted posts from the v4 database to the very different v5 structure. However, after looking at the last backup from v4, I can see this problem dates back even further: to the time when v2 switched over to v4.

That was in January 2016.

The footnotes were eventually found in the last backup of the v2 database as rendered HTML, telling me that the source of the post was most likely Evernote. Sure enough, all the requisite metadata was there showing that the post did originate from Evernote back in 2012 and it's been sitting untouched ever since.

This blogging engine has seen quite a bit of change over the decade that it has existed, going from an Evernote-linked tool to something that could accept posts from multiple sources to something that could host podcasts to the something that currently exists. Each major revision has seen the database schema change quite a bit as past lessons guide future directions. So, as this was the first time in several years that I've looked at the 10Cv2 database, what does the 40 year old me think of the design?

Well … it wasn't very sophisticated. Heck, I'd say it was downright barbaric in its design, which leaned heavily on the Evernote API structure. The only way it could ever be performant for more than a few dozen sites would be to have the contents of every table kept in memory at all times, which is not recommended when running any version MySQL created since the Big Bang1. This lack of regard for previous efforts is not necessarily a bad thing, though. Working within the constraints of various shared hosting servers, then later VMs on Sakura and Amazon, I learned how to design databases that can run on very basic servers. The systems I make now, even if they're destined for powerful enterprise-grade server hardware, are designed to be as quick and nimble as possible while being flexible enough to allow for and endless amount of evolution going forward. Software projects are never finished, after all; they're abandoned. The early versions of 10C certainly prove the case.

I wonder what a future version of me would think of the v5 data schema.

Over the next few days I'll dedicate the time to restore these old footnotes so that posts from 2012 and earlier are properly structured. Hopefully there won't be too many items that need more than a straight update. The data structures have certainly evolved over the years, but the Markdown syntax that is used for Footnotes has remained incredibly consistent since I ripped it off from a WordPress plugin2 I had been using previously for the very same reason. With any luck, this will be the last time I have to go back to the v2 database to look for missing data.


  1. Yes. That Big Bang. Using the MEMORY engine is not recommended without a whole lot of failsafes put into place right from the onset and, even then, it's better to not use it.

  2. Many thanks to John Watson for creating FD Footnotes for WordPress. While the footnote implementation in 10C has evolved over the years to support various new requirements, the core functionality that went into Noteworthy, then later 10Cv2 and v4 was taken from John's plugin.

Blank Slate

How much do we actually know about the subjects that fascinate us? When the topic is related to computer technology or Star Trek, I can generally answer just about any typical question a person might have, but there’s a lot that I don’t know about specific technologies that exist or the minutia of a universe that does not.

For most of the past decade I’ve been operating well within my comfort zone for things I know, occasionally branching out and expanding skill sets, and getting things done. Heck, it would be fair to say that because I’ve been operating primarily inside well-travelled patterns that my current lifestyle has been possible. This year there is a lot of new, though, which is showing me just how little I know about subjects that are going to be incredibly prevalent over the next few years, if not decades. It’s as though I’m a blank slate.

Fortunately there are a number of people I can call on to help me learn about some of these matters. For the work-related items there are colleagues. For kindergarten-related items there’s Reiko, the boy’s teachers, a next-door neighbour who happens to be an elementary school teacher, and the Internet. For parenting-related items there’s just about every person over the age of 40, plus a myriad of books, TV shows, and podcasts. For religion … this is a tricky one that I can’t quite quantify with words just yet but, having avoided any study or practice for two decades, there’s a lot to rediscover.

Being outside the comfort zone after such a long time is a good thing, though. We are challenged to explore new ideas, new activities, and new potential. We’re given an opportunity to take on more responsibility and maybe even re-examine the ones we currently have. We can learn. We can remember what it’s like to not know.

Some people have told me that I “always have the answer” they’re looking for when they come with complicated questions relating to databases, algorithms, complex math, and the like. A quarter century of experience can give anyone a vast pool of knowledge to draw from when helping others, and sometimes it can seem as though we’re answering the same question again and again and again. This has created some frustration with me recently. However, with my abject lack of knowledge about how kindergarten works in Japan, I have been able to see things from the other side again. I’m sure the boy’s teachers have already tired of my questions and incessant mixing up of ソ1 and ン2 when filling out the seemingly endless run of forms by hand3. The same with my questions to colleagues about something they know well that I’m just now coming up to speed on. The same with parents who likely roll their eyes when I ask about various tricks to get young kids to put their pants on without complaint rather than parade in front of the big glass sliding doors wearing nothing but a shirt and a smile.

There’s a whole lot that I don’t know and, the more I learn, the more blank my mental slates seem to become.


  1. The katakana character pronounced “so”.

  2. The katakana character pronounced “n”.

  3. Reiko has bugged me for years that I write my name as “Jainso” in katakana. ジェイソン vs. ジェインソ. Do you see the difference? It’s there if you know what to look for.

Feeling Disconnected

PThe Safari browser on both the phone and tablet have been limited to a handful of sites over the last few weeks, down from the dozen that would be visited on a daily basis beforehand. There was a time not too long ago when regular visits to popular news and technology sites were something I looked forward to but, as I examine the messages contained in the articles, there has been a growing disconnect between how events take place and how they’re reported. For twenty years I’ve kept abreast of numerous topics in order to be informed enough of something to make better decisions in the present and future. Occasionally there would be a break for a week or two in order to recenter myself, as an overdose of news can leave a person feeling terribly depressed, but I’ve always gone back for more.

Not anymore, though. The effort to extract truth from spin has become so great that it would be better to be clueless than to invest time reading three or more articles on the same topic to reach a general understanding of what probably took place and what that means for me, my family, my community, and the world at large.

What is means in the grand scheme of things is that there will be more books read going forward, and probably a branching out to read academic papers on subjects. There’s still plenty of bias and preconceived notions to sift through with these mediums, but the long-form expression of the ideas generally mean a better-written argument or explanation. Both of these are welcome changes of pace to the endless he-said-she-said and you-wouldn’t-believe-what-so-and-so-said articles that permeate so many journalistic enterprises. Hopefully the change will also encourage a bit of a change in my mental state, as it’s been really hard to shake the mental confines of this slight depression I’ve been battling since late last year. There are always ups and downs, but too many downs will invariably lead to trouble.

Of course, the lack of news does make me feel isolated and disconnected more so than usual. It is as though the world is moving on without me while I putter around in an increasingly tiny1 microcosm. So long as this feeling passes in the near future there shouldn’t be a relapse to reading multiple news sites a day. Besides, the point of reading is to learn something useful. News sites rarely offer anything that can be classified as such.


  1. Can something tiny be emphasized with a word like increasingly? Perhaps it would be better to say “ever-smaller”

Ghost Town

On Thursday evening Japan's Prime Minister issued a "recommendation" that all elementary, junior high, and high schools across the country shut down from February 29th until the start of the new school year in April. This caught a lot of people by surprise, including the Ministry of Education, but also triggered a lot of companies to begin allowing their people to take time off, work remotely, or otherwise find ways to minimise the risk of contracting COVID-19, otherwise known as the novel coronavirus. My employer has also been hard at work over the last couple of weeks to find a way to deal with this situation given the number of people affected not only in Japan, but in China and other countries where we have schools. With so many people concerned about the contagious virus, this seems like a very logical thing to do. However, with most kids at home and fewer adults out and about during the day, the neighbourhood has become something of a ghost town.

This afternoon I had an opportunity to head out for a short walk to my favourite thinking spot and, along the way, I passed just 3 people. Afterwards there were a few things that needed to be picked up from the boy's kindergarten, so I walked the 600 metres there and passed just two people along the way. The school itself was also deserted, with just two teachers present. Finally, walking the 1.3km home, there were fewer than a dozen people along the way. I had spent the better part of an hour outside and saw perhaps 20 people in total; a number I find hard to believe given there are roughly 45,000 people living in the six neighbourhoods that make up this remote part of the city.

People are understandably nervous.

The boy will be out of school for the next month and Reiko's classes at university do not begin until mid-April. Hopefully the pandemic will be mostly contained by then as the country can't come to a complete stand-still. People need to buy food and supplies. People need to earn money. People need to accomplish goals. While some of these are certainly possible from the comfort of our homes, we cannot all work and live from our house. Heck, even I need to get outside a couple of times a day just to see the sky and get some exercise.

Bots On Notice

What do web crawling bots do and why do they do it? It's a simple question with simple answers, but today I found myself asking "Why do they do it so much?". There are about 20,000 requests from almost 100 self-admitted bots hitting my web server every day, which works out to one every 4.32 seconds. Given that the average amount of computational time given to each request is less than a second, this shouldn't be much of an issue. However, looking at what many of these tools are for, I don't see why they should have the luxury of crawling around each of the websites hosted on 10C without offering something of value.

Bot Hits

AhrefsBot and MJ12Bot both enjoy hitting several thousand pages a day from a multitude of servers while others limit themselves to a few hundred or a couple dozen. Doing the research on these crawlers, it appears that they're serving the goals of advertising companies in search of trends and "free content". There are some valid bots, such as Feedly, though these are few and far between.

So, rather than encourage advertising companies from building complete link maps of every site on 10Centuries, I'll put in a little bit of effort to sour the milk. AhrefsBot ignores the rules set out in Robots.txt as to a number of other content scrapers, so it makes no sense to provide anything of value to them. I considered two options:

  1. Return a blank page
  2. Return a dynamically generated page that contains 10,000 randomly generated links to imaginary places across the web embedded within a giant Lorem Ipsum

Both of these options would be rather easy to implement, though the second one would be more interesting to create. I already have a Lorem Ipsum generator that can go as far as 500 paragraphs and having every couple of words turned into a link to (potentially) false URLs across the web would reduce the perceived quality of 10C-hosted content to junk status in a matter of days. The bandwidth requirements wouldn't be an issue for the most part so long as I try to keep the pages smaller than 50KB when compressed … which is a rather large HTML document, I must admit.

By setting up one of these mechanisms directly into the 10C core functions there should quickly be a drop in the number of undesired bots accessing the site to see what's new. More importantly, less spam traffic will mean a better response time for people who are legitimately using the service.

Grappling With Questions

There are a lot of questions that a person can ask themselves in the course of a day and it's interesting how simple so many of them can be. Generally these queries will be in the form of a closed question and resulting in a Yes/No response. These sorts of questions are easy and require very little cognition. Would I like another cup of coffee? Should I have another cookie? Is it too early to take Nozomi out for another walk in the park? These are easy. Over the last couple of months, though, I've been finding myself thinking about some of the more philosophical questions that can have a person debating themselves for days. A lot of these would likely be good topics to write about as the act of assembling the ideas into a cohesive structure would help me better understand the matter, while potentially providing others with a perspective similar or different to their own. The problem with this, however, is that the subjects are often too complicated for me to fully think through. I don't know enough of the surrounding context to reach a complete, wholly consistent solution.

Take the sixth Commandment, לֹא תִּרְצָח (You shall not murder), as an example. This moral imperative to not engage in the unlawful killing of another human is very straightforward and easy to understand. We have the right to kill in self-defence if absolutely necessary. We have the right to kill as a form of capital punishment if absolutely necessary1. We do not have the right to end the life of another person outside of these two conditions2. This essential rule is incredibly cut and dry and, interestingly enough, most of us never think to break it … even if we're a non-practicing member of an Abrahamic faith, Pastafarianist, or atheist.

But why?

Why would an atheist find murder, defined as an unlawful killing, immoral? Where does an atheist get their morals and guiding principles from? What makes one thing good and another thing evil? Personal opinion? Federal laws? Something else?

This isn't to say that a person who does not abide by a religious text cannot be good. There are billions of good people around the globe. Not every person is going to have the same set of beliefs or morals, yet all of them will likely agree that murder is wrong.

So the question still stands. Why?

Do unto others as you would have done unto you? So if you don't murder me (or the people I care about), I won't murder you? This answer seems incomplete if not overly simplistic. What happens in the event a family member is murdered? Or two people? Or an entire branch of the family tree? Is it then okay to engage in some extra-judicial vengeance?

If there is no God, there is no punishment beyond that issued by society. Murder can then be considered "acceptable so long as you're not caught", sort of like speeding in a school zone, taking too many ketchup packets from McDonald's, or lying on a tax form.

But this is part of the problem I've run into when trying to weigh what's written in the ancient texts with a morally atheistic lifestyle. My arguments, which I will be the first to admit have been simplified for the sake of this post, are not deep enough on either position because I have simply not fully thought through the issues and quandaries that exist. One side requires faith and a belief that God provided humanity with a set of morals to follow and, because God is good3, the morals must also be good. The other requires a solid foundation of clear definitions for right and wrong based on emperical evidence compiled over our lifetime — or the lifetimes of others — and codified in a manner where morals can emerge free from the top-down, holier-than-thou format of a structured theology. The argument for a "Universal morality" is not convincing as morality cannot be any more universal than "common sense".

A person can read every book from authors like Richard Dawkins, Sam Harris, Kerry Wendell Thornley, and even Bobby Henderson, and still not have the answers they seek for where morals come from if not from a theological foundation. This would mean that, even if a person were to say they did not believe in God, they would be living by many of the core principles attributed to God.

So, try as I might, topics like this tend not to get published because I lack the cognitive sophistication and historical contexts required to approach the subjects with any sort of cohesive clarity.


  1. Leviticus 24 states: A murderer must be put to death

  2. This is where a lot of the arguments against abortion, right-to-die, and mercy-killing stem from.

  3. This was another topic I've tried to write about countless times over the years. If God is good, then why is there so much pain, suffering, and outright malevolence in the world? "Because there is no God" is a shallow and unfulfilling answer as it raises several hundred follow-up questions, each with their own answers weakened by degrees of incompleteness.

23:45

Cold rain is hitting the side of the house with a regularity that belies the random distribution of droplets falling from the sky. The sound is remarkably calming, particularly when a slight breeze pushes the frigid February precipitation towards the glass door that separates my working space from the 2˚C weather outside. Nozomi is asleep in her bed, snoring peacefully without a care in the world. I imagine Reiko and the boy are doing the same upstairs given that it's almost twelve o'clock on a school night.

Behind me the hum of the fridge is distinct from the air conditioner, which is scheduled to shut itself off for the night in a matter of minutes. Every so often the hot water heater fires up to keep its 10-litre tank at a stable 41˚C. I mustn't forget to turn that off on my way to bed.

Some nights are best enjoyed when listened to. Tonight is one of them. The neighbourhood becomes quiet around this time as people turn in with the hope of a solid six hours rest before the next day begins. It's when everyone is asleep that I can hear the changes that have taken place over my lifetime.

On a similar night just five years ago I would have been reading on my phone while the sound of a NAS hummed noisily away in the closet that also doubled as a podcast recording space. Ten years ago the room would have been silent save for the ceaseless traffic of Highway 21 and the occasional passing of a JR train. Fifteen years ago the room would have been quite dark, but the bass from the neighbour's endless parties would shake my windows. Twenty years ago I would be sleeping in a bed much too large for one with a collection of replica Japanese swords by my side1 in the event someone broke into my apartment via the fire escape, as it was not exactly the safest neighbourhood to live in. Twenty five years ago the room would be completely dark and I'd have music piping into my ears via a pair of headphones.

What we remember about places tends to be the things we seldom think about. My memory of the places I've slept has certainly faded over time, but it's the minutia that tends to stick in my mind the most. Sometimes I wonder whether I would again feel comfortable in any of those places if given the opportunity. Fortunately I'm happier here with the rain, snoring dog, humming appliances, and sleeping family than anywhere else.

Late at night, when I'm essentially alone with my ears, I like to listen to home.


  1. I wasn't very bright back then. One could argue that I'm not very bright now, either, but I know not to keep weapons around with the intent to use them in a country where the "victim" of a crime is determined based on who needs the greatest amount of medical attention. This is one way that Japan and Canada are remarkably similar.

Bizarre Pizza

Pizza in Japan has often struck me as an oddity given the unique approach places approach the food from. When I was growing up, a large pizza generally came in a rectangle that was cut into 18 pieces and cost $20 and could feed three teenagers … or a family of 8. Many years later, while living in Vancouver, there was a pizzeria near my apartment that sold an 8-slice medium pizza with 3 toppings for $5. In Japan, though, an 8-slice pizza starts at $18 and quickly goes up from there. Suffice it to say, Reiko and I haven't ordered pizza very often in the 13 years we've lived together. That said, a flyer hit the mailbox today with an offer that is ridiculous enough that it might just be worth the absurd sticker price: 全力!ソーセージピザ1.

Aoki's Pizza

Just for giggles, I went to the calorie information page for the pizza shop and discovered that every slice contained a whopping 409kcal!

全力!ソーセージピザ Calorie Information

This would explain why waistlines around the country have started to balloon much like those in Canada did throughout the 1990s to today. It's understandable that restaurants will try ludicrous things in an attempt to attract sales, but this one is a bit absurd … even for Aoki's Pizza.

If the boy were a decade older, Reiko and I might entertain the idea of trying one of these. For the moment, though, we'll pass and enjoy a healthier dinner.


  1. 全力 (ゼンリョク) ⇢ With all one's strength / might.
    ソーセージ ⇢ Sausage.
    ピザ ⇢ pizza.