Search Done (Almost) Right

Yes, it's yet another post about search algorithms and how I'm never satisfied with any of the versions that I write against a MySQL database1. That said, I've managed to cobble something together in the v5 API that doesn't completely frustrate me every time the thing runs, and it's live on my v5 test site right now.

Unlike many of my past attempts to work with search, I wanted the Anri theme to have a very focused way of enabling search from any page without requiring a page load. When the search modal is triggered, the entire screen will be covered and a single text box will appear for search criteria to be entered.

001 - Search For Nozomi

When the results come back, the search criteria is moved to the top of the page and the bottom 75% shows information with the specified words highlighted in off-yellow. Like other versions of 10C, an icon will appear on the left of the title to signify what kind of item was returned. The title is a proper link to the item but, for people who want a bit more of a peek, there's a "Show More" button2. Clicking this will open a simplified version of a post with keywords highlighted and all of the HTML stripped out. I may need to change this in the future to allow images, though, as my posts about Nozomi look a little weird without the visual elements.

002 - Search Results for Nozomi

This form of search is not as instantaneous as the one built for v4's default blogging theme, but it's a lot more comprehensive.

003 - Bright Yellow

With v4 I would often run into issues when trying to return search results for my own site in under two seconds, which is why I "cheated" with the EzReader theme by merging search with an archive page. A full list of blog posts would be retrieved from the API along with some additional metadata such as tags and stored in local memory. This information would be used to generate the full list of blog posts in reverse chronological order and, when filter criteria was entered into the search box on that page, the data stored in local memory would be read to show the results. Unfortunately this would only include blog posts. Social and other post types were completely ignored because it typically results in too large a volume of data to work with.

Lazy, lazy, lazy!

With v5 I've set aside the goal of returning data for my sites in under two seconds and instead opted to return a more complete set of results by querying the database properly for all post types. This will be important going forward as there is no limit to the number of post types a channel may contain.

004 - Expanded Results

The v4 API did have a Search API that could be called to query the database, but this was rarely ever used. What I plan on doing with the v5 implementation is seeing how well it returns data for people and then improving its ability to handle accounts with more than 100,000 items in a single channel.

Using the v5 Search API

If you'd like to see how well the v5 Search API responds to requests, you can do it like this:


Required variables:


Optional variables:


Including the HTML body will send the full, original text of the item and is not included by default. The count value defaults at 75. Authentication is not a requirement but, if the request contains a valid authentication token, account-level search results are made. This means that if there are private or "invisible" posts on a site, they will appear in search so long as the signed in account has the appropriate level of permission.

Hopefully this is a solid start to search done right.

  1. I have some pretty fancy code to whip out for SQL Server that gives me a proper-weighted search result with pretty good consistency.

  2. This will appear only when the full text is longer than the summary.

First Ubuntu Crash

So there I was, just about ready to publish a blog post about the upcoming cherry blossom festivals in Japan, when something so rare happened that I didn't quite understand what had actually happened at first: Ubuntu crashed.

Technically it was Xorg that crashed, but that's neither here nor there in the grand scheme of things. The only time I can remember having an honest-to-goodness issue with Ubuntu is when I would run too many VMs, starving the host OS of resources. The system is generally incredibly stable, which is one of the reasons I come back to this distribution of Linux time and again for both desktops and servers. Fortunately nothing important was being worked on, otherwise I might be tempted to use this as an excuse to install one of the daily builds of the upcoming 19.04 release.

There's less than a month to go before the next version of the popular Debian-based operating system is released and there are a couple of features that I'm looking forward to. The first is a bug fix for name resolution when disconnecting from an L2TP-based VPN. This issue has bugged me for months despite it's relatively simple workaround. When disconnecting from the work VPN I lose all name resolution on the machine, meaning that I cannot use the network (or the web) at all. Browsers complain that there is no connection, and pings to known servers on the home network timeout. The solution is to disable and re-enable WiFi, which is annoying given that I'm using a wired connection. 19.04 will include a fix for this.

The second item I'm looking forward to is the updated Gnome desktop environment. Version 3.22 has some noticeable performance improvements that make the system feel much, much faster. Applications load faster. Animations and transitions are smoother. Memory consumption is lower. Wins all around and, given how the 18.10 version of Ubuntu that is currently running on my notebooks is already a heck of a lot more performant than either macOS or Windows 10, the additional improvements might just make people using a commercial operating system a tad jealous.

Of course, Linux on the desktop may not be something that everyone would want to use on a daily basis. I feel that it is more than ready to be used by a majority of people who do not absolutely require the Microsoft Office suite or the best hardware support for gaming1. Given the opportunity, I would even push the management at the day job to consider ditching Windows 10 for Ubuntu given that the vast majority of the computers at the schools require little more than a browser and Skype.

Maybe I'll bring the topic up when the 20.04 LTS release is announced.

  1. Gaming on Linux is getting better from what I'm told, but this is still something I'll try to actively avoid in order to make better use of any spare time. A nice train simulator or some of the better Need for Speed offerings could really interfere with sleep, work, and other responsibilities.

This Tesla owner tested his car's Autopilot auto-braking on his (soon-to-be-ex-) wife


Some absolute genius thought it would be a great idea to try out his Tesla’s Autopilot Emergency Braking system by driving straight at his own wife.

What I find odd is that it seems the woman agreed to have a car driven at her …

Vengeful sacked IT bod destroyed ex-employer's AWS cloud accounts. Now he'll spent rest of 2019 in the clink


Bloke hit delete on £500,000 of 'business-critical data' after he was let go for 'poor' performance

What an idiot. So not only will he spend a few years in prison, but he’ll need to train for a completely different career path while in the slammer because he’ll never work in tech again.

Let's Do This

Believe it or not, one of the many things that I enjoy throughout the day is putting the glowing screens away and spending time in the physical world. This can be done by going outside for a walk around the neighbourhood, which I'll readily admit involves blocking out the world by listening to podcasts while enjoying the sights and smells of the local environment, or enjoying a moment with family. When I'm at the desk there are always a number of work-related distractions vying for attention, so stepping away allows some time to focus. What I've found interesting about my desire for focus is not so much the fact that I'm passing a lot less time online, though this is unexpected, but that distractions in everyday life are generally batted away with more ease than I ever expected. This is something I've wanted to do for years and, after months of work and effort, it seems the goal has been mostly reached.

Perhaps some examples are in order.

This past Monday the family and I went out for a walk in the morning. As one would expect, the boy was happy to be outside with both of his parents and we made our way to the park. A few minutes into the walk, Reiko noticed that the recycling truck hadn't come by and wanted everyone to go back to the house so that we could take out a single stack of magazines. I suggested we do that later if the truck still hadn't come by. As one would expect, this rebuttal was not at all appreciated. My response was simple: "We decided to bring our son to the park. We're here now. Let's do this."

Going back home would have been a distraction. It would have upset the boy, who generally doesn't like leaving the park to begin with. There were fewer than ten magazines to take out, so it wasn't like we were buried in unwanted paper. Being present now, focusing on now, was the better use of time1.

Yesterday Nozomi was in one of her playful moods, so I stepped away from the computer to give her some much-deserved attention. While we were playing there were two chimes from the notebook telling me that some emails had arrived. Given the rather strict filtering rules that govern what appears in the Inbox, any message that makes it through is generally something relatively important or information required for a task that I'm working on. However, just like on Monday, I chose to focus my time and attention on the playful puppy. Just glancing at the computer would have been a distraction. If something were truly important, there would be a phone call. Email can wait, and that's exactly what it did. Being present now, focusing on now, was the better use of time … and I didn't feel bad about it.

There are a half-dozen more examples from just this week alone and they all follow the same pattern: focusing on this because that can wait.

Thirty year old me would be shocked to see this change. Thirty-five year old me would as well. In two weeks I'll be 40 and my opinion on the evolution is simple: it's about time.

  1. Also, as one would expect, this line of reasoning was not appreciated.

More is More

There's a lot to be said for a minimalistic simplicity. There's even more to be said for something that just works. The Anri theme on the upcoming 10Cv5 platform is something that I would really like for people to use without thinking about the level of complexity that is operating under the covers. As with all my work, the code is human-readable for anyone who is interested in seeing why things do what they do, but only those interested should even have the thought cross their mind.

Graph Paper and Pencil

Over the last couple of months I've managed to fill an entire notebook with scribbles that describe how functions work, why certain decisions were made, why others were avoided, and which order work should be performed. The amount of effort that has gone into Anri over the last three weeks covers barely a quarter of what's been planned for completion in the next little bit, though every update is incredibly important.

Last week the RSS and JSON Feed mechanisms were published. Today the ability to upload files and edit posts directly from Anri has been released. The next set of updates will focus on conversation threads and the OpsBar that runs along the top of a site when people are signed in. None of these are easy, nor should they be. If the v5 version of 10C was going to be easy, then I would have created a static site generator.

All in all, I'm quite happy with what's been accomplished so far this month. There's just one more quick little update I'd like to complete and send live before moving my main site over to the new platform, and I might just be able to get it written and deployed before the end of the day.

Here's hoping that the people who are testing the new platform are just as happy with the recent updates as I am.

Worse Than Failure

A "faulty server migration" is being blamed for the extinction-level removal of songs, photos and video from myspace. Tragic as it may seem, I very much doubt that the company has invested the necessary resources required to recover the data as it clearly does not "add value" for the current management team. What is disappointing about this affair isn't so much that it's the second time that myspace has suffered some catastrophic data loss1, but instead that the company has given up on trying to recover the four years of data for people who still actively use the service. An argument could certainly be made to skip or delay data restoration for accounts that have been idle for over a year and most people would likely agree that this makes sense. But never give up on your die-hard fans.

Scratched Platters

You Can Call It "Protection"

At the day job, management is making some similar noises about data. We're 16 months into a 30-month project that will see every location around the world move onto a single, cohesive set of cloud-based solutions. Students and instructors will finally have a consistent set of resources to use, regardless of where they happen to be on the globe. The company tried to do this once before with an in-house CMS and failed. This second attempt is being built on a business-oriented cloud platform with a larger group of people and expensive vendors. Failure is not going to be an option. That said, one of the decisions that was made early on is that we're really only going to import the last two years of data from the myriad of databases around the world. Some of our digital systems have student and lesson information going back to the late 90s. Does the company really want to toss all this away?

Some people are terrified of what might happen if two decades of information is intentionally left out of the new system and rightly so. Businesses cannot always make the best decisions about the future without understanding the past. Importing everything into the new system would be cost prohibitive2, which means there needs to be another system set up somewhere that can contain all of the data that was not converted for the new software. But how does one go about putting several dozen databases from different platforms with different schemas into a unified system that can be effectively indexed, searched, and reported from?

This is where Microsoft's Azure Data Lake may make sense, and I've been pushing hard to make it happen.

The day job currently has systems that use SQL Server, MySQL, Oracle, FileMaker, Access, and Excel as a back end3. A couple of schools in Europe even managed to build some tools with NoSQL databases. There's no way for all of this to be logically put into a single, unified system. Instead a data lake could be used to store unstructured and semi-structured data from all of these systems. This would make it possible for reports to be generated against the larger data set, pulling from all regions or just the specific locations a person wants to query. Data going back to the 90s would reside in this data lake as well as data that was recorded yesterday. More than this, data that will be created in the future could also be put into this data lake through regular synchronization processes, making it possible to have a comprehensive source of reporting data.

But there's more to the data lake idea than just reporting. My ultimate goal for this massively complex collection of data is not just to help the business answer questions, but to ensure the people who rely on our systems don't have to live through a myspace moment. Systems fail. Data gets lost accidentally. Vendors become undesirable. At no time should my employer have a single point of risk when it comes to our student (and instructor) data. Not having a backup strategy in place would be worse than failure.

One of the many services that I offer a lot of my freelance clients is the peace of mind of being their off-site backup keeper. Fortunately this is something I can manage pretty decently as few archives are over 50GB in size4. By using a data lake or something similar, I can ensure the day job has viable options should the unthinkable happen.

  1. The first time (that I can remember) myspace lost a bunch of data was in 2013 after a redesign that required everyone to rebuild their communities from scratch. This was one of the many problems that pushed the less-dedicated into Facebook's waiting arms.

  2. This is what I'm told, anyways. If it's true, then I'm quite upset with the senior executives who signed off on the vendor contracts, as they would have known full well what sort of lock-in we were getting into.

  3. Yes, I know that Excel is not a database. I know it should not be used as a "back end" for anything. Yet here we are …

  4. I generally burn backups to a DVD or BluRay disc, and my BluRay recorder only supports up to 50GB discs. I'd love to get a BD-XR burner at some point, though, as fitting 100GB on a disc would free up a lot of media binders and reduce the amount of data I keep on the NAS at any point in time.

Do People Still Torrent?

Earlier today one of my Ubuntu torrents hit a ratio of 1000:1, meaning that I have transferred the equivalent of 1000 copies of the open source operating system to people around the world. I generally have a number of torrents running at any given time, and they're all different distributions and versions of Linux. The reason for this is pretty simple: I would like to help people obtain their copy of an installation disc as quickly as possible, and sharing Linux is pretty much the only legal use of BitTorrent in Japan. Yet when the notice popped up to let me know about the golden torrent ratio, I was disappointed that it took as long as it did. The file was completed on December 23rd of last year. Over the next 85 days I would upload the equivalent of 1,000 copies. It seems … unremarkable.

Chalkboard Downloading

BitTorrent was incredibly popular a little over a decade ago, with sites going up and being taken down with such regularity that it would sometimes be hard to find the latest episodes of a TV show or a decent-quality copy of a CD. Large trackers such as ThePirateBay and EZTV would regularly appear in newspaper articles, letting people new to the idea of downloading entertainment without paying for it know what to type into Google. While living in Vancouver, and for the first couple of years I lived in Japan, I would often make use of the nefarious sites to get the most recent episodes of The Simpsons, Futurama, and a myriad of interesting documentaries. These were programs that just weren't available in the country in any usable format without investing a great deal of money in shipping fees. Some time around 2012 the government passed a law that made it possible for ISPs to rat out their customers to various copyright holders. Within a few days of the law going into effect, some people were arrested to "send a message" and just about everyone I knew in the country who used torrents gave them up overnight.

But I kept going … albeit with the limit of sharing only open sourced Linux-based distributions.

Over time people moved on to Netflix and Hulu, or suffered with whatever could be found on YouTube. Talk of torrents almost completely disappeared, even online. Every now and again there will be a magnet link on someone's blog to download a presentation or a conference talk, but these are few and far between. It's as though the technology has been labelled "for criminals only". Maybe this is semi-accurate.

Over the decades that I've been online, file sharing has evolved quite a bit. I was first introduced to the concept in high school when people would download files from a BBS and share 3.5" floppy disks with friends. Then there were the ever-busy XDCC servers on IRC, where your connection had to do something in 10 seconds otherwise risk being considered "idle" and disconnected. Then came the FTPs (that were broadcast on IRC). Afterwards I joined an ISP that had newsgroups and I would spend hours downloading RAR files, testing partity with PAR files, and screaming at the screen when part 23 of 25 of the last segment in a 150-piece RAR file didn't appear in the listings, rendering the entire download pointless. Later came Napster, which changed my relationship with music, as it was now possible to listen to the artists that Canadian radio stations refused to play. Later, for those who didn't mind viruses, LimeWire and e-Mule were the places to trade just about anything on your computer, including personal finance databases1. And then, when it seemed that downloading an entire file from a single source was no longer the best option, BitTorrent came along.

This is where I stopped. I don't know what came next, if any superior technology superseded torrents at all. I'm not particularly interested in downloading TV shows, music, movies, or anything like that from random strangers anymore, either. Streaming services are generally quite reliable and priced competitively. Spotify gets $10 a month from me, and I subscribe to Netflix two months of every year2. If there are movies that I'd really like to see, there's a number of providers who'll make the video available for anywhere between one and five dollars. It doesn't make sense to pirate content when the commercial offerings are generally good enough, even for people in geo-restricted countries.

Of course there are still going to be people who cannot or will not pay for the digital files they seek. There's no getting around this. The pervasiveness of digital piracy seems to have diminished, but it will never go away. Do people still torrent? Most certainly. Is it widely used in the Linux community to share the various distributions? Oddly enough … no.

  1. I wasn't the only one to discover that a lot of people would just share everything on their C drive, including the full contents of their personal documents folder.

  2. This is generally long enough for me to catch up on anything I'd like to see, minus a few shows on Fox.

Five Things

This week I decided to bump up the allergy medicine dosage to 1.5x of what is recommended in an effort to stave off this season’s excessive assault on my sinuses. With each pill weighing in at 88 Yen a piece and the recommended dosage being two pills every 12 hours, I’m now paying just over five dollars a day to have relatively decent vision and breathing, with a slightly less-runny nose. This could be much worse, of course, but $140 a month for a partial respite of a non-fatal condition irks me. That money could be going to something far more worthwhile, such as a new hard drive for the NAS. Hopefully there will be a “cure” for seasonal allergies at some point in the near future. Knowing how the world works, though, this cure would probably cost the equivalent of 50 years of medication and come with a possible risk of contracting a permanent bout of Montezuma's revenge.

All this aside, let’s get on with the list. In no particular order, this week I’ve been thinking about …

Mitsubishi Pens

Over the last couple of months I’ve been using a Frixion ballpoint pen when writing notes and thinking through data diagrams. While having the ability to erase ink from a page is nice, the feel of the writing tool is sub-par. If I wanted to write with a toy, I’d get one of my son’s colouring crayons. What I would really like is to find a stationery store around here that sells the Mitsubishi PiN felt-tipped pens or, barring that, the Mitsubishi UB-150 ballpoint pens that I used while working in the classroom.

A bad pen is like having a bad sword, in that you spend more energy battling the failings of the tool than accomplishing your goals.

Everything Old Is New Again

While watching the news a commercial came on to promote some stupid cell phone game. The announcer sounded really excited about the “all new, original characters” that are supposed to entice people into downloading the thing. A couple of these fictional appeared on the screen and I almost laughed at what I saw:

The Main Characters of Magic Knight Rayearth

  • Shidō Hikaru
  • Ryūzaki Umi
  • Hōōji Fū

These are the three promary protagonists in Magic Knight Rayearth, a manga that was published between 1994 and 1995.

There’s nothing new or original about taking characters from 90s manga and anime and injecting them into some predictably boring, digital card-based RPG. This is just about the only kind of non-hentai game that comes out of Japanese software studios anymore.

Resilient Nails

My feet have taken quite a bit of punishment since the boy has come along. Aside from stepping on things, I’ve been stubbing my toes on various safety gates that are supposed to keep him from getting into the kitchen or falling down the stairs. This culminated about a month ago with a definite crunch as my left little toe connected directly with a metal bar. The nail did not survive the encounter and I was afraid that it would never grow back again. My fears were misplaced, though, as a replacement appeared within a couple of weeks to do whatever it is that toenails do.

The human body is very interesting and quite resilient.

Auto-Carrot and Spiel Cheque

Maybe my typing on the phone has deteriorated to the point where the software has given up trying to understand what I mean from what I say, but it certainly feels like auto-correct on iOS has gotten a lot worse since the company moved away from skeuomorphic design principles. Words that should be understood are left with an incorrect consonant in the middle and a plethora of proper nouns get an “a” inserted before the word. “Jason a Irwin” is not a thing any human has ever spoken aloud.

A few years back I had disabled auto-correct for all the lag and bugginess on iOS 9. I turned it back on near the tail end of iOS 11 because everyone swore the system had gotten better. Maybe it’s just better to leave the toggle in the off position.

Asking for the Moon

Last week one of my bosses asked me if there was anything I would need this year to do my job a little better. While I generally say “no” to this question, this time I opted to ask for a more powerful Lenovo notebook. I outlined the reasons, explained the problems, provided the evidence, and listed out my ideal system: a Lenovo ThinkPad X1 Extreme with

  • a Core i7-8750H
  • 32GB DDR4 RAM
  • 2x 512GB NVMe SSD in RAID0
  • 4K display at 500nits

Total sticker price after discounts from Lenovo is just over 300,000 Yen, which is about $2800 USD. My boss was in support of the idea but no decision has been made just yet. Instead I may just be given some VM instances on local server or EC2 to spin up when serious number crunching is required. The reason I asked for the beefy system is because many people have said that we don’t get the things we want unless we ask for them, so I asked. Let’s see if it comes to pass.

That’s it for this week. Tomorrow is the start of a 4-day week in Japan, and I hope it’s a good one for everyone.

Losing Consciousness

When bedtime for the boy rolls around, I typically being him upstairs, read a book with him, then tuck him in for the night. Once he’s ensconced in his blankets, I sit on my bed and read the news for a bit. This has been the general pattern since moving to the new house, and it’s a good opportunity to spend a bit of quiet time with the kid as he drifts off to sleep. What usually happens, though, is that I fall drift in and out of unconsciousness a dozen times before the boy and every time I wake it’s with a little start.

The last couple of years have been pretty rough on the sleep cycles. Up until mid-2016 a full night of resful sleep could be obtained just by walking 10,000 steps in a day and doing some of the house chores. Working from home means that there are always more chores, but the 10K steps goal is just not feasible in the near future. Being a parent seems to require a great deal more energy than walking 8km.

So now I’m sitting on my bed, listening to my son talk to himself1, and trying to finish this short little post before giving in to the sandman. I’ve fallen asleep four times already, dropping my phone three times in the process, so it’s only a matter of time before my eyes close one last time for the day.

How do parents of multiple children manage to stay awake?

  1. Or an imaginary friend … or a ghost. He keeps talking about a Coco-chan, and I don’t know anyone by that name.