From Data to Information

For a lot of people it seems there is no difference between data and information. In reality, the two could not be more different. Data is primarily unstructured or semi-structured elements that exist without meaning or context. Information is structured data that has context and can be used to make decisions. One of the first questions I ask when I begin to design systems that will store data is what sort of information a person wants to get out of the tool later. The answer will fundamentally guide the sorts of data that is collected to ensure that the desired information can be returned and, more importantly, be expanded upon as time goes on. This second part is just as crucial as the first.

Earlier today I found myself reverse-engineering a system designed by a vendor in order to solve a rather serious business problem created by the very same system. The problem amounted to an API that was designed to present a great deal of data rather than actionable information. As a result, teachers in the classroom have found it incredibly difficult to deliver their lessons. What struck me as odd about the software is that an API is generally used to present data in a structure, thereby ensuring it is parsed as information. This particular tool, however, appeared to have a consistent structure but wound up being little more than a data dump of keys and values. It was up to the JavaScript that read the data to determine the context and convert the data into information. Unfortunately, the implementation resulted in a website that would crash older tablets or present just a partial subset of information, which put the onus of "filling in the gaps" onto the teacher who had neither the time nor the resources to do anything of the sort.

So, being the corporate fool, I quietly waded into the mess and started reverse-engineering this system in order to extract the data to populate a database of my own design, then structure the data in a logical manner for the business need, then present it to the teacher in a format they can use. Given that this is a system designed to show textbooks, the lack of structure and clarity in the vendor's system has me questioning whether they understood the actual problem the business needed to solve in the first place.

In the space of six hours, I managed to reverse out the entire system and copy the bulk of the data from the vendor's system into my own, then build a preliminary API structure to return to a browser. Tomorrow's task will be to take the information and turn it into a textbook with the same formatting and features, plus a bunch of other details that should have existed from the first day the system went live. Two days of work on my part and a brand new system can replace nearly two years of development from a high-priced vendor. This sort of turn-around for problem-solving solutions is probably why a lot of senior managers at the day job allow me to break rules from time to time.

While I can generally turn around and solve problems like this through sheer force of will1, how can others avoid making the mistake of leaving data bloated and without form?

It comes down to understanding what a person wants out of the system, that early question I ask before writing the first line of code.

For this example, the goal of the project was to have an API return enough information to dynamically construct a textbook. Leaving the front-end code out of this, what would an effective structure be for a textbook, or a group of textbooks? Let's break down what sort of data makes up the information that is a textbook.

At a minimum we would need:

  • title ⇢ the title of the book
  • chapters ⇢ the sections of the book, allowing for a table of contents to be built
  • pages ⇢ the pages associated with the book, and possibly a chapter object

There is a whole host of meta data that could be included, such as a cover image, authors, publisher, ISBN numbers, MSRP, inventory on hand, search keywords, access permissions, and the like. The sky is really the limit when it comes to metadata, but the receiving software needn't be overloaded with data it never reads. If an API is going to return structured data, most of it should be used. If a complete dataset is only sometimes required, then an API filter show allow an application to request a limited amount of data or the whole shebang. What's nice about going this route is that websites that call the API will not be returning large amounts of data to discard or uselessly store. The less data there is to transfer, the faster everything can operate.

The original API decided to include everything about a digital textbook, including elements that would never be read by the front-end code. Details relating to the source system with index keys and when the chapter or page was last edited in that tertiary system. Details outlining the amount of storage space remaining on the API server, which is of no value unless regularly uploading. Details that appeared to be just random numbers thrown into an array. Details that included the address and contact information for the publisher of the book … which was attached to every page object, resulting in 477 sets of duplicated publisher information for one common textbook. The entire package was 6.68 MB to download, which took an average of 4.1 seconds.

Not cool.

My solution, which is probably not the best solution, stripped a lot of this information out. I put the title, chapters, and pages into their own objects and ensured the basic metadata was in place to show ISBN numbers, and similar details. The entire package now weighs in at 682 KB and can be downloaded in under a quarter second. With some compression on the server, the JSON object can be reduced even further and expanded at the browser. The next step is to replicate the front end with less code and more functionality to aid the teacher in the classroom.

How did this happen?

The people who made the current system are not stupid. I've worked with them on a number of occasions in the past and know the main developers are doing the best they can within the bounds of the client-vendor relationship. One of the problems that I've seen time and again, though, is that people often fail to ask about the ultimate goal of any system. This one started out with a colleague saying "We need a digital textbook system" and then answering a hundred questions around the idea. Looking at the early notes from the project2. Not once did the question of "What does the teacher see?" get asked. Heck, from the meeting notes, that question wasn't asked until 7 months into the project! Well after the database and API were designed.

I'll admit that I tend to look at business problems from the point of view of the person who'll be stuck using the things I create rather than the managers authorizing my wage. This often means that I may not create something that leaders ask for and instead provide the solution their people want, which involves quickly turning data into information and getting the heck out of the way. Being an internal resource means I have a lot more flexibility and access to these people than a vendor might, which gives me an unfair advantage. Fortunately it's one that the right people have appreciated a few times in the past.

When it comes time to solve a business problem, one of the very first questions needs to be "what do you want out of the solution?" Everything else is just window dressing.


  1. Sheer force of will … and a quarter-century of experience writing software. I've made every mistake in the book, plus a bunch that have never been documented. It's important to remember past mistakes and their solutions so that future endeavours can be more successful from the start.

  2. Everything is recorded in JIRA … which is both a good and sad thing. Good because documentation is key. Sad because someone had to put all of this stuff into JIRA.

Gaps

For the better part of six months, I would keep two browser tabs open on my phone and notebooks for nice.social and beta.nice.social. The first site ran v4 of the platform while the beta ran v5. This was sub-optimal, but allowed for a good deal of testing to take place with the newer software in a realistic setting. Earlier this week when a server update took down the v4 service, the decision was made to move everyone and everything over to the new platform because I felt that it was ready despite a handful of incomplete items. As was to be expected, there were a whole lot more gaps in the tool than I had anticipated.

A good amount of time has been dedicated to migrating data and resolving reported bugs over the last three days and it has brought back memories of many other migrations I've done over the years for personal projects, client projects, and with several employers. When things go smoothly, it means that something is most probably wrong. When things are hectic, it means that something's wrong but the people reporting the issues give a darn. Crazy as it might sound, I generally prefer any sort of migration that is going to involve people who give a darn.

Some of the problems reported include missing posts, broken avatars, missing functions, and site routing issues. When something is reported, I write it to an ever-growing list of tasks, making sure to set aside the time to resolve the matter. If the missing or broken item is actively affecting people, then it gets pushed up closer near the top. As of this writing the critical items have been resolved1, and a half-dozen other issues remain. The ones that will be tackled next include:

  • change the font on the Anri blogging theme to a better sans serif font
  • resolve some of the reported CSS issues on the Anri theme
  • enable messages via the OpsBar[2. The OpsBar is the name of the bar that runs along the top of a 10C site when signed in.
  • return a JSON response for an object with a canonical URL when the HTTP header requests a JSON response
  • enable follow/block lists on the social site
  • complete password-protection handling in the Anri theme

There are also close to 1800 blog posts that still need to be brought over, and the podcasts need additional work to ensure all of the meta data is imported and sent properly in the syndication feeds. If all goes according to plan, all of the core items will be resolved on Monday or early Tuesday and then the focus can shift from "Identify and Repair" to "Converse and Extend".

If there's one thing I can take away from this experience, it's that I should really look at having data migrated daily in an automated fashion during the development phase. This would ensure that migration scripts were complete, meaning the actual migration would be done at the full speed of he server.


  1. If they weren't resolved, I wouldn't be blogging.

More is More

There's a lot to be said for a minimalistic simplicity. There's even more to be said for something that just works. The Anri theme on the upcoming 10Cv5 platform is something that I would really like for people to use without thinking about the level of complexity that is operating under the covers. As with all my work, the code is human-readable for anyone who is interested in seeing why things do what they do, but only those interested should even have the thought cross their mind.

Graph Paper and Pencil

Over the last couple of months I've managed to fill an entire notebook with scribbles that describe how functions work, why certain decisions were made, why others were avoided, and which order work should be performed. The amount of effort that has gone into Anri over the last three weeks covers barely a quarter of what's been planned for completion in the next little bit, though every update is incredibly important.

Last week the RSS and JSON Feed mechanisms were published. Today the ability to upload files and edit posts directly from Anri has been released. The next set of updates will focus on conversation threads and the OpsBar that runs along the top of a site when people are signed in. None of these are easy, nor should they be. If the v5 version of 10C was going to be easy, then I would have created a static site generator.

All in all, I'm quite happy with what's been accomplished so far this month. There's just one more quick little update I'd like to complete and send live before moving my main site over to the new platform, and I might just be able to get it written and deployed before the end of the day.

Here's hoping that the people who are testing the new platform are just as happy with the recent updates as I am.

A Missing Piece

Something has been bothering me with the structure of the 10Centuries data model recently and I've been unable to identify exactly what the problem was until I was on the train home from work yesterday when, in a flash of insight, things became clear. The underlying problem stems from a core limitation in 10Centuries with regards to accounts and how they're used in the content publishing mechanism: every account is treated as an identity, and every identity is essentially an account. If a person wants to publish posts on one website as "Tom" and on another website as "Jerry", they can easily use the same account. The problem is that when someone looks at the blog post via the RESTful API, they'll see that Tom and Jerry are both the same account but with a different author name.

This is silly, and it's something I've also blindly dealt with over the years writing the occasional post under an alias on various sites all the while using this very account. To make the matter even more blindingly obvious, publishing a social post alongside a blog post or podcast would come from the origin account, not an alias. So what exactly was the point of allowing aliases in the first place?

But this line of thinking is how the missing piece fell into place …

Missing Piece.jpg

What 10Centuries could really use are "personas". A single account could have many personas, but a persona would belong to only one account. This would allow a person to have what appears to be multiple accounts on here that are all available through a single sign-in. But then comes the obvious question: why would anybody want to do this?

With the dwindling activity on 10C — across all functions, not just social — I've been looking at ways to potentially resolve the "dead timeline" problem that people can face when they look at one of the social apps or sites. A lot of people have micro.blog accounts, and that project appears to be "open" enough that it's possible to build the necessary interfaces to play nicely with the network. Making 10C an "easy" way for people to use their micro.blog account seems like a natural fit. People would see posts from that network in their timeline, and they'd also see locally-created posts as well. The system would be smart enough to know where to send the response, posts too long for a single 280-character object on micro.blog could be presented as a partial with a link back to the full message in a proper stream view on a person's personal 10C site. Of course, people who do not wish to use micro.blog wouldn't have to. They could continue to use the system however they choose and everything would continue to work as expected. As working with micro.blog would require the full adoption of microformats, 10C sites would also — finally — adapt to play nice in the IndieWeb space.

Why the multiple personas, though? Separation of identities, really. Not everyone may want their micro.blog account tied to their main 10C account, and not everyone would have just one account to link to. It makes sense to allow multiples for personal, professional, creative, and silly accounts to exist all within one account, while appearing as multiple personas with varying degrees of data visibility via the API.

This is more or less a logical progression so far … right? I certainly hope so.

But wait. IndieWeb is an alternative to the "corporate web" and its silos. While 10C is most certainly not a corporate entity, it is still very much a silo. So this means in order to make this idea ideologically complete, I need to share the source code for 10C with the world.

Okay. I can do that. Version 5 is still very much slated to be the 100% open version of 10C that anyone and everyone can download, install, use — and abuse — to their heart's content, ideally finding issues and contributing back to the community whenever possible. This just means that Version 5 needs to have files to share.

Another Ambitious Goal?

As I explained recently, the last little bit at the day job has been an absolute slog. I'm not excited to go to work and solve problems there anymore. I'm dreading the inbox with every glance at the little mail icon. I'm just not much into the corporate tools right now. What I'd like to do is focus a bit on making something that I can be proud of, even for a little while. A basic version of 10C v5 might just pull me out of the doldrums and get the brain firing on an extra cylinder or two.

But this means I need to put a bit more thought into the data model and how everything will be structured. With 10Cv4 I started playing with the idea of "channels" and how every object was a self-contained entity that resided in a channel that could be accessed via a site or a pipe. While this model did solve some interesting problems, there's still room for improvement. The same can be said about the account structures, which still use the terrible term user and will be changed as soon as is feasible. It's already been decided that accounts can have personas, and accounts can own channels and be granted permissions to them. But what else can be improved about accounts and the model? Are the ToDo and Notes entities as effective as they could be? How about the Photo entities? There is a lot to think about, but not all at once.

What I'm going to aim to do is release a proof-of-concept of a Version 5 implementation in the coming weeks. It will not be too ambitious at first, instead being a simple little tool that will contain an API, a web presentation layer, files, and the start of a self-hosted platform with a few ideas I've been toying around with for 10Cv4 that would probably make more sense to ship with a more modern API. If the project is something people see potential in, then maybe there will be a few people contributing to the project and building something much better than a single person could hope to accomplish. I won't hold my hopes out, though.

There's a lot that I still don't understand about microformats and the IndieWeb project, and there are things in both projects that break my brain as I try to parse the logic behind decisions without knowing the context. That said, it's one of the few ways forward for people who truly care about owning their data and for the 10Centuries platform that I've invested so much time and effort into building.

Let's see if this concept of personas is the missing piece that will make 10C a little more interesting.

Books I'm Reading (or About to Read)

Since leaving the classroom and returning to the world of software development, I've tried to spend at least an hour a day reading about the changes that have taken place in technology since 2010. In July I started to dedicate my in-bed-before-sleep time to this task, and it's resulted in a lot of books being read and a bunch of new skills being acquired or refined. While it's impossible for any one person to know everything about a given subject, it should not be impossible for one person to know a healthy amount about a number of different subjects. This has certainly been the case with me while I learn more about data modelling, database design, and data warehousing. In fact, looking back at the technical books I've read in the last six months, it's easy to see that the vast majority are all related to databases in one way or another, and the four I have dedicated for January, February, and March are all on SQL Server.

I think I may have a little bit of a database fetish.

SQL Server Books

Last week I finished Stacia Varga's Developing SQL Data Models exam reference and I'm currently going through Victor Isakov's Administering a SQL Database Infrastructure reference in preparation for an upcoming Microsoft certification exam. On deck is Jose Chinchilla's Implementing a SQL Data Warehouse as data warehousing is a topic that has recently piqued my interest. Randolph's SQL Server 2017 Administration Inside Out is expected to be released by the end of next month, and I'll likely set aside some additional time to ingest the wisdom contained in the book. There's just so much to learn and explore!

A few people have asked why I read so much. They want to know specifically why I read so many technical books. When I think about it, though, the answer is not so cut and dry. Sure, I'd like to learn more about these tools so that I can make better use of the technology, but this isn't the only reason. Buried deep in the curiosity is the desire to discover what I do not yet know. As Bart Simpson so eloquently said to Mrs. Lovejoy all those years ago, "what you don't know could fill a warehouse."

It's true. We generally do not know what we do not know, and it's because of this ignorance that incomplete or inefficient decisions can be made to solve problems that other, smarter people may have resolved years before. Not a day goes by where I don't learn something new about the tools I use, and I hope this does not change anytime soon.

There's a certain excitement that comes from reading about an interesting feature or function, then trying it out and thinking about how it might be used to solve a real problem elsewhere. Back in November, I said that 10C would switch from MySQL to SQL Server just because I wanted to gain some experience with the platform on Linux. The conversion was finished mid-December, and nobody has reported any issues with the service since the switch. It was, by all accounts, one of the smoothest migrations of 10C I've ever performed. Learning more about SQL Server will (hopefully) allow me to do even more with the database going forward. More than this, I'd like to better understand how complex business problems could be solved with better use of this powerful tool.

While 10C is a personal project that I take very seriously and put a lot of care into, there is simply not enough "hard" work for me to do with the database. Businesses, however, ask a lot of tough questions. Depending on the quality of those questions, businesses may ask the question again and again in the form of reports. Being able to build the SQL queries to quickly and accurately return the answers is certainly a worthy place to use the skills I'm working so hard to acquire.

I Need To Be Chris

Between 2002 and 2007, I worked at a medium-sized company in Canada that was best known for its calendars and other print materials. I started in the warehouse and, over the course of 3 years, moved into different roles that culminated in a position as a software developer and worked with a number of very smart people who taught me a lot about software development, and a lot about how to ask the right questions to find out what people want the software to do, rather than making the wrong assumptions and delivering something that isn't at all what they're looking for. The person I learned from the most, however, was a man named Chris1.

Chris had a rather wide range of knowledge on just about every technical subject, no matter how obscure the tools might have been. His knowledge on certain subjects would often run circles around others, even when it was their area of focus. And, while he most certainly did complain when he was called in to fix somebody else's problem, he tried to make education part of the solution. There really isn't any point being "the only person who knows X" in a company, because that doesn't benefit anybody in the long run2. The guy seemed to know everything he needed and then some, and was honest enough to say "I don't know" when he really didn't know right before investigating whatever needed to be learned so that he wouldn't answer the same question the same way later.

I learned a lot from him in the two years or so we worked together, and would be happy to work with him again if the opportunity arose.

The way Chris handled situations was often incredibly efficient, and it's something I really need to work on myself. The last few weeks at the day job have been incredibly stressful as I attempt to do four very different tasks simultaneously in order to deliver a project that should have started limited trials back in August. I've recently complained that I shouldn't be doing four very different tasks if bugs and enhancements are going to be resolved by arbitrary deadlines, but complaining about reality will rarely resolve the problems one faces.

I've been incredibly fortunate over the last two decades to have worked with a lot of very different technologies and worked in a lot of very different roles. This sort of make me a little like Chris, in that I can look at a problem from different angles, apply lots of experience to find a solution or — at the very least — know how to find a solution, and have the capacity to do it without necessarily asking for a great deal of help. What I need to learn is how to make common distractions from various groups into learning experiences rather than seeing them as work blockages. When people have questions about databases, I need to guide rather than brusquely answer. When people have questions about X over Y, or the alternatives to Z, I need to outline the gist and provide some basic links to sites with more in-depth answers. The people I work with are not fools. They genuinely want to do a good job and go home knowing they accomplished something, and this is the same goal I have at the end of every day. The question I have now, though, is how to do this without coming across as dismissive or as though I'm "mansplaining"3 something.

Having spent the better part of 8 years working in a classroom, you'd think this would be natural. That said, the teacher-student dynamic doesn't work with peers, nor do I want to have that dynamic with my colleagues. So how does one turn a work-stoppage into a learning opportunity while also meeting all of the arbitrary and constantly shifting deadlines that managers are all to happy to create?


  1. He had a last name, too, but I'll just use his first one here.

  2. Seriously. You don't want to be the person to receive a 3:00am phone call when things go bad … especially if it's with something that isn't technically your responsibility.

  3. I hate this pseudo-word like you wouldn't believe … but it seems to be part of the lexicon, now.

First Time's the Charm?

The last few months have gone by in a blur as both my personal and professional projects have been kicked into overdrive. On the personal side, 10Centuries has seen a remarkable amount of work in the last 90 days which has culminated in the release of version 4.1, an update that's almost as big as the move from v2 to v4 in January of this year. While the platform is not yet mature enough for me to say it's "done", it is coming along nicely with a minimal amount of frustration and just a handful of reported bugs. This gives me hope that the system may be ready for a larger amount of traffic in the near future as more features are completed. Professionally, though … I'm in trouble.

Back in March I was able to finally move out of the classroom and into a full-time development role with the employer. Since then, I've managed to build a lot of the foundation of a new LMS that will be used across the country. However, the project has also massively grown in scope over the same time period. What was originally going to be a relatively simple reporting tool has now become a much more critical reporting tool that can affect the company's finances … as well as a digital textbook.

Visual Studio - ASP and MVC

One of the things that I really like about the challenge of building a big, complicated piece of software for the employer is the fact that I'm creating a big, complicated piece of software. This project will test darn near every skill that I've learned over the last two decades and put them to good use. Every aspect of the code is being hand coded by me and me alone. A wonderful and terrifying prospect all at the same time. I would really much rather be working with another developer. Someone that I could bounce ideas off of. Someone who would disagree and challenge me to create something better.

This isn't to say that I'm working in isolation, though. Nothing could be further from the truth. There are a lot of people working on this project, and it's going to have an impact on the organization this year one way or another. It's my goal to make sure it's a positive one. The code must be solid. The UI must be exceedingly efficient and attractive. The features must be solid.

I'm nervous. With my personal projects, I can make mistakes. With this professional one, there are no second chances.

Complete Conversations

Earlier today I asked a question on App.Net that sparked a conversation that I think was misinterpreted as me bypassing even more of the system1 to allow people to view and interact with posts from people that have blocked them, or they themselves have blocked. While it would be a trivial matter to simply show all posts from all accounts regardless of whether blocks are in place to prevent such a thing or not, this is really not my intention. My question has more to do with completeness than anything else. The next few updates to Nice.Social will be attacking some of the more complex issues that people have been asking for, one of which is a different conversation view that will allow for a TreeView-style layout. Doing so will not be ridiculously difficult so long as conversations are complete entities. It’s when they are incomplete that the code needs to get tricky and begin making assumptions.

Allow me to illustrate.

conversation_a

In the image above, we can see a very basic type of conversation. @person_a starts it off, and @person_b replies. Eventually we see others join in the conversation and it begins to split and evolve into three distinct conversations. This is par for the course on a social network like App.Net, but there’s a problem. @person_d has blocked @person_c from seeing their posts, and @person_c wants to see the conversation view in order to better understand the flow of the conversation. What does @person_c see?

conversation_b

If @person_d blocked @person_c, the first and third conversations will be completely disconnected despite what happened in the second conversation. These two conversations may come from the same conversation thread but, for all intents and purposes, they’re completely different conversations … even if they’re the same subject with the same ultimate root.

What I posited was whether it would be a good idea to show @person_c that the second conversation did indeed happen, as they’d see it in their home timelines, anyway, and link it accordingly.

conversation_c

The difference here is that, rather than seeing every message, we would see that the second conversation includes blocked content. Again, this could be the result of @person_d blocking @person_c or vice versa. It really doesn’t matter. The conversation between @person_a and @person_b that stemmed from @person_d’s comment will appear just as they should in a second tier, and everything would be good to go. This allows for a complete view of a conversation as best as possible. As someone who has worked with databases for over 15 years, data completeness is incredibly important to me. Broken data — especially obscured, broken data — infuriates me to no end as my mind wonders if something is wrong with systems if the human interactions are "out of alignment".

Of course there are a few ways to handle the comment from @person_d. The post can be completely eliminated from the view and a disconnected conversation can be shown, leaving the reader to wonder how a branch replying to @person_d with no discernible parent record came to exist.

Another way to resolve this issue would be to eliminate anything that came after @person_d’s post. The posts from @person_a and @person_b will continue to appear in @person_c’s timeline, but they will not be able to see it in the conversation view as an entire entity.

The third, my preferred way, would be to show that a post (or string of posts) does indeed exist, but collapsed in such a way that it’s understood the material is best ignored. Of course, if a person *really* wanted to know what those messages said, a client could be configured to collect the information without an API key and display it. This, too, would be trivial for someone to add to a personal branch of Nice.Social. Heck, this could be extended to show that posts have been deleted from the conversation, too!

As I said above, when I think about doing a proper conversation view, I think about data completeness. Hiding missing chunks of data is the default answer by many, but strikes me as overly simplistic for the types of complex discussion that can arise on the App.Net platform.

Of course there is the problem of having our posts — posts that we own according to the contract of App.Net — shown to people whom we have revoked the privilege, and this is not something that I will blatantly disregard. There will be no stupid “opt out” function necessary to ensure our posts are not easily read by people we’ve blocked from view. The goal of the exercise was really more about answering the question of completeness and information display.

I’ve said this before and I’ll say it again here. No code has been written to allow a person to access or read posts that have been denied to them. This question was raised as a thought experiment with the hopes of collecting ideas from the people who actually use the network. You are free to examine the source code for Nice to confirm what I’ve said.

Posting Images to App.Net via Client-Side JavaScript

This past weekend I added a much needed feature to Nice.Social that allows people to upload images to their ADN Storage and share images without jumping through hoops. File Storage is something I struggled with in the past, eventually putting the functionality on the back burner while focusing on other parts of the application, but I share an awful lot of images on App.net. As a result, any client that I use will need to have this one function down pat. So, throwing myself at the problem, I dove into the documentation to make this crucial feature possible.

As with any web project, it’s important to think about which browsers the site might be used on, and this meant that using formData as a way to collect and append information was not really an option. formData is supported by almost every current browser, but versions of Internet Explorer prior to 10 do not use this feature. As a result, I needed something a bit simpler. In addition to this, I wanted to have a way to report upload progress back to the screen so that people could see how much of their file had been sent to the server. This meant making use of XMLHttpRequest(), something that is used quite extensively in the Nice.Social code.

Unfortunately, getting an XMLHttpRequest() to play nicely with App.net is not quite as cut and dry as one might hope. Despite what the documentation might state, it does not seem to be possible to upload a file object in one API call. Instead, the process needs to work like this:


  • send a JSON package outlining what the document is

  • receive a file_id and file_token back from the server

  • upload the file using the above-mentioned values

With this in mind, let’s step through the process of uploading a file to App.net’s File Storage. First step, send a JSON package outlining just what the document is.

Disclaimer I’m making this code available with no warranty whatsoever.

document.getElementById(‘file’).addEventListener(‘change’, function(e) {
var file = this.files[0];
var xhr = new XMLHttpRequest();
var url = ‘https://api.app.net/files’;

var fileObj = { 'kind': "image",
'type': "com.example.photo",
'mime_type': file.type,
'name': file.name,
'public': true
};

xhr.onreadystatechange = function(e) { if ( 4 == this.readyState ) { parseFileUpload( e.target.response, file ); } };
xhr.open('post', url, true);
xhr.setRequestHeader("Authorization", "Bearer " + [ACCOUNT_ACCESS_TOKEN]);
xhr.setRequestHeader("Content-Type", "application/json");
xhr.send(JSON.stringify(fileObj));
}, false);




This code will create and upload the JSON object fileObj, which stipulates the mime type, file name, and whether the object should be publicly accessible or not. There is also a type, which allows you to specify a little more clearly what something is. In Nice, I set this to 'social.nice.image', though you may want to set one for your application if it’s going to be a unique type of upload. Be sure to pass the proper account access token to the API through the xhr.setRequestHeader() as well. Of course, the file itself has not been uploaded yet, just the precursors required to allow the upload to take place. So let’s upload some files!

Once the JSON data has been successfully uploaded, the App.net API will return a 200 status code along with a JSON package that contains all of the key information required to upload the actual file. Here is an example of what the return JSON will look like:

{
"data": {
"complete": false,
"created_at": "2015-05-26T00:00:00Z",
"file_token": "{a very, very long string}",
"id": "999999",
"kind": "other",
"mime_type": "image/png",
"name": "filename.png",
"sha1": "ef0ccae4d36d4083b53e121a6cf9cc9d7ac1234a",
"size": 1234567,
"source": {
"name": "Nice.Social",
"link": "https://nice.social",
"client_id": "abcdefghijklmnopqrstuvwxyz0123456789"
},
"total_size": 1234567,
"type": "com.example.test",
"url": "https://cdn.app.net/your-unique-file-name",
"url_expires": "2015-05-26T03:00:00Z",
"user": {user object}
},
"meta": {
"code": 200
}
}



Armed with this information, we can now upload a file to the App.net File Storage. In the above JavaScript snippet, you can see a parseFileUpload() function. This takes the JSON response from ADN and performs the appropriate action based on the meta.code response.

function parseFileUpload( response, file ) {
var rslt = jQuery.parseJSON( response );
var meta = rslt.meta;
var showMsg = false;
switch ( meta.code ) {
case 400:
case 507:
alert(‘App.Net Returned a ’ + meta.code + ’ Error:
’ + meta.error_message);
break;

case 200:
var data = rslt.data;
var xhr = new XMLHttpRequest();
var url = 'https://api.app.net/files/' + data.id + '/content';

if ( xhr.upload ) {
xhr.upload.onprogress = function(e) {
var done = e.loaded, total = e.total;
var progress = (Math.floor(done/total1000)/10);
if ( progress > 0 && progress <= 100 ) {
console.log('Uploading … ' + progress + '% Complete');
} else {
console.log('Upload Complete');
}
};
}
xhr.onreadystatechange = function(e) {
if ( 4 == this.readyState ) {
switch ( e.target.status ) {
case 204:
/
The Upload is Good. Return the File Object Array */
return data;
break;

case 507:
/* A 507 Means "Insufficient Storage" */
alert('App.Net Returned a ' + meta.code + ' Error:<br>' + meta.error_message);
break;

default:
/* Do Nothing */
}
}
};

xhr.open('put', url, true);
xhr.setRequestHeader("Authorization", "Bearer " + [ACCOUNT_ACCESS_TOKEN]);
xhr.setRequestHeader("Content-Type", file.type);
xhr.send(file);
break;

default:
/* Do Nothing */
}
}




Notice that we are using a PUT rather than a POST with the file upload, and that the Content-Type value is of the file’s mime type rather than as a multipart/form-data. This code does work, and you can see it in operation on Nice.Social. After the post is uploaded to the server you can embed the object into a post using an oembed JSON object that references this file when creating a post. Be sure to use the file_token and id values that were returned when the initial JSON package was uploaded to the API.

Wrap Up

This was not an easy problem for me to solve, as I ran into a lot of unforeseen complications that were not discussed in the official documentation. That said, after everything was operational, the system has worked without any problems whatsoever. I’ve not seen any hiccups when uploading files, though the public URL will occasionally alternate between photos.app.net and files.app.net. Truth be told, I think it has something to do with the data type we’re reporting, but I haven’t validated it 100% just yet.

While App.net may not be a popular platform to develop against, it’s still a pretty decent tool that can offer a lot of flexibility so long as you know how to make use of the API and its quirks. Hopefully there will be a few more exciting tools created in the future that allows people to explore new ways of using the service.

Re-Thinking Search

Search on this website is broken. Very broken. This is something I’ve known for quite some time but, as nobody was sending their complaints my way, it is something that I never really prioritised. Search on 10Centuries sites, after all, had a full month’s worth of optimisations just a little over a year ago. My ostrich move was brought to an abrupt end yesterday when Jeremy Cherfas remarked how difficult it was to find blog posts about food on this site. I went ahead and found the information he was looking for using some of my more advanced search tools but this isn’t something that I should have to do to help people find what they’re looking for. Search needs to be made much better in a very short period of time.

As of this writing there are just over 120,000 items that can be discovered through the search mechanisms on this website. This includes blog posts, ADN posts, Tweets, and a few other pieces of data that come from other applications such as SleepCycle. Typing in various things into the search field will return results, but these results cannot be sorted into any meaningful way. Filters are also required to enable people to quickly find exactly what they want.

That last requirement is the tricky part. How does one go about adding filters into a search field without over-complicating matters? We’ve all seen those search tools used by online shops and forums that require a visitor to know how the website categorises information before a cromulent set of results are returned, and that’s not what I want to do with 10Centuries. A great deal of logic goes into the search algorithms already, but there needs to be more. The question is what this will look like.

The Game Plan

Scoring will still play a large role in what results are returned when people go looking for specific topics or keywords. At the moment, the scores come from how many keywords are found in an article. This is clearly insufficient. What will come next is a list of data types in the search results that people can use to filter the results before them in real time. This can be done with very simple toggle buttons. Grey means disabled, and the site’s primary colour means active. With this in place, people will be able to find what they need in a more interactive manner.

Naturally people who don’t need or want the extra filters won’t ever need to use them. The search utility will continue to return results just as it should. The filters will just enable people to make more specific requests for data to find exactly what they need.

I’ll hope to have this baked into the first release of the 10C v3 API, which will go live on this site before any others1.