Twitter Writes Hamlet is back !

See it live http://twitter-writes-hamlet.com

Having learned from the first version of TWH, I decided to completely rewrite it. Here’s what changed:

Code rewrite

My initial approach was somewhat misguided. TWH would actually listen to 2 realtime Twitter streams: statuses/sample and statuses/filter.

statuses/sample gives a bunch of random tweets. statuses/filer was used to track some recurring words, mostly characters names (as they do occur a lot). The idea was that between these 2, I would get enough tweets to find the words.

In parallel to that, TWH would search (using search/tweets) for tweets, as all terms were not almays tracked. This would be the last strategy to ensure words are found.

The first issue with this approach was the amount of data I would pull from Twitter: listening to statuses/sample all the time gave me a lot of data, and 99.9% of it was simply ignored.

This drove the costs of operations way higher than it needed to be.

The statuses/filter approach made more sense, consumed less data but ultimately was not useful most of the time, because of the fixed list of terms tracked.

The new approach is only using only statuses/filter and search/tweets. Instead of a fixed list, TWH is now only tracking the next few words. This brings the amount of data way down.

Ideally I would only listen for the current word, and switch to the next one as soon as it’s found. Unfortunately, the API is made for long-lived connections, and requires to reconnect when changing tracking terms. So listening for a few of them allows to reconnect less often.

There’s still a catch: listening for highly common words like “the” or “have” will return a lot of tweets. So much in fact that Twitter will skip some. To fix this, I’m only listening to words with more than 4 letters.

This means i can’t track all terms in realtime. To compensate for that, search/tweets is used.

New infrastructure

The reason TWH was down for so long was that it was hosted on Google Cloud. I discovered that they ship more breaking changes to their services (like the datastore I was using) than I would expect. As a result, I’d have to adapt my code every time they made a change. I did not always have time for that.

With that experience, I decided I wanted to own my platform and went to Digital Ocean.

In order to ease deployment I went with a Docker setup: one container running the NodeJS app, and another for the database (Mongo). It works like a charm and made local setup way easier.

Running my own database also allows me to reduce costs. This app is not super data hungry: Hamlet has about 32 thousands words - not a big deal, and certainly doesn’t need state of the art cloud storage.

The operating costs of the first version on Google Cloud was about 20$ a month. I’m now running it all in a 5$/month Droplet.

The whole setup is also much simpler, as I don’t have to remember about all the fancy google stuff for building and deploy.

Typescript

This is the first real-life project I’ve written with TypeScript.

Overall the experience was awesome: my IDE (VSCode) suddenly got a lot more useful. Yes I did loose time sometimes around TypeScript issues, but it was totally worth the price. Instead of relauching the app constantly, I’d catch most of the bugs right in the IDE.

Conclusion

TWH now costs a quarter of what it used to, and the footprint of the app was drastically reduced too. The drawback is that common words are less accurately tracked: while words like “the” or “have” are probably tweeted all the time, it will take a few seconds for TWH to see them, because it’s using the search API.

Oh and the code is much better, check it out on GitHub