blog

This Might Be a Twitterbot

by

Image of William Allen White

A while ago, I made a threat on Twitter that I was going to unfollow any account that wasn’t a bot. On an average day, I’ve been getting as much value out of these algorithmically-generated as I do from human-run accounts, it seems, whether it’s just tweeting out Finnegans Wake or Gravity’s Rainbow, searching the twitter stream for pairs of rhyming tweets in iambic pentameter, replacing the nouns and adjectives in William Carlos Williams’ “This is Just to Say“, or a Markov chain-driven mashup of the King James Bible and Abelson & Sussman’s The Structure and Interpretation of Computer Programs.
There’s a quote from near the beginning of Pynchon’s ‘The Crying of Lot 49’ that’s always resonated with me — “You know what a miracle is. Not what Bakunin said. But another world’s intrusion into this one. Most of the time we coexist peacefully, but when we do touch there’s cataclysm.” The tweets being generated by the bots I follow give me that same sense of tiny little blips of another incongruous universe briefly slipping into ours. I’m also reminded of Wintermute, the AI gone insane in Gibson’s Count Zero, quietly making Cornell-like shadowboxes.
Then, one day I saw this:

…and knew that I had to build my own bot. I’ve been a fan of the band since I was in grad school in ’88 or so, and their lyrics have the kind of wordplay, randomness, rapid context switching and general evocativeness that made the idea of building this seem like enough fun to spend a few nights putting together (and the fact that there’s an excellent fan-run wiki that maintains a full database of all their lyrics reduced the labor on my part greatly).

Selecting an API library

Given a choice, the languages I reach for out of reflex when starting a project are Python and C++; C++ doesn’t seem like an especially appropriate choice for this, so Python it is.
Twitter has implemented a very nice and well-documented RESTful API , and I certainly could have worked with that directly. For this project, I didn’t see any particular advantage or fun in dealing with that part of th eplumbing manually, so it was off to search for a pre-existing library to handle communcations with Twitter.
Most of the links that I found on the topic pointed directly at Tweepy, but looking a little more deeply turned up some newsgroup posts that made me concerned that Tweepy was likely to be abandoned by its creator. Having been burned by that before, I settled on Twython  instead.

Authentication

The most confusing part of the whole project proved to be figuring out how to deal with authenticating with the Twitter server. All of the OAuth documentation that I turned up showed the (obviously much more common) use case where a twitter app would need to be able to authenticate on behalf of some number of human users. In this case, the bot app is the only user, and going through all of the OAuth handshaking seemed ridiculous.
It turns out (and I learned this from a Stack Overflow post that I’m unable to find again at the moment) that there’s a simple solution to this problem of authenticating a bot by changing some settings on an app’s settings.

Software screen capture

My app’s settings tab on dev.twitter.com


At the bottom of that settings tab, set the Application Type from the default “Read only” to “Read, Write, and Access direct messages”, then click the “Update this Twitter application’s settings” button at the very bottom of the page.
Software screen capture
This change may take a few minutes to become effective. After a while (go make a cup of tea, maybe…), return to your app’s Details tab, and click on the button at the bottom of the page to generate your access token.
Software screen captureOnce you have done that, the four pieces of information your bot needs to authenticate itself to twitter will be listed on that page:

  • Consumer key
  • Consumer secret
  • Access token
  • Access token secret

In my bot’s source, we read those four values from a JSON-formatted config file, and pass them as initialization parameters to the Twython object we create.
[sourcecode language=”python”]
self.settings = Settings(os.path.join(self.botPath, "tmbotg.json"))
s = self.settings
self.twitter = Twython(s.appKey, s.appSecret, s.accessToken,
s.accessTokenSecret)
[/sourcecode]

Posting a status update

Posting a status update is trivial with Twython — assuming that we have an instance created, it’s as easy as
[sourcecode language=”python”]
self.twitter.update_status(status="This is my awesome tweet")
[/sourcecode]
Of course, the matter of when to update, and what that awesome tweet text should be is a little more complicated. I’ll handwave past the ‘what’ question a bit (full source for the bot, including the code that scrapes the HTML at tmbw.net is available on github at https://github.com/bgporter/tmbotg). My goal was to have tweets appear unpredictably. It seemed that one an hour wasn’t going to be too obnoxious to anyone, so my first stab at this was to have the bot executed once a minute using a cron job, and then code inside the bot something like this:
[sourcecode language=”python”]

out of 1440 minutes in a day, post about 24 times.

tweet_probability = 24.0/1440
if random.random() < tweet_probability:
txt = self.GetTweetText()
self.twitter.update_status(status=txt)
[/sourcecode]
…but it turns out that in practice, not long after I put the bot live, random.random() decided to have us generate enough updates in a short period that I got a complaint about the frequency. Now we keep track of when the last update happened, and refuse to generate a new update until some reasonable period has elapsed after that (currently set at an hour). Right now, the code that creates tweets looks like this:
[sourcecode language=”python”]
def CreateUpdate(self):
”’
Called everytime the bot is Run().
If a random number is less than the probability that we should generate
a tweet (or if we’re told to force one), we look into the lyrics database
and (we hope) append a status update to the list of tweets.

     1/11/14: Added a configurable 'minimumSpacing' variable to prevent us from
     posting an update too frequently. Starting at an hour ()
  '''
  doUpdate = False
  if random() &lt; self.settings.tweetProbability:
       last = self.settings.lastUpdate or 0
       now = time()
       # Make sure that we're not tweeting too frequently. Default is to enforce
       # a 1-hour gap between tweets (configurable using the 'minimumSpacing' key
       # in the config file, providing a number of minutes we must remain silent.)
       requiredSpace = self.settings.minimumSpacing
       if not requiredSpace:
          # no entry in the file -- let's create one. Default = 1 hour.
          requiredSpace = 60*60
          self.settings.minimumSpacing = requiredSpace
       if now - last &gt; requiredSpace:
        # Our last tweet wasn't long enough ago. Stay quiet this time.
        doUpdate = True
  if self.force:
     doUpdate = True
  if doUpdate:
     try:
        # Occasionally force some short(er) updates so they're not all
        # paragraph-length.. (these values arbitrarily chosen)
        maxLen = choice([120, 120, 120, 120, 100, 100, 100, 80, 80, 40])
        album, track, msg = self.GetLyric(maxLen)
        self.tweets.append({'status' : msg})
        self.settings.lastUpdate = int(time())
     except NoLyricError:
        # !!! TODO: we should log this.
        pass

[/sourcecode]
Note that we don’t actually post a tweet in that method; we just add a dict to the end of a list of tweets. That’s because I thought it would be cool to let this bot reply to people, too.

Responding

Every time we’re run, we also check the mentions timeline, which will return data about every tweet posted that’s mentioned our username. We go through that list, and favorite each mention (give those nice people a little drip of dopamine). If any of those mentions have a ‘?’ in them, we assume that they’re asking us a question, and reply to them with a single line of lyrics as a response. We’ll keep up a conversation as long as they keep asking us questions (and if you’re thinking like I am at this instant, I’m assuming that there are at least 1000 Eliza-bots on Twitter right now.)

Again, our replies are added as a dict to the end of a list — in this case, we also include the id of the tweet that we’re replying to as the value for the key in_reply_to_status_id. Once we’re all done, a separate SendTweets() method can iterate through that array and just pass each of those dicts to the update_status API function, and it will just correctly send them as plain tweets or replies as is appropriate.
[sourcecode language=”python”]
def HandleMentions(self):
”’
Get all the tweets that mention us since the last time we ran and process each
one.
Any time we’re mentioned in someone’s tweet, we favorite it. If they ask
us a question, we reply to them.
”’
mentions = self.twitter.get_mentions_timeline(since_id=self.settings.lastMentionId)
if mentions:
# Remember the most recent tweet id, which will be the one at index zero.
self.settings.lastMentionId = mentions[0][‘id_str’]
for mention in mentions:
who = mention[‘user’][‘screen_name’]
text = mention[‘text’]
theId = mention[‘id_str’]

        # we favorite every mention that we see
        if self.debug:
           print &quot;Faving tweet {0} by {1}:n {2}&quot;.format(theId, who, text.encode(&quot;utf-8&quot;))
        else:
           self.twitter.create_favorite(id=theId)
        # if they asked us a question, reply to them.
        if &quot;?&quot; in text:
           # create a reply to them.
           maxReplyLen = 120 - len(who)
           album, track, msg = self.GetLyric(maxReplyLen)
           # get just the first line
           msg = msg.split('n')[0]
           # In order to post a reply, you need to be sure to include
           # their username in the body of the tweet.
           replyMsg = &quot;@{0} {1}&quot;.format(who, msg)
           self.tweets.append({'status': replyMsg, &quot;in_reply_to_status_id&quot; : theId})

[/sourcecode]

In the wild

There are very few dependencies here:

On the machine where the bot lives, I created a virtualenv for all these packages, and once a minute, a cron job activates that environment, runs the bot, and is done.
They Might Be Giants are going to be working on two new albums this year, so when that happens, I’ll need to re-run the GetLyrics.py HTML-scraper code again to fetch that data. Now that I’ve built this and seen it running, I can imagine extracting the underlying logic for this into a little twitterbot framework so that next time I get a weird urge to do something like this and a few hours that I have nothing better to do with, I can make another bot quickly.
 
UPDATE: I’ve added support for the Twitter Streaming API. Read about it here: Bot on the Stream

+ more