blog

Bot on the Stream

You may remember my post from a while back about my experiences writing a Twitter bot. On my desktop, I keep an instance of TweetDeck running throughout the day, and one of its columns is set to view the notifications for @tmbotg. One of the bits of code in the bot is that any time another twitter user @-mentions the bot (or does an old-style "RT" retweet), the bot creates a favorite for that tweet. Recently I’ve noticed that retweets have been showing up in that column, but not getting faved. What’s up with that?

Earlier this year, Twitter added a newer type of retweet that quotes the entire original tweet and gives you a full 140 characters to comment on that tweet. Very cool and useful except for one thing: the REST API that I’ve been using to talk to Twitter doesn’t let you know that this has happened.

When I noticed this, I dug into it a little, both at the official Twitter developer docs as well as the docs for the Twython library that my bot uses to talk to Twitter. At the time, it seemed like just enough work that I’d need to shelve the idea until I had a reasonably-sized chunk of time to sit down and understand how all the pieces here fit together.

Now that I’ve got something working, I’m happy that it ended up being easier than I imagined it was going to be, once I wrapped my head around things.

The Streaming API

The streaming API works in the opposite manner that the REST API does — instead of writing code that occasionally makes requests to the API endpoints and handles the resulting responses, your code needs to keep a persistent HTTP connection open to the Twitter server that blocks until Twitter has an event to tell us about. Also, all that the streaming API can handle is pushing events out to your client — it’s not possible to do things like updating statuses, faving tweets, or anything like that.

Working With Twython’s Streaming API Support

Twython provides a convenient TwythonStreamer class to abstract out almost all of the streaming API — pass in the same OAuth parameters that are used to initialize the Twython class that works with the REST API, and you’re almost all of the way there. In practice, you’ll need to create a new class that’s derived from TwythonStreamer that implements member function overrides for two callback functions that are called whenever the streaming API returns a new message or an error.

In schematic form, this is as easy as:

class BotStreamer(TwythonStreamer):
   def on_success(self, data):
      # data: a block of JSON data with details on this event.
   def on_error(self, status_code, data):
      # status_code: the non-200 error code identifying the error type
      # data: a block of JSON data with more details about the error

To process stream data, just create an instance of the streamer class, and then have it start receiving data from one of the three streaming endpoints:

Public Stream provides your app with either all tweets coming through twitter (‘the firehose’, which requires special permission to access), or a restricted sampling of tweets.
Site Stream provides your app with tweet data for some larger number of users; this feature is currently in a closed beta, and they’re not accepting any more applications.
User Stream provides your app with tweet data about your currently logged in user. This is the one we’re interested in here — Twitter will send us realtime notifications of things like mentions, quoted tweets, and the like.

So, in our case it’s as simple as something like:

# initialize and log into twitter with our streamer class
self.twitter = BotStreamer(s.appKey, s.appSecret, s.accessToken, s.accessTokenSecret)
# ...and then start listening for events:
self.twitter.user()

Back in the BotStreamer class, our on_success handler looks something like this:

class BotStreamer(TwythonStreamer):
   def on_success(self, data):
      # for now, all we are interested in handling are quoted tweets.
      if "event" in data:
         if data["event"] == "quoted_tweet":
            # get the id of the tweet that quotes us:
            tweetId = data["target_object"]["id_str"]

(consult the official definition of the JSON structures defining a tweet here)

Architecture

The new issue to take into consideration here is that because listening to the stream effectively blocks your process infinitely as it’s either waiting for an event to come back or processing an event quickly so it can return to its waiting state, we need to consider how to make this work with our existing bot code that’s woken up periodically by cron, does some stuff, and then exits.
The decision that I made was to add support for streaming into the existing TmBot class. When the class is instantiated in streaming mode (which we expose to the outside world with a new --stream command line argument), the bot’s __init__ code creates a BotStreamer object instead of a Twython object:

class TmBot(object):
   def __init__(self, argDict=None):
      # code omitted for clarity....
      if self.stream:
         self.twitter = BotStreamer(s.appKey, s.appSecret, s.accessToken, s.accessTokenSecret)
      else:
         self.twitter = Twython(s.appKey, s.appSecret, s.accessToken, s.accessTokenSecret)

later in our Run() method, we either process @-mentions and generate a new tweet as we’ve always done, or we process the stream instead:

   def Run(self):
      if self.stream:
         if self.debug:
            print "About to stream from user account."
         try:
            # The call to user() will sit forever waiting for events on
            # our user account to stream down. Those events will be handled
            # for us by the BotStreamer object that we created above
            self.twitter.user()
         except KeyboardInterrupt:
            # disconnect cleanly from the server.
            self.twitter.disconnect()
      else:
         self.CreateUpdate()
         # etc, as before...

… which is all fine and well, except we need a way for the stream-handling process to let the regular periodic bot process know that there’s been a new quoted tweet that needs to be favorited. In the name of simplicity, any time the stream-handling process detects a new quoted tweet, we create a text file in the same directory with the bot’s code that’s named "{tweetId}.fav" that contains a single line with the id of the tweet to favorite.
The next time the periodic instance of the bot wakes up, it looks for any *.fav files, and if it finds any, it loops through them and creates a favorite for each of the tweets:

   def HandleQuotes(self):
      ''' The streaming version of the bot may have detected some quoted tweets
         that we want to respond to. Look for files with the .fav extension, and
         if we find any, handle them.
      '''
      faves = glob("*.fav")
      for fileName in faves:
         with open(fileName, "rt") as f:
            tweetId = f.readline().strip()
            if self.debug:
               print "Faving quoted tweet {0}".format(tweetId)
            else:
               try:
                  self.twitter.create_favorite(id=tweetId)
               except TwythonError as e:
                  self.Log("EXCEPTION", str(e))
         os.remove(fileName)

After we fave the tweet, we delete the file so we don’t process it again, and we’re all set. At some point it may make sense to move the detection of @-mentions and other things like that over into the stream-handling process, and in that event we’ll probably move to doing something like writing more context formatted as JSON so the periodic process knows how to respond.
The only thing remaining is to make sure that we launch a copy of the bot in stream-mode when my server restarts, and that it gets re-launched if anything crashes, etc. At some point I’ll get this all configured and running under Supervisor.

Legacy Vulnerabilities AKA Software Senescence

by Jason Bagley | Aug 20, 2021 | Developer Blog, Home Display

Does your business still have an XT computer in the back office because it's running that one version of some database software that your business depends on? Yeah, we know there is. Most modern software doesn't work like that. If you aren't keeping your custom...

Asynchronous Python – A Real World Example

by Daniel Popowich | Aug 13, 2021 | Developer Blog, Home Display

Introduction We have a customer that developed a hardware device to make physical measurements. Some years ago we wrote a suite of software tools for the customer: a tablet application for configuring the hardware device, a django web server to receive uploaded XML...

Spot the Vulnerability: Data Ranges and Untrusted Input

by Paul Hendry | Aug 6, 2021 | Developer Blog, Home Display

In 1997, a flaw was discovered in how Linux and Windows handled IP fragmentation, a Denial-of-Service vulnerability which allowed systems to be crashed remotely.

« Older Entries

Next Entries »

Bot on the Stream

The Streaming API

Working With Twython’s Streaming API Support

Architecture

Recent posts

Categories

Legacy Vulnerabilities AKA Software Senescence

Asynchronous Python – A Real World Example

Spot the Vulnerability: Data Ranges and Untrusted Input

Solutions

About