Books, Cats, Tech

"Everything that moves serves to interest and amuse a cat." — F.A. Paradis Moncrif

Vicki Brown

My home on the WWW
Est: 1994

Email: vlb@cfcl.com
Home

More About Me

Lifestream

  • Background
  • How I Built My Twitter Reader
  • Install and Configure pyTwerp
  • Get the Tweets
  • TWikification of Output
  • Wrap It Up (Packaging)
  • The Code

Building a Twitter Reader

Using pyTwerp and TWiki

DISCLAIMER - Before you read this...

If you want to implement what I have here, you will need a computer capable of running Python, Perl, and the Unix cron command.

Capable systems include Linux, BSD, Mac OS X, and Windows (if properly configured, probably with cygwin installed). For any other system, your mileage will vary widely. I have no idea if Windows has something approximating cron.

I use Mac OS X, which is based on BSD Unix. I cannot help you with any other system!

Background

One of the features of Twitter is that it runs 24/7/365. (Un)fortunately, I don't. So, I miss things. I didn't want to miss things, so I looked for a solution.

Being a programmer myself, I wanted a solution I could control and tweak if necessary. However, I didn't want to write something from scratch if I didn't have to!

Twitter has a popular, published API, so I figured someone would have written what I wanted. Someone did. I found pyTwerp (written in Python). *

From the pyTwerp documentation:

jdhore on the #twitter channel (irc.wyldryde.org) was talking about the lack of a simple linux command line utility to post Twitter updates so I asked him what features he wanted and created pyTwerp.
...
The whole concept of Twerp is to allow your Twitter data stream to be pulled from Twitter, formatted via a template you can control and then output to the console.

It also allows you to post a status message or send a direct message.

That's it in a nutshell.

You can access the following data streams on Twitter:

  • Friends Timeline
  • Your Timeline
  • Your Replies
  • Direct Messages sent to you

I'm not a Python programmer, but pyTwerp looked like it does pretty much everything I need, out-of-the-box. It turns out that this is exactly the case. In fact, pyTwerp does some things I didn't even know I wanted until after I had it!

The Front End

I could easily read the pyTwerp output files with any text editor, or as plain .txt files in my web browser. But I wanted something that looked a little nicer. Enter TWiki.

TWiki is a structured enterprise wiki with a lot of programmable features. By using TWiki, I don't need to convert the output to html in order to read it on the web and I get the following useful features:

  • Blank lines are retained as "paragraph" breaks.
  • Anything that begins with http:// is rendered as a clickable link.
  • *word* is converted to bold; _word_ is converted to italic.
  • I can "script" the results, using TWiki to hide everything I'm not currently reading, or only show me this month's output.
  • I can easily adjust the look and feel with CSS.

How I Built My Twitter Reader

Steps

  • Install and Configure pyTwerp
  • Get the Tweets
  • Plan For TWiki
  • Wrap it up (Packaging)

Install and Configure pyTwerp

Start by downloading pyTwerp from code.google.com/p/pytwerp/.

You will also need two required libraries:

  • setuptools http://cheeseshop.python.org/pypi/setuptools
  • simplejson http://cheeseshop.python.org/pypi/simplejson

Installation

  1. setuptools is easy to install. Follow the directions on the download page.
  2. simplejson doesn't come with instructions. Run
       easy_install simplejson
  3. Install pyTwerp by running
       python setup.py install
    This will install the pyTwerp library into your Python's site-packages location. A utility script named twerp will be installed in /usr/local/bin (by default); this lets you invoke pyTwerp using twerp <options>.

Configuration

The default configuration file ~/.twerp.cfg is created and populated the first time you run twerp. Configure twerp for your twitter account by running:
   twerp -U twitter-username -P twitter password 
You'll only need to do this once.

Get the Tweets

First, I wrote a small shell script I named gotwerp. This does some housekeeping tasks and runs twerp. (Full gotwerp code can be seen at the end of this article.)

   /usr/local/bin/twerp -f ...
      ... >> TwitterLog${DATE}.txt

gotwerp creates files with date-stamped names, for example: TwitterLog2008Jun26.txt. Every day at midnight, a new file is created.

I configured cron to run gotwerp every 10 minutes, appending to my Log file every time it runs.

    0,10,20,30,40,50 * * * *  $HOME/bin/gotwerp

I also configured gotwerp to time-stamp the output every 30 minutes.

Output file format

Using the pyTwerp defaults, the output (TwitterLog) files are formatted like this:

0000
mdy: Devoting a hot and humid afternoon to home and electrical repairs.
dlpasco: George Carlin - Jammin' in New York  Still totally brilliant. ...
megfowler: just scratched my own face with a piece of fruit. i am epic.
vdichev: I'm amused... NOT. I'm annoyed that I cannot login to ...
MaryHodder: GirlGeekRevolution tomorrow night at Sugar Cafe/SF 6-9pm...

0030
Suw: Oh! Email says I've won 500k from Google UK to ...
al3x: Missed textures.

Reversing The Order

The Twitter API returns entries in LIFO order. I prefer chronological order; otherwise, conversations get muddled. To handle this, I included the following filter in my gotwerp script:
        perl -e 'print reverse <>'
Now each chunk of output is internally ordered by the time of posting. (actually, they're ordered by the time each posting reached twitter, but that's close enough.)

As of pyTwerp 0.4 this is no longer necessary.

TWikification of Output

Knowing I would be using TWiki, I made a few tweaks to add a little bit of TWiki markup code.

pyTwerp has a -T template option that provides more control of the output format. So, I changed the template like this

Show Code

(Caution: the template line is broken to fit the screen. Do not break the template format across lines in a real script.)
  twerp -f \
   -T '_%(user_screen_name)s_: %(text)s 
   [[Twitter:%(user_screen_name)s/statuses/%(id)s][view]]'

TWiki uses __ to signify bold italic type. I have defined Twitter: as shorthand using the TWiki Interwiki Plugin. This will cause [[Twitter:al3x]] to expand to http://twitter.com/al3x when viewed in TWiki.

The result (in the TwitterLog file):

__al3x__: Missed textures. [[Twitter:al3x/statuses/843948918][view]]

Viewed in TWiki, I'll see something like this:

al3x: Missed textures. view

(Note: I'm a bit embarrassed to admit that I had totally missed the Template feature of pyTwerp until I had been running my Twitter reader for a few weeks! My earlier output simply set the "view" link to the person's Twitter page, not directly to the tweet in question. Duh.)

I can also (and actually have) put the teplate string into my .twerprc file. However, for purposes of this article, we'll pretend it's still in gotwerp.

Prettification With CSS

Now I wanted to be even trickier. Under normal circumstances, TWiki would merge and wrap lines that aren't otherwise separated by a blank line or an explicit HTML <br> tag. I'd get

0030 Suw: Oh! Email says I've won ... view al3x: Missed textures. view

unless I use <pre> to preformat the text. And if I used <pre>, I'd run into other constraints: fixed-width fonts and lines approximately 140 characters long (no wrapping).

I did a little investigating and found the CSS white-space: pre-wrap; directive. This is a relatively new directive. It's not yet supported by all browsers (but there are workarounds for those).

pre-wrap is supported in Firefox 3 (and a variant is available for pre-Firefox-3 Mozilla browsers). Personally, that's all I care about. If you use a different browser, check this workaround or do a web search for "white-space pre-wrap".

As long as I was including CSS, I made a few tweaks to the look of the output, increasing the font size slightly, highlighting italic (<em>) in green...

Show Code

<style>
    .preElement
    {
     white-space: pre; 
     white-space: -moz-pre-wrap;
     white-space: pre-wrap;  /* Firefox 3 */
     width:650px;
     padding:0px;
     font-size: larger;
     line-height: 140%;
    }
    .preElement em {
      color: #363;
    }
    </style>

Here's what a Twitter log file looks looks when viewed in TWiki:

screenshot

Additional Tweaks...

Just when you think everything is working, someone drops a monkeywrench into the soup...

In the first case, one of the people I follow has been reworking his webpage with CSS. He's twittering about it:

... Experimenting by deleting and moving and renaming random <div> tags until something happens...

Oops. TWiki processed that <div>. That is, it tried to. (Without a matching </div>, the results were "unexpected".)

So now gotwerp now includes one more filter:

   s/</&lt;/g

Then I discovered that some feeders can send newlines to Twitter! That is, my expectation that all tweets were on one line was not 100% correct. Heresy! I added an end-of-tweet marker to my template and added a call to paste to gotwerp.

Wrap It Up (Packaging)

Finally, I wanted to make my Twitter Reader into a handy application. I wanted:
  • An easy way to choose which Twitter logs to read
  • A reasonable default choice of log (today)
  • Hiding of any logs I'm not currently reading
  • A table of contents with quick links into the list of choices.

You can view my Twitter Reader in action in my TWiki.

Screen shot  

reader_snap.jpg

The Code

Show Code for gotwerp  
    #!/bin/sh 
    
    dir=$HOME/web/Twitter
    
    DATE=`date "+%Y-%b-%d"`

    twitterlog=TwitterLog${DATE}.txt
    
    cd $dir
    
    if [ ! -f $twitterlog ]; then
        touch $twitterlog
        chmod g+w $twitterlog
    fi
    
    twerp -c ~/.twerprc -f |
       paste -s -d' ' - |
       perl -e '
         local $/ = "end-of-tweet"; # set record separator
         $time = `date "+%H%M"`;
         while ($line = <>) {
             chomp $line;
             next if ($line =~ /^\s*$/);
             $line =~ s/</\&lt;/g; 
             print $line, "\n";
         }
         if ($time =~ /\d\d[03][0-4]/) {
             print "\n$time-----\n"; 
         }
       ' >> $twitterlog

Show Code for Twitter Reader  

<noautolink>
---+!!  TWitter Reader

<form action="%SCRIPTURL{"view"}%/Vicki/Twitter/TwitterReader" >
  <input type="text" name="when"  size="20" value="%TODAY%" id="datecell" \
  class="twikiEditFormTextField" /><input type="image" name="calendar" \
  src="%PUBURL%/TWiki/JSCalendarContrib/img.gif" align=MIDDLE alt="Calendar" \
  onclick="return showCalendar('datecell', '%Y-%b-%d')" /> 
<br />
  <input type="submit">
</form>

Twitter Logs for %WHEN%

%TOC%

%~~ SEARCH{"TwitterLog%WHEN%.*" 
~~~   scope="topic" type="regex" nosearch="on" noheader="on"
~~~   format="---++ $percntSPACEOUT{ \"$topic\" }$percnt $n 
~~~   $percntTWISTY{remember="on"}$percnt $n
~~~   <div class=\"preElement\"> $n
~~~   $percntINCLUDE{\"$topic\"}$percnt $n
~~~   </div> $n
~~~   $percntENDTWISTY$percnt $n $n
~~~   "
~~~ }%

</noautolink>
<!---
   * Set TODAY = %SERVERTIME{"$year-$month-$day"}%
   * Set WHEN = %URLPARAM{"when" default="%TODAY%"}%
-->
<!-- JSCALENDAR -->
%INCLUDE{"%TWIKIWEB%/JSCalendarContribInline"}%

<link rel="stylesheet" href="%PUBURL%/%WEB%/WebHome/twitterstyles.css" type="text/css" media="all" />

Reference

pyTwerp man page


* Thank you to bear and decklin for pyTwerp; special thanks to decklin for answering bozo novice questions and pointing me toward seeing the template feature which I somehow had missed the first time out!