2009-02-21

fanf: (Default)

I've recently started playing with lpeg, a parsing library for Lua. It is based on "Parsing Expression Grammars", which were recently popularized by the prolific Bryan Ford. PEGs have some nice properties: they're suitable for unified parsers that handle both the low-level lexical syntax as well as higher-level hierarchial syntax; they have much simpler operational semantics than traditional extended regexes or context-free grammars; and as well as familiar regex-like and CFG-like operators they have nice features for controlling lookahead and backtracking. PEGs were originally developed alongside a cute algorithm for linear-time parsing which unfortunately also requires space linear in the input size with a fairly large multiplier. Lpeg instead uses a simple parsing machine, implemented somewhat like a bytecode interpreter. Its performance is quite competitive: the long paper says it has similar performance to pcre and glibc's POSIX regex implementation, and the short paper says it has similar performance to lex and yacc.

Lpeg actually consists of two modules. The core lpeg module is written in C and allows you to compose parser objects using operator overloading, building them up from primitives returned from tersely named constructor functions. The resulting syntax is rather eccentric. On top of that is the re module which provides a more normal PEG syntax for parsers, which despite the name of the module are rather different from regular expressions. This module is written in Lua, using an lpeg parser to parse PEGs and construct lpeg parsers from them. The PEG syntax is extended so that you can define "captures". Captures are the really nice thing about lpeg. You can use them like captures in Perl regexes to just extract substrings of the subject, but you can often do better. Lpeg captures are more like the semantic actions that you can attach to rules in parser generators like yacc. So, where in Perl you would do the match then fiddle around with $1, $2, etc, with lpeg the match can incorporate the fiddling in a nice tidy way. (In fact, probably the closest comparison is with Perl 6 rules, but they're not yet practically usable.)

The program I was writing with lpeg was to process some logs. I needed to convert the timestamps from ISO 8601 format into POSIX time_t which implied converting 8 fields from strings to numbers. Rather than having to convert each capture individually, or loop over the captures, I could write a single grammar rule to match a pair of digits and convert it to a number, then refer to that rule elsewhere in the grammar. (In fact Lua will coerce strings to numbers implicitly in most - but not all - circumstances. I happened to be tripped up trying to compare a number with a string, which doesn't coerce.) In the end it's nicest to let the parser drive all the program's activity through its semantic actions.

code )
fanf: (Default)

Since April 2002 I have been keeping a link log: a web page where I record links to other pages that are notable in some way, with a brief comment about each one. It's gatewayed to Delicious, LiveJournal, and Twitter. The Twitter feed is new.

I've been on Twitter since March 2007, when it made a big splash at the SXSW festival. I joined to see what the fuss was about but soon stopped using it. None of my social circle were using it much, and there wasn't a sensible way of finding people in the more distant reaches of my social network. For 18 months my last tweet said " Grumbling about the difficulty of finding out people's twitter usernames".

Recently Twitter has taken off in the UK, since a fairly large contingent of journalists and celebrities have started using it and talking about it. (Reminds me in a small way of the rise of the web in 1993/4.) Coincidentally at about the same time I decided to get back into using Facebook and Twitter a bit more, to keep in better touch with my family's activities. (They're on Facebook but I linked my Twitter and Facebook statuses together for convenience.)

The celebrity thing helped me to get more out of Twitter in a significant way. I had thought of it as more like Facebook or LiveJournal: a way of keeping up with the activities of people I already know. But in fact it's more like the wider blogging culture where it's normal to follow someone's writing just because they are interesting. (Yes, LJ blurs into this kind of blogging too.) In fact the even the Twitter guys didn't realise this at first: until June 2007 Twitter sent me email saying "You are foo's newest friend!" but the following month it started sending me email saying "foo is now following you on Twitter!". I now feel free to follow well-known people just because they are interesting, and I found more interesting people by looking through the lists of people they follow.

Part of the Twitter culture is heavy use of short URL services when posting interesting links. I decided that my link log ought to be gatewayed to Twitter to join in the fun, so I embarked on a bit of hacking. Previously it had been just a static HTML page, with a command-line script that I used to insert a new link. There was another script to turn the HTML into an RSS feed for LJ, and some code for uploading the results to chiark. I had a rudimentary click tracker so that I could get some idea of what links other people followed, but it did not shorten URLs. (That meant it was an open redirector until I found out what evil that made it vulnerable to.)

I replaced all the old static HTML and RSS gubbins with a CGI script which can format my link log in HTML or Atom, as well as providing short URLs for all the links. I decidided to host my own short URLs partly to continue tracking what people find interesting, and partly because I have a vanity domain that's just right for this job :-) The command line script for adding new links remained basically the same, apart from new code to create short tags and to post the link to Twitter.

After a couple of weeks of this new regime I decided I needed to be able to save links found when reading Twitter or LiveJournal on my iPod Touch. Its UI makes it easy to email links but not much else, so I set up a gateway from email to my link log. I send links to a specially tagged email address which is filtered to a special mailbox. A script polls this mailbox over IMAP every few minutes, looking for messages that match a number of paranoid checks. From these extracts a URL and a comment which it passes on to the old command-line script. (I still use the latter when I'm on a computer with a keyboard.)

The end result is pretty neat: I can easily save interesting links that I see in places where that was not previously practical. It's probably a good thing I don't have an iPhone because then I'd be able to spod everywhere.

PS. a brief hate: web sites that redirect deep links to the mobile version of their front page. Utter cWAP. They'll get no Google Juice from me. (In fact the iPhone browser is good enough that mobile sites are usually NOT an improvement.)

June 2025

S M T W T F S
1234567
891011121314
15161718192021
22232425262728
2930     

Most Popular Tags

Style Credit

Expand Cut Tags

No cut tags
Page generated 2025-06-07 22:20
Powered by Dreamwidth Studios