fanf: (Default)
[personal profile] fanf
PEGs are an interesting new formalism for grammars. Unlike most older formalisms, which are based on Chomsky's generative grammars, their starting point is recognizing strings that are in a language, not generating them. As such they are a closer match to what we usually want a grammar for. The practical effect of this is that they naturally avoid common ambiguities without external rules, such as C's if/else ambiguity or the various rules about greediness imposed on regexes (e.g. perl's matching rules versus POSIX's longest-leftmost rule, discussed in Friedl's book). Even though PEGs can recognize some non-context-free languages (e.g. anbncn) they can be matched in linear time using a packrat parser (which can be implemented very beautifully in Haskell).

Bryan Ford's 2004 POPL paper establishes the formal foundations of PEGs and defines a concrete syntax for them, fairly similar to ABNF. The key differences are: the choice operator is ordered (prefers to match its left-hand sub-expression); repetition operators are maximally greedy and don't backtrack (so the second a in a*a can never match); and it includes positive and negative lookahead assertions of the form &a and !a (like (?=a) and (?!a) in perl).

It occurs to me that there is a useful analogy hidden in here, that would be made more obvious with a little change in syntax. Instead of a / b write a || b, and instead of &a b write a && b. With || and && I am referring to C's short-cutting "orelse" and "andalso" boolean operators - or rather the more liberal versions that can return more than just a boolean, since a PEG returns the amount of input matched when it succeeds. This immediately suggests some new identities on grammars based on De Morgan's laws, e.g. !a || !b === !(a && b). Note that !!a =/= a because the former never consumes any input, so not all of De Morgan's laws work with PEGs. This also suggests how to choose the operators to overload when writing a PEG parser combinator library for C++ (which has a much wider range of possibilities than Lua).

Date: 2007-06-11 10:02 (UTC)
From: [identity profile] hoiho.livejournal.com
Very nfty.
I suspect the rest of my morning has now been lost to this.

Date: 2007-06-11 10:32 (UTC)
vatine: Generated with some CL code and a hand-designed blackletter font (Default)
From: [personal profile] vatine
Eeeeep! I just spotted that CL-peg uses my genhash library. I Am Officially Surprised.

Date: 2007-06-11 13:54 (UTC)
simont: A picture of me in 2016 (Default)
From: [personal profile] simont
Hmm. It's certainly an interesting thought, and I like the way it permits unification of the lexical and syntactic layers.

It does seem to me, though, that this heads directly away from what I actually in practice use formal grammars for. My usual process of language design involves writing down a formal grammar for the language, running it through yacc to check that it doesn't contain any non-obvious ambiguities, and then writing a recursive descent parser by hand (because I've never found yacc to give me decent parse-error reporting). Whereas PEGs' prioritised choice operator means that ambiguities are expected, which is very useful if you know there's an ambiguity and want to specify how it should be resolved, but rather less helpful if you wanted to be told about any ambiguities you'd missed.

The author has noticed this, of course, and section 5 mentions the possibility of introducing an unprioritised choice operator as well. I suspect I'd end up using that almost all the time, resorting to prioritised choice only to resolve an ambiguity I'd convinced myself was necessary.

Now to go and read up on the packrat parsing algorithm and see whether it looks as if it can be persuaded to do decent error reporting...

Date: 2007-06-11 14:00 (UTC)
From: [identity profile] gareth-rees.livejournal.com
It's a long time since I've used it, but what I recall is that if you want decent error reporting from yacc you have to write your own error productions. Tedious, but not more so than writing your own parser.

Date: 2007-06-11 20:51 (UTC)
pm215: (Default)
From: [personal profile] pm215
Is there any discussion of memory requirements for PEG parsers anywhere? (both in terms of code/data table size and any runtime memory allocation). The remarks in the wikipedia article about memoization suggest that it's likely to be quite space-hungry.

(The last parser I dealt with was for an application where memory usage was a big deal; and there's been a lot of work put in to the problem of minimising the data table size for LALR parsers...)

reference for a statement

Date: 2008-09-16 00:40 (UTC)
From: (Anonymous)
Hi there,

do you have a reference for your statement "Even though PEGs can recognize some non-context-free languages (e.g. a^nb^nc^n)"?

Thanks
Stefan

Re: reference for a statement

Date: 2008-09-16 14:48 (UTC)
From: (Anonymous)
Thanks very much for your quick response :o)
Have a great day.
Stefan

December 2025

S M T W T F S
 123456
78910111213
14151617181920
21222324 252627
28293031   

Most Popular Tags

Style Credit

Expand Cut Tags

No cut tags
Page generated 2026-01-11 06:25
Powered by Dreamwidth Studios