fanf: (Default)
[personal profile] fanf
PEGs are an interesting new formalism for grammars. Unlike most older formalisms, which are based on Chomsky's generative grammars, their starting point is recognizing strings that are in a language, not generating them. As such they are a closer match to what we usually want a grammar for. The practical effect of this is that they naturally avoid common ambiguities without external rules, such as C's if/else ambiguity or the various rules about greediness imposed on regexes (e.g. perl's matching rules versus POSIX's longest-leftmost rule, discussed in Friedl's book). Even though PEGs can recognize some non-context-free languages (e.g. anbncn) they can be matched in linear time using a packrat parser (which can be implemented very beautifully in Haskell).

Bryan Ford's 2004 POPL paper establishes the formal foundations of PEGs and defines a concrete syntax for them, fairly similar to ABNF. The key differences are: the choice operator is ordered (prefers to match its left-hand sub-expression); repetition operators are maximally greedy and don't backtrack (so the second a in a*a can never match); and it includes positive and negative lookahead assertions of the form &a and !a (like (?=a) and (?!a) in perl).

It occurs to me that there is a useful analogy hidden in here, that would be made more obvious with a little change in syntax. Instead of a / b write a || b, and instead of &a b write a && b. With || and && I am referring to C's short-cutting "orelse" and "andalso" boolean operators - or rather the more liberal versions that can return more than just a boolean, since a PEG returns the amount of input matched when it succeeds. This immediately suggests some new identities on grammars based on De Morgan's laws, e.g. !a || !b === !(a && b). Note that !!a =/= a because the former never consumes any input, so not all of De Morgan's laws work with PEGs. This also suggests how to choose the operators to overload when writing a PEG parser combinator library for C++ (which has a much wider range of possibilities than Lua).

Date: 2007-06-11 14:00 (UTC)
From: [identity profile] gareth-rees.livejournal.com
It's a long time since I've used it, but what I recall is that if you want decent error reporting from yacc you have to write your own error productions. Tedious, but not more so than writing your own parser.

January 2026

S M T W T F S
    123
45678910
1112 1314151617
18192021222324
25262728293031

Most Popular Tags

Style Credit

Expand Cut Tags

No cut tags
Page generated 2026-01-13 23:17
Powered by Dreamwidth Studios