parsr

parsr is a library for building parsers based on parsing expression grammars or PEGs.

You build a parser by making subparsers to match simple building blocks like numbers, strings, symbols, etc. and then composing them to reflect the higher level structure of your language.

Some means of combination are like those of regular expressions: sequences, alternatives, repetition, optional matching, etc. However, matching is always greedy. parsr also allows recursive definitions and the ability to transform the match of any subparser with a function. The parser can recognize and interpret its input at the same time.

Here’s an example that evaluates arithmetic expressions.

from insights.parsr import EOF, Forward, InSet, Many, Number, WS

def op(args):
    ans, rest = args
    for op, arg in rest:
        if op == "+":
            ans += arg
        elif op == "-":
            ans -= arg
        elif op == "*":
            ans *= arg
        else:
            ans /= arg
    return ans


LP = Char("(")
RP = Char(")")

expr = Forward()  # Forward declarations allow recursive structure
factor = WS >> (Number | (LP >> expr << RP)) << WS
term = (factor + Many(InSet("*/") + factor)).map(op)

# Notice the funny assignment of Forward definitions.
expr <= (term + Many(InSet("+-") + term)).map(op)

evaluate = expr << EOF
exception parsr.Backtrack(msg)[source]

Bases: Exception

Mapped or Lifted functions should Backtrack if they want to fail without causing parsing to fail.

class parsr.Char(char)[source]

Bases: parsr.Parser

Char matches a single character.

a = Char("a")     # parses a single "a"
val = a("a")      # produces an "a" from the data.
val = a("b")      # raises an exception
class parsr.Choice(children)[source]

Bases: parsr.Parser

A Choice requires at least one of its children to succeed, and it returns the value of the one that matched. Alternatives in a choice are tried left to right, so they have a definite priority. This a feature of PEGs over context free grammars.

Additional uses of | on the parser will cause it to accumulate parsers onto itself instead of creating new Choices. This has the desirable effect of increasing efficiency, but it can also have unintended consequences if a choice is used in multiple parts of a grammar as the initial element of another choice. Use a Wrapper to prevent that from happening.

abc = a | b | c   # alternation or choice.
val = abc("a")    # parses a single "a"
val = abc("b")    # parses a single "b"
val = abc("c")    # parses a single "c"
val = abc("d")    # raises an exception
class parsr.Context(lines, orig, src=None)[source]

Bases: object

An instance of Context is threaded through the process call to every parser. It stores an indention stack to track hanging indents, a tag stack for grammars like xml or apache configuration, the active parser stack for error reporting, and accumulated errors for the farthest position reached.

set(pos, msg)[source]

Every parser that encounters an error calls set with the current position and a message. If the error is at the farthest position reached by any other parser, the active parser stack and message are accumulated onto a list of errors for that position. If the position is beyond any previous errors, the error list is cleared before the active stack and new error are recorded. This is the “farthest failure heurstic.”

class parsr.EnclosedComment(s, e)[source]

Bases: parsr.Parser

EnclosedComment matches a start literal, an end literal, and all characters between. It returns the content between the start and end.

Comment = EnclosedComment("/*", "*/")
class parsr.EndTagName(parser, ignore_case=False)[source]

Bases: parsr.Wrapper

Wraps a parser that represents an end tag for grammars like xml, html, etc. The result is captured and compared to the last tag on the tag stack in the Context object. The tags must match for the parse to be successful.

class parsr.FollowedBy(child, follow)[source]

Bases: parsr.Parser

FollowedBy takes a parser and a predicate parser. The initial parser matches only if the predicate matches the input after it. On success, input for the predicate is left unread, and the result of the first parser is returned.

ab = Char("a") & Char("b") # matches an "a" followed by a "b", but
                           # the "b" isn't consumed from the input.
val = ab("ab")             # returns "a" and leaves "b" to be
                           # consumed.
val = ab("ac")             # raises an exception and doesn't
                           # consume "a".
class parsr.Forward[source]

Bases: parsr.Parser

Forward allows recursive grammars where a nonterminal’s definition includes itself directly or indirectly. You initially create a Forward nonterminal with regular assignment.

expr = Forward()

You later give it its real definition with the <= operator.

expr <= (term + Many(LowOps + term)).map(op)
class parsr.HangingString(chars, echars=None, min_length=1)[source]

Bases: parsr.Parser

HangingString matches lines with indented continuations like in ini files.

Key = WS >> PosMarker(String(key_chars)) << WS
Sep = InSet(sep_chars, "Sep")
Value = WS >> (Boolean | HangingString(value_chars))
KVPair = WithIndent(Key + Opt(Sep >> Value))
class parsr.InSet(s, name=None)[source]

Bases: parsr.Parser

InSet matches any single character from a set.

vowel = InSet("aeiou")  # or InSet(set("aeiou"))
val = vowel("a")  # okay
val = vowel("e")  # okay
val = vowel("i")  # okay
val = vowel("o")  # okay
val = vowel("u")  # okay
val = vowel("y")  # raises an exception
class parsr.KeepLeft(left, right)[source]

Bases: parsr.Parser

KeepLeft takes two parsers. It requires them both to succeed but only returns results for the first one. It consumes input for both.

a = Char("a")
q = Char('"')

aq = a << q      # like a + q except only the result of a is
                 # returned
val = aq('a"')   # returns "a". Keeps the thing on the left of the
                 # <<
class parsr.KeepRight(left, right)[source]

Bases: parsr.Parser

KeepRight takes two parsers. It requires them both to succeed but only returns results for the second one. It consumes input for both.

q = Char('"')
a = Char("a")

qa = q >> a      # like q + a except only the result of a is
                 # returned
val = qa('"a')   # returns "a". Keeps the thing on the right of the
                 # >>
class parsr.Lift(func)[source]

Bases: parsr.Parser

Lift wraps a function of multiple arguments. Use it with the multiplication operator on as many parsers as function arguments, and the results of those parsers will be passed to the function. The result of a Lift parser is the result of the wrapped function.

Example:

.. code-block:: python

    def comb(a, b, c):
        return "".join([a, b, c])

    # You'd normally invoke comb like comb("x", "y", "z"), but you can
    # "lift" it for use with parsers like this:

    x = Char("x")
    y = Char("y")
    z = Char("z")
    p = Lift(comb) * x * y * z

    # The * operator separates parsers whose results will go into the
    # arguments of the lifted function. I've used Char above, but x, y,
    # and z can be arbitrarily complex.

    val = p("xyz")  # would return "xyz"
    val = p("xyx")  # raises an exception. nothing would be consumed
class parsr.Literal(chars, value=<object object>, ignore_case=False)[source]

Bases: parsr.Parser

Match a literal string. The value keyword lets you return a python value instead of the matched input. The ignore_case keyword makes the match case insensitive.

lit = Literal("true")
val = lit("true")  # returns "true"
val = lit("True")  # raises an exception
val = lit("one")   # raises an exception

lit = Literal("true", ignore_case=True)
val = lit("true")  # returns "true"
val = lit("TRUE")  # returns "TRUE"
val = lit("one")   # raises an exception

t = Literal("true", value=True)
f = Literal("false", value=False)
val = t("true")  # returns the boolean True
val = t("True")  # raises an exception

val = f("false") # returns the boolean False
val = f("False") # raises and exception

t = Literal("true", value=True, ignore_case=True)
f = Literal("false", value=False, ignore_case=True)
val = t("true")  # returns the boolean True
val = t("True")  # returns the boolean True

val = f("false") # returns the boolean False
val = f("False") # returns the boolean False
class parsr.Many(parser, lower=0, upper=None)[source]

Bases: parsr.Parser

Many wraps another parser and requires it to match a certain number of times.

When Many matches zero occurences (lower=0), it always succeeds. Keep this in mind when using it in a list of alternatives or with FollowedBy or NotFollowedBy.

The results are returned as a list.

x = Char("x")
xs = Many(x)      # parses many (or no) x's in a row
val = xs("")      # returns []
val = xs("a")     # returns []
val = xs("x")     # returns ["x"]
val = xs("xxxxx") # returns ["x", "x", "x", "x", "x"]
val = xs("xxxxb") # returns ["x", "x", "x", "x"]

ab = Many(a + b)  # parses "abab..."
val = ab("")      # produces []
val = ab("ab")    # produces [["a", b"]]
val = ab("ba")    # produces []
val = ab("ababab")# produces [["a", b"], ["a", "b"], ["a", "b"]]

ab = Many(a | b)  # parses any combination of "a" and "b" like
                  # "aababbaba..."
val = ab("aababb")# produces ["a", "a", "b", "a", "b", "b"]
bs = Many(Char("b"), lower=1) # requires at least one "b"
class parsr.Map(child, func)[source]

Bases: parsr.Parser

Map wraps a parser and a function. It returns the result of using the function to transform the wrapped parser’s result.

Example:

.. code-block:: python

    Digit = InSet("0123456789")
    Digits = Many(Digit, lower=1)
    Number = Digits.map(lambda x: int("".join(x)))
class parsr.Mark(lineno, col, value, start, end)[source]

Bases: object

An object created by PosMarker to capture a value at a position in the input. Marks can give more context to a value transformed by mapped functions.

class parsr.Node[source]

Bases: object

Node is the base class of all parsers. It’s a generic tree structure with each instance containing a list of its children. Its main purpose is to simplify pretty printing.

class parsr.NotFollowedBy(child, follow)[source]

Bases: parsr.Parser

NotFollowedBy takes a parser and a predicate parser. The initial parser matches only if the predicate parser fails to match the input after it. On success, input for the predicate is left unread, and the result of the first parser is returned.

anb = Char("a") / Char("b") # matches an "a" not followed by a "b".
val = anb("ac")             # returns "a" and leaves "c" to be
                            # consumed
val = anb("ab")             # raises an exception and doesn't
                            # consume "a".
class parsr.OneLineComment(s)[source]

Bases: parsr.Parser

OneLineComment matches everything from a literal to the end of a line, excluding the end of line characters themselves. It returns the content between the start literal and the end of the line.

Comment = OneLineComment("#") | OneLineComment("//")
class parsr.Opt(p, default=None)[source]

Bases: parsr.Parser

Opt wraps a single parser and returns its value if it succeeds. It returns a default value otherwise. The input pointer is advanced only if the wrapped parser succeeds.

a = Char("a")
o = Opt(a)      # matches an "a" if its available. Still succeeds
                # otherwise but doesn't advance the read pointer.
val = o("a")    # returns "a"
val = o("b")    # returns None. Read pointer is not advanced.

o = Opt(a, default="x") # matches an "a" if its available. Returns
                        # "x" otherwise.
val = o("a")    # returns "a"
val = o("b")    # returns "x". Read pointer is not advanced.
class parsr.Parser[source]

Bases: parsr.Node

Parser is the common base class of all Parsers.

debug(d=True)[source]

Set to True to enable diagnostic messages before and after the parser is invoked.

map(func)[source]

Return a Map parser that transforms the results of the current parser with the function func.

sep_by(sep)[source]

Return a parser that matches zero or more instances of the current parser separated by instances of the parser sep.

until(pred, upper=None)[source]

Return an Until parser that matches zero or more instances of the current parser until the pred parser succeeds.

class parsr.PosMarker(parser)[source]

Bases: parsr.Wrapper

Save the line number and column of a subparser by wrapping it in a PosMarker. The value of the parser that handled the input as well as the initial input position will be returned as a Mark.

class parsr.Regex(pattern, flags=0, return_match=False)[source]

Bases: parsr.Parser

Match characters against a regular expression.

identifier = Regex("[a-zA-Z]([a-zA-Z0-9])*")
identifier("abcd1") # returns "abcd1"
identifier("1bcd1") # raises an exception
class parsr.Sequence(children)[source]

Bases: parsr.Parser

A Sequence requires all of its children to succeed. It returns a list of the values they matched.

Additional uses of + on the parser will cause it to accumulate parsers onto itself instead of creating new Sequences. This has the desirable effect of causing sequence results to be represented as flat lists instead of trees, but it can also have unintended consequences if a sequence is used in multiple parts of a grammar as the initial element of another sequence. Use a Wrapper to prevent that from happening.

a = Char("a")     # parses a single "a"
b = Char("b")     # parses a single "b"
c = Char("c")     # parses a single "c"

ab = a + b        # parses a single "a" followed by a single "b"
                  # (a + b) creates a "Sequence" object. Using `ab`
                  # as an element in a later sequence would modify
                  # its original definition.

abc = a + b + c   # parses "abc"
                  # (a + b) creates a "Sequence" object to which c
                  # is appended

val = ab("ab")    # produces a list ["a", "b"]
val = ab("a")     # raises an exception
val = ab("b")     # raises an exception
val = ab("ac")    # raises an exception
val = ab("cb")    # raises an exception

val = abc("abc")  # produces ["a", "b", "c"]
class parsr.StartTagName(parser)[source]

Bases: parsr.Wrapper

Wraps a parser that represents a starting tag for grammars like xml, html, etc. The tag result is captured and put onto a tag stack in the Context object.

class parsr.String(chars, echars=None, min_length=1)[source]

Bases: parsr.Parser

Match one or more characters in a set. Matching is greedy.

vowels = String("aeiou")
val = vowels("a")            # returns "a"
val = vowels("u")            # returns "u"
val = vowels("aaeiouuoui")   # returns "aaeiouuoui"
val = vowels("uoiea")        # returns "uoiea"
val = vowels("oouieaaea")    # returns "oouieaaea"
val = vowels("ga")           # raises an exception
class parsr.StringUntil(term, lower=None, upper=None)[source]

Bases: parsr.Parser

StringUntil matches any number of characters until a predicate is seen.

su  = StringUntil(Char("="))  # parses any number of characters until 'a'
val = su("ab=")               # produces "ab" from the data.
val = su("ab")                # raises an exception
class parsr.Until(parser, predicate, upper=None)[source]

Bases: parsr.Parser

Until wraps a parser and a terminal parser. It accumulates matches of the first parser until the terminal parser succeeds. Input for the terminal parser is left unread, and the results of the first parser are returned as a list.

Since Until can match zero occurences, it always succeeds. Keep this in mind when using it in a list of alternatives or with FollowedBy or NotFollowedBy.

cs = AnyChar.until(Char("y")) # parses many (or no) characters
                              # until a "y" is encountered.

val = cs("")                  # returns []
val = cs("a")                 # returns ["a"]
val = cs("x")                 # returns ["x"]
val = cs("ccccc")             # returns ["c", "c", "c", "c", "c"]
val = cs("abcdycc")           # returns ["a", "b", "c", "d"]
class parsr.WithIndent(parser)[source]

Bases: parsr.Wrapper

Consumes whitespace until a non-whitespace character is encountered, pushes the column position onto an indentation stack in the Context, and then calls the parser it’s wrapping. The wrapped parser and any of its children can make use of the saved indentation. Returns the value of the wrapped parser.

WithIndent allows HangingString to work by giving a way to mark how indented following lines must be to count as continuations.

Key = WS >> PosMarker(String(key_chars)) << WS
Sep = InSet(sep_chars, "Sep")
Value = WS >> (Boolean | HangingString(value_chars))
KVPair = WithIndent(Key + Opt(Sep >> Value))
class parsr.Wrapper(parser)[source]

Bases: parsr.Parser

Parser that wraps another parser. This can be used to prevent sequences and choices from accidentally accumulating other parsers when used in multiple parts of a grammar.

parsr.render(tree)[source]

Pretty prints a PEG.

parsr.text_format(tree)[source]

Converts a PEG into a pretty printed string.