parsr¶
parsr is a library for building parsers based on parsing expression grammars or PEGs.
You build a parser by making subparsers to match simple building blocks like numbers, strings, symbols, etc. and then composing them to reflect the higher level structure of your language.
Some means of combination are like those of regular expressions: sequences, alternatives, repetition, optional matching, etc. However, matching is always greedy. parsr also allows recursive definitions and the ability to transform the match of any subparser with a function. The parser can recognize and interpret its input at the same time.
Here’s an example that evaluates arithmetic expressions.
from insights.parsr import EOF, Forward, InSet, Many, Number, WS def op(args): ans, rest = args for op, arg in rest: if op == "+": ans += arg elif op == "-": ans -= arg elif op == "*": ans *= arg else: ans /= arg return ans LP = Char("(") RP = Char(")") expr = Forward() # Forward declarations allow recursive structure factor = WS >> (Number | (LP >> expr << RP)) << WS term = (factor + Many(InSet("*/") + factor)).map(op) # Notice the funny assignment of Forward definitions. expr <= (term + Many(InSet("+-") + term)).map(op) evaluate = expr << EOF
-
exception
parsr.Backtrack(msg)[source]¶ Bases:
ExceptionMapped or Lifted functions should Backtrack if they want to fail without causing parsing to fail.
-
class
parsr.Char(char)[source]¶ Bases:
parsr.ParserChar matches a single character.
a = Char("a") # parses a single "a" val = a("a") # produces an "a" from the data. val = a("b") # raises an exception
-
class
parsr.Choice(children)[source]¶ Bases:
parsr.ParserA Choice requires at least one of its children to succeed, and it returns the value of the one that matched. Alternatives in a choice are tried left to right, so they have a definite priority. This a feature of PEGs over context free grammars.
Additional uses of
|on the parser will cause it to accumulate parsers onto itself instead of creating new Choices. This has the desirable effect of increasing efficiency, but it can also have unintended consequences if a choice is used in multiple parts of a grammar as the initial element of another choice. Use aWrapperto prevent that from happening.abc = a | b | c # alternation or choice. val = abc("a") # parses a single "a" val = abc("b") # parses a single "b" val = abc("c") # parses a single "c" val = abc("d") # raises an exception
-
class
parsr.Context(lines, orig, src=None)[source]¶ Bases:
objectAn instance of Context is threaded through the process call to every parser. It stores an indention stack to track hanging indents, a tag stack for grammars like xml or apache configuration, the active parser stack for error reporting, and accumulated errors for the farthest position reached.
-
set(pos, msg)[source]¶ Every parser that encounters an error calls set with the current position and a message. If the error is at the farthest position reached by any other parser, the active parser stack and message are accumulated onto a list of errors for that position. If the position is beyond any previous errors, the error list is cleared before the active stack and new error are recorded. This is the “farthest failure heurstic.”
-
-
class
parsr.EnclosedComment(s, e)[source]¶ Bases:
parsr.ParserEnclosedComment matches a start literal, an end literal, and all characters between. It returns the content between the start and end.
Comment = EnclosedComment("/*", "*/")
-
class
parsr.EndTagName(parser, ignore_case=False)[source]¶ Bases:
parsr.WrapperWraps a parser that represents an end tag for grammars like xml, html, etc. The result is captured and compared to the last tag on the tag stack in the
Contextobject. The tags must match for the parse to be successful.
-
class
parsr.FollowedBy(child, follow)[source]¶ Bases:
parsr.ParserFollowedBy takes a parser and a predicate parser. The initial parser matches only if the predicate matches the input after it. On success, input for the predicate is left unread, and the result of the first parser is returned.
ab = Char("a") & Char("b") # matches an "a" followed by a "b", but # the "b" isn't consumed from the input. val = ab("ab") # returns "a" and leaves "b" to be # consumed. val = ab("ac") # raises an exception and doesn't # consume "a".
-
class
parsr.Forward[source]¶ Bases:
parsr.ParserForward allows recursive grammars where a nonterminal’s definition includes itself directly or indirectly. You initially create a Forward nonterminal with regular assignment.
expr = Forward()
You later give it its real definition with the
<=operator.expr <= (term + Many(LowOps + term)).map(op)
-
class
parsr.HangingString(chars, echars=None, min_length=1)[source]¶ Bases:
parsr.ParserHangingString matches lines with indented continuations like in ini files.
Key = WS >> PosMarker(String(key_chars)) << WS Sep = InSet(sep_chars, "Sep") Value = WS >> (Boolean | HangingString(value_chars)) KVPair = WithIndent(Key + Opt(Sep >> Value))
-
class
parsr.InSet(s, name=None)[source]¶ Bases:
parsr.ParserInSet matches any single character from a set.
vowel = InSet("aeiou") # or InSet(set("aeiou")) val = vowel("a") # okay val = vowel("e") # okay val = vowel("i") # okay val = vowel("o") # okay val = vowel("u") # okay val = vowel("y") # raises an exception
-
class
parsr.KeepLeft(left, right)[source]¶ Bases:
parsr.ParserKeepLeft takes two parsers. It requires them both to succeed but only returns results for the first one. It consumes input for both.
a = Char("a") q = Char('"') aq = a << q # like a + q except only the result of a is # returned val = aq('a"') # returns "a". Keeps the thing on the left of the # <<
-
class
parsr.KeepRight(left, right)[source]¶ Bases:
parsr.ParserKeepRight takes two parsers. It requires them both to succeed but only returns results for the second one. It consumes input for both.
q = Char('"') a = Char("a") qa = q >> a # like q + a except only the result of a is # returned val = qa('"a') # returns "a". Keeps the thing on the right of the # >>
-
class
parsr.Lift(func)[source]¶ Bases:
parsr.ParserLift wraps a function of multiple arguments. Use it with the multiplication operator on as many parsers as function arguments, and the results of those parsers will be passed to the function. The result of a Lift parser is the result of the wrapped function.
Example:
.. code-block:: python def comb(a, b, c): return "".join([a, b, c]) # You'd normally invoke comb like comb("x", "y", "z"), but you can # "lift" it for use with parsers like this: x = Char("x") y = Char("y") z = Char("z") p = Lift(comb) * x * y * z # The * operator separates parsers whose results will go into the # arguments of the lifted function. I've used Char above, but x, y, # and z can be arbitrarily complex. val = p("xyz") # would return "xyz" val = p("xyx") # raises an exception. nothing would be consumed
-
class
parsr.Literal(chars, value=<object object>, ignore_case=False)[source]¶ Bases:
parsr.ParserMatch a literal string. The
valuekeyword lets you return a python value instead of the matched input. Theignore_casekeyword makes the match case insensitive.lit = Literal("true") val = lit("true") # returns "true" val = lit("True") # raises an exception val = lit("one") # raises an exception lit = Literal("true", ignore_case=True) val = lit("true") # returns "true" val = lit("TRUE") # returns "TRUE" val = lit("one") # raises an exception t = Literal("true", value=True) f = Literal("false", value=False) val = t("true") # returns the boolean True val = t("True") # raises an exception val = f("false") # returns the boolean False val = f("False") # raises and exception t = Literal("true", value=True, ignore_case=True) f = Literal("false", value=False, ignore_case=True) val = t("true") # returns the boolean True val = t("True") # returns the boolean True val = f("false") # returns the boolean False val = f("False") # returns the boolean False
-
class
parsr.Many(parser, lower=0, upper=None)[source]¶ Bases:
parsr.ParserMany wraps another parser and requires it to match a certain number of times.
When Many matches zero occurences (
lower=0), it always succeeds. Keep this in mind when using it in a list of alternatives or withFollowedByorNotFollowedBy.The results are returned as a list.
x = Char("x") xs = Many(x) # parses many (or no) x's in a row val = xs("") # returns [] val = xs("a") # returns [] val = xs("x") # returns ["x"] val = xs("xxxxx") # returns ["x", "x", "x", "x", "x"] val = xs("xxxxb") # returns ["x", "x", "x", "x"] ab = Many(a + b) # parses "abab..." val = ab("") # produces [] val = ab("ab") # produces [["a", b"]] val = ab("ba") # produces [] val = ab("ababab")# produces [["a", b"], ["a", "b"], ["a", "b"]] ab = Many(a | b) # parses any combination of "a" and "b" like # "aababbaba..." val = ab("aababb")# produces ["a", "a", "b", "a", "b", "b"] bs = Many(Char("b"), lower=1) # requires at least one "b"
-
class
parsr.Map(child, func)[source]¶ Bases:
parsr.ParserMap wraps a parser and a function. It returns the result of using the function to transform the wrapped parser’s result.
Example:
.. code-block:: python Digit = InSet("0123456789") Digits = Many(Digit, lower=1) Number = Digits.map(lambda x: int("".join(x)))
-
class
parsr.Mark(lineno, col, value, start, end)[source]¶ Bases:
objectAn object created by
PosMarkerto capture a value at a position in the input. Marks can give more context to a value transformed by mapped functions.
-
class
parsr.Node[source]¶ Bases:
objectNode is the base class of all parsers. It’s a generic tree structure with each instance containing a list of its children. Its main purpose is to simplify pretty printing.
-
class
parsr.NotFollowedBy(child, follow)[source]¶ Bases:
parsr.ParserNotFollowedBy takes a parser and a predicate parser. The initial parser matches only if the predicate parser fails to match the input after it. On success, input for the predicate is left unread, and the result of the first parser is returned.
anb = Char("a") / Char("b") # matches an "a" not followed by a "b". val = anb("ac") # returns "a" and leaves "c" to be # consumed val = anb("ab") # raises an exception and doesn't # consume "a".
-
class
parsr.OneLineComment(s)[source]¶ Bases:
parsr.ParserOneLineComment matches everything from a literal to the end of a line, excluding the end of line characters themselves. It returns the content between the start literal and the end of the line.
Comment = OneLineComment("#") | OneLineComment("//")
-
class
parsr.Opt(p, default=None)[source]¶ Bases:
parsr.ParserOpt wraps a single parser and returns its value if it succeeds. It returns a default value otherwise. The input pointer is advanced only if the wrapped parser succeeds.
a = Char("a") o = Opt(a) # matches an "a" if its available. Still succeeds # otherwise but doesn't advance the read pointer. val = o("a") # returns "a" val = o("b") # returns None. Read pointer is not advanced. o = Opt(a, default="x") # matches an "a" if its available. Returns # "x" otherwise. val = o("a") # returns "a" val = o("b") # returns "x". Read pointer is not advanced.
-
class
parsr.Parser[source]¶ Bases:
parsr.NodeParser is the common base class of all Parsers.
-
debug(d=True)[source]¶ Set to
Trueto enable diagnostic messages before and after the parser is invoked.
-
map(func)[source]¶ Return a
Mapparser that transforms the results of the current parser with the function func.
-
-
class
parsr.PosMarker(parser)[source]¶ Bases:
parsr.WrapperSave the line number and column of a subparser by wrapping it in a PosMarker. The value of the parser that handled the input as well as the initial input position will be returned as a
Mark.
-
class
parsr.Regex(pattern, flags=0, return_match=False)[source]¶ Bases:
parsr.ParserMatch characters against a regular expression.
identifier = Regex("[a-zA-Z]([a-zA-Z0-9])*") identifier("abcd1") # returns "abcd1" identifier("1bcd1") # raises an exception
-
class
parsr.Sequence(children)[source]¶ Bases:
parsr.ParserA Sequence requires all of its children to succeed. It returns a list of the values they matched.
Additional uses of
+on the parser will cause it to accumulate parsers onto itself instead of creating new Sequences. This has the desirable effect of causing sequence results to be represented as flat lists instead of trees, but it can also have unintended consequences if a sequence is used in multiple parts of a grammar as the initial element of another sequence. Use aWrapperto prevent that from happening.a = Char("a") # parses a single "a" b = Char("b") # parses a single "b" c = Char("c") # parses a single "c" ab = a + b # parses a single "a" followed by a single "b" # (a + b) creates a "Sequence" object. Using `ab` # as an element in a later sequence would modify # its original definition. abc = a + b + c # parses "abc" # (a + b) creates a "Sequence" object to which c # is appended val = ab("ab") # produces a list ["a", "b"] val = ab("a") # raises an exception val = ab("b") # raises an exception val = ab("ac") # raises an exception val = ab("cb") # raises an exception val = abc("abc") # produces ["a", "b", "c"]
-
class
parsr.StartTagName(parser)[source]¶ Bases:
parsr.WrapperWraps a parser that represents a starting tag for grammars like xml, html, etc. The tag result is captured and put onto a tag stack in the
Contextobject.
-
class
parsr.String(chars, echars=None, min_length=1)[source]¶ Bases:
parsr.ParserMatch one or more characters in a set. Matching is greedy.
vowels = String("aeiou") val = vowels("a") # returns "a" val = vowels("u") # returns "u" val = vowels("aaeiouuoui") # returns "aaeiouuoui" val = vowels("uoiea") # returns "uoiea" val = vowels("oouieaaea") # returns "oouieaaea" val = vowels("ga") # raises an exception
-
class
parsr.StringUntil(term, lower=None, upper=None)[source]¶ Bases:
parsr.ParserStringUntil matches any number of characters until a predicate is seen.
su = StringUntil(Char("=")) # parses any number of characters until 'a' val = su("ab=") # produces "ab" from the data. val = su("ab") # raises an exception
-
class
parsr.Until(parser, predicate, upper=None)[source]¶ Bases:
parsr.ParserUntil wraps a parser and a terminal parser. It accumulates matches of the first parser until the terminal parser succeeds. Input for the terminal parser is left unread, and the results of the first parser are returned as a list.
Since Until can match zero occurences, it always succeeds. Keep this in mind when using it in a list of alternatives or with
FollowedByorNotFollowedBy.cs = AnyChar.until(Char("y")) # parses many (or no) characters # until a "y" is encountered. val = cs("") # returns [] val = cs("a") # returns ["a"] val = cs("x") # returns ["x"] val = cs("ccccc") # returns ["c", "c", "c", "c", "c"] val = cs("abcdycc") # returns ["a", "b", "c", "d"]
-
class
parsr.WithIndent(parser)[source]¶ Bases:
parsr.WrapperConsumes whitespace until a non-whitespace character is encountered, pushes the column position onto an indentation stack in the
Context, and then calls the parser it’s wrapping. The wrapped parser and any of its children can make use of the saved indentation. Returns the value of the wrapped parser.WithIndent allows
HangingStringto work by giving a way to mark how indented following lines must be to count as continuations.Key = WS >> PosMarker(String(key_chars)) << WS Sep = InSet(sep_chars, "Sep") Value = WS >> (Boolean | HangingString(value_chars)) KVPair = WithIndent(Key + Opt(Sep >> Value))
-
class
parsr.Wrapper(parser)[source]¶ Bases:
parsr.ParserParser that wraps another parser. This can be used to prevent sequences and choices from accidentally accumulating other parsers when used in multiple parts of a grammar.