parsr¶
parsr is a library for building parsers based on parsing expression grammars or PEGs.
You build a parser by making subparsers to match simple building blocks like numbers, strings, symbols, etc. and then composing them to reflect the higher level structure of your language.
Some means of combination are like those of regular expressions: sequences, alternatives, repetition, optional matching, etc. However, matching is always greedy. parsr also allows recursive definitions and the ability to transform the match of any subparser with a function. The parser can recognize and interpret its input at the same time.
Here’s an example that evaluates arithmetic expressions.
from insights.parsr import EOF, Forward, InSet, Many, Number, WS def op(args): ans, rest = args for op, arg in rest: if op == "+": ans += arg elif op == "-": ans -= arg elif op == "*": ans *= arg else: ans /= arg return ans LP = Char("(") RP = Char(")") expr = Forward() # Forward declarations allow recursive structure factor = WS >> (Number | (LP >> expr << RP)) << WS term = (factor + Many(InSet("*/") + factor)).map(op) # Notice the funny assignment of Forward definitions. expr <= (term + Many(InSet("+-") + term)).map(op) evaluate = expr << EOF
-
exception
parsr.
Backtrack
(msg)[source]¶ Bases:
Exception
Mapped or Lifted functions should Backtrack if they want to fail without causing parsing to fail.
-
class
parsr.
Char
(char)[source]¶ Bases:
parsr.Parser
Char matches a single character.
a = Char("a") # parses a single "a" val = a("a") # produces an "a" from the data. val = a("b") # raises an exception
-
class
parsr.
Choice
(children)[source]¶ Bases:
parsr.Parser
A Choice requires at least one of its children to succeed, and it returns the value of the one that matched. Alternatives in a choice are tried left to right, so they have a definite priority. This a feature of PEGs over context free grammars.
Additional uses of
|
on the parser will cause it to accumulate parsers onto itself instead of creating new Choices. This has the desirable effect of increasing efficiency, but it can also have unintended consequences if a choice is used in multiple parts of a grammar as the initial element of another choice. Use aWrapper
to prevent that from happening.abc = a | b | c # alternation or choice. val = abc("a") # parses a single "a" val = abc("b") # parses a single "b" val = abc("c") # parses a single "c" val = abc("d") # raises an exception
-
class
parsr.
Context
(lines, orig, src=None)[source]¶ Bases:
object
An instance of Context is threaded through the process call to every parser. It stores an indention stack to track hanging indents, a tag stack for grammars like xml or apache configuration, the active parser stack for error reporting, and accumulated errors for the farthest position reached.
-
set
(pos, msg)[source]¶ Every parser that encounters an error calls set with the current position and a message. If the error is at the farthest position reached by any other parser, the active parser stack and message are accumulated onto a list of errors for that position. If the position is beyond any previous errors, the error list is cleared before the active stack and new error are recorded. This is the “farthest failure heurstic.”
-
-
class
parsr.
EnclosedComment
(s, e)[source]¶ Bases:
parsr.Parser
EnclosedComment matches a start literal, an end literal, and all characters between. It returns the content between the start and end.
Comment = EnclosedComment("/*", "*/")
-
class
parsr.
EndTagName
(parser, ignore_case=False)[source]¶ Bases:
parsr.Wrapper
Wraps a parser that represents an end tag for grammars like xml, html, etc. The result is captured and compared to the last tag on the tag stack in the
Context
object. The tags must match for the parse to be successful.
-
class
parsr.
FollowedBy
(child, follow)[source]¶ Bases:
parsr.Parser
FollowedBy takes a parser and a predicate parser. The initial parser matches only if the predicate matches the input after it. On success, input for the predicate is left unread, and the result of the first parser is returned.
ab = Char("a") & Char("b") # matches an "a" followed by a "b", but # the "b" isn't consumed from the input. val = ab("ab") # returns "a" and leaves "b" to be # consumed. val = ab("ac") # raises an exception and doesn't # consume "a".
-
class
parsr.
Forward
[source]¶ Bases:
parsr.Parser
Forward allows recursive grammars where a nonterminal’s definition includes itself directly or indirectly. You initially create a Forward nonterminal with regular assignment.
expr = Forward()
You later give it its real definition with the
<=
operator.expr <= (term + Many(LowOps + term)).map(op)
-
class
parsr.
HangingString
(chars, echars=None, min_length=1)[source]¶ Bases:
parsr.Parser
HangingString matches lines with indented continuations like in ini files.
Key = WS >> PosMarker(String(key_chars)) << WS Sep = InSet(sep_chars, "Sep") Value = WS >> (Boolean | HangingString(value_chars)) KVPair = WithIndent(Key + Opt(Sep >> Value))
-
class
parsr.
InSet
(s, name=None)[source]¶ Bases:
parsr.Parser
InSet matches any single character from a set.
vowel = InSet("aeiou") # or InSet(set("aeiou")) val = vowel("a") # okay val = vowel("e") # okay val = vowel("i") # okay val = vowel("o") # okay val = vowel("u") # okay val = vowel("y") # raises an exception
-
class
parsr.
KeepLeft
(left, right)[source]¶ Bases:
parsr.Parser
KeepLeft takes two parsers. It requires them both to succeed but only returns results for the first one. It consumes input for both.
a = Char("a") q = Char('"') aq = a << q # like a + q except only the result of a is # returned val = aq('a"') # returns "a". Keeps the thing on the left of the # <<
-
class
parsr.
KeepRight
(left, right)[source]¶ Bases:
parsr.Parser
KeepRight takes two parsers. It requires them both to succeed but only returns results for the second one. It consumes input for both.
q = Char('"') a = Char("a") qa = q >> a # like q + a except only the result of a is # returned val = qa('"a') # returns "a". Keeps the thing on the right of the # >>
-
class
parsr.
Lift
(func)[source]¶ Bases:
parsr.Parser
Lift wraps a function of multiple arguments. Use it with the multiplication operator on as many parsers as function arguments, and the results of those parsers will be passed to the function. The result of a Lift parser is the result of the wrapped function.
Example:
.. code-block:: python def comb(a, b, c): return "".join([a, b, c]) # You'd normally invoke comb like comb("x", "y", "z"), but you can # "lift" it for use with parsers like this: x = Char("x") y = Char("y") z = Char("z") p = Lift(comb) * x * y * z # The * operator separates parsers whose results will go into the # arguments of the lifted function. I've used Char above, but x, y, # and z can be arbitrarily complex. val = p("xyz") # would return "xyz" val = p("xyx") # raises an exception. nothing would be consumed
-
class
parsr.
Literal
(chars, value=<object object>, ignore_case=False)[source]¶ Bases:
parsr.Parser
Match a literal string. The
value
keyword lets you return a python value instead of the matched input. Theignore_case
keyword makes the match case insensitive.lit = Literal("true") val = lit("true") # returns "true" val = lit("True") # raises an exception val = lit("one") # raises an exception lit = Literal("true", ignore_case=True) val = lit("true") # returns "true" val = lit("TRUE") # returns "TRUE" val = lit("one") # raises an exception t = Literal("true", value=True) f = Literal("false", value=False) val = t("true") # returns the boolean True val = t("True") # raises an exception val = f("false") # returns the boolean False val = f("False") # raises and exception t = Literal("true", value=True, ignore_case=True) f = Literal("false", value=False, ignore_case=True) val = t("true") # returns the boolean True val = t("True") # returns the boolean True val = f("false") # returns the boolean False val = f("False") # returns the boolean False
-
class
parsr.
Many
(parser, lower=0, upper=None)[source]¶ Bases:
parsr.Parser
Many wraps another parser and requires it to match a certain number of times.
When Many matches zero occurences (
lower=0
), it always succeeds. Keep this in mind when using it in a list of alternatives or withFollowedBy
orNotFollowedBy
.The results are returned as a list.
x = Char("x") xs = Many(x) # parses many (or no) x's in a row val = xs("") # returns [] val = xs("a") # returns [] val = xs("x") # returns ["x"] val = xs("xxxxx") # returns ["x", "x", "x", "x", "x"] val = xs("xxxxb") # returns ["x", "x", "x", "x"] ab = Many(a + b) # parses "abab..." val = ab("") # produces [] val = ab("ab") # produces [["a", b"]] val = ab("ba") # produces [] val = ab("ababab")# produces [["a", b"], ["a", "b"], ["a", "b"]] ab = Many(a | b) # parses any combination of "a" and "b" like # "aababbaba..." val = ab("aababb")# produces ["a", "a", "b", "a", "b", "b"] bs = Many(Char("b"), lower=1) # requires at least one "b"
-
class
parsr.
Map
(child, func)[source]¶ Bases:
parsr.Parser
Map wraps a parser and a function. It returns the result of using the function to transform the wrapped parser’s result.
Example:
.. code-block:: python Digit = InSet("0123456789") Digits = Many(Digit, lower=1) Number = Digits.map(lambda x: int("".join(x)))
-
class
parsr.
Mark
(lineno, col, value, start, end)[source]¶ Bases:
object
An object created by
PosMarker
to capture a value at a position in the input. Marks can give more context to a value transformed by mapped functions.
-
class
parsr.
Node
[source]¶ Bases:
object
Node is the base class of all parsers. It’s a generic tree structure with each instance containing a list of its children. Its main purpose is to simplify pretty printing.
-
class
parsr.
NotFollowedBy
(child, follow)[source]¶ Bases:
parsr.Parser
NotFollowedBy takes a parser and a predicate parser. The initial parser matches only if the predicate parser fails to match the input after it. On success, input for the predicate is left unread, and the result of the first parser is returned.
anb = Char("a") / Char("b") # matches an "a" not followed by a "b". val = anb("ac") # returns "a" and leaves "c" to be # consumed val = anb("ab") # raises an exception and doesn't # consume "a".
-
class
parsr.
OneLineComment
(s)[source]¶ Bases:
parsr.Parser
OneLineComment matches everything from a literal to the end of a line, excluding the end of line characters themselves. It returns the content between the start literal and the end of the line.
Comment = OneLineComment("#") | OneLineComment("//")
-
class
parsr.
Opt
(p, default=None)[source]¶ Bases:
parsr.Parser
Opt wraps a single parser and returns its value if it succeeds. It returns a default value otherwise. The input pointer is advanced only if the wrapped parser succeeds.
a = Char("a") o = Opt(a) # matches an "a" if its available. Still succeeds # otherwise but doesn't advance the read pointer. val = o("a") # returns "a" val = o("b") # returns None. Read pointer is not advanced. o = Opt(a, default="x") # matches an "a" if its available. Returns # "x" otherwise. val = o("a") # returns "a" val = o("b") # returns "x". Read pointer is not advanced.
-
class
parsr.
Parser
[source]¶ Bases:
parsr.Node
Parser is the common base class of all Parsers.
-
debug
(d=True)[source]¶ Set to
True
to enable diagnostic messages before and after the parser is invoked.
-
map
(func)[source]¶ Return a
Map
parser that transforms the results of the current parser with the function func.
-
-
class
parsr.
PosMarker
(parser)[source]¶ Bases:
parsr.Wrapper
Save the line number and column of a subparser by wrapping it in a PosMarker. The value of the parser that handled the input as well as the initial input position will be returned as a
Mark
.
-
class
parsr.
Regex
(pattern, flags=0, return_match=False)[source]¶ Bases:
parsr.Parser
Match characters against a regular expression.
identifier = Regex("[a-zA-Z]([a-zA-Z0-9])*") identifier("abcd1") # returns "abcd1" identifier("1bcd1") # raises an exception
-
class
parsr.
Sequence
(children)[source]¶ Bases:
parsr.Parser
A Sequence requires all of its children to succeed. It returns a list of the values they matched.
Additional uses of
+
on the parser will cause it to accumulate parsers onto itself instead of creating new Sequences. This has the desirable effect of causing sequence results to be represented as flat lists instead of trees, but it can also have unintended consequences if a sequence is used in multiple parts of a grammar as the initial element of another sequence. Use aWrapper
to prevent that from happening.a = Char("a") # parses a single "a" b = Char("b") # parses a single "b" c = Char("c") # parses a single "c" ab = a + b # parses a single "a" followed by a single "b" # (a + b) creates a "Sequence" object. Using `ab` # as an element in a later sequence would modify # its original definition. abc = a + b + c # parses "abc" # (a + b) creates a "Sequence" object to which c # is appended val = ab("ab") # produces a list ["a", "b"] val = ab("a") # raises an exception val = ab("b") # raises an exception val = ab("ac") # raises an exception val = ab("cb") # raises an exception val = abc("abc") # produces ["a", "b", "c"]
-
class
parsr.
StartTagName
(parser)[source]¶ Bases:
parsr.Wrapper
Wraps a parser that represents a starting tag for grammars like xml, html, etc. The tag result is captured and put onto a tag stack in the
Context
object.
-
class
parsr.
String
(chars, echars=None, min_length=1)[source]¶ Bases:
parsr.Parser
Match one or more characters in a set. Matching is greedy.
vowels = String("aeiou") val = vowels("a") # returns "a" val = vowels("u") # returns "u" val = vowels("aaeiouuoui") # returns "aaeiouuoui" val = vowels("uoiea") # returns "uoiea" val = vowels("oouieaaea") # returns "oouieaaea" val = vowels("ga") # raises an exception
-
class
parsr.
StringUntil
(term, lower=None, upper=None)[source]¶ Bases:
parsr.Parser
StringUntil matches any number of characters until a predicate is seen.
su = StringUntil(Char("=")) # parses any number of characters until 'a' val = su("ab=") # produces "ab" from the data. val = su("ab") # raises an exception
-
class
parsr.
Until
(parser, predicate, upper=None)[source]¶ Bases:
parsr.Parser
Until wraps a parser and a terminal parser. It accumulates matches of the first parser until the terminal parser succeeds. Input for the terminal parser is left unread, and the results of the first parser are returned as a list.
Since Until can match zero occurences, it always succeeds. Keep this in mind when using it in a list of alternatives or with
FollowedBy
orNotFollowedBy
.cs = AnyChar.until(Char("y")) # parses many (or no) characters # until a "y" is encountered. val = cs("") # returns [] val = cs("a") # returns ["a"] val = cs("x") # returns ["x"] val = cs("ccccc") # returns ["c", "c", "c", "c", "c"] val = cs("abcdycc") # returns ["a", "b", "c", "d"]
-
class
parsr.
WithIndent
(parser)[source]¶ Bases:
parsr.Wrapper
Consumes whitespace until a non-whitespace character is encountered, pushes the column position onto an indentation stack in the
Context
, and then calls the parser it’s wrapping. The wrapped parser and any of its children can make use of the saved indentation. Returns the value of the wrapped parser.WithIndent allows
HangingString
to work by giving a way to mark how indented following lines must be to count as continuations.Key = WS >> PosMarker(String(key_chars)) << WS Sep = InSet(sep_chars, "Sep") Value = WS >> (Boolean | HangingString(value_chars)) KVPair = WithIndent(Key + Opt(Sep >> Value))
-
class
parsr.
Wrapper
(parser)[source]¶ Bases:
parsr.Parser
Parser that wraps another parser. This can be used to prevent sequences and choices from accidentally accumulating other parsers when used in multiple parts of a grammar.