Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update pyparsing to 3.2.1 #325

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

pyup-bot
Copy link
Collaborator

This PR updates pyparsing from 3.0.7 to 3.2.1.

Changelog

3.2.1

------------------------------
- Updated generated railroad diagrams to make non-terminal elements links to their related
sub-diagrams. This _greatly_ improves navigation of the diagram, especially for
large, complex parsers.

- Simplified railroad diagrams emitted for parsers using `infix_notation`, by hiding
lookahead terms. Renamed internally generated expressions for clarity, and improved
diagramming.

- Improved performance of `cpp_style_comment`, `c_style_comment`, `common.fnumber`
and `common.ieee_float` Regex expressions. PRs submitted by Gabriel Gerlero,
nice work, thanks!

- Add missing type annotations to `match_only_at_col`, `replace_with`, `remove_quotes`,
`with_attribute`, and `with_class`. Issue 585 reported by rafrafrek.

- Added generated diagrams for many of the examples.

- Replaced old examples/0README.html file with examples/README.md file.

3.2.0

-------------------------------
- Discontinued support for Python 3.6, 3.7, and 3.8. Adopted new Python features from
Python versions 3.7-3.9:
- Updated type annotations to use built-in container types instead of names
 imported from the `typing` module (e.g., `list[str]` vs `List[str]`).
- Reworked portions of the packrat cache to leverage insertion-preserving ordering
 in dicts (including removal of uses of `OrderedDict`).
- Changed `pdb.set_trace()` call in `ParserElement.set_break()` to `breakpoint()`.
- Converted `typing.NamedTuple` to `dataclasses.dataclass` in railroad diagramming
 code.
- Added `from __future__ import annotations` to clean up some type annotations.
(with assistance from ISyncWithFoo, issue 535, thanks for the help!)

- POSSIBLE BREAKING CHANGES

 The following bugfixes may result in subtle changes in the results returned or
 exceptions raised by pyparsing.

 - Fixed code in `ParseElementEnhance` subclasses that
   replaced detailed exception messages raised in contained expressions with a
   less-specific and less-informative generic exception message and location.

   If your code has conditional logic based on the message content in raised
   `ParseExceptions`, this bugfix may require changes in your code.

 - Fixed bug in `transform_string()` where whitespace
   in the input string was not properly preserved in the output string.

   If your code uses `transform_string`, this bugfix may require changes in
   your code.

 - Fixed bug where an `IndexError` raised in a parse action was
   incorrectly handled as an `IndexError` raised as part of the `ParserElement`
   parsing methods, and reraised as a `ParseException`. Now an `IndexError`
   that raises inside a parse action will properly propagate out as an `IndexError`.
   (Issue 573, reported by August Karlstedt, thanks!)

   If your code raises `IndexError`s in parse actions, this bugfix may require
   changes in your code.

- FIXES AND NEW FEATURES

 - Added type annotations to remainder of `pyparsing` package, and added `mypy`
   run to `tox.ini`, so that type annotations are now run as part of pyparsing's CI.
   Addresses Issue 373, raised by Iwan Aucamp, thanks!

 - Exception message format can now be customized, by overriding
   `ParseBaseException.format_message`:

       def custom_exception_message(exc) -> str:
           found_phrase = f", found {exc.found}" if exc.found else ""
           return f"{exc.lineno}:{exc.column} {exc.msg}{found_phrase}"

       ParseBaseException.formatted_message = custom_exception_message

   (PR 571 submitted by Odysseyas Krystalakos, nice work!)

 - `run_tests` now detects if an exception is raised in a parse action, and will
   report it with an enhanced error message, with the exception type, string,
   and parse action name.

 - `QuotedString` now handles translation of escaped integer, hex, octal, and
   Unicode sequences to their corresponding characters.

 - Fixed the displayed output of `Regex` terms to deduplicate repeated backslashes,
   for easier reading in debugging, printing, and railroad diagrams.

 - Fixed (or at least reduced) elusive bug when generating railroad diagrams,
   where some diagram elements were just empty blocks. Fix submitted by RoDuth,
   thanks a ton!

 - Fixed railroad diagrams that get generated with a parser containing a Regex element
   defined using a verbose pattern - the pattern gets flattened and comments removed
   before creating the corresponding diagram element.

 - Defined a more performant regular expression used internally by `common_html_entity`.

 - `Regex` instances can now be created using a callable that takes no arguments
   and just returns a string or a compiled regular expression, so that creating complex
   regular expression patterns can be deferred until they are actually used for the first
   time in the parser.

 - Added optional `flatten` Boolean argument to `ParseResults.as_list()`, to
   return the parsed values in a flattened list.

 - Added `indent` and `base_1` arguments to `pyparsing.testing.with_line_numbers`. When
   using `with_line_numbers` inside a parse action, set `base_1`=False, since the
   reported `loc` value is 0-based. `indent` can be a leading string (typically of
   spaces or tabs) to indent the numbered string passed to `with_line_numbers`.
   Added while working on 557, reported by Bernd Wechner.

- NEW/ENHANCED EXAMPLES

 - Added query syntax to `mongodb_query_expression.py` with:
   - better support for array fields ("contains all",
     "contains any", and "contains none")
   - "like" and "not like" operators to support SQL "%" wildcard matching
     and "=~" operator to support regex matching
   - text search using "search for"
   - dates and datetimes as query values
   - `a[0]` style array referencing

 - Added `lox_parser.py` example, a parser for the Lox language used as a tutorial in
   Robert Nystrom's "Crafting Interpreters" (http://craftinginterpreters.com/).
   With helpful corrections from RoDuth.

 - Added `complex_chemical_formulas.py` example, to add parsing capability for
   formulas such as "3(C₆H₅OH)₂".

 - Updated `tag_emitter.py` to use new `Tag` class, introduced in pyparsing
   3.1.3.

3.1.4

----------------------------
- Fixed a regression introduced in pyparsing 3.1.3, addition of a type annotation that
referenced `re.Pattern`. Since this type was introduced in Python 3.7, using this type
definition broke Python 3.6 installs of pyparsing 3.1.3. PR submitted by Felix Fontein,
nice work!

3.1.3

----------------------------
- Added new `Tag` ParserElement, for inserting metadata into the parsed results.
This allows a parser to add metadata or annotations to the parsed tokens.
The `Tag` element also accepts an optional `value` parameter, defaulting to `True`.
See the new `tag_metadata.py` example in the `examples` directory.

Example:

      add tag indicating mood
     end_punc = "." | ("!" + Tag("enthusiastic")))
     greeting = "Hello" + Word(alphas) + end_punc

     result = greeting.parse_string("Hello World.")
     print(result.dump())

     result = greeting.parse_string("Hello World!")
     print(result.dump())

prints:

     ['Hello', 'World', '.']

     ['Hello', 'World', '!']
     - enthusiastic: True

- Added example `mongodb_query_expression.py`, to convert human-readable infix query
expressions (such as `a==100 and b>=200`) and transform them into the equivalent
query argument for the pymongo package (`{'$and': [{'a': 100}, {'b': {'$gte': 200}}]}`).
Supports many equality and inequality operators - see the docstring for the
`transform_query` function for more examples.

- Fixed issue where PEP8 compatibility names for `ParserElement` static methods were
not themselves defined as `staticmethods`. When called using a `ParserElement` instance,
this resulted  in a `TypeError` exception. Reported by eylenburg (548).

- To address a compatibility issue in RDFLib, added a property setter for the
`ParserElement.name` property, to call `ParserElement.set_name`.

- Modified `ParserElement.set_name()` to accept a None value, to clear the defined
name and corresponding error message for a `ParserElement`.

- Updated railroad diagram generation for `ZeroOrMore` and `OneOrMore` expressions with
`stop_on` expressions, while investigating 558, reported by user Gu_f.

- Added `<META>` tag to HTML generated for railroad diagrams to force UTF-8 encoding
with older browsers, to better display Unicode parser characters.

- Fixed some cosmetics/bugs in railroad diagrams:
- fixed groups being shown even when `show_groups`=False
- show results names as quoted strings when `show_results_names`=True
- only use integer loop counter if repetition > 2

- Some type annotations added for parse action related methods, thanks August
Karlstedt (551).

- Added exception type to `trace_parse_action` exception output, while investigating
SO question posted by medihack.

- Added `set_name` calls to internal expressions generated in `infix_notation`, for
improved railroad diagramming.

- `delta_time`, `lua_parser`, `decaf_parser`, and `roman_numerals` examples cleaned up
to use latest PEP8 names and add minor enhancements.

- Fixed bug (and corresponding test code) in `delta_time` example that did not handle
weekday references in time expressions (like "Monday at 4pm") when the weekday was
the same as the current weekday.

- Minor performance speedup in `trim_arity`, to benefit any parsers using parse actions.

- Added early testing support for Python 3.13 with JIT enabled.

3.1.2

---------------------------
- Added `ieee_float` expression to `pyparsing.common`, which parses float values,
plus "NaN", "Inf", "Infinity". PR submitted by Bob Peterson (538).

- Updated pep8 synonym wrappers for better type checking compatibility. PR submitted
by Ricardo Coccioli (507).

- Fixed empty error message bug, PR submitted by InSync (534). This _should_ return
pyparsing's exception messages to a former, more helpful form. If you have code that
parses the exception messages returned by pyparsing, this may require some code
changes.

- Added unit tests to test for exception message contents, with enhancement to
`pyparsing.testing.assertRaisesParseException` to accept an expected exception message.

- Updated example `select_parser.py` to use PEP8 names and added Groups for better retrieval
of parsed values from multiple SELECT clauses.

- Added example `email_address_parser.py`, as suggested by John Byrd (539).

- Added example `directx_x_file_parser.py` to parse DirectX template definitions, and
generate a Pyparsing parser from a template to parse .x files.

- Some code refactoring to reduce code nesting, PRs submitted by InSync.

- All internal string expressions using '%' string interpolation and `str.format()`
converted to f-strings.

3.1.1

--------------------------
- Fixed regression in Word(min), reported by Ricardo Coccioli, good catch! (Issue 502)

- Fixed bug in bad exception messages raised by Forward expressions. PR submitted
by Kyle Sunden, thanks for your patience and collaboration on this (493).

- Fixed regression in SkipTo, where ignored expressions were not checked when looking
for the target expression. Reported by catcombo, Issue 500.

- Fixed type annotation for enable_packrat, PR submitted by Mike Urbach, thanks! (Issue 498)

- Some general internal code cleanup. (Instigated by Michal Čihař, Issue 488)

3.1.0

--------------------------
- Added `tag_emitter.py` to examples. This example demonstrates how to insert
tags into your parsed results that are not part of the original parsed text.

3.1.0b2

---------------------------
- Updated `create_diagram()` code to be compatible with railroad-diagrams package
version 3.0. Fixes Issue 477 (railroad diagrams generated with black bars),
reported by Sam Morley-Short.

- Fixed bug in `NotAny`, where parse actions on the negated expr were not being run.
This could cause `NotAny` to incorrectly fail if the expr would normally match,
but would fail to match if a condition used as a parse action returned False.
Fixes Issue 482, raised by byaka, thank you!

- Fixed `create_diagram()` to accept keyword args, to be passed through to the
`template.render()` method to generate the output HTML (PR submitted by Aussie Schnore,
good catch!)

- Fixed bug in `python_quoted_string` regex.

- Added `examples/bf.py` Brainf*ck parser/executor example. Illustrates using
a pyparsing grammar to parse language syntax, and attach executable AST nodes to
the parsed results.

3.1.0b1

-----------------------------
- Added support for Python 3.12.

- API CHANGE: A slight change has been implemented when unquoting a quoted string
parsed using the `QuotedString` class. Formerly, when unquoting and processing
whitespace markers such as \t and \n, these substitutions would occur first, and
then any additional '\' escaping would be done on the resulting string. This would
parse "\\n" as "\<newline>". Now escapes and whitespace markers are all processed
in a single pass working left to right, so the quoted string "\\n" would get unquoted
to "\n" (a backslash followed by "n"). Fixes issue 474 raised by jakeanq,
thanks!

- Added named field "url" to `pyparsing.common.url`, returning the entire
parsed URL string.

- Fixed bug when parse actions returned an empty string for an expression that
had a results name, that the results name was not saved. That is:

   expr = Literal("X").add_parse_action(lambda tokens: "")("value")
   result = expr.parse_string("X")
   print(result["value"])

would raise a `KeyError`. Now empty strings will be saved with the associated
results name. Raised in Issue 470 by Nicco Kunzmann, thank you.

- Fixed bug in `SkipTo` where ignore expressions were not properly handled while
scanning for the target expression. Issue 475, reported by elkniwt, thanks
(this bug has been there for a looooong time!).

- Updated `ci.yml` permissions to limit default access to source - submitted by Joyce
Brum of Google. Thanks so much!

- Updated the `lucene_grammar.py` example (better support for '*' and '?' wildcards)
and corrected the test cases - brought to my attention by Elijah Nicol, good catch!

3.1.0a1

-----------------------------
- API ENHANCEMENT: `Optional(expr)` may now be written as `expr | ""`

This will make this code:

   "{" + Optional(Literal("A") | Literal("a")) + "}"

writable as:

   "{" + (Literal("A") | Literal("a") | "") + "}"

Some related changes implemented as part of this work:
- `Literal("")` now internally generates an `Empty()` (and no longer raises an exception)
- `Empty` is now a subclass of `Literal`

Suggested by Antony Lee (issue 412), PR (413) by Devin J. Pohly.

- Added new class property `identifier` to all Unicode set classes in `pyparsing.unicode`,
using the class's values for `cls.identchars` and `cls.identbodychars`. Now Unicode-aware
parsers that formerly wrote:

   ppu = pyparsing.unicode
   ident = Word(ppu.Greek.identchars, ppu.Greek.identbodychars)

can now write:

   ident = ppu.Greek.identifier
    or
    ident = ppu.Ελληνικά.identifier

- `ParseResults` now has a new method `deepcopy()`, in addition to the current
`copy()` method. `copy()` only makes a shallow copy - any contained `ParseResults`
are copied as references - changes in the copy will be seen as changes in the original.
In many cases, a shallow copy is sufficient, but some applications require a deep copy.
`deepcopy()` makes a deeper copy: any contained `ParseResults` or other mappings or
containers are built with copies from the original, and do not get changed if the
original is later changed. Addresses issue 463, reported by Bryn Pickering.

- Reworked `delimited_list` function into the new `DelimitedList` class.
`DelimitedList` has the same constructor interface as `delimited_list`, and
in this release, `delimited_list` changes from a function to a synonym for
`DelimitedList`. `delimited_list` and the older `delimitedList` method will be
deprecated in a future release, in favor of `DelimitedList`.

- Error messages from `MatchFirst` and `Or` expressions will try to give more details
if one of the alternatives matches better than the others, but still fails.
Question raised in Issue 464 by msdemlei, thanks!

- Added new class method `ParserElement.using_each`, to simplify code
that creates a sequence of `Literals`, `Keywords`, or other `ParserElement`
subclasses.

For instance, to define suppressible punctuation, you would previously
write:

   LPAR, RPAR, LBRACE, RBRACE, SEMI = map(Suppress, "(){};")

You can now write:

   LPAR, RPAR, LBRACE, RBRACE, SEMI = Suppress.using_each("(){};")

`using_each` will also accept optional keyword args, which it will
pass through to the class initializer. Here is an expression for
single-letter variable names that might be used in an algebraic
expression:

   algebra_var = MatchFirst(
       Char.using_each(string.ascii_lowercase, as_keyword=True)
   )

- Added new builtin `python_quoted_string`, which will match any form
of single-line or multiline quoted strings defined in Python. (Inspired
by discussion with Andreas Schörgenhumer in Issue 421.)

- Extended `expr[]` notation for repetition of `expr` to accept a
slice, where the slice's stop value indicates a `stop_on`
expression:

   test = "BEGIN aaa bbb ccc END"
   BEGIN, END = Keyword.using_each("BEGIN END".split())
   body_word = Word(alphas)

   expr = BEGIN + Group(body_word[...:END]) + END
    equivalent to
    expr = BEGIN + Group(ZeroOrMore(body_word, stop_on=END)) + END

   print(expr.parse_string(test))

Prints:

   ['BEGIN', ['aaa', 'bbb', 'ccc'], 'END']

- `ParserElement.validate()` is deprecated. It predates the support for left-recursive
parsers, and was prone to false positives (warning that a grammar was invalid when
it was in fact valid).  It will be removed in a future pyparsing release. In its
place, developers should use debugging and analytical tools, such as `ParserElement.set_debug()`
and `ParserElement.create_diagram()`.
(Raised in Issue 444, thanks Andrea Micheli!)

- Added bool `embed` argument to `ParserElement.create_diagram()`.
When passed as True, the resulting diagram will omit the `<DOCTYPE>`,
`<HEAD>`, and `<BODY>` tags so that it can be embedded in other
HTML source. (Useful when embedding a call to `create_diagram()` in
a PyScript HTML page.)

- Added `recurse` argument to `ParserElement.set_debug` to set the
debug flag on an expression and all of its sub-expressions. Requested
by multimeric in Issue 399.

- Added '·' (Unicode MIDDLE DOT) to the set of Latin1.identbodychars.

- Fixed bug in `Word` when `max=2`. Also added performance enhancement
when specifying `exact` argument. Reported in issue 409 by
panda-34, nice catch!

- `Word` arguments are now validated if `min` and `max` are both
given, that `min` <= `max`; raises `ValueError` if values are invalid.

- Fixed bug in srange, when parsing escaped '/' and '\' inside a
range set.

- Fixed exception messages for some `ParserElements` with custom names,
which instead showed their contained expression names.

- Fixed bug in pyparsing.common.url, when input URL is not alone
on an input line. Fixes Issue 459, reported by David Kennedy.

- Multiple added and corrected type annotations. With much help from
Stephen Rosen, thanks!

- Some documentation and error message clarifications on pyparsing's
keyword logic, cited by Basil Peace.

- General docstring cleanup for Sphinx doc generation, PRs submitted
by Devin J. Pohly. A dirty job, but someone has to do it - much
appreciated!

- `invRegex.py` example renamed to `inv_regex.py` and updated to PEP-8
variable and method naming. PR submitted by Ross J. Duff, thanks!

- Removed examples `sparser.py` and `pymicko.py`, since each included its
own GPL license in the header. Since this conflicts with pyparsing's
MIT license, they were removed from the distribution to avoid
confusion among those making use of them in their own projects.

3.0.9

-------------------------
- Added Unicode set `BasicMultilingualPlane` (may also be referenced
as `BMP`) representing the Basic Multilingual Plane (Unicode
characters up to code point 65535). Can be used to parse
most language characters, but omits emojis, wingdings, etc.
Raised in discussion with Dave Tapley (issue 392).

- To address mypy confusion of `pyparsing.Optional` and `typing.Optional`
resulting in `error: "_SpecialForm" not callable` message
reported in issue 365, fixed the import in `exceptions.py`. Nice
sleuthing by Iwan Aucamp and Dominic Davis-Foster, thank you!
(Removed definitions of `OptionalType`, `DictType`, and `IterableType`
and replaced them with `typing.Optional`, `typing.Dict`, and
`typing.Iterable` throughout.)

- Fixed typo in jinja2 template for railroad diagrams, thanks for the
catch Nioub (issue 388).

- Removed use of deprecated `pkg_resources` package in
railroad diagramming code (issue 391).

- Updated `bigquery_view_parser.py` example to parse examples at
https://cloud.google.com/bigquery/docs/reference/legacy-sql

3.0.8

---------------------------
- API CHANGE: modified `pyproject.toml` to require Python version
3.6.8 or later for pyparsing 3.x. Earlier minor versions of 3.6
fail in evaluating the `version_info` class (implemented using
`typing.NamedTuple`). If you are using an earlier version of Python
3.6, you will need to use pyparsing 2.4.7.

- Improved pyparsing import time by deferring regex pattern compiles.
PR submitted by Anthony Sottile to fix issue 362, thanks!

- Updated build to use flit, PR by Michał Górny, added `BUILDING.md`
doc and removed old Windows build scripts - nice cleanup work!

- More type-hinting added for all arithmetic and logical operator
methods in `ParserElement`. PR from Kazantcev Andrey, thank you.

- Fixed `infix_notation`'s definitions of `lpar` and `rpar`, to accept
parse expressions such that they do not get suppressed in the parsed
results. PR submitted by Philippe Prados, nice work.

- Fixed bug in railroad diagramming with expressions containing `Combine`
elements. Reported by Jeremy White, thanks!

- Added `show_groups` argument to `create_diagram` to highlight grouped
elements with an unlabeled bounding box.

- Added `unicode_denormalizer.py` to the examples as a demonstration
of how Python's interpreter will accept Unicode characters in
identifiers, but normalizes them back to ASCII so that identifiers
`print` and `𝕡𝓻ᵢ𝓃𝘁` and `𝖕𝒓𝗂𝑛ᵗ` are all equivalent.

- Removed imports of deprecated `sre_constants` module for catching
exceptions when compiling regular expressions. PR submitted by
Serhiy Storchaka, thank you.
Links

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant