forked from pyparsing/pyparsing
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathCHANGES
3467 lines (2630 loc) · 141 KB
/
CHANGES
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
==========
Change Log
==========
Version 3.0.0b2 - December, 2020
--------------------------------
- API CHANGE
`locatedExpr` is being replaced by the class `Located`. `Located` has the same
constructor interface as `locatedExpr`, but fixes bugs in the returned
`ParseResults` when the searched expression contains multiple tokens, or
has internal results names.
`locatedExpr` is deprecated, and will be removed in a future release.
Version 3.0.0b1 - November, 2020
--------------------------------
- API CHANGE
Diagnostic flags have been moved to an enum, `pyparsing.Diagnostics`, and
they are enabled through module-level methods:
- `pyparsing.enable_diag()`
- `pyparsing.disable_diag()`
- `pyparsing.enable_all_warnings()`
- API CHANGE
Most previous `SyntaxWarnings` that were warned when using pyparsing
classes incorrectly have been converted to `TypeError` and `ValueError` exceptions,
consistent with Python calling conventions. All warnings warned by diagnostic
flags have been converted from `SyntaxWarnings` to `UserWarnings`.
- To support parsers that are intended to generate native Python collection
types such as lists and dicts, the `Group` and `Dict` classes now accept an
additional boolean keyword argument `aslist` and `asdict` respectively. See
the `jsonParser.py` example in the `pyparsing/examples` source directory for
how to return types as `ParseResults` and as Python collection types, and the
distinctions in working with the different types.
In addition parse actions that must return a value of list type (which would
normally be converted internally to a ParseResults) can override this default
behavior by returning their list wrapped in the new `ParseResults.List` class:
# this parse action tries to return a list, but pyparsing
# will convert to a ParseResults
def return_as_list_but_still_get_parse_results(tokens):
return tokens.asList()
# this parse action returns the tokens as a list, and pyparsing will
# maintain its list type in the final parsing results
def return_as_list(tokens):
return ParseResults.List(tokens.asList())
This is the mechanism used internally by the `Group` class when defined
using `aslist=True`.
- A new `IndentedBlock` class is introduced, to eventually replace the
current `indentedBlock` helper method. The interface is largely the same,
however, the new class manages its own internal indentation stack, so
it is no longer necessary to maintain an external `indentStack` variable.
- API CHANGE
Added `cache_hit` keyword argument to debug actions. Previously, if packrat
parsing was enabled, the debug methods were not called in the event of cache
hits. Now these methods will be called, with an added argument
`cache_hit=True`.
If you are using packrat parsing and enable debug on expressions using a
custom debug method, you can add the `cache_hit=False` keyword argument,
and your method will be called on packrat cache hits. If you choose not
to add this keyword argument, the debug methods will fail silently,
behaving as they did previously.
- When using `setDebug` with packrat parsing enabled, packrat cache hits will
now be included in the output, shown with a leading '*'. (Previously, cache
hits and responses were not included in debug output.) For those using custom
debug actions, see the previous item regarding an optional API change
for those methods.
- `setDebug` output will also show more details about what expression
is about to be parsed (the current line of text being parsed, and
the current parse position):
Match integer at loc 0(1,1)
1 2 3
^
Matched integer -> ['1']
The current debug location will also be indicated after whitespace
has been skipped (was previously inconsistent, reported in Issue #244,
by Frank Goyens, thanks!).
- Modified the repr() output for `ParseResults` to include the class
name as part of the output. This is to clarify for new pyparsing users
who misread the repr output as a tuple of a list and a dict. pyparsing
results will now read like:
ParseResults(['abc', 'def'], {'qty': 100}]
instead of just:
(['abc', 'def'], {'qty': 100}]
- Fixed bugs in Each when passed OneOrMore or ZeroOrMore expressions:
. first expression match could be enclosed in an extra nesting level
. out-of-order expressions now handled correctly if mixed with required
expressions
. results names are maintained correctly for these expressions
- Fixed traceback trimming, and added `ParserElement.verbose_traceback`
save/restore to `reset_pyparsing_context()`.
- Default string for `Word` expressions now also include indications of
`min` and `max` length specification, if applicable, similar to regex length
specifications:
Word(alphas) -> "W:(A-Za-z)"
Word(nums) -> "W:(0-9)"
Word(nums, exact=3) -> "W:(0-9){3}"
Word(nums, min=2) -> "W:(0-9){2,...}"
Word(nums, max=3) -> "W:(0-9){1,3}"
Word(nums, min=2, max=3) -> "W:(0-9){2,3}"
For expressions of the `Char` class (similar to `Word(..., exact=1)`, the expression
is simply the character range in parentheses:
Char(nums) -> "(0-9)"
Char(alphas) -> "(A-Za-z)"
- Removed `copy()` override in `Keyword` class which did not preserve definition
of ident chars from the original expression. PR #233 submitted by jgrey4296,
thanks!
- In addition to `pyparsing.__version__`, there is now also a `pyparsing.__version_info__`,
following the same structure and field names as in `sys.version_info`.
Version 3.0.0a2 - June, 2020
----------------------------
- Summary of changes for 3.0.0 can be found in "What's New in Pyparsing 3.0.0"
documentation.
- API CHANGE
Changed result returned when parsing using countedArray,
the array items are no longer returned in a doubly-nested
list.
- An excellent new enhancement is the new railroad diagram
generator for documenting pyparsing parsers:
import pyparsing as pp
from pyparsing.diagram import to_railroad, railroad_to_html
from pathlib import Path
# define a simple grammar for parsing street addresses such
# as "123 Main Street"
# number word...
number = pp.Word(pp.nums).setName("number")
name = pp.Word(pp.alphas).setName("word")[1, ...]
parser = number("house_number") + name("street")
parser.setName("street address")
# construct railroad track diagram for this parser and
# save as HTML
rr = to_railroad(parser)
Path('parser_rr_diag.html').write_text(railroad_to_html(rr))
Very nice work provided by Michael Milton, thanks a ton!
- Enhanced default strings created for Word expressions, now showing
string ranges if possible. `Word(alphas)` would formerly
print as `W:(ABCD...)`, now prints as `W:(A-Za-z)`.
- Added ignoreWhitespace(recurse:bool = True) and added a
recurse argument to leaveWhitespace, both added to provide finer
control over pyparsing's whitespace skipping. Also contributed
by Michael Milton.
- The unicode range definitions for the various languages were
recalculated by interrogating the unicodedata module by character
name, selecting characters that contained that language in their
Unicode name. (Issue #227)
Also, pyparsing_unicode.Korean was renamed to Hangul (Korean
is also defined as a synonym for compatibility).
- Enhanced ParseResults dump() to show both results names and list
subitems. Fixes bug where adding a results name would hide
lower-level structures in the ParseResults.
- Added new __diag__ warnings:
"warn_on_parse_using_empty_Forward" - warns that a Forward
has been included in a grammar, but no expression was
attached to it using '<<=' or '<<'
"warn_on_assignment_to_Forward" - warns that a Forward has
been created, but was probably later overwritten by
erroneously using '=' instead of '<<=' (this is a common
mistake when using Forwards)
(**currently not working on PyPy**)
- Added ParserElement.recurse() method to make it simpler for
grammar utilities to navigate through the tree of expressions in
a pyparsing grammar.
- Fixed bug in ParseResults repr() which showed all matching
entries for a results name, even if listAllMatches was set
to False when creating the ParseResults originally. Reported
by Nicholas42 on GitHub, good catch! (Issue #205)
- Modified refactored modules to use relative imports, as
pointed out by setuptools project member jaraco, thank you!
- Off-by-one bug found in the roman_numerals.py example, a bug
that has been there for about 14 years! PR submitted by
Jay Pedersen, nice catch!
- A simplified Lua parser has been added to the examples
(lua_parser.py).
- Added make_diagram.py to the examples directory to demonstrate
creation of railroad diagrams for selected pyparsing examples.
Also restructured some examples to make their parsers importable
without running their embedded tests.
Version 3.0.0a1 - April, 2020
-----------------------------
- Removed Py2.x support and other deprecated features. Pyparsing
now requires Python 3.5 or later. If you are using an earlier
version of Python, you must use a Pyparsing 2.4.x version
Deprecated features removed:
. ParseResults.asXML() - if used for debugging, switch
to using ParseResults.dump(); if used for data transfer,
use ParseResults.asDict() to convert to a nested Python
dict, which can then be converted to XML or JSON or
other transfer format
. operatorPrecedence synonym for infixNotation -
convert to calling infixNotation
. commaSeparatedList - convert to using
pyparsing_common.comma_separated_list
. upcaseTokens and downcaseTokens - convert to using
pyparsing_common.upcaseTokens and downcaseTokens
. __compat__.collect_all_And_tokens will not be settable to
False to revert to pre-2.3.1 results name behavior -
review use of names for MatchFirst and Or expressions
containing And expressions, as they will return the
complete list of parsed tokens, not just the first one.
Use `__diag__.warn_multiple_tokens_in_named_alternation`
to help identify those expressions in your parsers that
will have changed as a result.
- Removed support for running `python setup.py test`. The setuptools
maintainers consider the test command deprecated (see
<https://github.com/pypa/setuptools/issues/1684>). To run the Pyparsing test,
use the command `tox`.
- API CHANGE:
The staticmethod `ParseException.explain` has been moved to
`ParseBaseException.explain_exception`, and a new `explain` instance
method added to ParseBaseException. This will make calls to `explain`
much more natural:
try:
expr.parseString("...")
except ParseException as pe:
print(pe.explain())
- POTENTIAL API CHANGE:
ZeroOrMore expressions that have results names will now
include empty lists for their name if no matches are found.
Previously, no named result would be present. Code that tested
for the presence of any expressions using "if name in results:"
will now always return True. This code will need to change to
"if name in results and results[name]:" or just
"if results[name]:". Also, any parser unit tests that check the
asDict() contents will now see additional entries for parsers
having named ZeroOrMore expressions, whose values will be `[]`.
- POTENTIAL API CHANGE:
Fixed a bug in which calls to ParserElement.setDefaultWhitespaceChars
did not change whitespace definitions on any pyparsing built-in
expressions defined at import time (such as quotedString, or those
defined in pyparsing_common). This would lead to confusion when
built-in expressions would not use updated default whitespace
characters. Now a call to ParserElement.setDefaultWhitespaceChars
will also go and update all pyparsing built-ins to use the new
default whitespace characters. (Note that this will only modify
expressions defined within the pyparsing module.) Prompted by
work on a StackOverflow question posted by jtiai.
- Expanded __diag__ and __compat__ to actual classes instead of
just namespaces, to add some helpful behavior:
- enable() and .disable() methods to give extra
help when setting or clearing flags (detects invalid
flag names, detects when trying to set a __compat__ flag
that is no longer settable). Use these methods now to
set or clear flags, instead of directly setting to True or
False.
import pyparsing as pp
pp.__diag__.enable("warn_multiple_tokens_in_named_alternation")
- __diag__.enable_all_warnings() is another helper that sets
all "warn*" diagnostics to True.
pp.__diag__.enable_all_warnings()
- added new warning, "warn_on_match_first_with_lshift_operator" to
warn when using '<<' with a '|' MatchFirst operator, which will
create an unintended expression due to precedence of operations.
Example: This statement will erroneously define the `fwd` expression
as just `expr_a`, even though `expr_a | expr_b` was intended,
since '<<' operator has precedence over '|':
fwd << expr_a | expr_b
To correct this, use the '<<=' operator (preferred) or parentheses
to override operator precedence:
fwd <<= expr_a | expr_b
or
fwd << (expr_a | expr_b)
- Cleaned up default tracebacks when getting a ParseException when calling
parseString. Exception traces should now stop at the call in parseString,
and not include the internal traceback frames. (If the full traceback
is desired, then set ParserElement.verbose_traceback to True.)
- Fixed FutureWarnings that sometimes are raised when '[' passed as a
character to Word.
- New namespace, assert methods and classes added to support writing
unit tests.
- assertParseResultsEquals
- assertParseAndCheckList
- assertParseAndCheckDict
- assertRunTestResults
- assertRaisesParseException
- reset_pyparsing_context context manager, to restore pyparsing
config settings
- Enhanced error messages and error locations when parsing fails on
the Keyword or CaselessKeyword classes due to the presence of a
preceding or trailing keyword character. Surfaced while
working with metaperl on issue #201.
- Enhanced the Regex class to be compatible with re's compiled with the
re-equivalent regex module. Individual expressions can be built with
regex compiled expressions using:
import pyparsing as pp
import regex
# would use regex for this expression
integer_parser = pp.Regex(regex.compile(r'\d+'))
Inspired by PR submitted by bjrnfrdnnd on GitHub, very nice!
- Fixed handling of ParseSyntaxExceptions raised as part of Each
expressions, when sub-expressions contain '-' backtrack
suppression. As part of resolution to a question posted by John
Greene on StackOverflow.
- Potentially *huge* performance enhancement when parsing Word
expressions built from pyparsing_unicode character sets. Word now
internally converts ranges of consecutive characters to regex
character ranges (converting "0123456789" to "0-9" for instance),
resulting in as much as 50X improvement in performance! Work
inspired by a question posted by Midnighter on StackOverflow.
- Improvements in select_parser.py, to include new SQL syntax
from SQLite. PR submitted by Robert Coup, nice work!
- Fixed bug in PrecededBy which caused infinite recursion, issue #127
submitted by EdwardJB.
- Fixed bug in CloseMatch where end location was incorrectly
computed; and updated partial_gene_match.py example.
- Fixed bug in indentedBlock with a parser using two different
types of nested indented blocks with different indent values,
but sharing the same indent stack, submitted by renzbagaporo.
- Fixed bug in Each when using Regex, when Regex expression would
get parsed twice; issue #183 submitted by scauligi, thanks!
- BigQueryViewParser.py added to examples directory, PR submitted
by Michael Smedberg, nice work!
- booleansearchparser.py added to examples directory, PR submitted
by xecgr. Builds on searchparser.py, adding support for '*'
wildcards and non-Western alphabets.
- Fixed bug in delta_time.py example, when using a quantity
of seconds/minutes/hours/days > 999.
- Fixed bug in regex definitions for real and sci_real expressions in
pyparsing_common. Issue #194, reported by Michael Wayne Goodman, thanks!
- Fixed FutureWarning raised beginning in Python 3.7 for Regex expressions
containing '[' within a regex set.
- Minor reformatting of output from runTests to make embedded
comments more visible.
- And finally, many thanks to those who helped in the restructuring
of the pyparsing code base as part of this release. Pyparsing now
has more standard package structure, more standard unit tests,
and more standard code formatting (using black). Special thanks
to jdufresne, klahnakoski, mattcarmody, and ckeygusuz, to name just
a few.
Version 2.4.7 - March, 2020 (April, actually)
---------------------------------------------
- Backport of selected fixes from 3.0.0 work:
. Each bug with Regex expressions
. And expressions not properly constructing with generator
. Traceback abbreviation
. Bug in delta_time example
. Fix regexen in pyparsing_common.real and .sci_real
. Avoid FutureWarning on Python 3.7 or later
. Cleanup output in runTests if comments are embedded in test string
Version 2.4.6 - December, 2019
------------------------------
- Fixed typos in White mapping of whitespace characters, to use
correct "\u" prefix instead of "u\".
- Fix bug in left-associative ternary operators defined using
infixNotation. First reported on StackOverflow by user Jeronimo.
- Backport of pyparsing_test namespace from 3.0.0, including
TestParseResultsAsserts mixin class defining unittest-helper
methods:
. def assertParseResultsEquals(
self, result, expected_list=None, expected_dict=None, msg=None)
. def assertParseAndCheckList(
self, expr, test_string, expected_list, msg=None, verbose=True)
. def assertParseAndCheckDict(
self, expr, test_string, expected_dict, msg=None, verbose=True)
. def assertRunTestResults(
self, run_tests_report, expected_parse_results=None, msg=None)
. def assertRaisesParseException(self, exc_type=ParseException, msg=None)
To use the methods in this mixin class, declare your unittest classes as:
from pyparsing import pyparsing_test as ppt
class MyParserTest(ppt.TestParseResultsAsserts, unittest.TestCase):
...
Version 2.4.5 - November, 2019
------------------------------
- NOTE: final release compatible with Python 2.x.
- Fixed issue with reading README.rst as part of setup.py's
initialization of the project's long_description, with a
non-ASCII space character causing errors when installing from
source on platforms where UTF-8 is not the default encoding.
Version 2.4.4 - November, 2019
--------------------------------
- Unresolved symbol reference in 2.4.3 release was masked by stdout
buffering in unit tests, thanks for the prompt heads-up, Ned
Batchelder!
Version 2.4.3 - November, 2019
------------------------------
- Fixed a bug in ParserElement.__eq__ that would for some parsers
create a recursion error at parser definition time. Thanks to
Michael Clerx for the assist. (Addresses issue #123)
- Fixed bug in indentedBlock where a block that ended at the end
of the input string could cause pyparsing to loop forever. Raised
as part of discussion on StackOverflow with geckos.
- Backports from pyparsing 3.0.0:
. __diag__.enable_all_warnings()
. Fixed bug in PrecededBy which caused infinite recursion, issue #127
. support for using regex-compiled RE to construct Regex expressions
Version 2.4.2 - July, 2019
--------------------------
- Updated the shorthand notation that has been added for repetition
expressions: expr[min, max], with '...' valid as a min or max value:
- expr[...] and expr[0, ...] are equivalent to ZeroOrMore(expr)
- expr[1, ...] is equivalent to OneOrMore(expr)
- expr[n, ...] or expr[n,] is equivalent
to expr*n + ZeroOrMore(expr)
(read as "n or more instances of expr")
- expr[..., n] is equivalent to expr*(0, n)
- expr[m, n] is equivalent to expr*(m, n)
Note that expr[..., n] and expr[m, n] do not raise an exception
if more than n exprs exist in the input stream. If this
behavior is desired, then write expr[..., n] + ~expr.
Better interpretation of [...] as ZeroOrMore raised by crowsonkb,
thanks for keeping me in line!
If upgrading from 2.4.1 or 2.4.1.1 and you have used `expr[...]`
for `OneOrMore(expr)`, it must be updated to `expr[1, ...]`.
- The defaults on all the `__diag__` switches have been set to False,
to avoid getting alarming warnings. To use these diagnostics, set
them to True after importing pyparsing.
Example:
import pyparsing as pp
pp.__diag__.warn_multiple_tokens_in_named_alternation = True
- Fixed bug introduced by the use of __getitem__ for repetition,
overlooking Python's legacy implementation of iteration
by sequentially calling __getitem__ with increasing numbers until
getting an IndexError. Found during investigation of problem
reported by murlock, merci!
Version 2.4.2a1 - July, 2019
----------------------------
It turns out I got the meaning of `[...]` absolutely backwards,
so I've deleted 2.4.1 and am repushing this release as 2.4.2a1
for people to give it a try before I can call it ready to go.
The `expr[...]` notation was pushed out to be synonymous with
`OneOrMore(expr)`, but this is really counter to most Python
notations (and even other internal pyparsing notations as well).
It should have been defined to be equivalent to ZeroOrMore(expr).
- Changed [...] to emit ZeroOrMore instead of OneOrMore.
- Removed code that treats ParserElements like iterables.
- Change all __diag__ switches to False.
Version 2.4.1.1 - July 24, 2019
-------------------------------
This is a re-release of version 2.4.1 to restore the release history
in PyPI, since the 2.4.1 release was deleted.
There are 3 known issues in this release, which are fixed in
the upcoming 2.4.2:
- API change adding support for `expr[...]` - the original
code in 2.4.1 incorrectly implemented this as OneOrMore.
Code using this feature under this relase should explicitly
use `expr[0, ...]` for ZeroOrMore and `expr[1, ...]` for
OneOrMore. In 2.4.2 you will be able to write `expr[...]`
equivalent to `ZeroOrMore(expr)`.
- Bug if composing And, Or, MatchFirst, or Each expressions
using an expression. This only affects code which uses
explicit expression construction using the And, Or, etc.
classes instead of using overloaded operators '+', '^', and
so on. If constructing an And using a single expression,
you may get an error that "cannot multiply ParserElement by
0 or (0, 0)" or a Python `IndexError`. Change code like
cmd = Or(Word(alphas))
to
cmd = Or([Word(alphas)])
(Note that this is not the recommended style for constructing
Or expressions.)
- Some newly-added `__diag__` switches are enabled by default,
which may give rise to noisy user warnings for existing parsers.
You can disable them using:
import pyparsing as pp
pp.__diag__.warn_multiple_tokens_in_named_alternation = False
pp.__diag__.warn_ungrouped_named_tokens_in_collection = False
pp.__diag__.warn_name_set_on_empty_Forward = False
pp.__diag__.warn_on_multiple_string_args_to_oneof = False
pp.__diag__.enable_debug_on_named_expressions = False
In 2.4.2 these will all be set to False by default.
Version 2.4.1 - July, 2019
--------------------------
- NOTE: Deprecated functions and features that will be dropped
in pyparsing 2.5.0 (planned next release):
. support for Python 2 - ongoing users running with
Python 2 can continue to use pyparsing 2.4.1
. ParseResults.asXML() - if used for debugging, switch
to using ParseResults.dump(); if used for data transfer,
use ParseResults.asDict() to convert to a nested Python
dict, which can then be converted to XML or JSON or
other transfer format
. operatorPrecedence synonym for infixNotation -
convert to calling infixNotation
. commaSeparatedList - convert to using
pyparsing_common.comma_separated_list
. upcaseTokens and downcaseTokens - convert to using
pyparsing_common.upcaseTokens and downcaseTokens
. __compat__.collect_all_And_tokens will not be settable to
False to revert to pre-2.3.1 results name behavior -
review use of names for MatchFirst and Or expressions
containing And expressions, as they will return the
complete list of parsed tokens, not just the first one.
Use __diag__.warn_multiple_tokens_in_named_alternation
(described below) to help identify those expressions
in your parsers that will have changed as a result.
- A new shorthand notation has been added for repetition
expressions: expr[min, max], with '...' valid as a min
or max value:
- expr[...] is equivalent to OneOrMore(expr)
- expr[0, ...] is equivalent to ZeroOrMore(expr)
- expr[1, ...] is equivalent to OneOrMore(expr)
- expr[n, ...] or expr[n,] is equivalent
to expr*n + ZeroOrMore(expr)
(read as "n or more instances of expr")
- expr[..., n] is equivalent to expr*(0, n)
- expr[m, n] is equivalent to expr*(m, n)
Note that expr[..., n] and expr[m, n] do not raise an exception
if more than n exprs exist in the input stream. If this
behavior is desired, then write expr[..., n] + ~expr.
- '...' can also be used as short hand for SkipTo when used
in adding parse expressions to compose an And expression.
Literal('start') + ... + Literal('end')
And(['start', ..., 'end'])
are both equivalent to:
Literal('start') + SkipTo('end')("_skipped*") + Literal('end')
The '...' form has the added benefit of not requiring repeating
the skip target expression. Note that the skipped text is
returned with '_skipped' as a results name, and that the contents of
`_skipped` will contain a list of text from all `...`s in the expression.
- '...' can also be used as a "skip forward in case of error" expression:
expr = "start" + (Word(nums).setName("int") | ...) + "end"
expr.parseString("start 456 end")
['start', '456', 'end']
expr.parseString("start 456 foo 789 end")
['start', '456', 'foo 789 ', 'end']
- _skipped: ['foo 789 ']
expr.parseString("start foo end")
['start', 'foo ', 'end']
- _skipped: ['foo ']
expr.parseString("start end")
['start', '', 'end']
- _skipped: ['missing <int>']
Note that in all the error cases, the '_skipped' results name is
present, showing a list of the extra or missing items.
This form is only valid when used with the '|' operator.
- Improved exception messages to show what was actually found, not
just what was expected.
word = pp.Word(pp.alphas)
pp.OneOrMore(word).parseString("aaa bbb 123", parseAll=True)
Former exception message:
pyparsing.ParseException: Expected end of text (at char 8), (line:1, col:9)
New exception message:
pyparsing.ParseException: Expected end of text, found '1' (at char 8), (line:1, col:9)
- Added diagnostic switches to help detect and warn about common
parser construction mistakes, or enable additional parse
debugging. Switches are attached to the pyparsing.__diag__
namespace object:
- warn_multiple_tokens_in_named_alternation - flag to enable warnings when a results
name is defined on a MatchFirst or Or expression with one or more And subexpressions
(default=True)
- warn_ungrouped_named_tokens_in_collection - flag to enable warnings when a results
name is defined on a containing expression with ungrouped subexpressions that also
have results names (default=True)
- warn_name_set_on_empty_Forward - flag to enable warnings whan a Forward is defined
with a results name, but has no contents defined (default=False)
- warn_on_multiple_string_args_to_oneof - flag to enable warnings whan oneOf is
incorrectly called with multiple str arguments (default=True)
- enable_debug_on_named_expressions - flag to auto-enable debug on all subsequent
calls to ParserElement.setName() (default=False)
warn_multiple_tokens_in_named_alternation is intended to help
those who currently have set __compat__.collect_all_And_tokens to
False as a workaround for using the pre-2.3.1 code with named
MatchFirst or Or expressions containing an And expression.
- Added ParseResults.from_dict classmethod, to simplify creation
of a ParseResults with results names using a dict, which may be nested.
This makes it easy to add a sub-level of named items to the parsed
tokens in a parse action.
- Added asKeyword argument (default=False) to oneOf, to force
keyword-style matching on the generated expressions.
- ParserElement.runTests now accepts an optional 'file' argument to
redirect test output to a file-like object (such as a StringIO,
or opened file). Default is to write to sys.stdout.
- conditionAsParseAction is a helper method for constructing a
parse action method from a predicate function that simply
returns a boolean result. Useful for those places where a
predicate cannot be added using addCondition, but must be
converted to a parse action (such as in infixNotation). May be
used as a decorator if default message and exception types
can be used. See ParserElement.addCondition for more details
about the expected signature and behavior for predicate condition
methods.
- While investigating issue #93, I found that Or and
addCondition could interact to select an alternative that
is not the longest match. This is because Or first checks
all alternatives for matches without running attached
parse actions or conditions, orders by longest match, and
then rechecks for matches with conditions and parse actions.
Some expressions, when checking with conditions, may end
up matching on a shorter token list than originally matched,
but would be selected because of its original priority.
This matching code has been expanded to do more extensive
searching for matches when a second-pass check matches a
smaller list than in the first pass.
- Fixed issue #87, a regression in indented block.
Reported by Renz Bagaporo, who submitted a very nice repro
example, which makes the bug-fixing process a lot easier,
thanks!
- Fixed MemoryError issue #85 and #91 with str generation for
Forwards. Thanks decalage2 and Harmon758 for your patience.
- Modified setParseAction to accept None as an argument,
indicating that all previously-defined parse actions for the
expression should be cleared.
- Modified pyparsing_common.real and sci_real to parse reals
without leading integer digits before the decimal point,
consistent with Python real number formats. Original PR #98
submitted by ansobolev.
- Modified runTests to call postParse function before dumping out
the parsed results - allows for postParse to add further results,
such as indications of additional validation success/failure.
- Updated statemachine example: refactored state transitions to use
overridden classmethods; added <statename>Mixin class to simplify
definition of application classes that "own" the state object and
delegate to it to model state-specific properties and behavior.
- Added example nested_markup.py, showing a simple wiki markup with
nested markup directives, and illustrating the use of '...' for
skipping over input to match the next expression. (This example
uses syntax that is not valid under Python 2.)
- Rewrote delta_time.py example (renamed from deltaTime.py) to
fix some omitted formats and upgrade to latest pyparsing idioms,
beginning with writing an actual BNF.
- With the help and encouragement from several contributors, including
Matěj Cepl and Cengiz Kaygusuz, I've started cleaning up the internal
coding styles in core pyparsing, bringing it up to modern coding
practices from pyparsing's early development days dating back to
2003. Whitespace has been largely standardized along PEP8 guidelines,
removing extra spaces around parentheses, and adding them around
arithmetic operators and after colons and commas. I was going to hold
off on doing this work until after 2.4.1, but after cleaning up a
few trial classes, the difference was so significant that I continued
on to the rest of the core code base. This should facilitate future
work and submitted PRs, allowing them to focus on substantive code
changes, and not get sidetracked by whitespace issues.
Version 2.4.0 - April, 2019
---------------------------
- Well, it looks like the API change that was introduced in 2.3.1 was more
drastic than expected, so for a friendlier forward upgrade path, this
release:
. Bumps the current version number to 2.4.0, to reflect this
incompatible change.
. Adds a pyparsing.__compat__ object for specifying compatibility with
future breaking changes.
. Conditionalizes the API-breaking behavior, based on the value
pyparsing.__compat__.collect_all_And_tokens. By default, this value
will be set to True, reflecting the new bugfixed behavior. To set this
value to False, add to your code:
import pyparsing
pyparsing.__compat__.collect_all_And_tokens = False
. User code that is dependent on the pre-bugfix behavior can restore
it by setting this value to False.
In 2.5 and later versions, the conditional code will be removed and
setting the flag to True or False in these later versions will have no
effect.
- Updated unitTests.py and simple_unit_tests.py to be compatible with
"python setup.py test". To run tests using setup, do:
python setup.py test
python setup.py test -s unitTests.suite
python setup.py test -s simple_unit_tests.suite
Prompted by issue #83 and PR submitted by bdragon28, thanks.
- Fixed bug in runTests handling '\n' literals in quoted strings.
- Added tag_body attribute to the start tag expressions generated by
makeHTMLTags, so that you can avoid using SkipTo to roll your own
tag body expression:
a, aEnd = pp.makeHTMLTags('a')
link = a + a.tag_body("displayed_text") + aEnd
for t in s.searchString(html_page):
print(t.displayed_text, '->', t.startA.href)
- indentedBlock failure handling was improved; PR submitted by TMiguelT,
thanks!
- Address Py2 incompatibility in simpleUnitTests, plus explain() and
Forward str() cleanup; PRs graciously provided by eswald.
- Fixed docstring with embedded '\w', which creates SyntaxWarnings in
Py3.8, issue #80.
- Examples:
- Added example parser for rosettacode.org tutorial compiler.
- Added example to show how an HTML table can be parsed into a
collection of Python lists or dicts, one per row.
- Updated SimpleSQL.py example to handle nested selects, reworked
'where' expression to use infixNotation.
- Added include_preprocessor.py, similar to macroExpander.py.
- Examples using makeHTMLTags use new tag_body expression when
retrieving a tag's body text.
- Updated examples that are runnable as unit tests:
python setup.py test -s examples.antlr_grammar_tests
python setup.py test -s examples.test_bibparse
Version 2.3.1 - January, 2019
-----------------------------
- POSSIBLE API CHANGE: this release fixes a bug when results names were
attached to a MatchFirst or Or object containing an And object.
Previously, a results name on an And object within an enclosing MatchFirst
or Or could return just the first token in the And. Now, all the tokens
matched by the And are correctly returned. This may result in subtle
changes in the tokens returned if you have this condition in your pyparsing
scripts.
- New staticmethod ParseException.explain() to help diagnose parse exceptions
by showing the failing input line and the trace of ParserElements in
the parser leading up to the exception. explain() returns a multiline
string listing each element by name. (This is still an experimental
method, and the method signature and format of the returned string may
evolve over the next few releases.)
Example:
# define a parser to parse an integer followed by an
# alphabetic word
expr = pp.Word(pp.nums).setName("int")
+ pp.Word(pp.alphas).setName("word")
try:
# parse a string with a numeric second value instead of alpha
expr.parseString("123 355")
except pp.ParseException as pe:
print(pp.ParseException.explain(pe))
Prints:
123 355
^
ParseException: Expected word (at char 4), (line:1, col:5)
__main__.ExplainExceptionTest
pyparsing.And - {int word}
pyparsing.Word - word
explain() will accept any exception type and will list the function
names and parse expressions in the stack trace. This is especially
useful when an exception is raised in a parse action.
Note: explain() is only supported under Python 3.
- Fix bug in dictOf which could match an empty sequence, making it
infinitely loop if wrapped in a OneOrMore.
- Added unicode sets to pyparsing_unicode for Latin-A and Latin-B ranges.
- Added ability to define custom unicode sets as combinations of other sets
using multiple inheritance.
class Turkish_set(pp.pyparsing_unicode.Latin1, pp.pyparsing_unicode.LatinA):
pass
turkish_word = pp.Word(Turkish_set.alphas)
- Updated state machine import examples, with state machine demos for:
. traffic light
. library book checkin/checkout
. document review/approval
In the traffic light example, you can use the custom 'statemachine' keyword
to define the states for a traffic light, and have the state classes
auto-generated for you:
statemachine TrafficLightState:
Red -> Green
Green -> Yellow
Yellow -> Red
Similar for state machines with named transitions, like the library book
state example:
statemachine LibraryBookState:
New -(shelve)-> Available
Available -(reserve)-> OnHold
OnHold -(release)-> Available
Available -(checkout)-> CheckedOut
CheckedOut -(checkin)-> Available
Once the classes are defined, then additional Python code can reference those
classes to add class attributes, instance methods, etc.
See the examples in examples/statemachine
- Added an example parser for the decaf language. This language is used in
CS compiler classes in many colleges and universities.
- Fixup of docstrings to Sphinx format, inclusion of test files in the source
package, and convert markdown to rst throughout the distribution, great job
by Matěj Cepl!
- Expanded the whitespace characters recognized by the White class to include
all unicode defined spaces. Suggested in Issue #51 by rtkjbillo.
- Added optional postParse argument to ParserElement.runTests() to add a
custom callback to be called for test strings that parse successfully. Useful
for running tests that do additional validation or processing on the parsed
results. See updated chemicalFormulas.py example.
- Removed distutils fallback in setup.py. If installing the package fails,
please update to the latest version of setuptools. Plus overall project code
cleanup (CRLFs, whitespace, imports, etc.), thanks Jon Dufresne!
- Fix bug in CaselessKeyword, to make its behavior consistent with
Keyword(caseless=True). Fixes Issue #65 reported by telesphore.
Version 2.3.0 - October, 2018
-----------------------------
- NEW SUPPORT FOR UNICODE CHARACTER RANGES
This release introduces the pyparsing_unicode namespace class, defining
a series of language character sets to simplify the definition of alphas,
nums, alphanums, and printables in the following language sets:
. Arabic
. Chinese
. Cyrillic
. Devanagari
. Greek
. Hebrew
. Japanese (including Kanji, Katakana, and Hirigana subsets)
. Korean
. Latin1 (includes 7 and 8-bit Latin characters)
. Thai
. CJK (combination of Chinese, Japanese, and Korean sets)
For example, your code can define words using: