-
Notifications
You must be signed in to change notification settings - Fork 6
/
CHANGES
1158 lines (829 loc) · 44.2 KB
/
CHANGES
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
Change History
==============
Version 3.0.1
-------------
- Fixed bug when using `None` as a key into a unique index.
- Fixed output width when calling `Table.present()`. Previously, this would limit the
width of the output to the current console default width (often set to 80), even if
sending the output to a file. The width can now be overridden by passing an integer
`width` argument to `Table.present()`.
- Updated `how_to_use_littletable.md` with notes on adding a `User-Agent` header when
importing data from HTTP/HTTPS urls.
Version 3.0.0
-------------
- Removed deprecated features:
- `DataObject` class removed, replace with `typing.SimpleNamespace`, `dict`, `typing.NamedTuple`,
`collections.namedtuple`, or other user-defined class
- `Table.re_match(patt)` comparator removed, replace with `re.compile(patt).match`
- Added `median` to statistics returned from `Table.stats()`.
- Added changes to SSL context arguments handling when importing from a URL, to address
removal of `cafile`, `capath`, and `cadata` args to `ssl.SSLContext` in Python 3.13.
- Added support for `timeout` argument when importing from a URL. The default value is defined
in the module constant `littletable.DEFAULT_HTTP_TIMEOUT` = 60, but this value can be modified.
- Added exception types to `__all__`.
Version 2.3.3
-------------
- Fixed bug when using `all` on an indexed field (values would be reported in key
order, not in the order they would be found in the table).
- Importing from `.tar.gz` archives added.
- Importing from `.zip`, `.gz`, and `.tar.gz` archives is more tolerant of the
archive file name not exactly matching the compressed contents file name,
as long as the archive contains only one file.
- Added support for wild card transforms in `csv_import`, using "*" as the transform
key. Import data files that have many numeric columns will be able to define a
wild card transform, rather than having to list all the numeric columns.
Instead of:
data = """\
label,a,b,c
A,1,2,3
B,4,5,6
"""
table = littletable.csv_import(data, transforms={"a": int, "b": int, "c": int})
you can write:
table = littletable.csv_import(data, transforms={"*": int})
"*" will try to convert all fields that are not otherwise listed with transforms,
and if an exception occurs (as would happen when trying to do `int()` on a `label`
value), leaves the field's value as-is.
More information on this feature in `how_to_use_littletable.md`.
- Extended `Table.compute_field` to accept just a str argument, naming an existing
field. Use this to add a field name that is an easily referenced Python identifier
with the same values as an existing field name, that might have embedded spaces,
punctuation characters, etc.
table.compute_field("caloric_value", "Caloric Value")
- Added `food_data.py` example, working with CSV files downloaded from Kaggle.
Version 2.3.2
-------------
- Renamed `Table.add_field` to `Table.compute_field`, to better reflect that it
can be used to compute a new field, or overwrite the current values for that
field for all the rows in the table, and is not just for adding new fields.
`Table.add_field` will be retained as a compatibility synonym.
Version 2.3.1
-------------
- Added `Table.batched`, similar to `itertools.batched` added in Python 3.12. Returns
a generator that yields tables sliced into n-sized batches:
for mini_table in tbl.batched(10):
... work with table containing only 10 entries ...
- Extended arguments support in `Table.splitby` to accept named arguments that define
the predicate splitting function in terms of the specific value of one or more
row attributes:
qa_data, production_assembly_data = data.splitby(
lambda rec: rec.env == "prod" and rec.dept == "assembly"
)
can be written as:
qa_data, production_data = data.splitby(env="prod", dept="assembly")
- Added `using` argument to `Table.create_search_index` to simplify creation of a
search index attribute by using multiple existing attributes. See this example
from the `explore_unicode.py` example:
unicode.create_search_index(
"name_words",
using=["name", "unicode_1_name", "iso10646_comment"],
)
The example creates a new field `name_words` by combining the attributes `name`,
`unicode_1_name`, and `iso10646_comment`, and then builds a search index using
this new field.
Version 2.3.0
-------------
- Implemented `get()` for objects returned by `table.by.<indexname>`, to emulate
`dict.get()` behavior (for unique indexes) and `defaultdict.get()` behavior (for
non-unique indexes). <-- POTENTIAL BREAKING CHANGE
- A new optional `errors` argument added to `Table.splitby` to define what action
to take if an exception occurs while evaluating the predicate function. Valid values
for `errors` are:
True : return exceptions as True
False: return exceptions as False
'discard': do not return table rows that raise exceptions
'return': return a third table containing rows that raise exceptions
'raise': raise the exception
`errors` can also be given as a dict mapping specific Exception types to one of
these 5 values.
The default value for `errors` (if omitted, or if an exception is raised
that is not listed in an `errors` dict), is to discard the row.
- New `attrgetter` function accepts a `defaults` dict argument to fill in missing attributes. Is now
used in `Table` sorting, splitting, and exporting, which would previously raise exceptions if one
or more attributes were missing in any row - now replaces missing attributes with None, or a given
default value. `littletable.attrgetter` is supported as part of the public API.
- Reworked `groupby` to be more similar to `itertools.groupby`, yielding a `(key, Table)` tuple for
each grouped list of rows. Includes an optional `sort` argument to sort the table before grouping,
using the same key attributes or function used for grouping. The previous prototype `groupby`
method has been renamed to `groupby_with_summaries`, as it supports the addition of dynamically
computed summary fields.
- Improved unit tests:
- added testing `Tables` containing `dataclasses` with `slots=True`
- added support for test running using `tox`
- modified tests to support running `tox run-parallel` (generate unique test
server ports for different Python versions)
Version 2.2.5
-------------
- Enhanced `Table.by`, `Table.all`, and `Table.search` methods to accept a field name that is not
a valid identifier by using a callable interface:
tbl = lt.csv_import(
"https://github.com/lukes/ISO-3166-Countries-with-Regional-Codes/blob/master/all/all.csv?raw=true"
)
# field "region" is a valid Python identifier
regions = list(tbl.all.region.unique)
# field "sub-region" is not a valid Python identifier, use callable interface
sub_regions = list(tbl.all("sub-region").unique)
- Added `examples/pyscript/matplotlib.html` example, showing how to use `littletable` within
a Pyscript static HTML file.
- Fixed minor code error in `Table.by` when determining if an index is unique or not.
(Not a bug, just fixed some bug-prone code.)
- Expanded `peps.py` example to Jupyter Notebook `PEPs data demo.ipynb`.
- Renamed `delete_index` to `drop_index` for more symmetry with SQL. `delete_index` is retained
for compatibility.
Version 2.2.4
-------------
- Added support for Python 3.13.
- Renamed `sort` to `orderby` to make the symmetry with relational SQL more apparent.
`sort` will be retained as a deprecated compatibility name.
- Added `Table.rank()` method, to add a ranking attribute to each object in the
table.
- Added/cleaned up many type annotations.
Version 2.2.3
-------------
- Fixed bug when calling `add_field` on an existing field that has been indexed, that
the index on that field would not reflect the new values.
- Added support for optional named arguments to `csv_import`,
`tsv_import`, `json_import`, and `excel_import` methods when the import
source is an HTTP or HTTPS URL:
- `headers`: dict to be used as request headers
- `body`: bytes for request body
- `username`: str to be added for basic authentication
- `password`: str to be added for basic authentication (default='')
- `context`: SSL Context passed to urlopen (see the urlopen docs at
https://docs.python.org/3/library/urllib.request.html#urllib.request.urlopen);
`capath` and `cafile` may be used instead, but use of these arguments is
deprecated and so discouraged
- Added `Table.as_dataframe()` method to make a pandas `DataFrame` from a
`littletable.Table`, and example `table_to_dataframe.py`.
Version 2.2.2
-------------
- Fixed bug using `datetime.UTC` (only available under Python 3.11 - sorry!)
Version 2.2.1
-------------
- Rewrote handling of multiple custom `JSONEncoder`s to use multiple
inheritance instead of explicit call chaining.
- 20-30% performance speedup when creating a search index.
- Better detection of English plurals when building search indexes. Searching
will also detect `Error`, `Exception`, and `Warning` word endings, which are
common in code documentation.
- Added module-level convenience methods for building `Table`s and importing from
CSV, TSV, JSON, or Excel files, so that in place of:
import littletable as lt
tbl = lt.Table()
tbl.csv_import(csv_data_file)
You can just write:
import littletable as lt
tbl = lt.csv_import(csv_data_file)
- Updated examples to new search return type, and new module-level `csv_import()`.
Also added new example `csv_import_examples.py` showing multiple snippets of
importing CSV data.
- Tables now keep track of timestamps when they were created, last modified, and
last imported.
Version 2.2.0
-------------
- BREAKING CHANGES:
- Support for Python versions <3.9 is dropped in this version. To run on
these older Python's, use `littletable` 2.1.2.
- The results from full text searches now return a Table by default.
- Added `DeprecationWarning` for usage of `DataObject` class. New code should
use `types.SimpleNamespaces`, or just plain Python dicts (which get stored
as `SimpleNamespaces`), namedtuples, or other user-defined class.
- Text search handles common English regular and irregular plural forms and
resolves to their singular forms for word searching.
- The Table of results returned from a full text search now gets titled with
the search query string.
- A new example for full text searches is included, `star_trek_tos.py`,
illustrating CSV import and table sorting, and searching episode descriptions
for keywords.
Version 2.1.2
-------------
- Added `json_encoder` argument to `Table.json_export`, so that custom data
fields can get exported without raising `JSONEncodeError`. `peps.py` example
has been modified to demonstrate this. The `json_encoder` argument can
take a single `JSONEncoder` subclass, or a tuple of subclasses, to be tried
in sequence. Each should follow the pattern given in the online Python
docs for the json module. (See updated code in examples/peps.py to see a
custom `JSONEncoder`.)
Also added `json_decoder` argument to `Table.json_import`, though it only
supports passing a single class.
Version 2.1.1
-------------
- Added `as_table` argument to text search functions, to return search results
as a table instead of as a list of (record, score) tuples. If `as_table` is
True and table records are of a type that will accept new attributes, each
record's score and optional match words are added as
`<search_attribute>_search_score` and `<search_attribute>_search_words` fields.
New example `peps.py` showcases some of these new JSON and full-text search,
features, using PEP data gathered from python.org.
- New example `future_import_features.py` peeks into the `__future__` module
to list out all the features that are defined, and their related metadata.
- Added docstring and annotations for generated `table.search.<search_attr>` methods.
- Added docstring for generated `table.by.<index_attr>` methods, and more explanation
in `create_index()` docstring on how to use indexed fields.
- Passing an unknown path element in `Table.json_import(path=path)` now raises
`KeyError` instead of unhelpful `TypeError`.
Version 2.1.0
-------------
- BREAKING CHANGES:
- littletable drops support for Python 3.6.
- `Table.json_import()` and `Table.json_export()` now default to non-streamed JSON.
Code that uses these methods in streaming mode must now call them with the
new `streaming=True` argument.
- Fixed type annotations for Table indexes, and verified type subclassing.
For this table:
tbl = lt.Table()
tbl.create_index("idx")
The following `isinstance()` tests are True:
Object collections.abc Abstract Base Classes
───────────────────────────────────────────────────
tbl Callable
Sized
Iterable
Container
Collection
Reversible
Sequence
tbl.by.idx Mapping
- `Table.csv_export()`, `tsv_export()`, and `json_export()`, if called with None as
the output destination (None is now the default), will return a string
containing the exported data.
# print first 10 rows of my_table as CSV data
print(my_table[:10].csv_export())
- `Table.json_export()` takes an optional parameter, `streaming` to control
whether the resulting JSON is a single JSON list element (if `streaming` is False),
or a separate JSON element per Table item (if `streaming` is True); the default
value is False. `streaming` is useful when passing data over a streaming protocol,
so that the Table contents can be unmarshaled separately on the receiving end.
- `Table.json_import()` takes two optional parameters:
- `streaming` to indicate that the input stream contains multiple JSON objects
(if streaming=True), or a single JSON list of objects (if streaming=False);
defaults to False
- `path`, a dot-delimited path of keys to read a list of JSON objects from a
sub-element of the input JSON text (only valid if streaming=False); defaults
to ""
Version 2.0.7
-------------
- Added support for sliced indexing into `Table` indexes, as a simple
form of range selection and filtering.
# employees.where(salary=Table.ge(50000))
employees.create_index("salary")
employees.by.salary[50000:]
Unlike Python list slices, `Table` index slices can use non-integer data
types (as long as they support `>=` and `<` comparison operations):
jan_01 = datetime.date(2000, 1, 1)
apr_01 = datetime.date(2000, 4, 1)
# first_qtr_sales = sales.where(date=Table.in_range(jan_01, apr_01))
sales.create_index("date")
first_qtr_sales = sales.by.date[jan_01: apr_01]
Slices with a step field (as in `[start : stop : step]`) are not supported.
See full example code in examples/sliced_indexing.py.
- Added new transform methods for importing timestamps as part of
CSV's.
- Table.parse_datetime(pattern, empty, on_error)
- Table.parse_date(pattern, empty, on_error)
- Table.parse_timedelta(pattern, reference_time, empty, on_error)
Each takes a pattern as would be used for `datetime.strptime()``, plus
optional values for empty inputs (default='') or error inputs
(default=None). `parse_timedelta` also takes a reference_time argument
to compute the resulting timedelta - default is 00:00:00.
See full example code in examples/time_conversions.py.
- `as_html()` now accepts an optional dict argument `table_properties`, to add HTML
`<table>`-level attributes to generated HTML:
tbl = lt.Table().csv_import("""\
a,b,c
1,2,3
4,5,6
""")
html = tbl.as_html(fields="a b c", table_properties={"border": 1, "cellpadding": 5}
- Workaround issue when running `Table.present()` in a terminal environment that does not support `isatty()`:
AttributeError: 'OutputCtxManager' object has no attribute 'isatty'
Version 2.0.6
-------------
- Simplified `Table.where()` when a filtering method takes a single
value as found in a record attribute.
For example, to find the odd `a` values in a Table, you would
previously write:
tbl.where(lambda rec: is_odd(rec.a))
Now you can write:
tbl.where(a=is_odd)
- The `Table.re_match` comparator is deprecated, and can be replaced with
this form:
word_str = "DOT"
# test if word_str is in field 'name' - DEPRECATED
tbl.where(name=lt.Table.re_match(rf".*?\b{word_str}\b"))
# test if word_str is in field 'name'
contains_word = re.compile(rf"\b{word_str}\b").search
tbl.where(name=contains_word)
See the `explore_unicode.py` example (line 185).
`Table.re_match` will be removed in a future release.
- Added helper method `Table.convert_numeric` to simplify converting
imported CSV values from str to int or float, or to replace empty
values with placeholders such as "n/a". Use as a transform in the
`transforms` dict argument of `Table.csv_import`.
- Added example `nfkc_normalization.py`, to show Unicode characters
that normalize to ASCII in Python scripts.
- Fixed internal bugs if using `groupby` when calling `Table.as_html()`.
- Added tests when using `__slots__` defined using a dict (feature
added in Python 3.8 to attach docstrings to attributes defined
using `__slots__`).
Version 2.0.5
-------------
- Added support for import/export to local Excel .xlsx spreadsheets.
tbl = Table().excel_import("data_table.xlsx")
Requires installation of openpyxl to do the spreadsheet handling.
PR submitted by Brunno Vanelli, very nice work, thanks!
A simple example script examples/excel_data_types.py has been added
to show how the data types of values are preserved or converted to
standard Python types when importing from Excel.
- Fixed count(), index(), remove(), and remove_many() to accept dict
objects. Identified while addressing issue #3.
- Fixed default of pop() to be -1.
- Added test cases for storing objects created using typing.NamedTuple.
Version 2.0.4
-------------
- Added grouping support in Table.present(), Table.as_html() and
Table.as_markdown(). Pass `groupby` parameter to indicate grouping
field.
tbl.present(groupby="ColumnA")
will display the table with repeated consecutive values for ColumnA
suppressed.
Multiple fields can be specified for grouping using a space-delimited
string. If multiple fields are given, they will be grouped hierarchically;
that is, the second field will group only if already grouping on the first
field, the third field will group only if already grouping on the second
field, etc. (For best grouping output, sort the records first.)
tbl = lt.Table("Academy Awards").csv_import("""\
year,award,movie,recipient
1960,Best Picture,Ben-Hur,
1960,Best Actor,Ben-Hur,Charlton Heston
1960,Best Actress,The Heiress,Simone Signoret
1960,Best Director,Ben-Hur,William Wyler
...
""")
tbl.present(groupby="year")
Academy Awards
Year Award Movie Recipient
────────────────────────────────────────────────────────────────────────────
1960 Best Picture Ben-Hur
Best Actor Ben-Hur Charlton Heston
Best Actress The Heiress Simone Signoret
Best Director Ben-Hur William Wyler
1961 Best Picture The Apartment
Best Actor Elmer Gantry Burt Lancaster
Best Actress Butterfield 8 Elizabeth Taylor
Best Director The Apartment Billy Wilder
1962 Best Picture West Side Story
Best Actor Judgment at Nuremberg Maximilian Schell
Best Actress Two Women Sophia Loren
Best Director West Side Story Willian Wise/Jerome Robbins
- Modified the table created by stats() to include a 'missing' column,
and reordered the columns so that when presented, the most likely interesting
stat (mean) is listed first.
- Added attributes to Table to record import source and type for csv_import,
tsv_import, and json_import. For imports from files or web urls, the
filename or url will be added as the table's default title.
- Added support and test cases for storing objects using the HasTraits
classes from the traits and traitlets packages. (Working with HasTraits
objects in a littletable.Table requires that they add littletable.HasTraitsMixin
to their inheritance classes).
- Added test cases for storing objects created using the attrs package.
Version 2.0.3
-------------
- Fixed bug introduced when adding support for auto-centering, if data
is not numeric or str but has no __len__ method.
- Cleaned up how_to_use_littletable.md docs to reflect recent changes and
new features.
Version 2.0.2
-------------
- Added unit tests to support using pydantic.BaseModel as Table contents.
- All xxx_import and _export methods will now accept a pathlib.Path to
reference the source or destination file.
- Fixed internally-reported version number (2.0.1 erroneously reported
its version as 2.0.0).
Version 2.0.1
-------------
- Fixed setup.py and setup.cfg to generate Py3-only wheel.
Version 2.0.0
-------------
- Discontinued support for Python 2.x, and Python 3 versions earlier than
Python 3.6.
- Added new comparators:
is_none - attribute value is None
is_not_none - attribute value is not None
is_null - attribute value is None, "", or not defined
is_not_null - attribute value is defined, and is not None or ""
startswith - attribute value starts with a given string
endswith - attribute value ends with a given string
re_match - attribute value matches a regular expression
Examples:
# get customers whose address includes an apartment number
has_apt = customers.where(address_apt_no=Table.is_not_null())
# get employees whose first name starts with "X"
x_names = employees.where(name=Table.startswith("X"))
# get log records that match a regex (any word starts with
# "warn" in the log description)
# (re_match will accept re flags argument)
warnings = log.where(description = Table.re_match(r".*\bwarn", flags=re.I)
- Added type annotations for public API methods.
- New examples:
- explore_unicode.py - code for exploring subset of the Unicode database,
showing several interesting symbol groups
- dune_casts.py - joining 3 tables to create a table showing the casts of
the 3 studio productions of Dune
- Nicer present() output - auto-center columns where all values
are single characters (centers "X", "Y", "N", etc. values).
Version 1.5.0
-------------
- Table.insert_many() now accepts a sequence of Python dicts for objects
to insert into a table (and Table.insert() will accept a single dict).
They get stored as SimpleNamespace objects to support all attribute
accesses.
- Added Table.as_markdown() to generate a string representing the
table using Markdown syntax.
- Added Table.splitby() - takes a predicate function (takes a table
record and returns True or False) and returns two tables: a table with
all the rows that returned False and a table with all the rows that
returned True. Will also accept a string indicating a particular field
name, and uses `bool(getattr(rec, field_name))` for the predicate
function.
is_odd = lambda x: bool(x % 2)
evens, odds = tbl.splitby(lambda rec: is_odd(rec.value))
nulls, not_nulls = tbl.splitby("optional_data_field")
Version 1.4.1
-------------
- Fixed bug when present() failed if a Table contained a field named 'default'
(would also have failed with a field named 'table'). Issue #1 (!!!) reported
by edp-penso, thanks!
- Add optional 'force' argument to create_search_index to regenerate a search
index if the contents of a searchable attribute changed.
- Some small optimizations to `Table.remove_many` and additional unit tests.
Version 1.4.0 -
---------------
- Added `Table.create_search_index()' and `Table.search.<attribute>` for full text
searching in a text attribute. Search terms may be prefixed with
'+' and '-' flags, to qualify the required or excluded nature of matching
for that term:
. + strong preference
. - strong anti-preference
. ++ required
. -- excluded
Example::
recipe_data = textwrap.dedent("""\
title,ingredients
Tuna casserole,tuna noodles cream of mushroom soup
Hawaiian pizza,pizza dough pineapple ham tomato sauce
BLT,bread bacon lettuce tomato mayonnaise
Bacon cheeseburger,ground beef bun lettuce ketchup mustard pickle cheese bacon
""")
recipes = lt.Table().csv_import(recipe_data)
recipes.create_search_index("ingredients")
matches = recipes.search.ingredients("+bacon tomato --pineapple")
The search index is valid only so long as no items are added or removed from the
table, and if the indexed attribute values stay unchanged; if a search is run on a
modified table, `littletable.SearchIndexInconsistentError` exception is raised.
Version 1.3.0 -
---------------
_(Sorry, it looks like I rushed the Table.join() changes for outer
joins in 1.2.0!)_
- Reworked the API for `Table.join()`, this is now just for inner
joins. Outer joins are now performed using `Table.outer_join`,
with the leading `join_type` argument of `Table.RIGHT_OUTER_JOIN`,
`Table.LEFT_OUTER_JOIN`, or `Table.FULL_OUTER_JOIN`.
As part of this rework, also cleaned up some leftover debugging
print() statements and a bug in the outer join logic, and spruced
up the docs. Outer joins are still slow, but at least with this
version they are giving proper results.
Version 1.2.0 -
---------------
- Import directly from simple .zip, .gz, or .xz/.lzma archives,
such as `data.csv.gz` or `data.csv.zip`. (For zip archives, the zip
file name must be the same as the compressed file with ".zip" added.)
- Add `join` argument to `Table.join()` to indicate whether an
"inner" join or "outer" join is to be performed. Accepted values
are "inner", "left outer", "right outer", "full outer", and "outer"
(synonym for "right outer"). Default join type is "inner".
- Fixed bug preventing joining on more than one field.
- Added `filters` argument to csv_import, to screen records as they
are read from the input file *before* they are added to the table. Can
be useful when dealing with large input files, to pre-screen data
before it is added to the table.
- Added tsv_export, analogous to csv_export, but using <TAB> character
as a value separator.
- Added comparators `Table.is_in` and `Table.not_in` to support filtering
by presence in or absence from a given collection of values. (For best
performance, if there are more than 4 values to be tested, convert the
collection to a Python `set` to optimize "in" testing.)
- For Python versions that support it, `types.SimpleNamespace` is now used
as the default row_class for dynamically created tables, instead of
`littletable.DataObject`. (This gives a significant performance boost.)
- Converted HowToUseLittletable.txt to Markdown how_to_use_littletable.md.
Version 1.1.0 -
---------------
- Added the `Table.present()` method, using the `rich` module to format
table contents into a clean tabular format. Also added notes in the
"How to Use Littletable.txt" file on creating nice tabular output with
`rich`, Jupyter Notebook, and `tabulate`. (Note: `rich` only supports
Python versions 3.6 and later.)
- Added `head(n)` and `tail(n)` methods for easy slicing of the first
or last `n` items in a Table (`n` defaults to 10).
- Added `Table.clear()` method to clear all contents of a Table, but
leaving any index definitions intact.
- Added comparators `Table.between(a, b)`, `Table.within(a, b)`, and
`Table.in_range(a, b)` for easy range testing:
`Table.between(a, b)` matches `a < x < b` exclusive match
`Table.within(a, b)` matches `a <= x <= b` inclusive match
`Table.in_range(a, b)` matches `a <= x < b`, range check, similar
to testing `x in range(a, b)` in Python
- Updated `Table.stats()` to use the Python statistics module for those
versions of Python that support it. The Tables returned from this
method now also include unique indexes to support `.by.name[field_name]`
access to the stats for a particular field, or `.by.stat[stat_name]`
access to a particular stat for all fields, if `Table.stats` is called
with `by_field=False`.
- Fixed `Table.stats()`` to return `None` for `min` and `max` values if
source table is empty. `Table.stats()` also defaults to using all
field names if a list is not given, and guards against non-numeric
data. If `stats()` is called on an empty Table, an empty Table of
statistics is returned.
- Removed sorting of field names in `table.info()["fields"]` so that
attribute names are kept in default order for tabular output.
- Proper definition of `table.all.x` iterators so that `iter(table.all.x)`
returns self. (Necessary for modules like statistics that check
if an iterator is passed by testing `if iter(data) is data`.)
- Added support for a `formats` named argument to `Table.as_html()`.
`formats` takes a dict that maps field names or field data types to
string formats or callables. If a string format, the string should
be of the form used to format a placeholder in the str.format method
(such as "{:5.2f}" for a real value formatted to two decimal places).
If a callable is passed, it should take a single value argument and
return a str.
- Fixed unit tests that fail under Python versions pre-3.6 that do not
preserve dict insertion order. (This was a bug in the unit test, not in
the littletable core code.)
Version 1.0.1 -
---------------
- Add support for optional .unique modifier for .all.<attr> value accessor.
for postal_code in customers.all.postal_code.unique:
...
- Added .all optimization when getting values for an indexed attribute.
- Added **kwargs support for csv_export, to permit passing arguments through
to the csv.DictWriter (such as `dialect`).
- Implemented `limit` named argument in csv_import.
- Closed issues when importing/exporting empty tables.
Version 1.0.0 -
----------------
- Add import directly from an HTTP/HTTPS url.
- Add Table.stats() method to return a table of common statistics for
selected numeric fields.
- Added methods Table.le, Table.lt, Table.ge, Table.gt, Table,ne, and
Table.eq as helpers for calling Table.where:
ret = table.where(lambda rec: rec.x > 100)
can now be written:
ret = table.where(x=Table.gt(100))
- Fixed bug when chaining multiple ".by" accesses:
data = lt.Table()
data.create_index("email", unique=True)
data.create_index("name")
data.create_index("city")
data.create_index("state")
for user in data.by.city['Springfield'].by.state['MO']:
print(user)
would formerly complain that the table has no index 'state'.
- `dir(table.by)` will now include the index names that can be used
as "by" attributes.
- Added unit tests to support using dataclasses as Table contents.
Version 0.13.2 -
----------------
- Fixed bug when deleting a slice containing None values from a table.
Special thanks to Chat Room 6 on StackOverflow.
- Fixed bug in insert_many when validating unique index keys.
- Fixed bugs in csv_import and tsv_import when named args were not
passed through to the internal csv DictReader.
- Fixed bug in csv_export where blank lines were included in the
exported CSV.
- Added Table.shuffle(), for randomizing the items in a table in place.
Can be useful for some games or simulations, also in internal testing.
- Fixed bug in Table.sort() when sorting on multiple attributes.
Version 0.13.1 -
---------------
- Modified all Table import methods to directly accept a string containing
the data to be imported. If the input is a multiline string, then it
is assumed to contain the actual data to be imported. If the input is a
string with no newlines, then it is treated as a filename, and the file
is opened for reading. The input can also still be specified as a
file-like object, such as would be the case if reading a file with
an encoding other than the default 'UTF-8'. This capability further
simplifies notebook integration and test and experimentation.
- Renamed format() to formatted_table().
- Introduced new format() method to generate a list of strings, one per
row in the table. format() is called passing a single string as an
argument, to be used as a str.format() template for converting each
row to a string.
Version 0.12.0 -
---------------
- Modified Table.select() to accept '*' and '-xxx'-style names, to
indicate '*' as 'all fields', and '-xxx' as "not including xxx". This
simplifies selecting "all fields except xxx".
- PivotTable.summary_counts() is renamed to as_table(). summary_counts()
is deprecated and will be dropped in a future release.
- Added Table.as_html() and PivotTable.as_table().as_html() methods
to support integration with Jupyter Notebook. Also included sample ipynb
file. as_html() takes a list of fields to be included in the table,
following the same syntax as Table.select(). This API is still
experimental, and may change before 1.0.0 release.
- Added Table.all.xxx attribute accessor, to yield out all the values of a
particular attribute in the table as a sequence.
- Added FixedWidthReader class to simplify import of data from files with
fixed width columns. See examples in HowToUseLittleTable.txt.
- Fixed bug where dict.iteritems() was used in some cases, all
now converted to using items().
- Updated xxx_import methods to return self, so that tables can be
declared and loaded in a single statement:
data_table = Table().csv_import('data_file.csv')
- Added optional row_class argument argument to all xxx_import methods
to designate a user class to use when constructing the rows to be imported
to the table. The default is DataObject, but any class that supports
`Class(**attributes)` construction (including namedtuples and
SimpleNamespace) can be given. If the desired class does not support
this kind of initialization, a factory method can be given instead.
- Deleted deprecated Table.run() and Table.addfield methods. run()
did nothing more than return self; addfield has been replaced by
add_field.
- Also deleted deprecated attribute access to table indexes. Indexed
access is now done as:
employee = emp_data.by.emp_id['12345']
qa_dept_employees = emp_data.by.department_name['QA']
Version 0.11 -
--------------
- Performance enhancement in insert(), also speeds up pivot() and other
related methods.
- Fixed bug in fetching keys if keys evaluate to a falsey value (such as 0).
Would also manifest as omitting columns from pivot tables for falsey
key values.
Version 0.10 -
-------------
- Deprecated access to indexed fields using '<tablename>.<fieldname>', as
this obscures the fact that the fields are attributes of the table's objects,
not of the table itself. Instead, index access will be done using the 'by'
index accessor, introduced in version 0.6 (see comments below in notes
for release 0.9). Indexes-as-table-attributes will be completely removed
in release 1.0.
- Deprecated Table.run(). Will be removed in release 1.0.
- Added Table.info() method, to give summary information about a table's
name, columns, indexes, size, etc.
- Extended interface to Table.csv_import, to accept passthru of
additional named arguments (such as 'delimiter' or 'fieldnames') to the
DictReader constructor used to read the import data.
- Extended interface to Table.csv_export and json_export, to support
addition of 'encoding' argument (default='UTF-8') for the output file.
(Python 3 only)
- Added set item support to DataObject, to support "obj['a'] = 100" style
assignments. Note that DataObjects are only semi-mutable: a given key or
attribute can only be assigned once, not overwritten.
- Added more list-like access to Table, including del table[index],
del table[start:end:step] and pop().
- Added 'key' argument to Table.unique, to support passing a callable to
unique for special cases for defining what makes an object 'unique'.
Default is the prior behavior, which is a tuple of all of the values of
the object.
- Added exceptions to DataObject when attempting to modify an existing
attribute. New attributes are still supported, but existing attributes
cannot be overwritten. (Applies to both attribute and indexed assignments.)
Formerly, these assignments would simply fail silently.
- Using OrderedDict when supported, to preserve field order in JSON output.
- Miscellaneous documentation cleanup.
Version 0.9 -
-------------
- Python 3 compatibility.
- (feature previously released in version 0.6 but not documented)
Added 'by' index accessor on tables, to help distinguish that the index
attributes are not attributes of the table itself, but of the objects
in the table:
# using unique index 'sku' on catalog table:
print(catalog.by.sku["ANVIL-001"].descr)
# using non-unique index 'state' on stations table:
stations.create_index("state")
for az_stn in stations.by.state['AZ']:
print(az_stn)
Updated inline examples to use '<table>.by.<index_name>' syntax.
Version 0.8 -
-------------
- Added json_import and json_export methods to Table, with same interface
as csv_import and csv_export. The import file should contain a JSON
object string per row, or a succession of JSON objects (can be pretty-
printed), but *not* a single JSON list of objects.
- Included pivot_demo.py as part of the source distribution.