Activity
Refactored auxiliary functions for segmenting with indices, with a bi…
Refactored auxiliary functions for segmenting with indices, with a bi…
AV evaluation now uses ChainedCounters to be able to average over fix…
AV evaluation now uses ChainedCounters to be able to average over fix…
Added ChainedCounter to the dictionary utils, for future moving-avera…
Added ChainedCounter to the dictionary utils, for future moving-avera…
Overhauled the accessor variety (AV) evaluation with many more metrics.
Overhauled the accessor variety (AV) evaluation with many more metrics.
Refactoring, and added a method for adding a progress bar to a NamedI…
Refactoring, and added a method for adding a progress bar to a NamedI…
Added multiplexer that selects based on maximal compression, and adde…
Added multiplexer that selects based on maximal compression, and adde…
Added two new metrics for segmentation diversity.
Added two new metrics for segmentation diversity.
Added more exceptional cases to Rényi entropy, and fixed two bugs wit…
Added more exceptional cases to Rényi entropy, and fixed two bugs wit…
Improved and added several evaluation tools.
Improved and added several evaluation tools.
Added iterable implementation of integerPartitions_k.
Added iterable implementation of integerPartitions_k.
Better timer, fixed graph samplers, added identity tokeniser.
Better timer, fixed graph samplers, added identity tokeniser.
Added graph-based rejection sampler for uniformly random segmentations.
Added graph-based rejection sampler for uniformly random segmentations.
Added function for filtering Nones from iterables.
Added function for filtering Nones from iterables.
Small fix in KudoPiece32ki deserialiser.
Small fix in KudoPiece32ki deserialiser.
Added histogram-based visualisation for token length and token amount.
Added histogram-based visualisation for token length and token amount.
Major preprocessing overhaul, now with full support for SentencePiece.
Major preprocessing overhaul, now with full support for SentencePiece.
Refactored tktkt.builders and tktkt.files into one submodule tktkt.fa…
Refactored tktkt.builders and tktkt.files into one submodule tktkt.fa…
Moved vocabulary builders away from the tokeniser builders because th…
Moved vocabulary builders away from the tokeniser builders because th…
Bugfixes and added nlpaug support.
Bugfixes and added nlpaug support.
Added forwards and backwards segmentation graph samplers for usage in…
Added forwards and backwards segmentation graph samplers for usage in…
Vocabulary builders to complement tokeniser builders.
Vocabulary builders to complement tokeniser builders.
Fixed DeL's text data not being installed with the package (non-edita…
Fixed DeL's text data not being installed with the package (non-edita…
Added more BPE visualisation examples and fixed small bug.
Added more BPE visualisation examples and fixed small bug.