All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog and this project adheres to Semantic Versioning.
- support for
#<=>
and#join
, which were added toset
in the meantime - support for getting the (overall) character set of a Regexp with multiple expressions
- support for global and local case-insensitivity in Regexp inputs
Regexp#{covered_by_character_set?,uses_character_set?}
methods (if core ext is used)
- new codepoints for
::assigned
and::emoji
predefined sets, as in Ruby 3.2.0
- fixed processing of Strings that are not ASCII- or UTF8-encoded
- removed dependency on
set
andsorted_set
- thanks to https://github.com/mikebaldry for reporting a related issue (#2)
::of
now supports bothString
andRegexp
arguments
- fixed segfault during
String
manipulation on Ruby 3.2.0-dev - improved performance for
String
manipulation - allow usage in Ractors
- predefined sets must be pre-initialized for this, though
- e.g.
CharacterSet.ascii
,keep_character_set(:ascii)
etc. - call them once in the main Ractor to trigger initialization
- new codepoints for
::assigned
and::emoji
predefined sets, as in Ruby 3.1.0 - latest unicode case-folding data (for
#case_insensitive
) - support for passing any Enumerable to
#disjoint?
,#intersect?
- this matches recent broadening of these methods in
ruby/set
- this matches recent broadening of these methods in
- new instance method
#secure_token
(see README) - class method
::of
now accepts more than oneString
CharacterSet::ExpressionConverter
can now build output of any Set-like class
CharacterSet::Pure::of_expression
now returns aCharacterSet::Pure
- it used to return a regular
CharacterSet
- it used to return a regular
- multiple fixes for Ruby 3
- fixed segfault for some
String
manipulation cases - added
sorted_set
as dependency, soCharacterSet::Pure
(non-C fallback) works
- fixed segfault for some
- fixed error when parsing a
Regexp
with an empty intersection (e.g./[a&&]/
)
#to_s_with_surrogate_ranges
/Writer::write_surrogate_ranges
- allows for much shorter astral plane representations e.g. in JavaScript
- thanks to https://github.com/singpolyma for the suggestion and groundwork (#1)
- improved performance for
#to_s
/Writer
by avoiding buggedRange#minmax
- '/' is now escaped by default when stringifying so as to work with //-regexp syntax
- improved
String
manipulation speed - improved initialization and
#merge
speed when passing a largeRange
- reduced memory consumption by > 90% for most use cases via dynamic resizing
- before, every set instance required 136 KB for codepoints
- now, 16 bytes for a CharacterSet in ASCII space, 8 KB for one in BMP space etc.
#count_in
and#scan
methods forString
interaction- new predefined sets
::any
/::all
,::assigned
,::surrogate
- conversion methods
#assigned_part
,#valid_part
- sectioning methods
#ascii_part
,#plane(n)
- section test methods
#ascii_part?
,#ascii_ratio
,#ascii_only?
,#astral_only?
#count
now supports passing an argument or block as usualCharacterSet::Pure#keep_in
,#delete_in
now preserve the original encoding
- added latest Unicode casefold data (for
#case_insensitive
)
- restored
range_compressor
as a runtime dependency for JRuby only
- improved messages for missing optional dependencies
- made
range_compressor
an optional dependency as it is almost never needed
- added option to reference a predefined set via Symbol in
String
extension methods - added predefined sets
::ascii_alnum
and::ascii_letters
Initial release.