Skip to content

Commit

Permalink
Merge branch 'rust-lang:master' into patch-1
Browse files Browse the repository at this point in the history
  • Loading branch information
barafael authored Nov 13, 2023
2 parents dabfeb0 + 837fd85 commit 4f940f6
Show file tree
Hide file tree
Showing 107 changed files with 5,445 additions and 1,656 deletions.
51 changes: 34 additions & 17 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -54,10 +54,14 @@ jobs:
os: ubuntu-latest
rust: stable
target: i686-unknown-linux-gnu
- build: stable-mips
- build: stable-powerpc64
os: ubuntu-latest
rust: stable
target: mips64-unknown-linux-gnuabi64
target: powerpc64-unknown-linux-gnu
- build: stable-s390x
os: ubuntu-latest
rust: stable
target: s390x-unknown-linux-gnu
- build: beta
os: ubuntu-latest
rust: beta
Expand All @@ -77,7 +81,7 @@ jobs:
- name: Checkout repository
uses: actions/checkout@v3
- name: Install Rust
uses: dtolnay/rust-toolchain@v1
uses: dtolnay/rust-toolchain@master
with:
toolchain: ${{ matrix.rust }}
- name: Install and configure Cross
Expand All @@ -92,12 +96,6 @@ jobs:
cd "$dir"
curl -LO "https://github.com/cross-rs/cross/releases/download/$CROSS_VERSION/cross-x86_64-unknown-linux-musl.tar.gz"
tar xf cross-x86_64-unknown-linux-musl.tar.gz
# We used to install 'cross' from master, but it kept failing. So now
# we build from a known-good version until 'cross' becomes more stable
# or we find an alternative. Notably, between v0.2.1 and current
# master (2022-06-14), the number of Cross's dependencies has doubled.
# cargo install --bins --git https://github.com/rust-embedded/cross --tag v0.2.1
echo "CARGO=cross" >> $GITHUB_ENV
echo "TARGET=--target ${{ matrix.target }}" >> $GITHUB_ENV
- name: Show command used for Cargo
Expand Down Expand Up @@ -141,9 +139,28 @@ jobs:
- name: Checkout repository
uses: actions/checkout@v3
- name: Install Rust
uses: dtolnay/rust-toolchain@v1
uses: dtolnay/rust-toolchain@master
with:
toolchain: 1.60.0
toolchain: 1.65.0
# The memchr 2.6 release purportedly bumped its MSRV to Rust 1.60, but it
# turned out that on aarch64, it was using something that wasn't stabilized
# until Rust 1.61[1]. (This was an oversight on my part. I had previously
# thought everything I needed was on Rust 1.60.) To resolve that, I just
# bumped memchr's MSRV to 1.61. Since it was so soon after the memchr 2.6
# release, I treated this as a bugfix.
#
# But the regex crate's MSRV is at Rust 1.60, and it now depends on at
# least memchr 2.6 (to make use of its `alloc` feature). So we can't set
# a lower minimal version. And I can't just bump the MSRV in a patch
# release as a bug fix because regex 1.9 was released quite some time ago.
# I could just release regex 1.10 and bump the MSRV there, but eh, I don't
# want to put out another minor version release just for this.
#
# So... pin memchr to 2.6.2, which at least works on x86-64 on Rust 1.60.
#
# [1]: https://github.com/BurntSushi/memchr/issues/136
- name: Pin memchr to 2.6.2
run: cargo update -p memchr --precise 2.6.2
- name: Basic build
run: cargo build --verbose
- name: Build docs
Expand All @@ -162,7 +179,7 @@ jobs:
- name: Checkout repository
uses: actions/checkout@v3
- name: Install Rust
uses: dtolnay/rust-toolchain@v1
uses: dtolnay/rust-toolchain@master
with:
toolchain: stable
- name: Run full test suite
Expand All @@ -175,7 +192,7 @@ jobs:
- name: Checkout repository
uses: actions/checkout@v3
- name: Install Rust
uses: dtolnay/rust-toolchain@v1
uses: dtolnay/rust-toolchain@master
with:
toolchain: stable
- name: Run full test suite
Expand All @@ -188,7 +205,7 @@ jobs:
- name: Checkout repository
uses: actions/checkout@v3
- name: Install Rust
uses: dtolnay/rust-toolchain@v1
uses: dtolnay/rust-toolchain@master
with:
toolchain: stable
- name: Run full test suite
Expand All @@ -201,7 +218,7 @@ jobs:
- name: Checkout repository
uses: actions/checkout@v3
- name: Install Rust
uses: dtolnay/rust-toolchain@v1
uses: dtolnay/rust-toolchain@master
with:
toolchain: stable
- name: Run full test suite
Expand All @@ -216,7 +233,7 @@ jobs:
- name: Checkout repository
uses: actions/checkout@v3
- name: Install Rust
uses: dtolnay/rust-toolchain@v1
uses: dtolnay/rust-toolchain@master
with:
# We use nightly here so that we can use miri I guess?
# It caught me by surprise that miri seems to only be
Expand All @@ -233,7 +250,7 @@ jobs:
- name: Checkout repository
uses: actions/checkout@v3
- name: Install Rust
uses: dtolnay/rust-toolchain@v1
uses: dtolnay/rust-toolchain@master
with:
toolchain: stable
components: rustfmt
Expand Down
195 changes: 195 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,198 @@
1.10.2 (2023-10-16)
===================
This is a new patch release that fixes a search regression where incorrect
matches could be reported.

Bug fixes:

* [BUG #1110](https://github.com/rust-lang/regex/issues/1110):
Revert broadening of reverse suffix literal optimization introduced in 1.10.1.


1.10.1 (2023-10-14)
===================
This is a new patch release with a minor increase in the number of valid
patterns and a broadening of some literal optimizations.

New features:

* [FEATURE 04f5d7be](https://github.com/rust-lang/regex/commit/04f5d7be4efc542864cc400f5d43fbea4eb9bab6):
Loosen ASCII-compatible rules such that regexes like `(?-u:☃)` are now allowed.

Performance improvements:

* [PERF 8a8d599f](https://github.com/rust-lang/regex/commit/8a8d599f9d2f2d78e9ad84e4084788c2d563afa5):
Broader the reverse suffix optimization to apply in more cases.


1.10.0 (2023-10-09)
===================
This is a new minor release of `regex` that adds support for start and end
word boundary assertions. That is, `\<` and `\>`. The minimum supported Rust
version has also been raised to 1.65, which was released about one year ago.

The new word boundary assertions are:

* `\<` or `\b{start}`: a Unicode start-of-word boundary (`\W|\A` on the left,
`\w` on the right).
* `\>` or `\b{end}`: a Unicode end-of-word boundary (`\w` on the left, `\W|\z`
on the right)).
* `\b{start-half}`: half of a Unicode start-of-word boundary (`\W|\A` on the
left).
* `\b{end-half}`: half of a Unicode end-of-word boundary (`\W|\z` on the
right).

The `\<` and `\>` are GNU extensions to POSIX regexes. They have been added
to the `regex` crate because they enjoy somewhat broad support in other regex
engines as well (for example, vim). The `\b{start}` and `\b{end}` assertions
are aliases for `\<` and `\>`, respectively.

The `\b{start-half}` and `\b{end-half}` assertions are not found in any
other regex engine (although regex engines with general look-around support
can certainly express them). They were added principally to support the
implementation of word matching in grep programs, where one generally wants to
be a bit more flexible in what is considered a word boundary.

New features:

* [FEATURE #469](https://github.com/rust-lang/regex/issues/469):
Add support for `\<` and `\>` word boundary assertions.
* [FEATURE(regex-automata) #1031](https://github.com/rust-lang/regex/pull/1031):
DFAs now have a `start_state` method that doesn't use an `Input`.

Performance improvements:

* [PERF #1051](https://github.com/rust-lang/regex/pull/1051):
Unicode character class operations have been optimized in `regex-syntax`.
* [PERF #1090](https://github.com/rust-lang/regex/issues/1090):
Make patterns containing lots of literal characters use less memory.

Bug fixes:

* [BUG #1046](https://github.com/rust-lang/regex/issues/1046):
Fix a bug that could result in incorrect match spans when using a Unicode word
boundary and searching non-ASCII strings.
* [BUG(regex-syntax) #1047](https://github.com/rust-lang/regex/issues/1047):
Fix panics that can occur in `Ast->Hir` translation (not reachable from `regex`
crate).
* [BUG(regex-syntax) #1088](https://github.com/rust-lang/regex/issues/1088):
Remove guarantees in the API that connect the `u` flag with a specific HIR
representation.

`regex-automata` breaking change release:

This release includes a `regex-automata 0.4.0` breaking change release, which
was necessary in order to support the new word boundary assertions. For
example, the `Look` enum has new variants and the `LookSet` type now uses `u32`
instead of `u16` to represent a bitset of look-around assertions. These are
overall very minor changes, and most users of `regex-automata` should be able
to move to `0.4` from `0.3` without any changes at all.

`regex-syntax` breaking change release:

This release also includes a `regex-syntax 0.8.0` breaking change release,
which, like `regex-automata`, was necessary in order to support the new word
boundary assertions. This release also includes some changes to the `Ast`
type to reduce heap usage in some cases. If you are using the `Ast` type
directly, your code may require some minor modifications. Otherwise, users of
`regex-syntax 0.7` should be able to migrate to `0.8` without any code changes.

`regex-lite` release:

The `regex-lite 0.1.1` release contains support for the new word boundary
assertions. There are no breaking changes.


1.9.6 (2023-09-30)
==================
This is a patch release that fixes a panic that can occur when the default
regex size limit is increased to a large number.

* [BUG aa4e4c71](https://github.com/rust-lang/regex/commit/aa4e4c7120b0090ce0624e3c42a2ed06dd8b918a):
Fix a bug where computing the maximum haystack length for the bounded
backtracker could result underflow and thus provoke a panic later in a search
due to a broken invariant.


1.9.5 (2023-09-02)
==================
This is a patch release that hopefully mostly fixes a performance bug that
occurs when sharing a regex across multiple threads.

Issue [#934](https://github.com/rust-lang/regex/issues/934)
explains this in more detail. It is [also noted in the crate
documentation](https://docs.rs/regex/latest/regex/#sharing-a-regex-across-threads-can-result-in-contention).
The bug can appear when sharing a regex across multiple threads simultaneously,
as might be the case when using a regex from a `OnceLock`, `lazy_static` or
similar primitive. Usually high contention only results when using many threads
to execute searches on small haystacks.

One can avoid the contention problem entirely through one of two methods.
The first is to use lower level APIs from `regex-automata` that require passing
state explicitly, such as [`meta::Regex::search_with`](https://docs.rs/regex-automata/latest/regex_automata/meta/struct.Regex.html#method.search_with).
The second is to clone a regex and send it to other threads explicitly. This
will not use any additional memory usage compared to sharing the regex. The
only downside of this approach is that it may be less convenient, for example,
it won't work with things like `OnceLock` or `lazy_static` or `once_cell`.

With that said, as of this release, the contention performance problems have
been greatly reduced. This was achieved by changing the free-list so that it
was sharded across threads, and that ensuring each sharded mutex occupies a
single cache line to mitigate false sharing. So while contention may still
impact performance in some cases, it should be a lot better now.

Because of the changes to how the free-list works, please report any issues you
find with this release. That not only includes search time regressions but also
significant regressions in memory usage. Reporting improvements is also welcome
as well! If possible, provide a reproduction.

Bug fixes:

* [BUG #934](https://github.com/rust-lang/regex/issues/934):
Fix a performance bug where high contention on a single regex led to massive
slow downs.


1.9.4 (2023-08-26)
==================
This is a patch release that fixes a bug where `RegexSet::is_match(..)` could
incorrectly return false (even when `RegexSet::matches(..).matched_any()`
returns true).

Bug fixes:

* [BUG #1070](https://github.com/rust-lang/regex/issues/1070):
Fix a bug where a prefilter was incorrectly configured for a `RegexSet`.


1.9.3 (2023-08-05)
==================
This is a patch release that fixes a bug where some searches could result in
incorrect match offsets being reported. It is difficult to characterize the
types of regexes susceptible to this bug. They generally involve patterns
that contain no prefix or suffix literals, but have an inner literal along with
a regex prefix that can conditionally match.

Bug fixes:

* [BUG #1060](https://github.com/rust-lang/regex/issues/1060):
Fix a bug with the reverse inner literal optimization reporting incorrect match
offsets.


1.9.2 (2023-08-05)
==================
This is a patch release that fixes another memory usage regression. This
particular regression occurred only when using a `RegexSet`. In some cases,
much more heap memory (by one or two orders of magnitude) was allocated than in
versions prior to 1.9.0.

Bug fixes:

* [BUG #1059](https://github.com/rust-lang/regex/issues/1059):
Fix a memory usage regression when using a `RegexSet`.


1.9.1 (2023-07-07)
==================
This is a patch release which fixes a memory usage regression. In the regex
Expand Down
11 changes: 6 additions & 5 deletions Cargo.toml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[package]
name = "regex"
version = "1.9.1" #:version
version = "1.10.2" #:version
authors = ["The Rust Project Developers", "Andrew Gallant <[email protected]>"]
license = "MIT OR Apache-2.0"
readme = "README.md"
Expand All @@ -15,7 +15,7 @@ categories = ["text-processing"]
autotests = false
exclude = ["/scripts/*", "/.github/*"]
edition = "2021"
rust-version = "1.60.0"
rust-version = "1.65"

[workspace]
members = [
Expand Down Expand Up @@ -52,6 +52,7 @@ std = [
# to actually emit the log messages somewhere.
logging = [
"aho-corasick?/logging",
"memchr?/logging",
"regex-automata/logging",
]
# The 'use_std' feature is DEPRECATED. It will be removed in regex 2. Until
Expand Down Expand Up @@ -167,20 +168,20 @@ optional = true

# For skipping along search text quickly when a leading byte is known.
[dependencies.memchr]
version = "2.5.0"
version = "2.6.0"
optional = true

# For the actual regex engines.
[dependencies.regex-automata]
path = "regex-automata"
version = "0.3.1"
version = "0.4.3"
default-features = false
features = ["alloc", "syntax", "meta", "nfa-pikevm"]

# For parsing regular expressions.
[dependencies.regex-syntax]
path = "regex-syntax"
version = "0.7.3"
version = "0.8.2"
default-features = false

[dev-dependencies]
Expand Down
Loading

0 comments on commit 4f940f6

Please sign in to comment.