Skip to content

Commit

Permalink
Merge pull request #122 from Alexhuszagh/safety
Browse files Browse the repository at this point in the history
Initial Littany of Safety Enhancements.

This removes a lot of unsafe code, documents the cases where removing it would have significant performance impacts but the safety invariants can be easily guaranteed, and likewise makes other enhancements to remove potentially unsafe behavior. This also redoes some architecture to make more code wrapped into safe variants, where rather than say if x.get(0) == b'0'. then do an unchecked index, instead it just has a peek and step in a single function, where applicable.. This also simplifies the code base a lot.

Part of many commits to address #100.
  • Loading branch information
Alexhuszagh authored Sep 11, 2024
2 parents 5611efb + 13194a5 commit 19bf353
Show file tree
Hide file tree
Showing 44 changed files with 968 additions and 1,825 deletions.
1 change: 1 addition & 0 deletions CHANGELOG
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
### Removed

- Support for mips (MIPS), mipsel (MIPS LE), mips64 (MIPS64 BE), and mips64el (MIPS64 LE) on Linux.
- All `_unchecked` API methods, since the performance benefits are dubious and it makes safety invariant checking much harder.

## [0.8.5] 2022-06-06

Expand Down
50 changes: 25 additions & 25 deletions CODE_OF_CONDUCT.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,29 +21,29 @@ In the interest of fostering an open and welcoming environment, we as contributo

Examples of behavior that contributes to creating a positive environment include:

* Using welcoming and inclusive language.
* Being respectful of differing viewpoints and experiences.
* Gracefully accepting constructive feedback.
* Focusing on what is best for the community.
* Showing empathy and kindness towards other community members.
* Encouraging and raising up your peers in the project so you can all bask in hacks and glory.
- Using welcoming and inclusive language.
- Being respectful of differing viewpoints and experiences.
- Gracefully accepting constructive feedback.
- Focusing on what is best for the community.
- Showing empathy and kindness towards other community members.
- Encouraging and raising up your peers in the project so you can all bask in hacks and glory.

Examples of unacceptable behavior by participants include:

* The use of sexualized language or imagery and unwelcome sexual attention or advances, including when simulated online. The only exception to sexual topics is channels/spaces specifically for topics of sexual identity.
* Casual mention of slavery or indentured servitude and/or false comparisons of one's occupation or situation to slavery. Please consider using or asking about alternate terminology when referring to such metaphors in technology.
* Making light of/making mocking comments about trigger warnings and content warnings.
* Trolling, insulting/derogatory comments, and personal or political attacks.
* Public or private harassment, deliberate intimidation, or threats.
* Publishing others' private information, such as a physical or electronic address, without explicit permission. This includes any sort of "outing" of any aspect of someone's identity without their consent.
* Publishing private screenshots or quotes of interactions in the context of this project without all quoted users' *explicit* consent.
* Publishing of private communication that doesn't have to do with reporting harrassment.
* Any of the above even when [presented as "ironic" or "joking"](https://en.wikipedia.org/wiki/Hipster_racism).
* Any attempt to present "reverse-ism" versions of the above as violations. Examples of reverse-isms are "reverse racism", "reverse sexism", "heterophobia", and "cisphobia".
* Unsolicited explanations under the assumption that someone doesn't already know it. Ask before you teach! Don't assume what people's knowledge gaps are.
* [Feigning or exaggerating surprise](https://www.recurse.com/manual#no-feigned-surprise) when someone admits to not knowing something.
* "[Well-actuallies](https://www.recurse.com/manual#no-well-actuallys)"
* Other conduct which could reasonably be considered inappropriate in a professional or community setting.
- The use of sexualized language or imagery and unwelcome sexual attention or advances, including when simulated online. The only exception to sexual topics is channels/spaces specifically for topics of sexual identity.
- Casual mention of slavery or indentured servitude and/or false comparisons of one's occupation or situation to slavery. Please consider using or asking about alternate terminology when referring to such metaphors in technology.
- Making light of/making mocking comments about trigger warnings and content warnings.
- Trolling, insulting/derogatory comments, and personal or political attacks.
- Public or private harassment, deliberate intimidation, or threats.
- Publishing others' private information, such as a physical or electronic address, without explicit permission. This includes any sort of "outing" of any aspect of someone's identity without their consent.
- Publishing private screenshots or quotes of interactions in the context of this project without all quoted users' *explicit* consent.
- Publishing of private communication that doesn't have to do with reporting harrassment.
- Any of the above even when [presented as "ironic" or "joking"](https://en.wikipedia.org/wiki/Hipster_racism).
- Any attempt to present "reverse-ism" versions of the above as violations. Examples of reverse-isms are "reverse racism", "reverse sexism", "heterophobia", and "cisphobia".
- Unsolicited explanations under the assumption that someone doesn't already know it. Ask before you teach! Don't assume what people's knowledge gaps are.
- [Feigning or exaggerating surprise](https://www.recurse.com/manual#no-feigned-surprise) when someone admits to not knowing something.
- "[Well-actuallies](https://www.recurse.com/manual#no-well-actuallys)"
- Other conduct which could reasonably be considered inappropriate in a professional or community setting.

## Scope

Expand All @@ -70,12 +70,12 @@ You may get in touch with the maintainer team through any of the following metho

### Further Enforcement

If you've already followed the [initial enforcement steps](#enforcement), these are the steps maintainers will take for further enforcement, as needed:
If you've already followed the [initial enforcement steps](#maintainer-enforcement-process), these are the steps maintainers will take for further enforcement, as needed:

1. Repeat the request to stop.
2. If the person doubles down, they will have offending messages removed or edited by a maintainers given an official warning. The PR or Issue may be locked.
3. If the behavior continues or is repeated later, the person will be blocked from participating for 24 hours.
4. If the behavior continues or is repeated after the temporary block, a long-term (6-12mo) ban will be used.
1. Repeat the request to stop.
2. If the person doubles down, they will have offending messages removed or edited by a maintainers given an official warning. The PR or Issue may be locked.
3. If the behavior continues or is repeated later, the person will be blocked from participating for 24 hours.
4. If the behavior continues or is repeated after the temporary block, a long-term (6-12mo) ban will be used.

On top of this, maintainers may remove any offending messages, images, contributions, etc, as they deem necessary.

Expand Down
35 changes: 17 additions & 18 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,4 @@
lexical
=======
# lexical

High-performance numeric conversion routines for use in a `no_std` environment. This does not depend on any standard library features, nor a system allocator.

Expand All @@ -26,7 +25,7 @@ If you want a minimal, stable, and compile-time friendly version of lexical's fl
- [License](#license)
- [Contributing](#contributing)

# Getting Started
## Getting Started

Add lexical to your `Cargo.toml`:

Expand Down Expand Up @@ -67,7 +66,7 @@ where
}
```

# Partial/Complete Parsers
## Partial/Complete Parsers

Lexical has both partial and complete parsers: the complete parsers ensure the entire buffer is used while parsing, without ignoring trailing characters, while the partial parsers parse as many characters as possible, returning both the parsed value and the number of parsed digits. Upon encountering an error, lexical will return an error indicating both the error type and the index at which the error occurred inside the buffer.

Expand All @@ -88,7 +87,7 @@ let x: i32 = lexical_core::parse(b"123 456")?;
let (x, count): (i32, usize) = lexical_core::parse_partial(b"123 456")?;
```

# no_std
## no_std

`lexical-core` does not depend on a standard library, nor a system allocator. To use `lexical-core` in a `no_std` environment, add the following to `Cargo.toml`:

Expand Down Expand Up @@ -120,7 +119,7 @@ let d: f64 = lexical_core::parse(b"3.5")?; // Ok(3.5), error checking parse.
let d: f64 = lexical_core::parse(b"3a")?; // Err(Error(_)), failed to parse.
```

# Features
## Features

Lexical feature-gates each numeric conversion routine, resulting in faster compile times if certain numeric conversions. These features can be enabled/disabled for both `lexical-core` (which does not require a system allocator) and `lexical`. By default, all conversions are enabled.

Expand Down Expand Up @@ -149,7 +148,7 @@ To ensure the safety when bounds checking is disabled, we extensively fuzz the a

Lexical also places a heavy focus on code bloat: with algorithms both optimized for performance and size. By default, this focuses on performance, however, using the `compact` feature, you can also opt-in to reduced code size at the cost of performance. The compact algorithms minimize the use of pre-computed tables and other optimizations at the cost of performance.

# Customization
## Customization

> **WARNING:** If changing the number of significant digits written, disabling the use of exponent notation, or changing exponent notation thresholds, `BUFFER_SIZE` may be insufficient to hold the resulting output. `WriteOptions::buffer_size` will provide a correct upper bound on the number of bytes written. If a buffer of insufficient length is provided, lexical-core will panic.
Expand All @@ -176,7 +175,7 @@ Due the high variability in the syntax of numbers in different programming and d

A limited subset of functionality is documented in examples below, however, the complete specification can be found in the API reference documentation.

## Number Format API
### Number Format API

The number format class provides numerous flags to specify number syntax when parsing or writing. When the `power-of-two` feature is enabled, additional flags are added:

Expand Down Expand Up @@ -213,7 +212,7 @@ const FORMAT: u128 = lexical_core::NumberFormatBuilder::new()
debug_assert!(lexical_core::format_is_valid::<FORMAT>());
```

## Options API
### Options API

The options API allows customizing number parsing and writing at run-time, such as specifying the maximum number of significant digits, exponent characters, and more.

Expand All @@ -239,7 +238,7 @@ let options = lexical_core::WriteFloatOptions::builder()
.unwrap();
```

# Documentation
## Documentation

Lexical's API reference can be found on [docs.rs](https://docs.rs/lexical), as can [lexical-core's](lexical-core). Detailed descriptions of the algorithms used can be found here:

Expand All @@ -250,7 +249,7 @@ Lexical's API reference can be found on [docs.rs](https://docs.rs/lexical), as c

In addition, descriptions of how lexical handles [digit separators](https://github.com/Alexhuszagh/rust-lexical/blob/main/docs/DigitSeparators.md) and implements [big-integer arithmetic](https://github.com/Alexhuszagh/rust-lexical/blob/main/lexical-parse-float/docs/BigInteger.md) are also documented.

# Validation
## Validation

**Float-Parsing**

Expand All @@ -264,7 +263,7 @@ Float parsing is difficult to do correctly, and major bugs have been found in im

Although lexical may contain bugs leading to rounding error, it is tested against a comprehensive suite of random-data and near-halfway representations, and should be fast and correct for the vast majority of use-cases.

# Metrics
## Metrics

Various benchmarks, binary sizes, and compile times are shown here:

Expand Down Expand Up @@ -305,13 +304,13 @@ A benchmark on writing floats generated via a random-number generator and parsed

![Random Data](https://raw.githubusercontent.com/Alexhuszagh/rust-lexical/main/lexical-write-float/assets/json.svg)

# Safety
## Safety

Due to the use of memory unsafe code in the integer and float writers, we extensively fuzz our float writers and parsers. The fuzz harnesses may be found under [fuzz](https://github.com/Alexhuszagh/rust-lexical/tree/main/fuzz), and are run continuously. So far, we've parsed and written over 72 billion floats.

Due to the simple logic of the integer writers, and the lack of memory safety in the integer parsers, we minimally fuzz both, and test it with edge-cases, which has shown no memory safety issues to date.

# Platform Support
## Platform Support

lexical-core is tested on a wide variety of platforms, including big and small-endian systems, to ensure portable code. Supported architectures include:
- x86_64 Linux, Windows, macOS, Android, iOS, FreeBSD, and NetBSD.
Expand All @@ -326,7 +325,7 @@ lexical-core is tested on a wide variety of platforms, including big and small-e

lexical-core should also work on a wide variety of other architectures and ISAs. If you have any issue compiling lexical-core on any architecture, please file a bug report.

# Versioning and Version Support
## Versioning and Version Support

**Version Support**

Expand All @@ -349,15 +348,15 @@ Please report any errors compiling a supported lexical-core version on a compati

lexical uses [semantic versioning](https://semver.org/). Removing support for Rustc versions newer than the latest stable Debian or Ubuntu version is considered an incompatible API change, requiring a major version change.

# Changelog
## Changelog

All changes are documented in [CHANGELOG](https://github.com/Alexhuszagh/rust-lexical/blob/main/CHANGELOG).

# License
## License

Lexical is dual licensed under the Apache 2.0 license as well as the MIT license. See the [LICENSE.md](LICENSE.md) file for full license details.

# Contributing
## Contributing

Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in lexical by you, as defined in the Apache-2.0 license, shall be dual licensed as above, without any additional terms or conditions. Contributing to the repository means abiding by the [code of conduct](https://github.com/Alexhuszagh/rust-lexical/blob/main/CODE_OF_CONDUCT.md).

Expand Down
4 changes: 2 additions & 2 deletions docs/BinarySize.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ Each binary is generated using all optimization levels, and includes the result

All these binaries sizes are *relative* to the size of an empty Rust binary: that is, the size of the empty executable is subtracted from the total binary's size. For some cases, this leads to results of 0 bytes, which isn't real, but in practice leads to no additional size in the resulting executable.

# Default
## Default

**Optimization Level "0"**

Expand Down Expand Up @@ -50,7 +50,7 @@ All these binaries sizes are *relative* to the size of an empty Rust binary: tha
![Parse Stripped - Optimization Level "z"](https://raw.githubusercontent.com/Alexhuszagh/rust-lexical/main/assets/size_parse_stripped_optz_posix.svg)
![Write Stripped - Optimization Level "z"](https://raw.githubusercontent.com/Alexhuszagh/rust-lexical/main/assets/size_write_stripped_optz_posix.svg)

# Compact
## Compact

**Optimization Level "0"**

Expand Down
10 changes: 5 additions & 5 deletions docs/Development.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ cargo +nightly build
cargo +nightly test
```

# Code Structure
## Code Structure

Lexical is broken up into compact, relatively isolated workspaces to separate functionality based on the numeric conversion, minimizing compile times and simplifying testing feature-dependent code. The workspaces are:

Expand All @@ -26,7 +26,7 @@ Furthermore, any unsafe code uses the following conventions:
1. Each unsafe function must contain a `# Safety` section.
2. Unsafe operations/calls in unsafe functions must be marked as unsafe, with their safety guarantees clearly documented via a `// SAFETY:` section.

# Dependencies
## Dependencies

In order to fully test and develop lexical, a recent, nightly compiler along with following Rust dependencies is required:

Expand Down Expand Up @@ -57,7 +57,7 @@ In addition, the following non-Rust dependencies must be installed:
- python-magic (python-magic-win64 on Windows)
- Valgrind

# Development Process
## Development Process

The [scripts](https://github.com/Alexhuszagh/rust-lexical/tree/main/scripts) directory contains numerous scripts for testing, fuzzing, analyzing, and formatting code. Since many development features are nightly-only, this ensures the proper compiler features are used. This requires a recent version of a nightly compiler (1.65.0+) installed via Rustup, which can be invoked as `cargo +nightly`.

Expand Down Expand Up @@ -87,7 +87,7 @@ scripts/check.sh
SKIP_MIRI=1 scripts/test.sh
```

# Safety
## Safety

In order to ensure memory safety even when using unsafe features, we have the following requirements.

Expand All @@ -106,6 +106,6 @@ RUSTFLAGS="--deny warnings" cargo +nightly build --features=lint
cargo +nightly clippy --all-features -- --deny warnings
```

# Algorithm Changes
## Algorithm Changes

Each workspace has a "docs" directory containing detailed descriptions of algorithms and benchmarks. If you make any substantial changes to an algorithm, you should both update the algorithm description and the provided benchmarks.
13 changes: 6 additions & 7 deletions docs/DigitSeparators.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,4 @@
Digit Separators
================
# Digit Separators

Supporting performant parsers using digit separators in a no-allocator context is difficult to support correctly with adequate performance. One of the major issues is that the syntax of numbers that accept digit separators varies between implementations.

Expand All @@ -25,11 +24,11 @@ double x = 1._0; // invalid

This means any parser must be context-aware, and also understand control characters: a digit separator followed by a decimal point is a trailing digit separator, while one followed by a digit is an internal one.

# Defining Grammar
## Defining Grammar

Due to the context-aware nature, it's important to define the grammar on how digit separators work:

1. Leading digit separators come before any other input, or after control characters. Any digit separators after a leading digit separator are considered leading, even if consecutive digit separators are not allowed.
- Leading digit separators come before any other input, or after control characters. Any digit separators after a leading digit separator are considered leading, even if consecutive digit separators are not allowed.

Examples therefore include:

Expand All @@ -44,7 +43,7 @@ __1.0
1.0e__5
```

2. Trailing digit separators come after any other input, or before control characters. Any digit separators before another trailing digit separator are considered trailing, even if consecutive digit separators are not allowed.
- Trailing digit separators come after any other input, or before control characters. Any digit separators before another trailing digit separator are considered trailing, even if consecutive digit separators are not allowed.

Examples therefore include:

Expand All @@ -59,7 +58,7 @@ Examples therefore include:
1.0e5__
```

3. Internal digit separators therefore are any digit separators that cannot be classified as leading or trailing. Likewise, any digit separators that are adjacent to another internal digit separator are considered internal, even if consecutive digit separators are not allowed.
- Internal digit separators therefore are any digit separators that cannot be classified as leading or trailing. Likewise, any digit separators that are adjacent to another internal digit separator are considered internal, even if consecutive digit separators are not allowed.

Examples therefore include:

Expand All @@ -78,7 +77,7 @@ Examples therefore include:

This opens up a lot of possibilities: what is a valid control character? In practice, it's much easier to define control characters as every character that's not a valid digit, and therefore to handle parsing we just need to check against valid digits and the digit separator.

# Iterator Design
## Iterator Design

The iterator is therefore a generic based on the format specification: this allows the iterator to resolve all unnecessary branching at compile time.

Expand Down
3 changes: 1 addition & 2 deletions fuzz/README.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,3 @@
lexical-fuzz
============
# lexical-fuzz

Fuzzing routines to minimize the risk of any memory unsafety. See [scripts/fuzz.sh](/scripts/fuzz.sh) for use.
3 changes: 1 addition & 2 deletions lexical-asm/README.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,4 @@
lexical-asm
===========
# lexical-asm

Utilities to carefully monitor the assembly generation of lexical's numeric conversion routines. See [scripts/asm.sh](/scripts/asm.sh) for use.

Expand Down
5 changes: 2 additions & 3 deletions lexical-benchmark/README.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,8 @@
lexical-benchmark
=================
# lexical-benchmark

Benchmarks comparing lexical to other numeric conversion routines.

# Running the Benchmark
## Running the Benchmark

The benchmark requires the following:

Expand Down
Loading

0 comments on commit 19bf353

Please sign in to comment.