Skip to content

Commit

Permalink
Improve documentation
Browse files Browse the repository at this point in the history
- Move comparison table to a separate section.
- Use CSS icons to make table more readable.
- Refer to the table from backend documentations.
- Explain how backends store and manipulate interned data.

Signed-off-by: Tin Švagelj <[email protected]>
  • Loading branch information
Caellian committed Nov 21, 2024
1 parent 6dcb898 commit dfa3285
Show file tree
Hide file tree
Showing 6 changed files with 467 additions and 133 deletions.
60 changes: 27 additions & 33 deletions src/backend/bucket/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -9,39 +9,33 @@ use crate::{symbol::expect_valid_symbol, DefaultSymbol, Symbol};
use alloc::{string::String, vec::Vec};
use core::{iter::Enumerate, marker::PhantomData, slice};

/// An interner backend that reduces memory allocations by using string buckets.
///
/// # Note
///
/// Implementation inspired by matklad's blog post that can be found here:
/// <https://matklad.github.io/2020/03/22/fast-simple-rust-interner.html>
///
/// # Usage Hint
///
/// Use when deallocations or copy overhead is costly or when
/// interning of static strings is especially common.
///
/// # Usage
///
/// - **Fill:** Efficiency of filling an empty string interner.
/// - **Resolve:** Efficiency of interned string look-up given a symbol.
/// - **Allocations:** The number of allocations performed by the backend.
/// - **Footprint:** The total heap memory consumed by the backend.
/// - **Contiguous:** True if the returned symbols have contiguous values.
/// - **Iteration:** Efficiency of iterating over the interned strings.
///
/// Rating varies between **bad**, **ok**, **good** and **best**.
///
/// | Scenario | Rating |
/// |:------------|:--------:|
/// | Fill | **good** |
/// | Resolve | **best** |
/// | Allocations | **good** |
/// | Footprint | **ok** |
/// | Supports `get_or_intern_static` | **yes** |
/// | `Send` + `Sync` | **yes** |
/// | Contiguous | **yes** |
/// | Iteration | **best** |
/// An interner backend that reduces memory allocations by using buckets.
///
/// # Overview
/// This interner uses fixed-size buckets to store interned strings. Each bucket is
/// allocated once and holds a set number of strings. When a bucket becomes full, a new
/// bucket is allocated to hold more strings. Buckets are never deallocated, which reduces
/// the overhead of frequent memory allocations and copying.
///
/// ## Trade-offs
/// - **Advantages:**
/// - Strings in already used buckets remain valid and accessible even as new strings
/// are added.
/// - **Disadvantages:**
/// - Slightly slower access times due to double indirection (looking up the string
/// involves an extra level of lookup through the bucket).
/// - Memory may be used inefficiently if many buckets are allocated but only partially
/// filled because of large strings.
///
/// ## Use Cases
/// This backend is ideal when interned strings must remain valid even after new ones are
/// added.general use
///
/// Refer to the [comparison table][crate::_docs::comparison_table] for comparison with
/// other backends.
///
/// [matklad's blog post]:
/// https://matklad.github.io/2020/03/22/fast-simple-rust-interner.html
#[derive(Debug)]
pub struct BucketBackend<'i, S: Symbol = DefaultSymbol> {
spans: Vec<InternedStr>,
Expand Down
38 changes: 13 additions & 25 deletions src/backend/buffer.rs
Original file line number Diff line number Diff line change
Expand Up @@ -5,34 +5,22 @@ use crate::{symbol::expect_valid_symbol, DefaultSymbol, Symbol};
use alloc::vec::Vec;
use core::{mem, str};

/// An interner backend that appends all interned string information in a single buffer.
/// An interner backend that concatenates all interned string contents into one large
/// buffer [`Vec`]. Unlike [`StringBackend`][crate::backend::StringBackend], string
/// lengths are stored in the same buffer as strings preceeding the respective string
/// data.
///
/// # Usage Hint
/// ## Trade-offs
/// - **Advantages:**
/// - Accessing interned strings is fast, as it requires a single lookup.
/// - **Disadvantages:**
/// - Iteration is slow because it requires consecutive reading of lengths to advance.
///
/// Use this backend if memory consumption is what matters most to you.
/// Note though that unlike all other backends symbol values are not contigous!
/// ## Use Cases
/// This backend is ideal for storing many small (<255 characters) strings.
///
/// # Usage
///
/// - **Fill:** Efficiency of filling an empty string interner.
/// - **Resolve:** Efficiency of interned string look-up given a symbol.
/// - **Allocations:** The number of allocations performed by the backend.
/// - **Footprint:** The total heap memory consumed by the backend.
/// - **Contiguous:** True if the returned symbols have contiguous values.
/// - **Iteration:** Efficiency of iterating over the interned strings.
///
/// Rating varies between **bad**, **ok**, **good** and **best**.
///
/// | Scenario | Rating |
/// |:------------|:--------:|
/// | Fill | **best** |
/// | Resolve | **bad** |
/// | Allocations | **best** |
/// | Footprint | **best** |
/// | Supports `get_or_intern_static` | **no** |
/// | `Send` + `Sync` | **yes** |
/// | Contiguous | **no** |
/// | Iteration | **bad** |
/// Refer to the [comparison table][crate::_docs::comparison_table] for comparison with
/// other backends.
#[derive(Debug)]
pub struct BufferBackend<'i, S: Symbol = DefaultSymbol> {
len_strings: usize,
Expand Down
47 changes: 18 additions & 29 deletions src/backend/string.rs
Original file line number Diff line number Diff line change
Expand Up @@ -5,38 +5,27 @@ use crate::{symbol::expect_valid_symbol, DefaultSymbol, Symbol};
use alloc::{string::String, vec::Vec};
use core::{iter::Enumerate, slice};

/// An interner backend that accumulates all interned string contents into one string.
/// An interner backend that concatenates all interned string contents into one large
/// buffer and keeps track of string bounds in a separate [`Vec`].
///
/// Implementation is inspired by [CAD97's](https://github.com/CAD97)
/// [`strena`](https://github.com/CAD97/strena) crate.
///
/// # Note
/// ## Trade-offs
/// - **Advantages:**
/// - Separated length tracking allows fast iteration.
/// - **Disadvantages:**
/// - Many insertions separated by external allocations can cause the buffer to drift
/// far away (in memory) from `Vec` storing string ends, which impedes performance of
/// all interning operations.
/// - Resolving a symbol requires two heap lookups because data and length are stored in
/// separate containers.
///
/// Implementation inspired by [CAD97's](https://github.com/CAD97) research
/// project [`strena`](https://github.com/CAD97/strena).
/// ## Use Cases
/// This backend is good for storing fewer large strings and for general use.
///
/// # Usage Hint
///
/// Use this backend if runtime performance is what matters most to you.
///
/// # Usage
///
/// - **Fill:** Efficiency of filling an empty string interner.
/// - **Resolve:** Efficiency of interned string look-up given a symbol.
/// - **Allocations:** The number of allocations performed by the backend.
/// - **Footprint:** The total heap memory consumed by the backend.
/// - **Contiguous:** True if the returned symbols have contiguous values.
/// - **Iteration:** Efficiency of iterating over the interned strings.
///
/// Rating varies between **bad**, **ok**, **good** and **best**.
///
/// | Scenario | Rating |
/// |:------------|:--------:|
/// | Fill | **good** |
/// | Resolve | **ok** |
/// | Allocations | **good** |
/// | Footprint | **good** |
/// | Supports `get_or_intern_static` | **no** |
/// | `Send` + `Sync` | **yes** |
/// | Contiguous | **yes** |
/// | Iteration | **good** |
/// Refer to the [comparison table][crate::_docs::comparison_table] for comparison with
/// other backends.
#[derive(Debug)]
pub struct StringBackend<'i, S: Symbol = DefaultSymbol> {
ends: Vec<usize>,
Expand Down
Loading

0 comments on commit dfa3285

Please sign in to comment.