Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Version 2 #13

Draft
wants to merge 26 commits into
base: master
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
69 changes: 0 additions & 69 deletions .github/workflows/rust.yml

This file was deleted.

9 changes: 1 addition & 8 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,9 +1,2 @@
/target
Cargo.lock
benches/alternatives.html
mail.txt
.vscode/settings.json
mail3.txt
mail2.txt
# This is a file for testing 491179 emails. Too big for git
tests/local_emails.rs
/Cargo.lock
15 changes: 11 additions & 4 deletions Cargo.toml
Original file line number Diff line number Diff line change
@@ -1,4 +1,11 @@
[workspace]
members = [
"email-parser",
]
[package]
name = "email-parser2"
authors = ["Mubelotix <[email protected]>"]
version = "0.1.0"
edition = "2021"

[dependencies]
faster-pest = { path="../../pest-based-parser/faster-pest" }

[dev-dependencies]
email-parser = { path="../email-parser" }
58 changes: 11 additions & 47 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,53 +1,17 @@
# email-parser

The fastest and lightest email parsing Rust library!\
This library has no dependency by default (and only a small optional one).
The fastest and lightest email parsing Rust library!

## Goal
## History

The goal of this library is to be fully compliant with RFC 5322. However, this library does not intend to support the obsolete syntax because it has been obsolete for 12 years, and it would slow down everything.\
This library supports MIME and will support PGP in the future.
Started in mid-2020, this library was originally designed to be a handmade, zero-dependency email parsing library.
The performance of this library was unrivalled.
However, the maintainance cost was too high.

## Example
I started wondering if I could use Rust macros to generate the parsing code.
I liked [pest](https://pest.rs/) but it had terrible performance.
Using the knowledge I got writing the first email parser, I wrote [a better code generator](https://github.com/Mubelotix/faster-pest) for pest grammars.

```rust
let email = Email::parse(
b"\
From: Mubelotix <[email protected]>\r\n\
Subject:Example Email\r\n\
To: Someone <[email protected]>\r\n\
Message-id: <[email protected]>\r\n\
Date: 5 May 2003 18:58:34 +0000\r\n\
\r\n\
Hey!\r\n",
)
.unwrap();

assert_eq!(email.subject.unwrap(), "Example Email");
assert_eq!(email.sender.name.unwrap(), vec!["Mubelotix"]);
assert_eq!(email.sender.address.local_part, "mubelotix");
assert_eq!(email.sender.address.domain, "mubelotix.dev");
```

## Pay for what you use

Mails can be elaborated. No matter what you are building, you are certainly not using all of its features.\
So why would you pay the parsing cost of header fields you are not using? This library allows you to enable headers you need so that other header values will be parsed as an unstructured header, which is much faster.\
By disabling all header value parsing, this library can parse an entire mail twice faster! But don't worry if you need everything enabled; this library is blazing fast anyway!

## Zero-Copy (almost)

This library tries to avoid usage of owned `String`s as much as possible and is using `Cow<str>` instead.\
Thanks to this method, around 90% of the strings are references.

## Benchmarks

This chart shows the time took to parse a single email.

![Benchmark](https://cdn.discordapp.com/attachments/694923348844609597/789162705494868020/unknown.png)

Run these benchmarks by yourself with `rustup run nightly cargo bench` and `rustup run nightly cargo bench --no-default-features`.\
Tests require a `mail.txt` file containing a raw mail next to the `Cargo.toml`.\
Some libraries suffer from huge performance variations depending on the content of the mail, so this library is not **always** the fastest.

License: MIT
Then, I made the current version of this library, whose code is almost **entirely automatically generated**.
The generated code is so overly optimized, it has beaten my older handmade parser.
This is the **fastest email parser** among all those I have tested.
17 changes: 17 additions & 0 deletions benches/alternatives.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
#![feature(test)]

extern crate test;
use test::Bencher;

const MAIL: &[u8] = include_bytes!("../mail.txt");
const MAIL2: &str = include_str!("../mail.txt");

#[bench]
fn email_parser2(b: &mut Bencher) {
b.iter(|| email_parser2::Email::parse(MAIL2));
}

#[bench]
fn email_parser(b: &mut Bencher) {
b.iter(|| email_parser::email::Email::parse(MAIL));
}
45 changes: 0 additions & 45 deletions email-parser/Cargo.toml

This file was deleted.

104 changes: 0 additions & 104 deletions email-parser/benches/alternatives.html

This file was deleted.

29 changes: 0 additions & 29 deletions email-parser/benches/alternatives.rs

This file was deleted.

17 changes: 0 additions & 17 deletions email-parser/benches/encodings.rs

This file was deleted.

Loading