Skip to content

Commit

Permalink
Merge branch 'main' into add_examples_comments
Browse files Browse the repository at this point in the history
  • Loading branch information
mrobinson authored Apr 3, 2024
2 parents 1eb6d32 + 9b94335 commit d6e184c
Show file tree
Hide file tree
Showing 56 changed files with 700 additions and 716 deletions.
49 changes: 39 additions & 10 deletions .github/workflows/main.yml
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@ name: CI

on:
push:
branches: [master]
branches: [main]
pull_request:
merge_group:
types: [checks_requested]
Expand All @@ -13,9 +13,9 @@ jobs:
runs-on: ubuntu-latest
strategy:
matrix:
version: [1.60.0, stable, beta, nightly]
version: [stable, beta, nightly]
steps:
- uses: actions/checkout@v3
- uses: actions/checkout@v4

- name: Set toolchain
run: |
Expand All @@ -26,16 +26,10 @@ jobs:
run: git submodule update --init

- name: Cargo bench
if: matrix.version != '1.41.0'
run: cargo bench --all
env:
RUSTFLAGS: --cfg bench

- name: Test "rustc-test/capture" feature
if: matrix.version == 'nightly'
working-directory: rcdom
run: cargo test --features "rustc-test/capture"

- name: Cargo test
if: matrix.version != 'nightly'
run: cargo test --all
Expand All @@ -44,6 +38,19 @@ jobs:
if: matrix.version == 'nightly'
run: cargo doc

msrv:
name: MSRV
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4

- name: Install stable toolchain
run: |
rustup set profile minimal
rustup override set 1.60.0
- run: cargo check --lib --all-features

build_result:
name: Result
runs-on: ubuntu-latest
Expand All @@ -56,4 +63,26 @@ jobs:
if: success()
- name: Mark the job as unsuccessful
run: exit 1
if: "!success()"
if: ${{ !success() }}

lint:
name: Lint
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4

- name: Install stable toolchain
run: |
rustup set profile minimal
rustup override set stable
- name: Install clippy
run: |
rustup component add clippy
rustup component add rustfmt
- name: Format
run: cargo fmt --all -- --check

- name: Run clippy
run: cargo clippy --all-features --all-targets -- -D warnings
1 change: 1 addition & 0 deletions Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -5,3 +5,4 @@ members = [
"rcdom",
"xml5ever"
]
resolver = "2"
15 changes: 8 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,11 +7,11 @@

html5ever is an HTML parser developed as part of the [Servo][] project.

It can parse and serialize HTML according to the [WHATWG](https://whatwg.org/) specs (aka "HTML5"). However, there are some differences in the actual behavior currently, most of which are documented [in the bug tracker][]. html5ever passes all tokenizer tests from [html5lib-tests][], with most tree builder tests outside of the unimplemented features. The goal is to pass all html5lib tests, while also providing all hooks needed by a production web browser, e.g. `document.write`.
It can parse and serialize HTML according to the [WHATWG](https://whatwg.org/) specs (aka "HTML5"). However, there are some differences in the actual behavior currently, most of which are documented [in the bug tracker][]. html5ever passes all tokenizer tests from [html5lib-tests][], with most tree builder tests outside of the unimplemented features. The goal is to pass all html5lib tests, while also providing all hooks needed by a production web browser, e.g. `document.write`.

Note that the HTML syntax is very similar to XML. For correct parsing of XHTML, use an XML parser (That said, many XHTML documents in the wild are serialized in an HTML-compatible form).
Note that the HTML syntax is very similar to XML. For correct parsing of XHTML, use an XML parser (that said, many XHTML documents in the wild are serialized in an HTML-compatible form).

html5ever is written in [Rust][], therefore it avoids the notorious security problems that come along with using C. Being built with Rust also makes the library come with the high-grade performance you would expect from an HTML parser written in C. html5ever is basically a C HTML parser, but without needing a garbage collector or other heavy runtime processes.
html5ever is written in [Rust][], therefore it avoids the notorious security problems that come along with using C. Being built with Rust also makes the library come with the high-grade performance you would expect from an HTML parser written in C. html5ever is basically a C HTML parser, but without needing a garbage collector or other heavy runtime processes.


## Getting started in Rust
Expand All @@ -20,11 +20,12 @@ Add html5ever as a dependency in your [`Cargo.toml`](https://crates.io/) file:

```toml
[dependencies]
html5ever = "0.26"
html5ever = "0.27"
```

You should also take a look at [`examples/html2html.rs`], [`examples/print-rcdom.rs`], and the [API documentation][].


## Getting started in other languages

Bindings for Python and other languages are much desired.
Expand All @@ -45,7 +46,7 @@ Run `cargo doc` in the repository root to build local documentation under `targe

html5ever uses callbacks to manipulate the DOM, therefore it does not provide any DOM tree representation.

html5ever exclusively uses UTF-8 to represent strings. In the future it will support other document encodings (and UCS-2 `document.write`) by converting input.
html5ever exclusively uses UTF-8 to represent strings. In the future it will support other document encodings (and UCS-2 `document.write`) by converting input.

The code is cross-referenced with the WHATWG syntax spec, and eventually we will have a way to present code and spec side-by-side.

Expand All @@ -56,5 +57,5 @@ html5ever builds against the official stable releases of Rust, though some optim
[Rust]: https://www.rust-lang.org/
[in the bug tracker]: https://github.com/servo/html5ever/issues?q=is%3Aopen+is%3Aissue+label%3Aweb-compat
[html5lib-tests]: https://github.com/html5lib/html5lib-tests
[`examples/html2html.rs`]: https://github.com/servo/html5ever/blob/master/rcdom/examples/html2html.rs
[`examples/print-rcdom.rs`]: https://github.com/servo/html5ever/blob/master/rcdom/examples/print-rcdom.rs
[`examples/html2html.rs`]: https://github.com/servo/html5ever/blob/main/rcdom/examples/html2html.rs
[`examples/print-rcdom.rs`]: https://github.com/servo/html5ever/blob/main/rcdom/examples/print-rcdom.rs
12 changes: 5 additions & 7 deletions html5ever/Cargo.toml
Original file line number Diff line number Diff line change
@@ -1,30 +1,28 @@
[package]

name = "html5ever"
version = "0.26.0"
version = "0.27.0"
authors = [ "The html5ever Project Developers" ]
license = "MIT OR Apache-2.0"
repository = "https://github.com/servo/html5ever"
description = "High-performance browser-grade HTML5 parser"
documentation = "https://docs.rs/html5ever"
build = "build.rs"
categories = [ "parser-implementations", "web-programming" ]
edition = "2018"
edition = "2021"

[dependencies]
log = "0.4"
mac = "0.1"
markup5ever = { version = "0.11", path = "../markup5ever" }
markup5ever = { version = "0.12", path = "../markup5ever" }

[dev-dependencies]
typed-arena = "2.0.2"

[target.'cfg(bench)'.dev-dependencies]
criterion = "0.3"
typed-arena = "2.0.2"

[build-dependencies]
quote = "1"
syn = { version = "1", features = ["extra-traits", "full", "fold"] }
syn = { version = "2", features = ["extra-traits", "full", "fold"] }
proc-macro2 = "1"

[[bench]]
Expand Down
5 changes: 2 additions & 3 deletions html5ever/benches/html5ever.rs
Original file line number Diff line number Diff line change
Expand Up @@ -27,12 +27,11 @@ fn run_bench(c: &mut Criterion, name: &str) {
let mut path = PathBuf::from(env!("CARGO_MANIFEST_DIR"));
path.push("data/bench/");
path.push(name);
let mut file = fs::File::open(&path).ok().expect("can't open file");
let mut file = fs::File::open(&path).expect("can't open file");

// Read the file and treat it as an infinitely repeating sequence of characters.
let mut file_input = ByteTendril::new();
file.read_to_tendril(&mut file_input)
.ok()
.expect("can't read file");
let file_input: StrTendril = file_input.try_reinterpret().unwrap();
let size = file_input.len();
Expand All @@ -55,7 +54,7 @@ fn run_bench(c: &mut Criterion, name: &str) {
c.bench_function(&test_name, move |b| {
b.iter(|| {
let mut tok = Tokenizer::new(Sink, Default::default());
let mut buffer = BufferQueue::new();
let mut buffer = BufferQueue::default();
// We are doing clone inside the bench function, this is not ideal, but possibly
// necessary since our iterator consumes the underlying buffer.
for buf in input.clone().into_iter() {
Expand Down
8 changes: 4 additions & 4 deletions html5ever/examples/arena.rs
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ use std::ptr;
/// By using our Sink type, the arena is filled with parsed HTML.
fn html5ever_parse_slice_into_arena<'a>(bytes: &[u8], arena: Arena<'a>) -> Ref<'a> {
let sink = Sink {
arena: arena,
arena,
document: arena.alloc(Node::new(NodeData::Document)),
quirks_mode: QuirksMode::NoQuirks,
};
Expand Down Expand Up @@ -88,7 +88,7 @@ impl<'arena> Node<'arena> {
next_sibling: Cell::new(None),
first_child: Cell::new(None),
last_child: Cell::new(None),
data: data,
data,
}
}

Expand Down Expand Up @@ -211,7 +211,7 @@ impl<'arena> TreeSink for Sink<'arena> {

fn get_template_contents(&mut self, target: &Ref<'arena>) -> Ref<'arena> {
if let NodeData::Element {
template_contents: Some(ref contents),
template_contents: Some(contents),
..
} = target.data
{
Expand Down Expand Up @@ -257,7 +257,7 @@ impl<'arena> TreeSink for Sink<'arena> {

fn create_pi(&mut self, target: StrTendril, data: StrTendril) -> Ref<'arena> {
self.new_node(NodeData::ProcessingInstruction {
target: target,
target,
contents: data,
})
}
Expand Down
5 changes: 2 additions & 3 deletions html5ever/examples/noop-tokenize.rs
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,6 @@

extern crate html5ever;

use std::default::Default;
use std::io;

use html5ever::tendril::*;
Expand All @@ -37,8 +36,8 @@ fn main() {
// Read HTML from standard input
let mut chunk = ByteTendril::new();
io::stdin().read_to_tendril(&mut chunk).unwrap();

let mut input = BufferQueue::new();
let mut input = BufferQueue::default();
input.push_back(chunk.try_reinterpret().unwrap());

let mut tok = Tokenizer::new(Sink(Vec::new()), Default::default());
Expand Down
5 changes: 2 additions & 3 deletions html5ever/examples/noop-tree-builder.rs
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,6 @@ extern crate html5ever;

use std::borrow::Cow;
use std::collections::HashMap;
use std::default::Default;
use std::io;

use html5ever::parse_document;
Expand Down Expand Up @@ -49,7 +48,7 @@ impl TreeSink for Sink {
}

fn get_template_contents(&mut self, target: &usize) -> usize {
if let Some(expanded_name!(html "template")) = self.names.get(&target).map(|n| n.expanded())
if let Some(expanded_name!(html "template")) = self.names.get(target).map(|n| n.expanded())
{
target + 1
} else {
Expand Down Expand Up @@ -96,7 +95,7 @@ impl TreeSink for Sink {

fn append_doctype_to_document(&mut self, _: StrTendril, _: StrTendril, _: StrTendril) {}
fn add_attrs_if_missing(&mut self, target: &usize, _attrs: Vec<Attribute>) {
assert!(self.names.contains_key(&target), "not an element");
assert!(self.names.contains_key(target), "not an element");
}
fn remove_from_parent(&mut self, _target: &usize) {}
fn reparent_children(&mut self, _node: &usize, _new_parent: &usize) {}
Expand Down
1 change: 0 additions & 1 deletion html5ever/examples/print-tree-actions.rs
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,6 @@ extern crate html5ever;

use std::borrow::Cow;
use std::collections::HashMap;
use std::default::Default;
use std::io;

use html5ever::parse_document;
Expand Down
3 changes: 1 addition & 2 deletions html5ever/examples/tokenize.rs
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,6 @@

extern crate html5ever;

use std::default::Default;
use std::io;

use html5ever::tendril::*;
Expand Down Expand Up @@ -91,7 +90,7 @@ fn main() {
let mut chunk = ByteTendril::new();
io::stdin().read_to_tendril(&mut chunk).unwrap();

let mut input = BufferQueue::new();
let mut input = BufferQueue::default();
input.push_back(chunk.try_reinterpret().unwrap());

let mut tok = Tokenizer::new(
Expand Down
Loading

0 comments on commit d6e184c

Please sign in to comment.