Skip to content

Commit

Permalink
chore(bench): add large benches
Browse files Browse the repository at this point in the history
  • Loading branch information
j-mendez committed Dec 27, 2023
1 parent 50ad1c6 commit d1a899d
Show file tree
Hide file tree
Showing 4 changed files with 53 additions and 3 deletions.
7 changes: 7 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,13 @@

The [spider](https://github.com/spider-rs/spider) project ported to Python.

Test url: `https://espn.com`

| `libraries` | `pages` | `speed` |
| :----------------------------- | :-------- | :------ |
| **`spider-rs(python): crawl`** | `150,387` | `186s` |
| **`scrapy(python): crawl`** | `49,598` | `1h` |

## Getting Started

1. `pip install spider_rs`
Expand Down
22 changes: 20 additions & 2 deletions bench/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
You can run the benches with python in terminal.

```sh
python scrapy.py && python spider.py
python scrappy.py && python spider.py
```

## Cases
Expand Down Expand Up @@ -32,4 +32,22 @@ pages found 200
elasped duration 5.860108852386475
```

Linux performance for Spider-RS increases around 10x especially on Arm.
Test url: `https://a11ywatch.com` (medium)
648 pages

| `libraries` | `speed` |
| :-------------------------------- | :------ |
| **`spider-rs: crawl 10 samples`** | `2s` |
| **`scrapy: crawl 10 samples`** | `7.7s` |

Test url: `https://espn.com` (large)
150,387 pages

| `libraries` | `pages` | `speed` |
| :---------------------------------------- | :-------- | :------ |
| **`spider-rs(python): crawl 10 samples`** | `150,387` | `186s` |
| **`scrapy(python): crawl 10 samples`** | `49,598` | `1h` |

Scrapy used too much memory, crawl cancelled after an hour.

Note: The performance scales the larger the website and if throttling is needed. Linux benchmarks are about 10x faster than macOS for spider-rs.
7 changes: 7 additions & 0 deletions book/src/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,3 +13,10 @@
- Written in [Rust](https://www.rust-lang.org/) for speed, safety, and simplicity

Spider powers some big tools and helps bring the crawling aspect to almost no downtime with the correct setup, view the [spider](https://github.com/spider-rs/spider) project to learn more.

Test url: `https://espn.com`

| `libraries` | `pages` | `speed` |
| :----------------------------- | :-------- | :------ |
| **`spider-rs(python): crawl`** | `150,387` | `186s` |
| **`scrapy(python): crawl`** | `49,598` | `1h` |
20 changes: 19 additions & 1 deletion book/src/benchmarks.md
Original file line number Diff line number Diff line change
Expand Up @@ -50,4 +50,22 @@ Test url: `https://rsseau.fr` (medium)
| **`spider-rs: crawl 10 samples`** | `2.5s` |
| **`scrapy: crawl 10 samples`** | `10s` |

The performance scales the larger the website and if throttling is needed. Linux benchmarks are about 10x faster than macOS for spider-rs.
Test url: `https://a11ywatch.com` (medium)
648 pages

| `libraries` | `speed` |
| :-------------------------------- | :------ |
| **`spider-rs: crawl 10 samples`** | `2s` |
| **`scrapy: crawl 10 samples`** | `7.7s` |

Test url: `https://espn.com` (large)
150,387 pages

| `libraries` | `pages` | `speed` |
| :-------------------------------- | :-------- | :------ |
| **`spider-rs: crawl 10 samples`** | `150,387` | `186s` |
| **`scrapy: crawl 10 samples`** | `49,598` | `1h` |

Scrapy used too much memory, crawl cancelled after an hour.

Note: The performance scales the larger the website and if throttling is needed. Linux benchmarks are about 10x faster than macOS for spider-rs.

0 comments on commit d1a899d

Please sign in to comment.