🌐 INSDC Benchmarking Scripts

Automated benchmarking tools for testing INSDC data download performance across repositories (ENA, SRA, and DDBJ) and multiple transfer protocols.

🚀 Quick Start

1. Install

pip install insdc-benchmarking-scripts

2. Configure

cp config.yaml.example config.yaml
# Edit config.yaml:
# site: nci
# api_endpoint: https://your.api/submit
# api_token: YOUR_TOKEN   # optional

3. Run a Benchmark

HTTP/HTTPS (wget-based)

benchmark-http --dataset DRR12345678 --repository ENA --site nci

SRA Cloud .sra Objects (AWS/GCS)

benchmark-http\
  --dataset DRR000001\
  --repository SRA\
  --sra-mode sra_cloud\
  --mirror auto\
  --no-submit

ENA FASTQ via HTTPS

benchmark-http\
  --dataset SRR000001\
  --repository ENA\
  --no-submit

🧠 Key Features

✅ HTTP/HTTPS benchmarking using wget
✅ SRA Cloud (AWS/GCS) .sra object downloads
✅ ENA FASTQ over HTTPS
🧩 Automatic system metrics --- CPU%, memory MB, disk write speed
🌍 Network baselines --- ping/traceroute latency and route
🧾 JSON output aligned with INSDC Benchmarking Schema v1.2
📤 Optional API submission (secure HTTP POST)
🧪 Repeatable tests with --repeats and aggregate stats
🧰 Mirror control for SRA: --mirror [aws|gcs|auto], --require-mirror, --explain

📦 Supported Protocols

Protocol	Implementation	Status
HTTP/HTTPS	wget	✅ Stable
FTP	ftplib	✅ Stable
Globus	Python SDK	🔄 Planned
Aspera	CLI SDK	🔄 Planned
SRA Toolkit	fasterq-dump (wrapper)	🔄 Planned

⚙️ Configuration

See config.yaml.example:

site: nci
api_endpoint: https://your.api/submit
api_token: your-secret-token

📊 Example Output

{
  "timestamp": "2025-11-06T06:21:33Z",
  "end_timestamp": "2025-11-06T06:23:05Z",
  "site": "nci",
  "protocol": "http",
  "repository": "SRA",
  "dataset_id": "DRR000001",
  "duration_sec": 92.3,
  "file_size_bytes": 596137898,
  "average_speed_mbps": 51.6,
  "cpu_usage_percent": 7.2,
  "memory_usage_mb": 10300.5,
  "status": "success",
  "checksum_md5": "bf11d3ea9d7e0b6e984998ea2dfd53ca",
  "write_speed_mbps": 3350.3,
  "network_latency_ms": 8.9,
  "tool_version": "GNU Wget 1.21.4",
  "notes": "Resolved from AWS ODP mirror"
}

🧱 Repository Structure

insdc-benchmarking-scripts/
├── scripts/
│   ├── benchmark_http.py        # HTTP/HTTPS benchmarking CLI (Click)
│   ├── benchmark_ftp.py         # FTP benchmarking (ftplib)
│   └── benchmark_aspera.py      # Future Aspera integration
│
├── insdc_benchmarking_scripts/
│   ├── utils/
│   │   ├── repositories/        # ENA/SRA/DDBJ resolvers
│   │   ├── system_metrics.py    # CPU/memory sampler
│   │   ├── network_baseline.py  # ping/traceroute helpers
│   │   ├── submit.py            # HTTP POST to results API
│   │   └── config.py            # Config loader
│   └── __init__.py
│
├── docs/
│   ├── INSTALLATION.md          # Setup and verification instructions
│   ├── USAGE.md                 # CLI usage and examples
│   ├── protocols/               # Protocol-specific notes
│   └── schema/                  # INSDC Benchmarking Schema v1.2
│
├── config.yaml.example          # Example configuration file
├── requirements.txt             # Dependencies for pip installs
├── pyproject.toml               # Poetry build config
├── README.md                    # This file
└── LICENSE

📚 Documentation

🧭 Roadmap

Add Globus and Aspera benchmarking
Unified results ingestion API (FastAPI backend)
Web dashboard for live performance visualization
Scheduled batch benchmarking for curated datasets
Add object checksum validation and retry support

🤝 Contributing

Contributions are welcome! Please open an issue or submit a pull request to add protocols, metrics, or infrastructure integrations.

Development Workflow

# Fork and clone
git clone https://github.com/AustralianBioCommons/insdc-benchmarking-scripts
cd insdc-benchmarking-scripts

# Install dependencies
poetry install

# Run a test benchmark
poetry run benchmark-http --dataset DRR000001 --repository ENA --no-submit

Maintained by: Australian BioCommons
📍 University of Melbourne
🔗 Licensed under the Apache 2.0 License

Name		Name	Last commit message	Last commit date
Latest commit History 44 Commits
.github		.github
.idea		.idea
docs		docs
insdc_benchmarking_scripts		insdc_benchmarking_scripts
tests		tests
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.python-version		.python-version
.release-please-manifest.json		.release-please-manifest.json
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
README-pypi.md		README-pypi.md
README.md		README.md
config.yaml.example		config.yaml.example
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🌐 INSDC Benchmarking Scripts

🚀 Quick Start

1. Install

2. Configure

3. Run a Benchmark

HTTP/HTTPS (wget-based)

SRA Cloud .sra Objects (AWS/GCS)

ENA FASTQ via HTTPS

🧠 Key Features

📦 Supported Protocols

⚙️ Configuration

📊 Example Output

🧱 Repository Structure

📚 Documentation

🧭 Roadmap

🤝 Contributing

Development Workflow

About

Uh oh!

Releases 9

Packages

Contributors 2

Uh oh!

Languages

AustralianBioCommons/insdc-benchmarking-scripts

Folders and files

Latest commit

History

Repository files navigation

🌐 INSDC Benchmarking Scripts

🚀 Quick Start

1. Install

2. Configure

3. Run a Benchmark

HTTP/HTTPS (wget-based)

SRA Cloud .sra Objects (AWS/GCS)

ENA FASTQ via HTTPS

🧠 Key Features

📦 Supported Protocols

⚙️ Configuration

📊 Example Output

🧱 Repository Structure

📚 Documentation

🧭 Roadmap

🤝 Contributing

Development Workflow

About

Resources

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 9

Packages 0

Contributors 2

Uh oh!

Languages

Packages