Download past papers, faster.
Built with the tools and technologies:
- 📍 Overview
- 👾 Features
- 📁 Project Structure
- 🚀 Getting Started
- 📌 Project Roadmap
- 🔰 Contributing
- 🎗 License
- 🙌 Acknowledgments
gce-scraper
efficiently downloads past GCE exam papers. Users specify subjects, years, and paper types, and the tool uses multi-threading to quickly acquire and save them. This saves students and educators significant time and effort in accessing vital study materials.
Feature | Summary | |
---|---|---|
⚙️ | Architecture |
|
🔩 | Code Quality |
|
📄 | Documentation |
|
🔌 | Integrations |
|
🧩 | Modularity |
|
└── /
├── Cargo.lock
├── Cargo.toml
└── src
├── config_gen.rs
├── configuration.rs
├── download.rs
├── lib.rs
├── main.rs
└── scraper.rs
/
__root__
Cargo.toml - `gce-scraper` defines a Rust project using various crates for command-line argument parsing, logging, HTTP requests, and parallel processing
- It leverages `reqwest` for web scraping, `clap` for user interface, and `par-stream` for concurrent operations
- The project's core functionality centers on data scraping and likely involves processing the acquired data using `serde` for serialization.
src
main.rs - The `src/main.rs` file serves as the main entry point, orchestrating the GCE-Guide past paper scraper
- It parses command-line arguments to either generate a configuration file specifying download parameters (papers, years, subjects, seasons) or download past papers using a provided configuration
- The program utilizes multi-threading for efficient I/O operations, managing logging verbosity based on user input.config_gen.rs - `config_gen.rs` generates a TOML configuration file
- It retrieves syllabus information, paper details across specified years and seasons, and consolidates this data
- The resulting configuration file, written to the designated output path, is used by other parts of the application to manage and process academic papers, leveraging multi-threading for efficient data retrieval.configuration.rs - The `src/configuration.rs` file defines a `Configuration` struct and implements its loading from a TOML configuration file
- This struct, used throughout the application (as indicated by its `pub` visibility), holds application-wide settings, specifically details about papers (`PaperType`) and subject-year configurations (`YearConfiguration`)
- It acts as a central point for managing the application's configurable parameters.scraper.rs - The `scraper.rs` module facilitates web scraping of examination papers from a specific website
- It retrieves available years and papers based on syllabus codes, paper types, and seasons
- The module then downloads and saves the requested papers to specified file paths
- Error handling is implemented to manage network and parsing issues, ensuring robust data acquisition.download.rs - `download.rs` manages the downloading and saving of academic papers
- It reads configuration data, creates necessary directories, and then concurrently downloads papers for specified subjects and years, leveraging multiple threads for efficiency
- The module handles potential errors during configuration parsing and file system operations, ensuring robust download management within the larger application.lib.rs - `src/lib.rs` establishes the core library for the project, providing foundational modules
- It initializes logging and exposes modules responsible for configuration management (`config_gen`, `configuration`), web scraping (`scraper`), and data downloading (`download`)
- These modules collectively form the building blocks for the application's primary functionality.
Before getting started with , ensure your runtime environment meets the following requirements:
- Programming Language: Rust
- Package Manager: Cargo
Install using one of the following methods:
Build from source:
- Clone the repository:
❯ git clone https://github.com/NightSling/GCE-Scraper.git
- Navigate to the project directory:
❯ cd GCE-Scraper
- Install the project dependencies:
❯ cargo build
Run using the following command:
Using cargo
❯ cargo run -- --help
Run the test suite using the following command:
Using cargo
❯ cargo test
-
Task 1
:Parallel config generation. -
Task 2
:Parallel downloading based on config. -
Task 3
: Extend the use cases to other miscellaneous files such as specimen papers and such. -
Task 4
: Extend the limitation from A-Levels to other boards supported by GCE Guide.
- 💬 Join the Discussions: Share your insights, provide feedback, or ask questions.
- 🐛 Report Issues: Submit bugs found or log feature requests for the `` project.
- 💡 Submit Pull Requests: Review open PRs, and submit your own PRs.
Contributing Guidelines
- Fork the Repository: Start by forking the project repository to your LOCAL account.
- Clone Locally: Clone the forked repository to your local machine using a git client.
git clone .
- Create a New Branch: Always work on a new branch, giving it a descriptive name.
git checkout -b new-feature-x
- Make Your Changes: Develop and test your changes locally.
- Commit Your Changes: Commit with a clear message describing your updates.
git commit -m 'Implemented new feature x.'
- Push to LOCAL: Push the changes to your forked repository.
git push origin new-feature-x
- Submit a Pull Request: Create a PR against the original project repository. Clearly describe the changes and their motivations.
- Review: Once your PR is reviewed and approved, it will be merged into the main branch. Congratulations on your contribution!
This project is protected under the Apache Version 2.0 License. For more details, refer to the LICENSE file.
- Access all past papers easily through GCE-Guide. All papers are the property of Cambridge Assessment International Education (CAIE). The purpose of the software is not to promote piracy or the sharing of proprietary content, but rather to serve as an educational tool.
- The README.md is generated through readme-ai.