GCE Scraper

Download past papers, faster.

Built with the tools and technologies:

🔗 Table of Contents

📍 Overview
👾 Features
📁 Project Structure
- 📂 Project Index
🚀 Getting Started
📌 Project Roadmap
🔰 Contributing
🎗 License
🙌 Acknowledgments

📍 Overview

gce-scraper efficiently downloads past GCE exam papers. Users specify subjects, years, and paper types, and the tool uses multi-threading to quickly acquire and save them. This saves students and educators significant time and effort in accessing vital study materials.

👾 Features

	Feature	Summary
⚙️	Architecture	The project uses a modular architecture, with distinct modules for configuration generation (`config_gen.rs`), configuration management (`configuration.rs`), web scraping (`scraper.rs`), and downloading (`download.rs`). The main application logic resides in `src/main.rs`, orchestrating the interaction between these modules. It leverages a `<Cargo>`-based build system, indicating a well-structured Rust project. The application uses a configuration file (likely TOML) to manage settings, promoting flexibility and maintainability. See `src/configuration.rs` and `Cargo.toml`.
🔩	Code Quality	The codebase is written in `<Rust>`, known for its focus on memory safety and performance. The use of established crates like `<clap>` for command-line argument parsing and `<log>` for logging suggests a focus on best practices. Further analysis of the code would be needed to assess aspects like code style consistency and adherence to coding standards. The modular design promotes code reusability and maintainability.
📄	Documentation	The project includes a `Cargo.toml` file and several `.rs` files (6, according to the provided context). See `FILE CONTENTS`. The primary language is `<Rust>`, and the documentation appears to be primarily embedded within the code itself and the `Cargo.toml` file. The provided context suggests that the documentation could be improved by adding more detailed comments and potentially external documentation. Install, usage, and test commands are provided, indicating some level of documentation for execution.
🔌	Integrations	The project uses `<reqwest>` for making HTTP requests to scrape data from a website. It leverages `<tokio>` for asynchronous operations, likely improving performance, especially during web scraping and downloading. `<serde>` is used for serialization, likely for handling the configuration file and potentially the scraped data. The `<clap>` crate handles command-line argument parsing, providing a user-friendly interface.
🧩	Modularity	The codebase is divided into several modules (`config_gen.rs`, `configuration.rs`, `scraper.rs`, `download.rs`, `lib.rs`), promoting code organization and reusability. The `lib.rs` file acts as a central point for exposing these modules, further enhancing modularity. This modular design improves maintainability and allows for easier testing of individual components. Dependencies are managed effectively using `<Cargo>`, further supporting modularity.

📁 Project Structure

└── /
    ├── Cargo.lock
    ├── Cargo.toml
    └── src
        ├── config_gen.rs
        ├── configuration.rs
        ├── download.rs
        ├── lib.rs
        ├── main.rs
        └── scraper.rs

📂 Project Index

/

__root__

Cargo.toml - `gce-scraper` defines a Rust project using various crates for command-line argument parsing, logging, HTTP requests, and parallel processing
- It leverages `reqwest` for web scraping, `clap` for user interface, and `par-stream` for concurrent operations
- The project's core functionality centers on data scraping and likely involves processing the acquired data using `serde` for serialization.

src

main.rs - The `src/main.rs` file serves as the main entry point, orchestrating the GCE-Guide past paper scraper
- It parses command-line arguments to either generate a configuration file specifying download parameters (papers, years, subjects, seasons) or download past papers using a provided configuration
- The program utilizes multi-threading for efficient I/O operations, managing logging verbosity based on user input.

config_gen.rs - `config_gen.rs` generates a TOML configuration file
- It retrieves syllabus information, paper details across specified years and seasons, and consolidates this data
- The resulting configuration file, written to the designated output path, is used by other parts of the application to manage and process academic papers, leveraging multi-threading for efficient data retrieval.

configuration.rs - The `src/configuration.rs` file defines a `Configuration` struct and implements its loading from a TOML configuration file
- This struct, used throughout the application (as indicated by its `pub` visibility), holds application-wide settings, specifically details about papers (`PaperType`) and subject-year configurations (`YearConfiguration`)
- It acts as a central point for managing the application's configurable parameters.

scraper.rs - The `scraper.rs` module facilitates web scraping of examination papers from a specific website
- It retrieves available years and papers based on syllabus codes, paper types, and seasons
- The module then downloads and saves the requested papers to specified file paths
- Error handling is implemented to manage network and parsing issues, ensuring robust data acquisition.

download.rs - `download.rs` manages the downloading and saving of academic papers
- It reads configuration data, creates necessary directories, and then concurrently downloads papers for specified subjects and years, leveraging multiple threads for efficiency
- The module handles potential errors during configuration parsing and file system operations, ensuring robust download management within the larger application.

lib.rs - `src/lib.rs` establishes the core library for the project, providing foundational modules
- It initializes logging and exposes modules responsible for configuration management (`config_gen`, `configuration`), web scraping (`scraper`), and data downloading (`download`)
- These modules collectively form the building blocks for the application's primary functionality.

🚀 Getting Started

☑️ Prerequisites

Before getting started with , ensure your runtime environment meets the following requirements:

Programming Language: Rust
Package Manager: Cargo

⚙️ Installation

Install using one of the following methods:

Build from source:

Clone the repository:

❯ git clone https://github.com/NightSling/GCE-Scraper.git

Navigate to the project directory:

❯ cd GCE-Scraper

Install the project dependencies:

Using cargo

❯ cargo build

🤖 Usage

Run using the following command: Using cargo

❯ cargo run -- --help

🧪 Testing

Run the test suite using the following command: Using cargo

❯ cargo test

📌 Project Roadmap

Task 1: ~~Parallel config generation.~~
Task 2: ~~Parallel downloading based on config.~~
Task 3: Extend the use cases to other miscellaneous files such as specimen papers and such.
Task 4: Extend the limitation from A-Levels to other boards supported by GCE Guide.

🔰 Contributing

💬 Join the Discussions: Share your insights, provide feedback, or ask questions.
🐛 Report Issues: Submit bugs found or log feature requests for the `` project.
💡 Submit Pull Requests: Review open PRs, and submit your own PRs.

Contributing Guidelines

Fork the Repository: Start by forking the project repository to your LOCAL account.
Clone Locally: Clone the forked repository to your local machine using a git client.
```
git clone .
```
Create a New Branch: Always work on a new branch, giving it a descriptive name.
```
git checkout -b new-feature-x
```
Make Your Changes: Develop and test your changes locally.
Commit Your Changes: Commit with a clear message describing your updates.
```
git commit -m 'Implemented new feature x.'
```
Push to LOCAL: Push the changes to your forked repository.
```
git push origin new-feature-x
```
Submit a Pull Request: Create a PR against the original project repository. Clearly describe the changes and their motivations.
Review: Once your PR is reviewed and approved, it will be merged into the main branch. Congratulations on your contribution!

Contributor Graph

🎗 License

This project is protected under the Apache Version 2.0 License. For more details, refer to the LICENSE file.

🙌 Acknowledgments

Access all past papers easily through GCE-Guide. All papers are the property of Cambridge Assessment International Education (CAIE). The purpose of the software is not to promote piracy or the sharing of proprietary content, but rather to serve as an educational tool.
The README.md is generated through readme-ai.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
.github/workflows		.github/workflows
assets		assets
src		src
.gitignore		.gitignore
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE.txt		LICENSE.txt
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GCE Scraper

🔗 Table of Contents

📍 Overview

👾 Features

📁 Project Structure

📂 Project Index

🚀 Getting Started

☑️ Prerequisites

⚙️ Installation

🤖 Usage

🧪 Testing

📌 Project Roadmap

🔰 Contributing

🎗 License

🙌 Acknowledgments

About

Releases 1

Packages

Languages

main.rs	- The `src/main.rs` file serves as the main entry point, orchestrating the GCE-Guide past paper scraper - It parses command-line arguments to either generate a configuration file specifying download parameters (papers, years, subjects, seasons) or download past papers using a provided configuration - The program utilizes multi-threading for efficient I/O operations, managing logging verbosity based on user input.
config_gen.rs	- `config_gen.rs` generates a TOML configuration file - It retrieves syllabus information, paper details across specified years and seasons, and consolidates this data - The resulting configuration file, written to the designated output path, is used by other parts of the application to manage and process academic papers, leveraging multi-threading for efficient data retrieval.
configuration.rs	- The `src/configuration.rs` file defines a `Configuration` struct and implements its loading from a TOML configuration file - This struct, used throughout the application (as indicated by its `pub` visibility), holds application-wide settings, specifically details about papers (`PaperType`) and subject-year configurations (`YearConfiguration`) - It acts as a central point for managing the application's configurable parameters.
scraper.rs	- The `scraper.rs` module facilitates web scraping of examination papers from a specific website - It retrieves available years and papers based on syllabus codes, paper types, and seasons - The module then downloads and saves the requested papers to specified file paths - Error handling is implemented to manage network and parsing issues, ensuring robust data acquisition.
download.rs	- `download.rs` manages the downloading and saving of academic papers - It reads configuration data, creates necessary directories, and then concurrently downloads papers for specified subjects and years, leveraging multiple threads for efficiency - The module handles potential errors during configuration parsing and file system operations, ensuring robust download management within the larger application.
lib.rs	- `src/lib.rs` establishes the core library for the project, providing foundational modules - It initializes logging and exposes modules responsible for configuration management (`config_gen`, `configuration`), web scraping (`scraper`), and data downloading (`download`) - These modules collectively form the building blocks for the application's primary functionality.

License

NightSling/GCE-Scraper

Folders and files

Latest commit

History

Repository files navigation

GCE Scraper

🔗 Table of Contents

📍 Overview

👾 Features

📁 Project Structure

📂 Project Index

🚀 Getting Started

☑️ Prerequisites

⚙️ Installation

🤖 Usage

🧪 Testing

📌 Project Roadmap

🔰 Contributing

🎗 License

🙌 Acknowledgments

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages