This project is a Rust-based web scraping tool designed to retrieve NEET scorecard data programmatically. It sends POST requests to the official NEET website with various combinations of application numbers and birthdates, parses the HTML responses, and extracts relevant details such as the application number, candidate's name, rank, and total marks obtained.
- Web Scraping: Uses
reqwest
for HTTP requests andscraper
for HTML parsing. - Asynchronous Execution: Built with
tokio
for handling multiple requests concurrently. - Robust Parsing: Extracts scorecard details using precise selectors and handles missing data gracefully.
- Error Handling: Implements basic error handling for network and parsing errors.
- Rust: Ensure you have Rust and Cargo installed. You can install Rust using rustup.
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
- Dependencies: The following crates are used:
reqwest
: For making HTTP requests.scraper
: For parsing HTML documents.tokio
: For asynchronous runtime support.
Install dependencies using Cargo:
cargo build
-
Clone the Repository
git clone https://github.com/ANIR1604/NeetScrapper.git cd NeetScrapper
-
Build the Project
Compile the project and ensure dependencies are resolved:
cargo build
-
Run the Scraper
Execute the scraper with:
cargo run
The tool will iterate through application numbers and date ranges, logging parsed data to the console.
-
HTTP Requests:
- The
reqwest::Client
is used to send POST requests to the NEET scorecard endpoint.
- The
-
HTML Parsing:
scraper
is used to locate and extract specific data fields (e.g., application number, rank).
-
Concurrency:
- Asynchronous functions and
tokio
are utilized to handle multiple requests simultaneously.
- Asynchronous functions and
-
Error Handling:
- The program gracefully handles network timeouts and parsing failures by skipping invalid responses.
solve
: Sends a POST request with specific form data and parses the response.parse_html
: Extracts scorecard data from HTML content.main_loop
: Iterates through possible date combinations and gathers data for a given application number.solve_all_applications
: Handles multiple application numbers sequentially.
The ParsedData
struct represents the extracted details:
struct ParsedData {
application_number: String,
candidate_name: String,
all_india_rank: String,
marks: String,
}
- Rate Limiting: The tool does not currently handle rate limiting. Be cautious of sending too many requests to avoid being blocked.
- Legal Considerations: Ensure that your use of this scraper complies with the terms of service of the NEET website.
The project uses the following dependencies:
[dependencies]
reqwest = { version = "0.11", features = ["json"] }
scraper = "0.14"
tokio = { version = "1", features = ["full"] }
This tool is intended for educational purposes only. The developers are not responsible for any misuse or legal issues arising from its use.
Happy scraping!