To run this program, ensure that Rust is installed on your system. You can download and install Rust from the official website: Rust Installation Guide.
Clone the repository using the following command:
git clone https://github.com/TechfaneTechnologies/phpd.git
Then, navigate to the project directory and build the project:
cd phpd
cargo build --release
To generate dummy data, execute the following command:
cargo run --release --bin generate_dummy_data
Alternatively, you can run the compiled binary directly:
./target/release/generate_dummy_data
$ ./target/release/generate_dummy_data
Proceeding with the generation of dummy data for the following instruments: ["BANKNIFTY", "BANKEX", "FINNIFTY", "MIDCPNIFTY", "NIFTY", "NIFTYNXT50", "SENSEX"]
For the year 2024
At directory: /Users/DrJuneMoone/Document/hive_partitioned_data
Successfully generated dummy data at: /Users/DrJuneMoone/Document/hive_partitioned_data
Generated 5502 CSV files across 1841 subfolders, totaling 8.69 GiB
Processing speed: 923.87 MiB per second in 9.63 seconds
To process the generated hive partitioned dummy data, run the following command:
cargo run --release --bin process_dummy_data
Or execute the binary directly:
./target/release/process_dummy_data
$ ./target/release/process_dummy_data
Found 7 instruments
Instrument: SENSEX Grouped CSV Files {2: [CsvFile { path: "/Users/DrJuneMoone/Document/hive_partitioned_data/SENSEX/20240101/SENSEX-2.csv", date: "20240101", seq_id: 2 }, CsvFile { path: "/Users/DrJuneMoone/Document/hive_partitioned_data/SENSEX/20240102/SENSEX-2.csv", date: "20240102", seq_id: 2 }, .... ]}
Processing instrument: BANKNIFTY
Found 3 sequence groups
Processing sequence group: 2
Processing file: /Users/DrJuneMoone/Document/hive_partitioned_data/NIFTYNXT50/20240102/NIFTYNXT50-1.csv
Processing file: /Users/DrJuneMoone/Document/hive_partitioned_data/NIFTY/20240102/NIFTY-2.csv
Processing file: /Users/DrJuneMoone/Document/hive_partitioned_data/SENSEX/20240103/SENSEX-2.csv
............
............
Successfully merged dummy data at: /Users/DrJuneMoone/Document/hive_partitioned_data
Generated 21 sequentially merged CSV files, totaling 8.69 GiB
Processed 5502 CSV files across 1841 subfolders, totaling 8.69 GiB
Processing speed: 3.30 GiB per second in 5.27 seconds
To change the location of the hive partitioned data, modify the base_path
variable in the source code:
- Edit
src/generate_dummy_data.rs
, lines 18-20. - Edit
src/process_dummy_data.rs
, lines 8-10.
After making the necessary changes, rebuild and run the program to regenerate and process the data with the updated location.
To process your actual hive partitioned data, update the base_path
variable in src/process_dummy_data.rs
(lines 8-10) and run the following command:
cargo run --release --bin process_dummy_data
Or execute the compiled binary:
./target/release/process_dummy_data