Skip to content

TechfaneTechnologies/phpd

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PHPD (Process Hive Partitioned Data)

Usage Guidelines

To run this program, ensure that Rust is installed on your system. You can download and install Rust from the official website: Rust Installation Guide.

Installation

Clone the repository using the following command:

git clone https://github.com/TechfaneTechnologies/phpd.git

Then, navigate to the project directory and build the project:

cd phpd
cargo build --release

Generating Dummy Data

To generate dummy data, execute the following command:

cargo run --release --bin generate_dummy_data

Alternatively, you can run the compiled binary directly:

./target/release/generate_dummy_data

Example Terminal Output:

$ ./target/release/generate_dummy_data
    Proceeding with the generation of dummy data for the following instruments: ["BANKNIFTY", "BANKEX", "FINNIFTY", "MIDCPNIFTY", "NIFTY", "NIFTYNXT50", "SENSEX"]
    For the year 2024
    At directory: /Users/DrJuneMoone/Document/hive_partitioned_data
    Successfully generated dummy data at: /Users/DrJuneMoone/Document/hive_partitioned_data
    Generated 5502 CSV files across 1841 subfolders, totaling 8.69 GiB
    Processing speed: 923.87 MiB per second in 9.63 seconds

Processing Hive Partitioned Dummy Data

To process the generated hive partitioned dummy data, run the following command:

cargo run --release --bin process_dummy_data

Or execute the binary directly:

./target/release/process_dummy_data

Example Terminal Output:

$ ./target/release/process_dummy_data
    Found 7 instruments

    Instrument: SENSEX Grouped CSV Files {2: [CsvFile { path: "/Users/DrJuneMoone/Document/hive_partitioned_data/SENSEX/20240101/SENSEX-2.csv", date: "20240101", seq_id: 2 }, CsvFile { path: "/Users/DrJuneMoone/Document/hive_partitioned_data/SENSEX/20240102/SENSEX-2.csv", date: "20240102", seq_id: 2 }, .... ]}

    Processing instrument: BANKNIFTY
    Found 3 sequence groups
    Processing sequence group: 2

    Processing file: /Users/DrJuneMoone/Document/hive_partitioned_data/NIFTYNXT50/20240102/NIFTYNXT50-1.csv
    Processing file: /Users/DrJuneMoone/Document/hive_partitioned_data/NIFTY/20240102/NIFTY-2.csv
    Processing file: /Users/DrJuneMoone/Document/hive_partitioned_data/SENSEX/20240103/SENSEX-2.csv
    ............
    ............
    Successfully merged dummy data at: /Users/DrJuneMoone/Document/hive_partitioned_data
    Generated 21 sequentially merged CSV files, totaling 8.69 GiB
    Processed 5502 CSV files across 1841 subfolders, totaling 8.69 GiB
    Processing speed: 3.30 GiB per second in 5.27 seconds

Changing the Data Directory

To change the location of the hive partitioned data, modify the base_path variable in the source code:

  1. Edit src/generate_dummy_data.rs, lines 18-20.
  2. Edit src/process_dummy_data.rs, lines 8-10.

After making the necessary changes, rebuild and run the program to regenerate and process the data with the updated location.

Processing Actual Hive Partitioned Data

To process your actual hive partitioned data, update the base_path variable in src/process_dummy_data.rs (lines 8-10) and run the following command:

cargo run --release --bin process_dummy_data

Or execute the compiled binary:

./target/release/process_dummy_data

Example Video

Watch On YouTube image

About

PHPD (Process Hive Partitioned Data)

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages