At the Johns Hopkins University Applied Physics Laboratory (JHU APL), the acronym "CIRCUIT" stands for "Cohort-based Integrated Research Community for Undergraduate Innovation and Trailblazing." This program provides undergraduate students with immersive, hands-on research experiences across various STEM disciplines, including artificial intelligence, precision medicine, planetary exploration, and cybersecurity. This repository showcases a machine learning and data analysis framework developed under this program. It includes end-to-end examples of data preprocessing, visualization, classification, regression, and clustering using Python and popular libraries such as pandas, numpy, matplotlib, and scikit-learn.
- Policy Data Handling: New workflow called
main.py
streamlines the different.py
files that comprise this project and compute correlations. - Population & Deaths Merge: Code now supports reading population and total deaths (e.g., from
pop.csv
ordeaths.csv
) to attach to policy data for further analysis. - Correlation with
% of Population Dead
: Introducespct_of_pop_dead
to see how policy timings correlate with mortality rates. - Multi-Index Melt: Optionally melts policy date offset columns so each row is
(POSTCODE, PolicyType, EffectiveOffset, …)
, facilitating advanced analyses.
To run this project on your local machine, follow the instructions below.
- Clone this repository to your local machine or download the source code as a ZIP file.
- Open a terminal (e.g., Command Prompt, bash shell) and navigate to the project directory.
- Install project dependencies using:
pip install -r requirements.txt
This will install the required Python packages.
python main.py
The code will perform data preprocessing, visualization, classification, regression, and clustering based on the provided functions and print the results to the terminal. You may customize the code and functions according to your specific requirements.
Contributions are welcome! If you find issues or want to add new features, open an issue or submit a pull request.
This project was inspired by the need for a code template for data analysis and machine learning tasks. Thanks to the creators and maintainers of the pandas, numpy, matplotlib, and scikit-learn libraries for providing powerful tools for data manipulation and analysis.