KIDS23-Team6 - Visualization and prediction of HPC resource utilization

This purpose of this project is to visualize the historical resource usage of the HPCF resources and predict the future usage. Our efforts focus on CPU, GPU, and memory utilization and prediction.

The aim is to summarize historical data, provide useful forecasts of utilization, and ultimately to provide a dashboard which displays these results interactively. We provide:

Historical usage trends.
Current usage statistics of the cluster queues.
A forecast/prediction of future usage to inform HPRC and users.

So, what's the need?

The HPCF team has gotten multiple requests regarding the usage of the cluster as a whole as well as for individual queues and departments. There is a tool which provides some visualiztion capabilties, but it's overly complicated for everyday use and for researchers who cannot dedicate time to understand all the intricacies.

So, we're designing a simple and intuitive dashboard where researchers can view data relevant to them, their departments, or the institution as a whole.

Furthermore, the HPCF team needs to be able to accurately predict future resource usage so that we can plan for expansions and refreshes of the infrastructure. This project provides information and visualization tools that will assist both researchers and the HPCF team in accomplishing these goals.

About this project

This project was built with:

Tidyverse, Dplyr R packages – Packages for data science and data manipulation
PostgreSQL – Relational database management system
Docker – containerization software
LSTM – AI, deep learning model for time-series data prediction
R Shiny – R package for developing web applications

Preprocessing Pipeline

This project changed from a data processing to a data engineering and reduction effort once we realized how large the data-set is. The following is an illustration of the tranformation and reduction of data that had to take place before we could start visualization and machine-learning efforts:

The following is a visual representing the transformation on the original data records to create a time-series for visualization and training purposes:

Data files and inputs

The input data we used comes primarily from LSF log files and LSF command outputs. Here is a sample of the command outputs:

The data in the LSF log files is quite large indeed. We have logs dating back to September of 2021 the total size of which is around 360GB. The data files are comma-delimited and have over 130 columns per line with over a hundred million lines total. Here is a sample of the data:

Name		Name	Last commit message	Last commit date
Latest commit History 88 Commits
dashboard		dashboard
log_data		log_data
model		model
visualize		visualize
.gitignore		.gitignore
KIDS23-Team6.Rproj		KIDS23-Team6.Rproj
LICENSE		LICENSE
README.md		README.md
combined_time_series.csv		combined_time_series.csv
host_group_and_nodes.csv		host_group_and_nodes.csv
per_second_cpu_counts_4M.csv		per_second_cpu_counts_4M.csv
queues_and_node_groups.csv		queues_and_node_groups.csv
queues_total_cores.csv		queues_total_cores.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

KIDS23-Team6 - Visualization and prediction of HPC resource utilization

So, what's the need?

About this project

Preprocessing Pipeline

Data files and inputs

About

Releases

Packages

Contributors 4

Languages

License

stjude-biohackathon/KIDS23-Team6

Folders and files

Latest commit

History

Repository files navigation

KIDS23-Team6 - Visualization and prediction of HPC resource utilization

So, what's the need?

About this project

Preprocessing Pipeline

Data files and inputs

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages