Skip to content

Commit

Permalink
combine workflow for csv files and cloud db, update README
Browse files Browse the repository at this point in the history
  • Loading branch information
TheTallJerry committed Nov 30, 2023
1 parent 04ff17e commit d882449
Show file tree
Hide file tree
Showing 5 changed files with 44 additions and 110 deletions.
32 changes: 0 additions & 32 deletions .github/workflows/data_db.yml

This file was deleted.

Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
name: Fetch data and save as csv to github
name: Fetch data and save data to database, and as csv to github

on:
workflow_dispatch:
Expand All @@ -11,6 +11,10 @@ jobs:
timeout-minutes: 15
runs-on: ubuntu-latest

env:
CLOUD_PSQL_USERNAME: ${{ secrets.CLOUD_PSQL_USERNAME }}
CLOUD_PSQL_PASSWORD: ${{ secrets.CLOUD_PSQL_PASSWORD }}

steps:
- name: Set branch name
id: branch_name
Expand All @@ -33,8 +37,8 @@ jobs:
run: npm install

- name: Run Node.js script
run: node query_nodb.js

run: node query_db.js cloud
- name: Commit and Push Changes
run: |
git add .
Expand Down
11 changes: 5 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,13 +2,12 @@

## Overview

The project currently has 4 workflows, as illustrated under `.github/workflows`:
1. `data_db.yml`: This workflow queries data about uoft bike stations and saves them into a Postgres database on the cloud - currently on ElephantSQL.
2. `cut_branch.yml`: This workflow cuts a daily branch where the data collection will occur.
3. `data_nodb.yml`: This workflow queries about the same data as `data_db.yml` and instead saves them as individual CSV files which are pushed onto the daily branch.
4. `clean_csv.yml`: This workflow collects the above individual CSV files on a daily basis, combines them into a single CSV named as `combined_daily_branch_${date_in_EST}` and removes the individual files afterwards, then pushes the combined CSV file onto the daily branch, then merges the daily branch into main and deletes the daily branch afterwards. The combined CSV is saved under `/collected_data`.
The project currently has 3 workflows, as illustrated under `.github/workflows`:
1. `cut_branch.yml`: This workflow cuts a daily branch where the data collection will occur.
2. `fetch_data.yml`: This workflow queries data about uoft bike stations and saves them into 2 places: a Postgres database on the cloud - currently on ElephantSQL - and as individual CSV files which are pushed onto the daily branch.
3. `clean_csv.yml`: This workflow collects the above individual CSV files on a daily basis, combines them into a single CSV named as `combined_daily_branch_${date_in_EST}` and removes the individual files afterwards, then pushes the combined CSV file onto the daily branch, then merges the daily branch into main and deletes the daily branch afterwards. The combined CSV is saved under `/collected_data`.

`cut_branch.yml` is run on 04:23. `data_db.yml` and `data_nodb.yml` are run with a 5 minute segment, between minutes 10-59, on hours 08:00 to 10:00 and 16:00 to 18:00. `clean_csv.yml` is run on 22:17. All times here are in EST - github actions currently only accepts UTC for their cron, so the times in the actual workflow files are in UTC.
`cut_branch.yml` is run on 04:23. `fetch_data.yml` is run with a 5 minute segment, between minutes 10-59, on hours 08:00 to 10:00 and 16:00 to 18:00. `clean_csv.yml` is run on 22:17. All times here are in EST - github actions currently only accepts UTC for their cron, so the times in the actual workflow files are in UTC.

## Development

Expand Down
42 changes: 32 additions & 10 deletions query_db.js → query.js
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,9 @@ if (process.argv.length < 3) {
port: "5432",
});
} else {
console.error('Invalid argument. Please specify either "cloud" or "docker".');
console.error(
'Invalid argument. Please specify either "cloud" or "docker".'
);
process.exit(1);
}
}
Expand All @@ -33,6 +35,17 @@ var uoftBikeStations = new Set([
7252, 7273, 7274, 7285, 7335, 7358, 7457, 7600, 7667, 7762,
]);

// [
// "station_id",
// "time_checked",
// "num_bikes_avail",
// "num_bikes_disabled",
// "num_docks_avail",
// "num_docks_disabled",
// "station_status",
// ]
const csvArray = [];
var csvData = "";
var lastUpdated = "";
async function fetchDataAndInsert() {
try {
Expand All @@ -53,20 +66,23 @@ async function fetchDataAndInsert() {
);

// Load data into the PostgreSQL database
// and save data into csv file
for (const station of uoftStationsData) {
data = [
Number(station.station_id),
lastUpdated,
station.num_bikes_available,
station.num_bikes_disabled,
station.num_docks_available,
station.num_docks_disabled,
station.status,
];
await dbClient.query(
"insert into station_emptiness (station_id, time_checked, num_bikes_avail, num_docks_avail, num_bikes_disabled, num_docks_disabled, station_status) values ($1, to_timestamp($2), $3, $4, $5, $6, $7)",
[
Number(station.station_id),
lastUpdated,
station.num_bikes_available,
station.num_bikes_disabled,
station.num_docks_available,
station.num_docks_disabled,
station.status,
]
data
);
console.log(`Inserted ${station.station_id} into the database`);
csvArray.push(data);
}
} else {
console.error("Failed to fetch data from the API");
Expand All @@ -77,6 +93,12 @@ async function fetchDataAndInsert() {
// Ensure the database connection is closed
await dbClient.end();
console.log("Connection closed");
// Create CSV file
for (const row of csvArray) {
csvData += row.join(",") + "\n";
}
fs.writeFileSync(`${lastUpdated}.csv`, csvData);
console.log(`Created ${lastUpdated}.csv`);
}
}

Expand Down
59 changes: 0 additions & 59 deletions query_nodb.js

This file was deleted.

0 comments on commit d882449

Please sign in to comment.