Add workflow to run evaluation on a subset of datasets #222

abdulfatir · 2024-11-29T17:03:48Z

Issue #, if available:

Description of changes: This PR adds a workflow that will run the evaluation script on chronos-bolt-small for a subset of datasets specified in ci/evaluate/backtest_configs.yaml. After evaluation, a comment will be made on the PR. The workflow will only run if the run-eval label is present on a PR. The end-to-end workflow has been split into two workflows:

eval-model.yml: only has read access (can be run from forks). This will evaluate the model and upload the metrics CSV file as a Github artifact.
eval-pr-comment.yml: has read and write access (can only be run when in the main branch). This will be triggered when the first job finishes, will download the CSV from the eval job and make the comment. According to this post, splitting into two jobs as done here is the recommended and secure way to do this.

NOTE: The first steps works as expected, but we can only test the second step after the merging because this workflow needs to be part of the main branch for this to work.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

abdulfatir · 2024-11-29T17:06:49Z

Evaluation Metrics

dataset	model	MASE	WQL
ETTh	amazon/chronos-bolt-tiny	0.7948305944396085	0.0748624476805626
monash_covid_deaths	amazon/chronos-bolt-tiny	40.640740664078855	0.0670648919735542
monash_fred_md	amazon/chronos-bolt-tiny	0.6133426935299197	0.04746329539586572
monash_m3_quarterly	amazon/chronos-bolt-tiny	1.3633424628367963	0.07873051309371093
monash_nn5_weekly	amazon/chronos-bolt-tiny	0.9325957817519015	0.084742535576688
monash_tourism_yearly	amazon/chronos-bolt-tiny	4.038943100958313	0.1726199493328989
taxi_30min	amazon/chronos-bolt-tiny	0.9122173974466276	0.28005610322283847

github-actions · 2024-11-29T17:14:42Z

Evaluation Metrics

dataset	model	MASE	WQL
ETTh	amazon/chronos-bolt-tiny	0.7948305944396085	0.0748624476805626
monash_covid_deaths	amazon/chronos-bolt-tiny	40.640740664078855	0.0670648919735542
monash_fred_md	amazon/chronos-bolt-tiny	0.6133426935299197	0.04746329539586572
monash_m3_quarterly	amazon/chronos-bolt-tiny	1.3633424628367963	0.07873051309371093
monash_nn5_weekly	amazon/chronos-bolt-tiny	0.9325957817519015	0.084742535576688
monash_tourism_yearly	amazon/chronos-bolt-tiny	4.038943100958313	0.1726199493328989
taxi_30min	amazon/chronos-bolt-tiny	0.9122173974466276	0.28005610322283847

github-actions · 2024-11-29T17:18:24Z

Evaluation Metrics

dataset	model	MASE	WQL
ETTh	amazon/chronos-bolt-tiny	0.7948305944396085	0.0748624476805626
monash_covid_deaths	amazon/chronos-bolt-tiny	40.640740664078855	0.0670648919735542
monash_fred_md	amazon/chronos-bolt-tiny	0.6133426935299197	0.04746329539586572
monash_m3_quarterly	amazon/chronos-bolt-tiny	1.3633424628367963	0.07873051309371093
monash_nn5_weekly	amazon/chronos-bolt-tiny	0.9325957817519015	0.084742535576688
monash_tourism_yearly	amazon/chronos-bolt-tiny	4.038943100958313	0.1726199493328989
taxi_30min	amazon/chronos-bolt-tiny	0.9122173974466276	0.28005610322283847

github-actions · 2024-11-29T17:24:25Z

Evaluation Metrics

dataset	model	MASE	WQL
ETTh	amazon/chronos-bolt-tiny	0.7948305944396085	0.0748624476805626
monash_covid_deaths	amazon/chronos-bolt-tiny	40.640740664078855	0.0670648919735542
monash_fred_md	amazon/chronos-bolt-tiny	0.6133426935299197	0.04746329539586572
monash_m3_quarterly	amazon/chronos-bolt-tiny	1.3633424628367963	0.07873051309371093
monash_nn5_weekly	amazon/chronos-bolt-tiny	0.9325957817519015	0.084742535576688
monash_tourism_yearly	amazon/chronos-bolt-tiny	4.038943100958313	0.1726199493328989
taxi_30min	amazon/chronos-bolt-tiny	0.9122173974466276	0.28005610322283847

.github/workflows/eval-pr-comment.yml

Abdul Fatir Ansari and others added 24 commits November 29, 2024 14:44

Initial eval CI

d146ee8

Fix directory

3325494

Fix trigger

48cee93

Fix trigger

d65a2ed

Remove upload and download

4985b32

Make comment

5c70113

Do eval instead

9385d01

Use CPU

b170f37

Use fp32

d300717

Udpate backtest config

4e6300a

Update heading

5ba5bf3

Remove eval_model.py

fe78953

Merge branch 'main' into ci-auto-eval

0854cf8

Set up repo secret

08dd592

Remove env

15492ac

debug token

bcf080c

fix

1498d58

Fix

d51a39f

Fix

793b62d

test

c7ad116

fix

0f38127

Test

9c0fc94

Change into labeled

0047336

revert

b019de1

abdulfatir added the run-eval Run evaluation CI workflow label Nov 29, 2024

Abdul Fatir Ansari added 2 commits November 29, 2024 17:08

Check with github token

827d44b

Fix

b7f1167

Use default token

4e22e75

Only run on PR to main branch

164b257

Abdul Fatir Ansari added 5 commits December 1, 2024 15:29

Split into two workflows

b93c240

test download

f9adcf8

Fix

7e46046

Run on small

4d1759e

Polish

5959808

abdulfatir changed the title ~~Add workflow to run evaluation for chronos-bolt-tiny on a subset of datasets~~ Add workflow to run evaluation for chronos-bolt-small on a subset of datasets Dec 1, 2024

abdulfatir commented Dec 1, 2024

View reviewed changes

.github/workflows/eval-pr-comment.yml Show resolved Hide resolved

Abdul Fatir Ansari added 2 commits December 1, 2024 16:02

Update Run ID

20b7e8d

scope down permissions

18a7340

abdulfatir changed the title ~~Add workflow to run evaluation for chronos-bolt-small on a subset of datasets~~ Add workflow to run evaluation on a subset of datasets Dec 1, 2024

lostella reviewed Dec 2, 2024

View reviewed changes

.github/workflows/eval-pr-comment.yml Show resolved Hide resolved

lostella approved these changes Dec 2, 2024

View reviewed changes

abdulfatir merged commit eac768c into main Dec 2, 2024
6 checks passed

abdulfatir deleted the ci-auto-eval branch December 2, 2024 09:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add workflow to run evaluation on a subset of datasets #222

Add workflow to run evaluation on a subset of datasets #222

abdulfatir commented Nov 29, 2024 •

edited

Loading

abdulfatir commented Nov 29, 2024

github-actions bot commented Nov 29, 2024

github-actions bot commented Nov 29, 2024

github-actions bot commented Nov 29, 2024

Add workflow to run evaluation on a subset of datasets #222

Add workflow to run evaluation on a subset of datasets #222

Conversation

abdulfatir commented Nov 29, 2024 • edited Loading

abdulfatir commented Nov 29, 2024

Evaluation Metrics

github-actions bot commented Nov 29, 2024

Evaluation Metrics

github-actions bot commented Nov 29, 2024

Evaluation Metrics

github-actions bot commented Nov 29, 2024

Evaluation Metrics

abdulfatir commented Nov 29, 2024 •

edited

Loading