Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add workflow to run evaluation on a subset of datasets #222

Merged
merged 35 commits into from
Dec 2, 2024
Merged

Conversation

abdulfatir
Copy link
Contributor

@abdulfatir abdulfatir commented Nov 29, 2024

Issue #, if available:

Description of changes: This PR adds a workflow that will run the evaluation script on chronos-bolt-small for a subset of datasets specified in ci/evaluate/backtest_configs.yaml. After evaluation, a comment will be made on the PR. The workflow will only run if the run-eval label is present on a PR. The end-to-end workflow has been split into two workflows:

  • eval-model.yml: only has read access (can be run from forks). This will evaluate the model and upload the metrics CSV file as a Github artifact.
  • eval-pr-comment.yml: has read and write access (can only be run when in the main branch). This will be triggered when the first job finishes, will download the CSV from the eval job and make the comment. According to this post, splitting into two jobs as done here is the recommended and secure way to do this.

NOTE: The first steps works as expected, but we can only test the second step after the merging because this workflow needs to be part of the main branch for this to work.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

@abdulfatir abdulfatir added the run-eval Run evaluation CI workflow label Nov 29, 2024
@abdulfatir
Copy link
Contributor Author

Evaluation Metrics

dataset model MASE WQL
ETTh amazon/chronos-bolt-tiny 0.7948305944396085 0.0748624476805626
monash_covid_deaths amazon/chronos-bolt-tiny 40.640740664078855 0.0670648919735542
monash_fred_md amazon/chronos-bolt-tiny 0.6133426935299197 0.04746329539586572
monash_m3_quarterly amazon/chronos-bolt-tiny 1.3633424628367963 0.07873051309371093
monash_nn5_weekly amazon/chronos-bolt-tiny 0.9325957817519015 0.084742535576688
monash_tourism_yearly amazon/chronos-bolt-tiny 4.038943100958313 0.1726199493328989
taxi_30min amazon/chronos-bolt-tiny 0.9122173974466276 0.28005610322283847

Copy link

Evaluation Metrics

dataset model MASE WQL
ETTh amazon/chronos-bolt-tiny 0.7948305944396085 0.0748624476805626
monash_covid_deaths amazon/chronos-bolt-tiny 40.640740664078855 0.0670648919735542
monash_fred_md amazon/chronos-bolt-tiny 0.6133426935299197 0.04746329539586572
monash_m3_quarterly amazon/chronos-bolt-tiny 1.3633424628367963 0.07873051309371093
monash_nn5_weekly amazon/chronos-bolt-tiny 0.9325957817519015 0.084742535576688
monash_tourism_yearly amazon/chronos-bolt-tiny 4.038943100958313 0.1726199493328989
taxi_30min amazon/chronos-bolt-tiny 0.9122173974466276 0.28005610322283847

Copy link

Evaluation Metrics

dataset model MASE WQL
ETTh amazon/chronos-bolt-tiny 0.7948305944396085 0.0748624476805626
monash_covid_deaths amazon/chronos-bolt-tiny 40.640740664078855 0.0670648919735542
monash_fred_md amazon/chronos-bolt-tiny 0.6133426935299197 0.04746329539586572
monash_m3_quarterly amazon/chronos-bolt-tiny 1.3633424628367963 0.07873051309371093
monash_nn5_weekly amazon/chronos-bolt-tiny 0.9325957817519015 0.084742535576688
monash_tourism_yearly amazon/chronos-bolt-tiny 4.038943100958313 0.1726199493328989
taxi_30min amazon/chronos-bolt-tiny 0.9122173974466276 0.28005610322283847

Copy link

Evaluation Metrics

dataset model MASE WQL
ETTh amazon/chronos-bolt-tiny 0.7948305944396085 0.0748624476805626
monash_covid_deaths amazon/chronos-bolt-tiny 40.640740664078855 0.0670648919735542
monash_fred_md amazon/chronos-bolt-tiny 0.6133426935299197 0.04746329539586572
monash_m3_quarterly amazon/chronos-bolt-tiny 1.3633424628367963 0.07873051309371093
monash_nn5_weekly amazon/chronos-bolt-tiny 0.9325957817519015 0.084742535576688
monash_tourism_yearly amazon/chronos-bolt-tiny 4.038943100958313 0.1726199493328989
taxi_30min amazon/chronos-bolt-tiny 0.9122173974466276 0.28005610322283847

@abdulfatir abdulfatir changed the title Add workflow to run evaluation for chronos-bolt-tiny on a subset of datasets Add workflow to run evaluation for chronos-bolt-small on a subset of datasets Dec 1, 2024
@abdulfatir abdulfatir changed the title Add workflow to run evaluation for chronos-bolt-small on a subset of datasets Add workflow to run evaluation on a subset of datasets Dec 1, 2024
@abdulfatir abdulfatir merged commit eac768c into main Dec 2, 2024
6 checks passed
@abdulfatir abdulfatir deleted the ci-auto-eval branch December 2, 2024 09:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
run-eval Run evaluation CI workflow
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants