diff --git a/docs/_freeze/posts/ci-analysis/index/execute-results/html.json b/docs/_freeze/posts/ci-analysis/index/execute-results/html.json index 2c529381d079..fa4a2204cfa6 100644 --- a/docs/_freeze/posts/ci-analysis/index/execute-results/html.json +++ b/docs/_freeze/posts/ci-analysis/index/execute-results/html.json @@ -1,7 +1,7 @@ { - "hash": "68f0afd28a0c7a6e975bb04cf7f07235", + "hash": "47294033e490cc53cd08275f84de9edd", "result": { - "markdown": "---\ntitle: \"Analysis of Ibis's CI performance\"\nauthor: \"Phillip Cloud\"\ndate: \"2023-01-09\"\ncategories:\n - blog\n - bigquery\n - continuous integration\n - data engineering\n - dogfood\n---\n\n## Summary\n\nThis notebook takes you through an analysis of Ibis's CI data using ibis on top of [Google BigQuery](https://cloud.google.com/bigquery).\n\n- First, we load some data and poke around at it to see what's what.\n- Second, we figure out some useful things to calculate based on our poking.\n- Third, we'll visualize the results of calculations to showcase what changed and how.\n\n## Imports\n\nLet's start out by importing ibis and turning on interactive mode.\n\n::: {#55fba71a .cell execution_count=1}\n``` {.python .cell-code}\nimport ibis\nfrom ibis import _\n\nibis.options.interactive = True\n```\n:::\n\n\n## Connect to BigQuery\n\nWe connect to BigQuery using the `ibis.connect` API, which accepts a URL string indicating the backend and various bit of information needed to connect to the backend. Here we're using BigQuery, so we need the project id (`ibis-gbq`) and the dataset id (`workflows`).\n\nDatasets are analogous to schemas in other systems.\n\n::: {#a414988f .cell execution_count=2}\n``` {.python .cell-code}\nurl = \"bigquery://ibis-gbq/workflows\"\ncon = ibis.connect(url)\n```\n:::\n\n\nLet's see what tables are available.\n\n::: {#24c17c0e .cell execution_count=3}\n``` {.python .cell-code}\ncon.list_tables()\n```\n\n::: {.cell-output .cell-output-display execution_count=3}\n```\n['analysis', 'jobs', 'workflows']\n```\n:::\n:::\n\n\n## Analysis\n\nHere we've got our first bit of interesting information: the `jobs` and `workflows` tables.\n\n### Terminology\n\nBefore we jump in, it helps to lay down some terminology.\n\n- A **workflow** corresponds to an individual GitHub Actions YAML file in a GitHub repository under the `.github/workflows` directory.\n- A **job** is a named set of steps to run inside a **workflow** file.\n\n### What's in the `workflows` table?\n\nEach row in the `workflows` table corresponds to a **workflow run**.\n\n- A **workflow run** is an instance of a workflow that was triggered by some entity: a GitHub user, bot, or other entity. Each row of the `workflows` table is a **workflow run**.\n\n### What's in the `jobs` table?\n\nSimilarly, each row in the `jobs` table is a **job run**. That is, for a given **workflow run** there are a set of jobs run with it.\n\n- A **job run** is an instance of a job *in a workflow*. It is associated with a single **workflow run**.\n\n## Rationale\n\nThe goal of this analysis is to try to understand ibis's CI performance, and whether the amount of time we spent waiting on CI has decreased, stayed the same or increased. Ideally, we can understand the pieces that contribute to the change or lack thereof.\n\n### Metrics\n\nTo that end there are a few interesting metrics to look at:\n\n- **job run** *duration*: this is the amount of time it takes for a given job to complete\n- **workflow run** *duration*: the amount of time it takes for *all* job runs in a workflow run to complete.\n- **queueing** *duration*: the amount time time spent waiting for the *first* job run to commence.\n\n### Mitigating Factors\n\n- Around October 2021, we changed our CI infrastructure to use [Poetry](https://python-poetry.org/) instead of [Conda](https://docs.conda.io/en/latest/). The goal there was to see if we could cache dependencies using the lock file generated by poetry. We should see whether that had any effect.\n- At the end of November 2022, we switch to the Team Plan (a paid GitHub plan) for the Ibis organzation. This tripled the amount of **job runs** that could execute in parallel. We should see if that helped anything.\n\nAlright, let's jump into some data!\n\n::: {#ac165685 .cell execution_count=4}\n``` {.python .cell-code}\njobs = con.tables.jobs[_.started_at < \"2023-01-09\"]\njobs\n```\n\n::: {.cell-output .cell-output-display execution_count=4}\n```{=html}\n
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┓\n┃ url                                                                    steps                                                                             status     started_at                 runner_group_name  run_attempt  name                               labels         node_id                       id         runner_id  run_url                                                                run_id     check_run_url                                                        html_url                                                                    runner_name  runner_group_id  head_sha                                  conclusion  completed_at              ┃\n┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━┩\n│ stringarray<struct<status: string, conclusion: string, started_at: timestamp('UTC'), …stringtimestamp('UTC')stringint64stringarray<string>stringint64int64stringint64stringstringstringint64stringstringtimestamp('UTC')          │\n├───────────────────────────────────────────────────────────────────────┼──────────────────────────────────────────────────────────────────────────────────┼───────────┼───────────────────────────┼───────────────────┼─────────────┼───────────────────────────────────┼───────────────┼──────────────────────────────┼───────────┼───────────┼───────────────────────────────────────────────────────────────────────┼───────────┼─────────────────────────────────────────────────────────────────────┼────────────────────────────────────────────────────────────────────────────┼─────────────┼─────────────────┼──────────────────────────────────────────┼────────────┼───────────────────────────┤\n│ https://api.github.com/repos/ibis-project/ibis/actions/jobs/950708339[{...}, {...}, ... +9]completed2020-08-05 19:01:16+00:00NULL1Lint, package and benckmark      []MDg6Q2hlY2tSdW45NTA3MDgzMzk=950708339NULLhttps://api.github.com/repos/ibis-project/ibis/actions/runs/196667191196667191https://api.github.com/repos/ibis-project/ibis/check-runs/950708339https://github.com/ibis-project/ibis/runs/950708339?check_suite_focus=trueNULLNULL08855609f1e9ebdeb6197887cf64ecda015d99a8success   2020-08-05 19:51:29+00:00 │\n│ https://api.github.com/repos/ibis-project/ibis/actions/jobs/950545071[{...}, {...}, ... +9]completed2020-08-05 18:12:43+00:00NULL1Lint, package and benckmark      []MDg6Q2hlY2tSdW45NTA1NDUwNzE=950545071NULLhttps://api.github.com/repos/ibis-project/ibis/actions/runs/196617109196617109https://api.github.com/repos/ibis-project/ibis/check-runs/950545071https://github.com/ibis-project/ibis/runs/950545071?check_suite_focus=trueNULLNULL7472797f3e4da39d18e53c09566dba5e373094b0success   2020-08-05 18:59:32+00:00 │\n│ https://api.github.com/repos/ibis-project/ibis/actions/jobs/950493219[{...}, {...}, ... +9]completed2020-08-05 17:59:27+00:00NULL1Lint, package and benckmark      []MDg6Q2hlY2tSdW45NTA0OTMyMTk=950493219NULLhttps://api.github.com/repos/ibis-project/ibis/actions/runs/196598751196598751https://api.github.com/repos/ibis-project/ibis/check-runs/950493219https://github.com/ibis-project/ibis/runs/950493219?check_suite_focus=trueNULLNULL7452ea048908149a672f681ffd94e3fd0953ab2cfailure   2020-08-05 18:05:28+00:00 │\n│ https://api.github.com/repos/ibis-project/ibis/actions/jobs/950486464[{...}, {...}, ... +9]completed2020-08-05 17:56:41+00:00NULL1Lint, package and benckmark      []MDg6Q2hlY2tSdW45NTA0ODY0NjQ=950486464NULLhttps://api.github.com/repos/ibis-project/ibis/actions/runs/196596932196596932https://api.github.com/repos/ibis-project/ibis/check-runs/950486464https://github.com/ibis-project/ibis/runs/950486464?check_suite_focus=trueNULLNULL59daccd16de041b14fa48b9ba53e8aac6495a578failure   2020-08-05 18:23:47+00:00 │\n│ https://api.github.com/repos/ibis-project/ibis/actions/jobs/950480141[{...}, {...}, ... +9]completed2020-08-05 17:54:44+00:00NULL1Lint, package and benckmark      []MDg6Q2hlY2tSdW45NTA0ODAxNDE=950480141NULLhttps://api.github.com/repos/ibis-project/ibis/actions/runs/196595357196595357https://api.github.com/repos/ibis-project/ibis/check-runs/950480141https://github.com/ibis-project/ibis/runs/950480141?check_suite_focus=trueNULLNULL1d4f2db372834da7fb33b53c60b59d3f3e40cf7cfailure   2020-08-05 17:54:51+00:00 │\n│ https://api.github.com/repos/ibis-project/ibis/actions/jobs/950177717[{...}, {...}, ... +10]completed2020-08-05 16:29:08+00:00NULL1Lint, package and benckmark      []MDg6Q2hlY2tSdW45NTAxNzc3MTc=950177717NULLhttps://api.github.com/repos/ibis-project/ibis/actions/runs/196505866196505866https://api.github.com/repos/ibis-project/ibis/check-runs/950177717https://github.com/ibis-project/ibis/runs/950177717?check_suite_focus=trueNULLNULLc36dd6504d86d1994fb36d6a84fb3f302a57642cfailure   2020-08-05 17:17:42+00:00 │\n│ https://api.github.com/repos/ibis-project/ibis/actions/jobs/950129587[{...}, {...}, ... +10]completed2020-08-05 16:15:54+00:00NULL1Lint, package and benckmark      []MDg6Q2hlY2tSdW45NTAxMjk1ODc=950129587NULLhttps://api.github.com/repos/ibis-project/ibis/actions/runs/196491206196491206https://api.github.com/repos/ibis-project/ibis/check-runs/950129587https://github.com/ibis-project/ibis/runs/950129587?check_suite_focus=trueNULLNULL5ffc4dcb3857eae64b5b36f46b378149c0bb2d74failure   2020-08-05 16:47:02+00:00 │\n│ https://api.github.com/repos/ibis-project/ibis/actions/jobs/949724768[{...}, {...}, ... +14]completed2020-08-05 14:36:49+00:00NULL1Docs, lint, package and benckmark[]MDg6Q2hlY2tSdW45NDk3MjQ3Njg=949724768NULLhttps://api.github.com/repos/ibis-project/ibis/actions/runs/196367166196367166https://api.github.com/repos/ibis-project/ibis/check-runs/949724768https://github.com/ibis-project/ibis/runs/949724768?check_suite_focus=trueNULLNULLe88d621425c939857b3b9391794c5ddfd7615981failure   2020-08-05 15:10:08+00:00 │\n│ https://api.github.com/repos/ibis-project/ibis/actions/jobs/949565949[{...}, {...}, ... +13]completed2020-08-05 14:01:36+00:00NULL1Docs, lint, package and benckmark[]MDg6Q2hlY2tSdW45NDk1NjU5NDk=949565949NULLhttps://api.github.com/repos/ibis-project/ibis/actions/runs/196316939196316939https://api.github.com/repos/ibis-project/ibis/check-runs/949565949https://github.com/ibis-project/ibis/runs/949565949?check_suite_focus=trueNULLNULL702446a96a1b9e6b463084f2f09f2f2106fef8d4failure   2020-08-05 14:32:10+00:00 │\n│ https://api.github.com/repos/ibis-project/ibis/actions/jobs/947307233[{...}, {...}, ... +13]completed2020-08-05 00:48:26+00:00NULL1Docs, lint, package and benckmark[]MDg6Q2hlY2tSdW45NDczMDcyMzM=947307233NULLhttps://api.github.com/repos/ibis-project/ibis/actions/runs/195537439195537439https://api.github.com/repos/ibis-project/ibis/check-runs/947307233https://github.com/ibis-project/ibis/runs/947307233?check_suite_focus=trueNULLNULL2ab26f385b87f39b66cf51783d7ab8904fdb4677failure   2020-08-05 01:19:56+00:00 │\n│                          │\n└───────────────────────────────────────────────────────────────────────┴──────────────────────────────────────────────────────────────────────────────────┴───────────┴───────────────────────────┴───────────────────┴─────────────┴───────────────────────────────────┴───────────────┴──────────────────────────────┴───────────┴───────────┴───────────────────────────────────────────────────────────────────────┴───────────┴─────────────────────────────────────────────────────────────────────┴────────────────────────────────────────────────────────────────────────────┴─────────────┴─────────────────┴──────────────────────────────────────────┴────────────┴───────────────────────────┘\n
\n```\n:::\n:::\n\n\nThese first few columns in the `jobs` table aren't that interesting so we should look at what else is there\n\n::: {#67783451 .cell execution_count=5}\n``` {.python .cell-code}\njobs.columns\n```\n\n::: {.cell-output .cell-output-display execution_count=5}\n```\n['url',\n 'steps',\n 'status',\n 'started_at',\n 'runner_group_name',\n 'run_attempt',\n 'name',\n 'labels',\n 'node_id',\n 'id',\n 'runner_id',\n 'run_url',\n 'run_id',\n 'check_run_url',\n 'html_url',\n 'runner_name',\n 'runner_group_id',\n 'head_sha',\n 'conclusion',\n 'completed_at']\n```\n:::\n:::\n\n\nA bunch of these aren't that useful for our purposes. However, `run_id`, `started_at`, `completed_at` are useful for us. The [GitHub documentation for job information](https://docs.github.com/en/rest/actions/workflow-jobs?apiVersion=2022-11-28#get-a-job-for-a-workflow-run) provides useful detail about the meaning of these fields.\n\n- `run_id`: the workflow run associated with this job run\n- `started_at`: when the job started\n- `completed_at`: when the job completed\n\nWhat we're interested in to a first degree is the job duration, so let's compute that.\n\nWe also need to compute when the last job for a given `run_id` started and when it completed. We'll use the former to compute the queueing duration, and the latter to compute the total time it took for a given workflow run to complete.\n\n::: {#c89b9246 .cell execution_count=6}\n``` {.python .cell-code}\nrun_id_win = ibis.window(group_by=_.run_id)\njobs = jobs.select(\n _.run_id,\n job_duration=_.completed_at.cast(\"int\") - _.started_at.cast(\"int\"),\n last_job_started_at=_.started_at.max().over(run_id_win),\n last_job_completed_at=_.completed_at.max().over(run_id_win),\n)\njobs\n```\n\n::: {.cell-output .cell-output-display execution_count=6}\n```{=html}\n
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┓\n┃ run_id     job_duration  last_job_started_at        last_job_completed_at     ┃\n┡━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━┩\n│ int64int64timestamp('UTC')timestamp('UTC')          │\n├───────────┼──────────────┼───────────────────────────┼───────────────────────────┤\n│ 20596172613530000002020-08-12 18:41:31+00:002020-08-12 18:41:31+00:00 │\n│ 2059617264540000002020-08-12 18:41:31+00:002020-08-12 18:41:31+00:00 │\n│ 2059617264470000002020-08-12 18:41:31+00:002020-08-12 18:41:31+00:00 │\n│ 2059617267100000002020-08-12 18:41:31+00:002020-08-12 18:41:31+00:00 │\n│ 2059617263640000002020-08-12 18:41:31+00:002020-08-12 18:41:31+00:00 │\n│ 2059617263220000002020-08-12 18:41:31+00:002020-08-12 18:41:31+00:00 │\n│ 20596172602020-08-12 18:41:31+00:002020-08-12 18:41:31+00:00 │\n│ 2128049343530000002020-08-18 00:36:24+00:002020-08-18 00:36:24+00:00 │\n│ 2128049344000000002020-08-18 00:36:24+00:002020-08-18 00:36:24+00:00 │\n│ 21280493410270000002020-08-18 00:36:24+00:002020-08-18 00:36:24+00:00 │\n│                                  │\n└───────────┴──────────────┴───────────────────────────┴───────────────────────────┘\n
\n```\n:::\n:::\n\n\nLet's take a look at `workflows`\n\n::: {#31835a03 .cell execution_count=7}\n``` {.python .cell-code}\nworkflows = con.tables.workflows\nworkflows\n```\n\n::: {.cell-output .cell-output-display execution_count=7}\n```{=html}\n
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓\n┃ workflow_url                                                              workflow_id  triggering_actor  run_number  run_attempt  updated_at                 cancel_url                                                                    rerun_url                                                                    check_suite_node_id               pull_requests                                                                     id         node_id                           status     repository                                                                                                                                                     jobs_url                                                                    previous_attempt_url  artifacts_url                                                                    html_url                                                     head_sha                                  head_repository                                                                                                                                                    run_started_at             head_branch    url                                                                    event         name    actor  created_at                 check_suite_url                                                         check_suite_id  conclusion  head_commit                                                                                                                            logs_url                                                                   ┃\n┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩\n│ stringint64struct<subscrip…int64int64timestamp('UTC')stringstringstringarray<struct<number: int64, url: string, id: int64, head: struct<sha: string, r…int64stringstringstruct<trees_url: string, teams_url: string, statuses_url: string, subscribers_…stringstringstringstringstringstruct<trees_url: string, teams_url: string, statuses_url: string, subscribers_…timestamp('UTC')stringstringstringstringstru…timestamp('UTC')stringint64stringstruct<tree_id: string, timestamp: timestamp('UTC'), message: string, id: strin…string                                                                     │\n├──────────────────────────────────────────────────────────────────────────┼─────────────┼──────────────────┼────────────┼─────────────┼───────────────────────────┼──────────────────────────────────────────────────────────────────────────────┼─────────────────────────────────────────────────────────────────────────────┼──────────────────────────────────┼──────────────────────────────────────────────────────────────────────────────────┼───────────┼──────────────────────────────────┼───────────┼───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┼────────────────────────────────────────────────────────────────────────────┼──────────────────────┼─────────────────────────────────────────────────────────────────────────────────┼─────────────────────────────────────────────────────────────┼──────────────────────────────────────────┼───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┼───────────────────────────┼───────────────┼───────────────────────────────────────────────────────────────────────┼──────────────┼────────┼───────┼───────────────────────────┼────────────────────────────────────────────────────────────────────────┼────────────────┼────────────┼───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┼────────────────────────────────────────────────────────────────────────────┤\n│ https://api.github.com/repos/ibis-project/ibis/actions/workflows/21009862100986NULL2812020-08-05 20:01:17+00:00https://api.github.com/repos/ibis-project/ibis/actions/runs/196667191/cancelhttps://api.github.com/repos/ibis-project/ibis/actions/runs/196667191/rerunMDEwOkNoZWNrU3VpdGUxMDEzMDI5NDkw[]196667191MDExOldvcmtmbG93UnVuMTk2NjY3MTkxcompleted{'trees_url': 'https://api.github.com/repos/ibis-project/ibis/git/trees{/sha}', 'teams_url': 'https://api.github.com/repos/ibis-project/ibis/teams', ... +44}https://api.github.com/repos/ibis-project/ibis/actions/runs/196667191/jobsNULLhttps://api.github.com/repos/ibis-project/ibis/actions/runs/196667191/artifactshttps://github.com/ibis-project/ibis/actions/runs/19666719108855609f1e9ebdeb6197887cf64ecda015d99a8{'trees_url': 'https://api.github.com/repos/ibis-project/ibis/git/trees{/sha}', 'teams_url': 'https://api.github.com/repos/ibis-project/ibis/teams', ... +44}2020-08-05 19:01:08+00:00actions-lint https://api.github.com/repos/ibis-project/ibis/actions/runs/196667191pull_requestMain  NULL2020-08-05 19:01:08+00:00https://api.github.com/repos/ibis-project/ibis/check-suites/10130294901013029490success   {'tree_id': 'c4277198178ae73c3d9611af464ee75eadbceedc', 'timestamp': datetime.datetime(2020, 8, 5, 19, 0, 57, tzinfo=<UTC>), ... +4}https://api.github.com/repos/ibis-project/ibis/actions/runs/196667191/logs │\n│ https://api.github.com/repos/ibis-project/ibis/actions/workflows/21009862100986NULL2712020-08-05 18:59:37+00:00https://api.github.com/repos/ibis-project/ibis/actions/runs/196617109/cancelhttps://api.github.com/repos/ibis-project/ibis/actions/runs/196617109/rerunMDEwOkNoZWNrU3VpdGUxMDEyODM0OTQ3[]196617109MDExOldvcmtmbG93UnVuMTk2NjE3MTA5completed{'trees_url': 'https://api.github.com/repos/ibis-project/ibis/git/trees{/sha}', 'teams_url': 'https://api.github.com/repos/ibis-project/ibis/teams', ... +44}https://api.github.com/repos/ibis-project/ibis/actions/runs/196617109/jobsNULLhttps://api.github.com/repos/ibis-project/ibis/actions/runs/196617109/artifactshttps://github.com/ibis-project/ibis/actions/runs/1966171097472797f3e4da39d18e53c09566dba5e373094b0{'trees_url': 'https://api.github.com/repos/ibis-project/ibis/git/trees{/sha}', 'teams_url': 'https://api.github.com/repos/ibis-project/ibis/teams', ... +44}2020-08-05 18:12:34+00:00actions-lint https://api.github.com/repos/ibis-project/ibis/actions/runs/196617109pull_requestMain  NULL2020-08-05 18:12:34+00:00https://api.github.com/repos/ibis-project/ibis/check-suites/10128349471012834947success   {'tree_id': '451472455efc6f20b81f6e1762ac712ec75e77b3', 'timestamp': datetime.datetime(2020, 8, 5, 18, 12, 25, tzinfo=<UTC>), ... +4}https://api.github.com/repos/ibis-project/ibis/actions/runs/196617109/logs │\n│ https://api.github.com/repos/ibis-project/ibis/actions/workflows/21009862100986NULL2612020-08-05 18:05:32+00:00https://api.github.com/repos/ibis-project/ibis/actions/runs/196598751/cancelhttps://api.github.com/repos/ibis-project/ibis/actions/runs/196598751/rerunMDEwOkNoZWNrU3VpdGUxMDEyNzc0OTQ4[]196598751MDExOldvcmtmbG93UnVuMTk2NTk4NzUxcompleted{'trees_url': 'https://api.github.com/repos/ibis-project/ibis/git/trees{/sha}', 'teams_url': 'https://api.github.com/repos/ibis-project/ibis/teams', ... +44}https://api.github.com/repos/ibis-project/ibis/actions/runs/196598751/jobsNULLhttps://api.github.com/repos/ibis-project/ibis/actions/runs/196598751/artifactshttps://github.com/ibis-project/ibis/actions/runs/1965987517452ea048908149a672f681ffd94e3fd0953ab2c{'trees_url': 'https://api.github.com/repos/ibis-project/ibis/git/trees{/sha}', 'teams_url': 'https://api.github.com/repos/ibis-project/ibis/teams', ... +44}2020-08-05 17:58:48+00:00actions-lint https://api.github.com/repos/ibis-project/ibis/actions/runs/196598751pull_requestMain  NULL2020-08-05 17:58:48+00:00https://api.github.com/repos/ibis-project/ibis/check-suites/10127749481012774948failure   {'tree_id': 'e753c3d693a15eeb99a0d2bd074414ab90dbc85d', 'timestamp': datetime.datetime(2020, 8, 5, 17, 58, 39, tzinfo=<UTC>), ... +4}https://api.github.com/repos/ibis-project/ibis/actions/runs/196598751/logs │\n│ https://api.github.com/repos/ibis-project/ibis/actions/workflows/21009862100986NULL2512020-08-05 18:23:52+00:00https://api.github.com/repos/ibis-project/ibis/actions/runs/196596932/cancelhttps://api.github.com/repos/ibis-project/ibis/actions/runs/196596932/rerunMDEwOkNoZWNrU3VpdGUxMDEyNzY2NTk2[]196596932MDExOldvcmtmbG93UnVuMTk2NTk2OTMycompleted{'trees_url': 'https://api.github.com/repos/ibis-project/ibis/git/trees{/sha}', 'teams_url': 'https://api.github.com/repos/ibis-project/ibis/teams', ... +44}https://api.github.com/repos/ibis-project/ibis/actions/runs/196596932/jobsNULLhttps://api.github.com/repos/ibis-project/ibis/actions/runs/196596932/artifactshttps://github.com/ibis-project/ibis/actions/runs/19659693259daccd16de041b14fa48b9ba53e8aac6495a578{'trees_url': 'https://api.github.com/repos/ibis-project/ibis/git/trees{/sha}', 'teams_url': 'https://api.github.com/repos/ibis-project/ibis/teams', ... +44}2020-08-05 17:56:34+00:00actions-lint https://api.github.com/repos/ibis-project/ibis/actions/runs/196596932pull_requestMain  NULL2020-08-05 17:56:34+00:00https://api.github.com/repos/ibis-project/ibis/check-suites/10127665961012766596failure   {'tree_id': '342312b7ce508d4d7c91259dd7919cee06508f19', 'timestamp': datetime.datetime(2020, 8, 5, 17, 56, 25, tzinfo=<UTC>), ... +4}https://api.github.com/repos/ibis-project/ibis/actions/runs/196596932/logs │\n│ https://api.github.com/repos/ibis-project/ibis/actions/workflows/21009862100986NULL2412020-08-05 17:54:55+00:00https://api.github.com/repos/ibis-project/ibis/actions/runs/196595357/cancelhttps://api.github.com/repos/ibis-project/ibis/actions/runs/196595357/rerunMDEwOkNoZWNrU3VpdGUxMDEyNzU5MjQ1[]196595357MDExOldvcmtmbG93UnVuMTk2NTk1MzU3completed{'trees_url': 'https://api.github.com/repos/ibis-project/ibis/git/trees{/sha}', 'teams_url': 'https://api.github.com/repos/ibis-project/ibis/teams', ... +44}https://api.github.com/repos/ibis-project/ibis/actions/runs/196595357/jobsNULLhttps://api.github.com/repos/ibis-project/ibis/actions/runs/196595357/artifactshttps://github.com/ibis-project/ibis/actions/runs/1965953571d4f2db372834da7fb33b53c60b59d3f3e40cf7c{'trees_url': 'https://api.github.com/repos/ibis-project/ibis/git/trees{/sha}', 'teams_url': 'https://api.github.com/repos/ibis-project/ibis/teams', ... +44}2020-08-05 17:54:35+00:00actions-lint https://api.github.com/repos/ibis-project/ibis/actions/runs/196595357pull_requestMain  NULL2020-08-05 17:54:35+00:00https://api.github.com/repos/ibis-project/ibis/check-suites/10127592451012759245failure   {'tree_id': '82f0fdad2916ec10d33f5b6c589dbfe8e4decccd', 'timestamp': datetime.datetime(2020, 8, 5, 17, 54, 26, tzinfo=<UTC>), ... +4}https://api.github.com/repos/ibis-project/ibis/actions/runs/196595357/logs │\n│ https://api.github.com/repos/ibis-project/ibis/actions/workflows/21009862100986NULL2312020-08-05 17:17:46+00:00https://api.github.com/repos/ibis-project/ibis/actions/runs/196505866/cancelhttps://api.github.com/repos/ibis-project/ibis/actions/runs/196505866/rerunMDEwOkNoZWNrU3VpdGUxMDEyNDExMDA0[]196505866MDExOldvcmtmbG93UnVuMTk2NTA1ODY2completed{'trees_url': 'https://api.github.com/repos/ibis-project/ibis/git/trees{/sha}', 'teams_url': 'https://api.github.com/repos/ibis-project/ibis/teams', ... +44}https://api.github.com/repos/ibis-project/ibis/actions/runs/196505866/jobsNULLhttps://api.github.com/repos/ibis-project/ibis/actions/runs/196505866/artifactshttps://github.com/ibis-project/ibis/actions/runs/196505866c36dd6504d86d1994fb36d6a84fb3f302a57642c{'trees_url': 'https://api.github.com/repos/ibis-project/ibis/git/trees{/sha}', 'teams_url': 'https://api.github.com/repos/ibis-project/ibis/teams', ... +44}2020-08-05 16:28:57+00:00actions-lint https://api.github.com/repos/ibis-project/ibis/actions/runs/196505866pull_requestMain  NULL2020-08-05 16:28:57+00:00https://api.github.com/repos/ibis-project/ibis/check-suites/10124110041012411004failure   {'tree_id': 'c3bf70f2809e48fc5c1dd5b0e7e2321bae4879ea', 'timestamp': datetime.datetime(2020, 8, 5, 16, 28, 48, tzinfo=<UTC>), ... +4}https://api.github.com/repos/ibis-project/ibis/actions/runs/196505866/logs │\n│ https://api.github.com/repos/ibis-project/ibis/actions/workflows/21009862100986NULL2212020-08-05 16:47:06+00:00https://api.github.com/repos/ibis-project/ibis/actions/runs/196491206/cancelhttps://api.github.com/repos/ibis-project/ibis/actions/runs/196491206/rerunMDEwOkNoZWNrU3VpdGUxMDEyMzQ4MjUz[]196491206MDExOldvcmtmbG93UnVuMTk2NDkxMjA2completed{'trees_url': 'https://api.github.com/repos/ibis-project/ibis/git/trees{/sha}', 'teams_url': 'https://api.github.com/repos/ibis-project/ibis/teams', ... +44}https://api.github.com/repos/ibis-project/ibis/actions/runs/196491206/jobsNULLhttps://api.github.com/repos/ibis-project/ibis/actions/runs/196491206/artifactshttps://github.com/ibis-project/ibis/actions/runs/1964912065ffc4dcb3857eae64b5b36f46b378149c0bb2d74{'trees_url': 'https://api.github.com/repos/ibis-project/ibis/git/trees{/sha}', 'teams_url': 'https://api.github.com/repos/ibis-project/ibis/teams', ... +44}2020-08-05 16:15:44+00:00actions-lint https://api.github.com/repos/ibis-project/ibis/actions/runs/196491206pull_requestMain  NULL2020-08-05 16:15:44+00:00https://api.github.com/repos/ibis-project/ibis/check-suites/10123482531012348253failure   {'tree_id': '3781a799e538f99a2e05399fea4237e5d06d5df2', 'timestamp': datetime.datetime(2020, 8, 5, 16, 12, 4, tzinfo=<UTC>), ... +4}https://api.github.com/repos/ibis-project/ibis/actions/runs/196491206/logs │\n│ https://api.github.com/repos/ibis-project/ibis/actions/workflows/21009862100986NULL2112020-08-05 15:10:13+00:00https://api.github.com/repos/ibis-project/ibis/actions/runs/196367166/cancelhttps://api.github.com/repos/ibis-project/ibis/actions/runs/196367166/rerunMDEwOkNoZWNrU3VpdGUxMDExODM5NTE5[]196367166MDExOldvcmtmbG93UnVuMTk2MzY3MTY2completed{'trees_url': 'https://api.github.com/repos/ibis-project/ibis/git/trees{/sha}', 'teams_url': 'https://api.github.com/repos/ibis-project/ibis/teams', ... +44}https://api.github.com/repos/ibis-project/ibis/actions/runs/196367166/jobsNULLhttps://api.github.com/repos/ibis-project/ibis/actions/runs/196367166/artifactshttps://github.com/ibis-project/ibis/actions/runs/196367166e88d621425c939857b3b9391794c5ddfd7615981{'trees_url': 'https://api.github.com/repos/datapythonista/ibis/git/trees{/sha}', 'teams_url': 'https://api.github.com/repos/datapythonista/ibis/teams', ... +44}2020-08-05 14:36:39+00:00conda-windowshttps://api.github.com/repos/ibis-project/ibis/actions/runs/196367166pull_requestMain  NULL2020-08-05 14:36:39+00:00https://api.github.com/repos/ibis-project/ibis/check-suites/10118395191011839519failure   {'tree_id': 'e093ce6398be8a2fb5331d944d07ef0c5518cc84', 'timestamp': datetime.datetime(2020, 8, 5, 14, 36, 30, tzinfo=<UTC>), ... +4}https://api.github.com/repos/ibis-project/ibis/actions/runs/196367166/logs │\n│ https://api.github.com/repos/ibis-project/ibis/actions/workflows/21009862100986NULL2012020-08-05 14:32:14+00:00https://api.github.com/repos/ibis-project/ibis/actions/runs/196316939/cancelhttps://api.github.com/repos/ibis-project/ibis/actions/runs/196316939/rerunMDEwOkNoZWNrU3VpdGUxMDExNjUzNDY5[]196316939MDExOldvcmtmbG93UnVuMTk2MzE2OTM5completed{'trees_url': 'https://api.github.com/repos/ibis-project/ibis/git/trees{/sha}', 'teams_url': 'https://api.github.com/repos/ibis-project/ibis/teams', ... +44}https://api.github.com/repos/ibis-project/ibis/actions/runs/196316939/jobsNULLhttps://api.github.com/repos/ibis-project/ibis/actions/runs/196316939/artifactshttps://github.com/ibis-project/ibis/actions/runs/196316939702446a96a1b9e6b463084f2f09f2f2106fef8d4{'trees_url': 'https://api.github.com/repos/datapythonista/ibis/git/trees{/sha}', 'teams_url': 'https://api.github.com/repos/datapythonista/ibis/teams', ... +44}2020-08-05 14:01:24+00:00conda-windowshttps://api.github.com/repos/ibis-project/ibis/actions/runs/196316939pull_requestMain  NULL2020-08-05 14:01:24+00:00https://api.github.com/repos/ibis-project/ibis/check-suites/10116534691011653469failure   {'tree_id': 'cdac62bce1914add5faceafd210f32965aa00fe7', 'timestamp': datetime.datetime(2020, 8, 5, 14, 1, 11, tzinfo=<UTC>), ... +4}https://api.github.com/repos/ibis-project/ibis/actions/runs/196316939/logs │\n│ https://api.github.com/repos/ibis-project/ibis/actions/workflows/21009862100986NULL1912020-08-05 01:20:00+00:00https://api.github.com/repos/ibis-project/ibis/actions/runs/195537439/cancelhttps://api.github.com/repos/ibis-project/ibis/actions/runs/195537439/rerunMDEwOkNoZWNrU3VpdGUxMDA5MDQ3OTk5[]195537439MDExOldvcmtmbG93UnVuMTk1NTM3NDM5completed{'trees_url': 'https://api.github.com/repos/ibis-project/ibis/git/trees{/sha}', 'teams_url': 'https://api.github.com/repos/ibis-project/ibis/teams', ... +44}https://api.github.com/repos/ibis-project/ibis/actions/runs/195537439/jobsNULLhttps://api.github.com/repos/ibis-project/ibis/actions/runs/195537439/artifactshttps://github.com/ibis-project/ibis/actions/runs/1955374392ab26f385b87f39b66cf51783d7ab8904fdb4677{'trees_url': 'https://api.github.com/repos/datapythonista/ibis/git/trees{/sha}', 'teams_url': 'https://api.github.com/repos/datapythonista/ibis/teams', ... +44}2020-08-05 00:48:17+00:00conda-windowshttps://api.github.com/repos/ibis-project/ibis/actions/runs/195537439pull_requestMain  NULL2020-08-05 00:48:17+00:00https://api.github.com/repos/ibis-project/ibis/check-suites/10090479991009047999failure   {'tree_id': '33ca23ad93f84344f03894d952d7ffeaf8fb5990', 'timestamp': datetime.datetime(2020, 8, 5, 0, 48, 8, tzinfo=<UTC>), ... +4}https://api.github.com/repos/ibis-project/ibis/actions/runs/195537439/logs │\n│                                                                           │\n└──────────────────────────────────────────────────────────────────────────┴─────────────┴──────────────────┴────────────┴─────────────┴───────────────────────────┴──────────────────────────────────────────────────────────────────────────────┴─────────────────────────────────────────────────────────────────────────────┴──────────────────────────────────┴──────────────────────────────────────────────────────────────────────────────────┴───────────┴──────────────────────────────────┴───────────┴───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┴────────────────────────────────────────────────────────────────────────────┴──────────────────────┴─────────────────────────────────────────────────────────────────────────────────┴─────────────────────────────────────────────────────────────┴──────────────────────────────────────────┴───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┴───────────────────────────┴───────────────┴───────────────────────────────────────────────────────────────────────┴──────────────┴────────┴───────┴───────────────────────────┴────────────────────────────────────────────────────────────────────────┴────────────────┴────────────┴───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┴────────────────────────────────────────────────────────────────────────────┘\n
\n```\n:::\n:::\n\n\nAgain we have a bunch of columns that aren't so useful to us, so let's see what else is there.\n\n::: {#bf73436e .cell execution_count=8}\n``` {.python .cell-code}\nworkflows.columns\n```\n\n::: {.cell-output .cell-output-display execution_count=8}\n```\n['workflow_url',\n 'workflow_id',\n 'triggering_actor',\n 'run_number',\n 'run_attempt',\n 'updated_at',\n 'cancel_url',\n 'rerun_url',\n 'check_suite_node_id',\n 'pull_requests',\n 'id',\n 'node_id',\n 'status',\n 'repository',\n 'jobs_url',\n 'previous_attempt_url',\n 'artifacts_url',\n 'html_url',\n 'head_sha',\n 'head_repository',\n 'run_started_at',\n 'head_branch',\n 'url',\n 'event',\n 'name',\n 'actor',\n 'created_at',\n 'check_suite_url',\n 'check_suite_id',\n 'conclusion',\n 'head_commit',\n 'logs_url']\n```\n:::\n:::\n\n\nWe don't care about many of these for the purposes of this analysis, however we need the `id` and a few values derived from the `run_started_at` column.\n\n- `id`: the unique identifier of the **workflow run**\n- `run_started_at`: the time the workflow run started\n\nWe compute the date the run started at so we can later compare it to the dates where we added poetry and switched to the team plan.\n\n::: {#69dfa49b .cell execution_count=9}\n``` {.python .cell-code}\nworkflows = workflows.select(\n _.id, _.run_started_at, started_date=_.run_started_at.date()\n)\nworkflows\n```\n\n::: {.cell-output .cell-output-display execution_count=9}\n```{=html}\n
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┓\n┃ id         run_started_at             started_date ┃\n┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━┩\n│ int64timestamp('UTC')date         │\n├───────────┼───────────────────────────┼──────────────┤\n│ 1954783822020-08-04 23:54:29+00:002020-08-04   │\n│ 1954765172020-08-04 23:51:44+00:002020-08-04   │\n│ 1954755252020-08-04 23:50:11+00:002020-08-04   │\n│ 1954686772020-08-04 23:39:51+00:002020-08-04   │\n│ 1954653432020-08-04 23:34:11+00:002020-08-04   │\n│ 1954606112020-08-04 23:29:07+00:002020-08-04   │\n│ 1954525052020-08-04 23:17:29+00:002020-08-04   │\n│ 1954478862020-08-04 23:11:35+00:002020-08-04   │\n│ 1954355212020-08-04 23:02:34+00:002020-08-04   │\n│ 1954333852020-08-04 23:01:00+00:002020-08-04   │\n│                     │\n└───────────┴───────────────────────────┴──────────────┘\n
\n```\n:::\n:::\n\n\nWe need to associate jobs and workflows somehow, so let's join them on the relevant key fields.\n\n::: {#8d4786e3 .cell execution_count=10}\n``` {.python .cell-code}\njoined = jobs.join(workflows, jobs.run_id == workflows.id)\njoined\n```\n\n::: {.cell-output .cell-output-display execution_count=10}\n```{=html}\n
┏━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┓\n┃ run_id      job_duration  last_job_started_at        last_job_completed_at      id          run_started_at             started_date ┃\n┡━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━┩\n│ int64int64timestamp('UTC')timestamp('UTC')int64timestamp('UTC')date         │\n├────────────┼──────────────┼───────────────────────────┼───────────────────────────┼────────────┼───────────────────────────┼──────────────┤\n│  28727181802020-10-04 01:42:03+00:002020-10-04 01:42:03+00:002872718182020-10-04 01:41:55+00:002020-10-04   │\n│ 1405828076270000002021-10-31 23:36:44+00:002021-10-31 23:37:11+00:0014058280762021-10-31 23:36:37+00:002021-10-31   │\n│ 1405808044340000002021-10-31 23:26:54+00:002021-10-31 23:27:28+00:0014058080442021-10-31 23:24:52+00:002021-10-31   │\n│ 140579786930000002021-10-31 23:26:44+00:002021-10-31 23:26:47+00:0014057978692021-10-31 23:19:08+00:002021-10-31   │\n│ 140579787002021-10-31 23:26:41+00:002021-10-31 23:26:41+00:0014057978702021-10-31 23:19:08+00:002021-10-31   │\n│ 14057960702800000002021-10-31 23:37:03+00:002021-10-31 23:37:03+00:0014057960702021-10-31 23:18:07+00:002021-10-31   │\n│ 14057960701280000002021-10-31 23:37:03+00:002021-10-31 23:37:03+00:0014057960702021-10-31 23:18:07+00:002021-10-31   │\n│ 14057960704770000002021-10-31 23:37:03+00:002021-10-31 23:37:03+00:0014057960702021-10-31 23:18:07+00:002021-10-31   │\n│ 14057960706070000002021-10-31 23:37:03+00:002021-10-31 23:37:03+00:0014057960702021-10-31 23:18:07+00:002021-10-31   │\n│ 14057960701540000002021-10-31 23:37:03+00:002021-10-31 23:37:03+00:0014057960702021-10-31 23:18:07+00:002021-10-31   │\n│                      │\n└────────────┴──────────────┴───────────────────────────┴───────────────────────────┴────────────┴───────────────────────────┴──────────────┘\n
\n```\n:::\n:::\n\n\nSweet! Now we have workflow runs and job runs together in the same table, let's start exploring summarization.\n\nLet's encode our knowledge about when the poetry move happened and also when we moved to the team plan.\n\n::: {#1bc567e0 .cell execution_count=11}\n``` {.python .cell-code}\nfrom datetime import date\n\nPOETRY_MERGED_DATE = date(2021, 10, 15)\nTEAMIZATION_DATE = date(2022, 11, 28)\n```\n:::\n\n\nLet's compute some indicator variables indicating whether a given row contains data after poetry changes occurred, and do the same for the team plan.\n\nLet's also compute queueing time and workflow duration.\n\n::: {#c12f7377 .cell execution_count=12}\n``` {.python .cell-code}\nstats = joined.select(\n _.started_date,\n _.job_duration,\n has_poetry=_.started_date > POETRY_MERGED_DATE,\n has_team=_.started_date > TEAMIZATION_DATE,\n queueing_time=_.last_job_started_at.cast(\"int\")\n - _.run_started_at.cast(\"int\"),\n workflow_duration=_.last_job_completed_at.cast(\"int\")\n - _.run_started_at.cast(\"int\"),\n)\nstats\n```\n\n::: {.cell-output .cell-output-display execution_count=12}\n```{=html}\n
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┓\n┃ started_date  job_duration  has_poetry  has_team  queueing_time  workflow_duration ┃\n┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━┩\n│ dateint64booleanbooleanint64int64             │\n├──────────────┼──────────────┼────────────┼──────────┼───────────────┼───────────────────┤\n│ 2022-08-1584000000 │ True       │ False    │      1100000095000000 │\n│ 2022-08-1524000000 │ True       │ False    │      1100000035000000 │\n│ 2022-08-150 │ True       │ False    │       10000001000000 │\n│ 2022-08-151000000 │ True       │ False    │      1800000020000000 │\n│ 2022-08-150 │ True       │ False    │      1800000020000000 │\n│ 2022-08-152000000 │ True       │ False    │      1800000020000000 │\n│ 2022-08-152000000 │ True       │ False    │      1800000020000000 │\n│ 2022-08-152000000 │ True       │ False    │      1800000020000000 │\n│ 2022-08-152000000 │ True       │ False    │      1800000020000000 │\n│ 2022-08-15411000000 │ True       │ False    │     712000000712000000 │\n│  │\n└──────────────┴──────────────┴────────────┴──────────┴───────────────┴───────────────────┘\n
\n```\n:::\n:::\n\n\nLet's create a column ranging from 0 to 2 inclusive where:\n\n- 0: no improvements\n- 1: just poetry\n- 2: poetry and the team plan\n\nLet's also give them some names that'll look nice on our plots.\n\n::: {#d40f4002 .cell execution_count=13}\n``` {.python .cell-code}\nstats = stats.mutate(\n raw_improvements=_.has_poetry.cast(\"int\") + _.has_team.cast(\"int\")\n).mutate(\n improvements=(\n _.raw_improvements.case()\n .when(0, \"None\")\n .when(1, \"Poetry\")\n .when(2, \"Poetry + Team Plan\")\n .else_(\"NA\")\n .end()\n ),\n team_plan=ibis.where(_.raw_improvements > 1, \"Poetry + Team Plan\", \"None\"),\n)\nstats\n```\n\n::: {.cell-output .cell-output-display execution_count=13}\n```{=html}\n
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━┓\n┃ started_date  job_duration  has_poetry  has_team  queueing_time  workflow_duration  raw_improvements  improvements  team_plan ┃\n┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━┩\n│ dateint64booleanbooleanint64int64int64stringstring    │\n├──────────────┼──────────────┼────────────┼──────────┼───────────────┼───────────────────┼──────────────────┼──────────────┼───────────┤\n│ 2021-03-29377000000 │ False      │ False    │      130000009780000000None        None      │\n│ 2021-03-29414000000 │ False      │ False    │      130000009780000000None        None      │\n│ 2021-03-29519000000 │ False      │ False    │      130000009780000000None        None      │\n│ 2021-03-29642000000 │ False      │ False    │      130000009780000000None        None      │\n│ 2021-03-29861000000 │ False      │ False    │      130000009780000000None        None      │\n│ 2021-03-29873000000 │ False      │ False    │      130000009780000000None        None      │\n│ 2021-03-29455000000 │ False      │ False    │      130000009780000000None        None      │\n│ 2021-03-29637000000 │ False      │ False    │      130000009780000000None        None      │\n│ 2021-03-29798000000 │ False      │ False    │      130000009780000000None        None      │\n│ 2021-03-29822000000 │ False      │ False    │      130000009780000000None        None      │\n│          │\n└──────────────┴──────────────┴────────────┴──────────┴───────────────┴───────────────────┴──────────────────┴──────────────┴───────────┘\n
\n```\n:::\n:::\n\n\nFinally, we can summarize by averaging the different durations, grouping on the variables of interest.\n\n::: {#dc30c9b1 .cell execution_count=14}\n``` {.python .cell-code}\nUSECS_PER_MIN = 60_000_000\n\nagged = stats.group_by([_.started_date, _.improvements, _.team_plan]).agg(\n job=_.job_duration.div(USECS_PER_MIN).mean(),\n workflow=_.workflow_duration.div(USECS_PER_MIN).mean(),\n queueing_time=_.queueing_time.div(USECS_PER_MIN).mean(),\n)\nagged\n```\n\n::: {.cell-output .cell-output-display execution_count=14}\n```{=html}\n
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓\n┃ started_date  improvements        team_plan           job        workflow   queueing_time ┃\n┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩\n│ datestringstringfloat64float64float64       │\n├──────────────┼────────────────────┼────────────────────┼───────────┼───────────┼───────────────┤\n│ 2020-09-02None              None              14.15476257.42083357.148929 │\n│ 2022-09-12Poetry            None              3.12556913.84556012.839561 │\n│ 2022-09-25Poetry            None              3.35744615.09534314.073260 │\n│ 2021-09-24None              None              7.91091520.35839220.336150 │\n│ 2020-11-25None              None              13.75061259.67240159.493807 │\n│ 2020-08-07None              None              48.22916748.3916670.162500 │\n│ 2021-04-09None              None              0.0000000.1666670.166667 │\n│ 2022-02-22Poetry            None              2.34196611.37756410.467503 │\n│ 2022-10-13Poetry            None              6.67685335.34404628.530969 │\n│ 2022-12-19Poetry + Team PlanPoetry + Team Plan4.11339110.2731967.780035 │\n│  │\n└──────────────┴────────────────────┴────────────────────┴───────────┴───────────┴───────────────┘\n
\n```\n:::\n:::\n\n\nIf at any point you want to inspect the SQL you'll be running, ibis has you covered with `ibis.to_sql`.\n\n::: {#f3e910a7 .cell execution_count=15}\n``` {.python .cell-code}\nibis.to_sql(agged)\n```\n\n::: {.cell-output .cell-output-display execution_count=15}\n```sql\nWITH t0 AS (\n SELECT\n t6.*\n FROM `ibis-gbq`.workflows.jobs AS t6\n WHERE\n t6.`started_at` < '2023-01-09'\n), t1 AS (\n SELECT\n t6.`id`,\n t6.`run_started_at`,\n DATE(t6.`run_started_at`) AS `started_date`\n FROM `ibis-gbq`.workflows.workflows AS t6\n), t2 AS (\n SELECT\n t0.`run_id`,\n UNIX_MICROS(t0.`completed_at`) - UNIX_MICROS(t0.`started_at`) AS `job_duration`,\n MAX(t0.`started_at`) OVER (PARTITION BY t0.`run_id`) AS `last_job_started_at`,\n MAX(t0.`completed_at`) OVER (PARTITION BY t0.`run_id`) AS `last_job_completed_at`\n FROM t0\n), t3 AS (\n SELECT\n `started_date`,\n `job_duration`,\n `started_date` > CAST('2021-10-15' AS DATE) AS `has_poetry`,\n `started_date` > CAST('2022-11-28' AS DATE) AS `has_team`,\n UNIX_MICROS(`last_job_started_at`) - UNIX_MICROS(`run_started_at`) AS `queueing_time`,\n UNIX_MICROS(`last_job_completed_at`) - UNIX_MICROS(`run_started_at`) AS `workflow_duration`\n FROM t2\n INNER JOIN t1\n ON t2.`run_id` = t1.`id`\n), t4 AS (\n SELECT\n t3.*,\n CAST(t3.`has_poetry` AS INT64) + CAST(t3.`has_team` AS INT64) AS `raw_improvements`\n FROM t3\n)\nSELECT\n t5.`started_date`,\n t5.`improvements`,\n t5.`team_plan`,\n avg(IEEE_DIVIDE(t5.`job_duration`, 60000000)) AS `job`,\n avg(IEEE_DIVIDE(t5.`workflow_duration`, 60000000)) AS `workflow`,\n avg(IEEE_DIVIDE(t5.`queueing_time`, 60000000)) AS `queueing_time`\nFROM (\n SELECT\n t4.*,\n CASE t4.`raw_improvements`\n WHEN 0\n THEN 'None'\n WHEN 1\n THEN 'Poetry'\n WHEN 2\n THEN 'Poetry + Team Plan'\n ELSE 'NA'\n END AS `improvements`,\n CASE WHEN t4.`raw_improvements` > 1 THEN 'Poetry + Team Plan' ELSE 'None' END AS `team_plan`\n FROM t4\n) AS t5\nGROUP BY\n 1,\n 2,\n 3\n```\n:::\n:::\n\n\n# Plot the Results\n\nIbis doesn't have builtin plotting support, so we need to pull our results into pandas.\n\nHere I'm using `plotnine` (a Python port of `ggplot2`), which has great integration with pandas DataFrames.\n\n::: {#baa6d5bb .cell execution_count=16}\n``` {.python .cell-code}\nraw_df = agged.execute()\nraw_df\n```\n\n::: {.cell-output .cell-output-display execution_count=16}\n```{=html}\n
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
started_dateimprovementsteam_planjobworkflowqueueing_time
02022-05-21PoetryNone3.31545312.05670510.856003
12021-06-23NoneNone8.80432918.8385280.799567
22022-05-12PoetryNone4.91249217.44380413.617164
32022-09-11PoetryNone3.31878212.66524411.561670
42021-04-08NoneNone8.36698113.9572330.276730
.....................
7792022-10-09PoetryNone3.47228312.4897499.092648
7802021-03-24NoneNone9.49908216.4199031.801063
7812022-03-06PoetryNone2.72794311.75732410.942026
7822021-11-22PoetryNone2.60886010.3066377.481462
7832022-06-08PoetryNone3.21447012.61727611.713546
\n

784 rows × 6 columns

\n
\n```\n:::\n:::\n\n\nGenerally, `plotnine` works with long, tidy data so let's use `pandas.melt` to get there.\n\n::: {#045cdbb1 .cell execution_count=17}\n``` {.python .cell-code}\nimport pandas as pd\n\ndf = pd.melt(\n raw_df,\n id_vars=[\"started_date\", \"improvements\", \"team_plan\"],\n var_name=\"entity\",\n value_name=\"duration\",\n)\ndf.head()\n```\n\n::: {.cell-output .cell-output-display execution_count=17}\n```{=html}\n
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
started_dateimprovementsteam_planentityduration
02022-05-21PoetryNonejob3.315453
12021-06-23NoneNonejob8.804329
22022-05-12PoetryNonejob4.912492
32022-09-11PoetryNonejob3.318782
42021-04-08NoneNonejob8.366981
\n
\n```\n:::\n:::\n\n\nLet's make our theme lighthearted by using `xkcd`-style plots.\n\n::: {#029124e7 .cell execution_count=18}\n``` {.python .cell-code}\nfrom plotnine import *\n\ntheme_set(theme_xkcd())\n```\n:::\n\n\nCreate a few labels for our plot.\n\n::: {#cb9f40c8 .cell execution_count=19}\n``` {.python .cell-code}\npoetry_label = f\"Poetry\\n{POETRY_MERGED_DATE}\"\nteam_label = f\"Team Plan\\n{TEAMIZATION_DATE}\"\n```\n:::\n\n\nWithout the following line you may see large amount of inconsequential warnings that make the notebook unusable.\n\n::: {#539939bc .cell execution_count=20}\n``` {.python .cell-code}\nimport logging\n\n# without this, findfont logging spams the notebook making it unusable\nlogging.getLogger('matplotlib.font_manager').disabled = True\n```\n:::\n\n\nHere we show job durations, coloring the points differently depending on whether they have no improvements, poetry, or poetry + team plan.\n\n::: {#6db462db .cell execution_count=21}\n``` {.python .cell-code}\n(\n ggplot(\n df.loc[df.entity == \"job\"].reset_index(drop=True),\n aes(x=\"started_date\", y=\"duration\", color=\"factor(improvements)\"),\n )\n + geom_point()\n + geom_vline(\n xintercept=[TEAMIZATION_DATE, POETRY_MERGED_DATE],\n colour=[\"blue\", \"green\"],\n linetype=\"dashed\",\n )\n + scale_color_brewer(\n palette=7,\n type='qual',\n limits=[\"None\", \"Poetry\", \"Poetry + Team Plan\"],\n )\n + geom_text(x=POETRY_MERGED_DATE, label=poetry_label, y=15, color=\"blue\")\n + geom_text(x=TEAMIZATION_DATE, label=team_label, y=10, color=\"blue\")\n + stat_smooth(method=\"lm\")\n + labs(x=\"Date\", y=\"Duration (minutes)\")\n + ggtitle(\"Job Duration\")\n + theme(\n figure_size=(22, 6),\n legend_position=(0.67, 0.65),\n legend_direction=\"vertical\",\n )\n)\n```\n\n::: {.cell-output .cell-output-display}\n![](index_files/figure-html/cell-22-output-1.png){}\n:::\n\n::: {.cell-output .cell-output-display execution_count=21}\n```\n
\n```\n:::\n:::\n\n\n## Result #1: Job Duration\n\nThis result is pretty interesting.\n\nA few things pop out to me right away:\n\n- The move to poetry decreased the average job run duration by quite a bit. No, I'm not going to do any statistical tests.\n- The variability of job run durations also decreased by quite a bit after introducing poetry.\n- Moving to the team plan had little to no effect on job run duration.\n\n::: {#c1e62289 .cell execution_count=22}\n``` {.python .cell-code}\n(\n ggplot(\n df.loc[df.entity != \"job\"].reset_index(drop=True),\n aes(x=\"started_date\", y=\"duration\", color=\"factor(improvements)\"),\n )\n + facet_wrap(\"entity\", ncol=1)\n + geom_point()\n + geom_vline(\n xintercept=[TEAMIZATION_DATE, POETRY_MERGED_DATE],\n linetype=\"dashed\",\n )\n + scale_color_brewer(\n palette=7,\n type='qual',\n limits=[\"None\", \"Poetry\", \"Poetry + Team Plan\"],\n )\n + geom_text(x=POETRY_MERGED_DATE, label=poetry_label, y=75, color=\"blue\")\n + geom_text(x=TEAMIZATION_DATE, label=team_label, y=50, color=\"blue\")\n + stat_smooth(method=\"lm\")\n + labs(x=\"Date\", y=\"Duration (minutes)\")\n + ggtitle(\"Workflow Duration\")\n + theme(\n figure_size=(22, 13),\n legend_position=(0.68, 0.75),\n legend_direction=\"vertical\",\n )\n)\n```\n\n::: {.cell-output .cell-output-display}\n![](index_files/figure-html/cell-23-output-1.png){}\n:::\n\n::: {.cell-output .cell-output-display execution_count=22}\n```\n
\n```\n:::\n:::\n\n\n## Result #2: Workflow Duration and Queueing Time\n\nAnother interesting result.\n\n### Queueing Time\n\n- It almost looks like moving to poetry made average queueing time worse. This is probably due to our perception that faster jobs means faster ci. As we see here that isn't the case\n- Moving to the team plan cut down the queueing time by quite a bit\n\n### Workflow Duration\n\n- Overall workflow duration appears to be strongly influenced by moving to the team plan, which is almost certainly due to the drop in queueing time since we are no longer limited by slow job durations.\n- Perhaps it's obvious, but queueing time and workflow duration appear to be highly correlated.\n\nIn the next plot we'll look at that correlation.\n\n::: {#878f8b27 .cell execution_count=23}\n``` {.python .cell-code}\n(\n ggplot(raw_df, aes(x=\"workflow\", y=\"queueing_time\"))\n + geom_point()\n + geom_rug()\n + facet_grid(\". ~ team_plan\")\n + labs(x=\"Workflow Duration (minutes)\", y=\"Queueing Time (minutes)\")\n + ggtitle(\"Workflow Duration vs. Queueing Time\")\n + theme(figure_size=(22, 6))\n)\n```\n\n::: {.cell-output .cell-output-display}\n![](index_files/figure-html/cell-24-output-1.png){}\n:::\n\n::: {.cell-output .cell-output-display execution_count=23}\n```\n
\n```\n:::\n:::\n\n\n## Result #3: Workflow Duration and Queueing Duration are correlated\n\nIt also seems that moving to the team plan (though also the move to poetry might be related here) reduced the variability of both metrics.\n\nWe're lacking data compared to the past so we should wait for more to come in.\n\n## Conclusions\n\nIt appears that you need both a short queue time **and** fast individual jobs to minimize time spent in CI.\n\nIf you have a short queue time, but long job runs then you'll be bottlenecked on individual jobs, and if you have more jobs than queue slots then you'll be blocked on queueing time.\n\nI think we can sum this up nicely:\n\n- slow jobs, slow queue: 🤷 blocked by jobs or queue\n- slow jobs, fast queue: ❓ blocked by jobs, if jobs are slow enough\n- fast jobs, slow queue: ❗ blocked by queue, with enough jobs\n- fast jobs, fast queue: ✅\n\n", + "markdown": "---\ntitle: \"Analysis of Ibis's CI performance\"\nauthor: \"Phillip Cloud\"\ndate: \"2023-01-09\"\ncategories:\n - blog\n - bigquery\n - continuous integration\n - data engineering\n - dogfood\n---\n\n## Summary\n\nThis notebook takes you through an analysis of Ibis's CI data using ibis on top of [Google BigQuery](https://cloud.google.com/bigquery).\n\n- First, we load some data and poke around at it to see what's what.\n- Second, we figure out some useful things to calculate based on our poking.\n- Third, we'll visualize the results of calculations to showcase what changed and how.\n\n## Imports\n\nLet's start out by importing ibis and turning on interactive mode.\n\n::: {#02f86f58 .cell execution_count=1}\n``` {.python .cell-code}\nimport ibis\nfrom ibis import _\n\nibis.options.interactive = True\n```\n:::\n\n\n## Connect to BigQuery\n\nWe connect to BigQuery using the `ibis.connect` API, which accepts a URL string indicating the backend and various bit of information needed to connect to the backend. Here we're using BigQuery, so we need the project id (`ibis-gbq`) and the dataset id (`workflows`).\n\nDatasets are analogous to schemas in other systems.\n\n::: {#1cd51565 .cell execution_count=2}\n``` {.python .cell-code}\nurl = \"bigquery://ibis-gbq/workflows\"\ncon = ibis.connect(url)\n```\n:::\n\n\nLet's see what tables are available.\n\n::: {#c8e5249f .cell execution_count=3}\n``` {.python .cell-code}\ncon.list_tables()\n```\n\n::: {.cell-output .cell-output-display execution_count=3}\n```\n['analysis', 'jobs', 'workflows']\n```\n:::\n:::\n\n\n## Analysis\n\nHere we've got our first bit of interesting information: the `jobs` and `workflows` tables.\n\n### Terminology\n\nBefore we jump in, it helps to lay down some terminology.\n\n- A **workflow** corresponds to an individual GitHub Actions YAML file in a GitHub repository under the `.github/workflows` directory.\n- A **job** is a named set of steps to run inside a **workflow** file.\n\n### What's in the `workflows` table?\n\nEach row in the `workflows` table corresponds to a **workflow run**.\n\n- A **workflow run** is an instance of a workflow that was triggered by some entity: a GitHub user, bot, or other entity. Each row of the `workflows` table is a **workflow run**.\n\n### What's in the `jobs` table?\n\nSimilarly, each row in the `jobs` table is a **job run**. That is, for a given **workflow run** there are a set of jobs run with it.\n\n- A **job run** is an instance of a job *in a workflow*. It is associated with a single **workflow run**.\n\n## Rationale\n\nThe goal of this analysis is to try to understand ibis's CI performance, and whether the amount of time we spent waiting on CI has decreased, stayed the same or increased. Ideally, we can understand the pieces that contribute to the change or lack thereof.\n\n### Metrics\n\nTo that end there are a few interesting metrics to look at:\n\n- **job run** *duration*: this is the amount of time it takes for a given job to complete\n- **workflow run** *duration*: the amount of time it takes for *all* job runs in a workflow run to complete.\n- **queueing** *duration*: the amount time time spent waiting for the *first* job run to commence.\n\n### Mitigating Factors\n\n- Around October 2021, we changed our CI infrastructure to use [Poetry](https://python-poetry.org/) instead of [Conda](https://docs.conda.io/en/latest/). The goal there was to see if we could cache dependencies using the lock file generated by poetry. We should see whether that had any effect.\n- At the end of November 2022, we switch to the Team Plan (a paid GitHub plan) for the Ibis organzation. This tripled the amount of **job runs** that could execute in parallel. We should see if that helped anything.\n\nAlright, let's jump into some data!\n\n::: {#2a119bb7 .cell execution_count=4}\n``` {.python .cell-code}\njobs = con.tables.jobs[_.started_at < \"2023-01-09\"]\njobs\n```\n\n::: {.cell-output .cell-output-display execution_count=4}\n```{=html}\n
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┓\n┃ url                                                                    steps                                                                             status     started_at                 runner_group_name  run_attempt  name                       labels         node_id                       id         runner_id  run_url                                                                run_id     check_run_url                                                        html_url                                                                    runner_name  runner_group_id  head_sha                                  conclusion  completed_at              ┃\n┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━┩\n│ stringarray<struct<status: string, conclusion: string, started_at: timestamp('UTC'), …stringtimestamp('UTC')stringint64stringarray<string>stringint64int64stringint64stringstringstringint64stringstringtimestamp('UTC')          │\n├───────────────────────────────────────────────────────────────────────┼──────────────────────────────────────────────────────────────────────────────────┼───────────┼───────────────────────────┼───────────────────┼─────────────┼───────────────────────────┼───────────────┼──────────────────────────────┼───────────┼───────────┼───────────────────────────────────────────────────────────────────────┼───────────┼─────────────────────────────────────────────────────────────────────┼────────────────────────────────────────────────────────────────────────────┼─────────────┼─────────────────┼──────────────────────────────────────────┼────────────┼───────────────────────────┤\n│ https://api.github.com/repos/ibis-project/ibis/actions/jobs/947152589[{...}, {...}, ... +12]completed2020-08-04 23:54:37+00:00NULL1Test Conda               []MDg6Q2hlY2tSdW45NDcxNTI1ODk=947152589NULLhttps://api.github.com/repos/ibis-project/ibis/actions/runs/195478382195478382https://api.github.com/repos/ibis-project/ibis/check-runs/947152589https://github.com/ibis-project/ibis/runs/947152589?check_suite_focus=trueNULLNULL29c148e9679a53c7bb99755347e336d6c1f4d8c8success   2020-08-04 23:56:50+00:00 │\n│ https://api.github.com/repos/ibis-project/ibis/actions/jobs/947152601[{...}, {...}, ... +5]completed2020-08-04 23:54:37+00:00NULL1Test setup miniconda task[]MDg6Q2hlY2tSdW45NDcxNTI2MDE=947152601NULLhttps://api.github.com/repos/ibis-project/ibis/actions/runs/195478382195478382https://api.github.com/repos/ibis-project/ibis/check-runs/947152601https://github.com/ibis-project/ibis/runs/947152601?check_suite_focus=trueNULLNULL29c148e9679a53c7bb99755347e336d6c1f4d8c8failure   2020-08-04 23:56:59+00:00 │\n│ https://api.github.com/repos/ibis-project/ibis/actions/jobs/947147571[{...}, {...}, ... +5]completed2020-08-04 23:51:54+00:00NULL1Test setup miniconda task[]MDg6Q2hlY2tSdW45NDcxNDc1NzE=947147571NULLhttps://api.github.com/repos/ibis-project/ibis/actions/runs/195476517195476517https://api.github.com/repos/ibis-project/ibis/check-runs/947147571https://github.com/ibis-project/ibis/runs/947147571?check_suite_focus=trueNULLNULL501d9cc1f2f8d016b1e4fe44ebfc3e1facaa3c91failure   2020-08-04 23:54:48+00:00 │\n│ https://api.github.com/repos/ibis-project/ibis/actions/jobs/947147586[{...}, {...}, ... +12]completed2020-08-04 23:51:53+00:00NULL1Test Conda               []MDg6Q2hlY2tSdW45NDcxNDc1ODY=947147586NULLhttps://api.github.com/repos/ibis-project/ibis/actions/runs/195476517195476517https://api.github.com/repos/ibis-project/ibis/check-runs/947147586https://github.com/ibis-project/ibis/runs/947147586?check_suite_focus=trueNULLNULL501d9cc1f2f8d016b1e4fe44ebfc3e1facaa3c91failure   2020-08-04 23:54:15+00:00 │\n│ https://api.github.com/repos/ibis-project/ibis/actions/jobs/947144553[{...}, {...}, ... +11]completed2020-08-04 23:50:19+00:00NULL1Test Conda               []MDg6Q2hlY2tSdW45NDcxNDQ1NTM=947144553NULLhttps://api.github.com/repos/ibis-project/ibis/actions/runs/195475525195475525https://api.github.com/repos/ibis-project/ibis/check-runs/947144553https://github.com/ibis-project/ibis/runs/947144553?check_suite_focus=trueNULLNULL963821c370fb8f10f915e4b29e1c78f053c6e7b0failure   2020-08-04 23:52:45+00:00 │\n│ https://api.github.com/repos/ibis-project/ibis/actions/jobs/947144585[{...}, {...}, ... +5]completed2020-08-04 23:50:20+00:00NULL1Test setup miniconda task[]MDg6Q2hlY2tSdW45NDcxNDQ1ODU=947144585NULLhttps://api.github.com/repos/ibis-project/ibis/actions/runs/195475525195475525https://api.github.com/repos/ibis-project/ibis/check-runs/947144585https://github.com/ibis-project/ibis/runs/947144585?check_suite_focus=trueNULLNULL963821c370fb8f10f915e4b29e1c78f053c6e7b0failure   2020-08-04 23:52:53+00:00 │\n│ https://api.github.com/repos/ibis-project/ibis/actions/jobs/947123154[{...}, {...}, ... +5]completed2020-08-04 23:39:58+00:00NULL1Test setup miniconda task[]MDg6Q2hlY2tSdW45NDcxMjMxNTQ=947123154NULLhttps://api.github.com/repos/ibis-project/ibis/actions/runs/195468677195468677https://api.github.com/repos/ibis-project/ibis/check-runs/947123154https://github.com/ibis-project/ibis/runs/947123154?check_suite_focus=trueNULLNULLfcab3265e8afd70dddc518601b33661d15e19f62failure   2020-08-04 23:42:36+00:00 │\n│ https://api.github.com/repos/ibis-project/ibis/actions/jobs/947123167[{...}, {...}, ... +9]completed2020-08-04 23:39:57+00:00NULL1Test Conda               []MDg6Q2hlY2tSdW45NDcxMjMxNjc=947123167NULLhttps://api.github.com/repos/ibis-project/ibis/actions/runs/195468677195468677https://api.github.com/repos/ibis-project/ibis/check-runs/947123167https://github.com/ibis-project/ibis/runs/947123167?check_suite_focus=trueNULLNULLfcab3265e8afd70dddc518601b33661d15e19f62failure   2020-08-04 23:42:15+00:00 │\n│ https://api.github.com/repos/ibis-project/ibis/actions/jobs/947111435[{...}, {...}, ... +9]completed2020-08-04 23:34:19+00:00NULL1Test Conda               []MDg6Q2hlY2tSdW45NDcxMTE0MzU=947111435NULLhttps://api.github.com/repos/ibis-project/ibis/actions/runs/195465343195465343https://api.github.com/repos/ibis-project/ibis/check-runs/947111435https://github.com/ibis-project/ibis/runs/947111435?check_suite_focus=trueNULLNULL0b496d97d80f22c5a1a7db27cffd3f71a0d28941failure   2020-08-04 23:36:26+00:00 │\n│ https://api.github.com/repos/ibis-project/ibis/actions/jobs/947111464[{...}, {...}, ... +5]completed2020-08-04 23:34:19+00:00NULL1Test setup miniconda task[]MDg6Q2hlY2tSdW45NDcxMTE0NjQ=947111464NULLhttps://api.github.com/repos/ibis-project/ibis/actions/runs/195465343195465343https://api.github.com/repos/ibis-project/ibis/check-runs/947111464https://github.com/ibis-project/ibis/runs/947111464?check_suite_focus=trueNULLNULL0b496d97d80f22c5a1a7db27cffd3f71a0d28941failure   2020-08-04 23:36:28+00:00 │\n│                          │\n└───────────────────────────────────────────────────────────────────────┴──────────────────────────────────────────────────────────────────────────────────┴───────────┴───────────────────────────┴───────────────────┴─────────────┴───────────────────────────┴───────────────┴──────────────────────────────┴───────────┴───────────┴───────────────────────────────────────────────────────────────────────┴───────────┴─────────────────────────────────────────────────────────────────────┴────────────────────────────────────────────────────────────────────────────┴─────────────┴─────────────────┴──────────────────────────────────────────┴────────────┴───────────────────────────┘\n
\n```\n:::\n:::\n\n\nThese first few columns in the `jobs` table aren't that interesting so we should look at what else is there\n\n::: {#2d796f1f .cell execution_count=5}\n``` {.python .cell-code}\njobs.columns\n```\n\n::: {.cell-output .cell-output-display execution_count=5}\n```\n['url',\n 'steps',\n 'status',\n 'started_at',\n 'runner_group_name',\n 'run_attempt',\n 'name',\n 'labels',\n 'node_id',\n 'id',\n 'runner_id',\n 'run_url',\n 'run_id',\n 'check_run_url',\n 'html_url',\n 'runner_name',\n 'runner_group_id',\n 'head_sha',\n 'conclusion',\n 'completed_at']\n```\n:::\n:::\n\n\nA bunch of these aren't that useful for our purposes. However, `run_id`, `started_at`, `completed_at` are useful for us. The [GitHub documentation for job information](https://docs.github.com/en/rest/actions/workflow-jobs?apiVersion=2022-11-28#get-a-job-for-a-workflow-run) provides useful detail about the meaning of these fields.\n\n- `run_id`: the workflow run associated with this job run\n- `started_at`: when the job started\n- `completed_at`: when the job completed\n\nWhat we're interested in to a first degree is the job duration, so let's compute that.\n\nWe also need to compute when the last job for a given `run_id` started and when it completed. We'll use the former to compute the queueing duration, and the latter to compute the total time it took for a given workflow run to complete.\n\n::: {#f989d187 .cell execution_count=6}\n``` {.python .cell-code}\nrun_id_win = ibis.window(group_by=_.run_id)\njobs = jobs.select(\n _.run_id,\n job_duration=_.completed_at.delta(_.started_at, \"microsecond\"),\n last_job_started_at=_.started_at.max().over(run_id_win),\n last_job_completed_at=_.completed_at.max().over(run_id_win),\n)\njobs\n```\n\n::: {.cell-output .cell-output-display execution_count=6}\n```{=html}\n
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┓\n┃ run_id     job_duration  last_job_started_at        last_job_completed_at     ┃\n┡━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━┩\n│ int64int64timestamp('UTC')timestamp('UTC')          │\n├───────────┼──────────────┼───────────────────────────┼───────────────────────────┤\n│ 23650377011090000002020-09-02 19:06:50+00:002020-09-02 19:06:50+00:00 │\n│ 2365037706360000002020-09-02 19:06:50+00:002020-09-02 19:06:50+00:00 │\n│ 2365037705940000002020-09-02 19:06:50+00:002020-09-02 19:06:50+00:00 │\n│ 2365037704590000002020-09-02 19:06:50+00:002020-09-02 19:06:50+00:00 │\n│ 2365037704300000002020-09-02 19:06:50+00:002020-09-02 19:06:50+00:00 │\n│ 23650377032680000002020-09-02 19:06:50+00:002020-09-02 19:06:50+00:00 │\n│ 23650377002020-09-02 19:06:50+00:002020-09-02 19:06:50+00:00 │\n│ 2438355375650000002020-09-08 02:45:31+00:002020-09-08 02:45:31+00:00 │\n│ 2438355375810000002020-09-08 02:45:31+00:002020-09-08 02:45:31+00:00 │\n│ 2438355376440000002020-09-08 02:45:31+00:002020-09-08 02:45:31+00:00 │\n│                                  │\n└───────────┴──────────────┴───────────────────────────┴───────────────────────────┘\n
\n```\n:::\n:::\n\n\nLet's take a look at `workflows`\n\n::: {#775b8356 .cell execution_count=7}\n``` {.python .cell-code}\nworkflows = con.tables.workflows\nworkflows\n```\n\n::: {.cell-output .cell-output-display execution_count=7}\n```{=html}\n
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓\n┃ workflow_url                                                              workflow_id  triggering_actor  run_number  run_attempt  updated_at                 cancel_url                                                                    rerun_url                                                                    check_suite_node_id               pull_requests                                                                     id         node_id                           status     repository                                                                                                                                                     jobs_url                                                                    previous_attempt_url  artifacts_url                                                                    html_url                                                     head_sha                                  head_repository                                                                                                                                                    run_started_at             head_branch    url                                                                    event         name    actor  created_at                 check_suite_url                                                         check_suite_id  conclusion  head_commit                                                                                                                            logs_url                                                                   ┃\n┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩\n│ stringint64struct<subscrip…int64int64timestamp('UTC')stringstringstringarray<struct<number: int64, url: string, id: int64, head: struct<sha: string, r…int64stringstringstruct<trees_url: string, teams_url: string, statuses_url: string, subscribers_…stringstringstringstringstringstruct<trees_url: string, teams_url: string, statuses_url: string, subscribers_…timestamp('UTC')stringstringstringstringstru…timestamp('UTC')stringint64stringstruct<tree_id: string, timestamp: timestamp('UTC'), message: string, id: strin…string                                                                     │\n├──────────────────────────────────────────────────────────────────────────┼─────────────┼──────────────────┼────────────┼─────────────┼───────────────────────────┼──────────────────────────────────────────────────────────────────────────────┼─────────────────────────────────────────────────────────────────────────────┼──────────────────────────────────┼──────────────────────────────────────────────────────────────────────────────────┼───────────┼──────────────────────────────────┼───────────┼───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┼────────────────────────────────────────────────────────────────────────────┼──────────────────────┼─────────────────────────────────────────────────────────────────────────────────┼─────────────────────────────────────────────────────────────┼──────────────────────────────────────────┼───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┼───────────────────────────┼───────────────┼───────────────────────────────────────────────────────────────────────┼──────────────┼────────┼───────┼───────────────────────────┼────────────────────────────────────────────────────────────────────────┼────────────────┼────────────┼───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┼────────────────────────────────────────────────────────────────────────────┤\n│ https://api.github.com/repos/ibis-project/ibis/actions/workflows/21009862100986NULL2812020-08-05 20:01:17+00:00https://api.github.com/repos/ibis-project/ibis/actions/runs/196667191/cancelhttps://api.github.com/repos/ibis-project/ibis/actions/runs/196667191/rerunMDEwOkNoZWNrU3VpdGUxMDEzMDI5NDkw[]196667191MDExOldvcmtmbG93UnVuMTk2NjY3MTkxcompleted{'trees_url': 'https://api.github.com/repos/ibis-project/ibis/git/trees{/sha}', 'teams_url': 'https://api.github.com/repos/ibis-project/ibis/teams', ... +44}https://api.github.com/repos/ibis-project/ibis/actions/runs/196667191/jobsNULLhttps://api.github.com/repos/ibis-project/ibis/actions/runs/196667191/artifactshttps://github.com/ibis-project/ibis/actions/runs/19666719108855609f1e9ebdeb6197887cf64ecda015d99a8{'trees_url': 'https://api.github.com/repos/ibis-project/ibis/git/trees{/sha}', 'teams_url': 'https://api.github.com/repos/ibis-project/ibis/teams', ... +44}2020-08-05 19:01:08+00:00actions-lint https://api.github.com/repos/ibis-project/ibis/actions/runs/196667191pull_requestMain  NULL2020-08-05 19:01:08+00:00https://api.github.com/repos/ibis-project/ibis/check-suites/10130294901013029490success   {'tree_id': 'c4277198178ae73c3d9611af464ee75eadbceedc', 'timestamp': datetime.datetime(2020, 8, 5, 19, 0, 57, tzinfo=<UTC>), ... +4}https://api.github.com/repos/ibis-project/ibis/actions/runs/196667191/logs │\n│ https://api.github.com/repos/ibis-project/ibis/actions/workflows/21009862100986NULL2712020-08-05 18:59:37+00:00https://api.github.com/repos/ibis-project/ibis/actions/runs/196617109/cancelhttps://api.github.com/repos/ibis-project/ibis/actions/runs/196617109/rerunMDEwOkNoZWNrU3VpdGUxMDEyODM0OTQ3[]196617109MDExOldvcmtmbG93UnVuMTk2NjE3MTA5completed{'trees_url': 'https://api.github.com/repos/ibis-project/ibis/git/trees{/sha}', 'teams_url': 'https://api.github.com/repos/ibis-project/ibis/teams', ... +44}https://api.github.com/repos/ibis-project/ibis/actions/runs/196617109/jobsNULLhttps://api.github.com/repos/ibis-project/ibis/actions/runs/196617109/artifactshttps://github.com/ibis-project/ibis/actions/runs/1966171097472797f3e4da39d18e53c09566dba5e373094b0{'trees_url': 'https://api.github.com/repos/ibis-project/ibis/git/trees{/sha}', 'teams_url': 'https://api.github.com/repos/ibis-project/ibis/teams', ... +44}2020-08-05 18:12:34+00:00actions-lint https://api.github.com/repos/ibis-project/ibis/actions/runs/196617109pull_requestMain  NULL2020-08-05 18:12:34+00:00https://api.github.com/repos/ibis-project/ibis/check-suites/10128349471012834947success   {'tree_id': '451472455efc6f20b81f6e1762ac712ec75e77b3', 'timestamp': datetime.datetime(2020, 8, 5, 18, 12, 25, tzinfo=<UTC>), ... +4}https://api.github.com/repos/ibis-project/ibis/actions/runs/196617109/logs │\n│ https://api.github.com/repos/ibis-project/ibis/actions/workflows/21009862100986NULL2612020-08-05 18:05:32+00:00https://api.github.com/repos/ibis-project/ibis/actions/runs/196598751/cancelhttps://api.github.com/repos/ibis-project/ibis/actions/runs/196598751/rerunMDEwOkNoZWNrU3VpdGUxMDEyNzc0OTQ4[]196598751MDExOldvcmtmbG93UnVuMTk2NTk4NzUxcompleted{'trees_url': 'https://api.github.com/repos/ibis-project/ibis/git/trees{/sha}', 'teams_url': 'https://api.github.com/repos/ibis-project/ibis/teams', ... +44}https://api.github.com/repos/ibis-project/ibis/actions/runs/196598751/jobsNULLhttps://api.github.com/repos/ibis-project/ibis/actions/runs/196598751/artifactshttps://github.com/ibis-project/ibis/actions/runs/1965987517452ea048908149a672f681ffd94e3fd0953ab2c{'trees_url': 'https://api.github.com/repos/ibis-project/ibis/git/trees{/sha}', 'teams_url': 'https://api.github.com/repos/ibis-project/ibis/teams', ... +44}2020-08-05 17:58:48+00:00actions-lint https://api.github.com/repos/ibis-project/ibis/actions/runs/196598751pull_requestMain  NULL2020-08-05 17:58:48+00:00https://api.github.com/repos/ibis-project/ibis/check-suites/10127749481012774948failure   {'tree_id': 'e753c3d693a15eeb99a0d2bd074414ab90dbc85d', 'timestamp': datetime.datetime(2020, 8, 5, 17, 58, 39, tzinfo=<UTC>), ... +4}https://api.github.com/repos/ibis-project/ibis/actions/runs/196598751/logs │\n│ https://api.github.com/repos/ibis-project/ibis/actions/workflows/21009862100986NULL2512020-08-05 18:23:52+00:00https://api.github.com/repos/ibis-project/ibis/actions/runs/196596932/cancelhttps://api.github.com/repos/ibis-project/ibis/actions/runs/196596932/rerunMDEwOkNoZWNrU3VpdGUxMDEyNzY2NTk2[]196596932MDExOldvcmtmbG93UnVuMTk2NTk2OTMycompleted{'trees_url': 'https://api.github.com/repos/ibis-project/ibis/git/trees{/sha}', 'teams_url': 'https://api.github.com/repos/ibis-project/ibis/teams', ... +44}https://api.github.com/repos/ibis-project/ibis/actions/runs/196596932/jobsNULLhttps://api.github.com/repos/ibis-project/ibis/actions/runs/196596932/artifactshttps://github.com/ibis-project/ibis/actions/runs/19659693259daccd16de041b14fa48b9ba53e8aac6495a578{'trees_url': 'https://api.github.com/repos/ibis-project/ibis/git/trees{/sha}', 'teams_url': 'https://api.github.com/repos/ibis-project/ibis/teams', ... +44}2020-08-05 17:56:34+00:00actions-lint https://api.github.com/repos/ibis-project/ibis/actions/runs/196596932pull_requestMain  NULL2020-08-05 17:56:34+00:00https://api.github.com/repos/ibis-project/ibis/check-suites/10127665961012766596failure   {'tree_id': '342312b7ce508d4d7c91259dd7919cee06508f19', 'timestamp': datetime.datetime(2020, 8, 5, 17, 56, 25, tzinfo=<UTC>), ... +4}https://api.github.com/repos/ibis-project/ibis/actions/runs/196596932/logs │\n│ https://api.github.com/repos/ibis-project/ibis/actions/workflows/21009862100986NULL2412020-08-05 17:54:55+00:00https://api.github.com/repos/ibis-project/ibis/actions/runs/196595357/cancelhttps://api.github.com/repos/ibis-project/ibis/actions/runs/196595357/rerunMDEwOkNoZWNrU3VpdGUxMDEyNzU5MjQ1[]196595357MDExOldvcmtmbG93UnVuMTk2NTk1MzU3completed{'trees_url': 'https://api.github.com/repos/ibis-project/ibis/git/trees{/sha}', 'teams_url': 'https://api.github.com/repos/ibis-project/ibis/teams', ... +44}https://api.github.com/repos/ibis-project/ibis/actions/runs/196595357/jobsNULLhttps://api.github.com/repos/ibis-project/ibis/actions/runs/196595357/artifactshttps://github.com/ibis-project/ibis/actions/runs/1965953571d4f2db372834da7fb33b53c60b59d3f3e40cf7c{'trees_url': 'https://api.github.com/repos/ibis-project/ibis/git/trees{/sha}', 'teams_url': 'https://api.github.com/repos/ibis-project/ibis/teams', ... +44}2020-08-05 17:54:35+00:00actions-lint https://api.github.com/repos/ibis-project/ibis/actions/runs/196595357pull_requestMain  NULL2020-08-05 17:54:35+00:00https://api.github.com/repos/ibis-project/ibis/check-suites/10127592451012759245failure   {'tree_id': '82f0fdad2916ec10d33f5b6c589dbfe8e4decccd', 'timestamp': datetime.datetime(2020, 8, 5, 17, 54, 26, tzinfo=<UTC>), ... +4}https://api.github.com/repos/ibis-project/ibis/actions/runs/196595357/logs │\n│ https://api.github.com/repos/ibis-project/ibis/actions/workflows/21009862100986NULL2312020-08-05 17:17:46+00:00https://api.github.com/repos/ibis-project/ibis/actions/runs/196505866/cancelhttps://api.github.com/repos/ibis-project/ibis/actions/runs/196505866/rerunMDEwOkNoZWNrU3VpdGUxMDEyNDExMDA0[]196505866MDExOldvcmtmbG93UnVuMTk2NTA1ODY2completed{'trees_url': 'https://api.github.com/repos/ibis-project/ibis/git/trees{/sha}', 'teams_url': 'https://api.github.com/repos/ibis-project/ibis/teams', ... +44}https://api.github.com/repos/ibis-project/ibis/actions/runs/196505866/jobsNULLhttps://api.github.com/repos/ibis-project/ibis/actions/runs/196505866/artifactshttps://github.com/ibis-project/ibis/actions/runs/196505866c36dd6504d86d1994fb36d6a84fb3f302a57642c{'trees_url': 'https://api.github.com/repos/ibis-project/ibis/git/trees{/sha}', 'teams_url': 'https://api.github.com/repos/ibis-project/ibis/teams', ... +44}2020-08-05 16:28:57+00:00actions-lint https://api.github.com/repos/ibis-project/ibis/actions/runs/196505866pull_requestMain  NULL2020-08-05 16:28:57+00:00https://api.github.com/repos/ibis-project/ibis/check-suites/10124110041012411004failure   {'tree_id': 'c3bf70f2809e48fc5c1dd5b0e7e2321bae4879ea', 'timestamp': datetime.datetime(2020, 8, 5, 16, 28, 48, tzinfo=<UTC>), ... +4}https://api.github.com/repos/ibis-project/ibis/actions/runs/196505866/logs │\n│ https://api.github.com/repos/ibis-project/ibis/actions/workflows/21009862100986NULL2212020-08-05 16:47:06+00:00https://api.github.com/repos/ibis-project/ibis/actions/runs/196491206/cancelhttps://api.github.com/repos/ibis-project/ibis/actions/runs/196491206/rerunMDEwOkNoZWNrU3VpdGUxMDEyMzQ4MjUz[]196491206MDExOldvcmtmbG93UnVuMTk2NDkxMjA2completed{'trees_url': 'https://api.github.com/repos/ibis-project/ibis/git/trees{/sha}', 'teams_url': 'https://api.github.com/repos/ibis-project/ibis/teams', ... +44}https://api.github.com/repos/ibis-project/ibis/actions/runs/196491206/jobsNULLhttps://api.github.com/repos/ibis-project/ibis/actions/runs/196491206/artifactshttps://github.com/ibis-project/ibis/actions/runs/1964912065ffc4dcb3857eae64b5b36f46b378149c0bb2d74{'trees_url': 'https://api.github.com/repos/ibis-project/ibis/git/trees{/sha}', 'teams_url': 'https://api.github.com/repos/ibis-project/ibis/teams', ... +44}2020-08-05 16:15:44+00:00actions-lint https://api.github.com/repos/ibis-project/ibis/actions/runs/196491206pull_requestMain  NULL2020-08-05 16:15:44+00:00https://api.github.com/repos/ibis-project/ibis/check-suites/10123482531012348253failure   {'tree_id': '3781a799e538f99a2e05399fea4237e5d06d5df2', 'timestamp': datetime.datetime(2020, 8, 5, 16, 12, 4, tzinfo=<UTC>), ... +4}https://api.github.com/repos/ibis-project/ibis/actions/runs/196491206/logs │\n│ https://api.github.com/repos/ibis-project/ibis/actions/workflows/21009862100986NULL2112020-08-05 15:10:13+00:00https://api.github.com/repos/ibis-project/ibis/actions/runs/196367166/cancelhttps://api.github.com/repos/ibis-project/ibis/actions/runs/196367166/rerunMDEwOkNoZWNrU3VpdGUxMDExODM5NTE5[]196367166MDExOldvcmtmbG93UnVuMTk2MzY3MTY2completed{'trees_url': 'https://api.github.com/repos/ibis-project/ibis/git/trees{/sha}', 'teams_url': 'https://api.github.com/repos/ibis-project/ibis/teams', ... +44}https://api.github.com/repos/ibis-project/ibis/actions/runs/196367166/jobsNULLhttps://api.github.com/repos/ibis-project/ibis/actions/runs/196367166/artifactshttps://github.com/ibis-project/ibis/actions/runs/196367166e88d621425c939857b3b9391794c5ddfd7615981{'trees_url': 'https://api.github.com/repos/datapythonista/ibis/git/trees{/sha}', 'teams_url': 'https://api.github.com/repos/datapythonista/ibis/teams', ... +44}2020-08-05 14:36:39+00:00conda-windowshttps://api.github.com/repos/ibis-project/ibis/actions/runs/196367166pull_requestMain  NULL2020-08-05 14:36:39+00:00https://api.github.com/repos/ibis-project/ibis/check-suites/10118395191011839519failure   {'tree_id': 'e093ce6398be8a2fb5331d944d07ef0c5518cc84', 'timestamp': datetime.datetime(2020, 8, 5, 14, 36, 30, tzinfo=<UTC>), ... +4}https://api.github.com/repos/ibis-project/ibis/actions/runs/196367166/logs │\n│ https://api.github.com/repos/ibis-project/ibis/actions/workflows/21009862100986NULL2012020-08-05 14:32:14+00:00https://api.github.com/repos/ibis-project/ibis/actions/runs/196316939/cancelhttps://api.github.com/repos/ibis-project/ibis/actions/runs/196316939/rerunMDEwOkNoZWNrU3VpdGUxMDExNjUzNDY5[]196316939MDExOldvcmtmbG93UnVuMTk2MzE2OTM5completed{'trees_url': 'https://api.github.com/repos/ibis-project/ibis/git/trees{/sha}', 'teams_url': 'https://api.github.com/repos/ibis-project/ibis/teams', ... +44}https://api.github.com/repos/ibis-project/ibis/actions/runs/196316939/jobsNULLhttps://api.github.com/repos/ibis-project/ibis/actions/runs/196316939/artifactshttps://github.com/ibis-project/ibis/actions/runs/196316939702446a96a1b9e6b463084f2f09f2f2106fef8d4{'trees_url': 'https://api.github.com/repos/datapythonista/ibis/git/trees{/sha}', 'teams_url': 'https://api.github.com/repos/datapythonista/ibis/teams', ... +44}2020-08-05 14:01:24+00:00conda-windowshttps://api.github.com/repos/ibis-project/ibis/actions/runs/196316939pull_requestMain  NULL2020-08-05 14:01:24+00:00https://api.github.com/repos/ibis-project/ibis/check-suites/10116534691011653469failure   {'tree_id': 'cdac62bce1914add5faceafd210f32965aa00fe7', 'timestamp': datetime.datetime(2020, 8, 5, 14, 1, 11, tzinfo=<UTC>), ... +4}https://api.github.com/repos/ibis-project/ibis/actions/runs/196316939/logs │\n│ https://api.github.com/repos/ibis-project/ibis/actions/workflows/21009862100986NULL1912020-08-05 01:20:00+00:00https://api.github.com/repos/ibis-project/ibis/actions/runs/195537439/cancelhttps://api.github.com/repos/ibis-project/ibis/actions/runs/195537439/rerunMDEwOkNoZWNrU3VpdGUxMDA5MDQ3OTk5[]195537439MDExOldvcmtmbG93UnVuMTk1NTM3NDM5completed{'trees_url': 'https://api.github.com/repos/ibis-project/ibis/git/trees{/sha}', 'teams_url': 'https://api.github.com/repos/ibis-project/ibis/teams', ... +44}https://api.github.com/repos/ibis-project/ibis/actions/runs/195537439/jobsNULLhttps://api.github.com/repos/ibis-project/ibis/actions/runs/195537439/artifactshttps://github.com/ibis-project/ibis/actions/runs/1955374392ab26f385b87f39b66cf51783d7ab8904fdb4677{'trees_url': 'https://api.github.com/repos/datapythonista/ibis/git/trees{/sha}', 'teams_url': 'https://api.github.com/repos/datapythonista/ibis/teams', ... +44}2020-08-05 00:48:17+00:00conda-windowshttps://api.github.com/repos/ibis-project/ibis/actions/runs/195537439pull_requestMain  NULL2020-08-05 00:48:17+00:00https://api.github.com/repos/ibis-project/ibis/check-suites/10090479991009047999failure   {'tree_id': '33ca23ad93f84344f03894d952d7ffeaf8fb5990', 'timestamp': datetime.datetime(2020, 8, 5, 0, 48, 8, tzinfo=<UTC>), ... +4}https://api.github.com/repos/ibis-project/ibis/actions/runs/195537439/logs │\n│                                                                           │\n└──────────────────────────────────────────────────────────────────────────┴─────────────┴──────────────────┴────────────┴─────────────┴───────────────────────────┴──────────────────────────────────────────────────────────────────────────────┴─────────────────────────────────────────────────────────────────────────────┴──────────────────────────────────┴──────────────────────────────────────────────────────────────────────────────────┴───────────┴──────────────────────────────────┴───────────┴───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┴────────────────────────────────────────────────────────────────────────────┴──────────────────────┴─────────────────────────────────────────────────────────────────────────────────┴─────────────────────────────────────────────────────────────┴──────────────────────────────────────────┴───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┴───────────────────────────┴───────────────┴───────────────────────────────────────────────────────────────────────┴──────────────┴────────┴───────┴───────────────────────────┴────────────────────────────────────────────────────────────────────────┴────────────────┴────────────┴───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┴────────────────────────────────────────────────────────────────────────────┘\n
\n```\n:::\n:::\n\n\nAgain we have a bunch of columns that aren't so useful to us, so let's see what else is there.\n\n::: {#846952c9 .cell execution_count=8}\n``` {.python .cell-code}\nworkflows.columns\n```\n\n::: {.cell-output .cell-output-display execution_count=8}\n```\n['workflow_url',\n 'workflow_id',\n 'triggering_actor',\n 'run_number',\n 'run_attempt',\n 'updated_at',\n 'cancel_url',\n 'rerun_url',\n 'check_suite_node_id',\n 'pull_requests',\n 'id',\n 'node_id',\n 'status',\n 'repository',\n 'jobs_url',\n 'previous_attempt_url',\n 'artifacts_url',\n 'html_url',\n 'head_sha',\n 'head_repository',\n 'run_started_at',\n 'head_branch',\n 'url',\n 'event',\n 'name',\n 'actor',\n 'created_at',\n 'check_suite_url',\n 'check_suite_id',\n 'conclusion',\n 'head_commit',\n 'logs_url']\n```\n:::\n:::\n\n\nWe don't care about many of these for the purposes of this analysis, however we need the `id` and a few values derived from the `run_started_at` column.\n\n- `id`: the unique identifier of the **workflow run**\n- `run_started_at`: the time the workflow run started\n\nWe compute the date the run started at so we can later compare it to the dates where we added poetry and switched to the team plan.\n\n::: {#d1f82209 .cell execution_count=9}\n``` {.python .cell-code}\nworkflows = workflows.select(\n _.id, _.run_started_at, started_date=_.run_started_at.date()\n)\nworkflows\n```\n\n::: {.cell-output .cell-output-display execution_count=9}\n```{=html}\n
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┓\n┃ id         run_started_at             started_date ┃\n┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━┩\n│ int64timestamp('UTC')date         │\n├───────────┼───────────────────────────┼──────────────┤\n│ 1954783822020-08-04 23:54:29+00:002020-08-04   │\n│ 1954765172020-08-04 23:51:44+00:002020-08-04   │\n│ 1954755252020-08-04 23:50:11+00:002020-08-04   │\n│ 1954686772020-08-04 23:39:51+00:002020-08-04   │\n│ 1954653432020-08-04 23:34:11+00:002020-08-04   │\n│ 1954606112020-08-04 23:29:07+00:002020-08-04   │\n│ 1954525052020-08-04 23:17:29+00:002020-08-04   │\n│ 1954478862020-08-04 23:11:35+00:002020-08-04   │\n│ 1954355212020-08-04 23:02:34+00:002020-08-04   │\n│ 1954333852020-08-04 23:01:00+00:002020-08-04   │\n│                     │\n└───────────┴───────────────────────────┴──────────────┘\n
\n```\n:::\n:::\n\n\nWe need to associate jobs and workflows somehow, so let's join them on the relevant key fields.\n\n::: {#0d322a2d .cell execution_count=10}\n``` {.python .cell-code}\njoined = jobs.join(workflows, jobs.run_id == workflows.id)\njoined\n```\n\n::: {.cell-output .cell-output-display execution_count=10}\n```{=html}\n
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┓\n┃ run_id     job_duration  last_job_started_at        last_job_completed_at      id         run_started_at             started_date ┃\n┡━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━┩\n│ int64int64timestamp('UTC')timestamp('UTC')int64timestamp('UTC')date         │\n├───────────┼──────────────┼───────────────────────────┼───────────────────────────┼───────────┼───────────────────────────┼──────────────┤\n│ 64713727010000002021-03-12 18:59:42+00:002021-03-12 18:59:43+00:006471372702021-03-12 18:59:31+00:002021-03-12   │\n│ 6471133248010000002021-03-12 18:51:40+00:002021-03-12 19:06:13+00:006471133242021-03-12 18:51:26+00:002021-03-12   │\n│ 6471133248320000002021-03-12 18:51:40+00:002021-03-12 19:06:13+00:006471133242021-03-12 18:51:26+00:002021-03-12   │\n│ 6471133243800000002021-03-12 18:51:40+00:002021-03-12 19:06:13+00:006471133242021-03-12 18:51:26+00:002021-03-12   │\n│ 6471133243710000002021-03-12 18:51:40+00:002021-03-12 19:06:13+00:006471133242021-03-12 18:51:26+00:002021-03-12   │\n│ 6471133247480000002021-03-12 18:51:40+00:002021-03-12 19:06:13+00:006471133242021-03-12 18:51:26+00:002021-03-12   │\n│ 6471133247750000002021-03-12 18:51:40+00:002021-03-12 19:06:13+00:006471133242021-03-12 18:51:26+00:002021-03-12   │\n│ 6471133243840000002021-03-12 18:51:40+00:002021-03-12 19:06:13+00:006471133242021-03-12 18:51:26+00:002021-03-12   │\n│ 6471133243950000002021-03-12 18:51:40+00:002021-03-12 19:06:13+00:006471133242021-03-12 18:51:26+00:002021-03-12   │\n│ 6471133248630000002021-03-12 18:51:40+00:002021-03-12 19:06:13+00:006471133242021-03-12 18:51:26+00:002021-03-12   │\n│                     │\n└───────────┴──────────────┴───────────────────────────┴───────────────────────────┴───────────┴───────────────────────────┴──────────────┘\n
\n```\n:::\n:::\n\n\nSweet! Now we have workflow runs and job runs together in the same table, let's start exploring summarization.\n\nLet's encode our knowledge about when the poetry move happened and also when we moved to the team plan.\n\n::: {#8cad3b01 .cell execution_count=11}\n``` {.python .cell-code}\nfrom datetime import date\n\nPOETRY_MERGED_DATE = date(2021, 10, 15)\nTEAMIZATION_DATE = date(2022, 11, 28)\n```\n:::\n\n\nLet's compute some indicator variables indicating whether a given row contains data after poetry changes occurred, and do the same for the team plan.\n\nLet's also compute queueing time and workflow duration.\n\n::: {#1c5210e6 .cell execution_count=12}\n``` {.python .cell-code}\nstats = joined.select(\n _.started_date,\n _.job_duration,\n has_poetry=_.started_date > POETRY_MERGED_DATE,\n has_team=_.started_date > TEAMIZATION_DATE,\n queueing_time=_.last_job_started_at.delta(_.run_started_at, \"microsecond\"),\n workflow_duration=_.last_job_completed_at.delta(_.run_started_at, \"microsecond\"),\n)\nstats\n```\n\n::: {.cell-output .cell-output-display execution_count=12}\n```{=html}\n
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┓\n┃ started_date  job_duration  has_poetry  has_team  queueing_time  workflow_duration ┃\n┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━┩\n│ dateint64booleanbooleanint64int64             │\n├──────────────┼──────────────┼────────────┼──────────┼───────────────┼───────────────────┤\n│ 2021-07-251000000 │ False      │ False    │       70000008000000 │\n│ 2021-07-250 │ False      │ False    │      1000000010000000 │\n│ 2021-07-25389000000 │ False      │ False    │      110000001010000000 │\n│ 2021-07-25482000000 │ False      │ False    │      110000001010000000 │\n│ 2021-07-25451000000 │ False      │ False    │      110000001010000000 │\n│ 2021-07-25519000000 │ False      │ False    │      110000001010000000 │\n│ 2021-07-25733000000 │ False      │ False    │      110000001010000000 │\n│ 2021-07-25758000000 │ False      │ False    │      110000001010000000 │\n│ 2021-07-25388000000 │ False      │ False    │      110000001010000000 │\n│ 2021-07-25403000000 │ False      │ False    │      110000001010000000 │\n│  │\n└──────────────┴──────────────┴────────────┴──────────┴───────────────┴───────────────────┘\n
\n```\n:::\n:::\n\n\nLet's create a column ranging from 0 to 2 inclusive where:\n\n- 0: no improvements\n- 1: just poetry\n- 2: poetry and the team plan\n\nLet's also give them some names that'll look nice on our plots.\n\n::: {#8b6cf051 .cell execution_count=13}\n``` {.python .cell-code}\nstats = stats.mutate(\n raw_improvements=_.has_poetry.cast(\"int\") + _.has_team.cast(\"int\")\n).mutate(\n improvements=(\n _.raw_improvements.case()\n .when(0, \"None\")\n .when(1, \"Poetry\")\n .when(2, \"Poetry + Team Plan\")\n .else_(\"NA\")\n .end()\n ),\n team_plan=ibis.where(_.raw_improvements > 1, \"Poetry + Team Plan\", \"None\"),\n)\nstats\n```\n\n::: {.cell-output .cell-output-display execution_count=13}\n```{=html}\n
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━┓\n┃ started_date  job_duration  has_poetry  has_team  queueing_time  workflow_duration  raw_improvements  improvements  team_plan ┃\n┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━┩\n│ dateint64booleanbooleanint64int64int64stringstring    │\n├──────────────┼──────────────┼────────────┼──────────┼───────────────┼───────────────────┼──────────────────┼──────────────┼───────────┤\n│ 2022-10-0843000000 │ True       │ False    │       8000000510000001Poetry      None      │\n│ 2022-10-0815000000 │ True       │ False    │     4580000004730000001Poetry      None      │\n│ 2022-10-080 │ True       │ False    │      16000000160000001Poetry      None      │\n│ 2022-10-088000000 │ True       │ False    │      16000000240000001Poetry      None      │\n│ 2022-10-080 │ True       │ False    │      16000000160000001Poetry      None      │\n│ 2022-10-080 │ True       │ False    │      27000000270000001Poetry      None      │\n│ 2022-10-081000000 │ True       │ False    │      25000000250000001Poetry      None      │\n│ 2022-10-080 │ True       │ False    │      25000000250000001Poetry      None      │\n│ 2022-10-081000000 │ True       │ False    │      25000000250000001Poetry      None      │\n│ 2022-10-082000000 │ True       │ False    │      25000000250000001Poetry      None      │\n│          │\n└──────────────┴──────────────┴────────────┴──────────┴───────────────┴───────────────────┴──────────────────┴──────────────┴───────────┘\n
\n```\n:::\n:::\n\n\nFinally, we can summarize by averaging the different durations, grouping on the variables of interest.\n\n::: {#427e669d .cell execution_count=14}\n``` {.python .cell-code}\nUSECS_PER_MIN = 60_000_000\n\nagged = stats.group_by([_.started_date, _.improvements, _.team_plan]).agg(\n job=_.job_duration.div(USECS_PER_MIN).mean(),\n workflow=_.workflow_duration.div(USECS_PER_MIN).mean(),\n queueing_time=_.queueing_time.div(USECS_PER_MIN).mean(),\n)\nagged\n```\n\n::: {.cell-output .cell-output-display execution_count=14}\n```{=html}\n
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓\n┃ started_date  improvements        team_plan           job        workflow   queueing_time ┃\n┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩\n│ datestringstringfloat64float64float64       │\n├──────────────┼────────────────────┼────────────────────┼───────────┼───────────┼───────────────┤\n│ 2022-08-10Poetry            None              5.35806327.10392122.845206 │\n│ 2020-10-02None              None              11.53641029.32564128.941538 │\n│ 2022-12-29Poetry + Team PlanPoetry + Team Plan2.6128436.5389345.663799 │\n│ 2022-05-25Poetry            None              2.1716209.6412429.223480 │\n│ 2021-03-23None              None              9.90827318.1210041.824016 │\n│ 2022-10-21Poetry            None              3.34787312.1950769.156614 │\n│ 2021-01-31None              None              0.0000000.2666670.266667 │\n│ 2021-07-27None              None              0.0166670.1833330.166667 │\n│ 2022-03-15Poetry            None              2.2754189.0912008.498640 │\n│ 2021-12-12Poetry            None              4.24576715.02757910.329464 │\n│  │\n└──────────────┴────────────────────┴────────────────────┴───────────┴───────────┴───────────────┘\n
\n```\n:::\n:::\n\n\nIf at any point you want to inspect the SQL you'll be running, ibis has you covered with `ibis.to_sql`.\n\n::: {#59c65db6 .cell execution_count=15}\n``` {.python .cell-code}\nibis.to_sql(agged)\n```\n\n::: {.cell-output .cell-output-display execution_count=15}\n```sql\nWITH t0 AS (\n SELECT\n t6.*\n FROM `ibis-gbq`.workflows.jobs AS t6\n WHERE\n t6.`started_at` < '2023-01-09'\n), t1 AS (\n SELECT\n t6.`id`,\n t6.`run_started_at`,\n DATE(t6.`run_started_at`) AS `started_date`\n FROM `ibis-gbq`.workflows.workflows AS t6\n), t2 AS (\n SELECT\n t0.`run_id`,\n TIMESTAMP_DIFF(t0.`completed_at`, t0.`started_at`, MICROSECOND) AS `job_duration`,\n MAX(t0.`started_at`) OVER (PARTITION BY t0.`run_id`) AS `last_job_started_at`,\n MAX(t0.`completed_at`) OVER (PARTITION BY t0.`run_id`) AS `last_job_completed_at`\n FROM t0\n), t3 AS (\n SELECT\n `started_date`,\n `job_duration`,\n `started_date` > CAST('2021-10-15' AS DATE) AS `has_poetry`,\n `started_date` > CAST('2022-11-28' AS DATE) AS `has_team`,\n TIMESTAMP_DIFF(`last_job_started_at`, `run_started_at`, MICROSECOND) AS `queueing_time`,\n TIMESTAMP_DIFF(`last_job_completed_at`, `run_started_at`, MICROSECOND) AS `workflow_duration`\n FROM t2\n INNER JOIN t1\n ON t2.`run_id` = t1.`id`\n), t4 AS (\n SELECT\n t3.*,\n CAST(t3.`has_poetry` AS INT64) + CAST(t3.`has_team` AS INT64) AS `raw_improvements`\n FROM t3\n)\nSELECT\n t5.`started_date`,\n t5.`improvements`,\n t5.`team_plan`,\n avg(IEEE_DIVIDE(t5.`job_duration`, 60000000)) AS `job`,\n avg(IEEE_DIVIDE(t5.`workflow_duration`, 60000000)) AS `workflow`,\n avg(IEEE_DIVIDE(t5.`queueing_time`, 60000000)) AS `queueing_time`\nFROM (\n SELECT\n t4.*,\n CASE t4.`raw_improvements`\n WHEN 0\n THEN 'None'\n WHEN 1\n THEN 'Poetry'\n WHEN 2\n THEN 'Poetry + Team Plan'\n ELSE 'NA'\n END AS `improvements`,\n IF(t4.`raw_improvements` > 1, 'Poetry + Team Plan', 'None') AS `team_plan`\n FROM t4\n) AS t5\nGROUP BY\n 1,\n 2,\n 3\n```\n:::\n:::\n\n\n# Plot the Results\n\nIbis doesn't have builtin plotting support, so we need to pull our results into pandas.\n\nHere I'm using `plotnine` (a Python port of `ggplot2`), which has great integration with pandas DataFrames.\n\n::: {#4ea62468 .cell execution_count=16}\n``` {.python .cell-code}\nraw_df = agged.execute()\nraw_df\n```\n\n::: {.cell-output .cell-output-display execution_count=16}\n```{=html}\n
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
started_dateimprovementsteam_planjobworkflowqueueing_time
02021-11-03PoetryNone3.94825118.43806117.952411
12020-10-01NoneNone9.91531526.17984226.083483
22022-08-23PoetryNone2.74435012.83958012.064237
32021-06-09NoneNone8.04447715.9381781.141473
42022-06-13PoetryNone3.11722615.78242114.715766
.....................
7792020-12-03NoneNone10.91371339.73248939.495992
7802021-10-21PoetryNone3.78110831.42346528.041193
7812021-12-14PoetryNone3.24021713.77885210.919449
7822023-01-02Poetry + Team PlanPoetry + Team Plan3.14457510.1167227.886025
7832022-02-02PoetryNone3.11933425.05440723.989267
\n

784 rows × 6 columns

\n
\n```\n:::\n:::\n\n\nGenerally, `plotnine` works with long, tidy data so let's use `pandas.melt` to get there.\n\n::: {#5b55980e .cell execution_count=17}\n``` {.python .cell-code}\nimport pandas as pd\n\ndf = pd.melt(\n raw_df,\n id_vars=[\"started_date\", \"improvements\", \"team_plan\"],\n var_name=\"entity\",\n value_name=\"duration\",\n)\ndf.head()\n```\n\n::: {.cell-output .cell-output-display execution_count=17}\n```{=html}\n
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
started_dateimprovementsteam_planentityduration
02021-11-03PoetryNonejob3.948251
12020-10-01NoneNonejob9.915315
22022-08-23PoetryNonejob2.744350
32021-06-09NoneNonejob8.044477
42022-06-13PoetryNonejob3.117226
\n
\n```\n:::\n:::\n\n\nLet's make our theme lighthearted by using `xkcd`-style plots.\n\n::: {#d149c514 .cell execution_count=18}\n``` {.python .cell-code}\nfrom plotnine import *\n\ntheme_set(theme_xkcd())\n```\n:::\n\n\nCreate a few labels for our plot.\n\n::: {#ee3add6c .cell execution_count=19}\n``` {.python .cell-code}\npoetry_label = f\"Poetry\\n{POETRY_MERGED_DATE}\"\nteam_label = f\"Team Plan\\n{TEAMIZATION_DATE}\"\n```\n:::\n\n\nWithout the following line you may see large amount of inconsequential warnings that make the notebook unusable.\n\n::: {#1bc4c94f .cell execution_count=20}\n``` {.python .cell-code}\nimport logging\n\n# without this, findfont logging spams the notebook making it unusable\nlogging.getLogger('matplotlib.font_manager').disabled = True\n```\n:::\n\n\nHere we show job durations, coloring the points differently depending on whether they have no improvements, poetry, or poetry + team plan.\n\n::: {#3b549c85 .cell execution_count=21}\n``` {.python .cell-code}\n(\n ggplot(\n df.loc[df.entity == \"job\"].reset_index(drop=True),\n aes(x=\"started_date\", y=\"duration\", color=\"factor(improvements)\"),\n )\n + geom_point()\n + geom_vline(\n xintercept=[TEAMIZATION_DATE, POETRY_MERGED_DATE],\n colour=[\"blue\", \"green\"],\n linetype=\"dashed\",\n )\n + scale_color_brewer(\n palette=7,\n type='qual',\n limits=[\"None\", \"Poetry\", \"Poetry + Team Plan\"],\n )\n + geom_text(x=POETRY_MERGED_DATE, label=poetry_label, y=15, color=\"blue\")\n + geom_text(x=TEAMIZATION_DATE, label=team_label, y=10, color=\"blue\")\n + stat_smooth(method=\"lm\")\n + labs(x=\"Date\", y=\"Duration (minutes)\")\n + ggtitle(\"Job Duration\")\n + theme(\n figure_size=(22, 6),\n legend_position=(0.67, 0.65),\n legend_direction=\"vertical\",\n )\n)\n```\n\n::: {.cell-output .cell-output-display}\n![](index_files/figure-html/cell-22-output-1.png){}\n:::\n\n::: {.cell-output .cell-output-display execution_count=21}\n```\n
\n```\n:::\n:::\n\n\n## Result #1: Job Duration\n\nThis result is pretty interesting.\n\nA few things pop out to me right away:\n\n- The move to poetry decreased the average job run duration by quite a bit. No, I'm not going to do any statistical tests.\n- The variability of job run durations also decreased by quite a bit after introducing poetry.\n- Moving to the team plan had little to no effect on job run duration.\n\n::: {#bdc51c16 .cell execution_count=22}\n``` {.python .cell-code}\n(\n ggplot(\n df.loc[df.entity != \"job\"].reset_index(drop=True),\n aes(x=\"started_date\", y=\"duration\", color=\"factor(improvements)\"),\n )\n + facet_wrap(\"entity\", ncol=1)\n + geom_point()\n + geom_vline(\n xintercept=[TEAMIZATION_DATE, POETRY_MERGED_DATE],\n linetype=\"dashed\",\n )\n + scale_color_brewer(\n palette=7,\n type='qual',\n limits=[\"None\", \"Poetry\", \"Poetry + Team Plan\"],\n )\n + geom_text(x=POETRY_MERGED_DATE, label=poetry_label, y=75, color=\"blue\")\n + geom_text(x=TEAMIZATION_DATE, label=team_label, y=50, color=\"blue\")\n + stat_smooth(method=\"lm\")\n + labs(x=\"Date\", y=\"Duration (minutes)\")\n + ggtitle(\"Workflow Duration\")\n + theme(\n figure_size=(22, 13),\n legend_position=(0.68, 0.75),\n legend_direction=\"vertical\",\n )\n)\n```\n\n::: {.cell-output .cell-output-display}\n![](index_files/figure-html/cell-23-output-1.png){}\n:::\n\n::: {.cell-output .cell-output-display execution_count=22}\n```\n
\n```\n:::\n:::\n\n\n## Result #2: Workflow Duration and Queueing Time\n\nAnother interesting result.\n\n### Queueing Time\n\n- It almost looks like moving to poetry made average queueing time worse. This is probably due to our perception that faster jobs means faster ci. As we see here that isn't the case\n- Moving to the team plan cut down the queueing time by quite a bit\n\n### Workflow Duration\n\n- Overall workflow duration appears to be strongly influenced by moving to the team plan, which is almost certainly due to the drop in queueing time since we are no longer limited by slow job durations.\n- Perhaps it's obvious, but queueing time and workflow duration appear to be highly correlated.\n\nIn the next plot we'll look at that correlation.\n\n::: {#a411cf31 .cell execution_count=23}\n``` {.python .cell-code}\n(\n ggplot(raw_df, aes(x=\"workflow\", y=\"queueing_time\"))\n + geom_point()\n + geom_rug()\n + facet_grid(\". ~ team_plan\")\n + labs(x=\"Workflow Duration (minutes)\", y=\"Queueing Time (minutes)\")\n + ggtitle(\"Workflow Duration vs. Queueing Time\")\n + theme(figure_size=(22, 6))\n)\n```\n\n::: {.cell-output .cell-output-display}\n![](index_files/figure-html/cell-24-output-1.png){}\n:::\n\n::: {.cell-output .cell-output-display execution_count=23}\n```\n
\n```\n:::\n:::\n\n\n## Result #3: Workflow Duration and Queueing Duration are correlated\n\nIt also seems that moving to the team plan (though also the move to poetry might be related here) reduced the variability of both metrics.\n\nWe're lacking data compared to the past so we should wait for more to come in.\n\n## Conclusions\n\nIt appears that you need both a short queue time **and** fast individual jobs to minimize time spent in CI.\n\nIf you have a short queue time, but long job runs then you'll be bottlenecked on individual jobs, and if you have more jobs than queue slots then you'll be blocked on queueing time.\n\nI think we can sum this up nicely:\n\n- slow jobs, slow queue: 🤷 blocked by jobs or queue\n- slow jobs, fast queue: ❓ blocked by jobs, if jobs are slow enough\n- fast jobs, slow queue: ❗ blocked by queue, with enough jobs\n- fast jobs, fast queue: ✅\n\n", "supporting": [ "index_files" ], diff --git a/docs/_freeze/posts/ci-analysis/index/figure-html/cell-22-output-1.png b/docs/_freeze/posts/ci-analysis/index/figure-html/cell-22-output-1.png index fd37b9f1c69f..db66edc13873 100644 Binary files a/docs/_freeze/posts/ci-analysis/index/figure-html/cell-22-output-1.png and b/docs/_freeze/posts/ci-analysis/index/figure-html/cell-22-output-1.png differ diff --git a/docs/_freeze/posts/ci-analysis/index/figure-html/cell-23-output-1.png b/docs/_freeze/posts/ci-analysis/index/figure-html/cell-23-output-1.png index 235aa83fb977..9a30d20994a8 100644 Binary files a/docs/_freeze/posts/ci-analysis/index/figure-html/cell-23-output-1.png and b/docs/_freeze/posts/ci-analysis/index/figure-html/cell-23-output-1.png differ diff --git a/docs/_freeze/posts/ci-analysis/index/figure-html/cell-24-output-1.png b/docs/_freeze/posts/ci-analysis/index/figure-html/cell-24-output-1.png index 0a4fb81d97f0..e96615b7026a 100644 Binary files a/docs/_freeze/posts/ci-analysis/index/figure-html/cell-24-output-1.png and b/docs/_freeze/posts/ci-analysis/index/figure-html/cell-24-output-1.png differ diff --git a/docs/posts/ci-analysis/index.qmd b/docs/posts/ci-analysis/index.qmd index 271161cecaa1..5babc2c6d0c6 100644 --- a/docs/posts/ci-analysis/index.qmd +++ b/docs/posts/ci-analysis/index.qmd @@ -119,7 +119,7 @@ We also need to compute when the last job for a given `run_id` started and when run_id_win = ibis.window(group_by=_.run_id) jobs = jobs.select( _.run_id, - job_duration=_.completed_at.cast("int") - _.started_at.cast("int"), + job_duration=_.completed_at.delta(_.started_at, "microsecond"), last_job_started_at=_.started_at.max().over(run_id_win), last_job_completed_at=_.completed_at.max().over(run_id_win), ) @@ -184,10 +184,8 @@ stats = joined.select( _.job_duration, has_poetry=_.started_date > POETRY_MERGED_DATE, has_team=_.started_date > TEAMIZATION_DATE, - queueing_time=_.last_job_started_at.cast("int") - - _.run_started_at.cast("int"), - workflow_duration=_.last_job_completed_at.cast("int") - - _.run_started_at.cast("int"), + queueing_time=_.last_job_started_at.delta(_.run_started_at, "microsecond"), + workflow_duration=_.last_job_completed_at.delta(_.run_started_at, "microsecond"), ) stats ``` diff --git a/ibis/backends/bigquery/registry.py b/ibis/backends/bigquery/registry.py index 1e0489ae73a2..81b7bb0cbe7e 100644 --- a/ibis/backends/bigquery/registry.py +++ b/ibis/backends/bigquery/registry.py @@ -746,6 +746,34 @@ def _count_distinct_star(t, op): ) +def _time_delta(t, op): + left = t.translate(op.left) + right = t.translate(op.right) + return f"TIME_DIFF({left}, {right}, {op.part.value.upper()})" + + +def _date_delta(t, op): + left = t.translate(op.left) + right = t.translate(op.right) + return f"DATE_DIFF({left}, {right}, {op.part.value.upper()})" + + +def _timestamp_delta(t, op): + left = t.translate(op.left) + right = t.translate(op.right) + left_tz = op.left.dtype.timezone + right_tz = op.right.dtype.timezone + args = f"{left}, {right}, {op.part.value.upper()}" + if left_tz is None and right_tz is None: + return f"DATETIME_DIFF({args})" + elif left_tz is not None and right_tz is not None: + return f"TIMESTAMP_DIFF({args})" + else: + raise NotImplementedError( + "timestamp difference with mixed timezone/timezoneless values is not implemented" + ) + + OPERATION_REGISTRY = { **operation_registry, # Literal @@ -906,6 +934,9 @@ def _count_distinct_star(t, op): ops.CountDistinctStar: _count_distinct_star, ops.Argument: lambda _, op: op.name, ops.Unnest: unary("UNNEST"), + ops.TimeDelta: _time_delta, + ops.DateDelta: _date_delta, + ops.TimestampDelta: _timestamp_delta, } _invalid_operations = { diff --git a/ibis/backends/clickhouse/compiler/values.py b/ibis/backends/clickhouse/compiler/values.py index 6d3770b9020e..084e9c4dba0a 100644 --- a/ibis/backends/clickhouse/compiler/values.py +++ b/ibis/backends/clickhouse/compiler/values.py @@ -1063,3 +1063,9 @@ def _scalar_udf(op, **kw) -> str: @translate_val.register(ops.AggUDF) def _agg_udf(op, *, where, **kw) -> str: return agg[op.__full_name__](*kw.values(), where=where) + + +@translate_val.register(ops.DateDelta) +@translate_val.register(ops.TimestampDelta) +def _delta(op, *, part, left, right, **_): + return sg.exp.DateDiff(this=left, expression=right, unit=part) diff --git a/ibis/backends/duckdb/registry.py b/ibis/backends/duckdb/registry.py index f9caf8505957..1ad137e0113a 100644 --- a/ibis/backends/duckdb/registry.py +++ b/ibis/backends/duckdb/registry.py @@ -326,6 +326,11 @@ def _try_cast(t, op): return try_cast(arg, type_=to) +_temporal_delta = fixed_arity( + lambda part, start, end: sa.func.date_diff(part, end, start), 3 +) + + operation_registry.update( { ops.ArrayColumn: ( @@ -469,6 +474,9 @@ def _try_cast(t, op): ops.First: reduction(sa.func.first), ops.Last: reduction(sa.func.last), ops.ArrayIntersect: _array_intersect, + ops.TimeDelta: _temporal_delta, + ops.DateDelta: _temporal_delta, + ops.TimestampDelta: _temporal_delta, } ) diff --git a/ibis/backends/mssql/registry.py b/ibis/backends/mssql/registry.py index a9b1e044c926..b57cc6fb4033 100644 --- a/ibis/backends/mssql/registry.py +++ b/ibis/backends/mssql/registry.py @@ -106,6 +106,12 @@ def _timestamp_truncate(t, op): return sa.func.datetrunc(sa.text(_truncate_precisions[unit]), arg) +def _temporal_delta(t, op): + left = t.translate(op.left) + right = t.translate(op.right) + return sa.func.datediff(sa.literal_column(op.part.value.upper()), right, left) + + operation_registry = sqlalchemy_operation_registry.copy() operation_registry.update(sqlalchemy_window_functions_registry) @@ -197,6 +203,9 @@ def _timestamp_truncate(t, op): ops.ExtractMicrosecond: fixed_arity( lambda arg: sa.func.datepart(sa.literal_column("microsecond"), arg), 1 ), + ops.TimeDelta: _temporal_delta, + ops.DateDelta: _temporal_delta, + ops.TimestampDelta: _temporal_delta, } ) diff --git a/ibis/backends/mysql/registry.py b/ibis/backends/mysql/registry.py index 2359bd4ec470..d9227d9bcde6 100644 --- a/ibis/backends/mysql/registry.py +++ b/ibis/backends/mysql/registry.py @@ -80,23 +80,29 @@ def _interval_from_integer(t, op): def _literal(_, op): - if op.dtype.is_interval(): - if op.dtype.unit.short in {"ms", "ns"}: + dtype = op.dtype + value = op.value + if value is None: + return sa.null() + if dtype.is_interval(): + if dtype.unit.short in {"ms", "ns"}: raise com.UnsupportedOperationError( - "MySQL does not allow operation " - f"with INTERVAL offset {op.dtype.unit}" + f"MySQL does not allow operation with INTERVAL offset {dtype.unit}" ) - text_unit = op.dtype.resolution.upper() + text_unit = dtype.resolution.upper() sa_text = sa.text(f"INTERVAL :value {text_unit}") - return sa_text.bindparams(value=op.value) - elif op.dtype.is_binary(): + return sa_text.bindparams(value=value) + elif dtype.is_binary(): # the cast to BINARY is necessary here, otherwise the data come back as # Python strings # # This lets the database handle encoding rather than ibis - return sa.cast(sa.literal(op.value), type_=sa.BINARY()) + return sa.cast(sa.literal(value), type_=sa.BINARY()) + elif dtype.is_time(): + return sa.func.maketime( + value.hour, value.minute, value.second + value.microsecond / 1e6 + ) else: - value = op.value with contextlib.suppress(AttributeError): value = value.to_pydatetime() @@ -167,6 +173,13 @@ def compiles_mysql_trim(element, compiler, **kw): ) +def _temporal_delta(t, op): + left = t.translate(op.left) + right = t.translate(op.right) + part = sa.literal_column(op.part.value.upper()) + return sa.func.timestampdiff(part, right, left) + + operation_registry.update( { ops.Literal: _literal, @@ -242,6 +255,8 @@ def compiles_mysql_trim(element, compiler, **kw): ops.Strip: unary(lambda arg: _mysql_trim(arg, "both")), ops.LStrip: unary(lambda arg: _mysql_trim(arg, "leading")), ops.RStrip: unary(lambda arg: _mysql_trim(arg, "trailing")), + ops.TimeDelta: _temporal_delta, + ops.DateDelta: _temporal_delta, } ) diff --git a/ibis/backends/snowflake/registry.py b/ibis/backends/snowflake/registry.py index 98e308920139..e283d43855ca 100644 --- a/ibis/backends/snowflake/registry.py +++ b/ibis/backends/snowflake/registry.py @@ -59,6 +59,9 @@ def _literal(t, op): return sa.func.timestamp_from_parts(*args) elif dtype.is_date(): return sa.func.date_from_parts(value.year, value.month, value.day) + elif dtype.is_time(): + nanos = value.microsecond * 1_000 + return sa.func.time_from_parts(value.hour, value.minute, value.second, nanos) elif dtype.is_array(): return sa.func.array_construct(*value) elif dtype.is_map() or dtype.is_struct(): @@ -461,6 +464,15 @@ def _map_get(t, op): ops.Levenshtein: fixed_arity(sa.func.editdistance, 2), ops.ArraySort: unary(sa.func.ibis_udfs.public.array_sort), ops.ArrayRepeat: fixed_arity(sa.func.ibis_udfs.public.array_repeat, 2), + ops.TimeDelta: fixed_arity( + lambda part, left, right: sa.func.timediff(part, right, left), 3 + ), + ops.DateDelta: fixed_arity( + lambda part, left, right: sa.func.datediff(part, right, left), 3 + ), + ops.TimestampDelta: fixed_arity( + lambda part, left, right: sa.func.timestampdiff(part, right, left), 3 + ), } ) diff --git a/ibis/backends/tests/test_temporal.py b/ibis/backends/tests/test_temporal.py index 26b2c4c5b654..28384bb97856 100644 --- a/ibis/backends/tests/test_temporal.py +++ b/ibis/backends/tests/test_temporal.py @@ -2400,3 +2400,62 @@ def test_timestamp_precision_output(con, ts, scale, unit): result = con.execute(expr) expected = pd.Timestamp(ts).floor(unit) assert result == expected + + +@pytest.mark.notimpl( + [ + "dask", + "datafusion", + "druid", + "flink", + "impala", + "oracle", + "pandas", + "polars", + "pyspark", + "sqlite", + ], + raises=com.OperationNotDefinedError, +) +@pytest.mark.notyet( + ["postgres"], + reason="postgres doesn't have any easy way to accurately compute the delta in specific units", + raises=com.OperationNotDefinedError, +) +@pytest.mark.parametrize( + ("start", "end", "unit", "expected"), + [ + param( + ibis.time("01:58:00"), + ibis.time("23:59:59"), + "hour", + 22, + id="time", + marks=[ + pytest.mark.notimpl( + ["clickhouse"], + raises=NotImplementedError, + reason="time types not yet implemented in ibis for the clickhouse backend", + ) + ], + ), + param(ibis.date("1992-09-30"), ibis.date("1992-10-01"), "day", 1, id="date"), + param( + ibis.timestamp("1992-09-30 23:59:59"), + ibis.timestamp("1992-10-01 01:58:00"), + "hour", + 2, + id="timestamp", + marks=[ + pytest.mark.notimpl( + ["mysql"], + raises=com.OperationNotDefinedError, + reason="timestampdiff rounds after subtraction and mysql doesn't have a date_trunc function", + ) + ], + ), + ], +) +def test_delta(con, start, end, unit, expected): + expr = end.delta(start, unit) + assert con.execute(expr) == expected diff --git a/ibis/backends/trino/registry.py b/ibis/backends/trino/registry.py index 6c30fce4fe6e..ca9ca4554da6 100644 --- a/ibis/backends/trino/registry.py +++ b/ibis/backends/trino/registry.py @@ -317,6 +317,13 @@ def _array_intersect(t, op): ) +_temporal_delta = fixed_arity( + lambda part, left, right: sa.func.date_diff( + part, sa.func.date_trunc(part, right), sa.func.date_trunc(part, left) + ), + 3, +) + operation_registry.update( { # conditional expressions @@ -503,6 +510,11 @@ def _array_intersect(t, op): ), ops.Levenshtein: fixed_arity(sa.func.levenshtein_distance, 2), ops.ArrayIntersect: _array_intersect, + # trino truncates _after_ the delta, whereas many other backends + # truncates each operand + ops.TimeDelta: _temporal_delta, + ops.DateDelta: _temporal_delta, + ops.TimestampDelta: _temporal_delta, } ) diff --git a/ibis/expr/operations/temporal.py b/ibis/expr/operations/temporal.py index f79b776e0712..8a5bf0827875 100644 --- a/ibis/expr/operations/temporal.py +++ b/ibis/expr/operations/temporal.py @@ -349,4 +349,28 @@ class BetweenTime(Between): upper_bound: Value[dt.Time | dt.String] +class TemporalDelta(Value): + part: Value[dt.String] + shape = rlz.shape_like("args") + dtype = dt.int64 + + +@public +class TimeDelta(TemporalDelta): + left: Value[dt.Time] + right: Value[dt.Time] + + +@public +class DateDelta(TemporalDelta): + left: Value[dt.Date] + right: Value[dt.Date] + + +@public +class TimestampDelta(TemporalDelta): + left: Value[dt.Timestamp] + right: Value[dt.Timestamp] + + public(ExtractTimestampField=ExtractTemporalField) diff --git a/ibis/expr/types/temporal.py b/ibis/expr/types/temporal.py index 39a8dd648a72..e93a3bb29cc2 100644 --- a/ibis/expr/types/temporal.py +++ b/ibis/expr/types/temporal.py @@ -253,6 +253,67 @@ def __rsub__(self, other: ops.Value[dt.Interval | dt.Time, ds.Any]): rsub = __rsub__ + def delta( + self, other: datetime.time | Value[dt.Time], part: str + ) -> ir.IntegerValue: + """Compute the number of `part`s between two times. + + ::: {.callout-note} + ## The order of operands matches standard subtraction + + The second argument is subtracted from the first. + ::: + + Parameters + ---------- + other + A time expression + part + The unit of time to compute the difference in + + Returns + ------- + IntegerValue + The number of `part`s between `self` and `other` + + Examples + -------- + >>> import ibis + >>> ibis.options.interactive = True + >>> start = ibis.time("01:58:00") + >>> end = ibis.time("23:59:59") + >>> end.delta(start, "hour") + 22 + >>> data = '''tpep_pickup_datetime,tpep_dropoff_datetime + ... 2016-02-01T00:23:56,2016-02-01T00:42:28 + ... 2016-02-01T00:12:14,2016-02-01T00:21:41 + ... 2016-02-01T00:43:24,2016-02-01T00:46:14 + ... 2016-02-01T00:55:11,2016-02-01T01:24:34 + ... 2016-02-01T00:11:13,2016-02-01T00:16:59''' + >>> with open("/tmp/triptimes.csv", "w") as f: + ... _ = f.write(data) + ... + >>> taxi = ibis.read_csv("/tmp/triptimes.csv") + >>> ride_duration = ( + ... taxi.tpep_dropoff_datetime.time() + ... .delta(taxi.tpep_pickup_datetime.time(), "minute") + ... .name("ride_minutes") + ... ) + >>> ride_duration + ┏━━━━━━━━━━━━━━┓ + ┃ ride_minutes ┃ + ┡━━━━━━━━━━━━━━┩ + │ int64 │ + ├──────────────┤ + │ 19 │ + │ 9 │ + │ 3 │ + │ 29 │ + │ 5 │ + └──────────────┘ + """ + return ops.TimeDelta(left=self, right=other, part=part).to_expr() + @public class TimeScalar(TemporalScalar, TimeValue): @@ -338,6 +399,62 @@ def __rsub__(self, other: ops.Value[dt.Date | dt.Interval, ds.Any]): rsub = __rsub__ + def delta( + self, other: datetime.date | Value[dt.Date], part: str + ) -> ir.IntegerValue: + """Compute the number of `part`s between two dates. + + ::: {.callout-note} + ## The order of operands matches standard subtraction + + The second argument is subtracted from the first. + ::: + + Parameters + ---------- + other + A date expression + part + The unit of time to compute the difference in + + Returns + ------- + IntegerValue + The number of `part`s between `self` and `other` + + Examples + -------- + >>> import ibis + >>> ibis.options.interactive = True + >>> start = ibis.date("1992-09-30") + >>> end = ibis.date("1992-10-01") + >>> end.delta(start, "day") + 1 + >>> prez = ibis.examples.presidential.fetch() + >>> prez.mutate( + ... years_in_office=prez.end.delta(prez.start, "year"), + ... hours_in_office=prez.end.delta(prez.start, "hour"), + ... ).drop("party") + ┏━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┓ + ┃ name ┃ start ┃ end ┃ years_in_office ┃ hours_in_office ┃ + ┡━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━┩ + │ string │ date │ date │ int64 │ int64 │ + ├────────────┼────────────┼────────────┼─────────────────┼─────────────────┤ + │ Eisenhower │ 1953-01-20 │ 1961-01-20 │ 8 │ 70128 │ + │ Kennedy │ 1961-01-20 │ 1963-11-22 │ 2 │ 24864 │ + │ Johnson │ 1963-11-22 │ 1969-01-20 │ 6 │ 45264 │ + │ Nixon │ 1969-01-20 │ 1974-08-09 │ 5 │ 48648 │ + │ Ford │ 1974-08-09 │ 1977-01-20 │ 3 │ 21480 │ + │ Carter │ 1977-01-20 │ 1981-01-20 │ 4 │ 35064 │ + │ Reagan │ 1981-01-20 │ 1989-01-20 │ 8 │ 70128 │ + │ Bush │ 1989-01-20 │ 1993-01-20 │ 4 │ 35064 │ + │ Clinton │ 1993-01-20 │ 2001-01-20 │ 8 │ 70128 │ + │ Bush │ 2001-01-20 │ 2009-01-20 │ 8 │ 70128 │ + │ … │ … │ … │ … │ … │ + └────────────┴────────────┴────────────┴─────────────────┴─────────────────┘ + """ + return ops.DateDelta(left=self, right=other, part=part).to_expr() + @public class DateScalar(TemporalScalar, DateValue): @@ -436,6 +553,65 @@ def __rsub__(self, other: ops.Value[dt.Timestamp | dt.Interval, ds.Any]): rsub = __rsub__ + def delta( + self, other: datetime.datetime | Value[dt.Timestamp], part: str + ) -> ir.IntegerValue: + """Compute the number of `part`s between two timestamps. + + ::: {.callout-note} + ## The order of operands matches standard subtraction + + The second argument is subtracted from the first. + ::: + + Parameters + ---------- + other + A timestamp expression + part + The unit of time to compute the difference in + + Returns + ------- + IntegerValue + The number of `part`s between `self` and `other` + + Examples + -------- + >>> import ibis + >>> ibis.options.interactive = True + >>> start = ibis.time("01:58:00") + >>> end = ibis.time("23:59:59") + >>> end.delta(start, "hour") + 22 + >>> data = '''tpep_pickup_datetime,tpep_dropoff_datetime + ... 2016-02-01T00:23:56,2016-02-01T00:42:28 + ... 2016-02-01T00:12:14,2016-02-01T00:21:41 + ... 2016-02-01T00:43:24,2016-02-01T00:46:14 + ... 2016-02-01T00:55:11,2016-02-01T01:24:34 + ... 2016-02-01T00:11:13,2016-02-01T00:16:59''' + >>> with open("/tmp/triptimes.csv", "w") as f: + ... _ = f.write(data) + ... + >>> taxi = ibis.read_csv("/tmp/triptimes.csv") + >>> ride_duration = taxi.tpep_dropoff_datetime.delta( + ... taxi.tpep_pickup_datetime, "minute" + ... ).name("ride_minutes") + >>> ride_duration + ┏━━━━━━━━━━━━━━┓ + ┃ ride_minutes ┃ + ┡━━━━━━━━━━━━━━┩ + │ int64 │ + ├──────────────┤ + │ 19 │ + │ 9 │ + │ 3 │ + │ 29 │ + │ 5 │ + └──────────────┘ + """ + return ops.TimestampDelta(left=self, right=other, part=part).to_expr() + @public class TimestampScalar(TemporalScalar, TimestampValue):