Skip to content

Commit

Permalink
Merge pull request #11 from davidkelliott/feature/add-dates
Browse files Browse the repository at this point in the history
Adding date created parameter
  • Loading branch information
davidkelliott authored Jun 14, 2023
2 parents 75e3b79 + 7eba185 commit 70905c8
Show file tree
Hide file tree
Showing 6 changed files with 70 additions and 32 deletions.
12 changes: 7 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,15 +8,17 @@ The repo contains five python scripts, `cfr.py`, `df.py`, `ltfc.py` and `mttr.py

## How are the metrics calculated?

* `cfr.py` -- computes Change Failure Rate by retrieving the last 1000 workflow runs against the `main` branch and then dividing the number of unsuccesful runs by the total number of runs and multiplying bu 100% to get a percentage. Assumptions are made regarding what each failure represents, and I offer no opinion on how reasonable this assumption is.
* `df.py` -- computes Deployment Frequency by retrieving the last 1000 workflow runs against the `main` branch, computing the number of days between the first and the last run and the dividing the number of runs by the number of days to get an approximate number of deployments to production over the 24 hour period.
* `ltfc.py` -- computes Lead Time for Change and is the longest-executing script for large number of PRs, as it must retrieve commits for every merged PR. The script only retrieves commits for up to 500 merged PRs as a way to balance the execution time and having large enough data set for the metric to be meaningfull. The metric is computed by averaging out the time period between the last commit to each PR (when development can be considered complete and the change is ready for production) and the time it is merged (kicking off the deployment into production.)
* `mttr.py` -- computes Mean Time to Recovery and takes failed workflow runs on main as a proxy for failures and subsequent successful execution of the same workflow as recovery. The reasonableness of these assumptions are left up to the judgement of the reader. Doubtless, using actual monitoring outputs and alerts would provide a more accurate measurement of this metric. The metric is computed by taking the mean of the periods between the first failure of the workflow run on main and the first susequent successful run of the same workflow. Failures with no subsequent successful runs are discarded. The sum of the lengths of the failure/success periods is divided by the total number of periods to compute the metric.
* `cfr.py` -- computes Change Failure Rate by retrieving a maximum of 1000 workflow runs between two dates against the `main` branch and then dividing the number of unsuccessful runs by the total number of runs and multiplying bu 100% to get a percentage. Assumptions are made regarding what each failure represents, and I offer no opinion on how reasonable this assumption is.
* `df.py` -- computes Deployment Frequency by retrieving a maximum of 1000 workflow runs between two dates against the `main` branch, computing the number of days between the first and the last run and the dividing the number of runs by the number of days to get an approximate number of deployments to production over the 24 hour period.
* `ltfc.py` -- computes Lead Time for Change and is the longest-executing script for large number of PRs, as it must retrieve commits for every merged PR. The script only retrieves commits for up to 500 merged PRs as a way to balance the execution time and having large enough data set for the metric to be meaningful. It then checks if the PRs fall between the date range. The metric is computed by averaging out the time period between the last commit to each PR (when development can be considered complete and the change is ready for production) and the time it is merged (kicking off the deployment into production.)
* `mttr.py` -- computes Mean Time to Recovery between two dates and takes failed workflow runs on main as a proxy for failures and subsequent successful execution of the same workflow as recovery. The reasonableness of these assumptions are left up to the judgement of the reader. Doubtless, using actual monitoring outputs and alerts would provide a more accurate measurement of this metric. The metric is computed by taking the mean of the periods between the first failure of the workflow run on main and the first susequent successful run of the same workflow. Failures with no subsequent successful runs are discarded. The sum of the lengths of the failure/success periods is divided by the total number of periods to compute the metric.

## How to run them?

These scripts rely heavily on workflow runs which only authenticated users have access to, so running them requires having a [Github Personal Access Token](https://docs.github.com/en/authentication/keeping-your-account-and-data-secure/creating-a-personal-access-token) with enough permissions to get workflow runs for Github Actions on the target repositories. The scripts will retrieve the token from the `ACCESS_TOKEN` environment variable. With the envvar unset, the scripts will fail with authentication errors.

Each script takes as a parameter a `json` file which contains a list of repositories against which it should run. See `modernisation-platform.json` for an example. The reason for the `json` is that the scripts don't compute DORA metrics for a repo, the metrics are computed for a team, which the name of the file used as the team name in the script outputs. The list of repos will contain all repositories which contain the team-managed code. The metrics are computed over all the repos in the list.

To execute each script, run `python3 script_name.py team.json`
To execute each script, run `python3 script_name.py team.json 2023-04-01..2023-05-01`

Note: The date range will run from the beginning of the day eg `2023-04-01 00:00:00` to `2023-05-01 00:00:00`.
3 changes: 2 additions & 1 deletion cfr.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@
# set up the command-line argument parser
parser = argparse.ArgumentParser()
parser.add_argument('filename', help='path to the input JSON file')
parser.add_argument('date_query', help='date range in the format 2023-04-01..2023-05-01')
args = parser.parse_args()

# load the repository names from a JSON file
Expand All @@ -27,7 +28,7 @@
per_page = 100
for repo in repos:
# Define the query parameters to retrieve all workflow runs
params = {"branch": "main", "status": "completed", "per_page": per_page}
params = {"branch": "main", "status": "completed", "per_page": per_page, "created": args.date_query}

# Retrieve the workflow runs for the given repository using the provided query parameters
workflow_runs = get_workflow_runs(OWNER, repo, ACCESS_TOKEN, params)
Expand Down
3 changes: 2 additions & 1 deletion df.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@
# set up the command-line argument parser
parser = argparse.ArgumentParser()
parser.add_argument('filename', help='path to the input JSON file')
parser.add_argument('date_query', help='date range in the format 2023-04-01..2023-05-01')
args = parser.parse_args()

filename, file_extension = os.path.splitext(args.filename)
Expand All @@ -31,7 +32,7 @@
num_successful_runs = 0

for repo in repos:
params = {"branch": "main", "status": "success", "per_page": per_page}
params = {"branch": "main", "status": "success", "per_page": per_page, "created": args.date_query}
try:
runs += get_workflow_runs(OWNER,repo, ACCESS_TOKEN,params)
# Count the number of successful runs
Expand Down
54 changes: 31 additions & 23 deletions ltfc.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,12 +17,17 @@
# set up the command-line argument parser
parser = argparse.ArgumentParser()
parser.add_argument('filename', help='path to the input JSON file')
parser.add_argument('date_query', help='date range in the format 2023-04-01..2023-05-01')
args = parser.parse_args()

filename, file_extension = os.path.splitext(args.filename)

team_merged_pull_requests = 0
team_lead_time = timedelta()
date_query = args.date_query
date_range = date_query.split("..")
start_date = datetime.strptime(date_range[0], '%Y-%m-%d')
end_date = datetime.strptime(date_range[-1], '%Y-%m-%d')

# load the repository names from a JSON file
with open(args.filename, 'r') as f:
Expand All @@ -41,39 +46,42 @@
# Grabbing commits for all PRs takes too long. 500 is a reasonable compromise between execution time and accuracy

total_lead_time = timedelta()
num_pull_requests = len(merged_pull_requests[0:500:1])
num_pull_requests = 0
# print(f'Number of Pull Requests: {num_pull_requests}')
for pr in merged_pull_requests[0:500:1]:
# Get the time the PR was merged
merged_at = datetime.fromisoformat(pr["merged_at"][:-1])

# Get the time of the last commit to the PR branch prior to the merge
commits_url = pr["url"] + "/commits"
try:
commits = make_github_api_call(commits_url, ACCESS_TOKEN)
# for commit in commits:
# commit_date = datetime.fromisoformat(commit["commit"]["committer"]["date"][:-1])
# print(f"Commit date: {commit_date}")
if len(commits) > 0:
last_commit = commits[-1] # change to commits[0] to get first rather than commit.
commit_time = datetime.fromisoformat(last_commit["commit"]["committer"]["date"][:-1])
# print(f"Commit date: {commit_time}")
else:
number = pr["number"]
if start_date <= merged_at <= end_date:
# print(f"PR {pr["number"]} is between {start_date} and {end_date} with {merged_at}")
num_pull_requests += 1
# Get the time of the last commit to the PR branch prior to the merge
commits_url = pr["url"] + "/commits"
try:
commits = make_github_api_call(commits_url, ACCESS_TOKEN)
# for commit in commits:
# commit_date = datetime.fromisoformat(commit["commit"]["committer"]["date"][:-1])
# print(f"Commit date: {commit_date}")
if len(commits) > 0:
last_commit = commits[-1] # change to commits[0] to get first rather than commit.
commit_time = datetime.fromisoformat(last_commit["commit"]["committer"]["date"][:-1])
# print(f"Commit date: {commit_time}")
else:
commit_time = merged_at
except Exception as e:
# Log message if there's a problem retrieving commits
print(f"Error retrieving commits: {e}")
commit_time = merged_at
except Exception as e:
# Log message if there's a problem retrieving commits
print(f"Error retrieving commits: {e}")
commit_time = merged_at
break
break

# Calculate the lead time for the pull request
lead_time = merged_at - commit_time
total_lead_time += lead_time
# Calculate the lead time for the pull request
lead_time = merged_at - commit_time
total_lead_time += lead_time
team_lead_time += total_lead_time
team_merged_pull_requests += num_pull_requests

if team_merged_pull_requests > 0:
mean_lead_time = team_lead_time / team_merged_pull_requests
print(f"\033[32m\033[1mMean lead time for {filename} team over {team_merged_pull_requests} merged pull requests: {mean_lead_time.days} days, {mean_lead_time.seconds // 3600} hours, {(mean_lead_time.seconds % 3600) // 60} minutes\033[0m")
else:
print("No merged pull requests found.")
print("No merged pull requests found.")
27 changes: 26 additions & 1 deletion modernisation-platform.json
Original file line number Diff line number Diff line change
@@ -1,6 +1,31 @@
{
"repos": [
"modernisation-platform",
"modernisation-platform-environments"
"modernisation-platform-terraform-ec2-autoscaling-group",
"modernisation-platform-ami-builds",
"modernisation-platform-terraform-ec2-instance",
"modernisation-platform-configuration-management",
"modernisation-platform-terraform-ecs",
"modernisation-platform-cp-network-test",
"modernisation-platform-terraform-ecs-cluster",
"modernisation-platform-terraform-environments",
"modernisation-platform-github-oidc-provider",
"modernisation-platform-terraform-iam-superadmins",
"modernisation-platform-github-oidc-role",
"modernisation-platform-terraform-lambda-function",
"modernisation-platform-terraform-loadbalancer",
"modernisation-platform-incident-response",
"modernisation-platform-terraform-member-vpc",
"modernisation-platform-infrastructure-test",
"modernisation-platform-instance-scheduler",
"modernisation-platform-terraform-pagerduty-integration",
"modernisation-platform-terraform-aws-vm-import",
"modernisation-platform-terraform-s3-bucket",
"modernisation-platform-terraform-baselines",
"modernisation-platform-terraform-s3-bucket-replication-role",
"modernisation-platform-terraform-bastion-linux",
"modernisation-platform-terraform-ssm-patching",
"modernisation-platform-terraform-cross-account-access",
"modernisation-platform-terraform-trusted-advisor"
]
}
3 changes: 2 additions & 1 deletion mttr.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@
# set up the command-line argument parser
parser = argparse.ArgumentParser()
parser.add_argument('filename', help='path to the input JSON file')
parser.add_argument('date_query', help='date range in the format 2023-04-01..2023-05-01')
args = parser.parse_args()


Expand All @@ -36,7 +37,7 @@
for repo in repos:

# Get all workflow runs on the main branch
params = {"branch": "main", "per_page": per_page}
params = {"branch": "main", "per_page": per_page, "created": args.date_query}
try:
repo_run = get_workflow_runs(OWNER,repo, ACCESS_TOKEN,params)
print(f"Retrieved {len(repo_run)} workflow runs for {OWNER}/{repo}")
Expand Down

0 comments on commit 70905c8

Please sign in to comment.