Merge pull request #11 from davidkelliott/feature/add-dates

Adding date created parameter
ministryofjustice · Jun 14, 2023 · 70905c8 · 70905c8
2 parents 75e3b79 + 7eba185
commit 70905c8
Show file tree

Hide file tree

Showing 6 changed files with 70 additions and 32 deletions.
diff --git a/README.md b/README.md
@@ -8,15 +8,17 @@ The repo contains five python scripts, `cfr.py`, `df.py`, `ltfc.py` and `mttr.py
 
 ## How are the metrics calculated?
 
-* `cfr.py` -- computes Change Failure Rate by retrieving the last 1000 workflow runs against the `main` branch and then dividing the number of unsuccesful runs by the total number of runs and multiplying bu 100% to get a percentage. Assumptions are made regarding what each failure represents, and I offer no opinion on how reasonable this assumption is. 
-* `df.py` -- computes Deployment Frequency by retrieving the last 1000 workflow runs against the `main` branch, computing the number of days between the first and the last run and the dividing the number of runs by the number of days to get an approximate number of deployments to production over the 24 hour period.
-* `ltfc.py` -- computes Lead Time for Change and is the longest-executing script for large number of PRs, as it must retrieve commits for every merged PR. The script only retrieves commits for up to 500 merged PRs as a way to balance the execution time and having large enough data set for the metric to be meaningfull. The metric is computed by averaging out the time period between the last commit to each PR (when development can be considered complete and the change is ready for production) and the time it is merged (kicking off the deployment into production.)
-* `mttr.py` -- computes Mean Time to Recovery and takes failed workflow runs on main as a proxy for failures and subsequent successful execution of the same workflow as recovery. The reasonableness of these assumptions are left up to the judgement of the reader. Doubtless, using actual monitoring outputs and alerts would provide a more accurate measurement of this metric. The metric is computed by taking the mean of the periods between the first failure of the workflow run on main and the first susequent successful run of the same workflow. Failures with no subsequent successful runs are discarded. The sum of the lengths of the failure/success periods is divided by the total number of periods to compute the metric.
+* `cfr.py` -- computes Change Failure Rate by retrieving a maximum of 1000 workflow runs between two dates against the `main` branch and then dividing the number of unsuccessful runs by the total number of runs and multiplying bu 100% to get a percentage. Assumptions are made regarding what each failure represents, and I offer no opinion on how reasonable this assumption is. 
+* `df.py` -- computes Deployment Frequency by retrieving a maximum of 1000 workflow runs between two dates against the `main` branch, computing the number of days between the first and the last run and the dividing the number of runs by the number of days to get an approximate number of deployments to production over the 24 hour period.
+* `ltfc.py` -- computes Lead Time for Change and is the longest-executing script for large number of PRs, as it must retrieve commits for every merged PR. The script only retrieves commits for up to 500 merged PRs as a way to balance the execution time and having large enough data set for the metric to be meaningful. It then checks if the PRs fall between the date range. The metric is computed by averaging out the time period between the last commit to each PR (when development can be considered complete and the change is ready for production) and the time it is merged (kicking off the deployment into production.)
+* `mttr.py` -- computes Mean Time to Recovery between two dates and takes failed workflow runs on main as a proxy for failures and subsequent successful execution of the same workflow as recovery. The reasonableness of these assumptions are left up to the judgement of the reader. Doubtless, using actual monitoring outputs and alerts would provide a more accurate measurement of this metric. The metric is computed by taking the mean of the periods between the first failure of the workflow run on main and the first susequent successful run of the same workflow. Failures with no subsequent successful runs are discarded. The sum of the lengths of the failure/success periods is divided by the total number of periods to compute the metric.
 
 ## How to run them?
 
 These scripts rely heavily on workflow runs which only authenticated users have access to, so running them requires having a [Github Personal Access Token](https://docs.github.com/en/authentication/keeping-your-account-and-data-secure/creating-a-personal-access-token) with enough permissions to get workflow runs for Github Actions on the target repositories. The scripts will retrieve the token from the `ACCESS_TOKEN` environment variable. With the envvar unset, the scripts will fail with authentication errors. 
 
 Each script takes as a parameter a `json` file which contains a list of repositories against which it should run. See `modernisation-platform.json` for an example. The reason for the `json` is that the scripts don't compute DORA metrics for a repo, the metrics are computed for a team, which the name of the file used as the team name in the script outputs. The list of repos will contain all repositories which contain the team-managed code. The metrics are computed over all the repos in the list. 
 
-To execute each script, run `python3 script_name.py team.json`
+To execute each script, run `python3 script_name.py team.json 2023-04-01..2023-05-01`
+
+Note: The date range will run from the beginning of the day eg `2023-04-01 00:00:00` to `2023-05-01 00:00:00`.
diff --git a/cfr.py b/cfr.py
@@ -13,6 +13,7 @@
 # set up the command-line argument parser
 parser = argparse.ArgumentParser()
 parser.add_argument('filename', help='path to the input JSON file')
+parser.add_argument('date_query', help='date range in the format 2023-04-01..2023-05-01')
 args = parser.parse_args()
 
 # load the repository names from a JSON file
@@ -27,7 +28,7 @@
 per_page = 100
 for repo in repos:
 # Define the query parameters to retrieve all workflow runs
-    params = {"branch": "main", "status": "completed", "per_page": per_page}
+    params = {"branch": "main", "status": "completed", "per_page": per_page, "created": args.date_query}
 
     # Retrieve the workflow runs for the given repository using the provided query parameters
     workflow_runs = get_workflow_runs(OWNER, repo, ACCESS_TOKEN, params)

diff --git a/df.py b/df.py
@@ -20,6 +20,7 @@
 # set up the command-line argument parser
 parser = argparse.ArgumentParser()
 parser.add_argument('filename', help='path to the input JSON file')
+parser.add_argument('date_query', help='date range in the format 2023-04-01..2023-05-01')
 args = parser.parse_args()
 
 filename, file_extension = os.path.splitext(args.filename)
@@ -31,7 +32,7 @@
 num_successful_runs = 0
 
 for repo in repos:
-    params = {"branch": "main", "status": "success", "per_page": per_page}
+    params = {"branch": "main", "status": "success", "per_page": per_page, "created": args.date_query}
     try:
         runs += get_workflow_runs(OWNER,repo, ACCESS_TOKEN,params)
         # Count the number of successful runs

diff --git a/ltfc.py b/ltfc.py
@@ -17,12 +17,17 @@
 # set up the command-line argument parser
 parser = argparse.ArgumentParser()
 parser.add_argument('filename', help='path to the input JSON file')
+parser.add_argument('date_query', help='date range in the format 2023-04-01..2023-05-01')
 args = parser.parse_args()
 
 filename, file_extension = os.path.splitext(args.filename)
 
 team_merged_pull_requests = 0
 team_lead_time = timedelta()
+date_query = args.date_query
+date_range = date_query.split("..")
+start_date = datetime.strptime(date_range[0], '%Y-%m-%d')
+end_date = datetime.strptime(date_range[-1], '%Y-%m-%d')
 
 # load the repository names from a JSON file
 with open(args.filename, 'r') as f:
@@ -41,39 +46,42 @@
     # Grabbing commits for all PRs takes too long. 500 is a reasonable compromise between execution time and accuracy
 
     total_lead_time = timedelta()
-    num_pull_requests = len(merged_pull_requests[0:500:1])
+    num_pull_requests = 0
     # print(f'Number of Pull Requests: {num_pull_requests}')
     for pr in merged_pull_requests[0:500:1]:
         # Get the time the PR was merged
         merged_at = datetime.fromisoformat(pr["merged_at"][:-1])
-
-        # Get the time of the last commit to the PR branch prior to the merge
-        commits_url = pr["url"] + "/commits"
-        try:
-            commits = make_github_api_call(commits_url, ACCESS_TOKEN)
-            # for commit in commits:
-            #     commit_date = datetime.fromisoformat(commit["commit"]["committer"]["date"][:-1])
-            #     print(f"Commit date: {commit_date}")
-            if len(commits) > 0:
-                last_commit = commits[-1] # change to commits[0] to get first rather than commit.
-                commit_time = datetime.fromisoformat(last_commit["commit"]["committer"]["date"][:-1])
-                # print(f"Commit date: {commit_time}")
-            else:
+        number = pr["number"]
+        if start_date <= merged_at <= end_date:
+            # print(f"PR {pr["number"]} is between {start_date} and {end_date} with {merged_at}")
+            num_pull_requests += 1
+            # Get the time of the last commit to the PR branch prior to the merge
+            commits_url = pr["url"] + "/commits"
+            try:
+                commits = make_github_api_call(commits_url, ACCESS_TOKEN)
+                # for commit in commits:
+                #     commit_date = datetime.fromisoformat(commit["commit"]["committer"]["date"][:-1])
+                #     print(f"Commit date: {commit_date}")
+                if len(commits) > 0:
+                    last_commit = commits[-1] # change to commits[0] to get first rather than commit.
+                    commit_time = datetime.fromisoformat(last_commit["commit"]["committer"]["date"][:-1])
+                    # print(f"Commit date: {commit_time}")
+                else:
+                    commit_time = merged_at
+            except Exception as e:
+                # Log message if there's a problem retrieving commits
+                print(f"Error retrieving commits: {e}")
                 commit_time = merged_at
-        except Exception as e:
-            # Log message if there's a problem retrieving commits
-            print(f"Error retrieving commits: {e}")
-            commit_time = merged_at
-            break
+                break
 
-        # Calculate the lead time for the pull request
-        lead_time = merged_at - commit_time
-        total_lead_time += lead_time
+            # Calculate the lead time for the pull request
+            lead_time = merged_at - commit_time
+            total_lead_time += lead_time
     team_lead_time += total_lead_time
     team_merged_pull_requests += num_pull_requests
 
 if team_merged_pull_requests > 0:
     mean_lead_time = team_lead_time / team_merged_pull_requests
     print(f"\033[32m\033[1mMean lead time for {filename} team over {team_merged_pull_requests} merged pull requests: {mean_lead_time.days} days, {mean_lead_time.seconds // 3600} hours, {(mean_lead_time.seconds % 3600) // 60} minutes\033[0m")
 else:
-    print("No merged pull requests found.")
+    print("No merged pull requests found.")
diff --git a/modernisation-platform.json b/modernisation-platform.json
@@ -1,6 +1,31 @@
 {
     "repos": [
         "modernisation-platform",
-        "modernisation-platform-environments"
+        "modernisation-platform-terraform-ec2-autoscaling-group",
+        "modernisation-platform-ami-builds",
+        "modernisation-platform-terraform-ec2-instance",
+        "modernisation-platform-configuration-management",
+        "modernisation-platform-terraform-ecs",
+        "modernisation-platform-cp-network-test",
+        "modernisation-platform-terraform-ecs-cluster",
+        "modernisation-platform-terraform-environments",
+        "modernisation-platform-github-oidc-provider",
+        "modernisation-platform-terraform-iam-superadmins",
+        "modernisation-platform-github-oidc-role",
+        "modernisation-platform-terraform-lambda-function",
+        "modernisation-platform-terraform-loadbalancer",
+        "modernisation-platform-incident-response",
+        "modernisation-platform-terraform-member-vpc",
+        "modernisation-platform-infrastructure-test",
+        "modernisation-platform-instance-scheduler",
+        "modernisation-platform-terraform-pagerduty-integration",
+        "modernisation-platform-terraform-aws-vm-import",
+        "modernisation-platform-terraform-s3-bucket",
+        "modernisation-platform-terraform-baselines",
+        "modernisation-platform-terraform-s3-bucket-replication-role",
+        "modernisation-platform-terraform-bastion-linux",
+        "modernisation-platform-terraform-ssm-patching",
+        "modernisation-platform-terraform-cross-account-access",
+        "modernisation-platform-terraform-trusted-advisor"
     ]
 }
diff --git a/mttr.py b/mttr.py
@@ -18,6 +18,7 @@
 # set up the command-line argument parser
 parser = argparse.ArgumentParser()
 parser.add_argument('filename', help='path to the input JSON file')
+parser.add_argument('date_query', help='date range in the format 2023-04-01..2023-05-01')
 args = parser.parse_args()
 
 
@@ -36,7 +37,7 @@
 for repo in repos:
 
     # Get all workflow runs on the main branch
-    params = {"branch": "main", "per_page": per_page}
+    params = {"branch": "main", "per_page": per_page, "created": args.date_query}
     try:
         repo_run = get_workflow_runs(OWNER,repo, ACCESS_TOKEN,params)
         print(f"Retrieved {len(repo_run)} workflow runs for {OWNER}/{repo}")