finalize post

creativecommons · Aug 22, 2024 · c2ba7a4 · c2ba7a4
1 parent 28ed8da
commit c2ba7a4
Show file tree

Hide file tree

Showing 2 changed files with 40 additions and 50 deletions.
diff --git a/content/blog/entries/2024-08-22-automating-quantifying/Final DFD.png b/content/blog/entries/2024-08-22-automating-quantifying/Final DFD.png
diff --git a/content/blog/entries/2024-08-22-automating-quantifying/contents.lr b/content/blog/entries/2024-08-22-automating-quantifying/contents.lr
@@ -22,50 +22,40 @@ body:
 
 This post serves as a technical journal for the development process of the
 concluding stretch of Automating Quantifying the Commons, a project initiative
-for the 2024 Google Summer of Code program. Please visit [Part 1][] for more context
+for the 2024 Google Summer of Code program. Please visit **[Part 1][part1]** for more context
 if you haven't already done so.
 
 At the point of the midterm evaluation, I successfully completed Phases 1, 2, and 3
 (`fetch`, `process`, and `report`) of the Google Custom Search (GCS) data source, with a working report `README` generation
-for each quarter. My goal for the next half of the quarter is to complete a baseline automation
+for each quarter. My documented goal for the second half of the period was to complete a baseline automation
 software for these processes across all data sources. 
 
 
 ## Development Process
 ***
 
 ### I. Midpoint Reassessment
-If you read my previous post, you may have noticed that my next steps included the completion of these phases 
-for the rest of the data sources. However, I quickly realized that my GCS phases and the base
-analysis and visualization code from the Data Discovery Program already provide a standard reference
-for the development of the rest of these data sources. Since the actual aim of this venture is to develop automation software
-for these phases, my mentor suggested that the trajectory of the concluding time period should shift to programming
-the Git functions for automation first (which would require more time and effort) rather than focusing on the phases of the remaining data sources.
-This way, regardless of who works on the rest of the data sources, anyone can easily add the rest of the data sources, using the 
-current code as a reference.
+
+If you read my previous post, you might have seen that my next steps involved completing the phases for the remaining data sources. 
+However, I soon realized that the GCS phases, along with the base analysis and visualization code from the Data Discovery Program, 
+already serve as a standard reference for these tasks. Given that the primary goal of this project is to develop automation software
+for these phases, my mentor suggested shifting the focus of the final time period towards programming the Git functions for automation. 
+This approach, which will require more time and effort, will ensure that anyone working on the remaining data sources can easily integrate 
+them using the existing code as a reference.
 
 ### II. GitHub Actions Development
-We defined GitHub Actions to host our CI/CD workflows. GitHub Actions uses YAML for their workflows, and as I had never
-used YAML before, I had to learn and familiarize myself within this new domain. As with learning any new technology, there
-were challenges with initially developing the git automation (which is why my mentor emphasized my focus on the git programming). 
-There were a lot of new nuances I had to learn that I hadn't dealt with before -- for example, I would get an error during the workflow 
-runs, but would not have any known way to debug it.
-
-In my previous post, I shared three points that helped me familiarize myself with new technology during the first-half of the summer
-period. Here, I am sharing two additional points that helped me during the git programming:
-1. **GitHub Actions Extension on Visual Studio Code:** As I was using VSCode for my development, I didn't have any way to debug the issues
-during the workflow runs. However, I came across the GitHub Actions Extension on VSCode, which was a game-changer for me. It made it much
-easier for me to figure out why the code would not be working because the extension specifically highlights them.
-2. **Assigning myself mini-tasks:** I created my own GitHub repository with basic, minimal-functioning code to help me understand and experiment
-with GitHub actions in a less risky environment. This made it much easier for me to debug, compare, and figure out why things possibly weren't
-working the way they were supposed to. Although I did receive more repository priviliges after being accepted for GSoC, I still do not have the 
-same level of access as my mentor has, so creating my own seperate repository helped me understand the higher level of GitHub Actions better, and
-this actually helped me read into the error logs better. For example, I was able to realize that initially, the fetched automation wasn't working
-because the repository secrets weren't updated without having access to the secrets.
-
-Once I got the intial steps to compile, I worked on refining the scripts to maximum optimization. This included moving the commit functions into
-the shared module instead, where the functions would be called in the individual scripts instead of the YAML workflow, allowing for less scope
-for crashing. After I was able to successfully run the workflows, I implemented Cron functions to schedule the workflows in a quarterly time period.
+We defined GitHub Actions to host our CI/CD workflows, and since I had never used YAML before, 
+I needed to learn and familiarize myself with this new technology. Learning YAML presented challenges, 
+particularly in developing the Git automation. My mentor emphasized focusing on the Git programming due to these challenges. 
+For example, I encountered errors during workflow runs without clear ways to debug them.
+
+In my previous post, I shared three strategies that helped me familiarize myself with new technology during the first half of the summer. Here, I’m sharing two additional strategies that were particularly useful for GitHub Actions programming:
+
+1. **GitHub Actions Extension for Visual Studio Code:** As I was using VSCode for development, I initially struggled to debug issues during workflow runs. Discovering the GitHub Actions Extension for VSCode was a game-changer. This extension highlights issues in the workflow, making it much easier to diagnose and fix problems. I highly recommend searching extensions for any development task, as having relevant tools can make programming much easier.
+
+2. **Creating Mini-Tasks for Experimentation:** I set up my own GitHub repository with minimal, functional code to experiment with GitHub Actions in a low-risk environment. This approach facilitated easier debugging and comparison, helping me understand why certain things weren’t working. Although I gained more repository privileges after being accepted for GSoC, I still didn’t have the same access level as my mentor. By using a separate repository, I gained a better understanding of GitHub Actions and was able to interpret error logs more effectively. For instance, I realized that the automation wasn’t working initially due to outdated repository secrets, which I discovered without access to the secrets.
+
+After successfully compiling the initial steps, I focused on refining the scripts for optimal performance. I moved the commit functions into a shared module, which reduced the risk of crashes by allowing functions to be called within individual scripts rather than directly in the YAML workflow. Once the workflows ran successfully, I implemented Cron functions to schedule them quarterly.
 
 ### III. Engineering a Custom Error Handling and Exception System
 A key innovation in this project was the creation of a custom `QuantifyingException` class tailored specifically for the unique needs of the data pipeline. 
@@ -75,30 +65,27 @@ While testing this system across all phases, I made sure to purposely include "e
 
 Upon completion of a robust error and exception handling system, I completed all phase outlines of the remainder of the data sources. For fetching data from these sources, I have developed
 codebases combining the GCS fetch system and the original Data Discovery API fetching for a complete fetching system. However, it should be noted that I have not actually fetched data from these 
-APIs using the new codebase, as Timid Robot will undertake an initiative to add GitHub bots for the API keys after the GSoC period -- this is due to best practice purposes, as it is fundamental
+APIs using the new codebase, as Timid Robot will undertake an initiative to add GitHub bots for the API keys after the GSoC period — this is due to best practice purposes, as it is fundamental
 to create dedicated accounts for clear API usage and automated git commits. Therefore, these fetch files may need to be slightly tweaked after that, which will be discussed in **Next Steps**. 
 However, I have made sure to utilize fake data to ensure that the third phase successfully generates reports within the respective README file for ALL data sources.
 
-### IV. Final Data Flow + System Design
-In Part 1, I had shared the initial data flow diagram (DFD) for arranging the codebase. By the end of the program, however, the DFD and the
-system design had solidified into something completely different. Below are the finalized diagrams for data flow and system design, which establish
-an official framework for future endeavors.
-
-insert data flow diagram
+### IV. Finalized Flow of System + Data
+In Part 1, I had shared the initial data flow diagram (DFD) for arranging the codebase. By the end of the program, however, the DFD and the overall system had solidified into something different. 
+Below is the final diagram for data flow, which establish an official framework for future endeavors.
 
-insert system design diagram
+![DFD](Final DFD.png)
 
 ## Final Conclusions
 ***
 
 ### I. All Deliverables Completed Over the Course of the Program
 
-Although this 12-week period provided a vast amount of time to grow the Quantifying codebase, there were still time and resource constraints that we had to consider; primarily, the lack 
+Although this 12-week period allowed significant expansion of the Quantifying codebase, there were still time and resource constraints that we had to consider; primarily, the lack 
 of data we could collect using the given APIs over this time period. However, as mentioned earlier, given strategic implementations, I was able to still complete the summer goal of developing a baseline
 automation software for data gathering, flow, and report generation, ensuring script runs on a quarterly basis. The **Next Steps** section will elaborate on how this software will be solidified over
 the upcoming quarters and years. 
 
-[number]+ commits, [number]+ lines of code, and 360+ hours of work later, I present to you ten pivotal deliverables that I have completed over the summer period:
+130+ commits, 7,615+ net code additions, and 360+ hours of work later, I present ten pivotal deliverables that I have completed over the summer period:
 
 | Deliverable | Description|
 | ------------- | ------------- |
@@ -115,7 +102,7 @@ the upcoming quarters and years.
 
 
 ### II. Acknowledgements, Impact, Next Steps
-This project would not have been possible without the constant guidance and insights of my mentors: Timid Robot (lead), Shafiya Heena (supporting), and Sara Lovell (supporting).
+This project would not have been possible without the constant guidance and insights of my mentors: **[Timid Robot Zehta][timid-robot]** (lead), **[Shafiya Heena][shafiya]** (supporting), and **[Sara Lovell][sara]** (supporting).
 I appreciate how they created a safe space for working since the very beginning. I've never felt hesitant to ask questions
 and have never felt out-of-place working in the organization, despite my introductory-level skillset at the start. In fact, this allowed
 me to feel open to ask questions and be able to undertake side-projects that facilitated my growth. I truly believe that being able to work in an environment like this 
@@ -124,24 +111,27 @@ has played a large role in my ability to perform well, and this was the sole rea
 As for overall impact, it is very evident that Creative Commons is integral to facilitating the sharing and utilization of creative works worldwide. With over 2.5 billion
 licenses globally, Creative Commons and its open-source inititives hold heavy impact, promising to empower researchers, policymakers, and stakeholders with up-to-date insights into the global
 usage patterns of open doman and CC-licensed content. Therefore, I'm looking forward to witnessing the direct influence this project holds in paving the way for future advancements
-in leveraging open content licenses globally. I am extremely grateful and honored to be able to have such a major role in contributing to this organization, and am excited to see
+in leveraging open content licenses globally. I am extremely grateful and honored to be able to play such a major role in contributing to this organization, and am excited to see
 future contributions I facilitate alongside other CC open-source developers.
 
-Limitations + Next Steps:
-add after deploying issues
+As for next steps, I am opening several post-GSoC issues in the Quantifying repository that can be worked on by any open-source contributor. 
+These issues cover some of the necessary adjustments that need to be made once we cross certain time periods and codebase additions. 
+If you're interested in getting involved, please visit the **[Issues][issues]** page linked for your convenience. 
+Your contributions will be invaluable as we continue to enhance and expand this project, 
+and I’m eager to see the innovative solutions and improvements that will unfold these upcoming years!
 
 ## Additional Readings
 ***
 
-- Automating Quantifying the Commons: Part 1 | Author: Naisha Sinha | Aug. 2024
+- [Automating Quantifying the Commons: Part 1][part1] | Author: Naisha Sinha | Jul. 2024
 - [Data Science Discovery: Quantifying the Commons][quantifying] | Author: Dun-Ming Huang (Brandon Huang) | Dec. 2022
 
 [quantifying]: https://opensource.creativecommons.org/blog/entries/2022-12-07-berkeley-quantifying/
-[logging]: https://github.com/creativecommons/quantifying/pull/97
-[AWS-whitepaper]: https://docs.aws.amazon.com/whitepapers/latest/microservices-on-aws/distributed-data-management.html
-[meta-whitepaper]: https://engineering.fb.com/2024/05/22/data-infrastructure/composable-data-management-at-meta/
 [timid-robot]: https://opensource.creativecommons.org/blog/authors/TimidRobot/
-[documentation]: https://unmarred-gym-686.notion.site/Automating-Quantifying-the-Commons-Documentation-441056ae02364d8a9a51d5e820401db5?pvs=4
+[shafiya]: https://opensource.creativecommons.org/blog/authors/shafiya/
+[sara]: https://opensource.creativecommons.org/blog/authors/sara/
+[part1]: https://opensource.creativecommons.org/blog/entries/2024-07-10-automating-quantifying/
+[issues]: https://github.com/creativecommons/quantifying/issues