Skip to content

Commit 10509b1

Browse files
Merge pull request #374 from smilingprogrammer/week-11/12-docs
chore(docs): Data pipeline for safaa week 11 and week 12 blog Reviewed-by: [email protected]
2 parents cc2a73d + aed0b62 commit 10509b1

File tree

7 files changed

+79
-4
lines changed

7 files changed

+79
-4
lines changed

docs/2025/data-pipeline/updates/2025-07-30.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@ SPDX-FileCopyrightText: 2025 Abdulsobur Oyewale <[email protected]>
1111
-->
1212

1313
# WEEK 9
14-
*(July 23, 2025)*
14+
*(July 30, 2025)*
1515

1616
## Attendees:
1717
- [Ayush Kumar Bhardwaj](https://github.com/hastagAB)

docs/2025/data-pipeline/updates/2025-08-06.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@ SPDX-FileCopyrightText: 2025 Abdulsobur Oyewale <[email protected]>
1111
-->
1212

1313
# WEEK 10
14-
*(July 23, 2025)*
14+
*(August 06, 2025)*
1515

1616
## Attendees:
1717
- [Ayush Kumar Bhardwaj](https://github.com/hastagAB)
@@ -27,7 +27,7 @@ SPDX-FileCopyrightText: 2025 Abdulsobur Oyewale <[email protected]>
2727

2828
* To successfully integrate the testing process in the pipeline, i introduced the entity_recognizer folder and declutter_model folder into the directory we saved our model during training.
2929
* This adjustment will allow us to test the model we saved in the new path from training
30-
* With this we will be able to test and visualized our testing metrics anything the pipeline runs. This will enable us to make decision in the future on the next steps of any newly trained model coming from the pipeline with respect to the metrics
30+
* With this, we will be able to test and visualized our testing metrics anything the pipeline runs. This will enable us to make decision in the future on the next steps of any newly trained model coming from the pipeline with respect to the metrics
3131
![image](/img/data-pipeline/test.png)
3232

3333

Lines changed: 42 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,42 @@
1+
---
2+
title: Week 11
3+
author: Abdulsobur Oyewale
4+
tags: [gsoc25, Data Pipeline for Safaa]
5+
---
6+
7+
<!--
8+
SPDX-License-Identifier: CC-BY-SA-4.0
9+
10+
SPDX-FileCopyrightText: 2025 Abdulsobur Oyewale <[email protected]>
11+
-->
12+
13+
# WEEK 11
14+
*(August 13, 2025)*
15+
16+
## Attendees:
17+
- [Ayush Kumar Bhardwaj](https://github.com/hastagAB)
18+
- [Kaushlendra Pratap](https://github.com/Kaushl2208)
19+
20+
21+
## Engagements
22+
23+
This week we integrated automation of raising a PR from the retraining pipeline. In this feature, we implemented the PR functionality into the pipeline with;
24+
- Extraction of retraining model metrics from workflow
25+
- Creating a new branch for each retraining triggered, and saved with timestamp
26+
- Automated PR raising using GitHub bot
27+
- Integrated metrics as PR description
28+
29+
I also fix the existing preprocessing and decluttering process to maintain DataFrame alignment while pipeline is running;
30+
31+
- Updated preprocess_data and declutter_data to return Series with original index
32+
- Ensured copyright column is updated in-place without breaking row alignment
33+
- Resolved the length mismatch errors during pipeline execution
34+
35+
## Meeting Discussion:
36+
* This week, I discussed with my mentors on what i have done so far regarding creation of a new PR through a different branch.
37+
* I was able to present and show them the PR that were raised from what when we tested the functionalities
38+
39+
![image](/img/data-pipeline/pipetest.png)
40+
41+
## Subsequent Steps
42+
I will be continuing with solving some errors like the precision that wasn't showing in the metrics of the PR raised, among others.

docs/2025/data-pipeline/updates/2025.07.09.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@ SPDX-FileCopyrightText: 2025 Abdulsobur Oyewale <[email protected]>
1111
-->
1212

1313
# WEEK 5
14-
*(July 02, 2025)*
14+
*(July 09, 2025)*
1515

1616
## Attendees:
1717
- [Kaushlendra Pratap](https://github.com/Kaushl2208)
Lines changed: 33 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,33 @@
1+
---
2+
title: Week 12
3+
author: Abdulsobur Oyewale
4+
tags: [gsoc25, Data Pipeline for Safaa]
5+
---
6+
7+
<!--
8+
SPDX-License-Identifier: CC-BY-SA-4.0
9+
10+
SPDX-FileCopyrightText: 2025 Abdulsobur Oyewale <[email protected]>
11+
-->
12+
13+
# WEEK 12
14+
*(August 20, 2025)*
15+
16+
## Attendees:
17+
- [Kaushlendra Pratap](https://github.com/Kaushl2208)
18+
- Sushant Kumar
19+
20+
## Engagements
21+
This week i majorly focused of documentation. I wrote documentation for;
22+
- My GSoC final evaluation submission.
23+
- The documentation for the project, what we have accomplished so far, and what still need to be included.
24+
25+
I also took my time make some modification to the current `pipeline.yml` like the precision metric that wasn't properly being exported and displayed in PRs previously created
26+
27+
![image](/img/data-pipeline/pipetest.png)
28+
29+
## Meeting Discussion:
30+
31+
Had a meeting with my mentor, and we majorly discuss the things that needed to be included in my evaluation documentation and requirements needed in the project documentation too.
32+
33+
I was also given a link to two repositories which i can use as a guide for my evaluation documentation.
94.1 KB
Loading
88 KB
Loading

0 commit comments

Comments
 (0)