- Objective
- Summary of Challenges Encountered and Corresponding Solutions
- Recommendations
- How to Run Task 3
- Data Flow
- Additional Tasks and Data Collection Efforts to Close the Gap
- Appendix
Task 3 ("Unique Grant Recipient Approach Proof-of-Concept") is a test project designed to improve how we identify and assist organizations or individuals who receive grants. By trying out new tools or methods, we aim to make the process faster, more accurate, and better tailored to specific needs. The goal is to create a better framework to match funding efficiently. The Raft Team (the team) aims to provide a Proof-of-Concept (PoC) approach for uniquely identifying these ACF grantees.
For the proof of concept, the team worked with grant data across three Administration for Children and Family (ACF) Program Offices:
- Office of Head Start (OHS)
- Office of Family Assistance (OFA)
- Office of Family Violence Prevention and Services (OFVPS)
Sub-Objective | Description |
---|---|
SO1 | Establish primary dataset for ETL and updating of USA Spending Grants Data. |
SO2 | Identify entity evolution over time by mapping old and new versions of entities. |
SO3 | Enhance data quality by improving completeness and accuracy of records. |
SO4 | Provide PoC for entity reconciliation using USA Spending grants data. |
SO5 | Offer recommendations for resolving unmatched entities. |
Problem: Many entities undergo changes over time (e.g., name refinements, Unique Entity Identifier [UEI] updates) without clear documentation, making tracking difficult.
Solution: Using auxiliary data sources (e.g., facility data, Bureau of Indian Affairs (BIA) name change notices) to create bridge event data.
Pros:
- Leverages available data sources.
- Improves historical tracking.
- Builds scalable solutions for future program offices.
Cons:
- Limited data coverage across offices.
- Scalability challenges requiring continuous updates.
Problem: USA Spending Grants Data has inconsistent and incomplete data documentation.
Solution: Deploying data validation scripts to identify and correct quality issues.
Pros:
- Enhances data reliability and systematic issue identification.
Cons:
- Does not address root causes of data inconsistencies.
- Requires ongoing maintenance.
Problem: Lack of validation dataset necessitated manual validation of entity evolutions over time.
Solution: Developing a validation dataset to evaluate reconciliation methods.
Pros:
- Improves accuracy and enables scalable reconciliation.
- Provides iterative improvement and enhances confidence in data reliability.
Cons:
- Requires significant resources for development.
- Partial solutions still require manual effort.
- Facilitate knowledge-sharing workshops with ACF stakeholders.
- Capture institutional knowledge and validate entity reconciliation results.
- Extend facility data collection to other offices beyond OHS.
- Address confidentiality concerns in data collection.
- USAspending.gov provides critical data for entity reconciliation.
- Data cleaning, normalization, and validation improve grant recipient identification.
- Source the
00-source.R
file. - Run ETL scripts for USA Spending, ACF Program Mapping, and Head Start Facilities data.
- Apply entity reconciliation functions.
- USA Spending Grants Data
- ACF Program Mapping Data
- Head Start Facility Locations Data
Data Source | Input Format | Output Format |
---|---|---|
USA Spending Grants Data | Zip URL | Feather |
ACF Program Mapping Data | XLSX | Feather |
Head Start Facility Locations Data | CSV URL | Feather |
- Expand facility data collection beyond OHS.
- Evaluate feasibility based on confidentiality and data regulations.
- Conduct workshops to validate entity reconciliation methods.
- Create an office-approved validation dataset.
- Utilize clustering methods:
- DBSCAN for density-based clustering.
- K-means for partitioning data.
- Agglomerative Clustering for nested structures.
Data Source | Type | Function | Description & Notes |
---|---|---|---|
USA Spending Grants Data | ETL | get_grants_data() |
Pulls most recent USA Spending data. |
Data Validation | assert_grant_identifiers() |
Summarizes grant consistency over time. | |
ACF Program Mapping Data | ETL | rename_acf_program_map() |
Standardizes column names. |
Data Validation | acf_program_info() |
Summarizes program distributions. | |
Head Start Facility Locations Data | ETL | expand_abbreviations() |
Standardizes address formats. |
Data Validation | assert_head_start_diff() |
Summarizes expired grants and discrepancies. |
This document provides a structured and scalable approach to entity reconciliation using USA Spending grants data, facilitating improved grant tracking and administration for ACF Program Offices.