Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bulk data export #4769

Open
15 tasks
jadudm opened this issue Mar 11, 2025 · 0 comments
Open
15 tasks

Bulk data export #4769

jadudm opened this issue Mar 11, 2025 · 0 comments
Assignees

Comments

@jadudm
Copy link
Contributor

jadudm commented Mar 11, 2025

Problem

Users need to access our full dataset (e.g. for multi-year lookback and oversight), and the FAC does not provide this feature. Not all users can write programs to pull the data from the API.

How did we discover this problem?

Census provided bulk downloads, and we have been asked for them since the FAC came to GSA.

Job Story(s)

  • As a PhD student studying government, I want to download the entire FAC database so that I can conduct my research.
  • As an Inspector General, I need to assure the quality of audits for my entire agency, and this requires doing multi-year data analysis over all entities receiving money from our agency.
  • As an audit resolution team, we want to verify what funding was issued, and what audits were received, so we can be certain we have carried out all of our required oversight.
  • As the GAO or OMB, I want to do analyses of funding across government for oversight, investigation, and policy-making.

What are we planning to do about it?

  • Generate CSV files of each table (e.g. general, federal_awards). The federal_awards file will be ~6M rows.
    • For Excel users, we will need to break these up. Perhaps export the files by year. (This is what Census did.)
    • Perhaps we should just generate Excel files directly, so we don't have to explain how to open CSVs?
  • Generate an SQLite database file containing all of the data
    • This might be preferable to a PostgreSQL dump, as it is 1) standalone, and 2) more easily integrated into more things.

What are we not planning to do about it?

We are not planning to offer filtering and custom exports; we'll do a standard (weekly/monthly) export of the data, and document it for our users.

How will we measure success?

  • Inquiries for bulk data go to zero. This implies...
    • The data exists and can be downloaded
    • People know about it and can find it (IA)
    • The use of each format is documented clearly, with examples
  • Tickets asking for more features. ("Give a mouse a cookie...") Asking for more is good feedback.
  • The process is automatic, and requires no intervention day-to-day, week-to-week, month-to-month.
  • Bonus: External recognition for our exceptional work.

Security Considerations

Required per CM-4.

The FAC data is public. The only security consideration is that there is some data that is suppressed and should not be exported. Otherwise, providing this data for bulk download by the public and Federal government has no particular concerns.


Process checklist
  • Has a clear story statement
  • Can reasonably be done in a few days (otherwise, split this up!)
  • Shepherds have been identified
  • UX youexes all the things
  • Design designs all the things
  • Engineering engineers all the things
  • Meets acceptance criteria
  • Meets QASP conditions
  • Presented in a review
  • Includes screenshots or references to artifacts
  • Tagged with the sprint where it was finished
  • Archived

If there's UI...

  • Screen reader - Listen to the experience with a screen reader extension, ensure the information presented in order
  • Keyboard navigation - Run through acceptance criteria with keyboard tabs, ensure it works.
  • Text scaling - Adjust viewport to 1280 pixels wide and zoom to 200%, ensure everything renders as expected. Document 400% zoom issues with USWDS if appropriate.
@github-project-automation github-project-automation bot moved this to Triage in FAC Mar 11, 2025
@jadudm jadudm moved this from Triage to Backlog in FAC Mar 11, 2025
@jadudm jadudm moved this from Backlog to In Progress in FAC Mar 12, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: In Progress
Development

No branches or pull requests

4 participants