This repository contains the following:
- Cloud-native, data pipeline architecture for onboarding public datasets to Google Cloud Datasets.
- Documentation set containing tutorials, samples, and other articles making use of the datasets hosted by the program.
For detailed documentation, please see the Wiki Pages.
Here are some of the featured datasets onboarded using this repository/architecture.
- Google Search Trends
- Political Advertising on Google
- DeepMind AlphaFold
- Google's Diversity Annual Report
- Google Cloud Release Notes
- Google's Open Source Insights (deps.dev)
- Global Biodiversity Information Facility (GBIF)
- Cancer Imaging Data from Imaging Data Commons (IDC)
- The New York Times US Coronavirus Database