Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add Parquet format to data page #629

Open
wants to merge 4 commits into
base: main
Choose a base branch
from
Open
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 11 additions & 0 deletions lang/aa/texts/data.html
Original file line number Diff line number Diff line change
Expand Up @@ -55,6 +55,17 @@ <h3>JSONL data export</h3>

<p>A suitable way to exploit the database is to use DuckDB, an in-process analytical tool designed to process large amount of data in a fraction of seconds. You can read our <a href="https://blog.openfoodfacts.org/en/news/food-transparency-in-the-palm-of-your-hand-explore-the-largest-open-food-database-using-duckdb-%f0%9f%a6%86x%f0%9f%8d%8a">blog post</a> where we walk you through exploring and processing the Open Food Facts database with DuckDB</p>

<h3>Parquet Data Export on Hugging Face</h3>

<p>A cleaner version of the JSONL dump is also available in the <a href="https://parquet.apache.org/">Parquet format</a>. This data format is optimized for columnar queries, which is particular convenient for data analysis.</p>
jeremyarancio marked this conversation as resolved.
Show resolved Hide resolved

The dataset is available on <a href="https://huggingface.co/datasets/openfoodfacts/product-database">Hugging Face</a>, a collaborative Machine Learning ecosystem where developers and researchers can share models and datasets.
<dl>
<dt>Link</dt>
<dd><a href="https://huggingface.co/datasets/openfoodfacts/product-database/resolve/main/products.parquet">https://huggingface.co/datasets/openfoodfacts/product-database/resolve/main/products.parquet</a>
</dd>
</dl>

<h3>CSV Data Export</h3>
<p>Data for all products, or some of the products, can be downloaded in the CSV format (readable with LibreOffice, Excel and many other spreadsheet software) through the <a href="https://world.openfoodfacts.org/cgi/search.pl">advanced search form</a>.</p>

Expand Down
Loading