Skip to content

Commit

Permalink
Update data page
Browse files Browse the repository at this point in the history
  • Loading branch information
jeremyarancio committed Oct 28, 2024
1 parent 9696184 commit 3f51ae9
Showing 1 changed file with 0 additions and 10 deletions.
10 changes: 0 additions & 10 deletions lang/aa/texts/data.html
Original file line number Diff line number Diff line change
Expand Up @@ -55,16 +55,6 @@ <h3>JSONL data export</h3>

<p>A suitable way to exploit the database is to use DuckDB, an in-process analytical tool designed to process large amount of data in a fraction of seconds. You can read our <a href="https://blog.openfoodfacts.org/en/news/food-transparency-in-the-palm-of-your-hand-explore-the-largest-open-food-database-using-duckdb-%f0%9f%a6%86x%f0%9f%8d%8a">blog post</a> where we walk you through exploring and processing the Open Food Facts database with DuckDB</p>

<h3>CSV Data Export</h3>
<p>Data for all products, or some of the products, can be downloaded in the CSV format (readable with LibreOffice, Excel and many other spreadsheet software) through the <a href="https://world.openfoodfacts.org/cgi/search.pl">advanced search form</a>.</p>

<dl>
<dt>Links</dt>
<dd><a href="https://static.openfoodfacts.org/data/en.openfoodfacts.org.products.csv.gz">https://static.openfoodfacts.org/data/en.openfoodfacts.org.products.csv.gz</a> (compressed CSV in GZIP format: ~ 0.9 Gb, uncompressed: ~ 9 Gb)</dd>
</dl>

<p>The file encoding is Unicode UTF-8. The character that separates fields is &lt;tab&gt; (tabulation).</p>

<h3>Parquet Data Export on Hugging Face</h3>

<p>A cleaner version of the JSONL dump is also available in the <a href="https://parquet.apache.org/">Parquet format</a>. This data format is optimized for columnar queries, which is particular convenient for data analysis.</p>
Expand Down

0 comments on commit 3f51ae9

Please sign in to comment.