From dca2f42d536cbd56cd5042d233c78b23e294bf05 Mon Sep 17 00:00:00 2001 From: Andrew Lamb Date: Tue, 21 May 2024 11:12:07 -0400 Subject: [PATCH] PARQUET-2479: Update README with link to parquet website, clarify contents (#243) --- README.md | 15 ++++++++++----- 1 file changed, 10 insertions(+), 5 deletions(-) diff --git a/README.md b/README.md index 18a75077f..42578c7be 100644 --- a/README.md +++ b/README.md @@ -19,12 +19,17 @@ # Parquet [![Build Status](https://github.com/apache/parquet-format/actions/workflows/test.yml/badge.svg)](https://github.com/apache/parquet-format/actions) -Parquet is a columnar storage format that supports nested data. +This repository contains the specification for [Apache Parquet] and +[Apache Thrift] definitions to read and write Parquet metadata. -Parquet metadata is encoded using Apache Thrift. +Apache Parquet is an open source, column-oriented data file format +designed for efficient data storage and retrieval. It provides high +performance compression and encoding schemes to handle complex data in +bulk and is supported in many programming language and analytics +tools. -The `Parquet-format` project contains all Thrift definitions that are necessary to create readers -and writers for Parquet files. +[Apache Parquet]: https://parquet.apache.org +[Apache Thrift]: https://thrift.apache.org ## Motivation @@ -176,7 +181,7 @@ following rules: * If the min is +0, the row group may contain -0 values as well. * If the max is -0, the row group may contain +0 values as well. * When looking for NaN values, min and max should be ignored. - + * BYTE_ARRAY and FIXED_LEN_BYTE_ARRAY - Lexicographic unsigned byte-wise comparison.