Skip to content

Commit

Permalink
PARQUET-2479: Update README with link to parquet website, clarify con…
Browse files Browse the repository at this point in the history
…tents (#243)
  • Loading branch information
alamb authored May 21, 2024
1 parent 079a2df commit dca2f42
Showing 1 changed file with 10 additions and 5 deletions.
15 changes: 10 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,12 +19,17 @@

# Parquet [![Build Status](https://github.com/apache/parquet-format/actions/workflows/test.yml/badge.svg)](https://github.com/apache/parquet-format/actions)

Parquet is a columnar storage format that supports nested data.
This repository contains the specification for [Apache Parquet] and
[Apache Thrift] definitions to read and write Parquet metadata.

Parquet metadata is encoded using Apache Thrift.
Apache Parquet is an open source, column-oriented data file format
designed for efficient data storage and retrieval. It provides high
performance compression and encoding schemes to handle complex data in
bulk and is supported in many programming language and analytics
tools.

The `Parquet-format` project contains all Thrift definitions that are necessary to create readers
and writers for Parquet files.
[Apache Parquet]: https://parquet.apache.org
[Apache Thrift]: https://thrift.apache.org

## Motivation

Expand Down Expand Up @@ -176,7 +181,7 @@ following rules:
* If the min is +0, the row group may contain -0 values as well.
* If the max is -0, the row group may contain +0 values as well.
* When looking for NaN values, min and max should be ignored.

* BYTE_ARRAY and FIXED_LEN_BYTE_ARRAY - Lexicographic unsigned byte-wise
comparison.

Expand Down

0 comments on commit dca2f42

Please sign in to comment.