Minimum Viable Product (MVP) Roadmap

Below are planned roadmap for MVP as discussed in different places (e.g. dev ML, Github issues & PRs, slack channel, etc.). Note that it is only for the native C++ implementation. For the effort of Rust C++ binding, please refer to https://lists.apache.org/thread/hotlcdw86nrmt7cf5o5o7kq6gwo98758.

# Convention

- Platform: Linux, MacOS, Windows
- Compilers: Clang, GCC, MSVC
- Build: CMake
- C++ standard: C++20
- Coding style: Follow what Apache Arrow C++ does: https://arrow.apache.org/docs/developers/cpp/development.html#code-style-linting-and-ci
- Error handling: use ported `expected` similar to `std::expected`

# Goal

- Implement read path for parsing metadata files of Iceberg v1 & v2. It is a nice-to-have feature to read data files depending on the bandwidth of contributors.
- Provide a light-weight io-less `iceberg` library with minimal dependencies (like `apache/nanoarrow` and `nlohmann/json`) to mainly deal with the Iceberg metadata. Downstream projects are required to provide their own implementations like I/O, Parquet, Avro and write adaptation code.
- Provide a battery-included `iceberg-bundle` library backed by Apache Arrow C++ and Apache Avro C++ libraries.

# Workitems

(Disclaimer: this is not an exhaustive list and is subject to change as the development goes on)

## API of metadata or building block

- [x] Add `Schema` (including data types)
- [ ] Add `DataFile`
- [ ] Add `DeleteFile`
- [x] Add `ManifestFile`
- [x] Add `ManifestEntry` @zhjwpku 
- [x] Add `Snapshot`  @zhjwpku 
- [x] Add `PartitionSpec`
- [x] Add `SortOrder`  @zhjwpku 
- [x] Add `ManifestList`
- [x] Add `TableMetadata`

## Catalog

- [x] Define `Catalog` interface.
- [x] Implement an in-memory catalog. @gty404

## IO

- [x] Define `FileIO` interface with minimal operations.
- [x] Provide default FileIO implementation backed by `arrow::FileSystem` for different storage providers.

## Table

- [x] Define `Table` interface.
- [ ] Provide a basic Table implementation to have access to its metadata. @lishuxu
- [ ] Implement `Table::NewScan` function and `TableScan` class to support planning files for reading a specific snapshot. @gty404
- [ ] Implement partition pruning and data file pruning in the `TableScan`.

## JSON Serialization

- [x] SortOrder  @gty404
- [x] PartitionSpec   @gty404
- [x] Schema
- [x] Snapshot
- [x] TableMetadata
- [x] NameMapping

## Metadata File Reader

- [x] JSON file reader.
- [x] JSON parser for metadata objects.
- [x] Reading gzip-compressed metadata json file.

## File Format Reader

- [x] Define format-agnostic `FileReader` interface with Arrow C Data as the contract.
- [ ] Implement manifest list file reader.
- [ ] Implement manifest file reader.
- [ ] Provide default Avro reader implementation in the `iceberg-bundle` library.
- [ ] Provide default Parquet reader implementation in the `iceberg-bundle` library.

## Schema/Data conversion

- [x] Bi-directional conversion to Arrow C schema.
- [ ] ArrowArray -> avro::GenericDatum
- [ ] avro::GenericDatum -> ArrowArray
- [ ] StructLike interface

## Expression
- [x] Transform metadata
- [ ] Transform function
- [ ] Expression components: expression, term, literal, reference, etc.
- [ ] Expression visitor

## Third-party library

- [x] Add `nanoarrow` to `libiceberg`
- [x] Add `nlohmann/json` to `libiceberg` @yingcai-cy
- [x] Add `avro-cpp` to `libiceberg-bundle`
- [x] Add `arrow-cpp` to `libiceberg-bundle`

## First release

- [ ] Check licenses of dependencies.
- [ ] Add and check documentations.
- [ ] Add release script.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Minimum Viable Product (MVP) Roadmap #2

Convention

Goal

Workitems

API of metadata or building block

Catalog

IO

Table

JSON Serialization

Metadata File Reader

File Format Reader

Schema/Data conversion

Expression

Third-party library

First release

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Minimum Viable Product (MVP) Roadmap #2

Description

Convention

Goal

Workitems

API of metadata or building block

Catalog

IO

Table

JSON Serialization

Metadata File Reader

File Format Reader

Schema/Data conversion

Expression

Third-party library

First release

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions