Skip to content
Open
40 changes: 40 additions & 0 deletions docs/docs/fileio.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
---
title: "FileIO"
---
<!--
- Licensed to the Apache Software Foundation (ASF) under one or more
- contributor license agreements. See the NOTICE file distributed with
- this work for additional information regarding copyright ownership.
- The ASF licenses this file to You under the Apache License, Version 2.0
- (the "License"); you may not use this file except in compliance with
- the License. You may obtain a copy of the License at
-
- http://www.apache.org/licenses/LICENSE-2.0
-
- Unless required by applicable law or agreed to in writing, software
- distributed under the License is distributed on an "AS IS" BASIS,
- WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- See the License for the specific language governing permissions and
- limitations under the License.
-->

# Iceberg FileIO

## Overview

Iceberg comes with a flexible abstraction around reading and writing data and metadata files. The FileIO interface allows the Iceberg library to communicate with the underlying storage layer. FileIO is used for all metadata operations during the job planning and commit stages.

## Iceberg Files

The metadata for an Iceberg table tracks the absolute path for data files which allows greater abstraction over the physical layout. Additionally, changes to table state are performed by writing new metadata files and never involve renaming files. This allows a much smaller set of requirements for file operations. The essential functionality for a FileIO implementation is that it can read files, write files, and seek to any position within a stream.

## Usage in Processing Engines

The responsibility of reading and writing data files lies with the processing engines and happens during task execution. However, after data files are written, processing engines use FileIO to write new Iceberg metadata files that capture the new state of the table.

Different FileIO implementations are used depending on the type of storage. Iceberg comes with a set of FileIO implementations for popular storage providers.
- Amazon S3
- Google Cloud Storage
- Object Service Storage (including https)
- Dell Enterprise Cloud Storage
- Hadoop (adapts any Hadoop FileSystem implementation)
1 change: 1 addition & 0 deletions docs/mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,7 @@ nav:
- API:
- Quickstart: java-api-quickstart.md
- API: api.md
- File I/O: fileio.md
- Javadoc: ../../javadoc/latest/
- Integrations:
- Apache Spark:
Expand Down
3 changes: 3 additions & 0 deletions site/nav.yml
Original file line number Diff line number Diff line change
Expand Up @@ -90,6 +90,8 @@ nav:
- Project:
- Contributing: contribute.md
- Multi-engine support: multi-engine-support.md
- Benchmarks: benchmarks.md
- Security: security.md
- How to release: how-to-release.md
- ASF:
- Sponsorship: https://www.apache.org/foundation/sponsorship.html
Expand All @@ -102,6 +104,7 @@ nav:
- Community: community.md
- Talks: talks.md
- Vendors: vendors.md

- Specification:
- Terms: terms.md
- REST Catalog Spec: rest-catalog-spec.md
Expand Down