Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Read] Implementation of Hash Bucketing and Batch Processing #4

Open
wants to merge 314 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
314 commits
Select commit Hold shift + click to select a range
47eac52
Spark: Flaky test due temp directory (#10811)
manuzhang Oct 28, 2024
1e3ee1e
Core: Add portable Roaring bitmap for row positions (#11372)
aokolnychyi Oct 28, 2024
602c2b2
Flink 1.20: Update Flink to use planned Avro reads (#11386)
jbonofre Oct 29, 2024
740d4e7
open-api: Fix `testFixtures` dependencies (#11422)
ajantha-bhat Oct 29, 2024
469c556
Core: use ManifestFiles.open when possible (#11414)
dramaticlly Oct 29, 2024
5359bea
GCS: Refresh vended credentials (#11282)
nastra Oct 30, 2024
bedc711
Docs: Add 21 blogs / fix one broken link (#11424)
AlexMercedCoder Oct 30, 2024
9e895cb
AWS: Refresh vended credentials (#11389)
nastra Oct 30, 2024
7c39086
Build: Bump Hadoop to 3.4.1 (#11428)
Fokko Oct 30, 2024
dec84c0
Core: Remove credentials from LoadViewResponse (#11432)
nastra Oct 30, 2024
f4b36a5
OpenAPI: Remove credentials from LoadViewResult (#11433)
nastra Oct 30, 2024
91e04c9
API: Add compatibility checks for Schemas with default values (#11434)
rdblue Oct 30, 2024
57fb6d5
Doc: Update rewrite data files spark procedure (#11396)
dramaticlly Oct 31, 2024
ea61ee4
Docs: warn `parallelism > 1` doesn't work for migration procedures (#…
manuzhang Oct 31, 2024
1d4df34
Core: Log retry sleep time (#11413)
sullis Oct 31, 2024
caf424a
Core: Use RoaringPositionBitmap in position index (#11441)
aokolnychyi Nov 1, 2024
fe23584
Core: Add validation for table commit properties (#11437)
dramaticlly Nov 2, 2024
8b4ebc6
Core: Add cardinality to PositionDeleteIndex (#11442)
aokolnychyi Nov 2, 2024
b9ebc71
Puffin: Add deletion-vector-v1 blob type (#11238)
rdblue Nov 2, 2024
d368a5f
Spec: Add deletion vectors to the table spec (#11240)
rdblue Nov 2, 2024
d9b9768
API, Core: Add data file reference to DeleteFile (#11443)
aokolnychyi Nov 2, 2024
e47fa6a
Build: Bump software.amazon.awssdk:bom from 2.29.1 to 2.29.6 (#11454)
dependabot[bot] Nov 4, 2024
29ee906
Build: Bump com.gradleup.shadow:shadow-gradle-plugin from 8.3.3 to 8.…
dependabot[bot] Nov 4, 2024
7ffb6a3
Build: Bump com.google.cloud:libraries-bom from 26.49.0 to 26.50.0 (#…
dependabot[bot] Nov 4, 2024
aa27d90
Build: Bump org.apache.httpcomponents.client5:httpclient5 (#11450)
dependabot[bot] Nov 4, 2024
c7f0f80
Build: Bump com.azure:azure-sdk-bom from 1.2.28 to 1.2.29 (#11453)
dependabot[bot] Nov 4, 2024
0669bcb
Flink: Maintenance - TableManager + ExpireSnapshots (#11144)
pvary Nov 4, 2024
ec0eef4
Build: Bump mkdocs-material from 9.5.42 to 9.5.43 (#11455)
dependabot[bot] Nov 4, 2024
8357f65
Build: Bump net.snowflake:snowflake-jdbc from 3.19.1 to 3.20.0 (#11447)
dependabot[bot] Nov 4, 2024
3298468
Build: Bump kafka from 3.8.0 to 3.8.1 (#11449)
dependabot[bot] Nov 4, 2024
9dcf0d3
Build: Bump jackson-bom from 2.18.0 to 2.18.1 (#11448)
dependabot[bot] Nov 4, 2024
af5be32
Core: Fix generated position delete file spec (#11458)
aokolnychyi Nov 4, 2024
ec269ee
API, Core: Add content offset and size to DeleteFile (#11446)
aokolnychyi Nov 4, 2024
7cc16fa
Revert "Build: Bump parquet from 1.13.1 to 1.14.3 (#11264)" (#11462)
RussellSpitzer Nov 4, 2024
d0cca38
Spark 3.5: Preserve data file reference during manifest rewrites (#11…
aokolnychyi Nov 4, 2024
43b2f7d
Core: Make PositionDeleteIndex serializable (#11463)
aokolnychyi Nov 5, 2024
592b3b1
Spark 3.5: Preserve content offset and size during manifest rewrites …
aokolnychyi Nov 5, 2024
20e0e3d
Spark 3.5: Fix flaky test due to temp directory not empty during dele…
manuzhang Nov 5, 2024
67ee082
Core, Data, Flink, Spark: Improve tableDir initialization for tests (…
nastra Nov 5, 2024
5bd314b
Core: Support DVs in DeleteFileIndex (#11467)
aokolnychyi Nov 5, 2024
549674b
Core: Adapt commit, scan, and snapshot stats for DVs (#11464)
aokolnychyi Nov 5, 2024
ad24d4b
Spark: Synchronously merge new position deletes with old deletes (#11…
amogh-jahagirdar Nov 5, 2024
9be7f00
Fix ADLSLocation file parsing (#11395)
mrcnc Nov 5, 2024
2ffd3b0
open-api: Build runtime jar for test fixture (#11279)
ajantha-bhat Nov 6, 2024
7938403
Core, Puffin: Add DV file writer (#11476)
aokolnychyi Nov 6, 2024
1e82c47
Flink: Fix config key typo in error message of SplitComparators (#11482)
liuml07 Nov 6, 2024
11e7230
API: Removes Explicit Parameterization of Schema Tests (#11444)
RussellSpitzer Nov 7, 2024
5c8a5d6
Docs: Fix verifying release candidate with Spark and Flink (#11461)
manuzhang Nov 7, 2024
3da64d3
Flink: Port #11144 to v1.19 (#11473)
pvary Nov 8, 2024
fff9ec3
Docs: Fix format of verifying release candidate with Flink (#11487)
manuzhang Nov 8, 2024
166edc7
Core: Support DVs in DeleteLoader (#11481)
aokolnychyi Nov 8, 2024
dda6215
Infra: Update DOAP.RDF for Apache Iceberg 1.7.0 (#11492)
RussellSpitzer Nov 8, 2024
6a16340
Docs: Site Update for 1.7.0 Release (#11494)
RussellSpitzer Nov 8, 2024
bbc0d9a
Infra: Add 1.7.0 to issue template (#11491)
RussellSpitzer Nov 8, 2024
df0917d
Build: Let revapi compare against 1.7.0 (#11490)
RussellSpitzer Nov 8, 2024
cd25937
Docs: Adds Release notes for 1.6.1 (#11500)
RussellSpitzer Nov 8, 2024
dfee4cb
Docs: Fixes Release Formatting for 1.7.0 Release Notes (#11499)
RussellSpitzer Nov 8, 2024
82a2362
DOCS: Explicitly specify `operation` as a _required_ field of `summar…
sungwy Nov 8, 2024
1c576c5
Spark: Exclude reading _pos column if it's not in the scan list (#11390)
huaxingao Nov 8, 2024
ea21a53
Build: Bump mkdocs-redirects from 1.2.1 to 1.2.2 (#11511)
dependabot[bot] Nov 11, 2024
aa0aeb0
Build: Bump mkdocs-material from 9.5.43 to 9.5.44 (#11510)
dependabot[bot] Nov 11, 2024
981f1ea
Build: Bump software.amazon.awssdk:bom from 2.29.6 to 2.29.9 (#11509)
dependabot[bot] Nov 11, 2024
d1fd492
Docs: Update multi-engine support after 1.7.0 release (#11503)
manuzhang Nov 11, 2024
5530605
Spark: Fix typo in Spark ddl comment (#11517)
hantangwangd Nov 11, 2024
af3fbfe
Core: Support commits with DVs (#11495)
aokolnychyi Nov 11, 2024
11d21b2
Kafka Connect: fix Hadoop dependency exclusion (#11516)
bryanck Nov 11, 2024
e3f3997
Build: Upgrade to Gradle 8.11.0 (#11521)
jbonofre Nov 12, 2024
5054578
Spark 3.5: Iceberg parser should passthrough unsupported procedure to…
pan3793 Nov 12, 2024
4a3817b
Release: Use `dist/release` KEYS (#11526)
kevinjqliu Nov 12, 2024
0280885
Pig: Remove iceberg-pig (#11380)
jbonofre Nov 13, 2024
e06b069
Core, Flink, Spark: Test DVs with format-version=3 (#11485)
nastra Nov 13, 2024
3659ded
Spark: Update tests which assume file format to use enum instead of s…
huaxingao Nov 13, 2024
9923ac9
Spark 3.4: Support Spark Column Stats (#11532)
saitharun15 Nov 14, 2024
daa24f9
Docs: Fix rendering lists (#11546)
manuzhang Nov 14, 2024
071d9e2
Build: Bump kafka from 3.8.1 to 3.9.0 (#11508)
dependabot[bot] Nov 14, 2024
0963485
Support WASB scheme in ADLSFileIO (#11504)
mrcnc Nov 14, 2024
0a705b0
Docs: 4 Spaces are Requried for Sublists (#11549)
RussellSpitzer Nov 14, 2024
4c0288a
API, Core, Spark: Ignore schema merge updates from long -> int (#11419)
rocco408 Nov 15, 2024
307593f
Docs: Fix level of Deletion Vectors (#11547)
manuzhang Nov 15, 2024
50d310a
API, Core: Replace deprecated ContentFile#path usage with location (#…
amogh-jahagirdar Nov 15, 2024
821aec3
API: Add Variant data type (#11324)
aihuaxu Nov 15, 2024
315e154
Spark 3.5: Adapt DeleteFileIndexBenchmark for DVs (#11529)
aokolnychyi Nov 15, 2024
7e4fd1b
Spark 3.5: Adapt PlanningBenchmark for DVs (#11531)
aokolnychyi Nov 15, 2024
acd7cc1
Spark 3.5: Add DVReaderBenchmark (#11537)
aokolnychyi Nov 15, 2024
8c83fb7
Build: Bump datamodel-code-generator from 0.26.2 to 0.26.3 (#11572)
dependabot[bot] Nov 17, 2024
3934b13
Build: Bump software.amazon.awssdk:bom from 2.29.9 to 2.29.15 (#11568)
dependabot[bot] Nov 17, 2024
fc11dc4
Build: Bump io.netty:netty-buffer from 4.1.114.Final to 4.1.115.Final…
dependabot[bot] Nov 17, 2024
b38951d
Data, Flink, MR, Spark: Test deletes with format-version=3 (#11538)
nastra Nov 17, 2024
491718c
Build: Bump nessie from 0.99.0 to 0.100.0 (#11567)
dependabot[bot] Nov 18, 2024
f9256c6
Build: Bump orc from 1.9.4 to 1.9.5 (#11571)
dependabot[bot] Nov 18, 2024
97542ab
API, Arrow, Core, Data, Spark: Replace usage of deprecated ContentFil…
amogh-jahagirdar Nov 18, 2024
bf8d25f
Core: Serialize `null` when there is no current snapshot (#11560)
Fokko Nov 18, 2024
209781a
Spark 3.4: Iceberg parser should passthrough unsupported procedure to…
pan3793 Nov 18, 2024
568940f
Spark 3.3: Iceberg parser should passthrough unsupported procedure to…
pan3793 Nov 18, 2024
e71e3cb
Core: Inherited classes from SnapshotProducer has TableOperations red…
gaborkaszab Nov 19, 2024
3badfe0
Revert "Core: Use encoding/decoding methods for namespaces and deprec…
nastra Nov 19, 2024
f6d02de
Core: Delete temp metadata file when version already exists (#11350)
leesf Nov 20, 2024
657fa86
Build: Bump Apache Parquet 1.14.4 (#11502)
Fokko Nov 20, 2024
3c4a710
Core: Filter on live entries when reading the manifest (#9996)
Fokko Nov 20, 2024
918f81f
Core: Fix CCE when retrieving TableOps (#11585)
nastra Nov 20, 2024
d19e3ff
API, Core: Remove unnecessary casts to Iterable<T> (#11601)
nastra Nov 20, 2024
799925a
Spark 3.5: Fix NotSerializableException when migrating Spark tables (…
manuzhang Nov 20, 2024
c9ece12
Spark 3.3: Deprecate support (#11596)
aokolnychyi Nov 20, 2024
6e9e07a
Hive: Bugfix for incorrect Deletion of Snapshot Metadata Due to OutOf…
ZhendongBai Nov 20, 2024
a8f42d1
docs: Add `iceberg-go` to doc site (#11607)
zeroshade Nov 20, 2024
3b5c9f7
Spark 3.5: Procedure to compute table stats (#10986)
karuppayya Nov 20, 2024
93a0633
Parquet: Use native getRowIndexOffset support instead of calculating …
wypoon Nov 21, 2024
c448a4b
Spark: Fix changelog table bug for start time older than current snap…
Acehaidrey Nov 21, 2024
652fcc6
Spark 3.5: Fix flaky TestRemoveOrphanFilesAction3 (#11616)
manuzhang Nov 21, 2024
c1f1f8b
Build: Upgrade to Gradle 8.11.1 (#11619)
jbonofre Nov 21, 2024
90be5d7
Core: Optimize MergingSnapshotProducer to use referenced manifests to…
amogh-jahagirdar Nov 21, 2024
12845d4
Revert "Core: Update TableMetadataParser to ensure all streams closed…
hussein-awala Nov 21, 2024
a52afdc
Add REST Catalog tests to Spark 3.5 integration test (#11093)
haizhou-zhao Nov 21, 2024
ce4c447
Spark 3.5: Correct the two-stage parsing strategy of antlr parser (#1…
pan3793 Nov 22, 2024
f717ebd
Spark 3.4: Correct the two-stage parsing strategy of antlr parser (#7…
pan3793 Nov 22, 2024
f2b1b91
Docs: Add new blog post to Iceberg Blogs (#11627)
ismailsimsek Nov 22, 2024
5851ca6
Docs: Mention HIVE-28121 for MySQL/MariaDB-based HMS users (#11631)
pan3793 Nov 23, 2024
9cc13b1
Spark 3.4: IcebergSource extends SessionConfigSupport (#7732)
pan3793 Nov 23, 2024
5e09cdc
Spark 3.5: IcebergSource extends SessionConfigSupport (#11624)
pan3793 Nov 23, 2024
eddf9a1
Spark 3.3: IcebergSource extends SessionConfigSupport (#11625)
pan3793 Nov 23, 2024
b1fbef7
Build: Bump testcontainers from 1.20.3 to 1.20.4 (#11640)
dependabot[bot] Nov 25, 2024
3aebcfe
Build: Bump mkdocs-material from 9.5.44 to 9.5.45 (#11641)
dependabot[bot] Nov 25, 2024
4337040
Spark 3.3: Correct the two-stage parsing strategy of antlr parser (#1…
pan3793 Nov 25, 2024
1f23dcd
Build: Bump software.amazon.awssdk:bom from 2.29.15 to 2.29.20 (#11639)
dependabot[bot] Nov 25, 2024
4b52dbd
Core,Open-API: Don't expose the `last-column-id` (#11514)
Fokko Nov 25, 2024
cb1ad79
Build: Bump com.google.errorprone:error_prone_annotations (#11638)
dependabot[bot] Nov 25, 2024
fa47f31
Flink: Add table.exec.iceberg.use-v2-sink option (#11244)
arkadius Nov 25, 2024
cdd944e
Docs: Use DataFrameWriterV2 in example (#11647)
pan3793 Nov 25, 2024
f7ff0dc
Docs: Add `WHEN NOT MATCHED BY SOURCE` to Spark doc (#11636)
hussein-awala Nov 25, 2024
fa00482
Build: Bump nessie from 0.100.0 to 0.100.2 (#11637)
dependabot[bot] Nov 25, 2024
430ebff
Build: Delete branch automatically on PR merge (#11635)
manuzhang Nov 26, 2024
f356087
Flink: Test both "new" Flink Avro planned reader and "deprecated" Avr…
jbonofre Nov 26, 2024
38d054e
Spark 3.4: Add procedure to compute table stats (#11652)
jeesou Nov 27, 2024
57527d7
Flink: Backport #11244 to Flink 1.19 (Add table.exec.iceberg.use-v2-s…
arkadius Nov 27, 2024
e9f24f8
Doc: Fix some Javadoc URLs. (#11666)
zhongyujiang Nov 27, 2024
2b3cb5b
Docs: Add blog post showing Nussknacker with Iceberg integration (#11…
arkadius Nov 27, 2024
9288d98
Kafka Connect: Add config to prefix the control consumer group (#11599)
hugofriant Nov 27, 2024
bd7cff1
Flink: Backport Avro planned reader (and corresponding tests) on Flin…
jbonofre Nov 27, 2024
3a8bf57
Docs: Add RisingWave (#11642)
hengm3467 Nov 28, 2024
163e206
REST: Docker file for REST Catalog Fixture (#11283)
ajantha-bhat Nov 28, 2024
a95943e
Core: Propagate custom metrics reporter when table is created/replace…
nastra Nov 28, 2024
8e0031c
Spark: remove ROW_POSITION from project schema (#11610)
huaxingao Nov 28, 2024
3a04257
Default to `overwrite` when operation is missing (#11421)
Fokko Nov 28, 2024
8fccdec
Core,API: Set `503: added_snapshot_id` as required (#11626)
Fokko Nov 28, 2024
7c7b4ba
Add GitHub Action to publish the `docker-rest-fixture` container (#11…
sungwy Nov 29, 2024
e770fac
Core, GCS, Spark: Replace wrong order of assertion (#11677)
ebyhr Nov 29, 2024
f978fe5
REST: Clean up `iceberg-rest-fixture` docker image naming (#11676)
ajantha-bhat Dec 1, 2024
2333640
Build: Bump software.amazon.awssdk:bom from 2.29.20 to 2.29.23 (#11683)
dependabot[bot] Dec 2, 2024
578dda8
Build: Bump mkdocs-material from 9.5.45 to 9.5.46 (#11680)
dependabot[bot] Dec 2, 2024
bc36f5e
REST: Use `HEAD` request to check table existence (#10999)
ebyhr Dec 2, 2024
a993b79
Build: Bump jackson-bom from 2.18.1 to 2.18.2 (#11681)
dependabot[bot] Dec 2, 2024
bfeaaeb
Build: Bump org.xerial:sqlite-jdbc from 3.47.0.0 to 3.47.1.0 (#11682)
dependabot[bot] Dec 2, 2024
d8326d8
Spark 3.5: Make where clause case sensitive in rewrite data files (#1…
ludlows Dec 2, 2024
af8e3f5
Spark: Remove extra columns for ColumnarBatch (#11551)
huaxingao Dec 3, 2024
6501d29
Spark: Add view support to SparkSessionCatalog (#11388)
nastra Dec 3, 2024
15bf9ca
Core: Fix warning message for deprecated OAuth2 server URI (#11694)
ebyhr Dec 4, 2024
c7cef9b
Build: Bump Parquet to 1.15.0 (#11656)
Fokko Dec 4, 2024
3278b69
Spark 3.3, 3.4: Make where clause case sensitive in rewrite data file…
ludlows Dec 4, 2024
8c04bcb
Core: Generalize Util.blockLocations (#11053)
okumin Dec 4, 2024
36140b8
Docs: Spark procedure for stats collection (#11606)
karuppayya Dec 4, 2024
38c8daa
Spark 3.5: Align RewritePositionDeleteFilesSparkAction filter with Sp…
huaxingao Dec 6, 2024
c91d3b7
Spark 3.5: Write DVs in Spark for V3 tables (#11561)
amogh-jahagirdar Dec 6, 2024
f931a3d
Infra: Add 1.7.1 to issue template (#11711)
bryanck Dec 6, 2024
2210e28
Update ASF doap.rdf to Release 1.7.1 (#11712)
bryanck Dec 6, 2024
deeb04b
Spark 3.3,3.4: Align RewritePositionDeleteFilesSparkAction filter wit…
huaxingao Dec 7, 2024
5d1dc1a
Build: Bump software.amazon.awssdk:bom from 2.29.23 to 2.29.29 (#11723)
dependabot[bot] Dec 9, 2024
b39253f
Build: Bump com.google.cloud:libraries-bom from 26.50.0 to 26.51.0 (#…
dependabot[bot] Dec 9, 2024
3eed132
Build: Bump mkdocs-material from 9.5.46 to 9.5.47 (#11726)
dependabot[bot] Dec 9, 2024
0699c8d
Add C++ to the list of languages in `doap.rdf` (#11714)
Fokko Dec 9, 2024
0662373
Build: Bump com.azure:azure-sdk-bom from 1.2.29 to 1.2.30 (#11725)
dependabot[bot] Dec 9, 2024
b18ab74
Add `curl` to the `iceberg-rest-fixture` Docker image (#11705)
dominikhei Dec 9, 2024
70d87f1
Build: Bump nessie from 0.100.2 to 0.101.0 (#11722)
dependabot[bot] Dec 9, 2024
2b2efd7
docs: 1.7.1 Release notes (#11717)
bryanck Dec 9, 2024
d402f83
Docs: Add guidelines for contributors to become committers (#11670)
rdblue Dec 9, 2024
28e8180
Core, Flink, Spark: Drop deprecated APIs scheduled for removal in 1.8…
findepi Dec 10, 2024
ac6509a
Flink: Fix range distribution npe when value is null (#11662)
Guosmilesmile Dec 10, 2024
ff81344
AWS: Enable RetryMode for AWS KMS client (#11420)
hsiang-c Dec 11, 2024
da53495
Core, Flink, Spark, KafkaConnect: Remove usage of deprecated path API…
amogh-jahagirdar Dec 11, 2024
fe2f593
Infra: Build Iceberg REST fixture docker image for `arm64` architectu…
Fokko Dec 11, 2024
af5e156
Docs: fix typos in spec (#11759)
xxchan Dec 12, 2024
587620b
Spark 3.4,3.5: Fix issue when views group by an ordinal (#11729)
Ppei-Wang Dec 12, 2024
5c00b29
Spark: Remove deprecated SparkAppenderFactory (#11727)
ajantha-bhat Dec 12, 2024
6c05f35
Core: Log where the missing metadata file is located for Hadoop (#11643)
manuzhang Dec 12, 2024
3053540
Core: Use HEAD request to check if view exists (#11760)
nastra Dec 12, 2024
1e126e2
Core: Use HEAD request to check if namespace exists (#11761)
nastra Dec 12, 2024
a3dcfd1
Hive: Optimize tableExists API in hive catalog (#11597)
dramaticlly Dec 12, 2024
540d6a6
GCS: Suppress JavaUtilDate in OAuth2RefreshCredentialsHandler (#11773)
ebyhr Dec 13, 2024
c2fd77a
Flink: Add RowConverter for Iceberg Source (#11301)
abharath9 Dec 14, 2024
bcf7b63
Spark 3.5: Fix assertion mismatch in PartitionedWritesTestBase/TestRe…
wzx140 Dec 15, 2024
fd739b3
Build: Bump nessie from 0.101.0 to 0.101.2 (#11791)
dependabot[bot] Dec 15, 2024
592b604
Core: Add missing REST endpoint definitions (#11756)
ajreid21 Dec 16, 2024
1851ca1
Build: Bump software.amazon.awssdk:bom from 2.29.29 to 2.29.34 (#11793)
dependabot[bot] Dec 16, 2024
2a5b089
Spark: Read DVs when reading from .position_deletes table (#11657)
nastra Dec 16, 2024
f40ec20
Core: Add TableUtil to provide access to a table's format version (#1…
nastra Dec 16, 2024
16cc4e9
Build: Bump mkdocs-material from 9.5.47 to 9.5.48 (#11790)
dependabot[bot] Dec 16, 2024
791d0fa
Spark 3.4: Add REST catalog to Spark integration tests (#11698)
nastra Dec 16, 2024
57ea310
Parquet: Implement defaults for generic data (#11785)
rdblue Dec 16, 2024
b9b61b1
Avro: Support default values for generic data (#11786)
rdblue Dec 16, 2024
ac865e3
REST: Use `apache/iceberg-rest-fixture` docker image (#11673)
ajantha-bhat Dec 17, 2024
5c170ae
docs: Default value of table level distribution-mode should be not se…
manuzhang Dec 17, 2024
3adcd89
Docs: Fix Spark catalog `table-override` description (#11684)
manuzhang Dec 17, 2024
ce7a4b4
API: Add missing deprecations (#11734)
Fokko Dec 17, 2024
ed06c9c
Core, Spark 3.5: Fix test failures due to timeout (#11654)
manuzhang Dec 17, 2024
a6cfc12
Auth Manager API part 1: HTTPRequest, HTTPHeader (#11769)
adutra Dec 17, 2024
e3628c1
Flink: make `StatisticsOrRecord` to be correctly serialized and deser…
huyuanfeng2018 Dec 18, 2024
b428fbc
Spark 3.4,3.5: Use correct identifier in view DESCRIBE cmd (#11751)
Ppei-Wang Dec 18, 2024
204a49c
Use try-with-resources in TestParallelIterable (#11810)
sopel39 Dec 18, 2024
7e1a4c9
Spark 3.5: Support default values in Parquet reader (#11803)
rdblue Dec 18, 2024
d0effc6
Data: Fix Parquet and Avro defaults date/time representation (#11811)
rdblue Dec 18, 2024
91a1505
Revert "Support WASB scheme in ADLSFileIO (#11504)" (#11812)
mrcnc Dec 18, 2024
88a2596
Core, Spark, Flink, Hive: Remove unused failsafe dependency from core…
amogh-jahagirdar Dec 19, 2024
3535240
Docs: Change to Flink directory for instructions (#11031)
liuml07 Dec 19, 2024
7033667
Spark 3.5: Support default values in vectorized reads (#11815)
rdblue Dec 19, 2024
ed36a9f
Spark 3.5: Remove numbers from assert description in TestRewritePosit…
TQJADE Dec 20, 2024
cdf748e
Auth Manager API part 2: AuthManager (#11809)
adutra Dec 20, 2024
dea2fd1
Core: Add Variant implementation to read serialized objects (#11415)
rdblue Dec 20, 2024
cd187c5
Spark: Test reading default values in Spark (#11832)
rdblue Dec 21, 2024
dbf26d7
Build: Bump datamodel-code-generator from 0.26.3 to 0.26.4 (#11856)
dependabot[bot] Dec 22, 2024
4ceb96d
Build: Bump mkdocs-awesome-pages-plugin from 2.9.3 to 2.10.0 (#11855)
dependabot[bot] Dec 22, 2024
b0a119c
Build: Bump mkdocs-material from 9.5.48 to 9.5.49 (#11854)
dependabot[bot] Dec 22, 2024
5c5d7c9
Build: Bump io.netty:netty-buffer from 4.1.115.Final to 4.1.116.Final…
dependabot[bot] Dec 22, 2024
bbf5d6f
Build: Bump software.amazon.awssdk:bom from 2.29.34 to 2.29.39 (#11851)
dependabot[bot] Dec 22, 2024
556969a
Build: Bump guava from 33.3.1-jre to 33.4.0-jre (#11850)
dependabot[bot] Dec 22, 2024
e0ccebc
Build: Bump junit from 5.11.3 to 5.11.4 (#11849)
dependabot[bot] Dec 22, 2024
e4d9c1d
Build: Bump org.assertj:assertj-core from 3.26.3 to 3.27.0 (#11847)
dependabot[bot] Dec 22, 2024
12d7ee5
Build: Bump nessie from 0.101.2 to 0.101.3 (#11852)
dependabot[bot] Dec 22, 2024
f7748f2
Build: Bump junit-platform from 1.11.3 to 1.11.4 (#11848)
dependabot[bot] Dec 22, 2024
55f10ca
Doc: Add status page for different implementations. (#11772)
liurenjie1024 Dec 23, 2024
ca3db93
Upgrade to Gradle 8.12 (#11861)
jbonofre Dec 23, 2024
dbd7d1c
Build: Fix ignoring `.asf.yaml` in PR (#11860)
manuzhang Dec 23, 2024
c6d9e0c
Gradle: Update `gradlew` with better `APP_HOME` definition (#11869)
jbonofre Dec 24, 2024
d6d3cf5
Core, Spark: Avoid deprecated methods in Guava Files (#11865)
ebyhr Dec 24, 2024
1b5886d
Core: Don't clear snapshotLog in `TableMetadata.removeRef` (#11779)
ebyhr Dec 24, 2024
4eb9f7f
Core: Replace deprecated Schema.toString with SchemaFormatter (#11867)
ebyhr Dec 25, 2024
bb27030
Build: Fix ignoring `license-check.yml` in PR (#11873)
manuzhang Dec 25, 2024
fc3f705
API: Replace deprecated `asList` with `asInstanceOf` (#11875)
ebyhr Dec 26, 2024
de54a08
Flink: Avoid RANGE mode broken chain when write parallelism changes (…
huyuanfeng2018 Dec 27, 2024
607e2fb
Update `README.md` with `iceberg-cpp` (#11882)
gabeiglio Dec 28, 2024
0029d6a
Build: Bump software.amazon.awssdk:bom from 2.29.39 to 2.29.43 (#11886)
dependabot[bot] Dec 29, 2024
3fa8a46
Build: Bump mkdocs-awesome-pages-plugin from 2.10.0 to 2.10.1 (#11885)
dependabot[bot] Dec 29, 2024
7f14032
Core: Fix typo in HadoopTableOperations (#11880)
okumin Dec 29, 2024
e3f50e5
Revert "Hive: close the fileIO client when closing the hive catalog (…
Fokko Dec 30, 2024
ab6365d
Docs: Add history to Hive's metadata tables (#11902)
okumin Jan 2, 2025
3b00043
Doc: Fix format of Hive (#11892)
ebyhr Jan 3, 2025
4d35682
Flink: Backport #11662 Fix range distribution npe when value is null …
Guosmilesmile Jan 3, 2025
c0d6d42
Spark: Change delete file granularity to file in Spark 3.5 (#11478)
amogh-jahagirdar Jan 3, 2025
dbfefb0
Bump Apache Spark to 3.5.4 (#11731)
pan3793 Jan 3, 2025
fcd5dd9
Kafka-connect-runtime: remove code duplications in integration tests …
wombatu-kun Jan 4, 2025
2247f59
Work in progress HashBucketing and Batch Processing
Jan 5, 2025
7d38197
Merge branch 'apache:main' into test/read-batch+hashbucketing
TTTechnoPro Jan 5, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
2 changes: 2 additions & 0 deletions .asf.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,8 @@ github:
required_approving_review_count: 1

required_linear_history: true

del_branch_on_merge: true

features:
wiki: true
Expand Down
4 changes: 3 additions & 1 deletion .github/ISSUE_TEMPLATE/iceberg_bug_report.yml
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,9 @@ body:
description: What Apache Iceberg version are you using?
multiple: false
options:
- "1.6.1 (latest release)"
- "1.7.1 (latest release)"
- "1.7.0"
- "1.6.1"
- "1.6.0"
- "1.5.2"
- "1.5.1"
Expand Down
6 changes: 0 additions & 6 deletions .github/labeler.yml
Original file line number Diff line number Diff line change
Expand Up @@ -130,12 +130,6 @@ MR:
'mr/**/*'
]

PIG:
- changed-files:
- any-glob-to-any-file: [
'pig/**/*'
]

AWS:
- changed-files:
- any-glob-to-any-file: [
Expand Down
5 changes: 2 additions & 3 deletions .github/workflows/delta-conversion-ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -37,23 +37,22 @@ on:
- '.github/workflows/jmh-benchmarks-ci.yml'
- '.github/workflows/kafka-connect-ci.yml'
- '.github/workflows/labeler.yml'
- '.github/workflows/licence-check.yml'
- '.github/workflows/license-check.yml'
- '.github/workflows/open-api.yml'
- '.github/workflows/publish-snapshot.yml'
- '.github/workflows/recurring-jmh-benchmarks.yml'
- '.github/workflows/site-ci.yml'
- '.github/workflows/spark-ci.yml'
- '.github/workflows/stale.yml'
- '.gitignore'
- '.asf.yml'
- '.asf.yaml'
- 'dev/**'
- 'mr/**'
- 'hive3/**'
- 'hive3-orc-bundle/**'
- 'hive-runtime/**'
- 'flink/**'
- 'kafka-connect/**'
- 'pig/**'
- 'docs/**'
- 'site/**'
- 'open-api/**'
Expand Down
5 changes: 2 additions & 3 deletions .github/workflows/flink-ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -37,23 +37,22 @@ on:
- '.github/workflows/jmh-benchmarks-ci.yml'
- '.github/workflows/kafka-connect-ci.yml'
- '.github/workflows/labeler.yml'
- '.github/workflows/licence-check.yml'
- '.github/workflows/license-check.yml'
- '.github/workflows/open-api.yml'
- '.github/workflows/publish-snapshot.yml'
- '.github/workflows/recurring-jmh-benchmarks.yml'
- '.github/workflows/site-ci.yml'
- '.github/workflows/spark-ci.yml'
- '.github/workflows/stale.yml'
- '.gitignore'
- '.asf.yml'
- '.asf.yaml'
- 'dev/**'
- 'mr/**'
- 'hive3/**'
- 'hive3-orc-bundle/**'
- 'hive-runtime/**'
- 'kafka-connect/**'
- 'spark/**'
- 'pig/**'
- 'docs/**'
- 'site/**'
- 'open-api/**'
Expand Down
5 changes: 2 additions & 3 deletions .github/workflows/hive-ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -37,21 +37,20 @@ on:
- '.github/workflows/jmh-benchmarks-ci.yml'
- '.github/workflows/kafka-connect-ci.yml'
- '.github/workflows/labeler.yml'
- '.github/workflows/licence-check.yml'
- '.github/workflows/license-check.yml'
- '.github/workflows/open-api.yml'
- '.github/workflows/publish-snapshot.yml'
- '.github/workflows/recurring-jmh-benchmarks.yml'
- '.github/workflows/site-ci.yml'
- '.github/workflows/spark-ci.yml'
- '.github/workflows/stale.yml'
- '.gitignore'
- '.asf.yml'
- '.asf.yaml'
- 'dev/**'
- 'arrow/**'
- 'spark/**'
- 'flink/**'
- 'kafka-connect/**'
- 'pig/**'
- 'docs/**'
- 'site/**'
- 'open-api/**'
Expand Down
4 changes: 2 additions & 2 deletions .github/workflows/java-ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -37,15 +37,15 @@ on:
- '.github/workflows/jmh-benchmarks-ci.yml'
- '.github/workflows/kafka-connect-ci.yml'
- '.github/workflows/labeler.yml'
- '.github/workflows/licence-check.yml'
- '.github/workflows/license-check.yml'
- '.github/workflows/open-api.yml'
- '.github/workflows/publish-snapshot.yml'
- '.github/workflows/recurring-jmh-benchmarks.yml'
- '.github/workflows/site-ci.yml'
- '.github/workflows/spark-ci.yml'
- '.github/workflows/stale.yml'
- '.gitignore'
- '.asf.yml'
- '.asf.yaml'
- 'dev/**'
- 'docs/**'
- 'site/**'
Expand Down
5 changes: 2 additions & 3 deletions .github/workflows/kafka-connect-ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -37,23 +37,22 @@ on:
- '.github/workflows/java-ci.yml'
- '.github/workflows/jmh-benchmarks-ci.yml'
- '.github/workflows/labeler.yml'
- '.github/workflows/licence-check.yml'
- '.github/workflows/license-check.yml'
- '.github/workflows/open-api.yml'
- '.github/workflows/publish-snapshot.yml'
- '.github/workflows/recurring-jmh-benchmarks.yml'
- '.github/workflows/site-ci.yml'
- '.github/workflows/spark-ci.yml'
- '.github/workflows/stale.yml'
- '.gitignore'
- '.asf.yml'
- '.asf.yaml'
- 'dev/**'
- 'mr/**'
- 'flink/**'
- 'hive3/**'
- 'hive3-orc-bundle/**'
- 'hive-runtime/**'
- 'spark/**'
- 'pig/**'
- 'docs/**'
- 'site/**'
- 'open-api/**'
Expand Down
63 changes: 63 additions & 0 deletions .github/workflows/publish-iceberg-rest-fixture-docker.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
#
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.
#

name: Build and Push 'iceberg-rest-fixture' Docker Image

on:
push:
tags:
- 'apache-iceberg-[0-9]+.[0-9]+.[0-9]+'
workflow_dispatch:

env:
DOCKER_IMAGE_TAG: iceberg-rest-fixture
DOCKER_IMAGE_VERSION: latest
DOCKER_REPOSITORY: apache

jobs:
build:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- uses: actions/setup-java@v4
with:
distribution: zulu
java-version: 21
- name: Build Iceberg Open API project
run: ./gradlew :iceberg-open-api:shadowJar
- name: Login to Docker Hub
run: |
docker login -u ${{ secrets.DOCKERHUB_USER }} -p ${{ secrets.DOCKERHUB_TOKEN }}
- name: Set the tagged version
# for tag 'apache-iceberg-1.7.1', publish image 'apache/iceberg-rest-fixture:1.7.1'
if: github.event_name == 'push' && contains(github.ref, 'refs/tags/')
run: |
echo "DOCKER_IMAGE_VERSION=`echo ${{ github.ref }} | tr -d -c 0-9.`" >> "$GITHUB_ENV"
- name: Set up QEMU
uses: docker/setup-qemu-action@v3
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v3
- name: Build and Push
uses: docker/build-push-action@v6
with:
context: ./
file: ./docker/iceberg-rest-fixture/Dockerfile
platforms: linux/amd64,linux/arm64
push: true
tags: ${{ env.DOCKER_REPOSITORY }}/${{ env.DOCKER_IMAGE_TAG }}:${{ env.DOCKER_IMAGE_VERSION }}
5 changes: 2 additions & 3 deletions .github/workflows/spark-ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -38,14 +38,14 @@ on:
- '.github/workflows/jmh-benchmarks-ci.yml'
- '.github/workflows/kafka-connect-ci.yml'
- '.github/workflows/labeler.yml'
- '.github/workflows/licence-check.yml'
- '.github/workflows/license-check.yml'
- '.github/workflows/open-api.yml'
- '.github/workflows/publish-snapshot.yml'
- '.github/workflows/recurring-jmh-benchmarks.yml'
- '.github/workflows/site-ci.yml'
- '.github/workflows/stale.yml'
- '.gitignore'
- '.asf.yml'
- '.asf.yaml'
- 'dev/**'
- 'site/**'
- 'mr/**'
Expand All @@ -54,7 +54,6 @@ on:
- 'hive-runtime/**'
- 'flink/**'
- 'kafka-connect/**'
- 'pig/**'
- 'docs/**'
- 'open-api/**'
- 'format/**'
Expand Down
26 changes: 26 additions & 0 deletions .palantir/revapi.yml
Original file line number Diff line number Diff line change
Expand Up @@ -1145,6 +1145,32 @@ acceptedBreaks:
new: "method org.apache.iceberg.BaseMetastoreOperations.CommitStatus org.apache.iceberg.BaseMetastoreTableOperations::checkCommitStatus(java.lang.String,\
\ org.apache.iceberg.TableMetadata)"
justification: "Removing deprecated code"
"1.7.0":
org.apache.iceberg:iceberg-core:
- code: "java.method.removed"
old: "method <T extends org.apache.iceberg.StructLike> org.apache.iceberg.deletes.PositionDeleteIndex\
\ org.apache.iceberg.deletes.Deletes::toPositionIndex(java.lang.CharSequence,\
\ java.util.List<org.apache.iceberg.io.CloseableIterable<T>>)"
justification: "Removing deprecated code"
- code: "java.method.removed"
old: "method <T extends org.apache.iceberg.StructLike> org.apache.iceberg.deletes.PositionDeleteIndex\
\ org.apache.iceberg.deletes.Deletes::toPositionIndex(java.lang.CharSequence,\
\ java.util.List<org.apache.iceberg.io.CloseableIterable<T>>, java.util.concurrent.ExecutorService)"
justification: "Removing deprecated code"
- code: "java.method.removed"
old: "method <T> org.apache.iceberg.io.CloseableIterable<T> org.apache.iceberg.deletes.Deletes::streamingFilter(org.apache.iceberg.io.CloseableIterable<T>,\
\ java.util.function.Function<T, java.lang.Long>, org.apache.iceberg.io.CloseableIterable<java.lang.Long>)"
justification: "Removing deprecated code"
- code: "java.method.removed"
old: "method <T> org.apache.iceberg.io.CloseableIterable<T> org.apache.iceberg.deletes.Deletes::streamingFilter(org.apache.iceberg.io.CloseableIterable<T>,\
\ java.util.function.Function<T, java.lang.Long>, org.apache.iceberg.io.CloseableIterable<java.lang.Long>,\
\ org.apache.iceberg.deletes.DeleteCounter)"
justification: "Removing deprecated code"
- code: "java.method.removed"
old: "method <T> org.apache.iceberg.io.CloseableIterable<T> org.apache.iceberg.deletes.Deletes::streamingMarker(org.apache.iceberg.io.CloseableIterable<T>,\
\ java.util.function.Function<T, java.lang.Long>, org.apache.iceberg.io.CloseableIterable<java.lang.Long>,\
\ java.util.function.Consumer<T>)"
justification: "Removing deprecated code"
apache-iceberg-0.14.0:
org.apache.iceberg:iceberg-api:
- code: "java.class.defaultSerializationChanged"
Expand Down
1 change: 1 addition & 0 deletions LICENSE
Original file line number Diff line number Diff line change
Expand Up @@ -298,6 +298,7 @@ License: https://www.apache.org/licenses/LICENSE-2.0
This product includes code from Delta Lake.

* AssignmentAlignmentSupport is an independent development but UpdateExpressionsSupport in Delta was used as a reference.
* RoaringPositionBitmap is a Java implementation of RoaringBitmapArray in Delta.

Copyright: 2020 The Delta Lake Project Authors.
Home page: https://delta.io/
Expand Down
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -74,7 +74,6 @@ Iceberg also has modules for adding Iceberg support to processing engines:
* `iceberg-spark` is an implementation of Spark's Datasource V2 API for Iceberg with submodules for each spark versions (use runtime jars for a shaded version)
* `iceberg-flink` contains classes for integrating with Apache Flink (use iceberg-flink-runtime for a shaded version)
* `iceberg-mr` contains an InputFormat and other classes for integrating with Apache Hive
* `iceberg-pig` is an implementation of Pig's LoadFunc API for Iceberg

---
**NOTE**
Expand All @@ -98,3 +97,4 @@ This repository contains the Java implementation of Iceberg. Other implementatio
* **Go**: [iceberg-go](https://github.com/apache/iceberg-go)
* **PyIceberg** (Python): [iceberg-python](https://github.com/apache/iceberg-python)
* **Rust**: [iceberg-rust](https://github.com/apache/iceberg-rust)
* **C++**: [iceberg-cpp](https://github.com/apache/iceberg-cpp)
Original file line number Diff line number Diff line change
Expand Up @@ -65,7 +65,7 @@ public static AliyunOSSExtension initialize() {
} else {
LOG.info(
"Initializing AliyunOSSExtension implementation with default AliyunOSSMockExtension");
extension = AliyunOSSMockExtension.builder().silent().build();
extension = AliyunOSSMockExtension.builder().build();
}

return extension;
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -70,7 +70,7 @@ public void testWrite() throws IOException {
reset(ossMock);

// Write large file.
writeAndVerify(ossMock, uri, randomData(32 * 1024 * 1024), arrayWrite);
writeAndVerify(ossMock, uri, randomData(32 * 1024), arrayWrite);
verify(ossMock, times(1)).putObject(any());
reset(ossMock);
}
Expand Down
Loading
Loading