Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Provide spark catalog, dsv2 and use parquet for copy/unload #120

Closed
wants to merge 175 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
175 commits
Select commit Hold shift + click to select a range
717a4ad
Notes about inlining this in Databricks Runtime.
rxin Nov 8, 2017
184b442
Make the note more obvious.
rxin Nov 8, 2017
a3a39a2
Remove itests. Fix jdbc url. Update Redshift jdbc driver
dichiarafrancesco May 9, 2018
cafa05f
Merge pull request #1 from Yelp/fdc_DATALAKE-563_remove-itests-from-p…
dichiarafrancesco May 9, 2018
ab8124a
Fix double type to float and cleanup
dichiarafrancesco May 9, 2018
3230aaa
Avoid logging creds. log sql query statement only
dichiarafrancesco May 9, 2018
3384333
Add bit and default types
dichiarafrancesco May 9, 2018
58fb829
Fix test
dichiarafrancesco May 9, 2018
040b4a9
Merge pull request #2 from Yelp/fdc_DATALAKE-488_cleanup-fix-double-t…
dichiarafrancesco May 10, 2018
967dddb
Merge pull request #3 from Yelp/fdc_DATALAKE-486_avoid-log-creds
dichiarafrancesco May 10, 2018
3ae6a9b
Fix Empty string is converted to null
dichiarafrancesco May 11, 2018
475e7a1
Fix convertion bit and test
dichiarafrancesco May 12, 2018
d16317e
Fix indentation
dichiarafrancesco May 12, 2018
e15ccb5
Fix parenthesis
dichiarafrancesco May 14, 2018
d06fe3b
Fix scalastyle
dichiarafrancesco May 14, 2018
689635c
Fix File line length exceeds 100 characters
dichiarafrancesco May 14, 2018
0d2a130
Merge pull request #4 from Yelp/fdc_DATALAKE-4899_empty-string-to-null
dichiarafrancesco May 14, 2018
fbb58b3
First Yelp release
dichiarafrancesco May 14, 2018
50dfd98
Merge pull request #5 from Yelp/fdc_first-version
dichiarafrancesco May 14, 2018
90581a8
Fixed NewFilter - including hadoop-aws - s3n test is failing
lucagiovagnoli May 14, 2019
834f0d6
Upgraded jackson by excluding it in aws
lucagiovagnoli May 15, 2019
ea5da29
force spark.avro - hadoop 2.7.7 and awsjavasdk downgraded
lucagiovagnoli May 17, 2019
0fe37d2
Compiles with spark 2.4.0 - amazon unmarshal error
lucagiovagnoli May 31, 2019
da10897
Compiling - managed to run tests but they mostly fail
lucagiovagnoli May 31, 2019
95cdf94
Removing conn.commit() everywhere - got 88% of integration tests to r…
lucagiovagnoli Jun 1, 2019
b1fa3f6
Ignoring a bunch of tests as did snowflake - close to have a green bu…
lucagiovagnoli Jun 6, 2019
f3bbdb7
sbt assembly the package into a fat jar - found the perfect coordinat…
lucagiovagnoli Jun 7, 2019
0666bc6
aws_variables.env gitignored
lucagiovagnoli Jun 12, 2019
094cc15
remove in Memory FileSystem class and clean up comments in the sbt bu…
lucagiovagnoli Jun 13, 2019
866d4fd
Moving to external github issues - rename spName to spark-redshift-co…
lucagiovagnoli Jun 18, 2019
25acded
Revert sbt scripts to an older version
lucagiovagnoli Jun 19, 2019
5b0f949
Merge pull request #6 from spark-redshift-community/luca-spark-2.4
lucagiovagnoli Jun 26, 2019
7746c51
Moving package from com.databricks.spark.redshift to com.spark.redshi…
lucagiovagnoli Jun 27, 2019
bef9893
Better CHANGELOG - modernize SparkRedshiftBuild to build.sbt
lucagiovagnoli Jun 27, 2019
94449d0
Fix all broken databricks spark-redshift substitutions to the communi…
lucagiovagnoli Jun 27, 2019
235468b
Remove merge_pr utility - minor README update
lucagiovagnoli Jun 27, 2019
492b1ca
Renaming package to com.spark_redshift_community.spark.redshift
lucagiovagnoli Jul 1, 2019
4a210a9
Merge pull request #12 from spark-redshift-community/luca-COREML-867-…
lucagiovagnoli Jul 2, 2019
5fd10e3
Removing AWSCredentialsInUriIntegrationSuite test - credentials in th…
lucagiovagnoli Jul 10, 2019
b2dc8ff
Update CHANGELOG and version
lucagiovagnoli Jul 10, 2019
9574e44
Merge pull request #15 from spark-redshift-community/luca-COREML-889-…
lucagiovagnoli Jul 12, 2019
d2690f8
Move to previewDATE rather than SNAPSHOT releases before 4.0.0
lucagiovagnoli Jul 15, 2019
0b8f706
Refactor package and organization to be io.github.spark_redshift_comm…
lucagiovagnoli Jul 16, 2019
9fb479a
Merge pull request #18 from spark-redshift-community/luca-move-to-pre…
lucagiovagnoli Jul 16, 2019
c53c055
Travis CI using hadoop 2.7.7, spark 2.4.3
lucagiovagnoli Jul 23, 2019
92d6d56
Removing assembly plugin (unused) - add icons for build and coverage …
lucagiovagnoli Jul 23, 2019
034fd5a
Merge pull request #21 from spark-redshift-community/luca-set-up-trav…
lucagiovagnoli Jul 24, 2019
2515795
Add a simple how to build from source tutorial
Aug 2, 2019
e31778c
Re-enable RedshiftSourceSuite using a new InMemoryS3AFileSystem
lucagiovagnoli Aug 5, 2019
c0af333
Merge pull request #24 from lucagiovagnoli/luca-re-enable-RedshiftSou…
lucagiovagnoli Aug 6, 2019
2c37142
Address review comments on using repo's sbt
Aug 7, 2019
1b6fbf7
Merge pull request #23 from smoy/how_to_build
lucagiovagnoli Aug 7, 2019
c1bec40
Modernize datetime parsing - use DateTimeFormatter from Java 8
lucagiovagnoli Aug 14, 2019
5f83d87
Fix line too long and update version
lucagiovagnoli Aug 14, 2019
5a3c56e
Fix DateTimeFormatter String - timezone in Redshift in the form of +00
lucagiovagnoli Aug 15, 2019
2987ba7
Fix ZonedDateTime conversion to timestamp
lucagiovagnoli Aug 15, 2019
890be17
Fix timezone test to run correctly on machines in any timezone
lucagiovagnoli Aug 15, 2019
0c8ad55
Merge pull request #26 from lucagiovagnoli/luca-COREML-952-parse-time…
lucagiovagnoli Aug 16, 2019
87cceaa
Expected value in timstamptz test was wrong
lucagiovagnoli Aug 20, 2019
768130c
Merge pull request #27 from lucagiovagnoli/luca-COREML-timestamptz-fi…
lucagiovagnoli Aug 22, 2019
c199b2e
Change groupId to use hyphens
lucagiovagnoli Sep 4, 2019
4359ec0
successfully publishLocalSigned
lucagiovagnoli Sep 10, 2019
fcb3641
correct checkThatBucketHasObjectLifecycleConfiguration when s3 lifecy…
julienbachmann Sep 10, 2019
5c2a400
Merge pull request #1 from nagra-insight/fix/checkThatBucketHasObject…
julienbachmann Sep 10, 2019
8311244
Add changelog and snapshot version to test publishing to Maven Central
lucagiovagnoli Sep 10, 2019
aec0fdf
remove extra space
lucagiovagnoli Sep 16, 2019
147c843
Merge pull request #28 from lucagiovagnoli/luca-issue19-publish-maven…
lucagiovagnoli Sep 16, 2019
58ec6f6
Stable 4.0.0 release to publish to maven central
lucagiovagnoli Sep 17, 2019
4df21f0
Merge pull request #29 from lucagiovagnoli/luca-publish-maven-central…
lucagiovagnoli Sep 17, 2019
bd350e2
test all 4 branches in the `checkThatBucketHasObjectLifecycleConfigur…
julienbachmann Sep 18, 2019
9ebb338
correct missing dot
julienbachmann Sep 22, 2019
c354a8f
Fixed group ids as per issue #33
grmontpetit Oct 24, 2019
da4b715
Addressed PR's comments.
grmontpetit Oct 25, 2019
ae6280c
Merge pull request #38 from sniggel/fix/group-ids
lucagiovagnoli Oct 25, 2019
51271a0
README improvements
eeshugerman Nov 6, 2019
ccab1aa
Merge pull request #49 from eeshugerman/master
lucagiovagnoli Nov 8, 2019
a26007b
Handle microseconds from redshift
lucagiovagnoli Nov 14, 2019
1ed1fc1
Bump version and changelog
lucagiovagnoli Nov 14, 2019
dec70a6
Handle 4 and 5 digits after the comma
lucagiovagnoli Nov 14, 2019
e9d4653
Merge pull request #51 from lucagiovagnoli/luca-handle-microseconds
lucagiovagnoli Nov 21, 2019
657c2e8
ISSUE-56 | Trimming preactions and postactions before splitting to av…
chandanatalef Dec 7, 2019
1c1b421
ISSUE-56 | Reformatting
meetchandan Dec 7, 2019
1d38a5f
ISSUE-56 | Added tests
meetchandan Dec 7, 2019
c9b6d9c
ISSUE-56 | fixing tests
meetchandan Dec 7, 2019
78465a8
Merge pull request #57 from meetchandan/ISSUE-56
lucagiovagnoli Dec 13, 2019
c28e985
Bump version and changelog to 4.0.2 - bug fix sql text trimming
lucagiovagnoli Dec 13, 2019
849dd82
add 'include_column_list' parameter
Dec 6, 2019
edebea4
fix scalastyle check
Dec 9, 2019
bb252e3
code review: don't use null for schema in test
Dec 14, 2019
5d8f9d2
code review: add test for include_column_list=false
Dec 14, 2019
bc113fa
code review: changelog, bump version
Dec 14, 2019
d4396f8
Merge pull request #58 from eeshugerman/column-list
lucagiovagnoli Dec 16, 2019
9dd950b
Update "How to help" section
lucagiovagnoli Dec 16, 2019
21e3ec8
Merge pull request #59 from spark-redshift-community/lucagiovagnoli-p…
lucagiovagnoli Dec 17, 2019
9af6b6c
Update README.md
lucagiovagnoli Feb 25, 2020
84ebe9d
Merge pull request #68 from spark-redshift-community/lucagiovagnoli-R…
lucagiovagnoli Feb 26, 2020
42fa95e
fix
karuppayya Apr 1, 2020
2e82f97
Refactor
karuppayya Apr 17, 2020
d0c5279
Fix build.sbt
karuppayya Apr 17, 2020
6283202
Refactoring, fixes
karuppayya Apr 23, 2020
07da20b
cross publish to 2.12
joprice Jun 15, 2020
da88eb7
Fix compilation
karuppayya Jul 9, 2020
55b63db
Rename class name
karuppayya Jul 9, 2020
c89ce05
Fix checkstyle
karuppayya Jul 9, 2020
633e1e4
Temporarily unblock build
karuppayya Jul 10, 2020
45b3e1b
Fix spark version
karuppayya Jul 10, 2020
888b650
Update README.md
wseaton Jul 14, 2020
e9e6a65
Merge pull request #73 from wseaton/patch-1
lucagiovagnoli Sep 15, 2020
e8b354a
Merge pull request #72 from joprice/crossPublish
lucagiovagnoli Sep 16, 2020
d39ef0b
Bump to 4.1.1 to cross publish for scala 2.12
lucagiovagnoli Sep 16, 2020
674d6d9
Merge pull request #75 from spark-redshift-community/luca-bump-to-4.1.1
lucagiovagnoli Sep 16, 2020
d6f2281
spark-3.0.1 compatible
Oct 1, 2020
020051a
spark-3.0.1 compatible - updated CI
Oct 1, 2020
9f68173
spark-3.0.1 compatible - updated test suites
Oct 1, 2020
f34026b
used enableHiveSupport for sqlContext
Oct 1, 2020
7dbf620
address review comment
Oct 6, 2020
ada001a
keep CI testing spark 2.x
Oct 6, 2020
8f5f700
we can't test the same code with 2 versions of sparks
Oct 6, 2020
636f8aa
bump minor version per review
Oct 6, 2020
6d3d3fe
removed trailing space
Oct 7, 2020
8e34bbf
updated version
Oct 8, 2020
cd1aea7
Merge pull request #77 from vanhoale/spark3.x-compatible
lucagiovagnoli Oct 8, 2020
de59930
Merge branch 'master' of https://github.com/spark-redshift-community/…
julienbachmann Oct 27, 2020
8230764
do not display stacktrace on error on getBucketLifecycle
julienbachmann Oct 27, 2020
5288e24
Fix compilation errors
88manpreet Dec 14, 2020
0b232cf
Ignore remaining 6 failing unit tests and fix dependencies
88manpreet Dec 18, 2020
bda5ec1
Fix hadoop version for travis
88manpreet Dec 18, 2020
4349270
Improve readability
88manpreet Jan 7, 2021
3ed0e5c
Bump version for hadoop 3 support in spark-redshift
88manpreet Jan 13, 2021
a9e2604
Bump version to 5.0.0 Hadoop 3 support
88manpreet Jan 13, 2021
453e3d0
Merge pull request #79 from spark-redshift-community/hadoop3_2_1_support
88manpreet Jan 13, 2021
cf58172
Creating a serializer per row has a really low performance.
gumartinm Apr 22, 2021
1c24b8c
Merge pull request #88 from gumartinm/spark3_low_performance
88manpreet Apr 24, 2021
7726c10
Bump version to 5.0.1
88manpreet May 1, 2021
11494b1
add sse kms support
May 5, 2021
20827fb
Put comment on two lines to appease the linter
May 6, 2021
160c82d
whitespace fix
May 6, 2021
dfc2b8e
Merge pull request #89 from munk/jd/add-sse-kms-support
88manpreet May 6, 2021
c9999a9
Bump version to v5.0.2
88manpreet May 6, 2021
59234aa
Remove reliance on sbt-spark-package.
jsleight May 7, 2021
3cb8667
Merge pull request #90 from spark-redshift-community/jsleight_remove_…
jsleight May 7, 2021
02b2ff5
Bump version to v5.0.3
88manpreet May 10, 2021
bc040d5
Update Spark version and aws java sdk version for consistency purposes
88manpreet Jul 9, 2021
cb39659
Bump version to 5.0.4
88manpreet Jul 9, 2021
846cc23
Merge pull request #92 from spark-redshift-community/d/manpreet/upgra…
88manpreet Jul 9, 2021
daf2ecf
Merge pull request #30 from nagra-insight/master
88manpreet Nov 9, 2021
bfe823b
Bump version to 5.0.5 to avoid warning when bucket if configured for …
88manpreet Nov 9, 2021
557783f
Merge pull request #97 from 88manpreet/d/manpreet/bump_version_bucket…
88manpreet Nov 9, 2021
dfa47dc
fix: log4j-api not compatibility with spark 3.2.0
Jan 13, 2022
1ede80a
Merge pull request #100 from nvlong198/fix/log4jcompatibility
jsleight Jan 13, 2022
ab96178
Upgrade to spark v3.2.0
jsleight Jan 27, 2022
1f12470
Merge pull request #101 from jsleight/u/jsleight/spark3.2
jsleight Jan 27, 2022
b8eb032
Add catalyst type mapping for LONGVARCHAR
Mar 29, 2022
a675a3c
Merge pull request #105 from matthewrj/support-super
jsleight Mar 30, 2022
f0d98f6
make manifest file path use s3a/n scheme
nanflasted Sep 9, 2022
a51b440
Merge pull request #110 from nanflasted/ych_tries_fixing_manifest_s3a
jsleight Sep 21, 2022
84870fa
Bump version to 5.1.0 to support Spark 3.2
88manpreet Sep 22, 2022
78792a9
Merge pull request #111 from spark-redshift-community/d/manpreet/bump…
88manpreet Sep 22, 2022
88948bd
Fix build
parisni Dec 23, 2022
3479ac6
Merge remote-tracking branch 'karu/redshift_dsv2' into dsv2
parisni Dec 29, 2022
69b1445
Fix dsv2 for spark3.2
parisni Dec 29, 2022
eca6ea8
Make sbt build on local
parisni Dec 30, 2022
a72bcd1
Add redshift catalog
parisni Dec 30, 2022
7381aea
Implement pushdown agg
parisni Dec 30, 2022
cf64c1d
Revert "Implement pushdown agg"
parisni Dec 30, 2022
ee6ea20
Handle redshift empty parquet files
parisni Dec 30, 2022
8fad123
Implement table ttl in minutes
parisni Dec 30, 2022
b133fbb
Implement catalog write support
parisni Jan 1, 2023
511a209
Implement copy in parquet format
parisni Jan 2, 2023
89e436f
Document features
parisni Jan 3, 2023
885f7b3
Support drop, alter, rename table and database
parisni Jan 4, 2023
6d17ad1
Support refresh table to invalidate cache
parisni Jan 4, 2023
e04356f
Support desc redshift comments
parisni Jan 12, 2023
869c49e
Speedup redshift schema discovery
parisni Jan 12, 2023
48dbdd3
Use cache manifest
parisni Jan 21, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -4,3 +4,5 @@ project/target
.idea_modules/
*.DS_Store
build/*.jar
aws_variables.env
derby.log
35 changes: 3 additions & 32 deletions .travis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -9,40 +9,11 @@ before_cache:
# Tricks to avoid unnecessary cache updates
- find $HOME/.ivy2 -name "ivydata-*.properties" -delete
- find $HOME/.sbt -name "*.lock" -delete
# There's no nicer way to specify this matrix; see
# https://github.com/travis-ci/travis-ci/issues/1519.
matrix:
include:
# Scala 2.10.5 tests:
- jdk: openjdk7
scala: 2.10.5
env: HADOOP_VERSION="2.2.0" SPARK_VERSION="2.0.0" SPARK_AVRO_VERSION="3.0.0" AWS_JAVA_SDK_VERSION="1.10.22"
# Scala 2.11 tests:
- jdk: openjdk7
scala: 2.11.7
env: HADOOP_VERSION="2.2.0" SPARK_VERSION="2.0.0" SPARK_AVRO_VERSION="3.0.0" AWS_JAVA_SDK_VERSION="1.10.22"
# Test with an old version of the AWS Java SDK
- jdk: openjdk7
scala: 2.11.7
env: HADOOP_VERSION="2.2.0" SPARK_VERSION="2.0.0" SPARK_AVRO_VERSION="3.0.0" AWS_JAVA_SDK_VERSION="1.7.4"
env:
global:
# AWS_REDSHIFT_JDBC_URL
- secure: "RNkxdKcaKEYuJqxli8naazp42qO5/pgueIzs+J5rHwl39jcBvJMgW3DX8kT7duzdoBb/qrolj/ttbQ3l/30P45+djn0BEwcJMX7G/FGpZYD23yd03qeq7sOKPQl2Ni/OBttYHJMah5rI6aPmAysBZMQO7Wijdenb/RUiU2YcZp0="
# AWS_REDSHIFT_PASSWORD
- secure: "g5li3gLejD+/2BIqIm+qHiqBUvCc5l0qnftVaVlLtL7SffErp/twDiFP4gW8eqnFqi2GEC1c9Shf7Z9cOIUunNSBQZdYIVG0f38UfBeDP14nOoIuwZ974O5yggbgZhX0cKvJzINcENGoRNk0FzRwgOdCCiF05IMnRqQxI3C24fE="
# AWS_REDSHIFT_USER
- secure: "LIkY/ZpBXK3vSFsdpBSRXEsgfD2wDF52X8OZOlyBJOiZpS4y1/obj8b3VQABDPyPH95bGX/LOpM0vVM137rYgF0pskgVEzLMyZOPpwYqNGPf/d4BtQhBRc8f7+jmr6D4Hrox4jCl0cCKaeiTazun2+Y9E+zgCUDvQ8y9qGctR2k="
# TEST_AWS_ACCESS_KEY_ID
- secure: "bsB6YwkscUxtzcZOKja4Y69IR3JqvCP3W/4vFftW/v33/hOC3EBz7TVNKS+ZIomBUQYJnzsMfM59bj7YEc3KZe8WxIcUdLI40hg0X5O1RhJDNPW+0oGbWshmzyua+hY1y7nRja+8/17tYTbAi1+MhscRu+O/2aWaXolA9BicuX0="
# TEST_AWS_SECRET_ACCESS_KEY
- secure: "cGxnZh4be9XiPBOMxe9wHYwEfrWNw4zSjmvGFEC9UUV11ydHLo5wrXtcTVFmY7qxUxYeb0NB2N+CQXE0GcyUKoTviKG9sOS3cxR1q30FsdOVcWDKAzpBUmzDTMwDLAUMysziyOtMorDlNVydqYdYLMpiUN0O+eDKA+iOHlJp7fo="
# STS_ROLE_ARN
- secure: "cuyemI1bqPkWBD5B1FqIKDJb5g/SX5x8lrzkO0J/jkyGY0VLbHxrl5j/9PrKFuvraBK3HC56HEP1Zg+IMvh+uv0D+p5y14C97fAzE33uNgR2aVkamOo92zHvxvXe7zBtqc8rztWsJb1pgkrY7SdgSXgQc88ohey+XecDh4TahTY="
# AWS_S3_SCRATCH_SPACE
- secure: "LvndQIW6dHs6nyaMHtblGI/oL+s460lOezFs2BoD0Isenb/O/IM+nY5K9HepTXjJIcq8qvUYnojZX1FCrxxOXX2/+/Iihiq7GzJYdmdMC6hLg9bJYeAFk0dWYT88/AwadrJCBOa3ockRLhiO3dkai7Ki5+M1erfaFiAHHMpJxYQ="
# AWS_S3_CROSS_REGION_SCRATCH_SPACE
- secure: "esYmBqt256Dc77HT68zoaE/vtsFGk2N+Kt+52RlR0cjHPY1q5801vxLbeOlpYb2On3x8YckE++HadjL40gwSBsca0ffoogq6zTlfbJYDSQkQG1evxXWJZLcafB0igfBs/UbEUo7EaxoAJQcLgiWWwUdO0a0iU1ciSVyogZPagL0="
- jdk: openjdk8
scala: 2.12.11
env: HADOOP_VERSION="3.2.1" SPARK_VERSION="3.0.2" AWS_JAVA_SDK_VERSION="1.11.1033"

script:
- ./dev/run-tests-travis.sh
Expand Down
156 changes: 156 additions & 0 deletions CHANGELOG
Original file line number Diff line number Diff line change
@@ -0,0 +1,156 @@
# spark-redshift Changelog

## 5.1.0 (2022-09-22)

- Make manifest file path use s3a/n scheme
- Add catalyst type mapping for LONGVARCHAR
- Upgrade to Spark 3.2
- Fix log4j-apt compatability with Spark 3.2

## 5.0.5 (2021-11-09)

- Avoid warning when tmp bucket is configured with a lifecycle without prefix.

## 5.0.4 (2021-07-08)

- Upgrade spark version to 3.0.2 and to latest test aws java sdk version to latest

## 5.0.3 (2021-05-10)

- Remove sbt-spark-package plugin dependency (#90)

## 5.0.2 (2021-05-06)

- Add sse kms support (#82)

## 5.0.1 (2021-04-30)

- Address low performance issue while reading csv files (#87)

## 5.0.0 (2021-01-13)

- Upgrade spark-redshift to support hadoop3

## 4.2.0 (2020-10-08)

- Make spark-redshift Spark 3.0.1 compatible

## 4.1.1

- Cross publish for scala 2.12 in addition to 2.11

## 4.1.0

- Add `include_column_list` parameter

## 4.0.2

- Trim SQL text for preactions and postactions, to fix empty SQL queries bug.

## 4.0.1

- Fix bug when parsing microseconds from Redshift

## 4.0.0

This major release makes spark-redshift compatible with spark 2.4. This was tested in production.

While upgrading the package we droped some features due to time constraints.

- Support for hadoop 1.x has been dropped.
- STS and IAM authentication support has been dropped.
- postgresql driver tests are inactive.
- SaveMode tests (or functionality?) are broken. This is a bit scary but I'm not sure we use the functionality
and fixing them didn't make it in this version (spark-snowflake removed them too).
- S3Native has been deprecated. We created an InMemoryS3AFileSystem to test S3A.

## 4.0.0-SNAPSHOT
- SNAPSHOT version to test publishing to Maven Central.

## 4.0.0-preview20190730 (2019-07-30)

- The library is tested in production using spark2.4
- RedshiftSourceSuite is again among the scala test suites.

## 4.0.0-preview20190715 (2019-07-15)

Move to pre-4.0.0 'preview' releases rather than SNAPSHOT

## 4.0.0-SNAPSHOT-20190710 (2019-07-10)

Remove AWSCredentialsInUriIntegrationSuite test and require s3a path in CrossRegionIntegrationSuite.scala

## 4.0.0-SNAPSHOT-20190627 (2019-06-27)

Baseline SNAPSHOT version working with 2.4

#### Deprecation
In order to get this baseline snapshot out, we dropped some features and package versions,
and disabled some tests.
Some of these changes are temporary, others - such as dropping hadoop 1.x - are meant to stay.

Our intent is to do the best job possible supporting the minimal set of features
that the community needs. Other non-essential features may be dropped before the
first non-snapshot release.
The community's feedback and contributions are vitally important.


* Support for hadoop 1.x has been dropped.
* STS and IAM authentication support has been dropped (so are tests).
* postgresql driver tests are inactive.
* SaveMode tests (or functionality?) are broken. This is a bit scarier but I'm not sure we use the functionality and fixing them didn't make it in this version (spark-snowflake removed them too).
* S3Native has been deprecated. It's our intention to phase it out from this repo. The test util ‘inMemoryFilesystem’ is not present anymore so an entire test suite RedshiftSourceSuite lost its major mock object and I had to remove it. We plan to re-write it using s3a.

#### Commits changelog
- 5b0f949 (HEAD -> master, origin_community/master) Merge pull request #6 from spark-redshift-community/luca-spark-2.4
- 25acded (origin_community/luca-spark-2.4, origin/luca-spark-2.4, luca-spark-2.4) Revert sbt scripts to an older version
- 866d4fd Moving to external github issues - rename spName to spark-redshift-community
- 094cc15 remove in Memory FileSystem class and clean up comments in the sbt build file
- 0666bc6 aws_variables.env gitignored
- f3bbdb7 sbt assembly the package into a fat jar - found the perfect coordination between different libraries versions! Tests pass and can compile spark-on-paasta and spark successfullygit add src/ project/
- b1fa3f6 Ignoring a bunch of tests as did snowflake - close to have a green build to try out
- 95cdf94 Removing conn.commit() everywhere - got 88% of integration tests to run - fix for STS token aws access in progress
- da10897 Compiling - managed to run tests but they mostly fail
- 0fe37d2 Compiles with spark 2.4.0 - amazon unmarshal error
- ea5da29 force spark.avro - hadoop 2.7.7 and awsjavasdk downgraded
- 834f0d6 Upgraded jackson by excluding it in aws
- 90581a8 Fixed NewFilter - including hadoop-aws - s3n test is failing
- 50dfd98 (tag: v3.0.0, tag: gtig, origin/master, origin/HEAD) Merge pull request #5 from Yelp/fdc_first-version
- fbb58b3 (origin/fdc_first-version) First Yelp release
- 0d2a130 Merge pull request #4 from Yelp/fdc_DATALAKE-4899_empty-string-to-null
- 689635c (origin/fdc_DATALAKE-4899_empty-string-to-null) Fix File line length exceeds 100 characters
- d06fe3b Fix scalastyle
- e15ccb5 Fix parenthesis
- d16317e Fix indentation
- 475e7a1 Fix convertion bit and test
- 3ae6a9b Fix Empty string is converted to null
- 967dddb Merge pull request #3 from Yelp/fdc_DATALAKE-486_avoid-log-creds
- 040b4a9 Merge pull request #2 from Yelp/fdc_DATALAKE-488_cleanup-fix-double-to-float
- 58fb829 (origin/fdc_DATALAKE-488_cleanup-fix-double-to-float) Fix test
- 3384333 Add bit and default types
- 3230aaa (origin/fdc_DATALAKE-486_avoid-log-creds) Avoid logging creds. log sql query statement only
- ab8124a Fix double type to float and cleanup
- cafa05f Merge pull request #1 from Yelp/fdc_DATALAKE-563_remove-itests-from-public
- a3a39a2 (origin/fdc_DATALAKE-563_remove-itests-from-public) Remove itests. Fix jdbc url. Update Redshift jdbc driver
- 184b442 Make the note more obvious.
- 717a4ad Notes about inlining this in Databricks Runtime.
- 8adfe95 (origin/fdc_first-test-branch-2) Fix decimal precision loss when reading the results of a Redshift query
- 8da2d92 Test infra housekeeping: reduce SBT memory, update plugin versions, update SBT
- 79bac6d Add instructions on using JitPack master SNAPSHOT builds
- 7a4a08e Use PreparedStatement.getMetaData() to retrieve Redshift query schemas
- b4c6053 Wrap and re-throw Await.result exceptions in order to capture full stacktrace
- 1092c7c Update version in README to 3.0.0-preview1
- 320748a Setting version to 3.0.0-SNAPSHOT
- a28832b (tag: v3.0.0-preview1, origin/fdc_30-review) Setting version to 3.0.0-preview1
- 8afde06 Make Redshift to S3 authentication mechanisms mutually exclusive
- 9ed18a0 Use FileFormat-based data source instead of HadoopRDD for reads
- 6cc49da Add option to use CSV as an intermediate data format during writes
- d508d3e Add documentation and warnings related to using different regions for Redshift and S3
- cdf192a Break RedshiftIntegrationSuite into smaller suites; refactor to remove some redundancy
- bdf4462 Pass around AWSCredentialProviders instead of AWSCredentials
- 51c29e6 Add codecov.yml file.
- a9963da Update AWSCredentialUtils to be uniform between URI schemes.

## 3.0.0-SNAPSHOT (2017-11-08)

Databricks spark-redshift pre-fork, changes not tracked.
6 changes: 6 additions & 0 deletions NOTICE
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
Apache Accumulo
Copyright 2011-2019 The Apache Software Foundation

This product includes software developed at
The Apache Software Foundation (http://www.apache.org/).

Loading