Include Column List option #343

kyrozetera · 2017-04-28T21:36:35Z

Closes #340

…mns to the COPY command

frankfarrell · 2018-05-16T11:32:35Z

This looks really useful. It would be really handy if we could have an option to ignore certain columns, eg columns that use IDENTITY, or columns with default values such as timestamps. That would solve #381

toddwildey · 2019-10-16T22:56:16Z

This would be very useful for our purposes and would make loading data into Redshift using this package far more robust for us.

kyrozetera · 2019-10-17T02:08:50Z

@toddwildey Unfortunately this package is dead and they haven't accepted or merged any PRs on it in 2 years. We've decided to decouple our Spark processes from Redshift and handle the Redshift data access in a different layer and move away from this package since it's no longer maintained and got no traction from what I could tell on a potential fork.

eeshugerman · 2019-12-06T02:02:04Z

Hi @kyrozetera I'm using a (maintained) fork of this library and this patch is just what I need. Would you be interested in contributing this patch to the fork? It applies without issue once the file paths are updated (see tweaked diff attached). If you'd rather not be bothered, I'd be happy to make the contribution on your behalf.

columns-list-patch.txt

kyrozetera · 2019-12-09T20:15:33Z

@eeshugerman Sorry, I was out of town for the weekend so wasn't able to look at this until now. Looks like you've made the PR though so 👍

sidrahsayyad · 2020-05-19T04:20:07Z

Which version of the package can this change be found in ?

eeshugerman · 2020-05-19T04:43:31Z

@sidrahsayyad https://github.com/spark-redshift-community/spark-redshift

sidrahsayyad · 2020-05-19T05:12:01Z

I'm using pyspark --packages io.github.spark-redshift-community:spark-redshift_2.11:4.0.1 to test this change but consistently getting "Delimiter not found" error logged in stl_load_errors.

eeshugerman · 2020-05-19T16:14:38Z

Did you set include_column_list to true? It's false by default.

kyrozetera added 6 commits April 28, 2017 12:33

introduce the 'include_column_list' parameter

3a9d49a

implement 'include_column_list' in the RedshiftWriter to add the colu…

39245ae

…mns to the COPY command

update comment

5cc0bb7

remove extra comma when not using column list

9fc299a

remove println statements

39873d0

add include_column_list parameter to README

6b8c9fb

eeshugerman mentioned this pull request Dec 9, 2019

Add 'include_column_list' parameter spark-redshift-community/spark-redshift#58

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Include Column List option #343

Include Column List option #343

kyrozetera commented Apr 28, 2017

frankfarrell commented May 16, 2018 •

edited

Loading

toddwildey commented Oct 16, 2019

kyrozetera commented Oct 17, 2019

eeshugerman commented Dec 6, 2019 •

edited

Loading

kyrozetera commented Dec 9, 2019

sidrahsayyad commented May 19, 2020

eeshugerman commented May 19, 2020

sidrahsayyad commented May 19, 2020

eeshugerman commented May 19, 2020

Include Column List option #343

Are you sure you want to change the base?

Include Column List option #343

Conversation

kyrozetera commented Apr 28, 2017

frankfarrell commented May 16, 2018 • edited Loading

toddwildey commented Oct 16, 2019

kyrozetera commented Oct 17, 2019

eeshugerman commented Dec 6, 2019 • edited Loading

kyrozetera commented Dec 9, 2019

sidrahsayyad commented May 19, 2020

eeshugerman commented May 19, 2020

sidrahsayyad commented May 19, 2020

eeshugerman commented May 19, 2020

frankfarrell commented May 16, 2018 •

edited

Loading

eeshugerman commented Dec 6, 2019 •

edited

Loading