Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Include Column List option #343

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

kyrozetera
Copy link

Closes #340

@frankfarrell
Copy link

frankfarrell commented May 16, 2018

This looks really useful. It would be really handy if we could have an option to ignore certain columns, eg columns that use IDENTITY, or columns with default values such as timestamps. That would solve #381

@toddwildey
Copy link

This would be very useful for our purposes and would make loading data into Redshift using this package far more robust for us.

@kyrozetera
Copy link
Author

@toddwildey Unfortunately this package is dead and they haven't accepted or merged any PRs on it in 2 years. We've decided to decouple our Spark processes from Redshift and handle the Redshift data access in a different layer and move away from this package since it's no longer maintained and got no traction from what I could tell on a potential fork.

@eeshugerman
Copy link

eeshugerman commented Dec 6, 2019

Hi @kyrozetera I'm using a (maintained) fork of this library and this patch is just what I need. Would you be interested in contributing this patch to the fork? It applies without issue once the file paths are updated (see tweaked diff attached). If you'd rather not be bothered, I'd be happy to make the contribution on your behalf.

columns-list-patch.txt

@kyrozetera
Copy link
Author

@eeshugerman Sorry, I was out of town for the weekend so wasn't able to look at this until now. Looks like you've made the PR though so 👍

@sidrahsayyad
Copy link

Which version of the package can this change be found in ?

@eeshugerman
Copy link

@sidrahsayyad
Copy link

I'm using pyspark --packages io.github.spark-redshift-community:spark-redshift_2.11:4.0.1 to test this change but consistently getting "Delimiter not found" error logged in stl_load_errors.

@eeshugerman
Copy link

Did you set include_column_list to true? It's false by default.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

COPY command relies on column order in DataFrame
5 participants