Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(c/driver/postgresql): fix ingest with multiple batches #1393

Merged
merged 1 commit into from
Dec 22, 2023

Conversation

lidavidm
Copy link
Member

@lidavidm lidavidm commented Dec 21, 2023

The COPY writer was ending the COPY command after each batch, so any dataset with more than one batch would fail. Instead, write the header once and don't end the command until we've written all batches.

Fixes #1310.

@github-actions github-actions bot added this to the ADBC Libraries 0.9.0 milestone Dec 21, 2023
@lidavidm
Copy link
Member Author

CC @WillAyd do the COPY changes here look reasonable?

Copy link
Contributor

@WillAyd WillAyd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice - I think this is a great extension of what is in place

PQerrorMessage(conn));
return ADBC_STATUS_IO;
}
if (rows_affected) *rows_affected += array->length;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What we might want to do (certainly fine as a follow up) is send this data to the server at the end of each loop iteration using PQputCopyData:

https://www.postgresql.org/docs/current/libpq-copy.html#LIBPQ-COPY-SEND

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, what do you mean? We're still calling PQputCopyData for each batch, we're just not calling PQputCopyEnd until we've written all the batches

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(We could also be smarter here and flush to the server at a certain amount of data instead of each batch to smooth things out between large/small batches)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah perfect. Ignore what I said then - I just missed it because I didn't see it in the diff

@lidavidm lidavidm merged commit 0f06843 into apache:main Dec 22, 2023
53 of 55 checks passed
@lidavidm lidavidm deleted the gh1310 branch December 22, 2023 13:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

python: errors using pyarrow Dataset with adbc_ingest() for adbc_driver_postgres()
2 participants