-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Enhance bulk insert speed using copy command #370
Closed
Closed
Changes from all commits
Commits
Show all changes
23 commits
Select commit
Hold shift + click to select a range
a91c94b
Generate a copy statement
kinghuang c11a929
Bulk insert data using copy
kinghuang 4af0c97
Remove generate_insert_statement
kinghuang 943419b
Directly handle csv generation
kinghuang 734c9b5
Quote table name
kinghuang a97ab70
Add a comment about copy_expert
kinghuang 72f2b37
Improve csv generation
kinghuang e8b438c
Remove unused imports
kinghuang 20cdb55
Improve comment about unquoted null values
kinghuang 534a38a
Escape backslashes in array values
kinghuang ed4838f
Escape backslashes in string values
kinghuang c9f4e3f
Merge branch 'main' into bulk-insert-copy
edgarrmondragon 373f2a3
Merge branch 'main' into bulk-insert-copy
edgarrmondragon b09b4c4
Merge branch 'main' into bulk-insert-copy
edgarrmondragon dd1622a
Merge branch 'main' into bulk-insert-copy
edgarrmondragon cc1c6bc
Merge branch 'main' into bulk-insert-copy
edgarrmondragon 4ca89fa
Merge branch 'main' into bulk-insert-copy
edgarrmondragon 12a85b2
Update imports
edgarrmondragon 519d15b
Merge branch 'main' into bulk-insert-copy
edgarrmondragon f325009
Merge branch 'main' into bulk-insert-copy
edgarrmondragon a7d4263
Merge branch 'main' into bulk-insert-copy
edgarrmondragon a99021e
Merge branch 'main' into bulk-insert-copy
edgarrmondragon a11c76b
Merge branch 'main' into bulk-insert-copy
edgarrmondragon File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think once the tests are passing, we're good to go here. This method of escaping arrays stands out as awkward, but I think it's necessary. I tried putting together an implementation that avoids it using a custom dialect of csv writer. But I think that anything I come up with there is going to be even more janky.
Thoughts @visch ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That makes sense, surprising but if this is better that makes sense to me!
@kinghuang
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll check those out. The csv.QUOTE_NOTNULL and csv.QUOTE_STRINGS would both be more convenient, but they are new in Python 3.12.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kinghuang are you interested in getting this PR to the finish line? Is there anything I can do to help?
Would it be worth using those to make the tests pass on Python 3.12? If they significantly simplify the implementation, we can think about backporting the
csv
library from Python 3.12+ and publish the wheels to PyPI.