Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A question about performance #74

Open
hs41-18 opened this issue Nov 25, 2023 · 1 comment
Open

A question about performance #74

hs41-18 opened this issue Nov 25, 2023 · 1 comment
Labels
question Further information is requested

Comments

@hs41-18
Copy link

hs41-18 commented Nov 25, 2023

Hi. Many thanks for all the work you are doing.

I have a question about getting data from SQLite into Duckdb. In SQLite, I have tables/views with a massive amount of rows 10 billion, transactional data. What is the easiest way to copy those tables into Duckdb to perform some analytics?

@szarnyasg szarnyasg added the question Further information is requested label Nov 25, 2023
@Nintorac
Copy link

Nintorac commented Jan 5, 2024

My experience here;

I am using a python arrow UDF to process results from a query against the sqlite3 source. I'm seeing many small vectors of 1 or 2 records going through (as opposed to the 2048 you would expect at full capacity). I have an expensive setup in my UDF so it kills performance.

I wonder if this behavior would match with internal duckdb operations.

I get larger vectors when accessing the sqlite db in order (wrt the rowid), so maybe a copy wouldn't suffer from the performace penalty

Is there a way to control the vector size here?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants