Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize arguments handling #127

Merged
merged 2 commits into from
Jul 18, 2024
Merged

Optimize arguments handling #127

merged 2 commits into from
Jul 18, 2024

Conversation

haaawk
Copy link
Contributor

@haaawk haaawk commented Jul 18, 2024

Remove memory/cpu overhead of handling arguments when they are empty. This is important in case of user loading a dump of their database. In such a case, there are no arguments at all and the number of statements is usually big.

Refs tursodatabase/turso-cli#735

@haaawk haaawk requested review from penberg and sivukhin July 18, 2024 11:31
Remove memory/cpu overhead of handling arguments when they are empty.
This is important in case of user loading a dump of their database.
In such a case, there are no arguments at all and the number of
statements is usually big.

When testing on a 650M sql file this change reduces number of total
allocation from 122772583kB to 38511376 so ~69%.

Signed-off-by: Piotr Jastrzebski <[email protected]>
stmts, _ := sqliteparserutils.SplitStatement(sql)

if len(args) == 0 {
return stmts, nil, nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we return array for params here too: return stmts, make([]Params, len(stmts)), nil?
It's a bit of overhead but it should be pretty negligible, no? But the benefit is that API will be "uniform" and there will be less special cases in the dependent code...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about it but this is proportional to the size of the input which is bad. For example for 100k or 1M statements that's not great especially that this is continuous memory which is hard to find in a fragmented heap.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For 1M statements this array will occupy at most 32MB. But from what I saw locally - dump process consumes GBs of RAM to handle 1M statements or so...

So, I agree that this is not great, but I doubt if this is actually that critical...
I would rather keep API uniform but also work on truly streaming approach (but this will require more work...)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Streaming is my next step. Just fixing stuff on the way.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree to some extend but this is not only about memory. Without special casing the arguments handling code will execute on those empty arrays and it does some logic including conditions. I don't want to execute this for 1M statements when I don't have to.

This reduces number of allocations during batch creation from
5636431kB to 1679694kB which is ~70%.

Signed-off-by: Piotr Jastrzebski <[email protected]>
stmt := &Stmt{
Sql: &sql,
WantRows: wantRows,
}
if err := stmt.AddArgs(params); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add guard in the AddArgs method ... } else if len(params.Positional()) > 0 { ... and not change argument to pointer then?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but then we're not really removing special cases and we're doing additional check for 1M statements potentially.

Copy link
Contributor

@sivukhin sivukhin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, let's merge this

@haaawk
Copy link
Contributor Author

haaawk commented Jul 18, 2024

Ok, let's merge this

Thanks. It seems that I will have to change this significantly for streaming so I will try to improve the situation then.

@haaawk haaawk merged commit 9bc6b51 into main Jul 18, 2024
2 checks passed
@haaawk haaawk deleted the improve_batch branch July 18, 2024 14:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants