-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize arguments handling #127
Conversation
Remove memory/cpu overhead of handling arguments when they are empty. This is important in case of user loading a dump of their database. In such a case, there are no arguments at all and the number of statements is usually big. When testing on a 650M sql file this change reduces number of total allocation from 122772583kB to 38511376 so ~69%. Signed-off-by: Piotr Jastrzebski <[email protected]>
stmts, _ := sqliteparserutils.SplitStatement(sql) | ||
|
||
if len(args) == 0 { | ||
return stmts, nil, nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we return array for params here too: return stmts, make([]Params, len(stmts)), nil
?
It's a bit of overhead but it should be pretty negligible, no? But the benefit is that API will be "uniform" and there will be less special cases in the dependent code...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought about it but this is proportional to the size of the input which is bad. For example for 100k or 1M statements that's not great especially that this is continuous memory which is hard to find in a fragmented heap.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For 1M statements this array will occupy at most 32MB. But from what I saw locally - dump process consumes GBs of RAM to handle 1M statements or so...
So, I agree that this is not great, but I doubt if this is actually that critical...
I would rather keep API uniform but also work on truly streaming approach (but this will require more work...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Streaming is my next step. Just fixing stuff on the way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree to some extend but this is not only about memory. Without special casing the arguments handling code will execute on those empty arrays and it does some logic including conditions. I don't want to execute this for 1M statements when I don't have to.
This reduces number of allocations during batch creation from 5636431kB to 1679694kB which is ~70%. Signed-off-by: Piotr Jastrzebski <[email protected]>
stmt := &Stmt{ | ||
Sql: &sql, | ||
WantRows: wantRows, | ||
} | ||
if err := stmt.AddArgs(params); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add guard in the AddArgs
method ... } else if len(params.Positional()) > 0 { ...
and not change argument to pointer then?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but then we're not really removing special cases and we're doing additional check for 1M statements potentially.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, let's merge this
Thanks. It seems that I will have to change this significantly for streaming so I will try to improve the situation then. |
Remove memory/cpu overhead of handling arguments when they are empty. This is important in case of user loading a dump of their database. In such a case, there are no arguments at all and the number of statements is usually big.
Refs tursodatabase/turso-cli#735