-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
offset command not working as expected #9488
Comments
This might be related to built-in parallelism of the Datafusion. Datafusion by default executes in parallel, if query planner thinks it is helpful. You can force DataFusion to work in. single partition by executing command Another option might be to specify desired order in the query. if you add |
The query in the issue occasionally returns the correct answer; however, increasing the offset to a large enough number < total rows, almost never works as expected. First option works well, but it kills the parallelism. Second option isn't always viable especially when data is not originally sorted. Two possible solutions I can think of:
|
I agree with @mustafasrepo 's analysis Fundamentally, If you want to treat the file as though it is ordered, you could consder defining it with a https://arrow.apache.org/datafusion/user-guide/sql/ddl.html#create-external-table |
Describe the bug
Giving wrong results for large rows
To Reproduce
If I have data of less than 753459 rows it will give this result. this is data for
753458
rows┌────────┬────────┐
│ data │ index │
│ int64 │ int64 │
├────────┼────────┤
│ 1 │ 0 │
│ 2 │ 1 │
│ 3 │ 2 │
│ 4 │ 3 │
│ 5 │ 4 │
│ 6 │ 5 │
│ 7 │ 6 │
│ 8 │ 7 │
│ 9 │ 8 │
│ 10 │ 9 │
│ 11 │ 10 │
│ 12 │ 11 │
│ 13 │ 12 │
│ 14 │ 13 │
│ 15 │ 14 │
│ 16 │ 15 │
│ 17 │ 16 │
│ 18 │ 17 │
│ 19 │ 18 │
│ 20 │ 19 │
│ · │ · │
│ · │ · │
│ · │ · │
│ 793439 │ 793438 │
│ 793440 │ 793439 │
│ 793441 │ 793440 │
│ 793442 │ 793441 │
│ 793443 │ 793442 │
│ 793444 │ 793443 │
│ 793445 │ 793444 │
│ 793446 │ 793445 │
│ 793447 │ 793446 │
│ 793448 │ 793447 │
│ 793449 │ 793448 │
│ 793450 │ 793449 │
│ 793451 │ 793450 │
│ 793452 │ 793451 │
│ 793453 │ 793452 │
│ 793454 │ 793453 │
│ 793455 │ 793454 │
│ 793456 │ 793455 │
│ 793457 │ 793456 │
│ 793458 │ 793457 │
├────────┴────────┤
│ 793458 rows │
│ (40 shown)
If the data is more it gives another
this is data for
793459
rowsfor
793460
rowsExpected behavior
No response
Additional context
No response
The text was updated successfully, but these errors were encountered: