-
Notifications
You must be signed in to change notification settings - Fork 605
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bug: Table.Sample doesn't sample on DuckDB after ibis version > 9.0.0 #10116
Comments
Hi @balintpto -- thanks for the report! I can't reproduce this -- can you try also updating [ins] In [1]: import ibis as ib
...:
...: t = ib.memtable({"x": [1, 2, 3, 4], "y": ["a", "b", "c", "d"]})
...:
...: t.sample(0.5, seed=1234).execute()
Out[1]:
x y
0 2 b
1 3 c
[ins] In [2]: import duckdb
[ins] In [3]: duckdb.__version__
Out[3]: '1.1.0' |
Hm, that's odd. I can't tell from the screenshot what kind of IDE you're running in, but could you try running in a non-fancy Python REPL so we can rule out outside factors? |
Sure, I'm using vscode. I tried from terminal with prints added and the same happens |
Also if you can run in DuckDB without ibis that'd be useful in ruling out a bug on our side. |
Can you post an example command? Never used duckdb without ibis |
Here's one you can run from Python: >>> import pandas as pd
>>> df = pd.DataFrame.from_dict({"a": [1, 2, 3, 4], "b": ["a", "b", "c", "d"]})
>>> import duckdb
>>> con = duckdb.connect()
>>> con.sql("SELECT * FROM df TABLESAMPLE bernoulli (50.0 PERCENT) REPEATABLE (1234)")
┌───────┬─────────┐
│ a │ b │
│ int64 │ varchar │
├───────┼─────────┤
│ 2 │ b │
│ 3 │ c │
└───────┴─────────┘ |
I tried this on 70de8db
9.5.0
|
@balintpto Can you also show the generated SQL? It should look something like this:
|
I'll just note that if I remove the seed, this operation results in anywhere from 0 to all 4 rows. since sampling is non-deterministic, it's possible this is just the result you're getting for your environment with this particular seed could you try other seeds? clearly this isn't ideal (ideally, you'd get exactly half of the number of rows) but I'm not sure there's a ton Ibis can do about it |
Hi @balintpto -- well, you've got our attention, certainly! Could you also try this out with a (slightly) larger dataset? It could be that on your machine, with the >>> import ibis
>>> import string
>>> t = ibis.memtable({"num": range(26), "let": list(string.ascii_lowercase)})
>>> ibis.options.interactive = True
>>> t.sample(0.5, seed=1234) |
Thanks for all the help, well this is interesting:
|
I uninstalled and reinstalled sqlglot but the sampling is still not showing up after the .to_sql method |
Okay, this is interesting. I made a fresh venv and sampling works fine. So maybe it's some library not updating? I don't think creating a new venv with every minor version update is a good solution though. |
Thanks, @balintpto ! We try to keep our version bounds wide to avoid making people upgrade unnecessarily, but we can't always control which versions get updated or which don't. Glad you got it sorted out. |
What happened?
Sampling doesn't seem to have an effect with ibis versions higher then 9.0.0 (I have to use 9.0.0 because of this)
9.5.0:
9.0.0:
Code:
What version of ibis are you using?
9.5.0
What backend(s) are you using, if any?
DuckDB
Relevant log output
No response
Code of Conduct
The text was updated successfully, but these errors were encountered: