Test ddf isin with large list #414

hayesgb · 2022-10-03T02:55:39Z

Adds a test for filtering a dataframe by columns on a large list

tests/benchmarks/test_dataframe.py

ncclementi · 2022-10-06T14:53:41Z

tests/benchmarks/test_dataframe.py

+    ddf = timeseries(end="2000-05-01", dtypes={"A": float, "B": int}, seed=42)
+    ddf.A = ddf.A.mul(1e7)
+    ddf.A = ddf.A.astype(int).persist()
+    a_column_unique_values = np.arange(1, n // 10)


nitpick, it looks like we only use n once, do we need to create a variable (line 71), is this a number that could potentially change? Or did we choose this number arbitrarily?

Cleared up by algning 1e7 to N. Yes the value could change.

ncclementi · 2022-10-06T14:55:53Z

tests/benchmarks/test_dataframe.py

+    n = 10_000_000
+    rs = np.random.RandomState(42)
+    ddf = timeseries(end="2000-05-01", dtypes={"A": float, "B": int}, seed=42)
+    ddf.A = ddf.A.mul(1e7)


It might be worth a comment here on why we need these next two lines. Is it a cardinality issue?

Added comments.

ncclementi · 2022-10-06T16:26:58Z

Thanks, @hayesgb !
When CI finishes, would you uncomment the test.yaml and adding [skip ci] on the commit message

ncclementi · 2022-10-07T22:47:27Z

I'll merge main and uncomment the test.yaml code

hayesgb added 13 commits September 13, 2022 21:34

Test for filtering ddf using isin with large list

fd32c96

testing isin

4b94f0d

Moves filter isin to remote datasets

723d853

Merge branch 'main' into test_ddf_isin

c4d167e

Update dataframe

065d0cd

Merge branch 'main' into test_ddf_isin

3f76775

update test.yml

6c21dff

linting

66fd4a0

Linting

548a84b

Merge branch 'main' into test_ddf_isin

cb53440

Update test_isin to use timeseries

56e68b6

Cleaning up test_isin

dcc4a52

Linting

acdf054

ncclementi reviewed Oct 3, 2022

View reviewed changes

tests/benchmarks/test_dataframe.py Outdated Show resolved Hide resolved

hayesgb requested a review from ncclementi October 4, 2022 16:55

hayesgb added 2 commits October 5, 2022 15:44

Removing print statements

181196a

Linting

201bcec

ncclementi reviewed Oct 6, 2022

View reviewed changes

hayesgb added 2 commits October 6, 2022 11:16

Aligns value of N across the test implementation

be7deb4

Adding comments

3407ca7

ncclementi added 2 commits October 7, 2022 18:44

Merge branch 'main' into test_ddf_isin

935de43

merge main and clean up

fc0b7f0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Test ddf isin with large list #414

Test ddf isin with large list #414

Uh oh!

hayesgb commented Oct 3, 2022

Uh oh!

Uh oh!

ncclementi Oct 6, 2022

Uh oh!

hayesgb Oct 6, 2022

Uh oh!

ncclementi Oct 6, 2022

Uh oh!

hayesgb Oct 6, 2022

Uh oh!

ncclementi commented Oct 6, 2022

Uh oh!

ncclementi commented Oct 7, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Test ddf isin with large list #414

Are you sure you want to change the base?

Test ddf isin with large list #414

Uh oh!

Conversation

hayesgb commented Oct 3, 2022

Uh oh!

Uh oh!

ncclementi Oct 6, 2022

Choose a reason for hiding this comment

Uh oh!

hayesgb Oct 6, 2022

Choose a reason for hiding this comment

Uh oh!

ncclementi Oct 6, 2022

Choose a reason for hiding this comment

Uh oh!

hayesgb Oct 6, 2022

Choose a reason for hiding this comment

Uh oh!

ncclementi commented Oct 6, 2022

Uh oh!

ncclementi commented Oct 7, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants