-
Notifications
You must be signed in to change notification settings - Fork 107
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DOC add example for Cramer V for column_associations #1186
Conversation
Hi @reshamas thanks for opening this! such examples directly in the reference documentation are super useful to understand how to use a library. Your example is great! I think |
Also I see you add the example section to the |
Super good idea to have done this as a docstring example |
>>> df = pd.DataFrame({f"c_{i}": rng.random(size=20)*10 for i in range(5)}) | ||
>>> df["c_str"] = [f"val {i}" for i in range(df.shape[0])] | ||
>>> df.shape | ||
>>> df.head() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The same for this line and the one past associations
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added output for associations
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like the output is too wide (> 80 chars):
left_column_name left_column_idx right_column_name right_column_idx cramer_v
0 c_3 3 c_str 5 0.8215
1 c_1 1 c_4 4 0.8215
2 c_0 0 c_1 1 0.8215
3 c_2 2 c_str 5 0.7551
4 c_0 0 c_str 5 0.7551
5 c_0 0 c_3 3 0.7551
6 c_1 1 c_3 3 0.6837
7 c_0 0 c_4 4 0.6837
8 c_4 4 c_str 5 0.6837
9 c_3 3 c_4 4 0.6053
10 c_2 2 c_3 3 0.6053
11 c_1 1 c_str 5 0.6053
12 c_0 0 c_2 2 0.6053
13 c_2 2 c_4 4 0.5169
14 c_1 1 c_2 2 0.4122
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's something to do with different configurations for different pandas versions, tests fail on the min reqs configuration 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding >>> pd.set_option('expand_frame_repr', False)
seems to print it properly, however now there's a problem with the ordering of the rows, which for whatever reason changes between pandas 1.5 and pandas 2.2.
Not sure how we can get around this, @jeromedockes any idea?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so I think this should be ok for the display:
>>> pd.set_option('display.width', 200)
>>> pd.set_option('display.max_columns', 10)
>>> pd.set_option('display.precision', 4)
and then at the end of the example
>>> pd.reset_option('display.width')
>>> pd.reset_option('display.max_columns')
>>> pd.reset_option('display.precision')
Hi @reshamas, thanks for contributing! I added a few comments that should fix the tests. |
You might also want to run |
sorry for the delay I'll look at it now |
I think we should have a little function in the also we probably can set pandas display options before running the doctests and perhaps reset them before and after running each module's doctests I don't want this stuff to block this PR though, because it really helps clarify what the output of the |
Cramér's V is a measure of association between two nominal variables, | ||
giving a value between 0 and +1 (inclusive). | ||
|
||
* `Cramer's V <https://en.wikipedia.org/wiki/Cramér%27s_V>`_ | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for adding the explanation and the link that's super useful!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great!! thanks a ton @reshamas . this function was moved to the public api in the last release as we realized it can be very useful on its own ... but it was lacking a proper documentation and examples! battling with pandas output formatting details across multiple versions of pandas and python is not fun but thanks a lot for pushing through :)
Thank you @jeromedockes and @rcap107. It was a very good learning experience. |
Likewise, thanks for contributing! |
yep, thanks a lot! I know you had also started working on adding Pearson correlations to the output, so if you have any questions about that don't hesitate to reach us here on github or on the skrub discord server: https://discord.gg/ABaPnm7fDC |
* bump version after 0.4.0 (#1162) * DOC Correct typo in TableReport docstring (#1168) * Add codespell support (#1126) * add python 3.13 (#1170) * add sections for the next release's changelogs (#1183) * Add a verbosity parameter to TableReport to control the printing of messages in stdout when opening summary report (#1182) * add verbose parameter to print_display to toggle on or off printing of progress messages when generating table report (#1188) * DOC add example for Cramer V for column_associations (#1186) * ENH adding alias "regression" and "classification" (#1180) * Bump codecov/codecov-action from 5.0.7 to 5.1.1 in the actions group (#1191) * remove use of matplotlib.rc_context (#1172) * MAINT adapt for scikit-learn 1.6 (#1135) * Revert "MAINT adapt for scikit-learn 1.6 (#1135)" (#1194) This reverts commit 18af508. * MAINT compatibility sklearn 1.6.0 (#1169) * MAINT fix nightly builds (#1193) * Updating contributing doc to add more detail on how to start (#1185) * FIX better display of object columns with multi-line repr in tablereport (#1196) * update changelog --------- Co-authored-by: Matt J. <[email protected]> Co-authored-by: Yaroslav Halchenko <[email protected]> Co-authored-by: priscilla baah <[email protected]> Co-authored-by: Reshama Shaikh <[email protected]> Co-authored-by: mrastgoo <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Guillaume Lemaitre <[email protected]> Co-authored-by: Riccardo Cappuzzo <[email protected]>
add example for Cramer V
Will be rendered here in documentation:
https://skrub-data.org/stable/reference/generated/skrub.column_associations.html
Towards #1181