-
Notifications
You must be signed in to change notification settings - Fork 159
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
return data class instead of dictionary #1887
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good - definitely think this is a bit nicer to use.
Couple of comments, but happy for you to decide what to do with them. If you do decide to implement the iteration will be happy to have another look over after
splink/cluster_metrics.py
Outdated
```node_metrics, edge_metrics, cluster_metrics = ( | ||
df_graph_metrics.nodes, | ||
df_graph_metrics.edges, | ||
df_graph_metrics.clusters, | ||
)``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that the suggestion was to be able to unpack directly, node_metrics, edge_metrics, cluster_metrics = df_graph_metrics
, or more likely in practice:
node_metrics, edge_metrics, cluster_metrics = linker.compute_graph_metrics(...)
so that you can immediately bypass the class itself if you want. Might want to check with Tom though if that's what he meant.
I'm personally not massively fussed about this feature though, so happy to leave if it's going to be too fiddly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm happy to leave for now and implement at a later date if the user need becomes apparent :)
splink/cluster_metrics.py
Outdated
return """ | ||
A data class of Splink dataframes containing metrics for nodes, edges and clusters. | ||
|
||
Access dataframes via attributes: | ||
`compute_graph_metrics.nodes` for node metrics, | ||
`compute_graph_metrics.edges` for edge metrics and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just personal preference but I always find it weird having multi-line strings that mess with the indentation like this - I think that using implicit continuation and explicit newline "\n"
characters reads clearer (see for example SplinkDataFrame.__repr__
).
But not a big deal at all, so am fine if I'm in the minority on this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is really useful feedback! It looks icky to me too so have updated 👍
Type of PR
Is your Pull Request linked to an existing Issue or Pull Request?
#1677 and #1806
Give a brief description for the solution you have provided
Following user feedback, the
_compute_graph_metrics()
method has been updated to return a data class, instead of a dictionary of splink dataframes.I have tested on the fake_100 dataset. To reproduce:
PR Checklist