-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
function: 'lowest' common type #157
Labels
enhancement
New feature or request
Comments
Hey Majid, great observation. Although it’s not exactly what you’re looking for we have a performance enhancement implementation leveraging this fact under ‘visions.type sets.typeset’ called ‘traverse_graph_with_sampled_series’ that you can invoke directly for a quick speed up win.
More broadly, if instead of the ‘detect_type’ method you simply use ‘detect’ (and infer counterparts) you can pull the full inference path which consists of a list of nodes from root to final. You can then find the intersections between columns across your discrete data sets to determine a best representation.
…On Tue, Dec 22 2020 at 12:37 PM, Majid alDosari < ***@***.*** > wrote:
Sometimes going through a whole array is not needed. You have subsets of
the array and you just want to get a compatible data type for all subsets.
A common scenario when assembling horrible csvs is that the same column
might be inferred as different types in different csvs. For example,
(float <-- int). Worst case is to 'fall back' to string.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub (
#157 ) , or unsubscribe (
https://github.com/notifications/unsubscribe-auth/AB3MV54GPBNRRD4SHHRGN4TSWDKMDANCNFSM4VF5XCUA
).
|
I should add, If you were interested in making a PR for this use case it would be more than welcome. A basic implementation would look something like this: def cast_along_path(series, graph, path, state={}):
base_type = path[0]
for vision_type in path[1:]:
relation = graph[base_type][vision_type]["relationship"]
series = relation.transform(series, state)
return series Which could be invoked T = typeset
s = pd.Series([your data])
path = [Generic, Object, String]
# Type Detection
new_s = cast_along_path(s, T.base_graph, path)
# Type Inference
new_s = cast_along_path(s, T.relation_graph, path) |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Sometimes going through a whole array is not needed. You have the types of the subsets of the array and you just want to get a compatible data type for all subsets.
A common scenario when assembling horrible csvs is that the same column might be inferred as different types in different csvs. For example, (float <-- int). Worst case is to 'fall back' to string.
The text was updated successfully, but these errors were encountered: