-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
readCsv replaces missing values in last column with 0 #19
Comments
Apparently there are other situations where this happens... I mean the unexpected 0. Not only in the last column. echo df.pretty() Dataframe with 4 columns and 4 rows: i.e. ERR gets replaced with 0 (or better, it stays to init'ed int value if I got your point correctly), while NaN or N/A are "just fine". df = df.replace("ERR","NaN") I have still to figure it out IF and HOW I can do the same on datamancer. |
So, I've addressed the things that are broken in my opinion. What I mean is: explicit appearing of However, the case of Note that indeed, the value of
I think this is reasonable. To be honest, I'm not the biggest fan of having to convert the first column here into a float column, but it is what it is. And sure, data cleaning is a very common task. But keep in mind that there is a reason that in the R community there is a whole package just for this, tidyr: In any case, with the DF as it stands now, you can apply rules to the |
Hello Vindaar, that's excellent (it's more than reasonable). The new layout you show is exactly the one I expected and quite the same Python/Pandas (I'm not an R guy... shame on me) would have produced. |
What I would personally do here is the following: df = df.mutate(f{Value -> int: "x" ~
(if `x` == %~ "ERR" or `x` == %~ "N/A":
0
else:
`x`.toInt)}) An explanation:
See the documentation of the There are other ways to do this of course, but this would be the most "idiomatic" if you will. Given the use case though, doing it manually by getting the tensor using |
Thank you Vindaar. Illuminating. I tried some initial experiments in the same direction, but I struggled with the type to use with object (Value... I see... not string) and I've would never tought to use ~ or %~ that way. Explained examples like this are a goldmine for datamancer newcomers. This one to me is worth of a further chapter in datamancer data wrangling tutorial (maybe is there and I've just missed), even because replacement (even if in casis like this, often "filtering out" could be more appropriate) and data type conversion are routinary operations when "cleaning" data. |
Yes, I understand that. The documentation is on the one hand clearly still lacking and on the other everything related to For the specific use cases of comparisons, I suppose I could add overloads to It's a good idea for an additional section in the data wrangling tutorial for sure though! I've opened an issue: for that. |
It is expected to use NaN, as it currently does when missing values happen in inner columns.
The text was updated successfully, but these errors were encountered: