Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Custom Parser Error #53

Open
MaximilianJHuber opened this issue Mar 15, 2018 · 2 comments
Open

Custom Parser Error #53

MaximilianJHuber opened this issue Mar 15, 2018 · 2 comments

Comments

@MaximilianJHuber
Copy link

I am trying to parse a CSV and convert a column that holds the sex of a person into a Nullable{Bool}

My attempts failed:

sex_parser = CustomParser(Bool) do str, i, len, opts
    return (len == 0 ? Nullable{Bool}() : Nullable{Bool}(str[i] == 'M'), len == 0 ? i : i + 1)
end

I spent couple of hours reading the docs and the code, but I do not understand what the str is that the tryparsenext receives in a readcsv context, and whether my CustomParser must be aware of the delimiter symbol.

For example, consider a CSV with ';' delimiter:

123;F;6789;
124;M;6712;
125;;6716;

Is str a row and my custom parse receives pos=5, len=1 for the first row and pos=5, len=0 for the third row?
And after a successful parse in row one, do I return 6 or 7. The doc is saying "position the next token, if any, starts at", that sound like my parser needs to be aware of my delimiter!

@MaximilianJHuber
Copy link
Author

I did some sleuthing and the str appears to be multiple (if not all) rows and len always is the same very big number.
However my new custom parser fails too:

sex_parser = CustomParser(Bool) do str, i, len, opts
    return (str[i] == ';' ? Nullable{Bool}(nothing) : Nullable{Bool}(str[i] == 'M'), str[i] == ';' ? i + 1: i + 2)
end

A JuliaDB loadtable call yields an error:

MethodError: no method matching isless(::TextParse.StrRange, ::TextParse.StrRange)
Closest candidates are:
  isless(::Missings.Missing, ::Any) at C:\Users\Max\AppData\Local\JuliaPro-0.6.2.1\pkgs-0.6.2.1\v0.6\Missings\src\Missings.jl:74
  isless(::DataValues.DataValue{Union{}}, ::Any) at C:\Users\Max\AppData\Local\JuliaPro-0.6.2.1\pkgs-0.6.2.1\v0.6\DataValues\src\scalar/core.jl:257
  isless(::DataValues.DataValue{S}, ::T) where {S, T} at C:\Users\Max\AppData\Local\JuliaPro-0.6.2.1\pkgs-0.6.2.1\v0.6\DataValues\src\scalar/core.jl:251
  ...
cmp at .\operators.jl:303 [inlined]
cmpelts at C:\Users\Max\AppData\Local\JuliaPro-0.6.2.1\pkgs-0.6.2.1\v0.6\IndexedTables\src\columns.jl:343 [inlined]
macro expansion at C:\Users\Max\AppData\Local\JuliaPro-0.6.2.1\pkgs-0.6.2.1\v0.6\IndexedTables\src\columns.jl:363 [inlined]
....

@MaximilianJHuber
Copy link
Author

I found a solution:

sex_parser = CustomParser(Bool) do str, i, len, opts
    return (DataValues.DataValue{Bool}(str[i] == 'M'), i + 1)
end

This inner parser is wrapped in a TextParse.NAToken(sex_parser). This way an empty field is taken care of by NAToken and I can be sure that str[i] is not empty.

Might be worth adding to the docs!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants