-
Notifications
You must be signed in to change notification settings - Fork 66
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
readDelimiter variant for Regex as delimiter #746
Comments
Say I have (output from
Then I have multiple spacess as delimiters... In some command line outputs, I have two words in one column:
Like that NOT SENT... that's where a regex can help here. It's not just tabs, it's a bunch of spaces. Also, how would you parse Markdown tables (or similar)...? Unless the library trims all those extra spaces... but I guess with markdown there might be more complications that just a delimiter. |
Good questions indeed. I think such tables should be parsed by readDelimStr in the future. For now i can only suggest something like this for Markdown.
|
I think that's a bit of an advanced technique for most people with this kind of use case... and it involves parsing in two steps... I wonder if some kind of readDSL would be better here... it could possibly work by line and give helpers for extracting the titles and values? |
Please share desired API or example of usages that you have in mind. Maybe something like this could be added |
I'm closing this fow now. We're working on a new CSV implementation based on Deephaven CSV #827 since it's faster and lighter, however this also doesn't allow Regexes for delimiter characters unfortunately, just a We plan to have an experimental version of it in 0.15. If that still does not work, I'd recommend modifying the string manually, potentially adding quote characters and then parsing it as delimStr. Edit: well, apparently it seems to have some issues with |
Hi, since you mentioned you were developing your own CSV library, I thought I would comment here. Whether you decide to use Deephaven's CSV library or develop your own, there are a variety of things we learned along the way that may benefit you. We used some clever ideas for high performance and also some cute tricks for automatic "type inference". I'd be happy to discuss in more detail in some appropriate forum if you would find that helpful. Best, Corey Kosak @ Deephaven |
@kosak We're not developing our own CSV library. We're simply replacing our Apache commons CSV integration in DataFrame with Deephaven's :) exactly for the reasons you mentioned; performance, type inference, etc. Plus, while we currently don't store our data primitively, using Deephaven, that remains a viable option in the future. |
deephaven/deephaven-csv#212 is merged :) We'll add it in #903. Simply set You can also manually specify |
Maybe since this is a function to especially read delimeters, it might be useful to have an override that takes in a Regex as a delimiter... this might be used for command line output tables that are usually space separated but sometimes inside a column value there might be a single space, so I need to use "\s\s+" to correctly read it in.
The text was updated successfully, but these errors were encountered: