Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question: Parsing without Type Checking? #192

Closed
djchapm opened this issue Jul 18, 2024 · 4 comments
Closed

Question: Parsing without Type Checking? #192

djchapm opened this issue Jul 18, 2024 · 4 comments

Comments

@djchapm
Copy link

djchapm commented Jul 18, 2024

If I want every field return as the literal byte[], no type checking etc, just a UTF-8 character to byte for every column, how would I do that?

Guessing I need to inject a custom parser, but no examples so thought I'd ask.

Thanks!

@devinrsmith
Copy link
Member

You should be able to set the specific parser for a given field index or field name, which forgoes inference:

https://github.com/deephaven/deephaven-csv/blob/v0.14.0/src/main/java/io/deephaven/csv/CsvSpecs.java#L56-L66

@djchapm
Copy link
Author

djchapm commented Jul 18, 2024

Thanks... trying it out.
How does this handle large files? Not seeing a way to progress through the file... only CsvReader.read which gets everything in memory.... ?

@devinrsmith
Copy link
Member

So, the CsvReader will hand off column chunks to the sinks. The basic sinks are, well, basic, and put everything into arrays. A more advanced sink could do whatever it wants with the data (write to disk, some other format, etc). For example, https://github.com/deephaven/deephaven-core uses custom sinks to write into its Table / column format.

https://github.com/deephaven/deephaven-csv/blob/main/ADVANCED.md may be of interest to you

@djchapm
Copy link
Author

djchapm commented Jul 19, 2024

Thanks for the quick responses. Seems non-trivial to use sinks to load a record since it is one sink instance per column I believe, and the sink receives every row value for that column for the input stream... So if I want to parse one row at a time - and send that data downstream, it might be a bit challenging. I didn't look at customizing the parser. It's a nice project, If I had a wishlist Item it would be something like:
CsvReader.read(specs, csvIn, Consumer)
where Consumer in my case would receive Object[] of size numCols.

I can see how this would mess up your threading though, which speeds things up quite a bit.

Cheers

@djchapm djchapm closed this as completed Jul 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants