-
-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Automatically determine the encoding of the file #29
Comments
I think what you're looking for is this line. This means, when creating a |
I see, so no way to determine encoding automatically then. Tough :( there is a method on NSString that computes that from an NSData object. Will try to look into that then on my own. Thanks for having a look. |
As you can see here we already have logic in place which will automatically determine the type of line ending of the file when Would you be up adding this feature youself and sending a PR with tests and docs updated? If there's a method on NSString/NSData which can handle that, than it should be pretty straight forward to implement since you have the same logic for line ending already in place. That would be really awesome. I'm reopening this issue and renaming it to describe this feature. |
I can make a PR, probably in next few days, I have already found a solution to this by the way and made a simple String extension that returns the encoding to me in String.Encoding format. The only other issue and a bit off topic here is the delimeters (can be ; as well sometimes) and if one can process a string from memory as a CSV file. Because the NSString method i am referring to not only guesses the encoding but also returns the string to you which would potentially need to be handled by the importer on the fly rather than from a file. |
Sounds good. Note that one of the advanatages of CSVImporter is that it's able to read big files faster and more safely since it doesn't read the entire file at once, which your solution probably does. So that's another plus on implementing this in CSVImporter. I don't really understand your other problems though. You probably would need to post some code so I can understand. Note though, that if it's a different problem than this one, it's probably better you open another issue for each problem. |
how does csvimporter handle garbage? i have a specific data structure but it can be corrupted or fields missiing or added so i need to add some regex. |
CSVImporter generally expects a valid CSV file according to RFC 4180 which specifies:
When a line for example doesn't have the same number of fields, then – at the moment – the entire line is simply ignored. That's not required by the RFC though (that's why it's a "should" not a "must"), so we could implement multiple different fallback strategies and let the user choose between them. Can you give examples of lines and how they are "corrupted"? Depending on the case, I'm perfectly okay with a little more accommodating behavior, so long as it doesn't conflict with the RFC. Feel free to post a PR with the changes you need and I'll have a look. As long as it is an opt-in feature, is documented (in the README) and is covered by tests (your corrupted file), I'm happy to merge it! |
Hi there, again thanks for making this since it saves tons of time.
Could you point me in the code or explain how does the importer determine what type of encoding the file is in when importing. I need to somehow extract this information and not sure how to do that. Maybe you can give me a hint where to look. not a bug more like request for information. And is there actually an automatic encoding determination or am i misinterpreting things?
guard let csv = CSVImporter<[String: String]>(url: fileURL) else {
return
}
The text was updated successfully, but these errors were encountered: