Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Option to escape Excel-unfriendly strings #40

Open
MrTemple opened this issue Jun 25, 2021 · 5 comments
Open

Option to escape Excel-unfriendly strings #40

MrTemple opened this issue Jun 25, 2021 · 5 comments
Assignees
Labels
enhancement New feature or request

Comments

@MrTemple
Copy link

Is your feature request related to a problem?

When a csv file contains a string with a leading -, +, or =, Excel will treat it as a formula field and throw an error.

Describe the solution you'd like

Love an option to auto-detect these leading characters and escape them properly with a single leading single-quote '.

@MrTemple MrTemple added the enhancement New feature or request label Jun 25, 2021
@dehesa
Copy link
Owner

dehesa commented Jun 28, 2021

Hi @MrTemple,

Thank you for the suggestion. That is quite a neat feature. I have added tho the backlog. Currently, I am a bit busy with regular work, so I would encourage you to give it a try and implement it yourself, since I don't know when I can get to it, sorry.

In any case, thank you for the feedback.

@MrTemple
Copy link
Author

MrTemple commented Jun 29, 2021

Thanks @dehesa.

Where do you suggest I start? Maybe in _lowlevelWrite(field: String), somewhere either in 3.A or between 3.A and 3.B? (This is used for the Encoder too, right?) Using an option created in Writer.Configuration and flowing through CSVWriter.settings?

And do you think I'd need to add another char to the result.reserveCapacity?

Also, I haven't contributed to an open source project yet. Is this the general process to follow? https://github.com/firstcontributions/first-contributions

@aehlke
Copy link

aehlke commented Mar 1, 2024

@MrTemple did you ever implement this? thank you

@MrTemple
Copy link
Author

MrTemple commented Mar 1, 2024

@aehlke No, I didn't have time to dig under the hood and make the change the way I'd have liked. For my purposes it was faster to just extend String with an excelSafe property.

This has all the escaping that seems necessary for excel. I'm pulling many tens of thousands of very long, complexly-formatted, code-containing issue descriptions from various systems (Jira, Github, etc). These seem to have pretty much the ugliest content you're likely to run into, and this escaping logic seems to wrangle them properly without spilling out of the cell. (@dehesa, if you've got an idea where I can put this in CodableCSV, I could give it a whirl.)

Note about the character length, I settled on 1000 kind of randomly because it worked on my version of excel, and I didn't happen to need the extra info in the fields I was pulling from. You may want to experiment with a higher limit.

Also the excelNumSafeHashPrefixed is just an uber hack to prevent excel monkeying with long numbers. When you open a csv there's no way to prevent excel from turning a serial number or barcode value like 64342363424634 into 6.43e13. If you remember, you can reformat the cells (custom format of 0 is the only way), but if you accidentally save the csv, excel squashes the number forever. 🤦‍♂️ I do a lot of opening and saving and reimporting of the csv that I export. I use this to force it to write out the field as #64342363424634 and then I have to remember to remove it with excelNumSafeHashPrefixRemoved. Ugly, but we're dealing with software written by boomers, for boomers here. 🤣


extension String {
    
        /// Trims whitespaces and newlines, and wraps any unsafe strings for CSV import to excel using ="stringContents", while escaping any double-quotes within the field.
    public var excelSafe: String {
            // Trim whitespaces and newlines.
        let string = self.trimmingCharacters(in: .whitespacesAndNewlines)
        
            // Any double-quotes or newlines must be escaped.
        let forbiddenCharacters = CharacterSet(charactersIn: "\"\r\n")
        
            // If the first character is an equal, minus, or plus sign, excel will treat it like a formula and display "#NAME". If the string contains double-quotes or newlines it will spill out of the cell.
        guard
            !string.hasPrefix("="),
            !string.hasPrefix("-"),
            !string.hasPrefix("+"),
            string.rangeOfCharacter(from: forbiddenCharacters) == nil
        else {
                // Note: Different versions of Excel dislike long formulae (255, 8192 characters?), but will still import and display them (with the ="..." visible in the field.
            let escapedString = string.replacingOccurrences(of: "\"", with: "\"\"")
            let truncatedString = String(escapedString.prefix(1000))
            return "=\"\(truncatedString)\""
        }
        
        return string
    }
    
    public var excelSafeOneLine: String {
        return self.replacingOccurrences(of: "\n", with: " ")
            .replacingOccurrences(of: "\r", with: " ")
            .replacingOccurrences(of: "\t", with: " ")
            .excelSafe
    }
    
    public var excelNumSafeHashPrefixed: String {
        return "#\(self)"
    }
    
    public var excelNumSafeHashPrefixRemoved: String {
        var string = self
        if string.hasPrefix("#") {
            string.removeFirst()
        }
        return string
    }
}

@aehlke
Copy link

aehlke commented Mar 1, 2024

Thanks for the detailed info! I ended up using a different library but it also has this issue so I appreciate this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants