Skip to content
This repository has been archived by the owner on May 24, 2022. It is now read-only.

--fixlengths option #12

Open
seamusabshere opened this issue Nov 14, 2019 · 3 comments
Open

--fixlengths option #12

seamusabshere opened this issue Nov 14, 2019 · 3 comments
Labels
enhancement faraday Requested by Faraday

Comments

@seamusabshere
Copy link
Member

$ cat a.csv
a,b,c
1,2,3
1

$ scrubcsv a.csv
3 rows (1 bad) in 0.00 seconds, 106.58 KiB/sec
a,b,c
1,2,3
Too many rows (1 of 3) were bad

it would be nice if i could do

$ scrubcsv --fixlengths a.csv
3 rows (0 bad) in 0.00 seconds, 106.58 KiB/sec
a,b,c
1,2,3
1,,
@seamusabshere
Copy link
Member Author

obviously this is duplicative of xsv fixlengths, but it's inconvenient to install and call both binaries for what is properly a CSV scrubbing task

@emk
Copy link
Contributor

emk commented Jan 12, 2020

I think this is out of scope for scrubcsv, unfortunately. scrubcsv is designed around the idea that incorrect line lengths are generally a sign of buggy CSV writers that can't get quoting right, and scrubcsv's most fundamental job is to throw out those lines.

If you know that your CSV is 100% valid but has variable-length lines, then try xsv fixlengths first. Since this feature would conflict with the most basic feature of scrubcsv, I feel like it belongs in another tool.

@emk
Copy link
Contributor

emk commented Jan 12, 2020

I suppose we could consider a mode where short lines were OK, and long lines weren't, but there's a good chance you'd need to know exactly how each specific input file was broken.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement faraday Requested by Faraday
Projects
None yet
Development

No branches or pull requests

2 participants