-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added a feature to remove empty .janno columns with rectify #326
base: master
Are you sure you want to change the base?
Conversation
…noFileWithoutEmptyCols
…ack out of the janno module
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #326 +/- ##
==========================================
+ Coverage 60.43% 60.51% +0.08%
==========================================
Files 29 29
Lines 4218 4242 +24
Branches 489 490 +1
==========================================
+ Hits 2549 2567 +18
- Misses 1180 1185 +5
- Partials 489 490 +1 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice. I haven't tested it, but the unit tests look convincing. I left a few comments about the empty-string to "n/a" feature.
replaceInJannoBytestring :: Bch.ByteString -> Bch.ByteString -> Bch.ByteString -> Bch.ByteString | ||
replaceInJannoBytestring from to tsv = | ||
let tsvRows = Bch.lines tsv | ||
tsvCells = map (Bch.splitWith (=='\t')) tsvRows |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will be problematic if a tab is hidden inside quotes, like "a\tweird\tbut\tlegal\tfield-value". Maybe that's OK. It's a bit tragic that we have all this fancy machinery to parse TSV and don't use it here. I understand why (this is so much simpler), but if wanted to be semantically 100% correct it would have to be more complicated. Not sure.
Is this shortcut actually needed? I think the only client who uses this function is explicitNA
, so I suppose we could get rid of these two functions and simply augment our various janno-writing functions to make sure empty strings are always output via n/a? So it would then be matter of parsing and writing a Janno.
replaceInJannoBytestring from to tsv = | ||
let tsvRows = Bch.lines tsv | ||
tsvCells = map (Bch.splitWith (=='\t')) tsvRows | ||
tsvCellsUpdated = map (map (\y -> if y == from || y == Bch.append from "\r" then to else y)) tsvCells |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's going on with "\r"? Why would that happen?
Left _ -> error "internal error, please report" | ||
Right x -> do | ||
let janno = V.toList $ V.map V.toList x | ||
jannoTransposed = transpose janno |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Clever! I didn't know Cassava can simply parse into a vector of vectors. That's great. I think in that case we could just improve replaceInJannoBytestring
above using that same parsing code, to catch empty strings? Or build it into here.
I started to think about #298 and realized
rectify
doesn't touch .janno files at all so far. So I thought I first add a simpler .janno-editing feature: Removing empty columns. That is also a pretty common task. Maybe one dayrectify
will rather bemodify
🤔