Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added a feature to remove empty .janno columns with rectify #326

Open
wants to merge 8 commits into
base: master
Choose a base branch
from

Conversation

nevrome
Copy link
Member

@nevrome nevrome commented Jan 3, 2025

I started to think about #298 and realized rectify doesn't touch .janno files at all so far. So I thought I first add a simpler .janno-editing feature: Removing empty columns. That is also a pretty common task. Maybe one day rectify will rather be modify 🤔

@nevrome nevrome requested a review from stschiff January 3, 2025 10:28
Copy link

codecov bot commented Jan 3, 2025

Codecov Report

Attention: Patch coverage is 77.77778% with 8 lines in your changes missing coverage. Please review.

Project coverage is 60.51%. Comparing base (d918c11) to head (bfdcdbc).

Files with missing lines Patch % Lines
src/Poseidon/CLI/Rectify.hs 60.00% 3 Missing and 1 partial ⚠️
src/Poseidon/CLI/OptparseApplicativeParsers.hs 0.00% 3 Missing ⚠️
src/Poseidon/Janno.hs 95.65% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master     #326      +/-   ##
==========================================
+ Coverage   60.43%   60.51%   +0.08%     
==========================================
  Files          29       29              
  Lines        4218     4242      +24     
  Branches      489      490       +1     
==========================================
+ Hits         2549     2567      +18     
- Misses       1180     1185       +5     
- Partials      489      490       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Member

@stschiff stschiff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice. I haven't tested it, but the unit tests look convincing. I left a few comments about the empty-string to "n/a" feature.

replaceInJannoBytestring :: Bch.ByteString -> Bch.ByteString -> Bch.ByteString -> Bch.ByteString
replaceInJannoBytestring from to tsv =
let tsvRows = Bch.lines tsv
tsvCells = map (Bch.splitWith (=='\t')) tsvRows
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will be problematic if a tab is hidden inside quotes, like "a\tweird\tbut\tlegal\tfield-value". Maybe that's OK. It's a bit tragic that we have all this fancy machinery to parse TSV and don't use it here. I understand why (this is so much simpler), but if wanted to be semantically 100% correct it would have to be more complicated. Not sure.

Is this shortcut actually needed? I think the only client who uses this function is explicitNA, so I suppose we could get rid of these two functions and simply augment our various janno-writing functions to make sure empty strings are always output via n/a? So it would then be matter of parsing and writing a Janno.

replaceInJannoBytestring from to tsv =
let tsvRows = Bch.lines tsv
tsvCells = map (Bch.splitWith (=='\t')) tsvRows
tsvCellsUpdated = map (map (\y -> if y == from || y == Bch.append from "\r" then to else y)) tsvCells
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's going on with "\r"? Why would that happen?

Left _ -> error "internal error, please report"
Right x -> do
let janno = V.toList $ V.map V.toList x
jannoTransposed = transpose janno
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clever! I didn't know Cassava can simply parse into a vector of vectors. That's great. I think in that case we could just improve replaceInJannoBytestring above using that same parsing code, to catch empty strings? Or build it into here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants