Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: bump RAS format to 1.3 #247

Closed
wants to merge 2 commits into from
Closed

feat: bump RAS format to 1.3 #247

wants to merge 2 commits into from

Conversation

deltork
Copy link
Collaborator

@deltork deltork commented Oct 10, 2024

PR Goal?

To fix an issue created by the editor

Fixes?

ReadAlongs/Studio-Web#347

Feedback sought?

sanity check

Priority?

High

Tests added?

yes

How to test?

Confidence?

High

Version change?

no

@deltork deltork linked an issue Oct 10, 2024 that may be closed by this pull request
Copy link

semanticdiff-com bot commented Oct 10, 2024

Review changes with SemanticDiff.

Analyzed 6 of 10 files.

Overall, the semantic diff is 47% smaller than the GitHub diff.

Filename Status
✔️ test/test_dtd.py Analyzed
test/data/ras-dtd-1.1.readalong Unsupported file format
test/data/ras-dtd-1.3.readalong Unsupported file format
✔️ readalongs/_version.py 61.26% smaller
✔️ readalongs/align.py Analyzed
✔️ readalongs/web_api.py Analyzed
✔️ readalongs/text/make_package.py 58.01% smaller
✔️ readalongs/text/util.py 11.36% smaller
readalongs/static/read-along-1.3.dtd Unsupported file format
docs/cli-guide.md Unsupported file format

@deltork deltork requested review from joanise and roedoejet October 10, 2024 13:52
Copy link

codecov bot commented Oct 10, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 87.12%. Comparing base (eec5662) to head (4cf9841).
Report is 17 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #247      +/-   ##
==========================================
- Coverage   87.51%   87.12%   -0.40%     
==========================================
  Files          21       21              
  Lines        1786     1786              
  Branches      323      323              
==========================================
- Hits         1563     1556       -7     
- Misses        185      191       +6     
- Partials       38       39       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Member

@joanise joanise left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work, but requires a few changes.

For CI to pass on Windows (see https://github.com/ReadAlongs/Studio/actions/runs/11276460698/job/31360285739?pr=247) you also need to update test/data/cs-ref.readalong. I should change that test so it only looks at the line where not outputting utf-8 correctly would make a difference, instead of diffing the whole file...

More comments inline.

readalongs/static/read-along-1.3.dtd Show resolved Hide resolved
dur CDATA #IMPLIED
annotation-id CDATA #IMPLIED
sentence-id CDATA #IMPLIED
xmlns CDATA #IMPLIED>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure why we need xmlns here. I get it for the top-level read-along element, and I would get it if every element needed to allow it, but only one additional element I don't get.

id CDATA #IMPLIED
class CDATA #IMPLIED
do-not-align CDATA #IMPLIED
ARPABET CDATA #IMPLIED
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we make this case insensitive?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking the same. However, I am worried about backwards compatibility. Attributes in HTML are case insensitive but are case sensitive in XML (.readalong is a subset of XML). We need a good look at the whole pipeline before we make the switch. We use both XML and HTML parsers in the various parts of the pileline. As far as I know, only the CLIs consume the ARPABET attribute.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe an option is to put both ARPABET and arpabet is the dtd?

id CDATA #IMPLIED
class CDATA #IMPLIED
do-not-align CDATA #IMPLIED
ARPABET CDATA #IMPLIED
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto?

@@ -20,13 +20,17 @@

from lxml import etree

from readalongs._version import VERSION
from readalongs._version import CURRENT_WEB_APP_VERSION, VERSION
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice idea, moving CURRENT_WEB_APP_VERSION to _version.py and reusing it here.

@deltork deltork closed this Dec 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

DTD 1.3
2 participants