Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

apply one-sentence-per-line rule for rst files via github workflow. #4996

Open
wants to merge 1 commit into
base: rolling
Choose a base branch
from

Conversation

fujitatomoya
Copy link
Collaborator

@fujitatomoya fujitatomoya commented Feb 5, 2025

follow-up of #4990, that adds
please make sure to write only one sentence per line as it makes reviewing and modifying your file much easier.

this is still good, but without checking it by system, eventually we could miss and extra iteration.
having this as custom rule using python script, possibly support this requirement and maintenance & developer friendly environment.

this can be still draft, there could be other way to explore. (AFAIK, sphinx-lint does not support this kind of rule.) any comments and suggestions are welcome 🤔

@fujitatomoya fujitatomoya force-pushed the fujitatomoya/one-sentence-per-line branch from 96762ad to 621e2a2 Compare February 5, 2025 05:58
Copy link

github-actions bot commented Feb 5, 2025

HTML artifacts: https://github.com/ros2/ros2_documentation/actions/runs/13151376674/artifacts/2538770703.

To view the resulting site:

  1. Click on the above link to download the artifacts archive
  2. Extract it
  3. Open html-artifacts-4996/index.html in your favorite browser

@fujitatomoya fujitatomoya force-pushed the fujitatomoya/one-sentence-per-line branch 2 times, most recently from 1ca12ba to 3edda48 Compare February 5, 2025 06:05
@fujitatomoya fujitatomoya force-pushed the fujitatomoya/one-sentence-per-line branch from 3edda48 to bcf5d90 Compare February 5, 2025 06:13
@fujitatomoya fujitatomoya changed the title one-sentence-per-line github workflow test. apply one-sentence-per-line rule for rst files via github workflow. Feb 5, 2025
@fujitatomoya
Copy link
Collaborator Author

I did some test if this actually fails the github workflow as following test patch.

diff --git a/source/Citations.rst b/source/Citations.rst
index a433d9fc..2f4e3a27 100644
--- a/source/Citations.rst
+++ b/source/Citations.rst
@@ -3,6 +3,13 @@
 Citations
 =========

+This is a test.
+This is a test. This should fail.
+Is this a test?
+Is this a test? Shoud this fail?
+This is a test!
+This is a test! This should fail!
+
 If you use ROS 2 in your work please cite the 2022 Science Robotics paper `Robot Operating System 2: Design, architecture, and uses in the wild <https://www.science.org/doi/10.1126/scirobotics.abm6074>`_.

     | S. Macenski, T. Foote, B. Gerkey, C. Lalancette, W. Woodall, "Robot Operating System 2: Design, architecture, and uses in the wild," Science Robotics vol. 7, May 2022.

and it works as expected.

image

image

.github/workflows/rst_check.yml Show resolved Hide resolved
.github/workflows/rst_check.yml Show resolved Hide resolved
Comment on lines +26 to +27
# this assumes a sentence ends with a period, question mark, or exclamation mark.
sentence_endings = re.findall(r'[.!?]', line)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could be more rules applied here? i would like to have some feedback for more use cases.

Copy link
Member

@christophebedard christophebedard Feb 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One way to find out would be to run it on all files and see.

Copy link

@Santti4go Santti4go Feb 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm one use case I can think of is for example using ellipses ....
As this is right now this f() would treat it as multiple sentences instead of just one.

Another not so strange use case could be using multiple exclamation marks: "This is one sentence!!!"

I suggest slightly changing the regex to allow multiple appearances. Well, at least that would solve the problems above.

Suggested change
# this assumes a sentence ends with a period, question mark, or exclamation mark.
sentence_endings = re.findall(r'[.!?]', line)
# this assumes a sentence ends with a period, question mark, or exclamation mark.
sentence_endings = re.findall(r'[.!?]+', line)

An small example applying above change
Notice that the last 3 sentences do not cause the test to fail, as expected.

image

@fujitatomoya
Copy link
Collaborator Author

@kscottz @christophebedard got some spare time for this, what do you think? would love to hear some opinions 😃

if violations:
print(f"\n⚠️ Found {len(violations)} violations: One sentence per line is required.")
for lineno, line in violations:
print(f" ❌ Line {lineno}: {line}")
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmmm, this is not user-friendly output, it would be better to output original file name and line to be fixed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see my comment about annotations: #4996 (comment)

@clalancette
Copy link
Contributor

@kscottz @christophebedard got some spare time for this, what do you think? would love to hear some opinions 😃

I think this is a great idea, for what it is worth. See also #4592, which attempts to do a similar thing.

If we do this, here are a few thoughts:

  1. We need to have a way to run this locally. It looks like this PR accomplishes that, but it is an important use-case.
  2. The output needs to point to exactly which line and exactly which file caused the violation, without too much other output. Otherwise contributors tend to be confused about where the problem is (we have that problem in rosdistro a lot).
  3. We have tons of existing violations of this, so this definitely should only be done on new changes for now. This PR also accomplishes that, so that is great.
  4. We may need a way to have exceptions to the rule. There are some places, particularly in bullet-points, where we may need to have multiple sentences per line.

@christophebedard
Copy link
Member

I think this is good if it doesn't have many false positives.

  1. The output needs to point to exactly which line and exactly which file caused the violation, without too much other output. Otherwise contributors tend to be confused about where the problem is (we have that problem in rosdistro a lot).

I don't want to increase the initial scope of this tool, but this would be a nice use-case for PR file annotations. It doesn't seem that complex to set up: https://docs.github.com/en/actions/writing-workflows/choosing-what-your-workflow-does/workflow-commands-for-github-actions#setting-an-error-message

@fujitatomoya
Copy link
Collaborator Author

@clalancette @christophebedard #4592 looks good too.

  • plugins/ros_checkers.py of Sentence checker #4592, tries to address in different design that 1st tries to split the line into sentences, this could be much complicated than i am trying to here. i am not even sure which one is better, one thing i am concerned is false alarms.

One way to find out would be to run it on _all) files and see.

yeah, that would be good practice. i will try with all .md and .rst files to see if what is gonna happen. probably we can have 2 cases, the one is to run the checker for all .md and .rst files, the other is to run the checker for only changes part via workflow.

We need to have a way to run this locally. It looks like this PR accomplishes that, but it is an important use-case.

1st i wanted to try with github workflow, but yes definitely this should be done locally. i would consider Makefile target for local check.

The output needs to point to exactly which line and exactly which file caused the violation, without too much other output.

agree. thanks for the comment and info. the thing is that we are doing here is checking the only changed part via git, that means we need to implement this functionality in the python, i guess. i will try to give it a shot.

We have tons of existing violations of this, so this definitely should only be done on new changes for now. This PR also accomplishes that, so that is great.

totally agree. i was considering 2 different perspectives. as maintainer's one, we should probably run the checker for all .md and .rst files once in a while (including updating rules). as contributor's aspect, they only want to focus on what has been changed.

We may need a way to have exceptions to the rule. There are some places, particularly in bullet-points, where we may need to have multiple sentences per line.

related to doing One way to find out would be to run it on _all) files and see., i will see how complicated it would be... 😓 probably.

i also would love to hear #4592 (comment)

@kscottz
Copy link
Collaborator

kscottz commented Feb 5, 2025

Tomoya you've made me the happiest women in the world and I love it. I'll take a look this afternoon.

@DLu
Copy link
Contributor

DLu commented Feb 6, 2025

What is the overhead to add a pip dependency to the ros2_documentation tool chain? I'd much rather write a sentence-parser package with a bunch of tests than just check code in here.

@fujitatomoya
Copy link
Collaborator Author

What is the overhead to add a pip dependency to the ros2_documentation tool chain?

i did not add any pip packages here, just simply add python script to detect that. i think adding pip dependency should be okay to ros2_documentation.

@DLu
Copy link
Contributor

DLu commented Feb 6, 2025

Sorry, to be clear, I was proposing rather than adding a bunch of sentence checking logic here, I want to create a separate sentence checking logic Python project, and upload it to pypi so that it can be depended on here.

@kscottz
Copy link
Collaborator

kscottz commented Feb 6, 2025

Let me preface this by saying I don't want great to be the enemy of good here. If this works, let's roll with it and we can always circle back. Perhaps we can get a rough set of ideas together and then find a summer GSoC student to work this all into an ament tool down the road?

Having said that, I am going to agree with @DLu on this, what if we just push all of this stuff down into Python package so we could (1) use that package as part of an ament lint utility that could deliver the same functionality to package maintainers (2) use it locally, and (3) use it part of CI task?

@mjcarroll can we add this to the ROS 2 meeting for next week? It seems like rolling this and #4997 into ament lint would be a great GSoC summer project. With the wiki being deprecated it seems like improved tooling and some guidance for package maintainers would be really useful right about now.

@mjcarroll
Copy link
Member

Added to agenda. Thanks.

"""
Check only changed lines for violations.

.. warning:: It checks only added / modified lines for the viaolation in the file,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
.. warning:: It checks only added / modified lines for the viaolation in the file,
::warning:: It checks only added / modified lines for the violation in the file,

Comment on lines +26 to +27
# this assumes a sentence ends with a period, question mark, or exclamation mark.
sentence_endings = re.findall(r'[.!?]', line)
Copy link

@Santti4go Santti4go Feb 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm one use case I can think of is for example using ellipses ....
As this is right now this f() would treat it as multiple sentences instead of just one.

Another not so strange use case could be using multiple exclamation marks: "This is one sentence!!!"

I suggest slightly changing the regex to allow multiple appearances. Well, at least that would solve the problems above.

Suggested change
# this assumes a sentence ends with a period, question mark, or exclamation mark.
sentence_endings = re.findall(r'[.!?]', line)
# this assumes a sentence ends with a period, question mark, or exclamation mark.
sentence_endings = re.findall(r'[.!?]+', line)

An small example applying above change
Notice that the last 3 sentences do not cause the test to fail, as expected.

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants