Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid platform-specific code in markdown:check-links task #422

Merged
merged 2 commits into from
Dec 1, 2023

Conversation

per1234
Copy link
Contributor

@per1234 per1234 commented Dec 1, 2023

The markdown:check-links task uses the markdown-link-check. This tool does not have a capability for discovering Markdown files so it is necessary to use the find command to discover the files, then pass their paths to the markdown-link-check tool.

Since it is managed as a project dependency using npm, the markdown-link-check tool is invoked using npx. Since the find command must be ran in combination with markdown-link-check, it is necessary to use the --call flag of npx. Even though Windows contributors are required to use a POSIX-compliant shell such as Git Bash when working with the assets, the commands ran via the --call flag are executed using the native shell, which means the Windows command interpreter on a Windows machine even if the task was invoked via a different shell. This causes commands completely valid for use on a Linux or macOS machine to fail to run on a Windows machine due to the significant differences in the Windows command interpreter syntax.

During the original development of the task, a reasonably maintainable cross-platform command could not be found. Lacking a better option, the hacky approach was taken of using a conditional to run a different command depending on whether the task was running on Windows or not, and not using npx for the Windows command. This resulted in a degraded experience for Windows contributors because they were forced to manually manage the markdown-link-check tool dependency and make it available in the system path. It also resulted in duplication of the fairly complex code contained in the task.

Following the elimination of unnecessary complexity in the task code, it became possible to use a single command on all platforms.

The Windows command interpreter syntax still posed a difficulty even for the simplified command: A beneficial practice, used throughout the assets, is to break commands into multiple lines to make them and the diffs of their development easier to read. With a POSIX-compliant shell this is accomplished by escaping the introduced newlines with a backslash. However, the Windows command interpreter does not recognize this syntax, making the commands formatted in that manner invalid when the task was ran on a Windows machine. The identified solution was to define the command via a Taskfile variable. The YAML syntax was carefully chosen to support the use of the familiar backslash escaping syntax, while also producing in a string that did not contain this non-portable escaping syntax after passing through the YAML parser.

Alternative solution

An alternative approach was taken initially during the work for this PR:

# Source: https://github.com/arduino/tooling-project-assets/blob/main/workflow-templates/assets/check-markdown-task/Taskfile.yml
markdown:check-links:
  desc: Check for broken links
  deps:
    - task: docs:generate
    - task: npm:install-deps
  cmds:
    - |
      # Using -regex instead of -name to avoid Task's behavior of globbing even when quoted on Windows
      # The odd method for escaping . in the regex is required for windows compatibility because mvdan.cc/sh gives
      # \ characters special treatment on Windows in an attempt to support them as path separators.
      find . \
        -type d -name ".git" -prune -o \
        -type d -name ".licenses" -prune -o \
        -type d -name "__pycache__" -prune -o \
        -type d -name "node_modules" -prune -o \
        -regex ".*[.]md" \
        -exec \
          npx \
            markdown-link-check \
              --quiet \
              --config "./.markdown-link-check.json" \
              '{}' \
              +

That code is superior in that it does not require the unintuitive use of a Taskfile variable and the associated confusing YAML quoting rules. The reason why the Taskfile variable is not needed in this version of the task is because the npx --call flag is not used, thus avoiding the need for compatibility with the Windows command interpreter's syntax. The --call flag is not needed in this version because the find command executes the npx command (instead of vice versa as done in the task from this PR).

Unfortunately there is a subtle problem with this approach: it fails with a "The command line is too long." error on Windows systems if used in a project with a large number of Markdown files.

This is unexpected because find is smart enough to limit the number of paths used to expand the {} in the command executed when using the find [...] -exec [...] '{}' + syntax so as to not exceed the system maximum command line length limit, instead executing the command multiple times (using as many paths as possible in each execution) until it has iterated through the list of paths. In fact, everything works perfectly under these conditions if markdown-link-check is invoked directly. It is only when it is invoked via npx that the error occurs. Perhaps there is mismatch between the command line length limit value recognized by find and the one in effect when npx makes an invocation.

Since projects often contain many Markdown files and new files may be added or the paths of the existing files changed
frequently, the best approach for validating Markdown files is to search the project file tree recursively for all
Markdown files, with exclusions configured for the paths of any externally maintained files.

The `markdown:check-links` task uses the markdown-link-check tool. This tool does not have a capability for discovering
Markdown files so it is necessary to use the `find` command to discover the files, then pass their paths to the
markdown-link-check tool.

Previously the discovery code used `find` to generate an array of paths, which was iterated over passed individually to
markdown-link-check in a `for` loop. The `for` loop is unnecessary because `find` has an `-exec` flag that can be used
to execute commands using the discovered paths. Although the syntax and behavior of this flag is unintuitive, these
disadvantages that come from its use are outweighed by the benefits of the significant amount of code that can be
replaced by it. Since the `-exec`/`-execdir` flags are already in use in the assets and project infrastructure, the
maintainer will be forced to work with them regardless.
The `markdown:check-links` task uses the markdown-link-check tool. This tool does not have a capability for discovering
Markdown files so it is necessary to use the `find` command to discover the files, then pass their paths to the
markdown-link-check tool.

Since it is managed as a project dependency using npm, the markdown-link-check tool is invoked using npx. Since the
`find` command must be ran in combination with markdown-link-check, it is necessary to use the `--call` flag of npx.
Even though Windows contributors are required to use a POSIX-compliant shell such as Git Bash when working with the
assets, the commands ran via the `--call` flag are executed using the native shell, which means the Windows command
interpreter on a Windows machine even if the task was invoked via a different shell. This causes commands completely
valid for use on a Linux or macOS machine to fail to run on a Windows machine due to the significant differences in the
Windows command interpreter syntax.

During the original development of the task, a reasonably maintainable cross-platform command could not be found.
Lacking a better option the hacky approach was taken of using a conditional to run a different command depending on
whether the task was running on Windows or not, and not using npx for the Windows command. This resulted in a degraded
experience for Windows contributors because they were forced to manually manage the markdown-link-check tool dependency
and make it available in the system path. It also resulted in duplication of the fairly complex code contained in the
task.

Following the elimination of unnecessary complexity in the task code, it became possible to use a single command on all
platforms.

The Windows command interpreter syntax still posed a difficulty even for the simplified command: A beneficial practice,
used throughout the assets, is to break commands into multiple lines to make them and the diffs of their development
easier to read. With a POSIX-compliant shell this is accomplished by escaping the introduced newlines with a backslash.
However, the Windows command interpreter does not recognize this syntax, making the commands formatted in that manner
invalid when the task was ran on a Windows machine. The identified solution was to define the command via a Taskfile
variable. The YAML syntax was carefully chosen to support the use of the familiar backslash escaping syntax, while also
producing in a string that did not contain this non-portable escaping syntax after passing through the YAML parser.
@per1234 per1234 added type: enhancement Proposed improvement topic: code Related to content of the project itself labels Dec 1, 2023
@per1234 per1234 self-assigned this Dec 1, 2023
Copy link
Contributor

@alessio-perugini alessio-perugini left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! 🚀

@per1234 per1234 merged commit 7530cab into arduino:main Dec 1, 2023
47 checks passed
@per1234 per1234 deleted the unify-markdown-link-check branch December 1, 2023 08:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic: code Related to content of the project itself type: enhancement Proposed improvement
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants