Avoid platform-specific code in `markdown:check-links` task #422

per1234 · 2023-12-01T04:31:34Z

The markdown:check-links task uses the markdown-link-check. This tool does not have a capability for discovering Markdown files so it is necessary to use the find command to discover the files, then pass their paths to the markdown-link-check tool.

Since it is managed as a project dependency using npm, the markdown-link-check tool is invoked using npx. Since the find command must be ran in combination with markdown-link-check, it is necessary to use the --call flag of npx. Even though Windows contributors are required to use a POSIX-compliant shell such as Git Bash when working with the assets, the commands ran via the --call flag are executed using the native shell, which means the Windows command interpreter on a Windows machine even if the task was invoked via a different shell. This causes commands completely valid for use on a Linux or macOS machine to fail to run on a Windows machine due to the significant differences in the Windows command interpreter syntax.

During the original development of the task, a reasonably maintainable cross-platform command could not be found. Lacking a better option, the hacky approach was taken of using a conditional to run a different command depending on whether the task was running on Windows or not, and not using npx for the Windows command. This resulted in a degraded experience for Windows contributors because they were forced to manually manage the markdown-link-check tool dependency and make it available in the system path. It also resulted in duplication of the fairly complex code contained in the task.

Following the elimination of unnecessary complexity in the task code, it became possible to use a single command on all platforms.

The Windows command interpreter syntax still posed a difficulty even for the simplified command: A beneficial practice, used throughout the assets, is to break commands into multiple lines to make them and the diffs of their development easier to read. With a POSIX-compliant shell this is accomplished by escaping the introduced newlines with a backslash. However, the Windows command interpreter does not recognize this syntax, making the commands formatted in that manner invalid when the task was ran on a Windows machine. The identified solution was to define the command via a Taskfile variable. The YAML syntax was carefully chosen to support the use of the familiar backslash escaping syntax, while also producing in a string that did not contain this non-portable escaping syntax after passing through the YAML parser.

Alternative solution

An alternative approach was taken initially during the work for this PR:

# Source: https://github.com/arduino/tooling-project-assets/blob/main/workflow-templates/assets/check-markdown-task/Taskfile.yml
markdown:check-links:
  desc: Check for broken links
  deps:
    - task: docs:generate
    - task: npm:install-deps
  cmds:
    - |
      # Using -regex instead of -name to avoid Task's behavior of globbing even when quoted on Windows
      # The odd method for escaping . in the regex is required for windows compatibility because mvdan.cc/sh gives
      # \ characters special treatment on Windows in an attempt to support them as path separators.
      find . \
        -type d -name ".git" -prune -o \
        -type d -name ".licenses" -prune -o \
        -type d -name "__pycache__" -prune -o \
        -type d -name "node_modules" -prune -o \
        -regex ".*[.]md" \
        -exec \
          npx \
            markdown-link-check \
              --quiet \
              --config "./.markdown-link-check.json" \
              '{}' \
              +

That code is superior in that it does not require the unintuitive use of a Taskfile variable and the associated confusing YAML quoting rules. The reason why the Taskfile variable is not needed in this version of the task is because the npx --call flag is not used, thus avoiding the need for compatibility with the Windows command interpreter's syntax. The --call flag is not needed in this version because the find command executes the npx command (instead of vice versa as done in the task from this PR).

Unfortunately there is a subtle problem with this approach: it fails with a "The command line is too long." error on Windows systems if used in a project with a large number of Markdown files.

This is unexpected because find is smart enough to limit the number of paths used to expand the {} in the command executed when using the find [...] -exec [...] '{}' + syntax so as to not exceed the system maximum command line length limit, instead executing the command multiple times (using as many paths as possible in each execution) until it has iterated through the list of paths. In fact, everything works perfectly under these conditions if markdown-link-check is invoked directly. It is only when it is invoked via npx that the error occurs. Perhaps there is mismatch between the command line length limit value recognized by find and the one in effect when npx makes an invocation.

Since projects often contain many Markdown files and new files may be added or the paths of the existing files changed frequently, the best approach for validating Markdown files is to search the project file tree recursively for all Markdown files, with exclusions configured for the paths of any externally maintained files. The `markdown:check-links` task uses the markdown-link-check tool. This tool does not have a capability for discovering Markdown files so it is necessary to use the `find` command to discover the files, then pass their paths to the markdown-link-check tool. Previously the discovery code used `find` to generate an array of paths, which was iterated over passed individually to markdown-link-check in a `for` loop. The `for` loop is unnecessary because `find` has an `-exec` flag that can be used to execute commands using the discovered paths. Although the syntax and behavior of this flag is unintuitive, these disadvantages that come from its use are outweighed by the benefits of the significant amount of code that can be replaced by it. Since the `-exec`/`-execdir` flags are already in use in the assets and project infrastructure, the maintainer will be forced to work with them regardless.

The `markdown:check-links` task uses the markdown-link-check tool. This tool does not have a capability for discovering Markdown files so it is necessary to use the `find` command to discover the files, then pass their paths to the markdown-link-check tool. Since it is managed as a project dependency using npm, the markdown-link-check tool is invoked using npx. Since the `find` command must be ran in combination with markdown-link-check, it is necessary to use the `--call` flag of npx. Even though Windows contributors are required to use a POSIX-compliant shell such as Git Bash when working with the assets, the commands ran via the `--call` flag are executed using the native shell, which means the Windows command interpreter on a Windows machine even if the task was invoked via a different shell. This causes commands completely valid for use on a Linux or macOS machine to fail to run on a Windows machine due to the significant differences in the Windows command interpreter syntax. During the original development of the task, a reasonably maintainable cross-platform command could not be found. Lacking a better option the hacky approach was taken of using a conditional to run a different command depending on whether the task was running on Windows or not, and not using npx for the Windows command. This resulted in a degraded experience for Windows contributors because they were forced to manually manage the markdown-link-check tool dependency and make it available in the system path. It also resulted in duplication of the fairly complex code contained in the task. Following the elimination of unnecessary complexity in the task code, it became possible to use a single command on all platforms. The Windows command interpreter syntax still posed a difficulty even for the simplified command: A beneficial practice, used throughout the assets, is to break commands into multiple lines to make them and the diffs of their development easier to read. With a POSIX-compliant shell this is accomplished by escaping the introduced newlines with a backslash. However, the Windows command interpreter does not recognize this syntax, making the commands formatted in that manner invalid when the task was ran on a Windows machine. The identified solution was to define the command via a Taskfile variable. The YAML syntax was carefully chosen to support the use of the familiar backslash escaping syntax, while also producing in a string that did not contain this non-portable escaping syntax after passing through the YAML parser.

alessio-perugini

Nice! 🚀

per1234 added 2 commits November 30, 2023 20:20

per1234 added type: enhancement Proposed improvement topic: code Related to content of the project itself labels Dec 1, 2023

per1234 requested review from alessio-perugini and MatteoPologruto December 1, 2023 04:31

per1234 self-assigned this Dec 1, 2023

alessio-perugini approved these changes Dec 1, 2023

View reviewed changes

per1234 merged commit 7530cab into arduino:main Dec 1, 2023

per1234 deleted the unify-markdown-link-check branch December 1, 2023 08:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Avoid platform-specific code in `markdown:check-links` task #422

Avoid platform-specific code in `markdown:check-links` task #422

Uh oh!

per1234 commented Dec 1, 2023

Uh oh!

alessio-perugini left a comment

Uh oh!

Uh oh!

Avoid platform-specific code in markdown:check-links task #422

Avoid platform-specific code in markdown:check-links task #422

Uh oh!

Conversation

per1234 commented Dec 1, 2023

Alternative solution

Uh oh!

alessio-perugini left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Avoid platform-specific code in `markdown:check-links` task #422

Avoid platform-specific code in `markdown:check-links` task #422