Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Windows: Change command line interface to UTF-8 #12377

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

larskanis
Copy link
Contributor

@larskanis larskanis commented Dec 17, 2024

The following inputs are changed to consistently work with UTF-8:

  • script name
  • include paths
  • script input from stdin

They are currently a inconsistent mix of locale encoding and UTF-8, depending on the situation.

Given the following script and a codepage of 850 it changes from: to to:

# execute with: ruby -It€st täst-locale-enc.rb
def pr(*strs)
  strs.each do |str|
    p [str, IO===str ? str.external_encoding&.name : str.encoding.name]
  end
end

if $0==__FILE__
  pr STDIN      # from: [#<IO:<STDIN>>, "CP850"]
                # to:   [#<IO:<STDIN>>, "UTF-8"]
  pr $0         # from: ["t\x84st-locale-enc.rb", "CP850"]
                # to:   ["täst-locale-enc.rb", "UTF-8"]
  pr __FILE__   # from: ["t\x84st-locale-enc.rb", "CP850"]
                # to:   ["täst-locale-enc.rb", "UTF-8"]
  pr __dir__    # from: ["C:/Users/Lars/ruby", "CP850"]
                # to:   ["C:/Users/Lars/ruby", "UTF-8"]
  pr 'ä'        # from: ["ä", "UTF-8"]
                # to:   ["ä", "UTF-8"]
  pr '€'        # from: ["€", "UTF-8"]
                # to:   ["€", "UTF-8"]
  pr $:.first   # from: ["C:/Users/Lars/ruby/t\xE2\x82\xACst", "ASCII-8BIT"]
                # to:   ["C:/Users/Lars/ruby/t€st", "UTF-8"]
  pr $:.last    # from: ["c:/Ruby34/lib/ruby/3.4.0+1/aarch64-mingw-ucrt", "CP850"]
                # to:   ["C:/Users/Lars/ruby/lib/ruby/3.4.0+1/aarch64-mingw-ucrt", "UTF-8"]
end

... and with code from STDIN the changes look like so:

# execute with: cat täst-locale-enc.rb | ruby -It€st
if $0==__FILE__
  pr STDIN      # from: [#<IO:<STDIN>>, "UTF-8"]
                # to:   [#<IO:<STDIN>>, "UTF-8"]
  pr $0         # from: ["-", "CP850"]
                # to:   ["-", "UTF-8"]
  pr __FILE__   # from: ["-", "UTF-8"]
                # to:   ["-", "UTF-8"]
  pr __dir__    # from: [".", "US-ASCII"]
                # to:   [".", "US-ASCII"]
  pr 'ä'        # from: ["\xC3\xA4", "CP850"]
                # to:   ["ä", "UTF-8"]
  pr '€'        # from: ["\xE2\x82\xAC", "CP850"]
                # to:   ["€", "UTF-8"]
  pr $:.first   # from: ["C:/Users/Lars/ruby/t\xE2\x82\xACst", "ASCII-8BIT"]
                # to:   ["C:/Users/Lars/ruby/t€st", "UTF-8"]
  pr $:.last    # from: ["c:/Ruby34/lib/ruby/3.4.0+1/aarch64-mingw-ucrt", "CP850"]
                # to:   ["C:/Users/Lars/ruby/lib/ruby/3.4.0+1/aarch64-mingw-ucrt", "UTF-8"]
end

This PR doesn't change STDIN encoding when keystrokes are read from the console. They are still in locale encoding. This is because it is difficult to implement UTF-8 there and it is a different topic than the command line interface.

This PR works equally with Prism and Parse.y parser.

Closes [Bug #20774]
Closes [Bug #20699]
Closes oneclick/rubyinstaller2#265

- script name
- include paths
- script input from stdin
@larskanis
Copy link
Contributor Author

The test is modified to show, that arbitrary Unicode characters can work now. However it is not necessary to change the test - it passes with and without modification.

@larskanis
Copy link
Contributor Author

It would be nice if this PR could be merged to Ruby-3.4 since it fixes longstanding bugs. I added an item to the compatibility section of the NEWS.md.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

Ruby require fails when the path has special characters
1 participant