Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dnf5 CLI does not handle emojis (and potentially other unicode symbols?) in %description #1685

Closed
gotmax23 opened this issue Sep 9, 2024 · 15 comments · Fixed by #1713
Closed
Assignees

Comments

@gotmax23
Copy link
Contributor

gotmax23 commented Sep 9, 2024

The uv package's description in Fedora has emojis in it, which dnf5 does not handle correctly. See the difference between dnf5 repoquery and dnf5 info's output:

sudo dnf5 repoquery --qf='%{description}' uv properly renders emojis

dnf5 info does not

$ dnf5 --version
dnf5 version 5.1.17
dnf5 plugin API version 1.0
libdnf5 version 5.1.17
libdnf5 plugin API version 1.0

Loaded dnf5 plugins:
  name: builddep
  version: 1.0.0
  API version: 1.0

  name: changelog
  version: 1.0.0
  API version: 1.0

  name: config-manager
  version: 0.1.0
  API version: 1.0

  name: copr
  version: 0.1.0
  API version: 1.0

  name: needs_restarting
  version: 1.0.0
  API version: 1.0

  name: repoclosure
  version: 1.0.0
  API version: 1.0
@hroncok
Copy link

hroncok commented Sep 10, 2024

@musicinmybrain said in the Fedora Python matrix channel:

I am not deeply committed to the emoji in the description of uv in particular, although all else equal, I default to preserving them since they are a very intentional part of upstream’s writing.
music
I tend to fall on the side of thinking that emoji are part of Unicode support, and (subject to font availability) failure to display them is a bug. Emoji are not the only things that use codepoints outside the BMP or combine several codepoints into one glyph, but they are the most visible to English speakers. So I tend to assume that if emoji don’t work, something in CJK languages doesn’t work either.

@musicinmybrain
Copy link

I just tried this on Fedora 40:

dnf5 info python3-fastapi

and got (excerpted):

               : The key features are:
               : 
               :   \xe2\x80\xa2 Fast: Very high performance, on par with NodeJS and Go (thanks to Starlette
               :     and Pydantic). One of the fastest Python frameworks available.

Now, these are merely Unicode bullets, ! So it looks like dnf5 is treating everything as ASCII and backslash-escaping everything that isn’t ASCII.

The above works fine with dnf 4.21.0 and with rpm -qi.

@musicinmybrain
Copy link

musicinmybrain commented Sep 10, 2024

There are also differences in localization, which kind of worked in rpm, kind of worked differently in dnf 4, and doesn’t work at all in dnf5. You can compare these in Fedora 40:

  • LC_ALL=zh_CN.utf8 rpm -qi python3-fastapi: uses localized summary and description text and build date; does not localize its own fixed strings (field labels)
  • LC_ALL=zh_CN.utf8 dnf info python3-fastapi: localizes all of its own strings (dates, field labels), but does not use the localized summary and description text
  • LC_ALL=zh_CN.utf8 dnf5 info python3-fastapi: ignores locale completely, hex-escapes in the default English description

(Don’t try other locales for now; I just fixed an issue with the other localized descriptions in the python-fastapi spec file, and the updates for this are not yet in stable repositories.)

I guess this should be filed as a separate issue.

@jrohel
Copy link
Contributor

jrohel commented Sep 19, 2024

dnf5 version 5.1.17 does not use locale.
Support was added to the upstream some time ago. I assume the problem was solved by this commit 3d302e0 .
That is, since dnf5 version 5.2.0.

@musicinmybrain
Copy link

Hmm, thanks for that! It does look like this works in Fedora 41. Both the Unicode bullets from python3-fastapi and the emoji from uv are displayed correctly when I try dnf info in a Fedora 41 mock chroot with --enable-network, currently with dnf5 version 5.2.5.0. I had no idea the “preview” version of dnf5 packaged in F39 and F40 was significantly different from the one in F41+.

@musicinmybrain
Copy link

It turns out, however, that setting LC_ALL – even to en_US.UTF-8 – seems to break this.

#1687 (comment)

@jrohel
Copy link
Contributor

jrohel commented Sep 19, 2024

It turns out, however, that setting LC_ALL – even to en_US.UTF-8 – seems to break this.

Is it a dnf problem? Can you please take a look at #1687 (comment)

@musicinmybrain
Copy link

musicinmybrain commented Sep 19, 2024

It turns out, however, that setting LC_ALL – even to en_US.UTF-8 – seems to break this.

Is it a dnf problem? Can you please take a look at #1687 (comment)

Indeed, as described in #1687 (comment), there are two things going on:

  • Setting locale (e.g. through setlocale) fails when the necessary glibc-langpack-## package is not present, so dnf ends up running with a default locale
  • In Fedora 41, with dnf 5.2.6.0, the message printed by dnf5 reports that default locale as C; in Fedora 40, with dnf 5.1.17, the message says C.UTF-8. In at least 5.2.6.0, that messge is hard-coded, and the discrepancy doesn’t reflect a difference in the actual locale.

So I think that at least with respect to this bug, dnf 5.2.x itself as shipped in Fedora 41 may be behaving as one would expect, as when setlocale fails dnf doesn’t necessarily have access to information about whether the expected output encoding is UTF-8, so the backslash-escaping behavior might be considered reasonable.

@musicinmybrain
Copy link

(Deleted a comment that was inaccurate because I did an experiment in a mock chroot that I had forgotten to clean first.)

@jrohel
Copy link
Contributor

jrohel commented Sep 19, 2024

We can try to improve it. If the locale setting fails, then instead of the defaulting to "C" message, we can try setting "C.UTF-8". If this succeeds the message defaulting to "C.UTF-8" will be displayed and only if it fails then defaulting to "C".
I think dnf4 does it this way.

@gotmax23
Copy link
Contributor Author

dnf5 version 5.1.17 does not use locale. Support was added to the upstream some time ago. I assume the problem was solved by this commit 3d302e0 . That is, since dnf5 version 5.2.0.

Can this be backported to the dnf5 version in Fedora 40 as well?

@musicinmybrain
Copy link

We can try to improve it. If the locale setting fails, then instead of the defaulting to "C" message, we can try setting "C.UTF-8". If this succeeds the message defaulting to "C.UTF-8" will be displayed and only if it fails then defaulting to "C". I think dnf4 does it this way.

If there aren’t any pitfalls I’m overlooking, this seems like a good idea and a solid improvement.

@ppisar
Copy link
Contributor

ppisar commented Sep 20, 2024

I don't think that falling to C.UTF-8 is great idea. If a user has a terminal in non-UTF-8 mode, he asks for e.g. ja_JP.EUCJP end DNF5 starts spitting out UTF-8 characters, it wil bodge his terminal. If glibc reverted a default locale from C.UTF-8 back to C, there was probably a good reason for it.

I would rather use nl_langfinfo() to retrieve current locale code set and print that in the error message. Or rather not mentioning "C" locale at all.

@jrohel
Copy link
Contributor

jrohel commented Sep 20, 2024

@ppisar
I think C.UTF-8 was created just for cases where the desired locale is not available. And we assume that a modern system uses UTF-8 as the default. https://sourceware.org/glibc/wiki/Proposals/C.UTF-8

dnf4 is also trying a fallback to C.UTF-8 and so far this has proven to be a good solution.
Yes there is a risk that the user will have a misconfigured system. He will have a locale available that the terminal will not know. The setlocale() will pass and the terminal will break even without this fallback. Or the user will request an uninstalled/unsupported locale on a terminal without UTF-8 support and then it will break because of this fallback.

Or the user will be a root and type rm -rf / and the whole system will crash. I'm just considering where the line is between user-friendliness and protection from shooting yourself in the foot.

However, you could try tweaking the fallback in your PR to set the C++ locale. Maybe based on the nl_langfinfo() you mention.

@jrohel
Copy link
Contributor

jrohel commented Sep 20, 2024

Can this be backported to the dnf5 version in Fedora 40 as well?

I'm not sure. The current plan is to backport anything related to dnf5 to Fedora 40 or lower only if there is a strong reason to do so.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants