Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ispunct differs from the function with the same name in C, could be worth a note #56680

Open
inkydragon opened this issue Nov 26, 2024 · 7 comments
Labels
docs This change adds or pertains to documentation unicode Related to unicode characters and encodings

Comments

@inkydragon
Copy link
Member

inkydragon commented Nov 26, 2024

Not sure if this is a bug.
This seems intentional, but the ispunct in julia is inconsistent with the behavior of the C function of the same name, which is confusing.

Perhaps a warning could be added to clarify the inconsistency with C.

https://en.cppreference.com/w/cpp/string/byte/ispunct

julia> c = '+'
'+': ASCII/Unicode U+002B (category Sm: Symbol, math)

julia> ispunct(c)
false

julia> ( @ccall ispunct(c::Cchar)::Cint ) != 0
true

julia> c = '-'
'-': ASCII/Unicode U+002D (category Pd: Punctuation, dash)

julia> (ispunct(c), ( @ccall ispunct(c::Cchar)::Cint )!= 0)
(true, true)
  ispunct(c::AbstractChar) -> Bool

  Tests whether a character belongs to the Unicode general category Punctuation, i.e. a character whose category code
  begins with 'P'.

And more chars:

julia> for c in raw"""!"#$%&'()*+,-./:;<=>?@[\]^_`{|}~"""
           ispunct(c) || println(c)
       end
$
+
<
=
>
^
`
|
~
@Seelengrab
Copy link
Contributor

Seelengrab commented Nov 26, 2024

I don't see an issue with the inconsistency - the julia function is explicit in its documentation that it handles Unicode punctuation (consistent with the domain of Char) while the C++ function is concerned with the C locale setting (which, as your reference points out, by default considers + to be punctuation). This may or may not be the same thing.

@KristofferC
Copy link
Member

I don't think there was any claim that this was an issue, just that explicitly pointing out the discrepancy might be a good idea.

@Seelengrab
Copy link
Contributor

I don't think there was any claim that this was an issue,

Well, Github calls this type of ticket an issue, so what else should I call it? 🤷

just that explicitly pointing out the discrepancy might be a good idea.

Yes, and I was agreeing with/reaffirming OP that the two functions aren't even intended to do the same thing. Why should we point out that a similar function in a different programming language has different behavior? Should we also clarify that eval only evaluates julia expressions, and not LISP expressions?

From reading the two documentations, it should already be plainly clear that the two have different behavior.

@Seelengrab
Copy link
Contributor

This came out a bit wrong - what I mean is that IMO there shouldn't be any expectation that the two functions behave the same, given the already existing differences in documentation between the two. So I'm partly questioning what such an additional text would add 😅

@gbaraldi
Copy link
Member

I think having the exact same name as the C function does make it nice to be clear to the user that they aren't the same specifically

@Keno
Copy link
Member

Keno commented Nov 26, 2024

julia> c = '-'
'-': ASCII/Unicode U+002D (category Pd: Punctuation, dash)

Just pointing out that this is technically because that's not minus, but dash (although we do canonicalize them for julia input).
That said, they are different characters:

julia> ispunct('−')
false

@stevengj stevengj added docs This change adds or pertains to documentation unicode Related to unicode characters and encodings labels Nov 27, 2024
@stevengj
Copy link
Member

stevengj commented Nov 27, 2024

I think it would make sense to comment on this explicitly in the ispunct docs.

Should be an easy PR if someone wants to take a stab.

@KristofferC KristofferC changed the title ispunct('+') return false ispunct differs from the function with the same name in C, could be worth a note Nov 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
docs This change adds or pertains to documentation unicode Related to unicode characters and encodings
Projects
None yet
Development

No branches or pull requests

6 participants