Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

byref parameters starting with some non-ASCII identifiers cannot be set. #18164

Open
ijklam opened this issue Dec 19, 2024 · 6 comments
Open
Labels
Bug help wanted Impact-Low (Internal MS Team use only) Describes an issue with limited impact on existing code.
Milestone

Comments

@ijklam
Copy link
Contributor

ijklam commented Dec 19, 2024

byref parameters whose identifier starting with some non-ASCII (like Chinese characters) cannot be set.

Repro steps

Enter this code in fsi:

let f (x: int byref) = x <- x + 1
let f3 (a: int byref) = a叉 <- a叉 + 1
let f2 (: int byref) =<-+ 1 // fails to compile
let f4 (a: int byref) = 叉a <- 叉a + 1 // fails to compile

Expected behavior

The code can be compiled successfully

Actual behavior

A FS0027 was thrown.

Image

Known workarounds

Don't use non-ASCII identifiers.

Related information

Provide any related information (optional):

  • Windows 11 23H2
  • .NET 9 with F#9
  • Editing Tools: fsi or ionide-vscode
@github-actions github-actions bot added this to the Backlog milestone Dec 19, 2024
@ijklam ijklam changed the title byref parameters with non-ASCII identifiers cannot be set. byref parameters with starting with some non-ASCII identifiers cannot be set. Dec 19, 2024
@ijklam ijklam changed the title byref parameters with starting with some non-ASCII identifiers cannot be set. byref parameters starting with some non-ASCII identifiers cannot be set. Dec 19, 2024
@ijklam
Copy link
Contributor Author

ijklam commented Dec 20, 2024

Update: Even the capital-letter-leading byref parameters are considered as immutable. Is this by designed?

Image

It seems that this is because these functions are firstly be parsed to a lambda with a match rather than a normal lambda.

let f2 (叉: int byref) = 叉 <- 1;;
Image

let f2 (a: int byref) = a <- 1;;
Image

@brianrourkeboll
Copy link
Contributor

That seems pretty unlikely to me to be by design.

@abonie abonie added Impact-Low (Internal MS Team use only) Describes an issue with limited impact on existing code. help wanted and removed Needs-Triage labels Jan 6, 2025
@brianrourkeboll
Copy link
Contributor

brianrourkeboll commented Jan 25, 2025

The AST is different for these two:

AST

let f (a: int byref) = a叉 <- a叉 + 1

AST

let f (: int byref) =<-+ 1

a叉 is parsed as a SynPat.Named; is parsed as a SynPat.LongIdent.

Image

I believe that comes from here:

fsharp/src/Compiler/pars.fsy

Lines 3763 to 3769 in c4d36d6

| atomicPatternLongIdent %prec prec_atompat_pathop
{ let vis, lidwd = $1
if not (isNilOrSingleton lidwd.LongIdent) || String.isLeadingIdentifierCharacterUpperCase (List.head lidwd.LongIdent).idText then
mkSynPatMaybeVar lidwd vis (lhs parseState)
else
let synIdent = List.head lidwd.IdentsWithTrivia
SynPat.Named(synIdent, false, vis, synIdent.Range) }

and here:

// Scripts that distinguish between upper and lower case (bicameral) DU Discriminators and Active Pattern identifiers are required to start with an upper case character.
// For valid identifiers where the case of the identifier cannot be determined because there is no upper and lower case we will allow DU Discriminators and upper case characters
// to be used. This means that developers using unicameral scripts such as hindi, are not required to prefix these identifiers with an Upper case latin character.
//
let isLeadingIdentifierCharacterUpperCase (s: string) =
let isUpperCaseCharacter c =
// if IsUpper and IsLower return the same value, then we can't tell if it's upper or lower case, so ensure it is a letter
// otherwise it is bicameral, so must be upper case
let isUpper = Char.IsUpper c
if isUpper = Char.IsLower c then
Char.IsLetter c
else
isUpper
s.Length >= 1 && isUpperCaseCharacter s[0]

isLeadingIdentifierCharacterUpperCase returns true for "叉", which causes it not to be parsed as a SynPat.Named.

Edit: see #18164 (comment).

We should really probably be calling String.EnumerateRunes and using Rune.IsUpper...

open System.Text

let s = ""

for rune in s.EnumerateRunes () do
    printfn $"Rune: {rune}. Is upper: %A{Rune.IsUpper rune}." // Rune: 叉. Is upper: false.

Too bad we don't have runes in netstandard2.0...

I guess we probably shouldn't try to add a full port of that to the compiler (e.g., stealing the logic from here, here, here, etc.), but otherwise I'm not sure what a suitable compromise between what we have now (which is just wrong for some inputs) and Rune.IsUpper would look like.

@brianrourkeboll
Copy link
Contributor

brianrourkeboll commented Jan 25, 2025

Edit: see #18164 (comment).

...Actually, maybe something like this would be good enough? (Edit: forgot the unicameral script part. But still, I think it might be workable.)

let isLeadingIdentifierCharacterUpperCase (s: string) =
    let isAscii c = uint c <= 0x007fu
    let isInRangeInclusive value (lo, hi) = (value - lo) <= (hi - lo)

    if s.Length = 0 then false
    elif isAscii s[0] then isInRangeInclusive s[0] ('A', 'Z')
    else Globalization.CharUnicodeInfo.GetUnicodeCategory (s, 0) = Globalization.UnicodeCategory.UppercaseLetter

Compare: Rune.IsUpper, UnicodeUtility.IsInRangeInclusive, UnicodeUtility.GetUnicodeCategoryNonAscii.

@brianrourkeboll
Copy link
Contributor

brianrourkeboll commented Jan 25, 2025

Oh, duh. isLeadingIdentifierCharacterUpperCase is really meant to be something more like "could the leading grapheme in the identifier be treated as though it were upper-case."

That is used to allow unicameral scripts (scripts that have no upper-case) to be used for identifiers in places where identifiers starting with an upper-case grapheme are otherwise required, e.g., type U = A | 叉 | हिन्दी.

In order to be able to use such identifiers in patterns in the same way as identifiers actually starting with upper-case graphemes, they must always be parsed as such.

E.g.,

type U = A || हिन्दी

let f (x : U) =
    match x with
    | A ->|->| हिन्दी ->

requires that those identifiers be parsed the same way in any other pattern context, including as parameters:

let f (A || हिन्दी) = ()

Note that this also fails to compile:

> let f (A : int byref) = A <- A + 1;;

  let f (A : int byref) = A <- A + 1;;
  ------------------------^^^^^^^^^^

stdin(1,25): error FS0027: This value is not mutable. Consider using the mutable keyword, e.g. 'let mutable A = expression'.

So we probably shouldn't change the AST, but we ought to be able to make the type-checking work for this.

This works just fine, for example:

let f () =
    let mutable A = 3
    A <- A + 1

...Playing around some more, there are some additional oddities with the typechecking of set <- expressions.

This fails with an internal error, for example:

let f a =
    let mutable A = a
    A <- A
    // error FS0073: internal error: Undefined or unsolved type variable: 'a

While even this fails with "error FS0971: Undefined value 'a'":

let f () =
    let mutable a = 3
    a <- a
    // error FS0971: Undefined value 'a'

@brianrourkeboll
Copy link
Contributor

So yeah, as @ijklam pointed out in #18164 (comment), basically the arg pat being parsed as a SynPat.LongIdent (just) causes the synthetic lambda body to be translated into a match expression, where the SynPat.LongIdent becomes the SynExpr.LongIdent input of the SynExpr.Match (as well as becoming the pattern of the match's single SynMatchClause).

When that synthetic match expression is typechecked, the SynExpr.LongIdent derived from the argument pattern gets explicitly typechecked as an expression before the SynExpr.LongIdentSet is typechecked, which means that it gets implicitly dereferenced before then as well.

When the SynExpr.LongIdent representing the same symbol is later typechecked inside of the SynExpr.LongIdentSet, its type has already been resolved to be the dereferenced type (e.g., int instead of byref<int>).

I don't see an easy way around this without a more invasive change. But maybe I'm not being creative enough?

(I tried a few things, like emitting a synthetic mutable let-binding instead of a match. That indeed fixes this scenario—but of course it then incorrectly allows any non-lower-case identifier used in a pattern to be treated as mutable 🙃.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug help wanted Impact-Low (Internal MS Team use only) Describes an issue with limited impact on existing code.
Projects
Status: New
Development

No branches or pull requests

3 participants