`byref` parameters starting with some non-ASCII identifiers cannot be set. #18164

ijklam · 2024-12-19T09:10:13Z

byref parameters whose identifier starting with some non-ASCII (like Chinese characters) cannot be set.

Repro steps

Enter this code in fsi:

let f (x: int byref) = x <- x + 1
let f3 (a叉: int byref) = a叉 <- a叉 + 1
let f2 (叉: int byref) = 叉 <- 叉 + 1 // fails to compile
let f4 (叉a: int byref) = 叉a <- 叉a + 1 // fails to compile

Expected behavior

The code can be compiled successfully

Actual behavior

A FS0027 was thrown.

Known workarounds

Don't use non-ASCII identifiers.

Related information

Provide any related information (optional):

Windows 11 23H2
.NET 9 with F#9
Editing Tools: fsi or ionide-vscode

The text was updated successfully, but these errors were encountered:

ijklam · 2024-12-20T15:42:59Z

Update: Even the capital-letter-leading byref parameters are considered as immutable. Is this by designed?

It seems that this is because these functions are firstly be parsed to a lambda with a match rather than a normal lambda.

let f2 (叉: int byref) = 叉 <- 1;;

let f2 (a: int byref) = a <- 1;;

brianrourkeboll · 2024-12-20T20:07:32Z

That seems pretty unlikely to me to be by design.

brianrourkeboll · 2025-01-25T21:25:02Z

The AST is different for these two:

AST

let f (a叉: int byref) = a叉 <- a叉 + 1

AST

let f (叉: int byref) = 叉 <- 叉 + 1

a叉 is parsed as a SynPat.Named; 叉 is parsed as a SynPat.LongIdent.

I believe that comes from here:

fsharp/src/Compiler/pars.fsy

Lines 3763 to 3769 in c4d36d6

    
           | atomicPatternLongIdent %prec prec_atompat_pathop 
        
               { let vis, lidwd = $1 
        
                 if not (isNilOrSingleton lidwd.LongIdent) || String.isLeadingIdentifierCharacterUpperCase (List.head lidwd.LongIdent).idText then 
        
                     mkSynPatMaybeVar lidwd vis (lhs parseState) 
        
                 else 
        
                    let synIdent = List.head lidwd.IdentsWithTrivia 
        
                    SynPat.Named(synIdent, false, vis, synIdent.Range) }

and here:

fsharp/src/Compiler/Utilities/illib.fs

Lines 745 to 760 in c4d36d6

    
           // Scripts that distinguish between upper and lower case (bicameral) DU Discriminators and Active Pattern identifiers are required to start with an upper case character. 
        
           // For valid identifiers where the case of the identifier cannot be determined because there is no upper and lower case we will allow DU Discriminators and upper case characters 
        
           // to be used.  This means that developers using unicameral scripts such as hindi, are not required to prefix these identifiers with an Upper case latin character. 
        
           // 
        
           let isLeadingIdentifierCharacterUpperCase (s: string) = 
        
               let isUpperCaseCharacter c = 
        
                   // if IsUpper and IsLower return the same value, then we can't tell if it's upper or lower case, so ensure it is a letter 
        
                   // otherwise it is bicameral, so must be upper case 
        
                   let isUpper = Char.IsUpper c 
        
                   if isUpper = Char.IsLower c then 
        
                       Char.IsLetter c 
        
                   else 
        
                       isUpper 
        
               s.Length >= 1 && isUpperCaseCharacter s[0]

isLeadingIdentifierCharacterUpperCase returns true for "叉", which causes it not to be parsed as a SynPat.Named.

Edit: see #18164 (comment).

We should really probably be calling String.EnumerateRunes and using Rune.IsUpper...

open System.Text

let s = "叉"

for rune in s.EnumerateRunes () do
    printfn $"Rune: {rune}. Is upper: %A{Rune.IsUpper rune}." // Rune: 叉. Is upper: false.

Too bad we don't have runes in netstandard2.0...

I guess we probably shouldn't try to add a full port of that to the compiler (e.g., stealing the logic from here, here, here, etc.), but otherwise I'm not sure what a suitable compromise between what we have now (which is just wrong for some inputs) and Rune.IsUpper would look like.

brianrourkeboll · 2025-01-25T21:45:51Z

Edit: see #18164 (comment).

...Actually, maybe something like this would be good enough? (Edit: forgot the unicameral script part. But still, I think it might be workable.)

let isLeadingIdentifierCharacterUpperCase (s: string) =
    let isAscii c = uint c <= 0x007fu
    let isInRangeInclusive value (lo, hi) = (value - lo) <= (hi - lo)

    if s.Length = 0 then false
    elif isAscii s[0] then isInRangeInclusive s[0] ('A', 'Z')
    else Globalization.CharUnicodeInfo.GetUnicodeCategory (s, 0) = Globalization.UnicodeCategory.UppercaseLetter

Compare: Rune.IsUpper, UnicodeUtility.IsInRangeInclusive, UnicodeUtility.GetUnicodeCategoryNonAscii.

brianrourkeboll · 2025-01-25T22:18:33Z

Oh, duh. isLeadingIdentifierCharacterUpperCase is really meant to be something more like "could the leading grapheme in the identifier be treated as though it were upper-case."

That is used to allow unicameral scripts (scripts that have no upper-case) to be used for identifiers in places where identifiers starting with an upper-case grapheme are otherwise required, e.g., type U = A | 叉 | हिन्दी.

In order to be able to use such identifiers in patterns in the same way as identifiers actually starting with upper-case graphemes, they must always be parsed as such.

E.g.,

type U = A | 叉 | हिन्दी

let f (x : U) =
    match x with
    | A -> …
    | 叉 -> …
    | हिन्दी -> …

requires that those identifiers be parsed the same way in any other pattern context, including as parameters:

let f (A | 叉 | हिन्दी) = ()

Note that this also fails to compile:

> let f (A : int byref) = A <- A + 1;;

  let f (A : int byref) = A <- A + 1;;
  ------------------------^^^^^^^^^^

stdin(1,25): error FS0027: This value is not mutable. Consider using the mutable keyword, e.g. 'let mutable A = expression'.

So we probably shouldn't change the AST, but we ought to be able to make the type-checking work for this.

This works just fine, for example:

let f () =
    let mutable A = 3
    A <- A + 1

...Playing around some more, there are some additional oddities with the typechecking of set <- expressions.

This fails with an internal error, for example:

let f a =
    let mutable A = a
    A <- A
    // error FS0073: internal error: Undefined or unsolved type variable: 'a

While even this fails with "error FS0971: Undefined value 'a'":

let f () =
    let mutable a = 3
    a <- a
    // error FS0971: Undefined value 'a'

brianrourkeboll · 2025-01-26T23:32:16Z

So yeah, as @ijklam pointed out in #18164 (comment), basically the arg pat being parsed as a SynPat.LongIdent (just) causes the synthetic lambda body to be translated into a match expression, where the SynPat.LongIdent becomes the SynExpr.LongIdent input of the SynExpr.Match (as well as becoming the pattern of the match's single SynMatchClause).

When that synthetic match expression is typechecked, the SynExpr.LongIdent derived from the argument pattern gets explicitly typechecked as an expression before the SynExpr.LongIdentSet is typechecked, which means that it gets implicitly dereferenced before then as well.

When the SynExpr.LongIdent representing the same symbol is later typechecked inside of the SynExpr.LongIdentSet, its type has already been resolved to be the dereferenced type (e.g., int instead of byref<int>).

I don't see an easy way around this without a more invasive change. But maybe I'm not being creative enough?

(I tried a few things, like emitting a synthetic mutable let-binding instead of a match. That indeed fixes this scenario—but of course it then incorrectly allows any non-lower-case identifier used in a pattern to be treated as mutable 🙃.)

ijklam added Bug Needs-Triage labels Dec 19, 2024

github-project-automation bot added this to F# Compiler and Tooling Dec 19, 2024

github-project-automation bot moved this to New in F# Compiler and Tooling Dec 19, 2024

github-actions bot added this to the Backlog milestone Dec 19, 2024

ijklam changed the title ~~byref parameters with non-ASCII identifiers cannot be set.~~ byref parameters with starting with some non-ASCII identifiers cannot be set. Dec 19, 2024

ijklam changed the title ~~byref parameters with starting with some non-ASCII identifiers cannot be set.~~ byref parameters starting with some non-ASCII identifiers cannot be set. Dec 19, 2024

abonie added Impact-Low (Internal MS Team use only) Describes an issue with limited impact on existing code. help wanted and removed Needs-Triage labels Jan 6, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`byref` parameters starting with some non-ASCII identifiers cannot be set. #18164

`byref` parameters starting with some non-ASCII identifiers cannot be set. #18164

ijklam commented Dec 19, 2024 •

edited

Loading

ijklam commented Dec 20, 2024 •

edited

Loading

brianrourkeboll commented Dec 20, 2024

brianrourkeboll commented Jan 25, 2025 •

edited

Loading

brianrourkeboll commented Jan 25, 2025 •

edited

Loading

brianrourkeboll commented Jan 25, 2025 •

edited

Loading

brianrourkeboll commented Jan 26, 2025

byref parameters starting with some non-ASCII identifiers cannot be set. #18164

byref parameters starting with some non-ASCII identifiers cannot be set. #18164

Comments

ijklam commented Dec 19, 2024 • edited Loading

ijklam commented Dec 20, 2024 • edited Loading

brianrourkeboll commented Dec 20, 2024

brianrourkeboll commented Jan 25, 2025 • edited Loading

brianrourkeboll commented Jan 25, 2025 • edited Loading

brianrourkeboll commented Jan 25, 2025 • edited Loading

brianrourkeboll commented Jan 26, 2025

`byref` parameters starting with some non-ASCII identifiers cannot be set. #18164

`byref` parameters starting with some non-ASCII identifiers cannot be set. #18164

ijklam commented Dec 19, 2024 •

edited

Loading

ijklam commented Dec 20, 2024 •

edited

Loading

brianrourkeboll commented Jan 25, 2025 •

edited

Loading

brianrourkeboll commented Jan 25, 2025 •

edited

Loading

brianrourkeboll commented Jan 25, 2025 •

edited

Loading