Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Single-character String literal (int type) #11704

Open
Ivorforce opened this issue Feb 5, 2025 · 9 comments
Open

Single-character String literal (int type) #11704

Ivorforce opened this issue Feb 5, 2025 · 9 comments

Comments

@Ivorforce
Copy link

Ivorforce commented Feb 5, 2025

Describe the project you are working on

Godot internals.

Describe the problem or limitation you are having in your project

GDScript does not have a convenient way to express 'single character' values.
Ideally, it would have a literal type for them.

Single characters are most useful for their performance benefits, because they do not allocate memory (as opposed to String).

Some existing APIs already use single characters as arguments or return types (int type). It is currently inconvenient to call them (a is unicode 97):

  • string.get_slicec(97, 0) (ref)
  • input.is_key_label_pressed(97) (ref)
  • string.unicode_at(0) == 97 (ref)
  • Adjust functions that should already take single characters but are currently taking string, e.g. lpad: string.lpadc(97)
    • Changing input behavior would be better, but would break compatibility (5.0+)
  • String.chr(97) (ref)
    • Unlikely to be useful in this configuration as you can just use "a", but may be used transitively (through functions).

Describe the feature / enhancement and how it helps to overcome the problem or limitation

There should be a single-character literal for GDScript. Since GDScript uses int to describe characters in existing APIs, it would evaluate to int (UTF-32).

Describe how your proposal will work, with code, pseudo-code, mock-ups, and/or diagrams

I propose the following syntax (analogous to others using String prefixes):

# Type is int
var char := c"a"  # Yields 97
var char := c'a'  # Yields 97

# Edge cases
var char := c"'"  # Valid, yields ' (39)
var char := c'"'  # Valid, yields " (34)
var char := c"\""  # Valid, yields " (34)
# Careful: Some characters are 1 char in UTF-32, but multiple chars in UTF-8 / UTF-16.
var char := c"Ä"  # Valid, yields Ä (196)
# Unlikely to be useful, could use var char := 0x2122 instead
var char := c"\u2122"  # Valid, yields ™ (8482 / 0x2122)

# Erroneous Use
var chars := c"aa" # Error: Expected one character in single-character literal, found string of size 2.
var chars := c"" # Error: Expected one character in single-character literal, found string of size 0.
# Emoji are often encoded using multiple characters.
var chars := c"👨🏻‍🦱" # Error: Expected one character in single-character literal, found string of size 7.

# Examples (each equal)
string.get_slicec(97, 0)
string.get_slicec(c"a", 0)

input.is_key_label_pressed(Key.KEY_A)
input.is_key_label_pressed(65)
input.is_key_label_pressed(c"A")

string.unicode_at(0) == 97
string.unicode_at(0) == c"a"

# Note: This function doesn't exist (yet?)
string.lpad("a")
string.lpadc(97)
string.lpadc(c"a")

"a"
String.chr(97)
String.chr(c"a")

If this enhancement will not be used often, can it be worked around with a few lines of script?

It is possible to use unicode value literals:

var char := 97  # a

These can be looked up on sites like unicodelookup.com.
It is also possible to use GDScript:

var char := "a".unicode_at(0)

However, this is slow.

Is there a reason why this should be core and not an add-on in the asset library?

It's GDScript syntax.

@AThousandShips
Copy link
Member

If it wasn't for the fact that we allowed single quote strings I'd have pushed for using the C syntax for this but unfortunately that's not convenient

I'm not sure about the suggested syntax here, but something achieving this would be helpful

@Ivorforce
Copy link
Author

Ivorforce commented Feb 5, 2025

Yeah, I would probably also have suggested 'a' syntax for single characters otherwise. Given we already have a few prefixed string literals, and could reuse their code for this, I judged the c"a" syntax as most appropriate.

@KoBeWi
Copy link
Member

KoBeWi commented Feb 5, 2025

It is also possible to use GDScript:
var char := "a".unicode_at(0)
However, this is slow.

It could be fast with better constant support.

const char = "a".unicode_at(0)

it would be evaluated only once for the script, but currently it's not allowed, even if it's technically a constant expression.
This works:

const chr = char(10)

but we don't have utility method that does the opposite.

@Ivorforce
Copy link
Author

It could be fast with better constant support.

That's true, though i would argue this pattern would still rarely be used since it's way longer than just "a". Convenience often trumps speed :)

@dalexeev
Copy link
Member

dalexeev commented Feb 5, 2025

There is no char type in GDScript, so there is no point in adding character literals. However, if you want to implement a parser in GDScript, it's more efficient to use string.unicode_at(i) instead of string[i] and compare the result with an integer code. At the moment you have to add the constants manually. And with this PR you will be able to use ord("x") in place, GDScript does the constant folding.

@Ivorforce
Copy link
Author

There is no char type in GDScript, so there is no point in adding character literals.

I think this perspective is a little too technical.
Often, programming language design is just as much about psychology as it is about logic: You design the language in a way that patterns you want people to use are convenient, and patterns you don't want people to use are inconvenient. Unicode literals are fast (and often correct), so we want people to design functions like lpad to take char arguments instead of strings.

That being said, ord("a") isn't half bad. I'd still prefer c"a" to encourage the use of char based logic, but ord based code would be better than the status quo.

@dalexeev
Copy link
Member

dalexeev commented Feb 5, 2025

Often, programming language design is just as much about psychology as it is about logic: You design the language in a way that patterns you want people to use are convenient, and patterns you don't want people to use are inconvenient.

This is a very fine line between purity and practicality, so I don't have any absolute arguments. You're right, it could work as you described and wouldn't cause any major practical contradictions. However, in my opinion, supporting character literals at the language level while simultaneously lacking a character type has at least conceptual inconsistencies:

  1. Users might feel entitled to ask us to introduce a character type in the future, even if just as a distinct type over int. But in Godot/GDScript, String is not a sequence of characters; string[i] has the type String, not int or char.
  2. I'm not a fan of how c"a" looks very similar to "a", r"a", &"a", ^"a", yet is significantly different from them (there are implicit conversions between String, StringName, and NodePath). A character literal looks very much like a string but is actually an integer. Yes, that's the whole point of character literals in C, C++, and similar languages, but this isn't common in scripting languages. Some users might get confused or make accidental mistakes.
  3. ord("a") seems to me like the best compromise between brevity and clarity. It's only 4 characters longer than c"a", yet it doesn't introduce new functionality at the language level. Additionally, the ord() function has documentation that the user can view using a tooltip.

@Jesusemora
Copy link

I was on board with this until I saw your proposed syntax.
ord("a") I think is good enough.

@Ivorforce
Copy link
Author

@Jesusemora got any better ideas?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants