-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove case insensitivity by changing the approach towards style insensitivity #547
Comments
As for me better to add (again) style skins (but more generalized): # possible syntax
import strutils
proc processIdent(s: string): string =
var afterWhitespace = false
for i in 0..<s.len - 1:
if s[i] == '_' and s[i + 1] != '_': result.add s[i + 1].toUpperAscii
elif not afterWhitespace: result.add s[i]
afterWhitespace = s[i] == '_' and s[i + 1] != '_'
if not afterWhitespace: result.add s[^1]
proc sameIdent(a, b: string): bool =
processIdent(a) == processIdent(b)
import std/syntaxskins # it contains also default skins
const mySkin = initSyntaxSkin(sameIdent = sameIdent)
{.syntaxSkin: mySkin.} It also solves problem with C wrappers (you can just disable case insensitivity for wrapper module and use case sensitivity wrapper from other module in case sensitivity manner) about interaction between modules: # gl.nim
import std/syntaxskins
{.syntaxSkin: caseSensive.} # or wrapperDefault, etc...
const GL_FLOAT = 0x1406
type GLfloat* = float32
# code.nim
import gl
# we have default syntax skin
let x = GL_FLOAT # external idents used with it's sameIdent logic
proc test(x: GLfloat) = discard
# but non external identifiers have normal logic
import snake_case_module
var x = weirdName # in snake_case_module it is weird_name |
Style skins seems like an interesting idea, but a bit irrelevant here as this RFC tries to address the issue of having multiple identifiers with different compound words that are treated as the same identifier, because the algorithm that converts them to an IR can't handle such cases. If your snake_case_module contains both weird_name and weirdname, Nim will still treat it as the same. Sure, you could just add {.syntaxSkin: caseSensive.} into snake_case_module, but I think it would be better to make Nim behave properly by default. |
Isn't the suggestion to remove case insensitivity? That is definitely breaking to code which uses different casings. |
Perhaps you still need to add a rule: It solves @metagn cases |
It is my understanding that it about changing the register insensitivity rules, not removing it. |
Sorry, I was wrong about |
newHTMLParser and newHtmlParser are the same |
With this proposal, they no longer become the same. |
this rule lets it be the same |
BTW, now I think that sameIdent shouldn't be in syntaxSkin, it should be another pragma like |
I think it would be better to convert every uppercase letter in a continuous sequence of uppercase letters to lowercase, except the first and the last ones, since a single identifier can have many words and (for me, at least) its not clear if compiler should alternate between upper and lower cases or make everything between first and last uppercase letter uppercase and what effects that may cause. With this logic newHTMLParser becomes newHtmlParser, GLFloat -> GlFloat, Gl_float, etc. What do you think? |
Maybe it's better, but I don't see much difference. getISLandFrame -> getIsLandFrame
get_is_land_frame -> getIsLandFrame # match but shouldn't
# must be valid only:
get_isl_and_frame -> getIslAndFrame # no match but should NOTE: for uppercase variant result is same # l - lowercase letter
# u - uppercase letter
# (l or u)* - zero and more matches
# (l or u)+ - one and more matches
# string with pattern:
l* u+ l+ u+ l* # in any case it will be the same Idealy, and must be camel case (And), but we can't detect it For getISLAndFrame it working as expected |
I thought about making Nim case sensitive but allowing for a |
I'm not sure I understand your examples correctly. get_is_land_frame and getIsLandFrame match and should match (both in current state of Nim and with proposed changes from this RFC). I don't see why get_isl_and_frame and getIslAndFrame woudn't match either (after both RFC changes and lowercasing letters between 2 capitals).
Get getISLAndFrame is problematic if you want to convert letters between 2 capitals to uppercase, because both it and getIslandFrame will be converted to getISLANDFrame. |
all ident matching and case insensitivity works via getIdent proc, if we add rules object for ident, we can
getISLandFrame this is a crazy example: getISLAndFrame will not be converted to getISLANDFrame it will be converted to getIslAndFrame |
This discussion flared up again because some people want to do this. type Foo = object
const FOO = 123 Before this, tons of people complained about not being able to use underscores on their own, like For any kind of distinction that style insensitivity in Nim removes, there are people that use that distinction in other languages, to the point that it supposedly stops them from using Nim. This has not changed since 8 years ago when I found out about Nim, and basically nothing has been done about it (I think the But it's still a feature that was promised to be part of Nim, hence 100+ people voted to keep it in #456. Removing it wouldn't be a development of Nim, it would create a new language called "Nim with complete identifier sensitivity". Many people would prefer or feel better about the original Nim over this new language. On the other end, it'll only be one thing potential new users/companies won't care about anymore while they find 20 other reasons not to use Nim. It's not a great compromise. I understand the desire here to find middle ground, but changing style insensitivity means people who are used to it will have to learn the more complex new rules, and only some of the people who are against the current rules will be accommodated while the rest will be equally placated and driven away due to the extra complexity. Instead, we can look at what exactly it is that the opponents of style insensitivity want that isn't possible in current Nim, and see if any of it is incompatible with what the proponents want that current Nim allows. If I had to say, it would be like: Opponents:
Proponents:
Notice the lack of overlap: the opponents don't care if we access the symbols with a different style than the definition as they're used to it just erroring, the proponents (probably) don't care if we define different symbols that would be considered the same under style insensitivity as, again, this would error in current Nim. Consider this design (ignore the complexity for the Nim compiler for now): Allow defining style sensitive different symbols. let foo = 1
let fO_o = 2 Allow referring to them with their exact definition style. echo foo # 1
echo fO_o # 2 If we refer to them with a different style, error only if more than 1 symbol matches the current style, otherwise allow the reference. echo fOo # error: ambiguous between foo and fO_o
let bar = 3
echo bA_r # 3 For routines defined with different styles (probably can be generalized to any overloaded symbol), if a style is used that exactly matches some of the definitions, only consider those as overloads. If a style is used that doesn't match any definition: if every definition has the same style, consider every overload, otherwise error due to ambiguous style. proc foo(x: int): int = x * 2
proc fO_o(x: int): int = x * 3
echo foo(1) # 2
echo fO_o(1) # 3
echo fOo(1) # error: ambiguous style between foo and fO_o
proc bar(x: int): int = x + 1
proc bar[T](x: T): T = x
echo bar(1) # 2
echo bA_r(1) # 2
echo bAr("abc") # 2 This part compromises on the style insensitivity camp because it doesn't let us swap the usage with any definition and have it still work in every case. But considering proc foo(x: int): int = x * 2
proc fO_o(x: int): int = x * 3
echo foo(1) # 2
echo fO_o(1) # 3
echo fOo(1) # error: both foo and fO_o match
proc bar(x: int): int = x + 1
proc bA_r[T](x: T): T = x # notice weaker match but different style
echo bar(1) # 2
echo bA_r(1) # 1
echo bAr(1) # 2 If this design is sound, then we have something that can make everyone happy, and we can iterate on it with regards to addressing complexity in the Nim compiler; or we can formulate other designs in the same way. But again, we can't compromise too much since Nim already promises style insensitivity and has for a long time. |
No benefits are known for SCREAMING constants, so this particular case is completely irrelevant for me personally. |
I personally don't like the fact that foo and fO_o are both compiled. But I think that idea of this RFC is quite logical because it’s strange to make gameStop == gamesTop. As for me we need case insensitivity but restricted where they have different meanings. The purpose of this RFC is to bring some clarity to case insensitivity. Yes maybe it need also aditional rules for equality foo and fO_o and Foo, FOO or make other rules in general. BTW I've never had a case where when using pure Nim, insensitivity gets in the way. However when wrapping C libraries it happens quite often, e.g. OpenGL, v4l2, libnx..... And I think you could add a pragma to specify case sensitivity: on | off, the same rules would apply for a module and identifiers from that module imported from other modules as case sensitivity in a module with a pragma. I find it quite useful, plus after these C-api wrappers there is likely to be a high-level api for nim. |
I also don't like the idea of having foo and fO_o in the same codebase, but mostly because in the context of Nim these things would be the same identifier, probably referring to the same thing in the same context, which is a huge code smell (that current Nim doesn't forbid anyway). And I do think that having style insensitivity for identifiers be allowed only if user's chosen style doesn't create any ambiguity when resolving identifier names is an interesting idea, but it's a bit out of scope of this RFC and is probably too complex for both camps to be happy with. The purpose of this RFC is to highlight a problem with current implementation of style insensitivity. People, when using snake_case or camelCase, don't place underscores and uppercase letters randomly, but tend to use them as delimiters for words inside of a single identifier name. This information is simply lost when Nim parses these names into IR, and in some cases this can create ambiguity amongst 2 different identifiers. And notice that people are mostly discussing snake_case and camelCase when talking about style insensitivity, and not styles like SCREAMINGCASE or nocase, so I personally don't find any value in a system that allows me to do C_R_A_Z_Y, sIlLy stufflikethis, but doesn't let me have new_freedom and new_free_dom in the same context. I consider that for styles that Nim realisitcally supports, case insensitivity is an improper tool to achieve insensitivity amongst them. If people find genuine issues with proposed changes that would break their workflow, I'd be happy to hear about those and have a discussion. But from what I can tell, these changes shouldn't affect anyone's code (that doesn't smell already) and shouldn't prohibit people from using their favorite style. |
No matter what algorithm we come up with some code will still break. In this sense, people do place underscores and uppercase letters randomly. There is also the problem that the simpler the algorithm is, the more code will break. If you don't want There is a case which compiles in current Nim: # a.nim
proc foo(x: int): int = x * 2 # b.nim
proc fO_o(x: int): int = x * 3 import a, b Trying to use either For this case, we can add another warning, for whenever we use a symbol when a symbol with another style was available. These warnings need to be opt in though because they will complain that people who prefer style sensitivity are using style sensitivity. |
From what I understand, I don't see how this is not possible to do in current compiler, since it requires a change in a compiler part that is already there and is responsible for current behavior in the first place (unless compiler, for some reason, is built around the fact that all identifiers internally are lowercase (except maybe for the first letter) and is now tightly coupled to this implementation detail).
If someone wants to use my Once again, if I'm missing something, and there is a genuine case where the proposed changes would break an otherwise reasonable code that doesn't adhere to either snake or camel case (or some mixture of both), then I'd like to see such an example (that isn't foobarbaz).
This will also compile with proposed changes, but will not be ambiguous, because you would be able to call |
The idea that the convenience style insensitivity is critical for deciding on using or ditching the language is hilarious. 🤷 |
I'm sure that's the rule that has been proposed here, but let me spell it out to see if there is disagreement: "Two identifiers are equal if they have separators in the same positions. A separator is either the underscore or a invisible between a case switch (lowercase followed by uppercase or vice versa). The first character is always case sensitive and other characters are not. An underscore cannot be followed by an underscore." This implies:
This is pretty intuitive, keeps style intensivity's benefits and tooling can easily provide a transition path. |
@Araq What about "A separator is either the underscore or a invisible between a case switch" (when lowercase followed by uppercase or one left if uppercase followed by lowercase). IMHO HTMLParser is more common style then HTMLparser so that
|
I am suggesting a new option entirely that would go with my proposal, it would just go under the styleCheck umbrella.
Yes, internally all identifiers are stored in a style insensitive way. You are suggesting to store them under a new scheme (no?), I am saying changing the scheme is going to cause more pain.
I don't want to indulge misunderstandings further here. The compiler, on a technical level, should be able to respect any different strings. The same way the compiler would allow
I meant that you cannot define both
I don't mean to be rude, but this is not an argument for your suggestion, it's a defense under the hypothetical argument "we should allow f_o_o_BaR". The point isn't that
I mean, this is the user's fault, there's nothing honest about changing
Yeah but are we really in a position to bargain against people who think this? We are only so influential.
I did not interpret the proposal like this at all, it's not what "convert snake case to camel case" usually means. The comment:
Implies that It's interesting though, and I'm not saying these conversion schemes don't make sense (if this was not clear) but I can't say that changing to a new scheme is a path we should be pursuing. To repeat, changing the scheme would:
If we are at the point that these are acceptable compromises then we can go nuts but I still have doubts. Final note: the majority use of style insensitivity is not the simultaneous use of camel case and snake case, most people adhere to NEP1 which says to use camel case. The majority use is cases like |
I agree but that's a (good!) argument against any proposal like this altogether. For now I'm interested in exploring the proposal, we can always reject it later. Of course, you can also say "waste of time" already. |
I like where this is going. Does this proposal handle all uppercase identifiers like this? |
@kuchta > What about "A separator is either the underscore or a invisible between a case switch" (when lowercase followed by uppercase or one left if uppercase followed by lowercase). Is that the same as "separator between uppercase letters AB if B is followed by a lowercase letter?" |
The ruleset is almost correct. Its just that a separator is either an underscore or a case switch from lowercase to uppercase (and not vice-versa). Also, like @metagn pointed out, a continuous sequence of uppercase letters gets converted into a separate word, except for the last character in such sequence, so people can use |
So AF_INET would be AfIneT? |
Perhaps there should be an exception would be made for the last character in sequence if that last character happens to be last character in identifier. So |
Maybe we could let the user decide what names the variables have, instead of having each variable name have x^n number of name permutations? It would make it easier for everyone, no inconsistencies. More power to the user. |
Sorry, didn't notice your comment earlier. Can you elaborate more on this? People here have already suggested using "style skins", and we already discussed their issues. |
I should probably read up on these style skins first, maybe they solve the issue already. |
Abstract
Instead of removing underscores and forcing every letter to lowercase, replace every occurrence of underscore followed by any-case letter with uppercase letter in snake_case identifiers, and keep camelCase identifiers as-is.
Motivation
Current style insensitivity algorithm can produce a few problems internally when trying to represent compound words that can be split more than one way in identifier names, as the internal all-lowercase identifier names are indistinguishable in certain scenarios (gameStop vs gamesTop, index vs inDEX, superb_owl vs super_bowl, island vs isLand, etc).
Description
I propose for identifiers to store their names with camelCase (or snake_case, or any other way to store separate words inside of a identifier), as a case of a letter or an underscore in the name clearly delimits the words used inside that name. This should resolve the issue presented above by allowing separate words to be represented properly.
Conversion between styles would preserve the words by forcing any letter preceded by an underscore to be uppercase when converting from snake_case to camelCase. When converting from camelCase to snake_case, just reverse the process.
Code Examples
No response
Backwards Compatibility
Macros will break if they generate an all-lowercase identifiers, but other than that I'm not aware of any other backwards-incompatible changes this would introduce to Nim.
The text was updated successfully, but these errors were encountered: