Why does GitHub still wrongly mislabel R / RMarkdown projects? #6869
Replies: 2 comments 1 reply
-
In what way? Rmd files are recognised but are considered prose so Linguist doesn't include them in the language stats by default. The same applies to repos that contain nothing but Markdown files.
Because Linguist considers Juptyer a programming language but not Rmd files. Why? See #5208. Additionally GitHub (independent of Linguist) has support for rendering Juptyer files but not Rmd. Also, Linguist doesn't know a language called "RStudio Notebook". If this is something distinct from a standalone Rmd file, please feel free to submit a pull request to add support.
This isn't likely to be your Rmd files. As I mentioned before, Linguist considers Rmd as prose so it is not included in the stats so won't appear in the side bar and won't contribute to the languages shown in the side bar. If you are seeing other names it is because other languages are being detected in your other files or your Rmd files are not using the linguist/lib/linguist/languages.yml Lines 5759 to 5770 in e2012cd If you provide a link to a repo showing the problem, I can take a look and explain the behaviour you are seeing more precisely.
This is the only way to override Linguist's default behaviour, in this case, to not count prose files, or to tell it what you really want if you're not using expected extensions. |
Beta Was this translation helpful? Give feedback.
-
So why don't you fix Linguist's hallucinations? Why does Linguist want to show biased language statistics? Perhaps some directive from the Ministry of Truth? How is an RStudio notebook (an .Rmd file) with very distinctive R code chunks and comments between the code chunks, conceptually different from a C file with comments? Why does Linguist not recognize R code chunks -- easily parsable -- as code? Why does Linguist not classify a C file with comments as prose? Isn't Linguist being inconsistent? Linguist is having an hallucination that needs to be fixed by looking at the code chunks in a notebook. If there are no code chunks, then perhaps "prose" is an adequate answer. If a C file has no C code but only comments do you label it as "prose"? An .Rmd file (or a Jupyter notebook), can have R or Python code (or other languages, too). Why not classify a file by the percentages of code chunks in the file? You could have an .Rmd file that is 75% R and 25% Python or vice versa. And perhaps a percentage for "prose" for the comment chunks? Most .Rmd files are not 100% "prose" whether Linquist says so or not. And I'd argue that you should have Quarto notebooks instead of "prose," too. I'd accept no classification as a solution to .Rmd files instead of a wrong solution. Why is "wrong" the default? Why is a "wrong" default ever acceptable? Because the Ministry of Truth says so? I've had this same argument about Pascal and Delphi and got nowhere in the past. Delphi is very different from Pascal, but not to the Linguist. |
Beta Was this translation helpful? Give feedback.
-
Why does GitHub automatically "guess wrong" about repos with Rmarkdown code (Rmd) files?
Why can't you label this an "RStudio Notebook" like you label a "Jupyter Notebook"?
Why is the "solution" to show wrong information instead of fixing this very old problem? My Rmd files are not "Jekyll using Docker image" or "SLSA Generic generator".
Forcing a .gitattributes file is not a good fix when the default is to show wrong information.
Beta Was this translation helpful? Give feedback.
All reactions