Replies: 2 comments 7 replies
-
It's not trivial at all, primarily because of the performance impact this would have and how accurate it would be. It's come up a few times in the past in the discussions around detecting Python in Jupyter files in #3316, R in Rmarkdown files in #5208, and JavaScript in HTML files in #5248, and more generically in #5326 |
Beta Was this translation helpful? Give feedback.
-
Hello. I am currently a student studying artificial intelligence! |
Beta Was this translation helpful? Give feedback.
-
As is known, if we do not set up the Linguist in
.gitattributes
, then the generated codes from the IPython will be likely to take over the stats in a repo. Right now there are two "solutions" to handle this in.gitattributes
.Method 1
The first one is an unfavorable method by setting the type of lang for
.ipynb
to Python (or Julia, R, etc)Not only Linguist will be largely inflated by the javascript lines from the notebook, the commit will be messed up as well. The output cells will be counted as a line and comparing the difference will be a nightmare for memory, as sometimes a simple lite notebook with output cells displaying tensors may easily have over 100k lines.
Method 2
I believe right now the following is the method most people are using: adding
or
in
.gitattributes
. This excludes*.ipynb
in stats once and for all.However, there is a downside in that now any commits involving any changes in(UPDATE: this claim is wrong, it was associated with a specific file that somehow has messed-up lines...) . This is also a nightmare for difference comparing between commits.*.ipynb
will just be counted as 1 line changeA semi feature proposal
I am curious how difficult it is to implement a more adaptive detector for codes in Linguist to achieve the following:
linguist-documentation
to exclude them from line by line difference.Beta Was this translation helpful? Give feedback.
All reactions