You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, guys. Thank you for your research. It is extremely interesting and valuable for the community, and I mean it! I am curious why didn’t you use lemmatization or stemming of the words prior to analysis? Is it only due to the increased computational power required, or is there another reason I am missing?
From my perspective, your current approach may potentially underestimate the frequency ratio of some words. For example, from Figure 2, it is clear that the frequency of the word "delve" should be higher, as both "delves" and "delved" are presented in the figure.
I am asking because I am planning to conduct similar research with the Earth Science manuscripts and finding excess words specific for my domain.
The text was updated successfully, but these errors were encountered:
To be honest, the main reason was "for simplicity", but one secondary reason was that we thought it may actually be interesting to look at all forms separately -- e.g. "delves", "delved" and "delve" may increase their usage by a different amount (because ChatGPT may prefer to use a specific form particularly often).
In retrospect I think it would actually be more sensible to lemmatize everything. We may change the analysis in future revisions, or possibly add a supplementary analysis with/without lemmatization. Depends also on how the peer review process will go.
Hi, guys. Thank you for your research. It is extremely interesting and valuable for the community, and I mean it! I am curious why didn’t you use lemmatization or stemming of the words prior to analysis? Is it only due to the increased computational power required, or is there another reason I am missing?
From my perspective, your current approach may potentially underestimate the frequency ratio of some words. For example, from Figure 2, it is clear that the frequency of the word "delve" should be higher, as both "delves" and "delved" are presented in the figure.
I am asking because I am planning to conduct similar research with the Earth Science manuscripts and finding excess words specific for my domain.
The text was updated successfully, but these errors were encountered: