-
-
Notifications
You must be signed in to change notification settings - Fork 104
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
presubmission inquiry for pangoling: Access to word predictability using large language (transformer) models. #573
Comments
Thanks @bnicenboim for your pre-submission. I'll come back to you ASAP. |
Dear @bnicenboim, It's my first rotation as EiC and I'm still learning the nuances of assessing the eligibility of each submission. I need to show other editors the evidence that shows how pangoling meets our criteria for fit and overlap. Can you please help me by answering the following questions as succinctly and precisely as possible? I numbered the actionable items to help you respond to them specifically. Package categoriesscientific software wrappers: Packages that wrap non-R utility programs used for scientific research.
What research field is pangoling specific for? Is that psycho/neuro/- linguists?
Can you please expand on what value pangoling adds? That is, above simple system() call or bindings, whether in parsing inputs and outputs, data handling, etc. Improved installation process, or extension of compatibility to more platforms, may constitute added value if installation is complex. Other scope considerationos
The 'pangoling' package states that the overlaping package 'text' is more general. Can you please argue how despite this pangoling still meets rOpenSci's guidelines for fit and overlap? Package overlap
Can you please explain if 'pangoling' duplicates or not functionaliry in the 'text' or another R package, and if it does then how 'pangoling' represents a significant improvement (see our guide for details on what we mean by 'significant improvement')?
Thanks for your patience :-) |
Ok, sure I answer inline.
Yes, it's common to use word predictability as a predictor in models, and pangoling extract predictability from the transformer models.
Transformer models are "meant" to be used for computational linguistic tasks. For example gpt-like models produce a (random) continuation given a context. That's trivial to get, since there is a short-cut call called Also the point of using I hope it's clearer, but feel free to ask!
I think I answered in the previous point. I'm not even sure that you can get the output of pangoling just using
I'm not sure if I understand this. One would need to set up the models in python, then extract the tensors and manipulate them. Finally, one needs to take care of the mapping between words and tokens. But python is not needed in the last step.
ok, there was a lot of overlap in my answers, so feel free to ask me more specific questions if something is not clear. |
@bnicenboim, I'm still discussing the scope with other editors.
|
On Fri, Feb 17, 2023, 5:29 PM Mauro Lepore ***@***.***> wrote:
@bnicenboim <https://github.com/bnicenboim>, I'm still discussing the
scope with other editors.
Ok, sure no problem.
- ml06. Did you consider submitting pangoling as as stats package
<https://stats-devguide.ropensci.org/>? If so, what convinced you to
submit as a general package?
Sorry, why as a stats package? it doesn't do any statistics. I guess it's
an NLP package if I'm forced to put it in a category.
…
—
Reply to this email directly, view it on GitHub
<#573 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABNUQ6RPB3WRRSXXEKU3QUTWX6RNRANCNFSM6AAAAAAU22HXEU>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Thanks. I ask because the category "text analysis" in the standard-package guide states:
Knowing that you at least considered it I can now be sure the standard-package review is your informed decision. |
Thanks for the clarification, i found this under statistical software:
1. Bayesian and Monte Carlo Routines
2. Dimensionality Reduction, Clustering, and Unsupervised Learning
3. Machine Learning
4. Regression and Supervised Learning
5. Exploratory Data Analysis (EDA) and Summary Statistics
6. Spatial Analyses
7. Time Series Analyses
And the package doesn't fall into any of those, it's not doing machine
learning either. So I think it's fine under general.
…On Fri, Feb 17, 2023, 7:08 PM Mauro Lepore ***@***.***> wrote:
Thanks.
I ask because the category "text analysis" in the standard-package guide
states:
Machine-learning and packages implementing NLP analysis algorithms should
be submitted under statistical software peer review
<https://stats-devguide.ropensci.org/>.
Knowing that you at least considered it I can now be sure the
standard-package review is your informed decision.
—
Reply to this email directly, view it on GitHub
<#573 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABNUQ6VUVAWBBIFT3JCNLADWX65BVANCNFSM6AAAAAAU22HXEU>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
I now have enough opinions from the editorial team to consider this package in scope. Please go ahead with a full submission. Thanks for your patience. |
Thanks, should I do something about |
Please use the same justification you wrote here. |
@bnicenboim @maurolepore I've just updated |
Closing because there is now a full submission at #575 |
Submitting Author Name: Bruno Nicenboim
Submitting Author Github Handle: @bnicenboim
Repository: https://github.com/bnicenboim/pangoling
Submission type: Pre-submission
Language: en
Scope
Please indicate which category or categories from our package fit policies or statistical package categories this package falls under. (Please check an appropriate box below):
Data Lifecycle Packages
Statistical Packages
Explain how and why the package falls under these categories (briefly, 1-2 sentences). Please note any areas you are unsure of:
The package is a wrapper around
transformers
python package, and it can tokenize, get word predictability and calculate perplexity which is text analysis.NA
This is mostly for psycho/neuro/- linguists that use word predictability as a predictor in their research, such as ERP and reading.
Another R package that acts as a wrapper for
transformers
istext
However,text
is more general, and its focusis on Natural Language Processing and Machine Learning.
pangoling
is much more specific and the focus is on measures used as predictors in analyses of data from experiments, rather than NLP.(If applicable) Does your package comply with our guidance around Ethics, Data Privacy and Human Subjects Research?
Any other questions or issues we should be aware of?:
Yes, the output of pkgcheck fails only because of the use of
<<-
. But this is done in order to use memoise, as it is recommended in its page. The<<-
in the package appears inside.onLoad <- function(libname, pkgname) {
The pkgcheck output is the following:
── pangoling 0.0.0.9002 ────────────────────────────
✔ Package name is available
✔ has a 'codemeta.json' file.
✔ has a 'contributing' file.
✔ uses 'roxygen2'.
✔ 'DESCRIPTION' has a URL field.
✔ 'DESCRIPTION' has a BugReports field.
✔ Package has at least one HTML vignette
✔ All functions have examples.
✖ Package uses global assignment operator ('<<-').
✔ Package has continuous integration checks.
✔ Package coverage is 94.4%.
✔ R CMD check found no errors.
✔ R CMD check found no warnings.
ℹ Current status:
✖ This package is not ready to be submitted.
The text was updated successfully, but these errors were encountered: