Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixing typos, spelling & grammar in documentation #268

Merged
merged 4 commits into from
Apr 1, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion R/clean_levels.R
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@
#'
#' # Tidying
#'
#' When you [`tidy()`][tidy.recipe()] this step, a tibble is retruned with
#' When you [`tidy()`][tidy.recipe()] this step, a tibble is returned with
#' columns `terms`, `orginal`, `value`, and `id`:
#'
#' \describe{
Expand Down
2 changes: 1 addition & 1 deletion R/clean_names.R
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@
#'
#' # Tidying
#'
#' When you [`tidy()`][tidy.recipe()] this step, a tibble is retruned with
#' When you [`tidy()`][tidy.recipe()] this step, a tibble is returned with
#' columns `terms`, `value`, and `id`:
#'
#' \describe{
Expand Down
4 changes: 2 additions & 2 deletions R/dummy_hash.R
Original file line number Diff line number Diff line change
Expand Up @@ -36,15 +36,15 @@
#' The argument `num_terms` controls the number of indices that the hashing
#' function will map to. This is the tuning parameter for this transformation.
#' Since the hashing function can map two different tokens to the same index,
#' will a higher value of `num_terms` result in a lower chance of collision.
#' a higher value of `num_terms` will result in a lower chance of collision.
#'
#' @template details-prefix
#'
#' @details
#'
#' # Tidying
#'
#' When you [`tidy()`][tidy.recipe()] this step, a tibble is retruned with
#' When you [`tidy()`][tidy.recipe()] this step, a tibble is returned with
#' columns `terms`, `value`, `num_terms`, `collapse`, and `id`:
#'
#' \describe{
Expand Down
6 changes: 3 additions & 3 deletions R/lda.R
Original file line number Diff line number Diff line change
Expand Up @@ -9,10 +9,10 @@
#' @template args-trained
#' @template args-columns
#' @param lda_models A WarpLDA model object from the text2vec package. If left
#' to NULL, the default, will it train its model based on the training data.
#' to NULL, the default, it will train its model based on the training data.
#' Look at the examples for how to fit a WarpLDA model.
#' @param num_topics integer desired number of latent topics.
#' @param prefix A prefix for generated column names, default to "lda".
#' @param prefix A prefix for generated column names, defaults to "lda".
#' @template args-keep_original_cols
#' @template args-skip
#' @template args-id
Expand All @@ -21,7 +21,7 @@
#'
#' # Tidying
#'
#' When you [`tidy()`][tidy.recipe()] this step, a tibble is retruned with
#' When you [`tidy()`][tidy.recipe()] this step, a tibble is returned with
#' columns `terms`, `num_topics`, and `id`:
#'
#' \describe{
Expand Down
2 changes: 1 addition & 1 deletion R/lemma.R
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@
#'
#' # Tidying
#'
#' When you [`tidy()`][tidy.recipe()] this step, a tibble is retruned with
#' When you [`tidy()`][tidy.recipe()] this step, a tibble is returned with
#' columns `terms` and `id`:
#'
#' \describe{
Expand Down
8 changes: 4 additions & 4 deletions R/ngram.R
Original file line number Diff line number Diff line change
Expand Up @@ -23,14 +23,14 @@
#' @details
#'
#' The use of this step will leave the ordering of the tokens meaningless. If
#' `min_num_tokens < num_tokens` then the tokens order in increasing fashion
#' with respect to the number of tokens in the n-gram. If `min_num_tokens = 1`
#' and `num_tokens = 3` then the output contains all the 1-grams followed by all
#' `min_num_tokens < num_tokens` then the tokens will be ordered in increasing
#' fashion with respect to the number of tokens in the n-gram. If `min_num_tokens = 1`
#' and `num_tokens = 3` then the output will contain all the 1-grams followed by all
#' the 2-grams followed by all the 3-grams.
#'
#' # Tidying
#'
#' When you [`tidy()`][tidy.recipe()] this step, a tibble is retruned with
#' When you [`tidy()`][tidy.recipe()] this step, a tibble is returned with
#' columns `terms` and `id`:
#'
#' \describe{
Expand Down
2 changes: 1 addition & 1 deletion R/pos_filter.R
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@
#'
#' # Tidying
#'
#' When you [`tidy()`][tidy.recipe()] this step, a tibble is retruned with
#' When you [`tidy()`][tidy.recipe()] this step, a tibble is returned with
#' columns `terms` and `id`:
#'
#' \describe{
Expand Down
6 changes: 3 additions & 3 deletions R/sequence_onehot.R
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@
#' @param vocabulary A character vector, characters to be mapped to integers.
#' Characters not in the vocabulary will be encoded as 0. Defaults to
#' `letters`.
#' @param prefix A prefix for generated column names, default to "seq1hot".
#' @param prefix A prefix for generated column names, defaults to "seq1hot".
#' @template args-keep_original_cols
#' @template args-skip
#' @template args-id
Expand All @@ -33,12 +33,12 @@
#'
#' The string will be capped by the sequence_length argument, strings shorter
#' then sequence_length will be padded with empty characters. The encoding will
#' assign a integer to each character in the vocabulary, and will encode
#' assign an integer to each character in the vocabulary, and will encode
#' accordingly. Characters not in the vocabulary will be encoded as 0.
#'
#' # Tidying
#'
#' When you [`tidy()`][tidy.recipe()] this step, a tibble is retruned with
#' When you [`tidy()`][tidy.recipe()] this step, a tibble is returned with
#' columns `terms`, `vocabulary`, `token`, and `id`:
#'
#' \describe{
Expand Down
4 changes: 2 additions & 2 deletions R/show_tokens.R
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
#' Show token output of recipe
#'
#' Returns the tokens as a list of character vector of a recipe. This function
#' can be useful for diagnostics doing recipe construction but should not be
#' Returns the tokens as a list of character vectors of a recipe. This function
#' can be useful for diagnostics during recipe construction but should not be
#' used in final recipe steps. Note that this function will both prep() and
#' bake() the recipe it is used on.
#'
Expand Down
2 changes: 1 addition & 1 deletion R/stem.R
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@
#'
#' # Tidying
#'
#' When you [`tidy()`][tidy.recipe()] this step, a tibble is retruned with
#' When you [`tidy()`][tidy.recipe()] this step, a tibble is returned with
#' columns `terms`, `is_custom_stemmer`, and `id`:
#'
#' \describe{
Expand Down
6 changes: 3 additions & 3 deletions R/stopwords.R
Original file line number Diff line number Diff line change
Expand Up @@ -23,18 +23,18 @@
#'
#' @details
#'
#' Stop words are words which sometimes are remove before natural language
#' Stop words are words which sometimes are removed before natural language
#' processing tasks. While stop words usually refers to the most common words in
#' the language there is no universal stop word list.
#'
#' The argument `custom_stopword_source` allows you to pass a character vector
#' to filter against. With the `keep` argument one can specify to keep the words
#' to filter against. With the `keep` argument one can specify words to keep
#' instead of removing thus allowing you to select words with a combination of
#' these two arguments.
#'
#' # Tidying
#'
#' When you [`tidy()`][tidy.recipe()] this step, a tibble is retruned with
#' When you [`tidy()`][tidy.recipe()] this step, a tibble is returned with
#' columns `terms`, `value`, `keep`, and `id`:
#'
#' \describe{
Expand Down
2 changes: 1 addition & 1 deletion R/text_normalization.R
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@
#'
#' # Tidying
#'
#' When you [`tidy()`][tidy.recipe()] this step, a tibble is retruned with
#' When you [`tidy()`][tidy.recipe()] this step, a tibble is returned with
#' columns `terms`, `normalization_form`, and `id`:
#'
#' \describe{
Expand Down
6 changes: 3 additions & 3 deletions R/textfeature.R
Original file line number Diff line number Diff line change
Expand Up @@ -9,8 +9,8 @@
#' @template args-trained
#' @template args-columns
#' @param extract_functions A named list of feature extracting functions.
#' default to `count_functions`. See details for more information.
#' @param prefix A prefix for generated column names, default to "textfeature".
#' Defaults to `count_functions`. See details for more information.
#' @param prefix A prefix for generated column names, defaults to "textfeature".
#' @template args-keep_original_cols
#' @template args-skip
#' @template args-id
Expand All @@ -29,7 +29,7 @@
#'
#' # Tidying
#'
#' When you [`tidy()`][tidy.recipe()] this step, a tibble is retruned with
#' When you [`tidy()`][tidy.recipe()] this step, a tibble is returned with
#' columns `terms`, `functions`, and `id`:
#'
#' \describe{
Expand Down
2 changes: 1 addition & 1 deletion R/texthash.R
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@
#'
#' @details # Tidying
#'
#' When you [`tidy()`][tidy.recipe()] this step, a tibble is retruned with
#' When you [`tidy()`][tidy.recipe()] this step, a tibble is returned with
#' columns `terms`, value and `id`:
#'
#' \describe{
Expand Down
6 changes: 3 additions & 3 deletions R/tf.R
Original file line number Diff line number Diff line change
Expand Up @@ -33,12 +33,12 @@
#' issues. A good strategy is to start with a low token count and go up
#' according to how much RAM you want to use.
#'
#' Term frequency is a weight of how many times each token appear in each
#' Term frequency is a weight of how many times each token appears in each
#' observation. There are different ways to calculate the weight and this step
#' can do it in a couple of ways. Setting the argument `weight_scheme` to
#' "binary" will result in a set of binary variables denoting if a token is
#' present in the observation. "raw count" will count the times a token is
#' present in the observation. "term frequency" will divide the count with the
#' present in the observation. "term frequency" will divide the count by the
#' total number of words in the document to limit the effect of the document
#' length as longer documents tends to have the word present more times but not
#' necessarily at a higher percentage. "log normalization" takes the log of 1
Expand All @@ -54,7 +54,7 @@
#'
#' # Tidying
#'
#' When you [`tidy()`][tidy.recipe()] this step, a tibble is retruned with
#' When you [`tidy()`][tidy.recipe()] this step, a tibble is returned with
#' columns `terms`, `value`, and `id`:
#'
#' \describe{
Expand Down
2 changes: 1 addition & 1 deletion R/tfidf.R
Original file line number Diff line number Diff line change
Expand Up @@ -51,7 +51,7 @@
#'
#' # Tidying
#'
#' When you [`tidy()`][tidy.recipe()] this step, a tibble is retruned with
#' When you [`tidy()`][tidy.recipe()] this step, a tibble is returned with
#' columns `terms`, `token`, `weight`, and `id`:
#'
#' \describe{
Expand Down
4 changes: 2 additions & 2 deletions R/tokenfilter.R
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@
#'
#' @details
#'
#' This step allow you to limit the tokens you are looking at by filtering on
#' This step allows you to limit the tokens you are looking at by filtering on
#' their occurrence in the corpus. You are able to exclude tokens if they appear
#' too many times or too few times in the data. It can be specified as counts
#' using `max_times` and `min_times` or as percentages by setting `percentage`
Expand All @@ -44,7 +44,7 @@
#'
#' # Tidying
#'
#' When you [`tidy()`][tidy.recipe()] this step, a tibble is retruned with
#' When you [`tidy()`][tidy.recipe()] this step, a tibble is returned with
#' columns `terms`, `value`, and `id`:
#'
#' \describe{
Expand Down
10 changes: 5 additions & 5 deletions R/tokenize.R
Original file line number Diff line number Diff line change
Expand Up @@ -29,16 +29,16 @@
#' options(width = 55)
#' ```
#'
#' Tokenization is the act of splitting a character string into smaller parts to
#' Tokenization is the act of splitting a character vector into smaller parts to
#' be further analyzed. This step uses the `tokenizers` package which includes
#' heuristics on how to to split the text into paragraphs tokens, word tokens,
#' among others. `textrecipes` keeps the tokens as a [`token`][tokenlist()]
#' variable and other steps will do their tasks on those [`token`][tokenlist()]
#' variable before transforming them back to numeric variables.
#' variables before transforming them back to numeric variables.
#'
#' Working will `textrecipes` will almost always start by calling
#' Working with `textrecipes` will almost always start by calling
#' `step_tokenize` followed by modifying and filtering steps. This is not always
#' the case as you sometimes want to do apply pre-tokenization steps, this can
#' the case as you sometimes want to apply pre-tokenization steps; this can
#' be done with [recipes::step_mutate()].
#'
#' # Engines
Expand Down Expand Up @@ -182,7 +182,7 @@
#'
#' # Tidying
#'
#' When you [`tidy()`][tidy.recipe()] this step, a tibble is retruned with
#' When you [`tidy()`][tidy.recipe()] this step, a tibble is returned with
#' columns `terms`, `value`, and `id`:
#'
#' \describe{
Expand Down
2 changes: 1 addition & 1 deletion R/tokenize_bpe.R
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@
#'
#' # Tidying
#'
#' When you [`tidy()`][tidy.recipe()] this step, a tibble is retruned with
#' When you [`tidy()`][tidy.recipe()] this step, a tibble is returned with
#' columns `terms` and `id`:
#'
#' \describe{
Expand Down
2 changes: 1 addition & 1 deletion R/tokenize_sentencepiece.R
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@
#'
#' # Tidying
#'
#' When you [`tidy()`][tidy.recipe()] this step, a tibble is retruned with
#' When you [`tidy()`][tidy.recipe()] this step, a tibble is returned with
#' columns `terms` and `id`:
#'
#' \describe{
Expand Down
2 changes: 1 addition & 1 deletion R/tokenize_wordpiece.R
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@
#'
#' # Tidying
#'
#' When you [`tidy()`][tidy.recipe()] this step, a tibble is retruned with
#' When you [`tidy()`][tidy.recipe()] this step, a tibble is returned with
#' columns `terms` and `id`:
#'
#' \describe{
Expand Down
4 changes: 2 additions & 2 deletions R/tokenmerge.R
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@
#' @template args-role_predictors
#' @template args-trained
#' @template args-columns
#' @param prefix A prefix for generated column names, default to "tokenmerge".
#' @param prefix A prefix for generated column names, defaults to "tokenmerge".
#' @template args-keep_original_cols
#' @template args-skip
#' @template args-id
Expand All @@ -20,7 +20,7 @@
#'
#' # Tidying
#'
#' When you [`tidy()`][tidy.recipe()] this step, a tibble is retruned with
#' When you [`tidy()`][tidy.recipe()] this step, a tibble is returned with
#' columns `terms` and `id`:
#'
#' \describe{
Expand Down
2 changes: 1 addition & 1 deletion R/untokenize.R
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@
#'
#' # Tidying
#'
#' When you [`tidy()`][tidy.recipe()] this step, a tibble is retruned with
#' When you [`tidy()`][tidy.recipe()] this step, a tibble is returned with
#' columns `terms`, `value`, and `id`:
#'
#' \describe{
Expand Down
2 changes: 1 addition & 1 deletion R/word_embeddings.R
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,7 @@
#'
#' # Tidying
#'
#' When you [`tidy()`][tidy.recipe()] this step, a tibble is retruned with
#' When you [`tidy()`][tidy.recipe()] this step, a tibble is returned with
#' columns `terms`, `embedding_rows`, `aggregation`, and `id`:
#'
#' \describe{
Expand Down
4 changes: 2 additions & 2 deletions man/show_tokens.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion man/step_clean_levels.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion man/step_clean_names.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

4 changes: 2 additions & 2 deletions man/step_dummy_hash.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Loading
Loading