[style] Replace Huggingface with Hugging Face (UKPLab#2905)

tomaarsen · Aug 23, 2024 · 2e51740 · 2e51740
1 parent fbe7d6a
commit 2e51740
Show file tree

Hide file tree

Showing 14 changed files with 24 additions and 24 deletions.
diff --git a/docs/cross_encoder/training_overview.md b/docs/cross_encoder/training_overview.md
@@ -6,7 +6,7 @@
     The CrossEncoder training approach has not been updated in v3.0 when `training Sentence Transformer models <../sentence_transformer/training_overview.html>`_ was improved. Improving training CrossEncoders is planned for a future major update.
 ```
 
-The `CrossEncoder` class is a wrapper around Huggingface `AutoModelForSequenceClassification`, but with some methods to make training and predicting scores a little bit easier. The saved models are 100% compatible with Huggingface and can also be loaded with their classes.
+The `CrossEncoder` class is a wrapper around Hugging Face `AutoModelForSequenceClassification`, but with some methods to make training and predicting scores a little bit easier. The saved models are 100% compatible with Hugging Face and can also be loaded with their classes.
 
 First, you need some sentence pair data. You can either have a continuous score, like:
 

diff --git a/docs/pretrained-models/ce-msmarco.md b/docs/pretrained-models/ce-msmarco.md
@@ -59,4 +59,4 @@ In the following table, we provide various pre-trained Cross-Encoders together w
 | amberoad/bert-multilingual-passage-reranking-msmarco | 68.40 | 35.54 | 330 |  
 | sebastian-hofstaetter/distilbert-cat-margin_mse-T2-msmarco | 72.82 | 37.88 | 720
 
- Note: Runtime was computed on a V100 GPU with Huggingface Transformers v4. 
+ Note: Runtime was computed on a V100 GPU with Hugging Face Transformers v4. 
diff --git a/examples/applications/cross-encoder/README.md b/examples/applications/cross-encoder/README.md
@@ -39,7 +39,7 @@ scores = model.predict([["My first", "sentence pair"], ["Second text", "pair"]])
 
 You pass to `model.predict` a list of sentence **pairs**. Note, Cross-Encoder do not work on individual sentence, you have to pass sentence pairs.
 
-As model name, you can pass any model or path that is compatible with Huggingface [AutoModel](https://huggingface.co/transformers/model_doc/auto.html) class
+As model name, you can pass any model or path that is compatible with Hugging Face [AutoModel](https://huggingface.co/transformers/model_doc/auto.html) class
 
 
 For a full example, to score a query with all possible sentences in a corpus see [cross-encoder_usage.py](cross-encoder_usage.py).

diff --git a/examples/training/cross-encoder/README.md b/examples/training/cross-encoder/README.md
@@ -9,7 +9,7 @@ See the following examples how to train Cross-Encoders:
 
 ## Training CrossEncoders
 
-The `CrossEncoder` class is a wrapper around Huggingface `AutoModelForSequenceClassification`, but with some methods to make training and predicting scores a little bit easier. The saved models are 100% compatible with Huggingface and can also be loaded with their classes.
+The `CrossEncoder` class is a wrapper around Hugging Face `AutoModelForSequenceClassification`, but with some methods to make training and predicting scores a little bit easier. The saved models are 100% compatible with Hugging Face and can also be loaded with their classes.
 
 First, you need some sentence pair data. You can either have a continuous score, like:
 ```python
@@ -32,7 +32,7 @@ train_samples = [
 ]
 ```
 
-Then, you define the base model and the number of labels. You can take any [Huggingface pre-trained model](https://huggingface.co/transformers/pretrained_models.html) that is compatible with AutoModel:
+Then, you define the base model and the number of labels. You can take any [Hugging Face pre-trained model](https://huggingface.co/transformers/pretrained_models.html) that is compatible with AutoModel:
 ```
 model = CrossEncoder('distilroberta-base', num_labels=1)
 ```

diff --git a/examples/training/data_augmentation/train_sts_indomain_nlpaug.py b/examples/training/data_augmentation/train_sts_indomain_nlpaug.py
@@ -61,7 +61,7 @@
     + datetime.now().strftime("%Y-%m-%d_%H-%M-%S")
 )
 
-# Use Huggingface/transformers model (like BERT, RoBERTa, XLNet, XLM-R) for mapping tokens to embeddings
+# Use Hugging Face/transformers model (like BERT, RoBERTa, XLNet, XLM-R) for mapping tokens to embeddings
 model = SentenceTransformer(model_name)
 
 # Load the STSB dataset: https://huggingface.co/datasets/sentence-transformers/stsb

diff --git a/examples/training/data_augmentation/train_sts_indomain_semantic.py b/examples/training/data_augmentation/train_sts_indomain_semantic.py
@@ -75,13 +75,13 @@
 
 ###### Cross-encoder (simpletransformers) ######
 logging.info(f"Loading cross-encoder model: {model_name}")
-# Use Huggingface/transformers model (like BERT, RoBERTa, XLNet, XLM-R) for cross-encoder model
+# Use Hugging Face/transformers model (like BERT, RoBERTa, XLNet, XLM-R) for cross-encoder model
 cross_encoder = CrossEncoder(model_name, num_labels=1)
 
 
 ###### Bi-encoder (sentence-transformers) ######
 logging.info(f"Loading bi-encoder model: {model_name}")
-# Use Huggingface/transformers model (like BERT, RoBERTa, XLNet, XLM-R) for mapping tokens to embeddings
+# Use Hugging Face/transformers model (like BERT, RoBERTa, XLNet, XLM-R) for mapping tokens to embeddings
 word_embedding_model = models.Transformer(model_name, max_seq_length=max_seq_length)
 
 # Apply mean pooling to get one fixed sized sentence vector

diff --git a/examples/training/data_augmentation/train_sts_qqp_crossdomain.py b/examples/training/data_augmentation/train_sts_qqp_crossdomain.py
@@ -83,14 +83,14 @@
 ###### Cross-encoder (simpletransformers) ######
 
 logging.info(f"Loading cross-encoder model: {model_name}")
-# Use Huggingface/transformers model (like BERT, RoBERTa, XLNet, XLM-R) for cross-encoder model
+# Use Hugging Face/transformers model (like BERT, RoBERTa, XLNet, XLM-R) for cross-encoder model
 cross_encoder = CrossEncoder(model_name, num_labels=1)
 
 ###### Bi-encoder (sentence-transformers) ######
 
 logging.info(f"Loading bi-encoder model: {model_name}")
 
-# Use Huggingface/transformers model (like BERT, RoBERTa, XLNet, XLM-R) for mapping tokens to embeddings
+# Use Hugging Face/transformers model (like BERT, RoBERTa, XLNet, XLM-R) for mapping tokens to embeddings
 word_embedding_model = models.Transformer(model_name, max_seq_length=max_seq_length)
 
 # Apply mean pooling to get one fixed sized sentence vector

diff --git a/examples/training/data_augmentation/train_sts_seed_optimization.py b/examples/training/data_augmentation/train_sts_seed_optimization.py
@@ -72,7 +72,7 @@
     num_epochs = 1
     model_save_path = "output/bi-encoder/training_stsbenchmark_" + model_name + "/seed-" + str(seed)
 
-    # Use Huggingface/transformers model (like BERT, RoBERTa, XLNet, XLM-R) for mapping tokens to embeddings
+    # Use Hugging Face/transformers model (like BERT, RoBERTa, XLNet, XLM-R) for mapping tokens to embeddings
     word_embedding_model = models.Transformer(model_name)
 
     # Apply mean pooling to get one fixed sized sentence vector

diff --git a/examples/unsupervised_learning/CT/train_ct_from_file.py b/examples/unsupervised_learning/CT/train_ct_from_file.py
@@ -46,7 +46,7 @@
 model_output_path = "output/train_ct{}-{}".format(output_name, datetime.now().strftime("%Y-%m-%d_%H-%M-%S"))
 
 
-# Use Huggingface/transformers model (like BERT, RoBERTa, XLNet, XLM-R) for mapping tokens to embeddings
+# Use Hugging Face/transformers model (like BERT, RoBERTa, XLNet, XLM-R) for mapping tokens to embeddings
 word_embedding_model = models.Transformer(model_name, max_seq_length=max_seq_length)
 
 # Apply mean pooling to get one fixed sized sentence vector

diff --git a/examples/unsupervised_learning/CT_In-Batch_Negatives/train_ct-improved_from_file.py b/examples/unsupervised_learning/CT_In-Batch_Negatives/train_ct-improved_from_file.py
@@ -46,7 +46,7 @@
 model_output_path = "output/train_ct-improved{}-{}".format(output_name, datetime.now().strftime("%Y-%m-%d_%H-%M-%S"))
 
 
-# Use Huggingface/transformers model (like BERT, RoBERTa, XLNet, XLM-R) for mapping tokens to embeddings
+# Use Hugging Face/transformers model (like BERT, RoBERTa, XLNet, XLM-R) for mapping tokens to embeddings
 word_embedding_model = models.Transformer(model_name, max_seq_length=max_seq_length)
 
 # Apply mean pooling to get one fixed sized sentence vector

diff --git a/examples/unsupervised_learning/SimCSE/train_simcse_from_file.py b/examples/unsupervised_learning/SimCSE/train_simcse_from_file.py
@@ -46,7 +46,7 @@
 model_output_path = "output/train_simcse{}-{}".format(output_name, datetime.now().strftime("%Y-%m-%d_%H-%M-%S"))
 
 
-# Use Huggingface/transformers model (like BERT, RoBERTa, XLNet, XLM-R) for mapping tokens to embeddings
+# Use Hugging Face/transformers model (like BERT, RoBERTa, XLNet, XLM-R) for mapping tokens to embeddings
 word_embedding_model = models.Transformer(model_name, max_seq_length=max_seq_length)
 
 # Apply mean pooling to get one fixed sized sentence vector

diff --git a/sentence_transformers/SentenceTransformer.py b/sentence_transformers/SentenceTransformer.py
@@ -80,7 +80,7 @@ class SentenceTransformer(nn.Sequential, FitMixin):
         use_auth_token (bool or str, optional): Deprecated argument. Please use `token` instead.
         truncate_dim (int, optional): The dimension to truncate sentence embeddings to. `None` does no truncation. Truncation is
             only applicable during inference when :meth:`SentenceTransformer.encode` is called.
-        model_kwargs (Dict[str, Any], optional): Additional model configuration parameters to be passed to the Huggingface Transformers model.
+        model_kwargs (Dict[str, Any], optional): Additional model configuration parameters to be passed to the Hugging Face Transformers model.
             Particularly useful options are:
 
             - ``torch_dtype``: Override the default `torch.dtype` and load the model under a specific `dtype`.
@@ -105,11 +105,11 @@ class SentenceTransformer(nn.Sequential, FitMixin):
             See the `PreTrainedModel.from_pretrained
             <https://huggingface.co/docs/transformers/en/main_classes/model#transformers.PreTrainedModel.from_pretrained>`_
             documentation for more details.
-        tokenizer_kwargs (Dict[str, Any], optional): Additional tokenizer configuration parameters to be passed to the Huggingface Transformers tokenizer.
+        tokenizer_kwargs (Dict[str, Any], optional): Additional tokenizer configuration parameters to be passed to the Hugging Face Transformers tokenizer.
             See the `AutoTokenizer.from_pretrained
             <https://huggingface.co/docs/transformers/en/model_doc/auto#transformers.AutoTokenizer.from_pretrained>`_
             documentation for more details.
-        config_kwargs (Dict[str, Any], optional): Additional model configuration parameters to be passed to the Huggingface Transformers config.
+        config_kwargs (Dict[str, Any], optional): Additional model configuration parameters to be passed to the Hugging Face Transformers config.
             See the `AutoConfig.from_pretrained
             <https://huggingface.co/docs/transformers/en/model_doc/auto#transformers.AutoConfig.from_pretrained>`_
             documentation for more details.

diff --git a/sentence_transformers/losses/DenoisingAutoEncoderLoss.py b/sentence_transformers/losses/DenoisingAutoEncoderLoss.py
@@ -29,7 +29,7 @@ def __init__(
 
         Args:
             model (SentenceTransformer): The SentenceTransformer model.
-            decoder_name_or_path (str, optional): Model name or path for initializing a decoder (compatible with Huggingface's Transformers). Defaults to None.
+            decoder_name_or_path (str, optional): Model name or path for initializing a decoder (compatible with Hugging Face's Transformers). Defaults to None.
             tie_encoder_decoder (bool): Whether to tie the trainable parameters of encoder and decoder. Defaults to True.
 
         References:

diff --git a/sentence_transformers/models/Transformer.py b/sentence_transformers/models/Transformer.py
@@ -10,20 +10,20 @@
 
 
 class Transformer(nn.Module):
-    """Huggingface AutoModel to generate token embeddings.
+    """Hugging Face AutoModel to generate token embeddings.
     Loads the correct class, e.g. BERT / RoBERTa etc.
 
     Args:
-        model_name_or_path: Huggingface models name
+        model_name_or_path: Hugging Face models name
             (https://huggingface.co/models)
         max_seq_length: Truncate any inputs longer than max_seq_length
-        model_args: Keyword arguments passed to the Huggingface
+        model_args: Keyword arguments passed to the Hugging Face
             Transformers model
-        tokenizer_args: Keyword arguments passed to the Huggingface
+        tokenizer_args: Keyword arguments passed to the Hugging Face
             Transformers tokenizer
-        config_args: Keyword arguments passed to the Huggingface
+        config_args: Keyword arguments passed to the Hugging Face
             Transformers config
-        cache_dir: Cache dir for Huggingface Transformers to store/load
+        cache_dir: Cache dir for Hugging Face Transformers to store/load
             models
         do_lower_case: If true, lowercases the input (independent if the
             model is cased or not)