John Snow Labs Spark-NLP 3.3.0: New ALBERT, XLNet, RoBERTa, XLM-RoBERTa, and Longformer for Token Classification, 50x times faster to save models, new ways to discover pretrained models and pipelines, new state-of-the-art models, and lots more! #6194
maziyarpanahi
announced in
Announcement
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Overview
We are very excited to release Spark NLP 🚀 3.3.0! This release comes with new ALBERT, XLNet, RoBERTa, XLM-RoBERTa, and Longformer existing or fine-tuned models for Token Classification on HuggingFace 🤗 , up to 50x times faster saving Spark NLP models & pipelines, no more 2G limitation for the size of imported TensorFlow models, lots of new functions to filter and display pretrained models & pipelines inside Spark NLP, bug fixes, and more!
We are proud to say Spark NLP 3.3.0 is still compatible across all major releases of Apache Spark used locally, by all Cloud providers such as EMR, and all managed services such as Databricks. The major releases of Apache Spark include Apache Spark 3.0.x/3.1.x (
spark-nlp
), Apache Spark 2.4.x (spark-nlp-spark24
), and Apache Spark 2.3.x (spark-nlp-spark23
).As always, we would like to thank our community for their feedback, questions, and feature requests.
Major features and improvements
no limitation of size
when you import TensorFlow models! You can now import TF Hub & HuggingFace models larger than 2 Gigabytes of size.xlm_roberta_base
model before Spark NLP 3.3.0, and now it only takes up to 15 seconds!AlbertForTokenClassification
can load ALBERT Models with a token classification head on top (a linear layer on top of the hidden-states output) e.g. for Named-Entity-Recognition (NER) tasks. This annotator is compatible with all the models trained/fine-tuned by usingAlbertForTokenClassification
orTFAlbertForTokenClassification
in HuggingFace 🤗XlnetForTokenClassification
can load XLNet Models with a token classification head on top (a linear layer on top of the hidden-states output) e.g. for Named-Entity-Recognition (NER) tasks. This annotator is compatible with all the models trained/fine-tuned by usingXLNetForTokenClassificationet
orTFXLNetForTokenClassificationet
in HuggingFace 🤗RoBertaForTokenClassification
can load RoBERTa Models with a token classification head on top (a linear layer on top of the hidden-states output) e.g. for Named-Entity-Recognition (NER) tasks. This annotator is compatible with all the models trained/fine-tuned by usingRobertaForTokenClassification
orTFRobertaForTokenClassification
in HuggingFace 🤗XlmRoBertaForTokenClassification
can load XLM-RoBERTa Models with a token classification head on top (a linear layer on top of the hidden-states output) e.g. for Named-Entity-Recognition (NER) tasks. This annotator is compatible with all the models trained/fine-tuned by usingXLMRobertaForTokenClassification
orTFXLMRobertaForTokenClassification
in HuggingFace 🤗LongformerForTokenClassification
can load Longformer Models with a token classification head on top (a linear layer on top of the hidden-states output) e.g. for Named-Entity-Recognition (NER) tasks. This annotator is compatible with all the models trained/fine-tuned by usingLongformerForTokenClassification
orTFLongformerForTokenClassification
in HuggingFace 🤗language
,version
, or the name of theannotator
Bug Fixes
explain_document_ml
andexplain_document_dl
due to some inputsminCount
andclassCount
in Python forContextSpellCheckerApproach
annotatorexplain_document_ml
pretrained pipeline for Spark NLP 3.x on Apache Spark 2.xwordseg_best
model for Thai languagewordseg_large
model for Chinese languageModels and Pipelines
Spark NLP 3.3.0 comes with:
New Transformer Models
en
en
en
en
en
fa
xx
en
en
en
en
en
xx
lg
rw
ig
ha
am
The complete list of all 3700+ models & pipelines in 200+ languages is available on Models Hub.
New Notebooks
Import hundreds of models in different languages to Spark NLP
Documentation
Installation
Python
#PyPI pip install spark-nlp==3.3.0
Spark Packages
spark-nlp on Apache Spark 3.0.x and 3.1.x (Scala 2.12 only):
GPU
spark-nlp on Apache Spark 2.4.x (Scala 2.11 only):
GPU
spark-nlp on Apache Spark 2.3.x (Scala 2.11 only):
GPU
Maven
spark-nlp on Apache Spark 3.0.x and 3.1.x:
spark-nlp-gpu:
spark-nlp on Apache Spark 2.4.x:
spark-nlp-gpu:
spark-nlp on Apache Spark 2.3.x:
spark-nlp-gpu:
FAT JARs
CPU on Apache Spark 3.x: https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/jars/spark-nlp-assembly-3.3.0.jar
GPU on Apache Spark 3.x: https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/jars/spark-nlp-gpu-assembly-3.3.0.jar
CPU on Apache Spark 2.4.x: https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/jars/spark-nlp-spark24-assembly-3.3.0.jar
GPU on Apache Spark 2.4.x: https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/jars/spark-nlp-gpu-spark24-assembly-3.3.0.jar
CPU on Apache Spark 2.3.x: https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/jars/spark-nlp-spark23-assembly-3.3.0.jar
GPU on Apache Spark 2.3.x: https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/jars/spark-nlp-gpu-spark23-assembly-3.3.0.jar
Beta Was this translation helpful? Give feedback.
All reactions