diff --git a/README.md b/README.md
index 0ac5844460c923..cb7c32736e8638 100644
--- a/README.md
+++ b/README.md
@@ -139,7 +139,7 @@ documentation and examples
- Text-To-Text Transfer Transformer (Google T5)
- Generative Pre-trained Transformer 2 (OpenAI GPT2)
- Seq2Seq for NLG, Translation, and Comprehension (Facebook BART)
-- Chat and Conversational LLMs (Facebook Llama-22)
+- Chat and Conversational LLMs (Facebook Llama-2)
- Vision Transformer (Google ViT)
- Swin Image Classification (Microsoft Swin Transformer)
- ConvNext Image Classification (Facebook ConvNext)
@@ -149,10 +149,10 @@ documentation and examples
- Automatic Speech Recognition (HuBERT)
- Automatic Speech Recognition (OpenAI Whisper)
- Named entity recognition (Deep learning)
-- Easy ONNX and TensorFlow integrations
+- Easy ONNX, OpenVINO, and TensorFlow integrations
- GPU Support
- Full integration with Spark ML functions
-- +30000 pre-trained models in +200 languages!
+- +31000 pre-trained models in +200 languages!
- +6000 pre-trained pipelines in +200 languages!
- Multi-lingual NER models: Arabic, Bengali, Chinese, Danish, Dutch, English, Finnish, French, German, Hebrew, Italian,
Japanese, Korean, Norwegian, Persian, Polish, Portuguese, Russian, Spanish, Swedish, Urdu, and more.
@@ -166,7 +166,7 @@ To use Spark NLP you need the following requirements:
**GPU (optional):**
-Spark NLP 5.4.0-rc2 is built with ONNX 1.17.0 and TensorFlow 2.7.1 deep learning engines. The minimum following NVIDIA® software are only required for GPU support:
+Spark NLP 5.4.0 is built with ONNX 1.17.0 and TensorFlow 2.7.1 deep learning engines. The minimum following NVIDIA® software are only required for GPU support:
- NVIDIA® GPU drivers version 450.80.02 or higher
- CUDA® Toolkit 11.2
@@ -182,7 +182,7 @@ $ java -version
$ conda create -n sparknlp python=3.7 -y
$ conda activate sparknlp
# spark-nlp by default is based on pyspark 3.x
-$ pip install spark-nlp==5.4.0-rc2 pyspark==3.3.1
+$ pip install spark-nlp==5.4.0 pyspark==3.3.1
```
In Python console or Jupyter `Python3` kernel:
@@ -227,10 +227,11 @@ For more examples, you can visit our dedicated [examples](https://github.com/Joh
## Apache Spark Support
-Spark NLP *5.4.0-rc2* has been built on top of Apache Spark 3.4 while fully supports Apache Spark 3.0.x, 3.1.x, 3.2.x, 3.3.x, 3.4.x, and 3.5.x
+Spark NLP *5.4.0* has been built on top of Apache Spark 3.4 while fully supports Apache Spark 3.0.x, 3.1.x, 3.2.x, 3.3.x, 3.4.x, and 3.5.x
| Spark NLP | Apache Spark 3.5.x | Apache Spark 3.4.x | Apache Spark 3.3.x | Apache Spark 3.2.x | Apache Spark 3.1.x | Apache Spark 3.0.x | Apache Spark 2.4.x | Apache Spark 2.3.x |
|-----------|--------------------|--------------------|--------------------|--------------------|--------------------|--------------------|--------------------|--------------------|
+| 5.4.x | YES | YES | YES | YES | YES | YES | NO | NO |
| 5.3.x | YES | YES | YES | YES | YES | YES | NO | NO |
| 5.2.x | YES | YES | YES | YES | YES | YES | NO | NO |
| 5.1.x | Partially | YES | YES | YES | YES | YES | NO | NO |
@@ -240,12 +241,6 @@ Spark NLP *5.4.0-rc2* has been built on top of Apache Spark 3.4 while fully supp
| 4.2.x | NO | NO | YES | YES | YES | YES | NO | NO |
| 4.1.x | NO | NO | YES | YES | YES | YES | NO | NO |
| 4.0.x | NO | NO | YES | YES | YES | YES | NO | NO |
-| 3.4.x | NO | NO | N/A | Partially | YES | YES | YES | YES |
-| 3.3.x | NO | NO | NO | NO | YES | YES | YES | YES |
-| 3.2.x | NO | NO | NO | NO | YES | YES | YES | YES |
-| 3.1.x | NO | NO | NO | NO | YES | YES | YES | YES |
-| 3.0.x | NO | NO | NO | NO | YES | YES | YES | YES |
-| 2.7.x | NO | NO | NO | NO | NO | NO | YES | YES |
Find out more about `Spark NLP` versions from our [release notes](https://github.com/JohnSnowLabs/spark-nlp/releases).
@@ -262,16 +257,10 @@ Find out more about `Spark NLP` versions from our [release notes](https://github
| 4.2.x | YES | YES | YES | YES | YES | NO | YES |
| 4.1.x | YES | YES | YES | YES | NO | NO | YES |
| 4.0.x | YES | YES | YES | YES | NO | NO | YES |
-| 3.4.x | YES | YES | YES | YES | NO | YES | YES |
-| 3.3.x | YES | YES | YES | NO | NO | YES | YES |
-| 3.2.x | YES | YES | YES | NO | NO | YES | YES |
-| 3.1.x | YES | YES | YES | NO | NO | YES | YES |
-| 3.0.x | YES | YES | YES | NO | NO | YES | YES |
-| 2.7.x | YES | YES | NO | NO | NO | YES | NO |
## Databricks Support
-Spark NLP 5.4.0-rc2 has been tested and is compatible with the following runtimes:
+Spark NLP 5.4.0 has been tested and is compatible with the following runtimes:
**CPU:**
@@ -344,7 +333,7 @@ Spark NLP 5.4.0-rc2 has been tested and is compatible with the following runtime
## EMR Support
-Spark NLP 5.4.0-rc2 has been tested and is compatible with the following EMR releases:
+Spark NLP 5.4.0 has been tested and is compatible with the following EMR releases:
- emr-6.2.0
- emr-6.3.0
@@ -394,11 +383,11 @@ Spark NLP supports all major releases of Apache Spark 3.0.x, Apache Spark 3.1.x,
```sh
# CPU
-spark-shell --packages com.johnsnowlabs.nlp:spark-nlp_2.12:5.4.0-rc2
+spark-shell --packages com.johnsnowlabs.nlp:spark-nlp_2.12:5.4.0
-pyspark --packages com.johnsnowlabs.nlp:spark-nlp_2.12:5.4.0-rc2
+pyspark --packages com.johnsnowlabs.nlp:spark-nlp_2.12:5.4.0
-spark-submit --packages com.johnsnowlabs.nlp:spark-nlp_2.12:5.4.0-rc2
+spark-submit --packages com.johnsnowlabs.nlp:spark-nlp_2.12:5.4.0
```
The `spark-nlp` has been published to
@@ -407,11 +396,11 @@ the [Maven Repository](https://mvnrepository.com/artifact/com.johnsnowlabs.nlp/s
```sh
# GPU
-spark-shell --packages com.johnsnowlabs.nlp:spark-nlp-gpu_2.12:5.4.0-rc2
+spark-shell --packages com.johnsnowlabs.nlp:spark-nlp-gpu_2.12:5.4.0
-pyspark --packages com.johnsnowlabs.nlp:spark-nlp-gpu_2.12:5.4.0-rc2
+pyspark --packages com.johnsnowlabs.nlp:spark-nlp-gpu_2.12:5.4.0
-spark-submit --packages com.johnsnowlabs.nlp:spark-nlp-gpu_2.12:5.4.0-rc2
+spark-submit --packages com.johnsnowlabs.nlp:spark-nlp-gpu_2.12:5.4.0
```
@@ -421,11 +410,11 @@ the [Maven Repository](https://mvnrepository.com/artifact/com.johnsnowlabs.nlp/s
```sh
# AArch64
-spark-shell --packages com.johnsnowlabs.nlp:spark-nlp-aarch64_2.12:5.4.0-rc2
+spark-shell --packages com.johnsnowlabs.nlp:spark-nlp-aarch64_2.12:5.4.0
-pyspark --packages com.johnsnowlabs.nlp:spark-nlp-aarch64_2.12:5.4.0-rc2
+pyspark --packages com.johnsnowlabs.nlp:spark-nlp-aarch64_2.12:5.4.0
-spark-submit --packages com.johnsnowlabs.nlp:spark-nlp-aarch64_2.12:5.4.0-rc2
+spark-submit --packages com.johnsnowlabs.nlp:spark-nlp-aarch64_2.12:5.4.0
```
@@ -435,11 +424,11 @@ the [Maven Repository](https://mvnrepository.com/artifact/com.johnsnowlabs.nlp/s
```sh
# M1/M2 (Apple Silicon)
-spark-shell --packages com.johnsnowlabs.nlp:spark-nlp-silicon_2.12:5.4.0-rc2
+spark-shell --packages com.johnsnowlabs.nlp:spark-nlp-silicon_2.12:5.4.0
-pyspark --packages com.johnsnowlabs.nlp:spark-nlp-silicon_2.12:5.4.0-rc2
+pyspark --packages com.johnsnowlabs.nlp:spark-nlp-silicon_2.12:5.4.0
-spark-submit --packages com.johnsnowlabs.nlp:spark-nlp-silicon_2.12:5.4.0-rc2
+spark-submit --packages com.johnsnowlabs.nlp:spark-nlp-silicon_2.12:5.4.0
```
@@ -453,7 +442,7 @@ set in your SparkSession:
spark-shell \
--driver-memory 16g \
--conf spark.kryoserializer.buffer.max=2000M \
- --packages com.johnsnowlabs.nlp:spark-nlp_2.12:5.4.0-rc2
+ --packages com.johnsnowlabs.nlp:spark-nlp_2.12:5.4.0
```
## Scala
@@ -471,7 +460,7 @@ coordinates:
com.johnsnowlabs.nlp
spark-nlp_2.12
- 5.4.0-rc2
+ 5.4.0
```
@@ -482,7 +471,7 @@ coordinates:
com.johnsnowlabs.nlp
spark-nlp-gpu_2.12
- 5.4.0-rc2
+ 5.4.0
```
@@ -493,7 +482,7 @@ coordinates:
com.johnsnowlabs.nlp
spark-nlp-aarch64_2.12
- 5.4.0-rc2
+ 5.4.0
```
@@ -504,7 +493,7 @@ coordinates:
com.johnsnowlabs.nlp
spark-nlp-silicon_2.12
- 5.4.0-rc2
+ 5.4.0
```
@@ -514,28 +503,28 @@ coordinates:
```sbtshell
// https://mvnrepository.com/artifact/com.johnsnowlabs.nlp/spark-nlp
-libraryDependencies += "com.johnsnowlabs.nlp" %% "spark-nlp" % "5.4.0-rc2"
+libraryDependencies += "com.johnsnowlabs.nlp" %% "spark-nlp" % "5.4.0"
```
**spark-nlp-gpu:**
```sbtshell
// https://mvnrepository.com/artifact/com.johnsnowlabs.nlp/spark-nlp-gpu
-libraryDependencies += "com.johnsnowlabs.nlp" %% "spark-nlp-gpu" % "5.4.0-rc2"
+libraryDependencies += "com.johnsnowlabs.nlp" %% "spark-nlp-gpu" % "5.4.0"
```
**spark-nlp-aarch64:**
```sbtshell
// https://mvnrepository.com/artifact/com.johnsnowlabs.nlp/spark-nlp-aarch64
-libraryDependencies += "com.johnsnowlabs.nlp" %% "spark-nlp-aarch64" % "5.4.0-rc2"
+libraryDependencies += "com.johnsnowlabs.nlp" %% "spark-nlp-aarch64" % "5.4.0"
```
**spark-nlp-silicon:**
```sbtshell
// https://mvnrepository.com/artifact/com.johnsnowlabs.nlp/spark-nlp-silicon
-libraryDependencies += "com.johnsnowlabs.nlp" %% "spark-nlp-silicon" % "5.4.0-rc2"
+libraryDependencies += "com.johnsnowlabs.nlp" %% "spark-nlp-silicon" % "5.4.0"
```
Maven
@@ -557,7 +546,7 @@ If you installed pyspark through pip/conda, you can install `spark-nlp` through
Pip:
```bash
-pip install spark-nlp==5.4.0-rc2
+pip install spark-nlp==5.4.0
```
Conda:
@@ -586,7 +575,7 @@ spark = SparkSession.builder
.config("spark.driver.memory", "16G")
.config("spark.driver.maxResultSize", "0")
.config("spark.kryoserializer.buffer.max", "2000M")
- .config("spark.jars.packages", "com.johnsnowlabs.nlp:spark-nlp_2.12:5.4.0-rc2")
+ .config("spark.jars.packages", "com.johnsnowlabs.nlp:spark-nlp_2.12:5.4.0")
.getOrCreate()
```
@@ -657,7 +646,7 @@ Use either one of the following options
- Add the following Maven Coordinates to the interpreter's library list
```bash
-com.johnsnowlabs.nlp:spark-nlp_2.12:5.4.0-rc2
+com.johnsnowlabs.nlp:spark-nlp_2.12:5.4.0
```
- Add a path to pre-built jar from [here](#compiled-jars) in the interpreter's library list making sure the jar is
@@ -668,7 +657,7 @@ com.johnsnowlabs.nlp:spark-nlp_2.12:5.4.0-rc2
Apart from the previous step, install the python module through pip
```bash
-pip install spark-nlp==5.4.0-rc2
+pip install spark-nlp==5.4.0
```
Or you can install `spark-nlp` from inside Zeppelin by using Conda:
@@ -696,7 +685,7 @@ launch the Jupyter from the same Python environment:
$ conda create -n sparknlp python=3.8 -y
$ conda activate sparknlp
# spark-nlp by default is based on pyspark 3.x
-$ pip install spark-nlp==5.4.0-rc2 pyspark==3.3.1 jupyter
+$ pip install spark-nlp==5.4.0 pyspark==3.3.1 jupyter
$ jupyter notebook
```
@@ -713,7 +702,7 @@ export PYSPARK_PYTHON=python3
export PYSPARK_DRIVER_PYTHON=jupyter
export PYSPARK_DRIVER_PYTHON_OPTS=notebook
-pyspark --packages com.johnsnowlabs.nlp:spark-nlp_2.12:5.4.0-rc2
+pyspark --packages com.johnsnowlabs.nlp:spark-nlp_2.12:5.4.0
```
Alternatively, you can mix in using `--jars` option for pyspark + `pip install spark-nlp`
@@ -740,7 +729,7 @@ This script comes with the two options to define `pyspark` and `spark-nlp` versi
# -s is for spark-nlp
# -g will enable upgrading libcudnn8 to 8.1.0 on Google Colab for GPU usage
# by default they are set to the latest
-!wget https://setup.johnsnowlabs.com/colab.sh -O - | bash /dev/stdin -p 3.2.3 -s 5.4.0-rc2
+!wget https://setup.johnsnowlabs.com/colab.sh -O - | bash /dev/stdin -p 3.2.3 -s 5.4.0
```
[Spark NLP quick start on Google Colab](https://colab.research.google.com/github/JohnSnowLabs/spark-nlp/blob/master/examples/python/quick_start_google_colab.ipynb)
@@ -763,7 +752,7 @@ This script comes with the two options to define `pyspark` and `spark-nlp` versi
# -s is for spark-nlp
# -g will enable upgrading libcudnn8 to 8.1.0 on Kaggle for GPU usage
# by default they are set to the latest
-!wget https://setup.johnsnowlabs.com/colab.sh -O - | bash /dev/stdin -p 3.2.3 -s 5.4.0-rc2
+!wget https://setup.johnsnowlabs.com/colab.sh -O - | bash /dev/stdin -p 3.2.3 -s 5.4.0
```
[Spark NLP quick start on Kaggle Kernel](https://www.kaggle.com/mozzie/spark-nlp-named-entity-recognition) is a live
@@ -782,9 +771,9 @@ demo on Kaggle Kernel that performs named entity recognitions by using Spark NLP
3. In `Libraries` tab inside your cluster you need to follow these steps:
- 3.1. Install New -> PyPI -> `spark-nlp==5.4.0-rc2` -> Install
+ 3.1. Install New -> PyPI -> `spark-nlp==5.4.0` -> Install
- 3.2. Install New -> Maven -> Coordinates -> `com.johnsnowlabs.nlp:spark-nlp_2.12:5.4.0-rc2` -> Install
+ 3.2. Install New -> Maven -> Coordinates -> `com.johnsnowlabs.nlp:spark-nlp_2.12:5.4.0` -> Install
4. Now you can attach your notebook to the cluster and use Spark NLP!
@@ -835,7 +824,7 @@ A sample of your software configuration in JSON on S3 (must be public access):
"spark.kryoserializer.buffer.max": "2000M",
"spark.serializer": "org.apache.spark.serializer.KryoSerializer",
"spark.driver.maxResultSize": "0",
- "spark.jars.packages": "com.johnsnowlabs.nlp:spark-nlp_2.12:5.4.0-rc2"
+ "spark.jars.packages": "com.johnsnowlabs.nlp:spark-nlp_2.12:5.4.0"
}
}]
```
@@ -844,7 +833,7 @@ A sample of AWS CLI to launch EMR cluster:
```.sh
aws emr create-cluster \
---name "Spark NLP 5.4.0-rc2" \
+--name "Spark NLP 5.4.0" \
--release-label emr-6.2.0 \
--applications Name=Hadoop Name=Spark Name=Hive \
--instance-type m4.4xlarge \
@@ -908,7 +897,7 @@ gcloud dataproc clusters create ${CLUSTER_NAME} \
--enable-component-gateway \
--metadata 'PIP_PACKAGES=spark-nlp spark-nlp-display google-cloud-bigquery google-cloud-storage' \
--initialization-actions gs://goog-dataproc-initialization-actions-${REGION}/python/pip-install.sh \
- --properties spark:spark.serializer=org.apache.spark.serializer.KryoSerializer,spark:spark.driver.maxResultSize=0,spark:spark.kryoserializer.buffer.max=2000M,spark:spark.jars.packages=com.johnsnowlabs.nlp:spark-nlp_2.12:5.4.0-rc2
+ --properties spark:spark.serializer=org.apache.spark.serializer.KryoSerializer,spark:spark.driver.maxResultSize=0,spark:spark.kryoserializer.buffer.max=2000M,spark:spark.jars.packages=com.johnsnowlabs.nlp:spark-nlp_2.12:5.4.0
```
2. On an existing one, you need to install spark-nlp and spark-nlp-display packages from PyPI.
@@ -951,7 +940,7 @@ spark = SparkSession.builder
.config("spark.kryoserializer.buffer.max", "2000m")
.config("spark.jsl.settings.pretrained.cache_folder", "sample_data/pretrained")
.config("spark.jsl.settings.storage.cluster_tmp_dir", "sample_data/storage")
- .config("spark.jars.packages", "com.johnsnowlabs.nlp:spark-nlp_2.12:5.4.0-rc2")
+ .config("spark.jars.packages", "com.johnsnowlabs.nlp:spark-nlp_2.12:5.4.0")
.getOrCreate()
```
@@ -965,7 +954,7 @@ spark-shell \
--conf spark.kryoserializer.buffer.max=2000M \
--conf spark.jsl.settings.pretrained.cache_folder="sample_data/pretrained" \
--conf spark.jsl.settings.storage.cluster_tmp_dir="sample_data/storage" \
- --packages com.johnsnowlabs.nlp:spark-nlp_2.12:5.4.0-rc2
+ --packages com.johnsnowlabs.nlp:spark-nlp_2.12:5.4.0
```
**pyspark:**
@@ -978,7 +967,7 @@ pyspark \
--conf spark.kryoserializer.buffer.max=2000M \
--conf spark.jsl.settings.pretrained.cache_folder="sample_data/pretrained" \
--conf spark.jsl.settings.storage.cluster_tmp_dir="sample_data/storage" \
- --packages com.johnsnowlabs.nlp:spark-nlp_2.12:5.4.0-rc2
+ --packages com.johnsnowlabs.nlp:spark-nlp_2.12:5.4.0
```
**Databricks:**
@@ -1250,7 +1239,7 @@ spark = SparkSession.builder
.config("spark.driver.memory", "16G")
.config("spark.driver.maxResultSize", "0")
.config("spark.kryoserializer.buffer.max", "2000M")
- .config("spark.jars", "/tmp/spark-nlp-assembly-5.4.0-rc2.jar")
+ .config("spark.jars", "/tmp/spark-nlp-assembly-5.4.0.jar")
.getOrCreate()
```
@@ -1259,7 +1248,7 @@ spark = SparkSession.builder
version (3.0.x, 3.1.x, 3.2.x, 3.3.x, 3.4.x, and 3.5.x)
- If you are local, you can load the Fat JAR from your local FileSystem, however, if you are in a cluster setup you need
to put the Fat JAR on a distributed FileSystem such as HDFS, DBFS, S3, etc. (
- i.e., `hdfs:///tmp/spark-nlp-assembly-5.4.0-rc2.jar`)
+ i.e., `hdfs:///tmp/spark-nlp-assembly-5.4.0.jar`)
Example of using pretrained Models and Pipelines in offline:
diff --git a/build.sbt b/build.sbt
index 9e5dd18adfb3a4..9e0e57ac29e51b 100644
--- a/build.sbt
+++ b/build.sbt
@@ -6,7 +6,7 @@ name := getPackageName(is_silicon, is_gpu, is_aarch64)
organization := "com.johnsnowlabs.nlp"
-version := "5.4.0-rc2"
+version := "5.4.0"
(ThisBuild / scalaVersion) := scalaVer
diff --git a/docs/_layouts/landing.html b/docs/_layouts/landing.html
index 4d88b8a4797399..ee4766b9904aa2 100755
--- a/docs/_layouts/landing.html
+++ b/docs/_layouts/landing.html
@@ -201,7 +201,7 @@
{{ _section.title }}
{% highlight bash %}
# Using PyPI
- $ pip install spark-nlp==5.4.0-rc2
+ $ pip install spark-nlp==5.4.0
# Using Anaconda/Conda
$ conda install -c johnsnowlabs spark-nlp
diff --git a/docs/en/concepts.md b/docs/en/concepts.md
index bf7695a7ab8a9e..61295da699db91 100644
--- a/docs/en/concepts.md
+++ b/docs/en/concepts.md
@@ -66,7 +66,7 @@ $ java -version
$ conda create -n sparknlp python=3.7 -y
$ conda activate sparknlp
# spark-nlp by default is based on pyspark 3.x
-$ pip install spark-nlp==5.4.0-rc2 pyspark==3.3.1 jupyter
+$ pip install spark-nlp==5.4.0 pyspark==3.3.1 jupyter
$ jupyter notebook
```
diff --git a/docs/en/examples.md b/docs/en/examples.md
index 5d4a893687975b..adc9b982acf24b 100644
--- a/docs/en/examples.md
+++ b/docs/en/examples.md
@@ -18,7 +18,7 @@ $ java -version
# should be Java 8 (Oracle or OpenJDK)
$ conda create -n sparknlp python=3.7 -y
$ conda activate sparknlp
-$ pip install spark-nlp==5.4.0-rc2 pyspark==3.3.1
+$ pip install spark-nlp==5.4.0 pyspark==3.3.1
```
@@ -40,7 +40,7 @@ This script comes with the two options to define `pyspark` and `spark-nlp` versi
# -p is for pyspark
# -s is for spark-nlp
# by default they are set to the latest
-!bash colab.sh -p 3.2.3 -s 5.4.0-rc2
+!bash colab.sh -p 3.2.3 -s 5.4.0
```
[Spark NLP quick start on Google Colab](https://colab.research.google.com/github/JohnSnowLabs/spark-nlp/blob/master/examples/python/quick_start_google_colab.ipynb) is a live demo on Google Colab that performs named entity recognitions and sentiment analysis by using Spark NLP pretrained pipelines.
diff --git a/docs/en/hardware_acceleration.md b/docs/en/hardware_acceleration.md
index 73934372cfff44..eaa8802d53a55f 100644
--- a/docs/en/hardware_acceleration.md
+++ b/docs/en/hardware_acceleration.md
@@ -49,7 +49,7 @@ Since the new Transformer models such as BERT for Word and Sentence embeddings a
| DeBERTa Large | +477%(5.8x) |
| Longformer Base | +52%(1.5x) |
-Spark NLP 5.4.0-rc2 is built with TensorFlow 2.7.1 and the following NVIDIA® software are only required for GPU support:
+Spark NLP 5.4.0 is built with TensorFlow 2.7.1 and the following NVIDIA® software are only required for GPU support:
- NVIDIA® GPU drivers version 450.80.02 or higher
- CUDA® Toolkit 11.2
diff --git a/docs/en/install.md b/docs/en/install.md
index d7ef9dc38b3322..4bc861a2c0d496 100644
--- a/docs/en/install.md
+++ b/docs/en/install.md
@@ -17,22 +17,22 @@ sidebar:
```bash
# Install Spark NLP from PyPI
-pip install spark-nlp==5.4.0-rc2
+pip install spark-nlp==5.4.0
# Install Spark NLP from Anaconda/Conda
conda install -c johnsnowlabs spark-nlp
# Load Spark NLP with Spark Shell
-spark-shell --packages com.johnsnowlabs.nlp:spark-nlp_2.12:5.4.0-rc2
+spark-shell --packages com.johnsnowlabs.nlp:spark-nlp_2.12:5.4.0
# Load Spark NLP with PySpark
-pyspark --packages com.johnsnowlabs.nlp:spark-nlp_2.12:5.4.0-rc2
+pyspark --packages com.johnsnowlabs.nlp:spark-nlp_2.12:5.4.0
# Load Spark NLP with Spark Submit
-spark-submit --packages com.johnsnowlabs.nlp:spark-nlp_2.12:5.4.0-rc2
+spark-submit --packages com.johnsnowlabs.nlp:spark-nlp_2.12:5.4.0
# Load Spark NLP as external JAR after compiling and building Spark NLP by `sbt assembly`
-spark-shell --jars spark-nlp-assembly-5.4.0-rc2.jar
+spark-shell --jars spark-nlp-assembly-5.4.0.jar
```
@@ -55,7 +55,7 @@ $ java -version
# should be Java 8 (Oracle or OpenJDK)
$ conda create -n sparknlp python=3.8 -y
$ conda activate sparknlp
-$ pip install spark-nlp==5.4.0-rc2 pyspark==3.3.1
+$ pip install spark-nlp==5.4.0 pyspark==3.3.1
```
Of course you will need to have jupyter installed in your system:
@@ -92,7 +92,7 @@ spark = SparkSession.builder \
.config("spark.serializer", "org.apache.spark.serializer.KryoSerializer") \
.config("spark.kryoserializer.buffer.max", "2000M") \
.config("spark.driver.maxResultSize", "0") \
- .config("spark.jars.packages", "com.johnsnowlabs.nlp:spark-nlp_2.12:5.4.0-rc2") \
+ .config("spark.jars.packages", "com.johnsnowlabs.nlp:spark-nlp_2.12:5.4.0") \
.getOrCreate()
```
@@ -109,7 +109,7 @@ spark = SparkSession.builder \
com.johnsnowlabs.nlp
spark-nlp_2.12
- 5.4.0-rc2
+ 5.4.0
```
@@ -120,7 +120,7 @@ spark = SparkSession.builder \
com.johnsnowlabs.nlp
spark-nlp-gpu_2.12
- 5.4.0-rc2
+ 5.4.0
```
@@ -131,7 +131,7 @@ spark = SparkSession.builder \
com.johnsnowlabs.nlp
spark-nlp-silicon_2.12
- 5.4.0-rc2
+ 5.4.0
```
@@ -142,7 +142,7 @@ spark = SparkSession.builder \
com.johnsnowlabs.nlp
spark-nlp-aarch64_2.12
- 5.4.0-rc2
+ 5.4.0
```
@@ -154,28 +154,28 @@ spark = SparkSession.builder \
```scala
// https://mvnrepository.com/artifact/com.johnsnowlabs.nlp/spark-nlp
-libraryDependencies += "com.johnsnowlabs.nlp" %% "spark-nlp" % "5.4.0-rc2"
+libraryDependencies += "com.johnsnowlabs.nlp" %% "spark-nlp" % "5.4.0"
```
**spark-nlp-gpu:**
```scala
// https://mvnrepository.com/artifact/com.johnsnowlabs.nlp/spark-nlp-gpu
-libraryDependencies += "com.johnsnowlabs.nlp" %% "spark-nlp-gpu" % "5.4.0-rc2"
+libraryDependencies += "com.johnsnowlabs.nlp" %% "spark-nlp-gpu" % "5.4.0"
```
**spark-nlp-silicon:**
```scala
// https://mvnrepository.com/artifact/com.johnsnowlabs.nlp/spark-nlp-silicon
-libraryDependencies += "com.johnsnowlabs.nlp" %% "spark-nlp-silicon" % "5.4.0-rc2"
+libraryDependencies += "com.johnsnowlabs.nlp" %% "spark-nlp-silicon" % "5.4.0"
```
**spark-nlp-aarch64:**
```scala
// https://mvnrepository.com/artifact/com.johnsnowlabs.nlp/spark-nlp-aarch64
-libraryDependencies += "com.johnsnowlabs.nlp" %% "spark-nlp-aarch64" % "5.4.0-rc2"
+libraryDependencies += "com.johnsnowlabs.nlp" %% "spark-nlp-aarch64" % "5.4.0"
```
Maven Central: [https://mvnrepository.com/artifact/com.johnsnowlabs.nlp](https://mvnrepository.com/artifact/com.johnsnowlabs.nlp)
@@ -257,7 +257,7 @@ maven coordinates like these:
com.johnsnowlabs.nlp
spark-nlp-silicon_2.12
- 5.4.0-rc2
+ 5.4.0
```
@@ -265,7 +265,7 @@ or in case of sbt:
```scala
// https://mvnrepository.com/artifact/com.johnsnowlabs.nlp/spark-nlp
-libraryDependencies += "com.johnsnowlabs.nlp" %% "spark-nlp-silicon" % "5.4.0-rc2"
+libraryDependencies += "com.johnsnowlabs.nlp" %% "spark-nlp-silicon" % "5.4.0"
```
If everything went well, you can now start Spark NLP with the `m1` flag set to `true`:
@@ -302,7 +302,7 @@ spark = sparknlp.start(apple_silicon=True)
## Installation for Linux Aarch64 Systems
-Starting from version 5.4.0-rc2, Spark NLP supports Linux systems running on an aarch64
+Starting from version 5.4.0, Spark NLP supports Linux systems running on an aarch64
processor architecture. The necessary dependencies have been built on Ubuntu 16.04, so a
recent system with an environment of at least that will be needed.
@@ -350,7 +350,7 @@ This script comes with the two options to define `pyspark` and `spark-nlp` versi
# -p is for pyspark
# -s is for spark-nlp
# by default they are set to the latest
-!wget http://setup.johnsnowlabs.com/colab.sh -O - | bash /dev/stdin -p 3.2.3 -s 5.4.0-rc2
+!wget http://setup.johnsnowlabs.com/colab.sh -O - | bash /dev/stdin -p 3.2.3 -s 5.4.0
```
[Spark NLP quick start on Google Colab](https://colab.research.google.com/github/JohnSnowLabs/spark-nlp/blob/master/examples/python/quick_start_google_colab.ipynb) is a live demo on Google Colab that performs named entity recognitions and sentiment analysis by using Spark NLP pretrained pipelines.
@@ -372,7 +372,7 @@ Run the following code in Kaggle Kernel and start using spark-nlp right away.
## Databricks Support
-Spark NLP 5.4.0-rc2 has been tested and is compatible with the following runtimes:
+Spark NLP 5.4.0 has been tested and is compatible with the following runtimes:
**CPU:**
@@ -454,7 +454,7 @@ Spark NLP 5.4.0-rc2 has been tested and is compatible with the following runtime
3.1. Install New -> PyPI -> `spark-nlp` -> Install
- 3.2. Install New -> Maven -> Coordinates -> `com.johnsnowlabs.nlp:spark-nlp_2.12:5.4.0-rc2` -> Install
+ 3.2. Install New -> Maven -> Coordinates -> `com.johnsnowlabs.nlp:spark-nlp_2.12:5.4.0` -> Install
4. Now you can attach your notebook to the cluster and use Spark NLP!
@@ -474,7 +474,7 @@ Note: You can import these notebooks by using their URLs.
## EMR Support
-Spark NLP 5.4.0-rc2 has been tested and is compatible with the following EMR releases:
+Spark NLP 5.4.0 has been tested and is compatible with the following EMR releases:
- emr-6.2.0
- emr-6.3.0
@@ -537,7 +537,7 @@ A sample of your software configuration in JSON on S3 (must be public access):
"spark.kryoserializer.buffer.max": "2000M",
"spark.serializer": "org.apache.spark.serializer.KryoSerializer",
"spark.driver.maxResultSize": "0",
- "spark.jars.packages": "com.johnsnowlabs.nlp:spark-nlp_2.12:5.4.0-rc2"
+ "spark.jars.packages": "com.johnsnowlabs.nlp:spark-nlp_2.12:5.4.0"
}
}
]
@@ -547,7 +547,7 @@ A sample of AWS CLI to launch EMR cluster:
```sh
aws emr create-cluster \
---name "Spark NLP 5.4.0-rc2" \
+--name "Spark NLP 5.4.0" \
--release-label emr-6.2.0 \
--applications Name=Hadoop Name=Spark Name=Hive \
--instance-type m4.4xlarge \
@@ -812,7 +812,7 @@ We recommend using `conda` to manage your Python environment on Windows.
Now you can use the downloaded binary by navigating to `%SPARK_HOME%\bin` and
running
-Either create a conda env for python 3.6, install *pyspark==3.3.1 spark-nlp numpy* and use Jupyter/python console, or in the same conda env you can go to spark bin for *pyspark --packages com.johnsnowlabs.nlp:spark-nlp_2.12:5.4.0-rc2*.
+Either create a conda env for python 3.6, install *pyspark==3.3.1 spark-nlp numpy* and use Jupyter/python console, or in the same conda env you can go to spark bin for *pyspark --packages com.johnsnowlabs.nlp:spark-nlp_2.12:5.4.0*.
@@ -840,12 +840,12 @@ spark = SparkSession.builder \
.config("spark.driver.memory","16G")\
.config("spark.driver.maxResultSize", "0") \
.config("spark.kryoserializer.buffer.max", "2000M")\
- .config("spark.jars", "/tmp/spark-nlp-assembly-5.4.0-rc2.jar")\
+ .config("spark.jars", "/tmp/spark-nlp-assembly-5.4.0.jar")\
.getOrCreate()
```
- You can download provided Fat JARs from each [release notes](https://github.com/JohnSnowLabs/spark-nlp/releases), please pay attention to pick the one that suits your environment depending on the device (CPU/GPU) and Apache Spark version (3.x)
-- If you are local, you can load the Fat JAR from your local FileSystem, however, if you are in a cluster setup you need to put the Fat JAR on a distributed FileSystem such as HDFS, DBFS, S3, etc. (i.e., `hdfs:///tmp/spark-nlp-assembly-5.4.0-rc2.jar`)
+- If you are local, you can load the Fat JAR from your local FileSystem, however, if you are in a cluster setup you need to put the Fat JAR on a distributed FileSystem such as HDFS, DBFS, S3, etc. (i.e., `hdfs:///tmp/spark-nlp-assembly-5.4.0.jar`)
Example of using pretrained Models and Pipelines in offline:
diff --git a/docs/en/spark_nlp.md b/docs/en/spark_nlp.md
index d96db75e028196..dac35142b800e6 100644
--- a/docs/en/spark_nlp.md
+++ b/docs/en/spark_nlp.md
@@ -25,7 +25,7 @@ Spark NLP is built on top of **Apache Spark 3.x**. For using Spark NLP you need:
**GPU (optional):**
-Spark NLP 5.4.0-rc2 is built with TensorFlow 2.7.1 and the following NVIDIA® software are only required for GPU support:
+Spark NLP 5.4.0 is built with TensorFlow 2.7.1 and the following NVIDIA® software are only required for GPU support:
- NVIDIA® GPU drivers version 450.80.02 or higher
- CUDA® Toolkit 11.2
diff --git a/python/README.md b/python/README.md
index 062875565dacc8..cb7c32736e8638 100644
--- a/python/README.md
+++ b/python/README.md
@@ -114,6 +114,7 @@ documentation and examples
- INSTRUCTOR Embeddings (HuggingFace models)
- E5 Embeddings (HuggingFace models)
- MPNet Embeddings (HuggingFace models)
+- UAE Embeddings (HuggingFace models)
- OpenAI Embeddings
- Sentence & Chunk Embeddings
- Unsupervised keywords extraction
@@ -138,7 +139,7 @@ documentation and examples
- Text-To-Text Transfer Transformer (Google T5)
- Generative Pre-trained Transformer 2 (OpenAI GPT2)
- Seq2Seq for NLG, Translation, and Comprehension (Facebook BART)
-- Chat and Conversational LLMs (Facebook Llama-22)
+- Chat and Conversational LLMs (Facebook Llama-2)
- Vision Transformer (Google ViT)
- Swin Image Classification (Microsoft Swin Transformer)
- ConvNext Image Classification (Facebook ConvNext)
@@ -148,10 +149,10 @@ documentation and examples
- Automatic Speech Recognition (HuBERT)
- Automatic Speech Recognition (OpenAI Whisper)
- Named entity recognition (Deep learning)
-- Easy ONNX and TensorFlow integrations
+- Easy ONNX, OpenVINO, and TensorFlow integrations
- GPU Support
- Full integration with Spark ML functions
-- +30000 pre-trained models in +200 languages!
+- +31000 pre-trained models in +200 languages!
- +6000 pre-trained pipelines in +200 languages!
- Multi-lingual NER models: Arabic, Bengali, Chinese, Danish, Dutch, English, Finnish, French, German, Hebrew, Italian,
Japanese, Korean, Norwegian, Persian, Polish, Portuguese, Russian, Spanish, Swedish, Urdu, and more.
@@ -165,7 +166,7 @@ To use Spark NLP you need the following requirements:
**GPU (optional):**
-Spark NLP 5.4.0-rc2 is built with ONNX 1.17.0 and TensorFlow 2.7.1 deep learning engines. The minimum following NVIDIA® software are only required for GPU support:
+Spark NLP 5.4.0 is built with ONNX 1.17.0 and TensorFlow 2.7.1 deep learning engines. The minimum following NVIDIA® software are only required for GPU support:
- NVIDIA® GPU drivers version 450.80.02 or higher
- CUDA® Toolkit 11.2
@@ -181,7 +182,7 @@ $ java -version
$ conda create -n sparknlp python=3.7 -y
$ conda activate sparknlp
# spark-nlp by default is based on pyspark 3.x
-$ pip install spark-nlp==5.4.0-rc2 pyspark==3.3.1
+$ pip install spark-nlp==5.4.0 pyspark==3.3.1
```
In Python console or Jupyter `Python3` kernel:
@@ -226,10 +227,11 @@ For more examples, you can visit our dedicated [examples](https://github.com/Joh
## Apache Spark Support
-Spark NLP *5.4.0-rc2* has been built on top of Apache Spark 3.4 while fully supports Apache Spark 3.0.x, 3.1.x, 3.2.x, 3.3.x, 3.4.x, and 3.5.x
+Spark NLP *5.4.0* has been built on top of Apache Spark 3.4 while fully supports Apache Spark 3.0.x, 3.1.x, 3.2.x, 3.3.x, 3.4.x, and 3.5.x
| Spark NLP | Apache Spark 3.5.x | Apache Spark 3.4.x | Apache Spark 3.3.x | Apache Spark 3.2.x | Apache Spark 3.1.x | Apache Spark 3.0.x | Apache Spark 2.4.x | Apache Spark 2.3.x |
|-----------|--------------------|--------------------|--------------------|--------------------|--------------------|--------------------|--------------------|--------------------|
+| 5.4.x | YES | YES | YES | YES | YES | YES | NO | NO |
| 5.3.x | YES | YES | YES | YES | YES | YES | NO | NO |
| 5.2.x | YES | YES | YES | YES | YES | YES | NO | NO |
| 5.1.x | Partially | YES | YES | YES | YES | YES | NO | NO |
@@ -239,12 +241,6 @@ Spark NLP *5.4.0-rc2* has been built on top of Apache Spark 3.4 while fully supp
| 4.2.x | NO | NO | YES | YES | YES | YES | NO | NO |
| 4.1.x | NO | NO | YES | YES | YES | YES | NO | NO |
| 4.0.x | NO | NO | YES | YES | YES | YES | NO | NO |
-| 3.4.x | NO | NO | N/A | Partially | YES | YES | YES | YES |
-| 3.3.x | NO | NO | NO | NO | YES | YES | YES | YES |
-| 3.2.x | NO | NO | NO | NO | YES | YES | YES | YES |
-| 3.1.x | NO | NO | NO | NO | YES | YES | YES | YES |
-| 3.0.x | NO | NO | NO | NO | YES | YES | YES | YES |
-| 2.7.x | NO | NO | NO | NO | NO | NO | YES | YES |
Find out more about `Spark NLP` versions from our [release notes](https://github.com/JohnSnowLabs/spark-nlp/releases).
@@ -261,16 +257,10 @@ Find out more about `Spark NLP` versions from our [release notes](https://github
| 4.2.x | YES | YES | YES | YES | YES | NO | YES |
| 4.1.x | YES | YES | YES | YES | NO | NO | YES |
| 4.0.x | YES | YES | YES | YES | NO | NO | YES |
-| 3.4.x | YES | YES | YES | YES | NO | YES | YES |
-| 3.3.x | YES | YES | YES | NO | NO | YES | YES |
-| 3.2.x | YES | YES | YES | NO | NO | YES | YES |
-| 3.1.x | YES | YES | YES | NO | NO | YES | YES |
-| 3.0.x | YES | YES | YES | NO | NO | YES | YES |
-| 2.7.x | YES | YES | NO | NO | NO | YES | NO |
## Databricks Support
-Spark NLP 5.4.0-rc2 has been tested and is compatible with the following runtimes:
+Spark NLP 5.4.0 has been tested and is compatible with the following runtimes:
**CPU:**
@@ -343,7 +333,7 @@ Spark NLP 5.4.0-rc2 has been tested and is compatible with the following runtime
## EMR Support
-Spark NLP 5.4.0-rc2 has been tested and is compatible with the following EMR releases:
+Spark NLP 5.4.0 has been tested and is compatible with the following EMR releases:
- emr-6.2.0
- emr-6.3.0
@@ -393,11 +383,11 @@ Spark NLP supports all major releases of Apache Spark 3.0.x, Apache Spark 3.1.x,
```sh
# CPU
-spark-shell --packages com.johnsnowlabs.nlp:spark-nlp_2.12:5.4.0-rc2
+spark-shell --packages com.johnsnowlabs.nlp:spark-nlp_2.12:5.4.0
-pyspark --packages com.johnsnowlabs.nlp:spark-nlp_2.12:5.4.0-rc2
+pyspark --packages com.johnsnowlabs.nlp:spark-nlp_2.12:5.4.0
-spark-submit --packages com.johnsnowlabs.nlp:spark-nlp_2.12:5.4.0-rc2
+spark-submit --packages com.johnsnowlabs.nlp:spark-nlp_2.12:5.4.0
```
The `spark-nlp` has been published to
@@ -406,11 +396,11 @@ the [Maven Repository](https://mvnrepository.com/artifact/com.johnsnowlabs.nlp/s
```sh
# GPU
-spark-shell --packages com.johnsnowlabs.nlp:spark-nlp-gpu_2.12:5.4.0-rc2
+spark-shell --packages com.johnsnowlabs.nlp:spark-nlp-gpu_2.12:5.4.0
-pyspark --packages com.johnsnowlabs.nlp:spark-nlp-gpu_2.12:5.4.0-rc2
+pyspark --packages com.johnsnowlabs.nlp:spark-nlp-gpu_2.12:5.4.0
-spark-submit --packages com.johnsnowlabs.nlp:spark-nlp-gpu_2.12:5.4.0-rc2
+spark-submit --packages com.johnsnowlabs.nlp:spark-nlp-gpu_2.12:5.4.0
```
@@ -420,11 +410,11 @@ the [Maven Repository](https://mvnrepository.com/artifact/com.johnsnowlabs.nlp/s
```sh
# AArch64
-spark-shell --packages com.johnsnowlabs.nlp:spark-nlp-aarch64_2.12:5.4.0-rc2
+spark-shell --packages com.johnsnowlabs.nlp:spark-nlp-aarch64_2.12:5.4.0
-pyspark --packages com.johnsnowlabs.nlp:spark-nlp-aarch64_2.12:5.4.0-rc2
+pyspark --packages com.johnsnowlabs.nlp:spark-nlp-aarch64_2.12:5.4.0
-spark-submit --packages com.johnsnowlabs.nlp:spark-nlp-aarch64_2.12:5.4.0-rc2
+spark-submit --packages com.johnsnowlabs.nlp:spark-nlp-aarch64_2.12:5.4.0
```
@@ -434,11 +424,11 @@ the [Maven Repository](https://mvnrepository.com/artifact/com.johnsnowlabs.nlp/s
```sh
# M1/M2 (Apple Silicon)
-spark-shell --packages com.johnsnowlabs.nlp:spark-nlp-silicon_2.12:5.4.0-rc2
+spark-shell --packages com.johnsnowlabs.nlp:spark-nlp-silicon_2.12:5.4.0
-pyspark --packages com.johnsnowlabs.nlp:spark-nlp-silicon_2.12:5.4.0-rc2
+pyspark --packages com.johnsnowlabs.nlp:spark-nlp-silicon_2.12:5.4.0
-spark-submit --packages com.johnsnowlabs.nlp:spark-nlp-silicon_2.12:5.4.0-rc2
+spark-submit --packages com.johnsnowlabs.nlp:spark-nlp-silicon_2.12:5.4.0
```
@@ -452,7 +442,7 @@ set in your SparkSession:
spark-shell \
--driver-memory 16g \
--conf spark.kryoserializer.buffer.max=2000M \
- --packages com.johnsnowlabs.nlp:spark-nlp_2.12:5.4.0-rc2
+ --packages com.johnsnowlabs.nlp:spark-nlp_2.12:5.4.0
```
## Scala
@@ -470,7 +460,7 @@ coordinates:
com.johnsnowlabs.nlp
spark-nlp_2.12
- 5.4.0-rc2
+ 5.4.0
```
@@ -481,7 +471,7 @@ coordinates:
com.johnsnowlabs.nlp
spark-nlp-gpu_2.12
- 5.4.0-rc2
+ 5.4.0
```
@@ -492,7 +482,7 @@ coordinates:
com.johnsnowlabs.nlp
spark-nlp-aarch64_2.12
- 5.4.0-rc2
+ 5.4.0
```
@@ -503,7 +493,7 @@ coordinates:
com.johnsnowlabs.nlp
spark-nlp-silicon_2.12
- 5.4.0-rc2
+ 5.4.0
```
@@ -513,28 +503,28 @@ coordinates:
```sbtshell
// https://mvnrepository.com/artifact/com.johnsnowlabs.nlp/spark-nlp
-libraryDependencies += "com.johnsnowlabs.nlp" %% "spark-nlp" % "5.4.0-rc2"
+libraryDependencies += "com.johnsnowlabs.nlp" %% "spark-nlp" % "5.4.0"
```
**spark-nlp-gpu:**
```sbtshell
// https://mvnrepository.com/artifact/com.johnsnowlabs.nlp/spark-nlp-gpu
-libraryDependencies += "com.johnsnowlabs.nlp" %% "spark-nlp-gpu" % "5.4.0-rc2"
+libraryDependencies += "com.johnsnowlabs.nlp" %% "spark-nlp-gpu" % "5.4.0"
```
**spark-nlp-aarch64:**
```sbtshell
// https://mvnrepository.com/artifact/com.johnsnowlabs.nlp/spark-nlp-aarch64
-libraryDependencies += "com.johnsnowlabs.nlp" %% "spark-nlp-aarch64" % "5.4.0-rc2"
+libraryDependencies += "com.johnsnowlabs.nlp" %% "spark-nlp-aarch64" % "5.4.0"
```
**spark-nlp-silicon:**
```sbtshell
// https://mvnrepository.com/artifact/com.johnsnowlabs.nlp/spark-nlp-silicon
-libraryDependencies += "com.johnsnowlabs.nlp" %% "spark-nlp-silicon" % "5.4.0-rc2"
+libraryDependencies += "com.johnsnowlabs.nlp" %% "spark-nlp-silicon" % "5.4.0"
```
Maven
@@ -556,7 +546,7 @@ If you installed pyspark through pip/conda, you can install `spark-nlp` through
Pip:
```bash
-pip install spark-nlp==5.4.0-rc2
+pip install spark-nlp==5.4.0
```
Conda:
@@ -585,7 +575,7 @@ spark = SparkSession.builder
.config("spark.driver.memory", "16G")
.config("spark.driver.maxResultSize", "0")
.config("spark.kryoserializer.buffer.max", "2000M")
- .config("spark.jars.packages", "com.johnsnowlabs.nlp:spark-nlp_2.12:5.4.0-rc2")
+ .config("spark.jars.packages", "com.johnsnowlabs.nlp:spark-nlp_2.12:5.4.0")
.getOrCreate()
```
@@ -656,7 +646,7 @@ Use either one of the following options
- Add the following Maven Coordinates to the interpreter's library list
```bash
-com.johnsnowlabs.nlp:spark-nlp_2.12:5.4.0-rc2
+com.johnsnowlabs.nlp:spark-nlp_2.12:5.4.0
```
- Add a path to pre-built jar from [here](#compiled-jars) in the interpreter's library list making sure the jar is
@@ -667,7 +657,7 @@ com.johnsnowlabs.nlp:spark-nlp_2.12:5.4.0-rc2
Apart from the previous step, install the python module through pip
```bash
-pip install spark-nlp==5.4.0-rc2
+pip install spark-nlp==5.4.0
```
Or you can install `spark-nlp` from inside Zeppelin by using Conda:
@@ -695,7 +685,7 @@ launch the Jupyter from the same Python environment:
$ conda create -n sparknlp python=3.8 -y
$ conda activate sparknlp
# spark-nlp by default is based on pyspark 3.x
-$ pip install spark-nlp==5.4.0-rc2 pyspark==3.3.1 jupyter
+$ pip install spark-nlp==5.4.0 pyspark==3.3.1 jupyter
$ jupyter notebook
```
@@ -712,7 +702,7 @@ export PYSPARK_PYTHON=python3
export PYSPARK_DRIVER_PYTHON=jupyter
export PYSPARK_DRIVER_PYTHON_OPTS=notebook
-pyspark --packages com.johnsnowlabs.nlp:spark-nlp_2.12:5.4.0-rc2
+pyspark --packages com.johnsnowlabs.nlp:spark-nlp_2.12:5.4.0
```
Alternatively, you can mix in using `--jars` option for pyspark + `pip install spark-nlp`
@@ -739,7 +729,7 @@ This script comes with the two options to define `pyspark` and `spark-nlp` versi
# -s is for spark-nlp
# -g will enable upgrading libcudnn8 to 8.1.0 on Google Colab for GPU usage
# by default they are set to the latest
-!wget https://setup.johnsnowlabs.com/colab.sh -O - | bash /dev/stdin -p 3.2.3 -s 5.4.0-rc2
+!wget https://setup.johnsnowlabs.com/colab.sh -O - | bash /dev/stdin -p 3.2.3 -s 5.4.0
```
[Spark NLP quick start on Google Colab](https://colab.research.google.com/github/JohnSnowLabs/spark-nlp/blob/master/examples/python/quick_start_google_colab.ipynb)
@@ -762,7 +752,7 @@ This script comes with the two options to define `pyspark` and `spark-nlp` versi
# -s is for spark-nlp
# -g will enable upgrading libcudnn8 to 8.1.0 on Kaggle for GPU usage
# by default they are set to the latest
-!wget https://setup.johnsnowlabs.com/colab.sh -O - | bash /dev/stdin -p 3.2.3 -s 5.4.0-rc2
+!wget https://setup.johnsnowlabs.com/colab.sh -O - | bash /dev/stdin -p 3.2.3 -s 5.4.0
```
[Spark NLP quick start on Kaggle Kernel](https://www.kaggle.com/mozzie/spark-nlp-named-entity-recognition) is a live
@@ -781,9 +771,9 @@ demo on Kaggle Kernel that performs named entity recognitions by using Spark NLP
3. In `Libraries` tab inside your cluster you need to follow these steps:
- 3.1. Install New -> PyPI -> `spark-nlp==5.4.0-rc2` -> Install
+ 3.1. Install New -> PyPI -> `spark-nlp==5.4.0` -> Install
- 3.2. Install New -> Maven -> Coordinates -> `com.johnsnowlabs.nlp:spark-nlp_2.12:5.4.0-rc2` -> Install
+ 3.2. Install New -> Maven -> Coordinates -> `com.johnsnowlabs.nlp:spark-nlp_2.12:5.4.0` -> Install
4. Now you can attach your notebook to the cluster and use Spark NLP!
@@ -834,7 +824,7 @@ A sample of your software configuration in JSON on S3 (must be public access):
"spark.kryoserializer.buffer.max": "2000M",
"spark.serializer": "org.apache.spark.serializer.KryoSerializer",
"spark.driver.maxResultSize": "0",
- "spark.jars.packages": "com.johnsnowlabs.nlp:spark-nlp_2.12:5.4.0-rc2"
+ "spark.jars.packages": "com.johnsnowlabs.nlp:spark-nlp_2.12:5.4.0"
}
}]
```
@@ -843,7 +833,7 @@ A sample of AWS CLI to launch EMR cluster:
```.sh
aws emr create-cluster \
---name "Spark NLP 5.4.0-rc2" \
+--name "Spark NLP 5.4.0" \
--release-label emr-6.2.0 \
--applications Name=Hadoop Name=Spark Name=Hive \
--instance-type m4.4xlarge \
@@ -907,7 +897,7 @@ gcloud dataproc clusters create ${CLUSTER_NAME} \
--enable-component-gateway \
--metadata 'PIP_PACKAGES=spark-nlp spark-nlp-display google-cloud-bigquery google-cloud-storage' \
--initialization-actions gs://goog-dataproc-initialization-actions-${REGION}/python/pip-install.sh \
- --properties spark:spark.serializer=org.apache.spark.serializer.KryoSerializer,spark:spark.driver.maxResultSize=0,spark:spark.kryoserializer.buffer.max=2000M,spark:spark.jars.packages=com.johnsnowlabs.nlp:spark-nlp_2.12:5.4.0-rc2
+ --properties spark:spark.serializer=org.apache.spark.serializer.KryoSerializer,spark:spark.driver.maxResultSize=0,spark:spark.kryoserializer.buffer.max=2000M,spark:spark.jars.packages=com.johnsnowlabs.nlp:spark-nlp_2.12:5.4.0
```
2. On an existing one, you need to install spark-nlp and spark-nlp-display packages from PyPI.
@@ -950,7 +940,7 @@ spark = SparkSession.builder
.config("spark.kryoserializer.buffer.max", "2000m")
.config("spark.jsl.settings.pretrained.cache_folder", "sample_data/pretrained")
.config("spark.jsl.settings.storage.cluster_tmp_dir", "sample_data/storage")
- .config("spark.jars.packages", "com.johnsnowlabs.nlp:spark-nlp_2.12:5.4.0-rc2")
+ .config("spark.jars.packages", "com.johnsnowlabs.nlp:spark-nlp_2.12:5.4.0")
.getOrCreate()
```
@@ -964,7 +954,7 @@ spark-shell \
--conf spark.kryoserializer.buffer.max=2000M \
--conf spark.jsl.settings.pretrained.cache_folder="sample_data/pretrained" \
--conf spark.jsl.settings.storage.cluster_tmp_dir="sample_data/storage" \
- --packages com.johnsnowlabs.nlp:spark-nlp_2.12:5.4.0-rc2
+ --packages com.johnsnowlabs.nlp:spark-nlp_2.12:5.4.0
```
**pyspark:**
@@ -977,7 +967,7 @@ pyspark \
--conf spark.kryoserializer.buffer.max=2000M \
--conf spark.jsl.settings.pretrained.cache_folder="sample_data/pretrained" \
--conf spark.jsl.settings.storage.cluster_tmp_dir="sample_data/storage" \
- --packages com.johnsnowlabs.nlp:spark-nlp_2.12:5.4.0-rc2
+ --packages com.johnsnowlabs.nlp:spark-nlp_2.12:5.4.0
```
**Databricks:**
@@ -1249,7 +1239,7 @@ spark = SparkSession.builder
.config("spark.driver.memory", "16G")
.config("spark.driver.maxResultSize", "0")
.config("spark.kryoserializer.buffer.max", "2000M")
- .config("spark.jars", "/tmp/spark-nlp-assembly-5.4.0-rc2.jar")
+ .config("spark.jars", "/tmp/spark-nlp-assembly-5.4.0.jar")
.getOrCreate()
```
@@ -1258,7 +1248,7 @@ spark = SparkSession.builder
version (3.0.x, 3.1.x, 3.2.x, 3.3.x, 3.4.x, and 3.5.x)
- If you are local, you can load the Fat JAR from your local FileSystem, however, if you are in a cluster setup you need
to put the Fat JAR on a distributed FileSystem such as HDFS, DBFS, S3, etc. (
- i.e., `hdfs:///tmp/spark-nlp-assembly-5.4.0-rc2.jar`)
+ i.e., `hdfs:///tmp/spark-nlp-assembly-5.4.0.jar`)
Example of using pretrained Models and Pipelines in offline:
diff --git a/python/docs/conf.py b/python/docs/conf.py
index e70ce34034473d..88d28fb0e8a4e8 100644
--- a/python/docs/conf.py
+++ b/python/docs/conf.py
@@ -23,7 +23,7 @@
author = "John Snow Labs"
# The full version, including alpha/beta/rc tags
-release = "5.4.0-rc2"
+release = "5.4.0"
pyspark_version = "3.2.3"
# -- General configuration ---------------------------------------------------
diff --git a/python/setup.py b/python/setup.py
index 5075f6e4a79380..53fb03dbfdd3e5 100644
--- a/python/setup.py
+++ b/python/setup.py
@@ -41,7 +41,7 @@
# project code, see
# https://packaging.python.org/en/latest/single_source_version.html
- version='5.4.0-rc2', # Required
+ version='5.4.0', # Required
# This is a one-line description or tagline of what your project does. This
# corresponds to the 'Summary' metadata field:
diff --git a/python/sparknlp/__init__.py b/python/sparknlp/__init__.py
index affe3d86179961..2ccd94083a04fb 100644
--- a/python/sparknlp/__init__.py
+++ b/python/sparknlp/__init__.py
@@ -128,7 +128,7 @@ def start(gpu=False,
The initiated Spark session.
"""
- current_version = "5.4.0-rc2"
+ current_version = "5.4.0"
if params is None:
params = {}
@@ -309,4 +309,4 @@ def version():
str
The current Spark NLP version.
"""
- return '5.4.0-rc2'
+ return '5.4.0'
diff --git a/scripts/colab_setup.sh b/scripts/colab_setup.sh
index 1f89570a223ae4..1871e9364d837d 100644
--- a/scripts/colab_setup.sh
+++ b/scripts/colab_setup.sh
@@ -1,7 +1,7 @@
#!/bin/bash
#default values for pyspark, spark-nlp, and SPARK_HOME
-SPARKNLP="5.4.0-rc2"
+SPARKNLP="5.4.0"
PYSPARK="3.2.3"
while getopts s:p:g option
diff --git a/scripts/kaggle_setup.sh b/scripts/kaggle_setup.sh
index 4dc900fc53c74c..847624604a69a9 100644
--- a/scripts/kaggle_setup.sh
+++ b/scripts/kaggle_setup.sh
@@ -1,7 +1,7 @@
#!/bin/bash
#default values for pyspark, spark-nlp, and SPARK_HOME
-SPARKNLP="5.4.0-rc2"
+SPARKNLP="5.4.0"
PYSPARK="3.2.3"
while getopts s:p:g option
diff --git a/scripts/sagemaker_setup.sh b/scripts/sagemaker_setup.sh
index fdced8a0b19452..2b147480f4ed5a 100644
--- a/scripts/sagemaker_setup.sh
+++ b/scripts/sagemaker_setup.sh
@@ -1,7 +1,7 @@
#!/bin/bash
# Default values for pyspark, spark-nlp, and SPARK_HOME
-SPARKNLP="5.4.0-rc2"
+SPARKNLP="5.4.0"
PYSPARK="3.2.3"
echo "Setup SageMaker for PySpark $PYSPARK and Spark NLP $SPARKNLP"
diff --git a/src/main/scala/com/johnsnowlabs/nlp/SparkNLP.scala b/src/main/scala/com/johnsnowlabs/nlp/SparkNLP.scala
index c20ee554181283..d87a3f5d47e860 100644
--- a/src/main/scala/com/johnsnowlabs/nlp/SparkNLP.scala
+++ b/src/main/scala/com/johnsnowlabs/nlp/SparkNLP.scala
@@ -20,7 +20,7 @@ import org.apache.spark.sql.SparkSession
object SparkNLP {
- val currentVersion = "5.4.0-rc2"
+ val currentVersion = "5.4.0"
val MavenSpark3 = s"com.johnsnowlabs.nlp:spark-nlp_2.12:$currentVersion"
val MavenGpuSpark3 = s"com.johnsnowlabs.nlp:spark-nlp-gpu_2.12:$currentVersion"
val MavenSparkSilicon = s"com.johnsnowlabs.nlp:spark-nlp-silicon_2.12:$currentVersion"
diff --git a/src/main/scala/com/johnsnowlabs/util/Build.scala b/src/main/scala/com/johnsnowlabs/util/Build.scala
index 09b269d771f8c1..7d20d0cf72106a 100644
--- a/src/main/scala/com/johnsnowlabs/util/Build.scala
+++ b/src/main/scala/com/johnsnowlabs/util/Build.scala
@@ -17,5 +17,5 @@
package com.johnsnowlabs.util
object Build {
- val version: String = "5.4.0-rc2"
+ val version: String = "5.4.0"
}