diff --git a/CHANGELOG b/CHANGELOG index 8912a2e7a41dac..5a09975bb542a6 100644 --- a/CHANGELOG +++ b/CHANGELOG @@ -1,3 +1,36 @@ +======== +5.5.1 +======== +---------------- +New Features & Enhancements +---------------- +* `BertForMultipleChoice` Transformer Added. Enhanced BERT’s capabilities to handle multiple-choice tasks such as standardized test questions and survey or quiz automation. +* Integrated New Tasks and Documentation: + * Added support and documentation for the following tasks: + * Automatic Speech Recognition + * Dependency Parsing + * Image Captioning + * Image Classification + * Landing Page + * Question Answering + * Summarization + * Table Question Answering + * Text Classification + * Text Generation + * Text Preprocessing + * Token Classification + * Translation + * Zero-Shot Classification + * Zero-Shot Image Classification +* `PromptAssembler` Annotator Introduced. Introduced a new annotator that constructs prompts for LLMs using a chat template and a sequence of messages. Accepts an array of tuples with roles (“system”, “user”, “assistant”) and message texts. Utilizes llama.cpp as a backend for template parsing, supporting basic template applications. + +---------------- +Bug Fixes +---------------- +* Resolved Pretrained Model Loading Issue on DBFS Systems. +* Fixed a bug where pretrained models were not found when running AutoGGUF model pipelines on Databricks due to incorrect path handling of gguf files. + + ======== 5.5.0 ======== diff --git a/README.md b/README.md index 77ae82f10edec9..e5af113964073d 100644 --- a/README.md +++ b/README.md @@ -63,7 +63,7 @@ $ java -version $ conda create -n sparknlp python=3.7 -y $ conda activate sparknlp # spark-nlp by default is based on pyspark 3.x -$ pip install spark-nlp==5.5.0 pyspark==3.3.1 +$ pip install spark-nlp==5.5.1 pyspark==3.3.1 ``` In Python console or Jupyter `Python3` kernel: @@ -129,7 +129,7 @@ For a quick example of using pipelines and models take a look at our official [d ### Apache Spark Support -Spark NLP *5.5.0* has been built on top of Apache Spark 3.4 while fully supports Apache Spark 3.0.x, 3.1.x, 3.2.x, 3.3.x, 3.4.x, and 3.5.x +Spark NLP *5.5.1* has been built on top of Apache Spark 3.4 while fully supports Apache Spark 3.0.x, 3.1.x, 3.2.x, 3.3.x, 3.4.x, and 3.5.x | Spark NLP | Apache Spark 3.5.x | Apache Spark 3.4.x | Apache Spark 3.3.x | Apache Spark 3.2.x | Apache Spark 3.1.x | Apache Spark 3.0.x | Apache Spark 2.4.x | Apache Spark 2.3.x | |-----------|--------------------|--------------------|--------------------|--------------------|--------------------|--------------------|--------------------|--------------------| @@ -157,7 +157,7 @@ Find out more about 4.x `SparkNLP` versions in our official [documentation](http ### Databricks Support -Spark NLP 5.5.0 has been tested and is compatible with the following runtimes: +Spark NLP 5.5.1 has been tested and is compatible with the following runtimes: | **CPU** | **GPU** | |--------------------|--------------------| @@ -174,7 +174,7 @@ We are compatible with older runtimes. For a full list check databricks support ### EMR Support -Spark NLP 5.5.0 has been tested and is compatible with the following EMR releases: +Spark NLP 5.5.1 has been tested and is compatible with the following EMR releases: | **EMR Release** | |--------------------| @@ -205,7 +205,7 @@ deployed to Maven central. To add any of our packages as a dependency in your ap from our official documentation. If you are interested, there is a simple SBT project for Spark NLP to guide you on how to use it in your -projects [Spark NLP SBT S5.5.0r](https://github.com/maziyarpanahi/spark-nlp-starter) +projects [Spark NLP SBT S5.5.1r](https://github.com/maziyarpanahi/spark-nlp-starter) ### Python @@ -250,7 +250,7 @@ In Spark NLP we can define S3 locations to: Please check [these instructions](https://sparknlp.org/docs/en/install#s3-integration) from our official documentation. -## Document5.5.0 +## Document5.5.1 ### Examples @@ -283,7 +283,7 @@ the Spark NLP library: keywords = {Spark, Natural language processing, Deep learning, Tensorflow, Cluster}, abstract = {Spark NLP is a Natural Language Processing (NLP) library built on top of Apache Spark ML. It provides simple, performant & accurate NLP annotations for machine learning pipelines that can scale easily in a distributed environment. Spark NLP comes with 1100+ pretrained pipelines and models in more than 192+ languages. It supports nearly all the NLP tasks and modules that can be used seamlessly in a cluster. Downloaded more than 2.7 million times and experiencing 9x growth since January 2020, Spark NLP is used by 54% of healthcare organizations as the world’s most widely used NLP library in the enterprise.} } -}5.5.0 +}5.5.1 ``` ## Community support diff --git a/build.sbt b/build.sbt index 4f35f22f8ae570..7a75e4bb134db8 100644 --- a/build.sbt +++ b/build.sbt @@ -6,7 +6,7 @@ name := getPackageName(is_silicon, is_gpu, is_aarch64) organization := "com.johnsnowlabs.nlp" -version := "5.5.0" +version := "5.5.1" (ThisBuild / scalaVersion) := scalaVer @@ -156,7 +156,8 @@ lazy val utilDependencies = Seq( exclude ("com.fasterxml.jackson.dataformat", "jackson-dataformat-cbor"), greex, azureIdentity, - azureStorage) + azureStorage, + jsoup) lazy val typedDependencyParserDependencies = Seq(junit) @@ -185,8 +186,8 @@ val llamaCppDependencies = Seq(llamaCppGPU) else if (is_silicon.equals("true")) Seq(llamaCppSilicon) -// else if (is_aarch64.equals("true")) -// Seq(openVinoCPU) + else if (is_aarch64.equals("true")) + Seq(llamaCppAarch64) else Seq(llamaCppCPU) diff --git a/conda/meta.yaml b/conda/meta.yaml index b6b5439c39f6d2..a9f693b9fc12ad 100644 --- a/conda/meta.yaml +++ b/conda/meta.yaml @@ -1,5 +1,5 @@ {% set name = "spark-nlp" %} -{% set version = "5.5.0" %} +{% set version = "5.5.1" %} package: name: {{ name|lower }} @@ -7,7 +7,7 @@ package: source: url: https://pypi.io/packages/source/{{ name[0] }}/{{ name }}/spark-nlp-{{ version }}.tar.gz - sha256: edc71585f462f548770bd13899686f10d88fa4a4a6e201bc1bf9c7711e398dc0 + sha256: e8ddaf939a1b0acbe0d7b6d6a67f7fa0c5a73339d9e4563e3c1aba1cf0039409 build: noarch: python diff --git a/docs/_data/navigation.yml b/docs/_data/navigation.yml index c6e75a2a846237..85688b6c357880 100755 --- a/docs/_data/navigation.yml +++ b/docs/_data/navigation.yml @@ -44,6 +44,8 @@ sparknlp: url: /docs/en/pipelines - title: General Concepts url: /docs/en/concepts + - title: Tasks + url: /docs/en/tasks/landing_page - title: Annotators url: /docs/en/annotators - title: Transformers diff --git a/docs/_layouts/landing.html b/docs/_layouts/landing.html index c67ff52b47e214..105f3bde451c47 100755 --- a/docs/_layouts/landing.html +++ b/docs/_layouts/landing.html @@ -201,7 +201,7 @@

{{ _section.title }}

{% highlight bash %} # Using PyPI - $ pip install spark-nlp==5.5.0 + $ pip install spark-nlp==5.5.1 # Using Anaconda/Conda $ conda install -c johnsnowlabs spark-nlp diff --git a/docs/_posts/ahmedlone127/2023-12-02-a2_en.md b/docs/_posts/ahmedlone127/2023-12-02-a2_en.md index b5e31ebec81df4..04aaac443bed1e 100644 --- a/docs/_posts/ahmedlone127/2023-12-02-a2_en.md +++ b/docs/_posts/ahmedlone127/2023-12-02-a2_en.md @@ -7,7 +7,7 @@ date: 2023-12-02 tags: [roberta, en, open_source, sequence_classification, onnx] task: Text Classification language: en -edition: Spark NLP 5.2.0 +edition: Spark NLP 5.3.0 spark_version: 3.0 supported: true engine: onnx diff --git a/docs/api/com/index.html b/docs/api/com/index.html index 770f6f64dbc4fc..771e3d19586f9b 100644 --- a/docs/api/com/index.html +++ b/docs/api/com/index.html @@ -3,9 +3,9 @@ - Spark NLP 5.5.0 ScalaDoc - com - - + Spark NLP 5.5.1 ScalaDoc - com + + @@ -28,7 +28,7 @@