diff --git a/docs/en/ocr_pipeline_components.md b/docs/en/ocr_pipeline_components.md
index 913c754df5..6ee254374c 100644
--- a/docs/en/ocr_pipeline_components.md
+++ b/docs/en/ocr_pipeline_components.md
@@ -1356,6 +1356,89 @@ data.select("dicom").show()
+##### Input Columns
+| Param name | Type | Default | Column Data Description |
+| --- | --- | --- | --- |
+| inputCol | string | content | Binary dicom object |
+| inputRegionsCol | string | regions | Detected Array[Coordinates] from PositionFinder |
+#### Parameters
+| Param name | Type | Default | Description |
+| --- | --- | --- | --- |
+| scaleFactor | float | 1.0 | Scaling factor for regions. |
+| rotated | boolean | False | Enable/Disable support for rotated rectangles |
+| keepInput | boolean | False | Keep the original input column |
+| compression | string | RLELossless | Compression type |
+| forceCompress | boolean | False | True - Force compress image. False - compress only if original image was compressed |
+| aggCols | Array[string] | ['path'] | Sets the columns to be included in aggregation. These columns are preserved in the output DataFrame after transformations |
## Image pre-processing
Next section describes the transformers for image pre-processing: scaling, binarization, skew correction, etc.
@@ -2896,6 +2979,11 @@ val result = modelPipeline.transform(df)
| lineWidth | Int | 4 | Line width for draw rectangles |
| fontSize | Int | 12 | Font size for render labels and score |
| rotated | boolean | False | Support rotated regions |
+| rectColor | Color | Color.black | Color outline for bounding box |
+| filledRect | boolean | False | Enable/Disable filling rectangle |
+| sourceImageHeightCol | Int | height_dimension | Original annotation reference height |
+| sourceImageWidthCol | Int | width_dimension | Original annotation reference width |
+| scaleBoundingBoxes | Boolean | True | sourceImage height & width are required for scaling. Necessary to ensure accurate regions despite image transformations.|
@@ -2915,13 +3003,12 @@ val result = modelPipeline.transform(df)
from pyspark.ml import PipelineModel
from sparkocr.transformers import *
+from sparkocr.enums import *
imagePath = "path to image"
# Read image file as binary file
-df = spark.read
- .format("binaryFile")
- .load(imagePath)
+df = spark.read.format("binaryFile").load(imagePath)
binary_to_image = BinaryToImage() \
.setInputCol("content") \
@@ -2935,6 +3022,7 @@ layout_analyzer = ImageLayoutAnalyzer() \
draw = ImageDrawRegions() \
.setInputCol("image") \
.setRegionCol("regions") \
+ .setRectColor(Color.red) \
# Define pipeline
@@ -2950,17 +3038,16 @@ data.show()
import org.apache.spark.ml.Pipeline
+import java.awt.Color
import com.johnsnowlabs.ocr.transformers.{ImageSplitRegions, ImageLayoutAnalyzer}
import com.johnsnowlabs.ocr.OcrContext.implicits._
val imagePath = "path to image"
// Read image file as binary file
-val df = spark.read
- .format("binaryFile")
- .load(imagePath)
- .asImage("image")
+val df = spark.read.format("binaryFile").load(imagePath).asImage("image")
// Define transformer for detect regions
val layoutAnalyzer = new ImageLayoutAnalyzer()
@@ -2970,6 +3057,7 @@ val layoutAnalyzer = new ImageLayoutAnalyzer()
val draw = new ImageDrawRegions()
+ .setRectColor(Color.RED)
// Define pipeline
-Fits a model to the input dataset with optional parameters.
(dataset, paramMaps)
-Fits a model to the input dataset for each param map in paramMaps .
-Gets current column names of input annotations.
-Gets whether Annotator should be evaluated lazily in a RecursivePipeline.
-Gets the value of a param in the user-supplied param map or its default value.
-Gets output column name of annotations.
-Gets a param by its name.
-Gets the value of a parameter.
-Checks whether a param has a default value.
-Tests whether this instance contains a param with a given (string) name.
-Checks whether a param is explicitly set by user or has a default value.
-Checks whether a param is explicitly set by user.
-Reads an ML instance from the input path, a shortcut of read().load(path) .
-Returns an MLReader instance for this class.
-Save this ML instance to the given path, a shortcut of 'write().save(path)'.
(param, value)
-Sets a parameter in the embedded param map.
-Sets a path to the the classification model if it has been already trained.
-Sets whether to fit an intercept term, default is true.
-Sets column names of input annotations.
-Sets column with the value result we are trying to predict.
-Sets array to output the label in the original form.
-Sets whether Annotator should be evaluated lazily in a RecursivePipeline.
-Sets maximum number of iterations.
-Sets output column name of annotations.
-Sets the value of a parameter.
-Sets convergence tolerance after each iteration.
-Sets a path to the the classification model if it has been already trained.
-Returns an MLWriter instance for this ML instance.
-Returns all params ordered by name.
-clear ( param )
-Clears a param from the param map if it has been explicitly set.
-copy ( extra = None )
-Creates a copy of this instance with the same uid and some
-extra params. This implementation first calls Params.copy and
-then make a copy of the companion Java pipeline component with
-extra params. So both the Python wrapper and the Java pipeline
-component get copied.
-extra dict, optional Extra parameters to copy to the new instance
Copy of this instance
-explainParam ( param )
-Explains a single param and returns its name, doc, and optional
-default value and user-supplied value in a string.
-explainParams ( )
-Returns the documentation of all params with their optionally
-default values and user-supplied values.
-Extracts the embedded default param values and user-supplied
-values, and then merges them with extra values from input into
-a flat param map, where the latter value is used if there exist
-conflicts, i.e., with ordering: default param values <
-user-supplied values < extra.
-extra dict, optional extra param values
-dict merged param map
-fit ( dataset , params = None )
-Fits a model to the input dataset with optional parameters.
-dataset pyspark.sql.DataFrame
input dataset.
-params dict or list or tuple, optional an optional param map that overrides embedded params. If a list/tuple of
-param maps is given, this calls fit on each param map and returns a list of
or a list of Transformer
fitted model(s)
-fitMultiple ( dataset , paramMaps )
-Fits a model to the input dataset for each param map in paramMaps .
-dataset pyspark.sql.DataFrame
input dataset.
-paramMaps collections.abc.Sequence
A Sequence of param maps.
A thread safe iterable which contains one model for each param map. Each
-call to next(modelIterator) will return (index, model) where model was fit
-using paramMaps[index] . index values may not be sequential.
-getInputCols ( )
-Gets current column names of input annotations.
-getLazyAnnotator ( )
-Gets whether Annotator should be evaluated lazily in a
-getOrDefault ( param )
-Gets the value of a param in the user-supplied param map or its
-default value. Raises an error if neither is set.
-getOutputCol ( )
-Gets output column name of annotations.
-getParam ( paramName )
-Gets a param by its name.
-getParamValue ( paramName )
-Gets the value of a parameter.
-paramName str Name of the parameter
-hasDefault ( param )
-Checks whether a param has a default value.
-hasParam ( paramName )
-Tests whether this instance contains a param with a given
-(string) name.
-isDefined ( param )
-Checks whether a param is explicitly set by user or has
-a default value.
-isSet ( param )
-Checks whether a param is explicitly set by user.
-classmethod load ( path )
-Reads an ML instance from the input path, a shortcut of read().load(path) .
-property params
-Returns all params ordered by name. The default implementation
-uses dir()
to get all attributes of type
-classmethod read ( )
-Returns an MLReader instance for this class.
-save ( path )
-Save this ML instance to the given path, a shortcut of ‘write().save(path)’.
-set ( param , value )
-Sets a parameter in the embedded param map.
-setClassificationModelPath ( value ) [source]
-Sets a path to the the classification model if it has been already trained.
-label str Path to the the classification model if it has been already trained.
-setFitIntercept ( merge ) [source]
-Sets whether to fit an intercept term, default is true.
-label str Whether to fit an intercept term, default is true.
-setInputCols ( * value )
-Sets column names of input annotations.
-*value str Input columns for the annotator
-setLabelColumn ( label ) [source]
-Sets column with the value result we are trying to predict.
-label str Column with the value result we are trying to predict.
-setLabels ( value ) [source]
-Sets array to output the label in the original form.
-label list array to output the label in the original form.
-setLazyAnnotator ( value )
-Sets whether Annotator should be evaluated lazily in a
-value bool Whether Annotator should be evaluated lazily in a
-setMaxIter ( k ) [source]
-Sets maximum number of iterations.
-k int Maximum number of iterations.
-setOutputCol ( value )
-Sets output column name of annotations.
-value str Name of output column
-setParamValue ( paramName )
-Sets the value of a parameter.
-paramName str Name of the parameter
-setTol ( dist ) [source]
-Sets convergence tolerance after each iteration.
-dist float Convergence tolerance after each iteration.
-setVectorizationModelPath ( value ) [source]
-Sets a path to the the classification model if it has been already trained.
-label str Path to the the classification model if it has been already trained.
-A unique id for the object.
-write ( )
-Returns an MLWriter instance for this ML instance.
\ No newline at end of file
diff --git a/docs/licensed/api/python/reference/autosummary/_autosummary/sparknlp_jsl.annotator.DocumentLogRegClassifierModel.html b/docs/licensed/api/python/reference/autosummary/_autosummary/sparknlp_jsl.annotator.DocumentLogRegClassifierModel.html
deleted file mode 100644
index 2e88ca74be..0000000000
--- a/docs/licensed/api/python/reference/autosummary/_autosummary/sparknlp_jsl.annotator.DocumentLogRegClassifierModel.html
+++ /dev/null
@@ -1,1097 +0,0 @@
sparknlp_jsl.annotator.DocumentLogRegClassifierModel — Spark NLP 3.3.0 documentation
-class sparknlp_jsl.annotator. DocumentLogRegClassifierModel ( classname = 'com.johnsnowlabs.nlp.annotators.classification.DocumentLogRegClassifierModel' , java_model = None ) [source]
-Bases: sparknlp.common.AnnotatorModel
-Classifies documents with a Logarithmic Regression algorithm.
-Input Annotation types
-Output Annotation type
-mergeChunks Whether to merge all chunks in a document or not (Default: false)
-labels Array to output the label in the original form.
-vectorizationModel Vectorization model if it has been already trained.
-classificationModel Classification model if it has been already trained.
([classname, java_model])
-Initialize this instance with a Java model object.
-Clears a param from the param map if it has been explicitly set.
-Creates a copy of this instance with the same uid and some extra params.
-Explains a single param and returns its name, doc, and optional default value and user-supplied value in a string.
-Returns the documentation of all params with their optionally default values and user-supplied values.
-Extracts the embedded default param values and user-supplied values, and then merges them with extra values from input into a flat param map, where the latter value is used if there exist conflicts, i.e., with ordering: default param values < user-supplied values < extra.
-Gets current column names of input annotations.
-Gets whether Annotator should be evaluated lazily in a RecursivePipeline.
-Gets the value of a param in the user-supplied param map or its default value.
-Gets output column name of annotations.
-Gets a param by its name.
-Gets the value of a parameter.
-Checks whether a param has a default value.
-Tests whether this instance contains a param with a given (string) name.
-Checks whether a param is explicitly set by user or has a default value.
-Checks whether a param is explicitly set by user.
-Reads an ML instance from the input path, a shortcut of read().load(path) .
(name[, lang, remote_loc])
-Returns an MLReader instance for this class.
-Save this ML instance to the given path, a shortcut of 'write().save(path)'.
(param, value)
-Sets a parameter in the embedded param map.
-Sets a path to the the classification model if it has been already trained.
-Sets column names of input annotations.
-Sets array to output the label in the original form.
-Sets whether Annotator should be evaluated lazily in a RecursivePipeline.
-Sets hether to merge all chunks in a document or not (Default: false)
-Sets output column name of annotations.
-Sets the value of a parameter.
-Sets a path to the the classification model if it has been already trained.
(dataset[, params])
-Transforms the input dataset with optional parameters.
-Returns an MLWriter instance for this ML instance.
-Returns all params ordered by name.
-clear ( param )
-Clears a param from the param map if it has been explicitly set.
-copy ( extra = None )
-Creates a copy of this instance with the same uid and some
-extra params. This implementation first calls Params.copy and
-then make a copy of the companion Java pipeline component with
-extra params. So both the Python wrapper and the Java pipeline
-component get copied.
-extra dict, optional Extra parameters to copy to the new instance
Copy of this instance
-explainParam ( param )
-Explains a single param and returns its name, doc, and optional
-default value and user-supplied value in a string.
-explainParams ( )
-Returns the documentation of all params with their optionally
-default values and user-supplied values.
-Extracts the embedded default param values and user-supplied
-values, and then merges them with extra values from input into
-a flat param map, where the latter value is used if there exist
-conflicts, i.e., with ordering: default param values <
-user-supplied values < extra.
-extra dict, optional extra param values
-dict merged param map
-getInputCols ( )
-Gets current column names of input annotations.
-getLazyAnnotator ( )
-Gets whether Annotator should be evaluated lazily in a
-getOrDefault ( param )
-Gets the value of a param in the user-supplied param map or its
-default value. Raises an error if neither is set.
-getOutputCol ( )
-Gets output column name of annotations.
-getParam ( paramName )
-Gets a param by its name.
-getParamValue ( paramName )
-Gets the value of a parameter.
-paramName str Name of the parameter
-hasDefault ( param )
-Checks whether a param has a default value.
-hasParam ( paramName )
-Tests whether this instance contains a param with a given
-(string) name.
-isDefined ( param )
-Checks whether a param is explicitly set by user or has
-a default value.
-isSet ( param )
-Checks whether a param is explicitly set by user.
-classmethod load ( path )
-Reads an ML instance from the input path, a shortcut of read().load(path) .
-property params
-Returns all params ordered by name. The default implementation
-uses dir()
to get all attributes of type
-classmethod read ( )
-Returns an MLReader instance for this class.
-save ( path )
-Save this ML instance to the given path, a shortcut of ‘write().save(path)’.
-set ( param , value )
-Sets a parameter in the embedded param map.
-setClassificationModel ( merge ) [source]
-Sets a path to the the classification model if it has been already trained.
-label: :class:`pyspark.ml.PipelineModel` Classification model if it has been already trained.
-setInputCols ( * value )
-Sets column names of input annotations.
-*value str Input columns for the annotator
-setLabels ( value ) [source]
-Sets array to output the label in the original form.
-label list array to output the label in the original form.
-setLazyAnnotator ( value )
-Sets whether Annotator should be evaluated lazily in a
-value bool Whether Annotator should be evaluated lazily in a
-setMergeChunks ( merge ) [source]
-Sets hether to merge all chunks in a document or not (Default: false)
-label list whether to merge all chunks in a document or not (Default: false)
-setOutputCol ( value )
-Sets output column name of annotations.
-value str Name of output column
-setParamValue ( paramName )
-Sets the value of a parameter.
-paramName str Name of the parameter
-setVectorizationModel ( merge ) [source]
-Sets a path to the the classification model if it has been already trained.
-label: :class:`pyspark.ml.PipelineModel` Classification model if it has been already trained.
-transform ( dataset , params = None )
-Transforms the input dataset with optional parameters.
-dataset pyspark.sql.DataFrame
input dataset
-params dict, optional an optional param map that overrides embedded params.
transformed dataset
-A unique id for the object.
-write ( )
-Returns an MLWriter instance for this ML instance.
\ No newline at end of file
diff --git a/docs/licensed/api/python/reference/autosummary/_autosummary/sparknlp_jsl.annotator.DrugNormalizer.html b/docs/licensed/api/python/reference/autosummary/_autosummary/sparknlp_jsl.annotator.DrugNormalizer.html
deleted file mode 100644
index 655a68a2a4..0000000000
--- a/docs/licensed/api/python/reference/autosummary/_autosummary/sparknlp_jsl.annotator.DrugNormalizer.html
+++ /dev/null
@@ -1,1072 +0,0 @@
sparknlp_jsl.annotator.DrugNormalizer — Spark NLP 3.3.0 documentation
-class sparknlp_jsl.annotator. DrugNormalizer [source]
-Bases: sparknlp.common.AnnotatorModel
-Annotator which normalizes raw text from clinical documents, e.g. scraped web pages or xml documents, from document type columns into Sentence. Removes all dirty characters from text following one or more input regex patterns.
-Can apply non wanted character removal which a specific policy.
-Can apply lower case normalization.
-Input Annotation types
-Output Annotation type
-lowercase whether to convert strings to lowercase
-policy policy to remove patterns from text. Defaults “all”
->>> data = spark . createDataFrame ([
-... [ "Sodium Chloride/Potassium Chloride 13bag" ],
-... [ "interferon alfa-2b 10 million unit ( 1 ml ) injec" ],
-... [ "aspirin 10 meq/ 5 ml oral sol" ]
-... ]) . toDF ( "text" )
->>> document = DocumentAssembler () . setInputCol ( "text" ) . setOutputCol ( "document" )
->>> drugNormalizer = DrugNormalizer () . setInputCols ([ "document" ]) . setOutputCol ( "document_normalized" )
->>> trainingPipeline = Pipeline ( stages = [ document , drugNormalizer ])
->>> result = trainingPipeline . fit ( data ) . transform ( data )
->>> result . selectExpr ( "explode(document_normalized.result) as normalized_text" ) . show ( truncate = False )
-|normalized_text |
-|Sodium Chloride / Potassium Chloride 13 bag |
-|interferon alfa - 2b 10000000 unt ( 1 ml ) injection|
-|aspirin 2 meq/ml oral solution |
-Initialize this instance with a Java model object.
-Clears a param from the param map if it has been explicitly set.
-Creates a copy of this instance with the same uid and some extra params.
-Explains a single param and returns its name, doc, and optional default value and user-supplied value in a string.
-Returns the documentation of all params with their optionally default values and user-supplied values.
-Extracts the embedded default param values and user-supplied values, and then merges them with extra values from input into a flat param map, where the latter value is used if there exist conflicts, i.e., with ordering: default param values < user-supplied values < extra.
-Gets current column names of input annotations.
-Gets whether Annotator should be evaluated lazily in a RecursivePipeline.
-Gets the value of a param in the user-supplied param map or its default value.
-Gets output column name of annotations.
-Gets a param by its name.
-Gets the value of a parameter.
-Checks whether a param has a default value.
-Tests whether this instance contains a param with a given (string) name.
-Checks whether a param is explicitly set by user or has a default value.
-Checks whether a param is explicitly set by user.
-Reads an ML instance from the input path, a shortcut of read().load(path) .
-Returns an MLReader instance for this class.
-Save this ML instance to the given path, a shortcut of 'write().save(path)'.
(param, value)
-Sets a parameter in the embedded param map.
-Sets column names of input annotations.
-Sets whether Annotator should be evaluated lazily in a RecursivePipeline.
-Sets whether to convert strings to lowercase
-Sets output column name of annotations.
-Sets the value of a parameter.
-Sets policy to remove patterns from text.
(dataset[, params])
-Transforms the input dataset with optional parameters.
-Returns an MLWriter instance for this ML instance.
-Returns all params ordered by name.
-clear ( param )
-Clears a param from the param map if it has been explicitly set.
-copy ( extra = None )
-Creates a copy of this instance with the same uid and some
-extra params. This implementation first calls Params.copy and
-then make a copy of the companion Java pipeline component with
-extra params. So both the Python wrapper and the Java pipeline
-component get copied.
-extra dict, optional Extra parameters to copy to the new instance
Copy of this instance
-explainParam ( param )
-Explains a single param and returns its name, doc, and optional
-default value and user-supplied value in a string.
-explainParams ( )
-Returns the documentation of all params with their optionally
-default values and user-supplied values.
-Extracts the embedded default param values and user-supplied
-values, and then merges them with extra values from input into
-a flat param map, where the latter value is used if there exist
-conflicts, i.e., with ordering: default param values <
-user-supplied values < extra.
-extra dict, optional extra param values
-dict merged param map
-getInputCols ( )
-Gets current column names of input annotations.
-getLazyAnnotator ( )
-Gets whether Annotator should be evaluated lazily in a
-getOrDefault ( param )
-Gets the value of a param in the user-supplied param map or its
-default value. Raises an error if neither is set.
-getOutputCol ( )
-Gets output column name of annotations.
-getParam ( paramName )
-Gets a param by its name.
-getParamValue ( paramName )
-Gets the value of a parameter.
-paramName str Name of the parameter
-hasDefault ( param )
-Checks whether a param has a default value.
-hasParam ( paramName )
-Tests whether this instance contains a param with a given
-(string) name.
-isDefined ( param )
-Checks whether a param is explicitly set by user or has
-a default value.
-isSet ( param )
-Checks whether a param is explicitly set by user.
-classmethod load ( path )
-Reads an ML instance from the input path, a shortcut of read().load(path) .
-property params
-Returns all params ordered by name. The default implementation
-uses dir()
to get all attributes of type
-classmethod read ( )
-Returns an MLReader instance for this class.
-save ( path )
-Save this ML instance to the given path, a shortcut of ‘write().save(path)’.
-set ( param , value )
-Sets a parameter in the embedded param map.
-setInputCols ( * value )
-Sets column names of input annotations.
-*value str Input columns for the annotator
-setLazyAnnotator ( value )
-Sets whether Annotator should be evaluated lazily in a
-value bool Whether Annotator should be evaluated lazily in a
-setLowercase ( value ) [source]
-Sets whether to convert strings to lowercase
-p bool Whether to convert strings to lowercase
-setOutputCol ( value )
-Sets output column name of annotations.
-value str Name of output column
-setParamValue ( paramName )
-Sets the value of a parameter.
-paramName str Name of the parameter
-setPolicy ( value ) [source]
-Sets policy to remove patterns from text.
-p str policy to remove patterns from text.
-transform ( dataset , params = None )
-Transforms the input dataset with optional parameters.
-dataset pyspark.sql.DataFrame
input dataset
-params dict, optional an optional param map that overrides embedded params.
transformed dataset
-A unique id for the object.
-write ( )
-Returns an MLWriter instance for this ML instance.
\ No newline at end of file
diff --git a/docs/licensed/api/python/reference/autosummary/_autosummary/sparknlp_jsl.annotator.EntityChunkEmbeddings.html b/docs/licensed/api/python/reference/autosummary/_autosummary/sparknlp_jsl.annotator.EntityChunkEmbeddings.html
deleted file mode 100644
index d3b57419a6..0000000000
--- a/docs/licensed/api/python/reference/autosummary/_autosummary/sparknlp_jsl.annotator.EntityChunkEmbeddings.html
+++ /dev/null
@@ -1,1356 +0,0 @@
sparknlp_jsl.annotator.EntityChunkEmbeddings — Spark NLP 3.3.0 documentation
-class sparknlp_jsl.annotator. EntityChunkEmbeddings ( classname = 'com.johnsnowlabs.nlp.annotators.embeddings.EntityChunkEmbeddings' , java_model = None ) [source]
-Bases: sparknlp.annotator.BertSentenceEmbeddings
-Weighted average embeddings of multiple named entities chunk annotations
-Input Annotation types
-Output Annotation type
-Target entities and their related entities
-entityWeights Relative weights of entities.
-maxSyntacticDistance Maximal syntactic distance between related entities. Default value is 2.
-metformin 125 mg
-250 mg coumadin
-one pill paracetamol
-[-0.267413, 0.07614058, -0.5620966, 0.83838946, 0.8911504]
-[0.22319649, -0.07094894, -0.6885556, 0.79176235, 0.82672405]
-[-0.10939768, -0.29242, -0.3574444, 0.3981813, 0.79609615]
([classname, java_model])
-Initialize this instance with a Java model object.
-Clears a param from the param map if it has been explicitly set.
-Creates a copy of this instance with the same uid and some extra params.
-Explains a single param and returns its name, doc, and optional default value and user-supplied value in a string.
-Returns the documentation of all params with their optionally default values and user-supplied values.
-Extracts the embedded default param values and user-supplied values, and then merges them with extra values from input into a flat param map, where the latter value is used if there exist conflicts, i.e., with ordering: default param values < user-supplied values < extra.
-Gets current batch size.
-Gets whether to ignore case in tokens for embeddings matching.
-Gets embeddings dimension.
-Gets current column names of input annotations.
-Gets whether Annotator should be evaluated lazily in a RecursivePipeline.
-Gets the value of a param in the user-supplied param map or its default value.
-Gets output column name of annotations.
-Gets a param by its name.
-Gets the value of a parameter.
-Gets unique reference name for identification.
-Checks whether a param has a default value.
-Tests whether this instance contains a param with a given (string) name.
-Checks whether a param is explicitly set by user or has a default value.
-Checks whether a param is explicitly set by user.
-Reads an ML instance from the input path, a shortcut of read().load(path) .
(folder, spark_session)
-Loads a locally saved model.
([name, lang, remote_loc])
-Downloads and loads a pretrained model.
-Returns an MLReader instance for this class.
-Save this ML instance to the given path, a shortcut of 'write().save(path)'.
(param, value)
-Sets a parameter in the embedded param map.
-Sets batch size.
-Sets whether to ignore case in tokens for embeddings matching.
-Sets configProto from tensorflow, serialized into byte array.
-Sets embeddings dimension.
-Sets the relative weights of the embeddings of specific entities. By default the dictionary is empty and
-Sets column names of input annotations.
-Sets whether to use Long type instead of Int type for inputs buffer.
-Sets whether Annotator should be evaluated lazily in a RecursivePipeline.
-Sets max sentence length to process.
-Sets the maximal syntactic distance between related entities. Default value is 2. Parameters ---------- distance : int Maximal syntactic distance.
-Sets output column name of annotations.
-Sets the value of a parameter.
-Sets unique reference name for identification.
-Sets the target entities and maps them to their related entities.
(dataset[, params])
-Transforms the input dataset with optional parameters.
-Returns an MLWriter instance for this ML instance.
-Returns all params ordered by name.
-clear ( param )
-Clears a param from the param map if it has been explicitly set.
-copy ( extra = None )
-Creates a copy of this instance with the same uid and some
-extra params. This implementation first calls Params.copy and
-then make a copy of the companion Java pipeline component with
-extra params. So both the Python wrapper and the Java pipeline
-component get copied.
-extra dict, optional Extra parameters to copy to the new instance
Copy of this instance
-explainParam ( param )
-Explains a single param and returns its name, doc, and optional
-default value and user-supplied value in a string.
-explainParams ( )
-Returns the documentation of all params with their optionally
-default values and user-supplied values.
-Extracts the embedded default param values and user-supplied
-values, and then merges them with extra values from input into
-a flat param map, where the latter value is used if there exist
-conflicts, i.e., with ordering: default param values <
-user-supplied values < extra.
-extra dict, optional extra param values
-dict merged param map
-getBatchSize ( )
-Gets current batch size.
-int Current batch size
-getCaseSensitive ( )
-Gets whether to ignore case in tokens for embeddings matching.
-bool Whether to ignore case in tokens for embeddings matching
-getDimension ( )
-Gets embeddings dimension.
-getInputCols ( )
-Gets current column names of input annotations.
-getLazyAnnotator ( )
-Gets whether Annotator should be evaluated lazily in a
-getOrDefault ( param )
-Gets the value of a param in the user-supplied param map or its
-default value. Raises an error if neither is set.
-getOutputCol ( )
-Gets output column name of annotations.
-getParam ( paramName )
-Gets a param by its name.
-getParamValue ( paramName )
-Gets the value of a parameter.
-paramName str Name of the parameter
-getStorageRef ( )
-Gets unique reference name for identification.
-str Unique reference name for identification
-hasDefault ( param )
-Checks whether a param has a default value.
-hasParam ( paramName )
-Tests whether this instance contains a param with a given
-(string) name.
-isDefined ( param )
-Checks whether a param is explicitly set by user or has
-a default value.
-isSet ( param )
-Checks whether a param is explicitly set by user.
-classmethod load ( path )
-Reads an ML instance from the input path, a shortcut of read().load(path) .
-static loadSavedModel ( folder , spark_session )
-Loads a locally saved model.
-folder str Folder of the saved model
-spark_session pyspark.sql.SparkSession The current SparkSession
-BertSentenceEmbeddings The restored model
-property params
-Returns all params ordered by name. The default implementation
-uses dir()
to get all attributes of type
-static pretrained ( name = 'sbiobert_base_cased_mli' , lang = 'en' , remote_loc = 'clinical/models' ) [source]
-Downloads and loads a pretrained model.
-name str, optional Name of the pretrained model, by default “sent_small_bert_L2_768”
-lang str, optional Language of the pretrained model, by default “en”
-remote_loc str, optional Optional remote address of the resource, by default None. Will use
-Spark NLPs repositories otherwise.
-BertSentenceEmbeddings The restored model
-classmethod read ( )
-Returns an MLReader instance for this class.
-save ( path )
-Save this ML instance to the given path, a shortcut of ‘write().save(path)’.
-set ( param , value )
-Sets a parameter in the embedded param map.
-setBatchSize ( v )
-Sets batch size.
-v int Batch size
-setCaseSensitive ( value )
-Sets whether to ignore case in tokens for embeddings matching.
-value bool Whether to ignore case in tokens for embeddings matching
-setConfigProtoBytes ( b )
-Sets configProto from tensorflow, serialized into byte array.
-b List[str] ConfigProto from tensorflow, serialized into byte array
-setDimension ( value )
-Sets embeddings dimension.
-value int Embeddings dimension
-setEntityWeights ( weights = {} ) [source]
-Sets the relative weights of the embeddings of specific entities. By default the dictionary is empty and all entities have equal weights. If non-empty and some entity is not in it, then its weight is set to 0.
-weights: dict[str, float] Dictionary with the relative weighs of entities. The notation TARGET_ENTITY:RELATED_ENTITY can be used to
-specify the weight of a entity which is related to specific target entity (e.g. “DRUG:SYMPTOM”: 0.3).
-Entity names are case insensitive.
-setInputCols ( * value )
-Sets column names of input annotations.
-*value str Input columns for the annotator
-setIsLong ( value )
-Sets whether to use Long type instead of Int type for inputs buffer.
-Some Bert models require Long instead of Int.
-value bool Whether to use Long type instead of Int type for inputs buffer
-setLazyAnnotator ( value )
-Sets whether Annotator should be evaluated lazily in a
-value bool Whether Annotator should be evaluated lazily in a
-setMaxSentenceLength ( value )
-Sets max sentence length to process.
-value int Max sentence length to process
-setMaxSyntacticDistance ( distance ) [source]
-Sets the maximal syntactic distance between related entities. Default value is 2.
-distance : int
-Maximal syntactic distance
-setOutputCol ( value )
-Sets output column name of annotations.
-value str Name of output column
-setParamValue ( paramName )
-Sets the value of a parameter.
-paramName str Name of the parameter
-setStorageRef ( value )
-Sets unique reference name for identification.
-value str Unique reference name for identification
-setTargetEntities ( entities = {} ) [source]
-Sets the target entities and maps them to their related entities. A target entity with an empty list of
-related entities means all other entities are assumed to be related to it.
-entities: dict[str, list[str]] Dictionary with target and related entities (TARGET: [RELATED1, RELATED2,…]). If the list of related
-entities is empty, then all non-target entities are considered.
-Entity names are case insensitive.
-transform ( dataset , params = None )
-Transforms the input dataset with optional parameters.
-dataset pyspark.sql.DataFrame
input dataset
-params dict, optional an optional param map that overrides embedded params.
transformed dataset
-A unique id for the object.
-write ( )
-Returns an MLWriter instance for this ML instance.
\ No newline at end of file
diff --git a/docs/licensed/api/python/reference/autosummary/_autosummary/sparknlp_jsl.annotator.GenericClassifierApproach.html b/docs/licensed/api/python/reference/autosummary/_autosummary/sparknlp_jsl.annotator.GenericClassifierApproach.html
deleted file mode 100644
index 088f52f995..0000000000
--- a/docs/licensed/api/python/reference/autosummary/_autosummary/sparknlp_jsl.annotator.GenericClassifierApproach.html
+++ /dev/null
@@ -1,1286 +0,0 @@
sparknlp_jsl.annotator.GenericClassifierApproach — Spark NLP 3.3.0 documentation
-class sparknlp_jsl.annotator. GenericClassifierApproach ( classname = 'com.johnsnowlabs.nlp.annotators.generic_classifier.GenericClassifierApproach' ) [source]
-Bases: sparknlp.common.AnnotatorApproach
-Trains a TensorFlow model for generic classification of feature vectors. It takes FEATURE_VECTOR annotations from
-FeaturesAssembler` as input, classifies them and outputs CATEGORY annotations.
-Input Annotation types
-Output Annotation type
-labelColumn Column with one label per document
-batchSize Size for each batch in the optimization process
-epochsN Number of epochs for the optimization process
-learningRate Learning rate for the optimization proces
-dropou Dropout at the output of each laye
-validationSplit Validaiton split - how much data to use for validation
-modelFile File name to load the mode from
-fixImbalance A flag indicating whenther to balance the trainig set
-featureScaling Feature scaling method. Possible values are ‘zscore’, ‘minmax’ or empty (no scaling)
-outputLogsPath Path to folder where logs will be saved. If no path is specified, no logs are generated
->>> import sparknlp
->>> from sparknlp.base import *
->>> from sparknlp.common import *
->>> from sparknlp.annotator import *
->>> from sparknlp.training import *
->>> import sparknlp_jsl
->>> from sparknlp_jsl.base import *
->>> from sparknlp_jsl.annotator import *
->>> from pyspark.ml import Pipeline
->>> features_asm = FeaturesAssembler () ... . setInputCols ([ "feature_1" , "feature_2" , "..." , "feature_n" ]) ... . setOutputCol ( "features" )
->>> gen_clf = GenericClassifierApproach () \
-... . setLabelColumn ( "target" ) \
-... . setInputCols ([ "features" ]) \
-... . setOutputCol ( "prediction" ) \
-... . setModelFile ( "/path/to/graph_file.pb" ) \
-... . setEpochsNumber ( 50 ) \
-... . setBatchSize ( 100 ) \
-... . setFeatureScaling ( "zscore" ) \
-... . setlearningRate ( 0.001 ) \
-... . setFixImbalance ( True ) \
-... . setOutputLogsPath ( "logs" ) \
-... . setValidationSplit ( 0.2 ) # keep 20% of the data for validation purposes
->>> pipeline = Pipeline () . setStages ([
-... features_asm ,
-... gen_clf
->>> clf_model = pipeline . fit ( data )
-Clears a param from the param map if it has been explicitly set.
-Creates a copy of this instance with the same uid and some extra params.
-Explains a single param and returns its name, doc, and optional default value and user-supplied value in a string.
-Returns the documentation of all params with their optionally default values and user-supplied values.
-Extracts the embedded default param values and user-supplied values, and then merges them with extra values from input into a flat param map, where the latter value is used if there exist conflicts, i.e., with ordering: default param values < user-supplied values < extra.
(dataset[, params])
-Fits a model to the input dataset with optional parameters.
(dataset, paramMaps)
-Fits a model to the input dataset for each param map in paramMaps .
-Gets current column names of input annotations.
-Gets whether Annotator should be evaluated lazily in a RecursivePipeline.
-Gets the value of a param in the user-supplied param map or its default value.
-Gets output column name of annotations.
-Gets a param by its name.
-Gets the value of a parameter.
-Checks whether a param has a default value.
-Tests whether this instance contains a param with a given (string) name.
-Checks whether a param is explicitly set by user or has a default value.
-Checks whether a param is explicitly set by user.
-Reads an ML instance from the input path, a shortcut of read().load(path) .
-Returns an MLReader instance for this class.
-Save this ML instance to the given path, a shortcut of 'write().save(path)'.
(param, value)
-Sets a parameter in the embedded param map.
-Size for each batch in the optimization process
-Sets drouptup
-Sets number of epochs for the optimization process
-Sets Feature scaling method.
-Sets A flag indicating whenther to balance the trainig set.
-Sets column names of input annotations.
-Sets Size for each batch in the optimization process
-Sets whether Annotator should be evaluated lazily in a RecursivePipeline.
-Sets learning rate for the optimization process
-Sets file name to load the mode from"
-Sets output column name of annotations.
-Sets path to folder where logs will be saved.
-Sets the value of a parameter.
-Sets validaiton split - how much data to use for validation
-Returns an MLWriter instance for this ML instance.
-Returns all params ordered by name.
-clear ( param )
-Clears a param from the param map if it has been explicitly set.
-copy ( extra = None )
-Creates a copy of this instance with the same uid and some
-extra params. This implementation first calls Params.copy and
-then make a copy of the companion Java pipeline component with
-extra params. So both the Python wrapper and the Java pipeline
-component get copied.
-extra dict, optional Extra parameters to copy to the new instance
Copy of this instance
-explainParam ( param )
-Explains a single param and returns its name, doc, and optional
-default value and user-supplied value in a string.
-explainParams ( )
-Returns the documentation of all params with their optionally
-default values and user-supplied values.
-Extracts the embedded default param values and user-supplied
-values, and then merges them with extra values from input into
-a flat param map, where the latter value is used if there exist
-conflicts, i.e., with ordering: default param values <
-user-supplied values < extra.
-extra dict, optional extra param values
-dict merged param map
-fit ( dataset , params = None )
-Fits a model to the input dataset with optional parameters.
-dataset pyspark.sql.DataFrame
input dataset.
-params dict or list or tuple, optional an optional param map that overrides embedded params. If a list/tuple of
-param maps is given, this calls fit on each param map and returns a list of
or a list of Transformer
fitted model(s)
-fitMultiple ( dataset , paramMaps )
-Fits a model to the input dataset for each param map in paramMaps .
-dataset pyspark.sql.DataFrame
input dataset.
-paramMaps collections.abc.Sequence
A Sequence of param maps.
A thread safe iterable which contains one model for each param map. Each
-call to next(modelIterator) will return (index, model) where model was fit
-using paramMaps[index] . index values may not be sequential.
-getInputCols ( )
-Gets current column names of input annotations.
-getLazyAnnotator ( )
-Gets whether Annotator should be evaluated lazily in a
-getOrDefault ( param )
-Gets the value of a param in the user-supplied param map or its
-default value. Raises an error if neither is set.
-getOutputCol ( )
-Gets output column name of annotations.
-getParam ( paramName )
-Gets a param by its name.
-getParamValue ( paramName )
-Gets the value of a parameter.
-paramName str Name of the parameter
-hasDefault ( param )
-Checks whether a param has a default value.
-hasParam ( paramName )
-Tests whether this instance contains a param with a given
-(string) name.
-isDefined ( param )
-Checks whether a param is explicitly set by user or has
-a default value.
-isSet ( param )
-Checks whether a param is explicitly set by user.
-classmethod load ( path )
-Reads an ML instance from the input path, a shortcut of read().load(path) .
-property params
-Returns all params ordered by name. The default implementation
-uses dir()
to get all attributes of type
-classmethod read ( )
-Returns an MLReader instance for this class.
-save ( path )
-Save this ML instance to the given path, a shortcut of ‘write().save(path)’.
-set ( param , value )
-Sets a parameter in the embedded param map.
-setBatchSize ( size ) [source]
-Size for each batch in the optimization process
-size int Size for each batch in the optimization process
-setDropout ( dropout ) [source]
-Sets drouptup
-dropout float Dropout at the output of each layer
-setEpochsNumber ( epochs ) [source]
-Sets number of epochs for the optimization process
-epochs int Number of epochs for the optimization process
-setFeatureScaling ( feature_scaling ) [source]
-Sets Feature scaling method. Possible values are ‘zscore’, ‘minmax’ or empty (no scaling
-feature_scaling str Feature scaling method. Possible values are ‘zscore’, ‘minmax’ or empty (no scaling
-setFixImbalance ( fix_imbalance ) [source]
-Sets A flag indicating whenther to balance the trainig set.
-fix_imbalance bool A flag indicating whenther to balance the trainig set.
-setInputCols ( * value )
-Sets column names of input annotations.
-*value str Input columns for the annotator
-setLabelCol ( label_column ) [source]
-Sets Size for each batch in the optimization process
-label str Column with the value result we are trying to predict.
-setLazyAnnotator ( value )
-Sets whether Annotator should be evaluated lazily in a
-value bool Whether Annotator should be evaluated lazily in a
-setLearningRate ( lamda ) [source]
-Sets learning rate for the optimization process
-lamda float Learning rate for the optimization process
-setModelFile ( mode_file ) [source]
-Sets file name to load the mode from”
-label str File name to load the mode from”
-setOutputCol ( value )
-Sets output column name of annotations.
-value str Name of output column
-setOutputLogsPath ( output_logs_path ) [source]
-Sets path to folder where logs will be saved. If no path is specified, no logs are generated
-label str Path to folder where logs will be saved. If no path is specified, no logs are generated
-setParamValue ( paramName )
-Sets the value of a parameter.
-paramName str Name of the parameter
-setValidationSplit ( validation_split ) [source]
-Sets validaiton split - how much data to use for validation
-validation_split float Validaiton split - how much data to use for validation
-A unique id for the object.
-write ( )
-Returns an MLWriter instance for this ML instance.
\ No newline at end of file
diff --git a/docs/licensed/api/python/reference/autosummary/_autosummary/sparknlp_jsl.annotator.GenericClassifierModel.html b/docs/licensed/api/python/reference/autosummary/_autosummary/sparknlp_jsl.annotator.GenericClassifierModel.html
deleted file mode 100644
index df8e8916d1..0000000000
--- a/docs/licensed/api/python/reference/autosummary/_autosummary/sparknlp_jsl.annotator.GenericClassifierModel.html
+++ /dev/null
@@ -1,1033 +0,0 @@
sparknlp_jsl.annotator.GenericClassifierModel — Spark NLP 3.3.0 documentation
-class sparknlp_jsl.annotator. GenericClassifierModel ( classname = 'com.johnsnowlabs.nlp.annotators.generic_classifier.GenericClassifierModel' , java_model = None ) [source]
-Bases: sparknlp.common.AnnotatorModel
-Generic classifier of feature vectors. It takes FEATURE_VECTOR annotations from
-FeaturesAssembler` as input, classifies them and outputs CATEGORY annotations.
-Input Annotation types
-Output Annotation type
->>> import sparknlp
->>> from sparknlp.base import *
->>> from sparknlp.common import *
->>> from sparknlp.annotator import *
->>> from sparknlp.training import *
->>> import sparknlp_jsl
->>> from sparknlp_jsl.base import *
->>> from sparknlp_jsl.annotator import *
->>> from pyspark.ml import Pipeline
->>> features_asm = FeaturesAssembler () \
-... . setInputCols ([ "feature_1" , "feature_2" , "..." , "feature_n" ]) \
-... . setOutputCol ( "features" )
->>> gen_clf = GenericClassifierModel . pretrained () \
-... . setInputCols ([ "features" ]) \
-... . setOutputCol ( "prediction" ) \
->>> pipeline = Pipeline () . setStages ([
-... features_asm ,
-... gen_clf
->>> clf_model = pipeline . fit ( data )
([classname, java_model])
-Initialize this instance with a Java model object.
-Clears a param from the param map if it has been explicitly set.
-Creates a copy of this instance with the same uid and some extra params.
-Explains a single param and returns its name, doc, and optional default value and user-supplied value in a string.
-Returns the documentation of all params with their optionally default values and user-supplied values.
-Extracts the embedded default param values and user-supplied values, and then merges them with extra values from input into a flat param map, where the latter value is used if there exist conflicts, i.e., with ordering: default param values < user-supplied values < extra.
-Gets current column names of input annotations.
-Gets whether Annotator should be evaluated lazily in a RecursivePipeline.
-Gets the value of a param in the user-supplied param map or its default value.
-Gets output column name of annotations.
-Gets a param by its name.
-Gets the value of a parameter.
-Checks whether a param has a default value.
-Tests whether this instance contains a param with a given (string) name.
-Checks whether a param is explicitly set by user or has a default value.
-Checks whether a param is explicitly set by user.
-Reads an ML instance from the input path, a shortcut of read().load(path) .
(name[, lang, remote_loc])
-Returns an MLReader instance for this class.
-Save this ML instance to the given path, a shortcut of 'write().save(path)'.
(param, value)
-Sets a parameter in the embedded param map.
-Sets column names of input annotations.
-Sets whether Annotator should be evaluated lazily in a RecursivePipeline.
-Sets output column name of annotations.
-Sets the value of a parameter.
(dataset[, params])
-Transforms the input dataset with optional parameters.
-Returns an MLWriter instance for this ML instance.
-Returns all params ordered by name.
-clear ( param )
-Clears a param from the param map if it has been explicitly set.
-copy ( extra = None )
-Creates a copy of this instance with the same uid and some
-extra params. This implementation first calls Params.copy and
-then make a copy of the companion Java pipeline component with
-extra params. So both the Python wrapper and the Java pipeline
-component get copied.
-extra dict, optional Extra parameters to copy to the new instance
Copy of this instance
-explainParam ( param )
-Explains a single param and returns its name, doc, and optional
-default value and user-supplied value in a string.
-explainParams ( )
-Returns the documentation of all params with their optionally
-default values and user-supplied values.
-Extracts the embedded default param values and user-supplied
-values, and then merges them with extra values from input into
-a flat param map, where the latter value is used if there exist
-conflicts, i.e., with ordering: default param values <
-user-supplied values < extra.
-extra dict, optional extra param values
-dict merged param map
-getInputCols ( )
-Gets current column names of input annotations.
-getLazyAnnotator ( )
-Gets whether Annotator should be evaluated lazily in a
-getOrDefault ( param )
-Gets the value of a param in the user-supplied param map or its
-default value. Raises an error if neither is set.
-getOutputCol ( )
-Gets output column name of annotations.
-getParam ( paramName )
-Gets a param by its name.
-getParamValue ( paramName )
-Gets the value of a parameter.
-paramName str Name of the parameter
-hasDefault ( param )
-Checks whether a param has a default value.
-hasParam ( paramName )
-Tests whether this instance contains a param with a given
-(string) name.
-isDefined ( param )
-Checks whether a param is explicitly set by user or has
-a default value.
-isSet ( param )
-Checks whether a param is explicitly set by user.
-classmethod load ( path )
-Reads an ML instance from the input path, a shortcut of read().load(path) .
-property params
-Returns all params ordered by name. The default implementation
-uses dir()
to get all attributes of type
-classmethod read ( )
-Returns an MLReader instance for this class.
-save ( path )
-Save this ML instance to the given path, a shortcut of ‘write().save(path)’.
-set ( param , value )
-Sets a parameter in the embedded param map.
-setInputCols ( * value )
-Sets column names of input annotations.
-*value str Input columns for the annotator
-setLazyAnnotator ( value )
-Sets whether Annotator should be evaluated lazily in a
-value bool Whether Annotator should be evaluated lazily in a
-setOutputCol ( value )
-Sets output column name of annotations.
-value str Name of output column
-setParamValue ( paramName )
-Sets the value of a parameter.
-paramName str Name of the parameter
-transform ( dataset , params = None )
-Transforms the input dataset with optional parameters.
-dataset pyspark.sql.DataFrame
input dataset
-params dict, optional an optional param map that overrides embedded params.
transformed dataset
-A unique id for the object.
-write ( )
-Returns an MLWriter instance for this ML instance.
\ No newline at end of file
diff --git a/docs/licensed/api/python/reference/autosummary/_autosummary/sparknlp_jsl.annotator.IOBTagger.html b/docs/licensed/api/python/reference/autosummary/_autosummary/sparknlp_jsl.annotator.IOBTagger.html
deleted file mode 100644
index 1478af6feb..0000000000
--- a/docs/licensed/api/python/reference/autosummary/_autosummary/sparknlp_jsl.annotator.IOBTagger.html
+++ /dev/null
@@ -1,1063 +0,0 @@
sparknlp_jsl.annotator.IOBTagger — Spark NLP 3.3.0 documentation
-class sparknlp_jsl.annotator. IOBTagger ( classname = 'com.johnsnowlabs.nlp.annotators.ner.IOBTagger' , java_model = None ) [source]
-Bases: sparknlp.common.AnnotatorModel
-Merges token tags and NER labels from chunks in the specified format.
-For example output columns as inputs from
-Input Annotation types
-Output Annotation type
-Scheme Format of tags, either IOB or BIOES
->>> import sparknlp
->>> from sparknlp.base import *
->>> from sparknlp.common import *
->>> from sparknlp.annotator import *
->>> from sparknlp.training import *
->>> import sparknlp_jsl
->>> from sparknlp_jsl.base import *
->>> from sparknlp_jsl.annotator import *
->>> from pyspark.ml import Pipeline
->>> documentAssembler = DocumentAssembler () \
->>> data = spark . createDataFrame ([[ "A 63-year-old man presents to the hospital ..." ]]) . toDF ( "text" )
->>> documentAssembler = DocumentAssembler () . setInputCol ( "text" ) . setOutputCol ( "document" )
->>> sentenceDetector = SentenceDetector () . setInputCols ([ "document" ]) . setOutputCol ( "sentence" )
->>> tokenizer = Tokenizer () . setInputCols ([ "sentence" ]) . setOutputCol ( "token" )
->>> embeddings = WordEmbeddingsModel . pretrained ( "embeddings_clinical" , "en" , "clinical/models" ) . setOutputCol ( "embs" )
->>> nerModel = MedicalNerModel . pretrained ( "ner_jsl" , "en" , "clinical/models" ) . setInputCols ([ "sentence" , "token" , "embs" ]) . setOutputCol ( "ner" )
->>> nerConverter = NerConverter () . setInputCols ([ "sentence" , "token" , "ner" ]) . setOutputCol ( "ner_chunk" )
->>> iobTagger = IOBTagger () . setInputCols ([ "token" , "ner_chunk" ]) . setOutputCol ( "ner_label" )
->>> pipeline = Pipeline ( stages = [ documentAssembler , sentenceDetector , tokenizer , embeddings , nerModel , nerConverter , iobTagger ])
->>> result . selectExpr ( "explode(ner_label) as a" ) ... . selectExpr ( "a.begin" , "a.end" , "a.result as chunk" , "a.metadata.word as word" ) ... . where ( "chunk!='O'" ) . show ( 5 , False )
-|begin|end|chunk |word |
-|5 |15 |B-Age |63-year-old|
-|17 |19 |B-Gender |man |
-|64 |72 |B-Modifier |recurrent |
-|98 |107|B-Diagnosis|cellulitis |
-|110 |119|B-Diagnosis|pneumonias |
([classname, java_model])
-Initialize this instance with a Java model object.
-Clears a param from the param map if it has been explicitly set.
-Creates a copy of this instance with the same uid and some extra params.
-Explains a single param and returns its name, doc, and optional default value and user-supplied value in a string.
-Returns the documentation of all params with their optionally default values and user-supplied values.
-Extracts the embedded default param values and user-supplied values, and then merges them with extra values from input into a flat param map, where the latter value is used if there exist conflicts, i.e., with ordering: default param values < user-supplied values < extra.
-Gets current column names of input annotations.
-Gets whether Annotator should be evaluated lazily in a RecursivePipeline.
-Gets the value of a param in the user-supplied param map or its default value.
-Gets output column name of annotations.
-Gets a param by its name.
-Gets the value of a parameter.
-Checks whether a param has a default value.
-Tests whether this instance contains a param with a given (string) name.
-Checks whether a param is explicitly set by user or has a default value.
-Checks whether a param is explicitly set by user.
-Reads an ML instance from the input path, a shortcut of read().load(path) .
-Returns an MLReader instance for this class.
-Save this ML instance to the given path, a shortcut of 'write().save(path)'.
(param, value)
-Sets a parameter in the embedded param map.
-Sets column names of input annotations.
-Sets whether Annotator should be evaluated lazily in a RecursivePipeline.
-Sets output column name of annotations.
-Sets the value of a parameter.
-Sets format of tags, either IOB or BIOES
(dataset[, params])
-Transforms the input dataset with optional parameters.
-Returns an MLWriter instance for this ML instance.
-Returns all params ordered by name.
-clear ( param )
-Clears a param from the param map if it has been explicitly set.
-copy ( extra = None )
-Creates a copy of this instance with the same uid and some
-extra params. This implementation first calls Params.copy and
-then make a copy of the companion Java pipeline component with
-extra params. So both the Python wrapper and the Java pipeline
-component get copied.
-extra dict, optional Extra parameters to copy to the new instance
Copy of this instance
-explainParam ( param )
-Explains a single param and returns its name, doc, and optional
-default value and user-supplied value in a string.
-explainParams ( )
-Returns the documentation of all params with their optionally
-default values and user-supplied values.
-Extracts the embedded default param values and user-supplied
-values, and then merges them with extra values from input into
-a flat param map, where the latter value is used if there exist
-conflicts, i.e., with ordering: default param values <
-user-supplied values < extra.
-extra dict, optional extra param values
-dict merged param map
-getInputCols ( )
-Gets current column names of input annotations.
-getLazyAnnotator ( )
-Gets whether Annotator should be evaluated lazily in a
-getOrDefault ( param )
-Gets the value of a param in the user-supplied param map or its
-default value. Raises an error if neither is set.
-getOutputCol ( )
-Gets output column name of annotations.
-getParam ( paramName )
-Gets a param by its name.
-getParamValue ( paramName )
-Gets the value of a parameter.
-paramName str Name of the parameter
-hasDefault ( param )
-Checks whether a param has a default value.
-hasParam ( paramName )
-Tests whether this instance contains a param with a given
-(string) name.
-isDefined ( param )
-Checks whether a param is explicitly set by user or has
-a default value.
-isSet ( param )
-Checks whether a param is explicitly set by user.
-classmethod load ( path )
-Reads an ML instance from the input path, a shortcut of read().load(path) .
-property params
-Returns all params ordered by name. The default implementation
-uses dir()
to get all attributes of type
-classmethod read ( )
-Returns an MLReader instance for this class.
-save ( path )
-Save this ML instance to the given path, a shortcut of ‘write().save(path)’.
-set ( param , value )
-Sets a parameter in the embedded param map.
-setInputCols ( * value )
-Sets column names of input annotations.
-*value str Input columns for the annotator
-setLazyAnnotator ( value )
-Sets whether Annotator should be evaluated lazily in a
-value bool Whether Annotator should be evaluated lazily in a
-setOutputCol ( value )
-Sets output column name of annotations.
-value str Name of output column
-setParamValue ( paramName )
-Sets the value of a parameter.
-paramName str Name of the parameter
-setScheme ( f ) [source]
-Sets format of tags, either IOB or BIOES
-pairs str Format of tags, either IOB or BIOES
-transform ( dataset , params = None )
-Transforms the input dataset with optional parameters.
-dataset pyspark.sql.DataFrame
input dataset
-params dict, optional an optional param map that overrides embedded params.
transformed dataset
-A unique id for the object.
-write ( )
-Returns an MLWriter instance for this ML instance.
\ No newline at end of file
diff --git a/docs/licensed/api/python/reference/autosummary/_autosummary/sparknlp_jsl.annotator.MedicalBertForSequenceClassification.html b/docs/licensed/api/python/reference/autosummary/_autosummary/sparknlp_jsl.annotator.MedicalBertForSequenceClassification.html
deleted file mode 100644
index a5364a3b21..0000000000
--- a/docs/licensed/api/python/reference/autosummary/_autosummary/sparknlp_jsl.annotator.MedicalBertForSequenceClassification.html
+++ /dev/null
@@ -1,1264 +0,0 @@
sparknlp_jsl.annotator.MedicalBertForSequenceClassification — Spark NLP 3.3.0 documentation
-class sparknlp_jsl.annotator. MedicalBertForSequenceClassification ( classname = 'com.johnsnowlabs.nlp.annotators.classification.MedicalBertForSequenceClassification' , java_model = None ) [source]
-Bases: sparknlp.common.AnnotatorModel
, sparknlp.common.HasCaseSensitiveProperties
, sparknlp.common.HasBatchedAnnotate
-MedicalBertForTokenClassifier can load Bert Models with sequence classification/regression head on top
-(a linear layer on top of the pooled output) e.g. for multi-class document classification tasks.
-Pretrained models can be loaded with pretrained()
of the companion
-For available pretrained models please see the Models Hub .
-Models from the HuggingFace 🤗 Transformers library are also compatible with
-Spark NLP 🚀. To see which models are compatible and how to import them see
-Import Transformers into Spark NLP 🚀 .
-Input Annotation types
-Output Annotation type
-batchSize Batch size. Large values allows faster processing but requires more
-memory, by default 8
-caseSensitive Whether to ignore case in tokens for embeddings matching, by default
-configProtoBytes ConfigProto from tensorflow, serialized into byte array.
-maxSentenceLength Max sentence length to process, by default 128
->>> import sparknlp
->>> from sparknlp.base import *
->>> from sparknlp.annotator import *
->>> from pyspark.ml import Pipeline
->>> documentAssembler = DocumentAssembler () \
-... . setInputCol ( "text" ) \
-... . setOutputCol ( "document" )
->>> tokenizer = Tokenizer () \
-... . setInputCols ([ "document" ]) \
-... . setOutputCol ( "token" )
->>> tokenClassifier = MedicalBertForSequenceClassification . pretrained () \
-... . setInputCols ([ "token" , "document" ]) \
-... . setOutputCol ( "label" ) \
-... . setCaseSensitive ( True )
->>> pipeline = Pipeline () . setStages ([
-... documentAssembler ,
-... tokenizer ,
-... tokenClassifier
-... ])
->>> data = spark . createDataFrame ([[ "I felt a bit drowsy and had blurred vision after taking Aspirin." ]]) . toDF ( "text" )
->>> result = pipeline . fit ( data ) . transform ( data )
->>> result . select ( "label.result" ) . show ( truncate = False )
([classname, java_model])
-Initialize this instance with a Java model object.
-Clears a param from the param map if it has been explicitly set.
-Creates a copy of this instance with the same uid and some extra params.
-Explains a single param and returns its name, doc, and optional default value and user-supplied value in a string.
-Returns the documentation of all params with their optionally default values and user-supplied values.
-Extracts the embedded default param values and user-supplied values, and then merges them with extra values from input into a flat param map, where the latter value is used if there exist conflicts, i.e., with ordering: default param values < user-supplied values < extra.
-Gets current batch size.
-Gets whether to ignore case in tokens for embeddings matching.
-Gets current column names of input annotations.
-Gets whether Annotator should be evaluated lazily in a RecursivePipeline.
-Gets the value of a param in the user-supplied param map or its default value.
-Gets output column name of annotations.
-Gets a param by its name.
-Gets the value of a parameter.
-Checks whether a param has a default value.
-Tests whether this instance contains a param with a given (string) name.
-Checks whether a param is explicitly set by user or has a default value.
-Checks whether a param is explicitly set by user.
-Reads an ML instance from the input path, a shortcut of read().load(path) .
(folder, spark_session)
-Loads a locally saved model Parameters ---------- folder : str Folder of the saved model spark_session : pyspark.sql.SparkSession The current SparkSession
-Loads a locally saved model.
([name, lang, remote_loc])
-Downloads and loads a pretrained model.
-Returns an MLReader instance for this class.
-Save this ML instance to the given path, a shortcut of 'write().save(path)'.
(param, value)
-Sets a parameter in the embedded param map.
-Sets batch size.
-Sets whether to ignore case in tokens for embeddings matching.
-Instead of 1 class per sentence (if inputCols is '''sentence''') output 1 class per document by averaging probabilities in all sentences.
-Sets configProto from tensorflow, serialized into byte array.
-Sets column names of input annotations.
-Sets whether Annotator should be evaluated lazily in a RecursivePipeline.
-Sets max sentence length to process, by default 128.
-Sets output column name of annotations.
-Sets the value of a parameter.
(dataset[, params])
-Transforms the input dataset with optional parameters.
-Returns an MLWriter instance for this ML instance.
-Returns all params ordered by name.
-clear ( param )
-Clears a param from the param map if it has been explicitly set.
-copy ( extra = None )
-Creates a copy of this instance with the same uid and some
-extra params. This implementation first calls Params.copy and
-then make a copy of the companion Java pipeline component with
-extra params. So both the Python wrapper and the Java pipeline
-component get copied.
-extra dict, optional Extra parameters to copy to the new instance
Copy of this instance
-explainParam ( param )
-Explains a single param and returns its name, doc, and optional
-default value and user-supplied value in a string.
-explainParams ( )
-Returns the documentation of all params with their optionally
-default values and user-supplied values.
-Extracts the embedded default param values and user-supplied
-values, and then merges them with extra values from input into
-a flat param map, where the latter value is used if there exist
-conflicts, i.e., with ordering: default param values <
-user-supplied values < extra.
-extra dict, optional extra param values
-dict merged param map
-getBatchSize ( )
-Gets current batch size.
-int Current batch size
-getCaseSensitive ( )
-Gets whether to ignore case in tokens for embeddings matching.
-bool Whether to ignore case in tokens for embeddings matching
-getInputCols ( )
-Gets current column names of input annotations.
-getLazyAnnotator ( )
-Gets whether Annotator should be evaluated lazily in a
-getOrDefault ( param )
-Gets the value of a param in the user-supplied param map or its
-default value. Raises an error if neither is set.
-getOutputCol ( )
-Gets output column name of annotations.
-getParam ( paramName )
-Gets a param by its name.
-getParamValue ( paramName )
-Gets the value of a parameter.
-paramName str Name of the parameter
-hasDefault ( param )
-Checks whether a param has a default value.
-hasParam ( paramName )
-Tests whether this instance contains a param with a given
-(string) name.
-isDefined ( param )
-Checks whether a param is explicitly set by user or has
-a default value.
-isSet ( param )
-Checks whether a param is explicitly set by user.
-classmethod load ( path )
-Reads an ML instance from the input path, a shortcut of read().load(path) .
-static loadSavedModel ( folder , spark_session ) [source]
-Loads a locally saved model
-folder : str
-Folder of the saved model
-spark_sessionpyspark.sql.SparkSession The current SparkSession
-MedicalBertForSequenceClassification The restored model
-static loadSavedModelOpenSource ( bertForTokenClassifierPath , tfModelPath , spark_session ) [source]
-Loads a locally saved model.
-bertForTokenClassifierPath str Folder of the bertForTokenClassifier
-tfModelPath str Folder taht contains the tf model
-spark_session pyspark.sql.SparkSession The current SparkSession
-MedicalBertForSequenceClassification The restored model
-property params
-Returns all params ordered by name. The default implementation
-uses dir()
to get all attributes of type
-static pretrained ( name = 'bert_sequence_classifier_ade' , lang = 'en' , remote_loc = 'clinical/models' ) [source]
-Downloads and loads a pretrained model.
-name str, optional Name of the pretrained model.
-lang str, optional Language of the pretrained model, by default “en”
-remote_loc str, optional Optional remote address of the resource, by default None. Will use
-Spark NLPs repositories otherwise.
-MedicalBertForSequenceClassification The restored model
-classmethod read ( )
-Returns an MLReader instance for this class.
-save ( path )
-Save this ML instance to the given path, a shortcut of ‘write().save(path)’.
-set ( param , value )
-Sets a parameter in the embedded param map.
-setBatchSize ( v )
-Sets batch size.
-v int Batch size
-setCaseSensitive ( value )
-Sets whether to ignore case in tokens for embeddings matching.
-value bool Whether to ignore case in tokens for embeddings matching
-setCoalesceSentences ( value ) [source]
-Instead of 1 class per sentence (if inputCols is ‘’’sentence’’’) output 1 class per document by averaging probabilities in all sentences.
-Due to max sequence length limit in almost all transformer models such as BERT (512 tokens), this parameter helps feeding all the sentences
-into the model and averaging all the probabilities for the entire document instead of probabilities per sentence. (Default: true)
-value bool If the output of all sentences will be averaged to one output
-setConfigProtoBytes ( b ) [source]
-Sets configProto from tensorflow, serialized into byte array.
-b List[str] ConfigProto from tensorflow, serialized into byte array
-setInputCols ( * value )
-Sets column names of input annotations.
-*value str Input columns for the annotator
-setLazyAnnotator ( value )
-Sets whether Annotator should be evaluated lazily in a
-value bool Whether Annotator should be evaluated lazily in a
-setMaxSentenceLength ( value ) [source]
-Sets max sentence length to process, by default 128.
-value int Max sentence length to process
-setOutputCol ( value )
-Sets output column name of annotations.
-value str Name of output column
-setParamValue ( paramName )
-Sets the value of a parameter.
-paramName str Name of the parameter
-transform ( dataset , params = None )
-Transforms the input dataset with optional parameters.
-dataset pyspark.sql.DataFrame
input dataset
-params dict, optional an optional param map that overrides embedded params.
transformed dataset
-A unique id for the object.
-write ( )
-Returns an MLWriter instance for this ML instance.
\ No newline at end of file
diff --git a/docs/licensed/api/python/reference/autosummary/_autosummary/sparknlp_jsl.annotator.MedicalBertForTokenClassifier.html b/docs/licensed/api/python/reference/autosummary/_autosummary/sparknlp_jsl.annotator.MedicalBertForTokenClassifier.html
deleted file mode 100644
index 19c0723a8a..0000000000
--- a/docs/licensed/api/python/reference/autosummary/_autosummary/sparknlp_jsl.annotator.MedicalBertForTokenClassifier.html
+++ /dev/null
@@ -1,1256 +0,0 @@
sparknlp_jsl.annotator.MedicalBertForTokenClassifier — Spark NLP 3.3.0 documentation
-class sparknlp_jsl.annotator. MedicalBertForTokenClassifier ( classname = 'com.johnsnowlabs.nlp.annotators.classification.MedicalBertForTokenClassifier' , java_model = None ) [source]
-Bases: sparknlp.common.AnnotatorModel
, sparknlp.common.HasCaseSensitiveProperties
, sparknlp.common.HasBatchedAnnotate
-MedicalBertForTokenClassifier can load Bert Models with a token
-classification head on top (a linear layer on top of the hidden-states
-output) e.g. for Named-Entity-Recognition (NER) tasks.
-Pretrained models can be loaded with pretrained()
of the companion
->>> embeddings = MedicalBertForTokenClassifier . pretrained () \
-... . setInputCols ([ "token" , "document" ]) \
-... . setOutputCol ( "label" )
-The default model is "bert_token_classifier_ner_bionlp"
, if no name is
-For available pretrained models please see the Models Hub .
-Models from the HuggingFace 🤗 Transformers library are also compatible with
-Spark NLP 🚀. To see which models are compatible and how to import them see
-Import Transformers into Spark NLP 🚀 .
-Input Annotation types
-Output Annotation type
-batchSize Batch size. Large values allows faster processing but requires more
-memory, by default 8
-caseSensitive Whether to ignore case in tokens for embeddings matching, by default
-configProtoBytes ConfigProto from tensorflow, serialized into byte array.
-maxSentenceLength Max sentence length to process, by default 128
->>> import sparknlp
->>> from sparknlp.base import *
->>> from sparknlp.annotator import *
->>> from pyspark.ml import Pipeline
->>> documentAssembler = DocumentAssembler () \
-... . setInputCol ( "text" ) \
-... . setOutputCol ( "document" )
->>> tokenizer = Tokenizer () \
-... . setInputCols ([ "document" ]) \
-... . setOutputCol ( "token" )
->>> tokenClassifier = MedicalBertForTokenClassifier . pretrained () \
-... . setInputCols ([ "token" , "document" ]) \
-... . setOutputCol ( "label" ) \
-... . setCaseSensitive ( True )
->>> pipeline = Pipeline () . setStages ([
-... documentAssembler ,
-... tokenizer ,
-... tokenClassifier
-... ])
->>> data = spark . createDataFrame ([[ "Both the erbA IRES and the erbA/myb virus constructs transformed erythroid cells after infection of bone marrow or blastoderm cultures." ]]) . toDF ( "text" )
->>> result = pipeline . fit ( data ) . transform ( data )
->>> result . select ( "label.result" ) . show ( truncate = False )
-|[O, O, B-Organism, I-Organism, O, O, B-Organism, I-Organism, O, O, B-Cell, I-Cell, O,
-O, O, B-Multi-tissue_structure, I-Multi-tissue_structure, O, B-Cell, I-Cell, O]|
([classname, java_model])
-Initialize this instance with a Java model object.
-Clears a param from the param map if it has been explicitly set.
-Creates a copy of this instance with the same uid and some extra params.
-Explains a single param and returns its name, doc, and optional default value and user-supplied value in a string.
-Returns the documentation of all params with their optionally default values and user-supplied values.
-Extracts the embedded default param values and user-supplied values, and then merges them with extra values from input into a flat param map, where the latter value is used if there exist conflicts, i.e., with ordering: default param values < user-supplied values < extra.
-Gets current batch size.
-Gets whether to ignore case in tokens for embeddings matching.
-Gets current column names of input annotations.
-Gets whether Annotator should be evaluated lazily in a RecursivePipeline.
-Gets the value of a param in the user-supplied param map or its default value.
-Gets output column name of annotations.
-Gets a param by its name.
-Gets the value of a parameter.
-Checks whether a param has a default value.
-Tests whether this instance contains a param with a given (string) name.
-Checks whether a param is explicitly set by user or has a default value.
-Checks whether a param is explicitly set by user.
-Reads an ML instance from the input path, a shortcut of read().load(path) .
(folder, spark_session)
-Loads a locally saved model.
-Loads a locally saved model.
([name, lang, remote_loc])
-Downloads and loads a pretrained model.
-Returns an MLReader instance for this class.
-Save this ML instance to the given path, a shortcut of 'write().save(path)'.
(param, value)
-Sets a parameter in the embedded param map.
-Sets batch size.
-Sets whether to ignore case in tokens for embeddings matching.
-Sets configProto from tensorflow, serialized into byte array.
-Sets column names of input annotations.
-Sets whether Annotator should be evaluated lazily in a RecursivePipeline.
-Sets max sentence length to process, by default 128.
-Sets output column name of annotations.
-Sets the value of a parameter.
(dataset[, params])
-Transforms the input dataset with optional parameters.
-Returns an MLWriter instance for this ML instance.
-Returns all params ordered by name.
-clear ( param )
-Clears a param from the param map if it has been explicitly set.
-copy ( extra = None )
-Creates a copy of this instance with the same uid and some
-extra params. This implementation first calls Params.copy and
-then make a copy of the companion Java pipeline component with
-extra params. So both the Python wrapper and the Java pipeline
-component get copied.
-extra dict, optional Extra parameters to copy to the new instance
Copy of this instance
-explainParam ( param )
-Explains a single param and returns its name, doc, and optional
-default value and user-supplied value in a string.
-explainParams ( )
-Returns the documentation of all params with their optionally
-default values and user-supplied values.
-Extracts the embedded default param values and user-supplied
-values, and then merges them with extra values from input into
-a flat param map, where the latter value is used if there exist
-conflicts, i.e., with ordering: default param values <
-user-supplied values < extra.
-extra dict, optional extra param values
-dict merged param map
-getBatchSize ( )
-Gets current batch size.
-int Current batch size
-getCaseSensitive ( )
-Gets whether to ignore case in tokens for embeddings matching.
-bool Whether to ignore case in tokens for embeddings matching
-getInputCols ( )
-Gets current column names of input annotations.
-getLazyAnnotator ( )
-Gets whether Annotator should be evaluated lazily in a
-getOrDefault ( param )
-Gets the value of a param in the user-supplied param map or its
-default value. Raises an error if neither is set.
-getOutputCol ( )
-Gets output column name of annotations.
-getParam ( paramName )
-Gets a param by its name.
-getParamValue ( paramName )
-Gets the value of a parameter.
-paramName str Name of the parameter
-hasDefault ( param )
-Checks whether a param has a default value.
-hasParam ( paramName )
-Tests whether this instance contains a param with a given
-(string) name.
-isDefined ( param )
-Checks whether a param is explicitly set by user or has
-a default value.
-isSet ( param )
-Checks whether a param is explicitly set by user.
-classmethod load ( path )
-Reads an ML instance from the input path, a shortcut of read().load(path) .
-static loadSavedModel ( folder , spark_session ) [source]
-Loads a locally saved model.
-folder str Folder of the saved model
-spark_session pyspark.sql.SparkSession The current SparkSession
-MedicalBertForTokenClassifier The restored model
-static loadSavedModelOpenSource ( bertForTokenClassifierPath , tfModelPath , spark_session ) [source]
-Loads a locally saved model.
-bertForTokenClassifierPath str Folder of the bertForTokenClassifier
-tfModelPath str Folder taht contains the tf model
-spark_session pyspark.sql.SparkSession The current SparkSession
-MedicalBertForTokenClassifier The restored model
-property params
-Returns all params ordered by name. The default implementation
-uses dir()
to get all attributes of type
-static pretrained ( name = 'bert_token_classifier_ner_bionlp' , lang = 'en' , remote_loc = 'clinical/models' ) [source]
-Downloads and loads a pretrained model.
-name str, optional Name of the pretrained model, by default
-lang : str, optional
-Language of the pretrained model, by default “en”
-remote_loc : str, optional
-Optional remote address of the resource, by default None. Will use
-Spark NLPs repositories otherwise.
-MedicalBertForTokenClassifier The restored model
-classmethod read ( )
-Returns an MLReader instance for this class.
-save ( path )
-Save this ML instance to the given path, a shortcut of ‘write().save(path)’.
-set ( param , value )
-Sets a parameter in the embedded param map.
-setBatchSize ( v )
-Sets batch size.
-v int Batch size
-setCaseSensitive ( value )
-Sets whether to ignore case in tokens for embeddings matching.
-value bool Whether to ignore case in tokens for embeddings matching
-setConfigProtoBytes ( b ) [source]
-Sets configProto from tensorflow, serialized into byte array.
-b List[str] ConfigProto from tensorflow, serialized into byte array
-setInputCols ( * value )
-Sets column names of input annotations.
-*value str Input columns for the annotator
-setLazyAnnotator ( value )
-Sets whether Annotator should be evaluated lazily in a
-value bool Whether Annotator should be evaluated lazily in a
-setMaxSentenceLength ( value ) [source]
-Sets max sentence length to process, by default 128.
-value int Max sentence length to process
-setOutputCol ( value )
-Sets output column name of annotations.
-value str Name of output column
-setParamValue ( paramName )
-Sets the value of a parameter.
-paramName str Name of the parameter
-transform ( dataset , params = None )
-Transforms the input dataset with optional parameters.
-dataset pyspark.sql.DataFrame
input dataset
-params dict, optional an optional param map that overrides embedded params.
transformed dataset
-A unique id for the object.
-write ( )
-Returns an MLWriter instance for this ML instance.
\ No newline at end of file
diff --git a/docs/licensed/api/python/reference/autosummary/_autosummary/sparknlp_jsl.annotator.MedicalDistilBertForSequenceClassification.html b/docs/licensed/api/python/reference/autosummary/_autosummary/sparknlp_jsl.annotator.MedicalDistilBertForSequenceClassification.html
deleted file mode 100644
index f879904e5d..0000000000
--- a/docs/licensed/api/python/reference/autosummary/_autosummary/sparknlp_jsl.annotator.MedicalDistilBertForSequenceClassification.html
+++ /dev/null
@@ -1,1278 +0,0 @@
sparknlp_jsl.annotator.MedicalDistilBertForSequenceClassification — Spark NLP 3.3.0 documentation
-class sparknlp_jsl.annotator. MedicalDistilBertForSequenceClassification ( classname = 'com.johnsnowlabs.nlp.annotators.classification.MedicalDistilBertForSequenceClassification' , java_model = None ) [source]
-Bases: sparknlp.common.AnnotatorModel
, sparknlp.common.HasCaseSensitiveProperties
, sparknlp.common.HasBatchedAnnotate
-MedicalDistilBertForSequenceClassification can load DistilBERT Models with sequence classification/regression head on
-top (a linear layer on top of the pooled output) e.g. for multi-class document classification tasks.
-Pretrained models can be loaded with pretrained()
of the companion
->>> sequenceClassifier = MedicalDistilBertForSequenceClassification . pretrained () \
-... . setInputCols ([ "token" , "document" ]) \
-... . setOutputCol ( "label" )
-Models from the HuggingFace 🤗 Transformers library are also compatible with
-Spark NLP 🚀. To see which models are compatible and how to import them see
-Import Transformers into Spark NLP 🚀 .
-Input Annotation types
-Output Annotation type
-batchSize Batch size. Large values allows faster processing but requires more
-memory, by default 8
-caseSensitive Whether to ignore case in tokens for embeddings matching, by default
-configProtoBytes ConfigProto from tensorflow, serialized into byte array.
-maxSentenceLength Max sentence length to process, by default 128
-coalesceSentences Instead of 1 class per sentence (if inputCols is sentence) output 1 class per document by averaging
-probabilities in all sentences.
->>> import sparknlp
->>> from sparknlp.base import *
->>> from sparknlp.annotator import *
->>> from pyspark.ml import Pipeline
->>> documentAssembler = DocumentAssembler () \
-... . setInputCol ( "text" ) \
-... . setOutputCol ( "document" )
->>> tokenizer = Tokenizer () \
-... . setInputCols ([ "document" ]) \
-... . setOutputCol ( "token" )
->>> sequenceClassifier = MedicalDistilBertForSequenceClassification . pretrained () \
-... . setInputCols ([ "token" , "document" ]) \
-... . setOutputCol ( "label" ) \
-... . setCaseSensitive ( True )
->>> pipeline = Pipeline () . setStages ([
-... documentAssembler ,
-... tokenizer ,
-... sequenceClassifier
-... ])
->>> data = spark . createDataFrame ([[ "I felt a bit drowsy and had blurred vision after taking Aspirin." ]]) . toDF ( "text" )
->>> result = pipeline . fit ( data ) . transform ( data )
->>> result . select ( "label.result" ) . show ( truncate = False )
([classname, java_model])
-Initialize this instance with a Java model object.
-Clears a param from the param map if it has been explicitly set.
-Creates a copy of this instance with the same uid and some extra params.
-Explains a single param and returns its name, doc, and optional default value and user-supplied value in a string.
-Returns the documentation of all params with their optionally default values and user-supplied values.
-Extracts the embedded default param values and user-supplied values, and then merges them with extra values from input into a flat param map, where the latter value is used if there exist conflicts, i.e., with ordering: default param values < user-supplied values < extra.
-Gets current batch size.
-Gets whether to ignore case in tokens for embeddings matching.
-Returns labels used to train this model
-Gets current column names of input annotations.
-Gets whether Annotator should be evaluated lazily in a RecursivePipeline.
-Gets the value of a param in the user-supplied param map or its default value.
-Gets output column name of annotations.
-Gets a param by its name.
-Gets the value of a parameter.
-Checks whether a param has a default value.
-Tests whether this instance contains a param with a given (string) name.
-Checks whether a param is explicitly set by user or has a default value.
-Checks whether a param is explicitly set by user.
-Reads an ML instance from the input path, a shortcut of read().load(path) .
(folder, spark_session)
-Loads a locally saved model.
-Loads a locally saved model.
([name, lang, remote_loc])
-Downloads and loads a pretrained model.
-Returns an MLReader instance for this class.
-Save this ML instance to the given path, a shortcut of 'write().save(path)'.
(param, value)
-Sets a parameter in the embedded param map.
-Sets batch size.
-Sets whether to ignore case in tokens for embeddings matching.
-Instead of 1 class per sentence (if inputCols is '''sentence''') output 1 class per document by averaging probabilities in all sentences.
-Sets configProto from tensorflow, serialized into byte array.
-Sets column names of input annotations.
-Sets whether Annotator should be evaluated lazily in a RecursivePipeline.
-Sets max sentence length to process, by default 128.
-Sets output column name of annotations.
-Sets the value of a parameter.
(dataset[, params])
-Transforms the input dataset with optional parameters.
-Returns an MLWriter instance for this ML instance.
-Returns all params ordered by name.
-clear ( param )
-Clears a param from the param map if it has been explicitly set.
-copy ( extra = None )
-Creates a copy of this instance with the same uid and some
-extra params. This implementation first calls Params.copy and
-then make a copy of the companion Java pipeline component with
-extra params. So both the Python wrapper and the Java pipeline
-component get copied.
-extra dict, optional Extra parameters to copy to the new instance
Copy of this instance
-explainParam ( param )
-Explains a single param and returns its name, doc, and optional
-default value and user-supplied value in a string.
-explainParams ( )
-Returns the documentation of all params with their optionally
-default values and user-supplied values.
-Extracts the embedded default param values and user-supplied
-values, and then merges them with extra values from input into
-a flat param map, where the latter value is used if there exist
-conflicts, i.e., with ordering: default param values <
-user-supplied values < extra.
-extra dict, optional extra param values
-dict merged param map
-getBatchSize ( )
-Gets current batch size.
-int Current batch size
-getCaseSensitive ( )
-Gets whether to ignore case in tokens for embeddings matching.
-bool Whether to ignore case in tokens for embeddings matching
-getClasses ( ) [source]
-Returns labels used to train this model
-getInputCols ( )
-Gets current column names of input annotations.
-getLazyAnnotator ( )
-Gets whether Annotator should be evaluated lazily in a
-getOrDefault ( param )
-Gets the value of a param in the user-supplied param map or its
-default value. Raises an error if neither is set.
-getOutputCol ( )
-Gets output column name of annotations.
-getParam ( paramName )
-Gets a param by its name.
-getParamValue ( paramName )
-Gets the value of a parameter.
-paramName str Name of the parameter
-hasDefault ( param )
-Checks whether a param has a default value.
-hasParam ( paramName )
-Tests whether this instance contains a param with a given
-(string) name.
-isDefined ( param )
-Checks whether a param is explicitly set by user or has
-a default value.
-isSet ( param )
-Checks whether a param is explicitly set by user.
-classmethod load ( path )
-Reads an ML instance from the input path, a shortcut of read().load(path) .
-static loadSavedModel ( folder , spark_session ) [source]
-Loads a locally saved model.
-folder str Folder of the saved model
-spark_session pyspark.sql.SparkSession The current SparkSession
-DistilBertForSequenceClassification The restored model
-static loadSavedModelOpenSource ( destilBertForTokenClassifierPath , tfModelPath , spark_session ) [source]
-Loads a locally saved model.
-bertForTokenClassifierPath str Folder of the bertForTokenClassifier
-tfModelPath str Folder taht contains the tf model
-spark_session pyspark.sql.SparkSession The current SparkSession
-MedicalBertForSequenceClassification The restored model
-property params
-Returns all params ordered by name. The default implementation
-uses dir()
to get all attributes of type
-static pretrained ( name = 'distilbert_sequence_classifier_ade' , lang = 'en' , remote_loc = 'clinical/models' ) [source]
-Downloads and loads a pretrained model.
-name str, optional Name of the pretrained model, by default
-lang str, optional Language of the pretrained model, by default “en”
-remote_loc str, optional Optional remote address of the resource, by default None. Will use
-Spark NLPs repositories otherwise.
-MedicalBertForTokenClassifier The restored model
-classmethod read ( )
-Returns an MLReader instance for this class.
-save ( path )
-Save this ML instance to the given path, a shortcut of ‘write().save(path)’.
-set ( param , value )
-Sets a parameter in the embedded param map.
-setBatchSize ( v )
-Sets batch size.
-v int Batch size
-setCaseSensitive ( value )
-Sets whether to ignore case in tokens for embeddings matching.
-value bool Whether to ignore case in tokens for embeddings matching
-setCoalesceSentences ( value ) [source]
-Instead of 1 class per sentence (if inputCols is ‘’’sentence’’’) output 1 class per document by averaging probabilities in all sentences.
-Due to max sequence length limit in almost all transformer models such as BERT (512 tokens), this parameter helps feeding all the sentences
-into the model and averaging all the probabilities for the entire document instead of probabilities per sentence. (Default: true)
-value bool If the output of all sentences will be averaged to one output
-setConfigProtoBytes ( b ) [source]
-Sets configProto from tensorflow, serialized into byte array.
-b List[int] ConfigProto from tensorflow, serialized into byte array
-setInputCols ( * value )
-Sets column names of input annotations.
-*value str Input columns for the annotator
-setLazyAnnotator ( value )
-Sets whether Annotator should be evaluated lazily in a
-value bool Whether Annotator should be evaluated lazily in a
-setMaxSentenceLength ( value ) [source]
-Sets max sentence length to process, by default 128.
-value int Max sentence length to process
-setOutputCol ( value )
-Sets output column name of annotations.
-value str Name of output column
-setParamValue ( paramName )
-Sets the value of a parameter.
-paramName str Name of the parameter
-transform ( dataset , params = None )
-Transforms the input dataset with optional parameters.
-dataset pyspark.sql.DataFrame
input dataset
-params dict, optional an optional param map that overrides embedded params.
transformed dataset
-A unique id for the object.
-write ( )
-Returns an MLWriter instance for this ML instance.
\ No newline at end of file
diff --git a/docs/licensed/api/python/reference/autosummary/_autosummary/sparknlp_jsl.annotator.MedicalNerApproach.html b/docs/licensed/api/python/reference/autosummary/_autosummary/sparknlp_jsl.annotator.MedicalNerApproach.html
deleted file mode 100644
index ea25f76581..0000000000
--- a/docs/licensed/api/python/reference/autosummary/_autosummary/sparknlp_jsl.annotator.MedicalNerApproach.html
+++ /dev/null
@@ -1,1764 +0,0 @@
sparknlp_jsl.annotator.MedicalNerApproach — Spark NLP 3.3.0 documentation
-class sparknlp_jsl.annotator. MedicalNerApproach [source]
-Bases: sparknlp.common.AnnotatorApproach
, sparknlp.annotator.NerApproach
-This Named Entity recognition annotator allows to train generic NER model
-based on Neural Networks.
-The architecture of the neural network is a Char CNNs - BiLSTM - CRF that
-achieves state-of-the-art in most datasets.
-For instantiated/pretrained models, see NerDLModel
-The training data should be a labeled Spark Dataset, in the format of
2003 IOB with Annotation type columns. The data should
-have columns of type DOCUMENT, TOKEN, WORD_EMBEDDINGS
and an additional
-label column of annotator type NAMED_ENTITY
-Excluding the label, this can be done with for example:
-For extended examples of usage, see the Spark NLP Workshop .
-Input Annotation types
-Output Annotation type
-labelColumn Column with label per each token
-entities Entities to recognize
-minEpochs Minimum number of epochs to train, by default 0
-maxEpochs Maximum number of epochs to train, by default 50
-verbose Level of verbosity during training, by default 2
-randomSeed Random seed
-lr Learning Rate, by default 0.001
-po Learning rate decay coefficient. Real Learning Rage = lr / (1 + po *
-epoch), by default 0.005
-batchSize Batch size, by default 8
-dropout Dropout coefficient, by default 0.5
-graphFolder Folder path that contain external graph files
-configProtoBytes ConfigProto from tensorflow, serialized into byte array.
-useContrib whether to use contrib LSTM Cells. Not compatible with Windows. Might
-slightly improve accuracy
-validationSplit Choose the proportion of training dataset to be validated against the
-model on each Epoch. The value should be between 0.0 and 1.0 and by
-default it is 0.0 and off, by default 0.0
-evaluationLogExtended Whether logs for validation to be extended, by default False.
-testDataset Path to test dataset. If set used to calculate statistic on it during
-includeConfidence whether to include confidence scores in annotation metadata, by default
-includeAllConfidenceScores whether to include all confidence scores in annotation metadata or just
-the score of the predicted tag, by default False
-enableOutputLogs Whether to use stdout in addition to Spark logs, by default False
-outputLogsPath Folder path to save training logs
-enableMemoryOptimizer Whether to optimize for large datasets or not. Enabling this option can
-slow down training, by default False
->>> import sparknlp
->>> from sparknlp.base import *
->>> from sparknlp.common import *
->>> from sparknlp.annotator import *
->>> from sparknlp.training import *
->>> import sparknlp_jsl
->>> from sparknlp_jsl.base import *
->>> from sparknlp_jsl.annotator import *
->>> from pyspark.ml import Pipeline
-First extract the prerequisites for the NerDLApproach
->>> documentAssembler = DocumentAssembler () \
-... . setInputCol ( "text" ) \
-... . setOutputCol ( "document" )
->>> sentence = SentenceDetector () \
-... . setInputCols ([ "document" ]) \
-... . setOutputCol ( "sentence" )
->>> tokenizer = Tokenizer () \
-... . setInputCols ([ "sentence" ]) \
-... . setOutputCol ( "token" )
->>> embeddings = BertEmbeddings . pretrained () \
-... . setInputCols ([ "sentence" , "token" ]) \
-... . setOutputCol ( "embeddings" )
-Then the training can start
->>> nerTagger = MedicalNerApproach () \
-... . setInputCols ([ "sentence" , "token" , "embeddings" ]) \
-... . setLabelColumn ( "label" ) \
-... . setOutputCol ( "ner" ) \
-... . setMaxEpochs ( 1 ) \
-... . setRandomSeed ( 0 ) \
-... . setVerbose ( 0 )
->>> pipeline = Pipeline () . setStages ([
-... documentAssembler ,
-... sentence ,
-... tokenizer ,
-... embeddings ,
-... nerTagger
-... ])
->>> conll = CoNLL ()
->>> trainingData = conll . readDataset ( spark , "src/test/resources/conll2003/eng.train" )
->>> pipelineModel = pipeline . fit ( trainingData )
-Clears a param from the param map if it has been explicitly set.
-Creates a copy of this instance with the same uid and some extra params.
-Explains a single param and returns its name, doc, and optional default value and user-supplied value in a string.
-Returns the documentation of all params with their optionally default values and user-supplied values.
-Extracts the embedded default param values and user-supplied values, and then merges them with extra values from input into a flat param map, where the latter value is used if there exist conflicts, i.e., with ordering: default param values < user-supplied values < extra.
(dataset[, params])
-Fits a model to the input dataset with optional parameters.
(dataset, paramMaps)
-Fits a model to the input dataset for each param map in paramMaps .
-Gets current column names of input annotations.
-Gets column for label per each token.
-Gets whether Annotator should be evaluated lazily in a RecursivePipeline.
-Gets the value of a param in the user-supplied param map or its default value.
-Gets output column name of annotations.
-Gets a param by its name.
-Gets the value of a parameter.
-Checks whether a param has a default value.
-Tests whether this instance contains a param with a given (string) name.
-Checks whether a param is explicitly set by user or has a default value.
-Checks whether a param is explicitly set by user.
-Reads an ML instance from the input path, a shortcut of read().load(path) .
-Returns an MLReader instance for this class.
-Save this ML instance to the given path, a shortcut of 'write().save(path)'.
(param, value)
-Sets a parameter in the embedded param map.
-Sets batch size, by default 64.
-Sets configProto from tensorflow, serialized into byte array.
-Sets dropout coefficient, by default 0.5.
-Sets early stopping criterion.
-Sets the number of epochs with no performance improvement before training is terminated.
-Sets Whether to optimize for large datasets or not, by default False.
-Sets whether to use stdout in addition to Spark logs, by default False.
-Sets entities to recognize.
-Sets whether logs for validation to be extended, by default False.
-Sets path that contains the external graph file.
-Sets folder path that contain external graph files.
-Sets whether to include all confidence scores in annotation metadata or just the score of the predicted tag, by default False.
-Sets whether to include confidence scores in annotation metadata, by default False.
-Sets column names of input annotations.
-Sets name of column for data labels.
-Sets whether Annotator should be evaluated lazily in a RecursivePipeline.
-Sets folder path to save training logs.
-Sets Learning Rate, by default 0.001.
-Sets maximum number of epochs to train.
-Sets minimum number of epochs to train.
-Sets output column name of annotations.
-Sets folder path to save training logs.
-Sets whether to override already learned tags when using a pretrained model to initialize the new model.
-Sets the value of a parameter.
-Sets Learning rate decay coefficient, by default 0.005.
-Sets folder path to save training logs.
-Sets random seed for shuffling.
-Sets a map specifying how old tags are mapped to new ones.
(path[, read_as, options])
-Sets Path to test dataset.
-Sets whether to restore and use the model that has achieved the best performance at the end of the training.
-Sets whether to use contrib LSTM Cells.
-Sets the proportion of training dataset to be validated against the model on each Epoch, by default it is 0.0 and off.
-Sets level of verbosity during training.
-Returns an MLWriter instance for this ML instance.
-Returns all params ordered by name.
-clear ( param )
-Clears a param from the param map if it has been explicitly set.
-copy ( extra = None )
-Creates a copy of this instance with the same uid and some
-extra params. This implementation first calls Params.copy and
-then make a copy of the companion Java pipeline component with
-extra params. So both the Python wrapper and the Java pipeline
-component get copied.
-extra dict, optional Extra parameters to copy to the new instance
Copy of this instance
-explainParam ( param )
-Explains a single param and returns its name, doc, and optional
-default value and user-supplied value in a string.
-explainParams ( )
-Returns the documentation of all params with their optionally
-default values and user-supplied values.
-Extracts the embedded default param values and user-supplied
-values, and then merges them with extra values from input into
-a flat param map, where the latter value is used if there exist
-conflicts, i.e., with ordering: default param values <
-user-supplied values < extra.
-extra dict, optional extra param values
-dict merged param map
-fit ( dataset , params = None )
-Fits a model to the input dataset with optional parameters.
-dataset pyspark.sql.DataFrame
input dataset.
-params dict or list or tuple, optional an optional param map that overrides embedded params. If a list/tuple of
-param maps is given, this calls fit on each param map and returns a list of
or a list of Transformer
fitted model(s)
-fitMultiple ( dataset , paramMaps )
-Fits a model to the input dataset for each param map in paramMaps .
-dataset pyspark.sql.DataFrame
input dataset.
-paramMaps collections.abc.Sequence
A Sequence of param maps.
A thread safe iterable which contains one model for each param map. Each
-call to next(modelIterator) will return (index, model) where model was fit
-using paramMaps[index] . index values may not be sequential.
-getInputCols ( )
-Gets current column names of input annotations.
-getLabelColumn ( )
-Gets column for label per each token.
-str Column with label per each token
-getLazyAnnotator ( )
-Gets whether Annotator should be evaluated lazily in a
-getOrDefault ( param )
-Gets the value of a param in the user-supplied param map or its
-default value. Raises an error if neither is set.
-getOutputCol ( )
-Gets output column name of annotations.
-getParam ( paramName )
-Gets a param by its name.
-getParamValue ( paramName )
-Gets the value of a parameter.
-paramName str Name of the parameter
-hasDefault ( param )
-Checks whether a param has a default value.
-hasParam ( paramName )
-Tests whether this instance contains a param with a given
-(string) name.
-isDefined ( param )
-Checks whether a param is explicitly set by user or has
-a default value.
-isSet ( param )
-Checks whether a param is explicitly set by user.
-classmethod load ( path )
-Reads an ML instance from the input path, a shortcut of read().load(path) .
-property params
-Returns all params ordered by name. The default implementation
-uses dir()
to get all attributes of type
-classmethod read ( )
-Returns an MLReader instance for this class.
-save ( path )
-Save this ML instance to the given path, a shortcut of ‘write().save(path)’.
-set ( param , value )
-Sets a parameter in the embedded param map.
-setBatchSize ( v ) [source]
-Sets batch size, by default 64.
-v int Batch size
-setConfigProtoBytes ( b ) [source]
-Sets configProto from tensorflow, serialized into byte array.
-b List[str] ConfigProto from tensorflow, serialized into byte array
-setDropout ( v ) [source]
-Sets dropout coefficient, by default 0.5.
-v float Dropout coefficient
-setEarlyStoppingCriterion ( criterion ) [source]
-Sets early stopping criterion. A value 0 means no early stopping.
-criterion float Early stopping criterion.
-setEarlyStoppingPatience ( patience ) [source]
-Sets the number of epochs with no performance improvement before training is terminated.
-patience int Early stopping patience.
-setEnableMemoryOptimizer ( value ) [source]
-Sets Whether to optimize for large datasets or not, by default False.
-Enabling this option can slow down training.
-value bool Whether to optimize for large datasets
-setEnableOutputLogs ( value ) [source]
-Sets whether to use stdout in addition to Spark logs, by default
-value bool Whether to use stdout in addition to Spark logs
-setEntities ( tags )
-Sets entities to recognize.
-tags List[str] List of entities
-setEvaluationLogExtended ( v ) [source]
-Sets whether logs for validation to be extended, by default False.
-Displays time and evaluation of each label.
-v bool Whether logs for validation to be extended
-setGraphFile ( ff ) [source]
-Sets path that contains the external graph file. When specified, the provided file will be used, and no graph search will happen.
-p str Path that contains the external graph file. When specified, the provided file will be used, and no graph search will happen.
-setGraphFolder ( p ) [source]
-Sets folder path that contain external graph files.
-p str Folder path that contain external graph files
-setIncludeAllConfidenceScores ( value ) [source]
-Sets whether to include all confidence scores in annotation metadata
-or just the score of the predicted tag, by default False.
-value bool Whether to include all confidence scores in annotation metadata or
-just the score of the predicted tag
-setIncludeConfidence ( value ) [source]
-Sets whether to include confidence scores in annotation metadata, by
-default False.
-value bool Whether to include the confidence value in the output.
-setInputCols ( * value )
-Sets column names of input annotations.
-*value str Input columns for the annotator
-setLabelColumn ( value )
-Sets name of column for data labels.
-value str Column for data labels
-setLazyAnnotator ( value )
-Sets whether Annotator should be evaluated lazily in a
-value bool Whether Annotator should be evaluated lazily in a
-setLogPrefix ( s ) [source]
-Sets folder path to save training logs.
-p str Folder path to save training logs
-setLr ( v ) [source]
-Sets Learning Rate, by default 0.001.
-v float Learning Rate
-setMaxEpochs ( epochs )
-Sets maximum number of epochs to train.
-epochs int Maximum number of epochs to train
-setMinEpochs ( epochs )
-Sets minimum number of epochs to train.
-epochs int Minimum number of epochs to train
-setOutputCol ( value )
-Sets output column name of annotations.
-value str Name of output column
-setOutputLogsPath ( p ) [source]
-Sets folder path to save training logs.
-p str Folder path to save training logs
-setOverrideExistingTags ( value ) [source]
-Sets whether to override already learned tags when using a pretrained model to initialize the new model. Default is ‘true’
-value bool Whether to override already learned tags when using a pretrained model to initialize the new model. Default is ‘true’
-setParamValue ( paramName )
-Sets the value of a parameter.
-paramName str Name of the parameter
-setPo ( v ) [source]
-Sets Learning rate decay coefficient, by default 0.005.
-Real Learning Rage is lr / (1 + po * epoch).
-v float Learning rate decay coefficient
-setPretrainedModelPath ( value ) [source]
-Sets folder path to save training logs.
-value str Path to an already trained MedicalNerModel, which is used as a starting point for training the new model.
-setRandomSeed ( seed )
-Sets random seed for shuffling.
-seed int Random seed for shuffling
-setTagsMapping ( value ) [source]
-Sets a map specifying how old tags are mapped to new ones. It only works if setOverrideExistingTags
-value list A map specifying how old tags are mapped to new ones. It only works if setOverrideExistingTags
-setTestDataset ( path , read_as = 'SPARK' , options = {'format': 'parquet'} ) [source]
-Sets Path to test dataset. If set used to calculate statistic on it
-during training.
-path str Path to test dataset
-read_as str, optional How to read the resource, by default ReadAs.SPARK
-options dict, optional Options for reading the resource, by default {“format”: “parquet”}
-setUseBestModel ( value ) [source]
-Sets whether to restore and use the model that has achieved the best performance at the end of the training..
-The metric that is being monitored is macro F1 for the following cases(highest precendence first),
-value bool Whether to return the model that has achieved the best metrics across epochs.
-setUseContrib ( v ) [source]
-Sets whether to use contrib LSTM Cells. Not compatible with Windows.
-Might slightly improve accuracy.
-v bool Whether to use contrib LSTM Cells
-Exception Windows not supported to use contrib
-setValidationSplit ( v ) [source]
-Sets the proportion of training dataset to be validated against the
-model on each Epoch, by default it is 0.0 and off. The value should be
-between 0.0 and 1.0.
-v float Proportion of training dataset to be validated
-setVerbose ( verboseValue )
-Sets level of verbosity during training.
-verboseValue int Level of verbosity
-A unique id for the object.
-write ( )
-Returns an MLWriter instance for this ML instance.
\ No newline at end of file
diff --git a/docs/licensed/api/python/reference/autosummary/_autosummary/sparknlp_jsl.annotator.MedicalNerModel.html b/docs/licensed/api/python/reference/autosummary/_autosummary/sparknlp_jsl.annotator.MedicalNerModel.html
deleted file mode 100644
index 7833f017ed..0000000000
--- a/docs/licensed/api/python/reference/autosummary/_autosummary/sparknlp_jsl.annotator.MedicalNerModel.html
+++ /dev/null
@@ -1,1251 +0,0 @@
sparknlp_jsl.annotator.MedicalNerModel — Spark NLP 3.3.0 documentation
-class sparknlp_jsl.annotator. MedicalNerModel ( classname = 'com.johnsnowlabs.nlp.annotators.ner.MedicalNerModel' , java_model = None ) [source]
-Bases: sparknlp.common.AnnotatorModel
, sparknlp.common.HasStorageRef
, sparknlp.common.HasBatchedAnnotate
-This Named Entity recognition annotator is a generic NER model based on
-Neural Networks.
-Neural Network architecture is Char CNNs - BiLSTM - CRF that achieves
-state-of-the-art in most datasets.
-This is the instantiated model of the NerDLApproach
. For training
-your own model, please see the documentation of that class.
-Pretrained models can be loaded with pretrained()
of the companion
->>> nerModel = MedicalNerDLModel . pretrained () \
-... . setInputCols ([ "sentence" , "token" , "embeddings" ]) \
-... . setOutputCol ( "ner" )
-The default model is "ner_dl"
, if no name is provided.
-For available pretrained models please see the Models Hub .
-Additionally, pretrained pipelines are available for this module, see
-Pipelines .
-Note that some pretrained models require specific types of embeddings,
-depending on which they were trained on. For example, the default model
requires the WordEmbeddings "glove_100d"
-For extended examples of usage, see the Spark NLP Workshop .
-Input Annotation types
-Output Annotation type
-batchSize Size of every batch, by default 8
-configProtoBytes ConfigProto from tensorflow, serialized into byte array.
-includeConfidence Whether to include confidence scores in annotation metadata, by default
-includeAllConfidenceScores Whether to include all confidence scores in annotation metadata or just
-the score of the predicted tag, by default False
-inferenceBatchSize Number of sentences to process in a single batch during inference
-classes Tags used to trained this NerDLModel
-labelCasing: Setting all labels of the NER models upper/lower case. values upper|lower
->>> import sparknlp
->>> from sparknlp.base import *
->>> from sparknlp.common import *
->>> from sparknlp.annotator import *
->>> from sparknlp.training import *
->>> import sparknlp_jsl
->>> from sparknlp_jsl.base import *
->>> from sparknlp_jsl.annotator import *
->>> from pyspark.ml import Pipeline
->>> documentAssembler = DocumentAssembler () \
-... . setInputCol ( "text" ) \
-... . setOutputCol ( "document" )
->>> sentence = SentenceDetector () \
-... . setInputCols ([ "document" ]) \
-... . setOutputCol ( "sentence" )
->>> tokenizer = Tokenizer () \
-... . setInputCols ([ "sentence" ]) \
-... . setOutputCol ( "token" )
->>> embeddings = WordEmbeddingsModel . pretrained () \
-... . setInputCols ([ "sentence" , "token" ]) \
-... . setOutputCol ( "bert" )
->>> nerTagger = MedicalNerDLModel . pretrained () \
-... . setInputCols ([ "sentence" , "token" , "bert" ]) \
-... . setOutputCol ( "ner" )
->>> pipeline = Pipeline () . setStages ([
-... documentAssembler ,
-... sentence ,
-... tokenizer ,
-... embeddings ,
-... nerTagger
-... ])
->>> data = spark . createDataFrame ([[ "U.N. official Ekeus heads for Baghdad." ]]) . toDF ( "text" )
->>> result = pipeline . fit ( data ) . transform ( data )
([classname, java_model])
-Initialize this instance with a Java model object.
-Clears a param from the param map if it has been explicitly set.
-Creates a copy of this instance with the same uid and some extra params.
-Explains a single param and returns its name, doc, and optional default value and user-supplied value in a string.
-Returns the documentation of all params with their optionally default values and user-supplied values.
-Extracts the embedded default param values and user-supplied values, and then merges them with extra values from input into a flat param map, where the latter value is used if there exist conflicts, i.e., with ordering: default param values < user-supplied values < extra.
-Gets current batch size.
-Gets current column names of input annotations.
-Gets whether Annotator should be evaluated lazily in a RecursivePipeline.
-Gets the value of a param in the user-supplied param map or its default value.
-Gets output column name of annotations.
-Gets a param by its name.
-Gets the value of a parameter.
-Gets unique reference name for identification.
-Checks whether a param has a default value.
-Tests whether this instance contains a param with a given (string) name.
-Checks whether a param is explicitly set by user or has a default value.
-Checks whether a param is explicitly set by user.
-Reads an ML instance from the input path, a shortcut of read().load(path) .
(ner_model_path, folder, ...)
([name, lang, remote_loc])
-Returns an MLReader instance for this class.
-Save this ML instance to the given path, a shortcut of 'write().save(path)'.
(param, value)
-Sets a parameter in the embedded param map.
-Sets batch size.
-Sets configProto from tensorflow, serialized into byte array.
-Sets whether to include confidence scores in annotation metadata, by default False.
-Sets number of sentences to process in a single batch during inference
-Sets column names of input annotations.
-Setting all labels of the NER models upper/lower case.
-Sets whether Annotator should be evaluated lazily in a RecursivePipeline.
-Sets output column name of annotations.
-Sets the value of a parameter.
-Sets unique reference name for identification.
(dataset[, params])
-Transforms the input dataset with optional parameters.
-Returns an MLWriter instance for this ML instance.
-Returns all params ordered by name.
-clear ( param )
-Clears a param from the param map if it has been explicitly set.
-copy ( extra = None )
-Creates a copy of this instance with the same uid and some
-extra params. This implementation first calls Params.copy and
-then make a copy of the companion Java pipeline component with
-extra params. So both the Python wrapper and the Java pipeline
-component get copied.
-extra dict, optional Extra parameters to copy to the new instance
Copy of this instance
