Skip to content

Commit

Permalink
Merge branch 'cognitive_to_ai_service' of https://github.com/JessicaX…
Browse files Browse the repository at this point in the history
…YWang/SynapseML into cognitive_to_ai_service
  • Loading branch information
JessicaXYWang committed Oct 30, 2023
2 parents 3075732 + 34698aa commit 611a561
Show file tree
Hide file tree
Showing 249 changed files with 22,644 additions and 475 deletions.
17 changes: 0 additions & 17 deletions .github/workflows/on-pull-request-target-review.yml

This file was deleted.

30 changes: 15 additions & 15 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,10 +11,10 @@ SynapseML requires Scala 2.12, Spark 3.2+, and Python 3.8+.
| Topics | Links |
| :------ | :------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
| Build | [![Build Status](https://msdata.visualstudio.com/A365/_apis/build/status/microsoft.SynapseML?branchName=master)](https://msdata.visualstudio.com/A365/_build/latest?definitionId=17563&branchName=master) [![codecov](https://codecov.io/gh/Microsoft/SynapseML/branch/master/graph/badge.svg)](https://codecov.io/gh/Microsoft/SynapseML) [![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black) |
| Version | [![Version](https://img.shields.io/badge/version-0.11.3-blue)](https://github.com/Microsoft/SynapseML/releases) [![Release Notes](https://img.shields.io/badge/release-notes-blue)](https://github.com/Microsoft/SynapseML/releases) [![Snapshot Version](https://mmlspark.blob.core.windows.net/icons/badges/master_version3.svg)](#sbt) |
| Docs | [![Website](https://img.shields.io/badge/SynapseML-Website-blue)](https://aka.ms/spark) [![Scala Docs](https://img.shields.io/static/v1?label=api%20docs&message=scala&color=blue&logo=scala)](https://mmlspark.blob.core.windows.net/docs/0.11.3/scala/index.html#package) [![PySpark Docs](https://img.shields.io/static/v1?label=api%20docs&message=python&color=blue&logo=python)](https://mmlspark.blob.core.windows.net/docs/0.11.3/pyspark/index.html) [![Academic Paper](https://img.shields.io/badge/academic-paper-7fdcf7)](https://arxiv.org/abs/1810.08744) |
| Version | [![Version](https://img.shields.io/badge/version-0.11.4-blue)](https://github.com/Microsoft/SynapseML/releases) [![Release Notes](https://img.shields.io/badge/release-notes-blue)](https://github.com/Microsoft/SynapseML/releases) [![Snapshot Version](https://mmlspark.blob.core.windows.net/icons/badges/master_version3.svg)](#sbt) |
| Docs | [![Website](https://img.shields.io/badge/SynapseML-Website-blue)](https://aka.ms/spark) [![Scala Docs](https://img.shields.io/static/v1?label=api%20docs&message=scala&color=blue&logo=scala)](https://mmlspark.blob.core.windows.net/docs/0.11.4/scala/index.html#package) [![PySpark Docs](https://img.shields.io/static/v1?label=api%20docs&message=python&color=blue&logo=python)](https://mmlspark.blob.core.windows.net/docs/0.11.4/pyspark/index.html) [![Academic Paper](https://img.shields.io/badge/academic-paper-7fdcf7)](https://arxiv.org/abs/1810.08744) |
| Support | [![Gitter](https://badges.gitter.im/Microsoft/MMLSpark.svg)](https://gitter.im/Microsoft/MMLSpark?utm_source=badge&utm_medium=badge&utm_campaign=pr-badge) [![Mail](https://img.shields.io/badge/mail-synapseml--support-brightgreen)](mailto:[email protected]) |
| Binder | [![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/microsoft/SynapseML/v0.11.3?labpath=notebooks%2Ffeatures) |
| Binder | [![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/microsoft/SynapseML/v0.11.4?labpath=notebooks%2Ffeatures) |
| Usage | [![Downloads](https://static.pepy.tech/badge/synapseml)](https://pepy.tech/project/synapseml) |
<!-- markdownlint-disable MD033 -->
<details open>
Expand Down Expand Up @@ -95,7 +95,7 @@ In Azure Synapse notebooks please place the following in the first cell of your
{
"name": "synapseml",
"conf": {
"spark.jars.packages": "com.microsoft.azure:synapseml_2.12:0.11.3-spark3.3",
"spark.jars.packages": "com.microsoft.azure:synapseml_2.12:0.11.4-spark3.3",
"spark.jars.repositories": "https://mmlspark.azureedge.net/maven",
"spark.jars.excludes": "org.scala-lang:scala-reflect,org.apache.spark:spark-tags_2.12,org.scalactic:scalactic_2.12,org.scalatest:scalatest_2.12,com.fasterxml.jackson.core:jackson-databind",
"spark.yarn.user.classpath.first": "true",
Expand All @@ -111,7 +111,7 @@ In Azure Synapse notebooks please place the following in the first cell of your
{
"name": "synapseml",
"conf": {
"spark.jars.packages": "com.microsoft.azure:synapseml_2.12:0.11.3,org.apache.spark:spark-avro_2.12:3.3.1",
"spark.jars.packages": "com.microsoft.azure:synapseml_2.12:0.11.4,org.apache.spark:spark-avro_2.12:3.3.1",
"spark.jars.repositories": "https://mmlspark.azureedge.net/maven",
"spark.jars.excludes": "org.scala-lang:scala-reflect,org.apache.spark:spark-tags_2.12,org.scalactic:scalactic_2.12,org.scalatest:scalatest_2.12,com.fasterxml.jackson.core:jackson-databind",
"spark.yarn.user.classpath.first": "true",
Expand All @@ -131,15 +131,15 @@ cloud](http://community.cloud.databricks.com), create a new [library from Maven
coordinates](https://docs.databricks.com/user-guide/libraries.html#libraries-from-maven-pypi-or-spark-packages)
in your workspace.

For the coordinates use: `com.microsoft.azure:synapseml_2.12:0.11.3`
For the coordinates use: `com.microsoft.azure:synapseml_2.12:0.11.4`
with the resolver: `https://mmlspark.azureedge.net/maven`. Ensure this library is
attached to your target cluster(s).

Finally, ensure that your Spark cluster has at least Spark 3.2 and Scala 2.12. If you encounter Netty dependency issues please use DBR 10.1.

You can use SynapseML in both your Scala and PySpark notebooks. To get started with our example notebooks import the following databricks archive:

`https://mmlspark.blob.core.windows.net/dbcs/SynapseMLExamplesv0.11.3.dbc`
`https://mmlspark.blob.core.windows.net/dbcs/SynapseMLExamplesv0.11.4.dbc`

### Microsoft Fabric

Expand All @@ -152,7 +152,7 @@ In Microsoft Fabric notebooks please place the following in the first cell of yo
{
"name": "synapseml",
"conf": {
"spark.jars.packages": "com.microsoft.azure:synapseml_2.12:0.11.3-spark3.3",
"spark.jars.packages": "com.microsoft.azure:synapseml_2.12:0.11.4-spark3.3",
"spark.jars.repositories": "https://mmlspark.azureedge.net/maven",
"spark.jars.excludes": "org.scala-lang:scala-reflect,org.apache.spark:spark-tags_2.12,org.scalactic:scalactic_2.12,org.scalatest:scalatest_2.12,com.fasterxml.jackson.core:jackson-databind",
"spark.yarn.user.classpath.first": "true",
Expand All @@ -168,7 +168,7 @@ In Microsoft Fabric notebooks please place the following in the first cell of yo
{
"name": "synapseml",
"conf": {
"spark.jars.packages": "com.microsoft.azure:synapseml_2.12:0.11.3,org.apache.spark:spark-avro_2.12:3.3.1",
"spark.jars.packages": "com.microsoft.azure:synapseml_2.12:0.11.4,org.apache.spark:spark-avro_2.12:3.3.1",
"spark.jars.repositories": "https://mmlspark.azureedge.net/maven",
"spark.jars.excludes": "org.scala-lang:scala-reflect,org.apache.spark:spark-tags_2.12,org.scalactic:scalactic_2.12,org.scalatest:scalatest_2.12,com.fasterxml.jackson.core:jackson-databind",
"spark.yarn.user.classpath.first": "true",
Expand All @@ -187,7 +187,7 @@ the above example, or from python:
```python
import pyspark
spark = pyspark.sql.SparkSession.builder.appName("MyApp") \
.config("spark.jars.packages", "com.microsoft.azure:synapseml_2.12:0.11.3") \
.config("spark.jars.packages", "com.microsoft.azure:synapseml_2.12:0.11.4") \
.getOrCreate()
import synapse.ml
```
Expand All @@ -198,9 +198,9 @@ SynapseML can be conveniently installed on existing Spark clusters via the
`--packages` option, examples:

```bash
spark-shell --packages com.microsoft.azure:synapseml_2.12:0.11.3
pyspark --packages com.microsoft.azure:synapseml_2.12:0.11.3
spark-submit --packages com.microsoft.azure:synapseml_2.12:0.11.3 MyApp.jar
spark-shell --packages com.microsoft.azure:synapseml_2.12:0.11.4
pyspark --packages com.microsoft.azure:synapseml_2.12:0.11.4
spark-submit --packages com.microsoft.azure:synapseml_2.12:0.11.4 MyApp.jar
```

### SBT
Expand All @@ -209,7 +209,7 @@ If you are building a Spark application in Scala, add the following lines to
your `build.sbt`:

```scala
libraryDependencies += "com.microsoft.azure" % "synapseml_2.12" % "0.11.3"
libraryDependencies += "com.microsoft.azure" % "synapseml_2.12" % "0.11.4"
```

### Apache Livy and HDInsight
Expand All @@ -223,7 +223,7 @@ Excluding certain packages from the library may be necessary due to current issu
{
"name": "synapseml",
"conf": {
"spark.jars.packages": "com.microsoft.azure:synapseml_2.12:0.11.3",
"spark.jars.packages": "com.microsoft.azure:synapseml_2.12:0.11.4",
"spark.jars.excludes": "org.scala-lang:scala-reflect,org.apache.spark:spark-tags_2.12,org.scalactic:scalactic_2.12,org.scalatest:scalatest_2.12,com.fasterxml.jackson.core:jackson-databind"
}
}
Expand Down
4 changes: 2 additions & 2 deletions build.sbt
Original file line number Diff line number Diff line change
Expand Up @@ -70,7 +70,7 @@ pomPostProcess := pomPostFunc

val getDatasetsTask = TaskKey[Unit]("getDatasets", "download datasets used for testing")
val datasetName = "datasets-2023-04-03.tgz"
val datasetUrl = new URL(s"https://mmlspark.blob.core.windows.net/installers/$datasetName")
val datasetUrl = new URI(s"https://mmlspark.blob.core.windows.net/installers/$datasetName").toURL()
val datasetDir = settingKey[File]("The directory that holds the dataset")
ThisBuild / datasetDir := {
join((Compile / packageBin / artifactPath).value.getParentFile,
Expand Down Expand Up @@ -221,7 +221,7 @@ publishDotnetBase := {
packDotnetAssemblyCmd(join(dotnetBaseDir, "target").getAbsolutePath, dotnetBaseDir)
val packagePath = join(dotnetBaseDir,
// Update the version whenever there's a new release
"target", s"SynapseML.DotnetBase.${dotnetedVersion("0.11.3")}.nupkg").getAbsolutePath
"target", s"SynapseML.DotnetBase.${dotnetedVersion("0.11.4")}.nupkg").getAbsolutePath
publishDotnetAssemblyCmd(packagePath, genSleetConfig.value)
}

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ import com.microsoft.azure.synapse.ml.cognitive.anomaly.AnomalyDetectorProtocol.
import com.microsoft.azure.synapse.ml.core.contracts.HasOutputCol
import com.microsoft.azure.synapse.ml.core.schema.DatasetExtensions
import com.microsoft.azure.synapse.ml.io.http.ErrorUtils
import com.microsoft.azure.synapse.ml.logging.SynapseMLLogging
import com.microsoft.azure.synapse.ml.logging.{FeatureNames, SynapseMLLogging}
import com.microsoft.azure.synapse.ml.param.ServiceParam
import org.apache.http.entity.{AbstractHttpEntity, StringEntity}
import org.apache.spark.injections.UDFUtils
Expand Down Expand Up @@ -148,7 +148,7 @@ abstract class AnomalyDetectorBase(override val uid: String) extends CognitiveSe
object DetectLastAnomaly extends ComplexParamsReadable[DetectLastAnomaly] with Serializable

class DetectLastAnomaly(override val uid: String) extends AnomalyDetectorBase(uid) with SynapseMLLogging {
logClass()
logClass(FeatureNames.AiServices.Anomaly)

def this() = this(Identifiable.randomUID("DetectLastAnomaly"))

Expand All @@ -165,7 +165,7 @@ class DetectLastAnomaly(override val uid: String) extends AnomalyDetectorBase(ui
object DetectAnomalies extends ComplexParamsReadable[DetectAnomalies] with Serializable

class DetectAnomalies(override val uid: String) extends AnomalyDetectorBase(uid) with SynapseMLLogging {
logClass()
logClass(FeatureNames.AiServices.Anomaly)

def this() = this(Identifiable.randomUID("DetectAnomalies"))

Expand All @@ -183,7 +183,7 @@ object SimpleDetectAnomalies extends ComplexParamsReadable[SimpleDetectAnomalies

class SimpleDetectAnomalies(override val uid: String) extends AnomalyDetectorBase(uid)
with HasOutputCol with SynapseMLLogging {
logClass()
logClass(FeatureNames.AiServices.Anomaly)

def this() = this(Identifiable.randomUID("SimpleDetectAnomalies"))

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ import com.microsoft.azure.synapse.ml.core.schema.DatasetExtensions
import com.microsoft.azure.synapse.ml.io.http.HandlingUtils.{convertAndClose, sendWithRetries}
import com.microsoft.azure.synapse.ml.io.http.RESTHelpers.{Client, retry}
import com.microsoft.azure.synapse.ml.io.http._
import com.microsoft.azure.synapse.ml.logging.SynapseMLLogging
import com.microsoft.azure.synapse.ml.logging.{FeatureNames, SynapseMLLogging}
import com.microsoft.azure.synapse.ml.stages._
import com.microsoft.azure.synapse.ml.param.CognitiveServiceStructParam
import org.apache.commons.io.IOUtils
Expand Down Expand Up @@ -416,7 +416,7 @@ object SimpleFitMultivariateAnomaly extends ComplexParamsReadable[SimpleFitMulti

class SimpleFitMultivariateAnomaly(override val uid: String) extends Estimator[SimpleDetectMultivariateAnomaly]
with MADBase {
logClass()
logClass(FeatureNames.AiServices.Anomaly)

def this() = this(Identifiable.randomUID("SimpleFitMultivariateAnomaly"))

Expand Down Expand Up @@ -569,7 +569,7 @@ object SimpleDetectMultivariateAnomaly extends ComplexParamsReadable[SimpleDetec

class SimpleDetectMultivariateAnomaly(override val uid: String) extends Model[SimpleDetectMultivariateAnomaly]
with MADBase with HasHandler with DetectMAParams {
logClass()
logClass(FeatureNames.AiServices.Anomaly)

def this() = this(Identifiable.randomUID("SimpleDetectMultivariateAnomaly"))

Expand Down Expand Up @@ -654,7 +654,7 @@ class DetectLastMultivariateAnomaly(override val uid: String) extends CognitiveS
with HasSetLocation with HasCognitiveServiceInput with HasBatchSize
with ComplexParamsWritable with Wrappable
with HasErrorCol with SynapseMLLogging with DetectMAParams {
logClass()
logClass(FeatureNames.AiServices.Anomaly)

def this() = this(Identifiable.randomUID("DetectLastMultivariateAnomaly"))

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ package com.microsoft.azure.synapse.ml.cognitive.bing

import com.microsoft.azure.synapse.ml.cognitive._
import com.microsoft.azure.synapse.ml.core.utils.AsyncUtils
import com.microsoft.azure.synapse.ml.logging.SynapseMLLogging
import com.microsoft.azure.synapse.ml.logging.{FeatureNames, SynapseMLLogging}
import com.microsoft.azure.synapse.ml.param.ServiceParam
import com.microsoft.azure.synapse.ml.stages.Lambda
import org.apache.commons.io.IOUtils
Expand Down Expand Up @@ -67,7 +67,7 @@ object BingImageSearch extends ComplexParamsReadable[BingImageSearch] with Seria
class BingImageSearch(override val uid: String)
extends CognitiveServicesBase(uid)
with HasCognitiveServiceInput with HasInternalJsonOutputParser with SynapseMLLogging with HasSetLinkedService {
logClass()
logClass(FeatureNames.AiServices.BingImage)

override protected lazy val pyInternalWrapper = true

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ package com.microsoft.azure.synapse.ml.cognitive.face

import com.microsoft.azure.synapse.ml.cognitive._
import com.microsoft.azure.synapse.ml.cognitive.vision.HasImageUrl
import com.microsoft.azure.synapse.ml.logging.SynapseMLLogging
import com.microsoft.azure.synapse.ml.logging.{FeatureNames, SynapseMLLogging}
import com.microsoft.azure.synapse.ml.param.ServiceParam
import org.apache.http.entity.{AbstractHttpEntity, StringEntity}
import org.apache.spark.ml.ComplexParamsReadable
Expand All @@ -21,7 +21,7 @@ class DetectFace(override val uid: String)
extends CognitiveServicesBase(uid) with HasImageUrl with HasServiceParams
with HasCognitiveServiceInput with HasInternalJsonOutputParser with HasSetLocation with SynapseMLLogging
with HasSetLinkedService {
logClass()
logClass(FeatureNames.AiServices.Face)

def this() = this(Identifiable.randomUID("DetectFace"))

Expand Down Expand Up @@ -100,7 +100,7 @@ class FindSimilarFace(override val uid: String)
with HasMaxNumOfCandidatesReturned with HasFaceIds
with HasCognitiveServiceInput with HasInternalJsonOutputParser with HasSetLocation with SynapseMLLogging
with HasSetLinkedService {
logClass()
logClass(FeatureNames.AiServices.Face)

def this() = this(Identifiable.randomUID("FindSimilarFace"))

Expand Down Expand Up @@ -189,7 +189,7 @@ class GroupFaces(override val uid: String)
extends CognitiveServicesBase(uid) with HasServiceParams
with HasFaceIds with HasSetLocation
with HasCognitiveServiceInput with HasInternalJsonOutputParser with SynapseMLLogging with HasSetLinkedService {
logClass()
logClass(FeatureNames.AiServices.Face)

def this() = this(Identifiable.randomUID("GroupFaces"))

Expand All @@ -212,7 +212,7 @@ class IdentifyFaces(override val uid: String)
with HasMaxNumOfCandidatesReturned with HasFaceIds
with HasCognitiveServiceInput with HasInternalJsonOutputParser with HasSetLocation with SynapseMLLogging
with HasSetLinkedService {
logClass()
logClass(FeatureNames.AiServices.Face)

def this() = this(Identifiable.randomUID("IdentifyFaces"))

Expand Down Expand Up @@ -281,7 +281,7 @@ class VerifyFaces(override val uid: String)
extends CognitiveServicesBase(uid) with HasServiceParams
with HasCognitiveServiceInput with HasInternalJsonOutputParser with HasSetLocation with SynapseMLLogging
with HasSetLinkedService {
logClass()
logClass(FeatureNames.AiServices.Face)

def this() = this(Identifiable.randomUID("VerifyFaces"))

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ package com.microsoft.azure.synapse.ml.cognitive.form

import com.microsoft.azure.synapse.ml.codegen.Wrappable
import com.microsoft.azure.synapse.ml.core.contracts.{HasInputCol, HasOutputCol}
import com.microsoft.azure.synapse.ml.logging.SynapseMLLogging
import com.microsoft.azure.synapse.ml.logging.{FeatureNames, SynapseMLLogging}
import com.microsoft.azure.synapse.ml.param.DataTypeParam
import org.apache.spark.injections.UDFUtils
import org.apache.spark.ml.param.ParamMap
Expand Down Expand Up @@ -42,7 +42,7 @@ object FormOntologyLearner extends DefaultParamsReadable[FormOntologyLearner] {

class FormOntologyLearner(override val uid: String) extends Estimator[FormOntologyTransformer]
with SynapseMLLogging with DefaultParamsWritable with HasInputCol with HasOutputCol with Wrappable {
logClass()
logClass(FeatureNames.AiServices.Form)

def this() = this(Identifiable.randomUID("FormOntologyLearner"))

Expand Down Expand Up @@ -87,7 +87,7 @@ object FormOntologyTransformer extends ComplexParamsReadable[FormOntologyTransfo

class FormOntologyTransformer(override val uid: String) extends Model[FormOntologyTransformer]
with SynapseMLLogging with ComplexParamsWritable with HasInputCol with HasOutputCol with Wrappable {
logClass()
logClass(FeatureNames.AiServices.Form)

val ontology: DataTypeParam = new DataTypeParam(
parent = this,
Expand Down
Loading

0 comments on commit 611a561

Please sign in to comment.