Skip to content

Commit c3dbec6

Browse files
Update latest versions in documentation (#21247)
* Updating latest Spark .NET version, .net core version and microsoft spark jar version * fixing markdown errors * PR comments
1 parent 369a348 commit c3dbec6

20 files changed

+129
-117
lines changed

docs/spark/how-to-guides/broadcast-guide.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -92,6 +92,11 @@ Func<Column, Column> udf2 = Udf<string, string>(
9292
df.Select(udf2(df["_1"])).Show();
9393
```
9494

95+
## FAQs
96+
97+
**Why don't Broadcast Variables work with .NET Interactive?**
98+
Broadcast variables don't work with interactive scenarios because of .NET interactive's design of appending each object defined in a cell with its cell submission class, which since is not marked serializable, fails with the same exception as shown previously. For more information, please check out [this article](dotnet-interactive-udf-issue.md).
99+
95100
## Next steps
96101

97102
* [Get started with .NET for Apache Spark](../tutorials/get-started.md)

docs/spark/how-to-guides/connect-to-mongo-db.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -77,7 +77,7 @@ In order to get .NET for Apache Spark to talk to your MongoDB instance you need
7777
In order to run your .NET for Apache Spark application, you should define the `mongo-spark-connector` module as part of the build definition in your Spark project, using `libraryDependency` in `build.sbt` for sbt projects. For Spark environments such as `spark-submit` (or `spark-shell`) you should use the `--packages` command-line option like so:
7878
7979
```bash
80-
spark-submit --master local --packages org.mongodb.spark:mongo-spark-connector_2.12:3.0.0 --class org.apache.spark.deploy.dotnet.DotnetRunner microsoft-spark-<version>.jar yourApp.exe
80+
spark-submit --master local --packages org.mongodb.spark:mongo-spark-connector_2.12:3.0.0 --class org.apache.spark.deploy.dotnet.DotnetRunner microsoft-spark-<spark_majorversion-spark_minorversion>_<scala_majorversion.scala_minorversion>-<spark_dotnet_version>.jar yourApp.exe
8181
```
8282
8383
> [!NOTE]

docs/spark/how-to-guides/databricks-deploy-methods.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@ You can use the [spark-submit](https://spark.apache.org/docs/latest/submitting-a
1717
1. Navigate to your Databricks Workspace and create a job. Choose a title for your job, and then select **Configure spark-submit**. Paste the following parameters in the job configuration, then select **Confirm**.
1818

1919
```
20-
["--files","/dbfs/<path-to>/<app assembly/file to deploy to worker>","--class","org.apache.spark.deploy.dotnet.DotnetRunner","/dbfs/<path-to>/microsoft-spark-<spark_majorversion.spark_minorversion.x>-<spark_dotnet_version>.jar","/dbfs/<path-to>/<app name>.zip","<app bin name>","app arg1","app arg2"]
20+
["--files","/dbfs/<path-to>/<app assembly/file to deploy to worker>","--class","org.apache.spark.deploy.dotnet.DotnetRunner","/dbfs/<path-to>/microsoft-spark-<spark_majorversion-spark_minorversion>_<scala_majorversion.scala_minorversion>-<spark_dotnet_version>.jar","/dbfs/<path-to>/<app name>.zip","<app bin name>","app arg1","app arg2"]
2121
```
2222
2323
> [!NOTE]
@@ -35,7 +35,7 @@ Alternatively, you can use [Set Jar](/azure/databricks/jobs#--create-a-job) in y
3535
3636
1. Navigate to your Databricks cluster and select **Jobs** from the left-side menu, followed by **Set JAR**.
3737
38-
2. Upload the appropriate `microsoft-spark-<spark-version>-<spark-dotnet-version>.jar`.
38+
2. Upload the appropriate `microsoft-spark-<spark_majorversion-spark_minorversion>_<scala_majorversion.scala_minorversion>-<spark_dotnet_version>.jar`.
3939
4040
3. Modify the following parameters to include the correct name for the executable that you published in place of `<your-app-name>`:
4141

docs/spark/how-to-guides/deploy-worker-udf-binaries.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -55,7 +55,7 @@ Once the Spark application is [bundled](https://spark.apache.org/docs/latest/sub
5555
### After submitting my Spark application, I get the error `System.TypeLoadException: Could not load type 'System.Runtime.Remoting.Contexts.Context'`.
5656
> **Error:** [Error] [TaskRunner] [0] ProcessStream() failed with exception: System.TypeLoadException: Could not load type 'System.Runtime.Remoting.Contexts.Context' from assembly 'mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=...'.
5757
58-
**Answer:** Check the `Microsoft.Spark.Worker` version you are using. There are two versions: **.NET Framework 4.6.1** and **.NET Core 2.1.x**. In this case, `Microsoft.Spark.Worker.net461.win-x64-<version>` (which you can [download](https://github.com/dotnet/spark/releases)) should be used since `System.Runtime.Remoting.Contexts.Context` is only for .NET Framework.
58+
**Answer:** Check the `Microsoft.Spark.Worker` version you are using. There are two versions: **.NET Framework 4.6.1** and **.NET Core 3.1.x**. In this case, `Microsoft.Spark.Worker.net461.win-x64-<version>` (which you can [download](https://github.com/dotnet/spark/releases)) should be used since `System.Runtime.Remoting.Contexts.Context` is only for .NET Framework.
5959

6060
### How do I run my spark application with UDFs on YARN? Which environment variables and parameters should I use?
6161

@@ -69,7 +69,7 @@ spark-submit \
6969
--conf spark.yarn.appMasterEnv.DOTNET_WORKER_DIR=./worker/Microsoft.Spark.Worker-<version> \
7070
--conf spark.yarn.appMasterEnv.DOTNET_ASSEMBLY_SEARCH_PATHS=./udfs \
7171
--archives hdfs://<path to your files>/Microsoft.Spark.Worker.net461.win-x64-<version>.zip#worker,hdfs://<path to your files>/mySparkApp.zip#udfs \
72-
hdfs://<path to jar file>/microsoft-spark-2.4.x-<version>.jar \
72+
hdfs://<path to jar file>/microsoft-spark-<spark_majorversion-spark_minorversion>_<scala_majorversion.scala_minorversion>-<spark_dotnet_version>.jar \
7373
hdfs://<path to your files>/mySparkApp.zip mySparkApp
7474
```
7575

docs/spark/how-to-guides/dotnet-spark-jupyter-notebooks.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -43,7 +43,7 @@ To work with Jupyter Notebooks, you'll need two things.
4343

4444
## Start .NET for Apache Spark
4545

46-
Run the following command to start .NET for Apache Spark in debug mode. This `spark-submit` command starts a process and waits for connections from a [SparkSession](xref:Microsoft.Spark.Sql.SparkSession). Make sure to provide the path to the `microsoft-spark-<version>.jar` for the respective version of .NET for Apache Spark you're using.
46+
Run the following command to start .NET for Apache Spark in debug mode. This `spark-submit` command starts a process and waits for connections from a [SparkSession](xref:Microsoft.Spark.Sql.SparkSession). Make sure to provide the path to the `microsoft-spark-<spark_majorversion-spark_minorversion>_<scala_majorversion.scala_minorversion>-<spark_dotnet_version>.jar` for the respective version of .NET for Apache Spark you're using.
4747

4848
**Ubuntu**
4949

docs/spark/how-to-guides/hdinsight-deploy-methods.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -18,13 +18,13 @@ You can use the [spark-submit](https://spark.apache.org/docs/latest/submitting-a
1818

1919
2. Copy the ssh login information and paste the login into a terminal. Sign in to your cluster using the password you set during cluster creation. You should see messages welcoming you to Ubuntu and Spark.
2020

21-
3. Use the **spark-submit** command to run your app on your HDInsight cluster. Remember to replace **mycontainer** and **mystorageaccount** in the example script with the actual names of your blob container and storage account. Also, be sure to replace `microsoft-spark-2.3.x-0.6.0.jar` with the appropriate jar file you're using for deployment. `2.3.x` represents the version of Apache Spark, and `0.6.0` represents the version of the [.NET for Apache Spark worker](https://github.com/dotnet/spark/releases).
21+
3. Use the **spark-submit** command to run your app on your HDInsight cluster. Remember to replace **mycontainer** and **mystorageaccount** in the example script with the actual names of your blob container and storage account. Also remember to replace the microsoft-spark jar with the version of Spark and .NET for Apache Spark being used.
2222

2323
```bash
2424
$SPARK_HOME/bin/spark-submit \
2525
--master yarn \
2626
--class org.apache.spark.deploy.dotnet.DotnetRunner \
27-
wasbs://[email protected]/microsoft-spark-2.3.x-0.6.0.jar \
27+
wasbs://[email protected]/microsoft-spark-<spark_majorversion-spark_minorversion>_<scala_majorversion.scala_minorversion>-<spark_dotnet_version>.jar \
2828
wasbs://[email protected]/publish.zip mySparkApp
2929
```
3030

@@ -41,7 +41,7 @@ curl -k -v -X POST "https://<your spark cluster>.azurehdinsight.net/livy/batches
4141
-H "X-Requested-By: <hdinsight username>" \
4242
-d @- << EOF
4343
{
44-
"file":"abfss://<your-file-system-name>@<your-storage-account-name>.dfs.core.windows.net/<some dir>/microsoft-spark-<spark_majorversion.spark_minorversion.x>-<spark_dotnet_version>.jar",
44+
"file":"abfss://<your-file-system-name>@<your-storage-account-name>.dfs.core.windows.net/<some dir>/microsoft-spark-<spark_majorversion-spark_minorversion>_<scala_majorversion.scala_minorversion>-<spark_dotnet_version>.jar",
4545
"className":"org.apache.spark.deploy.dotnet.DotnetRunner",
4646
"files":["abfss://<your-file-system-name>@<your-storage-account-name>.dfs.core.windows.net/<some dir>/<udf assembly>", "abfss://<your-file-system-name>@<your-storage-account-name>.dfs.core.windows.net/<some dir>/<file>"],
4747
"args":["abfss://<your-file-system-name>@<your-storage-account-name>.dfs.core.windows.net/<some dir>/<your app>.zip","<your app>","<app arg 1>","<app arg 2>,"...","<app arg n>"]

docs/spark/how-to-guides/hdinsight-notebook-installation.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -68,7 +68,7 @@ In the Azure portal, select the **HDInsight Spark cluster** you created in the p
6868
| Name | *Install .NET for Apache Spark Interactive Notebook Experience* |
6969
| Bash script URI | The URI to which you uploaded `install-interactive-notebook.sh`. |
7070
| Node type(s)| Head and Worker |
71-
| Parameters | .NET for Apache Spark version. You can check [.NET for Apache Spark releases](https://github.com/dotnet/spark/releases). For example, if you want to install Sparkdotnet version 0.6.0 then it would be `0.6.0`.
71+
| Parameters | .NET for Apache Spark version. You can check [.NET for Apache Spark releases](https://github.com/dotnet/spark/releases). For example, if you want to install Sparkdotnet version 1.0.0 then it would be `1.0.0`.
7272

7373
Move to the next step when green checkmarks appear next to the status of the script action.
7474

@@ -96,7 +96,7 @@ Follow the instructions in the [Stop Livy server](#stop-the-livy-server) section
9696

9797
* **Property 2** Use the version of .NET for Apache Spark which you had included in the previous script action.
9898
* Key:&ensp;&ensp;`spark.dotnet.packages`
99-
* Value: `["nuget: Microsoft.Spark, 0.6.0", "nuget: Microsoft.Spark.Extensions.Delta, 0.6.0"]`
99+
* Value: `["nuget: Microsoft.Spark, 1.0.0", "nuget: Microsoft.Spark.Extensions.Delta, 1.0.0"]`
100100

101101
* **Property 3**
102102
* Key:&ensp;&ensp;`spark.dotnet.interpreter`

docs/spark/how-to-guides/java-udf-from-dotnet.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -66,7 +66,7 @@ A basic example to illustrate the above steps:
6666
4. Submit this application using `spark-submit` by passing the previously compiled Java UDF jar through the `--jars` option:
6767

6868
```bash
69-
spark-submit --master local --jars UdfApp-0.0.1.jar --class org.apache.spark.deploy.dotnet.DotnetRunner microsoft-spark-3.0.x-0.12.1.jar InterRuntimeUDFs.exe
69+
spark-submit --master local --jars UdfApp-0.0.1.jar --class org.apache.spark.deploy.dotnet.DotnetRunner microsoft-spark-2-4_2.11-1.0.0.jar InterRuntimeUDFs.exe
7070
```
7171

7272
The resultant `dfUdf` DataFrame had the number 5 added to each row of the input column as defined by `JavaUdf`:

0 commit comments

Comments
 (0)