Skip to content

Commit

Permalink
[FSTORE-1248] Papermill integration (#359)
Browse files Browse the repository at this point in the history
* Add papermill job guide and update other jobs guides to 3.8.0

---------

Co-authored-by: marcopellegrinoit <[email protected]>
  • Loading branch information
Marco Pellegrino and marcopellegrinoit authored Mar 20, 2024
1 parent bea4385 commit 8dcf0e3
Show file tree
Hide file tree
Showing 21 changed files with 241 additions and 40 deletions.
Binary file modified docs/assets/images/guides/jobs/create_new_job.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/assets/images/guides/jobs/job_py_args.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/assets/images/guides/jobs/job_spark_args.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/assets/images/guides/jobs/jobs_overview.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/assets/images/guides/jobs/spark_main_class.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/assets/images/guides/jobs/start_job_py.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/assets/images/guides/jobs/start_job_pyspark.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/assets/images/guides/jobs/upload_job_py_file.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/assets/images/guides/jobs/upload_job_spark_file.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
177 changes: 177 additions & 0 deletions docs/user_guides/projects/jobs/notebook_job.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,177 @@
---
description: Documentation on how to configure and execute a Jupyter Notebook job on Hopsworks.
---

# How To Run A Jupyter Notebook Job

## Introduction

All members of a project in Hopsworks can launch the following types of applications through a project's Jobs service:

- Python (*Hopsworks Enterprise only*)
- Apache Spark

Launching a job of any type is very similar process, what mostly differs between job types is
the various configuration parameters each job type comes with. After following this guide you will be able to create a Jupyter Notebook job.

!!! note "Kubernetes integration required"
Python Jobs are only available if Hopsworks has been integrated with a Kubernetes cluster.

Hopsworks can be integrated with [Amazon EKS](../../../setup_installation/aws/eks_ecr_integration.md), [Azure AKS](../../../setup_installation/azure/aks_acr_integration.md) and on-premise Kubernetes clusters.

## UI

### Step 1: Jobs overview

The image below shows the Jobs overview page in Hopsworks and is accessed by clicking `Jobs` in the sidebar.

<p align="center">
<figure>
<img src="../../../../assets/images/guides/jobs/jobs_overview.png" alt="Jobs overview">
<figcaption>Jobs overview</figcaption>
</figure>
</p>

### Step 2: Create new job dialog

Click `New Job` and the following dialog will appear.

<p align="center">
<figure>
<img src="../../../../assets/images/guides/jobs/create_new_job.png" alt="Create new job dialog">
<figcaption>Create new job dialog</figcaption>
</figure>
</p>

### Step 3: Set the job type

By default, the dialog will create a Spark job. To instead configure a Jupyter Notebook job, select `PYTHON`.

<p align="center">
<figure>
<img src="../../../../assets/images/guides/jobs/jobs_select_python.gif" alt="Select Python job type">
<figcaption>Select Python job type</figcaption>
</figure>
</p>

### Step 4: Set the script

Next step is to select the Jupyter Notebook to run. You can either select `From project`, if the file was previously uploaded to Hopsworks, or `Upload new file` which lets you select a file from your local filesystem as demonstrated below. By default, the job name is the same as the file name, but you can customize it as shown.

<p align="center">
<figure>
<img src="../../../../assets/images/guides/jobs/upload_job_notebook_file.gif" alt="Configure program">
<figcaption>Configure program</figcaption>
</figure>
</p>

Then click `Create job` to create the job.

### Step 5 (optional): Set the Jupyter Notebook arguments

In the job settings, you can specify arguments for your notebook script.
Arguments must be in the format of `-arg1 value1 -arg2 value2`. For each argument, you must provide the parameter name (e.g. `arg1`) preceded by a hyphen (`-`), followed by its value (e.g. `value1`).
You do not need to handle the arguments in your notebook. Our system uses [Papermill](https://papermill.readthedocs.io/en/latest/) to insert a new cell containing the initialized parameters.

<p align="center">
<figure>
<img src="../../../../assets/images/guides/jobs/job_notebook_args.png" alt="Configure notebook arguments">
<figcaption>Configure notebook arguments</figcaption>
</figure>
</p>

### Step 6 (optional): Additional configuration

It is possible to also set following configuration settings for a `PYTHON` job.

* `Container memory`: The amount of memory in MB to be allocated to the Jupyter Notebook script
* `Container cores`: The number of cores to be allocated for the Jupyter Notebook script
* `Additional files`: List of files that will be locally accessible by the application
You can always modify the arguments in the job settings.

<p align="center">
<figure>
<img src="../../../../assets/images/guides/jobs/configure_py.png" alt="Set the job type">
<figcaption>Set the job type</figcaption>
</figure>
</p>

### Step 7: Execute the job

Now click the `Run` button to start the execution of the job. You will be redirected to the `Executions` page where you can see the list of all executions.

<p align="center">
<figure>
<img src="../../../../assets/images/guides/jobs/start_job_notebook.gif" alt="Start job execution">
<figcaption>Start job execution</figcaption>
</figure>
</p>

### Step 8: Visualize output notebook
Once the execution is finished, click `Logs` and then `notebook out` to see the logs for the execution.

<p align="center">
<figure>
<img src="../../../../assets/images/guides/jobs/job_view_out_notebook.gif" alt="Visualize output notebook">
<figcaption>Visualize output notebook</figcaption>
</figure>
</p>

You can directly edit and save the output notebook by clicking `Open Notebook`.

## Code

### Step 1: Upload the Jupyter Notebook script

This snippet assumes the Jupyter Notebook script is in the current working directory and named `notebook.ipynb`.

It will upload the Jupyter Notebook script to the `Resources` dataset in your project.

```python

import hopsworks

project = hopsworks.login()

dataset_api = project.get_dataset_api()

uploaded_file_path = dataset_api.upload("notebook.ipynb", "Resources")

```


### Step 2: Create Jupyter Notebook job

In this snippet we get the `JobsApi` object to get the default job configuration for a `PYTHON` job, set the Jupyter Notebook script to run and create the `Job` object.

```python

jobs_api = project.get_jobs_api()

notebook_job_config = jobs_api.get_configuration("PYTHON")

notebook_job_config['appPath'] = uploaded_file_path

job = jobs_api.create_job("notebook_job", notebook_job_config)

```

### Step 3: Execute the job

In this code snippet, we execute the job with arguments and wait until it reaches a terminal state.

```python

# Run the job
execution = job.run(args='-a 2 -b 5', await_termination=True)
```

### API Reference

[Jobs](https://docs.hopsworks.ai/hopsworks-api/{{{ hopsworks_version }}}/generated/api/jobs/)

[Executions](https://docs.hopsworks.ai/hopsworks-api/{{{ hopsworks_version }}}/generated/api/executions/)

## Conclusion

In this guide you learned how to create and run a Jupyter Notebook job.
32 changes: 19 additions & 13 deletions docs/user_guides/projects/jobs/pyspark_job.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@ The image below shows the Jobs overview page in Hopsworks and is accessed by cli

### Step 2: Create new job dialog

To configure a job, click `Advanced options`, which will open up the advanced configuration page for the job.
Click `New Job` and the following dialog will appear.

<p align="center">
<figure>
Expand All @@ -45,29 +45,36 @@ To configure a job, click `Advanced options`, which will open up the advanced c
</figure>
</p>

### Step 3: Set the script
### Step 3: Set the job type

Next step is to select the program to run. You can either select `From project`, if the file was previously uploaded to Hopsworks, or `Upload new file` which lets you select a file from your local filesystem as demonstrated below.
By default, the dialog will create a Spark job. Make sure `SPARK` is chocen.

### Step 4: Set the script

Next step is to select the program to run. You can either select `From project`, if the file was previously uploaded to Hopsworks, or `Upload new file` which lets you select a file from your local filesystem as demonstrated below. By default, the job name is the same as the file name, but you can customize it as shown.

<p align="center">
<figure>
<img src="../../../../assets/images/guides/jobs/upload_job_py_file.gif" alt="Configure program">
<img src="../../../../assets/images/guides/jobs/upload_job_pyspark_file.gif" alt="Configure program">
<figcaption>Configure program</figcaption>
</figure>
</p>

### Step 4: Set the job type
Then click `Create job` to create the job.

Next step is to set the job type to `SPARK` to indicate it should be executed as a spark job. Then specify [advanced configuration](#step-5-optional-advanced-configuration) or click `Create New Job` to create the job.
### Step 5 (optional): Set the PySpark script arguments

In the job settings, you can specify arguments for your PySpark script.
Remember to handle the arguments inside your PySpark script.

<p align="center">
<figure>
<img src="../../../../assets/images/guides/jobs/advanced_configuration_pyspark.png" alt="Set the job type">
<figcaption>Set the job type</figcaption>
<img src="../../../../assets/images/guides/jobs/job_py_args.png" alt="Configure PySpark script arguments">
<figcaption>Configure PySpark script arguments</figcaption>
</figure>
</p>

### Step 5 (optional): Advanced configuration
### Step 6 (optional): Advanced configuration

Resource allocation for the Spark driver and executors can be configured, also the number of executors and whether dynamic execution should be enabled.

Expand Down Expand Up @@ -115,10 +122,9 @@ Line-separates [properties](https://spark.apache.org/docs/3.1.1/configuration.ht
</figure>
</p>

### Step 6: Execute the job

Now click the `Run` button to start the execution of the job, and then click on `Executions` to see the list of all executions.
### Step 7: Execute the job

Now click the `Run` button to start the execution of the job. You will be redirected to the `Executions` page where you can see the list of all executions.

<p align="center">
<figure>
Expand All @@ -127,7 +133,7 @@ Now click the `Run` button to start the execution of the job, and then click on
</figure>
</p>

### Step 7: Application logs
### Step 8: Application logs

To monitor logs while the execution is running, click `Spark UI` to open the Spark UI in a separate tab.

Expand Down
33 changes: 22 additions & 11 deletions docs/user_guides/projects/jobs/python_job.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@ The image below shows the Jobs overview page in Hopsworks and is accessed by cli

### Step 2: Create new job dialog

By default, the dialog will create a Spark job. To instead configure a Python job, click `Advanced options`, which will open up the advanced configuration page for the job.
Click `New Job` and the following dialog will appear.

<p align="center">
<figure>
Expand All @@ -43,9 +43,20 @@ By default, the dialog will create a Spark job. To instead configure a Python jo
</figure>
</p>

### Step 3: Set the script
### Step 3: Set the job type

Next step is to select the python script to run. You can either select `From project`, if the file was previously uploaded to Hopsworks, or `Upload new file` which lets you select a file from your local filesystem as demonstrated below.
By default, the dialog will create a Spark job. To instead configure a Python job, select `PYTHON`.

<p align="center">
<figure>
<img src="../../../../assets/images/guides/jobs/jobs_select_python.gif" alt="Select Python job type">
<figcaption>Select Python job type</figcaption>
</figure>
</p>

### Step 4: Set the script

Next step is to select the python script to run. You can either select `From project`, if the file was previously uploaded to Hopsworks, or `Upload new file` which lets you select a file from your local filesystem as demonstrated below. By default, the job name is the same as the file name, but you can customize it as shown.

<p align="center">
<figure>
Expand All @@ -54,18 +65,19 @@ Next step is to select the python script to run. You can either select `From pro
</figure>
</p>

### Step 4: Set the job type
### Step 5 (optional): Set the Python script arguments

Next step is to set the job type to `PYTHON` to indicate it should be executed as a simple python script. Then click `Create New Job` to create the job.
In the job settings, you can specify arguments for your Python script.
Remember to handle the arguments inside your Python script.

<p align="center">
<figure>
<img src="../../../../assets/images/guides/jobs/advanced_configuration_py.gif" alt="Set the job type">
<figcaption>Set the job type</figcaption>
<img src="../../../../assets/images/guides/jobs/job_notebook_args.png" alt="Configure Python script arguments">
<figcaption>Configure Python script arguments</figcaption>
</figure>
</p>

### Step 5 (optional): Additional configuration
### Step 6 (optional): Additional configuration

It is possible to also set following configuration settings for a `PYTHON` job.

Expand All @@ -80,13 +92,12 @@ It is possible to also set following configuration settings for a `PYTHON` job.
</figure>
</p>

### Step 6: Execute the job
### Step 7: Execute the job

Now click the `Run` button to start the execution of the job, and then click on `Executions` to see the list of all executions.
Now click the `Run` button to start the execution of the job. You will be redirected to the `Executions` page where you can see the list of all executions.

Once the execution is finished, click on `Logs` to see the logs for the execution.


<p align="center">
<figure>
<img src="../../../../assets/images/guides/jobs/start_job_py.gif" alt="Start job execution">
Expand Down
Loading

0 comments on commit 8dcf0e3

Please sign in to comment.