Skip to content

Commit 8dcf0e3

Browse files
Marco Pellegrinomarcopellegrinoit
andauthored
[FSTORE-1248] Papermill integration (#359)
* Add papermill job guide and update other jobs guides to 3.8.0 --------- Co-authored-by: marcopellegrinoit <[email protected]>
1 parent bea4385 commit 8dcf0e3

21 files changed

+241
-40
lines changed
Loading
Loading
71.6 KB
Loading
Loading
Loading
-17.7 KB
Loading
Loading
Loading
Loading
Loading
Loading
Loading
Loading
Loading
Loading
Loading
Lines changed: 177 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,177 @@
1+
---
2+
description: Documentation on how to configure and execute a Jupyter Notebook job on Hopsworks.
3+
---
4+
5+
# How To Run A Jupyter Notebook Job
6+
7+
## Introduction
8+
9+
All members of a project in Hopsworks can launch the following types of applications through a project's Jobs service:
10+
11+
- Python (*Hopsworks Enterprise only*)
12+
- Apache Spark
13+
14+
Launching a job of any type is very similar process, what mostly differs between job types is
15+
the various configuration parameters each job type comes with. After following this guide you will be able to create a Jupyter Notebook job.
16+
17+
!!! note "Kubernetes integration required"
18+
Python Jobs are only available if Hopsworks has been integrated with a Kubernetes cluster.
19+
20+
Hopsworks can be integrated with [Amazon EKS](../../../setup_installation/aws/eks_ecr_integration.md), [Azure AKS](../../../setup_installation/azure/aks_acr_integration.md) and on-premise Kubernetes clusters.
21+
22+
## UI
23+
24+
### Step 1: Jobs overview
25+
26+
The image below shows the Jobs overview page in Hopsworks and is accessed by clicking `Jobs` in the sidebar.
27+
28+
<p align="center">
29+
<figure>
30+
<img src="../../../../assets/images/guides/jobs/jobs_overview.png" alt="Jobs overview">
31+
<figcaption>Jobs overview</figcaption>
32+
</figure>
33+
</p>
34+
35+
### Step 2: Create new job dialog
36+
37+
Click `New Job` and the following dialog will appear.
38+
39+
<p align="center">
40+
<figure>
41+
<img src="../../../../assets/images/guides/jobs/create_new_job.png" alt="Create new job dialog">
42+
<figcaption>Create new job dialog</figcaption>
43+
</figure>
44+
</p>
45+
46+
### Step 3: Set the job type
47+
48+
By default, the dialog will create a Spark job. To instead configure a Jupyter Notebook job, select `PYTHON`.
49+
50+
<p align="center">
51+
<figure>
52+
<img src="../../../../assets/images/guides/jobs/jobs_select_python.gif" alt="Select Python job type">
53+
<figcaption>Select Python job type</figcaption>
54+
</figure>
55+
</p>
56+
57+
### Step 4: Set the script
58+
59+
Next step is to select the Jupyter Notebook to run. You can either select `From project`, if the file was previously uploaded to Hopsworks, or `Upload new file` which lets you select a file from your local filesystem as demonstrated below. By default, the job name is the same as the file name, but you can customize it as shown.
60+
61+
<p align="center">
62+
<figure>
63+
<img src="../../../../assets/images/guides/jobs/upload_job_notebook_file.gif" alt="Configure program">
64+
<figcaption>Configure program</figcaption>
65+
</figure>
66+
</p>
67+
68+
Then click `Create job` to create the job.
69+
70+
### Step 5 (optional): Set the Jupyter Notebook arguments
71+
72+
In the job settings, you can specify arguments for your notebook script.
73+
Arguments must be in the format of `-arg1 value1 -arg2 value2`. For each argument, you must provide the parameter name (e.g. `arg1`) preceded by a hyphen (`-`), followed by its value (e.g. `value1`).
74+
You do not need to handle the arguments in your notebook. Our system uses [Papermill](https://papermill.readthedocs.io/en/latest/) to insert a new cell containing the initialized parameters.
75+
76+
<p align="center">
77+
<figure>
78+
<img src="../../../../assets/images/guides/jobs/job_notebook_args.png" alt="Configure notebook arguments">
79+
<figcaption>Configure notebook arguments</figcaption>
80+
</figure>
81+
</p>
82+
83+
### Step 6 (optional): Additional configuration
84+
85+
It is possible to also set following configuration settings for a `PYTHON` job.
86+
87+
* `Container memory`: The amount of memory in MB to be allocated to the Jupyter Notebook script
88+
* `Container cores`: The number of cores to be allocated for the Jupyter Notebook script
89+
* `Additional files`: List of files that will be locally accessible by the application
90+
You can always modify the arguments in the job settings.
91+
92+
<p align="center">
93+
<figure>
94+
<img src="../../../../assets/images/guides/jobs/configure_py.png" alt="Set the job type">
95+
<figcaption>Set the job type</figcaption>
96+
</figure>
97+
</p>
98+
99+
### Step 7: Execute the job
100+
101+
Now click the `Run` button to start the execution of the job. You will be redirected to the `Executions` page where you can see the list of all executions.
102+
103+
<p align="center">
104+
<figure>
105+
<img src="../../../../assets/images/guides/jobs/start_job_notebook.gif" alt="Start job execution">
106+
<figcaption>Start job execution</figcaption>
107+
</figure>
108+
</p>
109+
110+
### Step 8: Visualize output notebook
111+
Once the execution is finished, click `Logs` and then `notebook out` to see the logs for the execution.
112+
113+
<p align="center">
114+
<figure>
115+
<img src="../../../../assets/images/guides/jobs/job_view_out_notebook.gif" alt="Visualize output notebook">
116+
<figcaption>Visualize output notebook</figcaption>
117+
</figure>
118+
</p>
119+
120+
You can directly edit and save the output notebook by clicking `Open Notebook`.
121+
122+
## Code
123+
124+
### Step 1: Upload the Jupyter Notebook script
125+
126+
This snippet assumes the Jupyter Notebook script is in the current working directory and named `notebook.ipynb`.
127+
128+
It will upload the Jupyter Notebook script to the `Resources` dataset in your project.
129+
130+
```python
131+
132+
import hopsworks
133+
134+
project = hopsworks.login()
135+
136+
dataset_api = project.get_dataset_api()
137+
138+
uploaded_file_path = dataset_api.upload("notebook.ipynb", "Resources")
139+
140+
```
141+
142+
143+
### Step 2: Create Jupyter Notebook job
144+
145+
In this snippet we get the `JobsApi` object to get the default job configuration for a `PYTHON` job, set the Jupyter Notebook script to run and create the `Job` object.
146+
147+
```python
148+
149+
jobs_api = project.get_jobs_api()
150+
151+
notebook_job_config = jobs_api.get_configuration("PYTHON")
152+
153+
notebook_job_config['appPath'] = uploaded_file_path
154+
155+
job = jobs_api.create_job("notebook_job", notebook_job_config)
156+
157+
```
158+
159+
### Step 3: Execute the job
160+
161+
In this code snippet, we execute the job with arguments and wait until it reaches a terminal state.
162+
163+
```python
164+
165+
# Run the job
166+
execution = job.run(args='-a 2 -b 5', await_termination=True)
167+
```
168+
169+
### API Reference
170+
171+
[Jobs](https://docs.hopsworks.ai/hopsworks-api/{{{ hopsworks_version }}}/generated/api/jobs/)
172+
173+
[Executions](https://docs.hopsworks.ai/hopsworks-api/{{{ hopsworks_version }}}/generated/api/executions/)
174+
175+
## Conclusion
176+
177+
In this guide you learned how to create and run a Jupyter Notebook job.

docs/user_guides/projects/jobs/pyspark_job.md

Lines changed: 19 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -36,7 +36,7 @@ The image below shows the Jobs overview page in Hopsworks and is accessed by cli
3636

3737
### Step 2: Create new job dialog
3838

39-
To configure a job, click `Advanced options`, which will open up the advanced configuration page for the job.
39+
Click `New Job` and the following dialog will appear.
4040

4141
<p align="center">
4242
<figure>
@@ -45,29 +45,36 @@ To configure a job, click `Advanced options`, which will open up the advanced c
4545
</figure>
4646
</p>
4747

48-
### Step 3: Set the script
48+
### Step 3: Set the job type
4949

50-
Next step is to select the program to run. You can either select `From project`, if the file was previously uploaded to Hopsworks, or `Upload new file` which lets you select a file from your local filesystem as demonstrated below.
50+
By default, the dialog will create a Spark job. Make sure `SPARK` is chocen.
51+
52+
### Step 4: Set the script
53+
54+
Next step is to select the program to run. You can either select `From project`, if the file was previously uploaded to Hopsworks, or `Upload new file` which lets you select a file from your local filesystem as demonstrated below. By default, the job name is the same as the file name, but you can customize it as shown.
5155

5256
<p align="center">
5357
<figure>
54-
<img src="../../../../assets/images/guides/jobs/upload_job_py_file.gif" alt="Configure program">
58+
<img src="../../../../assets/images/guides/jobs/upload_job_pyspark_file.gif" alt="Configure program">
5559
<figcaption>Configure program</figcaption>
5660
</figure>
5761
</p>
5862

59-
### Step 4: Set the job type
63+
Then click `Create job` to create the job.
6064

61-
Next step is to set the job type to `SPARK` to indicate it should be executed as a spark job. Then specify [advanced configuration](#step-5-optional-advanced-configuration) or click `Create New Job` to create the job.
65+
### Step 5 (optional): Set the PySpark script arguments
66+
67+
In the job settings, you can specify arguments for your PySpark script.
68+
Remember to handle the arguments inside your PySpark script.
6269

6370
<p align="center">
6471
<figure>
65-
<img src="../../../../assets/images/guides/jobs/advanced_configuration_pyspark.png" alt="Set the job type">
66-
<figcaption>Set the job type</figcaption>
72+
<img src="../../../../assets/images/guides/jobs/job_py_args.png" alt="Configure PySpark script arguments">
73+
<figcaption>Configure PySpark script arguments</figcaption>
6774
</figure>
6875
</p>
6976

70-
### Step 5 (optional): Advanced configuration
77+
### Step 6 (optional): Advanced configuration
7178

7279
Resource allocation for the Spark driver and executors can be configured, also the number of executors and whether dynamic execution should be enabled.
7380

@@ -115,10 +122,9 @@ Line-separates [properties](https://spark.apache.org/docs/3.1.1/configuration.ht
115122
</figure>
116123
</p>
117124

118-
### Step 6: Execute the job
119-
120-
Now click the `Run` button to start the execution of the job, and then click on `Executions` to see the list of all executions.
125+
### Step 7: Execute the job
121126

127+
Now click the `Run` button to start the execution of the job. You will be redirected to the `Executions` page where you can see the list of all executions.
122128

123129
<p align="center">
124130
<figure>
@@ -127,7 +133,7 @@ Now click the `Run` button to start the execution of the job, and then click on
127133
</figure>
128134
</p>
129135

130-
### Step 7: Application logs
136+
### Step 8: Application logs
131137

132138
To monitor logs while the execution is running, click `Spark UI` to open the Spark UI in a separate tab.
133139

docs/user_guides/projects/jobs/python_job.md

Lines changed: 22 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -34,7 +34,7 @@ The image below shows the Jobs overview page in Hopsworks and is accessed by cli
3434

3535
### Step 2: Create new job dialog
3636

37-
By default, the dialog will create a Spark job. To instead configure a Python job, click `Advanced options`, which will open up the advanced configuration page for the job.
37+
Click `New Job` and the following dialog will appear.
3838

3939
<p align="center">
4040
<figure>
@@ -43,9 +43,20 @@ By default, the dialog will create a Spark job. To instead configure a Python jo
4343
</figure>
4444
</p>
4545

46-
### Step 3: Set the script
46+
### Step 3: Set the job type
4747

48-
Next step is to select the python script to run. You can either select `From project`, if the file was previously uploaded to Hopsworks, or `Upload new file` which lets you select a file from your local filesystem as demonstrated below.
48+
By default, the dialog will create a Spark job. To instead configure a Python job, select `PYTHON`.
49+
50+
<p align="center">
51+
<figure>
52+
<img src="../../../../assets/images/guides/jobs/jobs_select_python.gif" alt="Select Python job type">
53+
<figcaption>Select Python job type</figcaption>
54+
</figure>
55+
</p>
56+
57+
### Step 4: Set the script
58+
59+
Next step is to select the python script to run. You can either select `From project`, if the file was previously uploaded to Hopsworks, or `Upload new file` which lets you select a file from your local filesystem as demonstrated below. By default, the job name is the same as the file name, but you can customize it as shown.
4960

5061
<p align="center">
5162
<figure>
@@ -54,18 +65,19 @@ Next step is to select the python script to run. You can either select `From pro
5465
</figure>
5566
</p>
5667

57-
### Step 4: Set the job type
68+
### Step 5 (optional): Set the Python script arguments
5869

59-
Next step is to set the job type to `PYTHON` to indicate it should be executed as a simple python script. Then click `Create New Job` to create the job.
70+
In the job settings, you can specify arguments for your Python script.
71+
Remember to handle the arguments inside your Python script.
6072

6173
<p align="center">
6274
<figure>
63-
<img src="../../../../assets/images/guides/jobs/advanced_configuration_py.gif" alt="Set the job type">
64-
<figcaption>Set the job type</figcaption>
75+
<img src="../../../../assets/images/guides/jobs/job_notebook_args.png" alt="Configure Python script arguments">
76+
<figcaption>Configure Python script arguments</figcaption>
6577
</figure>
6678
</p>
6779

68-
### Step 5 (optional): Additional configuration
80+
### Step 6 (optional): Additional configuration
6981

7082
It is possible to also set following configuration settings for a `PYTHON` job.
7183

@@ -80,13 +92,12 @@ It is possible to also set following configuration settings for a `PYTHON` job.
8092
</figure>
8193
</p>
8294

83-
### Step 6: Execute the job
95+
### Step 7: Execute the job
8496

85-
Now click the `Run` button to start the execution of the job, and then click on `Executions` to see the list of all executions.
97+
Now click the `Run` button to start the execution of the job. You will be redirected to the `Executions` page where you can see the list of all executions.
8698

8799
Once the execution is finished, click on `Logs` to see the logs for the execution.
88100

89-
90101
<p align="center">
91102
<figure>
92103
<img src="../../../../assets/images/guides/jobs/start_job_py.gif" alt="Start job execution">

0 commit comments

Comments
 (0)