|
| 1 | +--- |
| 2 | +description: Documentation on how to configure and execute a Jupyter Notebook job on Hopsworks. |
| 3 | +--- |
| 4 | + |
| 5 | +# How To Run A Jupyter Notebook Job |
| 6 | + |
| 7 | +## Introduction |
| 8 | + |
| 9 | +All members of a project in Hopsworks can launch the following types of applications through a project's Jobs service: |
| 10 | + |
| 11 | +- Python (*Hopsworks Enterprise only*) |
| 12 | +- Apache Spark |
| 13 | + |
| 14 | +Launching a job of any type is very similar process, what mostly differs between job types is |
| 15 | +the various configuration parameters each job type comes with. After following this guide you will be able to create a Jupyter Notebook job. |
| 16 | + |
| 17 | +!!! note "Kubernetes integration required" |
| 18 | + Python Jobs are only available if Hopsworks has been integrated with a Kubernetes cluster. |
| 19 | + |
| 20 | + Hopsworks can be integrated with [Amazon EKS](../../../setup_installation/aws/eks_ecr_integration.md), [Azure AKS](../../../setup_installation/azure/aks_acr_integration.md) and on-premise Kubernetes clusters. |
| 21 | + |
| 22 | +## UI |
| 23 | + |
| 24 | +### Step 1: Jobs overview |
| 25 | + |
| 26 | +The image below shows the Jobs overview page in Hopsworks and is accessed by clicking `Jobs` in the sidebar. |
| 27 | + |
| 28 | +<p align="center"> |
| 29 | + <figure> |
| 30 | + <img src="../../../../assets/images/guides/jobs/jobs_overview.png" alt="Jobs overview"> |
| 31 | + <figcaption>Jobs overview</figcaption> |
| 32 | + </figure> |
| 33 | +</p> |
| 34 | + |
| 35 | +### Step 2: Create new job dialog |
| 36 | + |
| 37 | +Click `New Job` and the following dialog will appear. |
| 38 | + |
| 39 | +<p align="center"> |
| 40 | + <figure> |
| 41 | + <img src="../../../../assets/images/guides/jobs/create_new_job.png" alt="Create new job dialog"> |
| 42 | + <figcaption>Create new job dialog</figcaption> |
| 43 | + </figure> |
| 44 | +</p> |
| 45 | + |
| 46 | +### Step 3: Set the job type |
| 47 | + |
| 48 | +By default, the dialog will create a Spark job. To instead configure a Jupyter Notebook job, select `PYTHON`. |
| 49 | + |
| 50 | +<p align="center"> |
| 51 | + <figure> |
| 52 | + <img src="../../../../assets/images/guides/jobs/jobs_select_python.gif" alt="Select Python job type"> |
| 53 | + <figcaption>Select Python job type</figcaption> |
| 54 | + </figure> |
| 55 | +</p> |
| 56 | + |
| 57 | +### Step 4: Set the script |
| 58 | + |
| 59 | +Next step is to select the Jupyter Notebook to run. You can either select `From project`, if the file was previously uploaded to Hopsworks, or `Upload new file` which lets you select a file from your local filesystem as demonstrated below. By default, the job name is the same as the file name, but you can customize it as shown. |
| 60 | + |
| 61 | +<p align="center"> |
| 62 | + <figure> |
| 63 | + <img src="../../../../assets/images/guides/jobs/upload_job_notebook_file.gif" alt="Configure program"> |
| 64 | + <figcaption>Configure program</figcaption> |
| 65 | + </figure> |
| 66 | +</p> |
| 67 | + |
| 68 | +Then click `Create job` to create the job. |
| 69 | + |
| 70 | +### Step 5 (optional): Set the Jupyter Notebook arguments |
| 71 | + |
| 72 | +In the job settings, you can specify arguments for your notebook script. |
| 73 | +Arguments must be in the format of `-arg1 value1 -arg2 value2`. For each argument, you must provide the parameter name (e.g. `arg1`) preceded by a hyphen (`-`), followed by its value (e.g. `value1`). |
| 74 | +You do not need to handle the arguments in your notebook. Our system uses [Papermill](https://papermill.readthedocs.io/en/latest/) to insert a new cell containing the initialized parameters. |
| 75 | + |
| 76 | +<p align="center"> |
| 77 | + <figure> |
| 78 | + <img src="../../../../assets/images/guides/jobs/job_notebook_args.png" alt="Configure notebook arguments"> |
| 79 | + <figcaption>Configure notebook arguments</figcaption> |
| 80 | + </figure> |
| 81 | +</p> |
| 82 | + |
| 83 | +### Step 6 (optional): Additional configuration |
| 84 | + |
| 85 | +It is possible to also set following configuration settings for a `PYTHON` job. |
| 86 | + |
| 87 | +* `Container memory`: The amount of memory in MB to be allocated to the Jupyter Notebook script |
| 88 | +* `Container cores`: The number of cores to be allocated for the Jupyter Notebook script |
| 89 | +* `Additional files`: List of files that will be locally accessible by the application |
| 90 | +You can always modify the arguments in the job settings. |
| 91 | + |
| 92 | +<p align="center"> |
| 93 | + <figure> |
| 94 | + <img src="../../../../assets/images/guides/jobs/configure_py.png" alt="Set the job type"> |
| 95 | + <figcaption>Set the job type</figcaption> |
| 96 | + </figure> |
| 97 | +</p> |
| 98 | + |
| 99 | +### Step 7: Execute the job |
| 100 | + |
| 101 | +Now click the `Run` button to start the execution of the job. You will be redirected to the `Executions` page where you can see the list of all executions. |
| 102 | + |
| 103 | +<p align="center"> |
| 104 | + <figure> |
| 105 | + <img src="../../../../assets/images/guides/jobs/start_job_notebook.gif" alt="Start job execution"> |
| 106 | + <figcaption>Start job execution</figcaption> |
| 107 | + </figure> |
| 108 | +</p> |
| 109 | + |
| 110 | +### Step 8: Visualize output notebook |
| 111 | +Once the execution is finished, click `Logs` and then `notebook out` to see the logs for the execution. |
| 112 | + |
| 113 | +<p align="center"> |
| 114 | + <figure> |
| 115 | + <img src="../../../../assets/images/guides/jobs/job_view_out_notebook.gif" alt="Visualize output notebook"> |
| 116 | + <figcaption>Visualize output notebook</figcaption> |
| 117 | + </figure> |
| 118 | +</p> |
| 119 | + |
| 120 | +You can directly edit and save the output notebook by clicking `Open Notebook`. |
| 121 | + |
| 122 | +## Code |
| 123 | + |
| 124 | +### Step 1: Upload the Jupyter Notebook script |
| 125 | + |
| 126 | +This snippet assumes the Jupyter Notebook script is in the current working directory and named `notebook.ipynb`. |
| 127 | + |
| 128 | +It will upload the Jupyter Notebook script to the `Resources` dataset in your project. |
| 129 | + |
| 130 | +```python |
| 131 | + |
| 132 | +import hopsworks |
| 133 | + |
| 134 | +project = hopsworks.login() |
| 135 | + |
| 136 | +dataset_api = project.get_dataset_api() |
| 137 | + |
| 138 | +uploaded_file_path = dataset_api.upload("notebook.ipynb", "Resources") |
| 139 | + |
| 140 | +``` |
| 141 | + |
| 142 | + |
| 143 | +### Step 2: Create Jupyter Notebook job |
| 144 | + |
| 145 | +In this snippet we get the `JobsApi` object to get the default job configuration for a `PYTHON` job, set the Jupyter Notebook script to run and create the `Job` object. |
| 146 | + |
| 147 | +```python |
| 148 | + |
| 149 | +jobs_api = project.get_jobs_api() |
| 150 | + |
| 151 | +notebook_job_config = jobs_api.get_configuration("PYTHON") |
| 152 | + |
| 153 | +notebook_job_config['appPath'] = uploaded_file_path |
| 154 | + |
| 155 | +job = jobs_api.create_job("notebook_job", notebook_job_config) |
| 156 | + |
| 157 | +``` |
| 158 | + |
| 159 | +### Step 3: Execute the job |
| 160 | + |
| 161 | +In this code snippet, we execute the job with arguments and wait until it reaches a terminal state. |
| 162 | + |
| 163 | +```python |
| 164 | + |
| 165 | +# Run the job |
| 166 | +execution = job.run(args='-a 2 -b 5', await_termination=True) |
| 167 | +``` |
| 168 | + |
| 169 | +### API Reference |
| 170 | + |
| 171 | +[Jobs](https://docs.hopsworks.ai/hopsworks-api/{{{ hopsworks_version }}}/generated/api/jobs/) |
| 172 | + |
| 173 | +[Executions](https://docs.hopsworks.ai/hopsworks-api/{{{ hopsworks_version }}}/generated/api/executions/) |
| 174 | + |
| 175 | +## Conclusion |
| 176 | + |
| 177 | +In this guide you learned how to create and run a Jupyter Notebook job. |
0 commit comments