Skip to content

Commit

Permalink
Add short info about scheduling in Jobs guide (#381)
Browse files Browse the repository at this point in the history
* Add short info about scheduling in Jobs guide

* Apply suggestions from code review

Co-authored-by: MagicLex <[email protected]>

---------

Co-authored-by: MagicLex <[email protected]>
  • Loading branch information
vatj and MagicLex authored May 8, 2024
1 parent b625bda commit a4cae5c
Show file tree
Hide file tree
Showing 4 changed files with 14 additions and 5 deletions.
8 changes: 6 additions & 2 deletions docs/user_guides/projects/jobs/pyspark_job.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,9 +12,13 @@ All members of a project in Hopsworks can launch the following types of applicat
- Apache Spark

Launching a job of any type is very similar process, what mostly differs between job types is
the various configuration parameters each job type comes with. After following this guide you will be able to create a PySpark job.
the various configuration parameters each job type comes with. Hopsworks clusters support scheduling to run jobs on a regular basis,
e.g backfilling a Feature Group by running your feature engineering pipeline nightly. Scheduling can be done both through the UI and the python API,
checkout [our Scheduling guide](schedule_job.md).

The PySpark program can either be a `.py` script or a `.ipynb` file.

PySpark program can either be a `.py` script or a `.ipynb` file, however be mindful of how to access/create
the spark session based on the extension you provide.

!!! notice "Instantiate the SparkSession"
For a `.py` file, remember to instantiate the SparkSession i.e `spark=SparkSession.builder.getOrCreate()`
Expand Down
4 changes: 3 additions & 1 deletion docs/user_guides/projects/jobs/python_job.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,9 @@ All members of a project in Hopsworks can launch the following types of applicat
- Apache Spark

Launching a job of any type is very similar process, what mostly differs between job types is
the various configuration parameters each job type comes with. After following this guide you will be able to create a Python job.
the various configuration parameters each job type comes with. Hopsworks support scheduling jobs to run on a regular basis,
e.g backfilling a Feature Group by running your feature engineering pipeline nightly. Scheduling can be done both through the UI and the python API,
checkout [our Scheduling guide](schedule_job.md).

!!! note "Kubernetes integration required"
Python Jobs are only available if Hopsworks has been integrated with a Kubernetes cluster.
Expand Down
2 changes: 1 addition & 1 deletion docs/user_guides/projects/jobs/schedule_job.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ description: Documentation on how to schedule a job on Hopsworks.

## Introduction

Hopsworks jobs can be scheduled to run at regular intervals using the scheduling function provided by Hopsworks. Each job can be configured to have a single schedule.
Hopsworks clusters can run jobs on a schedule, allowing you to automate the execution. Whether you need to backfill your feature groups on a nightly basis or run a model training pipeline every week, the Hopsworks scheduler will help you automate these tasks. Each job can be configured to have a single schedule. For more advanced use cases, Hopsworks integrates with any DAG manager and directly with the open-source [Apache Airflow](https://airflow.apache.org/use-cases/), check out our [Airflow Guide](../airflow/airflow.md).

Schedules can be defined using the drop down menus in the UI or a Quartz [cron](https://en.wikipedia.org/wiki/Cron) expression.

Expand Down
5 changes: 4 additions & 1 deletion docs/user_guides/projects/jobs/spark_job.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,10 @@ All members of a project in Hopsworks can launch the following types of applicat
- Apache Spark

Launching a job of any type is very similar process, what mostly differs between job types is
the various configuration parameters each job type comes with. After following this guide you will be able to create a Spark job.
the various configuration parameters each job type comes with. Hopsworks support scheduling to run jobs on a regular basis,
e.g backfilling a Feature Group by running your feature engineering pipeline nightly. Scheduling can be done both through the UI and the python API,
checkout [our Scheduling guide](schedule_job.md).


## UI

Expand Down

0 comments on commit a4cae5c

Please sign in to comment.