From a506ab77b197358ed0fbdc2d3cd3e66f18199236 Mon Sep 17 00:00:00 2001 From: Victor Jouffrey <37411285+vatj@users.noreply.github.com> Date: Wed, 8 May 2024 11:27:21 +0200 Subject: [PATCH] Add short info about scheduling in Jobs guide (#381) (#385) * Add short info about scheduling in Jobs guide * Apply suggestions from code review --------- Co-authored-by: MagicLex <64143547+MagicLex@users.noreply.github.com> --- docs/user_guides/projects/jobs/pyspark_job.md | 8 ++++++-- docs/user_guides/projects/jobs/python_job.md | 4 +++- docs/user_guides/projects/jobs/schedule_job.md | 2 +- docs/user_guides/projects/jobs/spark_job.md | 5 ++++- 4 files changed, 14 insertions(+), 5 deletions(-) diff --git a/docs/user_guides/projects/jobs/pyspark_job.md b/docs/user_guides/projects/jobs/pyspark_job.md index c5dd1c75b..770eaf6bd 100644 --- a/docs/user_guides/projects/jobs/pyspark_job.md +++ b/docs/user_guides/projects/jobs/pyspark_job.md @@ -12,9 +12,13 @@ All members of a project in Hopsworks can launch the following types of applicat - Apache Spark Launching a job of any type is very similar process, what mostly differs between job types is -the various configuration parameters each job type comes with. After following this guide you will be able to create a PySpark job. +the various configuration parameters each job type comes with. Hopsworks clusters support scheduling to run jobs on a regular basis, +e.g backfilling a Feature Group by running your feature engineering pipeline nightly. Scheduling can be done both through the UI and the python API, +checkout [our Scheduling guide](schedule_job.md). -The PySpark program can either be a `.py` script or a `.ipynb` file. + +PySpark program can either be a `.py` script or a `.ipynb` file, however be mindful of how to access/create +the spark session based on the extension you provide. !!! notice "Instantiate the SparkSession" For a `.py` file, remember to instantiate the SparkSession i.e `spark=SparkSession.builder.getOrCreate()` diff --git a/docs/user_guides/projects/jobs/python_job.md b/docs/user_guides/projects/jobs/python_job.md index 3814e4716..3d28948ad 100644 --- a/docs/user_guides/projects/jobs/python_job.md +++ b/docs/user_guides/projects/jobs/python_job.md @@ -12,7 +12,9 @@ All members of a project in Hopsworks can launch the following types of applicat - Apache Spark Launching a job of any type is very similar process, what mostly differs between job types is -the various configuration parameters each job type comes with. After following this guide you will be able to create a Python job. +the various configuration parameters each job type comes with. Hopsworks support scheduling jobs to run on a regular basis, +e.g backfilling a Feature Group by running your feature engineering pipeline nightly. Scheduling can be done both through the UI and the python API, +checkout [our Scheduling guide](schedule_job.md). !!! note "Kubernetes integration required" Python Jobs are only available if Hopsworks has been integrated with a Kubernetes cluster. diff --git a/docs/user_guides/projects/jobs/schedule_job.md b/docs/user_guides/projects/jobs/schedule_job.md index 427dfc3b9..0cd57f9ee 100644 --- a/docs/user_guides/projects/jobs/schedule_job.md +++ b/docs/user_guides/projects/jobs/schedule_job.md @@ -6,7 +6,7 @@ description: Documentation on how to schedule a job on Hopsworks. ## Introduction -Hopsworks jobs can be scheduled to run at regular intervals using the scheduling function provided by Hopsworks. Each job can be configured to have a single schedule. +Hopsworks clusters can run jobs on a schedule, allowing you to automate the execution. Whether you need to backfill your feature groups on a nightly basis or run a model training pipeline every week, the Hopsworks scheduler will help you automate these tasks. Each job can be configured to have a single schedule. For more advanced use cases, Hopsworks integrates with any DAG manager and directly with the open-source [Apache Airflow](https://airflow.apache.org/use-cases/), check out our [Airflow Guide](../airflow/airflow.md). Schedules can be defined using the drop down menus in the UI or a Quartz [cron](https://en.wikipedia.org/wiki/Cron) expression. diff --git a/docs/user_guides/projects/jobs/spark_job.md b/docs/user_guides/projects/jobs/spark_job.md index bad61bc07..19256eeee 100644 --- a/docs/user_guides/projects/jobs/spark_job.md +++ b/docs/user_guides/projects/jobs/spark_job.md @@ -12,7 +12,10 @@ All members of a project in Hopsworks can launch the following types of applicat - Apache Spark Launching a job of any type is very similar process, what mostly differs between job types is -the various configuration parameters each job type comes with. After following this guide you will be able to create a Spark job. +the various configuration parameters each job type comes with. Hopsworks support scheduling to run jobs on a regular basis, +e.g backfilling a Feature Group by running your feature engineering pipeline nightly. Scheduling can be done both through the UI and the python API, +checkout [our Scheduling guide](schedule_job.md). + ## UI