diff --git a/.github/workflows/tests.yml b/.github/workflows/tests.yml index 743b7e1556..64622bb2e7 100644 --- a/.github/workflows/tests.yml +++ b/.github/workflows/tests.yml @@ -5,6 +5,23 @@ on: - master pull_request: jobs: + + docs: + runs-on: ubuntu-latest + steps: + - name: Fetch the code + uses: actions/checkout@v2 + - name: Set up Python 3.8 + uses: actions/setup-python@v2 + with: + python-version: "3.8" + - name: Install dependencies + run: | + python -m pip install --upgrade pip + if [ -f doc-requirements.txt ]; then pip install -r doc-requirements.txt; fi + - name: Build the documentation + run: make docs + end-to-end: runs-on: ubuntu-latest steps: diff --git a/Makefile b/Makefile index 62293f792b..2f112b124e 100644 --- a/Makefile +++ b/Makefile @@ -65,3 +65,8 @@ helm_install: .PHONY: helm_upgrade helm_upgrade: helm upgrade flyte --debug ./helm -f helm/values-sandbox.yaml --create-namespace --namespace=flyte + bash script/prepare_artifacts.sh + +.PHONY: docs +docs: + make -C rsts clean html SPHINXOPTS=-W diff --git a/doc-requirements.in b/doc-requirements.in index 609042e8d1..8fa1da952a 100644 --- a/doc-requirements.in +++ b/doc-requirements.in @@ -9,3 +9,4 @@ sphinx-tabs sphinxext-remoteliteralinclude sphinx-issues sphinx_fontawesome +sphinx-panels diff --git a/doc-requirements.txt b/doc-requirements.txt index 391f58e797..4e428a8149 100644 --- a/doc-requirements.txt +++ b/doc-requirements.txt @@ -21,6 +21,7 @@ chardet==4.0.0 docutils==0.16 # via # sphinx + # sphinx-panels # sphinx-tabs git+git://github.com/flyteorg/furo@main # via -r doc-requirements.in @@ -71,6 +72,8 @@ sphinx-fontawesome==0.0.6 # via -r doc-requirements.in sphinx-issues==1.2.0 # via -r doc-requirements.in +sphinx-panels==0.6.0 + # via -r doc-requirements.in sphinx-prompt==1.4.0 # via -r doc-requirements.in sphinx-tabs==3.0.0 @@ -84,6 +87,7 @@ sphinx==3.5.4 # sphinx-copybutton # sphinx-fontawesome # sphinx-issues + # sphinx-panels # sphinx-prompt # sphinx-tabs # sphinxext-remoteliteralinclude diff --git a/rsts/Makefile b/rsts/Makefile index d06c8c4804..a761edf1c9 100644 --- a/rsts/Makefile +++ b/rsts/Makefile @@ -18,6 +18,3 @@ help: # "make mode" option. $(O) is meant as a shortcut for $(SPHINXOPTS). %: Makefile @$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O) - - - diff --git a/rsts/community/contribute.rst b/rsts/community/contribute.rst index bf1212adc1..a8a419ae84 100644 --- a/rsts/community/contribute.rst +++ b/rsts/community/contribute.rst @@ -5,28 +5,28 @@ Contributing to Flyte Thank you for taking the time to contribute to Flyte! Here are some guidelines for you to follow, which will make your first and follow-up contributions easier. .. note:: - Please read our `Code of Conduct `_ before contributing to Flyte. + Please read our `Code of Conduct `__ before contributing to Flyte. Code ==== -An issue tagged with ``good first issue`` is the best place to start for first-time contributors. Look into them `here `_. +An issue tagged with ``good first issue`` is the best place to start for first-time contributors. Look into them `here `__. -To take a step ahead, check out the repositories available under `flyteorg `_. +To take a step ahead, check out the repositories available under `flyteorg `__. **Appetizer (for every repo): Fork and clone the concerned repository. Create a new branch on your fork and make the required changes. Create a pull request once your work is ready for review.** .. note:: - Note: To open a pull request, follow this `guide `_. + Note: To open a pull request, follow this `guide `__. *A piece of good news -- You can be added as a committer to any ``flyteorg`` repo as you become more involved with the project.* -Example PR for your reference: `GitHub PR `_. A couple of checks are introduced to help in maintaining the robustness of the project. +Example PR for your reference: `GitHub PR `__. A couple of checks are introduced to help in maintaining the robustness of the project. -#. To get through DCO, sign off on every commit. (`Reference `_) +#. To get through DCO, sign off on every commit. (`Reference `__) #. To improve code coverage, write unit tests to test your code. .. note:: - Format your Go code with ``golangci-lint`` followed by ``goimports`` (we used the same in the `Makefile `_), and Python code with ``black`` (use ``make fmt`` command which contains both black and isort). + Format your Go code with ``golangci-lint`` followed by ``goimports`` (we used the same in the `Makefile `__), and Python code with ``black`` (use ``make fmt`` command which contains both black and isort). Environment Setup ***************** @@ -38,14 +38,14 @@ Environment Setup The dependency graph between various flyteorg repos -#. `flyte `_ +#. `flyte `__ | Purpose: Deployment, Documentation, and Issues | Languages: Kustomize & RST -#. `flyteidl `_ - | Purpose: The Flyte Workflow specification in `protocol buffers `_ which forms the core of Flyte +#. `flyteidl `__ + | Purpose: The Flyte Workflow specification in `protocol buffers `__ which forms the core of Flyte | Language: Protobuf - | Setup: Refer to the `README `_ -#. `flytepropeller `_ + | Setup: Refer to the `README `__ +#. `flytepropeller `__ | Purpose: Kubernetes native execution engine for Flyte Workflows and Tasks | Language: Go @@ -56,7 +56,7 @@ Environment Setup * ``make test_unit`` * ``make link`` * To compile, run ``make compile`` -#. `flyteadmin `_ +#. `flyteadmin `__ | Purpose: Control Plane | Language: Go @@ -72,18 +72,18 @@ Environment Setup * To run integration tests locally: * ``make integration`` * (or, to run in containerized dockernetes): ``make k8s_integration`` -#. `flytekit `_ +#. `flytekit `__ | Purpose: Python SDK & Tools | Language: Python - | Setup: Refer to the `Flytekit Contribution Guide `_ -#. `flyteconsole `_ + | Setup: Refer to the `Flytekit Contribution Guide `__ +#. `flyteconsole `__ | Purpose: Admin Console | Language: Typescript - | Setup: Refer to the `README `_ -#. `datacatalog `_ + | Setup: Refer to the `README `__ +#. `datacatalog `__ | Purpose: Manage Input & Output Artifacts | Language: Go -#. `flyteplugins `_ +#. `flyteplugins `__ | Purpose: Flyte Plugins | Language: Go @@ -93,10 +93,10 @@ Environment Setup * ``make generate`` * ``make test_unit`` * ``make link`` -#. `flytestdlib `_ +#. `flytestdlib `__ | Purpose: Standard Library for Shared Components | Language: Go -#. `flytesnacks `_ +#. `flytesnacks `__ | Purpose: Examples, Tips, and Tricks to use Flytekit SDKs | Language: Python (In future, Java shall be added) @@ -106,7 +106,7 @@ Environment Setup * Run the ``make start`` command in the root directory of the flytesnacks repo * Visit https://localhost:30081 to view the Flyte console consisting of the examples present in ``flytesnacks/cookbook/core`` directory * To fetch the new dependencies and rebuild the image, run ``make register`` -#. `flytectl `_ +#. `flytectl `__ | Purpose: A Standalone Flyte CLI | Language: Go @@ -119,28 +119,29 @@ Environment Setup Issues ====== -`GitHub Issues `_ is used for issue tracking. There are a variety of issue types available that you could use while filing an issue. +`GitHub Issues `__ is used for issue tracking. There are a variety of issue types available that you could use while filing an issue. -* `Plugin Request `_ -* `Bug Report `_ -* `Documentation Bug/Update Request `_ -* `Core Feature Request `_ -* `Flytectl Feature Request `_ -* `Housekeeping `_ -* `UI Feature Request `_ +* `Plugin Request `__ +* `Bug Report `__ +* `Documentation Bug/Update Request `__ +* `Core Feature Request `__ +* `Flytectl Feature Request `__ +* `Housekeeping `__ +* `UI Feature Request `__ -If none of the above fits your requirements, file a `blank `_ issue. +If none of the above fits your requirements, file a `blank `__ issue. Documentation ============= Flyte uses Sphinx for documentation and ``godocs`` for Golang. ``godocs`` is quite simple -- comment your code and you are good to go! -Sphinx spans across multiple repositories under the `flyteorg `_ repository. It uses reStructured Text (rst) files to store the documentation content. For both the API and code-related content, it extracts docstrings from the code files. +Sphinx spans across multiple repositories under the `flyteorg `__ repository. It uses reStructured Text (rst) files to store the documentation content. For both the API and code-related content, it extracts docstrings from the code files. -To get started, look into `reStructuredText reference `_. +To get started, look into `reStructuredText reference `__. + +Docs Environment Setup +********************** -Environment Setup -***************** Install all the requirements from the `docs-requirements.txt` file present in the root of a repository. .. code-block:: console @@ -176,7 +177,7 @@ The edit option is found at the bottom of a page, as shown below. Intersphinx *********** -`Intersphinx `_ can generate automatic links to the documentation of objects in other projects. +`Intersphinx `__ can generate automatic links to the documentation of objects in other projects. To establish a reference to any other documentation from Flyte or within it, use intersphinx. @@ -192,7 +193,7 @@ For example: } .. note:: - ``docs/source`` is present in the repository root. Click `here `_ to view the intersphinx configuration. + ``docs/source`` is present in the repository root. Click `here `__ to view the intersphinx configuration. The key refers to the name used to refer to the file (while referencing the documentation), and the URL denotes the precise location. @@ -216,7 +217,7 @@ Output: | -Linking to Python elements changes based on what you're linking to. Check out this `section `_ to learn more. +Linking to Python elements changes based on what you're linking to. Check out this `section `__ to learn more. | diff --git a/rsts/community/index.rst b/rsts/community/index.rst index fcc34bdfd7..5d79960e30 100644 --- a/rsts/community/index.rst +++ b/rsts/community/index.rst @@ -9,21 +9,74 @@ amazing community. We are a completely open community and we vouch to treat every member with respect. You will find the community very welcoming and warm, so please join us on: -- `Slack `_ -- `Email `_ -- `Twitter `_ -- Community Sync every other Tuesday, 9:00 AM PDT/PST. Please check out the `calendar `_ and feel free to pop in! `Zoom link `_ +.. panels:: + :container: container-lg pb-4 + :column: col-lg-4 col-md-4 col-sm-4 col-xs-12 p-2 + :body: text-center -Although Slack is currently our primary discussion platform, we also have discussion boards on: + .. link-button:: http://flyte-org.slack.com + :type: url + :text: Slack + :classes: btn-block stretched-link -- `GitHub `_, featuring a list of Q&As and How-to questions + :fa:`slack` -- `Flyte OSS LinkedIn Discussion Group `_ + --- -We love contributions, so please contribute to - - docs - - examples - - new plugins or plugin ideas - - general feedback and discussions + .. link-button:: https://groups.google.com/a/flyte.org/d/forum/users + :type: url + :text: Google Group + :classes: btn-block stretched-link + + :fa:`google` + + --- + + .. link-button:: https://twitter.com/flyteorg + :type: url + :text: Twitter + :classes: btn-block stretched-link + + :fa:`twitter` + + --- + + .. link-button:: https://github.com/flyteorg/flyte/discussions + :type: url + :text: Github Discussions + :classes: btn-block stretched-link + + :fa:`github` + + --- + + .. link-button:: https://www.linkedin.com/groups/13962256 + :type: url + :text: LinkedIn Group + :classes: btn-block stretched-link + + :fa:`linkedin` + + +Open Source Community Sync +-------------------------- + +We host an Open Source Community Sync every other Tuesday, 9:00 AM PDT/PST. +Please check out the `calendar `_ +and feel free to pop in! + +.. link-button:: https://zoom.us/s/93875115830?pwd=YWZWOHl1ODRRVjhjVkxSV0pmZkJaZz09#success + :type: url + :text: Zoom Link + :classes: btn-outline-secondary + +.. toctree:: + :caption: Community + :maxdepth: -1 + :name: communitytoc + :hidden: -Thank you for being part of this amazing community! + contribute + roadmap + Frequently Asked Questions + troubleshoot diff --git a/rsts/concepts/admin.rst b/rsts/concepts/admin.rst index 2a9084e279..ecdb341ac6 100644 --- a/rsts/concepts/admin.rst +++ b/rsts/concepts/admin.rst @@ -1,8 +1,8 @@ .. _divedeep-admin: -########## -FlyteAdmin -########## +########### +Flyte Admin +########### Admin Structure =============== @@ -49,10 +49,10 @@ Additional Components The managers utilize additional components to process requests. These additional components include: -- **:ref:`workflow engine `**: compiles workflows and launches workflow executions from launch plans. -- **:ref:`data ` (remote cloud storage)**: offloads data blobs to the configured cloud provider. -- **:ref:`runtime `**: loads values from a config file to assign task resources, initialization values, execution queues and more. -- **:ref:`async processes `**: provides functions for scheduling and executing workflows as well as enqueuing and triggering notifications +- :ref:`workflow engine `: compiles workflows and launches workflow executions from launch plans. +- :ref:`data ` (remote cloud storage): offloads data blobs to the configured cloud provider. +- :ref:`runtime `: loads values from a config file to assign task resources, initialization values, execution queues and more. +- :ref:`async processes `: provides functions for scheduling and executing workflows as well as enqueuing and triggering notifications .. _divedeep-admin-repository: @@ -135,3 +135,332 @@ Workflowengine This directory contains interfaces to build and execute workflows leveraging flytepropeller compiler and client components. .. [0] Unfortunately, given unique naming constraints, some models are redefined in `migration_models `__ to guarantee unique index values. + +.. _divedeep-admin-service: + + +FlyteAdmin Service Background +============================= + +Entities +--------- + +The :std:ref:`admin service definition ` defines REST operations for the entities +flyteadmin administers. + +As a refresher, the primary :ref:`entities ` across Flyte map similarly to FlyteAdmin entities. + +Static entities ++++++++++++++++ + +These include: + +- Workflows +- Tasks +- Launch Plans + +Permitted operations: + +- Create +- Get +- List + +The above are designated by an :std:ref:`identifier ` +which consists of a project, domain, name and version specification. These entities are for the most part immutable. To update one of these specific entities, the updated +version must be reregistered with a unique and new version identifier attribute. + +One caveat is that launch plan state can toggle between :std:ref:`ACTIVE or INACTIVE `. +At most one launch plan version across a shared project, domain and name specification can be active at a time. The state affects scheduled launch plans only. +An inactive launch plan can still be used to launch individual executions. However, only an active launch plan runs on a schedule (if it has a schedule defined). + + +Static entities metadata (Named Entities) ++++++++++++++++++++++++++++++++++++++++++ +A :std:ref:`named entity ` includes metadata for one of the above entities +(workflow, task or launch plan) across versions. A named entity includes a resource type (workflow, task or launch plan) and an +:std:ref:`id ` which is composed of project, domain and name. +A named entity also includes metadata, which are mutable attributes about the referenced entity. + +This metadata includes: + +- Description: a human readable description for the Named Entity collection +- State (workflows only): this determines whether the workflow is shown on the overview list of workflows scoped by project and domain + +Permitted operations: + +- Create +- Update +- Get +- List + + +Execution entities +++++++++++++++++++ + +These include: + +- (Workflow) executions +- Node executions +- Task executions + +Permitted operations: + +- Create +- Get +- List + +After an execution begins, flyte propeller monitors the execution and sends events which admin uses to update the above executions. + +These :std:ref:`events ` include + +- WorkflowExecutionEvent +- NodeExecutionEvent +- TaskExecutionEvent + +and include information about respective phase transitions, phase transition time and optional output data if the event concerns a terminal phase change. + +These events are the **only** way to update an execution. No raw Update endpoint exists. + +To track the lifecycle of an execution admin stores attributes such as duration, timestamp at which an execution transitioned to running, and end time. + +For debug purposes admin also stores Workflow and Node execution events in its database, but does not currently expose them through an API. Because array tasks can yield very many executions, +admin does **not** store TaskExecutionEvents. + + +Platform entities ++++++++++++++++++ +Projects: like named entities, project have mutable metadata such as human-readable names and descriptions, in addition to their unique string ids. + +Permitted project operations: + +- Register +- List + +Matchable resources ++++++++++++++++++++ + +A thorough background on :std:ref:`matchable resources ` explains +their purpose and application logic. As a summary, these are used to override system level defaults for kubernetes cluster +resource management, default execution values, and more all across different levels of specificity. + +These entities consist of: + +- ProjectDomainAttributes +- WorkflowAttributes + +Where ProjectDomainAttributes configure customizable overrides at the project and domain level and WorkflowAttributes configure customizable overrides at the project, domain and workflow level. + +Permitted attribute operations: + +- Update (implicitly creates if there is no existing override) +- Get +- Delete + +Using the Admin Service +----------------------- + +Adding request filters +++++++++++++++++++++++ + +We use `gRPC Gateway `_ to reverse proxy http requests into gRPC. +While this allows for a single implementation for both HTTP and gRPC, an important limitation is that fields mapped to the path pattern cannot be +repeated and must have a primitive (non-message) type. Unfortunately this means that repeated string filters cannot use a proper protobuf message. Instead use +the internal syntax shown below:: + + func(field,value) or func(field, value) + +For example, multiple filters would be appended to an http request:: + + ?filters=ne(version, TheWorst)+eq(workflow.name, workflow) + +Timestamp fields use the RFC3339Nano spec (ex: "2006-01-02T15:04:05.999999999Z07:00") + +The fully supported set of filter functions are + +- contains +- gt (greater than) +- gte (greter than or equal to) +- lt (less than) +- lte (less than or equal to) +- eq (equal) +- ne (not equal) +- value_in (for repeated sets of values) + +"value_in" is a special case where multiple values are passed to the filter expression. For example:: + + value_in(phase, 1;2;3) + +Filterable fields vary based on entity types: + +- Task + + - project + - domain + - name + - version + - created_at +- Workflow + + - project + - domain + - name + - version + - created_at +- Launch plans + + - project + - domain + - name + - version + - created_at + - updated_at + - workflows.{any workflow field above} (for example: workflow.domain) + - state (you must use the integer enum e.g. 1) + - States are defined in :std:ref:`launchplanstate `. +- Named Entity Metadata + + - state (you must use the integer enum e.g. 1) + - States are defined in :std:ref:`namedentitystate `. +- Executions (Workflow executions) + + - project + - domain + - name + - workflow.{any workflow field above} (for example: workflow.domain) + - launch_plan.{any launch plan field above} (for example: launch_plan.name) + - phase (you must use the upper-cased string name e.g. RUNNING) + - Phases are defined in :std:ref:`workflowexecution.phase `. + - execution_created_at + - execution_updated_at + - duration (in seconds) + - mode (you must use the integer enum e.g. 1) + - Modes are defined in :std:ref:`executionmode `. + - user (authenticated user or role from flytekit config) + +- Node Executions + + - node_id + - execution.{any execution field above} (for example: execution.domain) + - phase (you must use the upper-cased string name e.g. QUEUED) + - Phases are defined in :std:ref:`nodeexecution.phase `. + - started_at + - node_execution_created_at + - node_execution_updated_at + - duration (in seconds) +- Task Executions + + - retry_attempt + - task.{any task field above} (for example: task.version) + - execution.{any execution field above} (for example: execution.domain) + - node_execution.{any node execution field above} (for example: node_execution.phase) + - phase (you must use the upper-cased string name e.g. SUCCEEDED) + - Phases are defined in :std:ref:`taskexecution.phase `. + - started_at + - task_execution_created_at + - task_execution_updated_at + - duration (in seconds) + +Putting it all together +----------------------- + +If you wanted to do query on specific executions that were launched with a specific launch plan for a workflow with specific attributes, you could do something like: + +:: + + gte(duration, 100)+value_in(phase,RUNNING;SUCCEEDED;FAILED)+eq(lauch_plan.project, foo) + +eq(launch_plan.domain, bar)+eq(launch_plan.name, baz) + +eq(launch_plan.version, 1234) + +lte(workflow.created_at,2018-11-29T17:34:05.000000000Z07:00) + + + +Adding sorting to requests +++++++++++++++++++++++++++ + +Only a subset of fields are supported for sorting list queries. The explicit list is below: + +- ListTasks + + - project + - domain + - name + - version + - created_at +- ListTaskIds + + - project + - domain +- ListWorkflows + + - project + - domain + - name + - version + - created_at +- ListWorkflowIds + + - project + - domain +- ListLaunchPlans + + - project + - domain + - name + - version + - created_at + - updated_at + - state (you must use the integer enum e.g. 1) + - States are defined in :std:ref:`launchplanstate `. +- ListWorkflowIds + + - project + - domain +- ListExecutions + + - project + - domain + - name + - phase (you must use the upper-cased string name e.g. RUNNING) + - Phases are defined in :std:ref:`workflowexecution.phase `. + - execution_created_at + - execution_updated_at + - duration (in seconds) + - mode (you must use the integer enum e.g. 1) + - Modes are defined :std:ref:`execution.proto `. +- ListNodeExecutions + + - node_id + - retry_attempt + - phase (you must use the upper-cased string name e.g. QUEUED) + - Phases are defined in :std:ref:`nodeexecution.phase `. + - started_at + - node_execution_created_at + - node_execution_updated_at + - duration (in seconds) +- ListTaskExecutions + + - retry_attempt + - phase (you must use the upper-cased string name e.g. SUCCEEDED) + - Phases are defined in :std:ref:`taskexecution.phase `. + - started_at + - task_execution_created_at + - task_execution_updated_at + - duration (in seconds) + +Sorting syntax +-------------- + +Adding sorting to a request requires specifying the ``key``, e.g. the attribute you wish to sort on. Sorting can also optionally specify the direction (one of ``ASCENDING`` or ``DESCENDING``) where ``DESCENDING`` is the default. + +Example sorting http param: + +:: + + sort_by.key=created_at&sort_by.direction=DESCENDING + +Alternatively, since descending is the default, the above could be rewritten as + +:: + + sort_by.key=created_at + diff --git a/rsts/concepts/admin_service.rst b/rsts/concepts/admin_service.rst deleted file mode 100644 index 68175618d7..0000000000 --- a/rsts/concepts/admin_service.rst +++ /dev/null @@ -1,324 +0,0 @@ -.. _divedeep-admin-service: - -############################# -FlyteAdmin Service Background -############################# - -Entities -======== -The :std:ref:`admin service definition ` defines REST operations for the entities -flyteadmin administers. - -As a refresher, the primary :ref:`entities ` across Flyte map similarly to FlyteAdmin entities. - -Static entities -+++++++++++++++ - -These include: - -- Workflows -- Tasks -- Launch Plans - -Permitted operations: - -- Create -- Get -- List - -The above are designated by an :std:ref:`identifier ` -which consists of a project, domain, name and version specification. These entities are for the most part immutable. To update one of these specific entities, the updated -version must be reregistered with a unique and new version identifier attribute. - -One caveat is that launch plan state can toggle between :std:ref:`ACTIVE or INACTIVE `. -At most one launch plan version across a shared project, domain and name specification can be active at a time. The state affects scheduled launch plans only. -An inactive launch plan can still be used to launch individual executions. However, only an active launch plan runs on a schedule (if it has a schedule defined). - - -Static entities metadata (Named Entities) -+++++++++++++++++++++++++++++++++++++++++ -A :std:ref:`named entity ` includes metadata for one of the above entities -(workflow, task or launch plan) across versions. A named entity includes a resource type (workflow, task or launch plan) and an -:std:ref:`id ` which is composed of project, domain and name. -A named entity also includes metadata, which are mutable attributes about the referenced entity. - -This metadata includes: - -- Description: a human readable description for the Named Entity collection -- State (workflows only): this determines whether the workflow is shown on the overview list of workflows scoped by project and domain - -Permitted operations: - -- Create -- Update -- Get -- List - - -Execution entities -++++++++++++++++++ - -These include: - -- (Workflow) executions -- Node executions -- Task executions - -Permitted operations: - -- Create -- Get -- List - -After an execution begins, flyte propeller monitors the execution and sends events which admin uses to update the above executions. - -These :std:ref:`events ` include - -- WorkflowExecutionEvent -- NodeExecutionEvent -- TaskExecutionEvent - -and include information about respective phase transitions, phase transition time and optional output data if the event concerns a terminal phase change. - -These events are the **only** way to update an execution. No raw Update endpoint exists. - -To track the lifecycle of an execution admin stores attributes such as duration, timestamp at which an execution transitioned to running, and end time. - -For debug purposes admin also stores Workflow and Node execution events in its database, but does not currently expose them through an API. Because array tasks can yield very many executions, -admin does **not** store TaskExecutionEvents. - - -Platform entities -+++++++++++++++++ -Projects: like named entities, project have mutable metadata such as human-readable names and descriptions, in addition to their unique string ids. - -Permitted project operations: - -- Register -- List - -Matchable resources -A thorough background on :ref:`matchable resources ` explains their purpose and application logic. As a summary, these are used to override system level defaults -for kubernetes cluster resource management, default execution values, and more all across different levels of specificity. - -These entities consist of: - -- ProjectDomainAttributes -- WorkflowAttributes - -Where ProjectDomainAttributes configure customizable overrides at the project and domain level and WorkflowAttributes configure customizable overrides at the project, domain and workflow level. - -Permitted attribute operations: - -- Update (implicitly creates if there is no existing override) -- Get -- Delete - -Using the Admin Service -======================= - -Adding request filters -++++++++++++++++++++++ - -We use `gRPC Gateway `_ to reverse proxy http requests into gRPC. -While this allows for a single implementation for both HTTP and gRPC, an important limitation is that fields mapped to the path pattern cannot be -repeated and must have a primitive (non-message) type. Unfortunately this means that repeated string filters cannot use a proper protobuf message. Instead use -the internal syntax shown below:: - - func(field,value) or func(field, value) - -For example, multiple filters would be appended to an http request:: - - ?filters=ne(version, TheWorst)+eq(workflow.name, workflow) - -Timestamp fields use the RFC3339Nano spec (ex: "2006-01-02T15:04:05.999999999Z07:00") - -The fully supported set of filter functions are - -- contains -- gt (greater than) -- gte (greter than or equal to) -- lt (less than) -- lte (less than or equal to) -- eq (equal) -- ne (not equal) -- value_in (for repeated sets of values) - -"value_in" is a special case where multiple values are passed to the filter expression. For example:: - - value_in(phase, 1;2;3) - -Filterable fields vary based on entity types: - -- Task - - - project - - domain - - name - - version - - created_at -- Workflow - - - project - - domain - - name - - version - - created_at -- Launch plans - - - project - - domain - - name - - version - - created_at - - updated_at - - workflows.{any workflow field above} (for example: workflow.domain) - - state (you must use the integer enum e.g. 1) - - States are defined in :std:ref:`launchplanstate `. -- Named Entity Metadata - - - state (you must use the integer enum e.g. 1) - - States are defined in :std:ref:`namedentitystate `. -- Executions (Workflow executions) - - - project - - domain - - name - - workflow.{any workflow field above} (for example: workflow.domain) - - launch_plan.{any launch plan field above} (for example: launch_plan.name) - - phase (you must use the upper-cased string name e.g. RUNNING) - - Phases are defined in :std:ref:`workflowexecution.phase `. - - execution_created_at - - execution_updated_at - - duration (in seconds) - - mode (you must use the integer enum e.g. 1) - - Modes are defined in :std:ref:`executionmode `. - - user (authenticated user or role from flytekit config) - -- Node Executions - - - node_id - - execution.{any execution field above} (for example: execution.domain) - - phase (you must use the upper-cased string name e.g. QUEUED) - - Phases are defined in :std:ref:`nodeexecution.phase `. - - started_at - - node_execution_created_at - - node_execution_updated_at - - duration (in seconds) -- Task Executions - - - retry_attempt - - task.{any task field above} (for example: task.version) - - execution.{any execution field above} (for example: execution.domain) - - node_execution.{any node execution field above} (for example: node_execution.phase) - - phase (you must use the upper-cased string name e.g. SUCCEEDED) - - Phases are defined in :std:ref:`taskexecution.phase `. - - started_at - - task_execution_created_at - - task_execution_updated_at - - duration (in seconds) - -Putting it all together ------------------------ - -If you wanted to do query on specific executions that were launched with a specific launch plan for a workflow with specific attributes, you could do something like: - -:: - - gte(duration, 100)+value_in(phase,RUNNING;SUCCEEDED;FAILED)+eq(lauch_plan.project, foo) - +eq(launch_plan.domain, bar)+eq(launch_plan.name, baz) - +eq(launch_plan.version, 1234) - +lte(workflow.created_at,2018-11-29T17:34:05.000000000Z07:00) - - - -Adding sorting to requests -++++++++++++++++++++++++++ - -Only a subset of fields are supported for sorting list queries. The explicit list is below: - -- ListTasks - - - project - - domain - - name - - version - - created_at -- ListTaskIds - - - project - - domain -- ListWorkflows - - - project - - domain - - name - - version - - created_at -- ListWorkflowIds - - - project - - domain -- ListLaunchPlans - - - project - - domain - - name - - version - - created_at - - updated_at - - state (you must use the integer enum e.g. 1) - - States are defined in :std:ref:`launchplanstate `. -- ListWorkflowIds - - - project - - domain -- ListExecutions - - - project - - domain - - name - - phase (you must use the upper-cased string name e.g. RUNNING) - - Phases are defined in :std:ref:`workflowexecution.phase `. - - execution_created_at - - execution_updated_at - - duration (in seconds) - - mode (you must use the integer enum e.g. 1) - - Modes are defined :std:ref:`execution.proto `. -- ListNodeExecutions - - - node_id - - retry_attempt - - phase (you must use the upper-cased string name e.g. QUEUED) - - Phases are defined in :std:ref:`nodeexecution.phase `. - - started_at - - node_execution_created_at - - node_execution_updated_at - - duration (in seconds) -- ListTaskExecutions - - - retry_attempt - - phase (you must use the upper-cased string name e.g. SUCCEEDED) - - Phases are defined in :std:ref:`taskexecution.phase `. - - started_at - - task_execution_created_at - - task_execution_updated_at - - duration (in seconds) - -Sorting syntax --------------- - -Adding sorting to a request requires specifying the ``key``, e.g. the attribute you wish to sort on. Sorting can also optionally specify the direction (one of ``ASCENDING`` or ``DESCENDING``) where ``DESCENDING`` is the default. - -Example sorting http param: - -:: - - sort_by.key=created_at&sort_by.direction=DESCENDING - -Alternatively, since descending is the default, the above could be rewritten as - -:: - - sort_by.key=created_at - diff --git a/rsts/concepts/architecture.rst b/rsts/concepts/architecture.rst index 1c654cdec5..5b4f369ee7 100644 --- a/rsts/concepts/architecture.rst +++ b/rsts/concepts/architecture.rst @@ -1,8 +1,8 @@ .. _divedeep-architecture-overview: -############################# -Architecture Overview -############################# +###################### +Component Architecture +###################### This document aims to demystify how Flyte's major components ``FlyteIDL``, ``FlyteKit``, ``FlyteCLI``, ``FlyteConsole``, ``FlyteAdmin``, ``FlytePropeller``, and ``FlytePlugins`` fit together at a high level. @@ -16,8 +16,8 @@ The Flyte IDL (Interface Definition Language) is where shared Flyte entities are FlyteIDL uses the `protobuf `_ schema to describe entities. Clients are generated for Python, Golang, and JavaScript and imported by Flyte components. -Planes โœˆ๏ธ -========= +Planes +====== Flyte components are separated into 3 logical planes. The planes are summarized here and explained in further detail below. The goal is that any of these planes can be replaced by an alternate implementation. diff --git a/rsts/concepts/basics.rst b/rsts/concepts/basics.rst index 8f970b2a5c..98afc85294 100644 --- a/rsts/concepts/basics.rst +++ b/rsts/concepts/basics.rst @@ -1,18 +1,70 @@ -.. _basics: +.. _divedeep: -###### -Basics -###### +############# +Core Concepts +############# -.. NOTE:: +.. panels:: + :header: text-center + + .. link-button:: divedeep-tasks + :type: ref + :text: Tasks + :classes: btn-block stretched-link + ^^^^^^^^^^^^ + A **Task** is any independent unit of processing. They can be pure functions or functions with side-effects. + Each definition of a task also has associated configuration and requirements specifications. + + --- + + .. link-button:: divedeep-workflows + :type: ref + :text: Workflows + :classes: btn-block stretched-link + ^^^^^^^^^^^^ + **Workflows** are programs that are guaranteed to eventually reach a terminal state and are represented as + Directed Acyclic Graphs (DAGs) expressed in protobuf. + + --- + + .. link-button:: divedeep-nodes + :type: ref + :text: Nodes + :classes: btn-block stretched-link + ^^^^^^^^^^^^ + A **Node** is an encapsulation of an instance of a Task and represent the unit of work, where multiple Nodes that are + interconnected via workflows + + --- + + .. link-button:: divedeep-launchplans + :type: ref + :text: Launch Plans + :classes: btn-block stretched-link + ^^^^^^^^^^^^ + **Launch Plans** provide a mechanism to specialize input parameters for workflows associated different schedules. + + --- + + .. link-button:: divedeep-executions + :type: ref + :text: Executions + :classes: btn-block stretched-link + ^^^^^^^^^^^^ + **Executions** are instances of workflows, nodes or tasks created in the system as a result of a user-requested + execution or a scheduled execution. + +The diagram below shows how inputs flow through tasks and workflows to produce outputs. + +.. image:: ./flyte_wf_tasks_high_level.png - Coming soon ๐Ÿ›  .. toctree:: :maxdepth: 1 - :name: Basics + :name: Core Concepts + :hidden: - flyte_ui - flyte_cli - deployment_options - glossary + tasks + workflows_nodes + launchplans_schedules + executions diff --git a/rsts/concepts/console.rst b/rsts/concepts/console.rst index 7e97bf7f06..89af3a84e3 100644 --- a/rsts/concepts/console.rst +++ b/rsts/concepts/console.rst @@ -27,8 +27,9 @@ Before we can run the server, we need to set up an environment variable or two. The Flyte console displays information fetched from the Flyte Admin API. This environment variable specifies the host prefix used in constructing API requests. -*Note*: this is only the host portion of the API endpoint, consisting of the -protocol, domain, and port (if not using the standard 80/443). +.. NOTE:: + this is only the host portion of the API endpoint, consisting of the + protocol, domain, and port (if not using the standard 80/443). This value will be combined with a suffix (such as ``/api/v1``) to construct the final URL used in an API request. diff --git a/rsts/concepts/control_plane.rst b/rsts/concepts/control_plane.rst index 83ebec2d36..a134061006 100644 --- a/rsts/concepts/control_plane.rst +++ b/rsts/concepts/control_plane.rst @@ -2,12 +2,62 @@ Control Plane ################ +.. panels:: + :header: text-center + + .. link-button:: divedeep-projects + :type: ref + :text: Projects + :classes: btn-block stretched-link + ^^^^^^^^^^^^ + **Projects** are a multi-tenancy primitive in Flyte that allow logical grouping of Flyte workflows and tasks, which + often correspond to source code repositories. + + --- + + .. link-button:: divedeep-domains + :type: ref + :text: Domains + :classes: btn-block stretched-link + ^^^^^^^^^^^^ + **Domains** enable workflows to be executed in different environments, with separate resource isolation and feature + configuration. + + --- + + .. link-button:: divedeep-registration + :type: ref + :text: Registration + :classes: btn-block stretched-link + ^^^^^^^^^^^^ + **Registration** is the process of uploading a workflow and its task definitions to the FlyteAdmin service. + Registration creates an inventory of available tasks, workflows and launchplans declared per project and domain. + + --- + + .. link-button:: divedeep-admin + :type: ref + :text: Flyte Admin + :classes: btn-block stretched-link + ^^^^^^^^^^^^ + **Flyte Admin** is the backend that serves the main Flyte API, processing all client requests to the system. + + --- + + .. link-button:: divedeep-console + :type: ref + :text: Flyte Console + :classes: btn-block stretched-link + ^^^^^^^^^^^^ + **Flyte Console** is the web UI for the Flyte platform. + + .. toctree:: :maxdepth: 1 + :hidden: projects domains - admin - admin_service registration + admin console diff --git a/rsts/concepts/core.rst b/rsts/concepts/core.rst deleted file mode 100644 index d6eb0fb9c8..0000000000 --- a/rsts/concepts/core.rst +++ /dev/null @@ -1,15 +0,0 @@ -.. _divedeep: - -############################ -Core Concepts & Architecture -############################ - -.. toctree:: - :maxdepth: 1 - :name: Concepts & Architecture - - overview - tasks - workflows_nodes - launchplans_schedules - architecture diff --git a/rsts/concepts/customizable_resources.rst b/rsts/concepts/customizable_resources.rst index ae87d8a097..6a9e6c7b1e 100644 --- a/rsts/concepts/customizable_resources.rst +++ b/rsts/concepts/customizable_resources.rst @@ -4,7 +4,10 @@ Adding customizable resources ############################# -For background on customizable resources, see :ref:`howto-managing-customizable-resources`. As a quick refresher, custom resources allow you to manage configurations for specific combinations of user projects, domains and workflows that override default values. Examples of such resources include execution clusters, task resource defaults, and :std:ref:`more `. +For background on customizable resources, see the :std:ref:`User Guide `. +As a quick refresher, custom resources allow you to manage configurations for specific combinations of user projects, +domains and workflows that override default values. Examples of such resources include execution clusters, task resource +defaults, and :std:ref:`more `. Example diff --git a/rsts/concepts/deployment_options.rst b/rsts/concepts/deployment_options.rst deleted file mode 100644 index 0ae2e5b1d1..0000000000 --- a/rsts/concepts/deployment_options.rst +++ /dev/null @@ -1,7 +0,0 @@ -################################### -Deployment options (Local & Remote) -################################### - -.. NOTE:: - - Coming soon ๐Ÿ›  \ No newline at end of file diff --git a/rsts/concepts/execution_time.rst b/rsts/concepts/execution_time.rst deleted file mode 100644 index c19980680b..0000000000 --- a/rsts/concepts/execution_time.rst +++ /dev/null @@ -1,14 +0,0 @@ -###################### -Execution Time Details -###################### - -.. toctree:: - :maxdepth: 1 - - executions - state_machine - execution_timeline - observability - dynamic_spec - catalog - customizable_resources \ No newline at end of file diff --git a/rsts/concepts/executions.rst b/rsts/concepts/executions.rst index aae12e2396..6aa273129e 100644 --- a/rsts/concepts/executions.rst +++ b/rsts/concepts/executions.rst @@ -1,16 +1,27 @@ .. _divedeep-executions: -######################################## -Overview of the Execution of a Workflow -######################################## - -.. image:: https://raw.githubusercontent.com/lyft/flyte/assets/img/flyte_wf_execution_overview.svg?sanitize=true +########## +Executions +########## Typical flow using flyte-cli ----------------------------- - * When you request an execution of a Workflow using the UI, Flyte CLI or other stateless systems, the system first calls the - getLaunchPlan endpoint and retrieves a Launch Plan matching the name for a version. The Launch Plan definition includes the definitions of all the input variables declared for the Workflow. - * The user-side component then ensures that all required inputs are supplied and requests the FlyteAdmin service for an execution - * The Flyte Admin service validates the inputs, making sure that they are all specified and, if required, within the declared bounds. - * Flyte Admin then fetches the previously validated and compiled workflow closure and translates it to an executable format with all of the inputs. - * This executable Workflow is then launched on Kubernetes with an execution record in the database. +* When you request an execution of a Workflow using the UI, Flyte CLI or other stateless systems, the system first calls the + getLaunchPlan endpoint and retrieves a Launch Plan matching the name for a version. The Launch Plan definition includes the definitions of all the input variables declared for the Workflow. +* The user-side component then ensures that all required inputs are supplied and requests the FlyteAdmin service for an execution +* The Flyte Admin service validates the inputs, making sure that they are all specified and, if required, within the declared bounds. +* Flyte Admin then fetches the previously validated and compiled workflow closure and translates it to an executable format with all of the inputs. +* This executable Workflow is then launched on Kubernetes with an execution record in the database. + +.. image:: https://raw.githubusercontent.com/lyft/flyte/assets/img/flyte_wf_execution_overview.svg?sanitize=true + +.. toctree:: + :caption: Execution Details + :maxdepth: 1 + + state_machine + execution_timeline + observability + dynamic_spec + catalog + customizable_resources diff --git a/rsts/concepts/flyte_cli.rst b/rsts/concepts/flyte_cli.rst deleted file mode 100644 index da251e189e..0000000000 --- a/rsts/concepts/flyte_cli.rst +++ /dev/null @@ -1,7 +0,0 @@ -############## -Flyte CLI -############## - -.. NOTE:: - - Coming soon ๐Ÿ›  diff --git a/rsts/concepts/flyte_ui.rst b/rsts/concepts/flyte_ui.rst deleted file mode 100644 index e26edf382a..0000000000 --- a/rsts/concepts/flyte_ui.rst +++ /dev/null @@ -1,7 +0,0 @@ -################# -Flyte UI -################# - -.. NOTE:: - - Coming soon ๐Ÿ›  \ No newline at end of file diff --git a/rsts/concepts/glossary.rst b/rsts/concepts/glossary.rst deleted file mode 100644 index 0be14da2ad..0000000000 --- a/rsts/concepts/glossary.rst +++ /dev/null @@ -1,12 +0,0 @@ -############ -Glossary -############ - -.. glossary:: - - Memoization - Memoization ensures that a method doesn't run for the same inputs more than once by keeping a record of the results for the given inputs. - -.. NOTE:: - - Coming soon ๐Ÿ›  \ No newline at end of file diff --git a/rsts/concepts/launchplans_schedules.rst b/rsts/concepts/launchplans_schedules.rst index d292436894..5230001a97 100644 --- a/rsts/concepts/launchplans_schedules.rst +++ b/rsts/concepts/launchplans_schedules.rst @@ -6,7 +6,9 @@ Launch plans are used to execute workflows. A workflow can have many launch plan Launch plans provide a way to templatize Flyte workflow invocations. Launch plans contain a set of bound workflow inputs that are passed as arguments to create an execution. Launch plans do not necessarily contain the entire set of required workflow inputs, but a launch plan is always necessary to trigger an execution. Additional input arguments can be provided at execution time to supplement launch plan static input values. -In addition to templatizing inputs, launch plans allow you to run your workflow on one or multiple schedules. Each launch plan can optionally define a single schedule (which can be easily disabled by disabling the launch plan) as well as optional notifications. Refer to :ref:`howto-notifications` for a deep dive into available notifications. +In addition to templatizing inputs, launch plans allow you to run your workflow on one or multiple schedules. Each launch +plan can optionally define a single schedule (which can be easily disabled by disabling the launch plan) as well as +optional notifications. Refer to the :std:ref:`User Guide ` for a deep dive into available notifications. See `here `__ for an overview. @@ -40,7 +42,7 @@ Fixed inputs cannot be overridden. If a workflow is executed with a launch plan .. _concepts-schedules: Schedules -========= +--------- Workflows can be run automatically using schedules associated with launch plans. Schedules can either define a cron_expression_. or rate_unit_. At most one launch plan version for a given {Project, Domain, Name} combination can be active, which means at most one schedule can be active for a launch plan. However, many unique launch plans and corresponding schedules can be defined for the same workflow. diff --git a/rsts/concepts/observability.rst b/rsts/concepts/observability.rst index c9b4d69b0b..3ebaf13a65 100644 --- a/rsts/concepts/observability.rst +++ b/rsts/concepts/observability.rst @@ -3,7 +3,7 @@ Metrics for your executions =========================== -.. tip:: Refer to :ref:`howto-monitoring` to see/use prebuilt dashboards published to Grafana Marketplace. The following section explains some other metrics that are very important. +.. tip:: Refer to the :std:ref:`User Guide ` to see how to use prebuilt dashboards published to Grafana Marketplace. The following section explains some other metrics that are very important. Flyte-Provided Metrics ~~~~~~~~~~~~~~~~~~~~~~~ diff --git a/rsts/concepts/overview.rst b/rsts/concepts/overview.rst deleted file mode 100644 index 341662bc20..0000000000 --- a/rsts/concepts/overview.rst +++ /dev/null @@ -1,42 +0,0 @@ -.. _divedeep_overview: - -#################### -Logical Overview -#################### - -Illustration of a workflow with tasks ----------------------------------------- - -.. image:: ./flyte_wf_tasks_high_level.png - - -:ref:`Tasks ` are at the core of Flyte. A Task is any independent unit of -processing. Tasks can be pure functions or functions with side-effects. Tasks also have -configuration and requirements specification associated with each definition of the task. - -:ref:`Workflows ` are programs that are guaranteed to reach a terminal -state eventually. They are represented as Directed Acyclic Graphs (DAGs) expressed in protobuf. -The Flyte specification language expresses DAGs with branches, parallel steps and nested -Workflows. Workflow can optionally specify typed inputs and produce typed outputs, which -are captured by the framework. Workflows are composed of one or more -:ref:`Nodes `. A Node is an encapsulation of an instance of a Task. - -:ref:`Executions ` are instances of workflows, nodes or tasks created -in the system as a result of a user-requested execution or a scheduled execution. - -:ref:`Projects ` are a multi-tenancy primitive in Flyte that allow -logical grouping of Flyte workflows and tasks. Projects often correspond to source code -repositories. For example the project *Save Water* may include multiple `Workflows` -that analyze wastage of water etc. - -:ref:`Domains ` enable workflows to be executed in different environments, -with separate resource isolation and feature configuration. - -:ref:`Launchplans ` provide a mechanism to specialize input parameters -for workflows associated different schedules. - -:ref:`Registration ` is the process of uploading a workflow and its -task definitions to the :ref:`FlyteAdmin ` service. Registration creates -an inventory of available tasks, workflows and launchplans declared per project -and domain. A scheduled or on-demand execution can then be launched against one of -the registered entities. diff --git a/rsts/concepts/registration.rst b/rsts/concepts/registration.rst index 464cdf7c9b..a4f3464569 100644 --- a/rsts/concepts/registration.rst +++ b/rsts/concepts/registration.rst @@ -1,8 +1,8 @@ .. _divedeep-registration: -################################## -Understanding Registration process -################################## +############ +Registration +############ .. image:: https://raw.githubusercontent.com/lyft/flyte/assets/img/flyte_wf_registration_overview.svg?sanitize=true diff --git a/rsts/concepts/state_machine.rst b/rsts/concepts/state_machine.rst index d9162fa714..92206ab9bb 100644 --- a/rsts/concepts/state_machine.rst +++ b/rsts/concepts/state_machine.rst @@ -26,11 +26,11 @@ The State diagram above illustrates the various states through which a Workflow A Workflow always starts in the Ready State and ends either in Failed, Succeeded or Aborted state. Any system error within a state causes a retry on that state. These retries are capped by system retries and will eventually lead to an Aborted state. -Every transition between states is recorded in Flyteadmin using :std:ref:`workflowexecutionevent protos/docs/event/event:workflowexecutionevent` +Every transition between states is recorded in Flyteadmin using :std:ref:`workflowexecutionevent ` -The phases in the above state diagram are captured in the Admin database as specified here :std:ref:`workflowexecution.phase protos/docs/core/core:workflowexecution.phase` and are sent as part of the Execution Event. +The phases in the above state diagram are captured in the Admin database as specified here :std:ref:`workflowexecution.phase ` and are sent as part of the Execution Event. -The state machine specification for the illustration can be found `here `_ +The state machine specification for the illustration can be found `here `__ Node States @@ -47,19 +47,19 @@ Once a Workflow enters a ``Running`` state, it triggers the phantom ``start node Nodes can be of different types, as follows, but all the nodes traverse through the same transitions #. Start Node - Only exists during the execution and is not modeled in the core spec -#. :std:ref:`tasknode protos/docs/core/core:tasknode` -#. :std:ref:`branchnode protos/docs/core/core:branchnode ` -#. :std:ref:`workflownode protos/docs/core/core:workflownode` +#. :std:ref:`tasknode ` +#. :std:ref:`branchnode ` +#. :std:ref:`workflownode ` #. Dynamic node - which is just a task node that does not return outputs, but a dynamic workflow. When the task runs, it stays in a `RUNNING` state. Once the task completes and Flyte starts executing the dynamic workflow, the overarching node that contains both the original task and the dynamic workflow enters `DYNAMIC_RUNNING` state. #. End Node - only exists during the execution and is not modeled in the core spec -Every transition between states is recorded in Flyteadmin using :std:ref:`nodeexecutionevent protos/docs/event/event:nodeexecutionevent` +Every transition between states is recorded in Flyteadmin using :std:ref:`nodeexecutionevent ` -Every NodeExecutionEvent can have one of the :std:ref:`nodeexecution.phase protos/docs/core/core:nodeexecution.phase` +Every NodeExecutionEvent can have one of the :std:ref:`nodeexecution.phase ` .. note:: TODO add explanation for each phase -The state machine specification for the illustration can be found `here `_ +The state machine specification for the illustration can be found `here `__ Task States ================ @@ -69,10 +69,10 @@ Task States The State diagram above illustrates the various states through which a Task transitions. -Every transition between states is recorded in Flyteadmin using :std:ref:`taskexecutionevent protos/docs/event/event:taskexecutionevent` +Every transition between states is recorded in Flyteadmin using :std:ref:`taskexecutionevent ` -Every TaskExecutionEvent can have one of the :std:ref:`taskexecution.phase protos/docs/core/core:taskexecution.phase` +Every TaskExecutionEvent can have one of the :std:ref:`taskexecution.phase ` .. note:: TODO add explanation for each phase -The state machine specification for the illustration can be found `here `_ +The state machine specification for the illustration can be found `here `__ diff --git a/rsts/concepts/tasks.rst b/rsts/concepts/tasks.rst index 741bd9e77f..2d31259fc8 100644 --- a/rsts/concepts/tasks.rst +++ b/rsts/concepts/tasks.rst @@ -37,8 +37,9 @@ In abstract, a task in the system is characterized by: 4. *Optional* Task interface definition In order for tasks to exchange data with each other, a task can define a signature (much like a function/method - signature in programming languages). A task interface defines the input and output variables - :std:ref:`variablesentry protos/docs/core/core:variablemap.variablesentry` - as well as their types :std:ref:`literaltype protos/docs/core/core:literaltype`. + signature in programming languages). A task interface defines the input and output variables - + :std:ref:`variablesentry ` + as well as their types :std:ref:`literaltype `. Requirements ------------ @@ -59,7 +60,7 @@ Types ----- Since it's impossible to define the unit of execution of a task the same way for all kinds of tasks, Flyte allows different task types in the system. Flyte comes with a set of defined, battle tested task types but also allows for a very flexible model to -introducing new :ref:`plugins_extend_intro`. +:std:ref:`define new types `. Fault tolerance --------------- @@ -76,4 +77,4 @@ Timeouts Memoization ----------- Flyte supports memoization for task outputs to ensure identical invocations of a task are not repeatedly executed wasting compute resources. -For more information on memoization please refer to :ref:`howto-enable-use-memoization`. +For more information on memoization please refer to the :std:ref:`User Guide `. diff --git a/rsts/concepts/workflows_nodes.rst b/rsts/concepts/workflows_nodes.rst index ae449d08bf..002557a482 100644 --- a/rsts/concepts/workflows_nodes.rst +++ b/rsts/concepts/workflows_nodes.rst @@ -33,7 +33,9 @@ Executions ---------- A workflow can only be executed through a :ref:`launch plan `. -A workflow can be launched many times with a variety of launch plans and inputs. Workflows that produce inputs and outputs can take advantage of :ref:`task caching ` to cache intermediate inputs and outputs and speed-up subsequent executions. +A workflow can be launched many times with a variety of launch plans and inputs. Workflows that produce inputs and +outputs can take advantage of :std:ref:`User Guide ` to cache +intermediate inputs and outputs and speed-up subsequent executions. .. _divedeep-nodes: diff --git a/rsts/conf.py b/rsts/conf.py index b769998c21..845ca1b537 100644 --- a/rsts/conf.py +++ b/rsts/conf.py @@ -61,7 +61,7 @@ "sphinxext.remoteliteralinclude", "sphinx_issues", "sphinx_search.extension", - "sphinx_fontawesome", + "sphinx_panels", ] extlinks = { @@ -107,7 +107,6 @@ html_theme = "furo" html_title = "Flyte Docs" -html_static_path = ["_static"] templates_path = ["_templates"] pygments_style = "tango" diff --git a/rsts/getting_started.rst b/rsts/getting_started.rst index ad750baf12..7669101764 100644 --- a/rsts/getting_started.rst +++ b/rsts/getting_started.rst @@ -103,7 +103,7 @@ Steps .. rubric:: ๐ŸŽ‰ Congratulations, you just ran your first Flyte workflow ๐ŸŽ‰ - Next Steps: User Guide - ####################### - - To experience the full capabilities of Flyte, take a look at the `User Guide `__ ๐Ÿ›ซ +Next Steps: User Guide +####################### + +To experience the full capabilities of Flyte, take a look at the `User Guide `__ ๐Ÿ›ซ diff --git a/rsts/howto/authentication/index.rst b/rsts/howto/authentication/index.rst deleted file mode 100644 index d9fa602c97..0000000000 --- a/rsts/howto/authentication/index.rst +++ /dev/null @@ -1,110 +0,0 @@ -.. _howto_authentication: - -####################### -Authentication in Flyte -####################### - -Flyte ships with a canonical implementation of OpenIDConnect client and OAuth2 Server, integrating seamlessly into an organization's existing identity provider. - -.. toctree:: - :maxdepth: 1 - :caption: Setting up Flyte Authentication - :name: howtosetupauthtoc - - setup - migration - -******** -Overview -******** - -Flyte system consists of multiple components. For the purposes of this document, let's categorize them into server-side and client-side components: - -- **Admin**: A server-side control plane component accessible from console, cli and other backends. -- **Catalog**: A server-side control plane component accessible from console, cli and other backends. -- **Console**: A client-side single page react app. -- **flyte-cli**: A python-based client-side command line interface that interacts with Admin and Catalog. -- **flytectl**: A go-based client-side command line interface that interacts with Admin and Catalog. -- **Propeller**: A server-side data plane component that interacts with both admin and catalog services. - -************** -OpenID Connect -************** - -Flyte supports OpenID Connect. A defacto standard for user authentication. After configuring OpenID Connect, users accessing flyte console or flytectl -(or other 3rd party apps) will be prompted to authenticate using the configured provider. - -.. image:: https://mermaid.ink/img/eyJjb2RlIjoic2VxdWVuY2VEaWFncmFtXG4lJXtjb25maWc6IHsgJ2ZvbnRGYW1pbHknOiAnTWVubG8nLCAnZm9udFNpemUnOiAxMCwgJ2ZvbnRXZWlnaHQnOiAxMDB9IH0lJVxuICAgIGF1dG9udW1iZXJcbiAgICBVc2VyLT4-K0Jyb3dzZXI6IC9ob21lXG4gICAgQnJvd3Nlci0-PitDb25zb2xlOiAvaG9tZVxuICAgIENvbnNvbGUtPj4tQnJvd3NlcjogMzAyIC9sb2dpblxuICAgIEJyb3dzZXItPj4rQWRtaW46IC9sb2dpblxuICAgIEFkbWluLT4-LUJyb3dzZXI6IElkcC5jb20vb2lkY1xuICAgIEJyb3dzZXItPj4rSWRwOiBJZHAuY29tL29pZGNcbiAgICBJZHAtPj4tQnJvd3NlcjogMzAyIC9sb2dpblxuICAgIEJyb3dzZXItPj4tVXNlcjogRW50ZXIgdXNlci9wYXNzXG4gICAgVXNlci0-PitCcm93c2VyOiBsb2dpblxuICAgIEJyb3dzZXItPj4rSWRwOiBTdWJtaXQgdXNlcm5hbWUvcGFzc1xuICAgIElkcC0-Pi1Ccm93c2VyOiBhZG1pbi8_YXV0aENvZGU9PGFiYz5cbiAgICBCcm93c2VyLT4-K0FkbWluOiBhZG1pbi9hdXRoQ29kZT08YWJjPlxuICAgIEFkbWluLT4-K0lkcDogRXhjaGFuZ2UgVG9rZW5zXG4gICAgSWRwLT4-LUFkbWluOiBpZHQsIGF0LCBydFxuICAgIEFkbWluLT4-K0Jyb3dzZXI6IFdyaXRlIENvb2tpZXMgJiBSZWRpcmVjdCB0byAvY29uc29sZVxuICAgIEJyb3dzZXItPj4rQ29uc29sZTogL2hvbWVcbiAgICBCcm93c2VyLT4-LVVzZXI6IFJlbmRlciAvaG9tZVxuIiwibWVybWFpZCI6eyJ0aGVtZSI6Im5ldXRyYWwifSwidXBkYXRlRWRpdG9yIjpmYWxzZX0 - :target: https://mermaid-js.github.io/mermaid-live-editor/#/edit/eyJjb2RlIjoic2VxdWVuY2VEaWFncmFtXG4lJXtjb25maWc6IHsgJ2ZvbnRGYW1pbHknOiAnTWVubG8nLCAnZm9udFNpemUnOiAxMCwgJ2ZvbnRXZWlnaHQnOiAxMDB9IH0lJVxuICAgIGF1dG9udW1iZXJcbiAgICBVc2VyLT4-K0Jyb3dzZXI6IC9ob21lXG4gICAgQnJvd3Nlci0-PitDb25zb2xlOiAvaG9tZVxuICAgIENvbnNvbGUtPj4tQnJvd3NlcjogMzAyIC9sb2dpblxuICAgIEJyb3dzZXItPj4rQWRtaW46IC9sb2dpblxuICAgIEFkbWluLT4-LUJyb3dzZXI6IElkcC5jb20vb2lkY1xuICAgIEJyb3dzZXItPj4rSWRwOiBJZHAuY29tL29pZGNcbiAgICBJZHAtPj4tQnJvd3NlcjogMzAyIC9sb2dpblxuICAgIEJyb3dzZXItPj4tVXNlcjogRW50ZXIgdXNlci9wYXNzXG4gICAgVXNlci0-PitCcm93c2VyOiBsb2dpblxuICAgIEJyb3dzZXItPj4rSWRwOiBTdWJtaXQgdXNlcm5hbWUvcGFzc1xuICAgIElkcC0-Pi1Ccm93c2VyOiBhZG1pbi8_YXV0aENvZGU9PGFiYz5cbiAgICBCcm93c2VyLT4-K0FkbWluOiBhZG1pbi9hdXRoQ29kZT08YWJjPlxuICAgIEFkbWluLT4-K0lkcDogRXhjaGFuZ2UgVG9rZW5zXG4gICAgSWRwLT4-LUFkbWluOiBpZHQsIGF0LCBydFxuICAgIEFkbWluLT4-K0Jyb3dzZXI6IFdyaXRlIENvb2tpZXMgJiBSZWRpcmVjdCB0byAvY29uc29sZVxuICAgIEJyb3dzZXItPj4rQ29uc29sZTogL2hvbWVcbiAgICBCcm93c2VyLT4-LVVzZXI6IFJlbmRlciAvaG9tZVxuIiwibWVybWFpZCI6eyJ0aGVtZSI6Im5ldXRyYWwifSwidXBkYXRlRWRpdG9yIjpmYWxzZX0 - :width: 600 - :alt: Flyte UI Swimlane - -****** -OAuth2 -****** - -Flyte supports OAuth2 to control access to 3rd party and native apps. FlyteAdmin comes with a built in Authorization Server that can perform 3-legged -and 2-legged OAuth2 flows. It also supports delegating these responsibilities to an external Authorization Server. - -Service Authentication using OAuth2 -=================================== - -Propeller (and potentially other non-user facing services) can also authenticate using client_credentials to the Idp and be granted an -access_token valid to be used with admin and other backend services. - -Using FlyteAdmin's builtin Authorization Server: - -.. image:: https://mermaid.ink/img/eyJjb2RlIjoic2VxdWVuY2VEaWFncmFtXG4gICAgUHJvcGVsbGVyLT4-K0FkbWluOiAvdG9rZW4_Y2xpZW50X2NyZWRzJnNjb3BlPWh0dHBzOi8vYWRtaW4vXG4gICAgQWRtaW4tPj4tUHJvcGVsbGVyOiBhY2Nlc3NfdG9rZW5cbiAgICBQcm9wZWxsZXItPj4rQWRtaW46IC9saXN0X3Byb2plY3RzP3Rva2VuPWFjY2Vzc190b2tlbiIsIm1lcm1haWQiOnsidGhlbWUiOiJuZXV0cmFsIn0sInVwZGF0ZUVkaXRvciI6ZmFsc2V9 - :target: https://mermaid-js.github.io/mermaid-live-editor/#/edit/eyJjb2RlIjoic2VxdWVuY2VEaWFncmFtXG4gICAgUHJvcGVsbGVyLT4-K0FkbWluOiAvdG9rZW4_Y2xpZW50X2NyZWRzJnNjb3BlPWh0dHBzOi8vYWRtaW4vXG4gICAgQWRtaW4tPj4tUHJvcGVsbGVyOiBhY2Nlc3NfdG9rZW5cbiAgICBQcm9wZWxsZXItPj4rQWRtaW46IC9saXN0X3Byb2plY3RzP3Rva2VuPWFjY2Vzc190b2tlbiIsIm1lcm1haWQiOnsidGhlbWUiOiJuZXV0cmFsIn0sInVwZGF0ZUVkaXRvciI6ZmFsc2V9 - :width: 600 - :alt: Service Authentication Swimlane - -Using an External Authorization Server: - -.. image:: https://mermaid.ink/img/eyJjb2RlIjoic2VxdWVuY2VEaWFncmFtXG4gICAgUHJvcGVsbGVyLT4-K0V4dGVybmFsIEF1dGhvcml6YXRpb24gU2VydmVyOiAvdG9rZW4_Y2xpZW50X2NyZWRzJnNjb3BlPWh0dHBzOi8vYWRtaW4vXG4gICAgRXh0ZXJuYWwgQXV0aG9yaXphdGlvbiBTZXJ2ZXItPj4tUHJvcGVsbGVyOiBhY2Nlc3NfdG9rZW5cbiAgICBQcm9wZWxsZXItPj4rQWRtaW46IC9saXN0X3Byb2plY3RzP3Rva2VuPWFjY2Vzc190b2tlbiIsIm1lcm1haWQiOnsidGhlbWUiOiJuZXV0cmFsIn0sInVwZGF0ZUVkaXRvciI6ZmFsc2V9 - :target: https://mermaid-js.github.io/mermaid-live-editor/#/edit/eyJjb2RlIjoic2VxdWVuY2VEaWFncmFtXG4gICAgUHJvcGVsbGVyLT4-K0V4dGVybmFsIEF1dGhvcml6YXRpb24gU2VydmVyOiAvdG9rZW4_Y2xpZW50X2NyZWRzJnNjb3BlPWh0dHBzOi8vYWRtaW4vXG4gICAgRXh0ZXJuYWwgQXV0aG9yaXphdGlvbiBTZXJ2ZXItPj4tUHJvcGVsbGVyOiBhY2Nlc3NfdG9rZW5cbiAgICBQcm9wZWxsZXItPj4rQWRtaW46IC9saXN0X3Byb2plY3RzP3Rva2VuPWFjY2Vzc190b2tlbiIsIm1lcm1haWQiOnsidGhlbWUiOiJuZXV0cmFsIn0sInVwZGF0ZUVkaXRvciI6ZmFsc2V9 - :width: 600 - :alt: Service Authentication Swimlane - -User Authentication in other clients (e.g. Cli) using OAuth2-Pkce -================================================================== - -Users accessing backend services through Cli should be able to use OAuth2-Pkce flow to authenticate (in a browser) to the Idp and be issued -an access_token valid to communicate with the intended backend service on behalf of the user. - -Using FlyteAdmin's builtin Authorization Server: - -.. image:: https://mermaid.ink/img/eyJjb2RlIjoic2VxdWVuY2VEaWFncmFtXG4lJXtjb25maWc6IHsgJ2ZvbnRGYW1pbHknOiAnTWVubG8nLCAnZm9udFNpemUnOiAxMCwgJ2ZvbnRXZWlnaHQnOiAxMDB9IH0lJVxuICAgIGF1dG9udW1iZXJcbiAgICBVc2VyLT4-K0NsaTogZmx5dGVjdGwgbGlzdC1wcm9qZWN0c1xuICAgIENsaS0-PitBZG1pbjogYWRtaW4vY2xpZW50LWNvbmZpZ1xuICAgIEFkbWluLT4-LUNsaTogQ2xpZW50X2lkPTxhYmM-LCAuLi5cbiAgICBDbGktPj4rQnJvd3NlcjogL29hdXRoMi9hdXRob3JpemU_cGtjZSZjb2RlX2NoYWxsZW5nZSxjbGllbnRfaWQsc2NvcGVcbiAgICBCcm93c2VyLT4-K0FkbWluOiAvb2F1dGgyL2F1dGhvcml6ZT9wa2NlLi4uXG4gICAgQWRtaW4tPj4tQnJvd3NlcjogMzAyIGlkcC5jb20vbG9naW5cbiAgICBOb3RlIG92ZXIgQnJvd3NlcixBZG1pbjogVGhlIHByaW9yIE9wZW5JRCBDb25uZWN0IGZsb3dcbiAgICBCcm93c2VyLT4-K0FkbWluOiBhZG1pbi9sb2dnZWRfaW5cbiAgICBOb3RlIG92ZXIgQnJvd3NlcixBZG1pbjogUG90ZW50aWFsbHkgc2hvdyBjdXN0b20gY29uc2VudCBzY3JlZW5cbiAgICBBZG1pbi0-Pi1Ccm93c2VyOiBsb2NhbGhvc3QvP2F1dGhDb2RlPTxhYmM-XG4gICAgQnJvd3Nlci0-PitDbGk6IGxvY2FsaG9zdC9hdXRoQ29kZT08YWJjPlxuICAgIENsaS0-PitBZG1pbjogL3Rva2VuP2NvZGUsY29kZV92ZXJpZmllclxuICAgIEFkbWluLT4-LUNsaTogYWNjZXNzX3Rva2VuXG4gICAgQ2xpLT4-K0FkbWluOiAvcHJvamVjdHMvICsgYWNjZXNzX3Rva2VuXG4gICAgQWRtaW4tPj4tQ2xpOiBwcm9qZWN0MSwgcHJvamVjdDJcbiIsIm1lcm1haWQiOnsidGhlbWUiOiJuZXV0cmFsIn0sInVwZGF0ZUVkaXRvciI6ZmFsc2V9 - :target: https://mermaid-js.github.io/mermaid-live-editor/#/edit/eyJjb2RlIjoic2VxdWVuY2VEaWFncmFtXG4lJXtjb25maWc6IHsgJ2ZvbnRGYW1pbHknOiAnTWVubG8nLCAnZm9udFNpemUnOiAxMCwgJ2ZvbnRXZWlnaHQnOiAxMDB9IH0lJVxuICAgIGF1dG9udW1iZXJcbiAgICBVc2VyLT4-K0NsaTogZmx5dGVjdGwgbGlzdC1wcm9qZWN0c1xuICAgIENsaS0-PitBZG1pbjogYWRtaW4vY2xpZW50LWNvbmZpZ1xuICAgIEFkbWluLT4-LUNsaTogQ2xpZW50X2lkPTxhYmM-LCAuLi5cbiAgICBDbGktPj4rQnJvd3NlcjogL29hdXRoMi9hdXRob3JpemU_cGtjZSZjb2RlX2NoYWxsZW5nZSxjbGllbnRfaWQsc2NvcGVcbiAgICBCcm93c2VyLT4-K0FkbWluOiAvb2F1dGgyL2F1dGhvcml6ZT9wa2NlLi4uXG4gICAgQWRtaW4tPj4tQnJvd3NlcjogMzAyIGlkcC5jb20vbG9naW5cbiAgICBOb3RlIG92ZXIgQnJvd3NlcixBZG1pbjogVGhlIHByaW9yIE9wZW5JRCBDb25uZWN0IGZsb3dcbiAgICBCcm93c2VyLT4-K0FkbWluOiBhZG1pbi9sb2dnZWRfaW5cbiAgICBOb3RlIG92ZXIgQnJvd3NlcixBZG1pbjogUG90ZW50aWFsbHkgc2hvdyBjdXN0b20gY29uc2VudCBzY3JlZW5cbiAgICBBZG1pbi0-Pi1Ccm93c2VyOiBsb2NhbGhvc3QvP2F1dGhDb2RlPTxhYmM-XG4gICAgQnJvd3Nlci0-PitDbGk6IGxvY2FsaG9zdC9hdXRoQ29kZT08YWJjPlxuICAgIENsaS0-PitBZG1pbjogL3Rva2VuP2NvZGUsY29kZV92ZXJpZmllclxuICAgIEFkbWluLT4-LUNsaTogYWNjZXNzX3Rva2VuXG4gICAgQ2xpLT4-K0FkbWluOiAvcHJvamVjdHMvICsgYWNjZXNzX3Rva2VuXG4gICAgQWRtaW4tPj4tQ2xpOiBwcm9qZWN0MSwgcHJvamVjdDJcbiIsIm1lcm1haWQiOnsidGhlbWUiOiJuZXV0cmFsIn0sInVwZGF0ZUVkaXRvciI6ZmFsc2V9 - :width: 600 - :alt: CLI Authentication with Admin's own Authorization Server - -Using an External Authorization Server: - -.. image:: https://mermaid.ink/img/eyJjb2RlIjoic2VxdWVuY2VEaWFncmFtXG4lJXtjb25maWc6IHsgJ2ZvbnRGYW1pbHknOiAnTWVubG8nLCAnZm9udFNpemUnOiAxMCwgJ2ZvbnRXZWlnaHQnOiAxMDB9IH0lJVxuICAgIGF1dG9udW1iZXJcbiAgICBVc2VyLT4-K0NsaTogZmx5dGVjdGwgbGlzdC1wcm9qZWN0c1xuICAgIENsaS0-PitBZG1pbjogYWRtaW4vY2xpZW50LWNvbmZpZ1xuICAgIEFkbWluLT4-LUNsaTogQ2xpZW50X2lkPTxhYmM-LCAuLi5cbiAgICBDbGktPj4rQnJvd3NlcjogL29hdXRoMi9hdXRob3JpemU_cGtjZSZjb2RlX2NoYWxsZW5nZSxjbGllbnRfaWQsc2NvcGVcbiAgICBCcm93c2VyLT4-K0V4dGVybmFsSWRwOiAvb2F1dGgyL2F1dGhvcml6ZT9wa2NlLi4uXG4gICAgRXh0ZXJuYWxJZHAtPj4tQnJvd3NlcjogMzAyIGlkcC5jb20vbG9naW5cbiAgICBOb3RlIG92ZXIgQnJvd3NlcixFeHRlcm5hbElkcDogVGhlIHByaW9yIE9wZW5JRCBDb25uZWN0IGZsb3dcbiAgICBCcm93c2VyLT4-K0V4dGVybmFsSWRwOiAvbG9nZ2VkX2luXG4gICAgTm90ZSBvdmVyIEJyb3dzZXIsRXh0ZXJuYWxJZHA6IFBvdGVudGlhbGx5IHNob3cgY3VzdG9tIGNvbnNlbnQgc2NyZWVuXG4gICAgRXh0ZXJuYWxJZHAtPj4tQnJvd3NlcjogbG9jYWxob3N0Lz9hdXRoQ29kZT08YWJjPlxuICAgIEJyb3dzZXItPj4rQ2xpOiBsb2NhbGhvc3QvYXV0aENvZGU9PGFiYz5cbiAgICBDbGktPj4rRXh0ZXJuYWxJZHA6IC90b2tlbj9jb2RlLGNvZGVfdmVyaWZpZXJcbiAgICBFeHRlcm5hbElkcC0-Pi1DbGk6IGFjY2Vzc190b2tlblxuICAgIENsaS0-PitBZG1pbjogL3Byb2plY3RzLyArIGFjY2Vzc190b2tlblxuICAgIEFkbWluLT4-LUNsaTogcHJvamVjdDEsIHByb2plY3QyXG4iLCJtZXJtYWlkIjp7InRoZW1lIjoibmV1dHJhbCJ9LCJ1cGRhdGVFZGl0b3IiOmZhbHNlfQ - :target: https://mermaid-js.github.io/mermaid-live-editor/#/edit/eyJjb2RlIjoic2VxdWVuY2VEaWFncmFtXG4lJXtjb25maWc6IHsgJ2ZvbnRGYW1pbHknOiAnTWVubG8nLCAnZm9udFNpemUnOiAxMCwgJ2ZvbnRXZWlnaHQnOiAxMDB9IH0lJVxuICAgIGF1dG9udW1iZXJcbiAgICBVc2VyLT4-K0NsaTogZmx5dGVjdGwgbGlzdC1wcm9qZWN0c1xuICAgIENsaS0-PitBZG1pbjogYWRtaW4vY2xpZW50LWNvbmZpZ1xuICAgIEFkbWluLT4-LUNsaTogQ2xpZW50X2lkPTxhYmM-LCAuLi5cbiAgICBDbGktPj4rQnJvd3NlcjogL29hdXRoMi9hdXRob3JpemU_cGtjZSZjb2RlX2NoYWxsZW5nZSxjbGllbnRfaWQsc2NvcGVcbiAgICBCcm93c2VyLT4-K0V4dGVybmFsSWRwOiAvb2F1dGgyL2F1dGhvcml6ZT9wa2NlLi4uXG4gICAgRXh0ZXJuYWxJZHAtPj4tQnJvd3NlcjogMzAyIGlkcC5jb20vbG9naW5cbiAgICBOb3RlIG92ZXIgQnJvd3NlcixFeHRlcm5hbElkcDogVGhlIHByaW9yIE9wZW5JRCBDb25uZWN0IGZsb3dcbiAgICBCcm93c2VyLT4-K0V4dGVybmFsSWRwOiAvbG9nZ2VkX2luXG4gICAgTm90ZSBvdmVyIEJyb3dzZXIsRXh0ZXJuYWxJZHA6IFBvdGVudGlhbGx5IHNob3cgY3VzdG9tIGNvbnNlbnQgc2NyZWVuXG4gICAgRXh0ZXJuYWxJZHAtPj4tQnJvd3NlcjogbG9jYWxob3N0Lz9hdXRoQ29kZT08YWJjPlxuICAgIEJyb3dzZXItPj4rQ2xpOiBsb2NhbGhvc3QvYXV0aENvZGU9PGFiYz5cbiAgICBDbGktPj4rRXh0ZXJuYWxJZHA6IC90b2tlbj9jb2RlLGNvZGVfdmVyaWZpZXJcbiAgICBFeHRlcm5hbElkcC0-Pi1DbGk6IGFjY2Vzc190b2tlblxuICAgIENsaS0-PitBZG1pbjogL3Byb2plY3RzLyArIGFjY2Vzc190b2tlblxuICAgIEFkbWluLT4-LUNsaTogcHJvamVjdDEsIHByb2plY3QyXG4iLCJtZXJtYWlkIjp7InRoZW1lIjoibmV1dHJhbCJ9LCJ1cGRhdGVFZGl0b3IiOmZhbHNlfQ - :width: 600 - :alt: CLI Authentication with an external Authorization Server - -Identity Providers Support -========================== - -+-----------------+--------+-------------+---------------------+----------+-------+----------+--------+ -| Feature | Okta | Google free | GC Identity Service | Azure AD | Auth0 | KeyCloak | Github | -+=================+========+=============+=====================+==========+=======+==========+========+ -| OpenIdConnect | Yes | Yes | Yes | Yes | Yes | Yes | No | -+-----------------+--------+-------------+---------------------+----------+-------+----------+--------+ -| Custom RP | Yes | No | Yes | Yes | ? | Yes | No | -+-----------------+--------+-------------+---------------------+----------+-------+----------+--------+ - -********** -References -********** - -This collection of RFCs may be helpful to those who wish to investigate the implementation in more depth. - -* `OAuth2 RFC 6749 `_ -* `OAuth Discovery RFC 8414 `_ -* `PKCE RFC 7636 `_ -* `JWT RFC 7519 `_ - diff --git a/rsts/howto/authentication/migration.rst b/rsts/howto/authentication/migration.rst deleted file mode 100644 index fcde70b5f9..0000000000 --- a/rsts/howto/authentication/migration.rst +++ /dev/null @@ -1,151 +0,0 @@ -.. _howto_authentication_migrate: - -###################################################### -How to Migrate Your Authentication Config (pre 0.13.0) -###################################################### - -Using Okta as an example, you would have previously seen something like the following: - -On the Okta side: - -* An Application (OpenID Connect Web) for Flyte Admin itself (e.g. **0oal5rch46pVhCGF45d6**). -* An Application (OpenID Native app) for Flyte-cli/flytectl (e.g. **0oal62nxuD6OSFSRq5d6**). -* These two applications would be assigned to the relevant users. -* An Application (Web) for Flyte Propeller (e.g. **0abc5rch46pVhCGF9876**). -* These applications would either use the default Authorization server, or you would create a new one. - -On the Admin side, you would have had the following configuration: - -.. code-block:: yaml - - server: - # ... other settings - security: - secure: false - useAuth: true - allowCors: true - allowedOrigins: - - "*" - allowedHeaders: - - "Content-Type" - oauth: - baseUrl: https://dev-62129345.okta.com/oauth2/default/ - scopes: - - profile - - openid - - email - claims: - iss: https://dev-62129345.okta.com/oauth2/default - aud: 0oal5rch46pVhCGF45d6 - clientId: 0oal5rch46pVhCGF45d6 - clientSecretFile: "/Users/ytong/etc/secrets/oauth/secret" - authorizeUrl: "https://dev-62129345.okta.com/oauth2/default/v1/authorize" - tokenUrl: "https://dev-62129345.okta.com/oauth2/default/v1/token" - callbackUrl: "http://localhost:8088/callback" - cookieHashKeyFile: "/Users/ytong/etc/secrets/hashkey/hashkey" - cookieBlockKeyFile: "/Users/ytong/etc/secrets/blockkey/blockkey" - redirectUrl: "/api/v1/projects" - thirdPartyConfig: - flyteClient: - clientId: 0oal62nxuD6OSFSRq5d6 - redirectUri: http://localhost:12345/callback - -From the Flyte-cli side, these two settings were needed: - -.. code-block:: bash - - FLYTE_PLATFORM_HTTP_URL=http://localhost:8088 FLYTE_CREDENTIALS_CLIENT_ID=0oal62nxuD6OSFSRq5d6 flyte-cli ... - -**FLYTE_PLATFORM_HTTP_URL** is used because **flyte-cli** uses only gRPC to communicate with Admin. It needs to know the HTTP port (which Admin hosts on a different port because of limitations of the -grpc-gateway library). **flyte-cli** uses this setting to talk to **/.well-known/oauth-authorization-server** to retrieve information regarding the auth endpoints. Previously this redirected to the -Okta Authorization Server's metadata endpoint. With this change, Admin now hosts its own (even if still using the external Authorization Service). - -After version `0.13.0 `__ of the platform, you can still use the IdP as the Authorization Server if you so choose. That configuration would now become: - -.. code-block:: yaml - - server: - # ... other settings - security: - secure: false - useAuth: true - allowCors: true - allowedOrigins: - - "*" - allowedHeaders: - - "Content-Type" - auth: - authorizedUris: - # This should point at your public http Uri. - - https://flyte.mycompany.com - # This will be used by internal services in the same namespace as flyteadmin - - http://flyteadmin:80 - # This will be used by internal services in the same cluster but different namespaces - - http://flyteadmin.flyte.svc.cluster.local:80 - userAuth: - openId: - # Put the URL of the OpenID Connect provider. - baseUrl: https://dev-62129345.okta.com/oauth2/default # Okta with a custom Authorization Server - scopes: - - profile - - openid - - offline_access # Uncomment if OIdC supports issuing refresh tokens. - # Replace with the client id created for Flyte. - clientId: 0oal5rch46pVhCGF45d6 - appAuth: - # External delegates app auth responsibilities to an external authorization server, Internal means Flyte Admin does it itself - authServerType: External - thirdPartyConfig: - flyteClient: - clientId: 0oal62nxuD6OSFSRq5d6 - redirectUri: http://localhost:12345/callback - scopes: - - all - - offline - -Specifically, - -* The original **oauth** section has been moved two levels higher into its own section and renamed **auth** but enabling/disabling of authentication remains in the old location. -* Secrets by default will now be looked up in **/etc/secrets**. Use the following command to generate them: - -.. code-block:: bash - - flyteadmin secrets init -p /etc/secrets - -This will generate the new cookie hash/block keys, as well as other secrets Admin needs to run the Authorization server. - -* The **clientSecretFile** has been moved to **/etc/secrets/oidc_client_secret** so move that there. -* **claims** has been removed, just delete that. -* **authorizeUrl** and **tokenUrl** are no longer necessary. -* The **baseUrl** for the external Authorization Server is now in the **appAuth** section. -* The **thirdPartyConfig** has been moved to **appAuth** as well. -* **redirectUrl** has been defaulted to **/console**. If that's the value you want, then you no longer need this setting. - -From Propeller side, you might have a configuration section that looks like this: - -.. code-block:: yaml - - admin: - endpoint: dns:///mycompany.domain.com - useAuth: true - clientId: flytepropeller - clientSecretLocation: /etc/secrets/client_secret - tokenUrl: https://demo.nuclyde.io/oauth2/token - scopes: - - all - -This can now be simplified to: - -.. code-block:: yaml - - admin: - endpoint: dns:///mycompany.domain.com - # If you are using the built-in authorization server, you can delete the following two lines: - clientId: flytepropeller - clientSecretLocation: /etc/secrets/client_secret - -Specifically, - -* **useAuth** is deprecated and will be removed in a future version. Auth requirement will be discovered through an anonymous admin discovery call. -* **tokenUrl** and **scopes** will also be discovered through a metadata call. -* **clientId** and **clientSecretLocation** have defaults that work out of the box with the built-in authorization server (e.g. if you setup Google OpenID Connect). diff --git a/rsts/howto/authentication/setup.rst b/rsts/howto/authentication/setup.rst deleted file mode 100644 index e064637af6..0000000000 --- a/rsts/howto/authentication/setup.rst +++ /dev/null @@ -1,252 +0,0 @@ -.. _howto_authentication_setup: - -############################ -How to Set Up Authentication -############################ - -***************** -IdP Configuration -***************** -Flyte Admin requires that the application in your identity provider be configured as a web client (i.e. with a client secret). We recommend allowing the application to be issued a refresh token to avoid interrupting the user's flow by frequently redirecting to the IdP. - -************************* -Flyte Admin Configuration -************************* -Please refer to the `inline documentation `_ on the ``Config`` object in the ``auth`` package for a discussion on the settings required. - -********************** -Example Configurations -********************** - -Below are listed some canonical examples of how to set up some of the common IdPs to secure your Fyte services. OpenID Connect enables users to authenticate, in the -browser, with an existing IdP. Flyte also allows connecting to an external OAuth2 Authorization Server to allow centrally managed third party app access. - -OpenID Connect -=============== - -OpenID Connect allows users to authenticate to Flyte in their browser using a familiar authentication provider (perhaps an organization-wide configured IdP). -Flyte supports connecting with external OIdC providers. Here are some examples for how to set these up: - -Google OpenID Connect ---------------------- - -Follow `Google Docs `__ on how to configure the IdP for OpenIDConnect. - -.. note:: - - Make sure to create an OAuth2 Client Credential. The `client_id` and `client_secret` will be needed in the following - steps. - -Okta OpenID Connect -------------------- - -Okta supports OpenID Connect protocol and the creation of custom OAuth2 Authorization Servers, allowing it to act as both the user and apps IdP. -It offers more detailed control on access policies, user consent, and app management. - -1. If you don't already have an Okta account, sign up for one `here `__. -2. Create an app (choose Web for the platform) and OpenID Connect for the sign-on method. -3. Add Login redirect URIs (e.g. http://localhost:30081/callback for sandbox or https:///callback) -4. OPTIONAL: Add logout redirect URIs (e.g. http://localhost:30081/logout for sandbox) -5. Write down the Client ID and Client Secret - -KeyCloak OpenID Connect ------------------------- - -`KeyCloak `__ is an open source solution for authentication, it supports both OpenID Connect and OAuth2 protocols (among others). -KeyCloak can be configured to be both the OpenID Connect and OAuth2 Authorization Server provider for Flyte. - -Apply configuration -------------------- - -1. Store the `client_secret` in a k8s secret as follows: - -.. prompt:: bash - - kubectl edit secret -n flyte flyte-admin-auth - -Add a new key under `stringData`: - -.. code-block:: yaml - - stringData: - oidc_client_secret: from the previous step - data: - ... - -Save and close your editor. - -2. Edit FlyteAdmin config to add `client_id` and configure auth as follows: - -.. prompt:: bash - - kubectl get deploy -n flyte flyteadmin -o yaml | grep "name: flyte-admin-config" - -This will output the name of the config map where the `client_id` needs to go. - -.. prompt:: bash - - kubectl edit configmap -n flyte - -Follow the inline comments to make the necessary changes: - -.. code-block:: yaml - - server: - ... - security: - secure: false - # 1. Enable Auth by turning useAuth to true - useAuth: true - ... - auth: - userAuth: - openId: - # 2. Put the URL of the OpenID Connect provider. - # baseUrl: https://accounts.google.com # Uncomment for Google - baseUrl: https://dev-14186422.okta.com/oauth2/default # Okta with a custom Authorization Server - scopes: - - profile - - openid - # - offline_access # Uncomment if OIdC supports issuing refresh tokens. - # 3. Replace with the client ID created for Flyte. - clientId: 0oakkheteNjCMERst5d6 - -Save and exit your editor. - -3. Restart `flyteadmin` for the changes to take effect: - -.. prompt:: bash - - kubectl rollout restart deployment/flyteadmin -n flyte - -OAuth2 Authorization Server -=========================== - -An OAuth2 Authorization Server allows external clients to request to authenticate and act on behalf of users (or as their own identities). Having -an OAuth2 Authorization Server enables Flyte administrators control over which apps can be installed and what scopes they are allowed to request or be granted (i.e. what privileges can they assume). - -Flyte comes with a built-in authorization server that can be statically configured with a set of clients to request and act on behalf of the user. -The default clients are defined `here `__ -and the corresponding section can be modified through configs. - -To set up an external OAuth2 Authorization Server, please follow the instructions below: - -Okta IdP --------- - -1. Under security -> API, click `Add Authorization Server`. Set the audience to the public URL of flyte admin (e.g. https://flyte.mycompany.io/). -2. Under `Access Policies`, click `Add New Access Policy` and walk through the wizard to allow access to the authorization server. -3. Under `Scopes`, click `Add Scope`. Set the name to `all` (required) and check `Require user consent for this scope` (recommended). -4. Create 2 apps (for fltyectl and flytepropeller) to enable these clients to communicate with the service. - Flytectl should be created as a `native client`. - FlytePropeller should be created as an `OAuth Service` and note the client ID and client Secrets provided. - -KeyCloak IdP ------------- - -`KeyCloak `__ is an open source solution for authentication, it supports both OpenID Connect and OAuth2 protocols (among others). -KeyCloak can be configured to be both the OpenID Connect and OAuth2 Authorization Server provider for flyte. - -Apply Configurations --------------------- - -1. It is possible to direct Flyte admin to use an external authorization server. To do so, edit the same config map once more and follow these changes: - -.. code-block:: yaml - - auth: - appAuth: - # 1. Choose External if you will use an external Authorization Server (e.g. a Custom Authorization server in Okta) - # Choose Self (or omit the value) to use Flyte Admin's internal (albeit limited) Authorization Server. - authServerType: External - - # 2. Optional: Set external auth server baseUrl if different from OpenId baseUrl. - externalAuthServer: - baseUrl: https://dev-14186422.okta.com/oauth2/auskngnn7uBViQq6b5d6 - thirdPartyConfig: - flyteClient: - # 3. Replace with a new Native Client ID provisioned in the custom authorization server - clientId: flytectl - - redirectUri: https://localhost:53593/callback - - # 4. "all" is a required scope and must be configured in the custom authorization server - scopes: - - offline - - all - userAuth: - openId: - baseUrl: https://dev-14186422.okta.com/oauth2/auskngnn7uBViQq6b5d6 # Okta with a custom Authorization Server - scopes: - - profile - - openid - # - offline_access # Uncomment if OIdC supports issuing refresh tokens. - clientId: 0oakkheteNjCMERst5d6 - -1. Store flyte propeller's `client_secret` in a k8s secret as follows: - -.. prompt:: bash - - kubectl edit secret -n flyte flyte-propeller-auth - -Add a new key under `stringData`: - -.. code-block:: yaml - - stringData: - client_secret: from the previous step - data: - ... - -Save and close your editor. - -2. Edit FlytePropeller config to add `client_id` and configure auth as follows: - -.. prompt:: bash - - kubectl get deploy -n flyte flytepropeller -o yaml | grep "name: flyte-propeller-config" - -This will output the name of the config map where the `client_id` needs to go. - -.. prompt:: bash - - kubectl edit configmap -n flyte - -Follow the inline comments to make the necessary changes: - -.. code-block:: yaml - - admin: - # 1. Replace with the client_id provided by the OAuth2 Authorization Server above. - clientId: flytepropeller - -Close the editor - -3. Restart `flytepropeller` for the changes to take effect: - -.. prompt:: bash - - kubectl rollout restart deployment/flytepropeller -n flyte - -*************************** -Continuous Integration - CI -*************************** - -If your organization does any automated registration, then you'll need to authenticate with the `basic authentication `_ flow (username and password effectively). After retrieving an access token from the IDP, you can send it along to Flyte Admin as usual. - -Flytekit configuration variables are automatically designed to look up values from relevant environment variables. However, to aid with continuous integration use-cases, Flytekit configuration can also reference other environment variables. - -For instance, if your CI system is not capable of setting custom environment variables like ``FLYTE_CREDENTIALS_CLIENT_SECRET`` but does set the necessary settings under a different variable, you may use ``export FLYTE_CREDENTIALS_CLIENT_SECRET_FROM_ENV_VAR=OTHER_ENV_VARIABLE`` to redirect the lookup. A ``FLYTE_CREDENTIALS_CLIENT_SECRET_FROM_FILE`` redirect is available as well, where the value should be the full path to the file containing the value for the configuration setting, in this case, the client secret. We found this redirect behavior necessary when setting up registration within our own CI pipelines. - -The following is a listing of the Flytekit configuration values we set in CI, along with a brief explanation. - -* ``FLYTE_CREDENTIALS_CLIENT_ID`` and ``FLYTE_CREDENTIALS_CLIENT_SECRET`` - When using basic authentication, this is the username and password. -* ``export FLYTE_CREDENTIALS_AUTH_MODE=basic`` - This tells the SDK to use basic authentication. If not set, Flytekit will assume you want to use the standard OAuth based three-legged flow. -* ``export FLYTE_CREDENTIALS_AUTHORIZATION_METADATA_KEY=text`` - At Lyft, the value is set to conform to this `header config `_ on the Admin side. -* ``export FLYTE_CREDENTIALS_SCOPE=text`` - When using basic authentication, you'll need to specify a scope to the IDP (instead of ``openid``, which is only for OAuth). Set that here. -* ``export FLYTE_PLATFORM_AUTH=True`` - Set this to force Flytekit to use authentication, even if not required by Admin. This is useful as you're rolling out the requirement. diff --git a/rsts/howto/enable_and_use_memoization.rst b/rsts/howto/enable_and_use_memoization.rst deleted file mode 100644 index 5a08be3915..0000000000 --- a/rsts/howto/enable_and_use_memoization.rst +++ /dev/null @@ -1,37 +0,0 @@ -.. _howto-enable-use-memoization: - -######################################### -How do I enable and use memoization? -######################################### - -Flyte provides the ability to cache the output of task executions in order to make subsequent executions faster. A well-behaved Flyte Task should generate deterministic output given the same inputs and task functionality. This is useful in situations where a user knows that many executions with the exact same inputs can occur. For example, your task may be periodically run on a schedule, run multiple times when debugging workflows, or commonly shared across different workflows but receive the same inputs. - -Enable Caching For a Task - SDK? ------------------------------------ - -In order to enable your task to be cached, mark ``cache=True`` below: - -.. code-block:: python - - @task(cache=True, cache_version='1.0.0') - def hash_string_task(original: str) -> str: - ... - -A task execution is cached based on the **Project, Domain, cache_version, the task signature and inputs** associated with the execution of the task. - -- *Project:* A task run under one project cannot use the cached task execution from another project. This could cause inadvertent results between project teams that could cause data corruption. -- *Domain:* For separation of test, staging, and production data, task executions are not shared across these environments. -- *cache_version:* When task functionality changes, you can change the cache_version of the task. Flyte will know not to use older cached task executions and create a new cache entry on the next execution. -- *Task signature:*: The cache is specific to the task signature that is associated with the execution. The signature is made up of task name, input parameter names/types and also the output parameter name/types. -- *Task input values*: A well-formed Flyte Task always produces deterministic outputs. This means given a set of input values, every execution should produce identical outputs. When a task execution is cached, the input values are part of the cache key. - -Notice that task executions can be cached across different versions of the task. This is because a change in SHA does not neccessarily mean that it correlates to a change in task functionality. - -Flyte provides several ways to break the old task execution cache, and cache new output: - -- ``cache_version``: this field indicates that the task functionality has changed. Flyte users can manually update this version and Flyte will cache the next execution instead of relying on the old cache. -- Task signature: If a Flyte user changes the task interface in any way (such as by adding, removing, or editing inputs/outputs), Flyte will treat that as a task functionality change. On the next execution, Flyte will run the task and store the outputs as new cached values. - - -Enable Caching in your FlytePlatform --------------------------------------- \ No newline at end of file diff --git a/rsts/howto/enable_and_use_schedules.rst b/rsts/howto/enable_and_use_schedules.rst deleted file mode 100644 index 9cf9dd91bd..0000000000 --- a/rsts/howto/enable_and_use_schedules.rst +++ /dev/null @@ -1,186 +0,0 @@ -.. _howto_scheduling: - -################################################# -How do I use Flyte scheduling? -################################################# - -******* -Usage -******* - -Launch plans can be set to run automatically on a schedule if the Flyte platform is properly configured. -You can even use the scheduled kick-off time in your workflow as an input. - -There are two types of schedules, cron schedules, and fixed rate intervals. - -Cron Schedules -============== - -Cron expression strings use the `AWS syntax `_. -These are validated at launch plan registration time. - -.. code-block:: - - from flytekit import CronSchedule - - schedule = CronSchedule( - cron_expression="0 10 * * ? *", - ) - - -This ``schedule`` object can then be used in the construction of a :py:class:`flytekit:flytekit.LaunchPlan`. - -Complete cron example ---------------------- - -For example, take the following workflow: - -.. code:: python - - from flytekit workflow - - @workflow - def MyWorkflow(an_input: int, another_input: int=10): - .... - -The above can be run on a cron schedule every 5 minutes like so: - -.. code:: python - - from flytekit import CronSchedule, LaunchPlan - - cron_lp = LaunchPlan.create( - "my_cron_lp", - MyWorkflow, - schedule=CronSchedule(cron_expression="0 5 * * ? *"), - fixed_inputs={"an_input": 5}, - ) - - -Fixed Rate Intervals -==================== - -Fixed rate schedules will run at the specified interval. - -.. code-block:: - - from flytekit import FixedRate - from datetime import timedelta - - schedule = FixedRate(duration=timedelta(minutes=10)) - - -Complete fixed rate example ---------------------------- - -.. code:: python - - from flytekit workflow - - @workflow - def MyOtherWorkflow(triggered_time: datetime, an_input: int, another_input: int=10): - .... - - -To run ``MyOtherWorkflow`` every 5 minutes with a value set for ``an_input`` and the scheduled execution time -assigned to the ``triggered_time`` input you could define the following launch plan: - -.. code:: python - - from datetime import timedelta - from flytekit import FixedRate, LaunchPlan - - fixed_rate_lp = LaunchPlan.create( - "my_fixed_rate_lp", - MyOtherWorkflow, - # Note that kickoff_time_input_arg matches the workflow input we defined above: triggered_time - schedule=FixedRate(duration=timedelta(minutes=5), kickoff_time_input_arg="triggered_time"), - fixed_inputs={"an_input": 3}, - ) - -Please see a more complete example in the :std:ref:`User Guide `. - -Activating a schedule -===================== - -Once you've initialized your launch plan, don't forget to set it to active so that the schedule is run. - -You can use pyflyte in container :: - - pyflyte lp -p {{ your project }} -d {{ your domain }} activate-all - -Or with flyte-cli view and activate launch plans :: - - flyte-cli -i -h localhost:30081 -p flyteexamples -d development list-launch-plan-versions - -Extract the URN returned for the launch plan you're interested in and make the call to activate it :: - - flyte-cli update-launch-plan -i -h localhost:30081 --state active -u {{ urn }} - -Verify your active launch plans:: - - flyte-cli -i -h localhost:30081 -p flyteexamples -d development list-active-launch-plans - -****************************** -Platform Configuration Changes -****************************** - -Scheduling features requires additional infrastructure to run so these will have to be created and configured. - -Setting up scheduled workflows -============================== - -In order to run workflow executions based on user-specified schedules you'll need to fill out the top-level ``scheduler`` portion of the flyteadmin application configuration. - -In particular you'll need to configure the two components responsible for scheduling workflows and processing schedule event triggers. - -Note this functionality is currently only supported for AWS installs. - -Event Scheduler ---------------- - -In order to schedule workflow executions, you'll need to set up an `AWS SQS `_ queue. A standard type queue should suffice. The flyteadmin event scheduler creates `AWS CloudWatch `_ event rules that invokes your SQS queue as a target. - -With that in mind, let's take a look at an example ``eventScheduler`` config section and dive into what each value represents: :: - - scheduler: - eventScheduler: - scheme: "aws" - region: "us-east-1" - scheduleRole: "arn:aws:iam::{{ YOUR ACCOUNT ID }}:role/{{ ROLE }}" - targetName: "arn:aws:sqs:us-east-1:{{ YOUR ACCOUNT ID }}:{{ YOUR QUEUE NAME }}" - scheduleNamePrefix: "flyte" - -* **scheme**: in this case because AWS is the only cloud back-end supported for scheduling workflows, only ``"aws"`` is a valid value. By default, the no-op scheduler is used. -* **region**: this specifies which region initialized AWS clients should will use when creating CloudWatch rules -* **scheduleRole** This is the IAM role ARN with permissions set to ``Allow`` - * ``events:PutRule`` - * ``events:PutTargets`` - * ``events:DeleteRule`` - * ``events:RemoveTargets`` -* **targetName** this is the ARN for the SQS Queue you've allocated to scheduling workflows -* **scheduleNamePrefix** this is an entirely optional prefix used when creating schedule rules. Because of AWS naming length restrictions, scheduled rules are a random hash and having a shared prefix makes these names more readable and indicates who generated the rules - -Workflow Executor ------------------ -Scheduled events which trigger need to be handled by the workflow executor, which subscribes to triggered events from the SQS queue you've configured above. - -.. NOTE:: - - Failure to configure a workflow executor will result in all your scheduled events piling up silently without ever kicking off workflow executions. - -Again, let's break down a sample config: :: - - scheduler: - eventScheduler: - ... - workflowExecutor: - scheme: "aws" - region: "us-east-1" - scheduleQueueName: "{{ YOUR QUEUE NAME }}" - accountId: "{{ YOUR ACCOUNT ID }}" - -* **scheme**: in this case because AWS is the only cloud back-end supported for executing scheduled workflows, only ``"aws"`` is a valid value. By default, the no-op executor is used. -* **region**: this specifies which region AWS clients should will use when creating an SQS subscriber client -* **scheduleQueueName**: this is the name of the SQS Queue you've allocated to scheduling workflows -* **accountId**: Your AWS `account id `_ diff --git a/rsts/howto/enable_backend_plugin.rst b/rsts/howto/enable_backend_plugin.rst deleted file mode 100644 index b5f1d054de..0000000000 --- a/rsts/howto/enable_backend_plugin.rst +++ /dev/null @@ -1,34 +0,0 @@ -.. _howto-enable-backend-plugins: - -################################# -How do I enable backend plugins? -################################# - -.. tip:: Flyte Backend plugins are awesome, but are not required to extend Flyte! You can always write a flytekit-only plugins. Refer to :ref:`plugins_extend_intro`. - -Flyte has a unique capability of adding backend plugins. Backend plugins enable Flyte platform to add new capabilities. This has several advantages, - -#. Advanced introspection capabilities - ways to improve logging etc -#. Service oriented architecture - ability to bugfix, deploy plugins without releasing new libraries and forcing all users to update their libraries -#. Better management of the system communication - For example in case of aborts, Flyte can guarantee cleanup of the remote resources -#. Reduced cost overhead, for many plugins which launch jobs on a remote service or cluster, the plugins are essentially just polling. This has a huge compute cost in traditional architectures like Airflow etc. Flyte on the other hand, can run these operations in its own control plane. -#. Potential to create drastically new interfaces, that work across multiple languages and platforms. - -Ok, How do I enable the backend plugins? -========================================= - -To enable a backend plugin you have to add the ``ID`` of the plugin to the enabled plugins list. The ``enabled-plugins`` is available under the ``tasks > task-plugins`` section of FlytePropeller's configuration. -The `plugin configuration structure is defined here `_. An example of the config follows, - -.. rli:: https://raw.githubusercontent.com/flyteorg/flyte/master/kustomize/overlays/sandbox/flyte/config/propeller/enabled_plugins.yaml - :language: yaml - -How do I find the ``ID`` of the backend plugin? -=============================================== -This is a little tricky and sadly at the moment you have to look at the source code of the plugin to figure out the ``ID``. In the case of Spark, for example, the value of ``ID`` is `used `_ here, defined as `spark `_. - -Enable a specific Backend Plugin in your own Kustomize generator -================================================================= -Flyte uses Kustomize to generate the the deployment configuration and it can be leveraged to `kustomize your own deployment `_. - -We will soon be supporting helm or a better deployment model - See issue :issue:`299`. diff --git a/rsts/howto/execute_single_task.rst b/rsts/howto/execute_single_task.rst deleted file mode 100644 index 9384c7d9f3..0000000000 --- a/rsts/howto/execute_single_task.rst +++ /dev/null @@ -1,91 +0,0 @@ -.. _howto_exec_single_task: - -###################### -Running a Single Task -###################### - - -What Are Single Task Executions? -================================ - -Tasks are the most atomic unit of execution in Flyte. Although workflows are traditionally composed of multiple tasks with dependencies -defined by shared inputs and outputs, it can be helpful to execute a single task during the process of iterating on its definition. -It can be tedious to write a new workflow definition every time you want to excecute a single task under development, but single task -executions can be used to easily iterate on task logic. - -Launching a Single Task -======================= - -After building an image with your updated task code, create an execution using launch: - -.. code-block:: python - - @inputs(plant=Types.String) - @outputs(out=Types.String) - @python_task - def my_task(wf_params, plant, out) - ... - - - my_single_task_execution = my_task.launch(project="my_flyte_project", domain="development", inputs={'plant': 'ficus'}) - print("Created {}".format(my_single_task_execution.id)) - -Just like workflow executions, you can optionally pass a user-defined name, labels, annotations, and/or notifications when launching a single task. - -The type of ``my_single_task_execution`` is `SdkWorkflowExecution `_ -and has the full set of methods and functionality available for conventional WorkflowExecutions. - - -Fetching and Launching a Single Task -==================================== - -Single task executions aren't limited to just tasks you've defined in your code. You can reference previously registered tasks and launch a single task execution like so: - -.. code-block:: python - - from flytekit.common.tasks import task as _task - - my_task = _task.SdkTask.fetch("my_flyte_project", "production", "workflows.my_task", "abc123") # project, domain, name, version - - my_task_exec = my_task.launch(project="my_other_project", domain="development", inputs={'plant': 'philodendron'}) - my_task_exec.wait_for_completion() - - -Launching a Single Task From the Commandline -============================================ - -Previously registered tasks can also be launched from the command-line using :ref:`flyte-cli ` - -.. code-block:: console - - $ flyte-cli -h example.com -p my_flyte_project -d development launch-task \ - -u tsk:my_flyte_project:production:my_complicated_task:abc123 -- an_input=hi \ - other_input=123 more_input=qwerty - - -Monitoring Single Task Executions in the Flyte Console -====================================================== - -Single task executions don't have native support in the Flyte console yet, but they are accessible using the same URLs as ordinary workflow executions. - -For a console hosted example.com, visit ``example.com/console/projects//domains//executions/`` to track the progress of your execution. Log links and status changes will be available as your execution progresses. - - -Registering and Launching a Single Task -======================================= - -A certain category of tasks don't rely on custom containers with registered images to run. Therefore, you may find it convenient to use -``register_and_launch`` on a task definition to immediately launch a single task execution, like so: - -.. code-block:: python - - containerless_task = SdkPrestoTask( - task_inputs=inputs(ds=Types.String, count=Types.Integer, rg=Types.String), - statement="SELECT * FROM flyte.widgets WHERE ds = '{{ .Inputs.ds}}' LIMIT {{ .Inputs.count}}", - output_schema=Types.Schema([("a", Types.String), ("b", Types.Integer)]), - routing_group="{{ .Inputs.rg }}", - ) - - my_single_task_execution = containerless_task.register_and_launch(project="my_flyte_project", domain="development", - inputs={'ds': '2020-02-29', 'count': 10, 'rg': 'my_routing_group'}) - diff --git a/rsts/howto/execute_workflow.rst b/rsts/howto/execute_workflow.rst deleted file mode 100644 index 3e4234831a..0000000000 --- a/rsts/howto/execute_workflow.rst +++ /dev/null @@ -1,9 +0,0 @@ -.. _howto_exec_workflow: - -#################################### -How do I execute a workflow? -#################################### - -.. NOTE:: - - Coming soon ๐Ÿ›  diff --git a/rsts/howto/fast_registration.rst b/rsts/howto/fast_registration.rst deleted file mode 100644 index 60b124dba4..0000000000 --- a/rsts/howto/fast_registration.rst +++ /dev/null @@ -1,97 +0,0 @@ -.. _fast_registration: - -******************************** -How do I use Fast Registration? -******************************** - -.. NOTE:: Experimental feature (beta) - -Are you frustrated by having to wait for an image build in order to test out simple code changes to your Flyte workflows? If you're interested in reducing to your iteration cycle to mere seconds, read on below. - -Caveats -======= - -Fast-registration only works when you're testing out code changes. If you need to update your container, say by installing a dependency or modifying a Dockerfile, you **must** use the conventional method of committing your changes and rebuilding a container image. - -Prerequisites -============= - -* Upgrade your flytekit dependency to ``>=0.16.0`` and re-run piptools if necessary - - -You'll need to build a base image with these changes incorporated before you can use fast registration. - - -Fast-registering -================ - -How-to: - -* After the above prerequisite changes are merged, pull the latest master and create a development branch on your local machine. -* Make some code changes. Save your files. -* Clear and/or create the directory used to store your serialized code archive: - -.. code-block:: text - - mkdir _pb_output || true - rm -f _pb_output/*.tar.gz - -* Using a python environment with flytekit installed fast-register your changes: - -.. code-block:: python - - pyflyte -c sandbox.config --pkgs recipes serialize --in-container-config-path /root/sandbox.config \ - --local-source-root --image fast workflows -f _pb_output/ - - -Or, from within your container: - - .. code-block:: text - - pyflyte --config /root/sandbox.config serialize fast workflows -f _pb_output/ - -* Assume a role that has write access to the intermittent directory you'll use to store fast registration code distributions . -* Fast-register your serialized files. You'll note the overlap with the existing register command (auth role and output location) - but with an new flag pointing to an additional distribution dir. This must be writable from the role you assume and readable from - the role your flytepropeller assumes: - - .. code-block:: text - - flyte-cli fast-register-files -p flytetester -d development --kubernetes-service-account ${FLYTE_AUTH_KUBERNETES_SERVICE_ACCOUNT} \ - --output-location-prefix ${FLYTE_AUTH_RAW_OUTPUT_DATA_PREFIX} -h ${FLYTE_PLATFORM_URL} \ - --additional-distribution-dir ${FLYTE_SDK_FAST_REGISTRATION_DIR} _pb_output/* - - -* Open the Flyte UI and launch the latest version of your workflow (under the domain you fast-registered above). It should run with your new code! - - -.. deprecated:: The following section is deprecated - -Older flytekit instructions (flytekit <0.16.0) -============================================== - -Flytekit releases prior to the introduction of native typing have a slightly modified workflow for fast-registering. - -Pre-Reqs -############# - -* Upgrade your flytekit dependency to ``>=0.15.0``. - -* Update your development flyte config and add a new required parameter to the sdk block specifying an intermittent directory for code distributions. Whichever role you use in the [auth] block must have read access to this directory. For example:: - - [sdk] - fast_registration_dir=s3://my-s3-bucket/distributions/ - -You'll need to build a base image with these changes incorporated before you can use fast registration. - -how? -#### - -How-to: - -#. After the above prerequisite changes are merged, pull the latest master and create a development branch on your local machine. -#. Make some code changes. Save your files. -#. Assume a role that has write access to the intermittent directory you'll use to store fast registration code distributions (specified in your flytekit config above). -#. Using a python environment with flytekit installed fast-register your changes: ``# flytekit_venv pyflyte -p myproject -d development -v -c /code/myproject/development.config fast-register workflows --source-dir /code/myproject/`` -#. Open the Flyte UI and launch the latest version of your workflow (under the domain you fast-registered above). It should run with your new code! - diff --git a/rsts/howto/flytecli.rst b/rsts/howto/flytecli.rst deleted file mode 100644 index 043fc13f5e..0000000000 --- a/rsts/howto/flytecli.rst +++ /dev/null @@ -1,169 +0,0 @@ -.. _howto-flytecli: - -######################## -How do I use Flyte CLI? -######################## - -.. note:: - - We are working hard on replacing flyte-cli, with a more robust, better designed and cross platform CLI. - Refer to :std:ref:`flytectl`. - -*************************************************** -A command-line interface for interacting with Flyte -*************************************************** - -The FlyteCLI is a command-line tool that allows users to perform administrative -tasks on their Flyte workflows and executions. It is an independent module but -installed as part of the `Flyte Kit `. It primarily -iteracts with the `FlyteAdmin ` service over its gRPC -interface, allowing users to list registered workflows, or get a currently -running execution. - ------- - -Installation -============ - -The easist way to install FlyteCLI is using virtual environments. -Follow the official doc_ to install the ``virtualenv`` package if -you don't already have it in your development environment. - -Install from source -------------------- -Now that you have virtualenv, you can either install flyte-cli from source. -To do this first clone the git repository and -after setting up and activating your virtual environment, change directory to -the root directory of the flytecli package, and install the dependencies with -``pip install -e .``. - - -.. _doc: https://virtualenv.pypa.io/en/latest/installation.html - -Install from pypi [recommended] -------------------------------- -Another option is to just install flyte-cli from prebuilt binaries - -Testing if you have a working installation ------------------------------------------- - -To test whether you have a successful installation of flytecli, run -``flyte-cli`` or ``flyte-cli --help``. - -If you see the following output, you have installed the FlyteCLI successfully. - -.. code-block:: console - - Usage: flyte-cli [OPTIONS] COMMAND [ARGS]... - - Command line tool for interacting with all entities on the Flyte Platform. - - Options: - -n, --name TEXT [Optional] The name to pass to the sub-command (if - ... - - Commands: - execute-launch-plan Kick off a launch plan. - ... - - ------- - -Terminology -=========== - -This section introduces and explains the most commonly used terms and concepts -the users will see in FlyteCLI. - -Host ----- -``Host`` refers to your running Flyte instance and is a common -argument for the commands in FlyteCLI. The FlyteCLI will only be interacting -with the Flyte instance at the URL you specify with the ``host`` argument. -parameter. This is a required argument for most of the FlyteCLI commands. - -Project -------- -``Project`` is a multi-tenancy primitive in Flyte and allows logical grouping -of instances of Flyte entities by users. Within Lyft's context, this term -usually refers to the name of the Github repository in which your workflow -code resides. - -For more information see :ref:`Projects ` - -Domain ------- -The term ``domain`` refers to development environment (or the service instance) -of your workflow/execution/launch plan/etc. You can specify it with the -``domain`` argument. Values can be either ``development``, ``staging``, or -``production``. See :ref:`Domains ` - - -Name ----- -The ``name`` of a named entity is a randomly generated hash assigned -automatically by the system at the creation time of the named entity. For some -commands, this is an optional argument. - - -Named Entity ------------- -``Name Entity`` is a primitive in Flyte that allows logical grouping of -processing entities across versions. The processing entities to which this term -can refer include unversioned ``launch plans``, ``workflows``, -``executions``, and ``tasks``. In other words, an unversioned ``workflow`` named -entity is essentially a group of multiple workflows that -have the same ``Project``, ``Domain``, and ``Name``, but different versions. - - -URN ---- - -.. note:: - - URN is a FlyteCLI-only concept. You won't see this term in other flyte modules. - -URN stands for "unique resource name", and is the identifier of -a version of a given named entity, such as a workflow, a launch plan, -an execution, or a task. Each URN uniquely identifies a named entity. -URNs are often used in FlyteCLI to interact with specific named entities. - -The URN of a version of a name entity is composible from the entity's -attributes. For example, the URN of a workflow can be composed of a prefix -`wf` and the workflow's ``project``, ``domain``, ``name``, and ``version``, -in the form of ``wf::::``. - -Note that execution is the sole exception here as an execution does not -have versions. The URN of an execution, therefore, is in the form of -``ex:::``. - ------- - -Flyte CLI User Configuration -============================== -The ``flyte-cli`` command line utility also supports default user-level configuration settings if the Admin service it accesses supports authentication. To get started either create or activate a Python 3 virtual environment :: - - $ python3 -m venv ~/envs/flyte - $ source ~/envs/flyte/bin/activate - -In general, we recommend installing and using Flyte CLI inside a virtualenv. Install ``flytekit`` (which installs ``flyte-cli``) as follows :: - - $ pip install wheel flytekit - -Use the setup-config command to create yourself a default config file. This will pull the necessary settings from Flyte's oauth metadata endpoint. :: - - (flyte) username:~ $ flyte-cli setup-config -h flyte.company.net - ------- - -Commands -======== - -For information on available commands in FlyteCLI, refer to FlyteCLI's help message. - -Subcommand Help ---------------- - -FlyteCLI uses subcommands. Whenever you feel unsure about the usage or -the arguments of a command or a subcommand, get help by running -``flyte-cli --help`` or ``flyte-cli --help`` diff --git a/rsts/howto/gcp.rst b/rsts/howto/gcp.rst deleted file mode 100644 index a8540fefa0..0000000000 --- a/rsts/howto/gcp.rst +++ /dev/null @@ -1,26 +0,0 @@ -.. _faq_gcp: - -############################################## -How do I Use Flyte with Google Cloud Platform? -############################################## - -I tried to run examples, but task fails with 401 error? -------------------------------------------------------- - Steps: - - #. Are you using Workload Identity, then you have to pass in the ServiceAccount when you create the launchplan. - - Refer to docs :ref:`howto-serviceaccounts` - - More information about WorkloadIdentity at https://cloud.google.com/kubernetes-engine/docs/how-to/workload-identity - #. If you are just using a simple Nodepool wide permissions then check the cluster's ServiceAccount for Storage permissions. Do they look fine? - - #. If not, then start a dummy pod in the intended namespace and check for - -:: - - gcloud auth list - - -.. note:: - - FlytePropeller uses Google Application credentials, but gsutil does not use these credentials - diff --git a/rsts/howto/index.rst b/rsts/howto/index.rst deleted file mode 100644 index 1c0bbe5e1c..0000000000 --- a/rsts/howto/index.rst +++ /dev/null @@ -1,38 +0,0 @@ -.. _howto: - -########################### -Frequently Asked Questions -########################### - -.. toctree:: - :maxdepth: 1 - :name: howtoguidestoc - - install_sdk - sandbox - flytecli - new_project - execute_single_task - execute_workflow - enable_and_use_memoization - productionize/index - launchplans - managing_customizable_resources - enable_and_use_schedules - enable_backend_plugin - monitoring/index - performance/index - authentication/index - resource_manager/index - resource_quota - fast_registration - multi_cluster/index - interruptible - labels_annotations - notifications - serviceaccount - gcp - secrets - - -.. _howto_extend: diff --git a/rsts/howto/install_sdk.rst b/rsts/howto/install_sdk.rst deleted file mode 100644 index 0d2cb45a18..0000000000 --- a/rsts/howto/install_sdk.rst +++ /dev/null @@ -1,26 +0,0 @@ -.. _install-flytekit-py: - -################################# -How to install Flytekit Python? -################################# - -Flytekit python is published as a regular python library to `pypi `_ - -Install is using the following command:: - - pip install flytekit - -Installing flytekit plugins ----------------------------- -All Flytekiplugins are also published to pypi as independent libraries and can be installed using pip. Refer to :ref:`plugins`. - - -.. _install-flytekit-java: - -################################# -How to install Flytekit Java? -################################# - -.. NOTE:: - - Coming soon ๐Ÿ›  diff --git a/rsts/howto/interruptible.rst b/rsts/howto/interruptible.rst deleted file mode 100644 index 1abcb5f8ea..0000000000 --- a/rsts/howto/interruptible.rst +++ /dev/null @@ -1,54 +0,0 @@ -.. _howto-interruptible: - -########################################################### -How do I use Spot, Pre-emptible Instances? Interruptible? -########################################################### - -What is interruptible? -====================== - -Interruptible allows users to specify that their tasks are ok to be scheduled on machines that may get preempted such as AWS spot instances. -`Spot instances `_ can lead up to 90% savings over on-demand. Anyone looking to realize cost savings should look into interruptible. - -What are Spot Instances? -======================== - -Spot Instances are unused EC2 capacity in AWS. Spot instances are available at up to a 90% discount compared to on-demand prices. The caveat is that at any point these instances can be preempted and no longer be available for use. This can happen due to: - -* Price โ€“ The Spot price is greater than your maximum price. -* Capacity โ€“ If there are not enough unused EC2 instances to meet the demand for Spot Instances, Amazon EC2 interrupts Spot Instances. The order in which the instances are interrupted is determined by Amazon EC2. -* Constraints โ€“ If your request includes a constraint such as a launch group or an Availability Zone group, these Spot Instances are terminated as a group when the constraint can no longer be met. - -As a general rule of thumb, most spot instances are obtained for around 2 hours (median), with the floor being around 20 minutes, and the ceiling being unbounded duration. - -Setting Interruptible -===================== - -In order to run your workload on spot, you can set interruptible to True. Example: - -.. code-block:: python - - @task(cache_version='1', interruptible=True) - def add_one_and_print(value_to_print: int) -> int: - return value_to_print + 1 - - -By setting this value, Flyte will schedule your task on an ASG with only spot instances. In the case your task gets preempted, Flyte will retry your task on a non-spot instance. This retry will not count towards a retry that a user sets. - - -What tasks should be set to interruptible? -========================================== - -Most Flyte workloads should be good candidates for spot instances. If your task does not exhibit the following properties, then the recommendation would be to set interruptible to true. - -* Time sensitive. I need this to run now and can not have any unexpected delays. -* Side Effects. My task is not idempotent and retrying will cause issues. -* Long Running Task. My task takes > 2 hours. Having an interruption during this time frame could potentially waste a lot of computation already done. - - -How to recover from interruptions? -=================================== - -.. NOTE:: - - Coming soon ๐Ÿ›  diff --git a/rsts/howto/labels_annotations.rst b/rsts/howto/labels_annotations.rst deleted file mode 100644 index 4cb54fdcc9..0000000000 --- a/rsts/howto/labels_annotations.rst +++ /dev/null @@ -1,54 +0,0 @@ -.. _howto_labels_annotations: - -#################################### -How to add Labels and Annotations? -#################################### -In Flyte, workflow executions are created as kubernetes resources. These can be extended with -`labels `_ and -`annotations `_. - -**Labels** and **annotations** are key value pairs which can be used to identify workflows for your own uses. -Labels are meant to be used as identifing attributes whereas annotations are arbitrary, *non-identifying* metadata. - -Using labels and annotations is entirely optional. They can be used to categorize and identify workflow executions. - -Labels and annotations are optional parameters to launch plan and execution invocations. In the case an execution -defines labels and/or annotations *and* the launch plan does as well, the execution spec values will be preferred. - -Launch plan usage example -------------------------- - -.. code:: python - - from flytekit.models.common import Labels, Annotations - - @workflow - class MyWorkflow(object): - ... - - my_launch_plan = MyWorkflow.create_launch_plan( - labels=Labels({"myexecutionlabel": "bar", ...}), - annotations=Annotations({"region": "SEA", ...}), - ... - ) - - my_launch_plan.execute(...) - -Execution example ------------------ - -.. code:: python - - from flytekit.models.common import Labels, Annotations - - @workflow - class MyWorkflow(object): - ... - - my_launch_plan = MyWorkflow.create_launch_plan(...) - - my_launch_plan.execute( - labels=Labels({"myexecutionlabel": "bar", ...}), - annotations=Annotations({"region": "SEA", ...}), - ... - ) diff --git a/rsts/howto/launchplans.rst b/rsts/howto/launchplans.rst deleted file mode 100644 index 511af6f8a1..0000000000 --- a/rsts/howto/launchplans.rst +++ /dev/null @@ -1,72 +0,0 @@ -.. _howto-lanuchplans: - -########################## -How do I add launch plans? -########################## - -When to use launchplans? -======================== - -- I want multiple schedules for my workflow with zero or more predefined inputs. -- I want to run the same workflow but with a different set of notifications. -- I want to share my workflow with another user with the inputs already set so that the other user can simply kick off an execution. -- I want to share my workflow to another user but also make sure that some inputs can be overridden if needed. -- I want to share my workflow with another user but make sure that some inputs are never changed. - -For preliminary examples on using launch plans in code, check out the canonical :std:ref:`User Guide ` examples. - -Partial Inputs for Launchplans -============================== -Launch plans bind a partial or complete list of inputs necessary to launch a workflow. Launch plan inputs must only assign inputs already defined in the reference workflow definition. Refer to to :ref:`primary launch plan concept documentation ` for a detailed introduction to launch plan input types. - -For example, let's say you had the following workflow definition: - -.. code:: python - - from datetime import datetime - from flytekit import workflow - - @workflow - def MyWorkflow(region: str, run_date: datetime, sample_size: int=100): - ... - -If you wanted to run the workflow over a set of date ranges for a specific region, you could define the following launch plan: - -.. code:: python - - from flytekit import LaunchPlan - - sea_launch_plan = LaunchPlan.create( - "sea_lp", - MyWorkflow, - default_inputs={'sample_size': 1000}, - fixed_inputs={'region': 'SEA'}, - ) - -Some things to note here, we redefine the ``sample_size`` input in the launch plan - this is perfectly fine. -Workflow inputs with default values can be redefined in launch plans. If we decide at execution creation time to adjust -``sample_size`` once more that's also perfectly fine because default_inputs in a launch plan can also be overriden. -However the region input is *fixed* and cannot be overriden. - -The launch plan doesn't assign run_date but there is nothing wrong with creating a launch plan that assigns -all workflow inputs (either as default or fixed inputs). The only requirement is that all required inputs (that is, those -without default values) must be resolved at execution time. The call to create an execution can still accept inputs -should your launch plan not define the complete set. - -Backfills with Launchplans -========================== - -Let's take a look at how to use the launch plan to create executions programmatically, for example to backfill: - -.. code:: python - - from datetime import timedelta, date - - run_date = date(2020, 1, 1) - end_date = date(2021, 1, 1) - one_day = timedelta(days=1) - while run_date < end_date: - sea_launch_plan(run_date=run_date) - run_date += one_day - -And boom, you've got a year's worth of executions created! diff --git a/rsts/howto/managing_customizable_resources.rst b/rsts/howto/managing_customizable_resources.rst deleted file mode 100644 index 1e6e13c380..0000000000 --- a/rsts/howto/managing_customizable_resources.rst +++ /dev/null @@ -1,185 +0,0 @@ -.. _howto-managing-customizable-resources: - -######################################################################################## -How do I configure my Flyte deployment to have specialized behavior per project/domain? -######################################################################################## - -As the complexity of your user base grows, you may find yourself tweaking resource assignments based on specific projects, domains and workflows. This document walks through how and in what ways you can configure your Flyte deployment. - - -*************************** -Configurable Resource Types -*************************** - -Flyte allows these custom settings along the following combination of dimensions - -- domain -- project and domain -- project, domain, and name (must be either a workflow name or a launch plan name) - -Please see the :ref:`divedeep-projects` document for more information on projects and domains. Along these dimensions, the following settings are configurable. Note that not all three of the combinations above are valid for each of these settings. - -- Defaults for task resource requests and limits (when not specified by the author of the task). -- Settings for project-namespaced cluster resource configuration that feeds into Admin's cluster resource manager. -- Execution queues that are used for Dynamic Tasks. Read more about execution queues here, but effectively they're meant to be used with constructs like AWS Batch. -- Determining how workflow executions get assigned to clusters in a multi-cluster Flyte deployment. - -The proto definition is the definitive source of which -`matchable attributes ` -can be customized. - -Each of the four above settings are discussed below. Eventually all of these customizations will be overridable using -:std:ref:`flytectl`. Until then, flyte-cli command line options can be used to modify frequent use-cases, and barring -that we show examples using curl. - - -Task Resources -============== - -This includes setting default value for task resource requests and limits for the following resources: - -- cpu -- gpu -- memory -- storage - -In the absence of an override the global -`default values `__ -in the flyteadmin config are used. - -The override values from the database are assigned at execution time. - -To update individual project-domain attributes, use the following as an example: - -.. prompt:: bash - - curl --request PUT 'https://flyte.company.net/api/v1/project_domain_attributes/projectname/staging' \ - --header 'Content-Type: application/json' --data-raw \ - '{"attributes":{"matchingAttributes":{"taskResourceAttributes":{"defaults":{"cpu": "1000", "memory": "5000Gi"}, "limits": {"cpu": "4000"}}}}' - - - -Cluster Resources -================= - -These are free-form key-value pairs which are used when filling in the templates that Admin feeds into its cluster manager. The keys represent templatized variables in `clusterresource template yaml `__ and the values are what you want to see filled in. - -In the absence of custom override values, templateData from the `flyteadmin config `__ is used as a default. - -Note that these settings can only take on domain, or a project and domain specificity. Since Flyte has not tied in the notion of a workflow or a launch plan to any Kubernetes constructs, specifying a workflow or launch plan name doesn't make any sense. - -Running the following, will make it so that when Admin fills in cluster resource templates, the K8s namespace ``flyteexamples-development`` will have a resource quota of 1000 CPU cores and 5TB of memory. - -.. prompt:: bash - - flyte-cli -h localhost:30081 -p flyteexamples -d development update-cluster-resource-attributes \ - --attributes projectQuotaCpu 1000 --attributes projectQuotaMemory 5000Gi - - -Similarly this can be done through `flytectl update cluster-resource-attribute `__ - - -These values will in turn be used to fill in the template fields, for example: - -.. rli:: https://raw.githubusercontent.com/flyteorg/flyte/master/kustomize/base/single_cluster/headless/config/clusterresource-templates/ab_project-resource-quota.yaml - -from the base of this repository for the ``flyteexamples-development`` namespace and that namespace only. -For other namespaces, the `platform defaults `__ would still be applied. - -======= - - flyte-cli -h localhost:30081 -p flyteexamples -d development update-cluster-resource-attributes \ - --attributes projectQuotaCpu 1000 --attributes projectQuotaMemory 5000Gi - - -These values will in turn be used to fill in the template fields, for example: - -.. rli:: https://raw.githubusercontent.com/flyteorg/flyte/master/kustomize/base/single_cluster/headless/config/clusterresource-templates/ab_project-resource-quota.yaml - -from the base of this repository for the ``flyteexamples-development`` namespace and that namespace only. -For other namespaces, the `platform defaults `__ would still be applied. - -.. note:: - - The template values, e.g. ``projectQuotaCpu`` or ``projectQuotaMemory`` are freeform strings. You must ensure that - they match the template placeholders in your `template file `__ - for your changes to take effect. - -Execution Queues -================ - -Execution queues are use to determine where tasks yielded by a dynamic :py:func:`flytekit:flytekit.maptask` run. - -Execution queues themselves are currently defined in the -`flyteadmin config `__. - -The **attributes** associated with an execution queue must match the **tags** for workflow executions. The tags are associated with configurable resources -stored in the Admin database. - -.. prompt:: bash - - flyte-cli -h localhost:30081 -p flyteexamples -d development update-execution-queue-attributes \ - --tags critical --tags gpu_intensive - -You can view existing attributes for which tags can be assigned by visting `http://localhost:30081/api/v1/matchable_attributes?resource_type=3 `__. - -Execution Cluster Label -======================= - -This allows forcing a matching execution to always execute on a specific kubernetes cluster. - -You can set this using flyte-cli: - -.. prompt:: bash - - flyte-cli -h localhost:30081 -p flyteexamples -d development update-execution-cluster-label --value mycluster - - -********* -Hierarchy -********* - -Increasing specifity defines how matchable resource attributes get applied. The available configurations, in order of decreasing specifity are: - - -#. Domain, project, workflow name and launch plan. - -#. Domain, project and workflow name - -#. Domain and project - -#. Domain - -Default values for all and per-domain attributes may be specified in the flyteadmin config as documented above. - - -Example -======= - -Let's say that our database includes the following - -+------------+--------------+----------+-------------+-----------+ -| Domain | Project | Workflow | Launch Plan | Tags | -+============+==============+==========+=============+===========+ -| production | widgetmodels | | | critical | -+------------+--------------+----------+-------------+-----------+ -| production | widgetmodels | Demand | | supply | -+------------+--------------+----------+-------------+-----------+ - -Any inbound CreateExecution requests with **[Domain: Production, Project: widgetmodels, Workflow: Demand]** for any launch plan would have a tag value of "supply". -Any inbound CreateExecution requests with **[Domain: Production, Project: widgetmodels]** for any workflow other than Demand and for any launch plan would have a tag value of "critical". - -All other inbound CreateExecution requests would use the default values specified in the flyteadmin config (if any). - -********* -Debugging -********* - -Use the `get `__ endpoint -to see if overrides exist for a specific resource. - -E.g. `https://example.com/api/v1/project_domain_attributes/widgetmodels/production?resource_type=2 `__ - -To get the global state of the world, use the list all endpoint, e.g. `https://example.com/api/v1/matchable_attributes?resource_type=2 `__. - -The resource type enum (int) is defined in the :std:ref:`matchableresource `. diff --git a/rsts/howto/monitoring/index.rst b/rsts/howto/monitoring/index.rst deleted file mode 100644 index d8fa308e18..0000000000 --- a/rsts/howto/monitoring/index.rst +++ /dev/null @@ -1,20 +0,0 @@ -.. _howto-monitoring: - -###################################### -How do I monitor my Flyte deployment? -###################################### - -.. tip:: The flyte core team publishes a maintains Grafana dashboards built using prometheus data source and can be found `here `__. - -Flyte Backend is written in Golang and exposes stats using Prometheus. The Stats themselves are labeled with the Workflow, Task, Project & Domain whereever appropriate. - -The dashboards are divided into primarily 2 types - -- User facing dashboards. These are dashboards that a user can use to triage/investigate/observe performance and characterisitics for their Workflows and tasks. - The User facing dashboard is published under Grafana marketplace ID `13980 `_ - -- System Dashboards. These dashboards are useful for the system maintainer to maintain their Flyte deployments. These are further divided into - - DataPlane/FlytePropeller dashboards published @ `13979 `_ - - ControlPlane/Flyteadmin dashboards published @ `13981 `_ - -These are basic dashboards and do no include all the metrics that are exposed by Flyte. You can contribute to the dashboards and help us improve them - by referring to the build scripts `here `__. diff --git a/rsts/howto/multi_cluster/index.rst b/rsts/howto/multi_cluster/index.rst deleted file mode 100644 index 0bf2ff0105..0000000000 --- a/rsts/howto/multi_cluster/index.rst +++ /dev/null @@ -1,150 +0,0 @@ -.. _howto-multi-cluster: - -########################################## -How do I use multiple Kubernetes clusters? -########################################## - -Scaling Beyond Kubernetes -------------------------- - -As described in the high-level architecture doc, the Flyte Control Plane sends workflows off to the Data Plane for execution. -The Data Plane fulfills these workflows by launching pods in kubernetes. - -At some point, your total compute needs could exceed the limits of a single kubernetes cluster. -To address this, you can deploy the Data Plane to several isolated kubernetes clusters. -The Control Plane (FlyteAdmin) can be configured to load-balance workflows across these isolated Data Planes. -This protects you from a failure in a single kubernetes cluster, and increases scalability. - -First, you'll need to create additional kubernetes clusters. For this example, we'll assume you have 3 kubernetes clusters, and can access them all with ``kubectl``. We'll call these clusters "cluster1", "cluster2", and "cluster3". - -We want to deploy **just** the Data Plane to these clusters. To do this, we'll remove the DataPlane components from the ``flyte`` overlay, and create a new overlay containing **only** the dataplane resources. - -Data Plane Deployment -********************* - -NOTE: - With v0.8.0 and the entire setup overhaul, this section is getting revisited. Keep on the lookout for an update soon - -To create the "Data Plane only" overlay, lets make a ``dataplane`` subdirectory inside our main deployment directory (my-flyte-deployment). This directory will contain contain only the dataplane resources. :: - - mkdir dataplane - -Now, lets copy the ``flyte`` config into the dataplane config :: - - cp flyte/kustomization.yaml dataplane/kustomization.yaml - -Since the dataplane resources will live in the new deployment, they are no longer needed in the main ``flyte`` deployment. Remove the Data Plane resources from the flyte deploy by opening ``flyte/kustomization.yaml`` and removing everything in the ``DATA PLANE RESOURCES`` section. - -Likewise, the User Plane / Control Plane resources are not needed in the dataplane deployment. Remove these resources from the dataplane deploy by opening ``dataplane/kustomization.yaml`` and removing everything in the ``USER PLANE / CONTROL PLANE RESOURCES`` section. - -Now Run :: - - kustomize build dataplane > dataplane_generated.yaml - -You will notice that the only the Data Plane resources are included in this file. - -You can point your ``kubectl`` context at each of the 3 clusters and deploy the dataplane with :: - - kubectl apply -f dataplane_generated.yaml - -User and Control Plane Deployment -********************************* - -In order for FlyteAdmin to create "flyteworkflows" on the 3 remote clusters, it will need a secret ``token`` and ``cacert`` to access each cluster. - -Once you have deployed the dataplane as described above, you can retrieve the "token" and "cacert" by pointing your ``kubectl`` context each dataplane cluster and executing the following commands. - -:token: - ``kubectl get secrets -n flyte | grep flyteadmin-token | awk '{print $1}' | xargs kubectl get secret -n flyte -o jsonpath='{.data.token}'`` - -:cacert: - ``kubectl get secrets -n flyte | grep flyteadmin-token | awk '{print $1}' | xargs kubectl get secret -n flyte -o jsonpath='{.data.ca\.crt}'`` - -These credentials will need to be included in the Control Plane. Create a new file ``admindeployment/secrets.yaml`` that looks like this :: - - apiVersion: v1 - kind: Secret - metadata: - name: cluster_credentials - namespace: flyte - type: Opaque - data: - cluster_1_token: {{ cluster 1 token here }} - cluster_1_cacert: {{ cluster 1 cacert here }} - cluster_2_token: {{ cluster 2 token here }} - cluster_2_cacert: {{ cluster 2 cacert here }} - cluster_3_token: {{ cluster 3 token here }} - cluster_3_cacert: {{ cluster 3 cacert here }} - -Include the new ``secrets.yaml`` file in the ``admindeployment`` by opening ``admindeployment/kustomization.yaml`` and add the following line under ``resources:`` to include the secrets in the deploy :: - - - secrets.yaml - -Next, we'll need to attach these secrets to the FlyteAdmin pods so that FlyteAdmin can access them. Open ``admindeployment/deployment.yaml`` and add an entry under ``volumes:`` :: - - volumes: - - name: cluster_credentials - secret: - secretName: cluster_credentials - -Now look for the container labeled ``flyteadmin``. Add a ``volumeMounts`` to that section. :: - - volumeMounts: - - name: cluster_credentials - mountPath: /var/run/credentials - -This mounts the credentials inside the FlyteAdmin pods, however, FlyteAdmin needs to be configured to use these credentials. Open the ``admindeployment/configmap.yaml`` file and add a ``clusters`` key to the configmap, with an entry for each cluster :: - - clusters: - - name: "cluster_1" - endpoint: {{ your-cluster-1-kubeapi-endpoint.com }} - enabled: true - auth: - type: "file_path" - tokenPath: "/var/run/credentials/cluster_1_token" - certPath: "/var/run/credentials/cluster_1_cacert" - - name: "cluster_2" - endpoint: {{ your-cluster-2-kubeapi-endpoint.com }} - auth: - enabled: true - type: "file_path" - tokenPath: "/var/run/credentials/cluster_2_token" - certPath: "/var/run/credentials/cluster_2_cacert" - - name: "cluster_3" - endpoint: {{ your-cluster-3-kubeapi-endpoint.com }} - enabled: true - auth: - type: "file_path" - tokenPath: "/var/run/credentials/cluster_3_token" - certPath: "/var/run/credentials/cluster_3_cacert" - -Now re-run :: - - kustomize build flyte > flyte_generated.yaml - -You will notice that the Data Plane resources have been removed from the ``flyte_generated.yaml`` file, and your new configurations have been added. - -Deploy the user/control plane to one cluster (you could use one of 3 existing clusters, or an entirely separate cluster). :: - - kubectl apply -f flyte_generated.yaml - - -FlyteAdmin Remote Cluster Access -********************************* - -Some deployments of Flyte may choose to run the control plane separate from the data plane. Flyte Admin is designed to create kubernetes resources in one or more Flyte data plane clusters. For Admin to access remote clusters, it needs credentials to each cluster. In kubernetes, scoped service credentials are created by configuring a โ€œRoleโ€ resource in a Kubernetes cluster. When you attach that role to a โ€œServiceAccountโ€, Kubernetes generates a bearer token that permits access. We create a flyteadmin `ServiceAccount `_ in each data plane cluster to generate these tokens. - -When you first create the Flyte Admin ServiceAccount in a new cluster, a bearer token is generated, and will continue to allow access unless the ServiceAccount is deleted. Once we create the Flyte Admin ServiceAccount on a cluster, we should never delete it. In order to feed the credentials to Flyte Admin, you must retrieve them from your new data plane cluster, and upload them to Admin somehow (within Lyft, we use Confidant for example). - -The credentials have two parts (ca cert, bearer token). Find the generated secret via :: - - kubectl get secrets -n flyte | grep flyteadmin-token - -Once you have the name of the secret, you can copy the ca cert to your clipboard with :: - - kubectl get secret -n flyte {secret-name} -o jsonpath='{.data.ca\.crt}' | base64 -D | pbcopy - -You can copy the bearer token to your clipboard with :: - - kubectl get secret -n flyte {secret-name} -o jsonpath='{.data.token}โ€™ | base64 -D | pbcopy - diff --git a/rsts/howto/new_project.rst b/rsts/howto/new_project.rst deleted file mode 100644 index b605a04ebd..0000000000 --- a/rsts/howto/new_project.rst +++ /dev/null @@ -1,32 +0,0 @@ -.. _howto_new_project: - -########################## -Registering a New Project -########################## - -Using Flytectl ---------------- - -.. NOTE:: - - Coming soon ๐Ÿ›  - - - -Using Flyte-cli ----------------- - -After installing flytekit, you can using ``flyte-cli`` to register a project :: - - flyte-cli register-project -i -h localhost:80 -p myflyteproject --name "My Flyte Project" \ - --description "My very first project onboarding onto Flyte" - - -If you refresh your console you'll see your new project appear! - -FlyteAdmin API Reference -------------------------- - -.. NOTE:: - - Coming soon ๐Ÿ›  diff --git a/rsts/howto/notifications.rst b/rsts/howto/notifications.rst deleted file mode 100644 index e85414da00..0000000000 --- a/rsts/howto/notifications.rst +++ /dev/null @@ -1,118 +0,0 @@ -.. _howto-notifications: - -#################################################### -How do I use and enable notifications on Flyte? -#################################################### - -When a workflow completes, users can be notified by - -* email -* `pagerduty `__ -* `slack `__. - -The content of these notifications is configurable at the platform level. - -***** -Usage -***** - -When a workflow reaches a specified `terminal workflow execution phase `__ -the :py:class:`flytekit:flytekit.Email`, :py:class:`flytekit:flytekit.PagerDuty`, or :py:class:`flytekit:flytekit.Slack` -objects can be used in the construction of a :py:class:`flytekit:flytekit.LaunchPlan`. - -For example - -.. code:: python - - from flytekit import Email, LaunchPlan - from flytekit.models.core.execution import WorkflowExecutionPhase - - # This launch plan triggers email notifications when the workflow execution it triggered reaches the phase `SUCCEEDED`. - my_notifiying_lp = LaunchPlan.create( - "my_notifiying_lp", - my_workflow_definition, - default_inputs={"a": 4}, - notifications=[ - Email( - phases=[WorkflowExecutionPhase.SUCCEEDED], - recipients_email=["admin@example.com"], - ) - ], - ) - - -See detailed usage examples in the :std:ref:`User Guide ` - -Notifications can be combined with schedules to automatically alert you when a scheduled job succeeds or fails. - -Future work -=========== - -Work is ongoing to support a generic event egress system that can be used to publish events for tasks, workflows and -workflow nodes. When this is complete, generic event subscribers can asynchronously process these vents for a rich -and fully customizable experience. - - -****************************** -Platform Configuration Changes -****************************** - -Setting up workflow notifications -================================= - -The ``notifications`` top-level portion of the flyteadmin config specifies how to handle notifications. - -As like in schedules, the notifications handling is composed of two parts. One handles enqueuing notifications asynchronously and the second part handles processing pending notifications and actually firing off emails and alerts. - -This is only supported for Flyte instances running on AWS. - -Config ------- - -To publish notifications, you'll need to set up an `SNS topic `_. - -In order to process notifications, you'll need to set up an `AWS SQS `_ queue to consume notification events. This queue must be configured as a subscription to your SNS topic you created above. - -In order to actually publish notifications, you'll need a `verified SES email address `_ which will be used to send notification emails and alerts using email APIs. - -The role you use to run flyteadmin must have permissions to read and write to your SNS topic and SQS queue. - -Let's look at the following config section and go into what each value represents: :: - - notifications: - type: "aws" - region: "us-east-1" - publisher: - topicName: "arn:aws:sns:us-east-1:{{ YOUR ACCOUNT ID }}:{{ YOUR TOPIC }}" - processor: - queueName: "{{ YOUR QUEUE NAME }}" - accountId: "{{ YOUR ACCOUNT ID }}" - emailer: - subject: "Notice: Execution \"{{ workflow.name }}\" has {{ phase }} in \"{{ domain }}\"." - sender: "flyte-notifications@company.com" - body: > - Execution \"{{ workflow.name }} [{{ name }}]\" has {{ phase }} in \"{{ domain }}\". View details at - - http://flyte.company.com/console/projects/{{ project }}/domains/{{ domain }}/executions/{{ name }}. {{ error }} - -* **type**: in this case because AWS is the only cloud back-end supported for executing scheduled workflows, only ``"aws"`` is a valid value. By default, the no-op executor is used. -* **region**: this specifies which region AWS clients should will use when creating SNS and SQS clients -* **publisher**: This handles pushing notification events to your SNS topic - * **topicName**: This is the arn of your SNS topic -* **processor**: This handles the recording notification events and enqueueing them to be processed asynchronously - * **queueName**: This is the name of the SQS queue which will capture pending notification events - * **accountId**: Your AWS `account id `_ -* **emailer**: This section encloses config details for sending and formatting emails used as notifications - * **subject**: Configurable subject line used in notification emails - * **sender**: Your verified SES email sender - * **body**: Configurable email body used in notifications - -The full set of parameters which can be used for email templating are checked into `code `_. - -.. _admin-config-example: - -Example config -============== - -.. rli:: https://raw.githubusercontent.com/flyteorg/flyteadmin/master/flyteadmin_config.yaml - :lines: 66-80 diff --git a/rsts/howto/performance/index.rst b/rsts/howto/performance/index.rst deleted file mode 100644 index 9c862102e5..0000000000 --- a/rsts/howto/performance/index.rst +++ /dev/null @@ -1,9 +0,0 @@ -.. _howto_performance: - -###################################################### -How do I optimize performance of my Flyte Deployment? -###################################################### - -.. NOTE:: - - Coming soon ๐Ÿ›  diff --git a/rsts/howto/productionize/index.rst b/rsts/howto/productionize/index.rst deleted file mode 100644 index 7dbde0d04c..0000000000 --- a/rsts/howto/productionize/index.rst +++ /dev/null @@ -1,13 +0,0 @@ -.. _howto_productionize: - -############################################## -How do I productionize my Flyte cluster -############################################## - -.. toctree:: - :maxdepth: 1 - :caption: How To Productionize My Flyte Cluster Guides - :name: howtoprovguidestoc - - production - production_eks diff --git a/rsts/howto/productionize/production.rst b/rsts/howto/productionize/production.rst deleted file mode 100644 index e9c0d9ed98..0000000000 --- a/rsts/howto/productionize/production.rst +++ /dev/null @@ -1,173 +0,0 @@ -.. _production: - -Handling Production Load ------------------------- - -In order to handle production load, you'll want to replace the sandbox's object store and PostgreSQL database with production grade storage systems. To do this, you'll need to modify your Flyte configuration to remove the sandbox datastores and reference new ones. - -Flyte Configuration -******************* - -A Flyte deployment contains around 50 kubernetes resources. -The Flyte team has chosen to use the "kustomize" tool to manage these configs. -Take a moment to read the `kustomize docs `__. Understanding kustomize will be important to modifying Flyte configurations. - -The ``/kustomize`` directory in the `flyte repository `__ is designed for use with ``kustomize`` to tailor Flyte deployments to your needs. -Important subdirectories are described below. - -base - The `base directory `__ contains minimal configurations for each Flyte component. - -dependencies - The `dependencies directory `__ contains deploy configurations for components like ``PostgreSQL`` that Flyte depends on. - -These directories were designed so that you can modify them using ``kustomize`` to generate a custom Flyte deployment. -In fact, this is how we create the ``sandbox`` deployment. - -Understanding the sandbox deployment will help you to create your own custom deployments. - -Understanding the Sandbox -************************* - -The sandbox deployment is managed by a set of kustomize `overlays `__ that alter the ``base`` configurations to compose the sandbox deployment. - -The sandbox overlays live in the `kustomize/overlays/sandbox `__ directory. There are overlays for each component, and a "flyte" overlay that aggregates the components into a single deploy file. - -**Component Overlays** - For each modified component, there is an kustomize overlay at ``kustomize/overlays/sandbox/{{ component }}``. - The overlay will typically reference the ``base`` for that component, and modify it to the needs of the sandbox. - - Using kustomize "patches", we add or override specific configs from the ``base`` resources. For example, in the "console" overlay, we specify a patch in the `kustomization.yaml `__. This patch adds memory and cpu limits to the console deployment config. - - Each Flyte component requires at least one configuration file. The configuration files for each component live in the component overlay. For example, the FlyteAdmin config lives at `kustomize/overlays/sandbox/admindeployment/flyteadmin_config.yaml `__. These files get included as Kubernetes configmaps and mounted into pods. - -**Flyte Overlay** - The ``flyte`` overlay is meant to aggregate the components into a single deployment file. - The `kustomization.yaml overlay `__ in that directory lists the components to be included in the deploy. - - We run ``kustomize build`` against the ``flyte`` directory to generate the complete `sandbox deployment yaml `__ we used earlier to deploy Flyte sandbox. - -Creating Your Own Deployment -**************************** - -Before you create a custom deployment, you'll need to `install kustomize `__. - -The simplest way to create your own custom deployment is to clone the sandbox deploy and modify it to your liking. - -NOTE: - This section is getting updated to use the new kustomize installation of Flyte. Link to kustomize - -To do this, check out the ``flyte`` repo :: - - git clone https://github.com/flyteorg/flyte.git - -Copy the sandbox configuration to a new directory on your machine, and enter the new directory :: - - cp -r flyte/kustomize/overlays/sandbox my-flyte-deployment - cd my-flyte-deployment - -Since the ``base`` files are not in your local copy, you'll need to make some slight modifications to reference the ``base`` files from our GitHub repository. :: - - find . -name kustomization.yaml -print0 | xargs -0 sed -i.bak 's~../../../base~github.com/flyteorg/flyte/kustomize/base~' - find . -name kustomization.yaml -print0 | xargs -0 sed -i.bak 's~../../../dependencies~github.com/flyteorg/flyte/kustomize/dependencies~' - find . -name '*.bak' | xargs rm - -You should now be able to run kustomize against the ``flyte`` directory :: - - kustomize build flyte > flyte_generated.yaml - -This will generate a deployment file identical to the sandbox deploy, and place it in a file called ``flyte_generated.yaml`` - -Going Beyond the Sandbox -************************ - -Let's modify the sandbox deployment to use cloud providers for the database and object store. - -Production Grade Database -************************* - -The ``FlyteAdmin`` and ``DataCatalog`` components rely on PostgreSQL to store persistent records. - -In this section, we'll modify the Flyte deploy to use a remote PostgreSQL database instead. - -First, you'll need to set up a reliable PostgreSQL database. The easiest way achieve this is to use a cloud provider like AWS `RDS `__, GCP `Cloud SQL `__, or Azure `PostgreSQL `__ to manage the PostgreSQL database for you. Create one and make note of the username, password, endpoint, and port. - -Next, remove old sandbox database by opening up the ``flyte/kustomization.yaml`` file and deleting database component. :: - - - github.com/flyteorg/flyte/kustomize/dependencies/database - -With this line removed, you can re-run ``kustomize build flyte > flyte_generated.yaml`` and see that the the postgres deployment has been removed from the ``flyte_generated.yaml`` file. - -Now, let's re-configure ``FlyteAdmin`` to use the new database. -Edit the ``admindeployment/flyteadmin_config.yaml`` file, and change the ``storage`` key like so :: - - database: - host: {{ your-database.endpoint }} - port: {{ your database port }} - username: {{ your_database_username }} - password: {{ your_database_password }} - dbname: flyteadmin - -Do the same thing in ``datacatalog/datacatalog_config.yaml``, but use the dbname ``datacatalog`` :: - - database: - host: {{ your-database.endpoint }} - port: {{ your database port }} - username: {{ your_database_username }} - password: {{ your_database_password }} - dbname: datacatalog - -Note: *You can mount the database password into the pod and use the "passwordPath" config to point to a file on disk instead of specifying the password here* - -Next, remove the "check-db-ready" init container from `admindeployment/admindeployment.yaml `__. This check is no longer needed. - -Production Grade Object Store -***************************** - -``FlyteAdmin``, ``FlytePropeller``, and ``DataCatalog`` components rely on an Object Store to hold files. - -In this section, we'll modify the Flyte deploy to use `AWS S3 `__ for object storage. -The process for other cloud providers like `GCP GCS `__ should be similar. - -To start, `create an s3 bucket `__. - -Next, remove the old sandbox object store by opening up the ``flyte/kustomization.yaml`` file and deleting the storage line. :: - - - github.com/flyteorg/flyte/kustomize/dependencies/storage - -With this line gone, you can re-run ``kustomize build flyte > flyte_generated.yaml`` and see that the sandbox object store has been removed from the ``flyte_generated.yaml`` file. - -Next, open the configs ``admindeployment/flyteadmin_config.yaml``, ``propeller/config.yaml``, ``datacatalog/datacatalog_config.yaml`` and look for the ``storage`` configuration. - -Change the ``storage`` configuration in each of these configs to use your new s3 bucket like so :: - - storage: - type: s3 - container: {{ YOUR-S3-BUCKET }} - connection: - auth-type: accesskey - access-key: {{ YOUR_AWS_ACCESS_KEY }} - secret-key: {{ YOUR_AWS_SECRET_KEY }} - region: {{ YOUR-AWS-REGION }} - -Note: *To use IAM roles for authentication, switch to the "iam" auth-type.* - -Next, open ``propeller/plugins/config.yaml`` and remove the `default-env-vars `__ (no need to replace them, the default behavior is sufficient). - -Now if you re-run ``kustomize build flyte > flyte_generated.yaml``, you should see that the configmaps have been updated. - -Run ``kubectl apply -f flyte_generated.yaml`` to deploy these changes to your cluster for a production-ready deployment. - -Dynamically Configured Projects -******************************* - -As your Flyte user-base evolves, adding new projects is as simple as registering them through the cli :: - - flyte-cli register-project -h {{ your-flyte-admin-host.com }} -p myflyteproject --name "My Flyte Project" \ - --description "My very first project onboarding onto Flyte" - -A cron which runs at the cadence specified in flyteadmin config will ensure that all the kubernetes resources necessary for the new project are created and new workflows can successfully -be registered and executed under the new project. - -This project should immediately show up in the Flyte console after refreshing. - diff --git a/rsts/howto/productionize/production_eks.rst b/rsts/howto/productionize/production_eks.rst deleted file mode 100644 index 56731f738c..0000000000 --- a/rsts/howto/productionize/production_eks.rst +++ /dev/null @@ -1,21 +0,0 @@ -.. _production-eks: - -Using AWS EKS to host Flyte ------------------------------- - -Illustration -************* - -.. note:: - - - Flyte needs a prefix in an AWS S3 bucket to store all its metadata. This is where the data about executions, workflows, tasks is stored - - this S3 bucket/prefix should be accessible to all FlytePropeller, FlyteAdmin, Datacatalog and running executions (user pods) - - FlyteAdmin can use any RDBMS database but we recommend Postgres. At scale we have used AWS Aurora - - Datacatalog also uses a postgres database similar to admin. They both could share the same physical instance, but prefer to have 2 logically separate databases - - If you want to use AWS IAM role for SeviceAccounts, then you have to manage the provisioning of the service account and providing it to Flyte at the time of execution - - For secrets, you can use Vault, Kube secrets etc, we are working on getting first class support for this - -.. image:: https://raw.githubusercontent.com/flyteorg/flyte/static-resources/img/core/flyte_single_cluster_eks.png - :alt: Illustration of setting up Flyte Cluster in a single AWS EKS (or any K8s cluster on AWS) - - diff --git a/rsts/howto/resource_manager/index.rst b/rsts/howto/resource_manager/index.rst deleted file mode 100644 index 3ed54ad535..0000000000 --- a/rsts/howto/resource_manager/index.rst +++ /dev/null @@ -1,10 +0,0 @@ -.. _howto_resource_manager: - - -################################################# -How do I enable and configure resource manager? -################################################# - -.. NOTE:: - - Coming soon ๐Ÿ›  diff --git a/rsts/howto/resource_quota.rst b/rsts/howto/resource_quota.rst deleted file mode 100644 index ac609818c2..0000000000 --- a/rsts/howto/resource_quota.rst +++ /dev/null @@ -1,9 +0,0 @@ -.. _howto-resource-quota: - -############################################### -How do I limit resources per project/domain? -############################################### - -.. NOTE:: - - Coming soon ๐Ÿ›  diff --git a/rsts/howto/sandbox.rst b/rsts/howto/sandbox.rst deleted file mode 100644 index d8305fca72..0000000000 --- a/rsts/howto/sandbox.rst +++ /dev/null @@ -1,157 +0,0 @@ -.. _howto-sandbox: - -################################ -How do I try out/install Flyte? -################################ - - -********************** -What is Flyte Sandbox? -********************** -Flyte can be run using a Kubernetes cluster only. This installs all the dependencies as kubernetes deployments. We call this a Sandbox deployment. Flyte sandbox can be deployed by simply applying a kubernetes YAML. - -.. note:: - - #. A Sandbox deployment takes over the entire cluster - #. It needs special cluster roles that will need access to create namespaces, pods etc - #. The sandbox deployment is not suitable for production environments. For an in-depth overview of how to productionize your flyte deployment, checkout the :ref:`howto_productionize`. - - -.. image:: https://raw.githubusercontent.com/flyteorg/flyte/static-resources/img/core/flyte_sandbox_single_k8s_cluster.png - :alt: Architecture of Sandbox deployment of Flyte. Single K8s cluster - - -********************************************************* -Deploy Flyte Sandbox environment locally - on your laptop -********************************************************* - -Ensure ``kubectl`` is installed. Follow `kubectl installation docs `_. On Mac:: - - brew install kubectl - - - -.. tabs:: - - .. tab:: Docker Image - - Refer to :ref:`getting-started-firstrun` - - .. tab:: k3d - - #. Install k3d Using ``curl``:: - - curl -s https://raw.githubusercontent.com/rancher/k3d/main/install.sh | bash - - Or Using ``wget`` :: - - wget -q -O - https://raw.githubusercontent.com/rancher/k3d/main/install.sh | bash - - #. Start a new K3s cluster called flyte:: - - k3d cluster create -p "30081:30081" --no-lb --k3s-server-arg '--no-deploy=traefik' --k3s-server-arg '--no-deploy=servicelb' flyte - - #. Ensure the context is set to the new cluster:: - - kubectl config set-context flyte - - #. Install Flyte:: - - kubectl create -f https://raw.githubusercontent.com/flyteorg/flyte/master/deployment/sandbox/flyte_generated.yaml - - - #. Connect to `FlyteConsole `__ - #. [Optional] You can delete the cluster once you are done with the tutorial using - :: - - k3d cluster delete flyte - - - .. note:: - - #. Sometimes Flyteconsole will not open up. This is probably because your docker networking is impacted. One solution is to restart docker and re-do the previous steps. - #. To debug you can try a simple excercise - run nginx as follows:: - - docker run -it --rm -p 8083:80 nginx - - Now connect to `locahost:8083 `__. If this does not work, then for sure the networking is impacted, please restart docker daemon. - - .. tab:: Docker-Mac + K8s - - #. `Install Docker for mac with Kubernetes as explained here `_ - #. Make sure Kubernetes is started and once started make sure your kubectx is set to the `docker-desktop` cluster, typically :: - - kubectl config set-context docker-desktop - - #. Install Flyte:: - - kubectl create -f https://raw.githubusercontent.com/flyteorg/flyte/master/deployment/sandbox/flyte_generated.yaml - - - #. Connect to `FlyteConsole `__ - - .. tab:: Using Minikube (Not recommended) - - #. Install `Minikube `_ - - #. Install Flyte:: - - kubectl create -f https://raw.githubusercontent.com/flyteorg/flyte/master/deployment/sandbox/flyte_generated.yaml - - - .. note:: - - - Minikube runs in a Virtual Machine on your host - - So if you try to access the flyte console on localhost, that will not work, because the Virtual Machine has a different IP address. - - Flyte runs within Kubernetes (minikube), thus to access FlyteConsole, you cannot just use https://localhost:30081/console, you need to use the IP address of the minikube VM instead of localhost - - Refer to https://kubernetes.io/docs/tutorials/hello-minikube/ to understand how to access a - also to register workflows, tasks etc or use the CLI to query Flyte service, you have to use the IP address. - - If you are building an image locally and want to execute on Minikube hosted Flyte environment, please push the image to docker registry running on the Minikube VM. - - Another alternative is to change the docker host, to build the docker image on the Minikube hosted docker daemon. https://minikube.sigs.k8s.io/docs/handbook/pushing/ provides more detailed information about this process. As a TL;DR, Flyte can only run images that are accessible to Kubernetes. To make an image accessible, you could either push it to a remote registry or to a regisry that is available to Kuberentes. In case on minikube this registry is the one thats running on the VM. - - -.. _howto-sandbox-dedicated-k8s-cluster: - -****************************************************************** -Deploy Flyte Sandbox environment to a Cloud Kubernetes cluster -****************************************************************** - -Cluster Requirements -==================== - -Ensure you have kubernetes up and running on your choice of cloud provider: - -- `AWS EKS `_ (Amazon) -- `GCP GKE `_ (Google) -- `Azure AKS `_ (Microsoft) - -If you can access your cluster with ``kubectl cluster-info``, you're ready to deploy Flyte. - - -Deployment -========== - -We'll proceed like with :ref:`locally hosted flyte ` with deploying the sandbox -Flyte configuration on your remote cluster. - -.. warning:: - The sandbox deployment is not suitable for production environments. For an in-depth overview of how to productionize your flyte deployment, checkout the :ref:`howto_productionize`. - -#. The Flyte sandbox can be deployed with a single command :: - - kubectl create -f https://raw.githubusercontent.com/flyteorg/flyte/master/deployment/sandbox/flyte_generated.yaml - - -#. You can now port-forward (or if you have load-balancer enabled then get an LB) to connect to remote FlyteConsole, as follows:: - - kubectl port-forward svc/envoy 30081:80 - - -#. Open console http://localhost:30081/console. - -*************************************************************** -Deploy Flyte Sandbox environment to a shared kubernetes cluster -*************************************************************** - -The goal here is to deploy to an existing Kubernetes cluster - within one namespace only. This would allow multiple Flyte clusters to run within one K8s cluster. - -.. NOTE:: coming soon! diff --git a/rsts/howto/secrets.rst b/rsts/howto/secrets.rst deleted file mode 100644 index 939c00f5d6..0000000000 --- a/rsts/howto/secrets.rst +++ /dev/null @@ -1,62 +0,0 @@ -.. _howto-secrets: - -################################ -How to Inject Secrets Into Tasks -################################ - - -************************** -What Is Secrets Injection? -************************** - -Flyte supports running a wide variety of tasks; from containers to sql queries and service calls. In order for flyte-run -containers to request and access secrets, flyte now natively supports a Secret construct. - -For a simple task that launches a Pod, the flow will look something like this: - -.. image:: https://mermaid.ink/img/eyJjb2RlIjoic2VxdWVuY2VEaWFncmFtXG4gICAgUHJvcGVsbGVyLT4-K1BsdWdpbnM6IENyZWF0ZSBLOHMgUmVzb3VyY2VcbiAgICBQbHVnaW5zLT4-LVByb3BlbGxlcjogUmVzb3VyY2UgT2JqZWN0XG4gICAgUHJvcGVsbGVyLT4-K1Byb3BlbGxlcjogU2V0IExhYmVscyAmIEFubm90YXRpb25zXG4gICAgUHJvcGVsbGVyLT4-K0FwaVNlcnZlcjogQ3JlYXRlIE9iamVjdCAoZS5nLiBQb2QpXG4gICAgQXBpU2VydmVyLT4-K1BvZCBXZWJob29rOiAvbXV0YXRlXG4gICAgUG9kIFdlYmhvb2stPj4rUG9kIFdlYmhvb2s6IExvb2t1cCBnbG9iYWxzXG4gICAgUG9kIFdlYmhvb2stPj4rUG9kIFdlYmhvb2s6IEluamVjdCBTZWNyZXQgQW5ub3RhdGlvbnMgKGUuZy4gSzhzLCBWYXVsdC4uLiBldGMuKVxuICAgIFBvZCBXZWJob29rLT4-LUFwaVNlcnZlcjogTXV0YXRlZCBQb2RcbiAgICBcbiAgICAgICAgICAgICIsIm1lcm1haWQiOnt9LCJ1cGRhdGVFZGl0b3IiOmZhbHNlfQ - :target: https://mermaid.ink/img/eyJjb2RlIjoic2VxdWVuY2VEaWFncmFtXG4gICAgUHJvcGVsbGVyLT4-K1BsdWdpbnM6IENyZWF0ZSBLOHMgUmVzb3VyY2VcbiAgICBQbHVnaW5zLT4-LVByb3BlbGxlcjogUmVzb3VyY2UgT2JqZWN0XG4gICAgUHJvcGVsbGVyLT4-K1Byb3BlbGxlcjogU2V0IExhYmVscyAmIEFubm90YXRpb25zXG4gICAgUHJvcGVsbGVyLT4-K0FwaVNlcnZlcjogQ3JlYXRlIE9iamVjdCAoZS5nLiBQb2QpXG4gICAgQXBpU2VydmVyLT4-K1BvZCBXZWJob29rOiAvbXV0YXRlXG4gICAgUG9kIFdlYmhvb2stPj4rUG9kIFdlYmhvb2s6IExvb2t1cCBnbG9iYWxzXG4gICAgUG9kIFdlYmhvb2stPj4rUG9kIFdlYmhvb2s6IEluamVjdCBTZWNyZXQgQW5ub3RhdGlvbnMgKGUuZy4gSzhzLCBWYXVsdC4uLiBldGMuKVxuICAgIFBvZCBXZWJob29rLT4-LUFwaVNlcnZlcjogTXV0YXRlZCBQb2RcbiAgICBcbiAgICAgICAgICAgICIsIm1lcm1haWQiOnt9LCJ1cGRhdGVFZGl0b3IiOmZhbHNlfQ - -Where: - -1. Flyte invokes a plugin to create the K8s object. This can be a Pod or a more complex CRD (e.g. Spark, PyTorch, etc.) - - .. tip:: The plugin will ensure that labels and annotations are passed through to any Pod that will be spawned due to the creation of the CRD. - -3. Flyte will apply labels and annotations that are referenced to all secrets the task is requesting access to. -4. Flyte will send a POST request to ApiServer to create the object. -5. Before persisting the Pod, ApiServer will invoke all registered Pod Webhooks. Flyte's Pod Webhook will be called. -6. Flyte Pod Webhook will then lookup globally mounted secrets for each of the requested secrets. -7. If found, Pod Webhook will mount them directly in the Pod. If not found, it will inject the appropriate annotations to load the secrets for K8s (or Vault or Confidant or any other secret management system plugin configured) into the Pod. - -****************************** -How to Enable Secret Injection -****************************** - -This feature is available in Flytekit v0.17.0+. Here is an example of an annotated task: - -The webhook is included in all overlays in this repo. The deployment file creates (mainly) two things; a Job and a Deployment. - -1) flyte-pod-webhook-secrets Job: This job runs ``flytepropeller webhook init-certs`` command that issues self-signed - CA Certificate as well as a derived TLS certificate and its private key. It stores them into a new secret ``flyte-pod-webhook-secret``. -2) flyte-pod-webhook Deployment: This deployment creates the Webhook pod which creates a MutatingWebhookConfiguration - on startup. This serves as the registration contract with the ApiServer to know about the Webhook before it starts serving - traffic. - -******************* -Scaling the Webhook -******************* - -Vertical Scaling -================= - -To scale the Webhook to be able to process the number/rate of pods you need, you may need to configure a vertical `pod -autoscaler `_. - -Horizontal Scaling -================== - -The Webhook does not make any external API Requests in response to Pod mutation requests. It should be able to handle traffic -quickly, but a benchmark is needed. For horizontal scaling, adding additional replicas for the Pod in the -deployment should be sufficient. A single MutatingWebhookConfiguration object will be used, the same TLS certificate -will be shared across the pods and the Service created will automatically load balance traffic across the available pods. diff --git a/rsts/howto/serviceaccount.rst b/rsts/howto/serviceaccount.rst deleted file mode 100644 index bf720e9f0c..0000000000 --- a/rsts/howto/serviceaccount.rst +++ /dev/null @@ -1,22 +0,0 @@ -.. _howto-serviceaccounts: - -###################################################################### -How do I use Kubernetes ServiceAccounts to access my cloud resources? -###################################################################### - -Kubernetes serviceaccount examples ----------------------------------- - -Configure project-wide kubernetes serviceaccounts by adding the following to your config: - -.. code:: python - - [auth] - kubernetes_service_account=my-kube-service-acct - - -Alternatively, pass the role as an argument to ``create_launch_plan``: - -.. code:: python - - my_lp = MyWorkflow.create_launch_plan(kubernetes_service_account='my-kube-service-acct') \ No newline at end of file diff --git a/rsts/index.rst b/rsts/index.rst index 1a2fd9b92f..eb83c69f1c 100644 --- a/rsts/index.rst +++ b/rsts/index.rst @@ -20,9 +20,8 @@ :hidden: concepts/basics - concepts/core concepts/control_plane - concepts/execution_time + concepts/architecture .. toctree:: :caption: Community @@ -31,9 +30,6 @@ :hidden: Join the Community - community/contribute - community/roadmap - community/troubleshoot .. toctree:: :caption: API Reference @@ -43,14 +39,6 @@ References -.. toctree:: - :caption: How-Tos - :maxdepth: 1 - :name: howtotoc - :hidden: - - plugins/index - howto/index Meet Flyte ========== @@ -122,5 +110,5 @@ Whether you want to write Flyte workflows, deploy the Flyte platform to your k8 * :ref:`Get Started ` * :ref:`Main Concepts ` -* :ref:`Extend Flyte ` +* :ref:`Extend Flyte ` * :ref:`Join the Community ` diff --git a/rsts/plugins/index.rst b/rsts/plugins/index.rst deleted file mode 100644 index d502c7a060..0000000000 --- a/rsts/plugins/index.rst +++ /dev/null @@ -1,12 +0,0 @@ -.. _plugins: - -==================== -Available Extensions -==================== -The following is a list of maintained plugins for Flyte and guides on how to install / use them. - -.. toctree:: - :maxdepth: 1 - :name: pluginstoc - - spark_k8s \ No newline at end of file diff --git a/rsts/plugins/spark_k8s.rst b/rsts/plugins/spark_k8s.rst deleted file mode 100644 index 7acc84342d..0000000000 --- a/rsts/plugins/spark_k8s.rst +++ /dev/null @@ -1,89 +0,0 @@ -.. _plugins-spark-k8s: - -######################################## -Run Spark on your Kubernetes Cluster -######################################## - -.. tip:: If you just looking for examples of spark on flyte - refer to :std:ref:`Cookbook Spark Plugin ` - - -Flyte has an optional plugin that makes it possible to run `Apache Spark `_ jobs native on your kubernetes cluster. This plugin has been used extensively at Lyft and is battle tested. -This makes it extremely easy to run your pyspark (coming soon scala/java) code as a task. The plugin creates a new virtual cluster for the spark execution dynamically and Flyte will manage the execution, auto-scaling -for the spark job. - -.. NOTE:: - - This has been tested at scale and more than 100k Spark Jobs run through Flyte at Lyft. This still needs a large capacity on Kubernetes and careful configuration. - We recommend using Multi-cluster mode - :ref:`howto-multi-cluster`, and enabling :ref:`howto-resource-quota` for large and extremely frequent Spark Jobs. - For extremely short running jobs, this is still not a recommended approach, and it might be better to use a pre-spawned cluster. - -Why use K8s Spark? -=================== -Managing Python dependencies is hard. Flyte makes it easy to version and manage dependencies using Containers. K8s Spark plugin brings all the benefits of containerization -to spark and without needing to manage special spark clusters. - -Pros: ------- -#. Extremely easy to get started and get complete isolation between workloads -#. Every job runs in isolation and has its own virtual cluster - no more nightmarish dependency management -#. Flyte manages everything for you! - -Cons: ------ -#. Short running, bursty jobs are not a great fit - because of the container overhead -#. No interactive spark capabilities available with Flyte K8s spark which is more suited for running, adhoc and/or scheduled jobs - - -How to enable Spark in flyte backend? -====================================== -Flyte Spark uses the `Spark On K8s Operator `_ and a custom built `Flyte Spark Plugin `_. -The plugin is a backend plugin and you have to enable it in your deployment. To enable a plugin follow the steps in :ref:`howto-enable-backend-plugins`. - -You can optionally configure the Plugin as per the - `backend Config Structure `_ and an example Config is defined -`here `_, which looks like, - -.. rli:: https://raw.githubusercontent.com/flyteorg/flyte/master/kustomize/overlays/sandbox/config/propeller/plugins/spark.yaml - :language: yaml - - -Spark in Flytekit -======================== -For a more complete example refer to the :std:ref:`User Guide ` - -#. Ensure you have ``flytekit>=0.16.0`` -#. Enable Spark in backend, following the previous section. -#. Install the `flytekit spark plugin `_ :: - - pip install flytekitplugins-spark - -#. Write regular pyspark code - with one change in ``@task`` decorator. Refer to the example - - .. code-block:: python - - @task( - task_config=Spark( - # this configuration is applied to the spark cluster - spark_conf={ - "spark.driver.memory": "1000M", - "spark.executor.instances": "2", - "spark.driver.cores": "1", - } - ), - cache_version="1", - cache=True, - ) - def hello_spark(partitions: int) -> float: - ... - sess = flytekit.current_context().spark_session - # Regular Pypsark code - ... - - -#. Run it locally - - .. code-block:: python - - hello_spark(partitions=10) - -#. Use it in a workflow (check cookbook) -#. Run it on a remote cluster - To do this, you have to build the correct dockerfile, as explained here :std:ref:`spark-docker-image`. You can also you the `Standard Dockerfile recommended by Spark `_. diff --git a/rsts/reference/index.rst b/rsts/reference/index.rst index c27aeaa579..45491e039d 100644 --- a/rsts/reference/index.rst +++ b/rsts/reference/index.rst @@ -4,22 +4,97 @@ API Reference ############# +.. panels:: + :header: text-center + + .. link-button:: https://flytectl.readthedocs.io + :type: url + :text: FlyteCTL + :classes: btn-block stretched-link + ^^^^^^^^^^^^ + The official Flyte Command-line Interface. + + --- + + .. link-button:: https://flyteidl.readthedocs.io + :type: url + :text: FlyteIDL + :classes: btn-block stretched-link + ^^^^^^^^^^^^ + The core language specification and backend service API specification for Flyte. + + --- + + .. link-button:: https://flytekit.readthedocs.io + :type: url + :text: Flytekit + :classes: btn-block stretched-link + ^^^^^^^^^^^^ + The Python SDK for Flyte. + + --- + + .. link-button:: https://github.com/spotify/flytekit-java + :type: url + :text: Flytekit-java + :classes: btn-block stretched-link + ^^^^^^^^^^^^ + The Java/Scala SDK for Flyte. + + --- + + .. link-button:: https://pkg.go.dev/mod/github.com/flyteorg/flytepropeller + :type: url + :text: FlytePropeller + :classes: btn-block stretched-link + ^^^^^^^^^^^^ + The K8s-native operator that executes Flyte workflows. + + --- + + .. link-button:: https://pkg.go.dev/mod/github.com/flyteorg/flyteadmin + :type: url + :text: FlyteAdmin + :classes: btn-block stretched-link + ^^^^^^^^^^^^ + The service responsible for managing Flyte entities and administering workflow executions. + + --- + + .. link-button:: https://pkg.go.dev/mod/github.com/flyteorg/flyteplugins + :type: url + :text: FlytePlugins + :classes: btn-block stretched-link + ^^^^^^^^^^^^ + Flyte Backend Plugins + + --- + + .. link-button:: https://pkg.go.dev/mod/github.com/flyteorg/datacatalog + :type: url + :text: DataCatalog + :classes: btn-block stretched-link + ^^^^^^^^^^^^ + Service that catalogs data to allow for data discovery, lineage and tagging + .. toctree:: :maxdepth: 1 :caption: API Reference :name: apitoc + :hidden: + FlyteCTL + FlyteIDL Flytekit Python Flytekit Java - FlyteIDL - Flytectl .. toctree:: :maxdepth: 1 :caption: Component Reference (Code docs) :name: componentreftoc + :hidden: FlytePropeller FlyteAdmin