Skip to content

Commit

Permalink
docs: update the tutorial for Config UI
Browse files Browse the repository at this point in the history
  • Loading branch information
merico-devlake authored and d4x1 committed Dec 8, 2023
1 parent d88a6cf commit a8d5dc8
Show file tree
Hide file tree
Showing 8 changed files with 132 additions and 272 deletions.
105 changes: 33 additions & 72 deletions docs/Configuration/Tutorial.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,97 +4,58 @@ sidebar_position: 1
description: Config UI instruction
---

## Overview
The Apache DevLake Config UI allows you to configure the data you wish to collect through a graphical user interface. Visit config-ui at `http://localhost:4000`.
The Apache DevLake Config UI provides a user-friendly interface for configuring the data collection process. To access the Config UI, please visit http://localhost:4000.

## Create a Project
Starting from v0.15, DevLake has introduced the Project feature to alllow viewing project-based metrics, such as DORA. To create a project, simply go to Project on the main navigation, click on the "+ New Project" button and fill out the info on the dialog below.

![img](/img/ConfigUI/BlueprintCreation-v0.15/project.png)

## Create a Blueprint

### Introduction
A Blueprint is a plan that covers all the work to get your raw data ready for query and metric computation in the dashboards. Blueprints can either be used to collect data for a Project or be used alone without being dependent on any Project. To use the Blueprint within a Project, you can create the Blueprint once a Project is created; to use it alone, you can create the Blueprint from the Blueprint page from the main navigation.

For either usage of the Blueprint, creating it consists of four steps:

1. Adding Data Connections: Add new or select from existing data connections for the data you wish to collect
2. Setting Data Scope: Select the scope of data (e.g. GitHub projects or Jira boards) for your data connections
3. Adding Transformations (Optional): Add transformation rules for the data scope you have selected in order to view corresponding metrics
4. Setting the Sync Policies: Set the sync frequency, time range and the skip-on-fail option for your data

For detailed instructions of each data source, please go to their individual configuration docs from the sidebar.
## Basic Configuration
To ensure the proper functioning of DevLake, follow these two key steps:

### Step 1 - Add Data Connections
There are two ways to add data connections to your Blueprint: adding them during the creation of a Blueprint and adding them separately on the Data Integrations page. There is no difference between these two ways.

When adding data connections from the Blueprint, you can either create a new or select from existing data connections.

![img](/img/ConfigUI/BlueprintCreation-v0.15/step1.png)
![img](images/data-connections.png)

- Step 1.1 - Add a connection. Configure the endpoint and authentication details to connect to the source data.

### Step 2 - Set Data Scope
After adding data connections, click on "Next Step" and you will be prompted to select the data scope of each data connection. For instance, for a GitHub connection, you will need to select or enter the projects you wish to sync, and for Jira, you will need to select from your boards.
- Step 1.2 - Add data scope, such as Git repositories, issue boards, or CI/CD pipelines, to determine what data should be collected.

![img](/img/ConfigUI/BlueprintCreation-v0.15/step2-1.png)
![img](/img/ConfigUI/BlueprintCreation-v0.15/step2-2.png)
- Step 1.3 - Add scope config (optional). Define the specific data entities within the data scope for collection or apply transformation rules to the raw API responses.

### Step 3 - Add Transformations (Optional)
This step is required for viewing certain metrics (e.g. Bug Age, Bug Count per 1k Lines of Code and DORA)in the pre-built dashboards that require data transformation. We highly recommend adding Transformations for your data for the best display of the metrics. but you can still view the basic metrics if you skip this step.
### Step 2 - Collect Data in a Project
- Step 2.1 - Create a project. DevLake assesses DORA metrics at the project level. For more information on organizing DevLake projects, please refer to [how to organize DevLake projects](/docs/GettingStarted/HowToOrganizeDevlakeProjects.md) for more details.

![github-add-transformation-rules-list](images/github-set-transformation1.png)
![github-add-transformation-rules](images/github-set-transformation2.png)
- Step 2.2 - Associate connection(s) with the project. When associating a connection with a project, you can select specific data scopes. All connections linked to the same project will be considered part of the same project for calculating DORA metrics.

### Step 4 - Set the Sync Policies
Time Filter: You can select the time range of the data you wish to sync to speed up the collection process.
- Step 2.3 - Set the synchronization policy. Specify the sync frequency, time range and the skip-on-fail option for your data.

Frequency: You can choose how often you would like to sync your data in this step by selecting a sync frequency option or entering a cron code to specify your preferred schedule.
- Step 2.4 - Start data collection. Choose the desired [mode](#step-2---collect-data-in-a-project) for collecting data.

Running Policy: By default, the `Skip failed tasks` is checked to avoid losing all data when encountering a few bugs during data collection, when you are collecting a large volume of data, e.g. 10+ GitHub repositories, Jira boards, etc. For clarity, a task is a unit of a pipeline, an execution of a blueprint. By default, when a task is failed, the whole pipeline will fail and all the data that has been collected will be discarded. By skipping failed tasks, the pipeline will continue to run, and the data collected by successful tasks will not be affected. After the pipeline is finished, you can rerun these failed tasks on the blueprint's detail page.

![img](/img/ConfigUI/BlueprintCreation-v0.15/step4.png)
## Examples
For detailed examples, please refer to the respective documentation files available in this folder, such as [GitHub configuration](GitHub.md), [GitLab configuration](GitLab.md), [Jira configuration](Jira.md) and more. They provide step-by-step instructions and guidance for configuring DevLake with different platforms.

### Step 5 - Collect Data
Upon completing the blueprint configuration, you can proceed to the 'Status' tab to initiate data collection in DevLake. There are three available modes for data collection:
## Q&A

- Collect Data (Default): This mode collects data within the configured time range. Tools and entities that support incremental refresh will utilize this method, while those that only support full refresh will perform a full refresh. This mode is the default for recurring pipelines.
- Collect Data in Full Refresh Mode: In this mode, all existing data within the configured time range will be deleted and re-collected. It is useful for removing outdated or irrelevant data from DevLake that no longer exists in the original tools.
- Re-transform Data: This mode does not collect new data but instead applies the latest transformation rules from the Scope Config to the existing data.
#### Q1. What are the specific sync policies to configure?
- Time Filter: This allows you to select the desired time range for syncing data, optimizing the collection process.

- Frequency: You can determine the frequency of data synchronization by choosing a sync frequency option or specifying a cron code for a custom schedule.

### View the Blueprint Status and Download Logs for Historical Runs
After setting up the Blueprint, you will be prompted to the Blueprint's status page, where you can track the progress of the current run and wait for it to finish before the dashboards become available. You can also view all historical runs of previously created Blueprints from the list on the Blueprint page.
- Running Policy: By default, the "Skip failed tasks" option is enabled. This helps prevent data loss in scenarios where you are collecting a large volume of data (e.g., 10+ GitHub repositories, Jira boards, etc.). When a task fails, this policy allows the pipeline to continue running, preserving the data collected by successful tasks. You can rerun the failed tasks later from the blueprint's detail page.

If you run into any errors, you can also download the pipeline logs and share them with us on Slack so that our developers can help you debug.

![img](/img/ConfigUI/BlueprintEditing/blueprint-edit3.png)
#### Q2. What data collection modes do DevLake support?
Three modes.
- _Collect Data (Default)_: This mode retrieves data within the specified time range. Tools and entities that support incremental refresh will utilize this method, while those that only support full refresh will perform a full refresh. This mode is the default choice for recurring pipelines.
- _Collect Data in Full Refresh Mode_: In this mode, all existing data within the designated time range will be deleted and re-collected. It is useful for removing outdated or irrelevant data from DevLake that no longer exists in the original tools.
- _Re-transform Data_: This mode does not collect new data. Instead, it applies the latest transformation rules from the Scope Config to the existing data.

## Edit a Blueprint (Normal Mode)
If you switch to the Configuration tab on the Blueprint detail page, you can see the settings of your Blueprint and edit them.

In the current version, the Blueprint editing feature **allows** editing:
- The Blueprint's name
- The sync policies
- The data scope of a connection
- The data entities of the data scope
- The transformation rules of any data scope
- Editing any connections

Please note:
If you have created the Blueprint in the Normal mode, you will only be able to edit it in the Normal Mode; if you have created it in the Advanced Mode, please refer to [this guide](AdvancedMode.md#editing-a-blueprint-advanced-mode) for editing.

![img](/img/ConfigUI/BlueprintEditing/blueprint-edit1.png)

## Create and Manage Data Connections

The Data Connections page allows you to view, create and manage all your data connections in one place.
![img](/img/ConfigUI/BlueprintCreation-v0.15/connections.png)
## Troubleshooting

## Manage Transformations
The Transformations page allows you to manage all your transformation rules.
![img](/img/ConfigUI/BlueprintCreation-v0.15/transformations.png)
#### 1. What can be done when a data collection failed or partially succeeded?
- First, re-run the failed task once all other tasks have completed. If the task still fails, proceed to the next steps.
- Capture a screenshot of the error message associated with the failed task.
- Download the logs from the pipeline for further analysis.
- Visit the [GitHub repository](https://github.com/apache/incubator-devlake/issues) and create a bug report. Include the captured screenshot and the downloaded logs in the bug report.

![img](/img/ConfigUI/BlueprintEditing/blueprint-edit3.png)

## Troubleshooting

If you run into any problem, please check [Troubleshooting](/Troubleshooting/Configuration.md), contact us on [Slack](https://join.slack.com/t/devlake-io/shared_invite/zt-17b6vuvps-x98pqseoUagM7EAmKC82xQ) or [create an issue](https://github.com/apache/incubator-devlake/issues).
For other problems, please check the [troubleshooting](/Troubleshooting/Configuration.md) doc, [create an issue](https://github.com/apache/incubator-devlake/issues), or contact us on [Slack](https://join.slack.com/t/devlake-io/shared_invite/zt-17b6vuvps-x98pqseoUagM7EAmKC82xQ).
Binary file added docs/Configuration/images/data-connections.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading

0 comments on commit a8d5dc8

Please sign in to comment.