Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docs how to organize project #637

Merged

Conversation

KucherenkoSerhiy
Copy link
Contributor

@KucherenkoSerhiy KucherenkoSerhiy commented Sep 3, 2023

⚠️   Pre Checklist

Please complete ALL items in this checklist, and remove before submitting

  • I have npm run build and npm run serve locally before submitting this PR
  • I have read through the Contributing Documentation

Summary

WIP, continues PR 634.

Does this close any open issues?

Closes #612

Screenshots

Other Information

I recreated the feature branch to fix a minor mistake without realizing that it force-closed the PR.

@KucherenkoSerhiy KucherenkoSerhiy changed the title Docs how to organize project WIP Docs how to organize project Sep 3, 2023

## 2. What is a DevLake project?
In the real world, a project is something being built and/or researched to solve some problem or to open new grounds.
In software development, a project is just a grouping of something. In DevLake, a `project` is a grouping of `pull requests`, `deployments`, or `incidents`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can make the definition straightforward, something like:

"A DevLake project is a grouping of pull requests, deployments, or incidents. It can be seen as a real-world project or product line. DevLake measures DORA metrics for each project."

In the real world, a project is something being built and/or researched to solve some problem or to open new grounds.
In software development, a project is just a grouping of something. In DevLake, a `project` is a grouping of `pull requests`, `deployments`, or `incidents`.

![](project_simple.png)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I came up with a simple version of the project pic, it introduced the idea of domains, i.e. repos, boards and cicd_scopes based on the original pic. What do you think?
project

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks simple and gives a pipeline-oriented view of what is happening. Since DevLake carries much of the pipeline nature, it is accurate.

The second `project` only has 2 `repositories` worked equal time among them.
The structure will look like the following:

![](project_use_case_1.png)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked at this pic and I can understand it, but I'm worried that it might be a bit complicated for users. I have a few thoughts for your reference:

  1. Make all use cases more real-life. E.g, use 2 Jira boards, 3 GitLab repos, and Deployments are executed via GitLab CI in each GitLab repo to replace the current description and the wordings in the picture.
    image

  2. Convert the single image to several steps according to the Config UI workflow to reduce the cognitive load so that users can follow it step by step. An example for the pic:
    image

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it is complex. Thank you for pointing out some possible ways to break it down, that is what I was looking for.

I'd rather use a different CI/CD platform to avoid possible missing points based on that you can configure both pull requests and deploys with a single connection in the case of GitLab. Using a single connection for multiple entities is a great feature, but as an example, it hides the point of seeing pull requests and deploys as completely separate entities.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@KucherenkoSerhiy You can use CI/CD platform for sure. What I was thinking was that in some use cases, we can use CI/CD platform: GitLab CI; while in others we can use CI/CD platform: Jenkins or CI/CD platform CircleCI(via webhook). The idea is to make them more close to real life. I hope it makes sense.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I re-thought about the annotations here. I think after we introduce the toolchains and how incidents, repos, and deployments are organized in each use case, we can

  1. First, add a text summary describing how many DevLake projects should be created.
  2. Then, show which boards(for incidents), repos(for pull_request), and deployments belong each project.
  3. Then, add a picture with steps showing how to configure the projects in DevLake.

I think this flow will be easier to understand. Looking forward to your feedback.


For JIRA `incident boards` we will create 1 connection per each board.

#### 4.2.1 GitHub connection
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @KucherenkoSerhiy , I think adding the diagrams of creating GitHub connections step by step makes the manual clearer. I suggest you adding the relative link to this doc (https://devlake.apache.org/docs/Configuration/GitHub#step-1---add-data-connections) for users to find more details to create a GitHub connection.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, that is exactly what I was looking for but did not find on time. Thank you very much for pointing it out!

#### 4.2.2 GitHub connection scope
TODO

#### JIRA
Copy link
Contributor

@Startrekzky Startrekzky Sep 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the example of creating a GitHub connection above, to make the doc concise and also reduce the workload, we don't need to add the steps to create connections for every other data source. Instead, we can just attach the existing URLs here.

The URLs can be found in this folder: https://devlake.apache.org/docs/Configuration, please use the relative URL.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the diagram is much simpler. I only have few thoughts on the wordings:

  1. To make the wording consistent, if Jira Board A is used, we should use GitHub Repo 1, Jenkins Job 1 instead.
  2. To make the example more real, we can add a caption under the Scopes, such as adding "Project A's features" under "Jira Board A", "Project A's main app" under "GitHub Repo1", "Shared libraries" under "Repo2", etc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @Startrekzky, Thank you once again for the corrections. I am not sure I understand the second point, but I think I get it. Let me know your thoughts in any case!

The same would apply to other repos (e.g. GitLab or BitBucket), boards (e.g. TAPD),
or CI/CD (e.g. GitLab CI, Azure DevOps).

## 4. Building use case 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the 2nd level heading should be "Use Cases", then adding "Use case 1", "Use case 2" as the 3rd level headings.


## 4. Building use case 1

There are `2 teams` with `2 boards`, 3 `repos`, and 3 `cicd pipelines`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @KucherenkoSerhiy , I'm not sure how you define team or project.

From my perspective, a team is an organizational concept (people-wise); while a project is a project (work-wise). Therefore, in this use case, we might use 2 projects instead of 2 teams.

Also, to make it more real, you can use the example of real-life open-source projects, for example, DevLake and [DevStream](https://github.com/devstream-io/devstream). DevLake is a project that manages 3 repos, apache/incubator-devlake, apache/incubator-devlake-website, and apache/incubator-devlake-helm-chart.

I hope it makes sense.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @Startrekzky, thank you for the tips! Sure, 2 projects sound more coherent to DevLake.
I also left a side note that for DORA there are no teams, only projects.

Regarding DevLake, I agree and also think it should come first as a base use case. The one already in the document is rather an expansion, introducing intersections between projects.

Quick note 2: if you use webhooks, check the [quick note](HowToOrganizeDevlakeProjects.md#5-note-about-webhooks) about them below.

### 4.1. Use Case 1: Projects DevLake and DevStream
DevLake and [DevStream](https://github.com/devstream-io/devstream) are both Apache `projects`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haha, DevStream is not an Apache project. Let's use another project. For your reference:
"Apache DevLake and Sparks are two independent projects of ASF. Assume that ASF wants to compare the DORA metrics of the two projects. What should the ASF do?"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haha, thanks for pointing out the funny mistake and guidance!

Regarding the comparing projects, I'd instead check the health of the development and maintenance with DORA of separate projects rather than compare themselves. There may be many differences between assignments (size of the team/community, problem the project solves, project's age, etc.).

Comparing the projects between themselves might mislead to why use DORA in the first place. Many companies end up having projects organized by teams, so this leads to comparing those teams.
That might lead to higher competitiveness and toxicity and create more problems instead of solving them.

DevLake manages 3 `repos`: [incubator-devlake](https://github.com/apache/incubator-devlake),
[incubator-devlake-website](https://github.com/apache/incubator-devlake-website),
and [incubator-devlake-helm-chart](https://github.com/apache/incubator-devlake-helm-chart).
DevStream also manages 3 `repos`: [devstream](https://github.com/devstream-io/devstream),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please update it to Sparks accordingly.


### 4.1.5 Resulting Metrics

To know if the data of a project is successfully collected go to your DORA Dashboard:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the "go" be removed from the sentence? I highly recommend you use ChatGPT or Grammarly to get all wording/typos fixed in this doc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the grammar clumsiness. I am used to running the Grammarly-ChatGPT-Grammarly pipeline in the end. Also, the files need to be sorted after finishing everything.

First, we create 2 `projects` on DevLake platform, representing both project DevLake and DevStream.
These steps will suffice for now:

![](create_project_1.png)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I retook the screenshots to make it look better.
image
image


To know if the data of a project is successfully collected go to your DORA Dashboard:

![](navigate_to_dora_1.png)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the arrow pointing to the dashboard menu missing here?
image


Following diagram describes the situation.

![](project_use_case_2.png)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is awesome. So clear.

- 1 webhook connection for `incidents` exclusively for it-new `project`


##### QA
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Q&A? QA usually refers to quality assurance.

- 1 connection to Jira to gather `incidents`

For it-legacy `project`:
- 1 connection for `repos` _it-legacy-1_ and _it-legacy-1_
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Normally, we don't recommend users to create multiple connections for a shared data scope, for instance, GitHub repos. This will increase the time to collect the data because the shared repos will be stored in multiple copies in the DB. See pic:
image

Instead, we encourage users to create ONE connection and add all data scopes (p-1, p2, ... , p-10, it-legacy-1, it-legacy-2) in the Data Connection page, and when users:

  • bind this connection to Project payments, they only need to choose 'p-1, p2, ... , p-10'
  • bind this connection to Project it-legacy, they only need to choose 'it-legacy-1, it-legacy-2'
    In this way, the data of 'p-1, p2, ... , p-10, it-legacy-1, it-legacy-2' will only be stored in a single copy. That is to say, the sync-up time will not be doubled/tripled either.

image

So, in my opinion, there should be a '4.2.0. Create Data Connections' before 4.2.1. And the current '4.2.2. Adding Connections' should be renamed to '4.2.2. Adding Connections and repos to Projects'

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a great point. In fact, I think it's good to mention that in the guide.

docs: rework use case
  * change one of the example projects
docs: improve project creation and navigation screenshots
@KucherenkoSerhiy KucherenkoSerhiy changed the title WIP Docs how to organize project Docs how to organize project Sep 25, 2023
Copy link
Contributor

@Startrekzky Startrekzky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. @KucherenkoSerhiy Thanks for your continuous input. I'll merge it first and refine it (mostly about the flow and wording) within the next week.

@Startrekzky Startrekzky merged commit dbeafe3 into apache:main Oct 8, 2023
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Doc][FAQ] Add the doc of "how to organize DevLake project"
2 participants