Skip to content

Commit

Permalink
Merge pull request #1 from NCKU-CSIE-Union/document
Browse files Browse the repository at this point in the history
Document
  • Loading branch information
jason810496 authored Jan 22, 2024
2 parents 5ee5e43 + 2290d18 commit 2cad833
Show file tree
Hide file tree
Showing 6 changed files with 482 additions and 92 deletions.
125 changes: 33 additions & 92 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,102 +1,43 @@
# TSMC-Hackathon-2024-IT-Infra
# TSMC Hackathon 2024 IT Infra

> [!IMPORTANT]
> We use **`Poetry`** to manage python package and virtual environment !!!
## Cloud Run CI/CD
<!--
https://docs.github.com/en/get-started/writing-on-github/getting-started-with-writing-and-formatting-on-github/basic-writing-and-formatting-syntax#alerts
-->

> reference : https://medium.com/@vngauv/from-github-to-gce-automate-deployment-with-github-actions-27e89ba6add8
## Idea Note

### Google Cloud Project

https://console.cloud.google.com/projectcreate

set project name
## TODO

> Archived : use `Artifact Registry` instead of `Container Registry` 👇
> ### Container Registry API Enable
>
> search `container registry` on top search bar
- [AI](#AI)
- [DevOps](#DevOps)
- [Monitor System (GCE)](#Monitor-System-GCE)
- [Consumer (Consumer Cloud)](#Consumer-Cloud-Run)
- [Discord Bot](#Service-Discord-Bot)

### Artifact Registry
## Gitflow

enable Artifact Registry API
### branch
- main
- develop
- test
- document
- feature/xxx
- fix/xxx
- hotfix/xxx

### Auth

two ways to auth
1. Service Account
2. Workload Identity

> Service Account is **much easier** to setup !!!

### IAM

setup service account permission
1. create service account
- using terminal
- using GCP console UI
2. create service account key
3. download service account key
4. set service account key to github secret

#### Create Service Account
1. using terminal
> in local terminal
```
export PROJECT_ID=tsmc-test-412003
gcloud iam service-accounts create "github-service-account" \
--project "${PROJECT_ID}"
```

2. using GCP console UI

#### Create Service Account Key

> both are easy to do
1. using terminal
> in local terminal
```
gcloud iam service-accounts keys create "github-service-account.json" \
--project "${PROJECT_ID}" \
--iam-account "github-service-account@${PROJECT_ID}.iam.gserviceaccount.com"
```

2. using GCP console UI

#### Workload Identity

> in local terminal
create workload identity pool :
```
gcloud iam workload-identity-pools create "github-pool"
--project="${PROJECT_ID}" \
--location="global" \
--display-name="GitHub Deployment Poll"
```
get workload identity pool id :
```
gcloud iam workload-identity-pools describe "github-pool" \
--project="${PROJECT_ID}" \
--location="global" \
--format="value(name)"
```
> return : `projects/111111111/locations/global/workloadIdentityPools/my-pool`

## Github Action : Cloud Run CI/CD from source ( include Build and Deploy )

https://github.com/google-github-actions/example-workflows/blob/main/workflows/deploy-cloudrun/cloudrun-source.yml

### env setup

```
PROJECT_ID: tsmc-test-412003 # TODO: update Google Cloud project id
SERVICE: stateless-service # TODO: update Cloud Run service name
REGION: asia-east1 # TODO: update Cloud Run region
```
- PROJECT_ID : Google Cloud Project ID
- SERVICE : Cloud Run Service Name to be set
- REGION : https://cloud.google.com/compute/docs/regions-zones
### message
- feat: 新增/修改功能 (feature)。
- fix: 修補 bug (bug fix)。
- docs: 文件 (documentation)。
- style: 格式 (不影響程式碼運行的變動 white-space, formatting, missing semicolons, etc.)。
- refactor: 重構 (既不是新增功能,也不是修補 bug 的程式碼變動)。
- perf: 改善效能 (A code change that improves performance)。
- test: 增加測試 (when adding missing tests)。
- chore: 建構程序或輔助工具的變動 (maintain)。
- revert: 撤銷回覆先前的 commit
- ci: DevOps 相關設定
251 changes: 251 additions & 0 deletions docs/Idea.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,251 @@
# TSMC Hackathon 2024

[TOC]

身份證影本
學生證影本
每個參賽者都有裝雙螢幕
可以帶線


Careerhack : hack time
- Resume check

## Questions

### 1. training data

## Cloud Resource

### Cloud Resource Monitering

:::danger

**Cloud Run** Metrics 研究 :

:::

### System Log Monitoring

## Creative

## Presentation

## Evaluation
![截圖 2024-01-19 下午3.46.54](https://hackmd.io/_uploads/rJzZtivt6.png)


# Idea

## AI part

## Data analyze
Backend log levels :
- Warn
- Fatal

### Log Input Format
GCP Provide :
https://cloud.google.com/monitoring/api/metrics_gcp#gcp-run <br>
> time/cpu/ram/req token num/res token num

per timestamp :
- `timestamp` : timestamp
- `level` : log level
- `cpu` : cpu usage rate
- `ram` : ram usage rate
- `remain_count` : remain query count
- `avg_in_token_count` : average token count
- `avg_out_token_count` : average token count
- `detail` : any detail message

threshold :


### Output Format


### Warn
> CPU , RAM , instance count
### Fatal
> DC alerting message with details + LLM interative
## Cloud Run Preformance

## Simulate System status

## Completeness

- [AI](#AI)
- [DevOps](#DevOps)
- [Discord Bot](#Discord-Bot)

## Creativity

## Presentation


# Gitflow

### branch
- main
- develop
- feature/xxx
- fix/xxx

### message
feat: 新增/修改功能 (feature)。
fix: 修補 bug (bug fix)。
docs: 文件 (documentation)。
style: 格式 (不影響程式碼運行的變動 white-space, formatting, missing semicolons, etc.)。
refactor: 重構 (既不是新增功能,也不是修補 bug 的程式碼變動)。
perf: 改善效能 (A code change that improves performance)。
test: 增加測試 (when adding missing tests)。
chore: 建構程序或輔助工具的變動 (maintain)。
revert: 撤銷回覆先前的 commit


### folder

- `RAG` : [name=Jerry] AI Part
- `DC-Bot` : [name=Henry] Discord Bot Part
- `Consumer` : [Consumer (Consumer Cloud)](#Consumer-Cloud-Run)
- `MonitorSystem` [Monitor System (GCE)](#Monitor-System-GCE)

# TODO Before Hackathon

- [AI](#AI)
- [DevOps](#DevOps)
- [Monitor System (GCE)](#Monitor-System-GCE)
- [Consumer (Consumer Cloud)](#Consumer-Cloud-Run)
- [Discord Bot](#Service-Discord-Bot)


## AI

> RAG
## DevOps
- Flow
- Git Flow
- TDD
- Scrum
- CI / CD
- CI
- [ ] Python Code Quality Check
- ruff ( lint )
- [bandit](bandit.readthedocs.io)
- [x] GCP Image Registry ( Github action )
- Pytest
- CD
- GCE
- [x] Cloud Run Deploy ( Github action )
- Discord Bot notify action update
- [drone.io](https://www.drone.io/)
- https://docs.drone.io/server/ha/developer-setup/
- https://github.com/Jim-Chang/KodingWork/blob/master/devops/painless_set_up_drone_ci_cd/docker-compose.yml
- https://koding.work/painless-set-up-drone-ci-cd/
- https://ithelp.ithome.com.tw/articles/10235164
- https://ithelp.ithome.com.tw/articles/10235165

---

## Consumer (Cloud Run)

==用 counter 來實作 queue 就好, **記得加 lock**==

### Random Generate

- [ ] `remain_count` : remain query count
- [ ] random inference time by:
- [ ] `avg_in_token_count` : average token count
- [ ] `avg_out_token_count` : average token count
- [ ] 研究怎麼大量佔用 cpu, mem

### 平常狀態

每個 interval 都先 random(0,1) 看要不要往 queue 塞東西
如果有要塞, 往 queue 裡塞 random(0,N) 個任務

### 開些 API 可以讓系統開始爆炸

範例:
- `/full/cpu?duration=<duration>`
- `/full/mem?duration=<duration>`
- `/full/enqueue?num=<num>`
- `/sleep?duration=<duration>`

### 開些 API 可以達到 xx% 狀態

- `/state/cpu/xx?duration=<duration>`
- `/state/cpu/ram?duration=<duration>`

## Monitor System (GCE)

### Service : Auto scaling

- CPU +-
- RAM +-
- CPU & RAM +-
- Instance Count +-
- instance count +-
- min = max = current running instance
- auto get log

reference :
- [sdk pipy](https://pypi.org/project/google-cloud-run/)
- [github](https://github.com/googleapis/google-cloud-python/tree/main/packages/google-cloud-run)
- [gcloud commad list](https://cloud.google.com/sdk/gcloud/reference/run)
- [gcloud update reference](https://cloud.google.com/sdk/gcloud/reference/run/services/update)
- [ ] python call `gcloud run service update`

### Service : sync log
> for AI input
> call `RAG` afterward
- get current metrics
- get current system log
- notify Discord Bot
- notify DC Bot to send error message

reference :
[cloud run metrics api](https://cloud.google.com/monitoring/api/metrics_gcp#gcp-run)


### Service: Discord Bot

- [discord bot demo repo](https://github.com/NCKU-CSIE-Union/Discord-Bot-Alert-Bot)
- error 通知
- [ ] 研究要怎麼讓其他服務主動發通知
- [ ] 研究 discord broadcast
- 要怎麼發到指定頻道, 以我們伺服器來說, 要打: `1199009519448109127` 這個 channel
- 也可以讓使用者主動要求訂閱 (假設使用者在頻道裡打了 `!sub`, 之後有新 alert 就往那些有訂閱的頻道發通知), 用這個做感覺比較好
- [ ] 漂亮的 Discord 通知 - keyword: `discord embbed message`
- LLM 互動訊息
- [ ] 收到互動訊息要 call jerry 提供的`answer_question(user_message: str) -> str` function

---

## Test

### Unit Test ( Code Coverage )

### Stress Test

# Tasks

## AI
- Jerry

要幫 system 開 1 個 function
- `insert_log` : 讓 system 可以 sync 當前的 log 到 vertor DB

## Discord Bot
- Henry

## DevOps & Monitor System & Consumer
- Peter
- Jason

Loading

0 comments on commit 2cad833

Please sign in to comment.