Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge Discord, ai, monitor and provide a sample a runner for demo #7

Merged
merged 34 commits into from
Jan 25, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
34 commits
Select commit Hold shift + click to select a range
97e34bf
chore: add google-cloud-run dependency
peterxcli Jan 23, 2024
3cebd28
docs: add README.md for Cloud Run Monitor System
peterxcli Jan 23, 2024
85a15b6
feat: add CloudRunManager class for managing Cloud Run instances
peterxcli Jan 23, 2024
ce30c66
Sending Warning Feature with fake function
HenryChang6 Jan 23, 2024
16a7454
chore: add pyproject.toml file with project dependencies
peterxcli Jan 23, 2024
6db83e9
docs: add more logging and metrics API documentation
peterxcli Jan 23, 2024
f0a1781
feat: add get_metrics method for CloudRunManager class
peterxcli Jan 23, 2024
40b10f5
docs: add init gcloud setting
peterxcli Jan 23, 2024
9076e42
add: Training data
jerrykal Jan 24, 2024
0d37b55
feat: Add data preprocessing function and requirements.txt
jerrykal Jan 24, 2024
ac68130
test embedded message
HenryChang6 Jan 24, 2024
d06ff00
Move DC Bot into env
HenryChang6 Jan 24, 2024
ff2f646
feat : sync log debug
jason810496 Jan 24, 2024
414417d
feat: implement the `increase` and `update` functionality of `instanc…
peterxcli Jan 24, 2024
02e4f4f
fix: update CPU and RAM basic type and number checking in cpu and mem…
peterxcli Jan 24, 2024
4c4ff76
docs: add python cloud run sdk docs
peterxcli Jan 24, 2024
d2810ee
chore: add cloud run manager test code to provide more understandability
peterxcli Jan 24, 2024
46bf51a
refactor: Update preprocess function
jerrykal Jan 24, 2024
451e433
feat: Add functions to use LLM to analyze metric data
jerrykal Jan 24, 2024
ecdd9aa
move cloudrun manager to service
peterxcli Jan 24, 2024
478ba7c
Merge remote-tracking branch 'origin/feature/monitor/sync-log' into f…
peterxcli Jan 24, 2024
fc736e1
test thread function
HenryChang6 Jan 25, 2024
79adca2
Merge remote-tracking branch 'origin/feature/genAI' into feat/monitor…
peterxcli Jan 25, 2024
3660c74
refactor: set CloudRunManager class to include project_id and locatio…
peterxcli Jan 25, 2024
dfc16bf
Merge remote-tracking branch 'origin/feature/discord-bot' into feat/m…
peterxcli Jan 25, 2024
dd008fc
chore: add discord dependencies
peterxcli Jan 25, 2024
ea24980
refator: CloudRunManager set runner and monitor client default in ini…
peterxcli Jan 25, 2024
72a050e
chore: add __init__.py in service
peterxcli Jan 25, 2024
3786548
refactor: move poetry files to project root
peterxcli Jan 25, 2024
465394b
change tail log func to return generator and logging library version
peterxcli Jan 25, 2024
0dc2d74
feat: add a runner to the monitor process
peterxcli Jan 25, 2024
eba7525
Merge remote-tracking branch 'origin/dev' into feat/monitor/auto-scaling
peterxcli Jan 25, 2024
9eb6d66
chore: regenerate poetry lock for consumer
peterxcli Jan 25, 2024
fb2d5b8
chore: add pyarrow dependency version 15.0.0 because following reason:
peterxcli Jan 25, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 0 additions & 1 deletion Discord-Bot/Error_Notify.py

This file was deleted.

63 changes: 58 additions & 5 deletions Discord-Bot/bot.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,15 @@
import discord
# 連結 Error_Notify
# 連接 Receive_Mess
import random
import asyncio
import os
from message import send_embedded_warning, send_embedded_error,send_embedded_info
from feedback import get_active_threads, process_feedback
from dotenv import load_dotenv
load_dotenv()
DISCORD_BOT_TOKEN = os.getenv('DISCORD_BOT_TOKEN')

last_warning = None
active_threads = []

# 權限設置
intents = discord.Intents.default()
Expand All @@ -9,13 +18,57 @@

client = discord.Client(intents = intents)

async def update_active_threads():
global active_threads
active_threads = await get_active_threads()

@client.event
async def on_ready():
slash = await client.tree.sync()
print("TSMC System Bot is Online!")
print(f'Logged in as {client.user}')
test_channel_id = 1199372364870340810
channel = client.get_channel(test_channel_id)
client.loop.create_task(update_active_threads())
if channel:
await send_embedded_warning(channel)
# await asyncio.sleep(5)
# await send_embedded_error(channel)
# await asyncio.sleep(5)
# await send_embedded_info(channel)

# 監聽討論串訊息
@client.event
async def on_message(message):
if message.author == client.user:
return

for thread in active_threads:
if message.channel.id == thread.id:
await process_feedback(message, thread)


# 測試 function
# async def pull_warning():
# num = random.randint(0,100)
# return f"編號{num}這是一個新的警告!"

# async def warning_task():
# global last_warning
# while True:
# # 呼叫 pull_warning 函數獲取最新警告
# new_warning = await pull_warning()
# if new_warning != last_warning:
# # 如果新警告與上一次警告不同,則向 Discord 頻道發送新警告
# last_warning = new_warning
# channel = client.get_channel(1199372364870340810)
# await channel.send(new_warning)
# await asyncio.sleep(5)


# @client.event
# async def on_ready():
# print(f'{client.user} has connected to Discord!')
# print("TSMC System Bot is Online!")
# client.loop.create_task(warning_task())


client.run("MTE5OTI3MTYxMzM1OTczOTAxMA.GJN31L.rNSDr17bJ4cMPOsKeLd4jFdS45wZqVGcrDmo6k")
client.run(DISCORD_BOT_TOKEN)
29 changes: 29 additions & 0 deletions Discord-Bot/feedback.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
import discord

active_threads = []

async def get_active_threads():
return active_threads

async def send_feedback(content):
# 看 Jerry 那邊要怎麼收
pass

async def create_thread(message):
# 創建討論串
thread = await message.create_thread(name="Feedback Discussion")
# 在討論串中發送一則 Welcome 訊息
await thread.send("Send me message if you have any suggestion!")
# 將討論串加入 active_threads 之中
active_threads.append(thread)
print(active_threads)


async def process_feedback(message, thread):
await thread.send(f"This is your feedback\n {message.content}\n Thanks for your feedback, Jerry will take care of it!")
print("成功發送「確認收到feedback訊息」")
# await send_feedback(message)




84 changes: 84 additions & 0 deletions Discord-Bot/message.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,84 @@
import discord


async def send_embedded_warning(channel):
# 創建一個 Embed 對象
embed = discord.Embed(
title = "WARNING",
description = "We're out of CPU !!!",
color = 0xf31cc8
)

# 添加欄位
embed.add_field(
name = "Suggestion1: Auto Scale \n",
value = "We have to auto-scale the CPU to make the system keep working.",
inline = False
)
embed.add_field(
name = "Suggestion2: Give up the system \n",
value = "There's nothing we can do bruh",
inline = False
)

# 設置腳註
embed.set_footer(text = "Warning Message by your system buddy")

message = await channel.send(embed = embed)

# 創建討論串
from feedback import create_thread
await create_thread(message)



async def send_embedded_error(channel):
# 創建一個 Embed 對象
embed = discord.Embed(
title = "ERROR",
description = "We encounter a Network Error!!!",
color = discord.Color.red()
)

# 添加欄位
embed.add_field(
name = "Suggestion1: Go get 張鴈光's help \n",
value = "He's the man in charge of CSIE wifi, but the wifi is always suck : (",
inline = False
)
embed.add_field(
name = "Suggestion2: Give up the system \n",
value = "There's nothing we can do bruh",
inline = False
)

# 設置腳註
embed.set_footer(text = "Error Message by your system buddy")

await channel.send(embed = embed)

async def send_embedded_info(channel):
# 創建一個 Embed 對象
embed = discord.Embed(
title = "Info",
description = "Hey! Here's some info",
color = 0x25f533
)

# 添加欄位
embed.add_field(
name = "Info1: We're going to win the competition! \n",
value = "We're the best!",
inline = False
)
embed.add_field(
name = "Info2: we're bringing ipad back! \n",
value = "We're the best!!",
inline = False
)

# 設置腳註
embed.set_footer(text = "Info Message by your system buddy")

await channel.send(embed = embed)

154 changes: 154 additions & 0 deletions ai/analyze.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,154 @@
import pandas as pd
from langchain.output_parsers import ResponseSchema, StructuredOutputParser
from langchain.prompts import PromptTemplate
from langchain_google_vertexai import VertexAI


def analyze_cpu_usage(metric_df) -> str:
feedback = ""
label = "Container CPU Utilization (%)"
for i in range(len(metric_df) - 1):
curr_entry = metric_df.iloc[i][label]
next_entry = metric_df.iloc[i + 1][label]
if (
not pd.isna(curr_entry)
and not pd.isna(next_entry)
and curr_entry > 60.0
and next_entry > 60.0
):
feedback += f"\
- ERROR: Container CPU Utilization (%) is above 60% for two minutes, at {metric_df.iloc[i]['Time']} and {metric_df.iloc[i + 1]['Time']}\n"

if feedback == "":
feedback = f"- INFO: Container CPU Utilization (%) is below 60% over the last {len(metric_df)} minutes.\n"

return feedback


def analyze_mem_usage(metric_df) -> str:
feedback = ""
label = "Container Memory Utilization (%)"
for i in range(len(metric_df) - 1):
curr_entry = metric_df.iloc[i][label]
next_entry = metric_df.iloc[i + 1][label]
if (
not pd.isna(curr_entry)
and not pd.isna(next_entry)
and curr_entry > 60.0
and next_entry > 60.0
):
feedback += f"\
- ERROR: Container Memory Utilization (%) is above 60% for two minutes, at {metric_df.iloc[i]['Time']} and {metric_df.iloc[i + 1]['Time']}\n"

if feedback == "":
feedback = f"- INFO: Container Memory Utilization (%) is below 60% over the last {len(metric_df)} minutes.\n"

return feedback


def analyze_restart(metric_df) -> str:
feedback = ""
for _, row in metric_df.iterrows():
if not pd.isna(row["Container Startup Latency (ms)"]):
feedback += f"\
- ERROR: Cloud run restarted at {row['Time']}, with Container Startup Latency (ms) of {row['Container Startup Latency (ms)']} ms\n"

if feedback == "":
feedback = f"- INFO: Cloud run did not restart over the last {len(metric_df)} minutes.\n"

return feedback


def analyze_instance_count(metric_df) -> str:
feedback = ""
for _, row in metric_df.iterrows():
if pd.isna(row["Instance Count (active)"]):
continue

total_instance_count = (
row["Instance Count (active)"] + row["Instance Count (idle)"]
)
if total_instance_count > 2:
feedback += f"\
- ERROR: Total instance count is above 2 at {row['Time']}, with Instance Count (active) of {int(row['Instance Count (active)'])} and Instance Count (idle) of {int(row['Instance Count (idle)'])}\n"

if feedback == "":
feedback = f"- INFO: Total instance count is less than or equal to 2 over the last {len(metric_df)} minutes.\n"

return feedback


def analyze_by_rule(metric_df: pd.DataFrame) -> str:
feedback = ""
feedback += analyze_cpu_usage(metric_df)
feedback += analyze_mem_usage(metric_df)
feedback += analyze_restart(metric_df)
feedback += analyze_instance_count(metric_df)

return feedback


def analyze_by_llm(metric_df: pd.DataFrame) -> dict:
# Analysis feedback by heuristic rules
heuristic_feedback = analyze_by_rule(metric_df)

# Define response schema
severity_schema = ResponseSchema(
name="severity",
description='Severity level of the analysis feedback. \
Use "ERROR" if the analysis detects errors, "WARNING" for potential issues, or "INFO" if no problems are identified.',
)
message_schema = ResponseSchema(
name="message",
description="In-depth analysis feedback based on provided metrics(The description can span multiple lines, use '\\n' to separate lines.)",
)
response_schema = [severity_schema, message_schema]
output_parser = StructuredOutputParser.from_response_schemas(response_schema)
format_instruction = output_parser.get_format_instructions()

# Define the model and prompt template
llm = VertexAI(
model_name="text-bison@001",
temperature=0,
max_output_tokens=512,
top_p=0.8,
top_k=40,
)
prompt_template = PromptTemplate.from_template(
"""\
The following text contains metric data for a Google Cloud Run application. \
This data is presented in CSV format and encompasses the most recent {time_span} minutes:
{metric_data}

The following text is a heuristic analysis feedback of the metric data:
{heuristic_feedback}

The heuristic analysis feedback is based on the following rules:
- CPU limit > 60% (lasts 2 minutes)
- Memory limit > 60% (lasts 2 minutes)
- Cloud run re-start
- Instance count > 2
- Fail response (4xx, 5xx)

Based on the provided metrics, an in-depth \
analysis is required to evaluate the cloud resource status and the operational health of the system. The analysis \
should identify and report any errors, anticipate potential problems, and propose appropriate remediation strategies.

{format_instruction}
"""
)

# Invoke the model
chain = prompt_template | llm
feedback = chain.invoke(
{
"time_span": len(metric_df),
"metric_data": metric_df.to_string(),
"heuristic_feedback": heuristic_feedback,
"format_instruction": format_instruction,
}
)

# Parse the feedback to a dictionary
feedback_dict = output_parser.parse(feedback)
return feedback_dict
Loading
Loading