Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: support host monitor #1890

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open

Conversation

Abingcbc
Copy link
Collaborator

  1. 打通进程元信息采集链路

TODO:

  1. 支持更多字段
  2. 可观测指标

@Abingcbc Abingcbc force-pushed the host_monitor branch 6 times, most recently from 6150e52 to bfdd9c2 Compare November 19, 2024 06:27
core/common/timer/HostMonitorTimerEvent.cpp Show resolved Hide resolved
@@ -58,6 +58,8 @@ enum class EventGroupMetaKey {
PROMETHEUS_SCRAPE_TIMESTAMP_MILLISEC,
PROMETHEUS_UP_STATE,

HOST_MONITOR_COLLECT_TIME,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个字段是否不可以跟具体业务无关,作为通用字段。

LOG_DEBUG(
sLogger,
("send http request succeeded, item address", request->mItem)(
"config-flusher-dst", QueueKeyManager::GetInstance()->GetName(request->mItem->mQueueKey))(
"response time", ToString(responseTimeMs) + "ms")("try cnt", ToString(request->mTryCnt))(
"sending cnt", ToString(FlusherRunner::GetInstance()->GetSendingBufferCount())));
static_cast<HttpFlusher*>(request->mItem->mFlusher)->OnSendDone(request->mResponse, request->mItem);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个文件改动原因是什么

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个debug日志会有core,遗留问题

* limitations under the License.
*/

#include "MockCollector.h"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

del


namespace logtail {

CollectorManager::CollectorManager() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

没太大必要删掉吧

const std::string ProcessorHostMetaNative::sName = "processor_host_meta_native";

bool ProcessorHostMetaNative::Init(const Json::Value& config) {
auto hostType = ToString(getenv(DEFAULT_ENV_KEY_HOST_TYPE.c_str()));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个机制也不合理。应该有个全局的管理中心,各业务方直接读值即可。这个事情日会上提下,讨论下。


// for process entity
const std::string DEFAULT_CONTENT_VALUE_ENTITY_TYPE_PROCESS = "process";
const std::string DEFAULT_CONTENT_KEY_PROCESS_PID = "process_pid";
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

尽量与 TagConstants.cpp 中,日志、指标类型保持一致。

同时看看node-exporter的指标label做下参考。

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

node exporter的指标也是以下划线连接
node_固定前缀 + 指标类型 + 指标
e.g. node_memory_HugePages_Total

namespace logtail {

int64_t GetSystemBootSeconds() {
static int64_t systemBootSeconds;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个文件,是不是应该是collector的一部分?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个是系统级的指标,会有多个collector用到。比如这个获取启动时间,cpu collector和process collector都要用到

targetEvent->SetContent("binary", sourceEvent.GetContent(DEFAULT_CONTENT_KEY_PROCESS_BINARY));
targetEvent->SetContent("arguments", sourceEvent.GetContent(DEFAULT_CONTENT_KEY_PROCESS_ARGUMENTS));
targetEvent->SetContent("language", sourceEvent.GetContent(DEFAULT_CONTENT_KEY_PROCESS_LANGUAGE));
targetEvent->SetContent("containerID", sourceEvent.GetContent(DEFAULT_CONTENT_KEY_PROCESS_CONTAINER_ID));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DEFAULT_CONTENT_KEY_PROCESS_CONTAINER_ID没有set的地方? 另外也不一定是container。


const size_t ProcessTopN = 20;

void ProcessCollector::Collect(PipelineEventGroup& group) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

并发的必要性是什么?

core/common/FileSystemUtil.h Show resolved Hide resolved
core/common/StringTools.cpp Show resolved Hide resolved
core/common/timer/HostMonitorTimerEvent.cpp Show resolved Hide resolved
Comment on lines +37 to +39
std::string mDomain;
std::string mEntityType;
std::string mHostEntityID;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这三个成员有用吗?

bool Init(const Json::Value& config, Json::Value& optionalGoPipeline) override;
bool Start() override;
bool Stop(bool isPipelineRemoving) override;
bool SupportAck() const override { return false; }
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

先改成true


bool InputHostMeta::Stop(bool isPipelineRemoving) {
LOG_INFO(sLogger, ("input host meta stop", mContext->GetConfigName()));
HostMonitorInputRunner::GetInstance()->RemoveCollector(mContext->GetConfigName());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

如果pipelineremoving是false,不要调用

bool InputHostMeta::Start() {
LOG_INFO(sLogger, ("input host meta start", mContext->GetConfigName()));
HostMonitorInputRunner::GetInstance()->Init();
HostMonitorInputRunner::GetInstance()->UpdateCollector(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

函数里面判断有没有过这个config,决定是不是要加初始事件

Comment on lines +116 to +126
if (ProcessQueueManager::GetInstance()->IsValidToPush(processQueueKey)) {
ProcessQueueManager::GetInstance()->PushQueue(processQueueKey, std::move(item));
} else {
std::this_thread::sleep_for(std::chrono::milliseconds(100));
// try again
if (ProcessQueueManager::GetInstance()->IsValidToPush(processQueueKey)) {
ProcessQueueManager::GetInstance()->PushQueue(processQueueKey, std::move(item));
} else {
LOG_WARNING(sLogger, ("process queue is full", "discard data")("config", configName));
}
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

用ProcessorRunner::GetInstance()->PushQueue

Comment on lines +106 to +112
mThreadPool->Add([this, eventCopy]() mutable {
auto configName = eventCopy.GetConfigName();
auto collectorName = eventCopy.GetCollectorName();
auto processQueueKey = eventCopy.GetProcessQueueKey();
PipelineEventGroup group(std::make_shared<SourceBuffer>());
auto collector = CollectorManager::GetInstance()->GetCollector(collectorName);
collector->Collect(group);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

感觉这边不太对,不应该有copy

LOG_DEBUG(sLogger, ("schedule host monitor collector again", configName)("collector", collectorName));

eventCopy.ResetForNextExec();
mTimer->PushEvent(std::make_unique<HostMonitorTimerEvent>(eventCopy));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这边直接构造新的

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants