데이터베이스를 2개의 HDD RAID로 분리 저장 #667

syncpark · 2024-01-25T08:59:12Z

Issue

In TIS project, the input traffic is 10Gbps or 20Gbps for each collector machine.
Since Giganto cannot not process all events sent by a single Piglet, the Giganto's storage/retrieval performance needs to be improved.

Purpose

Let's improve storage/retrieval performance by storing Conn events in a different HDD RAID than other protocols.

Background

Event ratio by protocols:

Total 5 billions events: Piglet generates this amount of events per day. But this event size only covers about 30% of the total bandwidth.
- Conn events: 4.3 billion (86%)
- Dns events: 0.11 billion (2.2%)
- HTTP events: 0.34 (0.68%)
- TLS events(HTTPS): 0.51 (10.2%)
REconverge manly analyze protocols other than Conn.

If Conn events can be stored separately in a separate HDD RAID, Disk I/O competition with storage and search requests from other protocols is reduced.
As a result, We can expect improved performance.

TODOs

Supports setting different DB storage paths for Conn and other protocols
Create and manage Conn DB and other protocol DB separately

sehkone · 2024-01-25T23:18:46Z

@syncpark
I'd like you to collect and organize the issues related to Giganto's performance, so I think the first step is to think about the strategy for improving Giganto's performance in the big picture.

@msk, @sophie-cluml, let's discuss this together.

msk · 2024-01-26T18:55:08Z

This approach to parallelize the storage of Conn events and other protocols could result in a latency decrease of about 14% (100% - 86%), which is not negligible. However, I have concerns that this alone might not sufficiently address Giganto's scalability issues under heavy traffic.

To tackle the core of the problem, we first need to identify where the bottleneck lies. @syncpark's suggestion hints at the physical disk I/O being the constraint. If that's the case, a potential solution could be to increase the number of stripes in our RAID configuration. This might offer a simpler and possibly more effective way to enhance performance compared to separating Conn and other events.

On the other hand, if the bottleneck is at the level of RocksDB operations, like locking or transaction handling, splitting the events across multiple RocksDB instances on different disks could be beneficial. However, dividing them based on event type may not be the most efficient, particularly when a single type (e.g., Conn) dominates. A more balanced approach could be to distribute events evenly, perhaps using hash values.

Additionally, it’s crucial to consider how much CPU time is currently idle. If we have sufficient CPU resources available, we might explore more aggressive methods. These could include batching events for storage (e.g., storing 1,000 events in a single RocksDB column family entry), compressing events before storage, or implementing both strategies.

sophie-cluml added the performance Performance improvement label Jan 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

데이터베이스를 2개의 HDD RAID로 분리 저장 #667

데이터베이스를 2개의 HDD RAID로 분리 저장 #667

syncpark commented Jan 25, 2024 •

edited

Loading

sehkone commented Jan 25, 2024

msk commented Jan 26, 2024

데이터베이스를 2개의 HDD RAID로 분리 저장 #667

데이터베이스를 2개의 HDD RAID로 분리 저장 #667

Comments

syncpark commented Jan 25, 2024 • edited Loading

Issue

Purpose

Background

TODOs

sehkone commented Jan 25, 2024

msk commented Jan 26, 2024

syncpark commented Jan 25, 2024 •

edited

Loading