Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

데이터베이스를 2개의 HDD RAID로 분리 저장 #667

Open
syncpark opened this issue Jan 25, 2024 · 2 comments
Open

데이터베이스를 2개의 HDD RAID로 분리 저장 #667

syncpark opened this issue Jan 25, 2024 · 2 comments
Labels
performance Performance improvement

Comments

@syncpark
Copy link
Contributor

syncpark commented Jan 25, 2024

Issue

In TIS project, the input traffic is 10Gbps or 20Gbps for each collector machine.
Since Giganto cannot not process all events sent by a single Piglet, the Giganto's storage/retrieval performance needs to be improved.

Purpose

Let's improve storage/retrieval performance by storing Conn events in a different HDD RAID than other protocols.

Background

Event ratio by protocols:

  • Total 5 billions events: Piglet generates this amount of events per day. But this event size only covers about 30% of the total bandwidth.
    • Conn events: 4.3 billion (86%)
    • Dns events: 0.11 billion (2.2%)
    • HTTP events: 0.34 (0.68%)
    • TLS events(HTTPS): 0.51 (10.2%)
  • REconverge manly analyze protocols other than Conn.

If Conn events can be stored separately in a separate HDD RAID, Disk I/O competition with storage and search requests from other protocols is reduced.
As a result, We can expect improved performance.

TODOs

  • Supports setting different DB storage paths for Conn and other protocols
  • Create and manage Conn DB and other protocol DB separately
@sehkone
Copy link
Contributor

sehkone commented Jan 25, 2024

@syncpark
I'd like you to collect and organize the issues related to Giganto's performance, so I think the first step is to think about the strategy for improving Giganto's performance in the big picture.

@msk, @sophie-cluml, let's discuss this together.

@sophie-cluml sophie-cluml added the performance Performance improvement label Jan 26, 2024
@msk
Copy link
Contributor

msk commented Jan 26, 2024

This approach to parallelize the storage of Conn events and other protocols could result in a latency decrease of about 14% (100% - 86%), which is not negligible. However, I have concerns that this alone might not sufficiently address Giganto's scalability issues under heavy traffic.

To tackle the core of the problem, we first need to identify where the bottleneck lies. @syncpark's suggestion hints at the physical disk I/O being the constraint. If that's the case, a potential solution could be to increase the number of stripes in our RAID configuration. This might offer a simpler and possibly more effective way to enhance performance compared to separating Conn and other events.

On the other hand, if the bottleneck is at the level of RocksDB operations, like locking or transaction handling, splitting the events across multiple RocksDB instances on different disks could be beneficial. However, dividing them based on event type may not be the most efficient, particularly when a single type (e.g., Conn) dominates. A more balanced approach could be to distribute events evenly, perhaps using hash values.

Additionally, it’s crucial to consider how much CPU time is currently idle. If we have sufficient CPU resources available, we might explore more aggressive methods. These could include batching events for storage (e.g., storing 1,000 events in a single RocksDB column family entry), compressing events before storage, or implementing both strategies.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Performance improvement
Projects
None yet
Development

No branches or pull requests

4 participants