[FEA] Distributed processing of Event Logs #1249

parthosa · 2024-08-01T23:34:55Z

Currently, we run the Tool (python+jar) on a single machine which is limited by the memory and compute of the host machine. However, Tools should have the capability to process large scale event logs.

Although, we do support running the Tools as a Spark Listener but is not useful for apps that are already processed.

Some of the ideas are:

Distributed Processing:
- If the JAR can be submitted as an Spark App.
Batch Processing on a Single Machine:
- If the Tool can do batching and write the JAR output to multiple directories.
- Then the Python Tool could process multiple rapids_4_spark_qualification_output directories.
- Batching can be done based on size of event logs or a config

cc: @viadea @kuhushukla

The text was updated successfully, but these errors were encountered:

amahussein · 2024-08-02T17:17:55Z

Currently, we run the Tool (python+jar) on a single machine which is limited by the memory and compute of the host machine. However, Tools should have the capability to process large scale event logs.

I am not sure I understand the problem. Is it about processing Apps in runtime or about tools resources requirements?

Processing eventlogs require large resources. As instance, Spark History Server is known to require large memory and resources to process eventlogs.
We have issues opened for performance optimizations which mainly target possibility of OOME while processing large eventlogs.

amahussein · 2024-08-02T17:32:55Z

Previously, the python CLI had option to submit the Tools jar as a Spark job. This was mainly a way to work with large eventlogs since the CLI will be able to spin distributed Spark jobs.
Based on feature requests, the python CLI was converted to be a single Dev machine despite knowing that large scale processing would be a problem.

tgravescs · 2024-10-04T15:06:00Z

Note that scaling can also be done via making a single machine run more efficient by storing the data in a database vs in memory. For instance like RocksDB. This issue should likely be split up into multiple for the various improvements being made

tgravescs · 2024-10-07T17:08:29Z

linking #1377 to this for handling lots and lots of event logs.
Also linking #1378 to this for processing huge event logs

amahussein · 2024-10-23T21:50:56Z

Note that scaling can also be done via making a single machine run more efficient by storing the data in a database vs in memory. For instance like RocksDB. This issue should likely be split up into multiple for the various improvements being made

@tgravescs , yes I agree. We had a previous issue #815 to track that
I am sort of confused about how each of those issues are connected together.
For example, what is the outcome from this issue (1249) Vs. what's in 1378?
IMHO, we should close 1249. Then we can file something specific to Distributed-Tools execution.

tgravescs · 2024-11-04T20:41:29Z

Please note there are 2 other issues to improve processing of event logs on a single machine:

[FEA] Scale: Support huge single event log size without using a lot of memory #1378
[FEA] Scale: Support huge number of eventlogs in single qualification process run #1377

parthosa added feature request New feature or request ? - Needs Triage core_tools Scope the core module (scala) labels Aug 1, 2024

mattahrens removed the ? - Needs Triage label Aug 14, 2024

viadea added the ? - Needs Triage label Oct 28, 2024

mattahrens removed the ? - Needs Triage label Oct 30, 2024

tgravescs changed the title ~~[FEA] Processing of Large Scale Event Logs~~ [FEA] Distributed processing of Event Logs Nov 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEA] Distributed processing of Event Logs #1249

[FEA] Distributed processing of Event Logs #1249

parthosa commented Aug 1, 2024 •

edited

Loading

amahussein commented Aug 2, 2024 •

edited

Loading

amahussein commented Aug 2, 2024

tgravescs commented Oct 4, 2024

tgravescs commented Oct 7, 2024 •

edited

Loading

amahussein commented Oct 23, 2024

tgravescs commented Nov 4, 2024

[FEA] Distributed processing of Event Logs #1249

[FEA] Distributed processing of Event Logs #1249

Comments

parthosa commented Aug 1, 2024 • edited Loading

amahussein commented Aug 2, 2024 • edited Loading

amahussein commented Aug 2, 2024

tgravescs commented Oct 4, 2024

tgravescs commented Oct 7, 2024 • edited Loading

amahussein commented Oct 23, 2024

tgravescs commented Nov 4, 2024

parthosa commented Aug 1, 2024 •

edited

Loading

amahussein commented Aug 2, 2024 •

edited

Loading

tgravescs commented Oct 7, 2024 •

edited

Loading