Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rewrite kafka connector #444

Merged
merged 60 commits into from
Oct 13, 2023
Merged

rewrite kafka connector #444

merged 60 commits into from
Oct 13, 2023

Conversation

ekneg54
Copy link
Collaborator

@ekneg54 ekneg54 commented Sep 21, 2023

this PR rewrites the confluent kafka input and output processors with focus on the input side.
please take a look at the CHANGELOG.md for changes.

@ekneg54 ekneg54 self-assigned this Sep 21, 2023
@ekneg54 ekneg54 force-pushed the feature-rewrite-kafka-connector branch from a6169c0 to 73c8d9b Compare September 26, 2023 16:10
@codecov-commenter
Copy link

codecov-commenter commented Sep 27, 2023

Codecov Report

Attention: 13 lines in your changes are missing coverage. Please review.

Comparison is base (9458a24) 91.97% compared to head (6a036bd) 92.01%.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #444      +/-   ##
==========================================
+ Coverage   91.97%   92.01%   +0.03%     
==========================================
  Files         132      132              
  Lines        9389     9476      +87     
==========================================
+ Hits         8636     8719      +83     
- Misses        753      757       +4     
Files Coverage Δ
logprep/abc/component.py 100.00% <100.00%> (ø)
logprep/abc/input.py 99.30% <100.00%> (ø)
logprep/connector/confluent_kafka/output.py 100.00% <100.00%> (ø)
logprep/connector/elasticsearch/output.py 98.24% <100.00%> (ø)
logprep/framework/pipeline.py 97.89% <ø> (-0.04%) ⬇️
logprep/util/prometheus_exporter.py 95.45% <100.00%> (+0.10%) ⬆️
logprep/util/validators.py 96.36% <66.66%> (-3.64%) ⬇️
logprep/connector/confluent_kafka/input.py 92.59% <90.62%> (-0.82%) ⬇️

... and 3 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@ekneg54 ekneg54 force-pushed the feature-rewrite-kafka-connector branch from 4402f56 to eae00ea Compare September 28, 2023 15:51
@ekneg54 ekneg54 added the enhancement New feature or request label Oct 2, 2023
@ekneg54 ekneg54 requested a review from ppcad October 3, 2023 18:42
@ekneg54 ekneg54 marked this pull request as ready for review October 3, 2023 18:42
@ekneg54 ekneg54 marked this pull request as draft October 4, 2023 10:56
@ekneg54 ekneg54 marked this pull request as ready for review October 5, 2023 17:22
@ekneg54 ekneg54 force-pushed the feature-rewrite-kafka-connector branch from 5128344 to 9f7facd Compare October 6, 2023 06:30
@ekneg54
Copy link
Collaborator Author

ekneg54 commented Oct 6, 2023

rebased from main.

@ppcad now it`s ready to rumble 👍

@ekneg54 ekneg54 force-pushed the feature-rewrite-kafka-connector branch from 9f7facd to 724f31d Compare October 9, 2023 15:41
Copy link
Collaborator

@ppcad ppcad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whenever group.id is missing from the kafka_config the error message is Required option is missing: input instead of telling what is missing.

Whenever bootstrap.servers is missing from the kafka_config Logprep crashes with a stacktrace.

group: "cgroup" in the confluent_kafka input example should be updated to group.id: "cgroup".

I have ran a couple of tests to compare it with the old version. I used the same config parameters. I activated only the labeler with no rules. In that case, the new version was about 25% slower than the old version. Do you have an idea why that could be?

@ekneg54
Copy link
Collaborator Author

ekneg54 commented Oct 11, 2023

it is possible, that the statistics.interval.ms default was too low. It was set to 1s. I increased the default to 30 seconds.

@ekneg54
Copy link
Collaborator Author

ekneg54 commented Oct 11, 2023

I added a validation for mandatory keys in kafka_config object. This prevents the Exceptions during start and logs the missing keys.

@ekneg54 ekneg54 requested a review from ppcad October 11, 2023 15:15
@ekneg54 ekneg54 force-pushed the feature-rewrite-kafka-connector branch from cb5d712 to 6a036bd Compare October 11, 2023 15:17
@ppcad
Copy link
Collaborator

ppcad commented Oct 12, 2023

I added a validation for mandatory keys in kafka_config object. This prevents the Exceptions during start and logs the missing keys.

Thanks for the changes. This solves the problem.

@ppcad
Copy link
Collaborator

ppcad commented Oct 12, 2023

it is possible, that the statistics.interval.ms default was too low. It was set to 1s. I increased the default to 30 seconds.

This does not seem to be the reason for the performance loss.

It seems that the new version is faster than the old one with 1 process, but slower once I increase the amount up to the partition count.

@ekneg54 ekneg54 merged commit eb4db4b into main Oct 13, 2023
@ekneg54 ekneg54 deleted the feature-rewrite-kafka-connector branch February 7, 2024 14:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants