Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(kubernetes_logs): delaying applied new pod #13886

Closed
wants to merge 3 commits into from

Conversation

sillent
Copy link
Contributor

@sillent sillent commented Aug 8, 2022

if my guess described in problem #13467 is correct, then as a quick solution i suggest putting the event about the creation of a new pod in a delayed queue equal to the delayed queue for deleting a pod plus one second

@netlify
Copy link

netlify bot commented Aug 8, 2022

Deploy Preview for vector-project canceled.

Name Link
🔨 Latest commit f366ef3
🔍 Latest deploy log https://app.netlify.com/sites/vector-project/deploys/62f1f306b9fdf20008a7484b

@github-actions
Copy link

github-actions bot commented Aug 8, 2022

Soak Test Results

Baseline: 8f2ae47
Comparison: d51e1a4
Total Vector CPUs: 4

Explanation

A soak test is an integrated performance test for vector in a repeatable rig, with varying configuration for vector. What follows is a statistical summary of a brief vector run for each configuration across SHAs given above. The goal of these tests are to determine, quickly, if vector performance is changed and to what degree by a pull request. Where appropriate units are scaled per-core.

The table below, if present, lists those experiments that have experienced a statistically significant change in their throughput performance between baseline and comparision SHAs, with 90.0% confidence OR have been detected as newly erratic. Negative values mean that baseline is faster, positive comparison. Results that do not exhibit more than a ±8.87% change in mean throughput are discarded. An experiment is erratic if its coefficient of variation is greater than 0.3. The abbreviated table will be omitted if no interesting changes are observed.

No interesting changes in throughput with confidence ≥ 90.00% and absolute Δ mean >= ±8.87%:

Fine details of change detection per experiment.
experiment Δ mean Δ mean % confidence baseline mean baseline stdev baseline stderr baseline outlier % baseline CoV comparison mean comparison stdev comparison stderr comparison outlier % comparison CoV erratic declared erratic
syslog_log2metric_splunk_hec_metrics 659.68KiB 3.61 100.00% 17.85MiB 620.48KiB 12.64KiB 0 0.0339313 18.5MiB 721.71KiB 14.7KiB 0 0.0380928 False False
syslog_humio_logs 585.81KiB 3.45 100.00% 16.56MiB 127.26KiB 2.6KiB 0 0.00750173 17.13MiB 111.08KiB 2.27KiB 0 0.00632975 False False
http_to_http_acks 565.99KiB 3.17 98.13% 17.44MiB 8.12MiB 169.77KiB 0 0.465617 17.99MiB 8.17MiB 170.42KiB 0 0.454096 True True
syslog_regex_logs2metric_ddmetrics 342.5KiB 2.72 100.00% 12.32MiB 538.04KiB 10.96KiB 0 0.0426466 12.65MiB 649.93KiB 13.24KiB 0 0.0501534 False False
syslog_splunk_hec_logs 433.47KiB 2.63 100.00% 16.1MiB 848.69KiB 17.26KiB 0 0.0514555 16.53MiB 714.64KiB 14.56KiB 0 0.0422186 False False
syslog_loki 309.99KiB 2.13 100.00% 14.25MiB 306.33KiB 6.27KiB 0 0.0209949 14.55MiB 732.35KiB 14.89KiB 0 0.0491494 False False
http_text_to_http_json 786.89KiB 2.02 100.00% 38.11MiB 1.12MiB 23.51KiB 0 0.0295062 38.88MiB 1.21MiB 25.33KiB 0 0.0311694 False False
splunk_hec_route_s3 378.07KiB 1.95 100.00% 18.9MiB 2.2MiB 45.89KiB 0 0.116531 19.27MiB 2.13MiB 44.46KiB 0 0.110353 False False
syslog_log2metric_humio_metrics 243.24KiB 1.89 100.00% 12.55MiB 246.93KiB 5.04KiB 0 0.0192059 12.79MiB 483.13KiB 9.84KiB 0 0.0368792 False False
datadog_agent_remap_datadog_logs_acks 257.39KiB 0.41 96.73% 61.53MiB 3.54MiB 73.97KiB 0 0.057503 61.78MiB 4.57MiB 95.08KiB 0 0.0739038 False False
splunk_hec_to_splunk_hec_logs_noack 31.79KiB 0.13 98.15% 23.81MiB 572.31KiB 11.67KiB 0 0.023472 23.84MiB 331.46KiB 6.77KiB 0 0.0135763 False False
datadog_agent_remap_blackhole 66.9KiB 0.1 41.36% 62.36MiB 4.52MiB 94.29KiB 0 0.0725061 62.43MiB 3.78MiB 78.9KiB 0 0.0605577 False False
splunk_hec_indexer_ack_blackhole -1.73KiB -0.01 5.35% 23.75MiB 882.37KiB 17.94KiB 0 0.0362806 23.74MiB 907.77KiB 18.46KiB 0 0.0373276 False False
enterprise_http_to_http -2.0KiB -0.01 21.78% 23.85MiB 248.69KiB 5.08KiB 0 0.0101819 23.84MiB 252.86KiB 5.17KiB 0 0.0103536 False False
http_pipelines_blackhole_acks -373.81B -0.03 9.27% 1.12MiB 119.12KiB 2.43KiB 0 0.104195 1.12MiB 97.39KiB 1.99KiB 0 0.0852117 False False
splunk_hec_to_splunk_hec_logs_acks -11.22KiB -0.05 36.46% 23.76MiB 800.34KiB 16.29KiB 0 0.032882 23.75MiB 843.54KiB 17.16KiB 0 0.0346731 False False
file_to_blackhole -58.77KiB -0.06 43.39% 95.34MiB 3.16MiB 65.49KiB 0 0.0331309 95.28MiB 3.78MiB 78.73KiB 0 0.0397037 False False
socket_to_socket_blackhole -18.05KiB -0.07 88.78% 23.7MiB 280.93KiB 5.73KiB 0 0.0115722 23.68MiB 480.18KiB 9.8KiB 0 0.0197943 False False
http_to_http_json -32.45KiB -0.13 98.81% 23.84MiB 355.24KiB 7.25KiB 0 0.0145472 23.81MiB 522.51KiB 10.67KiB 0 0.021425 False False
datadog_agent_remap_blackhole_acks -196.29KiB -0.3 91.61% 62.98MiB 4.69MiB 97.69KiB 0 0.0744991 62.79MiB 2.76MiB 57.85KiB 0 0.0440005 False False
http_to_http_noack -119.94KiB -0.49 100.00% 23.85MiB 256.01KiB 5.24KiB 0 0.0104822 23.73MiB 1.17MiB 24.43KiB 0 0.0493944 False False
datadog_agent_remap_datadog_logs -332.64KiB -0.53 99.92% 61.57MiB 2.22MiB 46.6KiB 0 0.0360783 61.24MiB 4.21MiB 87.72KiB 0 0.0687559 False False
fluent_elasticsearch -502.91KiB -0.62 100.00% 79.47MiB 53.7KiB 1.08KiB 0 0.000659681 78.98MiB 4.69MiB 96.2KiB 0 0.0593516 False False
http_pipelines_no_grok_blackhole -68.28KiB -0.62 99.90% 10.76MiB 122.34KiB 2.5KiB 0 0.0110988 10.7MiB 1010.98KiB 20.57KiB 0 0.0922918 False False
http_pipelines_blackhole -17.28KiB -1.06 100.00% 1.6MiB 36.77KiB 769.44B 0 0.0224863 1.58MiB 96.41KiB 1.97KiB 0 0.059594 False False

@sillent sillent changed the title enhancement(kubernetes_logs): delaying applied new pod fix(kubernetes_logs): delaying applied new pod Aug 9, 2022
@github-actions
Copy link

github-actions bot commented Aug 9, 2022

Soak Test Results

Baseline: afcee9b
Comparison: f366ef3
Total Vector CPUs: 4

Explanation

A soak test is an integrated performance test for vector in a repeatable rig, with varying configuration for vector. What follows is a statistical summary of a brief vector run for each configuration across SHAs given above. The goal of these tests are to determine, quickly, if vector performance is changed and to what degree by a pull request. Where appropriate units are scaled per-core.

The table below, if present, lists those experiments that have experienced a statistically significant change in their throughput performance between baseline and comparision SHAs, with 90.0% confidence OR have been detected as newly erratic. Negative values mean that baseline is faster, positive comparison. Results that do not exhibit more than a ±8.87% change in mean throughput are discarded. An experiment is erratic if its coefficient of variation is greater than 0.3. The abbreviated table will be omitted if no interesting changes are observed.

No interesting changes in throughput with confidence ≥ 90.00% and absolute Δ mean >= ±8.87%:

Fine details of change detection per experiment.
experiment Δ mean Δ mean % confidence baseline mean baseline stdev baseline stderr baseline outlier % baseline CoV comparison mean comparison stdev comparison stderr comparison outlier % comparison CoV erratic declared erratic
syslog_log2metric_splunk_hec_metrics 703.07KiB 3.86 100.00% 17.8MiB 605.24KiB 12.33KiB 0 0.033202 18.48MiB 793.91KiB 16.16KiB 0 0.0419343 False False
syslog_humio_logs 595.26KiB 3.52 100.00% 16.51MiB 154.04KiB 3.15KiB 0 0.00911044 17.09MiB 158.97KiB 3.26KiB 0 0.00908246 False False
syslog_splunk_hec_logs 576.07KiB 3.47 100.00% 16.19MiB 799.06KiB 16.25KiB 0 0.0481784 16.76MiB 552.12KiB 11.27KiB 0 0.032172 False False
http_to_http_acks 609.83KiB 3.41 98.98% 17.49MiB 8.01MiB 167.37KiB 0 0.457644 18.08MiB 8.06MiB 168.22KiB 0 0.445476 True True
syslog_regex_logs2metric_ddmetrics 397.57KiB 3.13 100.00% 12.39MiB 534.65KiB 10.89KiB 0 0.0421283 12.78MiB 521.55KiB 10.64KiB 0 0.0398481 False False
http_text_to_http_json 1.08MiB 2.78 100.00% 39.03MiB 782.67KiB 15.98KiB 0 0.0195813 40.11MiB 719.8KiB 14.7KiB 0 0.0175213 False False
splunk_hec_route_s3 403.04KiB 2.09 100.00% 18.8MiB 2.27MiB 47.18KiB 0 0.120532 19.19MiB 2.17MiB 45.39KiB 0 0.113068 False False
datadog_agent_remap_datadog_logs_acks 972.82KiB 1.5 100.00% 63.23MiB 3.49MiB 72.88KiB 0 0.0551809 64.18MiB 4.32MiB 89.88KiB 0 0.0672386 False False
datadog_agent_remap_datadog_logs 789.12KiB 1.23 100.00% 62.44MiB 1.56MiB 32.8KiB 0 0.0250262 63.21MiB 4.28MiB 89.16KiB 0 0.0676999 False False
datadog_agent_remap_blackhole 738.94KiB 1.13 100.00% 63.65MiB 4.29MiB 89.28KiB 0 0.0673429 64.37MiB 2.39MiB 49.93KiB 0 0.0371378 False False
syslog_loki 114.85KiB 0.76 100.00% 14.83MiB 264.9KiB 5.43KiB 0 0.0174428 14.94MiB 741.73KiB 15.08KiB 0 0.0484748 False False
socket_to_socket_blackhole 107.81KiB 0.46 100.00% 23.06MiB 539.17KiB 11.01KiB 0 0.0228251 23.17MiB 523.16KiB 10.68KiB 0 0.0220469 False False
datadog_agent_remap_blackhole_acks 96.11KiB 0.15 59.00% 60.88MiB 4.57MiB 95.18KiB 0 0.0750415 60.97MiB 3.22MiB 67.4KiB 0 0.0528093 False False
syslog_log2metric_humio_metrics 13.21KiB 0.1 74.00% 12.96MiB 245.1KiB 5.0KiB 0 0.0184651 12.97MiB 520.72KiB 10.6KiB 0 0.0391906 False False
splunk_hec_to_splunk_hec_logs_noack 18.86KiB 0.08 88.52% 23.82MiB 483.29KiB 9.86KiB 0 0.0198099 23.84MiB 331.6KiB 6.77KiB 0 0.0135817 False False
splunk_hec_to_splunk_hec_logs_acks 3.55KiB 0.01 11.70% 23.75MiB 847.0KiB 17.22KiB 0 0.0348198 23.75MiB 831.68KiB 16.92KiB 0 0.0341851 False False
enterprise_http_to_http -528.5B -0 5.74% 23.85MiB 250.39KiB 5.11KiB 0 0.0102522 23.85MiB 245.62KiB 5.03KiB 0 0.0100573 False False
splunk_hec_indexer_ack_blackhole -8.64KiB -0.04 27.73% 23.76MiB 824.84KiB 16.78KiB 0 0.033895 23.75MiB 867.94KiB 17.65KiB 0 0.0356784 False False
file_to_blackhole -48.65KiB -0.05 36.18% 95.34MiB 3.33MiB 69.03KiB 0 0.0349204 95.29MiB 3.71MiB 77.06KiB 0 0.0388786 False False
http_to_http_json -53.67KiB -0.22 99.98% 23.85MiB 347.67KiB 7.1KiB 0 0.0142342 23.79MiB 600.48KiB 12.24KiB 0 0.0246392 False False
http_to_http_noack -96.29KiB -0.39 99.99% 23.84MiB 403.54KiB 8.26KiB 0 0.0165274 23.75MiB 1.09MiB 22.78KiB 0 0.0460104 False False
fluent_elasticsearch -468.73KiB -0.58 100.00% 79.47MiB 53.08KiB 1.07KiB 0 0.000652058 79.02MiB 4.43MiB 91.03KiB 0 0.0561143 False False
http_pipelines_no_grok_blackhole -103.1KiB -0.89 100.00% 11.26MiB 68.41KiB 1.4KiB 0 0.00593076 11.16MiB 1.11MiB 23.21KiB 0 0.0998472 False False
http_pipelines_blackhole_acks -11.65KiB -0.96 100.00% 1.18MiB 103.4KiB 2.11KiB 0 0.0853527 1.17MiB 69.51KiB 1.42KiB 0 0.0579307 False False
http_pipelines_blackhole -21.2KiB -1.27 100.00% 1.63MiB 28.02KiB 585.68B 0 0.0167766 1.61MiB 104.98KiB 2.14KiB 0 0.063662 False False

@tobz
Copy link
Contributor

tobz commented Aug 9, 2022

Reading through the original issue, I don't believe this is a suitable fix.

Anything that is based on the timing of an event is too fragile to be considered reliable in production workloads. Based on the comments that @jszwedko left, it seems like the correct solution should revolve around including a better unique identifier so that when we delete the metadata for the pod, we only delete the metadata for the previous pod, not the new pod that shares the same name.

If you're willing to try to work up a solution based on that approach, feel free to do so in this PR. Otherwise, we would end up closing this one.

@sillent
Copy link
Contributor Author

sillent commented Aug 9, 2022

No problem. I do not have a complete understanding of the architecture of this solution, if in your opinion the problem is not in this place, then the PR can be closed.

@sillent sillent closed this Aug 9, 2022
@sillent sillent deleted the 13467-fix-kubernets_logs branch September 18, 2022 06:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants