fix(kubernetes_logs): delaying applied new pod #13886

sillent · 2022-08-08T20:22:55Z

if my guess described in problem #13467 is correct, then as a quick solution i suggest putting the event about the creation of a new pod in a delayed queue equal to the delayed queue for deleting a pod plus one second

netlify · 2022-08-08T20:22:59Z

✅ Deploy Preview for vector-project canceled.

Name	Link
🔨 Latest commit	`f366ef3`
🔍 Latest deploy log	https://app.netlify.com/sites/vector-project/deploys/62f1f306b9fdf20008a7484b

github-actions · 2022-08-08T23:50:41Z

Soak Test Results

Baseline: 8f2ae47
Comparison: d51e1a4
Total Vector CPUs: 4

Explanation

A soak test is an integrated performance test for vector in a repeatable rig, with varying configuration for vector. What follows is a statistical summary of a brief vector run for each configuration across SHAs given above. The goal of these tests are to determine, quickly, if vector performance is changed and to what degree by a pull request. Where appropriate units are scaled per-core.

The table below, if present, lists those experiments that have experienced a statistically significant change in their throughput performance between baseline and comparision SHAs, with 90.0% confidence OR have been detected as newly erratic. Negative values mean that baseline is faster, positive comparison. Results that do not exhibit more than a ±8.87% change in mean throughput are discarded. An experiment is erratic if its coefficient of variation is greater than 0.3. The abbreviated table will be omitted if no interesting changes are observed.

No interesting changes in throughput with confidence ≥ 90.00% and absolute Δ mean >= ±8.87%:

Fine details of change detection per experiment.

experiment	Δ mean	Δ mean %	confidence	baseline mean	baseline stdev	baseline stderr	baseline CoV	comparison mean	comparison stdev	comparison stderr	comparison CoV	erratic	declared erratic
syslog_log2metric_splunk_hec_metrics	659.68KiB	3.61	100.00%	17.85MiB	620.48KiB	12.64KiB	0.0339313	18.5MiB	721.71KiB	14.7KiB	0.0380928	False	False
syslog_humio_logs	585.81KiB	3.45	100.00%	16.56MiB	127.26KiB	2.6KiB	0.00750173	17.13MiB	111.08KiB	2.27KiB	0.00632975	False	False
http_to_http_acks	565.99KiB	3.17	98.13%	17.44MiB	8.12MiB	169.77KiB	0.465617	17.99MiB	8.17MiB	170.42KiB	0.454096	True	True
syslog_regex_logs2metric_ddmetrics	342.5KiB	2.72	100.00%	12.32MiB	538.04KiB	10.96KiB	0.0426466	12.65MiB	649.93KiB	13.24KiB	0.0501534	False	False
syslog_splunk_hec_logs	433.47KiB	2.63	100.00%	16.1MiB	848.69KiB	17.26KiB	0.0514555	16.53MiB	714.64KiB	14.56KiB	0.0422186	False	False
syslog_loki	309.99KiB	2.13	100.00%	14.25MiB	306.33KiB	6.27KiB	0.0209949	14.55MiB	732.35KiB	14.89KiB	0.0491494	False	False
http_text_to_http_json	786.89KiB	2.02	100.00%	38.11MiB	1.12MiB	23.51KiB	0.0295062	38.88MiB	1.21MiB	25.33KiB	0.0311694	False	False
splunk_hec_route_s3	378.07KiB	1.95	100.00%	18.9MiB	2.2MiB	45.89KiB	0.116531	19.27MiB	2.13MiB	44.46KiB	0.110353	False	False
syslog_log2metric_humio_metrics	243.24KiB	1.89	100.00%	12.55MiB	246.93KiB	5.04KiB	0.0192059	12.79MiB	483.13KiB	9.84KiB	0.0368792	False	False
datadog_agent_remap_datadog_logs_acks	257.39KiB	0.41	96.73%	61.53MiB	3.54MiB	73.97KiB	0.057503	61.78MiB	4.57MiB	95.08KiB	0.0739038	False	False
splunk_hec_to_splunk_hec_logs_noack	31.79KiB	0.13	98.15%	23.81MiB	572.31KiB	11.67KiB	0.023472	23.84MiB	331.46KiB	6.77KiB	0.0135763	False	False
datadog_agent_remap_blackhole	66.9KiB	0.1	41.36%	62.36MiB	4.52MiB	94.29KiB	0.0725061	62.43MiB	3.78MiB	78.9KiB	0.0605577	False	False
splunk_hec_indexer_ack_blackhole	-1.73KiB	-0.01	5.35%	23.75MiB	882.37KiB	17.94KiB	0.0362806	23.74MiB	907.77KiB	18.46KiB	0.0373276	False	False
enterprise_http_to_http	-2.0KiB	-0.01	21.78%	23.85MiB	248.69KiB	5.08KiB	0.0101819	23.84MiB	252.86KiB	5.17KiB	0.0103536	False	False
http_pipelines_blackhole_acks	-373.81B	-0.03	9.27%	1.12MiB	119.12KiB	2.43KiB	0.104195	1.12MiB	97.39KiB	1.99KiB	0.0852117	False	False
splunk_hec_to_splunk_hec_logs_acks	-11.22KiB	-0.05	36.46%	23.76MiB	800.34KiB	16.29KiB	0.032882	23.75MiB	843.54KiB	17.16KiB	0.0346731	False	False
file_to_blackhole	-58.77KiB	-0.06	43.39%	95.34MiB	3.16MiB	65.49KiB	0.0331309	95.28MiB	3.78MiB	78.73KiB	0.0397037	False	False
socket_to_socket_blackhole	-18.05KiB	-0.07	88.78%	23.7MiB	280.93KiB	5.73KiB	0.0115722	23.68MiB	480.18KiB	9.8KiB	0.0197943	False	False
http_to_http_json	-32.45KiB	-0.13	98.81%	23.84MiB	355.24KiB	7.25KiB	0.0145472	23.81MiB	522.51KiB	10.67KiB	0.021425	False	False
datadog_agent_remap_blackhole_acks	-196.29KiB	-0.3	91.61%	62.98MiB	4.69MiB	97.69KiB	0.0744991	62.79MiB	2.76MiB	57.85KiB	0.0440005	False	False
http_to_http_noack	-119.94KiB	-0.49	100.00%	23.85MiB	256.01KiB	5.24KiB	0.0104822	23.73MiB	1.17MiB	24.43KiB	0.0493944	False	False
datadog_agent_remap_datadog_logs	-332.64KiB	-0.53	99.92%	61.57MiB	2.22MiB	46.6KiB	0.0360783	61.24MiB	4.21MiB	87.72KiB	0.0687559	False	False
fluent_elasticsearch	-502.91KiB	-0.62	100.00%	79.47MiB	53.7KiB	1.08KiB	0.000659681	78.98MiB	4.69MiB	96.2KiB	0.0593516	False	False
http_pipelines_no_grok_blackhole	-68.28KiB	-0.62	99.90%	10.76MiB	122.34KiB	2.5KiB	0.0110988	10.7MiB	1010.98KiB	20.57KiB	0.0922918	False	False
http_pipelines_blackhole	-17.28KiB	-1.06	100.00%	1.6MiB	36.77KiB	769.44B	0.0224863	1.58MiB	96.41KiB	1.97KiB	0.059594	False	False

github-actions · 2022-08-09T07:19:43Z

Soak Test Results

Baseline: afcee9b
Comparison: f366ef3
Total Vector CPUs: 4

Explanation

A soak test is an integrated performance test for vector in a repeatable rig, with varying configuration for vector. What follows is a statistical summary of a brief vector run for each configuration across SHAs given above. The goal of these tests are to determine, quickly, if vector performance is changed and to what degree by a pull request. Where appropriate units are scaled per-core.

The table below, if present, lists those experiments that have experienced a statistically significant change in their throughput performance between baseline and comparision SHAs, with 90.0% confidence OR have been detected as newly erratic. Negative values mean that baseline is faster, positive comparison. Results that do not exhibit more than a ±8.87% change in mean throughput are discarded. An experiment is erratic if its coefficient of variation is greater than 0.3. The abbreviated table will be omitted if no interesting changes are observed.

No interesting changes in throughput with confidence ≥ 90.00% and absolute Δ mean >= ±8.87%:

Fine details of change detection per experiment.

experiment	Δ mean	Δ mean %	confidence	baseline mean	baseline stdev	baseline stderr	baseline CoV	comparison mean	comparison stdev	comparison stderr	comparison CoV	erratic	declared erratic
syslog_log2metric_splunk_hec_metrics	703.07KiB	3.86	100.00%	17.8MiB	605.24KiB	12.33KiB	0.033202	18.48MiB	793.91KiB	16.16KiB	0.0419343	False	False
syslog_humio_logs	595.26KiB	3.52	100.00%	16.51MiB	154.04KiB	3.15KiB	0.00911044	17.09MiB	158.97KiB	3.26KiB	0.00908246	False	False
syslog_splunk_hec_logs	576.07KiB	3.47	100.00%	16.19MiB	799.06KiB	16.25KiB	0.0481784	16.76MiB	552.12KiB	11.27KiB	0.032172	False	False
http_to_http_acks	609.83KiB	3.41	98.98%	17.49MiB	8.01MiB	167.37KiB	0.457644	18.08MiB	8.06MiB	168.22KiB	0.445476	True	True
syslog_regex_logs2metric_ddmetrics	397.57KiB	3.13	100.00%	12.39MiB	534.65KiB	10.89KiB	0.0421283	12.78MiB	521.55KiB	10.64KiB	0.0398481	False	False
http_text_to_http_json	1.08MiB	2.78	100.00%	39.03MiB	782.67KiB	15.98KiB	0.0195813	40.11MiB	719.8KiB	14.7KiB	0.0175213	False	False
splunk_hec_route_s3	403.04KiB	2.09	100.00%	18.8MiB	2.27MiB	47.18KiB	0.120532	19.19MiB	2.17MiB	45.39KiB	0.113068	False	False
datadog_agent_remap_datadog_logs_acks	972.82KiB	1.5	100.00%	63.23MiB	3.49MiB	72.88KiB	0.0551809	64.18MiB	4.32MiB	89.88KiB	0.0672386	False	False
datadog_agent_remap_datadog_logs	789.12KiB	1.23	100.00%	62.44MiB	1.56MiB	32.8KiB	0.0250262	63.21MiB	4.28MiB	89.16KiB	0.0676999	False	False
datadog_agent_remap_blackhole	738.94KiB	1.13	100.00%	63.65MiB	4.29MiB	89.28KiB	0.0673429	64.37MiB	2.39MiB	49.93KiB	0.0371378	False	False
syslog_loki	114.85KiB	0.76	100.00%	14.83MiB	264.9KiB	5.43KiB	0.0174428	14.94MiB	741.73KiB	15.08KiB	0.0484748	False	False
socket_to_socket_blackhole	107.81KiB	0.46	100.00%	23.06MiB	539.17KiB	11.01KiB	0.0228251	23.17MiB	523.16KiB	10.68KiB	0.0220469	False	False
datadog_agent_remap_blackhole_acks	96.11KiB	0.15	59.00%	60.88MiB	4.57MiB	95.18KiB	0.0750415	60.97MiB	3.22MiB	67.4KiB	0.0528093	False	False
syslog_log2metric_humio_metrics	13.21KiB	0.1	74.00%	12.96MiB	245.1KiB	5.0KiB	0.0184651	12.97MiB	520.72KiB	10.6KiB	0.0391906	False	False
splunk_hec_to_splunk_hec_logs_noack	18.86KiB	0.08	88.52%	23.82MiB	483.29KiB	9.86KiB	0.0198099	23.84MiB	331.6KiB	6.77KiB	0.0135817	False	False
splunk_hec_to_splunk_hec_logs_acks	3.55KiB	0.01	11.70%	23.75MiB	847.0KiB	17.22KiB	0.0348198	23.75MiB	831.68KiB	16.92KiB	0.0341851	False	False
enterprise_http_to_http	-528.5B	-0	5.74%	23.85MiB	250.39KiB	5.11KiB	0.0102522	23.85MiB	245.62KiB	5.03KiB	0.0100573	False	False
splunk_hec_indexer_ack_blackhole	-8.64KiB	-0.04	27.73%	23.76MiB	824.84KiB	16.78KiB	0.033895	23.75MiB	867.94KiB	17.65KiB	0.0356784	False	False
file_to_blackhole	-48.65KiB	-0.05	36.18%	95.34MiB	3.33MiB	69.03KiB	0.0349204	95.29MiB	3.71MiB	77.06KiB	0.0388786	False	False
http_to_http_json	-53.67KiB	-0.22	99.98%	23.85MiB	347.67KiB	7.1KiB	0.0142342	23.79MiB	600.48KiB	12.24KiB	0.0246392	False	False
http_to_http_noack	-96.29KiB	-0.39	99.99%	23.84MiB	403.54KiB	8.26KiB	0.0165274	23.75MiB	1.09MiB	22.78KiB	0.0460104	False	False
fluent_elasticsearch	-468.73KiB	-0.58	100.00%	79.47MiB	53.08KiB	1.07KiB	0.000652058	79.02MiB	4.43MiB	91.03KiB	0.0561143	False	False
http_pipelines_no_grok_blackhole	-103.1KiB	-0.89	100.00%	11.26MiB	68.41KiB	1.4KiB	0.00593076	11.16MiB	1.11MiB	23.21KiB	0.0998472	False	False
http_pipelines_blackhole_acks	-11.65KiB	-0.96	100.00%	1.18MiB	103.4KiB	2.11KiB	0.0853527	1.17MiB	69.51KiB	1.42KiB	0.0579307	False	False
http_pipelines_blackhole	-21.2KiB	-1.27	100.00%	1.63MiB	28.02KiB	585.68B	0.0167766	1.61MiB	104.98KiB	2.14KiB	0.063662	False	False

tobz · 2022-08-09T16:17:08Z

Reading through the original issue, I don't believe this is a suitable fix.

Anything that is based on the timing of an event is too fragile to be considered reliable in production workloads. Based on the comments that @jszwedko left, it seems like the correct solution should revolve around including a better unique identifier so that when we delete the metadata for the pod, we only delete the metadata for the previous pod, not the new pod that shares the same name.

If you're willing to try to work up a solution based on that approach, feel free to do so in this PR. Otherwise, we would end up closing this one.

sillent · 2022-08-09T18:51:03Z

No problem. I do not have a complete understanding of the architecture of this solution, if in your opinion the problem is not in this place, then the PR can be closed.

delay watcher applied event

f04fef9

refactor failed test

d51e1a4

fix test applied_should_add_object

f366ef3

sillent changed the title ~~enhancement(kubernetes_logs): delaying applied new pod~~ fix(kubernetes_logs): delaying applied new pod Aug 9, 2022

sillent closed this Aug 9, 2022

sillent deleted the 13467-fix-kubernets_logs branch September 18, 2022 06:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(kubernetes_logs): delaying applied new pod #13886

fix(kubernetes_logs): delaying applied new pod #13886

sillent commented Aug 8, 2022

netlify bot commented Aug 8, 2022 •

edited

Loading

github-actions bot commented Aug 8, 2022

github-actions bot commented Aug 9, 2022

tobz commented Aug 9, 2022 •

edited

Loading

sillent commented Aug 9, 2022

fix(kubernetes_logs): delaying applied new pod #13886

fix(kubernetes_logs): delaying applied new pod #13886

Conversation

sillent commented Aug 8, 2022

netlify bot commented Aug 8, 2022 • edited Loading

✅ Deploy Preview for vector-project canceled.

github-actions bot commented Aug 8, 2022

Soak Test Results

github-actions bot commented Aug 9, 2022

Soak Test Results

tobz commented Aug 9, 2022 • edited Loading

sillent commented Aug 9, 2022

netlify bot commented Aug 8, 2022 •

edited

Loading

tobz commented Aug 9, 2022 •

edited

Loading