Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NETOBSERV-1995: Use global map, spinlock, split maps #469

Merged
merged 8 commits into from
Dec 12, 2024

Conversation

jotak
Copy link
Member

@jotak jotak commented Nov 27, 2024

Also:

  • remove MAC and eth_protocol from map key (they're moved to flow
    metrics ie. map values)
  • in every hook, we don't update flow counters, even when the flow doesn't exist and is created. Previously, we were
    adding bytes/packet counters from hooks, which doesn't seem correct as
    counting bytes/packets is the role of the TC hook only - else it can
    lead to overcounts I guess
  • remove code that sets MIN_RTT on TCP flows (not sure why we had that)
  • small refactoring of the "Accumulate" functions to adapt to new model

Copy link

openshift-ci bot commented Nov 27, 2024

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@jotak
Copy link
Member Author

jotak commented Nov 27, 2024

} else {
// Key does not exist in the map, and will need to create a new entry.
u64 rtt = 0;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@msherif1234 I don't understand the rationale for this code. We're setting default RTT for TCP flows, how could that be accurate? I don't see how this wouldn't produce fake data.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this the min value of ythe hook isn't fire IIRC it was a define in the kernel and I believe we doc that as the default min RTT, when I recall more I will update here

Copy link

codecov bot commented Nov 28, 2024

Codecov Report

Attention: Patch coverage is 54.54545% with 95 lines in your changes missing coverage. Please review.

Project coverage is 29.68%. Comparing base (cbe3140) to head (f8c24aa).
Report is 2 commits behind head on main.

Files with missing lines Patch % Lines
pkg/tracer/tracer.go 0.00% 33 Missing ⚠️
pkg/model/flow_content.go 64.47% 18 Missing and 9 partials ⚠️
pkg/exporter/ipfix.go 0.00% 14 Missing ⚠️
pkg/model/record.go 36.36% 6 Missing and 1 partial ⚠️
pkg/tracer/tracer_legacy.go 0.00% 7 Missing ⚠️
pkg/decode/decode_protobuf.go 90.00% 2 Missing and 1 partial ⚠️
pkg/agent/agent.go 0.00% 2 Missing ⚠️
pkg/ebpf/bpf_x86_bpfel.go 0.00% 1 Missing ⚠️
pkg/exporter/kafka_proto.go 83.33% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #469      +/-   ##
==========================================
+ Coverage   29.65%   29.68%   +0.03%     
==========================================
  Files          50       51       +1     
  Lines        4863     4915      +52     
==========================================
+ Hits         1442     1459      +17     
- Misses       3316     3350      +34     
- Partials      105      106       +1     
Flag Coverage Δ
unittests 29.68% <54.54%> (+0.03%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
pkg/flow/account.go 84.74% <100.00%> (ø)
pkg/flow/decorator.go 100.00% <100.00%> (ø)
pkg/flow/deduper.go 94.54% <100.00%> (-0.05%) ⬇️
pkg/flow/tracer_map.go 82.35% <100.00%> (+4.57%) ⬆️
pkg/test/tracer_fake.go 52.77% <100.00%> (ø)
pkg/ebpf/bpf_x86_bpfel.go 0.00% <0.00%> (ø)
pkg/exporter/kafka_proto.go 70.00% <83.33%> (ø)
pkg/agent/agent.go 33.74% <0.00%> (ø)
pkg/decode/decode_protobuf.go 29.49% <90.00%> (-0.48%) ⬇️
pkg/model/record.go 78.00% <36.36%> (+9.11%) ⬆️
... and 4 more

@jotak jotak marked this pull request as ready for review November 28, 2024 12:51
@jotak jotak requested a review from msherif1234 November 28, 2024 12:51
@jotak
Copy link
Member Author

jotak commented Nov 28, 2024

cc @tohojo

@jotak
Copy link
Member Author

jotak commented Nov 28, 2024

removing packed attribute gives another 20% CPU gain
I'm currently at about -40% CPU and -10% memory
And haven't done the in-kernel dedup yet ...

@jotak jotak changed the title Use global map, spinlock, split maps NETOBSERV-1995: Use global map, spinlock, split maps Nov 28, 2024
@openshift-ci-robot
Copy link
Collaborator

openshift-ci-robot commented Nov 28, 2024

@jotak: This pull request references NETOBSERV-1995 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.19.0" version, but no target version was set.

In response to this:

Also:

  • remove MAC and eth_protocol from map key (they're moved to flow
    metrics ie. map values)
  • in every hook, we don't update flow counters, even when the flow doesn't exist and is created. Previously, we were
    adding bytes/packet counters from hooks, which doesn't seem correct as
    counting bytes/packets is the role of the TC hook only - else it can
    lead to overcounts I guess
  • remove code that sets MIN_RTT on TCP flows (not sure why we had that)
  • small refactoring of the "Accumulate" functions to adapt to new model

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@@ -1,107 +0,0 @@
package model
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hey @msherif1234 , I'm removing this test because I think it is of quite low value: it only tests ReadFrom, which is really a direct call to binary.Read. Meaning all it does is testing that the encoding/binary standard lib is right. This, at a price of sometimes painful maintenance. Does it worth it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the reason u removing it because u removed packed so the structure no longer aligns ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the advantage of this make sure the auto generated structure are sane too

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still don't get, what sort of issue could it catch, do you have an example?
Anyway, I'm fine reintroducing it. It was not only for the padding change, it's any change in the bpf types that make this test painful to maintain.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we go: 119061c
But I'd still appreciate if you can give me a concrete example. It doesn't really test anything about the types.h / go struct mapping, it only test that a go struct is what it claims to be ... that doesn't make really sense to me.

bpf/flows.c Outdated Show resolved Hide resolved
@tohojo
Copy link

tohojo commented Nov 29, 2024

Looked through the BPF code, and it basically LGTM :)

bpf/flows.c Outdated
@@ -93,6 +93,7 @@ static inline int flow_monitor(struct __sk_buff *skb, u8 direction) {
// a spinlocked alternative version and use it selectively https://lwn.net/Articles/779120/
flow_metrics *aggregate_flow = (flow_metrics *)bpf_map_lookup_elem(&aggregated_flows, &id);
if (aggregate_flow != NULL) {
bpf_spin_lock(&aggregate_flow->lock);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we probably need to check the return code for spin lock and unlock and log an error

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually, there's no error code if I'm correct. The doc for bpf_spin_lock says

 * Returns
 * 	0

In linux helper.c it returns void: https://elixir.bootlin.com/linux/v6.12.1/source/kernel/bpf/helpers.c#L283

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doc said it was returning something https://docs.ebpf.io/linux/helper-function/bpf_spin_lock/ maybe doc is wrong

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bpf/flows.c Outdated
@@ -80,7 +80,7 @@ static inline int flow_monitor(struct __sk_buff *skb, u8 direction) {
id.direction = direction;

// check if this packet need to be filtered if filtering feature is enabled
bool skip = check_and_do_flow_filtering(&id, pkt.flags, 0);
bool skip = check_and_do_flow_filtering(&pkt, 0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand the need to replace id with pkt struct that generated lots of unnecessary changes and I also planned to even delete the pkt info struct at some point ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is due to removing stuff from flow_id, and moving them to flow_metrics, like MAC addresses. As a result, id isn't sufficient to pass it to all these functions in utils. So instead of adding tons of arguments to the signatures I just use the pkt that already has everything needed

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pkt is convenient to not have extra long signatures

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

u just removed l2 and interface right ? or am I missing something else ? why would flow filter or util core apis need to change ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and eth_protocol and direction

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remember datapath has to be efficient adding additional dereferrence isn't efficient IMO that stuff not used and I don't think will impact the apis to change the signature much

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm reaching the 5 args limit of ebpf calls if I want to remove them from pkt

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed via 8e69cfd
that allows also to get rid of a few unnecessary MAC copying


// even if we know that the entry is new, another CPU might be concurrently inserting a flow
// so we need to specify BPF_ANY
long ret = bpf_map_update_elem(&aggregated_flows, &id, &new_flow, BPF_ANY);
long ret = bpf_map_update_elem(&aggregated_flows, &id, &new_flow, BPF_NOEXIST);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we will probably need to spin lock/unlock here as well

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the lock is on the value object, not the key, and it's a new object in that case, so I don't think it makes sense to lock

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if u have concurrent new flows on different cpus one should create and the other will update ?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's what the BPF_NOEXIST flag does: If two CPUs call this concurrently, one of them will fail with EEXIST, and then fall back to the lookup/update path (which does take the lock)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

with eexist lookup and update we need to take the lock as well what about BPF_F_LOCK can I set that flag to takecare of the lock/unlock for me ?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BPF_F_LOCK is for when you want to replace the whole contents of the map value; but that's not what you're doing, you are increasing a couple of counters. So you just re-use exactly the same update code as for when the map lookup succeeds in the first place :)

u16 flags;
u64 latency;
u8 errno;
} dns_record;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why we keep dns record here and not move to additional metrics ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah I mentioned that in the TODO here #470
I didn't do it at first because it's modified from the TC hook so it isn't required to move it ... but still it would probably be better to do it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

addressed via 01950d5

bpf/types.h Outdated
const struct filter_value_t *unused9 __attribute__((unused));

// Force emitting enums/structs into the ELF
const enum tcp_flags_t *unused0 __attribute__((unused));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I intentionally added that line after each structure definition as reminder so bpf2go can generate eqv go struct for userspace

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They're just moved at the end of the file. I can rollback that, I don't really care, but I was finding it easier to choose the next "unusedX" number when having all of them in sight at once

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not grouping them to gether make it easy to forget but at the cost of looking for the next unused manually

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok i'll rollback that

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

addressed via 01950d5

@@ -106,6 +106,7 @@ prereqs: ## Check if prerequisites are met, and install missing dependencies
fmt: ## Run go fmt against code.
@echo "### Formatting code"
go fmt ./...
find ./bpf -type f -not -path "./bpf/headers/*" -name "*.[ch]" | xargs clang-format -i --Werror
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have this rule as part of the lint target we can make it part of the fmt but in that case we need to remove it from the lint so we won't run it twice when build ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's for two different things: the format rule allows to auto-format as part of the local build process on your machine, whereas the lint rule is still useful on the CI to make sure we didn't commit & push unformatted code.

@jotak jotak force-pushed the global-map branch 2 times, most recently from 4c5c1af to 314c328 Compare December 3, 2024 10:24
@jotak
Copy link
Member Author

jotak commented Dec 3, 2024

I appear to have broken DNS somehow ... fixed

@@ -51,13 +51,27 @@
*/
#include "network_events_monitoring.h"

static inline void update_dns(additional_metrics *extra_metrics, pkt_info *pkt, int dns_errno) {
if (pkt->dns_id != 0) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

u can check dns_err here and if its 0 then add DNS info else add dns_error

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there can be an error and still have DNS data: when the dns request wasn't found, we can't generate latency, but we still provide the other data, and return ENOENT.
So it shouldn't be a if/else here, if we want to still provide what we have

aggregate_flow->dns_record.flags = pkt->dns_flags;
aggregate_flow->dns_record.latency = pkt->dns_latency;
aggregate_flow->dns_record.errno = dns_errno;
bpf_spin_unlock(&aggregate_flow->lock);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we no longer needs

// it might happen that start_mono_time hasn't been set due to

as that was specific to perCPU map

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

correct, good catch

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

SrcPort uint16
DstPort uint16
TransportProtocol uint8
IcmpType uint8
IcmpCode uint8
_ [3]byte
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this example of what removing packed will cause BTW we could avoid so gaps by moving the fields around if u move direction to the bottom might help this struct ?

Copy link
Member Author

@jotak jotak Dec 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so in this PR I haven't care of these little optimizations because in the next PR I still move stuff around and there I'm caring about these optimizations

case "icmpCodeIPv4", "icmpCodeIPv6":
ieVal.SetUnsigned8Value(record.Id.IcmpCode)
ieVal.SetUnsigned8Value(record.ID.IcmpCode)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this change from Id to ID generated tons of unnecessary changes here in many other places

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah but it's the naming convention & makes golint happier ...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might make linter happier but screw up all inflightes PRs :(

@msherif1234
Copy link
Contributor

few nits but I am more less done with review, will be good to run features testing with this PR pre merge
/lgtm

@msherif1234
Copy link
Contributor

pls update the arch doc reference to perCPu hashmap and make sure to run #172 (comment) to make sure we have no concurrency issues again

Checking for zero-values made sense only with per-cpu maps
@openshift-ci openshift-ci bot removed the lgtm label Dec 5, 2024
@jotak
Copy link
Member Author

jotak commented Dec 5, 2024

btw added a diagram for kernel space https://github.com/jotak/netobserv-agent/blob/global-map/docs/architecture.md

@msherif1234
Copy link
Contributor

btw added a diagram for kernel space https://github.com/jotak/netobserv-agent/blob/global-map/docs/architecture.md

can u share the tool used to draw as we add new hooks on a regular basis so that diag will be quickly become out of date

@jotak
Copy link
Member Author

jotak commented Dec 6, 2024

@msherif1234 you don't necessarily need a tool you can directly edit script:

flowchart TD
    A[TC/X Hooks] -->|Accumulate packet data| M1(Global map: aggregated_flows)
    D{If DNS} -->|Req: store req info| MD(Global map: dns_flows)
    D -->|Resp: compute latency| MD
    A -->D
    D -->|Store DNS info| M2(PerCPU map: additional_flow_metrics)
    B[Drops Hook: kfree_skb] -->|Accumulate drop data| M2(PerCPU map: additional_flow_metrics)
    C[RTT Hook: tcp_rcv_established] -->|Extract & store sRTT| M2(PerCPU map: additional_flow_metrics)
    E[Events Hook: psample_sample_packet] -->|Accumulate net events| M2(PerCPU map: additional_flow_metrics)
    A -->F{If busy map / error}
    F -->|Single-packet flow| RB(RingBuffer)
    M1 --> |Polling|U[User space]
    M2 --> |Polling|U
    RB --> |Push|U

So you can easily add a new hook by adding something like:

F[My new hook] --> |any interaction written here| M2

(M2 stands for the additional metrics map)

Or if you prefer there's also online tools such as this: https://mermaid.live/

@jotak
Copy link
Member Author

jotak commented Dec 6, 2024

I took some approximative measurements on the ebpf side, written down there: https://docs.google.com/spreadsheets/d/1gdwX4ZLjKicaJzANVq-r-lAW6uZxefqjBsIw3Orx4bo/edit?gid=0#gid=0
That's on a single (worker) node, running my usual stress tests. I looked at the sum of each data structure size, then removed a constant 16MB that's taken by ring buffer to make the gains on maps more apparent.
Measurements are done by:

while true; do bpftool map show | grep -B 3 netobserv | grep memlock | sed -re 's/.*memlock ([0-9]+)B.*/\1/' | awk '{s+=$1} END {print s}' ; sleep 1; done

For CPU I used bpftop, with even more approximative manual measurement (don't rely too much on my CPU measurements!)

Between -28% and -54% memory used depending on scenarios. And between -4% and -36% CPU.

The next PR improves that a little bit further with between -42% and -61% memory, and around -20% CPU.

Note that if I count the RB size, the improvements are much lower because on my nodes & scenarios it takes most of the size, but since it is constant I think it's worth looking without it.

@jotak jotak added the ok-to-test To set manually when a PR is safe to test. Triggers image build on PR. label Dec 6, 2024
@jotak jotak requested a review from msherif1234 December 6, 2024 10:42
Copy link

github-actions bot commented Dec 6, 2024

New image:
quay.io/netobserv/netobserv-ebpf-agent:1cd8c04

It will expire after two weeks.

To deploy this build, run from the operator repo, assuming the operator is running:

USER=netobserv VERSION=1cd8c04 make set-agent-image

@msherif1234
Copy link
Contributor

/lgtm

@openshift-ci openshift-ci bot added the lgtm label Dec 6, 2024
@Amoghrd
Copy link

Amoghrd commented Dec 11, 2024

/label qe-approved

@openshift-ci openshift-ci bot added the qe-approved QE has approved this pull request label Dec 11, 2024
@openshift-ci-robot
Copy link
Collaborator

openshift-ci-robot commented Dec 11, 2024

@jotak: This pull request references NETOBSERV-1995 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.19.0" version, but no target version was set.

In response to this:

Also:

  • remove MAC and eth_protocol from map key (they're moved to flow
    metrics ie. map values)
  • in every hook, we don't update flow counters, even when the flow doesn't exist and is created. Previously, we were
    adding bytes/packet counters from hooks, which doesn't seem correct as
    counting bytes/packets is the role of the TC hook only - else it can
    lead to overcounts I guess
  • remove code that sets MIN_RTT on TCP flows (not sure why we had that)
  • small refactoring of the "Accumulate" functions to adapt to new model

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@jotak
Copy link
Member Author

jotak commented Dec 12, 2024

thanks @msherif1234 @tohojo @Amoghrd !
/approve

Copy link

openshift-ci bot commented Dec 12, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jotak

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@jotak jotak merged commit 70003f2 into netobserv:main Dec 12, 2024
10 of 12 checks passed
msherif1234 pushed a commit to msherif1234/netobserv-ebpf-agent that referenced this pull request Dec 12, 2024
* NETOBSERV-1987: split maps

Split BPF maps to avoid having one huge map for flows. This should move
us farther from the "stack size limit" that potentially shows up when
adding new features.

This is also a pre-req for later being able to move away from per-cpu
maps, which is wasting memory by duplicating flows across CPUs

Also:
- remove MAC and eth_protocol from map key (they're moved to flow
  metrics ie. map values)
- in every hook, we don't try anymore to create the flow it it didn't
  exist. We assume it should exist. Note that, previously, we were
adding bytes/packet counters from hooks, which doesn't seem correct as
counting bytes/packets is the role of the TC hook only - else it can
lead to overcounts
- remove code that sets MIN_RTT on TCP flows (not sure why we had that)
- small refactoring of the "Accumulate" functions to adapt to new model

Use global map & spinlock

regen .o

Remove packing attributes

Packed attributes make linter unhappy with the spin-lock
Also, turns out it improves quite a lot the performances to remove that:
my tests show another 20% gain

Remove low value test for binary read

TestRecordBinaryEncoding only tests ReadFrom which... basically is a
simple call to binary.Read. So we basically testing the encoding/binary
standard lib.

Avoid unnecessary indirections & copies of MAC

Move DNS to secondary map, revert types.h reorg

Fix DNS issue - retry in case of concurrent write

Also add an error counter for DNS map updates

* Address review comments (renamings)

* Make BpfFlowMetrics a pointer in records

* Reinsert binary encoding test

* Remove unnecessary checks

Checking for zero-values made sense only with per-cpu maps

* update architecture doc

* kernel space mermaid arch

* Update architecture.md
@msherif1234 msherif1234 mentioned this pull request Dec 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved jira/valid-reference lgtm ok-to-test To set manually when a PR is safe to test. Triggers image build on PR. qe-approved QE has approved this pull request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants