Skip to content
This repository has been archived by the owner on Jul 5, 2023. It is now read-only.

Implement support for named ports in NetPols #134

Merged
merged 7 commits into from
Feb 26, 2021

Conversation

Phil1602
Copy link
Member

@Phil1602 Phil1602 commented Jan 15, 2021

Fixes #131

Changes

Illuminatio will lookup the port name during test case orchestration and will use numerical port by then. Currently the first matched pod/container's port will be used, which should be sufficient for the most cases. Note that there are some cases, which are still not yet covered and could lead to false positives for example (see 2. point in TODOs).

While testing the implementation using https://github.com/cloudogu/k8s-security-demos I have discovered some more issues.
Most of them need further investigation, but I have created some sanity checks (e.g. if =! none) to handle the errors and get at least a result. I will try to split those different related issues into actual issues to have them addressed separately.

Implementation

The named ports are printed out as is for the generated test cases, but are resolved later one to numerical ports. This is basically the same approach as for the pod label selectors, which are translated to actual pods or dummy pods.

TODOs

  • Cleanup commits
  • Think about the problem that different pods with the same labelset could have ports with different number but equal names. It be nice if we would be able to actually test it, because there could be also some differences between the CNIs in their implementation/support of this "edge case".
    -> I will create a followup issue therefore since this requires architectural redesign in some places.
  • Unit tests
  • e2e test cases
  • Create follow-up issues for the discovered problems
  • Check if egress ports can also be specified using a name of a label selected pod + implement if necessary
    -> Egress is currently not (fully) supported anway(?), see Support egress policies #65. Since namedports are rarely used in egress policies, I decided to skip this feature for now.

@Phil1602 Phil1602 force-pushed the gh-issue-131 branch 6 times, most recently from 56d9d00 to 5bb572d Compare January 17, 2021 13:17
@codecov
Copy link

codecov bot commented Jan 17, 2021

Codecov Report

Merging #134 (a2e94de) into master (dd57599) will decrease coverage by 1.00%.
The diff coverage is 12.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #134      +/-   ##
==========================================
- Coverage   39.21%   38.20%   -1.01%     
==========================================
  Files          11       11              
  Lines        1321     1369      +48     
  Branches      264      276      +12     
==========================================
+ Hits          518      523       +5     
- Misses        754      798      +44     
+ Partials       49       48       -1     
Impacted Files Coverage Δ
src/illuminatio/illuminatio.py 0.00% <0.00%> (ø)
src/illuminatio/illuminatio_runner.py 28.12% <0.00%> (-0.45%) ⬇️
src/illuminatio/test_orchestrator.py 23.22% <10.00%> (-0.95%) ⬇️
src/illuminatio/k8s_util.py 20.00% <16.66%> (-1.74%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update dd57599...e4c283c. Read the comment docs.

@Phil1602 Phil1602 force-pushed the gh-issue-131 branch 5 times, most recently from 7ea674c to c8f992f Compare January 24, 2021 21:13
@Phil1602 Phil1602 force-pushed the gh-issue-131 branch 4 times, most recently from dc7e280 to 3e7b08d Compare January 25, 2021 22:28
@Phil1602 Phil1602 self-assigned this Jan 26, 2021
@Phil1602 Phil1602 added the bug Something isn't working label Jan 26, 2021
@Phil1602 Phil1602 force-pushed the gh-issue-131 branch 4 times, most recently from 016bfb6 to 2dc5d68 Compare January 31, 2021 19:26
Copy link
Member

@maxbischoff maxbischoff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some minor comments and one potential issues with named ports containing a -. Otherwise LGTM, thanks for contributing :)

@@ -11,7 +11,8 @@ minikube start \
--cni=calico \
--container-runtime=docker \
--host-only-cidr=172.17.17.1/24 \
--kubernetes-version="${KUBERNETES_VERSION}"
--kubernetes-version="${KUBERNETES_VERSION}" \
--insecure-registry=localhost:5000
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to have not been needed up until now, do you know why?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kubelet was not able to pull the images from the internal minikube registry addon.
After adding this, it worked for me. But I did not further investigate logs besides the kubelet event, so I'm not sure whats the exact cause and why the pipeline does not have this issue.

@@ -264,6 +264,14 @@ def transform_results(
LOGGER.debug("mapped_sender_pod: %s", mapped_sender_pod)
LOGGER.debug("mapped_receiver_pod: %s", mapped_receiver_pod)
LOGGER.debug("raw_results: %s", json.dumps(raw_results))
if mapped_receiver_pod not in raw_results[mapped_sender_pod]:
LOGGER.error(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I would prefer the results to contain an entry for these failures, where the state is ERR. This way other output formats would also show the error without checking logs and test results would not vanish "silently" (when only looking at the result).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good, yes.

I thought about refactoring this whole error handling in another gh issue (basically differentiate between success, failure(case failed) and error (prerequisites etc failed). But you are right, should at least be consistent for now.

@@ -151,6 +151,9 @@ def run_tests_for_sender_pod(sender_pod, cases):
# TODO check if network ns is None -> HostNetwork is set
results = {}
for target, ports in cases[from_host_string].items():
if target is None:
LOGGER.error("Target is none. Skipping")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, I think I would prefer the results to contain an entry for these failures, where the state is ERR. This way other output formats would also show the error without checking logs and test results would not vanish "silently" (when only looking at the result).



def resolve_port_name(namespace, pod_label_selector_string, portname):
api = k8s.client.CoreV1Api()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer passing the pod list. In test_orchestrator all calculations so far have been done on one momentary state of the API, so that the state is consistent across the different steps of translating test cases to the test run.

p for p in service_ports if p.target_port == port_int
p
for p in service_ports
# TODO Services could also contain named ports. We should cross-compare here (resvoledName == numerical)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would be done at another point, as we are iterating over numbered_ports here, wouldn't it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment is related to the following issue (I guess we think about different things):

We are iterating over numbered_ports, but it could be the case, that the NetworkPolicy itself contains a numerical port (e.g. 80), but the Service uses a name (e.g. http), which is defined within the container.
In this case we are fine, since there is a service for port 80, but the comparison (e.g. if 80 == http) would fail and let the case fail, hence we would need to resolve the service ports here first to handle each edge case correctly.

I decided to skip this and only leave a comment as this is a edge case. I guess someone will use named ports in the NetPol and Svc accordingly in general.


# Strip of the "-" and add it again after name has been resolved
prefix = "-" if "-" in named_port else ""
named_port_without_prefix = named_port.replace("-", "")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Port names can also contain a -, as far as I know.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, thanks.

str(service_ports_for_port[0].port),
)
else:
# TODO change to exception (explanation see above)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

which is the explanation?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same as for the numerical ports. Unfortunately the comment is not within the "review window":

"# TODO this was a hotfix for recipe 11, where ports 53 were allowed but not for any target,
# resulting in test to 53 being written despite no service matching them existing.
# That error should be handled in test generation, an exception here would be fine"

Copy link
Member Author

@Phil1602 Phil1602 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will provide fixes during this week (hopefully ;) )

p for p in service_ports if p.target_port == port_int
p
for p in service_ports
# TODO Services could also contain named ports. We should cross-compare here (resvoledName == numerical)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment is related to the following issue (I guess we think about different things):

We are iterating over numbered_ports, but it could be the case, that the NetworkPolicy itself contains a numerical port (e.g. 80), but the Service uses a name (e.g. http), which is defined within the container.
In this case we are fine, since there is a service for port 80, but the comparison (e.g. if 80 == http) would fail and let the case fail, hence we would need to resolve the service ports here first to handle each edge case correctly.

I decided to skip this and only leave a comment as this is a edge case. I guess someone will use named ports in the NetPol and Svc accordingly in general.

@@ -264,6 +264,14 @@ def transform_results(
LOGGER.debug("mapped_sender_pod: %s", mapped_sender_pod)
LOGGER.debug("mapped_receiver_pod: %s", mapped_receiver_pod)
LOGGER.debug("raw_results: %s", json.dumps(raw_results))
if mapped_receiver_pod not in raw_results[mapped_sender_pod]:
LOGGER.error(
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good, yes.

I thought about refactoring this whole error handling in another gh issue (basically differentiate between success, failure(case failed) and error (prerequisites etc failed). But you are right, should at least be consistent for now.

str(service_ports_for_port[0].port),
)
else:
# TODO change to exception (explanation see above)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same as for the numerical ports. Unfortunately the comment is not within the "review window":

"# TODO this was a hotfix for recipe 11, where ports 53 were allowed but not for any target,
# resulting in test to 53 being written despite no service matching them existing.
# That error should be handled in test generation, an exception here would be fine"

@@ -11,7 +11,8 @@ minikube start \
--cni=calico \
--container-runtime=docker \
--host-only-cidr=172.17.17.1/24 \
--kubernetes-version="${KUBERNETES_VERSION}"
--kubernetes-version="${KUBERNETES_VERSION}" \
--insecure-registry=localhost:5000
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kubelet was not able to pull the images from the internal minikube registry addon.
After adding this, it worked for me. But I did not further investigate logs besides the kubelet event, so I'm not sure whats the exact cause and why the pipeline does not have this issue.


# Strip of the "-" and add it again after name has been resolved
prefix = "-" if "-" in named_port else ""
named_port_without_prefix = named_port.replace("-", "")
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, thanks.

@maxbischoff maxbischoff merged commit b2892d8 into inovex:master Feb 26, 2021
maxbischoff added a commit that referenced this pull request Feb 26, 2021
maxbischoff added a commit that referenced this pull request Feb 26, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Named ports in NetPols lead to error
2 participants