Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PR For the Charm Review #68

Closed
wants to merge 1 commit into from
Closed

Conversation

jdkandersson
Copy link
Contributor

This PR is intended to be used for the purpose of the GitHub runner charm review

@github-actions
Copy link
Contributor

Test coverage for 89e8919

Name                                    Stmts   Miss Branch BrPart  Cover   Missing
-----------------------------------------------------------------------------------
src/charm.py                              308     55     64     13    81%   254-264, 280-282, 291-292, 299, 310, 321->338, 326-331, 348-349, 353-354, 359-362, 378-379, 413-421, 451-453, 456->exit, 460-462, 485-490, 500-501, 503-504, 506-507, 584-585, 595
src/errors.py                              18      0      0      0   100%
src/event_timer.py                         42      8      4      0    74%   96-99, 119-122
src/github_type.py                         36      0      0      0   100%
src/lxd_type.py                            34      0      2      0   100%
src/repo_policy_compliance_service.py       5      5      0      0     0%   4-14
src/runner.py                             203     32     48     15    81%   137-145, 156-157, 164-171, 177-183, 242, 279-281, 288, 302, 312, 331, 365-370, 380, 464, 497, 523, 528-540, 554, 572
src/runner_manager.py                     195     20     80      8    88%   201-202, 216-222, 226-228, 238-239, 292, 316-320, 339, 357, 471, 491
src/runner_type.py                         46      0     10      0   100%
src/utilities.py                           55      2     12      4    91%   74->76, 78->84, 89-91
-----------------------------------------------------------------------------------
TOTAL                                     942    122    220     40    85%

Static code analysis report

Run started:2023-06-15 06:54:16.189508

Test results:
  No issues identified.

Code scanned:
  Total lines of code: 2062
  Total lines skipped (#nosec): 0
  Total potential issues skipped due to specifically being disabled (e.g., #nosec BXXX): 9

Run metrics:
  Total issues (by severity):
  	Undefined: 0
  	Low: 0
  	Medium: 0
  	High: 0
  Total issues (by confidence):
  	Undefined: 0
  	Low: 0
  	Medium: 0
  	High: 0
Files skipped (0):

Copy link
Collaborator

@yanksyoon yanksyoon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed until src/errors.py :D

README.md Show resolved Hide resolved
config.yaml Show resolved Hide resolved
@@ -0,0 +1,8 @@
ghapi
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if it would be worth limiting the versions for the libraries to .?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you referring to pinning to versions of the deps?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor comments

script/deploy_runner.sh Show resolved Hide resolved
Comment on lines +171 to +174
if token is None:
token = self.config["token"]
if path is None:
path = self.config["path"]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of curiosity, is this not a charm state because of when this charm was written? :D
Also, I'm thinking if the token, path and service_token validation can be done at a higher level on the controller function so that _get_runner_manager can be purely about getting the RunnerManager.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. You can find an example of a similar situation on this Synapse PR where we have a configuration (server_name) that needs to be set before the deployment too.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was not a charm state when this charm was written. Also this function was already in this module.
This can be refactored.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Major comments

return

runner_manager = self._get_runner_manager()
if runner_manager:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor nitpick, I think it would be better if the not case can come earlier so that it can set BlockedStatus and return first to reduce indentation.
i.e.

Suggested change
if runner_manager:
if not runner_manager:
self.unit.status = BlockedStatus(...)
return
... # continue with execution flow

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor comments

Comment on lines +280 to +285
except TimerEnableError as ex:
logger.exception("Failed to start the event timer")
self.unit.status = BlockedStatus(
f"Failed to start timer for regular reconciliation and binary update checks: {ex}"
)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of curiosity, if the exception is caught, should it return after setting BlockedStatus?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm guessing this is to trigger the next event, execution of the current event should probably proceed anyway. Perhaps the blocked state should say what the operator can/ should do to resolve?

Copy link
Collaborator

@yhaliaw yhaliaw Jul 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The return should be added.
In terms of the block message, I am not sure what the operator can do there.
Currently, changing the config again to trigger this hook, would retry the installing the timer. However, this does not seem to be a good design.

Maybe an action to manually install the timer, or an automatic retry for this can be added.
Currently, I am thinking not catching the TimerEnableError which causes ErrorStatus and juju would be retrying this event.

Might need to give this more thought.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Major comments

Comment on lines +296 to +299
if runner_manager:
self.unit.status = ActiveStatus()
else:
self.unit.status = BlockedStatus("Missing token or org/repo path config")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment here about having a not case first and continuing the execution flow.
i.e.

Suggested change
if runner_manager:
self.unit.status = ActiveStatus()
else:
self.unit.status = BlockedStatus("Missing token or org/repo path config")
if not runner_manager:
self.unit.status = BlockedStatus(...)
return
... # continue execution flow

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor comments

event: Event of reconciling the runner state.
"""
if not RunnerManager.runner_bin_path.is_file():
logger.warning("Unable to reconcile due to missing runner binary")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of curiosity, would there be a reason why this doesn't fall to BlockedStatus?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the runner binary is updated/downloaded every hour by default, the charm should be able to resolve missing binary by itself.
So a better status would be Maintenance.
Might be good for the code to set to maintenance status.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor comments

src/charm.py Show resolved Hide resolved
@@ -0,0 +1,9 @@
[flake8]
Copy link
Contributor

@amandahla amandahla Jun 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: Shouldn't this file be removed? So flake8 will only be configured using pyproject.toml

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor comments

Copy link
Collaborator

@yanksyoon yanksyoon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed until runner.py!

src/event_timer.py Show resolved Hide resolved
"""
self._pylxd_client = pylxd_client

def all(self) -> list[LxdInstance]:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of curiosity, I'm wondering if there would be a reason to use a mutable list, instead of a tuple?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At least from my perspective and in my humble opinion, usually a list or an iterator is returned. If the user of the class wants to use a tuple, it can be converted.

By the way the pylxd lib uses a list (here)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This interface class might be redesign, so I have put this down as something to take notes of.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor comments

token: Charm token configured for the repo policy compliance service.
"""

def __init__(self, session: requests.Session, url: str, charm_token: str) -> None:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of curiosity, why is a session passed in rather than a new requests session within? Since we're passing in a url, i'm assuming that the requests will be made to different url from where the session comes from, and in that case would a new requests session be better in this case?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The session argument can be remove.
Currently, all access to the same host is done in this class. And it is unlikely this will change.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor comments

busy (bool): Whether GitHub marks this runner as busy.
"""

runner_application = Path("/opt/github-runner")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor nitpick

Suggested change
runner_application = Path("/opt/github-runner")
runner_application_path = Path("/opt/github-runner")

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a tough one, there are arguments for going both ways:

  • If there are no other variables related to this path (e.g., the contents or something like that), adding the _path postfix doesn't add a lot of value
    • If there are related variables that are defined, then it can be worthwhile to indicate that one hols the path handle, another the content, another the deserialised content etc

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor comment

Comment on lines +496 to +497
if self.instance is None:
raise RunnerError("Runner operation called prior to runner creation.")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if the data flow could be unidirectional, without referring back to self so that we have a controller function and immutable data flow through it.

Copy link
Collaborator

@yhaliaw yhaliaw Jul 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The complexity of the runner class comes from it is possible to have runner without a LXD instance and not registered on GitHub, runner with LXD instance that is registered on GitHub, runner without a LXD instance and registered to GitHub (the LXD instance might crashed), etc.

It is possible to represent all these states as different type of objects with state pattern, which would avoid assert done in the line 496, 497. This would require a major rewrite so I did not did it.

This can be a future story to explore what pattern is best suited to reduce the complexity.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Major comments

name: end to end test
runs-on: [self-hosted, linux, x64, e2e-runner]
steps:
- name: Echo hello world
Copy link
Contributor

@amandahla amandahla Jun 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason for this echo here?
Would be possible to add a comment explaining that this echo is a connectivity test?

#70 (comment)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor comments

README.md Show resolved Hide resolved
event: Event of charm upgrade.
"""
logger.info("Reinstalling dependencies...")
self._install_deps()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't the SubprocessError be handled here like it is on _on_install?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the error handling is missing.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor comments

Copy link
Contributor Author

@jdkandersson jdkandersson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

up to github_type.py

.copier-answers.yml Show resolved Hide resolved
parts:
charm:
charm-python-packages:
- setuptools # for jinja2
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are all these and below still required?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I remember the ones for cryptography are for pylxd, and the installation would fail without those.
git is needed for installing from git source for pylxd, and repo-policy-compliance.
jinja2 is needed for the templating.
I think cffi is also for pylxd.

However, this was tested a while ago. Reviewing it again might be a good idea.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor comments

runners under a repository.
token:
type: string
default: ""
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this have a default? It seems that if not provided the charm isn't going to work?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we could have a default that works? (If that's the case we should mention in the description that this is required (I think we mention that path and token are required elsewhere in docs, but should do here as well).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This token refers to the github personal token, which does not have a default that could work. So this needs to be mentioned as required.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor comments

]

[tool.coverage.report]
fail_under = 38
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A known issue that this is below our requirements

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Major comments

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is obsoleted by the automated e2e tests?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this should be removed.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor comments

event: Event of configuration change.
"""
try:
self._event_timer.ensure_event_timer(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this is a way to periodically trigger custom events? Not a bad pattern until perhaps pebble notixes if we can find a better source for these events than cron

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@benhoyt any comment?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this seems like a nice workaround!

Comment on lines +280 to +285
except TimerEnableError as ex:
logger.exception("Failed to start the event timer")
self.unit.status = BlockedStatus(
f"Failed to start timer for regular reconciliation and binary update checks: {ex}"
)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm guessing this is to trigger the next event, execution of the current event should probably proceed anyway. Perhaps the blocked state should say what the operator can/ should do to resolve?


runner_manager = self._get_runner_manager()
if not runner_manager:
self.unit.status = BlockedStatus("Missing token or org/repo path config")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we be more specific? I.e., which one is missing?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it can be done.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor comments

event.fail("Missing runner binary")
return

online = 0
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be in a separate module?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor comments

event: Action event of reconciling the runner.
"""
runner_manager = self._get_runner_manager()
if not runner_manager:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps get runner manager can raise an exception instead of returning None which provides the error message? It seems this message is repeated a lot

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps injecting the runner manager can even be a decorator

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor comments

Comment on lines +315 to +316
self.instance.execute(["/usr/bin/who"])
self.instance.execute(["/usr/bin/nslookup", "github.com"])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The exit code of the execution is not checked, same for all occurrences of self.instance.execute in the charm.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Major comments

Copy link

@gregory-schiano gregory-schiano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stopped at _on_install for charm.py
Will continue later


jobs:
promote-charm:
uses: canonical/operator-workflows/.github/workflows/promote_charm.yaml@promote-charm-base

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we use @main ?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor comments

README.md Show resolved Hide resolved
README.md Show resolved Hide resolved
README.md Show resolved Hide resolved
README.md Show resolved Hide resolved
metadata.yaml Show resolved Hide resolved

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should pin all dependencies

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor comments

src/charm.py Show resolved Hide resolved
class GithubRunnerCharm(CharmBase):
"""Charm for managing GitHub self-hosted runners."""

_stored = StoredState()

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fact that we need to get away from it should be the main comment here

self.framework.observe(self.on.check_runners_action, self._on_check_runners_action)
self.framework.observe(self.on.reconcile_runners_action, self._on_reconcile_runners_action)
self.framework.observe(self.on.flush_runners_action, self._on_flush_runners_action)
self.framework.observe(self.on.update_runner_bin_action, self._on_update_runner_bin)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's sharing the same hook as "update_runner_bin" event
Each event should have a dedicated hook and eventually call the same private method

- name: Deploy nginx for testing
run: sudo microk8s kubectl create deployment nginx --image=nginx
- name: Wait for nginx to be ready
run: sudo microk8s kubectl rollout status deployment/nginx --timeout=30m
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't 30m too much? Maybe 5m?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor comments

@@ -0,0 +1,7 @@
.tox/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I miss a standard for this file on our charms. What do you think about using https://github.com/github/gitignore/blob/main/Python.gitignore as a base?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor comments

README.md Show resolved Hide resolved
charmcraft pack
# Ensure you're connected to a juju k8s model
# Configure the machine resource created by the model
juju set-model-constraints mem=8G cores=2 root-disk=50G
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are the constraints a requirement or suggestion?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In a sense they are a requirement as the default constraints would not have enough resource.
The memory would need to be increased and root-disk decreased, once we settle on using RAM as disk for the runners.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor comments

@@ -0,0 +1,72 @@
# Copyright 2023 Canonical Ltd.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: Flask named this file as exceptions.py. What's the standard? Related canonical/jenkins-k8s-operator#9 (comment)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

exceptions.py is probably a better name.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor comments

from jinja2 import Environment, FileSystemLoader


class TimerEnableError(Exception):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be in errors.py?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor comments

run: touch /usr/local/bin/test_file
# "Install microk8s" step will test if the proxies settings are correct.
- name: Proxy set in /etc/environment
run: cat /etc/environment | grep HTTP_PROXY
Copy link
Collaborator

@arturo-seijas arturo-seijas Jun 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the purpose of this cat /etc/environment | grep HTTP_PROXY ? Debugging? If so, how about of the rest of the proxy settings?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is to test whether the /etc/environment file was successful written to.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor comments

# "Update apt in python docker container" step will test docker client default proxy settings.
- name: Proxy set in docker client
run: cat /home/ubuntu/.docker/config.json | grep httpProxy
- name: Install microk8s
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we merge all the microk8s steps in ones?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which steps are you thinking about merging together?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor comments

needs: e2e-test
runs-on: [self-hosted, linux, x64, e2e-runner]
steps:
- name: Docker version
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we merge all these steps together? It will make debugging easier


This means that each interval, each unit will make one or more API calls to GitHub. The interval may need to be adjusted if the number of units is large enough to trigger [Rate Limiting](https://docs.github.com/en/rest/overview/resources-in-the-rest-api#rate-limiting).

## Development
Copy link
Collaborator

@arturo-seijas arturo-seijas Jun 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this too much detail? There's already a how to contribute doc

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor comments

* All enhancements require review before being merged. Code review typically examines
* code quality
* test coverage
* user experience for Juju administrators of this charm.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/administrators/operators?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor comments

@@ -0,0 +1,5 @@
GitHub self-hosted runner offer a way to run GitHub action workloads on non-GitHub servers.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/offer/offers

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor comments

Args:
dir: Name of the directory to create.
"""
self.instance.execute(["/usr/bin/mkdir", "-p", dir_name])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No error check?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor comments

LxdException: Unable to load the file into the LXD instance.
"""
lxc_cmd = [
"/snap/bin/lxc",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could "/snap/bin/lxc" be a constant?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor comments

LxdException: Unable to load the file to the LXD instance.
"""
if isinstance(content, str):
content = content.encode("utf-8")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Handle UnicodeError or add errors = ignore?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor comments

with tempfile.NamedTemporaryFile() as file:
self.pull_file(filepath, file.name)

return file.read().decode("utf-8")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment about handling errors here.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor comments

src/lxd.py Show resolved Hide resolved
src/lxd.py Show resolved Hide resolved
src/lxd.py Show resolved Hide resolved
src/lxd.py Show resolved Hide resolved
"""
self._pylxd_client = pylxd_client

def all(self) -> list[LxdInstance]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I'm missing something but is this only used for the tests to get the number of instances? If this is the case, why not change to return this number instead of the entire list?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not specifically related to this, addressing the question/ comment regarding list vs tuple. The order of preference for return values depending on what is possible:

  1. A generator, the caller can decide what kind of data structure to use
  2. A tuple, passing around lists can be dangerous since anyone can modify them accidentally, e.g., in a subroutine which can't happen with a tuple
  3. A list

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I'm missing something but is this only used for the tests to get the number of instances? If this is the case, why not change to return this number instead of the entire list?

I indeed only found references in test_runner and it seems it's systematically using a mocked object. So if that's the only use case we could probably find another way around

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This class original was created to replace the pylxd with the load file function changed. Hence, the interface is exactly like pylxd.
Now a interface to represent the virtualization layer will be created, this old interface will be removed.
No action needed.

"""
try:
pylxd_instance = self._pylxd_client.instances.create(config=config, wait=wait)
return LxdInstance(config["name"], pylxd_instance)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible that "name" key does not exist for some reason?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, it is not possible as the name is passed in on Runner class creation.
But checking can be added to the __init__ method to ensure a valid name is set.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor comments

@amandahla
Copy link
Contributor

Stopped on utilities.py. I'll review the tests tomorrow.

Copy link

@nrobinaubertin nrobinaubertin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't review everything

.github/workflows/end_to_end_test.yaml Show resolved Hide resolved
README.md Show resolved Hide resolved
# Assuming you're on amd64
juju deploy ./github-runner_ubuntu-22.04-amd64_ubuntu-20.04-amd64.charm
```

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the PAT tokens, just linking to the doc should be enough

templates/start.j2 Show resolved Hide resolved


@pytest.mark.abort_on_fail
async def test_build_and_deploy(ops_test):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Usually this is a fixture like here in Flask.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is in the progress of being change as a part of add integration tests.

self.deleted = True

def execute(self, cmd: Sequence[str], cwd: Optional[str] = None) -> tuple[int, str, str]:
return 0, "", ""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couldn't exit code be a parameter to test the behavior in case of error?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current method in the tests is to overwrite the method with a different stub method.

@patch("charm.RunnerManager")
@patch("pathlib.Path.write_text")
@patch("subprocess.run")
def test_org_register(self, run, wt, rm):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick: it took me some time to get that rm is runner manager

Copy link
Collaborator

@yhaliaw yhaliaw Jul 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test file is heavy out of date and is still using unittest. This will be updated in the migrate to pytest story.


@patch("pathlib.Path.write_text")
@patch("subprocess.run")
def test_install(self, run, wt):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For all the tests: I missed the docstrings as described in the contributing guide.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above

),
proxies={},
)
mock_rm.reconcile.assert_called_with(0, VirtualMachineResources(2, "7GiB", "10GiB")),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This assert is because of the default values, right?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes.


harness.charm._reconcile_runners = raise_error
harness.charm.on.install.emit()
assert harness.charm.unit.status == BlockedStatus("mock error")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What kind of error is expected?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reconcile runners can have various type of errors such as LXD error, hardware related errors (out of memory). This test checks if these unexpected errors, sets the charm into BlockedStatus.

if request.param[0] is None:
return None

attrs = {"status": request.param[0], "execute.return_value": (0, "", "")}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment here about the possibility of exit_code being a parameter to test execution error case.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor comments

bandit[toml]
-r{toxinidir}/requirements.txt
commands =
bandit -c {toxinidir}/pyproject.toml -r {[vars]src_path}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tst_path?

@amandahla
Copy link
Contributor

Finished 👍

src/github_type.py Show resolved Hide resolved
"""
self._pylxd_client = pylxd_client

def all(self) -> list[LxdInstance]:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not specifically related to this, addressing the question/ comment regarding list vs tuple. The order of preference for return values depending on what is possible:

  1. A generator, the caller can decide what kind of data structure to use
  2. A tuple, passing around lists can be dangerous since anyone can modify them accidentally, e.g., in a subroutine which can't happen with a tuple
  3. A list

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Broadly speaking, this class feels like it is a bit specific to LXD. It would be good to think about the virtualisation management interface the charm requires without thinking about LXD too much and make this module that interface

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are also some virtualisation specific things in the other parts of the code (like in charm.py) that would be good to move to be a part of the interface

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well I feel like the API is generic enough, the naming of the class are specific and should be renamed, we can easily derive an interface from the methods
Only the ProfileManager seems pretty LXD specific

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes and the exception raised also should be generic

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Major comments

busy (bool): Whether GitHub marks this runner as busy.
"""

runner_application = Path("/opt/github-runner")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a tough one, there are arguments for going both ways:

  • If there are no other variables related to this path (e.g., the contents or something like that), adding the _path postfix doesn't add a lot of value
    • If there are related variables that are defined, then it can be worthwhile to indicate that one hols the path handle, another the content, another the deserialised content etc

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is one of the key files for the runners, it would be good to write an interface definition to make it easier to understand it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also for runner_manager.py

try:
return func(*args, **kwargs)
# Error caught is set by the input of the function.
except exception as err: # pylint: disable=broad-exception-caught
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be good to take this as an argument so that we're not handling arbitrary exceptions

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, the exception is an argument, but it is defaulting to the Exception class.
It will take a major effort to find all exceptions to retry on for every instances of usage.
Also the exception argument would need to be change to a tuple of Exceptions rather than a single type of exception.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Major comments

src/utilities.py Show resolved Hide resolved
TimerDisableError: Timer cannot be stopped. Events will be emitted continuously.
"""
try:
# Don't check for errors in case the timer wasn't registered.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it worth it throwing an exception when you try to disable something already disabled ?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if we really should use this "trick" to trigger regularly the charm
I know there's no good way to do it, but this seems to bring some complication here using systemd

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Major comments

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well I feel like the API is generic enough, the naming of the class are specific and should be renamed, we can easily derive an interface from the methods
Only the ProfileManager seems pretty LXD specific

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes and the exception raised also should be generic

"""
self._pylxd_client = pylxd_client

def all(self) -> list[LxdInstance]:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I'm missing something but is this only used for the tests to get the number of instances? If this is the case, why not change to return this number instead of the entire list?

I indeed only found references in test_runner and it seems it's systematically using a mocked object. So if that's the only use case we could probably find another way around

src/lxd_type.py Show resolved Hide resolved

self.service_token = None

self.on.define_event("reconcile_runners", ReconcileRunnersEvent)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These 2 events are the one triggered by the timer service which is using SystemD scheduling to call the dispatch script from a juju run command

* `token` is a [GitHub Personal Access Token (PAT)](https://github.com/settings/tokens) (note: this is not the same as the token given in the Add a Runner instructions). The PAT token requires either:
* the **`repo`** ("Full control of private repositories") permission for
use with repositories or;
* both the **`repo`** and **`admin:org`** ("Full control of orgs and teams, read and write org projects") permissions for use with an organization. This is necessary because the charm will create and remove runners as needed to ensure that each runner executes only one job to protect jobs from leaking information to other jobs running on the same runner.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fine-grained token are in beta right now, but it might be good to include the instruction for fine-graned tokens.
https://docs.github.com/en/rest/overview/permissions-required-for-fine-grained-personal-access-tokens?apiVersion=2022-11-28

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor comments

@merkata
Copy link
Contributor

merkata commented Jul 10, 2024

@jdkandersson can this be closed?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.