WALL-E role updates #7

neoformit · 2024-09-20T00:51:35Z

You may wish to have a discussion about this, but I made quite a few changes to get WALL-E working on Galaxy AU. I think these changes should improve interoperability between Galaxy servers, but would require a minor update to EU's playbook to work (additional vars).

Add debug logging with --debug option for walle.py
Modify cron job to fix environment variable issues
Add all required env variables to walle_bashrc
Catch permission error in walle.py
Add --kill option for walle.py to kill malicious jobs with gxadmin
Include a modified version of galay_jwd.py that accepts XML or YAML format for object_store_conf
Ignore Ansible failure of "Clone malware database (WallE)"
Update README

Add debug logging

walle.py --debug was really useful for getting this going. It's very verbose so probably don't want to leave it on in production.

Modify cron job

I found that cron is not able to source our Galaxy .bashrc file, probably because it contains code that is specific to an interactive shell. The result is that none of the env vars that are set in wall_bashrc make it into WALL-E's env. I fixed this by creating a new .bashrc file for WALL-E (with all required env variables) and sourcing it in crontab like so:

# crontab
0 */1 * * * BASH_ENV=/mnt/galaxy/walle/.bashrc bash -c "  source /mnt/galaxy/venv/bin/activate; /mnt/galaxy/venv/bin/python  /usr/local/bin/walle.py  --tool interactive_tool    --max-size 10   --since 24    --debug   --kill  >> /var/log/walle/walle.log 2>&1"

Add all required env variables

To enable the dedicated walle_bashrc, all required env vars are added by the walle role. This requires additional Ansible variables, which are documented in README.md. Additional env vars can be easily added in the playbook with walle_extra_env_vars.

Permission error in `walle.py`

Instead of failing, log a warning if PermissionDenied is raised when reading a JWD file.

Add `--kill` option for `walle.py`

Optional, of course. We won't use this yet but perhaps in future when we're confident in WALL-E's abilities. We also would like to add an --alert option to notify us in Slack when malicious files are detected - this will most likely be our default. The kill option assumes that gxadmin is accessible (can point to it with env var GXADMIN_PATH), which will be run like:

gxadmin mutate fail-job $JOB_ID --commit
gxadmin mutate fail-terminal-datasets --commit

Ignore Ansible failure of "Clone malware database (WallE)"

Useful if you want to make a local modification of checksums.yml for testing.

neoformit · 2024-09-20T00:52:58Z

Oh dear, I just noticed that this overlaps a bit with your existing PR @mira-miracoli 😬

mira-miracoli · 2024-09-25T07:10:21Z

Hi, thank you for your contibution! :)
Especially thank you for integrating the galaxy_jwd.py part into the script, I shied away from that task😅
The --kill option is a cool feature. I thought about it, but dropped it in favor of the --delete-user option,
because Galaxy automatically kills all running jobs then. And I did not see many cases for EU where we would not delete a user when we find stuff that would make us want to kill the job.
Would it be okay for you if we merge my PR first and then you rebase?

neoformit · 2024-09-25T19:41:22Z

Hey Mira, yes please feel free to merge your PR first and I'll clean up mine. The --kill option I thought was just a bit less aggressive than --delete-user but really we should not be doing either of these until we're confident of no type 1 errors! Nice to give admins a choice of actions I suppose. I hope to push something like --slack-alert in the next week or two.

mira-miracoli · 2024-10-08T14:58:41Z

@neoformit
I merged my PR now, please feel free to rebase and then we can merge yours :)

neoformit · 2024-10-10T21:49:47Z

Hey @mira-miracoli I've rebased, there were lots of conflicts that I think I've resolved correctly but please check the diff on walle.py. It was a bit confusing at times - I noticed that in some diffs my changes were marked as "current" and other times "incoming" so it was hard to tell whose work I was accepting 🙄 I'm having a look over the diff now.

neoformit

I think this looks ok now, I'll try running updating the role and running our playbook to check for issues.

neoformit · 2024-10-10T22:16:24Z

files/walle.py

+            while chunk := specimen.read(chunksize):
+                sha1.update(chunk)
+    except PermissionError:
+        logger.warning(f"Permission denied for file: {path}")


There are two cases where I caught permission errors here, in my case it was only for one file in the JWD (I think command.sh) but this error could be fatal if all JWD files are PermissionDenied. I guess in that case it would be pretty obvious in walle.log that walle is not working.

Yep, in my walle.log I get:

2024-10-11 04:48 - WARNING - Permission denied for file: /mnt/galaxy/tmp/job_working_directory/_interactive/24546/command.sh

Thank you! I think walle should run as root, because the jupyter users have root access inside their jupyter notebook and can save files as uid and gid 0

WallE could cleanup everything non-root and the rest we could leave to our normal cleanup scripts?

Walle does not clean up, it just scans files. But in order to do so, it needs read access

neoformit · 2024-10-11T04:52:58Z

Tested on the AU dev server and working ok with --verbose --debug

mira-miracoli

Sorry some of the comments are just ideas, not worth changing.
I am not sure how --kill and --debug will be used in the script, I don't see a a change in the main function
(You're probably still working on it?)

tasks/main.yml

mira-miracoli · 2024-10-14T08:21:05Z

defaults/main.yml

@@ -30,6 +31,13 @@ walle_envs_database:
    value: "{{ galaxy_config_dir }}/galaxy.yml"


Suggested change

value: "{{ galaxy_config_dir }}/galaxy.yml"

value: "{{ galaxy_config_file }}"

I would also change (but gh does not allow suggestions on unchanged code parts :/ (or I don't know how)

- - key: PGHOST - value: 127.0.0.1 - - key: PGUSER - value: galaxy - - key: PGDATABASE - value: galaxy + - key: PGHOST + value: "{{ galaxy_pg_host }}" + - key: PGUSER + value: "{{ galaxy_pg_user }}" + - key: PGDATABASE + value: "{{ galaxy_pg_db }}"

galaxy_pg_host, galaxy_pg_user and galaxy_pg_db seem to be EU-specific playbook vars, we don't have them in AU or in the galaxyproject.galaxy role? I assumed that admins would change these values with walle_extra_env_vars if they wanted to customize them.

oh okay, to me it looked like you added them in the README

Yep you're right, I did 🤦 I have committed this suggestion: 4ce5886. It does make it easier for admins to control these basic vars, but can still get more flexibility with walle_extra_env_vars.

defaults/main.yml

mira-miracoli · 2024-10-14T08:48:53Z

files/walle.py

+            while chunk := specimen.read(chunksize):
+                sha1.update(chunk)
+    except PermissionError:
+        logger.warning(f"Permission denied for file: {path}")


Walle does not clean up, it just scans files. But in order to do so, it needs read access

files/walle.py

tasks/main.yml

neoformit · 2024-10-14T20:22:27Z

Thanks a lot for the thorough review @mira-miracoli 🎉
It looks like --debug is redundant now, so I have dropped that in favour of --verbose.
The --kill logic was lost in the merge so I've replaced that - thanks for noticing. I need to get better at doing a complex rebase 🙄

I think we just have to decide how to handle the walle_env_vars which don't always have a playbook variable to default to.

mira-miracoli · 2024-10-15T07:42:33Z

Of course, many thanks for the great contribution! :)

I think we just have to decide how to handle the walle_env_vars which don't always have a playbook variable to default to.

I am not happy with the dictionaries either. Ansible does not allow changing only specific k/v pairs from default, so you need to copy everything and then do your changes.
I am not sure if there are variables for all database k/v pairs. maybe we could leave it this way with hardcoded default values? I think

  - key: PGHOST
    value: 127.0.0.1
  - key: PGUSER
    value: galaxy
  - key: PGDATABASE
    value: galaxy
  - key: GXADMIN_PATH
    value: /usr/local/bin/gxadmin

should work for most Galaxy instances(?)
Only thing is if people rely on --kill to work when Ansible runs without errors and they don't actually have gxadmin.
But I think if we write that in the docs that would be users responsibility.

Co-authored-by: Mira <[email protected]>

neoformit · 2024-10-15T08:20:48Z

Ansible does not allow changing only specific k/v pairs from default

True, but I think they can just override them by appending to walle_extra_env_vars? Here's my logic:

walle_envs_database: 
  - key: PGHOST
    value: 127.0.0.1
  - key: PGUSER
    value: galaxy

walle_extra_env_vars:  # Defined in playbook
  - key: PGUSER
    value: custom_db_user

walle_env_vars: "{{ walle_envs_database + walle_extra_env_vars }}"
# Results in:
walle_env_vars:
  - key: PGHOST
    value: 127.0.0.1
  - key: PGUSER
    value: galaxy
  - key: PGUSER
    value: custom_db_user

If you do lineinfile with walle_env_vars, PGUSER=custom_db_user would overwrite PGUSER=galaxy, right?

neoformit · 2024-10-15T08:25:09Z

Sorry, didn't mean to close this. I did a typing blooper and hit some shortcut by accident 💩

mira-miracoli · 2024-10-15T08:36:49Z

If you do lineinfile with walle_env_vars, PGUSER=custom_db_user would overwrite PGUSER=galaxy, right?

oh smart! I did not think of this, that you could use the walle_extra_env_vars to overwrite.
Maybe we could include that in the README?

neoformit · 2024-10-15T08:38:59Z

Maybe we could include that in the README?

Yeah for sure! Do you think it's good to also have the required playbook vars that I defined in the README?

galaxy_config_file: /path/to/galaxy.yml
galaxy_log_dir: /path/to/galaxy/log/dir
galaxy_pg_db: galaxy
galaxy_pg_user: galaxy
galaxy_pg_host: my-db-server.usegalaxy.org
galaxy_pulsar_app_conf: /path/to/pulsar/app.yml

mira-miracoli · 2024-10-15T08:49:58Z

If they only appear in eu's playbooks, I think we could remove that

mira-miracoli · 2024-10-16T07:41:38Z

Sorry I know this is very annoying but pyright gave me some errors.
Could you check with pyright (I just pushed a gh action) and format with black?
I can also do that, I thought it is nicer to ask before pushing stuff to your PR

mira-miracoli · 2024-10-16T07:56:14Z

Closed and reopened, so the pyright action triggers (I don't know a more subtle way 🙄 )

neoformit · 2024-10-16T22:51:13Z

That is annoying, it seems that pyright raises linting errors for type hints on dependencies (i.e. we can't add type hinting to os.environ.get). But I can see how the linting is useful. I've done the best I can there - feel free to push if you see any others that need fixing?

neoformit marked this pull request as ready for review September 23, 2024 04:14

mira-miracoli mentioned this pull request Sep 26, 2024

Add delete user fuctionality, increase verbosity and general refactoring #2

Merged

1 task

neoformit added 15 commits October 11, 2024 06:25

Add logging with debug

d094cf2

Catch permission error

b740c29

Fix cron job env vars by invoking bash shell

4166f93

Add option to kill jobs with gxadmin

e51a97e

Update README

35ad742

Add galaxy_jwd.py script

6ef5e70

Modify galaxy_jwd.py to accept YAML object_store_conf

dfbc582

Document walle_virtualenv in defaults

e1e66fb

Fix walle_bashrc file

59ac1d3

Move python scripts to files

2fde4fd

Set bashrc env vars from playbook vars

f19593f

Improved debug logging with subprocess

3b7c50a

Fix subprocess kwarg

b3f2173

Add walle_extra_env_vars to make env vars configurable

393a4d1

Apply github/super-linter@v4 diffs

287f6b4

neoformit force-pushed the main branch from 3409093 to 287f6b4 Compare October 10, 2024 21:48

neoformit added 5 commits October 11, 2024 07:50

Remove duplicate logging.getLogger

44d8acf

Remove unused CURRENT_TIME

fb3ac66

Revert merge conflicts in walle.py

b36eff9

copy -> ansible.builtin.copy

5414f9e

walle_env_vars merge conflict

5a07b78

neoformit commented Oct 10, 2024

View reviewed changes

Remove old report_matching_malware

ba5d363

Remove invalid logger.debug param

7d747eb

neoformit mentioned this pull request Oct 13, 2024

Wall-E config updates usegalaxy-au/infrastructure#2231

Open

mira-miracoli reviewed Oct 14, 2024

View reviewed changes

neoformit added 5 commits October 15, 2024 05:46

Set exit code zero on no jobs found

a812136

Replace call to kill_job()

317f031

Use ansible.builtin.file

20b5cee

Remove redundant walle_debug

a2f2246

Create var walle_malware_database_force_update

816fca9

Debug log calculated SHA1 hash

a934a31

Co-authored-by: Mira <[email protected]>

neoformit closed this Oct 15, 2024

neoformit reopened this Oct 15, 2024

neoformit added 3 commits October 15, 2024 18:35

Fix README

ac620a3

Set env vars from ansible vars

4ce5886

Merge branch 'main' of https://github.com/usegalaxy-au/WallE

f08e0f8

Remove required env vars

b688e8a

mira-miracoli closed this Oct 16, 2024

mira-miracoli reopened this Oct 16, 2024

neoformit added 2 commits October 17, 2024 08:25

Black format walle

91cebf5

Fix pyright typing lint issues

71e15e4

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WALL-E role updates #7

WALL-E role updates #7

neoformit commented Sep 20, 2024

neoformit commented Sep 20, 2024 •

edited

Loading

mira-miracoli commented Sep 25, 2024

neoformit commented Sep 25, 2024

mira-miracoli commented Oct 8, 2024

neoformit commented Oct 10, 2024

neoformit left a comment

neoformit Oct 10, 2024

neoformit Oct 11, 2024

mira-miracoli Oct 14, 2024

bgruening Oct 14, 2024

mira-miracoli Oct 14, 2024

neoformit commented Oct 11, 2024

mira-miracoli left a comment •

edited

Loading

mira-miracoli Oct 14, 2024

mira-miracoli Oct 14, 2024

neoformit Oct 14, 2024

mira-miracoli Oct 15, 2024

neoformit Oct 15, 2024

mira-miracoli Oct 14, 2024

neoformit commented Oct 14, 2024

mira-miracoli commented Oct 15, 2024

neoformit commented Oct 15, 2024 •

edited

Loading

neoformit commented Oct 15, 2024

mira-miracoli commented Oct 15, 2024 •

edited

Loading

neoformit commented Oct 15, 2024

mira-miracoli commented Oct 15, 2024

mira-miracoli commented Oct 16, 2024

mira-miracoli commented Oct 16, 2024

neoformit commented Oct 16, 2024

		@@ -30,6 +31,13 @@ walle_envs_database:
		value: "{{ galaxy_config_dir }}/galaxy.yml"

	value: "{{ galaxy_config_dir }}/galaxy.yml"
	value: "{{ galaxy_config_file }}"

WALL-E role updates #7

Are you sure you want to change the base?

WALL-E role updates #7

Conversation

neoformit commented Sep 20, 2024

Add debug logging

Modify cron job

Add all required env variables

Permission error in walle.py

Add --kill option for walle.py

Ignore Ansible failure of "Clone malware database (WallE)"

neoformit commented Sep 20, 2024 • edited Loading

mira-miracoli commented Sep 25, 2024

neoformit commented Sep 25, 2024

mira-miracoli commented Oct 8, 2024

neoformit commented Oct 10, 2024

neoformit left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

neoformit commented Oct 11, 2024

mira-miracoli left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

neoformit commented Oct 14, 2024

mira-miracoli commented Oct 15, 2024

neoformit commented Oct 15, 2024 • edited Loading

neoformit commented Oct 15, 2024

mira-miracoli commented Oct 15, 2024 • edited Loading

neoformit commented Oct 15, 2024

mira-miracoli commented Oct 15, 2024

mira-miracoli commented Oct 16, 2024

mira-miracoli commented Oct 16, 2024

neoformit commented Oct 16, 2024

Permission error in `walle.py`

Add `--kill` option for `walle.py`

neoformit commented Sep 20, 2024 •

edited

Loading

mira-miracoli left a comment •

edited

Loading

neoformit commented Oct 15, 2024 •

edited

Loading

mira-miracoli commented Oct 15, 2024 •

edited

Loading