Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Database and scheduler overhaul #2037

Merged

Conversation

tbeadle
Copy link
Contributor

@tbeadle tbeadle commented Apr 1, 2024

First off, I apologize for the size of this diff.

See the individual commits for a little more detail about the specific changes, but this PR aims to do a few things:

  • Make maintenance of the scheduler simpler
  • Make interaction with the database use the "unit of work" pattern so that transactions can be rolled back easily if an error occurs during some piece of it.
  • Remove the need for the machine lock and "scheduling" of machines for tasks, while keeping it performant.
  • Improve test coverage and make testing easier because configs and the database will start at known, clean state.

This must not be merged without vetting it in the staging environment for a while. I would love it if people using a cloud environment for the VM's could try it out as well because I do not have a way to test that.

Tommy Beadle added 9 commits April 1, 2024 16:38
…odule import.

Calling Database() just returns a proxy to a real database object that
may or may not exist when the Database object is created. That's fine as
long as no attributes/methods are accessed until the init_database
function has been called, which must be called only once during the
program's startup. For testing purposes, the reset_database function can
be called before calling init_database again. This is useful to create a
fresh in-memory database for the test. Note that the Database objects
that were created in other modules at module-level scope will still be
valid and calling methods on them will be proxied to the new database
instance.

load_vms_tags() and load_vms_exists() have been modified so that they
are not called during import of web_utils but are called where needed
and the database is only contacted the first time--the resulting value
is cached. No database queries should be made simply as a result of
importing modules!
Use a scoped_session to make it easy to always get the current database
session with one per thread. In the scheduler and web requests, wrap the
unit of work (a main loop iteration in the scheduler and a
request/response in the Django app) in a transaction so that the whole
thing or none of it will be committed. This also removes the need for
the classlock decorator around all of the Database methods. It also
removes the need for handling commit/rollback behavior in each of those
methods.

This does make us be more aware of when the database is getting
used, by explicitly starting a transaction. In sqlalchemy 2.0+, we can
add `autobegin=False` to the sessionmaker arguments, which will make it
easier to see when database access is being requested (like by using an
attribute of an object that is "expired", i.e. outside of the
transaction it was retrieved in). Without this knowledge, we could be
making unnecessary calls to the database and not know it.

This type of transaction handling also makes it easier to do things like
find and reserve a machine. Improvements to this will be coming in a PR
soon, but for now, we can at least use `with_for_update()` to lock the
row of the machine that is found and, since that single request is not
its own transaction, it can be locked later in the same transaction and
we are guaranteed that no other thread will grab it.
- Use fixtures appropriately (i.e. be more pytest-y)
- Honor the updated transaction handling. This generally consists of
  wrapping the API call in a transaction as it would be during a web
  request or an iteration of the scheduler. Allow that transaction to be
  committed to make sure rows are written to the database correctly. Use
  another transaction to do validation that the database was updated as
  expected. We need to wrap those in a transaction so that it doesn't
  create an implicit one upon the first database command sent to it.
- Enable engine.echo so that, upon test failure or if `-s` is supplied
  to `pytest`, the SQL statements issued to the database are output.
  This enables for manual verification that it's issuing the commands we
  expect it to.
It is, so far, unchanged except for things required to make it work from
a file separate from scheduler.py.
- Improve the handling of tasks, machines, and how they are updated in
  the database. Use transactions and locks appropriately so that changes
  are more atomic.
- Remove the need for holding the machine_lock for long periods of time
  in the main loop and remove the need for batch scheduling tasks.
- Make the code a lot cleaner and readable, including separation of concerns
  among various classes. Introduce a MachineryManager class that does
  what the name suggests. In the future, this could have an API added
  that could provide us a way to dynamically update machines in the
  database without having to update a conf file and restart cuckoo.py.
Also rework how Config caching is invalidated to make it work better for
tests.
More tests still need to be added for the Scheduler and AnalysisManager,
but this at least gets us a little more testing than what was being done
before.
…sk if the only machines with that tag are "reserved."
…arent process after forking.

Use `engine.dispose` as described in
https://docs.sqlalchemy.org/en/14/core/pooling.html#using-connection-pools-with-multiprocessing-or-os-fork.
The benefit of this approach is that connection pools can still be used
in the child processes.
@doomedraven doomedraven changed the base branch from master to staging April 2, 2024 08:28
@doomedraven doomedraven merged commit ed5f57e into kevoreilly:staging Apr 2, 2024
3 of 5 checks passed
@@ -483,31 +482,6 @@ def availables(self, label=None, platform=None, tags=None, arch=None, include_re
label=label, platform=platform, tags=tags, arch=arch, include_reserved=include_reserved, os_version=os_version
)

def acquire(self, machine_id=None, platform=None, tags=None, arch=None, os_version=[], need_scheduled=False):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cccs-mog note that this method is removed when testing

@kevoreilly
Copy link
Owner

I am seeing:

Traceback (most recent call last):
  File "/opt/CAPEv2/lib/cuckoo/core/analysis_manager.py", line 497, in run
    self.launch_analysis()
  File "/opt/CAPEv2/lib/cuckoo/core/analysis_manager.py", line 459, in launch_analysis
    success = self.perform_analysis()
  File "/opt/CAPEv2/lib/cuckoo/core/analysis_manager.py", line 445, in perform_analysis
    self.run_analysis_on_guest()
  File "/opt/CAPEv2/lib/cuckoo/core/analysis_manager.py", line 402, in run_analysis_on_guest
    options = self.build_options()
  File "/opt/CAPEv2/lib/cuckoo/core/analysis_manager.py", line 236, in build_options
    "options": self.get_machine_specific_options(self.task.options),
AttributeError: 'AnalysisManager' object has no attribute 'get_machine_specific_options'

Can't see get_machine_specific_options() in the code

@tbeadle
Copy link
Contributor Author

tbeadle commented Apr 2, 2024

I'll be pushing up another commit shortly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants