Database and scheduler overhaul #2037

tbeadle · 2024-04-01T20:53:27Z

First off, I apologize for the size of this diff.

See the individual commits for a little more detail about the specific changes, but this PR aims to do a few things:

Make maintenance of the scheduler simpler
Make interaction with the database use the "unit of work" pattern so that transactions can be rolled back easily if an error occurs during some piece of it.
Remove the need for the machine lock and "scheduling" of machines for tasks, while keeping it performant.
Improve test coverage and make testing easier because configs and the database will start at known, clean state.

This must not be merged without vetting it in the staging environment for a while. I would love it if people using a cloud environment for the VM's could try it out as well because I do not have a way to test that.

…odule import. Calling Database() just returns a proxy to a real database object that may or may not exist when the Database object is created. That's fine as long as no attributes/methods are accessed until the init_database function has been called, which must be called only once during the program's startup. For testing purposes, the reset_database function can be called before calling init_database again. This is useful to create a fresh in-memory database for the test. Note that the Database objects that were created in other modules at module-level scope will still be valid and calling methods on them will be proxied to the new database instance. load_vms_tags() and load_vms_exists() have been modified so that they are not called during import of web_utils but are called where needed and the database is only contacted the first time--the resulting value is cached. No database queries should be made simply as a result of importing modules!

Use a scoped_session to make it easy to always get the current database session with one per thread. In the scheduler and web requests, wrap the unit of work (a main loop iteration in the scheduler and a request/response in the Django app) in a transaction so that the whole thing or none of it will be committed. This also removes the need for the classlock decorator around all of the Database methods. It also removes the need for handling commit/rollback behavior in each of those methods. This does make us be more aware of when the database is getting used, by explicitly starting a transaction. In sqlalchemy 2.0+, we can add `autobegin=False` to the sessionmaker arguments, which will make it easier to see when database access is being requested (like by using an attribute of an object that is "expired", i.e. outside of the transaction it was retrieved in). Without this knowledge, we could be making unnecessary calls to the database and not know it. This type of transaction handling also makes it easier to do things like find and reserve a machine. Improvements to this will be coming in a PR soon, but for now, we can at least use `with_for_update()` to lock the row of the machine that is found and, since that single request is not its own transaction, it can be locked later in the same transaction and we are guaranteed that no other thread will grab it.

- Use fixtures appropriately (i.e. be more pytest-y) - Honor the updated transaction handling. This generally consists of wrapping the API call in a transaction as it would be during a web request or an iteration of the scheduler. Allow that transaction to be committed to make sure rows are written to the database correctly. Use another transaction to do validation that the database was updated as expected. We need to wrap those in a transaction so that it doesn't create an implicit one upon the first database command sent to it. - Enable engine.echo so that, upon test failure or if `-s` is supplied to `pytest`, the SQL statements issued to the database are output. This enables for manual verification that it's issuing the commands we expect it to.

It is, so far, unchanged except for things required to make it work from a file separate from scheduler.py.

- Improve the handling of tasks, machines, and how they are updated in the database. Use transactions and locks appropriately so that changes are more atomic. - Remove the need for holding the machine_lock for long periods of time in the main loop and remove the need for batch scheduling tasks. - Make the code a lot cleaner and readable, including separation of concerns among various classes. Introduce a MachineryManager class that does what the name suggests. In the future, this could have an API added that could provide us a way to dynamically update machines in the database without having to update a conf file and restart cuckoo.py.

Also rework how Config caching is invalidated to make it work better for tests. More tests still need to be added for the Scheduler and AnalysisManager, but this at least gets us a little more testing than what was being done before.

…sk if the only machines with that tag are "reserved."

…arent process after forking. Use `engine.dispose` as described in https://docs.sqlalchemy.org/en/14/core/pooling.html#using-connection-pools-with-multiprocessing-or-os-fork. The benefit of this approach is that connection pools can still be used in the child processes.

cccs-kevin · 2024-04-02T13:28:02Z

modules/machinery/az.py

            label=label, platform=platform, tags=tags, arch=arch, include_reserved=include_reserved, os_version=os_version
        )

-    def acquire(self, machine_id=None, platform=None, tags=None, arch=None, os_version=[], need_scheduled=False):


@cccs-mog note that this method is removed when testing

kevoreilly · 2024-04-02T13:57:01Z

I am seeing:

Traceback (most recent call last):
  File "/opt/CAPEv2/lib/cuckoo/core/analysis_manager.py", line 497, in run
    self.launch_analysis()
  File "/opt/CAPEv2/lib/cuckoo/core/analysis_manager.py", line 459, in launch_analysis
    success = self.perform_analysis()
  File "/opt/CAPEv2/lib/cuckoo/core/analysis_manager.py", line 445, in perform_analysis
    self.run_analysis_on_guest()
  File "/opt/CAPEv2/lib/cuckoo/core/analysis_manager.py", line 402, in run_analysis_on_guest
    options = self.build_options()
  File "/opt/CAPEv2/lib/cuckoo/core/analysis_manager.py", line 236, in build_options
    "options": self.get_machine_specific_options(self.task.options),
AttributeError: 'AnalysisManager' object has no attribute 'get_machine_specific_options'

Can't see get_machine_specific_options() in the code

tbeadle · 2024-04-02T14:13:11Z

I'll be pushing up another commit shortly.

Tommy Beadle added 9 commits April 1, 2024 16:38

Move the AnalysisManager to its own module.

8392843

It is, so far, unchanged except for things required to make it work from a file separate from scheduler.py.

Update the unit tests for the database.

08c5aab

Update the remaining unit tests.

8af4e18

Also rework how Config caching is invalidated to make it work better for tests. More tests still need to be added for the Scheduler and AnalysisManager, but this at least gets us a little more testing than what was being done before.

Allow a "reserved" machine to be used when a tag is provided for a ta…

a68f2fe

…sk if the only machines with that tag are "reserved."

doomedraven changed the base branch from master to staging April 2, 2024 08:28

doomedraven merged commit ed5f57e into kevoreilly:staging Apr 2, 2024

cccs-kevin reviewed Apr 2, 2024

View reviewed changes

tbeadle mentioned this pull request Apr 2, 2024

Updates to scheduler overhaul #2040

Merged

xiangchen96 mentioned this pull request Jun 3, 2024

Fix stop, use guest name #2152

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Database and scheduler overhaul #2037

Database and scheduler overhaul #2037

Uh oh!

tbeadle commented Apr 1, 2024

Uh oh!

cccs-kevin Apr 2, 2024

Uh oh!

kevoreilly commented Apr 2, 2024

Uh oh!

tbeadle commented Apr 2, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

Database and scheduler overhaul #2037

Database and scheduler overhaul #2037

Uh oh!

Conversation

tbeadle commented Apr 1, 2024

Uh oh!

cccs-kevin Apr 2, 2024

Choose a reason for hiding this comment

Uh oh!

kevoreilly commented Apr 2, 2024

Uh oh!

tbeadle commented Apr 2, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants