Skip to content
This repository has been archived by the owner on Nov 30, 2022. It is now read-only.

Implementing a Driver

Alexander Dubrawski edited this page Aug 27, 2020 · 14 revisions

Basic Idea

driver_architecture

The cockpit's only task is to execute the task and get all relevant meta information. We created drivers for the workload/benchmarks that define the logic for them. The generator uses the driver to generate the tasks. The database uses the driver to execute the tasks. We decided that the best generic implementation is to find an abstraction for the workloads. A workload is defined by a name (type) and its scale factor, frequency, and weights. The Cockpit understands a workload by these attributes and it communicates with these attributes with the driver. How exactly the tables are named in the database and in which folder the queries are located (or how they are generated) is implemented in the driver. The drivers and the cockpit are connected via the connector. The connector is managing a dictionary with all drivers for the database component and a dictionary with Workload Objects (that are including the driver) for every workload that the generator can use.

Implementing a Driver

The following are instructions for how to implement a new driver for a system. Inside a driver is the execution logic of a workload. The driver needs to implement the following interfaces:

  • get_scalefactors()
  • get_default_weights()
  • generate(scalefactor, frequency, weights)
  • get_table_names(scalefactor)
  • get_load_queries(scalefactor)
  • get_delete_queries(scalefactor)
  • execute_task(task, cursor, worker_id)

The new driver X needs to be implemented in hyrisecockpit.drivers. Inside this folder you need to create a new folder for the driver. Inside this folder the interfaces need to be implemented in a class Named XDriver in x_driver.py` :

hyrisecockpit
|-> drivers
		|-> job
		|-> job
		|-> driver_x
				|-> x_driver.py
class XDriver

    def get_scalefactors(): 
    ...

    def execute_task(task, cursor, worker_id): 
    ...

Is the driver implemented you need to add it to the connector in hyrisecockpit.drivers.connector. Import the driver and add it in get_workload_drivers:

...
from hyrisecockpit.drivers.tpch.tpch_driver import TpchDriver
from hyrisecockpit.drivers.tpch.x_driver import XDriver

...

class Connector:
...

    @classmethod
    def get_workload_drivers(cls):
        """Return a dictionary with workload drivers."""
        return {
            "tpch": TpchDriver(),
            ...
            "x": XDriver(),
        }

Definitions interface Functions

get_scalefactors() -> List

This function needs to return the scale factors that are supported for this workload. The scale factors need to be returned in a list from the type float.

get_default_weights() -> Dict

Returns the default weights for the queries of the workload. The weights are handed over to the generating function of the table. This function needs to return a dictionary where the key is the name of the query and the value the weight of the query. The weight needs to be a float. 1.0 is the normal distribution. If you have no weights please return an empty dictionary.

generate(scalefactor, frequency, weights) -> List

This Funktion needs to return a list with workload tasks. The workload tasks are published by the workload generator and executed by the task_worker. The task will be executed by the execute_task function from this driver. You can use the frequency and weights argument to generate the right amount of tasks. The task object needs to be a dictionary with the keys query, args (parameter), scalefactor, query_type.

get_table_names(scalefactor) -> Dict

Returns a dictionary with all table names that belong to this workload. The key is the name of the table of the workload and the value is the table representation inside the hyrise. One example is:

return {
	"customer": "customer_tpcds_1_0",
	...
}

get_load_queries(scalefactor) -> Dict

Needs to return a dictionary where the key is the name of the table in the hyrise and the value the sql command to load the table. One example is:

return {
	"customer_tpcds_1_0": "COPY customer_tpcds_1_0 FROM '/usr/local/hyrise/cached_tables/tpcds_0_1/customer.bin';",
	...
}

The convention is that all binary tables are at the location /usr/local/hyrise/cached_tables on the machine the hyrise is running. If you implement a driver please update the dockerfile (it needs to build al necessary binary tables) and the guide https://github.com/hyrise/Cockpit/wiki/Hyrise-Things on how to build the binary tables.

get_delete_queries(scalefactor) -> Dict

Needs to return a dictionary where the key is the name of the table in the hyrise and the value the sql command to delete the table. One example is:

return {
	"customer_tpcds_1_0": "DELETE customer_tpcds_1_0;",
	...
}

execute_task(task, cursor, worker_id) -> Tuple[int, int, float, str, bool]:

This function needs to handle the execution of the task that was generated in the generate function. For that, the function gets a cursor object as an argument. The cursor is defined in hyrisecockpit.database_manager.cursor.HyriseCursor. It uses its main functionality from psycopg2 (https://www.psycopg.org/). This function needs to return the start timestamp of the execution, the latency (end timestamp - start timestamp), the scale factor of the workload, the query type, and if it was successfully committed in this order as a Tupel.

Quick fixes

  • if you get the error ModuleNotFoundError: No module named 'hyrisecockpit.drivers.driver_x try to add an empty __init__.py in the directory driver_x of your new driver.

Default driver