Skip to content

Data Mining Course Project developed during my MSc in Data Science studies

License

Notifications You must be signed in to change notification settings

Leoo99G/RouteAligner

Repository files navigation

RouteAligner

This is a project that was developed for a course in Data Mining. I provide a brief overview of the project in this file, but for a comprehensive description, refer to the PDF report file Report_RouteAligner.

Problem description

The scenario involves drivers participating in the transportation of various goods between different cities using trucks. A truck moves between two cities while carrying one or more types of merchandise, each with specified quantities. We will call this a trip. A route is a sequence of trips. The company, based on the frequent orders it receives, generates standard routes (SR), i.e., sequences of cities with their corresponding merchandise types and quantities for each trip. However, it is assumed that truck drivers in charge of executing routes often deviate from the planned route. They might load slightly more or less of a particular merchandise, add extra merchandise, or omit some altogether. Moreover, they may introduce additional cities into their route or omit some cities. These routes, which reflect the deviations, are referred to as actual routes (AR).

An example of standard route is given by:

{
    "id": "s1",
    "route": [
        {
            "from": "city_1",
            "to": "city_2",
            "merchandise": {
                "item_1": 16
            }
        },
        {
            "from": "city_2",
            "to": "city_3",
            "merchandise": {
                "item_1": 4,
                "item_2": 7,
                "item_3": 19
            }
        }
    ]
}

An example of actual route is given by:

  {
      "id": "a1_1",
      "driver": "D23",
      "sroute": "s1",
      "route": [
          {
              "from": "city_1",
              "to": "city_2",
              "merchandise": {
                  "item1": 16
              }
          },
          {
              "from": "city2",
              "to": "city3",
              "merchandise": {
                  "item_1": 3,
                  "item_2": 7
              }
          }
      ]
  }

Notice that for each actual route, we are given information on the standard route it is meant to implement and the driver that has performed it.

Tasks

The following tasks need to be performed:

  • Task 1 Provide recommendations to the company regarding which standard routes they should adopt (improving adherence to historical actual routes).
  • Task 2 For each driver, create an ordered list of standard routes. The higher a standard route is placed in the list, the less likely the driver is to deviate from it. These routes are chosen from the pool of standard routes originally provided by the company and those recommended.
  • Task 3 For each driver, generate an ideal standard route that minimizes deviations from the driver’s actual routes.

In solving these tasks, we assume the following inputs:

  • A JSON file containing the standard routes provided by the company.
  • A JSON file containing the actual routes executed by the drivers. However, these datasets are not pre-existing and are synthetically generated as part of the project.

To achieve these objectives, various Data Mining techniques were employed, including frequent itemsets mining and minhashing. A detailed description of the developed solutions is provided in the PDF report. The implementation of the project is in Python, and the synthetically generated datasets include lists of SR (standard routes) and AR (actual routes).

Project structure

The code is organized as follows:

  • Datasets_Generation contains scripts (in particular, DatasetGeneration.py) to synthetically generate SR and AR.
  • data contains 4 JSON files of SR and AR that have been synthetically generated using the scripts in the Datasets_Generation folder.
  • src is the main folder and contains all the code that implements solving the 3 tasks defined above. In particular,
    • update_routes.py contains functions that solve Task1;
    • top5routes.py contains functions that solve Task2;
    • idealroutes.py contains functions that solve Task3;
    • similarity.py contains functions that compute the (custom) similarity metrics between routes that have been introduced in the report.
  • results contains the output files generated by running the code to solve the 3 tasks on the 4 datasets in the data folder.
  • Routes_tables contains a function that converts JSON files of AR and SR into CSV format. This conversion allows for easier visualization of the routes using tools like Excel.

Please note that the data and results directories were not uploaded because their files were too large. However, you can generate these datasets using the provided configuration files. For detailed instructions, refer to the README.txt file located in the src directory.

Instructions on how to execute the code are provided in the README.txt file in the src folder.

About

Data Mining Course Project developed during my MSc in Data Science studies

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages