An example of a microservice that can be used to classify vehicles by make, model and variant. I have used this microservice to explore the Azure serverless stack, specifically:
- Functions (.NET 8)
- CosmosDb
- Bicep
- Azure Pipelines
This example is also available in the following stacks:
The taxonomy API will use a simplified hierarchy of classifications and only covers a single vehicle type (cars):
- Make: AKA Brand or Marque. The retail name of the company that produces a range of vehicle models. Note that a parent company may own a range of makes e.g. "Volkswagen", "Audi" and "Porche" are all makes owned by the "Volkswagen Group" organisation.
- Model: A model of vehicle often with a distinct shape and chassis e.g. for make "Volkswagen" modles could be "Polo", "ID.3", "Golf Plus" etc. A Make has many models, and a model can have many configurations or "variants".
- Variant: AKA derivative or configuration. A specific configuration of a vehicle model combining attributes such as the "trim" level, fuel type and engine size e.g. Volkswagen Polo variants include "POLO MATCH TDI 1.5l Diesel", "POLO S 1.2l Petrol" and "POLO S 75 AUTO 1.4l Petrol".
In addition we'll also capture some example attributes against a variant including the fuel type and engine size. This structure can be represented by the entity relationship diagram (ERD) below:
The microservice will need to cover the following behavior for each of the three entities in the make-model-variant hierarchy:
- List all, by parent entity id if applicable
- Determine if a new entity meets uniqueness criteria i.e. an "is unique" query
- Add entity
- Delete entity
Additionally we'll need a way to bulk import taxonomy data via a CSV file.
As example data we'll be using vehicle statistics data from the UK Government, available under the open government licence v3. Here's an extract of the data:
An export of the data from 2024 can be found in the data directory at /data/MakesModelsFuelType.csv, which is around 6mb and has around 75,000 rows. Once imported the dataset would be expected to only grow slowly over time as new makes and models enter production. It is not expected that older data will need to be updated often or at all.
This data does not explicitly include a variant or derivative so we will need to construct one from the "Model", "Fuel" and "EngineSizeSimple" columns in the CSV. Confusingly, it will be the CSV "GenModel" value (AKA "Generation" Model) that we will use for our interpretation of a vehicle "Model", as this value best aligns with a customer expectations when, for example, filtering vehicle selection by model.
The dataset also include data for non-car vehicle types, but including this data is a non-goal.
I used DynamoDb in the original AWS version of this example, and for CosmosDb many of the considerations are similar. Choosing the correct partition scheme is particularly important to avoid hot partitions as well as understanding how to model a relational schema in nosql.
For our vehicle taxonomy data we'll use a single-container design, overloading the container to store all our entity types. Here's a description of the data fields:
- EntityType: Used to determine the entity type in the overload cosmos db container e.g. "Make", "Model" or "Variant". This isn't used for querying and is only stored for debugging purposes.
- Id: The internal CosmosDb identifier for the entity, unique across all entity types. This is the full identity path (ParentPath plus PublicId), which is then hashed and encoded because the CosmosDb recommendation is for alpha-numerical values only.
- ParentPath: The hierarchical path to the entity using the public ids of parent entities. e.g. for a model this might be '/volkswagen', or for a variant this might be '/volkswagen/polo'. For makes this is always "/". This value is used as the partition key.
- PublicId: A slugified version of the name used as the public identifier for item, unique within the ParentPath e.g. "volkswagen", "3-series" or "se-petrol-1-3".
- Name: Publicly displayable name or title of the record e.g. "BMW" or "3 Series".
- CreateDate: The date the record was created in UTC.
- VariantData: Additional data fields relating only to "Variant" entities.
Points of note:
- Because the
id
is composed from the full path we can use it to determine uniqueness via point reads and enforce uniqueness during inserts. - The
ParentPath
partition key can be used to efficiently return all makes as well as lists of models and variants filtered to their parent entity. The ordering of items is facilitated by an index on thename
property.
Here's an example variant entry:
{
"entityType": "Variant",
"id": "odalcqp34gotpvwjgwjyndwssu3cgcyxldwo5yh2bf6nhyn3dnfyzakj7h",
"parentPath": "/alfa-romeo/alfa-romeo-159",
"publicId": "159-jts-3-2l-petrol",
"name": "159 JTS 3.2l Petrol",
"createDate": "2024-09-06T15:56:58.5962699",
"variantData": {
"fuelCategory": "Petrol",
"engineSizeInCC": 3200
},
}
Currently there's two approaches to developing Azure Functions in .NET, "in-process" and "isolated". This example uses the isolated worker model because it is the recommended approach, with support for the in-process model ending in Nov 2024. The Durable Functions framework is used to handle the long running data import process.
Auth is not covered by this example project so all API functions are set to allow anonymous access.
There are a number of IaC frameworks for getting Azure infrastructure deployed:
- ARM Templates: Base-level framework for defining and provisioning Azure resources through JSON templates.
- Bicep: A concise DSL that builds on top of ARM templates making them simpler and easier to work with. Only supports Azure resources.
- Terraform: Popular cross-platform IaC via configuration files (HCL). OpenTofu is an open source fork of Terraform.
- Pulumi: Open source cross platform IaC using programming languages such as TypeScript and C#.
This example deals only with Azure resources so I've chosen to use Bicep.
- .NET 8 SDK
- Visual Studio Azure workload or equivalent tooling for your environment
- Docker environment e.g. Docker Desktop.
Tested on Visual Studio 2022 and Docker Desktop for Windows with WSL2, but other environments should be supported.
First startup local resources via the docker compose file in the root of the repository:
docker compose -f docker-compose.dev.yml up
The local ComsosDb emulator can take a while to startup, so make sure you wait until it's done.
IaC is used for resource creation in Azure, but this cannot be run against local emulators. For your local API environment you can run the VehicleTaxonomy.Azure.LocalInfraInitializer console app to create the required CosmosDb containers and Azure Storage accounts.
First set the connection strings in your local user secrets configuration to match the following, then run the console app:
{
"CosmosDb": {
"ConnectionString": "AccountEndpoint=https://localhost:8081/;AccountKey=C2y6yDjf5/R+ob0N8A7Cgv30VRDJIWEHLM+4QDU5DE2nQ9nDuVTqobD4b8mGGyPMbIZnqyMsEcaGQy67XIw/Jw==",
"UseLocalDb": true
},
"BlobStorage": {
"ConnectionString": "DefaultEndpointsProtocol=http;AccountName=devstoreaccount1;AccountKey=Eby8vdM02xNOcqFlqUwJPLlmEtlCDXJ1OUzFT50uSRZ6IFsuFq2UVErCz4I6tq/K1SZFPTOtr/KBHBeksoGMGw==;BlobEndpoint=http://127.0.0.1:18080/devstoreaccount1;QueueEndpoint=http://127.0.0.1:18081/devstoreaccount1;TableEndpoint=http://127.0.0.1:18082/devstoreaccount1;"
}
}
Note that the above keys are public, well known keys for the Azure emulators.
Next configure the VehicleTaxonomy.Azure.Api project to connect to your local docker resources by configuring your user secrets with the same settings used in step 2.
Start up the VehicleTaxonomy.Azure.Api project.
- Swagger API docs are available at
http://localhost:7177/api/swagger/ui
. - The docker-based CosmosDb emulator GUI is available at
https://localhost:8081/_explorer/index.html
Many of the domain project tests run integrated with a local CosmosDb instance provided via test containers which is automatically regenerated for each test run. As long as you have a local docker runtime running you will be able to run the tests. Unfortunately the long startup time of the CosmosDb emulator significantly delays the running of integration tests, so when running locally it's more performant to test against an long-lived emulator instance. You can do this using the same local development docker compose file:
docker compose -f docker-compose.dev.yml up
Then change the connection string in your VehicleTaxonomy.Azure.Domain.Tests
project user secrets configuration:
{
"CosmosDb": {
"ConnectionString": "AccountEndpoint=https://localhost:8081/;AccountKey=C2y6yDjf5/R+ob0N8A7Cgv30VRDJIWEHLM+4QDU5DE2nQ9nDuVTqobD4b8mGGyPMbIZnqyMsEcaGQy67XIw/Jw==",
"UseLocalDb": true
}
}
Tests are run against a separate CosmosDb database, so there's no conflict with your local API development environment.
There are currently no tests for the VehicleTaxonomy.Azure.Api
project due to a lack of test server support in Azure Functions. Issue 1 covers resolving this.
Azure resources are deployed via Bicep using the files located in the /infra folder, see the infra readme for deployment instructions.
Once the infrastructure is in place you can deploy the VehicleTaxonomy.Azure.Api
functions project either via the Visual Studio publish dialog or via the command line with Azure Functions Core Tools. No additional application configuration is required, it is all handled by the bicep files.
Automated deployment is supported via Azure Pipelines. See the deployment readme for more information.
Swagger API docs are available at /api/swagger/ui
.
Bruno is an open source API client similar to postman, but does not require a login and saves API collections to your file system.
A Bruno API collection is available in the /docs/api folder, all you need to do is add a baseUrl
environment variable for your functions deployment.