NLP-Based Semantic Matching on ECLASS

About this Project

This repository contains the implementation and evaluation code for my master's thesis "NLP-Based Semantic Matching on ECLASS: Design and Validation of an Industrie 4.0 Matching Service" at the Chair of Information and Automation Systems for Process and Material Technology at RWTH Aachen University.

The aim is to develop a proof-of-concept Semantic Matching Service that leverages NLP techniques to semantically match concept definitions from the IEC 61360-2-compliant ECLASS dictionary. Furthermore, the project aims to investigate occurring matching patterns, outliers and errors.

Project Structure

semantic-matching-nlp-eclass/
│
├── data/                           # All data
│   ├── embedded/                   # Embeddings
│   │   ├── filtered/               # Filtered Embeddings
│   │   └── unfiltered/             # Unfiltered Embeddings
│   ├── extracted/                  # Extracted data
│   ├── original/                   # Original data
│   └── scores/                     # Matching scores
│
├── src/                            # Source code
│   ├── embedding/                  # Data preprocessing and embeddings generation
│   ├── evaluation/                 # Data evaluation and visualisation
│   ├── service/                    # Semantic Matching Service
│   └── utils/                      # Helper functions
│
├── test/                           # Unit testing
│
├── test_data/                      # Data for testing
│
└── visualisation/                  # Visualised results

Please note that, due to ECLASS copyright restrictions, files in data/ and visualisation/ cannot be included in this public repository.

Getting Started

How to get this project running, assuming you have a cuda-capable GPU on your Windows machine:

Run nvidia-smi to see which version of cuda you have (For me that was cuda v12.9)
Visit https://pytorch.org/get-started/locally/ and select the correct properties to get an installation link that looks like this:

pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu128

To verify if it worked, try the following snippet:

import torch
torch.cuda.is_available()

Run pip install -r requirements.txt (Note that pip was a bit unhappy with the pytorch dependency, but since we already installed it above I quickly commented it out.)
Add the raw ECLASS Basic files (ECLASS15_0_BASIC_EN_SG_01.xml) to the ./data/raw/ directory
Run src/data/extract_xml_to_csv.py
Run src/data/embeddings_<model>.py

Note

Theoretically, the script is supposed to download the necessary model files, however I had to manually download the models and use a local filepath, since the script download kept getting stuck for some reason. If that's the case, you can simply edit the line: model = SentenceTransformer("<put path here instead of model name>, ...).

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
.github/workflows		.github/workflows
data		data
src		src
test		test
test_data		test_data
visualisation		visualisation
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

NLP-Based Semantic Matching on ECLASS

About this Project

Project Structure

Getting Started

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

License

moritzsommer/nlp-semantic-matching-service

Folders and files

Latest commit

History

Repository files navigation

NLP-Based Semantic Matching on ECLASS

About this Project

Project Structure

Getting Started

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages