-
Notifications
You must be signed in to change notification settings - Fork 98
Developers
The following steps are only required for local development and testing. The containerized version is recommended for production use.
-
Install the following packages using your OS package manager (apt, yum, homebrew, etc.):
- make
- shellcheck
- shfmt
-
Start by cloning this repository.
git clone [email protected]:center-for-threat-informed-defense/tram.git
-
Change to the TRAM directory.
cd tram/
-
Create a virtual environment and activate the new virtual environment.
-
Mac and Linux
python3 -m venv venv source venv/bin/activate
-
Windows
venv\Scripts\activate.bat
-
-
Install Python application requirements.
pip install -r requirements/requirements.txt pip install -r requirements/test-requirements.txt
-
Install pre-commit hooks
pre-commit install
-
Set up the application database.
tram makemigrations tram tram migrate
-
Run the Machine learning training.
tram attackdata load tram pipeline load-training-data tram pipeline train --model nb tram pipeline train --model logreg tram pipeline train --model nn_cls
-
Download the pre-trained tokenizer and BERT models.
python3 -c "import os; import transformers; mdl = transformers.AutoTokenizer.from_pretrained('allenai/scibert_scivocab_uncased'); mdl.save_pretrained('data/ml-models/priv-allenai-scibert-scivocab-uncased')" curl -L "https://ctidtram.blob.core.windows.net/tram-models/single-label-202308303/config.json" \ -o data/ml-models/bert_model/config.json curl -L "https://ctidtram.blob.core.windows.net/tram-models/single-label-202308303/pytorch_model.bin" \ -o data/ml-models/bert_model/pytorch_model.bin
-
Create a superuser (web login)
```sh
tram createsuperuser
```
-
Run the application server
DJANGO_DEBUG=1 tram runserver
-
Open the application in your web browser.
- Navigate to http://localhost:8000 and use the superuser to log in
-
In a separate terminal window, run the ML pipeline
cd tram/ source venv/bin/activate tram pipeline run
The repository includes a Makefile
that includes handy shortcuts for common development tasks:
- Run TRAM application
make start-container
- Stop TRAM application
make stop-container
- View TRAM logs
make container-logs
- Build Python virtualenv
make venv
- Install production Python dependencies
make install
- Install prod and dev Python dependencies
make install-dev
- Manually run pre-commit hooks without performing a commit
make pre-commit-run
- Build container image (By default, container is tagged with timestamp and git hash of codebase) See note below about custom CA certificates in the Docker build.)
make build-container
- Run linting locally
make lint
- Run unit tests, safety, and bandit locally
make test
The automated test suite runs inside tox
, which guarantees a consistent testing
environment, but also has considerable overhead. When writing code, it may be
useful to run pytest
directly, which is considerably faster and can also be
used to run a specific test. Here are some useful pytest commands:
# Run the entire test suite:
$ pytest tests/
# Run tests in a specific file:
$ pytest tests/tram/test_models.py
# Run a test by name:
$ pytest tests/ -k test_mapping_repr_is_correct
# Run tests with code coverage tracking, and show which lines are missing coverage:
$ pytest --cov=tram --cov-report=term-missing tests/
If you are building the container in an environment that intercepts SSL connections, you can specify a root CA certificate to inject into the container at build time. (This is only necessary for the TRAM application container. The TRAM Nginx container does not make outbound connections.)
Export the following two variables in your environment.
$ export TRAM_CA_URL="http://your.domain.com/root.crt"
$ export TRAM_CA_THUMBPRINT="C7:E0:F9:69:09:A4:A3:E7:A9:76:32:5F:68:79:9A:85:FD:F9:B3:BD"
The first variable is a URL to a PEM certificate containing a root certificate
that you want to inject into the container. (If you use an https
URL, then
certificate checking is disabled.) The second variable is a SHA-1 certificate
thumbprint that is used to verify that the correct certificate was downloaded.
You can obtain the thumbprint with the following OpenSSL command:
$ openssl x509 -in <your-cert.crt> -fingerprint -noout
SHA1 Fingerprint=C7:E0:F9:69:09:A4:A3:E7:A9:76:32:5F:68:79:9A:85:FD:F9:B3:BD
After exporting these two variables, you can run make build-container
as usual
and the TRAM container will contain your specified root certificate.
All source code related to machine learning is located in TRAM src/tram/ml.
TRAM has five machine learning models that can be used out-of-the-box:
- SKLearn models
- LogisticRegressionModel - Uses SKLearn's Logistic Regression.
- NaiveBayesModel - Uses SKLearn's Multinomial NB.
- Multilayer Perception - Uses SKLearn's MLPClassifier.
- DummyModel - Uses SKLearn's Dummy Classifier for testing purposes.
- Large Language Models (PyTorch)
- BERT Classifier - Uses Huggingface's transformers library with a fine-tuned BERT model.
The SKLearn models are each implemented as an SKLearn Pipeline. Machine learning engineers will find that it's pretty easy to plug in a new SKLearn model (see Creating Your Own SKLearn Model).
In order to write your own model, take the following steps:
-
Create a subclass of
tram.ml.base.SKLearnModel
that implements theget_model
function. See existing ML Models for examples that can be copied.class DummyModel(SKLearnModel): def get_model(self): # Your model goes here return Pipeline([ ("features", CountVectorizer(lowercase=True, stop_words='english', min_df=3)), ("clf", DummyClassifier(strategy='uniform')) ])
-
Add your model to the
ModelManager
registry- Note: This method can be improved. Pull requests welcome!
class ModelManager(object): model_registry = { 'dummy': DummyModel, 'nb': NaiveBayesModel, 'logreg': LogisticRegressionModel, # Your model on the line below 'your-model': python.path.to.your.model }
-
You can now train your model, and the model will appear in the application interface.
tram pipeline train --model your-model
-
If you are interested in sharing your model with the community, thank you! Please open a Pull Request with your model, and please include performance statistics in your Pull Request description.