Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Datamodel docs #7

Merged
merged 18 commits into from
Jan 27, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 4 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
@@ -1,2 +1,5 @@
site/
venv/
venv/

# ide
.idea
110 changes: 110 additions & 0 deletions docs/IntelOwl/contribute.md
Original file line number Diff line number Diff line change
Expand Up @@ -430,6 +430,116 @@ You can use the django management command `dumpplugin` to automatically create t

Example: `docker exec -ti intelowl_uwsgi python3 manage.py dumpplugin PlaybookConfig <new_analyzer_name>`

## How to create a DataModel

After the successful execution of an `Analyzer`, a `DataModel` will be created only if `_do_create_data_model` returns `True` and at least one of the following conditions is true:
1. The `mapping_data_model` field is defined in the `AnalyzerConfig`
2. The `Analyzer` overrides `_update_data_model`
3. The `Analyzer` overrides `_create_data_model_mtm`

### AnalyzerConfig.mapping_data_model
Each `AnalyzerConfig` has now a new field, called `mapping_data_model`: this is a dictionary whose keys represent the path used to retrieve the value in the `AnalyzerReport`, while its values represent the fields in the `DataModel`.

If you precede the key with the symbol `$` it means that is a constant.

Example:
```python3
report= {
"data": {
"urls": [
{"url": "intelowl.com"},
{"url": "https://intelowl.com"}
],
"country": "IT",
"tags": [
"social_engineering",
"random_things"
]
}
}
mapping_data_model={
"data.urls.url": "external_urls", # unmarshaling of the array is done automatically
"data.country": "country_code",
"$malicious": "evaluation", # the $ specify that this is a constant
"data.tags.0": "tags" # we just want the first tag
}
```

With the previously shown `AnalyzerReport` and its mapping, we will create a DataModel with these conditions
```python3
# the values are lowercase because everything inside the DataModel is converted to lowercase
assert external_urls == ["intelowl.com", "https://intelowl.com"]
assert country_code == "it"
assert evaluation == "malicious"
```

If you specify a path that is not present in the `DataModel`, an error will be added to the job.
If you specify a path that is not present in the `AnalyzerConfig`, a warning will be added to the job.

### Analyzer._do_create_data_model
This is a function that every `Analyzer` can override: this function returns a boolean and, if `False`, the DataModel will not be created.
This can be useful when a specific `Analyzer` succeeds without retrieving useful results.
Let's use the `UrlHaus` Analyzer as an example : if the domain analyzed is not present in its database, the result will be
```python3
{"query_status": "no_results"}
```
meaning that we can use the following code to consider only _real_ results:
```python3
def _do_create_data_model(self) -> bool:
return (
super()._do_create_data_model()
and self.report.report.get("query_status", "no_results") != "no_results"
)
```

### Analyzer._create_data_model_mtm
This is a function that every `Analyzer` can override: this function returns a dictionary where the values are the objects that will be added in a many to many relationship in the DataModel, and the keys the names of the fields.
This is useful when you want to save part of a report in separate Model and want to reference it with a many to many relationship.
Let's use the `Yara` Analyzer as an example.

```python3
def _create_data_model_mtm(self):
from api_app.data_model_manager.models import Signature

signatures = []
for yara_signatures in self.report.report.values():
for yara_signature in yara_signatures:
url = yara_signature.pop("rule_url", None)
sign = Signature.objects.create(
provider=Signature.PROVIDERS.YARA.value,
signature=yara_signature,
url=url if url else "",
score=1,
)
signatures.append(sign)

return {"signatures": signatures}

```
Here we are creating many `Signature` objects (using the signatures that matched the sample analyzed) and adding them to the `signatures` field.

### Analyzer._update_data_model
This is the last function that you can override in the `Analyzer` class: this function returns nothing, and is called after every other check.
This mean that you can use it for more articulate data transformation to parse the `AnalyzerReport` into a `DataModel`.

Again, let's use an example, this time with the analyzer `AbuseIPDB`.
```python3
def _update_data_model(self, data_model) -> None:
super()._update_data_model(data_model)
if self.report.report.get("totalReports", 0):
self.report: AnalyzerReport
if self.report.report["isWhitelisted"]:
evaluation = (
self.report.data_model_class.EVALUATIONS.TRUSTED.value
)
else:
evaluation = self.report.data_model_class.EVALUATIONS.MALICIOUS.value
data_model.evaluation = evaluation
```
We are setting the field `evaluation` depending on some logic that we constructed, using the data inside the report.
If the IP address has been reported by some AbuseIPDB users but, at the same time, is whitelisted by AbuseIPDB, then we set its `evaluation` to `trusted`. On the contrary, if it's not whitelisted, we set it as `malicious`.


## How to modify a plugin

If the changes that you have to make should stay local, you can just change the configuration inside the `Django admin` page.
Expand Down
15 changes: 15 additions & 0 deletions docs/IntelOwl/usage.md
Original file line number Diff line number Diff line change
Expand Up @@ -295,6 +295,21 @@ The form will open with the fields to fill in to create the analyzer.

![img.png](./static/analyzer_creation.png)

### DataModels

_Available from version > 6.2.0_

The main functionality of a `DataModel` is to model an `Analyzer` result to a set of prearranged keys, allowing users to easily search, evaluate and use the analyzer result.
The author of an `AnalyzerConfig` is able to decide the mapping between each field of the `AnalyzerReport` and the corresponding one in the `DataModel`.

There are three types of `DataModel`:
1. `DomainDataModel` is the `DataModel` for domains and URLs
2. `IPDataModel` is the `DataModel` for IP addresses
3. `FileDataModel` is the `DataModel` for files and hashes

The `DataModel` will not be created for generic observables. This is a design choice and could be changed in future.

This feature is still in the development phase. At the moment, the DataModels created are saved in the database, but they are not being used for further operations.

### Connectors

Expand Down