diff --git a/.gitignore b/.gitignore index 9b72d31..76fef04 100644 --- a/.gitignore +++ b/.gitignore @@ -1,2 +1,5 @@ site/ -venv/ \ No newline at end of file +venv/ + +# ide +.idea \ No newline at end of file diff --git a/docs/IntelOwl/contribute.md b/docs/IntelOwl/contribute.md index 7f04bfb..93a2ba9 100644 --- a/docs/IntelOwl/contribute.md +++ b/docs/IntelOwl/contribute.md @@ -430,6 +430,116 @@ You can use the django management command `dumpplugin` to automatically create t Example: `docker exec -ti intelowl_uwsgi python3 manage.py dumpplugin PlaybookConfig ` +## How to create a DataModel + +After the successful execution of an `Analyzer`, a `DataModel` will be created only if `_do_create_data_model` returns `True` and at least one of the following conditions is true: +1. The `mapping_data_model` field is defined in the `AnalyzerConfig` +2. The `Analyzer` overrides `_update_data_model` +3. The `Analyzer` overrides `_create_data_model_mtm` + +### AnalyzerConfig.mapping_data_model +Each `AnalyzerConfig` has now a new field, called `mapping_data_model`: this is a dictionary whose keys represent the path used to retrieve the value in the `AnalyzerReport`, while its values represent the fields in the `DataModel`. + +If you precede the key with the symbol `$` it means that is a constant. + +Example: +```python3 +report= { + "data": { + "urls": [ + {"url": "intelowl.com"}, + {"url": "https://intelowl.com"} + ], + "country": "IT", + "tags": [ + "social_engineering", + "random_things" + ] + } +} +mapping_data_model={ + "data.urls.url": "external_urls", # unmarshaling of the array is done automatically + "data.country": "country_code", + "$malicious": "evaluation", # the $ specify that this is a constant + "data.tags.0": "tags" # we just want the first tag +} +``` + +With the previously shown `AnalyzerReport` and its mapping, we will create a DataModel with these conditions +```python3 +# the values are lowercase because everything inside the DataModel is converted to lowercase +assert external_urls == ["intelowl.com", "https://intelowl.com"] +assert country_code == "it" +assert evaluation == "malicious" +``` + +If you specify a path that is not present in the `DataModel`, an error will be added to the job. +If you specify a path that is not present in the `AnalyzerConfig`, a warning will be added to the job. + +### Analyzer._do_create_data_model +This is a function that every `Analyzer` can override: this function returns a boolean and, if `False`, the DataModel will not be created. +This can be useful when a specific `Analyzer` succeeds without retrieving useful results. +Let's use the `UrlHaus` Analyzer as an example : if the domain analyzed is not present in its database, the result will be +```python3 +{"query_status": "no_results"} +``` +meaning that we can use the following code to consider only _real_ results: +```python3 +def _do_create_data_model(self) -> bool: + return ( + super()._do_create_data_model() + and self.report.report.get("query_status", "no_results") != "no_results" + ) +``` + +### Analyzer._create_data_model_mtm +This is a function that every `Analyzer` can override: this function returns a dictionary where the values are the objects that will be added in a many to many relationship in the DataModel, and the keys the names of the fields. +This is useful when you want to save part of a report in separate Model and want to reference it with a many to many relationship. +Let's use the `Yara` Analyzer as an example. + +```python3 +def _create_data_model_mtm(self): + from api_app.data_model_manager.models import Signature + + signatures = [] + for yara_signatures in self.report.report.values(): + for yara_signature in yara_signatures: + url = yara_signature.pop("rule_url", None) + sign = Signature.objects.create( + provider=Signature.PROVIDERS.YARA.value, + signature=yara_signature, + url=url if url else "", + score=1, + ) + signatures.append(sign) + + return {"signatures": signatures} + +``` +Here we are creating many `Signature` objects (using the signatures that matched the sample analyzed) and adding them to the `signatures` field. + +### Analyzer._update_data_model +This is the last function that you can override in the `Analyzer` class: this function returns nothing, and is called after every other check. +This mean that you can use it for more articulate data transformation to parse the `AnalyzerReport` into a `DataModel`. + +Again, let's use an example, this time with the analyzer `AbuseIPDB`. +```python3 +def _update_data_model(self, data_model) -> None: + super()._update_data_model(data_model) + if self.report.report.get("totalReports", 0): + self.report: AnalyzerReport + if self.report.report["isWhitelisted"]: + evaluation = ( + self.report.data_model_class.EVALUATIONS.TRUSTED.value + ) + else: + evaluation = self.report.data_model_class.EVALUATIONS.MALICIOUS.value + data_model.evaluation = evaluation +``` +We are setting the field `evaluation` depending on some logic that we constructed, using the data inside the report. +If the IP address has been reported by some AbuseIPDB users but, at the same time, is whitelisted by AbuseIPDB, then we set its `evaluation` to `trusted`. On the contrary, if it's not whitelisted, we set it as `malicious`. + + ## How to modify a plugin If the changes that you have to make should stay local, you can just change the configuration inside the `Django admin` page. diff --git a/docs/IntelOwl/usage.md b/docs/IntelOwl/usage.md index 97ebc1c..7d7d6af 100644 --- a/docs/IntelOwl/usage.md +++ b/docs/IntelOwl/usage.md @@ -295,6 +295,21 @@ The form will open with the fields to fill in to create the analyzer. ![img.png](./static/analyzer_creation.png) +### DataModels + +_Available from version > 6.2.0_ + +The main functionality of a `DataModel` is to model an `Analyzer` result to a set of prearranged keys, allowing users to easily search, evaluate and use the analyzer result. +The author of an `AnalyzerConfig` is able to decide the mapping between each field of the `AnalyzerReport` and the corresponding one in the `DataModel`. + +There are three types of `DataModel`: +1. `DomainDataModel` is the `DataModel` for domains and URLs +2. `IPDataModel` is the `DataModel` for IP addresses +3. `FileDataModel` is the `DataModel` for files and hashes + +The `DataModel` will not be created for generic observables. This is a design choice and could be changed in future. + +This feature is still in the development phase. At the moment, the DataModels created are saved in the database, but they are not being used for further operations. ### Connectors