Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/visualization api #62

Draft
wants to merge 53 commits into
base: develop
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
53 commits
Select commit Hold shift + click to select a range
4562ffa
Merge pull request #3 from IFRCGo/fix/docker-file
susilnem Nov 15, 2024
9baaf3a
Add gdacs extraction.
Rup-Narayan-Rajbanshi Nov 15, 2024
6218511
Remove old table and add different table for Logs and Extraction.
Rup-Narayan-Rajbanshi Nov 18, 2024
5a6a482
Add migration command to import Gdacs data.
Rup-Narayan-Rajbanshi Nov 18, 2024
55fb4fe
Update etl model structure.
Rup-Narayan-Rajbanshi Nov 19, 2024
3fe76ce
Add celery-beat-scheduler to schedule task.
Rup-Narayan-Rajbanshi Nov 21, 2024
76a9419
Add the validator for gdacs event
ranjan-stha Nov 21, 2024
a1fdecd
Implementation first level, second level, third level extraction in
Rup-Narayan-Rajbanshi Nov 22, 2024
d7fc7d2
Fix error in html scrapping for GDACS.
Rup-Narayan-Rajbanshi Nov 25, 2024
2dd7d45
Modify admin table for Extraction Data.
Rup-Narayan-Rajbanshi Nov 25, 2024
af81446
Set all fields to readonly in admin.
Rup-Narayan-Rajbanshi Nov 26, 2024
2872b22
Add sample.env file.
Rup-Narayan-Rajbanshi Nov 26, 2024
e6ca2c3
Add validator for response data.
Rup-Narayan-Rajbanshi Nov 27, 2024
cbd2c9c
Code clean up.
Rup-Narayan-Rajbanshi Nov 27, 2024
7b33117
Add file hash for response data.
Rup-Narayan-Rajbanshi Nov 28, 2024
472e637
Manage dublicate files.
Rup-Narayan-Rajbanshi Nov 28, 2024
ccb5d67
Add the validation for events geometry data;
ranjan-stha Nov 29, 2024
8299211
Separate independent task for extraction.
Rup-Narayan-Rajbanshi Dec 3, 2024
a84aec4
Solve issue arised after converting the ext into celery task.
Rup-Narayan-Rajbanshi Dec 4, 2024
21c7296
WIP: Apply retry mechanism for extraction.
Rup-Narayan-Rajbanshi Dec 4, 2024
9424b7e
WIP - Transformation logic structure added.
Rup-Narayan-Rajbanshi Dec 5, 2024
b30d28b
Refactor save extraction data.
Rup-Narayan-Rajbanshi Dec 6, 2024
e15d9f0
Fix issue for retry mechanism.
Rup-Narayan-Rajbanshi Dec 9, 2024
0d08aef
Add Population Exposure validator;
ranjan-stha Dec 10, 2024
a34a4b6
Integrate validation for population exposure data.
Rup-Narayan-Rajbanshi Dec 12, 2024
7be4829
Code clean up.
Rup-Narayan-Rajbanshi Dec 13, 2024
d2bba2a
Code clean up.
Rup-Narayan-Rajbanshi Dec 15, 2024
a3f50a9
Fix attempt no.
Rup-Narayan-Rajbanshi Dec 15, 2024
f222650
Code clean up.
Rup-Narayan-Rajbanshi Dec 15, 2024
41da836
Add logging settings.
Rup-Narayan-Rajbanshi Dec 15, 2024
6af626b
code clean up using pre-commit.
Rup-Narayan-Rajbanshi Dec 16, 2024
c39a77e
Add validators and rename filenames
ranjan-stha Dec 17, 2024
684f380
fetch event data from gdacs api.
Rup-Narayan-Rajbanshi Dec 17, 2024
633d8e2
Solve issue in validation.
Rup-Narayan-Rajbanshi Dec 17, 2024
82b41c1
pass hazard type in each extraction object.
Rup-Narayan-Rajbanshi Dec 17, 2024
1a485a5
feat: add `pystac-monty` as a submodule and integrate it
samshara Dec 20, 2024
bb556c2
Integrate docker with pystac-monty-submodule
Rup-Narayan-Rajbanshi Dec 20, 2024
3b0db7e
Merge pull request #53 from IFRCGo/feat/pystac-monty-submodule
Rup-Narayan-Rajbanshi Dec 20, 2024
d5af9e5
WIP transformation.
Rup-Narayan-Rajbanshi Dec 23, 2024
5e9b292
Wip: transformation
Rup-Narayan-Rajbanshi Dec 23, 2024
a7497d5
Add model for gdacs transformation.
Rup-Narayan-Rajbanshi Dec 25, 2024
aea6c28
Solve issue for transformation geo data.
Rup-Narayan-Rajbanshi Dec 26, 2024
96123f1
Add task for loading into stac api.
Rup-Narayan-Rajbanshi Dec 26, 2024
fc49b88
Add monty_etl id in the imput for stac api.
Rup-Narayan-Rajbanshi Dec 27, 2024
d014baf
Setup strawbettry
Rup-Narayan-Rajbanshi Dec 16, 2024
f1b3833
Add api for extraction data.
Rup-Narayan-Rajbanshi Dec 16, 2024
b3d050c
Update README file.
Rup-Narayan-Rajbanshi Dec 18, 2024
8ec8b6a
Add command to generate schema.graphql file.
Rup-Narayan-Rajbanshi Dec 18, 2024
17e6f44
Resolve cors issue.
Rup-Narayan-Rajbanshi Dec 19, 2024
b699f03
Add api for /me.
Rup-Narayan-Rajbanshi Dec 20, 2024
97ab669
Generate new .schema file.
Rup-Narayan-Rajbanshi Dec 20, 2024
0135cc2
- Add extraction queries
sudan45 Dec 30, 2024
64c1d42
Refactor code
sudan45 Dec 31, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -129,3 +129,4 @@ dmypy.json

# editors
.idea/
source_raw_data/
3 changes: 3 additions & 0 deletions .gitmodules
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
[submodule "libs/pystac-monty"]
path = libs/pystac-monty
url = https://github.com/IFRCGo/pystac-monty.git
2 changes: 2 additions & 0 deletions Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,8 @@ WORKDIR /code

COPY pyproject.toml poetry.lock /code/

COPY libs /code/libs

RUN apt-get update -y \
&& apt-get install -y --no-install-recommends \
# Build required packages
Expand Down
25 changes: 25 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
## Getting started

- Clone this repository: [email protected]:IFRCGo/montandon-etl.github
- Go the directory where manage.py exists.
- Create a .env file and copy all environment variable from sample.env.
- Set your own environment variables in .env file.
- Buiid docker using this command:
```bash
docker compose up --build -d
```
- Run migration using this command:
```bash
docker-compose exec web python manage.py migrate
```
- Command to import GDACS data.
```bash
docker-compose exec web python manage.py import_gdacs_data
```
- To view the imported data in the admin panel you need to create yourself as a superuser:
```bash
docker-compose exec web python manage.py createsuperuser
```
Fill up the form for creating super user.
- Once user is created, go the browser and request the link localhost:8000/admin/ to view the data in Extraction data table.
- To go to graphql server go to: localhost:8000/graphql
Empty file added apps/common/__init__.py
Empty file.
Empty file added apps/common/admin.py
Empty file.
6 changes: 6 additions & 0 deletions apps/common/apps.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
from django.apps import AppConfig


class CommonConfig(AppConfig):
default_auto_field = "django.db.models.BigAutoField"
name = "apps.common"
14 changes: 14 additions & 0 deletions apps/common/dataloaders.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
import typing

from django.db import models

DjangoModel = typing.TypeVar("DjangoModel", bound=models.Model)


def load_model_objects(
Model: typing.Type[DjangoModel],
keys: list[int],
) -> list[DjangoModel]:
qs = Model.objects.filter(id__in=keys)
_map = {obj.pk: obj for obj in qs}
return [_map[key] for key in keys]
Empty file.
23 changes: 23 additions & 0 deletions apps/common/management/commands/generate_schema.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
import argparse

from django.core.management.base import BaseCommand
from strawberry.printer import print_schema

from main.graphql.schema import schema


class Command(BaseCommand):
help = "Create schema.graphql file"

def add_arguments(self, parser):
parser.add_argument(
"--out",
type=argparse.FileType("w"),
default="schema.graphql",
)

def handle(self, *args, **options):
file = options["out"]
file.write(print_schema(schema))
file.close()
self.stdout.write(self.style.SUCCESS(f"{file.name} file generated"))
37 changes: 37 additions & 0 deletions apps/common/migrations/0001_initial.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
# Generated by Django 5.1.3 on 2024-11-21 11:08

import django.db.models.deletion
from django.db import migrations, models


class Migration(migrations.Migration):

initial = True

dependencies = [
]

operations = [
migrations.CreateModel(
name='Region',
fields=[
('id', models.BigAutoField(auto_created=True, primary_key=True, serialize=False, verbose_name='ID')),
('name', models.IntegerField(choices=[(0, 'Africa'), (1, 'Americas'), (2, 'Asia Pacific'), (3, 'Europe'), (4, 'Middle East & North Africa')], verbose_name='name')),
],
),
migrations.CreateModel(
name='Country',
fields=[
('id', models.BigAutoField(auto_created=True, primary_key=True, serialize=False, verbose_name='ID')),
('name', models.CharField(blank=True, max_length=255, null=True, verbose_name='name')),
('iso3', models.CharField(blank=True, max_length=3, null=True, verbose_name='iso3')),
('iso', models.CharField(blank=True, max_length=2, null=True, verbose_name='iso2')),
('record_type', models.IntegerField(blank=True, choices=[(1, 'Country'), (2, 'Cluster'), (3, 'Region'), (4, 'Country Office'), (5, 'Representative Office')], help_text='Type of entity', null=True, verbose_name='type')),
('bbox', models.JSONField(blank=True, default=dict, null=True, verbose_name='bbox')),
('centroid', models.JSONField(blank=True, default=dict, null=True, verbose_name='centroid')),
('independent', models.BooleanField(default=None, help_text='Is this an independent country?', null=True)),
('is_deprecated', models.BooleanField(default=False, help_text='Is this an active, valid country?')),
('region', models.ForeignKey(blank=True, null=True, on_delete=django.db.models.deletion.SET_NULL, to='common.region', verbose_name='region')),
],
),
]
Empty file.
76 changes: 76 additions & 0 deletions apps/common/models.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,76 @@
from django.db import models
from django.utils.translation import gettext_lazy as _


class UserResource(models.Model):
created_at = models.DateTimeField(auto_now_add=True)
modified_at = models.DateTimeField(auto_now=True)
# Typing
id: int
pk: int

class Meta:
abstract = True
ordering = ["-id"]


class Region(models.Model):
class RegionName(models.IntegerChoices):
AFRICA = 0, _("Africa")
AMERICAS = 1, _("Americas")
ASIA_PACIFIC = 2, _("Asia Pacific")
EUROPE = 3, _("Europe")
MENA = 4, _("Middle East & North Africa")

name = models.IntegerField(
verbose_name=_("name"),
choices=RegionName.choices,
)

def __str__(self):
return f"{self.name}"


class Country(models.Model):
class CountryType(models.IntegerChoices):
"""
We use the Country model for some things that are not "Countries". This helps classify the type.
"""

COUNTRY = 1, _("Country")
CLUSTER = 2, _("Cluster")
REGION = 3, _("Region")
COUNTRY_OFFICE = 4, _("Country Office")
REPRESENTATIVE_OFFICE = 5, _("Representative Office")

name = models.CharField(max_length=255, verbose_name=_("name"), null=True, blank=True)
iso3 = models.CharField(max_length=3, verbose_name=_("iso3"), null=True, blank=True)
iso = models.CharField(max_length=2, verbose_name=_("iso2"), null=True, blank=True)
record_type = models.IntegerField(
choices=CountryType.choices, verbose_name=_("type"), null=True, blank=True, help_text=_("Type of entity")
)
region = models.ForeignKey(Region, verbose_name=_("region"), null=True, blank=True, on_delete=models.SET_NULL)
bbox = models.JSONField(
default=dict,
null=True,
blank=True,
verbose_name=_("bbox"),
)
centroid = models.JSONField(
default=dict,
null=True,
blank=True,
verbose_name=_("centroid"),
)
independent = models.BooleanField(default=None, null=True, help_text=_("Is this an independent country?"))
is_deprecated = models.BooleanField(default=False, help_text=_("Is this an active, valid country?"))

def __str__(self):
return f"{self.name} - {self.iso3}"

def save(self, *args, **kwargs):
if self.iso3:
self.iso3 = self.iso3.lower()
if self.iso:
self.iso = self.iso.lower()
return super().save(*args, **kwargs)
Empty file added apps/common/tests.py
Empty file.
14 changes: 14 additions & 0 deletions apps/common/types.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
import strawberry_django
from django.contrib.auth.models import User
from strawberry import auto


@strawberry_django.type(User)
class UserMeType:
id: auto
username: auto
first_name: auto
last_name: auto
email: auto
is_staff: auto
is_superuser: auto
Empty file added apps/common/views.py
Empty file.
Empty file added apps/etl/__init__.py
Empty file.
47 changes: 47 additions & 0 deletions apps/etl/admin.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
from django.contrib import admin

# Register your models here.
from .models import ExtractionData, GdacsTransformation


@admin.register(ExtractionData)
class ExtractionDataAdmin(admin.ModelAdmin):
def get_readonly_fields(self, request, obj=None):
# Use the model's fields to populate readonly_fields
if obj: # If the object exists (edit page)
return [field.name for field in self.model._meta.fields]
return []

list_display = (
"id",
"source",
"resp_code",
"status",
"parent__id",
"resp_data_type",
"source_validation_status",
"hazard_type",
"created_at",
)
list_filter = ("status",)
autocomplete_fields = ["parent"]
search_fields = ["parent"]


@admin.register(GdacsTransformation)
class GdacsTransformationAdmin(admin.ModelAdmin):
def get_readonly_fields(self, request, obj=None):
# Use the model's fields to populate readonly_fields
if obj: # If the object exists (edit page)
return [field.name for field in self.model._meta.fields]
return []

list_display = (
"id",
"extraction",
"item_type",
"status",
)
list_filter = ("status",)
autocomplete_fields = ["extraction"]
search_fields = ["extraction"]
6 changes: 6 additions & 0 deletions apps/etl/apps.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
from django.apps import AppConfig


class EtlConfig(AppConfig):
default_auto_field = "django.db.models.BigAutoField"
name = "apps.etl"
21 changes: 21 additions & 0 deletions apps/etl/dataloaders.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
import typing

from asgiref.sync import sync_to_async
from common.dataloaders import load_model_objects
from django.utils.functional import cached_property
from strawberry.dataloader import DataLoader

from .models import User

if typing.TYPE_CHECKING:
from .types import ExtractionDataType


def load_extraction(keys: list[int]) -> list["ExtractionDataType"]:
return load_model_objects(User, keys) # type: ignore[reportReturnType]


class ExtractionDataLoader:
@cached_property
def load_extraction(self):
return DataLoader(load_fn=sync_to_async(load_extraction))
9 changes: 9 additions & 0 deletions apps/etl/enums.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
import strawberry

from .models import ExtractionData

ExtractionDataStatusTypeEnum = strawberry.enum(ExtractionData.Status, name="ExtractionDataStatusTypeEnum")
ExtractionValidationTypeEnum = strawberry.enum(
ExtractionData.ValidationStatus, name="ExtractionDataValidationStatusTypeEnum"
)
ExtractionSourceTypeEnum = strawberry.enum(ExtractionData.Source, name="ExtractionDataSourceTypeEnum")
85 changes: 85 additions & 0 deletions apps/etl/extract.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,85 @@
import requests
from celery.utils.log import get_task_logger
from django.core.exceptions import ObjectDoesNotExist

from .models import ExtractionData

logger = get_task_logger(__name__)


class Extraction:
def __init__(self, url: str):
self.url = url

def _get_file_extension(self, content_type):
mappings = {
"application/json": "json",
"text/html": "html",
"application/xml": "xml",
"text/csv": "csv",
}
return mappings.get(content_type, "txt")

def pull_data(self, source: int, retry_count: int, timeout: int = 30, ext_object_id: int = None):
resp_status = ExtractionData.Status.IN_PROGRESS
source_validation_status = ExtractionData.ValidationStatus.NO_VALIDATION

# Update extraction object status to in_progress
if ext_object_id:
try:
instance_obj = ExtractionData.objects.get(id=ext_object_id)
instance_obj.resp_code = resp_status
instance_obj.attempt_no = retry_count
instance_obj.save(update_fields=["resp_code", "attempt_no"])
except ExtractionData.DoesNotExist:
raise ObjectDoesNotExist("ExtractionData object with ID {ext_object_id} not found")

try:
response = requests.get(self.url, timeout=timeout)
resp_type = response.headers.get("Content-Type", "")
file_extension = self._get_file_extension(resp_type)

# Try saving the data in case of failure
if response.status_code != 200:
data = {
"source": source,
"url": self.url,
"attempt_no": retry_count,
"resp_code": response.status_code,
"status": ExtractionData.Status.FAILED,
"resp_data": None,
"resp_data_type": "text",
"file_extension": None,
"source_validation_status": ExtractionData.ValidationStatus.NO_VALIDATION,
"content_validation": "",
"resp_text": response.text,
}

for key, value in data.items():
setattr(instance_obj, key, value)
instance_obj.save()

logger.error(f"Request failed with status {response.status_code}")
raise Exception("Request failed")

elif response.status_code == 204:
source_validation_status = ExtractionData.ValidationStatus.NO_DATA

resp_status = ExtractionData.Status.SUCCESS

return {
"source": source,
"url": self.url,
"attempt_no": retry_count,
"resp_code": response.status_code,
"status": resp_status,
"resp_data": response,
"resp_data_type": resp_type,
"file_extension": file_extension,
"source_validation_status": source_validation_status,
"content_validation": "",
"resp_text": "",
}
except requests.exceptions.RequestException as e:
logger.error(f"Extraction failed for source {source}: {str(e)}")
raise Exception(f"Request failed: {e}")
Empty file.
Loading