Skip to content

Commit

Permalink
Merge pull request #42 from bento-platform/refact/fastapi-rewrite
Browse files Browse the repository at this point in the history
refact!: remove tables concept + rewrite with FastAPI
  • Loading branch information
davidlougheed authored Aug 28, 2023
2 parents b7065ec + 9f8f914 commit cb61481
Show file tree
Hide file tree
Showing 31 changed files with 1,922 additions and 1,673 deletions.
4 changes: 2 additions & 2 deletions .github/workflows/lint.yml
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ jobs:
runs-on: ubuntu-latest
strategy:
matrix:
python-version: ["3.8", "3.10"]
python-version: ["3.10", "3.11"]

steps:

Expand All @@ -34,4 +34,4 @@ jobs:
run: python -m poetry install

- name: Lint
run: flake8 ./bento_aggregation_service ./tests
run: poetry run flake8 ./bento_aggregation_service ./tests
8 changes: 4 additions & 4 deletions .github/workflows/test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ jobs:
runs-on: ubuntu-latest
strategy:
matrix:
python-version: ["3.8", "3.10"]
python-version: ["3.10", "3.11"]

steps:

Expand All @@ -34,7 +34,7 @@ jobs:
run: python -m poetry install

- name: Test
run: coverage run -m unittest -v
run: poetry run coverage run -m unittest -v

- name: Codecov
run: codecov
- name: Upload coverage to Codecov
uses: codecov/codecov-action@v3
2 changes: 1 addition & 1 deletion .idea/bento_federation_service.iml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion .idea/misc.xml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

6 changes: 3 additions & 3 deletions .vscode/launch.json
Original file line number Diff line number Diff line change
Expand Up @@ -9,17 +9,17 @@
"name": "Attach Debugger",
"type": "python",
"request": "attach",
"port": 5879,
"port": 5684,
"host": "0.0.0.0",
"justMyCode": true,
},
{
"name": "Fed. + dep. - Attach Debugger",
"type": "python",
"request": "attach",
"port": 5879,
"port": 5684,
"host": "0.0.0.0",
"justMyCode": false,
},
]
}
}
12 changes: 6 additions & 6 deletions Dockerfile
Original file line number Diff line number Diff line change
@@ -1,32 +1,32 @@
FROM ghcr.io/bento-platform/bento_base_image:python-debian-2023.03.22
FROM ghcr.io/bento-platform/bento_base_image:python-debian-2023.08.16

# Run as root in the Dockerfile until we drop down to the service user in the entrypoint
USER root

# Use uvicorn (instead of hypercorn) in production since I've found
# multiple benchmarks showing it to be faster - David L
RUN pip install --no-cache-dir "uvicorn[standard]==0.20.0"
RUN pip install --no-cache-dir "uvicorn[standard]==0.23.2"

WORKDIR /aggregation

COPY pyproject.toml .
COPY poetry.toml .
COPY poetry.lock .

# Install production dependencies
# Without --no-root, we get errors related to the code not being copied in yet.
# But we don't want the code here, otherwise Docker cache doesn't work well.
RUN poetry install --without dev --no-root
RUN poetry config virtualenvs.create false && \
poetry install --without dev --no-root

# Manually copy only what's relevant
# (Don't use .dockerignore, which allows us to have development containers too)
COPY bento_aggregation_service bento_aggregation_service
COPY LICENSE .
COPY README.md .
COPY run.py .
COPY run.bash .

# Install the module itself, locally (similar to `pip install -e .`)
RUN poetry install --without dev

# Use base image entrypoint for dropping down into bento_user & running this CMD
CMD [ "python3", "run.py" ]
CMD ["bash", "./run.bash"]
18 changes: 10 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,17 +3,19 @@
A service for aggregating search results across Bento data services.

## Environment Variables
The following environment variables are required for the Aggregation Service:

`CHORD_DEBUG`: `true` (insecure) or `false`; default is `false`
- `BENTO_DEBUG`: `true` (insecure) or `false`; default is `false`

`CHORD_URL`: ex. `http://127.0.0.1:5000/`
- `USE_GOHAN`: `true` or `false` to use Gohan; default is `true`

By convention, this *should* have a trailing slash; however as of v0.9.1 this
is optional.
- `KATSU_URL`: katsu service url (e.g. https://portal.bentov2.local/api/metadata/)

- `SERVICE_REGISTRY_URL`: service registry url (e.g. https://bentov2.local/api/service-registry/)

`PORT`: Specified when running via `./run.py`; defaults to `5000`
- `BENTO_AUTHZ_SERVICE_URL`: authorization service url (e.g. https://bentov2.local/api/authorization/)

`SERVICE_URL_BASE_PATH`: Base URL fragment (e.g. `/test/`) for endpoints
By convention, URLs *should* have a trailing slash; however as of v0.9.1 this
is optional.

Should usually be blank; set to non-blank to locally emulate a proxy prefix
like `/api/aggregation`.
Note that when deployed in a [Bento](https://github.com/bento-platform/bento) node, these environment variables are provided by the Aggregation docker-compose [file](https://github.com/bento-platform/bento/blob/main/lib/aggregation/docker-compose.aggregation.yaml).
136 changes: 53 additions & 83 deletions bento_aggregation_service/app.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,33 +2,48 @@

import asyncio
import bento_aggregation_service
import tornado.gen
import tornado.ioloop
import tornado.web

from bento_lib.types import GA4GHServiceInfo
from tornado.web import RequestHandler, url
from fastapi import FastAPI
from fastapi.middleware.cors import CORSMiddleware

from .config import ConfigDependency, get_config
from .constants import (
BENTO_SERVICE_KIND,
SERVICE_ID,
SERVICE_TYPE,
SERVICE_NAME,
PORT,
BASE_PATH,
CHORD_DEBUG,
CHORD_URL_SET,
DEBUGGER_PORT,
)
from .logger import logger
from .search.handlers.datasets import DatasetsSearchHandler
from .search.handlers.private_dataset import PrivateDatasetSearchHandler
from .logger import LoggerDependency
from .search.handlers.datasets import dataset_search_router


# noinspection PyAbstractClass,PyAttributeOutsideInit
class ServiceInfoHandler(RequestHandler):
SERVICE_INFO: GA4GHServiceInfo = {
"id": SERVICE_ID,
application = FastAPI()

# TODO: Find a way to DI this
config_for_setup = get_config()

application.add_middleware(
CORSMiddleware,
allow_origins=config_for_setup.cors_origins,
allow_headers=["Authorization"],
allow_credentials=True,
allow_methods=["*"],
)

application.include_router(dataset_search_router)


async def _git_stdout(*args) -> str:
git_proc = await asyncio.create_subprocess_exec(
"git", *args, stdout=asyncio.subprocess.PIPE, stderr=asyncio.subprocess.PIPE)
res, _ = await git_proc.communicate()
return res.decode().rstrip()


@application.get("/service-info")
async def service_info(config: ConfigDependency, logger: LoggerDependency):
info: GA4GHServiceInfo = {
"id": config.service_id,
"name": SERVICE_NAME, # TODO: Should be globally unique?
"type": SERVICE_TYPE,
"description": "Aggregation service for a Bento platform node.",
Expand All @@ -41,71 +56,26 @@ class ServiceInfoHandler(RequestHandler):
"bento": {
"serviceKind": BENTO_SERVICE_KIND,
},
"environment": "prod",
}

@staticmethod
async def _git_stdout(*args) -> str:
git_proc = await asyncio.create_subprocess_exec(
"git", *args, stdout=asyncio.subprocess.PIPE, stderr=asyncio.subprocess.PIPE)
res, _ = await git_proc.communicate()
return res.decode().rstrip()

async def get(self):
# Spec: https://github.com/ga4gh-discovery/ga4gh-service-info

if not CHORD_DEBUG:
# Cache production service info, since no information should change
self.set_header("Cache-Control", "private")
self.write({**self.SERVICE_INFO, "environment": "prod"})
return

service_info = {
**self.SERVICE_INFO,
"environment": "dev",
}

try:
if res_tag := await self._git_stdout("describe", "--tags", "--abbrev=0"):
# noinspection PyTypeChecker
service_info["bento"]["gitTag"] = res_tag
if res_branch := await self._git_stdout("branch", "--show-current"):
# noinspection PyTypeChecker
service_info["bento"]["gitBranch"] = res_branch
if res_commit := await self._git_stdout("rev-parse", "HEAD"):
# noinspection PyTypeChecker
service_info["bento"]["gitCommit"] = res_commit

except Exception as e:
logger.warning(f"Could not retrieve git information: {type(e).__name__}")

self.write(service_info)


class Application(tornado.web.Application):
def __init__(self, base_path: str):
super().__init__([
url(f"{base_path}/service-info", ServiceInfoHandler),
url(f"{base_path}/dataset-search", DatasetsSearchHandler),
url(f"{base_path}/private/dataset-search/([a-zA-Z0-9\\-_]+)", PrivateDatasetSearchHandler),
])


application = Application(BASE_PATH)


def run(): # pragma: no cover
if not CHORD_URL_SET:
logger.critical("CHORD_URL is not set, terminating...")
exit(1)

if CHORD_DEBUG:
try:
# noinspection PyPackageRequirements,PyUnresolvedReferences
import debugpy
debugpy.listen(("0.0.0.0", DEBUGGER_PORT))
logger.info("debugger attached")
except ImportError:
logger.info("debugpy not found")

application.listen(PORT)
tornado.ioloop.IOLoop.current().start()
if not config.bento_debug:
return info

info["environment"] = "dev"

try:
if res_tag := await _git_stdout("describe", "--tags", "--abbrev=0"):
# noinspection PyTypeChecker
info["bento"]["gitTag"] = res_tag
if res_branch := await _git_stdout("branch", "--show-current"):
# noinspection PyTypeChecker
info["bento"]["gitBranch"] = res_branch
if res_commit := await _git_stdout("rev-parse", "HEAD"):
# noinspection PyTypeChecker
info["bento"]["gitCommit"] = res_commit

except Exception as e:
logger.warning(f"Could not retrieve git information: {type(e).__name__}")

return info
64 changes: 64 additions & 0 deletions bento_aggregation_service/config.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,64 @@
import json

from fastapi import Depends
from functools import lru_cache
from pydantic.fields import FieldInfo
from pydantic_settings import BaseSettings, EnvSettingsSource, PydanticBaseSettingsSource, SettingsConfigDict
from typing import Annotated, Any, Literal

from .constants import SERVICE_TYPE

__all__ = [
"Config",
"get_config",
"ConfigDependency",
]


class CorsOriginsParsingSource(EnvSettingsSource):
def prepare_field_value(self, field_name: str, field: FieldInfo, value: Any, value_is_complex: bool) -> Any:
if field_name == "cors_origins":
return tuple(x.strip() for x in value.split(";")) if value is not None else ()
return json.loads(value) if value_is_complex else value


class Config(BaseSettings):
bento_debug: bool = False

service_id: str = str(":".join(list(SERVICE_TYPE.values())[:2]))

request_timeout: int = 180 # seconds

bento_authz_service_url: str # Bento authorization service base URL
authz_enabled: bool = True

# Other services - settings and flags
use_gohan: bool = False
katsu_url: str
service_registry_url: str # used for fetching list of data services, so we can get data type providers

cors_origins: tuple[str, ...] = ("*",)

log_level: Literal["debug", "info", "warning", "error"] = "debug"

# Make Config instances hashable + immutable
model_config = SettingsConfigDict(frozen=True)

@classmethod
def settings_customise_sources(
cls,
settings_cls: type[BaseSettings],
init_settings: PydanticBaseSettingsSource,
env_settings: PydanticBaseSettingsSource,
dotenv_settings: PydanticBaseSettingsSource,
file_secret_settings: PydanticBaseSettingsSource,
) -> tuple[PydanticBaseSettingsSource, ...]:
return (CorsOriginsParsingSource(settings_cls),)


@lru_cache()
def get_config() -> Config:
return Config()


ConfigDependency = Annotated[Config, Depends(get_config)]
Loading

0 comments on commit cb61481

Please sign in to comment.