Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix #60: Creating pipeline insertion module #65

Merged
merged 90 commits into from
Apr 16, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
90 commits
Select commit Hold shift + click to select a range
3fbf0e7
fixes #51: connecting to swin model
Feb 7, 2024
04df34e
fixes #51: connecting seed detector to swin
Feb 7, 2024
f833979
fixes #51: refactor exception
Feb 8, 2024
8ebec41
fixes #51: incorporate pipelines_endpoints in CACHE
Feb 8, 2024
f3ab024
fixes #51: add request_factory function to bring a standard to call m…
Feb 8, 2024
e8f7b60
fixes #51: add models_Utils.py
Feb 9, 2024
4748844
fixes #51: Update model_utilitary_functions import
Feb 12, 2024
f19b63b
fixes #51: update result parsing and add default pipeline
Feb 12, 2024
40c298b
fixes #51: update inference_request to call swin in loop
Feb 13, 2024
1e77768
fixes #51: Add documentation to image_slicing and swin_result_parser …
Feb 13, 2024
ddd8553
fixes #51: Standardize headers with azureml-model-deployment being ma…
Feb 14, 2024
92353ad
fixes #51: updates devcontainer
Feb 14, 2024
dc95875
fixes #51: Refactor swin_result_parser function to add all result to …
Feb 14, 2024
7ff78cb
fixes #51: Change model_utilitary_function to model_request module an…
Feb 14, 2024
73ceff9
fixes #51: fix bug with request import
Feb 14, 2024
4502f3b
fixes #51: Add workflow.yml with following jobs: standard, lint test,…
Feb 14, 2024
b030591
fixes #51: Add repo-standard, markdown-check, and yaml-check workflow…
Feb 14, 2024
61f0324
fixes #51: correct failed check from workflows
Feb 14, 2024
9e43553
fixes #51: correct typo in testing.md and workflows.yml
Feb 14, 2024
ac07bc8
delete .MD file
Feb 14, 2024
266217b
fixes #51: Refactor inference result processing and add test file for…
Feb 16, 2024
aff08a3
fixes #51: add model_module module
Feb 16, 2024
84f9403
fixes #51: Add type 3 model to inference request
Feb 20, 2024
44a3227
fixes #51: change model type
Feb 20, 2024
5dd5621
fixes #51: Update doc to include input and output of inference request
Feb 20, 2024
cbd900e
Fixes #51: uptdate doc
Feb 21, 2024
753757b
fixes #51: Add model documentation for the backend
Feb 22, 2024
d72aac9
fixes #51: Change the categories model name
Feb 22, 2024
3e0ca86
fixes #51: implement model module request inference function
Feb 23, 2024
63dddec
fixes #51: add function to retrieve pipeline info from blob storage
Feb 26, 2024
1d3514e
fixes #51: Add script to upload to blob storage json file containing …
Feb 26, 2024
4f3166c
#fixes #51: update documentation on pipeline
Feb 26, 2024
1f14dab
fixes #51: correct lint error
Feb 27, 2024
f8805f4
fixes #51: correct lint error
Feb 27, 2024
384930e
Update nachet-inference-documentation.md to reflect code change
Feb 27, 2024
070f5bc
fixes #51: Add manuel test case in testing.md
Feb 27, 2024
6785da1
Correct lint error
Feb 27, 2024
3cddb43
fixes #51: Add test get pipeline info unsuccessful
Feb 28, 2024
4a4a731
fixes #51: Add get pipeline info successful test
Feb 28, 2024
13b78d4
fixes #51: Update sequence diagram to reflect change in code
Feb 28, 2024
6d63d20
fixes #51: Update sequence diagram
Feb 28, 2024
16e2193
fixes #51: Update doc string
Feb 28, 2024
3fcc61d
fixes #51: update README and TESTING
Feb 29, 2024
c57c822
fixes #51: Add inferences request automatic test
Feb 29, 2024
ea80c9c
fixes #51: correcting markdown lint
Mar 1, 2024
6651708
fixes #51: Add documentation to pipeline function
Mar 1, 2024
60482b3
fixes #51: Add inference request test
Mar 1, 2024
46aadb9
fixes #51: inference test with Quart.test_client
Mar 1, 2024
d94fb95
fixes #51: Correct lint ruff error and tests
Mar 1, 2024
cbfe42c
fixes #51: change from loop to asyncio.run
Mar 1, 2024
0101d40
fixes #51: Correct trailing whitespace and EOF.
Mar 4, 2024
a9375ff
fixes #51: implement Ricky reviews
Mar 6, 2024
216cf5e
fixes #51: implement William's reviews
Mar 6, 2024
2b2100f
fixes #51: fixes lint
Mar 6, 2024
c56569a
fixes #51: raise ConnectionStringError in insert_new_version_pipeline
Mar 6, 2024
f449137
Update .github/workflows/workflows.yml
Mar 7, 2024
0f50e07
Update .github/workflows/workflows.yml
Mar 7, 2024
269e321
fixes #51: removes run_test.py
Mar 7, 2024
098cc29
fixes #51 remove literal path
Mar 7, 2024
4985ad2
fixes ##51: add print statement to log the result from models
Mar 7, 2024
587732e
fixes #51: change to pipeline insertion script
Mar 8, 2024
29d2233
fixes #51: Move insert_new_version_pipeline to pipelines_version_inse…
Mar 14, 2024
11c8b97
fixes #51: add explanation for `box` value
Mar 14, 2024
8bd5685
Add red box image
Mar 14, 2024
dd7d752
fixes #60:
Mar 14, 2024
274b051
fixes #60: correct yaml
Mar 14, 2024
3f741a7
fixes #60: template yaml fix
Mar 14, 2024
64dd55e
fixes #60: template lint error
Mar 14, 2024
77982e9
fixes #60: template fix
Mar 14, 2024
2d22557
fixes #60: try copilot fix
Mar 14, 2024
67ff50f
fixes #60: Add unittest
Mar 19, 2024
276f56c
fixes #60: correct typo in template
Mar 19, 2024
82d2f6c
fixes #60: Correct lint error
Mar 19, 2024
29aaaa6
fixes #60: Add EOF
Mar 19, 2024
7df0e16
fixes #60: Implement grammar correction
Mar 20, 2024
9e8671e
fixes #60: Add environment variable check
Mar 20, 2024
cfd27e6
fixes #60: update environment variables readme
Mar 21, 2024
10b723f
fixes #60: remove act from devcontainer
Mar 25, 2024
91f1b46
fixes #60: Correct Markdown lint error
Mar 25, 2024
f8710c7
#Issue 60: Add a default key for pipeline
Apr 2, 2024
155b96e
issue #60: Modify tests to check error message
Apr 2, 2024
1ce2caa
issue #60: Create Fernet Key and Blob Service Client in main
Apr 3, 2024
9cf2d57
issue #60: update tests
Apr 3, 2024
b4c40dc
issue #60: Merge branch 'main' into 60-pipeline-version-insertion-module
Apr 10, 2024
ca1ec07
issue #60: add warning for image environment variable
Apr 10, 2024
c910abf
iisue #60: Change to image validation
Apr 10, 2024
662af44
issue #60: remove legacy folder
Apr 10, 2024
f3c5558
issue #60: Eliminate catching generic exceptions
Apr 11, 2024
7df50f0
issue #65: Move CONSTANT to upper function
Apr 11, 2024
9124f99
fixes #60: Move pipeline related files into
Apr 15, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions TESTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ pipeline information.
**Preconditions:**

- [ ] Nachet backend is set up and running. Use the command `hypercorn -b :8080
app:app` to start the quartz server.
app:app` to start the quart server.
- [ ] The environment variables are all set.
- [ ] :exclamation: The frontend is not running yet

Expand Down Expand Up @@ -66,7 +66,7 @@ expected.
**Preconditions:**

- [ ] Nachet backend is set up and running. Use the command `hypercorn -b :8080
app:app` to start the quartz server.
app:app` to start the quart server.
- [ ] The environment variables are all set.
- [ ] The frontend is running.
- [ ] Start the frontend application
Expand Down
186 changes: 107 additions & 79 deletions app.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,17 +8,17 @@
import time
import warnings

import model.inference as inference
from model import request_function

from PIL import Image, UnidentifiedImageError
from datetime import date
from dotenv import load_dotenv
from quart import Quart, request, jsonify
from quart_cors import cors
from collections import namedtuple
from cryptography.fernet import Fernet

import azure_storage.azure_storage_api as azure_storage_api
import model.inference as inference
from model import request_function

class APIErrors(Exception):
pass
Expand Down Expand Up @@ -60,24 +60,45 @@ class MaxContentLengthWarning(APIWarnings):
pass

load_dotenv()

connection_string_regex = r"^DefaultEndpointsProtocol=https?;.*;FileEndpoint=https://[a-zA-Z0-9]+\.file\.core\.windows\.net/;$"
connection_string = os.getenv("NACHET_AZURE_STORAGE_CONNECTION_STRING")
pipeline_version_regex = r"\d.\d.\d"

CONNECTION_STRING = os.getenv("NACHET_AZURE_STORAGE_CONNECTION_STRING")

FERNET_KEY = os.getenv("NACHET_BLOB_PIPELINE_DECRYPTION_KEY")
PIPELINE_VERSION = os.getenv("NACHET_BLOB_PIPELINE_VERSION")
PIPELINE_BLOB_NAME = os.getenv("NACHET_BLOB_PIPELINE_NAME")

NACHET_DATA = os.getenv("NACHET_DATA")
NACHET_MODEL = os.getenv("NACHET_MODEL")

Model = namedtuple(
'Model',
[
'entry_function',
'name',
'endpoint',
'api_key',
'inference_function',
'content_type',
'deployment_platform',
]
)

try:
VALID_EXTENSION = json.loads(os.getenv("NACHET_VALID_EXTENSION"))
VALID_DIMENSION = json.loads(os.getenv("NACHET_VALID_DIMENSION"))
except TypeError:
except (TypeError, json.decoder.JSONDecodeError):
# For testing
VALID_DIMENSION = {"width": 1920, "height": 1080}
VALID_EXTENSION = {"jpeg", "jpg", "png", "gif", "bmp", "tiff", "webp"}
warnings.warn(
f"""
NACHET_VALID_EXTENSION or NACHET_VALID_DIMENSION is not set,
using default values: {", ".join(list(VALID_EXTENSION))} and dimension: {tuple(VALID_DIMENSION.values())}
""",
ImageWarning
)

try:
MAX_CONTENT_LENGTH_MEGABYTES = int(os.getenv("NACHET_MAX_CONTENT_LENGTH"))
Expand Down Expand Up @@ -113,6 +134,51 @@ class MaxContentLengthWarning(APIWarnings):
app.config["MAX_CONTENT_LENGTH"] = MAX_CONTENT_LENGTH_MEGABYTES * 1024 * 1024


@app.before_serving
async def before_serving():
try:
# Check: do environment variables exist?
if CONNECTION_STRING is None:
raise ServerError("Missing environment variable: NACHET_AZURE_STORAGE_CONNECTION_STRING")

if FERNET_KEY is None:
raise ServerError("Missing environment variable: FERNET_KEY")

if PIPELINE_VERSION is None:
raise ServerError("Missing environment variable: PIPELINE_VERSION")

if PIPELINE_BLOB_NAME is None:
raise ServerError("Missing environment variable: PIPELINE_BLOB_NAME")

if NACHET_DATA is None:
raise ServerError("Missing environment variable: NACHET_DATA")

# Check: are environment variables correct?
if not bool(re.match(connection_string_regex, CONNECTION_STRING)):
raise ServerError("Incorrect environment variable: NACHET_AZURE_STORAGE_CONNECTION_STRING")

if not bool(re.match(pipeline_version_regex, PIPELINE_VERSION)):
raise ServerError("Incorrect environment variable: PIPELINE_VERSION")

CACHE["seeds"] = await fetch_json(NACHET_DATA, "seeds", "seeds/all.json")
CACHE["endpoints"] = await get_pipelines(
CONNECTION_STRING, PIPELINE_BLOB_NAME,
PIPELINE_VERSION, Fernet(FERNET_KEY)
)

print(
f"""Server start with current configuration:\n
date: {date.today()}
file version of pipelines: {PIPELINE_VERSION}
pipelines: {[pipeline for pipeline in CACHE["pipelines"].keys()]}\n
"""
) #TODO Transform into logging

except ServerError as e:
print(e)
raise


@app.post("/del")
async def delete_directory():
"""
Expand Down Expand Up @@ -215,10 +281,21 @@ async def image_validation():
image_base64 = data["image"]

header, encoded_image = image_base64.split(",", 1)

image_bytes = base64.b64decode(encoded_image)

image = Image.open(io.BytesIO(image_bytes))

# size check
if image.size[0] > VALID_DIMENSION["width"] and image.size[1] > VALID_DIMENSION["height"]:
raise ImageValidationError(f"invalid file size: {image.size[0]}x{image.size[1]}")

# resizable check
try:
size = (100,150)
image.thumbnail(size)
except IOError:
raise ImageValidationError("invalid file not resizable")

magic_header = magic.from_buffer(image_bytes, mime=True)
image_extension = magic_header.split("/")[1]

Expand All @@ -232,23 +309,12 @@ async def image_validation():
if header.lower() != expected_header:
raise ImageValidationError(f"invalid file header: {header}")

# size check
if image.size[0] > VALID_DIMENSION["width"] and image.size[1] > VALID_DIMENSION["height"]:
raise ImageValidationError(f"invalid file size: {image.size[0]}x{image.size[1]}")

# resizable check
try:
size = (100,150)
image.thumbnail(size)
except IOError:
raise ImageValidationError("invalid file not resizable")

validator = await azure_storage_api.generate_hash(image_bytes)
CACHE['validators'].append(validator)

return jsonify([validator]), 200

except (FileNotFoundError, ValueError, TypeError, UnidentifiedImageError, ImageValidationError) as error:
except (UnidentifiedImageError, ImageValidationError) as error:
print(error)
return jsonify([error.args[0]]), 400

Expand Down Expand Up @@ -335,14 +401,6 @@ async def inference_request():
print(error)
return jsonify(["InferenceRequestError: " + error.args[0]]), 400

except Exception as error:
print(error)
return jsonify(["Unexpected error occured"]), 500

@app.get("/coffee")
async def get_coffee():
return jsonify("Tea is great!"), 418


@app.get("/seed-data/<seed_name>")
async def get_seed_data(seed_name):
Expand All @@ -363,8 +421,9 @@ async def reload_seed_data():
try:
await fetch_json(NACHET_DATA, 'seeds', "seeds/all.json")
return jsonify(["Seed data reloaded successfully"]), 200
except Exception as e:
return jsonify({"error": str(e)}), 500
except urllib.error.HTTPError as e:
return jsonify(
{f"An error happend when reloading the seed data: {e.args[0]}"}), 500


@app.get("/model-endpoints-metadata")
Expand Down Expand Up @@ -406,25 +465,28 @@ async def test():

return CACHE["endpoints"], 200


async def fetch_json(repo_URL, key, file_path):
"""
Fetches JSON document from a GitHub repository and caches it
"""
try:
if key != "endpoints":
json_url = os.path.join(repo_URL, file_path)
with urllib.request.urlopen(json_url) as response:
result = response.read()
result_json = json.loads(result.decode("utf-8"))
return result_json
Fetches JSON document from a GitHub repository.

except urllib.error.HTTPError as error:
raise ValueError(str(error))
except Exception as e:
raise ValueError(str(e))
Parameters:
- repo_URL (str): The URL of the GitHub repository.
- key (str): The key to identify the JSON document.
- file_path (str): The path to the JSON document in the repository.

Returns:
- dict: The JSON document as a Python dictionary.
"""
if key != "endpoints":
json_url = os.path.join(repo_URL, file_path)
with urllib.request.urlopen(json_url) as response:
result = response.read()
result_json = json.loads(result.decode("utf-8"))
return result_json


async def get_pipelines():
async def get_pipelines(connection_string, pipeline_blob_name, pipeline_version, cipher_suite):
"""
Retrieves the pipelines from the Azure storage API.

Expand All @@ -433,16 +495,15 @@ async def get_pipelines():
"""
try:
app.config["BLOB_CLIENT"] = await azure_storage_api.get_blob_client(connection_string)
result_json = await azure_storage_api.get_pipeline_info(app.config["BLOB_CLIENT"], PIPELINE_BLOB_NAME, PIPELINE_VERSION)
cipher_suite = Fernet(FERNET_KEY)
result_json = await azure_storage_api.get_pipeline_info(app.config["BLOB_CLIENT"], pipeline_blob_name, pipeline_version)
except (azure_storage_api.AzureAPIErrors) as error:
print(error)
raise ServerError("server errror: could not retrieve the pipelines") from error

models = ()
for model in result_json.get("models"):
m = Model(
request_function.get(model.get("api_call_function")),
request_function.get(model.get("endpoint_name")),
model.get("model_name"),
# To protect sensible data (API key and model endpoint), we encrypt it when
# it's pushed into the blob storage. Once we retrieve the data here in the
Expand All @@ -461,38 +522,5 @@ async def get_pipelines():
return result_json.get("pipelines")


@app.before_serving
async def before_serving():
try:
# Check: do environment variables exist?
if connection_string is None:
raise ServerError("Missing environment variable: NACHET_AZURE_STORAGE_CONNECTION_STRING")

if FERNET_KEY is None:
raise ServerError("Missing environment variable: FERNET_KEY")

# Check: are environment variables correct?
if not bool(re.match(connection_string_regex, connection_string)):
raise ServerError("Incorrect environment variable: NACHET_AZURE_STORAGE_CONNECTION_STRING")

CACHE["seeds"] = await fetch_json(NACHET_DATA, "seeds", "seeds/all.json")
CACHE["endpoints"] = await get_pipelines()

print(
f"""Server start with current configuration:\n
date: {date.today()}
file version of pipelines: {PIPELINE_VERSION}
pipelines: {[pipeline for pipeline in CACHE["pipelines"].keys()]}\n
"""
) #TODO Transform into logging

except ServerError as e:
print(e)
raise

except Exception as e:
print(e)
raise ServerError("Failed to retrieve data from the repository")

if __name__ == "__main__":
app.run(debug=True, host="0.0.0.0", port=8080)
5 changes: 4 additions & 1 deletion azure_storage/azure_storage_api.py
Original file line number Diff line number Diff line change
Expand Up @@ -72,7 +72,10 @@ async def generate_hash(image):

async def get_blob_client(connection_string: str):
"""
given a connection string, returns the blob client object
given a connection string and a container name, mounts the container and
returns the container client as an object that can be used in other
functions. if a specified container doesnt exist, it creates one with the
provided uuid, if create_container is True
"""
try:
blob_service_client = BlobServiceClient.from_connection_string(
Expand Down
7 changes: 3 additions & 4 deletions docs/nachet-inference-documentation.md
Original file line number Diff line number Diff line change
Expand Up @@ -242,7 +242,7 @@ the `CACHE["endpoint"]` variable. This is the variable that feeds the `models`
information and metadata to the frontend.

```python
async def get_pipelines():
async def get_pipelines(connection_string, pipeline_blob_name, pipeline_version, cipher_suite):
"""
Retrieves the pipelines from the Azure storage API.

Expand All @@ -251,9 +251,8 @@ async def get_pipelines():
"""
try:
app.config["BLOB_CLIENT"] = await azure_storage_api.get_blob_client(connection_string)
result_json = await azure_storage_api.get_pipeline_info(app.config["BLOB_CLIENT"], PIPELINE_BLOB_NAME, PIPELINE_VERSION)
cipher_suite = Fernet(FERNET_KEY)
except (ConnectionStringError, PipelineNotFoundError) as error:
result_json = await azure_storage_api.get_pipeline_info(app.config["BLOB_CLIENT"], pipeline_blob_name, pipeline_version)
except (azure_storage_api.AzureAPIErrors) as error:
print(error)
raise ServerError("server errror: could not retrieve the pipelines") from error

Expand Down
Loading
Loading