From 9fc77923ff0392e18cbcebe9b6ce9a8284d939b1 Mon Sep 17 00:00:00 2001 From: Inga Ulusoy Date: Thu, 26 Sep 2024 12:21:11 +0200 Subject: [PATCH 01/19] fix typos --- ammico/notebooks/DemoNotebook_ammico.ipynb | 18 +++++++++--------- pyproject.toml | 1 + 2 files changed, 10 insertions(+), 9 deletions(-) diff --git a/ammico/notebooks/DemoNotebook_ammico.ipynb b/ammico/notebooks/DemoNotebook_ammico.ipynb index e2d375e5..c2ff55ff 100644 --- a/ammico/notebooks/DemoNotebook_ammico.ipynb +++ b/ammico/notebooks/DemoNotebook_ammico.ipynb @@ -155,7 +155,7 @@ "| `limit` | `int` | maximum number of files to read (defaults to `20`, for all images set to `None` or `-1`) |\n", "| `random_seed` | `str` | the random seed for shuffling the images; applies when only a few images are read and the selection should be preserved (defaults to `None`) |\n", "\n", - "The `find_files` function returns a nested dict that contains the file ids and the paths to the files and is empty otherwise. This dict is filled step by step with more data as each detector class is run on the data (see below).\n", + "The `find_files` function returns a nested dictionary that contains the file ids and the paths to the files and is empty otherwise. This dictionary is filled step by step with more data as each detector class is run on the data (see below).\n", "\n", "If you downloaded the test dataset above, you can directly provide the path you already set for the test directory, `data_path`. The below cell is already set up for the test dataset.\n", "\n", @@ -187,9 +187,9 @@ "\n", "If you want to run an analysis using the EmotionDetector detector type, you have first have to respond to an ethical disclosure statement. This disclosure statement ensures that you only use the full capabilities of the EmotionDetector after you have been made aware of its shortcomings.\n", "\n", - "For this, answer \"yes\" or \"no\" to the below prompt. This will set an environment variable with the name given as in `accept_disclosure`. To re-run the disclosure prompt, unset the variable by uncommenting the line `os.environ.pop(accept_disclosure, None)`. To permanently set this envorinment variable, add it to your shell via your `.profile` or `.bashr` file.\n", + "For this, answer \"yes\" or \"no\" to the below prompt. This will set an environment variable with the name given as in `accept_disclosure`. To re-run the disclosure prompt, unset the variable by uncommenting the line `os.environ.pop(accept_disclosure, None)`. To permanently set this environment variable, add it to your shell via your `.profile` or `.bashr` file.\n", "\n", - "If the disclosure statement is accepted, the EmotionDetector will perform age, gender and race/ethnicity classification dependend on the provided thresholds. If the disclosure is rejected, only the presence of faces and emotion (if not wearing a mask) is detected." + "If the disclosure statement is accepted, the EmotionDetector will perform age, gender and race/ethnicity classification depending on the provided thresholds. If the disclosure is rejected, only the presence of faces and emotion (if not wearing a mask) is detected." ] }, { @@ -722,7 +722,7 @@ "| output key | output type | output value |\n", "| ---------- | ----------- | ------------ |\n", "| `const_image_summary` | `str` | when `analysis_type=\"summary\"` or `\"summary_and_questions\"`, constant image caption (does not change upon re-running the analysis for the same model) |\n", - "| `3_non-deterministic_summary` | `list[str]` | when `analysis_type=\"summary\"` or s`ummary_and_questions`, three different captions generated with different random seeds |\n", + "| `3_non-deterministic_summary` | `list[str]` | when `analysis_type=\"summary\"` or `summary_and_questions`, three different captions generated with different random seeds |\n", "| *a user-defined input question* | `str` | when `analysis_type=\"questions\"` or `summary_and_questions`, the answer to the user-defined input question | \n" ] }, @@ -837,7 +837,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "You can also ask sequential questions if you pass the argument `cosequential_questions=True`. This means that the answers to previous questions will be passed as context to the next question. However, this method will work a bit slower, because for each image the answers to the questions will not be calculated simultaneously, but sequentially. " + "You can also ask sequential questions if you pass the argument `consequential_questions=True`. This means that the answers to previous questions will be passed as context to the next question. However, this method will work a bit slower, because for each image the answers to the questions will not be calculated simultaneously, but sequentially. " ] }, { @@ -1103,7 +1103,7 @@ "You can filter your results in 3 different ways:\n", "- `filter_number_of_images` limits the number of images found. That is, if the parameter `filter_number_of_images = 10`, then the first 10 images that best match the query will be shown. The other images ranks will be set to `None` and the similarity value to `0`.\n", "- `filter_val_limit` limits the output of images with a similarity value not bigger than `filter_val_limit`. That is, if the parameter `filter_val_limit = 0.2`, all images with similarity less than 0.2 will be discarded.\n", - "- `filter_rel_error` (percentage) limits the output of images with a similarity value not bigger than `100 * abs(current_simularity_value - best_simularity_value_in_current_search)/best_simularity_value_in_current_search < filter_rel_error`. That is, if we set filter_rel_error = 30, it means that if the top1 image have 0.5 similarity value, we discard all image with similarity less than 0.35." + "- `filter_rel_error` (percentage) limits the output of images with a similarity value not bigger than `100 * abs(current_similarity_value - best_similarity_value_in_current_search)/best_similarity_value_in_current_search < filter_rel_error`. That is, if we set filter_rel_error = 30, it means that if the top1 image have 0.5 similarity value, we discard all image with similarity less than 0.35." ] }, { @@ -1227,7 +1227,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "Then using the same output function you can add the `itm=True` argument to output the new image order. Remember that for images querys, an error will be thrown with `itm=True` argument. You can also add the `image_gradcam_with_itm` along with `itm=True` argument to output the heat maps of the calculated images." + "Then using the same output function you can add the `itm=True` argument to output the new image order. Remember that for images queries, an error will be thrown with `itm=True` argument. You can also add the `image_gradcam_with_itm` along with `itm=True` argument to output the heat maps of the calculated images." ] }, { @@ -1252,7 +1252,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "Convert the dictionary of dictionarys into a dictionary with lists:" + "Convert the dictionary of dictionaries into a dictionary with lists:" ] }, { @@ -1406,7 +1406,7 @@ ], "metadata": { "kernelspec": { - "display_name": "ammico", + "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, diff --git a/pyproject.toml b/pyproject.toml index e615a0a3..6d3fd33b 100644 --- a/pyproject.toml +++ b/pyproject.toml @@ -32,6 +32,7 @@ dependencies = [ "importlib_metadata", "importlib_resources", "ipython", + "jupyter", "jupyter_dash", "matplotlib", "numpy<=1.23.4", From 3acaed162ab1524a91dac9d3195f6e2e116ac4b7 Mon Sep 17 00:00:00 2001 From: Inga Ulusoy Date: Thu, 26 Sep 2024 12:54:04 +0200 Subject: [PATCH 02/19] add buttons for google colab everywhere --- README.md | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/README.md b/README.md index 3dfa5dc4..fca8af46 100644 --- a/README.md +++ b/README.md @@ -5,6 +5,7 @@ ![codecov](https://img.shields.io/codecov/c/github/ssciwr/AMMICO) ![Quality Gate Status](https://sonarcloud.io/api/project_badges/measure?project=ssciwr_ammico&metric=alert_status) ![Language](https://img.shields.io/github/languages/top/ssciwr/AMMICO) +[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/ssciwr/ammico/blob/main/ammico/notebooks/DemoNotebook_ammico.ipynb) This package extracts data from images such as social media posts that contain an image part and a text part. The analysis can generate a very large number of features, depending on the user input. See [our paper](https://dx.doi.org/10.31235/osf.io/v8txj) for a more in-depth description. @@ -104,14 +105,14 @@ Be careful, it requires around 7 GB of disk space. ## Usage -The main demonstration notebook can be found in the `notebooks` folder and also on [google colab](https://colab.research.google.com/github/ssciwr/ammico/blob/main/ammico/notebooks/DemoNotebook_ammico.ipynb). +The main demonstration notebook can be found in the `notebooks` folder and also on google colab: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)]. There are further sample notebooks in the `notebooks` folder for the more experimental features: 1. Topic analysis: Use the notebook `get-text-from-image.ipynb` to analyse the topics of the extraced text.\ -**You can run this notebook on google colab: [Here](https://colab.research.google.com/github/ssciwr/ammico/blob/main/ammico/notebooks/get-text-from-image.ipynb)** +**You can run this notebook on google colab: [![Open In Colab](https://colab.research.google.com/github/ssciwr/ammico/blob/main/ammico/notebooks/get-text-from-image.ipynb)** Place the data files and google cloud vision API key in your google drive to access the data. 1. To crop social media posts use the `cropposts.ipynb` notebook. -**You can run this notebook on google colab: [Here](https://colab.research.google.com/github/ssciwr/ammico/blob/main/ammico/notebooks/cropposts.ipynb)** +**You can run this notebook on google colab: [![Open In Colab](https://colab.research.google.com/github/ssciwr/ammico/blob/main/ammico/notebooks/cropposts.ipynb)** ## Features ### Text extraction From 0564cfcfcccfe1acb9c977b9c2021ce1388938f4 Mon Sep 17 00:00:00 2001 From: Inga Ulusoy Date: Fri, 4 Oct 2024 10:29:31 +0200 Subject: [PATCH 03/19] update readme, separate out FAQ --- FAQ.md | 99 ++++++++++++++++++++++++++++++++++++++++++++++++++++ README.md | 101 ++---------------------------------------------------- 2 files changed, 101 insertions(+), 99 deletions(-) create mode 100644 FAQ.md diff --git a/FAQ.md b/FAQ.md new file mode 100644 index 00000000..c58377fe --- /dev/null +++ b/FAQ.md @@ -0,0 +1,99 @@ +# FAQ + +## Compatibility problems solving + +Some ammico components require `tensorflow` (e.g. Emotion detector), some `pytorch` (e.g. Summary detector). Sometimes there are compatibility problems between these two frameworks. To avoid these problems on your machines, you can prepare proper environment before installing the package (you need conda on your machine): + +### 1. First, install tensorflow (https://www.tensorflow.org/install/pip) +- create a new environment with python and activate it + + ```conda create -n ammico_env python=3.10``` + + ```conda activate ammico_env``` +- install cudatoolkit from conda-forge + + ``` conda install -c conda-forge cudatoolkit=11.8.0``` +- install nvidia-cudnn-cu11 from pip + + ```python -m pip install nvidia-cudnn-cu11==8.6.0.163``` +- add script that runs when conda environment `ammico_env` is activated to put the right libraries on your LD_LIBRARY_PATH + + ``` + mkdir -p $CONDA_PREFIX/etc/conda/activate.d + echo 'CUDNN_PATH=$(dirname $(python -c "import nvidia.cudnn;print(nvidia.cudnn.__file__)"))' >> $CONDA_PREFIX/etc/conda/activate.d/env_vars.sh + echo 'export LD_LIBRARY_PATH=$CUDNN_PATH/lib:$CONDA_PREFIX/lib/:$LD_LIBRARY_PATH' >> $CONDA_PREFIX/etc/conda/activate.d/env_vars.sh + source $CONDA_PREFIX/etc/conda/activate.d/env_vars.sh + ``` +- deactivate and re-activate conda environment to call script above + + ```conda deactivate``` + + ```conda activate ammico_env ``` + +- install tensorflow + + ```python -m pip install tensorflow==2.12.1``` + +### 2. Second, install pytorch + +- install pytorch for same cuda version as above + + ```python -m pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118``` + +### 3. After we prepared right environment we can install the ```ammico``` package + +- ```python -m pip install ammico``` + +It is done. + +### Micromamba +If you are using micromamba you can prepare environment with just one command: + +```micromamba create --no-channel-priority -c nvidia -c pytorch -c conda-forge -n ammico_env "python=3.10" pytorch torchvision torchaudio pytorch-cuda "tensorflow-gpu<=2.12.3" "numpy<=1.23.4"``` + +### Windows + +To make pycocotools work on Windows OS you may need to install `vs_BuildTools.exe` from https://visualstudio.microsoft.com/visual-cpp-build-tools/ and choose following elements: +- `Visual Studio extension development` +- `MSVC v143 - VS 2022 C++ x64/x86 build tools` +- `Windows 11 SDK` for Windows 11 (or `Windows 10 SDK` for Windows 10) + +Be careful, it requires around 7 GB of disk space. + +![Screenshot 2023-06-01 165712](https://github.com/ssciwr/AMMICO/assets/8105097/3dfb302f-c390-46a7-a700-4e044f56c30f) + +## What happens to the images that are sent to google Cloud Vision? + +You have to accept the privacy statement of ammico to run this type of analyis. + +According to the [google Vision API](https://cloud.google.com/vision/docs/data-usage), the images that are uploaded and analysed are not stored and not shared with third parties: + +> We won't make the content that you send available to the public. We won't share the content with any third party. The content is only used by Google as necessary to provide the Vision API service. Vision API complies with the Cloud Data Processing Addendum. + +> For online (immediate response) operations (`BatchAnnotateImages` and `BatchAnnotateFiles`), the image data is processed in memory and not persisted to disk. +For asynchronous offline batch operations (`AsyncBatchAnnotateImages` and `AsyncBatchAnnotateFiles`), we must store that image for a short period of time in order to perform the analysis and return the results to you. The stored image is typically deleted right after the processing is done, with a failsafe Time to live (TTL) of a few hours. +Google also temporarily logs some metadata about your Vision API requests (such as the time the request was received and the size of the request) to improve our service and combat abuse. + +## What happens to the text that is sent to google Translate? + +You have to accept the privacy statement of ammico to run this type of analyis. + +According to [google Translate](https://cloud.google.com/translate/data-usage), the data is not stored after processing and not made available to third parties: + +> We will not make the content of the text that you send available to the public. We will not share the content with any third party. The content of the text is only used by Google as necessary to provide the Cloud Translation API service. Cloud Translation API complies with the Cloud Data Processing Addendum. + +> When you send text to Cloud Translation API, text is held briefly in-memory in order to perform the translation and return the results to you. + +## What happens if I don't have internet access - can I still use ammico? + +Some features of ammico require internet access; a general answer to this question is not possible, some services require an internet connection, others can be used offline: + +- Text extraction: To extract text from images, and translate the text, the data needs to be processed by google Cloud Vision and google Translate, which run in the cloud. Without internet access, text extraction and translation is not possible. +- Image summary and query: After an initial download of the models, the `summary` module does not require an internet connection. +- Facial expressions: After an initial download of the models, the `faces` module does not require an internet connection. +- Multimodal search: After an initial download of the models, the `multimodal_search` module does not require an internet connection. +- Color analysis: The `color` module does not require an internet connection. + +## Why don't I get probabilistic assessments of age, gender and race when running the Emotion Detector? +Due to well documented biases in the detection of minorities with computer vision tools, and to the ethical implications of such detection, these parts of the tool are not directly made available to users. To access these capabilities, users must first agree with a ethical disclosure statement that reads: "The Emotion Detector uses RetinaFace to probabilistically assess the gender, age and race of the detected faces. Such assessments may not reflect how the individuals identified by the tool view themselves. Additionally, the classification is carried out in simplistic categories and contains only the most basic classes, for example “male” and “female” for gender. By continuing to use the tool, you certify that you understand the ethical implications such assessments have for the interpretation of the results." +This disclosure statement is included as a separate line of code early in the flow of the Emotion Detector. Once the user has agreed with the statement, further data analyses will also include these assessments. diff --git a/README.md b/README.md index fca8af46..dda4f78f 100644 --- a/README.md +++ b/README.md @@ -39,69 +39,7 @@ The `AMMICO` package can be installed using pip: ``` pip install ammico ``` -This will install the package and its dependencies locally. If after installation you get some errors when running some modules, please follow the instructions below. - -## Compatibility problems solving - -Some ammico components require `tensorflow` (e.g. Emotion detector), some `pytorch` (e.g. Summary detector). Sometimes there are compatibility problems between these two frameworks. To avoid these problems on your machines, you can prepare proper environment before installing the package (you need conda on your machine): - -### 1. First, install tensorflow (https://www.tensorflow.org/install/pip) -- create a new environment with python and activate it - - ```conda create -n ammico_env python=3.10``` - - ```conda activate ammico_env``` -- install cudatoolkit from conda-forge - - ``` conda install -c conda-forge cudatoolkit=11.8.0``` -- install nvidia-cudnn-cu11 from pip - - ```python -m pip install nvidia-cudnn-cu11==8.6.0.163``` -- add script that runs when conda environment `ammico_env` is activated to put the right libraries on your LD_LIBRARY_PATH - - ``` - mkdir -p $CONDA_PREFIX/etc/conda/activate.d - echo 'CUDNN_PATH=$(dirname $(python -c "import nvidia.cudnn;print(nvidia.cudnn.__file__)"))' >> $CONDA_PREFIX/etc/conda/activate.d/env_vars.sh - echo 'export LD_LIBRARY_PATH=$CUDNN_PATH/lib:$CONDA_PREFIX/lib/:$LD_LIBRARY_PATH' >> $CONDA_PREFIX/etc/conda/activate.d/env_vars.sh - source $CONDA_PREFIX/etc/conda/activate.d/env_vars.sh - ``` -- deactivate and re-activate conda environment to call script above - - ```conda deactivate``` - - ```conda activate ammico_env ``` - -- install tensorflow - - ```python -m pip install tensorflow==2.12.1``` - -### 2. Second, install pytorch - -- install pytorch for same cuda version as above - - ```python -m pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118``` - -### 3. After we prepared right environment we can install the ```ammico``` package - -- ```python -m pip install ammico``` - -It is done. - -### Micromamba -If you are using micromamba you can prepare environment with just one command: - -```micromamba create --no-channel-priority -c nvidia -c pytorch -c conda-forge -n ammico_env "python=3.10" pytorch torchvision torchaudio pytorch-cuda "tensorflow-gpu<=2.12.3" "numpy<=1.23.4"``` - -### Windows - -To make pycocotools work on Windows OS you may need to install `vs_BuildTools.exe` from https://visualstudio.microsoft.com/visual-cpp-build-tools/ and choose following elements: -- `Visual Studio extension development` -- `MSVC v143 - VS 2022 C++ x64/x86 build tools` -- `Windows 11 SDK` for Windows 11 (or `Windows 10 SDK` for Windows 10) - -Be careful, it requires around 7 GB of disk space. - -![Screenshot 2023-06-01 165712](https://github.com/ssciwr/AMMICO/assets/8105097/3dfb302f-c390-46a7-a700-4e044f56c30f) +This will install the package and its dependencies locally. If after installation you get some errors when running some modules, please follow the instructions in the [FAQ](FAQ.md). ## Usage @@ -131,7 +69,7 @@ The [Hugging Face transformers library](https://huggingface.co/) is used to perf ### Content extraction -The image content ("caption") is extracted using the [LAVIS](https://github.com/salesforce/LAVIS) library. This library enables vision intelligence extraction using several state-of-the-art models, depending on the task. Further, it allows feature extraction from the images, where users can input textual and image queries, and the images in the database are matched to that query (multimodal search). Another option is question answering, where the user inputs a text question and the library finds the images that match the query. +The image content ("caption") is extracted using the [LAVIS](https://github.com/salesforce/LAVIS) library. This library enables vision intelligence extraction using several state-of-the-art models such as BLIP and BLIP2, depending on the task and user selection. Further, it allows feature extraction from the images, where users can input textual and image queries, and the images in the database are matched to that query (multimodal search). Another option is question answering, where the user inputs a text question and the library finds the images that match the query. ### Emotion recognition @@ -144,38 +82,3 @@ Color detection is carried out using [colorgram.py](https://github.com/obskyr/co ### Cropping of posts Social media posts can automatically be cropped to remove further comments on the page and restrict the textual content to the first comment only. - - -# FAQ - -## What happens to the images that are sent to google Cloud Vision? - -According to the [google Vision API](https://cloud.google.com/vision/docs/data-usage), the images that are uploaded and analysed are not stored and not shared with third parties: - -> We won't make the content that you send available to the public. We won't share the content with any third party. The content is only used by Google as necessary to provide the Vision API service. Vision API complies with the Cloud Data Processing Addendum. - -> For online (immediate response) operations (`BatchAnnotateImages` and `BatchAnnotateFiles`), the image data is processed in memory and not persisted to disk. -For asynchronous offline batch operations (`AsyncBatchAnnotateImages` and `AsyncBatchAnnotateFiles`), we must store that image for a short period of time in order to perform the analysis and return the results to you. The stored image is typically deleted right after the processing is done, with a failsafe Time to live (TTL) of a few hours. -Google also temporarily logs some metadata about your Vision API requests (such as the time the request was received and the size of the request) to improve our service and combat abuse. - -## What happens to the text that is sent to google Translate? - -According to [google Translate](https://cloud.google.com/translate/data-usage), the data is not stored after processing and not made available to third parties: - -> We will not make the content of the text that you send available to the public. We will not share the content with any third party. The content of the text is only used by Google as necessary to provide the Cloud Translation API service. Cloud Translation API complies with the Cloud Data Processing Addendum. - -> When you send text to Cloud Translation API, text is held briefly in-memory in order to perform the translation and return the results to you. - -## What happens if I don't have internet access - can I still use ammico? - -Some features of ammico require internet access; a general answer to this question is not possible, some services require an internet connection, others can be used offline: - -- Text extraction: To extract text from images, and translate the text, the data needs to be processed by google Cloud Vision and google Translate, which run in the cloud. Without internet access, text extraction and translation is not possible. -- Image summary and query: After an initial download of the models, the `summary` module does not require an internet connection. -- Facial expressions: After an initial download of the models, the `faces` module does not require an internet connection. -- Multimodal search: After an initial download of the models, the `multimodal_search` module does not require an internet connection. -- Color analysis: The `color` module does not require an internet connection. - -## Why don't I get probabilistic assessments of age, gender and race when running the Emotion Detector? -Due to well documented biases in the detection of minorities with computer vision tools, and to the ethical implications of such detection, these parts of the tool are not directly made available to users. To access these capabilities, users must first agree with a ethical disclosure statement that reads: "The Emotion Detector uses RetinaFace to probabilistically assess the gender, age and race of the detected faces. Such assessments may not reflect how the individuals identified by the tool view themselves. Additionally, the classification is carried out in simplistic categories and contains only the most basic classes, for example “male” and “female” for gender. By continuing to use the tool, you certify that you understand the ethical implications such assessments have for the interpretation of the results." -This disclosure statement is included as a separate line of code early in the flow of the Emotion Detector. Once the user has agreed with the statement, further data analyses will also include these assessments. From c1fe64da13364a7efe19eb542559cb4670f5e7b7 Mon Sep 17 00:00:00 2001 From: Inga Ulusoy Date: Fri, 4 Oct 2024 12:40:42 +0200 Subject: [PATCH 04/19] add privacy disclosure statement --- ammico/__init__.py | 3 +- ammico/display.py | 26 +++++- ammico/faces.py | 30 ++++--- ammico/notebooks/DemoNotebook_ammico.ipynb | 32 +++++++- ammico/test/test_display.py | 4 + ammico/test/test_text.py | 96 ++++++++++++++-------- ammico/text.py | 73 ++++++++++++++++ 7 files changed, 212 insertions(+), 52 deletions(-) diff --git a/ammico/__init__.py b/ammico/__init__.py index 9ec19fc2..ead8dad6 100644 --- a/ammico/__init__.py +++ b/ammico/__init__.py @@ -8,7 +8,7 @@ from ammico.faces import EmotionDetector, ethical_disclosure from ammico.multimodal_search import MultimodalSearch from ammico.summary import SummaryDetector -from ammico.text import TextDetector, TextAnalyzer, PostprocessText +from ammico.text import TextDetector, TextAnalyzer, PostprocessText, privacy_disclosure from ammico.utils import find_files, get_dataframe # Export the version defined in project metadata @@ -28,4 +28,5 @@ "find_files", "get_dataframe", "ethical_disclosure", + "privacy_disclosure", ] diff --git a/ammico/display.py b/ammico/display.py index e8e73e53..2f359db2 100644 --- a/ammico/display.py +++ b/ammico/display.py @@ -99,6 +99,7 @@ def __init__(self, mydict: dict) -> None: State("setting_Text_analyse_text", "value"), State("setting_Text_model_names", "value"), State("setting_Text_revision_numbers", "value"), + State("setting_privacy_env_var", "value"), State("setting_Emotion_emotion_threshold", "value"), State("setting_Emotion_race_threshold", "value"), State("setting_Emotion_gender_threshold", "value"), @@ -171,7 +172,24 @@ def _create_setting_layout(self): id="setting_Text_analyse_text", style={"margin-bottom": "10px"}, ), - ), # row 1 + ), + # row 1 + dbc.Row( + dbc.Col( + [ + html.P( + "Privacy disclosure acceptance environment variable" + ), + dcc.Input( + type="text", + value="PRIVACY_AMMICO", + id="setting_privacy_env_var", + style={"width": "100%"}, + ), + ], + align="start", + ), + ), # text row 2 dbc.Row( [ @@ -469,6 +487,7 @@ def _right_output_analysis( settings_text_analyse_text: list, settings_text_model_names: str, settings_text_revision_numbers: str, + setting_privacy_env_var: str, setting_emotion_emotion_threshold: int, setting_emotion_race_threshold: int, setting_emotion_gender_threshold: int, @@ -521,6 +540,11 @@ def _right_output_analysis( if (settings_text_revision_numbers is not None) else None ), + accept_privacy=( + setting_privacy_env_var + if setting_privacy_env_var + else "PRIVACY_AMMICO" + ), ) elif detector_value == "EmotionDetector": detector_class = identify_function( diff --git a/ammico/faces.py b/ammico/faces.py index 30b5b17e..d6990c10 100644 --- a/ammico/faces.py +++ b/ammico/faces.py @@ -79,6 +79,19 @@ def _processor(fname, action, pooch): ), ) +ETHICAL_STATEMENT = """This analysis uses the DeepFace and RetinaFace libraries. + DeepFace and RetinaFace provide wrappers to trained models in face recognition and + emotion detection. Age, gender and race / ethnicity models were trained + on the backbone of VGG-Face with transfer learning. + ETHICAL DISCLOSURE STATEMENT: + The Emotion Detector uses RetinaFace to probabilistically assess the gender, age and + race of the detected faces. Such assessments may not reflect how the individuals + identify. Additionally, the classification is carried + out in simplistic categories and contains only the most basic classes, for example + “male” and “female” for gender. By continuing to use the tool, you certify that you + understand the ethical implications such assessments have for the interpretation of + the results.""" + def ethical_disclosure(accept_disclosure: str = "DISCLOSURE_AMMICO"): """ @@ -106,22 +119,7 @@ def _ask_for_disclosure_acceptance(accept_disclosure: str = "DISCLOSURE_AMMICO") """ Asks the user to accept the disclosure. """ - print("This analysis uses the DeepFace and RetinaFace libraries.") - print( - """ - DeepFace and RetinaFace provide wrappers to trained models in face recognition and - emotion detection. Age, gender and race / ethnicity models were trained - on the backbone of VGG-Face with transfer learning. - ETHICAL DISCLOSURE STATEMENT: - The Emotion Detector uses RetinaFace to probabilistically assess the gender, age and - race of the detected faces. Such assessments may not reflect how the individuals - identified by the tool view themselves. Additionally, the classification is carried - out in simplistic categories and contains only the most basic classes, for example - “male” and “female” for gender. By continuing to use the tool, you certify that you - understand the ethical implications such assessments have for the interpretation of - the results. - """ - ) + print(ETHICAL_STATEMENT) answer = input("Do you accept the disclosure? (yes/no): ") answer = answer.lower().strip() if answer == "yes": diff --git a/ammico/notebooks/DemoNotebook_ammico.ipynb b/ammico/notebooks/DemoNotebook_ammico.ipynb index c2ff55ff..ee8bde72 100644 --- a/ammico/notebooks/DemoNotebook_ammico.ipynb +++ b/ammico/notebooks/DemoNotebook_ammico.ipynb @@ -170,7 +170,7 @@ "source": [ "image_dict = ammico.find_files(\n", " # path=\"/content/drive/MyDrive/misinformation-data/\",\n", - " path=str(data_path),\n", + " path=\"data-test/\",\n", " limit=15,\n", ")" ] @@ -207,6 +207,34 @@ "_ = ammico.ethical_disclosure(accept_disclosure=accept_disclosure)" ] }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Privacy disclosure statement\n", + "\n", + "If you want to run an analysis using the TextDetector detector type, you have first have to respond to a privacy disclosure statement. This disclosure statement ensures that you are aware that your data will be sent to google cloud vision servers for analysis.\n", + "\n", + "For this, answer \"yes\" or \"no\" to the below prompt. This will set an environment variable with the name given as in `accept_privacy`. To re-run the disclosure prompt, unset the variable by uncommenting the line `os.environ.pop(accept_privacy, None)`. To permanently set this environment variable, add it to your shell via your `.profile` or `.bashr` file.\n", + "\n", + "If the privacy disclosure statement is accepted, the TextDetector will perform the text extraction, translation and if selected, analysis. If the privacy disclosure is rejected, no text processing will be carried out and you cannot use the TextDetector." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# respond to the privacy disclosure statement\n", + "# this will set an environment variable for you\n", + "# if you do not want to re-accept the privacy disclosure every time, you can set this environment variable in your shell\n", + "# to re-set the environment variable, uncomment the below line\n", + "accept_privacy = \"PRIVACY_AMMICO\"\n", + "os.environ.pop(accept_privacy, None)\n", + "_ = ammico.privacy_disclosure(accept_privacy=accept_privacy)" + ] + }, { "cell_type": "code", "execution_count": null, @@ -1424,5 +1452,5 @@ } }, "nbformat": 4, - "nbformat_minor": 2 + "nbformat_minor": 4 } diff --git a/ammico/test/test_display.py b/ammico/test/test_display.py index 75baa5f2..b43d118c 100644 --- a/ammico/test/test_display.py +++ b/ammico/test/test_display.py @@ -43,6 +43,7 @@ def test_AnalysisExplorer(get_AE, get_options): def test_right_output_analysis_summary(get_AE, get_options, monkeypatch): + monkeypatch.setenv("SOME_VAR", "True") monkeypatch.setenv("OTHER_VAR", "True") get_AE._right_output_analysis( 2, @@ -52,6 +53,7 @@ def test_right_output_analysis_summary(get_AE, get_options, monkeypatch): True, None, None, + "SOME_VAR", 50, 50, 50, @@ -64,6 +66,7 @@ def test_right_output_analysis_summary(get_AE, get_options, monkeypatch): def test_right_output_analysis_emotions(get_AE, get_options, monkeypatch): + monkeypatch.setenv("SOME_VAR", "True") monkeypatch.setenv("OTHER_VAR", "True") get_AE._right_output_analysis( 2, @@ -73,6 +76,7 @@ def test_right_output_analysis_emotions(get_AE, get_options, monkeypatch): True, None, None, + "SOME_VAR", 50, 50, 50, diff --git a/ammico/test/test_text.py b/ammico/test/test_text.py index 99714959..67ae0c76 100644 --- a/ammico/test/test_text.py +++ b/ammico/test/test_text.py @@ -24,9 +24,34 @@ def set_testdict(get_path): LANGUAGES = ["de", "en", "en"] -def test_TextDetector(set_testdict): +@pytest.fixture +def accepted(monkeypatch): + monkeypatch.setenv("OTHER_VAR", "True") + tt.TextDetector({}, accept_privacy="OTHER_VAR") + return "OTHER_VAR" + + +def test_privacy_statement(monkeypatch): + # test pre-set variables: privacy + monkeypatch.delattr("builtins.input", raising=False) + monkeypatch.setenv("OTHER_VAR", "something") + with pytest.raises(ValueError): + tt.TextDetector({}, accept_privacy="OTHER_VAR") + monkeypatch.setenv("OTHER_VAR", "False") + with pytest.raises(ValueError): + tt.TextDetector({}, accept_privacy="OTHER_VAR") + with pytest.raises(ValueError): + tt.TextDetector({}, accept_privacy="OTHER_VAR").get_text_from_image() + with pytest.raises(ValueError): + tt.TextDetector({}, accept_privacy="OTHER_VAR").translate_text() + monkeypatch.setenv("OTHER_VAR", "True") + pd = tt.TextDetector({}, accept_privacy="OTHER_VAR") + assert pd.accepted + + +def test_TextDetector(set_testdict, accepted): for item in set_testdict: - test_obj = tt.TextDetector(set_testdict[item]) + test_obj = tt.TextDetector(set_testdict[item], accept_privacy=accepted) assert not test_obj.analyse_text assert not test_obj.skip_extraction assert test_obj.subdict["filename"] == set_testdict[item]["filename"] @@ -39,17 +64,21 @@ def test_TextDetector(set_testdict): assert test_obj.revision_summary == "a4f8f3e" assert test_obj.revision_sentiment == "af0f99b" assert test_obj.revision_ner == "f2482bf" - test_obj = tt.TextDetector({}, analyse_text=True, skip_extraction=True) + test_obj = tt.TextDetector( + {}, analyse_text=True, skip_extraction=True, accept_privacy=accepted + ) assert test_obj.analyse_text assert test_obj.skip_extraction with pytest.raises(ValueError): - tt.TextDetector({}, analyse_text=1.0) + tt.TextDetector({}, analyse_text=1.0, accept_privacy=accepted) with pytest.raises(ValueError): - tt.TextDetector({}, skip_extraction=1.0) + tt.TextDetector({}, skip_extraction=1.0, accept_privacy=accepted) -def test_run_spacy(set_testdict, get_path): - test_obj = tt.TextDetector(set_testdict["IMG_3755"], analyse_text=True) +def test_run_spacy(set_testdict, get_path, accepted): + test_obj = tt.TextDetector( + set_testdict["IMG_3755"], analyse_text=True, accept_privacy=accepted + ) ref_file = get_path + "text_IMG_3755.txt" with open(ref_file, "r") as file: reference_text = file.read() @@ -58,18 +87,18 @@ def test_run_spacy(set_testdict, get_path): assert isinstance(test_obj.doc, spacy.tokens.doc.Doc) -def test_clean_text(set_testdict): +def test_clean_text(set_testdict, accepted): nlp = spacy.load("en_core_web_md") doc = nlp("I like cats and fjejg") - test_obj = tt.TextDetector(set_testdict["IMG_3755"]) + test_obj = tt.TextDetector(set_testdict["IMG_3755"], accept_privacy=accepted) test_obj.doc = doc test_obj.clean_text() result = "I like cats and" assert test_obj.subdict["text_clean"] == result -def test_init_revision_numbers_and_models(): - test_obj = tt.TextDetector({}) +def test_init_revision_numbers_and_models(accepted): + test_obj = tt.TextDetector({}, accept_privacy=accepted) # check the default options assert test_obj.model_summary == "sshleifer/distilbart-cnn-12-6" assert test_obj.model_sentiment == "distilbert-base-uncased-finetuned-sst-2-english" @@ -79,7 +108,7 @@ def test_init_revision_numbers_and_models(): assert test_obj.revision_ner == "f2482bf" # provide non-default options model_names = ["facebook/bart-large-cnn", None, None] - test_obj = tt.TextDetector({}, model_names=model_names) + test_obj = tt.TextDetector({}, model_names=model_names, accept_privacy=accepted) assert test_obj.model_summary == "facebook/bart-large-cnn" assert test_obj.model_sentiment == "distilbert-base-uncased-finetuned-sst-2-english" assert test_obj.model_ner == "dbmdz/bert-large-cased-finetuned-conll03-english" @@ -91,6 +120,7 @@ def test_init_revision_numbers_and_models(): {}, model_names=model_names, revision_numbers=revision_numbers, + accept_privacy=accepted, ) assert test_obj.model_summary == "facebook/bart-large-cnn" assert test_obj.model_sentiment == "distilbert-base-uncased-finetuned-sst-2-english" @@ -100,30 +130,32 @@ def test_init_revision_numbers_and_models(): assert test_obj.revision_ner == "f2482bf" # now test the exceptions with pytest.raises(ValueError): - tt.TextDetector({}, analyse_text=1.0) + tt.TextDetector({}, analyse_text=1.0, accept_privacy=accepted) with pytest.raises(ValueError): - tt.TextDetector({}, model_names=1.0) + tt.TextDetector({}, model_names=1.0, accept_privacy=accepted) with pytest.raises(ValueError): - tt.TextDetector({}, revision_numbers=1.0) + tt.TextDetector({}, revision_numbers=1.0, accept_privacy=accepted) with pytest.raises(ValueError): - tt.TextDetector({}, model_names=["something"]) + tt.TextDetector({}, model_names=["something"], accept_privacy=accepted) with pytest.raises(ValueError): - tt.TextDetector({}, revision_numbers=["something"]) + tt.TextDetector({}, revision_numbers=["something"], accept_privacy=accepted) @pytest.mark.gcv -def test_analyse_image(set_testdict, set_environ): +def test_analyse_image(set_testdict, set_environ, accepted): for item in set_testdict: - test_obj = tt.TextDetector(set_testdict[item]) + test_obj = tt.TextDetector(set_testdict[item], accept_privacy=accepted) test_obj.analyse_image() - test_obj = tt.TextDetector(set_testdict[item], analyse_text=True) + test_obj = tt.TextDetector( + set_testdict[item], analyse_text=True, accept_privacy=accepted + ) test_obj.analyse_image() @pytest.mark.gcv -def test_get_text_from_image(set_testdict, get_path, set_environ): +def test_get_text_from_image(set_testdict, get_path, set_environ, accepted): for item in set_testdict: - test_obj = tt.TextDetector(set_testdict[item]) + test_obj = tt.TextDetector(set_testdict[item], accept_privacy=accepted) test_obj.get_text_from_image() ref_file = get_path + "text_" + item + ".txt" with open(ref_file, "r", encoding="utf8") as file: @@ -131,9 +163,9 @@ def test_get_text_from_image(set_testdict, get_path, set_environ): assert test_obj.subdict["text"].replace("\n", " ") == reference_text -def test_translate_text(set_testdict, get_path): +def test_translate_text(set_testdict, get_path, accepted): for item, lang in zip(set_testdict, LANGUAGES): - test_obj = tt.TextDetector(set_testdict[item]) + test_obj = tt.TextDetector(set_testdict[item], accept_privacy=accepted) ref_file = get_path + "text_" + item + ".txt" trans_file = get_path + "text_translated_" + item + ".txt" with open(ref_file, "r", encoding="utf8") as file: @@ -148,8 +180,8 @@ def test_translate_text(set_testdict, get_path): assert word in translated_text -def test_remove_linebreaks(): - test_obj = tt.TextDetector({}) +def test_remove_linebreaks(accepted): + test_obj = tt.TextDetector({}, accept_privacy=accepted) test_obj.subdict["text"] = "This is \n a test." test_obj.subdict["text_english"] = "This is \n another\n test." test_obj.remove_linebreaks() @@ -157,9 +189,9 @@ def test_remove_linebreaks(): assert test_obj.subdict["text_english"] == "This is another test." -def test_text_summary(get_path): +def test_text_summary(get_path, accepted): mydict = {} - test_obj = tt.TextDetector(mydict, analyse_text=True) + test_obj = tt.TextDetector(mydict, analyse_text=True, accept_privacy=accepted) ref_file = get_path + "example_summary.txt" with open(ref_file, "r", encoding="utf8") as file: reference_text = file.read() @@ -169,18 +201,18 @@ def test_text_summary(get_path): assert mydict["text_summary"] == reference_summary -def test_text_sentiment_transformers(): +def test_text_sentiment_transformers(accepted): mydict = {} - test_obj = tt.TextDetector(mydict, analyse_text=True) + test_obj = tt.TextDetector(mydict, analyse_text=True, accept_privacy=accepted) mydict["text_english"] = "I am happy that the CI is working again." test_obj.text_sentiment_transformers() assert mydict["sentiment"] == "POSITIVE" assert mydict["sentiment_score"] == pytest.approx(0.99, 0.02) -def test_text_ner(): +def test_text_ner(accepted): mydict = {} - test_obj = tt.TextDetector(mydict, analyse_text=True) + test_obj = tt.TextDetector(mydict, analyse_text=True, accept_privacy=accepted) mydict["text_english"] = "Bill Gates was born in Seattle." test_obj.text_ner() assert mydict["entity"] == ["Bill Gates", "Seattle"] diff --git a/ammico/text.py b/ammico/text.py index 277a6b13..139f245b 100644 --- a/ammico/text.py +++ b/ammico/text.py @@ -3,12 +3,68 @@ from googletrans import Translator import spacy import io +import os from ammico.utils import AnalysisMethod import grpc import pandas as pd from bertopic import BERTopic from transformers import pipeline +PRIVACY_STATEMENT = """PRIVACY STATEMENT: The Text Detector uses Google Services + for text extraction and translation, and requires a Google Cloud Vision API Key + to work. Instructions about how to get such a key are provided here: + https://ssciwr.github.io/AMMICO/build/html/notebooks/DemoNotebook_ + ammico.html#Step-0:-Create-and-set-a-Google-Cloud-Vision-Key. + Google's privacy policy can be read here: + https://policies.google.com/privacy. By continuing to use this Detector, + you agree to send the data you want analyzed to the Google servers for + extraction and translation. """ + + +def privacy_disclosure(accept_privacy: str = "PRIVACY_AMMICO"): + """ + Asks the user to accept the privacy statement. + + Args: + accept_privacy (str): The name of the disclosure variable (default: "PRIVACY_AMMICO"). + """ + if not os.environ.get(accept_privacy): + accepted = _ask_for_privacy_acceptance(accept_privacy) + elif os.environ.get(accept_privacy) == "False": + accepted = False + elif os.environ.get(accept_privacy) == "True": + accepted = True + else: + print( + "Could not determine privacy disclosure - skipping \ + text detection and translation." + ) + accepted = False + return accepted + + +def _ask_for_privacy_acceptance(accept_privacy: str = "PRIVACY_AMMICO"): + """ + Asks the user to accept the disclosure. + """ + print(PRIVACY_STATEMENT) + answer = input("Do you accept the privacy disclosure? (yes/no): ") + answer = answer.lower().strip() + if answer == "yes": + print("You have accepted the privacy disclosure.") + print("""Text detection and translation will be performed.""") + os.environ[accept_privacy] = "True" + accepted = True + elif answer == "no": + print("You have not accepted the privacy disclosure.") + print("No text detection and translation will be performed.") + os.environ[accept_privacy] = "False" + accepted = False + else: + print("Please answer with yes or no.") + accepted = _ask_for_privacy_acceptance() + return accepted + class TextDetector(AnalysisMethod): def __init__( @@ -18,6 +74,7 @@ def __init__( skip_extraction: bool = False, model_names: list = None, revision_numbers: list = None, + accept_privacy: str = "PRIVACY_AMMICO", ) -> None: """Init text detection class. @@ -41,6 +98,9 @@ def __init__( Defaults to None, except if the default models are used; then it defaults to "a4f8f3e" (summary, distilbart), "af0f99b" (sentiment, distilbert), "f2482bf" (NER, bert). + accept_privacy (str, optional): Environment variable to accept the privacy + statement for the Google Cloud processing of the data. Defaults to + "PRIVACY_AMMICO". """ super().__init__(subdict) # disable this for now @@ -48,6 +108,11 @@ def __init__( # the reason is that they are inconsistent depending on the selected # options, and also this may not be really necessary and rather restrictive # self.subdict.update(self.set_keys()) + self.accepted = privacy_disclosure(accept_privacy) + if not self.accepted: + raise ValueError( + "Privacy disclosure not accepted - skipping text detection." + ) self.translator = Translator() if not isinstance(analyse_text, bool): raise ValueError("analyse_text needs to be set to true or false") @@ -186,6 +251,10 @@ def analyse_image(self) -> dict: def get_text_from_image(self): """Detect text on the image using Google Cloud Vision API.""" + if not self.accepted: + raise ValueError( + "Privacy disclosure not accepted - skipping text detection." + ) path = self.subdict["filename"] try: client = vision.ImageAnnotatorClient() @@ -221,6 +290,10 @@ def get_text_from_image(self): def translate_text(self): """Translate the detected text to English using the Translator object.""" + if not self.accepted: + raise ValueError( + "Privacy disclosure not accepted - skipping text translation." + ) translated = self.translator.translate(self.subdict["text"]) self.subdict["text_language"] = translated.src self.subdict["text_english"] = translated.text From 4a1003e7c4231bc9b7864bf21a9abc80a1298865 Mon Sep 17 00:00:00 2001 From: Inga Ulusoy Date: Fri, 4 Oct 2024 12:41:59 +0200 Subject: [PATCH 05/19] do not install using uv --- .github/workflows/ci.yml | 2 -- 1 file changed, 2 deletions(-) diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml index 598f29ea..0aa19825 100644 --- a/.github/workflows/ci.yml +++ b/.github/workflows/ci.yml @@ -30,9 +30,7 @@ jobs: brew install ffmpeg - name: Install dependencies run: | - # python -m pip install uv pip install -e . - # uv pip install --system -e . - name: Run pytest test_colors run: | cd ammico From 76627e669e1a8af662223ca67acf626a59d3cca8 Mon Sep 17 00:00:00 2001 From: Inga Ulusoy Date: Fri, 4 Oct 2024 12:54:38 +0200 Subject: [PATCH 06/19] update docs notebook --- ammico/notebooks/DemoNotebook_ammico.ipynb | 4 +- .../notebooks/DemoNotebook_ammico.ipynb | 114 ++++++++++++++---- 2 files changed, 95 insertions(+), 23 deletions(-) diff --git a/ammico/notebooks/DemoNotebook_ammico.ipynb b/ammico/notebooks/DemoNotebook_ammico.ipynb index ee8bde72..90087f44 100644 --- a/ammico/notebooks/DemoNotebook_ammico.ipynb +++ b/ammico/notebooks/DemoNotebook_ammico.ipynb @@ -170,7 +170,7 @@ "source": [ "image_dict = ammico.find_files(\n", " # path=\"/content/drive/MyDrive/misinformation-data/\",\n", - " path=\"data-test/\",\n", + " path=str(data_path),\n", " limit=15,\n", ")" ] @@ -1434,7 +1434,7 @@ ], "metadata": { "kernelspec": { - "display_name": "Python 3 (ipykernel)", + "display_name": "ammico", "language": "python", "name": "python3" }, diff --git a/docs/source/notebooks/DemoNotebook_ammico.ipynb b/docs/source/notebooks/DemoNotebook_ammico.ipynb index fdfd71c2..e5cdc4b8 100644 --- a/docs/source/notebooks/DemoNotebook_ammico.ipynb +++ b/docs/source/notebooks/DemoNotebook_ammico.ipynb @@ -18,7 +18,10 @@ "metadata": {}, "outputs": [], "source": [ - "# if running on google colab\n", + "# if running on google colab\\\n", + "# PLEASE RUN THIS ONLY AS CPU RUNTIME\n", + "# for a GPU runtime, there are conflicts with pre-installed packages - \n", + "# you first need to uninstall them (prepare a clean environment with no pre-installs) and then install ammico\n", "# flake8-noqa-cell\n", "\n", "if \"google.colab\" in str(get_ipython()):\n", @@ -26,9 +29,10 @@ " # install setuptools\n", " # %pip install setuptools==61 -qqq\n", " # uninstall some pre-installed packages due to incompatibility\n", - " %pip uninstall --yes tensorflow-probability dopamine-rl lida pandas-gbq torchaudio torchdata torchtext orbax-checkpoint flex-y -qqq\n", + " %pip uninstall --yes tensorflow-probability dopamine-rl lida pandas-gbq torchaudio torchdata torchtext orbax-checkpoint flex-y jax jaxlib -qqq\n", " # install ammico\n", " %pip install git+https://github.com/ssciwr/ammico.git -qqq\n", + " # install older version of jax to support transformers use of diffusers\n", " # mount google drive for data and API key\n", " from google.colab import drive\n", "\n", @@ -92,12 +96,12 @@ "outputs": [], "source": [ "import os\n", + "# jax also sometimes leads to problems on google colab\n", + "# if this is the case, try restarting the kernel and executing this \n", + "# and the above two code cells again\n", "import ammico\n", "# for displaying a progress bar\n", - "from tqdm import tqdm\n", - "# to get the reference data for text_dict\n", - "import importlib_resources\n", - "pkg = importlib_resources.files(\"ammico\")" + "from tqdm import tqdm" ] }, { @@ -151,7 +155,7 @@ "| `limit` | `int` | maximum number of files to read (defaults to `20`, for all images set to `None` or `-1`) |\n", "| `random_seed` | `str` | the random seed for shuffling the images; applies when only a few images are read and the selection should be preserved (defaults to `None`) |\n", "\n", - "The `find_files` function returns a nested dict that contains the file ids and the paths to the files and is empty otherwise. This dict is filled step by step with more data as each detector class is run on the data (see below).\n", + "The `find_files` function returns a nested dictionary that contains the file ids and the paths to the files and is empty otherwise. This dict is filled step by step with more data as each detector class is run on the data (see below).\n", "\n", "If you downloaded the test dataset above, you can directly provide the path you already set for the test directory, `data_path`. The below cell is already set up for the test dataset.\n", "\n", @@ -183,9 +187,9 @@ "\n", "If you want to run an analysis using the EmotionDetector detector type, you have first have to respond to an ethical disclosure statement. This disclosure statement ensures that you only use the full capabilities of the EmotionDetector after you have been made aware of its shortcomings.\n", "\n", - "For this, answer \"yes\" or \"no\" to the below prompt. This will set an environment variable with the name given as in `accept_disclosure`. To re-run the disclosure prompt, unset the variable by uncommenting the line `os.environ.pop(accept_disclosure, None)`. To permanently set this envorinment variable, add it to your shell via your `.profile` or `.bashr` file.\n", + "For this, answer \"yes\" or \"no\" to the below prompt. This will set an environment variable with the name given as in `accept_disclosure`. To re-run the disclosure prompt, unset the variable by uncommenting the line `os.environ.pop(accept_disclosure, None)`. To permanently set this environment variable, add it to your shell via your `.profile` or `.bashr` file.\n", "\n", - "If the disclosure statement is accepted, the EmotionDetector will perform age, gender and race/ethnicity classification dependend on the provided thresholds. If the disclosure is rejected, only the presence of faces and emotion (if not wearing a mask) is detected." + "If the disclosure statement is accepted, the EmotionDetector will perform age, gender and race/ethnicity classification depending on the provided thresholds. If the disclosure is rejected, only the presence of faces and emotion (if not wearing a mask) is detected." ] }, { @@ -203,6 +207,34 @@ "_ = ammico.ethical_disclosure(accept_disclosure=accept_disclosure)" ] }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Privacy disclosure statement\n", + "\n", + "If you want to run an analysis using the TextDetector detector type, you have first have to respond to a privacy disclosure statement. This disclosure statement ensures that you are aware that your data will be sent to google cloud vision servers for analysis.\n", + "\n", + "For this, answer \"yes\" or \"no\" to the below prompt. This will set an environment variable with the name given as in `accept_privacy`. To re-run the disclosure prompt, unset the variable by uncommenting the line `os.environ.pop(accept_privacy, None)`. To permanently set this environment variable, add it to your shell via your `.profile` or `.bashr` file.\n", + "\n", + "If the privacy disclosure statement is accepted, the TextDetector will perform the text extraction, translation and if selected, analysis. If the privacy disclosure is rejected, no text processing will be carried out and you cannot use the TextDetector." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# respond to the privacy disclosure statement\n", + "# this will set an environment variable for you\n", + "# if you do not want to re-accept the privacy disclosure every time, you can set this environment variable in your shell\n", + "# to re-set the environment variable, uncomment the below line\n", + "accept_privacy = \"PRIVACY_AMMICO\"\n", + "# os.environ.pop(accept_privacy, None)\n", + "_ = ammico.privacy_disclosure(accept_privacy=accept_privacy)" + ] + }, { "cell_type": "code", "execution_count": null, @@ -253,9 +285,17 @@ "metadata": {}, "outputs": [], "source": [ + "# set the thresholds for the emotion detection\n", + "emotion_threshold = 50 # this is the default value for the detection confidence\n", + "# the lowest possible value is 0\n", + "# the highest possible value is 100\n", + "race_threshold = 50\n", + "gender_threshold = 50\n", "for num, key in tqdm(enumerate(image_dict.keys()), total=len(image_dict)): # loop through all images\n", - " image_dict[key] = ammico.EmotionDetector(image_dict[key]).analyse_image() # analyse image with EmotionDetector and update dict\n", - " \n", + " image_dict[key] = ammico.EmotionDetector(image_dict[key],\n", + " emotion_threshold=emotion_threshold,\n", + " race_threshold=race_threshold,\n", + " gender_threshold=gender_threshold).analyse_image() # analyse image with EmotionDetector and update dict\n", " if num % dump_every == 0 or num == len(image_dict) - 1: # save results every dump_every to dump_file\n", " image_df = ammico.get_dataframe(image_dict)\n", " image_df.to_csv(dump_file)" @@ -405,8 +445,7 @@ "metadata": {}, "outputs": [], "source": [ - "csv_path = pkg / \"data\" / \"ref\" / \"test.csv\"\n", - "ta = ammico.TextAnalyzer(csv_path=str(csv_path), column_key=\"text\")" + "ta = ammico.TextAnalyzer(csv_path=\"../data/ref/test.csv\", column_key=\"text\")" ] }, { @@ -530,6 +569,17 @@ " image_df.to_csv(dump_file)" ] }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# write output to csv\n", + "image_df = ammico.get_dataframe(image_dict)\n", + "image_df.to_csv(\"/content/drive/MyDrive/misinformation-data/data_out.csv\")" + ] + }, { "cell_type": "markdown", "metadata": {}, @@ -759,7 +809,7 @@ "# analysis_type can be \n", "# \"summary\",\n", "# \"questions\",\n", - "# \"summary_and_questions\".\n" + "# \"summary_and_questions\"." ] }, { @@ -806,7 +856,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "You can also ask sequential questions if you pass the argument `cosequential_questions=True`. This means that the answers to previous questions will be passed as context to the next question. However, this method will work a bit slower, because for each image the answers to the questions will not be calculated simultaneously, but sequentially. " + "You can also ask sequential questions if you pass the argument `consequential_questions=True`. This means that the answers to previous questions will be passed as context to the next question. However, this method will work a bit slower, because for each image the answers to the questions will not be calculated simultaneously, but sequentially. " ] }, { @@ -840,6 +890,17 @@ "image_dict" ] }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# write output to csv\n", + "image_df = ammico.get_dataframe(image_dict)\n", + "image_df.to_csv(\"/content/drive/MyDrive/misinformation-data/data_out.csv\")" + ] + }, { "cell_type": "markdown", "metadata": {}, @@ -855,7 +916,7 @@ "\n", "From the seven facial expressions, an overall dominating emotion category is identified: negative, positive, or neutral emotion. These are defined with the facial expressions angry, disgust, fear and sad for the negative category, happy for the positive category, and surprise and neutral for the neutral category.\n", "\n", - "A similar threshold as for the emotion recognition is set for the race/ethnicity and gender detection, `race_threshold` and `gender_threshold`, with the default set to 50% so that a confidence for race / gender above 0.5 only will return a value in the analysis.\n", + "A similar threshold as for the emotion recognition is set for the race/ethnicity and gender detection, `race_threshold` and `gender_threshold`, with the default set to 50% so that a confidence for race / gender above 0.5 only will return a value in the analysis. \n", "\n", "For age unfortunately no confidence value is accessible so that no threshold values can be set for this type of analysis. The [reported MAE of the model is ± 4.65](https://sefiks.com/2019/02/13/apparent-age-and-gender-prediction-in-keras/).\n", "\n", @@ -876,6 +937,17 @@ " accept_disclosure=\"DISCLOSURE_AMMICO\").analyse_image()" ] }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# write output to csv\n", + "image_df = ammico.get_dataframe(image_dict)\n", + "image_df.to_csv(\"/content/drive/MyDrive/misinformation-data/data_out.csv\")" + ] + }, { "cell_type": "markdown", "metadata": {}, @@ -986,7 +1058,7 @@ "source": [ "The images are then processed and stored in a numerical representation, a tensor. These tensors do not change for the same image and same model - so if you run this analysis once, and save the tensors giving a path with the keyword `path_to_save_tensors`, a file with filename `.__saved_features_image.pt` will be placed there.\n", "\n", - "This can save you time if you want to analyse same images with the same model but different questions. To run using the saved tensors, execute the below code giving the path and name of the tensor file. Any subsequent query of the model will run in a fraction of the time than it run in initially." + "This can save you time if you want to analyse the same images with the same model but different questions. To run using the saved tensors, execute the below code giving the path and name of the tensor file. Any subsequent query of the model will run in a fraction of the time than it run in initially." ] }, { @@ -1050,7 +1122,7 @@ "You can filter your results in 3 different ways:\n", "- `filter_number_of_images` limits the number of images found. That is, if the parameter `filter_number_of_images = 10`, then the first 10 images that best match the query will be shown. The other images ranks will be set to `None` and the similarity value to `0`.\n", "- `filter_val_limit` limits the output of images with a similarity value not bigger than `filter_val_limit`. That is, if the parameter `filter_val_limit = 0.2`, all images with similarity less than 0.2 will be discarded.\n", - "- `filter_rel_error` (percentage) limits the output of images with a similarity value not bigger than `100 * abs(current_simularity_value - best_simularity_value_in_current_search)/best_simularity_value_in_current_search < filter_rel_error`. That is, if we set filter_rel_error = 30, it means that if the top1 image have 0.5 similarity value, we discard all image with similarity less than 0.35." + "- `filter_rel_error` (percentage) limits the output of images with a similarity value not bigger than `100 * abs(current_similarity_value - best_similarity_value_in_current_search)/best_similarity_value_in_current_search < filter_rel_error`. That is, if we set filter_rel_error = 30, it means that if the top1 image have 0.5 similarity value, we discard all image with similarity less than 0.35." ] }, { @@ -1174,7 +1246,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "Then using the same output function you can add the `itm=True` argument to output the new image order. Remember that for images querys, an error will be thrown with `itm=True` argument. You can also add the `image_gradcam_with_itm` along with `itm=True` argument to output the heat maps of the calculated images." + "Then using the same output function you can add the `itm=True` argument to output the new image order. Remember that for images queries, an error will be thrown with `itm=True` argument. You can also add the `image_gradcam_with_itm` along with `itm=True` argument to output the heat maps of the calculated images." ] }, { @@ -1199,7 +1271,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "Convert the dictionary of dictionarys into a dictionary with lists:" + "Convert the dictionary of dictionaries into a dictionary with lists:" ] }, { @@ -1367,7 +1439,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.11.5" + "version": "3.11.9" } }, "nbformat": 4, From 599bdc05fe1ba4a9023bfd7d13e53359e172eb85 Mon Sep 17 00:00:00 2001 From: Inga Ulusoy Date: Fri, 4 Oct 2024 12:59:01 +0200 Subject: [PATCH 07/19] explicit install of libopenblas --- .github/workflows/ci.yml | 1 + 1 file changed, 1 insertion(+) diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml index 0aa19825..b79163f3 100644 --- a/.github/workflows/ci.yml +++ b/.github/workflows/ci.yml @@ -30,6 +30,7 @@ jobs: brew install ffmpeg - name: Install dependencies run: | + apt install libopenblas-dev # for scipy pip install -e . - name: Run pytest test_colors run: | From a9215ddf17861cb5179f3f6043ae9e0173fa02b2 Mon Sep 17 00:00:00 2001 From: Inga Ulusoy Date: Fri, 4 Oct 2024 13:02:29 +0200 Subject: [PATCH 08/19] explicit install of libopenblas --- .github/workflows/ci.yml | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml index b79163f3..45e5a523 100644 --- a/.github/workflows/ci.yml +++ b/.github/workflows/ci.yml @@ -28,9 +28,17 @@ jobs: if: matrix.os == 'macos-latest' run: | brew install ffmpeg + brew install openblas + - name: install openblas on linux + if: matrix.os == 'ubuntu-22.04' + run: | + sudo apt-get install libopenblas-dev + - name: install openblas on windows: + if: matrix.os == 'windows-latest' + run: | + choco install openblas - name: Install dependencies run: | - apt install libopenblas-dev # for scipy pip install -e . - name: Run pytest test_colors run: | From f22ed6035538ea36aa12ca2136f170590b1b7362 Mon Sep 17 00:00:00 2001 From: Inga Ulusoy Date: Fri, 4 Oct 2024 13:02:54 +0200 Subject: [PATCH 09/19] explicit install of libopenblas --- .github/workflows/ci.yml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml index 45e5a523..2a54e534 100644 --- a/.github/workflows/ci.yml +++ b/.github/workflows/ci.yml @@ -33,7 +33,7 @@ jobs: if: matrix.os == 'ubuntu-22.04' run: | sudo apt-get install libopenblas-dev - - name: install openblas on windows: + - name: install openblas on windows if: matrix.os == 'windows-latest' run: | choco install openblas From 7fc233f7d3294ae67c8f1878662ae17390693236 Mon Sep 17 00:00:00 2001 From: Inga Ulusoy Date: Fri, 4 Oct 2024 13:27:59 +0200 Subject: [PATCH 10/19] try to get scipy installed using uv --- .github/workflows/ci.yml | 13 ++++++------- 1 file changed, 6 insertions(+), 7 deletions(-) diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml index 2a54e534..d0a57f3e 100644 --- a/.github/workflows/ci.yml +++ b/.github/workflows/ci.yml @@ -15,7 +15,8 @@ jobs: strategy: fail-fast: false matrix: - os: [ubuntu-22.04,windows-latest,macos-latest] + # os: [ubuntu-22.04,windows-latest,macos-latest] + os: [ubuntu-22.04] python-version: [3.11] steps: - name: Checkout repository @@ -32,14 +33,12 @@ jobs: - name: install openblas on linux if: matrix.os == 'ubuntu-22.04' run: | - sudo apt-get install libopenblas-dev - - name: install openblas on windows - if: matrix.os == 'windows-latest' - run: | - choco install openblas + sudo apt-get install libopenblas-dev meson - name: Install dependencies run: | - pip install -e . + # pip install -e . + python -m pip install uv + uv pip install --system -e . - name: Run pytest test_colors run: | cd ammico From c47acf74c673dca7c2cb890b42cd49267cf22fd3 Mon Sep 17 00:00:00 2001 From: Inga Ulusoy Date: Fri, 4 Oct 2024 13:29:14 +0200 Subject: [PATCH 11/19] use ubuntu 24.04 --- .github/workflows/ci.yml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml index d0a57f3e..ddccfcf2 100644 --- a/.github/workflows/ci.yml +++ b/.github/workflows/ci.yml @@ -16,7 +16,7 @@ jobs: fail-fast: false matrix: # os: [ubuntu-22.04,windows-latest,macos-latest] - os: [ubuntu-22.04] + os: [ubuntu-24.04] python-version: [3.11] steps: - name: Checkout repository From 2e81c6642cd1db5210a7046cd349643901672d84 Mon Sep 17 00:00:00 2001 From: Inga Ulusoy Date: Fri, 4 Oct 2024 13:30:35 +0200 Subject: [PATCH 12/19] go back to pip --- .github/workflows/ci.yml | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml index ddccfcf2..d7baa16e 100644 --- a/.github/workflows/ci.yml +++ b/.github/workflows/ci.yml @@ -36,9 +36,9 @@ jobs: sudo apt-get install libopenblas-dev meson - name: Install dependencies run: | - # pip install -e . - python -m pip install uv - uv pip install --system -e . + pip install -e . + # python -m pip install uv + # uv pip install --system -e . - name: Run pytest test_colors run: | cd ammico From 3056f9072bd4ca3a7dc70954fdcbcd849dd843a7 Mon Sep 17 00:00:00 2001 From: Inga Ulusoy Date: Mon, 7 Oct 2024 07:59:25 +0200 Subject: [PATCH 13/19] try with scipy only --- .github/workflows/ci.yml | 103 ++++++++++++++++++++------------------- 1 file changed, 52 insertions(+), 51 deletions(-) diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml index d7baa16e..d0c9d043 100644 --- a/.github/workflows/ci.yml +++ b/.github/workflows/ci.yml @@ -33,58 +33,59 @@ jobs: - name: install openblas on linux if: matrix.os == 'ubuntu-22.04' run: | - sudo apt-get install libopenblas-dev meson + sudo apt-get install libopenblas-dev - name: Install dependencies run: | - pip install -e . + pip install scipy # python -m pip install uv # uv pip install --system -e . - - name: Run pytest test_colors - run: | - cd ammico - python -m pytest test/test_colors.py -svv --cov=. --cov-report=xml --cov-append - - name: Run pytest test_cropposts - run: | - cd ammico - python -m pytest test/test_cropposts.py -svv --cov=. --cov-report=xml --cov-append - - name: Run pytest test_display - run: | - cd ammico - python -m pytest test/test_display.py -svv --cov=. --cov-report=xml --cov-append - - name: Run pytest test_faces - run: | - cd ammico - python -m pytest test/test_faces.py -svv --cov=. --cov-report=xml --cov-append - - name: Run pytest test_multimodal_search - run: | - cd ammico - python -m pytest test/test_multimodal_search.py -m "not long" -svv --cov=. --cov-report=xml --cov-append - - name: Clear cache ubuntu 1 - if: matrix.os == 'ubuntu-22.04' - run: | - rm -rf ~/.cache/* - - name: Run pytest test_summary - run: | - cd ammico - python -m pytest test/test_summary.py -m "not long" -svv --cov=. --cov-report=xml --cov-append - - name: Clear cache ubuntu 2 - if: matrix.os == 'ubuntu-22.04' - run: | - rm -rf ~/.cache/* - - name: Run pytest test_text - run: | - cd ammico - python -m pytest test/test_text.py -m "not gcv" -svv --cov=. --cov-report=xml --cov-append - - name: Run pytest test_utils - run: | - cd ammico - python -m pytest test/test_utils.py -svv --cov=. --cov-report=xml --cov-append - - name: Upload coverage - if: matrix.os == 'ubuntu-22.04' && matrix.python-version == '3.9' - uses: codecov/codecov-action@v3 - env: - CODECOV_TOKEN: ${{ secrets.CODECOV_TOKEN }} - with: - fail_ci_if_error: false - files: ammico/coverage.xml - verbose: true + # - name: Run pytest test_colors + # run: | + # cd ammico + # python -m pytest test/test_colors.py -svv --cov=. --cov-report=xml --cov-append + # - name: Run pytest test_cropposts + # run: | + # cd ammico + # python -m pytest test/test_cropposts.py -svv --cov=. --cov-report=xml --cov-append + # - name: Run pytest test_display + # run: | + # cd ammico + # python -m pytest test/test_display.py -svv --cov=. --cov-report=xml --cov-append + # - name: Run pytest test_faces + # run: | + # cd ammico + # python -m pytest test/test_faces.py -svv --cov=. --cov-report=xml --cov-append + # - name: Run pytest test_multimodal_search + # run: | + # cd ammico + # python -m pytest test/test_multimodal_search.py -m "not long" -svv --cov=. --cov-report=xml --cov-append + # - name: Clear cache ubuntu 1 + # if: matrix.os == 'ubuntu-22.04' + # run: | + # rm -rf ~/.cache/* + # - name: Run pytest test_summary + # run: | + # cd ammico + # python -m pytest test/test_summary.py -m "not long" -svv --cov=. --cov-report=xml --cov-append + # - name: Clear cache ubuntu 2 + # if: matrix.os == 'ubuntu-22.04' + # run: | + # rm -rf ~/.cache/* + # - name: Run pytest test_text + # run: | + # cd ammico + # python -m pytest test/test_text.py -m "not gcv" -svv --cov=. --cov-report=xml --cov-append + # - name: Run pytest test_utils + # run: | + # cd ammico + # python -m pytest test/test_utils.py -svv --cov=. --cov-report=xml --cov-append + # - name: Upload coverage + # if: matrix.os == 'ubuntu-22.04' && matrix.python-version == '3.9' + # uses: codecov/codecov-action@v3 + # env: + # CODECOV_TOKEN: ${{ secrets.CODECOV_TOKEN }} + # with: + # fail_ci_if_error: false + # files: ammico/coverage.xml + # verbose: true +# \ No newline at end of file From 6625e529d137679c6af94d2fdd5107459ffc7c3b Mon Sep 17 00:00:00 2001 From: Inga Ulusoy Date: Mon, 7 Oct 2024 08:06:00 +0200 Subject: [PATCH 14/19] try with a few others --- .github/workflows/ci.yml | 2 +- pyproject.toml | 22 ---------------------- 2 files changed, 1 insertion(+), 23 deletions(-) diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml index d0c9d043..90486eee 100644 --- a/.github/workflows/ci.yml +++ b/.github/workflows/ci.yml @@ -36,7 +36,7 @@ jobs: sudo apt-get install libopenblas-dev - name: Install dependencies run: | - pip install scipy + pip install -e . # python -m pip install uv # uv pip install --system -e . # - name: Run pytest test_colors diff --git a/pyproject.toml b/pyproject.toml index 6d3fd33b..4768d1c5 100644 --- a/pyproject.toml +++ b/pyproject.toml @@ -22,41 +22,19 @@ classifiers = [ ] dependencies = [ - "bertopic<=0.14.1", - "dash>=2.11.0", - "datasets", - "deepface<=0.0.92", - "googletrans==4.0.0rc1", - "google-cloud-vision", - "grpcio", - "importlib_metadata", - "importlib_resources", "ipython", "jupyter", - "jupyter_dash", "matplotlib", "numpy<=1.23.4", "pandas", - "Pillow", - "pooch", - "protobuf", "pytest", "pytest-cov", - "Requests", - "retina_face", "ammico-lavis>=1.0.2.3", "setuptools", - "spacy", "tensorflow>=2.13.0", "torch<2.4.0", "transformers", - "google-cloud-vision", - "dash_bootstrap_components", - "colorgram.py", - "webcolors>1.13", - "colour-science", "scikit-learn>1.3.0", - "tqdm" ] [project.scripts] From dbfbf792c4f445f18fbae9137b935c57f5025ec1 Mon Sep 17 00:00:00 2001 From: Inga Ulusoy Date: Mon, 7 Oct 2024 08:06:58 +0200 Subject: [PATCH 15/19] use hatchling --- pyproject.toml | 7 ++----- 1 file changed, 2 insertions(+), 5 deletions(-) diff --git a/pyproject.toml b/pyproject.toml index 4768d1c5..f69a9c79 100644 --- a/pyproject.toml +++ b/pyproject.toml @@ -1,8 +1,6 @@ [build-system] -requires = [ - "setuptools>=61", -] -build-backend = "setuptools.build_meta" +requires = ["hatchling"] +build-backend = "hatchling.build" [project] name = "ammico" @@ -30,7 +28,6 @@ dependencies = [ "pytest", "pytest-cov", "ammico-lavis>=1.0.2.3", - "setuptools", "tensorflow>=2.13.0", "torch<2.4.0", "transformers", From d17d44056a4667bef0c189337a4cd4f2f4affdfd Mon Sep 17 00:00:00 2001 From: Inga Ulusoy Date: Mon, 7 Oct 2024 08:34:31 +0200 Subject: [PATCH 16/19] wording changes, install all requirements --- .github/workflows/ci.yml | 2 +- FAQ.md | 7 ++++- ammico/faces.py | 28 +++++++++++-------- ammico/notebooks/DemoNotebook_ammico.ipynb | 3 +- ammico/text.py | 16 +++++------ .../notebooks/DemoNotebook_ammico.ipynb | 8 +++++- pyproject.toml | 23 +++++++++++++++ 7 files changed, 62 insertions(+), 25 deletions(-) diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml index 90486eee..f48823b7 100644 --- a/.github/workflows/ci.yml +++ b/.github/workflows/ci.yml @@ -31,7 +31,7 @@ jobs: brew install ffmpeg brew install openblas - name: install openblas on linux - if: matrix.os == 'ubuntu-22.04' + if: matrix.os == 'ubuntu-24.04' run: | sudo apt-get install libopenblas-dev - name: Install dependencies diff --git a/FAQ.md b/FAQ.md index c58377fe..caffea1c 100644 --- a/FAQ.md +++ b/FAQ.md @@ -95,5 +95,10 @@ Some features of ammico require internet access; a general answer to this questi - Color analysis: The `color` module does not require an internet connection. ## Why don't I get probabilistic assessments of age, gender and race when running the Emotion Detector? -Due to well documented biases in the detection of minorities with computer vision tools, and to the ethical implications of such detection, these parts of the tool are not directly made available to users. To access these capabilities, users must first agree with a ethical disclosure statement that reads: "The Emotion Detector uses RetinaFace to probabilistically assess the gender, age and race of the detected faces. Such assessments may not reflect how the individuals identified by the tool view themselves. Additionally, the classification is carried out in simplistic categories and contains only the most basic classes, for example “male” and “female” for gender. By continuing to use the tool, you certify that you understand the ethical implications such assessments have for the interpretation of the results." +Due to well documented biases in the detection of minorities with computer vision tools, and to the ethical implications of such detection, these parts of the tool are not directly made available to users. To access these capabilities, users must first agree with a ethical disclosure statement that reads: + +"DeepFace and RetinaFace provide wrappers to trained models in face recognition and emotion detection. Age, gender and race/ethnicity models were trained on the backbone of VGG-Face with transfer learning. +ETHICAL DISCLOSURE STATEMENT: +The Emotion Detector uses DeepFace and RetinaFace to probabilistically assess the gender, age and race of the detected faces. Such assessments may not reflect how the individuals identify. Additionally, the classification is carried out in simplistic categories and contains only the most basic classes (for example, “male” and “female” for gender, and seven non-overlapping categories for ethnicity). To access these probabilistic assessments, you must therefore agree with the following statement: “I understand the ethical and privacy implications such assessments have for the interpretation of the results and that this analysis may result in personal and possibly sensitive data, and I wish to proceed.” + This disclosure statement is included as a separate line of code early in the flow of the Emotion Detector. Once the user has agreed with the statement, further data analyses will also include these assessments. diff --git a/ammico/faces.py b/ammico/faces.py index d6990c10..d731c2a0 100644 --- a/ammico/faces.py +++ b/ammico/faces.py @@ -79,18 +79,22 @@ def _processor(fname, action, pooch): ), ) -ETHICAL_STATEMENT = """This analysis uses the DeepFace and RetinaFace libraries. - DeepFace and RetinaFace provide wrappers to trained models in face recognition and - emotion detection. Age, gender and race / ethnicity models were trained - on the backbone of VGG-Face with transfer learning. - ETHICAL DISCLOSURE STATEMENT: - The Emotion Detector uses RetinaFace to probabilistically assess the gender, age and - race of the detected faces. Such assessments may not reflect how the individuals - identify. Additionally, the classification is carried - out in simplistic categories and contains only the most basic classes, for example - “male” and “female” for gender. By continuing to use the tool, you certify that you - understand the ethical implications such assessments have for the interpretation of - the results.""" +ETHICAL_STATEMENT = """DeepFace and RetinaFace provide wrappers to trained models in face +recognition and emotion detection. Age, gender and race/ethnicity models were trained on +the backbone of VGG-Face with transfer learning. + +ETHICAL DISCLOSURE STATEMENT: +The Emotion Detector uses DeepFace and RetinaFace to probabilistically assess the gender, +age and race of the detected faces. Such assessments may not reflect how the individuals +identify. Additionally, the classification is carried out in simplistic categories and +contains only the most basic classes (for example, "male" and "female" for gender, and seven +non-overlapping categories for ethnicity). To access these probabilistic assessments, you +must therefore agree with the following statement: "I understand the ethical and privacy +implications such assessments have for the interpretation of the results and that this +analysis may result in personal and possibly sensitive data, and I wish to proceed." +Please type your answer in the adjacent box: "YES" for "I agree with the statement" or "NO" +for "I disagree with the statement." +""" def ethical_disclosure(accept_disclosure: str = "DISCLOSURE_AMMICO"): diff --git a/ammico/notebooks/DemoNotebook_ammico.ipynb b/ammico/notebooks/DemoNotebook_ammico.ipynb index 90087f44..1c6a8589 100644 --- a/ammico/notebooks/DemoNotebook_ammico.ipynb +++ b/ammico/notebooks/DemoNotebook_ammico.ipynb @@ -44,7 +44,8 @@ "metadata": {}, "source": [ "## Use a test dataset\n", - "You can download a dataset for test purposes. Skip this step if you use your own data." + "\n", + "You can download this dataset for test purposes. Skip this step if you use your own data. If the data set on Hugging Face is gated or private, Hugging Face will ask you for a login token. However, for the default dataset in this notebook you do not need to provide one." ] }, { diff --git a/ammico/text.py b/ammico/text.py index 139f245b..7481c462 100644 --- a/ammico/text.py +++ b/ammico/text.py @@ -10,15 +10,13 @@ from bertopic import BERTopic from transformers import pipeline -PRIVACY_STATEMENT = """PRIVACY STATEMENT: The Text Detector uses Google Services - for text extraction and translation, and requires a Google Cloud Vision API Key - to work. Instructions about how to get such a key are provided here: - https://ssciwr.github.io/AMMICO/build/html/notebooks/DemoNotebook_ - ammico.html#Step-0:-Create-and-set-a-Google-Cloud-Vision-Key. - Google's privacy policy can be read here: - https://policies.google.com/privacy. By continuing to use this Detector, - you agree to send the data you want analyzed to the Google servers for - extraction and translation. """ +PRIVACY_STATEMENT = """The Text Detector uses Google Cloud Vision + and Google Translate. Detailed information about how information + is being processed is provided here: + https://ssciwr.github.io/AMMICO/build/html/readme_link.html#faq. + Google’s privacy policy can be read here: https://policies.google.com/privacy. + By continuing to use this Detector, you agree to send the data you want analyzed + to the Google servers for extraction and translation.""" def privacy_disclosure(accept_privacy: str = "PRIVACY_AMMICO"): diff --git a/docs/source/notebooks/DemoNotebook_ammico.ipynb b/docs/source/notebooks/DemoNotebook_ammico.ipynb index e5cdc4b8..90c3f8aa 100644 --- a/docs/source/notebooks/DemoNotebook_ammico.ipynb +++ b/docs/source/notebooks/DemoNotebook_ammico.ipynb @@ -44,9 +44,15 @@ "metadata": {}, "source": [ "## Use a test dataset\n", - "You can download a dataset for test purposes. Skip this step if you use your own data." + "\n", + "You can download this dataset for test purposes. Skip this step if you use your own data. If the data set on Hugging Face is gated or private, Hugging Face will ask you for a login token. However, for the default dataset in this notebook you do not need to provide one." ] }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [] + }, { "cell_type": "code", "execution_count": null, diff --git a/pyproject.toml b/pyproject.toml index f69a9c79..138f967e 100644 --- a/pyproject.toml +++ b/pyproject.toml @@ -20,18 +20,41 @@ classifiers = [ ] dependencies = [ + "bertopic<=0.14.1", + "dash>=2.11.0", + "datasets", + "deepface<=0.0.92", + "googletrans==4.0.0rc1", + "google-cloud-vision", + "grpcio", + "importlib_metadata", + "importlib_resources", "ipython", "jupyter", + "jupyter_dash", "matplotlib", "numpy<=1.23.4", "pandas", + "Pillow", + "pooch", + "protobuf", "pytest", "pytest-cov", + "Requests", + "retina_face", "ammico-lavis>=1.0.2.3", + "setuptools", + "spacy", "tensorflow>=2.13.0", "torch<2.4.0", "transformers", + "google-cloud-vision", + "dash_bootstrap_components", + "colorgram.py", + "webcolors>1.13", + "colour-science", "scikit-learn>1.3.0", + "tqdm" ] [project.scripts] From 89e92009b8dccad22423fa7dfa16ed0effc7bc65 Mon Sep 17 00:00:00 2001 From: Inga Ulusoy Date: Mon, 7 Oct 2024 13:30:05 +0200 Subject: [PATCH 17/19] fix offending spacy version --- .github/workflows/ci.yml | 104 ++++++++++++++++++--------------------- pyproject.toml | 2 +- 2 files changed, 50 insertions(+), 56 deletions(-) diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml index f48823b7..7abaa466 100644 --- a/.github/workflows/ci.yml +++ b/.github/workflows/ci.yml @@ -29,63 +29,57 @@ jobs: if: matrix.os == 'macos-latest' run: | brew install ffmpeg - brew install openblas - - name: install openblas on linux - if: matrix.os == 'ubuntu-24.04' - run: | - sudo apt-get install libopenblas-dev - name: Install dependencies run: | pip install -e . # python -m pip install uv # uv pip install --system -e . - # - name: Run pytest test_colors - # run: | - # cd ammico - # python -m pytest test/test_colors.py -svv --cov=. --cov-report=xml --cov-append - # - name: Run pytest test_cropposts - # run: | - # cd ammico - # python -m pytest test/test_cropposts.py -svv --cov=. --cov-report=xml --cov-append - # - name: Run pytest test_display - # run: | - # cd ammico - # python -m pytest test/test_display.py -svv --cov=. --cov-report=xml --cov-append - # - name: Run pytest test_faces - # run: | - # cd ammico - # python -m pytest test/test_faces.py -svv --cov=. --cov-report=xml --cov-append - # - name: Run pytest test_multimodal_search - # run: | - # cd ammico - # python -m pytest test/test_multimodal_search.py -m "not long" -svv --cov=. --cov-report=xml --cov-append - # - name: Clear cache ubuntu 1 - # if: matrix.os == 'ubuntu-22.04' - # run: | - # rm -rf ~/.cache/* - # - name: Run pytest test_summary - # run: | - # cd ammico - # python -m pytest test/test_summary.py -m "not long" -svv --cov=. --cov-report=xml --cov-append - # - name: Clear cache ubuntu 2 - # if: matrix.os == 'ubuntu-22.04' - # run: | - # rm -rf ~/.cache/* - # - name: Run pytest test_text - # run: | - # cd ammico - # python -m pytest test/test_text.py -m "not gcv" -svv --cov=. --cov-report=xml --cov-append - # - name: Run pytest test_utils - # run: | - # cd ammico - # python -m pytest test/test_utils.py -svv --cov=. --cov-report=xml --cov-append - # - name: Upload coverage - # if: matrix.os == 'ubuntu-22.04' && matrix.python-version == '3.9' - # uses: codecov/codecov-action@v3 - # env: - # CODECOV_TOKEN: ${{ secrets.CODECOV_TOKEN }} - # with: - # fail_ci_if_error: false - # files: ammico/coverage.xml - # verbose: true -# \ No newline at end of file + - name: Run pytest test_colors + run: | + cd ammico + python -m pytest test/test_colors.py -svv --cov=. --cov-report=xml --cov-append + - name: Run pytest test_cropposts + run: | + cd ammico + python -m pytest test/test_cropposts.py -svv --cov=. --cov-report=xml --cov-append + - name: Run pytest test_display + run: | + cd ammico + python -m pytest test/test_display.py -svv --cov=. --cov-report=xml --cov-append + - name: Run pytest test_faces + run: | + cd ammico + python -m pytest test/test_faces.py -svv --cov=. --cov-report=xml --cov-append + - name: Run pytest test_multimodal_search + run: | + cd ammico + python -m pytest test/test_multimodal_search.py -m "not long" -svv --cov=. --cov-report=xml --cov-append + - name: Clear cache ubuntu 1 + if: matrix.os == 'ubuntu-22.04' + run: | + rm -rf ~/.cache/* + - name: Run pytest test_summary + run: | + cd ammico + python -m pytest test/test_summary.py -m "not long" -svv --cov=. --cov-report=xml --cov-append + - name: Clear cache ubuntu 2 + if: matrix.os == 'ubuntu-22.04' + run: | + rm -rf ~/.cache/* + - name: Run pytest test_text + run: | + cd ammico + python -m pytest test/test_text.py -m "not gcv" -svv --cov=. --cov-report=xml --cov-append + - name: Run pytest test_utils + run: | + cd ammico + python -m pytest test/test_utils.py -svv --cov=. --cov-report=xml --cov-append + - name: Upload coverage + if: matrix.os == 'ubuntu-22.04' && matrix.python-version == '3.9' + uses: codecov/codecov-action@v3 + env: + CODECOV_TOKEN: ${{ secrets.CODECOV_TOKEN }} + with: + fail_ci_if_error: false + files: ammico/coverage.xml + verbose: true diff --git a/pyproject.toml b/pyproject.toml index 138f967e..717887c3 100644 --- a/pyproject.toml +++ b/pyproject.toml @@ -44,7 +44,7 @@ dependencies = [ "retina_face", "ammico-lavis>=1.0.2.3", "setuptools", - "spacy", + "spacy<=3.7.5", "tensorflow>=2.13.0", "torch<2.4.0", "transformers", From 3d810952fa0ed21c55bc3c3c4c8cb5920d53d7d4 Mon Sep 17 00:00:00 2001 From: Inga Ulusoy Date: Mon, 7 Oct 2024 13:53:46 +0200 Subject: [PATCH 18/19] run all tests --- .github/workflows/ci.yml | 7 +++---- 1 file changed, 3 insertions(+), 4 deletions(-) diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml index 7abaa466..d6238bbe 100644 --- a/.github/workflows/ci.yml +++ b/.github/workflows/ci.yml @@ -15,8 +15,7 @@ jobs: strategy: fail-fast: false matrix: - # os: [ubuntu-22.04,windows-latest,macos-latest] - os: [ubuntu-24.04] + os: [ubuntu-24.04,windows-latest,macos-latest] python-version: [3.11] steps: - name: Checkout repository @@ -63,7 +62,7 @@ jobs: cd ammico python -m pytest test/test_summary.py -m "not long" -svv --cov=. --cov-report=xml --cov-append - name: Clear cache ubuntu 2 - if: matrix.os == 'ubuntu-22.04' + if: matrix.os == 'ubuntu-24.04' run: | rm -rf ~/.cache/* - name: Run pytest test_text @@ -75,7 +74,7 @@ jobs: cd ammico python -m pytest test/test_utils.py -svv --cov=. --cov-report=xml --cov-append - name: Upload coverage - if: matrix.os == 'ubuntu-22.04' && matrix.python-version == '3.9' + if: matrix.os == 'ubuntu-24.04' && matrix.python-version == '3.11' uses: codecov/codecov-action@v3 env: CODECOV_TOKEN: ${{ secrets.CODECOV_TOKEN }} From de7ba83972e0df334703b433b80da4f77304db80 Mon Sep 17 00:00:00 2001 From: Inga Ulusoy Date: Mon, 7 Oct 2024 14:08:03 +0200 Subject: [PATCH 19/19] include faq in documentation, fix link --- ammico/text.py | 2 +- docs/source/faq_link.md | 2 ++ docs/source/index.rst | 1 + 3 files changed, 4 insertions(+), 1 deletion(-) create mode 100644 docs/source/faq_link.md diff --git a/ammico/text.py b/ammico/text.py index 7481c462..8d79e322 100644 --- a/ammico/text.py +++ b/ammico/text.py @@ -13,7 +13,7 @@ PRIVACY_STATEMENT = """The Text Detector uses Google Cloud Vision and Google Translate. Detailed information about how information is being processed is provided here: - https://ssciwr.github.io/AMMICO/build/html/readme_link.html#faq. + https://ssciwr.github.io/AMMICO/build/html/faq_link.html. Google’s privacy policy can be read here: https://policies.google.com/privacy. By continuing to use this Detector, you agree to send the data you want analyzed to the Google servers for extraction and translation.""" diff --git a/docs/source/faq_link.md b/docs/source/faq_link.md new file mode 100644 index 00000000..9e07425d --- /dev/null +++ b/docs/source/faq_link.md @@ -0,0 +1,2 @@ +```{include} ../../FAQ.md +``` \ No newline at end of file diff --git a/docs/source/index.rst b/docs/source/index.rst index 839cfd5c..95e23e64 100644 --- a/docs/source/index.rst +++ b/docs/source/index.rst @@ -11,6 +11,7 @@ Welcome to AMMICO's documentation! :caption: Contents: readme_link + faq_link create_API_key_link notebooks/DemoNotebook_ammico notebooks/Example cropposts