This repository contains the work of the Capstone project "EyeTism - Eye Movement Based Autism Diagnostics", developed within the intensive Data Science Bootcamp provided by neuefische GmbH.
Our "EyeTism" project focused on the development of a tool for diagnosis of Autism Spectrum Disorder (ASD) in children within the age range of 8 - 15 years old. ASD is a developmental disability with effects on social interaction and learning. Hence, early diagnosis of affected children is crucial for child development. Although individuals with ASD often exhibit distinct gaze behavior compared to typically developing (TD), ASD detection still remains challenging. Our tool employs machine learning on eye tracking data from high-functioning ASD and TD children to build an integrative tool for pediatricians responsible for diagnosing ASD based on visual attention patterns of patients on a selected subset of images.
Gaze behaviors of 14 patients with ASD and 14 TD were analyzed when exposed to diverse visual stimuli. 300 images composed the Saliency4ASD dataset (https://saliency4asd.ls2n.fr/datasets/) featuring diverse scenes:
- 40 images featuring animals
- 88 with buildings or objects
- 20 depicting natural scenes
- 36 portraying multiple people in one image
- 41 displaying multiple people and objects in one image
- 32 with a single person
- 43 with a single person and objects in one image
Reference dataset: H. Duan, G. Zhai, X. Min, Z. Che, Y. Fang, X. Yang, J. Gutiérrez, P. Le Callet, “A Dataset of Eye Movements for the Children with Autism Spectrum Disorder”, ACM Multimedia Systems Conference (MMSys’19), Jun. 2019
git clone [email protected]:eockfen/EyeTism.git
cd EyeTism
- Open the terminal
- Create a virtual environmant with the tool of your choice
- Install
Python 3.11.3
- Depending on how you manage your virtual environments, either install the all dependencies
- via conda:
conda env create -f environment.yml
- or via pip:
pip install -r requirements.txt
- via conda:
- important notes:
- to install
dlib
you need to have CMake and a working C++ compiler installed - in case your are on a Mac and run into Problems while pip-installing
lightgbm
, it could be thatbrew install libomp
see here helps
- to install
-
Download the following .zip archives and store them in the
/source
folder: -
Run python script in
/scripts
folder:
cd ./scripts
python unzip_data.py
- This will extract the full
Saliency4ASD
dataset, as well as the saliency predictions of the 300 images for three different visual attentive models:DeepGazeIIE
(repo) & theResNET
andVGG
versions ofSAM
(repo).
Re-do saliency predictions
-
The extracted zip files contain the already generated saliency maps predicted by
DeepGazeIIE
andSAM
, but your are able to reproduce our steps we did in order to obtain these maps. -
Originally downloaded saliency prediction maps of the
SAM
model had different names as the images in theSaliency4ASD
dataset, therefore the following steps were performed:- Matching differently named files to the salency4asd files
- Renaming / copying the saliency predicted maps
-
DeepGazeIIE
predictions were done by implementing their actual model. -
To re-do these steps, run the following code:
cd ./scripts
python unzip_data.py sam
python prepare_saliency_maps.py sam
python prepare_saliency_maps.py dg
-
Check out and run the
extract_features.iypnb
notebook in the/notebooks
folder. -
Extracted features will be saved in
/data/df_deep_sam.csv
file. This process can approximately take a few hours, depending on your machine. -
If you are as inpatient as me and don't want to waste precious time waiting for our "slow) script to calculate all the features, you can download the df_deep_sam.csv and store it in the
/data
folder
After running the notebook, three outputs are generated:
-
All individual scanpaths are overlayed onto the stimuli images.
-
All detected objects (whose probability scores will be saved in a
.txt
file) and faces are overlayed onto the stimuli images. -
Individual scanpaths, detected objects and faces are overlayed onto the stimuli images.
Outputs will be saved in /data/obj_detection
folder and in the /data/individual_scanpaths
folder, respectively.
will be provided
Check out the notebook baseline.iypnb
in the /notebooks
folder to run the baseline model and see the results.
The final models were selected after evaluating the 30-image-test-set by defining the best model-image-pairs, as detailed in the notebooks in the /modeling
folder
The results were generated as reported:
-
In notebook
create_basemodel_pipelines.ipynb
- All models use a different set of features, therefore pipelines are built to being able to also generate stacking and voting classifiers
- This results in uncalibrated basemodels of
RF
,XGBoost
andSVC
, which are saved in/models/uncalibrated_pipelines/<MODEL>_uncalib.pickle
-
In notebook
calib_RF_XGB_SVC_threshold.ipynb
- models mentioned in 1. are calibrated
- threshold analysis is performed to find the optimal decision thresholds for each model in order to maximize f1 score
- calibrated models are saved in
/models/calibrated/<MODEL>_calib.pickle
-
In notebook
voting_RF_XGB_SVC_threshold.ipynb
voting
classifier is built on top of the previous calibrated models (RF
,XGBoost
andSVC
)- also, the optimal (max. f1) threshold is found for this
voting
classifier - voting model is saved in
/models/calibrated/VTG_calib.pickle
-
In notebook
stacking_<MODEL>_calib.ipynb
stacking
classifiers are built for 4 different final estimators- Logistic regression (LR)
- K-nearest neighbors (KNN)
- Light gradient boosting machine (LGBM)
- Naive Bayes (NB)
- base estimators are the calibrated basemodels
RF
,XGBoost
andSVC
stacking
models are saved in/models/calibrated/stacking_<MODEL>_calib.pickle
-
In notebook
stacking_thresholding.ipynb
- threshold analysis is performed for the four
stacking
models
- threshold analysis is performed for the four
The 8 models developed are then evaluated on our 30-image-test-set as reported in the notebook FINAL_EVALUATION.ipynb
.
We selected 9 images, and defined the optimal models to classify the eye tracking data for the respective image. The following figure shows the model performance for each of these selected images:
Overall, the performace metrics for our diagnostic tool are:
- f2-score: 90.5 %
- accuracy: 82.1 %
To showcase the basic functionality of our diagnostic tool, we've constructed a Streamlit application. If you're inclined towards practical demonstrations rather than delving into intricate code details, this application is tailor-made for you. Feel free to explore and experience the practical side of our project!
To delve into its workings, you have two options:
- Local Installation:
- cloning this repository onto your system
- next, establish a virtual environment to ensure a clean and isolated setup
- finally, initiate the dashboard by
cd Dashboard
and executing the commandstreamlit run app.py
within your terminal - this method allows you to explore the tool's capabilities firsthand, right from the comfort of your own machine
- Online Access:
- Prefer a hassle-free experience? Look no further!
- Simply visit EyeTism to access the application online.
Whichever route you choose, we hope this demonstration offers valuable insights into the potential of diagnostic tools and inspires further exploration in the realm of data-driven solutions.
We had the opportunity to present our Capstone project at the graduation event of the Neuefische Data Science Bootcamp. You can download the slides, or even watch our presentation on Youtube.
All authors express their profound gratitude to the coaches and the organization of neuefische GmbH
/CNN
- This folder contains the work done for the CNN modelling part (not integrated in the workflow)
README.md
can navigate you through its content
/Dashboard
- This folder contains the streamlit application we designed to demonstrate how a simple version of a diagnostic tool could look like.
- You can either clone this repository, install a virtual environment and run the dashboard by yourself via
streamlit run /Dashboard/app.py
- or, you can visit the online version at LINK WILL FOLLOW
/data
- All the generated data while running the scripts and notebooks will be saved here.
/images
- In this folder you will find:
- final_set.png contains the final set of images
- test_set.png contains set of images used for generating the predictions of the models
- val_set.png was another candidate for the test-set
- figures which are used in this README
/modeling
- In this folder you will find:
- the subfolder
/dev
where several models were developed, trained and tested, but not made it into the final set of models - the notebooks generated for the 8 final models, containing the pipelines to realize voting and stacking classifiers (see Roadmap above)
- the final evaluation of the models
FINAL_EVALUATION.ipynb
- the subfolder
/models
- In this folder you find the subfolders:
/dev
contains subfolders with all the models generated during the development, finetuning and optimization phase aspickle
files/mediapipe
contains mediapipe models used for object detection/uncalibrated_pipelines
contains uncalibrated models aspickle
files/calibrated
contains the calibrated models aspickle
files
/notebooks
- In this folder you find the notebooks generated for the
EDA
, thebaseline
modeling part, and the extraction of thefeatures
/scripts
- This folder contains all scripts and function used by different notebooks