Massi 2022

Model-based and model-free replay mechanisms for reinforcement learning in neurorobotics

Table of Contents

About the Project
- Goals of the modeling
- Contributors & Contacts
Requirements
Usage
- Structure of the project
- Modules

About the Project

This repository is related to the article :
Model-based and model-free replay mechanisms for reinforcement learning in neurorobotics (2022, Accepted)
Elisa Massi, Remi Dromnelle, Julianne Mailly, Jeanne Barthéléemy, Julien Canitrot, Esther Poniatowski, Benoît Girard and Mehdi Khamassi.
Institute of Intelligent Systems and Robotics, CNRS, Sorbonne University, F-75005
Paris, France
It contains codes and data used and generated for the part :
3 Simulation of individual replay strategies with an autonomously learned state decomposition
Keywords: hippocampal replay, reinforcement learning, neurorobotics, model-based, model-free

Project Link: https://github.com/esther-poniatowski/Massi2022

Goals of the modeling

To study the implications of offline learning in spatial navigation, from rodents' behavior to robotics, this article investigated the role of several Reinforement Learning (RL) algorithms, by simulating artificial agents. The task of the agents mimicks the classical Morris water maze task (Morris, 1981). The environment is defined by a circular maze, consistent with the original experimental paradigm in terms of environment/robot size ratio. The goal of the task is to navigate the environment until reaching the rewarded location, starting from a fixed initial point. Agents learn over 50 trials, and the reward location is changed at the middle of the simulation (trial 25). In this robotic framework, the task is a Markov decision problem (MDP), where agents visit discrete states, using a finite set of discrete actions.

The learning performances of the agents are tested here in two conditions:

Deterministic environment: In this version of the task, any action a performed in a given state always leads the agent to the same arrival state (with probability 1).
Stochastic environment: In this version of the task, performing action in a given state can lead to distinct possible arrival state (non-null probabilities for several states).

Four learning strategies are compared. Three of them include replays of the experienced state-action-state transitions during each inter-trial interval.

Model Free (MF) No replay: In this classical reinforcement learning framework, the artificial agent learns only online, during behavior.
Model Free (MF) Backward replay: This agents stores the most recent experienced state-action-state transitions in a memory buffer, and replays them from the more recent (rewarded) one to the most remote one.
Model Free (MF) Backward replay: This agents stores the most recent experienced state-action-state transitions in a memory buffer, and replays them in random order.
Model Based (MB) Prioritized sweeping: This agents stores the most recent experienced state-action-state transitions in a memory buffer, and replays them from the more recent (rewarded) one to the most remote one. Note that one more replay strategy (Most diverse sequence replay) appears in the code, but is not investigated in the related article.

Contributors & Contacts

(back to top)

Requirements

All codes are built in and uses the following libraries:

numpy
random
bisect
itertools
copy
scipy
bioinfokit
statsmodels
similaritymeasures
pandas
pickle
matplotlib
seaborn

Those packages can be installes by the following command:

pip install -r requirements.txt

(back to top)

Usage

INFO

More details are provided in the Jupyter notebooks and the code comments.

Structure of the project

The project is made up of the following files and directories :

Two Jupyter notebooks guide the execution of the main functionalities.
- Navigation_generate_data.ipynb can be used to generate data, with arbitrary parameters and different versions of the task.
- Navigation_alanysis.ipynb provides graphical visualization of the results, reproducing in particular the figures of the article.
Nine python files correspond to the modules called by the Jupyter notebooks.
The folder Data/ is the location where generated data are stored. called by the Jupyter notebooks.
- It aready contains most of the data files required to plot the figures from the Jupyter notebooks. Files' formats are either .csv (for dataframes) or .pickle (for dictionaries, arrays, lists).
- The sub-folder Data_indiv/ specifically contains detailed data for 100 individual artificial agents.
The folder Figures/ is the location where generated figures can be saved. It already contains the file map1.pgm necessary to plot one type of figure, representing the environment.
Three .txt files contain the transition matrices which define the properties of the environment.
The folder data+code_2generate_the_paper_figures/ contains all the data and the scripts to generate what is needed for the figures in the paper in Sect.3.

Modules

Except for the module parameters_MF_MB.py, all the modules only contain functions (no script), which are called by the Jupyter notebooks.
More details about those modules and functions are available in the code documentation (accessed via help()).

parameters_MF_MB.py - Defines the parameters of the simulation and the transition matrices. All the parameters are collected in a dictionary, which is provided to the main functions as a default argument (as the module is imported in the preamble of all other files).
algorithms_MF_MB.py - Implements the reinforcement learning procedure and the different replay strategies, necessary to perform one trial (behavior + replay).
simulations_MF_MB.py - Generates simulations of n_individuals (100) agents over n_trials (50) trials, in a given environmental condition (deterministic/stochastic). Saves data in the appropriate folder.
analyzes_MF_MB.py - Extracts relevant features of the data: computes summary statistics, performs statistical analyses...
figures_MF_MB - Generates the main functions of the article.
figures_indiv, figures_pop, figures_qvalue_map, figures_utils - Other graphical functions to display results more flexibly in exploratory invesigations.

(back to top)

Name		Name	Last commit message	Last commit date
Latest commit History 96 Commits
Data		Data
Figures		Figures
data+code_2generate_the_paper_figures		data+code_2generate_the_paper_figures
Navigation_analysis.ipynb		Navigation_analysis.ipynb
Navigation_final.ipynb		Navigation_final.ipynb
Navigation_generate_data.ipynb		Navigation_generate_data.ipynb
README.md		README.md
analyzes_MF_MB.py		analyzes_MF_MB.py
figure_qvalue_map.py		figure_qvalue_map.py
figures_final.py		figures_final.py
figures_indiv.py		figures_indiv.py
figures_pop.py		figures_pop.py
figures_utils.py		figures_utils.py
functions_MF_MB.py		functions_MF_MB.py
parameters_MF_MB.py		parameters_MF_MB.py
simulations_MF_MB.py		simulations_MF_MB.py
transitions.txt		transitions.txt
transitions_deterministic.txt		transitions_deterministic.txt
transitions_stochastic.txt		transitions_stochastic.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Massi 2022

Model-based and model-free replay mechanisms for reinforcement learning in neurorobotics

About the Project

Goals of the modeling

Contributors & Contacts

Requirements

Usage

Structure of the project

Modules

About

Releases

Packages

Contributors 2

Languages

esther-poniatowski/Massi2022

Folders and files

Latest commit

History

Repository files navigation

Massi 2022

Model-based and model-free replay mechanisms for reinforcement learning in neurorobotics

About the Project

Goals of the modeling

Contributors & Contacts

Requirements

Usage

Structure of the project

Modules

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages