Skip to content
This repository has been archived by the owner on Dec 11, 2022. It is now read-only.

Tf2 migration #430

Open
wants to merge 153 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
153 commits
Select commit Hold shift + click to select a range
ed96453
Added WIP warning in the README.md file header
Jul 7, 2019
0c26e7c
Added tf2 components folder. The folder contains the same files as te…
Jul 14, 2019
0a49ea4
Added __init_ files to tf2_components sub folders
Jul 16, 2019
c9429d7
renaming tf2_components folder to tensorflow_components
Jul 16, 2019
4261f32
Nothing importent in this commit. Mostly blanks were inserted inorder…
Jul 16, 2019
c262ef0
Started input embeder migration to Keras
Jul 24, 2019
2d71311
Input embedder migrated to tf2
Jul 28, 2019
4a162af
MiddleWare migrated to tf2. Added DnnModel class, this class instanti…
Jul 29, 2019
ecc07bb
initial version of general_network migrated to tf2. The network is im…
Jul 30, 2019
1fe2719
Forward pass is running, not tested for correctness. Training is not …
Aug 1, 2019
f1424be
Separated the loss function from the DNN head. loss is now in a separ…
Aug 5, 2019
8c362f4
get_model() in general network creates a new model and returns it, in…
Aug 6, 2019
8199a11
Optimizers migrated to Keras optimizers. Optimizers are now separated…
Aug 6, 2019
cac4717
learning rate scheduler moved to optimizer. Model is built before inp…
Aug 7, 2019
668d493
Added activation functions, batch normalization and dropout layers to…
Aug 7, 2019
64d6142
Added input clipping, scaling and offset reduction (translation) in t…
Aug 7, 2019
135502b
Input embedder and vector embedder migration is completed
Aug 7, 2019
2738f96
Added batchnorm, dropout and activation functions to middleware
Aug 8, 2019
8759fb8
Loss was removed from DNN head
Aug 8, 2019
31dbcbd
get_input embedder is called with explicit vector embedder parameters…
Aug 8, 2019
69aa86b
get_middleware is called with explicit vector embedder parameters and…
Aug 11, 2019
296f8fc
Cleaning network head constructor
Aug 11, 2019
7699dd5
get_output_head is called with explicit vector embedder parameters an…
Aug 12, 2019
ff751e4
Checkpoint before mxnet enablement
Aug 13, 2019
cc628b5
Rolled back API changes to re enable mxnet
Aug 13, 2019
d3d84e6
Single worker model is instantiated in a seperate class. Need to chec…
Aug 13, 2019
1b14ff6
general_network migrated to tf2 (mostly)
Aug 14, 2019
ccbc182
Deleting redundant files
Aug 14, 2019
2cb3e12
GeneralTensorFlowNetwork extends TensorFlowArchitecture. We should co…
Aug 15, 2019
dea5335
Fixed devision bug and embbeder name
Aug 15, 2019
7c450b1
Full batch forward through the DNN model
Aug 15, 2019
d1cef10
changed model inputs to one input for each embedder. parallel predict…
Aug 19, 2019
30ca9cb
Loss is calculated on input batch
Aug 20, 2019
4387e3a
Gradient of the loss is calculated w.r.t model parameters
Aug 20, 2019
43620d8
Gradients accumulation converted to tf2
Aug 21, 2019
70b266d
Model weights are updated with optimization step
Aug 21, 2019
739b5bf
DQN is runing. Not tested.
Aug 21, 2019
7a3dd29
Shallow testing on DQN shows convergence
Aug 21, 2019
2efc676
Cosmetic changes
Aug 21, 2019
1fb347e
Removed generalLoss class. Loss class is defined in the same file as …
Aug 21, 2019
c542f61
Added temporary preparation for gradient rescaling, needs testing tha…
Aug 22, 2019
4d625b0
Removed gradient rescaling wrapper class
Aug 22, 2019
d09a4c1
get_input_embedder, get_middleware and get_output head are member fun…
Aug 22, 2019
bcd38e4
DNN Model moved to a separate module
Aug 22, 2019
ccdf110
Added losses directory with loss base class
Aug 22, 2019
4856501
Starting loss separation from head.
Aug 27, 2019
967b7f3
DNN Head is separeted from loss function
Aug 27, 2019
85ce5e6
Modules renaming
Aug 27, 2019
a2ce044
Removed loss from head module
Aug 27, 2019
bbef6cc
Image embedder converted to tf1 compatible version
Aug 28, 2019
a1c2587
Added small code snippet that enables running the script from command…
Aug 29, 2019
e09e4b1
GalN code review fixes
Aug 29, 2019
a1062a3
Adding Gal Leibovitch code review comments
Sep 1, 2019
99407ae
Adding Gal Leibovitch code review comments
Sep 1, 2019
3716e75
Added gradient clipping
Sep 1, 2019
c3e98a1
Removed input laeyr. DNN inputs are casted to 32-bit precision. This …
Sep 2, 2019
d044c67
Atari DQN is showing signs of convergence on breakout environment. Ne…
Sep 2, 2019
a22eb3d
Saver save and restore partly migrated, only saving the weights, not …
Sep 4, 2019
6f76420
Removed concurrency support in order to avoid broken pipe error
Sep 8, 2019
0f94381
Fixed memory leak bug
Sep 9, 2019
62c0084
Added support for configurable input embedder and configurable middle…
Sep 10, 2019
7c8c8b6
Wrapped tensor dense layer for mxnet compatibility
Sep 10, 2019
dfc8205
Starting clipped PPO infrastructure
Sep 11, 2019
afeb809
Added value loss
Sep 11, 2019
6ca380b
Adding a RL loss base class ontop of keras loss class
Sep 12, 2019
81e6bc3
Renamed output_heads to heads
Sep 12, 2019
790a148
Adding a RL loss base class ontop of keras loss class
Sep 12, 2019
480c039
Added a loss forward function. This is where each agent implements it…
Sep 15, 2019
cf5e44d
loss_forward is implemented in parent class, each child should implem…
Sep 15, 2019
4fee6e1
Temporary fix: Changed /presets/Mujoco_ClippedPPO to import tf dense …
Sep 15, 2019
852479c
Removed heads that are not implemented in tensorflow 2
Sep 15, 2019
cd586a3
dummy_model_inputs moved to dnn_model
Sep 16, 2019
99b152c
Spliting the targets into targets per loss
Sep 18, 2019
9d9627f
Head Loss expands keras layer and not keras loss. This is due to call…
Sep 18, 2019
d51fa30
Added output schema to the head loss output
Sep 18, 2019
4fbe85a
Added output schema to the head loss output
Sep 18, 2019
ee0e03e
Added output schema to the head loss output
Sep 19, 2019
eb87303
Merge branch 'tf2_migration' of https://github.com/NervanaSystems/coa…
Sep 19, 2019
4e6c8c7
Bug fix in tensorflow architecture. Dimensionality check added
Sep 19, 2019
ef888f2
Added _num_outputs property to output head to be compared against num…
Sep 22, 2019
baa32f9
Changed loss output to dictionary, added a check on the loss outputs
Sep 23, 2019
85d0300
Clipped PPO is running. Gradient tape is not working as expected with…
Sep 24, 2019
4400f50
Checkpoint
Sep 24, 2019
7c330fb
Starting to add support for stochastic policy
Sep 25, 2019
c400d9a
Agent is outputing probability for each action instead of mean and ST…
Sep 25, 2019
f03e065
Changed PPO agent to output probability distribution
Sep 26, 2019
d1d8f0a
Dimensions Bug fix in PPO loss
Sep 26, 2019
b4d13ce
PPO learning is extreamly slow
Sep 26, 2019
4504d8d
DQN is learning OK. Memory usage is very high
Sep 26, 2019
2a600b8
Started learning with PPO, reaching reward of 200 on the inverted pen…
Oct 2, 2019
41cdd05
Reaching evaluation reward of 250 on inverted pendulum and then perfo…
Oct 2, 2019
7d770d1
Removed loss from model.compaile
Oct 3, 2019
cbdb07d
Actor is outputing log std instead of std in order to to constrain po…
Oct 3, 2019
9c60056
Detaching The value of the std from the network output. Reaching rewa…
Oct 7, 2019
f603099
Input shape is derived from emmbeder parameters. Instead of from inst…
Oct 10, 2019
6aa49bd
Added network wrapper to hold network inputs and losses
Oct 13, 2019
d6fdef1
Added functional keras wrapper on top layer. mirrored_strategy does n…
Oct 13, 2019
35b5f9a
Fixed Bug in value loss calculation. Reaching reward of 1000, not stable
Oct 15, 2019
fe61c16
policy network std is a variable and not network output. Reaching 100…
Oct 15, 2019
deb08fc
Changed PPO Head to functional form. Reaching reward 0f 1000
Oct 17, 2019
f77fb18
Changed model_wrapper class to create_model function
Oct 17, 2019
e779725
Changed actor critic to functional form. DQN gives errors
Oct 17, 2019
89f7834
DQN Bug fixed PPO is Buggy
Oct 18, 2019
18da38a
Sanity PPO reaching 1000
Oct 18, 2019
fc4eaa4
Functional top model DQN and PPO running
Oct 18, 2019
78ec6dd
Changed head value output shape and value loss accordingly
Oct 19, 2019
604592d
Reverted unnecessary changes from initial commit. PPO reaches 1000
Oct 20, 2019
2c5eeb6
Changed back ppo agent to the original fetches form
Oct 20, 2019
5e00670
Starting single network migration to functional form
Oct 21, 2019
bba9ed2
Both DQN and PPO are running
Oct 21, 2019
0f508cc
Removed SingleDNN PPO Head models subclass impementation. PPO Reward …
Oct 22, 2019
2eef34e
Changed heads to functional form. PPO 1000
Oct 22, 2019
06569ce
Added functional losses
Oct 23, 2019
bd90a32
PPO is working, everything else is broken. Unintentionally shared par…
Nov 20, 2019
3f55ca6
Rolled back unintentional change to mxnet
Nov 21, 2019
99d7e67
Back to old accumulate_gradients, not verified
Nov 21, 2019
d149568
Should verfy performance. Both DQN and PPO are training. should imple…
Nov 26, 2019
eb88df4
Changed keras Dense to framework agnostic dense in PPO
Nov 26, 2019
be50f36
Changed keras Dense to framework agnostic dense in PPO. Should change…
Nov 26, 2019
254a9fa
humanoid is runing. inverted pendulum is checked with clipped PPO
Nov 27, 2019
1c24b6d
Humanoid reaches benchmark. Number of timesteps is OK. Wall clock tim…
Dec 1, 2019
e32b608
1.PEP 8 fixes and removed stale comments 2. DQN runs with convolution…
Dec 2, 2019
6de3d19
Removed loss function for head. Loss is implemented only via loss class
Dec 2, 2019
e4a5630
Inverted Pendulum verifiedd
Dec 4, 2019
e85fbf9
Reverted coach main to original version. Renaming and coments
Dec 4, 2019
4a499de
Checkpoint, before fixing additional input
Dec 4, 2019
82cd05f
Report generation
Dec 4, 2019
72d16dd
Presentation
Dec 5, 2019
5b6ffe8
Humanoid multiple seeds experiment done on this checkpoint
Dec 15, 2019
24fc264
Removed output schema
Dec 15, 2019
e6298e6
Changed loss input schema to split inputs based on trainable and non …
Dec 15, 2019
13bfe86
Updated copyriCopyright header in TF2 files
Dec 15, 2019
4bb017e
Each head_loss is responsible for extracting its own args. Inverted p…
Dec 16, 2019
8a19944
Updated input schema and input checking
Dec 16, 2019
1e10728
Removed PolicyHead and PPO VHead
Dec 16, 2019
086e5b8
Added comments and typings
Dec 16, 2019
c1c0d9b
Removed LSTM middleware, not supported in TF2
Dec 16, 2019
373c738
Added documentation
Dec 16, 2019
e5be943
Added documentation
Dec 16, 2019
fe47023
Added wrappers on dense layers and changed casting, Should test for c…
Dec 17, 2019
d87461f
loss type is not hard coded, can be configured from agent params
Dec 17, 2019
7f60eb1
QHead and VHead are generated via function call, not class constructor
Dec 17, 2019
e082a77
Added GPU support
Dec 17, 2019
54fc45a
Checkpoint before runing experiments
Dec 17, 2019
e545d76
Comments and typing hints were inserted
Dec 18, 2019
3a00afa
Restrict TensorFlow to only allocate 2GB of memory on the first GPU, …
Dec 22, 2019
bd87a2d
Added benchmark for humanoid_clipped_pp
Dec 25, 2019
c576eed
Added benchmark for DQN pong
Dec 29, 2019
4501581
Added benchmark for DQN breakout
Dec 29, 2019
12a40e0
Added benchmark for inverted pendulum, only have 2.5 M timesteps
Dec 29, 2019
6a5fa7b
Added benchmark for space invaders DQN
Dec 30, 2019
e6a7e52
Added benchmarks for clipped PPO double pendulum and half cheetah
Jan 8, 2020
70b7ad4
Added ant clipped ppo benchmark
Jan 12, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
# Coach
# Warrning:
## This branch of Coach is WIP for migration to tf2 and should not be checked out

[![CI](https://img.shields.io/circleci/project/github/NervanaSystems/coach/master.svg)](https://circleci.com/gh/NervanaSystems/workflows/coach/tree/master)
[![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://github.com/NervanaSystems/coach/blob/master/LICENSE)
Expand Down
Binary file added benchmarks/clipped_ppo/ant_clipped_ppo_tf2.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added benchmarks/clipped_ppo/ant_clipped_ppo_tf2_3.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added benchmarks/dqn/breakout_dqn_tf2.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added benchmarks/dqn/pong_dqn_tf2.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added benchmarks/dqn/space_invaders_dqn_tf2.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
13 changes: 9 additions & 4 deletions rl_coach/agents/clipped_ppo_agent.py
Original file line number Diff line number Diff line change
Expand Up @@ -202,10 +202,15 @@ def train_network(self, batch, epochs):
'entropy': []
}

fetches = [self.networks['main'].online_network.output_heads[1].kl_divergence,
self.networks['main'].online_network.output_heads[1].entropy,
self.networks['main'].online_network.output_heads[1].likelihood_ratio,
self.networks['main'].online_network.output_heads[1].clipped_likelihood_ratio]
# fetches = [self.networks['main'].online_network.output_heads[1].kl_divergence,
# self.networks['main'].online_network.output_heads[1].entropy,
# self.networks['main'].online_network.output_heads[1].likelihood_ratio,
# self.networks['main'].online_network.output_heads[1].clipped_likelihood_ratio]

fetches = [(1, 'kl_divergence'),
(1, 'entropy'),
(1, 'likelihood_ratio'),
(1, 'clipped_likelihood_ratio')]

# TODO-fixme if batch.size / self.ap.network_wrappers['main'].batch_size is not an integer, we do not train on
# some of the data
Expand Down
7 changes: 3 additions & 4 deletions rl_coach/architectures/architecture.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
#
# Copyright (c) 2017 Intel Corporation
# Copyright (c) 2019 Intel Corporation
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
Expand All @@ -14,15 +14,14 @@
# limitations under the License.
#

from typing import Any, Dict, List, Tuple

import numpy as np

from typing import Any, Dict, List, Tuple
from rl_coach.base_parameters import AgentParameters
from rl_coach.saver import SaverCollection
from rl_coach.spaces import SpacesDefinition



class Architecture(object):
@staticmethod
def construct(variable_scope: str, devices: List[str], *args, **kwargs) -> 'Architecture':
Expand Down
2 changes: 1 addition & 1 deletion rl_coach/architectures/embedder_parameters.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
#
# Copyright (c) 2017 Intel Corporation
# Copyright (c) 2019 Intel Corporation
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
Expand Down
2 changes: 1 addition & 1 deletion rl_coach/architectures/head_parameters.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
#
# Copyright (c) 2017 Intel Corporation
# Copyright (c) 2019 Intel Corporation
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
Expand Down
2 changes: 1 addition & 1 deletion rl_coach/architectures/layers.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
#
# Copyright (c) 2017 Intel Corporation
# Copyright (c) 2019 Intel Corporation
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
Expand Down
693 changes: 693 additions & 0 deletions rl_coach/architectures/legacy_tf_components/architecture.py

Large diffs are not rendered by default.

103 changes: 103 additions & 0 deletions rl_coach/architectures/legacy_tf_components/distributed_tf_utils.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,103 @@
#
# Copyright (c) 2017 Intel Corporation
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#

from typing import Tuple

import tensorflow as tf


def create_cluster_spec(parameters_server: str, workers: str) -> tf.train.ClusterSpec:
"""
Creates a ClusterSpec object representing the cluster.
:param parameters_server: comma-separated list of hostname:port pairs to which the parameter servers are assigned
:param workers: comma-separated list of hostname:port pairs to which the workers are assigned
:return: a ClusterSpec object representing the cluster
"""
# extract the parameter servers and workers from the given strings
ps_hosts = parameters_server.split(",")
worker_hosts = workers.split(",")

# Create a cluster spec from the parameter server and worker hosts
cluster_spec = tf.train.ClusterSpec({"ps": ps_hosts, "worker": worker_hosts})

return cluster_spec


def create_and_start_parameters_server(cluster_spec: tf.train.ClusterSpec, config: tf.ConfigProto=None) -> None:
"""
Create and start a parameter server
:param cluster_spec: the ClusterSpec object representing the cluster
:param config: the tensorflow config to use
:return: None
"""
# create a server object for the parameter server
server = tf.train.Server(cluster_spec, job_name="ps", task_index=0, config=config)

# wait for the server to finish
server.join()


def create_worker_server_and_device(cluster_spec: tf.train.ClusterSpec, task_index: int,
use_cpu: bool=True, config: tf.ConfigProto=None) -> Tuple[str, tf.device]:
"""
Creates a worker server and a device setter used to assign the workers operations to
:param cluster_spec: a ClusterSpec object representing the cluster
:param task_index: the index of the worker task
:param use_cpu: if use_cpu=True, all the agent operations will be assigned to a CPU instead of a GPU
:param config: the tensorflow config to use
:return: the target string for the tf.Session and the worker device setter object
"""
# Create and start a worker
server = tf.train.Server(cluster_spec, job_name="worker", task_index=task_index, config=config)

# Assign ops to the local worker
worker_device = "/job:worker/task:{}".format(task_index)
if use_cpu:
worker_device += "/cpu:0"
else:
worker_device += "/device:GPU:0"
device = tf.train.replica_device_setter(worker_device=worker_device, cluster=cluster_spec)

return server.target, device


def create_monitored_session(target: tf.train.Server, task_index: int,
checkpoint_dir: str, checkpoint_save_secs: int, config: tf.ConfigProto=None) -> tf.Session:
"""
Create a monitored session for the worker
:param target: the target string for the tf.Session
:param task_index: the task index of the worker
:param checkpoint_dir: a directory path where the checkpoints will be stored
:param checkpoint_save_secs: number of seconds between checkpoints storing
:param config: the tensorflow configuration (optional)
:return: the session to use for the run
"""
# we chose the first task to be the chief
is_chief = task_index == 0

# Create the monitored session
sess = tf.train.MonitoredTrainingSession(
master=target,
is_chief=is_chief,
hooks=[],
checkpoint_dir=checkpoint_dir,
save_checkpoint_secs=checkpoint_save_secs,
config=config,
log_step_count_steps=0 # disable logging of steps to avoid TF warning during inference
)

return sess

Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
from .image_embedder import ImageEmbedder
from .vector_embedder import VectorEmbedder
from .tensor_embedder import TensorEmbedder

__all__ = ['ImageEmbedder', 'VectorEmbedder', 'TensorEmbedder']
157 changes: 157 additions & 0 deletions rl_coach/architectures/legacy_tf_components/embedders/embedder.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,157 @@
#
# Copyright (c) 2017 Intel Corporation
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#

from typing import List, Union, Tuple
import copy

import numpy as np
import tensorflow as tf

from rl_coach.architectures.tensorflow_components.layers import BatchnormActivationDropout, convert_layer, Dense
from rl_coach.base_parameters import EmbedderScheme, NetworkComponentParameters

from rl_coach.core_types import InputEmbedding
from rl_coach.utils import force_list


class InputEmbedder(object):
"""
An input embedder is the first part of the network, which takes the input from the state and produces a vector
embedding by passing it through a neural network. The embedder will mostly be input type dependent, and there
can be multiple embedders in a single network
"""
def __init__(self, input_size: List[int], activation_function=tf.nn.relu,
scheme: EmbedderScheme=None, batchnorm: bool=False, dropout_rate: float=0.0,
name: str= "embedder", input_rescaling=1.0, input_offset=0.0, input_clipping=None, dense_layer=Dense,
is_training=False):
self.name = name
self.input_size = input_size
self.activation_function = activation_function
self.batchnorm = batchnorm
self.dropout_rate = dropout_rate
self.input = None
self.output = None
self.scheme = scheme
self.return_type = InputEmbedding
self.layers_params = []
self.layers = []
self.input_rescaling = input_rescaling
self.input_offset = input_offset
self.input_clipping = input_clipping
self.dense_layer = dense_layer
if self.dense_layer is None:
self.dense_layer = Dense
self.is_training = is_training

# layers order is conv -> batchnorm -> activation -> dropout
if isinstance(self.scheme, EmbedderScheme):
self.layers_params = copy.copy(self.schemes[self.scheme])
self.layers_params = [convert_layer(l) for l in self.layers_params]
else:
# if scheme is specified directly, convert to TF layer if it's not a callable object
# NOTE: if layer object is callable, it must return a TF tensor when invoked
self.layers_params = [convert_layer(l) for l in copy.copy(self.scheme)]

# we allow adding batchnorm, dropout or activation functions after each layer.
# The motivation is to simplify the transition between a network with batchnorm and a network without
# batchnorm to a single flag (the same applies to activation function and dropout)
if self.batchnorm or self.activation_function or self.dropout_rate > 0:
for layer_idx in reversed(range(len(self.layers_params))):
self.layers_params.insert(layer_idx+1,
BatchnormActivationDropout(batchnorm=self.batchnorm,
activation_function=self.activation_function,
dropout_rate=self.dropout_rate))

def __call__(self, prev_input_placeholder: tf.placeholder=None) -> Tuple[tf.Tensor, tf.Tensor]:
"""
Wrapper for building the module graph including scoping and loss creation
:param prev_input_placeholder: the input to the graph
:return: the input placeholder and the output of the last layer
"""
with tf.variable_scope(self.get_name()):
if prev_input_placeholder is None:
self.input = tf.placeholder("float", shape=[None] + self.input_size, name=self.get_name())
else:
self.input = prev_input_placeholder
self._build_module()

return self.input, self.output

def _build_module(self) -> None:
"""
Builds the graph of the module
This method is called early on from __call__. It is expected to store the graph
in self.output.
:return: None
"""
# NOTE: for image inputs, we expect the data format to be of type uint8, so to be memory efficient. we chose not
# to implement the rescaling as an input filters.observation.observation_filter, as this would have caused the
# input to the network to be float, which is 4x more expensive in memory.
# thus causing each saved transition in the memory to also be 4x more pricier.

input_layer = self.input / self.input_rescaling
input_layer -= self.input_offset
# clip input using te given range
if self.input_clipping is not None:
input_layer = tf.clip_by_value(input_layer, self.input_clipping[0], self.input_clipping[1])

self.layers.append(input_layer)

for idx, layer_params in enumerate(self.layers_params):
self.layers.extend(force_list(
layer_params(input_layer=self.layers[-1], name='{}_{}'.format(layer_params.__class__.__name__, idx),
is_training=self.is_training)
))

self.output = tf.contrib.layers.flatten(self.layers[-1])

@property
def input_size(self) -> List[int]:
return self._input_size

@input_size.setter
def input_size(self, value: Union[int, List[int]]):
if isinstance(value, np.ndarray) or isinstance(value, tuple):
value = list(value)
elif isinstance(value, int):
value = [value]
if not isinstance(value, list):
raise ValueError((
'input_size expected to be a list, found {value} which has type {type}'
).format(value=value, type=type(value)))
self._input_size = value

@property
def schemes(self):
raise NotImplementedError("Inheriting embedder must define schemes matching its allowed default "
"configurations.")

def get_name(self) -> str:
"""
Get a formatted name for the module
:return: the formatted name
"""
return self.name

def __str__(self):
result = ['Input size = {}'.format(self._input_size)]
if self.input_rescaling != 1.0 or self.input_offset != 0.0:
result.append('Input Normalization (scale = {}, offset = {})'.format(self.input_rescaling, self.input_offset))
result.extend([str(l) for l in self.layers_params])
if not self.layers_params:
result.append('No layers')

return '\n'.join(result)
Loading