Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A3C LSTM GA for language grounding #7

Open
wants to merge 98 commits into
base: master
Choose a base branch
from
Open

Conversation

beduffy
Copy link
Contributor

@beduffy beduffy commented Feb 23, 2019

  • Added A3C_LSTM_GA model from DeepRL-Grounding by Devendra Chaplot
  • Contains cozmo_env PR
  • Added checkpointing and experiment IDs of models
  • Added tensorboardX which outputs to experiment folder with config+argparse settings
  • Added avg_episode_return
  • Added random actions on init
  • Added 4 natural language tasks with variants specified by config files (NaturalLanguageLookAtObjectTask, NaturalLanguageNavigateToObjectTask, NaturalLanguagePickUpObjectTask, NaturalLanguagePickUpMultipleObjectTask)
  • Added test case for 2 natural language tasks
  • Added if statements to decide which model to use depending on whether task has natural language or not and therefore chooses A3C or A3C_LSTM_GA
  • Added 3 config files and renamed main config to default_config.json
  • Created task_utils.py for visualisation, preprocessing and reward functions
  • Fixed config overwrite bug
  • Fixed many other bugs I've forgotten about (e.g. one test case was broken)
  • Renamed to PickUpTask from PickUp
  • Added ability to remove lookupdown_actions and moveupdown_actions and put action alone (for pickup tasks but not put needed)
  • Added ai2thor_examples.py for previous test cases that were purely just ai2thor run examples
  • Added inbuilt_interactive_mode.py for script for exploring ai2thor with unity build path and .interact()
  • Added pdb_interactive_and_check_bbox_reward_functions.py for testing reward functions and printing bboxes using pdb debugging
  • VizDoom integration as originally with DeepRL-Grounding
  • random scene reset choice and setting from config
  • ASCII art figures for architectures within model.py
  • Documentation

beduffy and others added 30 commits September 30, 2018 00:13
…with bounding boxes and visualising them and checking if we are close and focusing on an object

Added sample event metadata json for easy viewing
Added scene id and current object type params. Cleared up some stuff. Fixed bug in total reward.
…f colorBounds

This avoided "the colored mug changing color as you place it and it not being in color_to_object_id matrix" bug. Put everything into main
Added interaction param to env which removes actions which use objects. Removed some code.
…crowave task

Saved files by episode number rather than steps.
TODO add more general dense reward for groups of objects
Fixed 2 bugs (time embedding index failure and action output) and allowed colour channels and higher resolution. Grayscale option to environment
renamed a3c_lst_ga model file. Had to make lots of things a double though.
Stopped creating environment in main file
Renamed main file
Created environment guid and saved plots into folder identified by that.
Added todos. Changed A3C to use 128x128 grayscale
Don't spend your life looking at numbers. Automate your experiments
…w model to deal with it.

For 3 mugs: top_left_screen_x, top_left_screen_y, width, height, x, y, z, 3d distance, class.
Removed legend in one lplot
Append all last actions to list. Added done break condition to wrapper random walk
Made bounding crosshair function one line. Added todos
task == 6 and fixed up crosshair check (centre crosshair has to be in
bounding box centre for object.
Added success return value to avoid calculating done twice (can still do
 better refactoring)
Show coloured preprocess to 128 resolution (still works in 0-1 range)
p_losses and v_losses print of average of these. Two new charts made showing these
print last reward on done
interaction=False
MOVEMENT_REWARD_DISCOUNT up one order.
…object type

Added better test of task to get microwave or cup and 2 embeddings.
Unsqueeze and expand in model (will check in future if is right with sentences)
many todos
CUDA device to 0 but no effect of course.
Stopped rendering depth and class to get 5-10 or so FPS faster
Fixed bug with checking to checkpoint at every step

Many todos
… timing and benchmarking

print_all and print for rank = 0.
some todos, commented out eids
Adapted from RL-adventure repo. First running version, although results look odd.
There was a bug on the channel order returned from the environment by the function step. 
Rainbow dqn code also adapted to use the fix
…-experiment-ids.txt, model.py, check_latest_plots)

Moved sample event metadata to gym_ai2thor folder
Similar to the calculate_lstm_input_size_for_A3C function except that there are only
3 conv layers and there is variation among the kernel_size, stride, the number of channels
and there is no padding. Therefore these are hardcoded. Check A3C_LSTM_GA class for these numbers.
Also, returns tuple representing (width, height, num_output_filters) instead of size
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As in its equivalent for A3C, it returns a tuple bla

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A3C's one returns the size not the tuple

height = (height - 4) // 2 + 1
height = (height - 4) // 2 + 1

return width, height, 64

def normalized_columns_initializer(weights, std=1.0):
Copy link
Member

@fernandotorch fernandotorch Mar 9, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do I need column normalization? Add an explanation here

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also ellaborate more on this particular type of normalization being used as opposed to other ones

return out


def weights_init(m):
classname = m.__class__.__name__
if classname.find('Conv') != -1:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that you are initializing layers differently, but what are the magic numbers like 6? Can you make a docstring for the function?

# Gated-Attention layers
self.attn_linear = nn.Linear(self.gru_hidden_size, 64)

# Time embedding layer, helps in stabilizing value prediction
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you explain more to me about the time embedding layer and why it stabilizes value?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't actually know "exactly" how but let's work it out here:

if episode is 1000 steps long and the the step number from 0 to 999 get's passed in as tx, and extra self.time_emb_dim dimensions are input to critic+actor to help use time to your advantage (e.g. rush when you're running out of time) or purely just to know that the min reward you can get in 1000 steps with movement reward -0.001 is -1.0). The agent can work out in sparse reward settings therefore that if there are only 3 steps left it's value prediction will be accurate towards the end and no cup in sight?

How about something like that?

Copy link
Contributor Author

@beduffy beduffy Mar 9, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i don't see exactly how it can help the actor though? Should you act drastically differently with 30 steps left than with 300? Maybe if you have to choose between going to the closer cup or not

Copy link
Contributor Author

@beduffy beduffy Mar 18, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yusun helped me figure this out. Value prediction isn't given time input so for the same image state, if there is 1 step left in episode or 900, the value head can't learn to predict less value for the 1st case. Added this comment:

# Time embedding layer, helps in stabilizing value prediction e.g. if only 1 step is left
# in episode not much reward can be earned vs the same image state with 500 steps left

x_attention = x_attention.unsqueeze(2).unsqueeze(3)
x_attention = x_attention.expand(1, self.num_output_filters, self.output_height,
self.output_width)
assert x_image_rep.size() == x_attention.size()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No asserts. Check and raise if it doesn't work

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmmmm, i really like the conciseness of this assert and i can add a comment?
check and raise is lame to have in a forward function? 2 lines and then you have ugly raise ValueError's in your beauitfully clear forward function

beduffy added 3 commits March 9, 2019 19:49
…her comments/todos in model.py, my_optim.py and train.py

Removed square image talk.
In README.md mentioned needing to install ViZDoom for ViZDoom command
2 lines between functions
…de clearer. Split A3C_LSTM_GA docstring in two and explained langauge grounding. self.num_output_filters = 64. Explained GA expand notation and depth slices

 Removed learning during training from README.md
…isode only in done, put 4 appends before done (why was it otherwise in the first place?) and removed 2nd "if done", fixed ai2thor_examples.py multithreading example fps printing

removed printing of 3d euclidean
env.seed(args.seed + rank)

model = ActorCritic(env.observation_space.shape[0], env.action_space.n, args.frame_dim)
if env.task.task_has_language_instructions:
Copy link
Member

@fernandotorch fernandotorch Mar 10, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we have a convention of tasks with language starting with "Language" or sth like that? so we would check if env.task.__name__.startswith("Language"): That would look better IMO

Copy link
Contributor Author

@beduffy beduffy Mar 10, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do have NaturalLanguage at the start of all of them and NL at the start of the config files. But a boolean is just easier and clearer no??
Booleans are MUCH clearer than startswith stuff which could be wrong. That way requires us to stick to the standard.

Copy link
Member

@fernandotorch fernandotorch Mar 21, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and isn't sticking to a standard a good thing? Rather than having to add 1 extra param

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's added automatically in the NaturalLanguageBaseTask?
0 work, super clear, very readable, very DRY and doesn't require us to forcibly stick to a standard as we create 100+ tasks

beduffy added 4 commits March 14, 2019 22:15
…dded eid info to README, default reward is 1 for 5 objects

removed printing of num ranbdom actions at init
…cified object types) and example to pdb_interactive py file
…m code: No more import star for vizdoom. Style changes, docs and avoid repetition
…tObjectOpened, lastObjectPickedUp, lastObjectPut, lastObjectPutReceptacle. (need to test). Added todos, time embedding explanation

from gym_ai2thor.utils import InvalidTaskParams
from gym_ai2thor.task_utils import get_word_to_idx, check_if_focus_and_close_enough_to_object_type


class TaskFactory:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's get rid of this task entirely by calling getattr(gym_ai2thor.tasks, config['task']['task_name'])(**config) from AI2ThorEnv

return super(NaturalLanguageNavigateToObjectTask, self).reset()


class NaturalLanguagePickUpObjectTask(NaturalLanguageBaseTask):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would restructure this to have this class inherit from both NaturalLanguageBaseTask and PickupTask. We can discuss this in slack

env.seed(args.seed + rank)

model = ActorCritic(env.observation_space.shape[0], env.action_space.n, args.frame_dim)
if env.task.task_has_language_instructions:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Totally different issue. What happens if the task is not natural language? task.has_language_instructions will raise a ValueError

closest_openable = None
for obj in visible_objects:
# look for closest closed receptacle to open it
if obj['openable'] and obj['distance'] < distance and not obj['isopen'] and \
obj['objectType'] in self.objects['openables']:
obj['objectType'] in self.allowed_objects['openables']:
closest_openable = obj
distance = closest_openable['distance']
if closest_openable:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is wrong. If there are two objects to close and the furthest one is first in visible_objects iterable you will close both the furthest and the closest. Moreover, we are executing the close action as many times as remaining objects in visible_objects after the closest_openable. Unindenting this if block will solve the issue. The same applies to "CloseObject" action

interaction_obj = closest_pickupable
self.event = self.controller.step(
dict(action=action_str, objectId=interaction_obj['objectId']))
elif action_str == 'OpenObject':
self.event.metadata['lastObjectPickedUp'] = interaction_obj
elif action_str.startswith('Open'):
closest_openable = None
for obj in visible_objects:
# look for closest closed receptacle to open it
if obj['openable'] and obj['distance'] < distance and not obj['isopen'] and \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to make this more understandable. Can we take the whole condition in the if to a variable called is_closest_receptacle and then just do if is_closest_receptacle? Then the comment above this line won't be needed anymore

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Analogously we can do the same for is_closest_pickupable and is_closest_opened_receptacle if you know what I mean

"""
def __init__(self, seed=None, config_file='config_files/config_example.json', config_dict=None):
def __init__(self, config_file='config_files/default_config.json', config_dict=None):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is no seed parameter although it is used in main.py?

@@ -66,106 +73,185 @@ def __init__(self, seed=None, config_file='config_files/config_example.json', co
if not self.config['pickup_put_interaction']:
self.action_names = tuple([action_name for action_name in self.action_names if 'Pickup'
not in action_name and 'Put' not in action_name])
if self.config.get('put_interaction'):
Copy link
Member

@fernandotorch fernandotorch Mar 23, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we simplify all these gets in one single for loop?

for action_name in self.action_names:
    if action_name.startswith('Look') and not self.config.get('lookupdown_actions, True) or
    if action_name.startswith('Move') and not self.config.get('moveupdown_actions, True):
        self.action_names.remove(acton_name)
self.action_names = tuple(self.action_names)

And so on. It might not be 100% correct but you get the idea

beduffy added 3 commits March 23, 2019 23:39
…ean vars so if statement looks easy to read, set last vars to None
…urposes), writer problems, printing latest_sys_args, args.env, 3 objects config
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Algorithm enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants