Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add an unsupervised warm up for the models #140

Open
wants to merge 23 commits into
base: master
Choose a base branch
from

Conversation

gabrieltseng
Copy link
Contributor

@gabrieltseng gabrieltseng commented Dec 10, 2019

Inspired by tile2vec, pretrains the models by training the models to make embeddings that are far away from each other more different than embeddings that are close to one another.

It's a less rigid way of communicating the latlon information to the models

@gabrieltseng gabrieltseng changed the title Adds an unsupervised warm up for the models Add an unsupervised warm up for the models Dec 10, 2019
@tommylees112
Copy link
Contributor

This is super cool Gabi! Thanks so much. Just reviewing now

Copy link
Contributor

@tommylees112 tommylees112 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is amazing work dude!! Just a few qs - thanks so much for implementing

src/models/data.py Show resolved Hide resolved
src/models/neural_networks/triplet_data.py Outdated Show resolved Hide resolved
neighbour_indices: List[int] = []
distant_indices: List[int] = []

outer_distance = tuple(multiplier * val for val in distance)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

whats the role of the multiplier?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's basically to enforce a minimum distance between the neighbouring instance and the distant instance.

The neighbour will be within neighbouring_distance of the anchor. The distant instance will be further than multiplier * neighbouring_distance from the anchor

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gotcha! so basically enforcing how large an area our spatial differences should be over

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yup!

@@ -288,11 +289,14 @@ def forward(

x = self.rnn_dropout(hidden_state[:, -1, :])

if return_embedding:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this for interpreting the static embedding?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No - the loss in tile2vec compares the embedding, not the final value. This is to return that embedding for the loss, before it gets put through the final linear layer

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes makes sense! Could this be used for interpreting the embedding layer too though?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea, 100%. Although here the "embedding" is the final output of the model before the linear regression layer

# initialize the model
if self.model is None:
x_ref, _, _ = next(iter(train_dataloader))
model = self._initialize_model(self._input_to_tuple(x_ref))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this train the LSTM model? Don't we need to initialise with a CNN as they use in Tile2Vec?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The principles of tile2vec can be used with any model that takes a raw input and outputs an embedding.

So yea, in this case it can also train the (EA)LSTM model

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay gotcha.

So have i interpreted this correctly:

"We use the unsupervised learning algorithm described in Tile2Vec to pretrain (initialise) the weights of the EALSTM. This allows us to produce weights in the network that produce sensible spatial patterns. Mainly that pixels close together are more similar than pixels that are far apart."

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea, that's exactly right

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants