Eigenpooling #90

thazhemadam · 2021-10-09T11:55:37Z

Starting to sketch out EigenPooling layers which would resolve (at least a part of #10).

This will be largely based on the following work - Graph Convolutional Networks with EigenPooling

Presently, AGNPool simply does a max/mean pooling. The goal of this PR is to implement pooling layers that don't flatten the node representations into the graph representation.

This will be performed using a new type of pooling layer (named EigenPool for now).
TLDR of what these pooling layers would do -

The initial graph adjacency matrix that represents the structure of the crystal would get coarsened into subgraphs.
The node features of each "super-node" generated from coarsening is computed using the EigenPooling pooling operator.
Finally, it returns a (new? or mutated?) FeaturizedAtoms object, that corresponds to the sub-graph.

Signed-off-by: Anant Thazhemadam <[email protected]>

thazhemadam · 2021-10-09T11:59:46Z

Couple more notes -

The coarsening was done in the initial work using spectral clustering. However, the spectral clustering package in Julia seems to have some issues (see Eigen vector Clustering for Spectral Clustering is inconsistent (and possibly wrong) lucianolorenti/SpectralClustering.jl#10). Nonetheless, I reckon any other coarsening method (like k-NN maybe?) would also be usable (they also posit this in the paper).
I'm leaning towards the pooling layers mutating the FeaturizedAtoms objects (mainly because the memory footprint would be lesser), but I could be persuaded to have them return new FeaturizedAtoms objects instead too

CC @rkurchin

codecov · 2021-10-09T12:08:53Z

Codecov Report

Merging #90 (77024f0) into main (cebdbd5) will decrease coverage by 13.44%.
The diff coverage is 56.33%.

@@             Coverage Diff             @@
##             main      #90       +/-   ##
===========================================
- Coverage   70.58%   57.14%   -13.45%     
===========================================
  Files           2        4        +2     
  Lines          68       84       +16     
===========================================
  Hits           48       48               
- Misses         20       36       +16

Impacted Files	Coverage Δ
src/layers/pool/eigenpool.jl	`0.00% <0.00%> (ø)`
src/layers/conv/agnconv.jl	`55.00% <55.00%> (ø)`
src/layers/pool/agnpool.jl	`82.85% <82.85%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update cebdbd5...77024f0. Read the comment docs.

Signed-off-by: Anant Thazhemadam <[email protected]>

rkurchin · 2021-10-12T13:47:53Z

I'm leaning towards the pooling layers mutating the FeaturizedAtoms objects (mainly because the memory footprint would be lesser), but I could be persuaded to have them return new FeaturizedAtoms objects instead too

Isn't there going to be an issue with AD for actual model training if they mutate? cc @DhairyaLGandhi

thazhemadam · 2021-10-12T13:53:24Z

Isn't there going to be an issue with AD for actual model training if they mutate? cc @DhairyaLGandhi

Oh, nvm, you're right, I think that might be an issue.

On an unrelated note - this experiment implementing this as a global mechanism makes some sense to me primarily because the size of the graphs we deal with isn't as big as what they had to with the original paper itself. But there's also a chance that I could be wrong.

rkurchin · 2021-10-12T13:53:07Z

src/layers/pool/eigenpool.jl

+in theory would give us the overall graph representation
+=#
+
+# TBD - what other fields would be necessary for the pooling layer itself?


Maybe we could set it up so there's an option for the user to input a value of H?

(see more detailed comment below)

rkurchin · 2021-10-12T14:00:17Z

src/layers/pool/eigenpool.jl

+
+    # using an agreeable H and then return H elements of result hcatt-ed into a single 1xdH vector
+    result = hcat(result...)'
+    reshape(result, length(result), 1)  # return it as a dHx1 Matrix


Yeah, but returning something of length N x D isn't going to work because ultimately the size can't depend on N if you want to be able to feed in graphs of different sizes.

As per my comment above, I think a sensible way to do this could be that there's a parameter H that says how many eigenvectors to keep, then we can guarantee a return length of H x D.

OR, if we're worried about that having inconsistent performance across graph sizes, we could instead specify a parameter (say h) in (0,1) that says the fraction of eigenvectors to keep and then do a standard (e.g. max or mean) pooling across a list of h vectors of length D to always return a vector of length D.

Yeah, but returning something of length N x D isn't going to work because ultimately the size can't depend on N if you want to be able to feed in graphs of different sizes.

Fair enough. We need to standardize across different graph sizes.

As per my comment above, I think a sensible way to do this could be that there's a parameter H that says how many eigenvectors to keep, then we can guarantee a return length of H x D.

Would having H::Integer as a field in EigenPool be the best solution for this? I figured that H for a graph of size N1 might not be appropriate (or rather, the "best") H for another graph of size N2.

Instead, what if, like AGNPool, we let users determine what they want their pooled feature length to be? Essentially,
d * H + length(zero padding) = pooled_feature_length, so pooled features would all be of similar lengths regardless of the graph sizes?

Yeah, that's probably the best way to go about it and the most transparent to ensure a "compatible" Chain

so we'd just need an analogous function to the one for AGNPool that works out the actual parameters to make that happen, which shouldn't be too hard

Addressed in a38fdf4

rkurchin · 2021-10-12T14:06:00Z

src/layers/layers.jl

@@ -0,0 +1,48 @@
+module Layers


I'm not sure this really needs to be a separate module since it's kind of the main/only thing the package does apart from the convenience functions for building standard model architectures, and I don't really see a risk of any sort of namespace conflicts...

I made it a module because as I was re-organizing the files I felt like this could be more coherently organized if it were all in one place/module, now that we have different types of pooling layers and all that.
I'm not really particular about it being a module or not, so whatever works.

Oh I'm 💯 fine with the file reorganization, I just don't think we need an actual explicit module.

Resolved in 77024f0.

Return an output feature of specified size such that `d * H` + `length(zero padding)` = `pooled_feature_length`

Signed-off-by: Anant Thazhemadam <[email protected]>

rkurchin

Made one performance-related comment, but also can we add some tests so the codecov bot will chill out? 😆

rkurchin · 2021-10-21T18:08:04Z

src/layers/pool/eigenpool.jl

+
+    result = Vector()
+
+    for i = 1:H


I'm pretty sure this whole loop could be a single matrix multiply, no?

thazhemadam added 2 commits October 9, 2021 13:52

modulify things

f40afd2

Signed-off-by: Anant Thazhemadam <[email protected]>

start sketching out eigenpool pool layer

3641a75

Signed-off-by: Anant Thazhemadam <[email protected]>

thazhemadam marked this pull request as draft October 9, 2021 11:55

add constructors and separate coarsening and pooling

bb512f3

Signed-off-by: Anant Thazhemadam <[email protected]>

thazhemadam changed the title ~~Hierarchial pooling + Eigenpooling~~ Eigenpooling Oct 11, 2021

try to use eigenpooling as the global pooling mechanism

cad8cc0

Signed-off-by: Anant Thazhemadam <[email protected]>

thazhemadam force-pushed the at/eigen-pool branch from 5f4df5b to cad8cc0 Compare October 12, 2021 13:37

rkurchin reviewed Oct 12, 2021

View reviewed changes

thazhemadam added 2 commits October 13, 2021 19:28

allow users to specify the EigenPool's output feature size

a38fdf4

Return an output feature of specified size such that `d * H` + `length(zero padding)` = `pooled_feature_length`

don't make Layer as a module

77024f0

Signed-off-by: Anant Thazhemadam <[email protected]>

thazhemadam requested a review from rkurchin October 18, 2021 18:03

rkurchin requested changes Oct 21, 2021

View reviewed changes

src/layers/pool/eigenpool.jl

result = Vector()

for i = 1:H

Copy link

Member

rkurchin Oct 21, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm pretty sure this whole loop could be a single matrix multiply, no?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Eigenpooling #90

Eigenpooling #90

thazhemadam commented Oct 9, 2021 •

edited

Loading

thazhemadam commented Oct 9, 2021 •

edited

Loading

codecov bot commented Oct 9, 2021 •

edited

Loading

rkurchin commented Oct 12, 2021

thazhemadam commented Oct 12, 2021

rkurchin Oct 12, 2021

rkurchin Oct 12, 2021

rkurchin Oct 12, 2021

thazhemadam Oct 12, 2021 •

edited

Loading

rkurchin Oct 12, 2021

rkurchin Oct 12, 2021

thazhemadam Oct 13, 2021

rkurchin Oct 12, 2021

thazhemadam Oct 12, 2021

rkurchin Oct 12, 2021

thazhemadam Oct 13, 2021

rkurchin left a comment

rkurchin Oct 21, 2021

Eigenpooling #90

Are you sure you want to change the base?

Eigenpooling #90

Conversation

thazhemadam commented Oct 9, 2021 • edited Loading

thazhemadam commented Oct 9, 2021 • edited Loading

codecov bot commented Oct 9, 2021 • edited Loading

Codecov Report

rkurchin commented Oct 12, 2021

thazhemadam commented Oct 12, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

thazhemadam Oct 12, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rkurchin left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

thazhemadam commented Oct 9, 2021 •

edited

Loading

thazhemadam commented Oct 9, 2021 •

edited

Loading

codecov bot commented Oct 9, 2021 •

edited

Loading

thazhemadam Oct 12, 2021 •

edited

Loading