Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactored compact flows #7

Closed
wants to merge 61 commits into from
Closed

refactored compact flows #7

wants to merge 61 commits into from

Conversation

jonkhler
Copy link
Collaborator

@jonkhler jonkhler commented May 7, 2021

@jonkhler jonkhler requested a review from Olllom May 7, 2021 16:49
@jonkhler
Copy link
Collaborator Author

jonkhler commented May 9, 2021

@Olllom I think I fixed the shape bug. I am not perfectly happy with the semantics - so this might need some deeper refactoring in the long run:

  1. in order to speed up things I am stacking function evaluations in the 0 dimension. I do this two times: a. when I compute the ramp twice and b. when I compute the zero-offset in the sigmoid transform. So any resulting input shape [d1, d2, ..., dk] will be changed into [2, 2, d1, ..., dk]. The reason for pushing it to the front is simplified broadcasting.
  2. However, by convention we use the first n-1 dimensions as batch dimensions. The complexity comes that we infer the number of components in the mixture from the last dimension. This results from the PyTorch convention that the first n-1 components are batch dimensions, so the output of the parameter generating network will always be in the last dimension.
    Thus, a parameter like the alpha exponent of the smooth ramp will have shape [2, 2, d1, ..., dk, n_components]. Of course this makes it a bit complicated to combine with things we already have. I see two solutions here: 1. we just ignore it for now, and use last dim = mixture index for mixtures in the future (see last paragraph of this comment). 2. we need to refactor this a lot because then we have to transpose things back and forth to both leverage speed and keep things consistent.

--

I am not perfectly sure but it seems to be slower now - but I only checked it on CPU yet anyways and the model will benefit heavily from GPU support. Can you compare to see whether that is true? If it is too slow we have to see whether there are some spurious memory allocations happening from the fix (e.g. as a result from broadcasting).

--

I did not merge your docstrings/comments yet.

--

Inverses via bisections work again.

--

I also changed the AffineSigmoidTransformer into an AffineSigmoidComponents class. It is a semantic difference, whether the transformer is an actual CDF transform or a bundle of CDF transforms that is aggregated in a mixture. This has caused the shape mismatch when summing up the log PDFs. I believe that we need some API decisions here: a) obviously any transformer can be seen as a trivial bundle of transformers b) a bundle of transformations that is aggregated e.g. with a sum (mixture) or a product (product of experts / rejection sampling) is no single CDF transform anymore. Components is a somewhat generic term here. Fits for now but maybe we should pick something better.

@Olllom
Copy link
Contributor

Olllom commented Jan 20, 2022

closing this in favor of its duplicate (#28 )

@Olllom Olllom closed this Jan 20, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants