[ ] update the complex layer initialization from kaiming to independent by default (check Trabelsi et al. 2018)
[+] deal with the torch.nonzero(..., as_tuple=True)
deprecation warning in utils.spectrum
[ ] Improve implementation
- (Bernoulli Dropout) need 1d (exists), 2d and 3d
(Convolutions) implement 3d convolutions and 3d VarDropout convolutions both real and complex(Transposed Convolutions) figure out the math and implement var dropout for transposed convos
[+] FIgure out the issues with ONNX support
[ ] begin migration to complex
tensors in pytorch>=1.6
- For C->R real-valued loss functions grad.conj() gives a descent direction.
- complex autograd
[ ] Consider replacing Real
with Tensor
in format-conversion layers, like RealToCplx
, CplxToReal
- the term Real has connotations with real numbers, making it very unintuitive to convert between Cplx, which is perceived as a complex number, to a torch Tensor, which serves merely as a storage format.
- need a deprecation cycle for these and related functions
- in
cplx
:from_interleaved_real
,from_concatenated_real
,to_interleaved_real
,to_concatenated_real
, aliasesfrom_real
andto_real
(affects__init__.py
) nn.modules.casting
:InterleavedRealToCplx
,ConcatenatedRealToCplx
,CplxToInterleavedReal
,CplxToConcatenatedReal
, also base classesBaseRealToCplx
andBaseCplxToReal
.- three basic types? Tensor -- aka Storage, Real -- real-valued tensor, Cplx -- complex-valued tensor
- in
[ ] Implement scheduled mag-pruning of Zhu and Gupta (2017) or thresholded of Wu et al. (2019).
- use
nn.masked
as abackend
-- this will automatically support real and Cplx layers!!!! - implement as either wrapper around optimizer (bad), or as a separate entity (better)
- settings of the target sparsity per eligible layer (
dict
) - method
.step()
which updates the masks according to the schedule and the current sorted magnitudes of the parameters
- settings of the target sparsity per eligible layer (
[+] make load_state_dict respect components of CplxParameter and allow promoting real-tensors to complex-tensors provided the state dict has no .real or .imag, but a correct key referring to the parameter.
[+] fix the incorrect naming of bayesain methods in nn.relevance
- rename
*ARD
named layers in.real
and.complex
to*VD
layers, since they use log-uniform prior and thus are in fact Variational Dropout layers - start deprecating importing
*ARD
named layers from.real
and.complex
- fix aliases of imported layers in
.extensions
- expose all base VD/ARD layers in
\_\_init\_\_.py
and require importing modifications from.extensions
- fix the text in nn/relevance/README.md
[+] fix the names for L0 regularized layer whith in fact performs probabilistic sparsification, and is not related to Variational inference
[+] check if setup.py
has correct requirements and specifiy them explicitly
requires
is not a keyword, useinstall_requires
andtests_require
[+] investigate reordering base classes in LinearMasked(MaskedWeightMixin, Linear, _BaseRealMixin)
and similar in nn.masked
.
- could moving it further into the bases result in a slower property lookup? It seems no:
- from python decsriptors doc
The implementation works through a precedence chain that gives data descriptors priority over instance variables, instance variables priority over non-data descriptors, and assigns lowest priority to __getattr__
- lookup order is thus by __getattribute__: descriptors (aka @property), instance __dict__, class attributes __dict__, and lastly __getattr__.
- moved MaskedWeightMixin into _BaseMixin
[+] get rid of torch_module
from .utils
and declare activations
explicitly
[+] clean up the nn
module itself
- remove crap from
.sequential
:CplxResidualBottleneck
,CplxResidualSequential
and CplxBusResidualSequential must go, and move CplxSequential to base layers - split
.layers
,.activation
, and.sequential
.modules.base
: base classes (CplxToCplx, BaseRealToCplx, BaseCplxToReal), and parameter type (CplxParameter, CplxParameterAccessor).modules.casting
: converting real tensors in various formats to and from Cplx (InterleavedRealToCplx, ConcatenatedRealToCplx, CplxToInterleavedReal, CplxToConcatenatedReal, AsTypeCplx).modules.linear
: Linear, Bilinear, Identity, PhaseShift.modules.conv
: everything convolutional.modules.activation
: activations (CplxModReLU, CplxAdaptiveModReLU, CplxModulus, CplxAngle) and layers (CplxReal, CplxImag).modules.container
: CplxSequential.modules.extra
: Dropout, AvgPool1d
- move
.batchnorm
to modules, keep.init
in.nn
- fix imports from adjacent modules:
nn.masked
andnn.relevance
.
[+] in nn.relevance.complex
: drop Cplx(*map(torch.randn_like, (s2, s2)))
and write Cplx(torch.randn_like(s2), torch.randn_like(s2))
explicitly
- implemented
cplx.randn
andcplx.randn_like
[+] residual clean up in nn
module
.activation
:CplxActivation
is the same as CplxToCplx[...]- CplxActivation promotes classic (real) torch functions to split activations, so yes.
- See if it is possible to implement function promotion through CplxToCplx[...]
- it is possbile: just reuse CplxActivation
- Currently CplxToCplx promotes layers and real functions to inpdependently applied layers/functions (split)
- how should we proceed with cplx. trig functions? a wrapper, or hardcoded activations?
- the latter seems more natural, as the trig functions are vendored by this module
- since torch is the base, and implements a great number of univariate tensor functions and could potentially be extended, it is more natural to use a wrapper (rationale behind CplxToCplx[...]).
- how should we proceed with cplx. trig functions? a wrapper, or hardcoded activations?
.modules.extra
: this needs thorough cleaning- drop CplxResidualBottleneck, CplxResidualSequential and CplxBusResidualSequential
- abandon
torch_module
and code the trig activations by hand. - remove alias CplxDropout1d : use torch.nn names as much as possible
- deprecate CplxAvgPool1d: it can be created in runtime with CplxToCplx[torch.nn.AvgPool1d]
[+] documentation for bayesian and maskable layers
- in
nn.relevance.base
, making it like innn.masked
- classes in
nn.relevance
.real
and.complex
should be also documented properly, the same goes for.extensions
[+] restrucure the extensions and non-bayesian layers
- new folder structure
- take ard-related declarations and move them to
relevance/ard.py
, everythin else to a submodule .extensions
submodule:complex
for cplx specific etended layers: bogus penalties, approximations and other stuff, -- not directly related to variational dropout or automatic relevance determinationreal
for supplementary real-valued layers
- take ard-related declarations and move them to
- decide the fate of
lasso
class innn.relevance
:- it is irrelevant to Bayesian methods: move it to
extensions/real
- it is irrelevant to Bayesian methods: move it to
[+] documentation
- go through README-s in each submodule to make sure that info there is correct and typical use cases described
nn.init
: document the initializations according to Trabelsi et al. (2018)- seems to be automatically documented using
functools.wraps
from the originaltorch.nn.init
procedures.
- seems to be automatically documented using
[+] add missing tests to the unit test suite
- tests for
*state_dict
api compliance ofnn.masked
andnn.base.CplxParameter
- implementing these test helped figure out and fix edge cases and fix them, so yay for TDD!