Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SAE script multiple parameters #161

Merged
merged 14 commits into from
Jul 3, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .github/workflows/Latex.yml
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,7 @@ jobs:
make put_figures_outside_of_minted_environment -C docs
make do_correct_quotation_marks -C docs
make make_correct_thrm_and_dfntn_and_xmpl_and_rmrk_and_proof_environment -C docs
make references_instead_of_hyperlinks -C docs
- name: compile tex document
run: |
cd docs/build
Expand Down
5 changes: 4 additions & 1 deletion docs/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ latex: latex_no_pdf
$(MAKE) compile_tex;
$(MAKE) compile_tex

latex_no_pdf_no_images: install_brenier_two_fluid latex_docs_no_pdf copy_png_files put_figures_outside_of_minted_environment do_correct_quotation_marks make_correct_thrm_and_dfntn_and_xmpl_and_rmrk_and_proof_environment
latex_no_pdf_no_images: install_brenier_two_fluid latex_docs_no_pdf copy_png_files put_figures_outside_of_minted_environment do_correct_quotation_marks make_correct_thrm_and_dfntn_and_xmpl_and_rmrk_and_proof_environment references_instead_of_hyperlinks

latex_no_pdf: latex_images latex_no_pdf_no_images

Expand Down Expand Up @@ -95,6 +95,9 @@ do_correct_quotation_marks:
sed -i'' -e 's/{\\textquotedbl}/"/g' build/G*.tex;
sed -i'' -e 's/ "/ ``/g' build/G*.tex

references_instead_of_hyperlinks:
sed -i'' -e 's/\\hyperlinkref{\([0-9]*\)}{\([a-zA-Z -]*\)}/\2 (see \\Cref{\1})/g' build/G*.tex

copy_png_files:
find build/manifolds -name \*.png -exec cp {} build \; ;
find build/optimizers/manifold_related -name \*.png -exec cp {} build \; ;
Expand Down
2 changes: 1 addition & 1 deletion docs/make.jl
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ const html_format = Documenter.HTML(;
# specifies that we do not display the package name again (it's already in the logo)
sidebar_sitename = false,
# we should get rid of this line again eventually. We will be able to do this once we got rid of library.md
size_threshold = 262144,
size_threshold = 524288,
)

# if platform is set to "none" then no output pdf is generated
Expand Down
45 changes: 27 additions & 18 deletions docs/src/data_loader/data_loader.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,6 @@
# Data Loader

```@eval
using GeometricMachineLearning, Markdown
Markdown.parse(description(Val(:DataLoader)))
```
The `DataLoader` in `GeometricMachineLearning` is designed to make training convenient.

The data loader can be called with various types of arrays as input, for example a [snapshot matrix](snapshot_matrix.md):

Expand All @@ -25,7 +22,8 @@ SnapshotTensor = rand(Float32, 10, 100, 5)
dl = DataLoader(SnapshotTensor)
```

Here the `DataLoader` has different properties `:RegularData` and `:TimeSeries`. This indicates that in the first case we treat all columns in the input tensor independently (this is mostly used for autoencoder problems), whereas in the second case we have *time series-like data*, which are mostly used for integration problems.
Here the `DataLoader` has different properties `:RegularData` and `:TimeSeries`. This indicates that in the first case we treat all columns in the input tensor independently (this is mostly used for autoencoder problems), whereas in the second case we have *time series-like data*, which are mostly used for integration problems. The default when using a matrix is `:RegularData` and the default when using a tensor is `:TimeSeries`.

We can also treat a problem with a matrix as input as a time series-like problem by providing an additional keyword argument: `autoencoder=false`:

```@example
Expand All @@ -37,11 +35,7 @@ dl = DataLoader(SnapshotMatrix; autoencoder=false)
dl.input_time_steps
```


```@eval
using GeometricMachineLearning, Markdown
Markdown.parse(description(Val(:data_loader_for_named_tuple)))
```
If we deal with hamiltonian systems we typically split the coordinates into a ``q`` and a ``p`` part. Such data can also be used as input arguments for `DataLoader`:

```@example named_tuple_tensor
using GeometricMachineLearning # hide
Expand All @@ -51,17 +45,15 @@ SymplecticSnapshotTensor = (q = rand(Float32, 10, 100, 5), p = rand(Float32, 10,
dl = DataLoader(SymplecticSnapshotTensor)
```

The dimension of the system is then the sum of the dimensions of the ``q`` and the ``p`` component:

```@example named_tuple_tensor
dl.input_dim
```

## The `Batch` struct

```@eval
using GeometricMachineLearning, Markdown
Markdown.parse(description(Val(:Batch)))
```

If we want to draw mini batches from a data loader, we need to allocate an instance of [`Batch`](@ref). If we call the corresponding functor on an instance of [`DataLoader`](@ref) we get the following result:

```@example
using GeometricMachineLearning # hide
Expand All @@ -73,7 +65,15 @@ batch = Batch(3)
batch(dl)
```

This also works if the data are in ``qp`` form:
The output of applying the batch functor is always of the form:

```math
([(b_{1,1}^t, b_{1,1}^p), (b_{1,2}^t, b_{1,2}^p), \ldots], [(b_{2,1}^t, b_{2, 1}^p), (b_{2, 2}^t, b_{2, 2}^p), \ldots], [(b_{3, 1}^t, b_{3, 2}^p), \ldots], \ldots),
```

so it is a tuple of vectors of tuples. One vector represents one batch. the tuples in one vector always have two entries: a *time index* and a *parameter index,* indicated by the superscripts ``t`` and ``p`` respectively in the equation above.

This also works if the data are in ``(q, p)`` form:

```@example
using GeometricMachineLearning # hide
Expand Down Expand Up @@ -103,7 +103,7 @@ Specifically the routines do the following:
3. ``\mathcal{I}_i \leftarrow \mathtt{indices[(i - 1)} \cdot \mathtt{batch\_size} + 1 \mathtt{:} i \cdot \mathtt{batch\_size]}\text{ for }i=1, \ldots, (\mathrm{last} -1),``
4. ``\mathcal{I}_\mathtt{last} \leftarrow \mathtt{indices[}(\mathtt{n\_batches} - 1) \cdot \mathtt{batch\_size} + 1\mathtt{:end]}.``

Note that the routines are implemented in such a way that no two indices appear double.
Note that the routines are implemented in such a way that no two indices appear double, i.e. we *sample without replacement*.

## Sampling from a tensor

Expand Down Expand Up @@ -133,4 +133,13 @@ This algorithm can be visualized the following way (here `batch_size = 4`):
Main.include_graphics("../tikz/tensor_sampling") # hide
```

Here the sampling is performed over the second axis (the *time step dimension*) and the third axis (the *parameter dimension*). Whereas each block has thickness 1 in the ``x`` direction (i.e. pertains to a single parameter), its length in the ``y`` direction is `seq_length`. In total we sample as many such blocks as the batch size is big. By construction those blocks are never the same throughout a training epoch but may intersect each other!
Here the sampling is performed over the second axis (the *time step dimension*) and the third axis (the *parameter dimension*). Whereas each block has thickness 1 in the ``x`` direction (i.e. pertains to a single parameter), its length in the ``y`` direction is `seq_length`. In total we sample as many such blocks as the batch size is big. By construction those blocks are never the same throughout a training epoch but may intersect each other!

## Library Functions

```@docs; canonical=false
DataLoader
Batch
GeometricMachineLearning.number_of_batches
GeometricMachineLearning.convert_input_and_batch_indices_to_array
```
54 changes: 26 additions & 28 deletions docs/src/layers/sympnet_gradient.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,36 +19,34 @@ Main.proof(raw"""Proofing this is straightforward by looking at the gradient of
""" * Main.indentation * raw""" \nabla_q^2f & \mathbb{I}
""" * Main.indentation * raw""" \end{pmatrix},
""" * Main.indentation * raw"""```
""" * Main.indentation * raw"""
""" * Main.indentation * raw"""where ``\nabla_q^2f`` is the Hessian of ``f``. This matrix is symmetric and for any symmetric matrix ``A`` we have that:
""" * Main.indentation * raw"""
""" * Main.indentation * raw"""```math
""" * Main.indentation * raw""" \begin{pmatrix}
""" * Main.indentation * raw""" \mathbb{I} & \mathbb{O} \\
""" * Main.indentation * raw""" A & \mathbb{I}
""" * Main.indentation * raw""" \end{pmatrix}^T \mathbb{J}_{2n}
""" * Main.indentation * raw""" \begin{pmatrix}
""" * Main.indentation * raw""" \mathbb{I} & \mathbb{O} \\
""" * Main.indentation * raw""" A & \mathbb{I}
""" * Main.indentation * raw""" \end{pmatrix} =
""" * Main.indentation * raw""" \begin{pmatrix}
""" * Main.indentation * raw""" \mathbb{I} & A \\
""" * Main.indentation * raw""" \mathbb{O} & \mathbb{I}
""" * Main.indentation * raw""" \end{pmatrix}
""" * Main.indentation * raw""" \begin{pmatrix}
""" * Main.indentation * raw""" \mathbb{O} & \mathbb{I} \\
""" * Main.indentation * raw""" -\mathbb{I} & \mathbb{O}
""" * Main.indentation * raw""" \end{pmatrix}
""" * Main.indentation * raw""" \begin{pmatrix}
""" * Main.indentation * raw""" \mathbb{I} & \mathbb{O} \\
""" * Main.indentation * raw""" A & \mathbb{I}
""" * Main.indentation * raw""" \end{pmatrix} =
""" * Main.indentation * raw""" \begin{pmatrix}
""" * Main.indentation * raw""" \mathbb{O} & \mathbb{I} \\
""" * Main.indentation * raw""" -\mathbb{I} & \mathbb{O}
""" * Main.indentation * raw""" \end{pmatrix} = \mathbb{J}_{2n}.
""" * Main.indentation * raw""" \begin{pmatrix}
""" * Main.indentation * raw""" \mathbb{I} & \mathbb{O} \\
""" * Main.indentation * raw""" A & \mathbb{I}
""" * Main.indentation * raw""" \end{pmatrix}^T \mathbb{J}_{2n}
""" * Main.indentation * raw""" \begin{pmatrix}
""" * Main.indentation * raw""" \mathbb{I} & \mathbb{O} \\
""" * Main.indentation * raw""" A & \mathbb{I}
""" * Main.indentation * raw""" \end{pmatrix} =
""" * Main.indentation * raw""" \begin{pmatrix}
""" * Main.indentation * raw""" \mathbb{I} & A \\
""" * Main.indentation * raw""" \mathbb{O} & \mathbb{I}
""" * Main.indentation * raw""" \end{pmatrix}
""" * Main.indentation * raw""" \begin{pmatrix}
""" * Main.indentation * raw""" \mathbb{O} & \mathbb{I} \\
""" * Main.indentation * raw""" -\mathbb{I} & \mathbb{O}
""" * Main.indentation * raw""" \end{pmatrix}
""" * Main.indentation * raw""" \begin{pmatrix}
""" * Main.indentation * raw""" \mathbb{I} & \mathbb{O} \\
""" * Main.indentation * raw""" A & \mathbb{I}
""" * Main.indentation * raw""" \end{pmatrix} =
""" * Main.indentation * raw""" \begin{pmatrix}
""" * Main.indentation * raw""" \mathbb{O} & \mathbb{I} \\
""" * Main.indentation * raw""" -\mathbb{I} & \mathbb{O}
""" * Main.indentation * raw""" \end{pmatrix} = \mathbb{J}_{2n},
""" * Main.indentation * raw"""```
""" * Main.indentation * raw"""Thus showing symplecticity.""")
""" * Main.indentation * raw"""thus showing symplecticity.""")
```

If we deal with [`GSympNet`](@ref)s this function ``f`` is
Expand All @@ -60,7 +58,7 @@ If we deal with [`GSympNet`](@ref)s this function ``f`` is
where ``a, b\in\mathbb{R}^m``, ``K\in\mathbb{R}^{m\times{}n}`` and ``\Sigma`` is the antiderivative of some common activation function ``\sigma``. We routinely refer to ``m`` as the *upscaling dimension* in `GeometricMachineLearning`. Computing the gradient of ``f`` gives:

```math
[\nabla_qf]_k = \sum_{i=1}^m a_i \sigma(\sum_{j=1}^nk_{ij}q_j + b_i)k_{ik} = K^T a \odot \sigma(Kq + b),
[\nabla_qf]_k = \sum_{i=1}^m a_i \sigma(\sum_{j=1}^nk_{ij}q_j + b_i)k_{ik} = K^T (a \odot \sigma(Kq + b)),
```

where ``\odot`` is the element-wise product, i.e. ``[a\odot{}v]_k = a_kv_k``. This is the form that *gradient layers* take. In addition to gradient layers `GeometricMachineLearning` also has *linear* and *activation* layers implemented. Activation layers are simplified versions of *gradient layers*. These are equivalent to taking ``m = n`` and ``K = \mathbb{I}.``
Expand Down
26 changes: 6 additions & 20 deletions docs/src/tutorials/symplectic_autoencoder.md
Original file line number Diff line number Diff line change
Expand Up @@ -139,30 +139,14 @@ We can see that the autoencoder approach has much more approximation capabilitie
Instead of using a standard integrator we can also use a neural network that is trained on the reduced data. For this:

```@example toda_lattice
data_unprocessed = encoder(sae_nn)(dl.input)
data_processed = ( q = reshape(data_unprocessed.q, reduced_dim ÷ 2, length(data_unprocessed.q)),
p = reshape(data_unprocessed.p, reduced_dim ÷ 2, length(data_unprocessed.p))
)

dl_reduced = DataLoader(data_processed; autoencoder = false)
integrator_batch_size = 128
integrator_train_epochs = 4
integrator_train_epochs = 10

integrator_nn = NeuralNetwork(GSympNet(reduced_dim))
o_integrator = Optimizer(AdamOptimizer(Float64), integrator_nn)
struct ReducedLoss{ET, DT} <: GeometricMachineLearning.NetworkLoss
encoder::ET
decoder::DT
end
function (loss::ReducedLoss)(model::Chain, params::Tuple, input::CT, output::CT) where {AT <:Array, CT <: NamedTuple{(:q, :p), Tuple{AT, AT}}}
GeometricMachineLearning._compute_loss(loss.decoder(model(loss.encoder(input), params)), output)
end

loss = ReducedLoss(encoder(sae_nn), decoder(sae_nn))
dl_integration = DataLoader((q = reshape(dl.input.q, size(dl.input.q, 1), size(dl.input.q, 3)),
p = reshape(dl.input.p, size(dl.input.p, 1), size(dl.input.p, 3)));
autoencoder = false
)
dl_integration = DataLoader(dl; autoencoder = false)

o_integrator(integrator_nn, dl_integration, Batch(integrator_batch_size), integrator_train_epochs, loss)

Expand All @@ -172,15 +156,17 @@ nothing # hide
We can now evaluate the solution:

```@example toda_lattice
ics = (q = dl_reduced.input.q[:, 1], p = dl_reduced.input.p[:, 1])
ics = encoder(sae_nn)((q = dl.input.q[:, 1, 1], p = dl.input.p[:, 1, 1]))
time_series = iterate(integrator_nn, ics; n_points = t_steps)
prediction = (q = time_series.q[:, end], p = time_series.p[:, end])
sol = decoder(sae_nn)(prediction)

plot!(sol.q; label = "Neural Network Integrator")
```


```@eval
Main.remark(raw"The results presented here are not ideal. In order to be able to quickly run them on a relatively weak gpu we have chosen the number of epochs very low. For more useful results the number of epochs should be increased to at least 1000 or more.")
```


## References
Expand Down
22 changes: 11 additions & 11 deletions scripts/symplectic_autoencoders/online_sympnet.jl
Original file line number Diff line number Diff line change
@@ -1,18 +1,19 @@
using CUDA
using GeometricIntegrators: integrate, ImplicitMidpoint
using CairoMakie
using GeometricProblems.TodaLattice: hodeproblem
using GeometricProblems.TodaLattice: hodeensemble
using GeometricMachineLearning
import Random

backend = CUDABackend()

pr = hodeproblem(; tspan = (0.0, 100.))
params = [(α = α̃ ^ 2, N = 200) for α̃ in 0.8 : .1 : 2.0]
pr = hodeensemble(; tspan = (0.0, 100.), parameters = params)
sol = integrate(pr, ImplicitMidpoint())
dl_cpu = DataLoader(sol; autoencoder = true)
dl = DataLoader(dl_cpu, backend, Float32)

const reduced_dim = 2
const reduced_dim = 4

psd_arch = PSDArch(dl.input_dim, reduced_dim)
sae_arch = SymplecticAutoencoder(dl.input_dim, reduced_dim; n_encoder_blocks = 4, n_decoder_blocks = 4, n_encoder_layers = 2, n_decoder_layers = 2)
Expand All @@ -22,11 +23,13 @@ psd_nn = NeuralNetwork(psd_arch, backend)
sae_nn = NeuralNetwork(sae_arch, backend)

const n_epochs = 8192
const batch_size = 512
const batch_size = 2048

sae_method = AdamOptimizerWithDecay(n_epochs)
o = Optimizer(sae_nn, sae_method)

println("Number of batches: ", GeometricMachineLearning.number_of_batches(dl, Batch(batch_size)))

psd_error = solve!(psd_nn, dl)
sae_error = o(sae_nn, dl, Batch(batch_size), n_epochs)

Expand Down Expand Up @@ -65,14 +68,11 @@ sol_sae_reduced = integrate_reduced_system(sae_rs)

# train sympnet

data_unprocessed = encoder(sae_nn)(dl.input)
data_processed = ( q = reshape(data_unprocessed.q, reduced_dim ÷ 2, length(data_unprocessed.q)),
p = reshape(data_unprocessed.p, reduced_dim ÷ 2, length(data_unprocessed.p))
)
data_processed = encoder(sae_nn)(dl.input)

dl_reduced = DataLoader(data_processed; autoencoder = false)
integrator_train_epochs = 8192
integrator_batch_size = 512
integrator_batch_size = 2048

seq_length = 4
integrator_architecture = StandardTransformerIntegrator(reduced_dim; transformer_dim = 10, n_blocks = 3, n_heads = 5, L = 2, upscaling_activation = tanh)
Expand Down Expand Up @@ -129,7 +129,7 @@ end
# plot the reduced data (should be a closed orbit)
reduced_data_matrix = vcat(dl_reduced.input.q, dl_reduced.input.p)
fig_reduced = Figure()
ax_reduced = Axis(reduced_fig[1, 1])
lines!(ax_reduced, reduced_data_matrix[1, :, 1], reduced_data_matrix[2, :, 1]; color = mgreen, label = "Reduced Data")
ax_reduced = Axis(fig_reduced[1, 1])
lines!(ax_reduced, reduced_data_matrix[1, :, 1] |> mtc, reduced_data_matrix[2, :, 1] |> mtc; color = mgreen, label = "Reduced Data")
axislegend(; position = (.82, .75), backgroundcolor = :transparent, color = text_color)
save("reduced_data.png", fig_reduced)
Loading
Loading