Skip to content

Commit

Permalink
Batching and database benchmarks. (#148)
Browse files Browse the repository at this point in the history
* Understanding the costs of batching.

* No virtualenvs from poetry.

* Notes on codespaces error.

* Improve install.sh to always install our package.

* Performance benchmarks

* ignore poetry.toml
  • Loading branch information
tbenthompson authored Nov 10, 2022
1 parent c41756e commit acc7479
Show file tree
Hide file tree
Showing 14 changed files with 3,220 additions and 123 deletions.
4 changes: 4 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -50,3 +50,7 @@ terraform.*

# explicitly ignore a file.
*.gitignore

# poetry.toml should not be committed because it will vary between machines
# (e.g. CI will want to use virtualenvs but locally we use conda)
poetry.toml
13 changes: 13 additions & 0 deletions cloud/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -44,6 +44,19 @@ For personal customizations, you get to do a few things. These are not mutually

I am happy to share my dotfiles and VSCode settings if you'd like. To share the dotfiles, I'll need to scrub out some passwords first, but that's probably an improvement anyway. :embarrassed:

## Codespaces and Dev Containers bugs

This stuff is on shaky footing. See the issue here: https://github.com/Confirm-Solutions/confirmasaurus/issues/146

If you get the error:

```
2022-11-09 19:19:50.047Z: nvidia-container-cli: mount error: file creation failed: /var/lib/docker/overlay2/ec3218afb8e82841b8d25c74cf6c5686c7e7c953e61b9c554c7ec5c73821df87/merged/run/nvidia-persistenced/socket: no such device or address: unknown.
```

then, it's likely that you did not select the `smalldev`, `bigdev` or `clouddev` container and instead used the default devcontainer. This is a temporary bug and should be fixed by upcoming releases in Codespaces. The blocker is a release of the Dev Containers extension with the outcome of the PR here (https://github.com/devcontainers/cli/pull/173).


## Getting started launching AWS infra

Installing and configuring your tools:
Expand Down
1 change: 0 additions & 1 deletion cloud/images/bigdev/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -108,7 +108,6 @@ RUN apt-get update \
# https://gitlab.com/nvidia/container-images/cuda/-/tree/master/dist/11.7.0/ubuntu2204
# We combine the packages from the base, runtime and devel images.
ARG CUDA_PKG_VERSION="11-7"
ENV CUDA_VERSION 11.7.0
ENV NVARCH=x86_64
RUN curl -fsSLO https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/${NVARCH}/cuda-keyring_1.0-1_all.deb \
&& dpkg -i cuda-keyring_1.0-1_all.deb \
Expand Down
1 change: 0 additions & 1 deletion cloud/images/smalldev/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,6 @@ RUN apt-get update \
&& apt-get autoremove -y

ARG CUDA_PKG_VERSION="11-7"
ENV CUDA_VERSION 11.7.0
ENV NVARCH=x86_64
RUN curl -fsSLO https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/${NVARCH}/cuda-keyring_1.0-1_all.deb \
&& dpkg -i cuda-keyring_1.0-1_all.deb \
Expand Down
22 changes: 15 additions & 7 deletions confirm/lewislib/batch.py
Original file line number Diff line number Diff line change
Expand Up @@ -142,6 +142,11 @@ def batch(f, batch_size: int, in_axes, out_axes=None):
specified axis. If the function has multiple outputs, each output is
concatenated along the corresponding axis.
NOTE: In performance critical situations, it may be better to use batch_all
and decide for yourself how to concatenate or process the output. For
example, using np.concatenate can be slower than jnp.concatenate if the
batched function is outputting JAX arrays.
Args:
f: Function to be batched.
batch_size: The batch size.
Expand Down Expand Up @@ -180,13 +185,16 @@ def entry(i, j):
else:
return outs[j][i]

return_vals = [
np.concatenate(
[entry(i, j) for j in range(len(outs))],
axis=internal_out_axes[i],
)
for i in range(len(outs[0]))
]
if len(outs) == 1:
return_vals = [entry(i, 0) for i in range(len(outs[0]))]
else:
return_vals = [
np.concatenate(
[entry(i, j) for j in range(len(outs))],
axis=internal_out_axes[i],
)
for i in range(len(outs[0]))
]
if return_first:
return return_vals[0]
else:
Expand Down
11 changes: 10 additions & 1 deletion install.sh
Original file line number Diff line number Diff line change
@@ -1,7 +1,16 @@
#!/bin/bash

# We don't use poetry envs. The caller should have already activated a conda
# environment.
poetry config virtualenvs.create false --local

# Install dependencies. This might fail in Codespaces or Dev Containers due to
# not being run as root. That's okay because in those settings, we've already
# installed our dependencies in the Dockerfile.
poetry install || true

# Install our package:
poetry install
poetry install --only-root

# Set up pre-commit so it's fast the first time it gets used
pre-commit install --install-hooks
Expand Down
89 changes: 34 additions & 55 deletions poetry.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

243 changes: 240 additions & 3 deletions research/adagrid/inspector.ipynb

Large diffs are not rendered by default.

66 changes: 66 additions & 0 deletions research/adagrid/inspector.md
Original file line number Diff line number Diff line change
Expand Up @@ -67,12 +67,78 @@ load_iter = "latest"
S, load_iter, fn = adastate.load(name, load_iter)
```

```python
plt.hist(S.B_lam.min(axis=0))
plt.show()
```

```python
cr = Criterion(lei_obj, P, S, D)
assert S.twb_max_lam[cr.twb_worst_tile] == np.min(S.twb_max_lam)
assert S.twb_min_lam[cr.twb_worst_tile] == np.min(S.twb_min_lam[cr.ties])
```

```python
Blamsort = S.B_lam.argsort(axis=0)
```

```python
origlamsort = S.orig_lam.argsort()
```

```python
Blamsort[0]
```

```python
import confirm.mini_imprint.lewis_drivers as ld

for i in [0, 1, 10, 100, 200, 300, 500, 750, 1000, 5000, 10000, 100000]:
B_lamss_idx = Blamsort[i, :]
B_lamss = S.B_lam[B_lamss_idx, np.arange(S.B_lam.shape[1])]
overall_tile = origlamsort[i]
overall_lam = S.orig_lam[overall_tile]
bootstrap_min_lams = np.concatenate(([overall_lam], B_lamss))
overall_stats = ld.one_stat(
lei_obj,
S.g.theta_tiles[overall_tile],
S.g.null_truth[overall_tile],
S.sim_sizes[overall_tile],
D.unifs,
D.unifs_order,
)
overall_typeI_sum = (overall_stats[None, :] < B_lamss[:, None]).sum(axis=1)
bias = (overall_typeI_sum[0] - overall_typeI_sum[1:].mean()) / S.sim_sizes[
overall_tile
]
print(f"index={i} bias={bias:5f}")
```

```python
overall_typeI_sum = (overall_stats[None, :] < bootstrap_min_lams[:, None]).sum(axis=1)
bias = (overall_typeI_sum[0] - overall_typeI_sum[1:].mean()) / S.sim_sizes[overall_tile]
```

```python
cr.bias
```

```python
tie = cr.overall_typeI_sum / S.sim_sizes[cr.overall_tile]
tie[0] - np.mean(tie[1:])
```

```python
biases = [tie[i] - np.mean(np.delete(tie, i)) for i in range(1, len(tie))]
plt.hist(biases)
plt.show()
```

```python
plt.hist()
plt.show()
```

```python
idxs = cr.dangerous[:10]
alpha0_new = adastate.AdaRunner(P, lei_obj).batched_invert_bound(
Expand Down
Loading

0 comments on commit acc7479

Please sign in to comment.