Skip to content

Commit

Permalink
User guide for environment variables (#227)
Browse files Browse the repository at this point in the history
A small user guide mentioning userfacing environment variables
based on #226 

rendered version on fork:
https://scienfitz.github.io/baybe-dev/userguide/envvars.html
  • Loading branch information
AdrianSosic authored May 21, 2024
2 parents 80b478f + 7171c60 commit e759ca5
Show file tree
Hide file tree
Showing 5 changed files with 120 additions and 78 deletions.
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
- Basic deserialization tests using different class type specifiers
- `GammaPrior`, `HalfCauchyPrior`, `NormalPrior`, `HalfNormalPrior`, `LogNormalPrior`
and `SmoothedBoxPrior` can now be chosen as lengthscale prior
- Environment variables user guide

### Changed
- Reorganized acquisition.py into `acquisition` subpackage
Expand Down
6 changes: 5 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -298,12 +298,16 @@ Darmstadt, Germany and/or its affiliates. The recording of metrics is turned off
all other users and is impossible due to a VPN block. In any case, the usage statistics
do **not** involve logging of recorded measurements, targets/parameters or their names
or any project information that would allow for reconstruction of details. The user and
host machine names are irreversibly anonymized.
host machine names are anonymized with via truncated hashing.
- You can verify the above statements by studying the open-source code in the
`telemetry` module.
- You can always deactivate all telemetry by setting the environment variable
`BAYBE_TELEMETRY_ENABLED` to `false` or `off`. For details please consult
[this page](https://emdgroup.github.io/baybe/_autosummary/baybe.telemetry.html).
- If you want to be absolutely sure, you can uninstall internet related packages such
as `opentelemetry*` or its secondary dependencies from the environment. Due to the
inability of specifying opt-out dependencies, these are installed by default, but the
package works without them.

## Authors

Expand Down
80 changes: 3 additions & 77 deletions baybe/telemetry.py
Original file line number Diff line number Diff line change
@@ -1,78 +1,4 @@
"""Telemetry functionality for BayBE.
Important:
BayBE collects anonymous usage statistics **only** for employees of Merck KGaA,
Darmstadt, Germany and/or its affiliates. The recording of metrics is turned off
for all other users and impossible due to a VPN block. In any case, the usage
statistics do **not** involve logging of recorded measurements, targets or any
project information that would allow for reconstruction of details. The user and
host machine names are irreversibly anonymized.
**Monitored quantities are:**
* ``batch_size`` used when querying recommendations
* Number of parameters in the search space
* Number of constraints in the search space
* How often ``recommend`` was called
* How often ``add_measurements`` was called
* How often a search space is newly created
* How often initial measurements are added before recommendations were calculated
("naked initial measurements")
* The fraction of measurements added that correspond to previous recommendations
* Each measurement is associated with an irreversible hash of the user- and hostname
**The following environment variables control the behavior of BayBE telemetry:**
``BAYBE_TELEMETRY_ENABLED``
Flag that can turn off telemetry entirely (default is `true`). To turn it off set it
to `false`.
``BAYBE_TELEMETRY_ENDPOINT``
The receiving endpoint URL for telemetry data.
``BAYBE_TELEMETRY_VPN_CHECK``
Flag turning an initial telemetry connectivity check on/off (default is `true`).
``BAYBE_TELEMETRY_VPN_CHECK_TIMEOUT``
The timeout in seconds for the check whether the endpoint URL is reachable.
``BAYBE_TELEMETRY_USERNAME``
The name of the user executing BayBE code. Defaults to an irreversible hash of
the username according to the OS.
``BAYBE_TELEMETRY_HOSTNAME``
The name of the machine executing BayBE code. Defaults to an irreversible hash of
the machine name.
If you wish to disable logging, you can set the following environment variable:
.. code-block:: console
export BAYBE_TELEMETRY_ENABLED=false
or in Python:
.. code-block:: python
import os
os.environ["BAYBE_TELEMETRY_ENABLED"] = "false"
before calling any BayBE functionality.
Telemetry can be re-enabled by simply removing the variable:
.. code-block:: console
unset BAYBE_TELEMETRY_ENABLED
or in Python:
.. code-block:: python
os.environ.pop["BAYBE_TELEMETRY_ENABLED"]
Note, however, that (un-)setting the variable in the shell will not affect the running
Python session.
"""
"""Telemetry functionality for BayBE."""

import getpass
import hashlib
Expand Down Expand Up @@ -155,7 +81,7 @@ def is_enabled() -> bool:
try:
DEFAULT_TELEMETRY_USERNAME = (
hashlib.sha256(getpass.getuser().upper().encode()).hexdigest().upper()[:10]
) # this hash is irreversible and cannot identify the user or their machine
)
except ModuleNotFoundError:
# getpass.getuser() does not work on Windows if all the environment variables
# it checks are empty. Since then there is no way of inferring the username, we
Expand All @@ -164,7 +90,7 @@ def is_enabled() -> bool:

DEFAULT_TELEMETRY_HOSTNAME = (
hashlib.sha256(socket.gethostname().encode()).hexdigest().upper()[:10]
) # this hash is irreversible and cannot identify the user or their machine
)

_endpoint_url = os.environ.get(
VARNAME_TELEMETRY_ENDPOINT, DEFAULT_TELEMETRY_ENDPOINT
Expand Down
110 changes: 110 additions & 0 deletions docs/userguide/envvars.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,110 @@
# Environment Variables

Several aspects of BayBE can be configured via environment variables.

## Basic Instructions
Setting an environment variable with the name `ENVVAR_NAME` is best done before calling
any Python code, and must also be done in the same session unless made persistent, e.g.
via `.bashrc` or similar:
```bash
ENVAR_NAME="my_value"
python do_baybe_work.py
```
Or on Windows:
```shell
set ENVAR_NAME=my_value
```
Note that variables set in this manner are interpreted as text, but converted internally
to the needed format. See for instance the [`strtobool`](baybe.utils.boolean.strtobool)
converter for values that can be set so BayBE can interpret them as booleans.

It is also possible to set environment variables in Python:
```python
import os

os.environ["ENVAR_NAME"] = "my_value"

# proceed with BayBE code ...
```
However, this needs to be done carefully at the entry point of your script or session and
will not persist between sessions.

## Telemetry

```{admonition} Telemetry Scope
:class: important
BayBE collects anonymous usage statistics **only** for employees of Merck KGaA,
Darmstadt, Germany and/or its affiliates. The recording of metrics is turned off for
all other users and impossible due to a VPN block. In any case, the usage statistics
do **not** involve logging of recorded measurements, targets or any project information
that would allow for reconstruction of details. The user and host machine names are
anonymized.
```

Monitored quantities:
* `batch_size` used when querying recommendations
* Number of parameters in the search space
* Number of constraints in the search space
* How often [`recommend`](baybe.campaign.Campaign.recommend) was called
* How often [`add_measurements`](baybe.campaign.Campaign.add_measurements) was called
* How often a search space is newly created
* How often initial measurements are added before recommendations were calculated
("naked initial measurements")
* The fraction of measurements added that correspond to previous recommendations
* Each measurement is associated with a truncated hash of the user- and hostname

The following environment variables control the behavior of BayBE telemetry:
- `BAYBE_TELEMETRY_ENABLED`: Flag that can turn off telemetry entirely (default is
`True`). To turn it off set it to `False`.
- `BAYBE_TELEMETRY_ENDPOINT`: The receiving endpoint URL for telemetry data.
- `BAYBE_TELEMETRY_VPN_CHECK`: Flag turning an initial telemetry connectivity check
on/off (default is `True`).
- `BAYBE_TELEMETRY_VPN_CHECK_TIMEOUT`: The timeout in seconds for the check whether the
endpoint URL is reachable.
- `BAYBE_TELEMETRY_USERNAME`: The name of the user executing BayBE code. Defaults to a
truncated hash of the username according to the OS.
- `BAYBE_TELEMETRY_HOSTNAME`: The name of the machine executing BayBE code. Defaults to
a truncated hash of the machine name.

```{admonition} Uninstalling Internet Packages
:class: important
If you do not trust the instructions above, you are free to uninstall all
internet-related packages such as `opentelemetry*` or its secondary dependencies. These
are being shipped in the default dependencies because there is no good way of creating
opt-out dependencies, but the baybe package will work without them.
```

## Disk Caching
For some components, such as the
[`SubstanceParameter`](baybe.parameters.substance.SubstanceParameter), some of the
computation results are cached in local storage.

By default, BayBE determines the location of temporary files on your system and puts
cached data into a subfolder `.baybe_cache` there. If you want to change the location of
the disk cache, change:
```bash
BAYBE_CACHE_DIR="/path/to/your/desired/cache/folder"
```

By setting
```bash
BAYBE_CACHE_DIR=""
```
you can turn off disk caching entirely.

## Floating Point Precision
In general, double precision is recommended because numerical stability during optimization
can be bad when single precision is used. This impacts gradient-based optimization,
i.e. search spaces with continuous parameters, more than optimization without gradients.

If you still want to use single precision, you can set the following boolean variables:
- `BAYBE_NUMPY_USE_SINGLE_PRECISION` (defaults to `False`)
- `BAYBE_TORCH_USE_SINGLE_PRECISION` (defaults to `False`)

```{admonition} Continuous Constraints in Single Precision
:class: warning
Currently, due to explicit casting in BoTorch,
[`ContinuousConstraint`](baybe.constraints.base.ContinuousConstraint)s do not support
single precision and cannot be used if the corresponding environment variables are
activated.
```
1 change: 1 addition & 0 deletions docs/userguide/userguide.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
```{toctree}
Campaigns <campaigns>
Constraints <constraints>
Environment Vars <envvars>
Objective <objective>
Parameters <parameters>
Recommenders <recommenders>
Expand Down

0 comments on commit e759ca5

Please sign in to comment.