Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[installation] fix installation and some name typo #146

Merged
merged 4 commits into from
Oct 21, 2024
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/examples_test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ jobs:
sed -i 's/range(1000)/range(100)/g' examples/noisy_label_detection/trak_noisy_label.py
python examples/noisy_label_detection/trak_noisy_label.py --device cpu
python examples/pretrained_benchmark/influence_function_lds.py --device cpu
python examples/pretrained_benchmark/trak_lds.py --device cpu
python examples/pretrained_benchmark/trak_loo.py --device cpu
python examples/brittleness/mnist_lr_brittleness.py --method cg --device cpu
- name: Uninstall the package
run: |
Expand Down
13 changes: 7 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,20 +26,20 @@ git clone https://github.com/TRAIS-Lab/dattri
pip install -e .
```

If you want to use all features on CUDA and accelerate the library, you may install the full version by
If you want to use `fast_jl` to accelerate the random projection, you may install the full version by
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"the full version" -> "the version with fast_jl"


```bash
pip install -e .[all]
pip install -e .[fast_jl]
```

> [!NOTE]
> It's highly recommended to use a device support CUDA to run `dattri`, especially for moderately large or larger models or datasets. And it's required to have CUDA if you want to install the full version `dattri`.
> It's highly recommended to use a device support CUDA to run `dattri`, especially for moderately large or larger models or datasets.

> [!NOTE]
> If you are using `dattri[all]`, please use `pip<23` and `torch<2.3` due to some known issue of `fast_jl` library.
> It's required to have CUDA if you want to install and use the fast_jl version `dattri[fast_jl]` to accelerate the random projection. The projection is mainly used in `TRAKAttributor`. Please use `pip<23` and `torch<2.3` due to some known issue of `fast_jl` library.

#### Recommended enviroment setup
It's not required to follow the exact same steps in this section. But this is a verified environment setup flow that may help users to avoid most of the issues during the installation.
It's **not** required to follow the exact same steps in this section. But this is a verified environment setup flow that may help users to avoid most of the issues during the installation.

```bash
conda create -n dattri python=3.10
Expand All @@ -49,7 +49,7 @@ conda install -c "nvidia/label/cuda-11.8.0" cuda-toolkit
pip3 install torch==2.1.0 --index-url https://download.pytorch.org/whl/cu118

git clone https://github.com/TRAIS-Lab/dattri
pip install -e .[all]
pip install -e .[fast_jl]
```

### Apply data attribution methods on PyTorch models
Expand Down Expand Up @@ -171,6 +171,7 @@ model = activate_dropout(model, ["dropout1", "dropout2"], dropout_prob=0.2)
```

## Algorithms Supported
We have implemented most of the state-of-the-art methods. The categories and reference paper of the algorithms are listed in the following table.
| Family | Algorithms |
|:------:|:-------------------------------------:|
| [IF](https://arxiv.org/abs/1703.04730) | [Explicit](https://arxiv.org/abs/1703.04730) |
Expand Down
6 changes: 5 additions & 1 deletion dattri/metric/ground_truth.py
Original file line number Diff line number Diff line change
Expand Up @@ -160,7 +160,8 @@ def target_func(ckpt_path, dataloader):
target function calculated on all test samples under `num_subsets` models,
each retrained on a subset of the training data. The second tensor has the
shape (num_subsets, subset_size), where each row refers to the indices of
the training samples used to retrain the model.
the training samples used to retrain the model. The targeted value will be
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

targeted -> target

flipped to be consistent with the score calculated by the attributors.
"""
retrain_dir = Path(retrain_dir)

Expand All @@ -186,4 +187,7 @@ def target_func(ckpt_path, dataloader):
target_values[i] += target_func(ckpt_path, test_dataloader)
target_values /= num_runs_per_subset

# flip the target values
target_values = -target_values

return target_values, indices
3 changes: 2 additions & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -21,13 +21,14 @@ dependencies = [
"numpy>=1.25",
"scipy>=1.11",
"pyyaml",
"pretty_midi"
]

[project.urls]
homepage = "https://github.com/TRAIS-Lab/dattri"

[project.optional-dependencies]
all = ["fast_jl"]
fast_jl = ["fast_jl"]
test = ["build", "pytest", "pre-commit", "ruff", "darglint", "scikit-learn", "pretty_midi", "requests"]

[tool.setuptools.packages]
Expand Down
Loading