Skip to content

Conversation

Wheest
Copy link
Contributor

@Wheest Wheest commented Oct 3, 2025

This PR introduces a new dvc purge command to remove DVC-tracked outputs and their cache, while leaving stage metadata (.dvc files, dvc.yaml) intact. It's intended as a safer/faster alternative to manually deleting files and cache when cleaning up a workspace.

CLI

dvc purge [targets...] [--recursive] [--dry-run] [-f|--force] [-y]
  • targets...: optional list of specific files/directories to purge. If omitted, the entire repo is considered.
  • --recursive, -r: recurse into directories.
  • --dry-run: show what would be removed, without deleting anything.
  • --force, -f: bypass safety checks (dirty outputs, remote backup).
  • --yes, -y: skip confirmation prompt.

Behaviour

  • Collect outputs (outs) from .dvc files and dvc.yaml.
  • For each output:
    • Remove workspace copies (files/dirs).
    • Remove corresponding objects from the local cache.
  • Stage metadata remains intact.
  • Non-DVC files are never touched.

Safety Checks

Before purging, DVC performs two safety checks:

  1. Dirty outputs – if an output has been modified in the workspace and differs from cache:

    • Abort with PurgeError unless --force is used.
  2. Remote backup – if a default remote is configured, verify that all outputs are present remotely:

  • If missing -> abort unless --force.
  • If no remote is configured -> abort unless --force.
  • With --force, purge proceeds but logs a warning that data may be permanently lost.

Example

$ dvc purge --dry-run
WARNING: This will permanently remove local DVC-tracked outputs for the entire workspace.
(dry-run: showing what would be removed, no changes).
ERROR: No default remote configured. Cannot safely purge outputs without verifying remote backup.
Use `--force` to purge anyway.
$ dvc purge --force -y
WARNING: This will permanently remove local DVC-tracked outputs for the entire workspace.
WARNING: No default remote configured. Proceeding with purge due to --force. Outputs may be permanently lost.
Removed 5 outputs (workspace + cache).

Tests

  • ✅ Purge removes both workspace + cache copies, leaves .dvc metadata.
  • ✅ Purge with targets removes only matching outs.
  • ✅ Recursive purge works on nested dirs.
  • ✅ Dry-run lists removals without making changes
  • ✅ Dirty outs raise error unless --force
  • ✅ Missing remote / missing objects raise error unless --force
  • ✅ CLI tests for confirmation, -y, and force behavior.

Fixes #10874
Docs will be added in iterative/dvc.org#5464

@github-project-automation github-project-automation bot moved this to Backlog in DVC Oct 3, 2025
Copy link

codecov bot commented Oct 3, 2025

Codecov Report

❌ Patch coverage is 94.97908% with 12 lines in your changes missing coverage. Please review.
✅ Project coverage is 90.98%. Comparing base (2431ec6) to head (8a218df).
⚠️ Report is 135 commits behind head on main.

Files with missing lines Patch % Lines
dvc/repo/purge.py 86.66% 6 Missing and 4 partials ⚠️
dvc/commands/purge.py 94.11% 1 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main   #10880      +/-   ##
==========================================
+ Coverage   90.68%   90.98%   +0.29%     
==========================================
  Files         504      508       +4     
  Lines       39795    41025    +1230     
  Branches     3141     3257     +116     
==========================================
+ Hits        36087    37325    +1238     
- Misses       3042     3054      +12     
+ Partials      666      646      -20     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@Wheest
Copy link
Contributor Author

Wheest commented Oct 3, 2025

Specific question for reviewers: are there any parts of the code for which there are existing helpers in the codebase I don't know about (I haven't done much dev in DVC)

@Wheest
Copy link
Contributor Author

Wheest commented Oct 3, 2025

Note: reviewers can get a feel for the tool by running:

# 1. Initialize a Git repo
git init dvc-repo
cd dvc-repo

# 2. Initialize DVC
dvc init

# 3. Create a few 1MB junk files
for i in (seq 1 5)
    head -c 1M </dev/urandom > file_$i.bin
end

# 4. Add the files to DVC
dvc add file_*.bin

# 5. Commit changes to Git
git add .
git commit -m "Initialize DVC repo with 1MB junk files"

(be sure to have the dvc version installed).

1. Preview what files would be deleted

$ dvc purge --dry-run
WARNING: This will show what local DVC-tracked outputs would be removed for the entire workspace.
(dry-run: showing what would be removed, no changes).
ERROR: No default remote configured. Cannot safely purge outputs without verifying remote backup.
Use `--force` to purge anyway.

2. Preview what files would be deleted (with --force)

$ dvc purge --dry-run --force
WARNING: This will show what local DVC-tracked outputs would be removed for the entire workspace.
(dry-run: showing what would be removed, no changes).
WARNING: No default remote configured. Proceeding with purge due to --force. Outputs may be permanently lost.
[dry-run] Would remove file_4.bin
[dry-run] Would remove file_5.bin
[dry-run] Would remove file_1.bin
[dry-run] Would remove file_3.bin
[dry-run] Would remove file_2.bin
Nothing to purge.

3. Try and purge files that aren't backed up

$ dvc purge
WARNING: This will permanently remove local DVC-tracked outputs for the entire workspace.
Are you sure you want to proceed? [y/n]: y
ERROR: Some outputs are not present in the remote cache and would be permanently lost if purged:
  - file_4.bin
  - file_5.bin
  - file_1.bin
  - file_3.bin
  - file_2.bin
Use `--force` to purge anyway.

4. Change a file, preview warnings

# append 10 random bytes at the end
$ dd if=/dev/urandom bs=1 count=10 >> file_1.bin

$ dvc purge --dry-run
WARNING: This will show what local DVC-tracked outputs would be removed for the entire workspace.
(dry-run: showing what would be removed, no changes).
ERROR: Some tracked outputs have uncommitted changes. Use `--force` to purge anyway.
  - file_1.bin

5. Set up remote, preview what would be removed

$ mkdir -p /tmp/dvc-remote
$ dvc remote add -d local_remote /tmp/dvc-remote
$ dvc push
$ dvc purge --dry-run
WARNING: This will show what local DVC-tracked outputs would be removed for the entire workspace.
(dry-run: showing what would be removed, no changes).
[dry-run] Would remove file_4.bin
[dry-run] Would remove file_5.bin
[dry-run] Would remove file_1.bin
[dry-run] Would remove file_3.bin
[dry-run] Would remove file_2.bin
Nothing to purge.

6. Purge files that are confirmed to be backed up

$ dvc purge -y
WARNING: This will permanently remove local DVC-tracked outputs for the entire workspace.
Removed 5 outputs (workspace + cache).

@skshetry
Copy link
Collaborator

skshetry commented Oct 4, 2025

Hi, thank you for creating the pull request. I am OOO, so please give me a few days for me to review this (and the problem statement/issue itself).

@Wheest
Copy link
Contributor Author

Wheest commented Oct 13, 2025

@rgoya does this fit your needs? Are there any features that you need that aren't represented

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Backlog

Development

Successfully merging this pull request may close these issues.

Remove all locally downloaded data

2 participants