Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistent output paths when acquiring Windows container from Linux #171

Open
JazzCore opened this issue May 30, 2024 · 1 comment
Open

Comments

@JazzCore
Copy link

JazzCore commented May 30, 2024

When acquiring any non-live Windows container (HDD, VM image) from Linux with case-sensitive filesystem output tar/directory contains duplicate directories with mixed case:

For example, running acquire windows-vm.qcow2 on Linux with btrfs gives following directories (truncated for readability):

$ tree
.
└── C:
    ├── $Recycle.bin
    ├── $Recycle.Bin
    ├── windows
    │   ├── appcompat
    │   ├── system32
    │   │   ├── config
    │   │   ├── drivers
    │   │   ├── sru
    │   │   ├── tasks
    │   │   ├── wbem
    │   │   └── winevt
    │   └── tasks
    └── Windows
        └── System32
            └── WDI

Notice duplicated $Recycle.Bin, Windows, System32 directories with different case.
I managed to somewhat fix it with replacing all sysvol/windows/ and /sysvol/windows/system32 strings in acquire.py with proper case, but this method also requires similar changes in other dissect libraries, since acquire calls them to get collection paths. Surely there are a better fix for this than specifying correct case in collection paths, e.g. using proper path from filesystem for output path

@cecinestpasunepipe
Copy link
Contributor

We appreciate your awareness of the duplication issue with Dissect artefacts. The issue arises from utilizing a diverse range of sources to curate artefacts. While our tools do address duplicates for certain functions, they may persist in raw tar extracts due to the varied nature of source methodologies.

Currently, the best solution is to implement post-extraction post-processing, potentially by a script or third-party Dissect tool. However, we would like to inform you that we do not plan to add such features to our core software. The tar archive should be regarded as a transitional data repository, intended for subsequent processing rather than direct manual extraction.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants