Rework chunk creation and processing workflow

This issue is about a refactor idea that came up while review-ing #357. The below steps are in decreasing order of details as the outcome of further steps can be significantly altered by new ideas coming up during execution of the first ones.

### `search_chunks` should return all the chunks

Currently chunks are identified in two distinct places during processing of a file: in `finder.search_chunks` which returns a list of only known chunks, then in `processing._FileTask.process` we fill in the gaps between chunks and beginning and end of the input file.

The basic idea is that search_chunks could do this preprocessing **and in addition to that** return an unknown chunk covering the whole file. Essentially, the following test change needs to pass:

```diff
diff --git a/tests/test_processing.py b/tests/test_processing.py
index 82ff857..71fcf1d 100644
--- a/tests/test_processing.py
+++ b/tests/test_processing.py
@@ -90,7 +90,7 @@ def test_remove_inner_chunks(
     "chunks, file_size, expected",
     [
         ([], 0, []),
-        ([], 10, []),
+        ([], 10, [UnknownChunk(0, 10)]),
         ([ValidChunk(0x0, 0x5)], 5, []),
         ([ValidChunk(0x0, 0x5), ValidChunk(0x5, 0xA)], 10, []),
         ([ValidChunk(0x0, 0x5), ValidChunk(0x5, 0xA)], 12, [UnknownChunk(0xA, 0xC)]),
```

This requires a separate change so that `carve_unknown_chunk` should be guarded the same way as `carve_valid_chunk` is:
https://github.com/IoT-Inspector/unblob/blob/dbc104ffd3cbebd584af60e8f4fea548824b2a64/unblob/processing.py#L249-L256

Given, that chunks are also ordered, it will also make the chunks in metadata ordered as an added bonus.

### OO wrapping of chunks with operations to do on them

After the above changes, there are different possibilities to go forward, I'll just outline one possible way here.

- Create a factory that maps the incoming Chunk structs to objects with behavior. We can add additional data fields, like the input file path that is needed for further operations. We shouldn't change the constructor arguments of ValidChunks, as they are part of the public interface so we should keep them convenient to use. (Sidenote: given careful API design, we could expose ways for handlers to return custom Chunk subclasses that alters the way they are being handled. This possible change is out of scope of this issues).
- Move the chunk-type specific aspects like carving, extraction, entropy calculation of `_FileTask` to these objects.

### Adjust metadata creation

The primary goal of these changes is that chunk metadata handling can be encapsulated entiirely in the scope of of the `processing` module. E.g. chunk related information can be added to the new wrapping object  created in the above steps. The new chunk abstractions also have access to the file path and arbitrary extra information we may add to them, so that we could also generate predictable id-s given the input path and offset-length pairs.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Rework chunk creation and processing workflow #369

`search_chunks` should return all the chunks

OO wrapping of chunks with operations to do on them

Adjust metadata creation

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Rework chunk creation and processing workflow #369

Description

search_chunks should return all the chunks

OO wrapping of chunks with operations to do on them

Adjust metadata creation

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

`search_chunks` should return all the chunks