Skip to content
This repository has been archived by the owner on Jun 22, 2022. It is now read-only.

ENH: Adds id to support output caching #83

Closed
wants to merge 1 commit into from

Conversation

thomasjpfan
Copy link

@thomasjpfan thomasjpfan commented Jul 1, 2018

Fixes #39

This PR adds an optional id field to data dictionary. When cache_output is set to True, theid field is appended to step.nameto distinguish between output caches produced by different data dictionaries.

For example:

data_train = {
    'id': 'data_train'
    'input': {
        'features': np.array([
            [1, 6],
            [2, 5],
            [3, 4]
        ]),
        'labels': np.array([2, 5, 3]),
    }
}
step = Step(
    name='test_cache_output_with_key',
    transformer=IdentityOperation(),
    input_data=['input'],
    experiment_directory='/exp_dir',
    cache_output=True
)
step.fit_transform(data_train)

This will produce a output cache file at /exp_dir/cache/test_cache_output_with_key__data_train.

@thomasjpfan thomasjpfan changed the title ENH: Adds auto output caching ENH: Adds id to support output caching Jul 1, 2018
@jakubczakon
Copy link
Collaborator

@thomasjpfan Sorry for late answer.

It's a very interesting idea from the production point of view where your training/dev/test data can easily change. Having an Id here could save you time and trouble.

We're gonna think it through shortly and get back to you.

@kamil-kaczmarek
Copy link
Member

@thomasjpfan As mentioned in issue I will take a closer look at it next week. Thank you for your PR, here!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants