How to use dataloader without NVTabular? #50

bschifferer · 2022-11-17T10:51:41Z

@radekosmulski developed the examples for dataloaders with native TensorFlow/PyTorch:#47

I wonder how do we recommend to use the dataloaders without NVTabular. When we do not use NVTabular, we do not have a dataschema, therefore, all columns are treated as an input feature.

In particular, TensorFlow keras expects that the output of the dataloader is (x, y). Currently, @radekosmulski added a parsing function: https://github.com/NVIDIA-Merlin/dataloader/blob/8157c7650248359201545934fd7d3e7f95b0eea8/examples/01a-Getting-started-Tensorflow.ipynb

label_column = 'rating'

def process_batch(data, _):
    x = {col: data[col] for col in data.keys() if col != label_column}
    y = data[label_column]
    
    return (x, y)

loader._map_fns = [process_batch]

I think the parsing function will be always required and therefore, this it not the best user experience.
Previously, the dataloader supported to parse arguments cat_names, cont_names, label_names.

I think there are multiple options:

Easy tool to define manually a schema
Enabling parameters cat_names, cont_names, label_names, etc.
maybe there are more?

The text was updated successfully, but these errors were encountered:

oliverholworthy · 2022-11-17T11:54:57Z

In this example notebook linked here.

When the Dataset is created, it automatically infers and creates a schema if not already present in the data being loaded.

The dataloader uses the TARGET tag to idenfity the target columns.

https://github.com/NVIDIA-Merlin/dataloader/blob/v0.0.2/merlin/loader/loader_base.py#L113

Based on that, it seems that the way to achieve specifying the target in this example would be to add a line to assign the target tag:

dataset.schema[label_column] = dataset.schema[label_column].with_tags(Tags.TARGET)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to use dataloader without NVTabular? #50

How to use dataloader without NVTabular? #50

bschifferer commented Nov 17, 2022

oliverholworthy commented Nov 17, 2022

How to use dataloader without NVTabular? #50

How to use dataloader without NVTabular? #50

Comments

bschifferer commented Nov 17, 2022

oliverholworthy commented Nov 17, 2022