Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fitting the model using generators #16

Open
naarkhoo opened this issue Jan 18, 2021 · 2 comments
Open

fitting the model using generators #16

naarkhoo opened this issue Jan 18, 2021 · 2 comments

Comments

@naarkhoo
Copy link

naarkhoo commented Jan 18, 2021

I am working with a huge file which can not fully be loaded therefore have to use generators.

here is the iris example, which I am trying to read the data from the csv file in batches

!wget https://gist.githubusercontent.com/netj/8836201/raw/6f9306ad21398ea43cba4f7d537619d0e07d5ae3/iris.csv
    
def generate_data_from_file(params):
    data_out = pd.read_csv('iris.csv',  
                           skiprows = range(1, params['skiprows']),
                           index_col = 0, 
                           nrows = params['train_upto_row_num'], 
                           chunksize = params['num_observation'])

    for item_df in data_out:
        
        target = item_df['variety']
        item_df = item_df[["sepal.length","sepal.width","petal.length","petal.width"]]
        
        yield np.array(item_df), np.array(target)

types = (tf.float32, tf.int16)

training_params = {'skiprows': 0, 
                   'train_upto_row_num': 40, 
                   'num_observation': 5}

valid_params = {'skiprows': 40, 
                   'train_upto_row_num': 20, 
                   'num_observation': 5}

training_dataset = tf.data.Dataset.from_generator(lambda: generate_data_from_file(params_train),
                                         output_types=types
                                         #, output_shapes=shapes
                                        ).repeat(1)

validation_dataset = tf.data.Dataset.from_generator(lambda: generate_data_from_file(params_val),
                                         output_types=types
                                         #, output_shapes=shapes
                                        ).repeat(1)

col_names = ['sepal_length', 'sepal_width', 'petal_length', 'petal_width']

feature_columns = []
for col_name in col_names:
    feature_columns.append(tf.feature_column.numeric_column(col_name))

    
model = tabnet.TabNetClassifier(feature_columns, num_classes=3,
                                feature_dim=8, output_dim=4,
                                num_decision_steps=4, relaxation_factor=1.0,
                                sparsity_coefficient=1e-5, batch_momentum=0.98,
                                virtual_batch_size=None, norm_type='group',
                                num_groups=1)

lr = tf.keras.optimizers.schedules.ExponentialDecay(0.01, decay_steps=100, decay_rate=0.9, staircase=False)
optimizer = tf.keras.optimizers.Adam(lr)
model.compile(optimizer, loss='categorical_crossentropy', metrics=['accuracy'])

model.fit(training_dataset, 
          epochs=100, 
          validation_data=validation_dataset, 
          verbose=2)

model.summary()

however it gives me the following error

ValueError: in user code:

    /opt/conda/lib/python3.8/site-packages/tensorflow/python/keras/engine/training.py:805 train_function  *
        return step_function(self, iterator)
    /opt/conda/lib/python3.8/site-packages/tabnet/tabnet.py:421 call  *
        self.activations = self.tabnet(inputs, training=training)
    /opt/conda/lib/python3.8/site-packages/tabnet/tabnet.py:213 call  *
        features = self.input_features(inputs)
    /opt/conda/lib/python3.8/site-packages/tensorflow/python/keras/engine/base_layer.py:1012 __call__  **
        outputs = call_fn(inputs, *args, **kwargs)
    /opt/conda/lib/python3.8/site-packages/tensorflow/python/keras/feature_column/dense_features.py:158 call  **
        raise ValueError('We expected a dictionary here. Instead we got: ',

    ValueError: ('We expected a dictionary here. Instead we got: ', <tf.Tensor 'IteratorGetNext:0' shape=<unknown> dtype=float32>)
    

I wonder if you can help me with formatting the data - thanks

@gfade
Copy link

gfade commented Feb 25, 2021

I changed the feature_columns to None and it started working for me, but I'm not sure why

@mainguyenanhvu
Copy link

Have you found the reason why it works when changing the feature_columns to None?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants