Potential Bugs in the hybrid_node.py Example Script #293

vladislavalerievich · 2025-02-15T21:17:47Z

Several possible issues have been observed in the example script examples/hybrid_node.py that may lead to unexpected behavior or unfair comparisons. The following points outline the bugs along with the problematic code lines and suggested improvements:

LightGBM Test Metrics Printing:
The script prints outdated test metrics (e.g., using metrics from the GNN model when evaluating LightGBM predictions).
Bug:

 pred = model.predict(tf_test).numpy()
 print(f"Test: {test_metrics}")

Improvement:

 pred = pred.predict(tf_test).numpy()
 test_metrics = task.evaluate(test_pred)
 print(f"LightGBM Test metrics: {test_metrics}"

State Dict Path Formatting:
Bug:
The state dict file path is defined with placeholders in a plain string:
```
STATE_DICT_PTH = "results/{args.dataset}_{args.task}_state_dict.pth"
```
Improvement:
Use an f-string so that the placeholders are replaced with actual values:
```
STATE_DICT_PTH = f"results/{args.dataset}_{args.task}_state_dict.pth"
```
Sample Size Usage for GNN Model:
Issue:
The GNN model does not utilize the sample_size parameter when training. This means that while the LightGBM model is trained on a subsampled dataset (first sample_size rows), the GNN model is trained on the full training set. This discrepancy can lead to an unfair comparison between the models.

Entity Table Overwriting:
Bug:
The script reassigns the entity_table variable in a loop when creating loaders for each split. For example:

for split in ["train", "val", "test"]:
    table = task.get_table(split)
    table_input = get_node_train_table_input(table=table, task=task)
    entity_table = table_input.nodes[0]  # This gets overwritten each iteration
    ...

Improvement:
Instead of overwriting, maintain a mapping for each split and reference the correct table. For instance:

entity_table_mapping: Dict[str, str] = {}
for split in ["train", "val", "test"]:
    table = task.get_table(split)
    table_input = get_node_train_table_input(table=table, task=task)
    entity_table_mapping[split] = table_input.nodes[0]
    ...
# Later reference the appropriate entity table, e.g., using task.entity_table = entity_table_mapping["train"]

Addressing these issues may help improve the robustness of the example script and ensure a fair comparison between the GNN and LightGBM models.

The text was updated successfully, but these errors were encountered:

rishabh-ranjan · 2025-02-26T00:11:35Z

Thanks for pointing this out @vladislavalerievich ! Since you have proposed the improvements, can you make a quick PR with these changes? We will be happy to merge to main!

vladislavalerievich changed the title ~~Potential Bugs in the Example Script: Entity Table Overwrite, State Dict Path, and Sample Size Usage~~ Potential Bugs in the hybrid_node.py Example Script Feb 15, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Potential Bugs in the hybrid_node.py Example Script #293

Potential Bugs in the hybrid_node.py Example Script #293

vladislavalerievich commented Feb 15, 2025 •

edited

Loading

rishabh-ranjan commented Feb 26, 2025

Potential Bugs in the hybrid_node.py Example Script #293

Potential Bugs in the hybrid_node.py Example Script #293

Comments

vladislavalerievich commented Feb 15, 2025 • edited Loading

rishabh-ranjan commented Feb 26, 2025

vladislavalerievich commented Feb 15, 2025 •

edited

Loading