You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Several possible issues have been observed in the example script examples/hybrid_node.py that may lead to unexpected behavior or unfair comparisons. The following points outline the bugs along with the problematic code lines and suggested improvements:
LightGBM Test Metrics Printing:
The script prints outdated test metrics (e.g., using metrics from the GNN model when evaluating LightGBM predictions). Bug:
Sample Size Usage for GNN Model: Issue:
The GNN model does not utilize the sample_size parameter when training. This means that while the LightGBM model is trained on a subsampled dataset (first sample_size rows), the GNN model is trained on the full training set. This discrepancy can lead to an unfair comparison between the models.
Entity Table Overwriting: Bug:
The script reassigns the entity_table variable in a loop when creating loaders for each split. For example:
forsplitin ["train", "val", "test"]:
table=task.get_table(split)
table_input=get_node_train_table_input(table=table, task=task)
entity_table=table_input.nodes[0] # This gets overwritten each iteration
...
Improvement:
Instead of overwriting, maintain a mapping for each split and reference the correct table. For instance:
entity_table_mapping: Dict[str, str] = {}
forsplitin ["train", "val", "test"]:
table=task.get_table(split)
table_input=get_node_train_table_input(table=table, task=task)
entity_table_mapping[split] =table_input.nodes[0]
...
# Later reference the appropriate entity table, e.g., using task.entity_table = entity_table_mapping["train"]
Addressing these issues may help improve the robustness of the example script and ensure a fair comparison between the GNN and LightGBM models.
The text was updated successfully, but these errors were encountered:
vladislavalerievich
changed the title
Potential Bugs in the Example Script: Entity Table Overwrite, State Dict Path, and Sample Size Usage
Potential Bugs in the hybrid_node.py Example Script
Feb 15, 2025
Thanks for pointing this out @vladislavalerievich ! Since you have proposed the improvements, can you make a quick PR with these changes? We will be happy to merge to main!
Several possible issues have been observed in the example script
examples/hybrid_node.py
that may lead to unexpected behavior or unfair comparisons. The following points outline the bugs along with the problematic code lines and suggested improvements:LightGBM Test Metrics Printing:
The script prints outdated test metrics (e.g., using metrics from the GNN model when evaluating LightGBM predictions).
Bug:
Improvement:
State Dict Path Formatting:
Bug:
The state dict file path is defined with placeholders in a plain string:
Improvement:
Use an f-string so that the placeholders are replaced with actual values:
Sample Size Usage for GNN Model:
Issue:
The GNN model does not utilize the
sample_size
parameter when training. This means that while the LightGBM model is trained on a subsampled dataset (firstsample_size
rows), the GNN model is trained on the full training set. This discrepancy can lead to an unfair comparison between the models.Entity Table Overwriting:
Bug:
The script reassigns the
entity_table
variable in a loop when creating loaders for each split. For example:Improvement:
Instead of overwriting, maintain a mapping for each split and reference the correct table. For instance:
Addressing these issues may help improve the robustness of the example script and ensure a fair comparison between the GNN and LightGBM models.
The text was updated successfully, but these errors were encountered: