-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve the accuracy on ogbn-papers100M with GAT model, add undirected graph on ogbn-products #9467
base: master
Are you sure you want to change the base?
Improve the accuracy on ogbn-papers100M with GAT model, add undirected graph on ogbn-products #9467
Conversation
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
2. Add undirected graph on ogbn-products with Sage model 3. Add undirected graph on ogbn-products with GAT model
…ub/pytorch_geometric into performance_analysis
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
@akihironitta @rusty1s anything else needed to merge? |
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great! 🚀
|
||
inference_start = time.time() | ||
val_acc = test(val_loader) | ||
test_acc = test(test_loader) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand why we need to evaluate the model on the test set every epoch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agreed, @guanxingithub please make the test only happen at the end of all epochs of the training
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All evaluations are used for data analysis after the training is done. With all of these statistical data collected during training, we can do further performance and accuracy analysis.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you agree, I can put all of these extra testing and evaluations under verbose mode. So, from user point of view, they don't need to make these further testing and evaluations. But for someone who wants to make performance and accuracy analysis, she/he can turn on verbose mode to collect raw data of performance and accuracy for further statistical analysis.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no you should just not be running test during training, all other metrics are fine. please adjust accordingly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we keep evaluation results on test set in each epoch, that will be very helpful to make further analysis. I did the same evaluation on the test set on other tools. If we can do the same evaluation in PyG, that will be very helpful to compare the accuracy between PyG and other tools apple to apple. Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that will be very helpful to make further analysis.
What kind of analysis are you referring to specifically?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
performance and accuracy analysis cross platform and different tools and different version of tools.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For example, for platform1, and platform2, we collected all these performance and accuracy data, and then we can analyze these data obtained from both platforms, to make a judgment like the hypothesis: whether the performance is same or different, accuracy is difference or not from statistical point of view. If it is significant difference, what could cause the difference. This will build the foundation of our further investigation and improvement. We have done a lot of this kind of analysis not only in PyG, but also other tools.
if val_acc > best_val: | ||
best_val = val_acc | ||
if test_acc > best_test: | ||
best_test = test_acc |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Related to my above comment: I don't understand why it's ok to pick the best metric evaluated on the test set across all epochs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
well this was just for tracking it seems. there is no checkpointing being done but yeah the test set should only be run once at the end anyways. @guanxingithub please remove the test acc tracking since it should only be run once at the end and reported then
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Best metric evaluated on the test set is one of indexes to make statistical accuracy analysis. With all the information we collected during training, we can compare the performance and accuracy of particular model among different platforms, among different version of tools used to run training.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@guanxingithub in general in machine learning it is not proper to run the test set before training/val has finished.
Tracking the best val acc across multiple epochs for perflab is fine, but there should only be one final test accuracy for them to track. running test during training is sort of "cheating". please address me and akihiro's concerns in order to get this merged.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't train the test set, we just evaluate the accuracy on the test set after the model is trained on the training set.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm, still not sure why it makes sense to evaluate the model on the test set during training. Just to make sure, I'm suggesting deleting L233 and L241-L242.
best_test = test_acc | ||
times.append(time.time() - train_start) | ||
|
||
if verbose: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: The concern I previously had was having too many redundant/meaningless print statements in the example, but now the scirpt looks pretty minimal. I think it's ok to just remove this verbose
flag.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The redundant/meaningless print was intended to collect raw data of performance and accuracy in order to make further performance and accuracy analysis after we run the scripts in the different platforms with different versions of tools.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
its fine now, just remove verbose flag and let all prints come out (except remove the test accurac and please only run test set once at the end
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's also make sure the filenames are updated in the entire repo, e.g.,
$ git grep ogbn_products_sage.py
README.md:- **[SAGEConv](https://pytorch-geometric.readthedocs.io/en/latest/generated/torch_geometric.nn.conv.SAGEConv.html)** from Hamilton *et al.*: [Inductive Representation Learning on Large Graphs](https://arxiv.org/abs/1706.02216) (NIPS 2017) \[[**Example1**](https://github.com/pyg-team/pytorch_geometric/blob/master/examples/reddit.py), [**Example2**](https://github.com/pyg-team/pytorch_geometric/blob/master/examples/ogbn_products_sage.py), [**Example3**](https://github.com/pyg-team/pytorch_geometric/blob/master/examples/graph_sage_unsup.py), [**Example4**](https://github.com/pyg-team/pytorch_geometric/blob/master/examples/graph_sage_unsup_ppi.py)\]
README.md:- **[NeighborLoader](https://pytorch-geometric.readthedocs.io/en/latest/modules/loader.html#torch_geometric.loader.NeighborLoader)** from Hamilton *et al.*: [Inductive Representation Learning on Large Graphs](https://arxiv.org/abs/1706.02216) (NIPS 2017) \[[**Example1**](https://github.com/pyg-team/pytorch_geometric/blob/master/examples/reddit.py), [**Example2**](https://github.com/pyg-team/pytorch_geometric/blob/master/examples/ogbn_products_sage.py), [**Example3**](https://github.com/pyg-team/pytorch_geometric/blob/master/examples/ogbn_products_gat.py), [**Example4**](https://github.com/pyg-team/pytorch_geometric/blob/master/examples/hetero/to_hetero_mag.py)\]
torch_geometric/loader/neighbor_sampler.py: `examples/ogbn_products_sage.py
torch_geometric/loader/neighbor_sampler.py: ogbn_products_sage.py>`_.
Co-authored-by: Akihiro Nitta <[email protected]>
if val_acc > best_val: | ||
best_val = val_acc | ||
if test_acc > best_test: | ||
best_test = test_acc |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@guanxingithub in general in machine learning it is not proper to run the test set before training/val has finished.
Tracking the best val acc across multiple epochs for perflab is fine, but there should only be one final test accuracy for them to track. running test during training is sort of "cheating". please address me and akihiro's concerns in order to get this merged.
Improve the test accuracy to >60% on ogbn-papers100M consolidate repetitive examples