Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vertices in unconnected_vertex_pairs does not exists #1

Open
mahmoodtareq opened this issue Sep 6, 2021 · 7 comments
Open

Vertices in unconnected_vertex_pairs does not exists #1

mahmoodtareq opened this issue Sep 6, 2021 · 7 comments
Labels
question Further information is requested

Comments

@mahmoodtareq
Copy link

mahmoodtareq commented Sep 6, 2021

From what I understood, vertices mentioned in unconnected_vertex_pairs, should also be present in the vertices mentioned in full_dynamic_graph_sparse. But, after running following snippet,

u_v = set(unconnected_vertex_pairs[:, 0]) | set(unconnected_vertex_pairs[:, 1])
c_v = set(full_dynamic_graph_sparse[:, 0]) | set(full_dynamic_graph_sparse[:, 1])

the output of len(u_v - c_v) is 13152. That means, 13752 vertices in the unconnected list do not appear in the full dynamic graph. I may understood it wrong, so please clarify.

@franciscoandrades
Copy link

Hey mahmoodtareq. Another participant here. It appears that there is a significant amount of nodes with degree 0. You got the same conclusion?

@MarioKrenn6240
Copy link
Contributor

MarioKrenn6240 commented Sep 9, 2021

Hi @mahmoodtareq and @franciscoandrades,
Sorry for the late answer.

I use this code:

import pickle

data_source='CompetitionSet2017_3.pkl'

with open(data_source, "rb" ) as pkl_file:
    full_dynamic_graph_sparse,unconnected_vertex_pairs,year_start,years_delta = pickle.load(pkl_file)

c_v = set(full_dynamic_graph_sparse[:, 0]) | set(full_dynamic_graph_sparse[:, 1])
u_v = set(unconnected_vertex_pairs[:, 0]) | set(unconnected_vertex_pairs[:, 1])

print('vertices in full_dynamic_graph_sparse: ', len(c_v))
print('vertices in unconnected_vertex_pairs: ', len(u_v))

And get the output:

vertices in full_dynamic_graph_sparse:  55973
vertices in unconnected_vertex_pairs:  41310

Let me know if this helps. Happy to explain more.

@MarioKrenn6240
Copy link
Contributor

Dear @mahmoodtareq and @franciscoandrades,
Thank you again for this question. We just saw that the datasets actually do not have any restrictions on the degrees of the vertices. This fact results in situations where you are asked to predict edges with vertices of degree=0.

It doesnt change the competition, but makes it more interesting, because now you can exploit this fact in building your solutions. We will post a more detailed analysation for that fact at the GitHub main page in a few days.

Thank you a lot for investigating the data so thoroughly - you are certainly on the right track! :)

@MarioKrenn6240 MarioKrenn6240 pinned this issue Sep 9, 2021
@mahmoodtareq
Copy link
Author

Hey mahmoodtareq. Another participant here. It appears that there is a significant amount of nodes with degree 0. You got the same conclusion?

@franciscoandrades Yeah, same conclusion. I got confused because, in the tutorial notebook, it was mentioned that "unconnected_vertex_pairs: This is a list of vertex pairs v1,v2 with deg(v1)>=10, deg(v2)>=10". I thought I made a mistake reading the data. Thanks to @MarioKrenn6240 for mentioning "This fact results in situations where you are asked to predict edges with vertices of degree=0". So, it the issue was in the data after all.

@mkk20
Copy link
Contributor

mkk20 commented Sep 13, 2021

Hi @mahmoodtareq and @franciscoandrades . First up thanks for your interest in our competition and thanks for your astute observations. As @MarioKrenn6240 said, this does not change the competition, but you might want to change the training data set we suggested to use (TrainSet2014_3). In it we have 27% of vertices that are in the unconnected vertex pairs data set but are also not connected to the rest of the graph ever. This is suboptimal as one has no information on these vertices (you could permute them ... ). Reducing the training set by these vertices might lead to better model performances. Note that in our competition data set (CompetitionSet2017_3) you only find a small percentage of such vertices.

@MarioKrenn6240 MarioKrenn6240 added the question Further information is requested label Sep 24, 2021
@trdavidson
Copy link

trdavidson commented Oct 2, 2021

Hi @mkk20 - some confusion here I hope you can resolve. The competition prediction set has 18.08% vertices, that are 0-degree in 2017, which is not a small percentage (see code]. Should we read your comments as:

  1. The 0-degree vertices will be ignored for leadership-board computations, i.e. do not account for 0-degree nodes
  2. The 0-degree vertices will count, and you should design your model accordingly, i.e. account for 0-degree nodes

As you can imagine, this makes quite the difference in inductive biases one would like to inject. Thanks!

code

import pickle
import numpy as np

# load data
data_source='CompetitionSet2017_3.pkl'
full_dynamic_graph_sparse,unconnected_vertex_pairs,year_start,years_delta = pickle.load( open( data_source, "rb" ) )

# find unique node ids
edges = full_dynamic_graph_sparse[:, :2]
nodes = set(list(edges.flatten()))
print(f'all unique connected nodes: {len(nodes)}')

# find unique nod ides of prediction set
nodes_eval = set(list(unconnected_vertex_pairs.flatten()))
print(f'all unique nodes in prediction set: {len(nodes_eval)}')

# find 0-degree nodes in prediction set --> should be 0, per competition evalution description
zero_degree_nodes = nodes_eval - nodes
print(f'nodes in prediction set with degree 0: {len(zero_degree_nodes)} [{len(zero_degree_nodes) / len(nodes_eval): .2%}]')

@lcfalors, @nicola-decao

@MarioKrenn6240
Copy link
Contributor

Hi @mkk20 - some confusion here I hope you can resolve. The competition prediction set has 18.08% vertices, that are 0-degree in 2017, which is not a small percentage (see code]. Should we read your comments as:

1. The 0-degree vertices will be _ignored_ for leadership-board computations, i.e. do not account for 0-degree nodes

2. The 0-degree vertices will _count_, and you should design your model accordingly, i.e. account for 0-degree nodes

Dear @trdavidson, thanks for your question - The 0-degree vertices will be used (i.e. your 2nd comment is the true one). You have information about edges formed with the zero-degree vertices that can be used (the edges are formed with other vertices which you can use as extract information). You could consider this subtask as a first implicit attempt to perform predictions of new vertices.

Please let me know if you have additional questions on this, i am happy to explain more -- thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

5 participants