Vertices in unconnected_vertex_pairs does not exists #1

mahmoodtareq · 2021-09-06T18:50:24Z

From what I understood, vertices mentioned in unconnected_vertex_pairs, should also be present in the vertices mentioned in full_dynamic_graph_sparse. But, after running following snippet,

u_v = set(unconnected_vertex_pairs[:, 0]) | set(unconnected_vertex_pairs[:, 1])
c_v = set(full_dynamic_graph_sparse[:, 0]) | set(full_dynamic_graph_sparse[:, 1])

the output of len(u_v - c_v) is 13152. That means, 13752 vertices in the unconnected list do not appear in the full dynamic graph. I may understood it wrong, so please clarify.

The text was updated successfully, but these errors were encountered:

franciscoandrades · 2021-09-08T22:55:38Z

Hey mahmoodtareq. Another participant here. It appears that there is a significant amount of nodes with degree 0. You got the same conclusion?

MarioKrenn6240 · 2021-09-09T03:42:35Z

Hi @mahmoodtareq and @franciscoandrades,
Sorry for the late answer.

I use this code:

import pickle

data_source='CompetitionSet2017_3.pkl'

with open(data_source, "rb" ) as pkl_file:
    full_dynamic_graph_sparse,unconnected_vertex_pairs,year_start,years_delta = pickle.load(pkl_file)

c_v = set(full_dynamic_graph_sparse[:, 0]) | set(full_dynamic_graph_sparse[:, 1])
u_v = set(unconnected_vertex_pairs[:, 0]) | set(unconnected_vertex_pairs[:, 1])

print('vertices in full_dynamic_graph_sparse: ', len(c_v))
print('vertices in unconnected_vertex_pairs: ', len(u_v))

And get the output:

vertices in full_dynamic_graph_sparse:  55973
vertices in unconnected_vertex_pairs:  41310

Let me know if this helps. Happy to explain more.

MarioKrenn6240 · 2021-09-09T16:28:51Z

Dear @mahmoodtareq and @franciscoandrades,
Thank you again for this question. We just saw that the datasets actually do not have any restrictions on the degrees of the vertices. This fact results in situations where you are asked to predict edges with vertices of degree=0.

It doesnt change the competition, but makes it more interesting, because now you can exploit this fact in building your solutions. We will post a more detailed analysation for that fact at the GitHub main page in a few days.

Thank you a lot for investigating the data so thoroughly - you are certainly on the right track! :)

mahmoodtareq · 2021-09-13T09:51:03Z

Hey mahmoodtareq. Another participant here. It appears that there is a significant amount of nodes with degree 0. You got the same conclusion?

@franciscoandrades Yeah, same conclusion. I got confused because, in the tutorial notebook, it was mentioned that "unconnected_vertex_pairs: This is a list of vertex pairs v1,v2 with deg(v1)>=10, deg(v2)>=10". I thought I made a mistake reading the data. Thanks to @MarioKrenn6240 for mentioning "This fact results in situations where you are asked to predict edges with vertices of degree=0". So, it the issue was in the data after all.

mkk20 · 2021-09-13T10:03:24Z

Hi @mahmoodtareq and @franciscoandrades . First up thanks for your interest in our competition and thanks for your astute observations. As @MarioKrenn6240 said, this does not change the competition, but you might want to change the training data set we suggested to use (TrainSet2014_3). In it we have 27% of vertices that are in the unconnected vertex pairs data set but are also not connected to the rest of the graph ever. This is suboptimal as one has no information on these vertices (you could permute them ... ). Reducing the training set by these vertices might lead to better model performances. Note that in our competition data set (CompetitionSet2017_3) you only find a small percentage of such vertices.

trdavidson · 2021-10-02T12:44:29Z

Hi @mkk20 - some confusion here I hope you can resolve. The competition prediction set has 18.08% vertices, that are 0-degree in 2017, which is not a small percentage (see code]. Should we read your comments as:

The 0-degree vertices will be ignored for leadership-board computations, i.e. do not account for 0-degree nodes
The 0-degree vertices will count, and you should design your model accordingly, i.e. account for 0-degree nodes

As you can imagine, this makes quite the difference in inductive biases one would like to inject. Thanks!

code

import pickle
import numpy as np

# load data
data_source='CompetitionSet2017_3.pkl'
full_dynamic_graph_sparse,unconnected_vertex_pairs,year_start,years_delta = pickle.load( open( data_source, "rb" ) )

# find unique node ids
edges = full_dynamic_graph_sparse[:, :2]
nodes = set(list(edges.flatten()))
print(f'all unique connected nodes: {len(nodes)}')

# find unique nod ides of prediction set
nodes_eval = set(list(unconnected_vertex_pairs.flatten()))
print(f'all unique nodes in prediction set: {len(nodes_eval)}')

# find 0-degree nodes in prediction set --> should be 0, per competition evalution description
zero_degree_nodes = nodes_eval - nodes
print(f'nodes in prediction set with degree 0: {len(zero_degree_nodes)} [{len(zero_degree_nodes) / len(nodes_eval): .2%}]')

@lcfalors, @nicola-decao

MarioKrenn6240 · 2021-10-02T22:18:26Z

Hi @mkk20 - some confusion here I hope you can resolve. The competition prediction set has 18.08% vertices, that are 0-degree in 2017, which is not a small percentage (see code]. Should we read your comments as:
1. The 0-degree vertices will be _ignored_ for leadership-board computations, i.e. do not account for 0-degree nodes

2. The 0-degree vertices will _count_, and you should design your model accordingly, i.e. account for 0-degree nodes

Dear @trdavidson, thanks for your question - The 0-degree vertices will be used (i.e. your 2nd comment is the true one). You have information about edges formed with the zero-degree vertices that can be used (the edges are formed with other vertices which you can use as extract information). You could consider this subtask as a first implicit attempt to perform predictions of new vertices.

Please let me know if you have additional questions on this, i am happy to explain more -- thanks!

MarioKrenn6240 pinned this issue Sep 9, 2021

MarioKrenn6240 added the question Further information is requested label Sep 24, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Vertices in unconnected_vertex_pairs does not exists #1

Vertices in unconnected_vertex_pairs does not exists #1

mahmoodtareq commented Sep 6, 2021 •

edited

Loading

franciscoandrades commented Sep 8, 2021

MarioKrenn6240 commented Sep 9, 2021 •

edited

Loading

MarioKrenn6240 commented Sep 9, 2021

mahmoodtareq commented Sep 13, 2021

mkk20 commented Sep 13, 2021

trdavidson commented Oct 2, 2021 •

edited

Loading

MarioKrenn6240 commented Oct 2, 2021

Vertices in unconnected_vertex_pairs does not exists #1

Vertices in unconnected_vertex_pairs does not exists #1

Comments

mahmoodtareq commented Sep 6, 2021 • edited Loading

franciscoandrades commented Sep 8, 2021

MarioKrenn6240 commented Sep 9, 2021 • edited Loading

MarioKrenn6240 commented Sep 9, 2021

mahmoodtareq commented Sep 13, 2021

mkk20 commented Sep 13, 2021

trdavidson commented Oct 2, 2021 • edited Loading

code

MarioKrenn6240 commented Oct 2, 2021

mahmoodtareq commented Sep 6, 2021 •

edited

Loading

MarioKrenn6240 commented Sep 9, 2021 •

edited

Loading

trdavidson commented Oct 2, 2021 •

edited

Loading