Run ipynb 01-10 in sequence or train sh
01 get text(title,abstract,keyword,venue) embedding from tfidf,word2vec,chatglm3 and bge-m3
02 For each autherID, calculate the similarity between each pid and other pids
03 Extract strongly correlated information (co author, co-org, co-keyword...)
04 tree model
05 gnn model and post-processs