USTORY

Libraries

pandas 1.2.4
sklearn 0.24.2
numpy 1.19.5
scipy 1.4.1
sentence-transformers 2.0.0
torch 1.8.2 (for sentence-transformer)
spherical_kmeans (source)
b3 (source)

Data sets

Raw data sets (before preprocessed)

Data sets link: link
- Newsfeed14 source
- WCEP18/19 source
- USNews (case study) - included in the link

Preprocessing

Download the raw data sets in the above link, or you can prepare your own data set ('.csv','.json',..) where the row format is ['title', 'date', 'text', 'id', 'story' (if available)]
Run Dataset_preprocessing.ipynb to preprocess the data set

USTORY usage

Parameters

file_path: the path to a preprocessed data set file
window_size: the size of window in desired time units (e.g., days) - default = 7
slide_size: the size of slide in desired time units (e.g., days) - default = 1
num_windows: the total number of windows to evaluate - default = 365
min_articles: the minimum number of articles to form a story (e.g., 8 for Newsfeed14 and 18 for WCEP18/19 and USNews by default)
N: the number of thematic keywords - default = 10
T: the temperature for scaling the confidence score - default: 2
keyword_score: the type of keyword score function in ["tfidf", "bm25"] - default = "tfidf"
verbose: print the intermediate process in ["True", "False"] - default = "False"
story_label: the existence of label for data set (to evaluate accuracy) in ["True", "False"] - default = "True"

Run an example simulation

Output items (in order)

(all_window, cluster_keywords_df, final_num_cluster, avg_win_proc_time, nmi, ami, ri, ari, precision, recall, fscore)

all_window: include all article information and cluster assignment/confidence information
cluster_keywords_df: the lists of thematic keywords of clusters in every window

from USTORY import *
output = simulate(file_path, window_size, slide_size, begin_date, num_windows, min_articles, N, T, keyword_score, verbose, story_label)
print(fscore: output[-1])

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
Dataset_preprocessing.ipynb		Dataset_preprocessing.ipynb
Link_to_dataset.txt		Link_to_dataset.txt
README.md		README.md
USTORY.py		USTORY.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

USTORY

Libraries

Data sets

Raw data sets (before preprocessed)

Preprocessing

USTORY usage

Parameters

Run an example simulation

Output items (in order)

About

Releases

Packages

Languages

cliveyn/USTORY

Folders and files

Latest commit

History

Repository files navigation

USTORY

Libraries

Data sets

Raw data sets (before preprocessed)

Preprocessing

USTORY usage

Parameters

Run an example simulation

Output items (in order)

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages