formatforlicense

The aim of this repository is to resolve the problem of license issues while sharing datasets and labels in token level. We are proposing a representation for sharing dataset among researchers and communities.

For each token, one word from left and one word from right will be given. There is a transmitter and receiver in this process.

Transmitter(which is sharing the tokens) is converting the tokens from folia format to OUR format and is sharing the links of news.

Receiver is getting news from links(In this case, using justext). Unlabeled Conll files will be created using news and script will try to match tokens of shared format from transmitter with the text of created empty Conll file. If it matches, It will change the label.

1- Get news with Justext 2- Get justext to folia 3- Folia to Conll 4- Matching

Dependencies:

requests pynlpl justext

Convert Folia Files to Conll Format

This script converts Folia Files to Conll files.

python3 folia2conll.py source_dir save_dir

Convert Folia Files to OUR Format

This script converts all folia files in source_dir to our format and saving to save_dir

python3 foliatoconlllike.py source_dir save_dir

Getting News with justext

This script gets all news from links in source_dir using justext and writes to files in a save_dir

python3 justextnews.py source_dir save_dir

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
README.md		README.md
folia2conll.py		folia2conll.py
foliatoconlllike.py		foliatoconlllike.py
justext_folia_conll.py		justext_folia_conll.py
justextnews.py		justextnews.py
matchfile.py		matchfile.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

formatforlicense

Convert Folia Files to Conll Format

Convert Folia Files to OUR Format

Getting News with justext

About

Releases

Packages

Languages

emerging-welfare/formatforlicense

Folders and files

Latest commit

History

Repository files navigation

formatforlicense

Convert Folia Files to Conll Format

Convert Folia Files to OUR Format

Getting News with justext

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages