Skip to content

JieyuZhao/toWino

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Convert winobias dataset from .txt format to .conll format

Based on Berkeley Coref system (please check their website for more info)

  1. extract the senteces from winobias.txt (in our case, winobias.txt means anti_stereotyped_type1.txt.dev etc.)
mkdir wino_sentences
python toSentences.py data/anti_stereotyped_type1.txt.dev wino_sentences/ 
  1. Run Berkeleycoref preprocessh script (refer to "preprocessing" section here)

  2. Add all the side info obtained by Berkeleycoref to our data:

mkdir wino_berkeley
python addCoref.py data/winobias.txt data/wino_preprocess/ wino_berkeley/
mkdir wino_conll
python toWino.py wino_berkeley/ wino_conll/