Skip to content

Commit 74ac51e

Browse files
committed
Fixed bug concerning newline chars (affecting york). Updated corpora.
1 parent 1bb3585 commit 74ac51e

11 files changed

+102026
-38712
lines changed

clean_corpus.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -195,7 +195,7 @@ def get_participants(chafiles):
195195
child_age = re.sub('[;\.]',' ',child_age) # Replace age written as "y;m.d" by "y m d"
196196
continue_line = 0
197197
processed_lines = ''
198-
with open(input_location +'\\'+ file) as corpus:
198+
with open(input_location +'\\'+ file, mode = 'rU') as corpus:
199199
for j, line in enumerate(corpus):
200200
line = line.strip()
201201
is_adult_speaking = ((line[0] == '*') and (line[1:4] in participants)) or (continue_line == 1)

compiled_corpus/corpus_ortho_0y0m_2y0m.txt

+720
Large diffs are not rendered by default.

compiled_corpus/corpus_phono_L_D_E_0y0m_2y0m.txt

+720
Large diffs are not rendered by default.

corpora/york/clean/extract.txt

+12,583
Large diffs are not rendered by default.

output/york/enchainement_cases.txt

+32,875-25,306
Large diffs are not rendered by default.

output/york/liaison_cases.txt

+4,307-3,359
Large diffs are not rendered by default.

output/york/liquid_deletion_cases.txt

+2,478-2,034
Large diffs are not rendered by default.

output/york/phonologized_L.txt

+12,583
Large diffs are not rendered by default.

output/york/phonologized_L_D.txt

+12,583
Large diffs are not rendered by default.

output/york/phonologized_L_D_E.txt

+12,583
Large diffs are not rendered by default.

output/york/rejected_liaison_cases.txt

+10,593-8,012
Large diffs are not rendered by default.

0 commit comments

Comments
 (0)