-
Notifications
You must be signed in to change notification settings - Fork 2
/
abstract.tex
60 lines (54 loc) · 2.71 KB
/
abstract.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
{\centering A STATISTICAL METHOD FOR SYNTACTIC DIALECTOMETRY
}
This dissertation establishes the utility and reliability of a
statistical distance measure for syntactic dialectometry, expanding
dialectometry's methods to include syntax as well as phonology and
the lexicon. It establishes the measure's reliability by comparing
its results to those of dialectology and phonological dialectometry
on Swedish dialects, as well as evaluating variant parameter
settings.
% Dialectology studies varieties of language. Dialectometry is a
% subfield which studies language varieties quantitatively.
The
research questions of this dissertation are (1) whether a
statistical measure of syntax for dialectometry will reproduce the
results of syntactic dialectology and phonological dialectometry and
(2) what parameter settings produce results most similar to
dialectology's results.
% Answering these questions will establish
% whether a statistical measure of syntax is useful, and if so, how it
% can best be used in future research.
Statistical dialect distance is defined in two parts: a feature set
that captures linguistic properties and a measure of dissimilarity
that combines two sites' features into a single number. This
dissertation uses feature sets from previous work: trigrams
(Nerbonne \& Wiersma, 2006) and leaf-ancestor paths (Sanders,
2007). In addition, it introduces two other feature sets: leaf-head
paths based on dependencies and phrase-structure rules. This
dissertation uses the measure $R$ (Nerbonne \& Wiersma 2006) as well
as measures from information theory: Kullback-Leibler and
Jensen-Shannon divergences and cosine similarity.
This statistical distance is tested on the Swediasyn, a corpus of
interviews recorded in villages throughout Sweden.
% Before measuring
% distance between interview sites, the interview transcriptions were
% annotated linguistically to support the above feature sets.
After
the distance was measured, the distances were processed and then
compared with existing dialectology results.
Unlike previous work, significant distances were measured between
dialect corpora in this dissertation. When these distances are
mapped to the geography of Sweden, they reproduce the traditional
dialect regions of Sweden. There is weak correlation with geographic
distance, but good agreement between dialectometric syntactic and
phonological distance. Comparing specific dialect features with
those of dialectology is inconclusive; better comparison methods are
needed.
\noindent{}\rule{4in}{1pt} \\
\rule{4in}{1pt} \\
\rule{4in}{1pt} \\
\rule{4in}{1pt}
%%% Local Variables:
%%% mode: latex
%%% TeX-master: "dissertation.tex"
%%% End: