AUTOTYP version 1.0.0 is a completely new release that focuses on usability, documentation and completeness. It has been radically overhauled compared to the earlier 0.1.x version. The sheer number of differences makes it impossible to provide a comprehensive list of changes. What follows is a quick summary of the most important of the new release as well as notes on migrating from the old database releases.
-
New naming conventions for datasets and variables, focusing on usability and clarity. All names now consistently follow the CamelCase convention and are based on verbose descriptions that provide more context about the variable (e.g.
Position
->VerbInflectionMarkerPosition
). Hundreds of variables have been renamed to fit these criteria. -
The datasets are now organized into thematic modules, rather than each dataset constituting a module on its own.
-
Published data now includes the raw exported database data, in addition to the previously published derived aggregated tables. All aggregation scripts used to compute derived data are published as well (see
aggregation-scripts
). Please feel free to inspect the scripts and modify them to suit your own needs. -
Many improvements to variable descriptions and metadata. The metadata YAML files are now simpler and more compact, which should make the documentation more accessible.
-
Overhauled the data architecture to allow nested and repeated table fields (see Data Architecture). This allows many datasets to be expressed in a more natural, conceptually simpler fashion.
-
New R and JSON exports for users who want quick access to the data using their preferred data wrangling environment.
-
Language name and glottocode is exported for every dataset in addition to the internal language ID
-
GrammaticalRelations
module now encompasses all data on grammatical relations and alignments. We now fully provide the underlying raw database data in addition to the aggregated alignment data and the scripts used to produce these aggregations. -
VerbSynthesis
has been overhauled to include detailed list of inflectional categories expressed on verbs -
LocusOfMarking
module now contains the raw database data in addition to the previously published aggregations. -
GrammaticalMarkers
dataset has been overhauled to include a detailed list of marker hosts and marked categories -
MorphemeClasses
replaces the previous aggregatedMorpheme_types
dataset and exposes the information about individual language-specific morpheme classes. The information previously available inMorpheme_types
is now integrated into the improvedMorphologyPerLanguage
aggregated dataset. -
New module
Categories
groups together datasets that provide information about selected grammatical categories -
New module
Definitions
provides access to underlying definitions of categorical variables used across AUTOTYP -
New module
PerLanguageSummaries
groups together various per-language aggregated summaries (code to generate these summaries is available underaggregation-scripts
)
If you have been using the AUTOTYP version 0.1.x you will notice that many datasets have been moved or renamed. The following list should help you to find the new location of the data:
Agreement
is now exported asCategories/Agreement
Alienability
is now exported asCategories/Alienability
Alignment
is now exported asGrammaticalRelations/Alignment
Alignment_per_language
is nowPerLanguageSummaries/AlignmentForDefaultPredicatesPerLanguage
Clause_linkage
is nowSentence/ClauseLinkage
Clause_word_order
is nowSentence/ClauseWordOrder
Clusivity
is now exported asCategories/Clusivity
Gender
is now exported asCategories/Gender
Grammatical_markers
is now exported asMorphology/GrammaticalMarkers
GR_per_language
has been superseded byGrammaticalRelations/GrammaticalRelationCoverage
Locus_per_language
is nowPerLanguageSummaries/LocusOfMarkingPerLanguage
Locus_per_macrorelation
has been superseded byMorphology/DefaultLocusOfMarkingPerMacrorelation
Locus_per_microrelation
has been superseded byMorphology/LocusOfMarkingPerMicrorelation
Markers_per_language
is nowPerLanguageSummaries/GrammaticalMarkersPerLanguage
Morpheme_types
has been superseded byMorphology/MorphemeClasses
andPerLanguageSummaries/MorphologyPerLanguage
Morphology_per_language
is nowPerLanguageSummaries/MorphologyPerLanguage
NP_per_language
is nowPerLanguageSummaries/NPStructurePerLanguage
NP_structure
is nowNP/NPStructure
NP_structure_presence
is nowPerLanguageSummaries/NPStructurePresence
Numeral_classifiers
is now exported asCategories/NumeralClassifiers
Register
is stillRegister
Synthesis
is nowMorphology/VerbSynthesis
Valence_classes
is nowGrammaticalRelations/PredicateClasses
Valence_classes_per_language
is nowPerLanguageSummaries/PredicateClassesSemanticsPerLanguage
VInfl_counts_per_position
is nowPerLanguageSummaries/VerbInflectionAndAgreementCountsByPosition
VInfl_cat_*
is nowPerLanguageSummaries/VerbInflectionCategoriesAggregatedBy*
VInfl_macrocat_*
is nowPerLanguageSummaries/VerbInflectionMacrocategories*
VAgr_*
is nowPerLanguageSummaries/VerbAgreementAggregatedBy*
Word_domains
is nowWord/WordDomains