Skip to content

Corpus Linguistics slides, labs, assignments and data

Notifications You must be signed in to change notification settings

juletx/corpus-linguistics

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

26 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Corpus Linguistics

This course is an introduction to corpus linguistics. We will start with a brief introduction to textual corpora, including linguistic annotation and representation schemas. We will then address aspects such as the extraction of relevant information from corpora, such as collocations or keyword extraction, using statistical and distributional techniques. Finally, we will learn the XML markup language. During the module we will introduce several corpora in various languages (English, Spanish, Basque, etc).

Syllabus

  • Introduction to Corpus Linguistics

    • Introduction
    • Corpus Linguistics
    • Uses of corpora
    • Corpus types
    • Corpus annotation and standards for linguistic representation
  • XML

    • XML introduction
    • XML schemas and validation
    • XPath
  • Laboratories

    • Linux commands
    • Word frequencies and Zipf law
    • Collocations
    • Keyword extraction
    • XML and XPath
  • Assignments

    • Brown collocations
    • Hyperpartisan log-odd ratios

Software

Evaluation

  • Attendance and participation: 10%
  • Class assignments: 55%
  • Assignments: three choices
    • Regular assignment: 20%
    • ’Hard’ assignment: 35%
    • Propose a subject for the final project: 35%