Course: LIN386M  Applied Text Analysis
Semester: Spring 2012

Instructor Contact Information

Jason Baldridge
office hours:  Mon 10-noon, Fri 2-3
office: Calhoun 510
phone: 232-7682


Graduate standing.

Syllabus and Text

This page serves as the syllabus for this course.

There is no required course text book. Readings will primarily come from online sources, including tutorials. There will also be readings assigned from the following draft book:

  • Dickinson, M., C. Brew, and D. Meurers. Language and Computers. Unpublished manuscript book. A PDF of the book will be made available on this course's Blackboard site.

Students are encouraged to obtain a copy of the following standard natural language processing book:

Additional required readings will be made available for download from the schedule page of the course website.

For learning Scala, there is no official course book. I have created a series of Scala tutorials for beginning programmers to get students started, and an overview is given on the links page. In addition, here are some resources:

Exams and Assignments

There will be no midterm or final exam.

There will be seven homework assignments. Assignments will be updated on the assignments page. A tentative schedule for the entire semester is posted on the schedule page. Readings and exercises may change up one week in advance of their due dates.  

Goals and Overview

Text analytics is an applied arm of the field of computational linguistics, which has experienced significant growth in the last two decades. Some of the most important factors behind this include the use of machine learning techniques, the availability of large (sometimes annotated) corpora (including the web itself), and the availability of relatively cheap and powerful computers. Together, these factors have played a major part in making computational linguistics very relevant in applied settings.

The foremost goal of this course is to expose the student to the core techniques and applications of text analysis. By the end, students will understand the motivations for and capabilities of several core natural language processing and machine learning algorithms and techniques used in text analysis, including:

  • regular expressions
  • vector space models
  • clustering (e.g. k-means)
  • classification (e.g. naive Bayes, perceptrons, and maximum entropy)
  • n-gram language models
  • topic models
  • part-of-speech tagging
  • named entity recognition
We will show, on a few chosen topics, how natural language processing builds on and uses the fundamental data structures and algorithms presented in this course. In particular, we will discuss:
  • authorship attribution
  • sentiment analysis
  • information extraction
  • geolocation
The course will also serve as an introduction to Scala programming and programming for text analysis. Students will learn to write non-trivial programs for text analysis that take advantage of existing open source toolkits.

The course will help prepare students both for jobs in the industry and for doing original research that involves text analysis.

    See the course schedule for details.

    Course Requirements

    The course grade is based on seven assignments. The first is worth 10% of the overall course grade and the last six are each worth 15% of the overall course grade.

    Grading scale. The grading scale used is different from the usual one used in the USA.

    80+ A
    77-80 A-
    74-77 B+
    70-74 B
    67-70 B-
    64-67 C+
    60-64 C
    57-60 C-
    54-57 D+
    50-54 D
    47-50 D-
    0-47 F

    This scale is inspired by typical British grading scale. It allows us to give you a better sense of where you can improve, taking off points, but still giving an A for quality work. Also, if you get 90+, it means you did an amazingly good job, above and beyond expectations.

    Attendance is not required, and it is not used as part of determining the grade.

    Extension Policy

    Homework must be turned in on the due date in order to receive credit. Late homework will be accepted only under exceptional circumstances (e.g., medical or family emergency) and at the discretion of the instructor (e.g. exceptional denotes a rare event).  This policy allowing for exceptional circumstances is not a right, but a privilege and courtesy to be used when needed and not abused. Should you encounter such circumstances, simply email assignment to instructor and note "late submission due to exceptional circumstances". You do not need to provide any further justification or personally revealing information regarding the details. 

    Academic Honor Code

    You are encouraged to discuss assignments with classmates, but all written submission must reflect your own, original work. If in doubt, ask the instructor. Acts like plagiarism represent a serious violation of UT's Honor Code and standards of conduct:

    Students who violate University rules on academic dishonesty are subject to severe disciplinary penalties, such as automatically failing the course and potentially being dismissed from the University. Don't risk it. Honor code violations ultimately harm yourself as well as other students, and the integrity of the University, policies on academic honesty will be strictly enforced.

    For further information please visit the Student Judicial Services Web site:

    Notice about students with disabilities

    The University of Texas at Austin provides appropriate accommodations for qualified students with disabilities. To determine if you qualify, please contact the Dean of Students at 512-471-6529 or UT Services for Students with Disabilities. If they certify your needs, we will work with you to make appropriate arrangements.

    UT SSD Website:

    Notice about missed work due to religious holy days

    A student who misses an examination, work assignment, or other project due to the observance of a religious holy day will be given an opportunity to complete the work missed within a reasonable time after the absence, provided that he or she has properly notified the instructor. It is the policy of the University of Texas at Austin that the student must notify the instructor at least fourteen days prior to the classes scheduled on dates he or she will be absent to observe a religious holy day. For religious holy days that fall within the first two weeks of the semester, the notice should be given on the first day of the semester. The student will not be penalized for these excused absences, but the instructor may appropriately respond if the student fails to complete satisfactorily the missed assignment or examination within a reasonable time after the excused absence.