DUE: February 15, 2012 by 10am (submission on Blackboard)Your starting point is a set of stub programs that you can download from the course website in the file hw2-stubs.zip.
These are incomplete implementations that handle a few things for you
(such as code to test your code), and have comments embedded in them to
help you solve the problems.Your submission will be a zipped up file with your implementations for each problem in it. Submit it on Blackboard as <lastname>_<firstname>_hw2.zip. Please follow the same instructions as for HW1 with respect how to package up the solutions before submission.Note:
there may be some new concepts in this homework that have not been
covered in the tutorials or in class. I have provided hints and
guidelines for you to be able to use them, but you should not feel shy
to ask for help. A big part of programming is trying to figure out some
new things based on what you already know, so give it a try, but at the
end of the day, I’m here to help smooth things out for you.If any of the instructions don’t make sense to you, please get in touch with the instructor right away. 1. List transformations and computations.For
this problem, you need to process a string from the command line, and
do some simple transformations using Lists. Modify the stub file lengthSort.scala.(a) Split the string from the command line (which you get as args(0))
and sort it. The sorting order is first according to length of each
token and then alphabetically by the token (so “to” is before “and”,
which is before “the”). Print out the list in this sorted order as a
single line, with each token converted to the format word:length, e.g. the:3. (See example output below.)Tip: The split method on Strings returns an Array, and you may want to work with a List instead. Just use toList on the Array to convert it to a List, e.g. “a:b:c:d”.split(“:”).toList.(b) Working from the sorted list you obtained in (a), now output the list using a similar format, but remix it so that you have:- the first group of elements are the tokens of length 3 or less, in ascending order
- the next group of elements are the tokens of length greater than 3 and less than 11, in descending order
- the last group of elements are the tokens of length 11 or more, in ascending order
See the example output for guidance on formatting, which is similar to (a).Example output.$
scala lengthSort.scala "Introduction to Computational Linguistics
introduces the most important data structures and algorithmic techniques
underlying computational linguistics"Sorted:
to:2 and:3 the:3 data:4 most:4 important:9 introduces:10 structures:10
techniques:10 underlying:10 Linguistics:11 algorithmic:11 linguistics:11
Introduction:12 Computational:13 computational:13Remixed:
to:2 and:3 the:3 underlying:10 techniques:10 structures:10
introduces:10 important:9 most:4 data:4 Linguistics:11 algorithmic:11
linguistics:11 Introduction:12 Computational:13 computational:13Hint:
You’ll want to use pairs (Tuple2) for a number of reasons. One of the
biggest advantages is that they sort first on the first element and then
on the second element.Note: You must make sure to enclose the argument to lengthSort.scala in quotes so that the entire argument is available in the first element of the args array. 2. Text manipulationHere is a text passage from H.G.Wells, The War of the Worlds (from Project Gutenberg, http://www.gutenberg.org/etext/36):After
the glimpse I had had of the Martians emerging from the cylinder in
which they had come to the earth from their planet, a kind of
fascination paralysed my actions. I remained standing knee-deep in the
heather, staring at the mound that hid them. I was a battleground of
fear and curiosity. I
did not dare to go back towards the pit, but I felt a passionate
longing to peer into it. I began walking, therefore, in a big curve,
seeking some point of vantage and continually looking at the sand heaps
that hid these new-comers to our earth. Once a leash of thin black
whips, like the arms of an octopus, flashed across the sunset and was
immediately withdrawn, and afterwards a thin rod rose up, joint by
joint, bearing at its apex a circular disk that spun with a wobbling
motion. What could be going on there? For this problem, you will perform series of manipulations and operations on this passage. Modify the stub program wotw.scala,
looking at the comments in it for guidance. See the output at the end
of this question for guidance on what your program’s output should be.Note: even though this problem is a bit lengthy in its description, the Scala solutions themselves can be quite short.(a)
In the passage, how many word tokens are there that end with the
letters “ing”? (Note that we are looking for words ending in “ing”, not
just occurrences of the letter sequence “ing” anywhere in a word. You do
not need to deal with punctuation. Just count only the words that
actually end in “ing”, ignoring those that end in “ing,” or “ing.”.)
Print it out as specified in the output below.Hint: Use the String method split
to split a string of text into individual words (and split on sequences
of one or more whitespace characters), the string method endsWith to test whether a string ends with a particular substring, and the List method count to find how many words pass that test.Note: The result of split
is an Array, not a List. For present purposes, this will behave
similarly to a List, so you don’t need to change anything. However, if
you do want a list, call the method toList, e.g. “a:b:c:d”.split(“:”).toList.(b) Make a list chopped
that contains all the words from the above passage, but with the last
two letters cut off. Print out a string that concatenates the chopped
tokens using a space as the separator, as below. (This is a very crude
first stab at making a stemmer, a program that cuts off suffixes and
leaves only word stems. We will see improved implementations later.)Hint: Do this using map on the result of splitting the string from part (a). Also, recall the slice method, which excises a sublist from a List, e.g.:scala> val foo = List(3,4,0,32,3,9,13)foo: List[Int] = List(3, 4, 0, 32, 3, 9, 13)scala> foo.slice(2,5)res21: List[Int] = List(0, 32, 3)Strings are also a List-like data structure (they are sequences of characters) and the slice method works for them too.Note: It’s fine to work with either List[String] or Array[String].(c
) A common item of interest when we get to language models is the
bigram word probability of one word given that another precedes it, such
as the probability of seeing the word “the” next given that you’ve just
seen the word “at” in a sentence like “I’ll see you at … “ This value
is expressed as P(the|at). At its most basic level, this is a simple thing to compute. Given a corpus of texts, the probability is calculated as: P(the|at) = count(at,the) / count(at)In
words: out of all the times I’ve seen the word “at”, how many times was
it followed by the word “the”? Write the code to compute this from the
WotW passage, and print out the result, as below.Hint: There is a very useful function on lists called sliding, which gives you a sliding window of the values in the list. This is easy to see in an example:scala> "a b c d".split(" ").sliding(2).toListres0: List[Array[java.lang.String]] = List(Array(a, b), Array(b, c), Array(c, d))The argument to sliding
is an Int that specifies how many items will be in each window. With
such a list of two-element Arrays in hand, you can now use the count function with an appropriately defined predication to compute the numerator of the probability.Goal output: (with ### substituted for actual numerical values)$ scala wotw.scalaPart (a)Number of words ending in 'ing': ###Part (b)Aft
t glimp h h t Martia emergi fr t cylind whi th h co t ear fr the
plane ki fascinati paralys action remain standi knee-de t heathe
stari t mou th h the w battlegrou fe a curiosit d n da ba towar t
pi b fe passiona longi pe in i beg walkin therefor b curv seeki
so poi vanta a continual looki t sa hea th h the new-come o eart On
lea th bla whip li t ar octopu flash acro t suns a w immediate
withdraw a afterwar th r ro u joi join beari i ap circul di th sp wi
wobbli motio Wh cou goi therPart (c)P(the|at) = ### 3. Map data structures (a.k.a. associative arrays)For
this problem, you will write a program that takes a list of names and
room numbers on the command line, adds them to entries from a Map
relating people to their rooms, and then creates a reverse Map that
relates rooms to the people who are in them.Look at the stub program rooms.scala. A Map named defaultRoomNumbers
has already been defined with a few entries. As in previous problems,
there are some print statements like print ’’Part (a)’’ – make sure not
to delete those.Note: you must use immutable Maps for this problem.(a) The arguments on the command line are to be expected as a list of (person number) pairs, like Bill 315 Angela 412 Susan 325.
Get this information from the command line (making sure to check that
an even number of arguments has been provided), and use it to add create
a new Map that contains both these person to room pairings and those in
the defaultRoomNumbers Map. Then print out who is in which room, alphabetically by name.Hint:
use the mod function % to find out whether there are an even number of
arguments. (Try 5 % 2 and 4 % 2 in the Scala REPL and you’ll see what it
does.)(b)
Some people might share the same room. Create a new map that maps rooms
to the people in them, so it will have room numbers as keys, and lists
of people as values.Here’s an example of how your program should work on some example input:$ scala rooms.scala Bill 315 Angela 412 Susan 325Part (a)Angela: Room 412Bill: Room 315Jane: Room 312Sam: Room 312Susan: Room 325Ted: Room 325Part (b)Room 312: Sam,JaneRoom 315: BillRoom 325: Susan,TedRoom 412: Angela 4. Word-by-word translationFor
this problem you’ll create a very simple machine translator that looks
at each word in a source language sentence and maps it to a unique word
in a target language. This will involve working with the file system and
with Maps. Use the stub file translate.scala for this.You are given a (very) small translation dictionary in the file por2eng.txt.
You must read this in from the command line (as the first argument) to
create a Map that maps Portuguese words to the English words, as
specified in por2eng.txt.
Then, using that Map, you must translate any remaining arguments on the
command line (which you can expect to be Portuguese sentences) and
translate them one by one to English.As with the previous problem, you should use only immutable data structures.Here is an example call to translate.scala and the output it should produce for the given input.$ scala translate.scala por2eng.txt "a menina ouviu o cachorro ontem" "o cachorro viu o gato hoje"Portuguese: a menina ouviu o cachorro ontemEnglish: the girl heard the dog yesterdayPortuguese: o cachorro viu o gato hojeEnglish: the dog saw the cat today |
 Updating...
Jason Baldridge, Jan 30, 2012, 7:30 PM
|