assignments

Einstein’s Theory of General Relativity – A Personal Description

From the syllabus – Reading: Find and read descriptions of Einstein’s Theory of General Relativity and how it relates to time. Describe it in your own words on your blog and post a link to it on the wiki. We’ll go through your descriptions in class. How would you describe it to a 5 year old or your grandmother?

Nothing can go faster than the speed of light, and this restriction applies to influences (such as the force of gravity) between massive elements (planets, particles, people) in the universe. It is these forces that determine the movement of objects, the swinging of a pendulum in a clock, the workings of our consciousnesses, and the travel of light.

When the objects in question are moving with respect to some other perceiving entity, the things traveling between the moving objects have to travel slightly further because the destination object will have moved further away from the original point of the source object by the time the things moving between them can arrive. It follows that those things then have to either travel faster or take longer to get there. If those things are already traveling at the speed of light (such as influence due to gravity), then they cannot go any faster, so they must take longer, and the interactions between the moving objects slow from the point of view of the outside observer.

The moving object, however, still behaves just as it did before, albeit slightly more slowly. Because the workings of its own ability to perceive time – either with a clock, or with a human consciousness, or with something else – have all slowed down in precisely the same fashion, the moving object experiences time just as it did before.

Particularly useful sources:
http://www.physics.fsu.edu/Courses/Spring98/AST3033/Relativity/GeneralRelativity.htm – “When later it became clear that influences travel at finite speeds it was reasonable to suppose this true of gravity also.”
http://www.squidoo.com/relativity_explanation

ITP
Time
assignments

Programming A to Z – Assignment #9 Evolvocabulary

The last of the weekly assignments was relatively open ended –

Acquire some text. Visualize it. Source and methodology are up to you, but be prepared to justify your choices.

I decided to use my papers from college as my source text. I copy pasted the contents of the papers into plain text files, and had hoped to see how my vocabulary evolved through time (hence the project name… not the most clever one I’ve come up with, but it will do for a weekly assignment). (Note I didn’t include the writing I did for certain technical Linguistics, Computer Science, and Physics classes. I also didn’t include papers that were group-authored.)

In week four of the class we had looked at how to represent word counts as a hashmap of words to the number of times that they occurred in a text (see the WordCount class in the notes). WordCount extends TextFilter, however, and TextFilter is built to only be able to read data from a single file. I thought about combining the files into one or trying to use multiple TextFilters, but it seemed easier and more elegant to start from scratch.

Scala seemed like it would be well-suited to this sort of problem, and I was eager to find a use for it since it had been a couple months since I last worked on TwiTerra. My aforementioned friend Jorge pointed me towards a class written to parse Ruby log files; that code, which uses a Scala community library called Scalax, served as a useful starting point.

You can see the full source code for the assignment here, but I’ve pasted a particularly interesting function below. There’s some “deep Scalax magic” going on here (as Jorge says), which I’ll explain -

  def readAllFiles(fileNames: List[String]): Map[String, Map[String, Int]] = {
    val wordsInFiles: List[(String, String)] = for {
      filename <- fileNames
      line <- filename.toFile.readLines()
      word <- line.split("\\W+|\\d+")
    } yield (word.toLowerCase.trim, filename)
 
    val emptyMap: Map[String, Map[String, Int]] = Map.empty.withDefaultValue(Map.empty.withDefaultValue(0))
 
    wordsInFiles.foldLeft(emptyMap) { case (map, (word, filename)) => 
        map.update(word, map.apply(word).update(filename, map.apply(word).apply(filename) + 1))
      //map.update(word, map.apply(word).update(filename, map(word)(filename) + 1))
      //map.update(word, map.apply(word)(filename) = map(word)(filename) + 1)
      //map.update(word, map(word)(filename) = map(word)(filename) + 1)
      //map.update(word, map(word)(filename) += 1)
      //map(word) = (map(word)(filename) += 1)
    }
  }

The function takes a list of the names of the files mentioned above, and returns a map that has each word mapped to another map, and each of those maps has the names of the files in which that word occurred mapped to the number of times that word occurred in that file. There are three parts of the function:

  1. The first part of the function goes through all the files in that list, and then through each line in that file, and then through each token in that line (delimiting tokens by non-alphabetic characters), and puts each of those words in a list as a pair with the file in which they occurred. Thus wordsInFiles is a long list of words and file names, with an entry for every word on every line of every file.
  2. The function then initializes an empty map (emptyMap) with default values for the words and file names, and 0 for the word counts. This eliminates the need for a lot of hassle later on checking to see if words/file names are in the map – we can just assume they are there and trust it to use the default values if they aren’t.
  3. Finally, it operates on each pair in the wordsInFiles list and updates the map accordingly. foldLeft is explained thoroughly on the Ruby log file example linked above, but I’ll go through this specific case. It starts off with the emptyMap, goes through each pair in the wordsInFiles list, and performs a function on the pair of that map paired with that pair from the list ((map, (word, filename))) to fold that list pair into the map. The result of that fold is a map that is then used in the fold of the next item in the list, and this process continues for each (word, filename) pair in the wordsInFiles list.

    The function performed on each item in the list is not as complicated as it looks, and note that each of the commented lines is equivalent to the uncommented one – I left them to show the progressive application of syntactic sugar (which I won’t go into here). The purpose of this next line is to increment the count of the number of occurrences of the current word in the current file.

    The outermost map.update finds the word in the map and replaces the map with which it is associated with a new one. This new map needs to be an updated version of the previous map, which we retrieve with map.apply(word). We want to update only one of the values corresponding to the file names in that words’ map of filenames to occurrence-counts, so we need to get the previous count (using two map.apply’s to get to the value of a key in a map within a map) and increment it before the resulting map is sent to the update.

… “deep Scalax magic.”

I’ve saved the results of the visualization in the below images. The program didn’t quite create the effect that I had intended – that of showing how my vocabulary evolved over time – but I did give some sense of the topics about which I was thinking and the words I used to describe them. I thought about eliminating common words, but it seemed like it would be hard to make those decisions in a non-arbitrary manner. Click each image for a larger version in which the differences are more visibly apparent, and I recommend opening the below images in tabs and cycling through them with shortcut keys so that it’s easier to make quick comparisons.

A2Z
ITP
assignments

Programming A to Z – Delvicious, Django and Assignment #8

project page here with overview and previous posts

Since I had already worked with XML and web services for my midterm project, I decided to take this week’s assignment for Programming A to Z as an opportunity to continue work on that project. I thought that a good next step would be to use App Engine’s Datastore to store the necessary information about all of a user’s bookmarks, and I would again access that data initially as an XML document obtained from the Delicious API.

I had already begun to learn the Python web framework Django for the Textonic project for Design for UNICEF, and it seemed like it would provide a rich set of useful tools for this project as well. The Datastore does not work, however, like the relational databases with which I am more familiar. Django relies by default on a relational database and is thus incompatible with Google’s Datastore out-of-the-box, but there are two software projects that aim to reconcile these differences. Google’s App Engine Helper (tutorial, code) seemed less well- and less actively-developed than app-engine-patch (tutorial, code), so I decided to go with the latter.

Django is quite powerful and gives you a lot of functionality for free, but when trying to branch out from various tutorials I encountered the somewhat strange challenge of figuring out exactly what it did for you and what it didn’t. It took me a quite a while to get the hang of working with Django on App Engine, so I didn’t have time to actually get the XML stored in a reasonable set of Models in the database. I have, however, gotten successful storage of Delicious logins to work, which is a good first step. The updated code is available on GitHub, and I should be able to make additional improvements soon.

A2Z
ITP
assignments
projects

Programming A to Z – Assignments #6 and #7

Since it’s getting to be that time in the semester when I feel that I should be focusing on final projects, I didn’t spend too much time on the context free grammar and Bayesian classificiation assignments. My writeups for both are below.

I’ve studied context free grammars in the past (ahh undergrad memories of Ling120 (syllabus here, but not the one I had)), so I have a good sense of how they work. I made a few quick modifications to the grammar file used by Adam’s ContextFilter class to handle certain conjunctions, adverbs/adverbial phrases, and prepositional phrases. I also made some modifications to support quotations, but they aren’t particularly refined – I couldn’t come up with a simple solution to the nesting problem in the below example that didn’t involve duplication of many of the other rules:

this smug dichotomy thought ” that smug walrus foregrounds this time that yelled ” the seagull that spins this important sea said ” that restaurant that sneezes has come ” ” ”

Thinking about ways to solve this problem highlighted how CFGs can get rather complex rather quickly. When, for example, do you want a period/comma at the end of the quotation? When is one period/comma sufficient for multiple quotations? How do you resolve those situations programmatically?

My test.grammar file is online here, and you can run it with Adam’s Java classes that are zipped here. I recommend you only test a few of my rules at a time and comment out the others – otherwise you might get sentences like this:

the important trombone yelled ” wow, the blue amoeba said ” oh, this blue thing interprets this seagull that said ” wow, the trombone sneezes ” ” but this amoeba said ” this suburb that quickly whines in that sea that daydreams by that corsage that quickly or habitually computes this dichotomy slowly yet habitually vocalizes and damn, that luxurious restaurant habitually prefers this seagull ” but that suburb interprets the time yet damn, that sea said ” that restaurant tediously slobbers ” yet wow, that seagull quickly yet slowly foregrounds the restaurant or the boiling hot time spins this bald restaurant of this trombone but that amoeba computes this smug restaurant but the seagull that quickly or slowly prefers this sea yelled ” oh, the thing that spins this restaurant tediously yet quietly whines ” or wow, this amoeba coughs through the important sea and oh, that time tediously yet slowly coughs yet oh, that trombone habitually computes that luxurious suburb or wow, that thing that said ” my, that amoeba that said ” my, that time tediously or slowly sneezes ” quietly yet tediously foregrounds that trombone that has come ” said ” my, the restaurant that habitually but quietly prefers the dichotomy that said ” damn, this boiling hot trombone quietly slobbers by the trombone that quietly coughs ” said ” oh, the amoeba habitually coughs ” ” yet damn, this seagull that spins the seagull that sneezes for the important trombone quietly coughs but my, this sea that slowly yet habitually has come foregrounds this dichotomy and that blue thing tediously slobbers “

 
 
For the assignment on Bayesian classification I combined Adam’s BayesClassifier.java with the previous Markov chain examples to use n-grams instead of words as the tokens for analysis. BayesNGramClassifier.java can be found here as a .txt file, and you can download all of the required files here. Note you might have to increase the amount of memory available to Java to run the analysis with higher values for n. Try something like java -Xmx1024m BayesNGramClassifier 10 shakespeare.txt twain.txt austen.txt if you're having trouble.

I compared sonnets.txt to shakespeare.txt, twain.txt and austen.txt as in the example using various values of n for the analysis. The data is below, with the word-level analysis first. Note that higher numbers (i.e. those closer to zero) indicate a greater degree of similarity.

n shakespeare.txt twain.txt austen.txt
word -59886.12916378722 -64716.741899235174 -66448.68994538345
1 -311.94997977326625 -348.2797252624347 -351.8612295074756
2 -6688.356420467105 -6824.843204592283 -7076.488251510615
3 -46332.8806624305 -47629.58502376338 -49557.906858505885
4 -155190.04376334642 -161815.95665896614 -167839.50470553883
5 -350322.9494161118 -369897.08857782563 -379600.90797560615
6 -581892.4161591302 -620871.7848829604 -629557.118086935
7 -798094.5896325088 -851043.4785550251 -856926.3903304675
8 -977428.4318098201 -1033391.2297240103 -1037851.0025613104
9 -1106125.9701775634 -1153251.0919529926 -1159479.8816597122
10 -1184654.361656962 -1218599.6808817217 -1227484.9929278728
11 -1221770.2880299168 -1242286.351341775 -1255024.535274092
12 -1228641.7908902294 -1238848.5031651617 -1254404.827728626
13 -1214247.043351669 -1217403.6233457213 -1235480.9184978919
14 -1187489.0276571538 -1186476.2556523846 -1205959.6178494398
15 -1153511.2780243065 -1150529.1594209142 -1170746.8132369826

When the n-grams become 14 characters long (which is very long, considering the average length of English words) the analysis finally starts to break down, and it no longer correctly classifies sonnets.txt as being most similar to shakespeare.txt. Some values of n certainly perform better than others, but I'd need to delve further into the mathematics of how these numbers are calculated in order to do more detailed analysis.

A2Z
ITP
assignments

Programming A to Z – Assignment #5 Markov v vokraM

The class preceding our fifth assignment was on Markov chains, which involve statistical models of text that can be used to predict what character (or word, or some other unit) would be likely to follow a preceeding series of n characters (or words or other units). (An interesting paper of such an algorithm from the assigned reading can be found here.) Our assignment, with a few suggestions, was simply to:

Modify, augment, or replace the in-class Markov chain example.

As presented by Adam, the MarkovFilter example looked at each series of, say, three characters to build a model of what the next character is most likely to be, and then used this model to generate new lines of text by starting with an initial series of three characters used by the actual text, choosing the next character based on the first three characters, choosing the character after that based on the new most recent three characters (the latter two from the first set of three, and the new fourth character), and so on.

It occurred to me that perhaps it was unnecessary that the algorithm examine the text as we read it, from left to right. Instead, I wanted to rewrite MarkovFilter.java to work backwards – starting with the last three characters of a line and working backwards instead of forwards, looking at the set of three characters at the (temporary) beginning of the line, choosing a new first character for the line from a modified statistical model about which character is most likely to precede them, and repeating until the line is of the desired length. VokramFilter.java represents this reversed Markov filter, and the zip file of all the classes is here.

The text this generated seemed approximately as similar to English as that generated using the forward looking method, and I considered for a while how to be sure this is the case. I suspected that the operation performed by backward-looking Vokram analysis was equivalent to reversing the text that was input to a forward-looking Markov algorithm and then reversing the result, and it seemed like that operation would do the same thing as the Markov algorithm on its own. Yet I couldn’t quite work out a more thorough proof of that intuition, and will see if Adam has any insights. I considered doing a comparative analysis of several large texts using both MarkovFilter and VokramFilter (by comparing the Markov analysis of a text generated from applying VokramFilter to an original text to a Markov analysis of that actual original text), but didn’t have a chance.

A2Z
ITP
assignments