Category Archives: assignments

Wanderli.st – Thesis Presentation

A couple weeks ago I presented Wanderli.st during Thesis Week at NYU’s Interactive Telecommunications Program. The list of all of the presentations is here, and ITP has a copy of the video hosted here, but I’ve also embedded it below. Some of the slides are difficult to read in the video, so they are embedded as well. If you’re in a hurry and think you can read faster than I was talking (hah!), the notes on which the talk was based are below each slide.

I’ve been thinking a lot the best way to continue this work now that I am free from academia, and about how Wanderli.st fits with other proposals such as Diaspora (which has gotten incredible support). I’ll continue to publish updates here, and let me know if you have any ideas you’d like to discuss!

Note: I had originally wanted to synchronize the PDF of the slides with the video, but I couldn’t find a good tool to help me with this – Omnisio has disabled the ability to create new presentations since they were acquired by Google, and the Zentation player was simply too ugly (despite their much prettier main website). Let me know if you can recommend something!

Einstein’s Theory of General Relativity – A Personal Description

From the syllabus – Reading: Find and read descriptions of Einstein’s Theory of General Relativity and how it relates to time. Describe it in your own words on your blog and post a link to it on the wiki. We’ll go through your descriptions in class. How would you describe it to a 5 year old or your grandmother?

Nothing can go faster than the speed of light, and this restriction applies to influences (such as the force of gravity) between massive elements (planets, particles, people) in the universe. It is these forces that determine the movement of objects, the swinging of a pendulum in a clock, the workings of our consciousnesses, and the travel of light.

When the objects in question are moving with respect to some other perceiving entity, the things traveling between the moving objects have to travel slightly further because the destination object will have moved further away from the original point of the source object by the time the things moving between them can arrive. It follows that those things then have to either travel faster or take longer to get there. If those things are already traveling at the speed of light (such as influence due to gravity), then they cannot go any faster, so they must take longer, and the interactions between the moving objects slow from the point of view of the outside observer.

The moving object, however, still behaves just as it did before, albeit slightly more slowly. Because the workings of its own ability to perceive time – either with a clock, or with a human consciousness, or with something else – have all slowed down in precisely the same fashion, the moving object experiences time just as it did before.

Particularly useful sources:

  1. http://www.physics.fsu.edu/Courses/Spring98/AST3033/Relativity/GeneralRelativity.htm – “When later it became clear that influences travel at finite speeds it was reasonable to suppose this true of gravity also.”
  2. http://www.squidoo.com/relativity_explanation

Programming A to Z – Assignment #9 Evolvocabulary

The last of the weekly assignments was relatively open ended –

Acquire some text. Visualize it. Source and methodology are up to you, but be prepared to justify your choices.

I decided to use my papers from college as my source text. I copy pasted the contents of the papers into plain text files, and had hoped to see how my vocabulary evolved through time (hence the project name… not the most clever one I’ve come up with, but it will do for a weekly assignment). (Note I didn’t include the writing I did for certain technical Linguistics, Computer Science, and Physics classes. I also didn’t include papers that were group-authored.)

In week four of the class we had looked at how to represent word counts as a hashmap of words to the number of times that they occurred in a text (see the WordCount class in the notes). WordCount extends TextFilter, however, and TextFilter is built to only be able to read data from a single file. I thought about combining the files into one or trying to use multiple TextFilters, but it seemed easier and more elegant to start from scratch.

Scala seemed like it would be well-suited to this sort of problem, and I was eager to find a use for it since it had been a couple months since I last worked on TwiTerra. My aforementioned friend Jorge pointed me towards a class written to parse Ruby log files; that code, which uses a Scala community library called Scalax, served as a useful starting point.

You can see the full source code for the assignment here, but I’ve pasted a particularly interesting function below. There’s some “deep Scalax magic” going on here (as Jorge says), which I’ll explain -

  def readAllFiles(fileNames: List[String]): Map[String, Map[String, Int]] = {
    val wordsInFiles: List[(String, String)] = for {
      filename <- fileNames
      line <- filename.toFile.readLines()
      word <- line.split("\\W+|\\d+")
    } yield (word.toLowerCase.trim, filename)
 
    val emptyMap: Map[String, Map[String, Int]] = Map.empty.withDefaultValue(Map.empty.withDefaultValue(0))
 
    wordsInFiles.foldLeft(emptyMap) { case (map, (word, filename)) => 
        map.update(word, map.apply(word).update(filename, map.apply(word).apply(filename) + 1))
      //map.update(word, map.apply(word).update(filename, map(word)(filename) + 1))
      //map.update(word, map.apply(word)(filename) = map(word)(filename) + 1)
      //map.update(word, map(word)(filename) = map(word)(filename) + 1)
      //map.update(word, map(word)(filename) += 1)
      //map(word) = (map(word)(filename) += 1)
    }
  }

The function takes a list of the names of the files mentioned above, and returns a map that has each word mapped to another map, and each of those maps has the names of the files in which that word occurred mapped to the number of times that word occurred in that file. There are three parts of the function:

  1. The first part of the function goes through all the files in that list, and then through each line in that file, and then through each token in that line (delimiting tokens by non-alphabetic characters), and puts each of those words in a list as a pair with the file in which they occurred. Thus wordsInFiles is a long list of words and file names, with an entry for every word on every line of every file.
  2. The function then initializes an empty map (emptyMap) with default values for the words and file names, and 0 for the word counts. This eliminates the need for a lot of hassle later on checking to see if words/file names are in the map – we can just assume they are there and trust it to use the default values if they aren’t.
  3. Finally, it operates on each pair in the wordsInFiles list and updates the map accordingly. foldLeft is explained thoroughly on the Ruby log file example linked above, but I’ll go through this specific case. It starts off with the emptyMap, goes through each pair in the wordsInFiles list, and performs a function on the pair of that map paired with that pair from the list ((map, (word, filename))) to fold that list pair into the map. The result of that fold is a map that is then used in the fold of the next item in the list, and this process continues for each (word, filename) pair in the wordsInFiles list.

    The function performed on each item in the list is not as complicated as it looks, and note that each of the commented lines is equivalent to the uncommented one – I left them to show the progressive application of syntactic sugar (which I won’t go into here). The purpose of this next line is to increment the count of the number of occurrences of the current word in the current file.

    The outermost map.update finds the word in the map and replaces the map with which it is associated with a new one. This new map needs to be an updated version of the previous map, which we retrieve with map.apply(word). We want to update only one of the values corresponding to the file names in that words’ map of filenames to occurrence-counts, so we need to get the previous count (using two map.apply‘s to get to the value of a key in a map within a map) and increment it before the resulting map is sent to the update.

… “deep Scalax magic.”

I’ve saved the results of the visualization in the below images. The program didn’t quite create the effect that I had intended – that of showing how my vocabulary evolved over time – but I did give some sense of the topics about which I was thinking and the words I used to describe them. I thought about eliminating common words, but it seemed like it would be hard to make those decisions in a non-arbitrary manner. Click each image for a larger version in which the differences are more visibly apparent, and I recommend opening the below images in tabs and cycling through them with shortcut keys so that it’s easier to make quick comparisons.

Programming A to Z – Delvicious, Django and Assignment #8

project page here with overview and previous posts

Since I had already worked with XML and web services for my midterm project, I decided to take this week’s assignment for Programming A to Z as an opportunity to continue work on that project. I thought that a good next step would be to use App Engine’s Datastore to store the necessary information about all of a user’s bookmarks, and I would again access that data initially as an XML document obtained from the Delicious API.

I had already begun to learn the Python web framework Django for the Textonic project for Design for UNICEF, and it seemed like it would provide a rich set of useful tools for this project as well. The Datastore does not work, however, like the relational databases with which I am more familiar. Django relies by default on a relational database and is thus incompatible with Google’s Datastore out-of-the-box, but there are two software projects that aim to reconcile these differences. Google’s App Engine Helper (tutorial, code) seemed less well- and less actively-developed than app-engine-patch (tutorial, code), so I decided to go with the latter.

Django is quite powerful and gives you a lot of functionality for free, but when trying to branch out from various tutorials I encountered the somewhat strange challenge of figuring out exactly what it did for you and what it didn’t. It took me a quite a while to get the hang of working with Django on App Engine, so I didn’t have time to actually get the XML stored in a reasonable set of Models in the database. I have, however, gotten successful storage of Delicious logins to work, which is a good first step. The updated code is available on GitHub, and I should be able to make additional improvements soon.

Programming A to Z – Assignments #6 and #7

Since it’s getting to be that time in the semester when I feel that I should be focusing on final projects, I didn’t spend too much time on the context free grammar and Bayesian classificiation assignments. My writeups for both are below.

I’ve studied context free grammars in the past (ahh undergrad memories of Ling120 (syllabus here, but not the one I had)), so I have a good sense of how they work. I made a few quick modifications to the grammar file used by Adam’s ContextFilter class to handle certain conjunctions, adverbs/adverbial phrases, and prepositional phrases. I also made some modifications to support quotations, but they aren’t particularly refined – I couldn’t come up with a simple solution to the nesting problem in the below example that didn’t involve duplication of many of the other rules:

this smug dichotomy thought ” that smug walrus foregrounds this time that yelled ” the seagull that spins this important sea said ” that restaurant that sneezes has come ” ” ”

Thinking about ways to solve this problem highlighted how CFGs can get rather complex rather quickly. When, for example, do you want a period/comma at the end of the quotation? When is one period/comma sufficient for multiple quotations? How do you resolve those situations programmatically?

My test.grammar file is online here, and you can run it with Adam’s Java classes that are zipped here. I recommend you only test a few of my rules at a time and comment out the others – otherwise you might get sentences like this:

the important trombone yelled ” wow, the blue amoeba said ” oh, this blue thing interprets this seagull that said ” wow, the trombone sneezes ” ” but this amoeba said ” this suburb that quickly whines in that sea that daydreams by that corsage that quickly or habitually computes this dichotomy slowly yet habitually vocalizes and damn, that luxurious restaurant habitually prefers this seagull ” but that suburb interprets the time yet damn, that sea said ” that restaurant tediously slobbers ” yet wow, that seagull quickly yet slowly foregrounds the restaurant or the boiling hot time spins this bald restaurant of this trombone but that amoeba computes this smug restaurant but the seagull that quickly or slowly prefers this sea yelled ” oh, the thing that spins this restaurant tediously yet quietly whines ” or wow, this amoeba coughs through the important sea and oh, that time tediously yet slowly coughs yet oh, that trombone habitually computes that luxurious suburb or wow, that thing that said ” my, that amoeba that said ” my, that time tediously or slowly sneezes ” quietly yet tediously foregrounds that trombone that has come ” said ” my, the restaurant that habitually but quietly prefers the dichotomy that said ” damn, this boiling hot trombone quietly slobbers by the trombone that quietly coughs ” said ” oh, the amoeba habitually coughs ” ” yet damn, this seagull that spins the seagull that sneezes for the important trombone quietly coughs but my, this sea that slowly yet habitually has come foregrounds this dichotomy and that blue thing tediously slobbers “

 
 
For the assignment on Bayesian classification I combined Adam’s BayesClassifier.java with the previous Markov chain examples to use n-grams instead of words as the tokens for analysis. BayesNGramClassifier.java can be found here as a .txt file, and you can download all of the required files here. Note you might have to increase the amount of memory available to Java to run the analysis with higher values for n. Try something like java -Xmx1024m BayesNGramClassifier 10 shakespeare.txt twain.txt austen.txt if you're having trouble.

I compared sonnets.txt to shakespeare.txt, twain.txt and austen.txt as in the example using various values of n for the analysis. The data is below, with the word-level analysis first. Note that higher numbers (i.e. those closer to zero) indicate a greater degree of similarity.

n shakespeare.txt twain.txt austen.txt
word -59886.12916378722 -64716.741899235174 -66448.68994538345
1 -311.94997977326625 -348.2797252624347 -351.8612295074756
2 -6688.356420467105 -6824.843204592283 -7076.488251510615
3 -46332.8806624305 -47629.58502376338 -49557.906858505885
4 -155190.04376334642 -161815.95665896614 -167839.50470553883
5 -350322.9494161118 -369897.08857782563 -379600.90797560615
6 -581892.4161591302 -620871.7848829604 -629557.118086935
7 -798094.5896325088 -851043.4785550251 -856926.3903304675
8 -977428.4318098201 -1033391.2297240103 -1037851.0025613104
9 -1106125.9701775634 -1153251.0919529926 -1159479.8816597122
10 -1184654.361656962 -1218599.6808817217 -1227484.9929278728
11 -1221770.2880299168 -1242286.351341775 -1255024.535274092
12 -1228641.7908902294 -1238848.5031651617 -1254404.827728626
13 -1214247.043351669 -1217403.6233457213 -1235480.9184978919
14 -1187489.0276571538 -1186476.2556523846 -1205959.6178494398
15 -1153511.2780243065 -1150529.1594209142 -1170746.8132369826

When the n-grams become 14 characters long (which is very long, considering the average length of English words) the analysis finally starts to break down, and it no longer correctly classifies sonnets.txt as being most similar to shakespeare.txt. Some values of n certainly perform better than others, but I'd need to delve further into the mathematics of how these numbers are calculated in order to do more detailed analysis.