Posts in the coding category:

  1. Google Interview Part 1 03 Nov 2009
  2. Multiple window.onload functions in JavaScript 01 Jun 2009
  3. Get and Set 27 Apr 2009
  4. Lambda 22 Aug 2008
  5. Installing Pubcookie 3.3 on Ubuntu 7.10 with Apache 2 29 Dec 2007

So I had my first Google interview last week. I submitted an online resume and cover letter and was asked to come in for a 1.5-hour, 2-session interview on campus. (No, I’m no longer a student. I guess that doesn’t matter.) I got asked in about a month after submitting a resume.

Here’s a brief summary on how each of the interview sessions went. It’s a bit sloppy, but you can deal with that.

The First

Friendly software test guy. Not super chatty but nice.

Question:

Design an algorithm to visit every room in a building once.

This was really open-ended: there were no constraints on what the structure of the input might be. “You decide” was the response for most questions I asked about clarifications.

Since I knew I was going to have to code this up, I decided on using an adjacency matrix to represent the graph of rooms. This also turned out to be wise because your solution can be optimized very easily.

The answer is pretty easy. I’ll let you write the code, but here’s a pretty detailed description of the algorithm and data structure used:

Ryan's Room Algorithm
-----------------------
Let A be the square adjacency matrix of size n by n with
  A_i,j = A_j,i = 1 if room i is connected to room j and
  0 otherwise.  Represent this as n arrays of n boolean
  values:  boolean A[n][n]; If you have rooms A, B, C, D
  with a door between A->B->C->D, your matrix would look
  like this:
        [[ 1 1 0 0 ]
         [ 1 1 1 0 ]
         [ 0 1 1 1 ]
         [ 0 0 1 1 ]]
Let V be an empty collection of rooms (can use size-n
  array or a hash table or ....).
Function Do(Room R):
    For each entry E in A[R]
    if E is not in V, go from R to E and put E in V
    Perform Do(E)
    Go back from E to R
You will end up back in R and will have visited every
  reachable room.

This is, of course, O(n^2) space, and it turns out to be O(n^2) time complexity as well. You can do much better by using a simple Java class:

class Room {
    int name;
    ... other data
    Collection<Room> children;
}

You can use a HashMap<Room,Boolean> for your V in the above algorithm, and your solution ends up being O(n) worst-case in both time and space. I was asked specifically about the optimizations that might be made if there were lots of other data about the rooms and if there were millions of rooms. Specifically, you might want to leave the Room Java class pretty small and make it have a pointer to a RoomData object for the other data about the room. Make this accessible via a method like getRoomData() or something so it can be made to do lazy initialization from the disk if necessary. Note that V isn’t really something you need to worry about.

Then I was asked how I would test my solution. I drew some graph structures that might cause problems including an empty graph, a disconnected graph (sets of rooms that aren’t connected to any other sets of rooms):

+---+    +---+       +---+    +---+
| A | -> | B |       | C | -> | D |
+---+    +---+       +---+    +---+

and cycles (A->B->C->A). The interviewer was very happy with my answers.

Important that you should think of other implementations: this might have been implemented using a Stack<Room> or something of that sort, so giving a test case where that sort of thing might end in an infinite loop is wise.

The Second

Really nice software engineer on the search team.

Question:

Given a List<List<T>>, write an implementation for an Iterator<T>.

This is a “list of lists of items of type T”, and you’re asked to write an Iterator that returns the next T. (I didn’t remember everything the Iterator interface had, but the important bits are the next() and hasNext() methods.)

This is actually surprisingly easy if you use anonymous classes. This is a good way to show off your question-asking ability, though like what the structure of the data might look like (are there allowed to be null values for the “inner” list:

List<T> |  List<T> |  null | List<T> | List<T>

and what is the expected behavior of it, etc. Say “as long as it’s documented you can do whatever you want.” Developers love that kind of thing. PMs and users hate it, but you’re being jovial with a coder here, right?

Using an anonymous inner class was deemed to be a good solution by my interviewer who hadn’t thought of doing something so clever. I decided to declare that there could be no inner null lists. Each element of the list of lists had to be a valid list of items. This makes the coding a lot easier.

Here’s a very basic outline of my solution:

class ListOfList<T> implements Iterable {
    
    private List<List<T>> data;
    
    public Iterator<T> iterator() {
        ListOfList<T> self = this;
        
        return new Iterator<T>() {
            Iterator<List<T>> outer = 
                self.data.iterator();
            Iterator<T> inner = outer.next();
            
            public boolean hasNext() {
                if ( ! outer.hasNext() )
                    outer = outer.next().iterator();
                return hasNext();
            }
            
            public T next() {
                if ( !hasNext() )
                    return null;
                return inner.next();
            }
            
        };
    }   
}

I noted to the interviewer the “clever” use of recursion and how the assumptions work to my advantage. Also note that the assumption isn’t particularly reasonable (maybe) and that the code isn’t so clear at first, so there should be lots of documentation. Also note the slightly ugly usage of the self variable. Note why this is necessary (because the this keyword is scoped to the inner class) and discuss closures.

I was then asked about how to test it and what the complexities are. Testing is pretty easy; complexity is constant time (again using the assumptions).

I then asked the guy a bunch of questions about working for Google, and we chatted it up for about 20 minutes past when the interview was supposed to be finished. This was mostly about some interesting projects at Google but was also about my past projects and interests as well. It went long because the interviewer seemed to be legitimately enjoying the conversation.

In all, I think I’ll get a callback! The process was very “fair”: no “how would you move Mount Fuji” or weighing pills on balances questions; the questions were practical, open-ended and allowed me to show my stuff pretty well.

Writing in JavaScript often requires triggering code to execute after the page has finished loading (that is, when the Document Object Model DOM is ready).

Say you want to set the contents of the document’s <title> tag to the contents of the first <h1> tag. Here is the na”ive solution:

<html><head><title></title>
  <script type="text/javascript" charset="utf-8">
  
    document.title =
        document.getElementsByTagName('h1')[0].innerHTML;
  
  </script>
</head><body><h1>Hello!</h1></body></html>

The problem, of course, is that this is executed while still in the <head> section: before the <body> has even been seen. Thus, when we ask for document.getElementsByTagName('h1'), we will get null back, and this will cause an error.

The solution is to wait until after the document has loaded:

<html><head><title></title>
  <script type="text/javascript" charset="utf-8">
  
    window.onload = function() {
        document.title =
            document.getElementsByTagName('h1')[0].innerHTML;
    };
    
  </script>
</head><body><h1>Hello!</h1></body></html>

This sets up a “callback” that the browser knows to execute as soon as it’s loaded the whole Document.

The problem, though, is that we might want many such functions to execute. Lots of (or at least more than one) window.onload. And if we overwrite the contents of document.onload as we’ve done here, then we’ve shot the rest of our code in the foot and thrown it out to the curb to die.

The solution is to create a stack of functions to execute and simply add them one by one. Then set window.onload to pop off all the functions on the stack and execute them using the call() method for functions.

jQuery does this automatically. So we could simply have this in the <head> section:

<script type="text/javascript" charset="utf-8">

    $(function(){
        document.title = $("h1:first").html();
    });

</script>

…but if we didn’t have jQuery at our disposal, we could do this manually:

<html><head><title></title>
  <script type="text/javascript" charset="utf-8">
    
    // Prevent overwriting loadstack if it's already set:
    var loadstack = loadstack || [];
    
    // Add our function that changes the <title>
    // to the contents of the first <h1>:
    loadstack.push(function(){
        document.title =
            document.getElementsByTagName('h1')[0].innerHTML;
    });
    
    // ... lots of other additions to the loadstack ...

    // this *must* be the last code in the <head>
    // section's JS block (or at least the very last
    // thing to touch window.onload):
    
    if ( window.onload )
        loadstack.push(window.onload);
    
    window.onload = function()
    {
        while ( loadstack.length > 0 )
            // `this` will be bound to the window object
            // inside the called function
            loadstack.pop().call(this);
    };
  </script>
</head><body><h1>Hello!</h1></body></html>

Adding code to execute is simple as this:

loadstack.push(function(win){
    // code to execute here.
    // (win is the window object if you need it)
});

All said, I usually just use jQuery’s built-in handling of this, but it’s nice to know that it’s pretty easy to to it all ourselves if we had to.

Get and Set

27 Apr 2009

Although I work with Java on a regular basis at work, I’ve long been scared away from Java for personal projects based solely on its verbosity.

Rather than having object members (“properties”) be declared public, Java tends to have everything declared private and then accessed for reading/writing via “getters” and “setters.” So if a Document object has a title property, the Document class would likely have the methods String getTitle() and void setTitle(String t).

What is simpler, however, is to overload a method having the same name as the property. When called with no arguments, the method serves as the getter, and when called with a single argument, the method serves as a setter - setting the value to the indicated argument.

This is less verbose and more in-line with what happen “nowadays” with the more scripting-inclined languages like Ruby and PHP.

The old way:

Document d = new Document();
String oldTitle = d.getTitle();
d.setTitle("New Title");

The new way:

Document d = new Document();
String oldTitle = d.title();
d.title("New Title");

And here is a sample POJO “Plain Old Java Object” that utilizes this:

class MyDocument
{

    private String title;

    public MyDocument()
    {
    }

    public String title()
    {
        return title;
    }

    public void title(String t)
    {
        this.title = t;
    }

}

Just my $0.02.

Lambda

22 Aug 2008

PHP has been in dire need of some new features for some time. PHP 5.3 plans to add a slew of new features, finally adding lambdas a/k/a anonymous functions to the language.

That is… …once it gets out of alpha. Then out of beta. Then onto production systems.

(PHP haters can now stop sipping quite so much hater-ade in that PHP now also supports namespaces, but I don’t find that new feature particularly interesting.)

What makes lambdas so interesting is how much simpler and cleaner they can make your code.

Consider the problem of wanting to sort a list where some of the items have the words ‘The’ or ‘A’ as a prefix. We want to sort the list

  • Beck
  • The Starting Line
  • Blue October
  • The All American Rejects

to be

  • The All American Rejects
  • Beck
  • Blue October
  • The Starting Line

and not

  • Beck
  • Blue October
  • The All American Rejects
  • The Starting Line

Without the use of lambdas, we have to do something like this in old PHP:

function stem_the($a)
{
    return preg_replace( 
    '/^(a|the) *(.*)$/i', '$2', $a 
    );
}

function cmp_the($a,$b)
{
    $a = stem_the($a);
    $b = stem_the($b);
    return strcmp($a,$b);
}

$bands = array( 'The All American Rejects',
        'Beck', 'Blue October',
        'The Starting Line' );

usort( $bands, 'cmp_the' );

usort()’s second argument is a string. This is really ugly. Old PHP didn’t have functions as first-class citizens, so that was the only way we could do something like this. It’s ugly and slow.

Moreover, we’re forced to create two functions stem_the() and cmp_the() (my names are always this creative). We’ve cluttered the global function namespace for our very trivial task.

Enter stage left, Dr. Martin Luther Lambda. His dream is promoting functions to first-class citizens…

With lambdas, we can rewrite that entire mess to be the clean, simple, elegant mess that follows:

$bands = array( 'The All American Rejects',
        'Beck', 'Blue October',
        'The Starting Line' );

usort( $bands, function($a,$b)
{
    list($a,$b) = array_map(function($s) {
    return preg_replace('/^(a|the) *(.*)$/i','$2', $s);
    }, array($a,$b));
    return strcmp($a,$b);
});

Notice that this code doesn’t create any new functions in the global namespace. It’s slightly harder to read initially (mostly because of the gratuitous use of the list() language construct), but it is vastly cleaner and, in a word elegant.

(Aside: calling super geeky things elegant is not even close to pass’e yet.)

We can also now perform the operation of currying-partially instantiating a function with multiple parameters. Let’s say we want to apply a function to each item in a list that adds 7 to that item. PHP has a built-in array_map() function, used above, that allows us to apply a function to every item in an array, but its argument is again a string, and we still have to create a global-namespace function to accomplish this.

Let’s just say we have a function add() such as

function add($x,$y) { return $x + $y; }

and we want to create a function add7($x) that returns $x + 7. We could simply create it:

function add7($x){ return $x + 7; }

Now what we want to do is partially instantiate our add() function to have a sort of ‘hard-coded’ first argument 7. In this way, we would define add7() to be

    function add7($x){ return add(7,$x); }

We just had to create another function in the global namespace: bloating our memory footprint and creating an obtrusive disaster of our codebase to accomplish a trivial task.

With currying, we could simply do this:

$add  = function()( $x, $y ) { return $x + $y; };
$add7 = curry($add,7);

The punchline is that curry() returns a function. It’s a robot that gives birth to new robots. Elegant?

So we can increment every item in an array by 7 very simply:

$nums = array( 1, 2, 3, 4 );
$add  = function()( $x, $y ) { return $x + $y; };
$nums = array_map( curry($add,7), $num );

(This example is very contrived, of course, since there was no reason to use currying here-we could have simply had

$nums = array( 1, 2, 3, 4 );
$nums = array_map( function($x){ return $x + 7; }, $num );

but cut me some slack.)

Unfortunately curry() is not yet part of the PHP standard library. It’s also not inherently obvious how to implement it-the necessary language constructs were only introduced a mere three weeks ago today.

Fortunately, I have implemented curry() for you. It’s a bit ugly, but it works. It’s actually fairly fast as well (although my initial tests of it were not exhaustive).

function curry($fn, $arg)
{
    return function() use ($fn, $arg)
    {
        $args = func_get_args();
        array_unshift( $args, $arg );
        return call_user_func_array( $fn, $args );
    };
}

It’s cool because you can chain it to itself. So if you wanted a function that always returned 7, you could simply say:

$return7 = curry(curry($add,3),4);

Or if you had a function that took three arguments and you wanted a new function with the first two ‘hard-coded’ (imagine g(x) = f(1,3,x) from math), you could simply “curry f()” twice:

$g = curry(curry($f,3),1);

A better implementation would be for curry() to take a variable number of arguments and to curry all of them in:

$g = curry($f,3,1); // or `curry($f,1,3)` maybe?

I’ll leave this as ‘an exercise for the reader’, but it’s not too difficult to implement.

Awesome, n’est-ce pas?

You, too, can have some PHP 5.3 alpha goodness by running the following nonsense on your command line:

mkdir -p "$HOME/src/php53alpha/build"
cd "$HOME/src/php53alpha"
wget http://downloads.php.net/johannes/php-5.3.0alpha1.tar.gz
tar zxf php-5.3.0alpha1.tar.gz
cd php-5.3.0alpha1
./configure --prefix="$PWD/../build"
make
make install

This installs the php binary in ~/src/php53alpha/build/bin, so you can run your .php scripts by simply calling, e.g.,

~/src/php53alpha/build/bin/php my_script_file.php

Also read:

Enjoy.

I recently had the wonderful experience of configuring UW’s Pubcookie for use on Ubuntu’s package-manager version of Apache2. I took modest notes while doing this, and it was spread out over several weeks whenever I could find the time, so maybe don’t take this as a guide so much as a few installation notes, but these are the spots that gave me the most trouble.

This accompanies the Pubcookie Apache Module installation guide at http://pubcookie.org/docs/install-mod_pubcookie-3.3.html and the UW-specific guide notes at http://www.washington.edu/computing/pubcookie/uwash-install.html.

(Of course following the convention that $ are commands run as a limited user while # commands are run as root.)

  1. Ensure Apache2 is installed and configured with OpenSSL:

    # apt-get install apache2-threaded-dev apache2 openssl
    # mkdir -p /etc/apache2/ssl
  2. Obtain Weblogin Server Registration for your hostname at https://server-reg.cac.washington.edu/pubcookie/

  3. Get the UW Root Cert from http://certs.cac.washington.edu/?req=svpem

    I put this file at /etc/apache2/ssl/server.pem. This is the server’s public key.

  4. Get the CA Bundle from http://www.washington.edu/computing/pubcookie/ca-bundle.crt

    I put this file at /etc/apache2/ssl/ca-bundle.crt

    This file allows the server to verify peers’ certificates and is used by keyclient.

  5. Generate your cert’s private key and have it signed by your CA.

    Information on how to generate a private key and a signature signing request are probably documented on whatever site is signing your certificate. The UW CA’s Technical Information can be found at https://www.washington.edu/computing/ca/infra/.

    Generating a request for the UW CA (and probably all other CAs as well) is simply a matter of:

    # cd /etc/apache2/ssl
    # openssl req -nodes -newkey 1024 \
        -keyout key.pem -out req.pem

    When I went through the request process, the CA gave me the following values to fill in to the request UI:

    Country (C)         US
    State (ST)          WA or Washington
    Organization (O)    Optional
    Organizational Unit (OU)    Optional
    Common Name (CN)    Your host's fully qualified domain name
  6. Put your private key at /etc/apache2/ssl/key.pem and your CA-signed certificate at /etc/apache2/ssl/cert.pem.

    (Note that the above step should generate the key and the request at these locations already.)

  7. Working in $HOME, get the Pubcookie tarball and unzip:

    $ mkdir -p $HOME/pubcookie
    $ wget http://www.pubcookie.org/downloads/pubcookie-3.3.3.tar.gz
    $ tar xzf pubcookie-3.3.3.tar.gz
  8. Modify the configure script to know where apache’s PREFIX is. This problem seems to come from the fact that Apache isn’t built from source locally when using aptitude.

    The diff for this modification is

    3783c3783
     <   APACHE_PREFIX=`$APXS -q PREFIX`
     ---
     >   APACHE_PREFIX="/usr/share/apache2" #`$APXS -q PREFIX`

    This via a message from the Pubcookie mailing list.

  9. Configure, compile install:

    $ cd $HOME/pubcookie/pubcookie-3.3.3/
    $ ./configure   \
        --enable-apache  \
        --prefix=/usr/local/pubcookie  \
        --with-apxs=/usr/bin/apxs2
    $ make
    $ sudo make install
  10. Based on information from the installation guide, the following serves as a good checkpoint:

    $ ls -F /usr/local/pubcookie
    keyclient*      keys/
  11. Here is my keyclient configuration file, /usr/local/pubcookie/config

    # ssl config
    ssl_key_file: /etc/apache2/ssl/key.pem
    ssl_cert_file: /etc/apache2/ssl/cert.pem
    
    # keyclient-specific config
    keymgt_uri: https://weblogin.washington.edu:2222
    ssl_ca_file: /etc/apache2/ssl/ca-bundle.crt
  12. Run keyclient to request a new key and to download the “granting” certificate:

    # cd /user/local/pubcookie
    # ./keyclient
    # ./keyclient -G keys/pubcookie_granting.cert
  13. Create a pubcookie load file so we can continue to use Ubuntu’s methodology for managing Apache extensions (e.g. using a2enmod and a2dismod, which really only create/modify symlinks in /etc/apache2/mods-enabled but are sometimes reportedly used by other installation scripts):

    # echo 'LoadModule pubcookie_module /usr/lib/apache2/modules/mod_pubcookie.so' \
    > /etc/apache2/mods-available/pubcookie.load
  14. Stop Apache and load the pubcookie module:

    # apache2ctl stop
    # a2enmod pubcookie
  15. Set Pubcookie directives in /etc/apache2/httpd.conf:

    PubcookieGrantingCertFile /usr/local/pubcookie/keys/pubcookie_granting.cert
    PubcookieSessionKeyFile /etc/apache2/ssl/key.pem
    PubcookieSessionCertFile /etc/apache2/ssl/cert.pem
    PubcookieLogin https://weblogin.washington.edu/
    PubcookieLoginMethod POST
    PubcookieDomain .washington.edu
    PubcookieKeyDir /usr/local/pubcookie/keys/
    PubCookieAuthTypeNames UWNetID null SecurID

    Note that Ubuntu’s Apache likes to have lots of configuration files. The main configuration happens in /etc/apache2/apache2.conf while “user” modifications appen in the httpd.conf file as per above. You will also need to have Apache listen for SSL requests on port 443 by modifying /etc/apache2/ports.conf to include

    Listen *:443
  16. Enable SSL on your default site. This can usually be done by modifying /etc/apache2/sites-available/default to include

    SSLEngine on
    SSLCertificateFile /etc/apache2/ssl/cert.pem
    SSLCertificateKeyFile /etc/apache2/ssl/key.pem

    You might be able to get away with only enabling SSL on select virtual hosts if your environment is such that you have multiple host names pointing to the same Apache instance. I was able to accomplish this to some degree but am still working out a few ambiguities that Apache isn’t telling me about.

  17. Restart Apache and you’re good to go:

    # apache2ctl -k start

You now have the usual .htaccess directives at your disposal. E.g.:

AuthType UWNetID
Require valid-user

Have some sorbet?