How to Solve Will's Puzzle With Code
Name a famous author whose last name starts with the letter C. Cross out four letters in it. The remaining letters, in order,
will name another famous author also starting with C. Who are these two writers?
As it turns out, I was able to identify eleven author pairs that
match each other in this way! And it was pretty straightforward too.
Algorithm Summary
-
Build a dictionary of authors, using their last name as a key
-
A dictionary (a type of hash table in .NET) is quite handy because it is very efficient at performing lookups, and
also because we can use it to store additional information, in this case the author's full name.
-
Look at each author (iterate the dictionary) and construct candidate variatons according to Will's instructions.
-
Check whether any candidate is in our authors dictionary, if so, we have a solution!
The only tricky part here is constructing the last name variations. I looked at a number of
possible techniques,
including recursion, before I realized I could
simply use four nested loops. Each loop controls the position of a character that must
be removed (blocked). Every inner loop starts just after its parent's value,
and runs as far as it can before bumping into its child loop (or the end of the string for
the innermost loop).
Nested Loops Control Your Blocking Positions
The illustration at left shows the first nine variations for the name 'Campbell'.
p1..p4 are the positions that are blocked when constructing a new variation. Observe
how p4 starts one position higher than it's containing loop (the p3 loop) and goes to the end. Meanwhile, p3 starts
one higher than p2. In the picture, p1 and p2 haven't
had a chance to move yet.
In case it isn't obvious, the unblocked letters are what we use to form our new variation.
Interpretation:
-
letter c is blocked every time above, because p1 hasn't moved yet
-
letter a is blocked each time in the picture, because p2 hasn't had a chance to move
-
p3 blocks the letter 'm' the first 5 iterations, then blocks the letter 'p' after p4 has
finished its first pass
-
p4 is blocking a variety of letters because it is moving in the innermost loop
A code sample follows which is probably easier to read than my explanation!
//p1..p4 are positions to block in the current variation of our last name
//Each is controlled by a nested loop so all possibilities are iterated
for (int p1 = 0; p1 < lastName.Length - 3; p1++) {
for (int p2 = p1 + 1; p2 < lastName.Length - 2; p2++) {
for (int p3 = p2 + 1; p3 < lastName.Length - 1; p3++) {
for (int p4 = p3 + 1; p4 < lastName.Length; p4++) {
//Use p1..p4 to construct a variation:
CopyBlocked(p1, p2, p3, p4, lastName, variations);
//look in the dictionary:
string candidate = new string(variations);
if (authorDic.ContainsKey(candidate)) {
//Score!
lstResults.Items.Add(lastName + " - "
+ authorDic[candidate]);
}
}
}
}
}
Note that CopyBlocked is a method I wrote (not shown above)
which utilizes p1..p4 to create a new variation. Meanwhile, lstResults is the name of
a listbox that displays the results.
Note that this code will generate duplicates if the last name contains duplicated letters.
The impact on efficiency is negligible, so I leave it as an exercise for the reader to construct
a method to suppress the duplicates in the output.
Algorithm Analysis (for Advanced Readers)
In terms of algorithm analysis, the main loop is O(n). (Take a look at the Wikipedia
article on "Big O Notation"
if you have no idea what I'm talking about here.) That's pretty speedy,
though I really can't take much credit for it, since I'd almost have to be a
dunce to make it slower! It makes exactly one pass through the data set,
and since I used a dictionary to look-up author names (which runs in O(1) time), no significant
cost is incurred for the lookup.
Calculating the variations on an author's name is somewhat interesting. Strictly
speaking, it runs in O((n-4)4)
time, where n is considered to be the name length. 'Interesting'
because the length of the names is short (relative to the length of the list
they are in), making it
debatable whether this additional processing is worthy of
consideration. At any rate, since the author data set is small, the overall question of
algorithm speed is academic here.
Getting the Author List
Once again, Wikipedia provides the best place to get lists. As a matter of fact, they have
a page dedicated to authors whose name begins with the letter c: http://en.wikipedia.org/wiki/List_of_authors_by_name:_C
As usual, I built the data set using my standard techniques of:
-
Writing a
spider
to scrape the Wikipedia screen
-
Matching author names via a
regular expression
to extract them from the page markup, and
-
Employing some simple
File IO
to write the regular expression matches to a file we can use.
If you're curious about the regular expression I utilized,
download my C# code and take a look.
Solve the Problem Now!
Name a famous author whose last name starts with the letter C. Cross out four letters in it.
The remaining letters, in order, will name another famous author also starting with C.
Who are these two writers?
By clicking the button, you will run a small javascript program in your browser to
solve the problem. It needs to fetch the author list from the server, and then it should
solve the problem in a few seconds. If you're curious, inspect the page source code!
(Search for "function dataFetched".)
Note that my JavaScript implementation is different than my C# implementation, mainly
because JavaScript doesn't have a Dictionary class, and I didn't feel like implementing
my own. So I used an array with a binary search instead.
That works fine except for the existence of duplicated last names; note that I am using
last name as a search value for my binary search. The upshot:
my C# solution identifies 11 pairs, and my JavaScript only identifies 7 pairs. The difference
is wholly due to the problem of duplicated last names.