Beat the Puzzle Master Home
Beat the Puzzle Master!
A web site dedicated to solving the NPR Sunday Puzzle

Puzzle Sept 7, 2008

Podcast of the original challenge

The End Justifies the Clue

Didn't get enough anagrams Sunday? Try these
Challenge: You are given clues that end in a six-letter word. Rearrange the letters in the last word to get the answer to the clue. For example, given "Jewels a pirate buries," the answer would be "rubies."

Clue:  
What's the Anagram?

 

From Merle Regal:

Kris Kristofferson's last name starts with his four-letter first name. Can you name a famous American whose last name ends with his four-letter first name? Hint: The last name has seven letters.

How to Solve the Puzzle With Code

This puzzle is quite easy if you can just get a list of famous Americans! Unfortunately, I was not able to find such a list per se. The closes thing is Wikipedia, which maintains a number of such lists. Like all pages in Wikipedia, the names we want are closely mixed-in with HTML markup, so extracting the names was quite the challenge.

This puzzle can be solved in two stages:

  • Get the list of names (hard!)
  • Looping through the list to find the person in question (easy!)
Once you've got the list, all you have to do is
  1. Loop through the list, examining every entry
  2. Ignore every entry with firstName.Length ≠ 4
  3. Ignore all with lastName.Length ≠ 7
  4. Spit-out any you find where the trailing characters of the last name match the first name
Most languages have as string function something like 'EndsWith', so, if you're programming with, for example, C#, the code inside your loop would look like this:
    if (fName.Length == 4 && lName.Length == 7) {
        if (lName.EndsWith(fName)) {
            Response.Write(fName + " " + lName);
        }
    }

This code fragment assumes that you have variables named 'fName' and 'lName' representing the first name and last name, respectively.

Get the Names!

Getting the names is a lot harder. The basic idea is to 1) write your own spider or robot program to read all the pages in question, 2) somehow extract just the names out of all the page markup. The list I built has over 10,000 entries, so editing it manually is out of the question, and if you've ever examined any page mark-up, you'll immediately realize that the names make up only a small portion of a typical HTML page.

Try right-clicking on this page and choosing 'view page source' to get an idea of the near-gibberish appearance of a typical HTML page. Or go to this page Wikipedia Famous Americans Page and right-click on their page! Either should quickly convince you to give up any idea of creating the list manually.

So, getting our names list boils down to this:
  1. Writing a program that can fetch the page content from Wikipedia, and
  2. Figuring out how to extract just the names from the page markup.

Since writing a spider to read other web pages is so common while solving puzzles, I dedicated a separate page to it: How to Write a Spider

Wikipedia has a 'master page' of famous Americans, with links to other pages. For example, it has links to all the states. For my purposes, it seemed likely that I could get an exhaustive list by visiting every state page and grabbing the people's names there

I needed two regular expressions to get the list:
  1. One to extract the links to the state lists
  2. Another to extract the people's names off the state pages

Since regular expressions are used for almost every puzzle, I don't explain them every time. Read my introductory article on regular expressions to learn how to use them to 'scrape a screen'.

The Wikipedia page in question has 50 links that look like this:
<a href="/wiki/List_of_people_from_Alabama" 
   title="List of people from Alabama">Alabama</a>
Here is the regular expression I used to extract these links.
<a                                    #Beginning of the anchor
\s                                    #Space
href="                                #Beginning of the address
(/wiki/List_of_people_from_[^"]*)     #The URL we seek
"                                     #The close quote

I'm sure you remember that # is the comment designator for regular expressions, and that it is easy to turn text above into a regular expression using the RegexOptions.IgnorePatternWhitespace option.

Once I had the links, it was fairly easy to use my spider/robot to visit all the pages in turn and read their contents. Now it's a matter of matching the names on each page with another regular expression and saving them to a file.

Here's a link to a typical page on Wikipedia listing famous Americans. If you look at the page source, you will quickly observe that each person listed looks like this:

<li><a href="/wiki/Hank_Aaron" title="Hank Aaron">Hank Aaron</a>,

Once again, we use a regular expression to match the link (refer to my regex article on getting all the matches for a regular expression and using them in code). Basically, we need a regular expression that has literal text for everything up to the title tag, and we place it inside parentheses to make it a match group.

Here's the final regex, with embedded comments:
((<li>)|(<td>))            #Starts with either a td or else a list item
(<b>){0,1}                 #2 pages inexplicably have bold tags; 48 do not
<a                         #Beginning of a list item and link
\s+                        #White space
href="/wiki/[A-Za-z_]+"    #Url of the reference with variable letter
\s+                        #White space
title="[a-zA-Z\s]+">       #The title
([a-zA-Z\s]+)              #The text of the anchor, our group
</a>                       #close the tag

The regex became complicated because some pages do not follow the same pattern. As a result, the regex has optional clauses for bold tags and table cell tags (<td>). In order to make this markup optional in the regex, a the notation {0,1} is used to indicate the markup will appear either 0 times, or else 1 time. We use parentheses around the markup that might occur 1 or 0 times.

At this point, all we have to do is plug-in our regular expression, extract the matches, and write them to a file. Refer to my article on simple file IO for a quick introduction to reading/writing to files in .NET.

Download the code! (C#)

Solve the Problem in Code!

By clicking the button, you will run a small javascript program in your browser to solve the problem. It needs to fetch the list from the server, and then it should solve the problem in a few seconds. If you're curious, inspect the page source code! (Search for "function famousAmericansFetched".)










Reader Comments

Be the first to comment!

Add a Comment Show comment controls
Comment Details
Your Comment:

Name:
Password:
Your Name*:
Password*:
Confirm*:
Email*: (will not be published!)
  Your Gravatar image
Website?
About Me: