The uniqueness of Caroline and baby name popularity


The social security name dataset provides an easy-to-access, easy-to-parse, laptop-compatible dataset. Perhaps for this reason, and because names are so personal, a lot of analyses have been done on it. The most well known is the Baby Names Voyager. Other very interesting ones are the most poisoned name in history analysis by Hilary Parker and the most unisex names in history analysis by Nathan Yau.

As is clear in the above analyses, names exhibit both chronological and geographical patterns, changing in popularity depending on when and where you live. Ethel is not a popular name today, especially relative to the popularity it once had during the early 1900s. Todd briefly spiked in the early 1970s. A standard and relatively intuitive way of representing and investigating these patterns is a heatmap.

I downloaded the name/birth year data from ssn, parsed it by year / state, clustered it (Euclidean distance), and mapped it (note that all name differences are scaled to the total name popularity), all in R, clustering and mapping by the gplots (not gglplot) library and heatmap.2 function.


First, the chronological patterns. This includes only names that appear more than 200K times between 1901 and 2013, from James (5,001,709) to Sally (200,461). Are there any surprising patterns? I wouldn’t say so. There are some clear classic names – those that maintain their relative level of popularity year after year – Katherine, Kathryn, and Catherine among them (as indicated by the steady green colour across years). Samuel, Victor, Vincent, and Joseph also look to make the cut. And the standard flash-in-the-pans – Madison, which has already peaked, Shirley, and others. Full heatmap pdf, larger pdf here (all names appearing more than 50K times). Names that do not appear at all in one year are coloured white.

Are there any notable geographical patterns? [Here I include all names that appear more than 50K times between 2001 and 2015. In addition, both the rows and the columns have been reordered to cluster similar names and states.]


Most notably, Caroline appears to be unique – no other name has the same pattern of popularity, as shown by the length of the branch leading to Caroline.  Other surprises? The state most similar state to California, by name popularity, is Texas. Part of this is driven by the large Hispanic populations present in these states – Jesus, Diego, Luis, and Jose are considerably more popular than in most states. But other factors are also in play – Connor, Lily, and Amelia are distinctly unpopular, while Adrian and Andrea are more popular than you might expect.

Other perhaps surprising groups of states cluster by relative name popularity. The outdoor states – Washington, Oregon, and Colorado – group together, but with Kansas, Nebraska, and Idaho. Vermont, New Hampshire, and Maine group, but they are considerably different from any of the other New England states, which form a Northeast coastal cohort. Washington DC and West Virginia stand out as being different (again, in terms of relative name popularity) from any other states.

Meanwhile, as you probably already expected, the Hayden-Landon-Colton-Peyton-Brooklyn-Addison-Brayden-Carson coterie is not (was never?) popular in the coastal states, but is wildly popular in the rest of the country.

Look for yourself here. Analysis for the years 1980 to 1999 here, with Caroline as unique as ever.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s