visualizing piero scaruffi’s music database

afr.png
Scaruffi’s music database

Since the mid-1980s, Piero Scaruffi has written essays on countless topics, and published them all for free on the internet – which he helped develop. You can learn more about him (and pretty much anything else that might interest you) on his legendary website.
I was introduced to the site when a friend began referring to certain records as “Scaruffi 7s,” in reference to their ratings in Scaruffi’s music database. One of the oldest components of the site, the database contains entries on thousands of acts. My initial reaction was along the lines of “7 doesn’t sound that great.” But scales are relative and Scaruffi is a judicious critic. Getting a 7 puts your record in the 84th percentile of albums that have received ratings (thousands don’t even get a number).

The Distribution of Scaruffi's Favor.png

In music writing, where homogeneity of style and opinion are typical, Scaruffi is a complete outlier. He reps for over-looked bands and fulminates against the orthodoxy. If top-40 is your thing you’ll likely be disappointed, but if you want to find out which Georgia Anne Muldrow album to start with, or about the interactive CD-ROM album Todd Rundgren made in 1993, well, Scaruffi is your guy. Having read the music database for some time, I decided I wanted to see what it looks like.

closeup.png
If you squint you can see band names

I wrote a web scraper to download Scaruffi’s entries on over 5000 bands. Using the networkx Python package, I converted the entries into a graph. Nodes represent bands and edges represent hyper-links between them. Finally, I plugged the results into the Gephi visualization tool.

Loosely speaking, we can interpret a node’s in-degree (the number of other nodes pointing to it) as the corresponding band’s level of influence. Why? Suppose Scaruffi ends up mentioning band x in his discussion of 20 other bands. The odds are that many of these mentions will discuss a creative debt to band x.

kb.png
“I definitely don’t think of myself as being an influence.” – Kate Bush

Admittedly this interpretation is coarse. Not all references include hyper-links, and not all links represent an “influenced by” relation; other possibilities include shared members and side-projects. But empirically, it kind of pans out: the nodes of influential artists tend to have high in-degree.

in-degree-rankings-4

Alternatively we could run PageRank, the original algorithm behind Google. PageRank measures a node’s influence not just in terms of its immediate neighbours, but throughout the entire network. So, for example, a large number of links from ’90s nu-metal bands probably won’t change your PageRank, but a single link from Bob Dylan helps a lot. The top five PageRank results were Dylan, the Beatles, ELO (see: a single link from Dylan helps a lot), Zappa, and Television.

vu_fixed
Something about selling 30,000 copies

The two metrics have different flavours. The in-degree counts are more closely tied to Scaruffi’s interests and opinions (and if you don’t believe me read the first sentence of this).  Additionally, they are robust against outliers—missing and erroneous links won’t have a drastic effect. By contrast, PageRank is less tied to Scaruffi’s opinions. It may deem a band influential even when he doesn’t particularly like them. On the other hand, it can be extremely sensitive to outliers.

The in-degree metric is ultimately more interesting. The database is singular not because it is big enough to facilitate network analysis—both Wikipedia and AllMusic are considerably larger—but because it was crafted by a single person, and reflects his views.

(This post was updated on 10-02-2016)

Published by Dave Fernig

data@shopify

5 replies on “visualizing piero scaruffi’s music database”

  1. Nice, I like the definition of influential that transpires here. Those five bands were influential in alternative bands and were seminal in creating new sounds, rather than influential in the amount of bands that imitated them, where bands like Beatles, Rolling Stones, Dylan or Madonna would lead.

    What library have you used for the scraper, Scrapy?

    Like

  2. From eyeballing that graph it looks like Scaruffi’s given ratings to more than 20,000 albums. If we assume the average album he’s rated is 40 minutes long this would mean he’s reviewed over 13,000 hours of music – in other words, a little over 4.5 years of listening to new music 8 hours a day, 365 days a year. Somehow, though, he still finds the time to work as a full-time software consultant, travel the globe and hike the Sierra Nevadas.

    tldr: I’m calling bullshit.

    Like

    1. Must be impossible for someone to devote their free time to something they enjoy and are passionate about. I’m sure we could catalog your life and find something that you’ve spent countless hours on

      Like

Comments are closed.