I’m not gonna write you a love songSara Bareilles

Post 1 of 4

A question I’ve always wondered is their a magic formula for creating a number one song. Many song writers are prolific however their song doesn’t necessarily have commercial success. It may be difficult to quantify how the musical composition relates to the song’s success. However I am going to make an attempt at evaluating the lyrics of songs to determining if we can accurately predict if a song will be commercially successful.

I am using the Billboard Hot 100 to determine how well a song did commercially. The rank is in ascending order. The song that sold the most copies and where applicable had the most streams will be ranked first. The ranking is unique and doesn’t align with how other sources may rank the commercial success of a song in prior periods. The rankings are issued weekly.


The project is multi-steps, to help with readability I am going to break up the steps in to their own posts.


Getting the Data

  1. Get the Songs Made by the Artists
  2. Extract the List of Songs by the Artist
  3. Scrap the Lyrics for Each Song by the Artist
  4. Extract the Lyrics from Each File
  5. Scrap Rankings by Artists
  6. Parse the Song Rankings Files


Preparing the Data

  1. Clean the Rankings
  2. Match the Song Ranks


Model Prediction

  1. Song Summary
  2. Visualizations
  3. Prepare the Lyrics for Analysis
  4. Model Building