Contents

A boring look into my Spotify playlist

Let's jam out

Sit tight and listen to music

For the past weeks, I’ve taken breaks from staring blankly into the middle distance to dip deeper into my playlists than I have in years.

I’ve been using Spotify only since last year since most of time I used apple music. I like the feature of adding songs to library on apple music. By innertion I’m still using the library to save songs that I like. After moving apple music library to Spotify, I nerded out to see how my playlist looks under Spotify’s standard. All code can be found here.

Preparation

To get the data I used Spotify API and spotipy as a Python client. After creating a web application in Spotify API Dashboard and gathering the credentials, I was able to initialize and authorize the client.

import spotipy
import spotipy.util as util

user_id = 'your user_id'
client_id= 'your client_id'
client_secret= 'your client_secret'

token = util.prompt_for_user_token(user_id,
                                   scope = 'user-top-read playlist-read-collaborative',
                                   client_id=client_id,
                                   client_secret=client_secret,
                                   redirect_uri= redirect_uri) # arbitrary url you put in while registering in Spotify API
sp = spotipy.Spotify(auth=token)

Recent top songs

So here I’m gonna pull up my most played tracks in the last 4 weeks.

if token:
    sp = spotipy.Spotify(auth=token)
    artist_shortterm = []
    song_shortterm = []
    results = sp.current_user_top_tracks(time_range='short_term', limit=50)
   """
   time_range: long_term (calculated from several years of data), medium_term (approximately last 6 months), short_term (approximately last 4 weeks)
    """
    for i, item in enumerate(results['items']):
        song_shortterm.append(item['name'])
        artist_shortterm.append(item['artists'][0]['name'])

pd_top50 = pd.DataFrame({'track':song_shortterm, 'artist':artist_shortterm })
pd_top50.sample(12)
track artist
What I Got Sublime
Lovesong The Cure
Heart-Shaped Box Nirvana
Just Like Honey The Jesus and Mary Chain
Gigantic (live) Pixies
The Diamond Sea Sonic Youth
Hey Pixies
Schizophrenia Sonic Youth
Feel Good Inc. Gorillaz
Psycho Killer Talking Heads
Lamb Of God Marilyn Manson
Smelly Cat Medley Phoebe Buffay And The Hairballs

Get audio features of song tracks

As everything is inside just one playlist, it was easy to gather. The only problem was that user_playlist method in Spotipy doesn’t support pagination and can only return the first 100 track, but it was easily solved by adding condition of while more_songs

def get_features_from_playlist(user='', playlist_id=''):
    df_result = pd.DataFrame()
    track_list = ''
    added_ts_list = []
    artist_list = []
    title_list = []

    more_songs = True #As long as there is tracks not fetched from API, continue looping
    offset_index = 0

    if playlist_id != '' and user == '':
        print("Enter username for playlist")
        return

    while more_songs:
        songs = sp.user_playlist_tracks(user, playlist_id=playlist_id, offset=offset_index)

        for song in songs['items']:
            track_list += song['track']['id'] +','
            added_ts_list.append(song['added_at'])
            title_list.append(song['track']['name'])
            artists = song['track']['artists']
            artists_name = ''
            for artist in artists:
                artists_name += artist['name']  + ','
            artist_list.append(artists_name[:-1])

        track_features = sp.audio_features(track_list[:-1])
        df_temp = pd.DataFrame(track_features)
        df_result = df_result.append(df_temp)
        track_list = ''

        if songs['next'] == None:
            more_songs = False
        else:
            offset_index += songs['limit']
            print('Progress: ' + str(offset_index) + ' of '+ str(songs['total']))

    #add the timestamp added, title and artists of a song
    df_result['added_at'], df_result['song_title'], df_result['artists'] = added_ts_list, title_list, artist_list
    return df_result    

A glimpse of my playlists

get all my playlist:

user_playlists = sp.user_playlists(user='lalala')

for playlist in user_playlists['items']:
    print(playlist['id'], playlist['name'])
id1 TUNE
xx2 Sonic Youth Radio
xx3 Driving
xx4 Kickkkk
...

First column is playlist id, second is the name of my playlists.

Let’s dive into my quarantine playlist ‘TUNE’ 🙌

playlist = sp.user_playlist(user_id, 'what ever id1 is')
tracks = playlist['tracks']['items']
next_uri = playlist['tracks']['next']
for _ in range(int(playlist['tracks']['total'] / playlist['tracks']['limit'])):
    response = sp._get(next_uri)
    tracks += response['items']
    next_uri = response['next']

tracks_df = pd.DataFrame([(track['track']['id'],
                           track['track']['artists'][0]['name'],
                           track['track']['name'],
                           parse_date(track['track']['album']['release_date']) if track['track']['album']['release_date'] else None,
                           parse_date(track['added_at']))
                          for track in playlist['tracks']['items']],
                         columns=['id', 'artist', 'name', 'release_date', 'added_at'] )

Top artists

The first vanilla idea was the list of the most appearing artists in my playlist:

tracks_df.groupby('artist').count()['id'].reset_index().sort_values('id', ascending=False).rename(columns={'id': 'amount'}).head(10)
Artist amount
Sublime 20
Dire Straits 18
The Cure 13
BANKS 12
Radiohead 11
Pink Floyd 11
Oasis 11
Eminem 11
Nirvana 11
Gorillaz 11

Audio features of song tracks

Spotify API has an endpoint that provides features like danceability, energy, loudness and etc for tracks. So I gathered features for all tracks from the playlist. I don’t have years of records on Spotify so it’s difficult to check how my taste has changed over years. 🤷‍♀️

Getting sad?

So I looked at if my music habits changes under lockdown. It turns out only Valence had some visible difference:

Valence: A measure from 0.0 to 1.0 describing the musical positiveness conveyed by a track. Tracks with high valence sound more positive (e.g. happy, cheerful, euphoric), while tracks with low valence sound more negative (e.g. sad, depressed, angry).

Ahh. Lockdown unleashed my sadness just like the old saying from me ‘Depression feels like my 30-pound dog is sitting on my chest’.

Kidding. I luuuuuv my dog!!

plt.figure(figsize=(6,4))
sns.boxplot(x=df_2020.added_at.dt.month, y=df_2020.valence, color='#1eb954')
plt.title("Valence changes over months", fontsize=12,y=1.01,weight='bold')
plt.show()

image.png

Have moves?

image.png

Huh? I have no words. This is a shame to someone who has 100% danceability!!!

To get my head around the loss of my danceability, let’s check if my sadness has anything to do with the dancing!

tracks_w_features.plot(kind='scatter', x='danceability', y='valence')
plt.title("Danceability x Valence", fontsize=12, y=1.01,weight='bold')
plt.tight_layout()

image.png

Hmm. Interesting. So apperantly my playlist doesn’t show the positive correaltion between ‘upbeating’ and ‘danceable’. :thinking:

Let’s just say I’ve tried to be chipper under quarantine, because I’m afraid that if there’s one crack, I’ll fall apart completely.

How different and similar among songs?

I took those features out and calculate the distance between every two different tracks. (matrix production)

encode_fields = ['danceability', 'energy', 'key', 'loudness', 'mode', 'speechiness', 'acousticness', 'instrumentalness', 'liveness', 'valence', 'tempo', 'duration_ms', 'time_signature']

def encode(row):
    return np.array([
        (row[k] - tracks_with_features_df[k].min())
        / (tracks_with_features_df[k].max() - tracks_with_features_df[k].min())
        for k in encode_fields])

tracks_with_features_encoded_df = tracks_with_features_df.assign(
    encoded=tracks_with_features_df.apply(encode, axis=1))
tracks_w_features_encoded_product = tracks_w_features_encoded.assign(temp=0) \
    .merge(tracks_w_features_encoded.assign(temp=0), on='temp', how='left').drop(columns='temp')

tracks_w_features_encoded_product = tracks_w_features_encoded_product[
    tracks_w_features_encoded_product.id_x != tracks_w_features_encoded_product.id_y]

tracks_w_features_encoded_product['merge_id'] = tracks_w_features_encoded_product \
    .apply(lambda row: ''.join(sorted([row['id_x'], row['id_y']])), axis=1)

tracks_w_features_encoded_product['distance'] = tracks_w_features_encoded_product \
    .apply(lambda row: np.linalg.norm(row['encoded_x'] - row['encoded_y']), axis=1)

Most similar songs

After that I was able to get most similar songs/songs with the minimal distance, and it selected kind of similar songs:

tracks_w_features_encoded_product.sort_values('distance').drop_duplicates('merge_id') \
    [['artists_x', 'song_title_x', 'artists_y', 'song_title_y', 'distance']].head(10)
artists_x song_title_x artists_y song_title_y distance
Florence + The Machine The End Of Love Glass Animals Gooey 0.011732
The Stone Roses Love Spreads The Stone Roses Love Spreads 0.038285
OK Go Here It Goes Again The Jesus and Mary Chain Some Candy Talking 0.108457
The Libertines Can’t Stand Me Now Foo Fighters Monkey Wrench 0.117521
AC/DC Thunderstruck Muse Starlight 0.141387
Marilyn Manson The Nobodies Foo Fighters My Hero 0.147820
Pulp Something Changed Ween Mutilated Lips 0.158513
Blur My Terracotta Heart Men I Trust Tailwhip 0.161328
Talking Heads Road to Nowhere HAIM The Steps 0.162264
Halestorm Bad Romance Dodgy Good Enough 0.162886

Suprisely makes sense!

Most average songs

i.e. the songs with the least distance from every other song:

tracks_w_features_encoded_product \
    .groupby(['artists_x', 'song_title_x']) \
    .sum()['distance'] \
    .reset_index() \
    .sort_values('distance') \
    .head(10)
artists song_title distance
The Animals We Gotta Get Out Of This Place 758.802868
Tenacious D Fuck Her Gently 761.310917
Arctic Monkeys Do I Wanna Know? 767.353926
One Direction Story of My Life 773.588932
Urge Overkill Girl, You’ll Be a Woman Soon 775.938550
alt-J Tessellate 783.974366
Guns N’ Roses Knockin’ On Heaven’s Door 786.414124
alt-J Breezeblocks 787.258564
Divinyls I Touch Myself 787.683498
Tears For Fears,Dave Bascombe Head Over Heels - Dave Bascombe 7” N.Mix 789.882583

Most ‘So not me’ songs

artists_x song_title_x distance
The Stone Roses Love Spreads 1859.548582
Piano Dreamers Heaven’s Gate 1504.676314
Per-Olov Kindgren After Silence 1406.159590
Men I Trust All Night 1348.317181
MONO Ashes in the Snow - Remastered 1324.674595

Next

I’ll use cluster to bucket my favorite tracks when I get really bored.

Since you made this far

How do you find good music you’ve never heard before?

Try this gem I recently found… www.gnoosic.comyou put in three of your favorite bands/artists, and it will recommend similar stuff that you most likely haven’t listened to.

And then you will come back and thank me. You are welcome!

Stay safe.