Skip to content

Open In Colab

Python Foundations
X1: The Spotify API

Instructor: Wesley Beckner

Contact: wesleybeckner@gmail.com



For this workbook we're going to use the spotipy library to access the Spotify Web API!



Install and import libraries

First we will need to install it:

!pip install spotipy
Collecting spotipy
  Downloading spotipy-2.19.0-py3-none-any.whl (27 kB)
Requirement already satisfied: six>=1.15.0 in /usr/local/lib/python3.7/dist-packages (from spotipy) (1.15.0)
Collecting urllib3>=1.26.0
  Downloading urllib3-1.26.8-py2.py3-none-any.whl (138 kB)
     |████████████████████████████████| 138 kB 14.7 MB/s 
[?25hCollecting requests>=2.25.0
  Downloading requests-2.27.1-py2.py3-none-any.whl (63 kB)
     |████████████████████████████████| 63 kB 687 kB/s 
[?25hRequirement already satisfied: charset-normalizer~=2.0.0 in /usr/local/lib/python3.7/dist-packages (from requests>=2.25.0->spotipy) (2.0.11)
Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.7/dist-packages (from requests>=2.25.0->spotipy) (2021.10.8)
Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.7/dist-packages (from requests>=2.25.0->spotipy) (2.10)
Installing collected packages: urllib3, requests, spotipy
  Attempting uninstall: urllib3
    Found existing installation: urllib3 1.24.3
    Uninstalling urllib3-1.24.3:
      Successfully uninstalled urllib3-1.24.3
  Attempting uninstall: requests
    Found existing installation: requests 2.23.0
    Uninstalling requests-2.23.0:
      Successfully uninstalled requests-2.23.0
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
google-colab 1.0.0 requires requests~=2.23.0, but you have requests 2.27.1 which is incompatible.
datascience 0.10.6 requires folium==0.2.1, but you have folium 0.8.3 which is incompatible.
Successfully installed requests-2.27.1 spotipy-2.19.0 urllib3-1.26.8

And then import

from spotipy import client
import spotipy
from spotipy.oauth2 import SpotifyClientCredentials, SpotifyOAuth
import sys
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

Setup developer account

You'll need to visit this link to setup a developer account, then fill in your authorization information below

SPOTIPY_CLIENT_ID = ""
SPOTIPY_CLIENT_SECRET = ""

Top 10 tracks of an artist

We can grab the first 10 tracks of Led Zepplin:

alist = ['elephant', 'pinecone', 'toothbrush']
for index, item in enumerate(alist):
  print(str(index) + ' ' + item)
0 elephant
1 pinecone
2 toothbrush
lz_uri = 'spotify:artist:36QJpDe2go2KgaRleHCDTp'

spotify = spotipy.Spotify(client_credentials_manager=SpotifyClientCredentials(
    client_id=SPOTIPY_CLIENT_ID,
    client_secret=SPOTIPY_CLIENT_SECRET
))

results = spotify.artist_top_tracks(lz_uri)
ids = []

for i, track in enumerate(results['tracks']):
    ids.append(track['id'])
    if i < 10:
      print('track    : ' + track['name'])
      print('audio    : ' + track['preview_url'])
      print('cover art: ' + track['album']['images'][0]['url'])
      print()
track    : Stairway to Heaven - Remaster
audio    : https://p.scdn.co/mp3-preview/8226164717312bc411f8635580562d67e191a754?cid=93cef3f9255042d7854a6014e0929504
cover art: https://i.scdn.co/image/ab67616d0000b273c8a11e48c91a982d086afc69

track    : Immigrant Song - Remaster
audio    : https://p.scdn.co/mp3-preview/8455599677a13017978dcd3f4b210937f0a16bcb?cid=93cef3f9255042d7854a6014e0929504
cover art: https://i.scdn.co/image/ab67616d0000b27390a50cfe99a4c19ff3cbfbdb

track    : Whole Lotta Love - 1990 Remaster
audio    : https://p.scdn.co/mp3-preview/ce11b19a4d2de9976d7626df0717d0073863909c?cid=93cef3f9255042d7854a6014e0929504
cover art: https://i.scdn.co/image/ab67616d0000b273fc4f17340773c6c3579fea0d

track    : Black Dog - Remaster
audio    : https://p.scdn.co/mp3-preview/9b76619fd9d563a48d38cc90ca00c3008327b52e?cid=93cef3f9255042d7854a6014e0929504
cover art: https://i.scdn.co/image/ab67616d0000b273c8a11e48c91a982d086afc69

track    : Kashmir - Remaster
audio    : https://p.scdn.co/mp3-preview/f3ca68c9ceaa3435d5bd55c0199ba0b09b916cce?cid=93cef3f9255042d7854a6014e0929504
cover art: https://i.scdn.co/image/ab67616d0000b273765b0617b572bdd1dbdc7d8e

track    : Ramble On - 1990 Remaster
audio    : https://p.scdn.co/mp3-preview/83383aceb01ea27b0bffdedfaebe55e29b33aca2?cid=93cef3f9255042d7854a6014e0929504
cover art: https://i.scdn.co/image/ab67616d0000b273fc4f17340773c6c3579fea0d

track    : Rock and Roll - Remaster
audio    : https://p.scdn.co/mp3-preview/e7ea8a13f7caf6942c5447e9cd96aac2a076d85a?cid=93cef3f9255042d7854a6014e0929504
cover art: https://i.scdn.co/image/ab67616d0000b273c8a11e48c91a982d086afc69

track    : Going to California - Remaster
audio    : https://p.scdn.co/mp3-preview/4bdae56c6a9f7a8ec42b753cb7bea2c77ec68f1e?cid=93cef3f9255042d7854a6014e0929504
cover art: https://i.scdn.co/image/ab67616d0000b273c8a11e48c91a982d086afc69

track    : Good Times Bad Times - 1993 Remaster
audio    : https://p.scdn.co/mp3-preview/c1f024eb57b569b926c8e68cab0a6056dc7d9654?cid=93cef3f9255042d7854a6014e0929504
cover art: https://i.scdn.co/image/ab67616d0000b2736f2f499c1df1f210c9b34b32

track    : D'yer Mak'er - Remaster
audio    : https://p.scdn.co/mp3-preview/863a26744fa4389f1dc61557133df3453be82d7b?cid=93cef3f9255042d7854a6014e0929504
cover art: https://i.scdn.co/image/ab67616d0000b2731816adce1d49e35d3ce9a1d1

the top tracks API only gives the top 10 tracks by an artist:

len(ids)
10
ids
['5CQ30WqJwcep0pYcV4AMNc',
 '78lgmZwycJ3nzsdgmPPGNx',
 '0hCB0YR03f6AmQaHbwWDe8',
 '3qT4bUD1MaWpGrTwcvguhb',
 '6Vjk8MNXpQpi0F4BefdTyq',
 '3MODES4TNtygekLl146Dxd',
 '4PRGxHpCpF2yoOHYKQIEwD',
 '70gbuMqwNBE2Y5rkQJE9By',
 '0QwZfbw26QeUoIy82Z2jYp',
 '4ItljeeAXtHsnsnnQojaO2']

Spotify has an audio features API that can be used for ML or data visualization:

features = spotify.audio_features(ids)
features
[{'acousticness': 0.58,
  'analysis_url': 'https://api.spotify.com/v1/audio-analysis/5CQ30WqJwcep0pYcV4AMNc',
  'danceability': 0.338,
  'duration_ms': 482830,
  'energy': 0.34,
  'id': '5CQ30WqJwcep0pYcV4AMNc',
  'instrumentalness': 0.0032,
  'key': 9,
  'liveness': 0.116,
  'loudness': -12.049,
  'mode': 0,
  'speechiness': 0.0339,
  'tempo': 82.433,
  'time_signature': 4,
  'track_href': 'https://api.spotify.com/v1/tracks/5CQ30WqJwcep0pYcV4AMNc',
  'type': 'audio_features',
  'uri': 'spotify:track:5CQ30WqJwcep0pYcV4AMNc',
  'valence': 0.197},
 {'acousticness': 0.013,
  'analysis_url': 'https://api.spotify.com/v1/audio-analysis/78lgmZwycJ3nzsdgmPPGNx',
  'danceability': 0.564,
  'duration_ms': 146250,
  'energy': 0.932,
  'id': '78lgmZwycJ3nzsdgmPPGNx',
  'instrumentalness': 0.169,
  'key': 11,
  'liveness': 0.349,
  'loudness': -10.068,
  'mode': 1,
  'speechiness': 0.0554,
  'tempo': 112.937,
  'time_signature': 4,
  'track_href': 'https://api.spotify.com/v1/tracks/78lgmZwycJ3nzsdgmPPGNx',
  'type': 'audio_features',
  'uri': 'spotify:track:78lgmZwycJ3nzsdgmPPGNx',
  'valence': 0.619},
 {'acousticness': 0.0484,
  'analysis_url': 'https://api.spotify.com/v1/audio-analysis/0hCB0YR03f6AmQaHbwWDe8',
  'danceability': 0.412,
  'duration_ms': 333893,
  'energy': 0.902,
  'id': '0hCB0YR03f6AmQaHbwWDe8',
  'instrumentalness': 0.131,
  'key': 9,
  'liveness': 0.405,
  'loudness': -11.6,
  'mode': 1,
  'speechiness': 0.405,
  'tempo': 89.74,
  'time_signature': 4,
  'track_href': 'https://api.spotify.com/v1/tracks/0hCB0YR03f6AmQaHbwWDe8',
  'type': 'audio_features',
  'uri': 'spotify:track:0hCB0YR03f6AmQaHbwWDe8',
  'valence': 0.422},
 {'acousticness': 0.396,
  'analysis_url': 'https://api.spotify.com/v1/audio-analysis/3qT4bUD1MaWpGrTwcvguhb',
  'danceability': 0.437,
  'duration_ms': 295387,
  'energy': 0.864,
  'id': '3qT4bUD1MaWpGrTwcvguhb',
  'instrumentalness': 0.0314,
  'key': 4,
  'liveness': 0.242,
  'loudness': -7.842,
  'mode': 0,
  'speechiness': 0.0904,
  'tempo': 81.394,
  'time_signature': 4,
  'track_href': 'https://api.spotify.com/v1/tracks/3qT4bUD1MaWpGrTwcvguhb',
  'type': 'audio_features',
  'uri': 'spotify:track:3qT4bUD1MaWpGrTwcvguhb',
  'valence': 0.749},
 {'acousticness': 0.452,
  'analysis_url': 'https://api.spotify.com/v1/audio-analysis/6Vjk8MNXpQpi0F4BefdTyq',
  'danceability': 0.483,
  'duration_ms': 517125,
  'energy': 0.615,
  'id': '6Vjk8MNXpQpi0F4BefdTyq',
  'instrumentalness': 0.000414,
  'key': 2,
  'liveness': 0.0512,
  'loudness': -8.538,
  'mode': 1,
  'speechiness': 0.0497,
  'tempo': 80.576,
  'time_signature': 3,
  'track_href': 'https://api.spotify.com/v1/tracks/6Vjk8MNXpQpi0F4BefdTyq',
  'type': 'audio_features',
  'uri': 'spotify:track:6Vjk8MNXpQpi0F4BefdTyq',
  'valence': 0.594},
 {'acousticness': 0.072,
  'analysis_url': 'https://api.spotify.com/v1/audio-analysis/3MODES4TNtygekLl146Dxd',
  'danceability': 0.468,
  'duration_ms': 263333,
  'energy': 0.607,
  'id': '3MODES4TNtygekLl146Dxd',
  'instrumentalness': 0.000852,
  'key': 9,
  'liveness': 0.225,
  'loudness': -11.367,
  'mode': 1,
  'speechiness': 0.0336,
  'tempo': 98.429,
  'time_signature': 4,
  'track_href': 'https://api.spotify.com/v1/tracks/3MODES4TNtygekLl146Dxd',
  'type': 'audio_features',
  'uri': 'spotify:track:3MODES4TNtygekLl146Dxd',
  'valence': 0.886},
 {'acousticness': 0.000582,
  'analysis_url': 'https://api.spotify.com/v1/audio-analysis/4PRGxHpCpF2yoOHYKQIEwD',
  'danceability': 0.317,
  'duration_ms': 220561,
  'energy': 0.887,
  'id': '4PRGxHpCpF2yoOHYKQIEwD',
  'instrumentalness': 0.00258,
  'key': 9,
  'liveness': 0.0891,
  'loudness': -7.292,
  'mode': 1,
  'speechiness': 0.0375,
  'tempo': 169.613,
  'time_signature': 4,
  'track_href': 'https://api.spotify.com/v1/tracks/4PRGxHpCpF2yoOHYKQIEwD',
  'type': 'audio_features',
  'uri': 'spotify:track:4PRGxHpCpF2yoOHYKQIEwD',
  'valence': 0.871},
 {'acousticness': 0.943,
  'analysis_url': 'https://api.spotify.com/v1/audio-analysis/70gbuMqwNBE2Y5rkQJE9By',
  'danceability': 0.503,
  'duration_ms': 212161,
  'energy': 0.265,
  'id': '70gbuMqwNBE2Y5rkQJE9By',
  'instrumentalness': 0.045,
  'key': 2,
  'liveness': 0.0867,
  'loudness': -15.913,
  'mode': 1,
  'speechiness': 0.0333,
  'tempo': 78.044,
  'time_signature': 4,
  'track_href': 'https://api.spotify.com/v1/tracks/70gbuMqwNBE2Y5rkQJE9By',
  'type': 'audio_features',
  'uri': 'spotify:track:70gbuMqwNBE2Y5rkQJE9By',
  'valence': 0.522},
 {'acousticness': 0.0382,
  'analysis_url': 'https://api.spotify.com/v1/audio-analysis/0QwZfbw26QeUoIy82Z2jYp',
  'danceability': 0.476,
  'duration_ms': 166267,
  'energy': 0.717,
  'id': '0QwZfbw26QeUoIy82Z2jYp',
  'instrumentalness': 7.61e-05,
  'key': 9,
  'liveness': 0.0818,
  'loudness': -9.192,
  'mode': 1,
  'speechiness': 0.0949,
  'tempo': 93.584,
  'time_signature': 4,
  'track_href': 'https://api.spotify.com/v1/tracks/0QwZfbw26QeUoIy82Z2jYp',
  'type': 'audio_features',
  'uri': 'spotify:track:0QwZfbw26QeUoIy82Z2jYp',
  'valence': 0.753},
 {'acousticness': 0.262,
  'analysis_url': 'https://api.spotify.com/v1/audio-analysis/4ItljeeAXtHsnsnnQojaO2',
  'danceability': 0.525,
  'duration_ms': 262748,
  'energy': 0.929,
  'id': '4ItljeeAXtHsnsnnQojaO2',
  'instrumentalness': 2.9e-05,
  'key': 9,
  'liveness': 0.0754,
  'loudness': -8.56,
  'mode': 0,
  'speechiness': 0.0784,
  'tempo': 163.503,
  'time_signature': 4,
  'track_href': 'https://api.spotify.com/v1/tracks/4ItljeeAXtHsnsnnQojaO2',
  'type': 'audio_features',
  'uri': 'spotify:track:4ItljeeAXtHsnsnnQojaO2',
  'valence': 0.556}]
pd.DataFrame(features)
danceability energy key loudness mode speechiness acousticness instrumentalness liveness valence tempo type id uri track_href analysis_url duration_ms time_signature
0 0.338 0.340 9 -12.049 0 0.0339 0.580000 0.003200 0.1160 0.197 82.433 audio_features 5CQ30WqJwcep0pYcV4AMNc spotify:track:5CQ30WqJwcep0pYcV4AMNc https://api.spotify.com/v1/tracks/5CQ30WqJwcep... https://api.spotify.com/v1/audio-analysis/5CQ3... 482830 4
1 0.564 0.932 11 -10.068 1 0.0554 0.013000 0.169000 0.3490 0.619 112.937 audio_features 78lgmZwycJ3nzsdgmPPGNx spotify:track:78lgmZwycJ3nzsdgmPPGNx https://api.spotify.com/v1/tracks/78lgmZwycJ3n... https://api.spotify.com/v1/audio-analysis/78lg... 146250 4
2 0.412 0.902 9 -11.600 1 0.4050 0.048400 0.131000 0.4050 0.422 89.740 audio_features 0hCB0YR03f6AmQaHbwWDe8 spotify:track:0hCB0YR03f6AmQaHbwWDe8 https://api.spotify.com/v1/tracks/0hCB0YR03f6A... https://api.spotify.com/v1/audio-analysis/0hCB... 333893 4
3 0.437 0.864 4 -7.842 0 0.0904 0.396000 0.031400 0.2420 0.749 81.394 audio_features 3qT4bUD1MaWpGrTwcvguhb spotify:track:3qT4bUD1MaWpGrTwcvguhb https://api.spotify.com/v1/tracks/3qT4bUD1MaWp... https://api.spotify.com/v1/audio-analysis/3qT4... 295387 4
4 0.483 0.615 2 -8.538 1 0.0497 0.452000 0.000414 0.0512 0.594 80.576 audio_features 6Vjk8MNXpQpi0F4BefdTyq spotify:track:6Vjk8MNXpQpi0F4BefdTyq https://api.spotify.com/v1/tracks/6Vjk8MNXpQpi... https://api.spotify.com/v1/audio-analysis/6Vjk... 517125 3
5 0.468 0.607 9 -11.367 1 0.0336 0.072000 0.000852 0.2250 0.886 98.429 audio_features 3MODES4TNtygekLl146Dxd spotify:track:3MODES4TNtygekLl146Dxd https://api.spotify.com/v1/tracks/3MODES4TNtyg... https://api.spotify.com/v1/audio-analysis/3MOD... 263333 4
6 0.317 0.887 9 -7.292 1 0.0375 0.000582 0.002580 0.0891 0.871 169.613 audio_features 4PRGxHpCpF2yoOHYKQIEwD spotify:track:4PRGxHpCpF2yoOHYKQIEwD https://api.spotify.com/v1/tracks/4PRGxHpCpF2y... https://api.spotify.com/v1/audio-analysis/4PRG... 220561 4
7 0.503 0.265 2 -15.913 1 0.0333 0.943000 0.045000 0.0867 0.522 78.044 audio_features 70gbuMqwNBE2Y5rkQJE9By spotify:track:70gbuMqwNBE2Y5rkQJE9By https://api.spotify.com/v1/tracks/70gbuMqwNBE2... https://api.spotify.com/v1/audio-analysis/70gb... 212161 4
8 0.476 0.717 9 -9.192 1 0.0949 0.038200 0.000076 0.0818 0.753 93.584 audio_features 0QwZfbw26QeUoIy82Z2jYp spotify:track:0QwZfbw26QeUoIy82Z2jYp https://api.spotify.com/v1/tracks/0QwZfbw26QeU... https://api.spotify.com/v1/audio-analysis/0QwZ... 166267 4
9 0.525 0.929 9 -8.560 0 0.0784 0.262000 0.000029 0.0754 0.556 163.503 audio_features 4ItljeeAXtHsnsnnQojaO2 spotify:track:4ItljeeAXtHsnsnnQojaO2 https://api.spotify.com/v1/tracks/4ItljeeAXtHs... https://api.spotify.com/v1/audio-analysis/4Itl... 262748 4

Add additional artists

Let's get some other artist data to compare with:

artist_dict = {'zep': '36QJpDe2go2KgaRleHCDTp',
               'tswift': '06HL4z0CvFAxyc27GXpf02',
               'debussy': '1Uff91EOsvd99rtAupatMP',
               'luttrell': '4EOyJnoiiOJ4vuNhSBArB2',
               'johnwill': '3dRfiJ2650SZu6GbydcHNb'}

color_dict = {'zep': 'tab:blue',
               'tswift': 'tab:green',
               'debussy': 'tab:orange',
               'luttrell': 'tab:red',
               'johnwill': 'tab:pink'}
for artist, uri in artist_dict.items():
  print(artist + ' ' + uri)
zep 36QJpDe2go2KgaRleHCDTp
tswift 06HL4z0CvFAxyc27GXpf02
debussy 1Uff91EOsvd99rtAupatMP
luttrell 4EOyJnoiiOJ4vuNhSBArB2
johnwill 3dRfiJ2650SZu6GbydcHNb
ids = []
artists = []
colors = []
for artist, uri in artist_dict.items():
  results = spotify.artist_top_tracks('spotify:artist:' + uri)


  for i, track in enumerate(results['tracks']):
      ids.append(track['id'])
      artists.append(artist)
      colors.append(color_dict[artist])
      if i < 1:
        print('track    : ' + track['name'])
        print('cover art: ' + track['album']['images'][0]['url'])
        print()
track    : Stairway to Heaven - Remaster
cover art: https://i.scdn.co/image/ab67616d0000b273c8a11e48c91a982d086afc69

track    : All Too Well (10 Minute Version) (Taylor's Version) (From The Vault)
cover art: https://i.scdn.co/image/ab67616d0000b273318443aab3531a0558e79a4d

track    : Clair de Lune, L. 32
cover art: https://i.scdn.co/image/ab67616d0000b2736e7bb273ff9cb1de1e1d4d0a

track    : Twin Souls
cover art: https://i.scdn.co/image/ab67616d0000b2735dea3da9d2751a0fa7b23fd3

track    : Carol of the Bells
cover art: https://i.scdn.co/image/ab67616d0000b273a68c06155b7c3cf82b00cb96
features = spotify.audio_features(ids)
df = pd.DataFrame(features)
df['artist'] = artists
df['color'] = colors
feat_names = df.columns[:11]
print(feat_names)
Index(['danceability', 'energy', 'key', 'loudness', 'mode', 'speechiness',
       'acousticness', 'instrumentalness', 'liveness', 'valence', 'tempo'],
      dtype='object')
df.head()
danceability energy key loudness mode speechiness acousticness instrumentalness liveness valence tempo type id uri track_href analysis_url duration_ms time_signature artist color
0 0.338 0.340 9 -12.049 0 0.0339 0.5800 0.003200 0.1160 0.197 82.433 audio_features 5CQ30WqJwcep0pYcV4AMNc spotify:track:5CQ30WqJwcep0pYcV4AMNc https://api.spotify.com/v1/tracks/5CQ30WqJwcep... https://api.spotify.com/v1/audio-analysis/5CQ3... 482830 4 zep tab:blue
1 0.564 0.932 11 -10.068 1 0.0554 0.0130 0.169000 0.3490 0.619 112.937 audio_features 78lgmZwycJ3nzsdgmPPGNx spotify:track:78lgmZwycJ3nzsdgmPPGNx https://api.spotify.com/v1/tracks/78lgmZwycJ3n... https://api.spotify.com/v1/audio-analysis/78lg... 146250 4 zep tab:blue
2 0.412 0.902 9 -11.600 1 0.4050 0.0484 0.131000 0.4050 0.422 89.740 audio_features 0hCB0YR03f6AmQaHbwWDe8 spotify:track:0hCB0YR03f6AmQaHbwWDe8 https://api.spotify.com/v1/tracks/0hCB0YR03f6A... https://api.spotify.com/v1/audio-analysis/0hCB... 333893 4 zep tab:blue
3 0.437 0.864 4 -7.842 0 0.0904 0.3960 0.031400 0.2420 0.749 81.394 audio_features 3qT4bUD1MaWpGrTwcvguhb spotify:track:3qT4bUD1MaWpGrTwcvguhb https://api.spotify.com/v1/tracks/3qT4bUD1MaWp... https://api.spotify.com/v1/audio-analysis/3qT4... 295387 4 zep tab:blue
4 0.483 0.615 2 -8.538 1 0.0497 0.4520 0.000414 0.0512 0.594 80.576 audio_features 6Vjk8MNXpQpi0F4BefdTyq spotify:track:6Vjk8MNXpQpi0F4BefdTyq https://api.spotify.com/v1/tracks/6Vjk8MNXpQpi... https://api.spotify.com/v1/audio-analysis/6Vjk... 517125 3 zep tab:blue

Visualize the audio features

# <groupby>.<select>.<agg operation>
df.groupby('artist')[feat_names].mean()
danceability energy key loudness mode speechiness acousticness instrumentalness liveness valence tempo
artist
debussy 0.2868 0.019716 3.6 -30.3510 0.8 0.04209 0.990800 0.927000 0.07817 0.07747 81.2996
johnwill 0.2783 0.151770 3.9 -20.4347 0.5 0.03766 0.903000 0.615920 0.18085 0.19182 87.2151
luttrell 0.5310 0.706000 4.5 -9.3468 0.7 0.03504 0.127682 0.669259 0.18714 0.16701 117.5059
tswift 0.5073 0.621700 4.2 -6.4014 0.9 0.05926 0.198158 0.000263 0.13591 0.40550 125.7815
zep 0.4523 0.705800 7.3 -10.2421 0.7 0.09121 0.280518 0.038355 0.17212 0.61690 105.0253
fig, ax = plt.subplots(figsize=(10,10))
df.groupby('artist')[feat_names].mean().plot(kind='barh', ax=ax)
<matplotlib.axes._subplots.AxesSubplot at 0x7f7e11316ad0>

png

we see that it is difficult to see all the features together at once. One tactic may be to normalize the features

# (x - x.min()) / (x.max() - x.min())
scaled = df[feat_names].apply(lambda x: (x - x.mean()) / (x.std()))
scaled['artist'] = df['artist']
fig, ax = plt.subplots(figsize=(10,10))
scaled.groupby('artist')[feat_names].mean().plot(kind='barh', ax=ax)
<matplotlib.axes._subplots.AxesSubplot at 0x7f9c3412a390>

png

We can then investigate if these scaled and centered features separate out under a dimensionality reduction (a topic we explore in unsupervised learning):

scaled[feat_names].head()
danceability energy key loudness mode speechiness acousticness instrumentalness liveness valence tempo
0 -0.469349 -0.300769 1.218214 0.351423 -1.587451 -0.342620 0.186938 -1.068262 -0.305866 -0.356202 -0.626640
1 0.980922 1.462203 1.784825 0.561988 0.617342 0.042005 -1.138510 -0.671989 1.739793 1.230425 0.286535
2 0.005519 1.372864 1.218214 0.399148 0.617342 6.296175 -1.055757 -0.762811 2.231454 0.489749 -0.407896
3 0.165947 1.259700 -0.198314 0.798594 -1.587451 0.668137 -0.243189 -1.000862 0.800371 1.719197 -0.657743
4 0.461135 0.518179 -0.764925 0.724615 0.617342 -0.059966 -0.112281 -1.074920 -0.874787 1.136431 -0.682231
from sklearn.decomposition import PCA
pca = PCA(n_components=2)
pca.fit(scaled[feat_names])
PCA(n_components=2)
color_dict
{'debussy': 'tab:orange',
 'johnwill': 'tab:pink',
 'luttrell': 'tab:red',
 'tswift': 'tab:green',
 'zep': 'tab:blue'}
X_pca = pca.transform(df[feat_names])

plt.scatter(X_pca[:, 0], X_pca[:, 1], alpha=0.8, c=df['color'].values,
            edgecolor='grey')
plt.xlabel('First PC')
plt.ylabel('Second PC')
Text(0, 0.5, 'Second PC')

png

🏋️‍♀️ Exercises

🕵️‍♀️ Exercise 1: Find artist urls and build dataset

a. Navigate to the Spotify web application, pick 5 artists and update the the dictionary below

import matplotlib.colors as mcolors
mcolors.TABLEAU_COLORS
OrderedDict([('tab:blue', '#1f77b4'),
             ('tab:orange', '#ff7f0e'),
             ('tab:green', '#2ca02c'),
             ('tab:red', '#d62728'),
             ('tab:purple', '#9467bd'),
             ('tab:brown', '#8c564b'),
             ('tab:pink', '#e377c2'),
             ('tab:gray', '#7f7f7f'),
             ('tab:olive', '#bcbd22'),
             ('tab:cyan', '#17becf')])
# Cell for 1.a

artist_dict = {'zep': '36QJpDe2go2KgaRleHCDTp',
               'tswift': '06HL4z0CvFAxyc27GXpf02',
               'debussy': '1Uff91EOsvd99rtAupatMP',
               'luttrell': '4EOyJnoiiOJ4vuNhSBArB2',
               'johnwill': '3dRfiJ2650SZu6GbydcHNb',
               'chopin': '7y97mc3bZRFXzT2szRM4L4',
               '2pac': '1ZwdS5xdxEREPySFridCfh',
               'ganja': '1a6oIpEh4DGgaqgWg5xwd3'}

color_dict = {'zep': 'tab:blue',
               'tswift': 'tab:green',
               'debussy': 'tab:orange',
               'luttrell': 'tab:red',
               'johnwill': 'tab:pink',
               'chopin': 'tab:brown',
               '2pac': 'tab:gray',
               'ganja': 'tab:purple'}

b. build feature set using spotify.artist_top_tracks('spotify:artist:' + uri) and storing the resultant track information

# Cell for 1.b
ids = []
artists = []
colors = []
for artist, uri in artist_dict.items():
  results = spotify.artist_top_tracks('spotify:artist:' + uri)


  for i, track in enumerate(results['tracks']):
      ids.append(track['id'])
      artists.append(artist)
      colors.append(color_dict[artist])
      if i < 1:
        print('track    : ' + track['name'])
        print('cover art: ' + track['album']['images'][0]['url'])
        print()
track    : Stairway to Heaven - Remaster
cover art: https://i.scdn.co/image/ab67616d0000b273c8a11e48c91a982d086afc69

track    : All Too Well (10 Minute Version) (Taylor's Version) (From The Vault)
cover art: https://i.scdn.co/image/ab67616d0000b273318443aab3531a0558e79a4d

track    : Clair de Lune, L. 32
cover art: https://i.scdn.co/image/ab67616d0000b2736e7bb273ff9cb1de1e1d4d0a

track    : Twin Souls
cover art: https://i.scdn.co/image/ab67616d0000b2735dea3da9d2751a0fa7b23fd3

track    : Carol of the Bells
cover art: https://i.scdn.co/image/ab67616d0000b273a68c06155b7c3cf82b00cb96

track    : Nocturne No. 2 in E-Flat Major, Op. 9 No. 2
cover art: https://i.scdn.co/image/ab67616d0000b27355c82855070525581e2c6fee

track    : Hit 'Em Up - Single Version
cover art: https://i.scdn.co/image/ab67616d0000b273d81a092eb373ded457d94eec

track    : Miss You
cover art: https://i.scdn.co/image/ab67616d0000b273bbfbb9cd91d8676f68676e59
features = spotify.audio_features(ids)
df = pd.DataFrame(features)
df['artist'] = artists
df['color'] = colors
feat_names = df.columns[:11]
print(feat_names)
Index(['danceability', 'energy', 'key', 'loudness', 'mode', 'speechiness',
       'acousticness', 'instrumentalness', 'liveness', 'valence', 'tempo'],
      dtype='object')

💫 Exercise 2: Visualize Features

a. Create a boxplot of each feature, grouped by artist (similar to example above)

# Cell for 2.a
fig, ax = plt.subplots(6, 2, figsize=(10,25))
indices = np.argwhere(ax)
for index, feature in enumerate(feat_names):
  df.boxplot(by='artist', column=feature, ax=ax[indices[index][0],
                                                indices[index][1]])
plt.tight_layout()

png

b. Normalize the features this time, then create the boxplot

# Cell for 2.b
scaled = df[feat_names].apply(lambda x: (x - x.mean()) / (x.std()))
scaled['artist'] = df['artist']

fig, ax = plt.subplots(6, 2, figsize=(10,25))
indices = np.argwhere(ax)
for index, feature in enumerate(feat_names):
  scaled.boxplot(by='artist', column=feature, ax=ax[indices[index][0],
                                                indices[index][1]])
plt.tight_layout()

png

⚖️ Exercise 3: Write a function that returns a similarity score between two artists

# Cell for 3
avg = df.groupby('artist')[feat_names].mean()
avg
danceability energy key loudness mode speechiness acousticness instrumentalness liveness valence tempo
artist
2pac 0.8102 0.692400 5.0 -6.2682 0.8 0.18134 0.097199 0.000569 0.20875 0.62680 100.6579
chopin 0.3255 0.018298 4.6 -30.1471 0.5 0.04479 0.988800 0.906600 0.08981 0.10532 86.1455
debussy 0.2868 0.019716 3.6 -30.3510 0.8 0.04209 0.990800 0.927000 0.07817 0.07747 81.2996
ganja 0.6506 0.701500 5.2 -5.0898 0.3 0.08868 0.083128 0.114312 0.12767 0.40886 108.9341
johnwill 0.2783 0.151770 3.9 -20.4347 0.5 0.03766 0.903000 0.615920 0.18085 0.19182 87.2151
luttrell 0.5310 0.706000 4.5 -9.3468 0.7 0.03504 0.127682 0.669259 0.18714 0.16701 117.5059
tswift 0.5073 0.621700 4.2 -6.4014 0.9 0.05926 0.198158 0.000263 0.13591 0.40550 125.7815
zep 0.4523 0.705800 7.3 -10.2421 0.7 0.09121 0.280518 0.038355 0.17212 0.61690 105.0253
x = avg.loc['2pac'].values
y = avg.loc['zep'].values
def distance(xlabel, ylabel):
  x = avg.loc[xlabel].values
  y = avg.loc[ylabel].values
  return np.sqrt(((x-y)**2).sum())
distance('2pac', 'ganja')
8.38304605268043
distance('chopin', 'debussy')
4.9615683289484185
scores = np.zeros((avg.shape[0], avg.shape[0]))
for index, name in enumerate(avg.index):
  for index2, name2 in enumerate(avg.index):
    if index < 1:
      print('{:.2f}'.format(distance(name, name2)) + ' ' + name + ' ' + name2 + ' ')
    scores[index, index2] = distance(name,name2)
0.00 2pac 2pac 
27.99 2pac chopin 
30.97 2pac debussy 
8.38 2pac ganja 
19.61 2pac johnwill 
17.16 2pac luttrell 
25.14 2pac tswift 
6.35 2pac zep
dists = pd.DataFrame(scores, index=avg.index, columns=avg.index)
import seaborn as sns
sns.heatmap(dists)
<matplotlib.axes._subplots.AxesSubplot at 0x7f9c32ee0190>

png

import numpy as np
fig, ax = plt.subplots(1, 1, figsize = (10,10))

# create a mask to white-out the upper triangle
mask = np.triu(np.ones_like(dists, dtype=bool))

# we'll want a divergent colormap for this so our eye
# is not attracted to the values close to 0
cmap = sns.diverging_palette(230, 20, as_cmap=True)

sns.heatmap(dists, mask=mask, cmap=cmap, ax=ax)
<matplotlib.axes._subplots.AxesSubplot at 0x7f9c32a1b450>

png