Music Recommendation Through the Lens of Emotion Detection

In today's digital age, personalized experiences are paramount. From tailored shopping suggestions to curated news feeds, we've come to expect technology to understand and cater to our individual preferences. Music is no exception. While traditional music recommendation systems often rely on listening history, genre preferences, or collaborative filtering, a more nuanced approach involves understanding the listener's current emotional state. This blog post explores the fascinating intersection of emotion detection and music recommendation, detailing how we built a system that recommends music based on how you're feeling.

Unveiling Emotions: The Foundation of Our System

The first crucial step in building an emotion-aware music recommendation system is accurately detecting human emotions. This is a complex task, as emotions are subtle and multifaceted. Our approach focused on analyzing facial expressions, a powerful indicator of a person's internal state. We utilized the FER2013 dataset, which is widely used for facial emotion recognition and contains images categorized into seven distinct emotions: angry, disgusted, fearful, happy, neutral, sad, and surprised.

Image description

We explored various methods for facial emotion recognition, primarily leveraging the power of Convolutional Neural Networks (CNNs). CNNs are a class of deep learning models particularly well-suited for analyzing visual data like images. We experimented with building custom CNN architectures from scratch, carefully designing layers and configurations to effectively learn features from facial images that correlate with different emotions.

Our initial attempts involved a VGG-inspired architecture. We carefully designed its layers and configurations to effectively learn features from facial images that correlate with different emotions. Through training on the FER2013 dataset, this model yielded an accuracy of 51.98%.

Image description

Image description

Building upon this, we then explored a custom ResNet-inspired model. This architecture, known for its deeper structure and residual connections, showed improved performance in learning complex facial features. Training this custom ResNet-inspired model on the FER2013 dataset resulted in a higher accuracy of 73.15%.

Image description

Image description

In addition to custom models, we also investigated the use of pre-trained models. These models, trained on massive datasets for image recognition tasks, have already learned to identify a wide range of visual patterns. We explored fine-tuning a pre-trained ResNet50 model, adapting its learned knowledge to the specific task of facial emotion recognition. This involved adjusting the model's later layers to classify emotions based on the features it had already learned to extract. This approach proved to be the most effective, resulting in a final accuracy of 92.8%. The fine-tuned pre-trained ResNet50 model consistently outperformed the custom CNN architectures, demonstrating its effectiveness in this task. This is likely due to the vast amount of data the pre-trained model was initially exposed to, allowing it to develop a robust understanding of visual hierarchies and features that are also relevant to facial expressions.

Image description

Crafting the Soundtrack to Your Mood: The Music Recommendation Engine

With a reliable emotion detection system in place, the next step was to build a music recommendation engine that could leverage this emotional understanding. Our goal was to provide music that aligns with or helps to enhance the detected emotion.

Our music recommendation system was built around a curated collection of songs categorized by emotion. We compiled lists of songs associated with various emotional states, such as happy, sad, angry, neutral, fearful, disgusted, and surprised. This categorization was based on a combination of factors, including lyrical content, musical style, and common associations.

To access and play music, we integrated with a music streaming service using the Spotipy library, a Python library for the Spotify Web API. This allowed us to programmatically search for and play songs from our categorized lists based on the emotion detected.

The Harmony of Emotion and Music: Bringing It All Together

The true innovation of our system lies in the seamless integration of emotion detection and music recommendation. The process unfolds as follows:

First, the system captures a facial image of the user. This image is then fed into the trained emotion detection model. The model analyzes the facial features and outputs a predicted emotional state.

Once the emotion is detected (e.g., happy, sad, calm), the system accesses the corresponding category of songs in our curated music collection. Using the Spotipy library, it then selects and plays a song from that category.

This creates a dynamic and personalized music listening experience. Instead of relying on static preferences, the system adapts to the user's current mood, offering a soundtrack that resonates with their feelings. Feeling down? The system might recommend uplifting tunes. Feeling stressed? Perhaps calming melodies are in order.

By combining the power of computer vision for emotion detection with a carefully curated music library and seamless integration with a music streaming service, we've created a system that goes beyond traditional recommendation methods, offering a truly emotionally intelligent music experience. This approach opens up exciting possibilities for more personalized and empathetic interactions between humans and technology.