November 21, 2019

How User Rating Scores Correlate with Audience Content Preferences

Data Analysis Netflix Python
Written during BloomTech (formerly Lambda School), 2019. This is where the data journey started.

Most of us consume content on various platforms and in different ways. Some prefer old-fashioned shows, while others seek out new or trending content. This brings us to one of the most popular platforms: Netflix. This platform has transformed the status quo and changed the way we view content.

As a former filmmaker and aspiring data scientist, my curiosity lay in understanding the variety of content available for individuals to consume and the ideology behind selecting and suggesting content for viewers.

The Dataset

The dataset relied on Netflix's suggestion engine due to the vast amount of time it would take to collect 1,000 shows individually. After analysis, more than half of the titles were duplicates, leaving 495 unique titles to examine.

Rating Distribution

First, I sought to determine whether Netflix had a diverse library of content across a range of ratings. Within the dataset, I discovered twelve categorical ratings, ranging from G to TV-MA.

The rating system breaks down the audience into categories: Little Kids (G, TV-Y, TV-G), Older Kids (PG, TV-Y7, TV-Y7-FV, TV-PG), Teens (PG-13, TV-14), and Mature (R, NC-17, TV-MA).

Age Group Correlation

I employed feature engineering by adding an audience column to the dataset, linking show ratings with specific age groups. This allowed analysis of the correlation between audience age and individual user rating scores.

One might question how a minor user's score can be validated. Since this dataset was created, Netflix has implemented a thumbs up/thumbs down system that enables simpler rating.

Content Popularity

I employed a word cloud to visualize the most popular titles within the dataset. The analysis revealed that Netflix excels at meeting consumer demand by ensuring the appropriate titles are in their library.

Netflix word cloud visualization

Conclusion

Working with this dataset, particularly with my past career in motion pictures, was fascinating. It became clear how studios select which projects to move forward with and why they adhere to these models to facilitate prediction and streamline the selection process. For a platform like Netflix, the vast array of content allows audiences to enjoy content on demand, keeping subscribers engaged and satisfied.

Original article on Medium · View the code

← All posts