Odds of winning one? - 1 in 11,500.

As a child, I was captivated by the magic of movies. One of my earliest memories is watching Cary Grant dodge a crop duster in “North by Northwest,” a film that was nominated for three Academy Awards in 1959 but ultimately came up empty-handed. As a former documentary filmmaker, I’ve always been curious about what it takes to win a golden statue.

alt text

“You can’t approach baseball from a statistical bean-counting point of view, it’s won on the field with fundamental play, you have to steal you have to bunt, you have to sacrifice, you got to get men in scoring position, and you got to bring them in. You don’t do that with a bunch of statistical gimmicks. Nobody reinvents this game.” ~ from the Motion Picture Moneyball nominated for six Oscars.

There is a lot to be said about this quote, adapted from Moneyball: The Art of Winning an Unfair Game. As a data scientist, I have come to appreciate that predicting potential outcomes in the movie industry takes more than just machine learning and AI. It requires the expertise of talented individuals who can write a great screenplay, someone with a vision, and a crew that can handle pre-production, principal photography, and post-production to bring the film to the finish line. This process requires a human touch that AI can never fully replicate. Making a motion picture is a complex art that involves a wide range of skills, and I have firsthand experience of just how much hard work and dedication goes into it.

“In feature films the director is God; in documentary films, God is the director.” ~ Alfred Hitchock

So, I decided to use my skills as a data scientist to predict the winners of the Oscars. I gathered data from other award ceremonies, critics’ scores, and historical data spanning from 1980 to 2016. I analyzed correlations between awards, release dates, ratings, and running time before running them through models and familiarizing myself with the data firsthand.

alt text

The data I collected was from various award ceremonies such as the BAFTAs, Guilds, critics’ scores, and historical data.

“A total of 3,096 Oscar statuettes have been awarded from the inception of the award through the 91st ceremony.”

Winners are chosen from 24 categories. I selected eight out of the 24 for my model, as shown below:

alt text

I selected eight out of the 24 categories for my model, leaving out technical awards. After analyzing the data and running it through models, I found that the top three awards that precede the Oscars are BAFTA, Golden Globes, and the Guild. Releasing a motion picture in Q4 and Q1 is also advantageous since it’s close to award season and helps during the campaigning process.

alt text

When observing this chart, it becomes evident that the Golden Globes may be a significant factor in predicting the Oscars. However, let’s further analyze the data to determine if this hypothesis remains consistent.

Additionally, we investigated the potential impact of quarterly release dates on the likelihood of winning an Oscar, as shown in the following image: alt text

In the count plot above, one can see that there are benefits to releasing a motion picture in Q4 and Q1, which makes a lot of sense since it’s close to award season. Releasing a movie during this time helps with the campaigning process while it’s still fresh in the minds of the Academy members. Additionally, most of the films that go on to win an Oscar are “R” rated films, as seen in the chart below.

alt text

According to our analysis, a motion picture with a running time of 123 minutes has the highest chance of winning an Oscar.

alt text

Based on the current information, those who are planning to produce a film with the goal of winning an Oscar should consider making a drama film with a 123-minute runtime, and aim to release it in either the first or fourth quarter. This may increase their chances of success, albeit still with a probability of one in 11,500.

Additionally, in this project, future engineering was performed by adding a “wins” column and extracting data from films nominated in each category, which were then grouped by nominations and wins for further analysis.

alt text

As one can see, there are many variables used in these predictions which can result in a wide range of outputs based on the data collected from award shows leading up to the Oscars. After thoroughly analyzing the data and running it on the remaining features:

alt text alt text

While conducting research for this project, I came across a potential candidate that may go on to win Best Picture. This film has landed three crucial guild nominations, which indicate strong support from multiple quadrants across the Academy: SAG’s Cast in a Motion Picture, PGA, and DGA.

Despite having limited time, I plan to continue exploring and gathering more data. The model I developed predicts a 55% probability for this film to win Best Picture.

The top 3 reasons for this prediction include:

  1. The Rotten Tomatoes audience score is at 96.0, indicating widespread popularity and positive reviews.

  2. The film has achieved box office success, grossing 389900000.0.

  3. The star count was at 6.0, indicating strong performances and acting talent.

See the chart below for a visual representation of these factors: alt text

It’s important for fans to have an impact by supporting their favorite films at the box office. The Academy may have a significant influence, but ultimately, ticket sales can also make a difference. As shown in the partial dependence plot PDP, box office performance is a significant factor that can affect the outcome of the Oscars. You can view the relationship between box office performance and the probability of winning in this graph:

alt text

The model predicts that “The Irishman” will win the Adapted Screenplay category, while Noah Baumbach’s “Marriage Story” will take home the Original Screenplay award. Joaquin Phoenix is predicted to win Best Actor for his role in “Joker,” while Renee Zellweger is predicted to win Lead Actress for “Judy.” Brad Pitt is predicted to win Supporting Actor for “Once Upon a Time in… Hollywood,” and Laura Dern is predicted to win Supporting Actress for “Marriage Story.” Sam Mendes is predicted to win Best Director for “1917,” and “1917” is predicted to win Best Picture.

While my model provides a glimpse into who might win, it’s not a sure thing. There’s more to winning an Oscar than statistics and machine learning. It takes someone with a vision, a great screenplay, and a talented crew in pre-production, principal photography, and post-production to make a motion picture. Nonetheless, I had a blast working on this project, and I will continue exploring the intersection of film and data science.

So, mark your calendars for February 9, 2020, at 6:30 PM EST, and see how my predictions stack up against the real winners.

want to get in touch?

You can applaud my story on Medium here:

Medium

Code