top of page
  • gittra

CS401
ADA EPFL
2022

7th ART & BOOKS

BASED ON REAL DATA

The global film industry was worth $136 billion in 2018. Often labelled as the "7ème art", it is the anchor of entertainment and soft power. Indeed, in 1950, it was even the subject of a political agreement between France and the USA, known as “Blum-Byrnes” agreement.

In recent years, we have witnessed a growing number of films that were inspired by books. Therefore, we were particularly interested to investigate the influence of books in the cinema industry with our following questions. 

 

 

What is our data, and how are the movies connected to books?

Our story starts with the CMU dataset containing 81 '471 movies which include 42’000 movie summaries. We have enriched our dataset with the following additional information:

- Movie subtitles (3’000 movies annotated with English subtitles).

- Movie review scores and movie awards  (17’000 movies with Rotten Tomato scores).

- Book review scores and book awards (3'300 book ratings from Amazon and Goodreads).

 

To connect movies and books, we used WikiData and collected 7’000 films that are based on books with their book labels.

  • Does our dataset represent the world cinema industry?

By taking a closer look, India (the most prominent movie producer in the world) is only placed in the 13th position, and the USA, UK, and France are top three. Therefore, our dataset is mostly representative of the western cinema industry. 

top-countries.png

  • What is the impact of books in the cinema?

Cinema is an industry with huge honorariums and enormous money flows. Through our dataset, we investigated the most significant success in cinema history, and we found that 14 out of 15 best-sold movies are based on books!

revenues.png
  • Are books critical to a successful movie?

 

We have collected film reviews from Rotten Tomato to figure it out. For the clarity of the experiment, we evaluated the question in terms of causal relation, stating the property of “based on book” as a treatment and review score as a target variable. After propensity score matching, we see that films based on books get higher review scores than others on average. Interesting outcome!

Do films based on books get higher movie review scores?

 

Yes, according to a statistical test for hypothesis testing (T-test).

Can we predict the movie review score based on plots, and how much, does the fact that a movie is based on a book improves the review score?

  • Predicting review scores based on the movie plot.

We saw that movies inspired by books have higher review scores. However, we usually choose a movie depending on its plot and description. We decided to check how correct it is to judge a film by its story. To do this, we first tried to predict the movie review score from the movie summaries.

Given the complexity of aggregating textual information, we got pretty good results. It can be seen from the table that, on average, our best model is only 1 point wrong and that the metric R-squared is 0.27.

To visualize the results, we display the predictions of our model on the most frequent genre found in the test subsample (Drama, Pearson correlation = 0.5156089870102782)

  • In addition to the previous point,  how much does the fact that a movie is based on a book improve the score compared to other movies?

We decided to test whether the fact that a book is based on a movie improves the review scores. Films based on books inspire more trust among viewers. They often turn out to be better because of the more detailed story. Summary movies and the book-based variable were used to predict review scores, and it turned out that the latter variable is significant in predictions. Indeed, the table shows that movies based on the book receive an additional 0.23 points.

Can the vocabulary used in book-based movies be the secret of their success?

​As we have seen, movies based on books are more successful than regular movies and movie plots can predict movie scores. Let's examine movie subtitles to see if there is a difference in speech between book-based movies and those that are not. Maybe this is the secret of their success?

  • What are the most representative words used in book-based movies and in other movies?
     

If we look at the most important words in movies based on books and in movies not based on books, then we will hardly notice the difference, although the words of the movies based on the books seem a little more complicated

Movies based on books

Movies not based on books

11.png
  • What if we divide movies by genre? Will there be a noticeable difference?
     

Again, the difference is not very noticeable. Although the words on the right (for movies that are not based on books) are often more colloquial and slangy, and the words on the left (for movies based on books) are more literary.

An interesting difference can be seen in the Adventure genre. Movies based on books seem to be more fantasy - about magic, wizards, pirates. And movies not based on books are more realistic - about the Middle Ages and the Wild West.

22.png

Movies based on books

Movies not based on books

  • Most successful and unsuccessful movies.
     

Let's divide the movies differently. Let's try to find the difference between the most successful and unsuccessful films: between movies with the highest and lowest ratings, respectively.

Let's look at about 10% of the most successful movies and 10% of the most unsuccessful ones.

Here the difference in word style is more noticeable: on the right (films not based on the book), words are colloquial, and on the left (films based on the books), words are rarer, more beautiful and more literary.

Movies based on books

Movies not based on books

  • The complexity of words and their uniqueness
     

Let's try to find the difference in words numerically.

Let's look at 2 charts:
Number of unique words and word complexity.

The left plot shows the average values of the number of unique words for segments of a given length. 
The book-movies have slightly more unique words. But the difference is insignificant. 

The right plot shows the average word difficulty for 3 datasets: all books, 10% of the most successful and 10% of the most unsuccessful. 


As we can see, the situation is even reversed here - in movies based on books, slightly more accessible and common vocabulary is used (since the larger the number, the more familiar the word, which means it is easier). However, the difference is again insignificant. And it does not matter if the movie is successful - all values are very similar.

Summing up the analysis of subtitles, there is a slight difference in the speech used in movies based on books compared to other films.

​Specifically, the movies based on books use a slightly more varied and complex vocabulary and contain less jargon and swear words. However, the difference is negligible, so it needs to be clarified whether the style of speech is a significant factor in the success of these movies. Other elements, such as the plot, the acting, the direction, and the marketing, might be more critical in determining the success of films based on books.

So far, we have mainly investigated the movie dataset and how the latter relates to movie scores, but...

Are books more appreciated than their movie counterparts? 

We all know someone (or yourself) who says something like:

 

 

But is that really true? Before diving into this question, let's first take a look at the book and movie awards and how they relate to their respective scores; Scores were normalized to range from 0 to 10, and 602  exact title matches between movies and books were found.

  • Scores and Awards

 

                        Movie awards and scores                                              Book awards and scores

We notice on the left graph that higher movie scores are more likely to get an award, which makes sense. On the right graph, we observe that the most successful and awarded books are more likely to be turned out into a movie. 

Now, let's investigate the most intriguing question.

We clearly notice that books have higher review scores (average of around 8) in comparison to their movie counterparts (average of around 6.5). 

So finally books are better than their movies?

 

Yes, according to a statistical test for hypothesis testing (paired t-test), books have higher ratings than their movie counterparts.

CONCLUSION

bvs_edited.png
  1.  The biggest successes of cinema (with the exception of Titanic) are inspired by books.

  2. Movies based on books get higher movie scores.

  3. The movie plot predicts the review score. In addition, the dummy variable 'Based on books' significantly impacts the review score (additional 0.23 points).

  4. Movies based on books employ more literary and sophisticated words in comparison with those that are not book-based.

  5. Successful books are more likely to turn into a movie.

  6. Even though movie scores are higher when inspired by books compared to other movies, they can not compete with their book counterparts. Books have higher scores in comparison with their respective movies. 

bottom of page