Writing the Significance article Picturing the Pictures would not have been possible without the data available from Barry Salt’s database on the Cinemetrics web site. The site is a rich source of data that others may find interesting.
Cinemetrics has been defined by Yuri Tsivian as ‘an open-access interactive website designed to collect, store, and process digital data related to film editing’. This manifests itself, in one database, as information on over 10,000 films in the form of ordered shot lengths that can, equivalently, be represented as cut-points in a film.
Ignoring the time-series structure, questions have arisen as to whether the distribution of shot lengths tends to be lognormal, and how best to summarise and compare distributions. On a log-transformed scale normality and most forms of deviation from it are manifest, so the data are ideal for exploring both informal graphical and formal hypothesis testing approaches for assessing normality. The potential for providing examples to play with in introductory statistics teaching is obvious.
At a different level, problems are posed in taking into account the time-series information available. For those interested in film and/or what might be done along these lines, Tsivian provides a fascinating analysis of the cutting structure in D.W. Griffiths seminal Intolerance (1916). This exposes interpretable patterns across the film as a whole, and contrasting patterns within the four stories, themselves intercut, that constitute the film. Time-series is not my area, but it seems to me that fruitfully applying ideas to explore cutting patterns both within films and across films is challenging.
In another vein, research has been undertaken to investigate whether localised patterns of cutting within films act as markers that subdivide the film into `acts’ analogous to those familiar from theatrical productions, and whether three- or four-act structures are common. I think the question is undecided.
Barry Salt’s database, and the section on shot scale data, was that used in my paper. He has also made available data on camera movement and average shot lengths (ASLs) for several thousands of films. ASLs are the aspect of film quantification that has attracted most attention to date, and how they can be exploited statistically, categorised by period and nationality for example, is illustrated in Salt’s books referenced in the paper.
Pattern recognition is at the heart of a considerable body of statistical methodology. It requires quantification, and establishing quantification as a valuable approach to scholarly study in fields where it does not come `naturally’ is a hurdle that often has to be overcome. Cinemetrics is taking important steps in this direction in what may seem, to some, to be an unlikely area and I hope this note may persuade other statisticians to take an interest.
Yuri Tsivian’s article What is cinema? An agnostic answer, which discusses the genesis of cinemetrics and its applications, can be found via Google Scholar. The site is located at http://www.cinemetrics.lv/ and a debate on statistical issues, on which comment is welcome, has recently been initiated at http://www.cinemetrics.lv/dev/on_statistics.php