Comparing book citations in humanities journals to library holdings:

Scholarly use versus 'perceived cultural benefit'.

Alesia Zuccala1 and Raf Guns2


Institute for Logic, Language and Computation, University of Amsterdam,

P.O. Box 94242, Amsterdam, 1090 GE (The Netherlands)


University of Antwerp, IBW, City Campus, Venusstraat 35, B-2000 Antwerpen (Belgium)


In this paper we examine the statistical relationship between citation counts to books referenced in SCOPUS humanities journals and library holding counts ('libcitations') retrieved from WorldCat®. Our focus is on books (with ISBN numbers) published between 2001-2006, which received citations in History and Literature & Literary Theory journals during the period of 2007-2011. A Spearman's rank correlation coefficient was used, and our test resulted in significant correlations between the citations and 'libcitations'. We present and discuss the details of our dataset (extracted from a much larger, newly constructed database) and comment on why the 'perceived cultural benefit' of holding a book in a research library can lead to, but may not necessarily lead to use (i.e., a citation) of that book in new humanities research.

Conference Topic Scientometrics Indicators: Relevance to Humanities (Topic 1), Old and New Data Sources for Scientometric Studies: Coverage, Accuracy and Reliability (Topic 2), and Bibliometrics in Library and Information Science (Topic 3).

Introduction Books or monographs published in the humanities capture the research efforts of scholars concerned with human achievements. These texts are as much a part of our cultural heritage as they are part of scholarship (Garfield, 1979). In books we observe the story of a research discipline, that is, how it has evolved in different regions, over a specific time period, and within a particular "interpretive" community (Fish, 1980). Despite the fact that books are, for many humanities topics, principal modes of output, little is known about their scholarly impact. Bibliometricians have been reluctant to approach the subject of impact, because it is normally associated with high citation counts to and from articles published in scientific journals covered by the Web of Science (e.g., the Journal Impact Factor). Some journals published in the humanities are agreeable to impact factors (see Elsevier, 2010), but for the most part, these measures have been avoided in favour of general citation monitoring (Nederhof, 2006). Since the late 1970s, research has focused primarily on the characteristics of cited works in humanities texts or classifying citations to or from small monograph collections in disciplines where they are most prevalent (Budd, 1986; Cullars, 1985; 1989;

1998; Frost, 1979; Hammarfelt, 2012; Heinzkill, 1980; Hellqvist, 2010; Jones et al., 1972;

Lindholm-Romantschuk & Warner, 1996; Nolan, 2010; Stern, 1983; Thomson, 2002).

For books in general, the absence of source metadata (i.e., internal identification codes) in the main commercial citation indices (i.e., Thomson Reuters’ Web of Knowledge and Elsevier SCOPUS) has made it difficult to develop reliable indicators. Books have always been recorded in the Thomson Reuters’ Web of Science (i.e., Science-SCI-E, Social Science-SSCI and Arts & Humanities-A&HCI) as ‘non-sourced’ cited materials, but some ‘book chapters’ and ‘books’ started to appear in all three indices as far back as 2005. Growth rates indicate that their appearance has occurred irregularly (Leydesdorff & Felt, 2012). Recently, Leydesdorff & Felt (2012) found that the classification of books in the Web of Science is problematic: many have been misclassified as articles or reviews. Thomson Reuters' new Book Citation Index (BKCI) is expected to be a more accurate resource for bibliometric analyses (Adams & Testa, 2011). With the introduction to this index, we have been promised a ‘complete picture’ (Thomson Reuters, 2013). Only research based on this new Book Citation Index can tell us how useful it will be for evaluating citation-based impacts over the long term.

While the new indices are still in production, some researchers from the bibliometrics community have been considering alternative ways to study the impact of books. Kousha and Thelwall (2009) confirm that there are substantial numbers of citations to academic books from Google Books and Google Scholar to help evaluate book-oriented disciplines. TorresSalinas and Moed (2009) as well as White et al. (2009) have focused on the potential of library catalogues for impact-based analyses, where an analogy may be created between journal-based citations and library holdings. Torres-Salinas and Moed (2009) studied the number of catalogue inclusions per book title in WorldCat®, while White et al. (2009) introduced the term ‘libcitation’ as “an indicator of perceived cultural benefit” (p. 1087).

Linmans (2010) later suggested that researchers use a three-level approach for assessing books, focusing on citation counts, library holdings, and productivity.

The present study is motivated by the contributions of Torres-Salinas and Moed (2009) and White et al. (2009). Our objective is to further this earlier work using a special database that we have constructed to include books cited in journal articles covered by SCOPUS (History and Literature & Literary Theory) and corresponding library holding counts in both Association of Research Libraries (ARL) and non-ARL libraries. These were gleaned from WorldCat®. ARL is a non-profit membership organization of 125 research libraries in North America. Here, we explain how this database was developed for a much larger project (i.e., still a research in-progress) and present some preliminary analyses pertaining to the scholarly use of books (i.e., cited in journals) and their ‘perceived cultural benefit’ (catalogued in ARL and non-ARL libraries).

Overview of the datasets and database Data were granted to us from Elsevier through the Elsevier Bibliometrics Research Program.

In our application to this program we requested two separate datasets, each limited to citations recorded in journals classified as History or Literature & Literary Theory (Table1).

Table 1. Journals and journal citation data granted by Elsevier SCOPUS (April 2011).

–  –  –

Upon receiving the SCOPUS data, we examined the number of citations recorded in the 1023 journals (two time periods together) to determine the overall frequency to books, research articles (ar), conference proceedings (cp), review papers (re), notes (no) and other nonsourced materials. Cited materials that did not have an internal SCOPUS identification number, or did not meet the criteria that we established for identifying other materials - e.g., a non-sourced journal/proceedings article - were classified as a 'book'. All 'other' documents will be re-examined and classified at a later date.

–  –  –

For all items classified as a 'book' we performed a set of queries in WorldCat®, using an API developer key granted to us by the Online Computer Library Center (OCLC). This key allowed us to match cited titles with the titles recorded in one or more of the international libraries covered by WorldCat®. For every matched title (confirming that it was a book), we retrieved an OCLC identification number, ISBN number, publisher name, publisher location, and a library holding count for both ARL and non-ARL libraries. With all the journal citingto-cited data obtained from Elsevier, including the results of the API queries, we created a unique new SCOPUS-WorldCat® relational database.

Table 3, below, presents some descriptive statistics resulting from queries made to our new database. Here we show the total number of documents cited by articles or reviews published in History and Literature & Literary Theory journals for two citation windows: 1996-2001 and 2007-2011. Some of these cited documents have been categorized as follows: a) sourced in SCOPUS only, b) non-sourced in SCOPUS, but matched in WorldCat®, or c) sourced in SCOPUS and matched in WorldCat®. In this paper, we are concerned with a subset of books that were non-sourced in SCOPUS, but matched in WorldCat®.

–  –  –

Data analyses and results The aim of this study is to statistically examine the relationship between citation counts to books in SCOPUS journals and WorldCat® library holding counts ('libcitations') for both ARL and non-ARL libraries worldwide. We expect to find a strong positive relationship, where 'libcitations' or library holdings support or lead to journal citations, but not vice versa.

The idea behind this hypothesis is that humanities scholars borrow books from their university/academic library and 'use' these books by reading and citing them in research articles/review papers. Time is required for a book to be published, marketed to and purchased by a library before it is cited; hence we focus on a book publication window of six years (2001-2006), followed by a journal citation window of five years (2007-2011). With respect to library holdings, we assume that a book published in 2001 would not have been added to one of the libraries at least until this date or sometime after (up to and including Nov. 2012).

Figure 1. Citation and 'libcitation' frequency distributions for books published in 2001 to 2006 (cited in SCOPUS History journals, 2007-2011).

Table 4. Spearman's rank correlation coefficients for citations and libcitations.

(History, 2007-2011 and Literature & Literary Theory, 2007-2011).

To select the best test for our hypothesis we first observed and compared the frequency distributions for all citation counts and library holding counts in the separate fields. Figure 1, above, presents the book citation and library holding distributions for History (N=59,436) only. Note that the data are skewed thus do not fit a normal curve. Similar non-normal distributions were found for Literature & Literary Theory (N=41,853). With this data the most appropriate test to use is the Spearman's rank correlation coefficient. The procedure for performing a Spearman correlation is the same for a Pearson correlation; however, the Spearman rho is less sensitive to strong outliers. Table 4, above, presents some general statistics related to our datasets and indicates that we found significant, though not especially strong correlations between citations and 'libcitations' in History as well Literature & Literary Theory.

Discussion and Considerations for Further Research First it is important to comment on the 'cleanliness' of the History and Literature & Literary Theory datasets. Since we were working with thousands of 'non-sourced' book references, it was necessary to examine repeat iterations of the reference strings to be sure that they were to the same book. For the most part, they were, but without a deep manual cleaning effort, we cannot say that the datasets were 'perfectly' clean. With respect to our correlation results, it is possible that if given access to cited book references from other books as well, the Spearman's rho might even be more significant. Also, there is much to be said about conducting this type of test, especially when correlation measures are not necessarily the best for understanding 'causes' and 'effects'. Many 'in between' variables can influence a correlation, some of which may be the browsing habits of humanities scholars, the concentrated nature of their work, and habit of citing books from their collegial network regardless of whether or not it is present in their institutional research library. Nevertheless, it is the goal of their institution to hold books that are 'perceived' to be beneficial to the culture of their research. From a bibliometric perspective, it is helpful to know if research-oriented libraries are doing for the humanities what they aim to do, which is to make quality books available for scholarly use.

A sizable portion of the books are present in many libraries but infrequently cited in the data set. The reverse (highly cited, but present in few libraries) is less common. When we examined the list of books that were proportionally cited least compared to their holding count, we discovered two main reasons for the divergence. First, reference works such as the Oxford English Dictionary have a very high ‘perceived cultural benefit’ but are not typically cited. The presence of this kind of book indicates that citations and 'libcitations' are not entirely interchangeable – they measure (partially) different dimensions. Second, the list contains many books that stem from other disciplines (e.g., Diagnostic and Statistical Manual of Mental Disorders). It seems likely that most books in this category would be highly cited if journals from their disciplines were part of the data set. More work may be done with larger datasets, for instance, expanding the data to include other humanities subjects, and/or making comparisons with cited books in the social sciences and sciences. There is also a strong opportunity here to further examine the role of book reviews, as 'gateway' documents, i.e., documents that encourage or discourage librarians to purchase books, and the motivation of scholars to read and cite books that were or were not selected based on reviews.

Ackowledgements The authors are grateful to both the Elsevier Bibliometrics Research Programme (http://ebrp.elsevier.com/) and OCLC WorldCat® for granting access to the data that were used to build the unique database required for this study. We also wish to thank Dr. Roberto Cornacchia for assisting us with the development of our database, as well as Maurits van Bellen and Robert Iepsma for their data cleaning and standardisation work.

References Adams, J., & Testa, J. (2011). Thomson Reuters Book Citation Index. In E. Noyons, P. Ngulube & J.

Leta (Eds.), The 13th Conference of the International Society for Scientometrics and Informetrics (Vol. I, pp. 13-18). Durban, South Africa: ISSI, Leiden University and the University of Zululand.

Budd, J. (1986). Characteristics of written scholarship in American literature: A citation study. Library and Information Science Research, 8, 189–211.

Cullars, J. (1985). Characteristics of the monographic literature of British and American literary studies. College & Research Libraries, 46, 511-22.

