Joined: Sep. 2006
I've mentioned this paper before (in the Young Cosmos thread?): A Mathematical Theory of Citing, by Mikhail V. Simkin and Vwani P. Roychowdhury. Journal of the American Society for Information Science and Technology, 58(11): 1661-1673, 2007.
|A theory of citing was long called for by information scholars (Cronin, 1981). From a mathematical perspective, an advance was recently made with the formulation and solution of the model of random-citing scientists (Simkin & Roychowdhury, 2005a). According to the model, when a scientist writes a manuscript, he picks up several random papers, cites them, and also copies a fraction of their references. The model was stimulated by the recursive literature search model (Vazquez, 2001) and justified by the fact that a majority of scientific citations are copied from the lists of references used in other papers (Simkin & Roychowdhury, 2003, 2005b). The model leads to the cumulative advantage (Prices, 1976) (also known today as preferential attachment; Barabasi & Albert, 1999) process, so that the rate of citing a particular paper is proportional to the number of citations it has already received. Despite its simplicity, the model appeared to account for several major properties of empirically observed distributions of citations.|
A more involved analysis, however, reveals that certain subtleties of the citation distribution are not accounted for by the model. It is known that the cumulative advantage process would lead to the oldest papers being most highly cited (Barabasi & Albert, 1999; Gunter, Levitin, Schapiro, & Wagner, 1996; Krapivsky & Redner, 2001). In reality, the average citation rate decreases as the paper in question gets older (Glanzel & Schoepflin, 1994; Nakamoto, 1988; Pollmann, 2000; Price, 1965). The cumulative advantage process also would lead to an exponential distribution of citations to papers of the same age (Gunter et al., 1996; Krapivsky & Redner, 2001). Empirically, it was found that citations to papers published during the same year are distributed according to a power law (see the ISI dataset in Figure 1a in Redner, 1998).
In the present article, we propose the modified model of random-citing scientists: When a scientist writes a manuscript, he picks several random recent papers, cites them, and also copies some of their references. The difference with the original model is the word recent. We solve the model using methods of the theory of branching processes (Harris, 1963) (for a review of its relevant elements, see Appendix A), and show that it explains both the power-law distribution of citations to papers published during the same year and literature aging. A somewhat similar model was recently proposed by Bentley, Hahn, and Shennan (2004) in the context of patents citations; however, those authors use it to explain only a power law in citation distribution (for what the usual cumulative advantage model will do) and did not address the topics just mentioned.
Unfortunately this paper is available only through a subscription through JASIST. Therefore I'm going to have to do this in shifts.
"Branching citations" are described.
"Scientific Darwinism," the bibliometric measure of scientific "fangs and claws" that help a paper "to fight for citations with its competitors" is described.
The exponential growth of literature, much as Asimov described it, is described.
The situation of numerical simulations and unread citations is described.
The "aging of scientific literature" is described.
I like this part: "Sleeping Beauties in Science":
|Figure 3 [not pictured] shows two distinct citation histories. The paper, whose citation history is shown by the squares, is an ordinary paper. It merely followed some trend. When 10 years later that trend got out of fashion, the paper was forgotten. The paper, whose citation history is depicted by the triangles, reported an important but premature (Garfield, 1980; Glanzel & Garfield, 2004) discovery, the significance of which was not immediately realized by scientific peers. Only 10 years after its publication did the paper get recognition, and got cited widely and increasingly. Such papers are called "Sleeping Beauties" (Raan, 2004). Surely the reader has realized that both citation histories are merely the outcomes of numerical simulations of the modifieid model of random-citing scientists.|
After the original version of this paper was submitted for publication, there appeared an article by Burrell (2005) which used a phenomenological stochastic model of citation process to show that some sleeping beauties are to be expected by ordinary chance. An earlier paper by Glanzel, Schlemmer, and Thijs (2003) addressed a similar issue using the cumulative advantage model. In this case, the authors were specifically concentrating on papers that were little cited during the 2 years after publication. (This is the standard time frame used in bibliometrics to determine the impact of a publication.)
Relation to Self-Organized Criticality:
|Those familiar with the Self-Organized Criticality (SOC) of Bak et al. (1988) may be interested to know that it is directly related to our study. We model scientific citing as a random branching process. In its mean-field version, SOC also can be described as a branching process (Alstrom, 1988; Lauritsen, Zapperi, @ Stanley, 1996). Here, the sand grains, which are moved during the original toppling, are equivalent to sons. These displaced grains can cause further toppling, resulting in the motion of more grains, which are equivalent to grandsons, and so on. The total number of displaced grains is the size of the avalanche and is equivalent to the total offspring in the case fo a branching process. Distribution of offspring is equivalent to distribution of avalanches in SOC.|
Bak (1999) himself had emphasized the major role of chance in works of nature: One sand grain falls, and nothing happens; another one (identical) falls, and causes an avalanche. Applying these ideas to biological evolution, Bak and Sneppen (1993) argued that no cataclysmic external events was necessary to cause a mass extinction of dinosaurs. It could have been caused by one of many minor external events. Similarly, in the model of random-citing scientists: One paper goes unnoticed, but another one (identical in merit) causes an avalanche of citations. Therefore, apart from explanations of 1/f noise, avalanches in sand piles, and extinction of dinosaurs, the highly cited Science of Self Organized Criticality (Bak, 1999) also can account for its own success.
Well, I think the reference to dinosaur death was unnecessary, even silly, but the paper is interesting.
Based on this, I would predict that citations in creationist literature, if any, do not exhibit this branching process.
Which came first: the shimmy, or the hip?
AtBC Poet Laureate
"I happen to think that this prerequisite criterion of empirical evidence is itself not empirical." - Clive
"Damn you. This means a trip to the library. Again." -- fnxtr