The number of dies used to produce a particular coin issue is of considerable interest to both numismatists and those interested in the economics of the ancient world in general. For
the benefit of interested readers I summarise here some formula which can be used for estimating the number of dies from the data now available to us*. Basic estimators – Good's Formula.
For die estimation we would ideally have a large sample of the type in question, with full details of the make up of the sample, i.e, we would know the total number of coins in the sample (n), the number of
"singletons" , i.e, dies represented by only one coin (d1), the number of dies represented by two coins (d2), and so on, together with the total number of different dies represented in the sample
(d). In practice we normally will have only some of these figures, but fortunately we don't need all of them to obtain a reasonable estimate of the total number of dies originally used, which we will denote by D.
The basic factor we have to estimate is the "coverage" of the sample, meaning, in the current context, the ratio of the number of different dies in the sample to the total number of dies in the underlying population of
coins**, i.e, we need to estimate the ratio:
where do is the (unknown) number of dies not present in the sample. You might think that the factor C depends in a complicated way on the detailed distribution of the coins in the sample, but in fact,
following on the work of Good in 1953, it turns out that C can be estimated by the simple formula#:
so that our first approximation to the total number dies D is just
To get a feel for this formula we note that if most of the coins in the sample are singletons then the sample will likely include only a small fraction of the original dies; and the coverage will be small;
mathematically, d1 won't be much less than n, and hence Ce
will be considerably less than one, so that D in formula (3) will be >> (very much greater than) d, the number of dies in the sample.
For the simple case where we have n coins with only one die match, then d1 will equal n-2, so that D = ½ nd (or ½ d2 close enough) - with two die matches, D = 1/4
nd, and so on (for not too many matches). On the other hand, if most of the dies in the sample are represented more than once, then the sample probably includes most of the original dies. In this case d
1 will be << n, and hence Ce will not be much less than 1, so that D will be only slightly greater than d. The above formula (3) is a good start, but it makes a key simplifying assumption,
namely that each die produced exactly the same number of coins. This is of course quite unrealistic, and probably tends to underestimate the actual number of dies because low output dies will tend not to show up in the
sample; however, we can improve the formula by by including a "spread" factor (S) to allow for the spread in the number of coins produced by the different dies, so that our basic formula now becomes:
(Alternatively, you can write this formula as D = d/C'e, where C'e= Ce/S means the estimated coverage allowing for the spread). To make the mathematics tractable, we assume that the
spread can be described by a certain family of formulae known as Gamma functions. We needn't go into these in detail, but they produce a distribution of coins something like a Gaussian bell curve, but centred on the
average number of coins produced per die. The relative width of the distribution is described by a "shape parameter" p – a low value of p means a broad distribution, while a large value gives a narrow curve. Assuming
this type of distribution, the spread factor works out to be approximately:
Note the higher the value of p, the closer S tends to 1, as you would expect, and that even for quite a wide spread, with, say, p = 2, S is still less than 1.5, so that in practice this correction factor generally
makes a relatively small difference to the final estimate (which also means that it doesn't need be very accurate). On the basis of experience we usually assume that p = 2, so that our formula finally becomes:
(p = 2 here corresponds essentially to Poissonian die lifetimes, as would result if the die life was determined by random breakage at a constant average rate; p < 2 corresponds to an excess of extreme values, e.g,
extra early breakage of dies and/or an excess of long life dies). Carter's Formulae. Good's formula gives, in theory at least, a good estimate of the total number of dies from only three descriptors of the
sample, namely n, d and d1. Quite often however, we will be given only the two most basic factors in the sample, namely n and d. All is not lost, however, since in a real sample, d1
will be statistically related to n and d, and we can still get quite a reasonable estimate for D from the formula:
where R = n/d is the average number of coins per die in the sample (sqrt means the positive squareroot). This formula is, according to Esty, essentially a single formula version of Carter's set of three linear
formula with which some readers may be familiar. It might lack the direct intuitive appeal of Good's Formula (6) above, but it is apparently based on the same assumptions, and the same value of the parameter p, namely
2. Note that provided there are at least some die duplicates in the sample then R > 1, so that the denominator in the formula will be greater than zero, ensuring a sensible result. (If there are no die duplicates,
then Good's formula doesn't work either, since in this case d1 = n). Discussion. While the above formulae may seem to be based on some fairly specific assumptions they seem to give good results in
practice. In particular, computer simulations have been run on model populations of coins with varying compositions (i.e, numbers of dies, and spread factors) to produce test samples of various size, and from these
simulations we find that estimations of D made using the formulae listed here generally match the original die numbers quite well, or at least, within the expected margin of error, with the important proviso that we
have a random sample to work from. Also, these formulae (with p = 2, or a bit less) seem to give the correct answers in real world cases where the dies are individually numbered and hence we actually have a good idea
of their total numbers anyway (e.g, the Norbanus and, particularly, the Crepusius denarius issues). Alternatively, if you don't want to make assumptions about die output spreads, you can simply take the basic Good
Formula 3 above (or my Formula 9 below, or in fact any equal output formula) as giving the number of "efud's", i.e, equivalent full use dies, or the number of dies that would have been used if all dies had the same
standard "full use" output. Such a figure may not be strictly realistic, but it can be sufficient for purposes such as the calculation of the relative sizes of different issues. In any case it should be realised that
these estimations of die numbers are only approximate, and hence the expected margin of error can be quite large. We can also produce estimates of the possible error range, but I will not do that in detail here, except
to say that a rough estimate of the total 95% error range is given by (6D/n) x sqrt D – this means that for smaller samples (say a few dozen coins, with a value of C less than 0.5), the possible error range in the die
estimate can easily be plus or minus a factor of 2. This may seem a lot, but it is still sufficient to tell us whether the issue we are dealing with involved only a dozen or so dies, or scores of dies, or many hundreds.
My version of Good's Formula. Finally I come to a formula of my own devising. This is a very simple formula which in practice gives a good approximation to the Good's Formula 2 above for the coverage, using
only the number of different dies d in a sample of size n; i.e, it does not involve d1, the number of singletons in the sample. Specifically, I approximate d1
by d/R, where R = n/d as before, so that from Formula 2 the coverage is now estimated by##:
Applying this as in Equation 3, we get:
This last formula assumes equal die outputs, so add a spread factor of your own choice. Formula 5 above becomes here:
So, taking p = 2 as usual, we get:
giving ultimately:
In practice this last formula agrees very closely with the full Carter formula, i.e, Eqn 7 above, which also assumes p = 2, but as well it can be easily adapted to any value of p. Conclusion. Now that you
have the basic formulae go to an archive site (CNG, Coin Archives or the like), search on your favourite coin, save off the results and get to work on a die number estimate. Note that for large issues you don't need to
check every single coin that comes up – just select a manageable number (say not more than 30) of coins in reasonable condition and go to it. (To make life easier for yourself set the formulae up in a small
spreadsheet). Note that for scyphate types you need to check both the left and right sides of the "obverse" (the convex side in this case), as these types are usually double struck, and not infrequently different
obverse dies are used for the two strikes – it seems the die holders often dropped their dies and had to grab another (cooler) one. But be warned – die estimating can become addictive, and is rather time consuming.
* The formulae in this note are mostly taken from an article by Warren Esty in Numismatic Chronicle 2007, p. 359-364; they are discussed in more detail in an article by the same author in Numismatic Chronicle
1986, p. 185-216. ** Note that this is a more restricted definition of coverage than that originally introduced by Good in 1953. # The essence of the proof is to show that, while the proportion in the
sample (the frequency) of coins from all dies appearing r times in the sample is rdr/n, the proportion of coins from these dies in the population is best estimated by (r+1)dr+1/n, the
difference being due to the coins in the population from dies not in the sample. The sum of these latter frequencies from r = 1 upwards is the coverage C, and is evidently 1 - d1/n (since the sum of all the
sample frequencies rdr/n from r = 1 upwards will necessarily total 1). We therefore see that the proportion in the population of coins from dies not
appearing in the sample can be estimated by d1/n, which, assuming equal die outputs, equals do/D, the proportion in the population of dies not present in the sample. In this case, therefore, Ce
= 1 - d1/n also estimates 1 - do/D, the coverage of dies in the sample. Informally, we can argue like this - given that we have a sample that includes d dies, the probability that the next coin
examined will be from a known die is d/D, i.e, it equals the coverage C. Now the relative fraction of singletons in the sample is d1/n, so that we might intuitively expect that the probability that the next
coin will be another singleton, and hence from a new die, is also d1/n, so that the probability that next coin will be from a known die is 1 - d1/n. Hence Ce = 1 - d1
/n. However, in the end intuition is not formally rigorous proof. ## Formula 8 is derived from the fact that if d = n - x, then for x << n, (i.e, R not much more than 1), d1
will usually equal d - x, or not much more, so that d1 = n - 2x = d2/n = d/R, approximately, just as d = n/R. In practice the formula also works well (i.e, agrees with Formulae 3 and 7 above) for
higher values of R as well, which suggests that our estimate for d1
(d/R) is essentially valid for all R values - in fact this may well be the basis of the Carter formula above. This is reasonable considering that for R > 2 this formula gives a value for d1
in the lower part of the possible range of the actual values of d1 in the sample, namely 0 to d-1; in any case for higher values of R the factor d1/n is small, and hence an accurate
estimate of it is not needed anyway for reasonable results. Ross Glanfield. January 2010. Latest revisions: 21 May '10: My own formula added.
22 May '10: Discussion section rewritten, modifying views on value of p. 22 June '10: Mistake in error range estimate corrected. |