In my last post introducing Bayesian stats through the simplest non-trivial distribution, the binomial, I introduced moment-matching “empirical Bayes” point estimates of the beta priors. In this post, I’ll introduce the so-called “Bayesian estimator” point estimate for the beta priors. In addition, I’ll show why maximum a posteriori (MAP) estimates of batting average differ from the Bayesian estimates for batting average.

### Bayesian Estimators

Suppose we provide an estimate for a parameter that has true value . We define a loss function , lower values of which indicate that is a better estimate of .

The Bayesian estimator minimizes the expected loss given a Bayesian posterior distribution over parameter :

### With Squared Loss, Bayesian Estimates are Posterior Means

The most common loss function, due to its convenience, is squared loss:

As stated, this assumes a single real-valued parameter; the natural extension is to square distance for parameter vectors.

With squared loss, the expected loss is minimized by the mean of the posterior. In other words, the Bayes estimator for a parameter under squared loss is the posterior mean of that parameter:

### Beta-Binomial Batting Model

Our model for batting so far is very simple, with player ‘s ability being drawn from a beta prior with fixed hyperparameters (prior hits plus 1) and (prior outs plus 1):

The number of hits for player in at bats is drawn from a binomial sampling distribution:

The observed batting average is just .

Recall that the binomial is defined by:

and the beta prior by:

Given the convenient exponential form of the prior and likelihood function, the posterior is easily computed as:

Given this final form, which is proportional to a beta, we have

.

Thus the beta prior is conjugate to the binomial distribution, because when the prior over abilities is a beta density and the sampling distribution for the number of hits a binomial distribution, the posterior over abilities is also a beta density.

### Bayesian Estimator for Batting Ability

The posterior mean for batting ability for player is thus the mean of the posterior beta distribution. The mean of a beta distribution is . So for the posterior , the Bayes estimators for batting ability are just:

### Contrast with the MAP Estimate

Note that this is not the same estimate as the maximum a posteriori (MAP) estimate, because the mode (maximum point) and mean of a beta distribution are not the same. The MAP estimate is based on the mode,

### Diffuse Priors for the Beta Priors

In order to derive a Bayes estimate of the prior parameters and , we need to extend our model with a prior for the priors (as we said last time, it’s elephants all the way down). As is common in these priors-for-priors situations, we’ll put a very diffuse prior on the priors themselves.

For convenience, we’ll reparameterize the beta distribution using its mean and scale . Given the mean and scale, we can derive and by:

.

We then put priors directly on the mean and scale. Because the mean has a range of 0 to 1, we can use a uniform prior:

Because the scale ranges from 0 to infinity, it’s impossible to have a (proper) uniform prior, because the integral would diverge. Instead, we use a Pareto prior,

The Pareto distribution is defined on the interval by

In other words, the Pareto distribution is a power-law distribution with exponent . This is a fairly diffuse prior, slightly favoring smaller values of the scale.

### BUGS Code

Here’s the full model coded in BUGS:

model { for (j in 1:J) { ability[j] ~ dbeta(prior.hits,prior.outs) hits[j] ~ dbin(ability[j],atbats[j]) } prior.hits <- prior.avg * prior.atbats prior.outs <- (1 - prior.avg) * prior.atbats prior.avg ~ dbeta(1,1) prior.atbats ~ dpar(1.5,1) }

This model assumes J batters, with abilities coded as ability[j], hits as hits[j], at bats as atbats[j]. The prior parameter is coded as prior.hits, and as prior.outs. The reparameterized prior mean is coded as prior.avg, and the reparameterized prior scale as prior.atbats.

### Gibbs Sampling

With the model coded in BUGS, we can automatically run Gibbs sampling to collect a number of samples from the posterior distribution (we suppressed the conditioning on the hyperparameters).

We can then approximate the posterior marginal averages (to any number of decimal places) by taking the average of the samples for the respective variables.

It’s very easy to call BUGS from R using the >R2WinBUGS package. So we read the data into R, use it to call BUGS, then we collect the results back in R, where we can calculate their means.

data.tsv = scan("2006ALnonPitch.tsv") N = length(data.tsv)/2 y = matrix(data.tsv, nrow=N, ncol=2, byrow=TRUE) hits = y[,1] atbats = y[,2] library("R2WinBUGS") data = list("atbats","hits","N") parameters = c("ability","prior.avg","prior.atbats") inits = function() { list(ability=rep(N,0.30), prior.hits=c(1),prior.outs=c(2), prior.avg=c(1.0/3.0), prior.atbats=c(3)) } anno = bugs(data, inits, parameters, "c:/carp/devmitz/carp/baseball/src/baseballone/binomial.bug", n.chains=3, n.iter=1000, debug=TRUE, clearWD=TRUE, bugs.directory="c:\WinBUGS\WinBUGS14") print(anno) plot(anno) attach.bugs(anno)

This presupposes that you’ve installed BUGS at the location specified by bugs.directory, have provided the BUGS program at the specified full path (I can’t get relative paths to work here), and have provided the data in the file 2006ALnonPitch.tsv (see the last section for a dump of the data in this file). The format of the tsv (tab-separated values) file is one player per line, with the number of hits, a tab, the number of at bats, and a newline.

The parameters specify three simultaneous Markov chains for 1000 iterations each. By default, BUGS will discard the first half of the iterations to remove noise from before the chains converge.

We initialize the ability values to , and . Better initialization helps models converge to the posterior samples more quickly, but aren’t even necessary for easy-to-fit models like these that run relatively quickly.

The data and parameters are specified. Data must be defined in R before calling BUGS, and parameters are returned as a two-dimensional array of samples (indexed by chain and sample number).

BUGS is not super-fast, taking about 15 minutes to take 3000 samples in 3 chains on my old notebook (it’s about twice as fast on my newish workstation).

### Bayesian Estimates

While I don’t have time to explain Gibbs sampling in detail in this context, let me just say that the three chains mixed very well with values all having values very near 1.0.

The mean prior batting average is 0.271, and the mean prior at bats was 403. Note that this average is lower and the number of at bats higher than with the moment-matching estimates. I calculated these values in R after closing the BUGS window by:

> mean(prior.avg) [1] 0.2709107 > mean(prior.atbats) [1] 403.1252

This corresponds to beta density parameters and .

I’ll discuss more about the joint marginal average/at-bats posterior and the marginal ability posteriors in the next and (hopefully) final post in the series.

### Who Do We Think is Better?

Who do we think is a better batter, someone we’ve seen go 40/100 (a 0.400 average) or someone we’ve seen go 175/500 (a 0.350 average)?

For now, suppose we fix an empirical Bayes prior to the Bayesian estimate , so that and The Bayesian estimator (posterior mean) for a player’s ability when we observe the player getting get hits in at bats, is then the expected value of the posterior, which is:

If we plug in we get an estimated ability of 0.295. If we plug in , we get an estimated ability of 0.314! If we believe this model, we can’t just compare batting averages — the number of at bats is a critical element of our posterior estimate for low numbers of at bats. This is the effect of the prior, not the Bayesian inference *per se*. Of course, the scale of the prior being so large is the result of using the Bayesian estimator for the prior rather than the moment-matching estimate, so the effect is greater.

### In the Limit

In the limit, of course, the Bayesian estimate approaches the maximum likelihood estimate:

### Data from Baseball1

I gathered the data for the 2006 American League position players from Baseball1 (click on the “download database” menu item on the nav bar). It requires merging the player table (to get positions and exclude piters) and the batting table. For completeness, here’s the munged data:

161 558 156 555 17 89 103 333 10 39 22 60 0 2 3 20 41 183 57 201 2 8 9 30 16 74 65 237 8 35 54 185 121 423 82 365 159 546 5 22 3 9 12 56 143 491 147 506 1 13 12 45 2 13 1 4 43 175 11 39 155 596 3 14 59 251 46 212 141 556 2 14 55 251 3 10 114 393 54 193 76 267 10 40 129 460 31 154 60 229 0 8 27 110 2 3 16 60 6 11 0 1 0 9 15 99 3 20 6 27 8 55 5 29 165 482 37 136 37 164 47 193 1 11 43 146 2 10 170 603 16 70 61 268 16 43 14 49 5 12 11 50 8 24 103 413 1 6 103 352 29 130 27 115 0 1 190 655 30 115 67 286 2 7 53 220 11 45 10 27 40 126 136 450 59 246 91 371 18 65 33 131 0 3 159 569 12 79 76 333 21 87 7 24 102 373 35 127 181 521 83 385 145 491 82 358 75 234 3 32 8 40 6 27 146 569 190 592 53 234 18 87 4 13 10 98 147 461 72 243 52 172 110 364 5 25 181 581 62 224 161 604 5 18 140 454 109 413 97 365 0 6 1 4 14 76 51 156 2 17 158 557 1 7 89 348 21 82 11 50 3 39 18 75 28 103 122 444 24 96 62 189 138 541 91 279 110 441 133 459 157 591 18 80 17 87 53 222 30 99 3 13 132 494 154 520 87 294 56 235 17 81 59 226 10 41 185 611 51 174 4 16 1 5 52 197 23 80 217 691 177 628 164 547 166 572 91 320 70 308 33 130 131 437 68 218 65 230 51 221 45 132 7 49 183 600 151 527 17 64 136 461 9 44 80 325 224 695 5 28 137 524 174 543 62 251 14 67 38 176 138 502 8 42 4 17 22 104 163 552 74 255 7 36 77 343 69 209 214 648 9 33 6 36 138 501 170 539 117 485 43 152 12 48 165 557 6 37 5 50 118 420 83 337 139 448 129 465 214 623 113 446 71 320 14 56 200 607 1 9 171 607 150 509 160 558 7 40 89 314 73 270 155 557 113 401 2 10 156 591 102 350 32 153 163 548 166 620 56 217 13 65 181 626 111 474 1 13 79 290 89 289 5 28 61 179 117 430 163 573 99 388 173 584 43 172 137 542 109 463 37 156 141 490 58 260 181 572 7 27 34 169 97 351 9 61 18 79 83 276 0 4 36 156 45 184 39 164 86 268 89 354 128 463 0 2 28 128 126 466 38 146 177 593 76 236 177 566 7 45 59 220 154 544 2 9 10 46 144 449 87 365 102 381 31 161 4 19 89 296 169 593 161 563 171 624 194 620 25 121 18 88 95 343 103 389 42 178 45 151 12 63 74 279 82 288 88 344 56 211 54 225 123 433 123 451 136 540 21 95 152 543 5 27 46 197

## Leave a Reply