# Learning the Ways of Bayes

Author: Byron Jones and Scott Haughie

One very useful application of Bayesian methods is to reduce the size of a clinical trial by using prior information on the effects of a control or placebo treatment. Here Byron Jones and Scott Haughie illustrate how this can be done.

Introduction

Thomas Bayes' tomb, Bunhill Fields cemetry, London. Photo: Anna Mair

Mention the use of Bayesian methods to statisticians in the pharmaceutical industry and you are likely to get one of two reactions. Some will enthusiastically endorse their use and will no doubt regale you with examples of past successes of applying such methods. Others will be much less enthusiastic and will tell you that such methods are inappropriate or difficult to use or are poorly understood by non-statistical colleagues. Among the sceptics are the so-called “Frequentists” who have ideological reasons for not endorsing the Bayesian way. We will say a little more about the Frequentist approach in the next section. We take the view that a modern statistician needs to have a selection of tools in his or her technical kit bag and that on some occasions will need to take out the Bayesian screwdriver and on other occasions the Frequentist spanner. The purpose of this article is to give an example to illustrate a situation where we believe that using tools out of the Bayesian section of the kit bag is a good idea. We also don’t think that Bayesian methods are difficult to apply, even in complex experimental situations, due to the availability of excellent, easy-to-use, software such as WinBUGS1.

Who is Bayes?

There are many publications that will tell you the answer to this is in great detail and give you lots of information on the late Reverend Thomas Bayes. Please look these up if you have time. One we recommend is Bellhouse2 . However, all you really need to know for now is that the Reverend Bayes proposed a theorem3 showing how to formally combine past information on parameters of interest with experimental data on those parameters to give an integrated description of the experimental results and a way of making probabilistic statements about present and future values of the data and parameters. A significant bonus that comes from using “Bayesian” methods is that we can make probabilistic statements about parameters of interest.

The usual Frequentist approach to analysing a study involves calculating the probability of observing the data (the ‘p-value’) assuming a particular value for the parameter of interest (e.g., the difference in means, δ, is zero). In contrast, the Bayesian approach involves calculating the probability that the parameter of interest takes a particular value, given the observed data. For example, when comparing two active treatments for pain relief we can make statements such as “there is a 0.65 probability that treatment A gives a 10% better reduction in a standard measure of pain than treatment B”. In the Bayesian world the parameters are random variables and statements about them are made conditional on a fixed set of observed data. The alternative, “Frequentist” view is to consider the parameters as fixed (but unknown) and the experimental data as a particular outcome of a random process that could, in principle, generate an infinite number of realisations of the data. Probabilistic statements from this approach depend on the long run frequency that events can occur. For a good introduction to Bayesian methods in clinical trials see Spiegelhalter et al.4

In summary, there are three main ingredients to a Bayesian analysis: (1) the prior distribution (the description of past knowledge), (2) the likelihood (the description of the experimental data) and (3) the posterior distribution (the result of formally integrating the prior and the likelihood).

In the next section we will briefly discuss where Bayesian methods can add value in clinical trials. In the following section we will describe a real clinical trial that used a Bayesian approach to reduce the number of patients needed to test for a treatment difference. For this example we will describe each of the above three ingredients and, most importantly, discuss how the prior distributions were chosen.

Using Bayesian methods in clinical trials

Of great importance to pharmaceutical companies and patients is the time it takes to complete the clinical trials that are run as part of a drug development programme. Trials with fewer patients cost less to run and will complete sooner than larger trials. This is obviously beneficial for patients as it should reduce the time it takes a successful drug to reach the market place. It is beneficial to the pharmaceutical industry for not only this reason but also because the additional cost savings mean that, if a drug is shown to be less effective than anticipated, the decision to stop the development of that drug and shift resources to other programmes can be made sooner.

Sometimes a new drug is a member of a class of drugs that a company has a lot of experience developing, or there is a wealth of information in the literature about that class of drugs. If this is the case, then it makes eminent sense to make use of that information, not only in the planning of a trial to examine the effects of the new drug, but also in the analysis of the trial results. This is an ideal situation for the application of Bayesian methods. The archive of past information held within a company, or a summary of information from past trials that are reported in the literature, provide an excellent basis for constructing prior distributions for the parameters of interest in the new trial. Often, one of the treatments included in a trial is a placebo, a dummy drug that looks and tastes like an active drug but has no active constituents. The effects of different doses of the active drug are compared to that of the placebo to measure how beneficial are the doses of the new drug. Usually, there is good prior information on the anticipated ‘placebo’ effect. Sometimes, an active control treatment is also included in the trial, to a give a measure of how well the new drug compares to this control. Often, the control is a marketed competitor drug whose effects have been well established. As well as a means of calibrating the effectiveness of the new drug, the active control also provides reassurance that the trial had been conducted correctly. If the anticipated responses on the control drug are not seen in the trial, this will raise doubts about the integrity of the results on the new drug.

So in summary, there are often situations where informative prior distributions can be constructed and this naturally leads to the use of Bayesian methods. Of course, we should add that all priors are subjective to some degree. At the extreme they are not based directly on data but on ‘expert’ opinion. Even when data are available, they can only represent a sample of the total amount of information that could exist. For example, a company will have only run a finite (usually small) number of previous trials on a drug. The information in the literature usually only represents a selection of what could have been published. So in general, all prior distributions are subject to criticism and perhaps also to claims that they have been selected to help ensure a particular result is obtained. Of course, in any scientific endeavour, such as the development of a new drug, it is in no-one’s interest to use prior distributions that are not based on careful and objective assessment of the available information. From a technical perspective, the prior information should also be ‘exchangeable’ with the data obtained in the current trial4. Sometimes, little prior information is available or there is a concern that the results will not be believed because of the use of a ‘subjective’ prior distribution. In this case so-called ‘uninformative’ priors are used. These allow the values of the parameters of interest to vary over a wide (even infinite) range of values and so do not have much influence when the prior and likelihood are integrated to produce the posterior distribution. One might wonder what benefit might be gained from doing this. After all, Bayesian methods are designed to make use of prior information, aren’t they? Well, here the ideological advantage comes into play. The Bayesian approach allows probability statements to be made about the parameters of interest based on the posterior distributions.

An example that uses Bayesian methods

This example is based on a Phase II study that was designed to test an experimental treatment (E) against a placebo treatment (P) in an indication where improvements in the condition being treated can be detected using a symptom severity score (SSS). For a treatment to be effective in this indication, reductions in the SSS (relative to the baseline score) are desired. The larger the reduction, the more effective the treatment.

From three previous clinical trials in the same indication, the mean reduction in the SSS for P is expected to be 3.2 points. (These trials were selected because they were conducted in a similar patient population and had similar designs to the current study).

From a previous trial using a drug in the same class, the mean reduction in the SSS for E is expected to be 5.7 points. So the expected difference in means between E and P is δ = 2.5 points. The common standard deviation for the reduction in SSS is expected to be 6. A standard Frequentist sample size calculation, using a 1-sided 5% significance level, indicates that 80 patients per group would be required for just over 80% power.

This sample size was considered too large for the planned Phase II study and so a Bayesian approach was taken. For several reasons, the prior distribution for the E mean was taken to be non-informative (i.e., we did not use any prior data from the single previous trial using a similar active treatment). In contrast, an informative prior distribution was used for the P mean. To obtain this, we used the same three previous trials that were used to derive the assumptions which informed the Frequentist sample size calculation. In the Bayesian context, however, the previous results are used more explicitly, to derive a prior distribution for the P mean. Then the prior distribution for δ is derived from the prior distributions for the E mean and the P mean.

Bayesian sample size calculation methods are described in Whitehead et al.5 In basic terms, the prior data represents a set of “pseudo-patients” and we need to work out how many additional patients to include in our new trial so that our posterior distribution (pseudo-patients + new patients) has the required properties. When the new sample size is calculated in this way, we find that no new patients are required for P. The three previous trials contributed 287 patients in total for P and it turns out that this is sufficient to meet the stated objectives. However, it makes intuitive sense to collect at least some data for P in the new trial so that we are not relying entirely on historical data. How can we do this? Well, it is possible to “discount” a proportion of the previous data, so there is a greater requirement to collect new data. For example a 90% “discount” on the prior data would be equivalent to using only 28.7 “pseudo-patients” (i.e. 10% of 287). This is equivalent to multiplying the prior variance for the placebo mean by a factor of 10. It is possible to decide on the discount factor, then re-calculate the sample size. In practice, however, a sensible sample size for P can be chosen first, then the discount factor follows. For our study, we ended up choosing 80 patients for E and 30 patients for P, a saving of 50 patients (over 30%).

After deciding on the sample size, the study was run and analysed using the planned Bayesian approach. Figure 1 shows the posterior distribution for the treatment difference (d), which has mean 2.91 and standard deviation 1. The grey area in the left-hand plot represents the probability of exceeding zero (treatment better than placebo) and the grey area in the right-hand plot represents the probability that the treatment is better than placebo by 2.5 points (the clinically meaningful difference). The 90% credible interval4 for the treatment difference is (1.30, 4.55) and this is identified in the left-hand plot. We are 90% certain that the true difference lies between 1.30 and 4.55.

Figure 1. Posterior distribution

Summary

Bayesian methods have wide application in pharmaceutical research in particular and in medical and health research more generally (see Spiegelhalter, et al.4, for example). One important application of Bayesian methods is in the fitting of dose-response relationships. These relationships describe how the response to treatment changes as the dose of a drug is increased. Typically these relationships can be summarised by simple mathematical functions with three or four unknown parameters. Often, there is good prior data on these parameters. Then the use of modelling in conjunction with a Bayesian analysis can bring large increases in precision in the estimation of these parameters compared to simple pair-wise comparisons of observed dose-group means.

Byron Jones is a Biometrical Fellow in Statistical Methodology at Novartis Pharma AG, Basel, Switzerland. Scott Haughie is a Director in the Primary Care Business Unit at Pfizer Ltd, Sandwich, UK.

## References

• 1. WinBUGS, Version 1.4.3, 2007. Medical Research Council, UK.
• 2. Bellhouse, D.R. (2004). The Reverend Thomas Bayes, FRS: A Biography to Celebrate the Tercentenary of His Birth. Statistical Science, 19, 3-43.
• 3. Bayes, T. (1763). An essay towards solving a problem in the doctrine of chances. Philosophical Transactions of the Royal Society, 53, 418.
• 4. Spiegelhalter, D.J., Abrams, K.R. and Myles, J.P. (2004). Bayesian Approaches to Clinical Trials and Health-Care Evaluation. Wiley: Chichester.
• 5. Whitehead J., Valdes-Marquez E., Johnson P. and Graham G. (2008). Bayesian sample size for exploratory clinical trials incorporating historical data. Statistics in Medicine, 27, 2307-2327

Site Search