Bayesian and frequentist reasoning in plain English

来源：互联网发布：java实现平衡二叉树编辑：程序博客网时间：2024/06/02 06:55

Here is how I would explain the basic difference to my grandma:

I have misplaced my phone somewhere in the home. I can use the phone locator on the base of the instrument to locate the phone and when I press the phone locator the phone starts beeping.

Problem: Which area of my home should I search?

Frequentist Reasoning:

I can hear the phone beeping. I also have a mental model which helps me identify the area from which the sound is coming from. Therefore, upon hearing the beep, I infer the area of my home I must search to locate the phone.

Bayesian Reasoning:

I can hear the phone beeping. Now, apart from a mental model which helps me identify the area from which the sound is coming from, I also know the locations where I have misplaced the phone in the past. So, I combine my inferences using the beeps and my prior information about the locations I have misplaced the phone in the past to identify an area I must search to locate the phone.

answered Jul 19 '10 at 19:42

user28

I like the analogy. I would find it very useful if there were a defined question (based on a dataset) in which an answer was derived using frequentist reasoning and an answer was derived using Bayesian - preferably with R script to handle both reasonings. Am I asking too much? – Farrel Jul 19 '10 at 19:56

The simplest thing that I can think of that tossing a coin n times and estimating the probability of a heads (denote by p). Suppose, we observe k heads. Then the probability of getting k heads is: P (k heads in n trials) = (n, k) p^k (1-p)^(n-k) Frequentist inference would maximize the above to arrive at an estimate of p = k / n. Bayesian would say: Hey, I know that p ~ Beta(1,1) (which is equivalent to assuming that p is uniform on [0,1]). So, the updated inference would be: p ~ Beta(1+k,1+n-k) and thus the bayesian estimate of p would be p = 1+k / (2+n) I do not know R, sorry. – user28 Jul 19 '10 at 20:11

It should be pointed out that, from the frequentists point of view, there is no reason that you can't incorporate the prior knowledge into the model. In this sense, the frequentist view is simpler, you only have a model and some data. There is no need to separate the prior information from the model. – Robby McKilliam Sep 9 '10 at 22:29

@user28 As a comment on your comment, if

n=3, then the frequentist would estimate

p=0 (respectively

p=1) upon seeing a result of

k=0 heads (respectively

k=3 heads), i.e., the coin is two-headed or two-tailed. The Bayesian estimates

1/5 and

4/5 respectively do allow for the possibility that it is a somewhat less biased coin. – Dilip Sarwate Oct 4 '11 at 13:25

@Farrel - the recent question at stats.stackexchange.com/questions/21439/… and my answer in two parts (unintentionally) is a nice simple example of this. It would be fairly easy to knock together an example dataset and R script showing the two approaches. – Peter Ellis Jan 23 '12 at 18:30

show 3 more comments

up vote43down vote

Tongue firmly in cheek:

A Bayesian defines a "probability" in exactly the same way that most non-statisticians do - namely an indication of the plausibility of a proposition or a situation. If you ask him a question, he will give you a direct answer assigning probabilities describing the plausibilities of the possible outcomes for the particular situation (and state his prior assumptions).

A Frequentist is someone that believes probabilities represent long run frequencies with which events occur; if needs be, he will invent a fictitious population from which your particular situation could be considered a random sample so that he can meaningfully talk about long run frequencies. If you ask him a question about a particular situation, he will not give a direct answer, but instead make a statement about this (possibly imaginary) population. Many non-frequentist statisticians will be easily confused by the answer and interpret it as Bayesian probability about the particular situation.

However, it is important to note that most Frequentist methods have a Bayesian equivalent that in most circumstances will give essentially the same result, the difference is largely a matter of philosophy, and in practice it is a matter of "horses for courses".

As you may have guessed, I am a Bayesian and an engineer. ;o)

edited Feb 5 '13 at 18:57

whuber♦
93.7k8162327

answered Aug 12 '10 at 14:53

Dikran Marsupial
19.4k3777

As a non-expert, I think that the key to the entire debate is that people actually reason like Bayesians. You have to be trained to think like a frequentist, and even then it's easy to slip up and either reason or present your reasoning as if it were Bayesian. "There's a 95% chance that the value is within this confidence interval." Enough said. – Wayne Apr 7 '11 at 21:08

Downvoter - why the downvote? The feedback would be helpful. – Dikran Marsupial May 15 at 8:41

add a comment

up vote24down vote

Very crudely I would say that:

Frequentist: Sampling is infinite and decision rules can be sharp. Data are a repeatable random sample - there is a frequency. Underlying parameters are fixed i.e. they remain constant during this repeatable sampling process.

Bayesian: Unknown quantities are treated probabilistically and the state of the world can always be updated. Data are observed from the realised sample. Parameters are unknown and described probabilistically. It is the data which are fixed.

There is a brilliant blog post which gives an indepth example of how a Bayesian and Frequentist would tackle the same problem. Why not answer the problem for yourself and then check?

The problem (taken from Panos Ipeirotis' blog):

You have a coin that when flipped ends up head with probability p and ends up tail with probability 1-p. (The value of p is unknown.)

Trying to estimate p, you flip the coin 100 times. It ends up head 71 times.

Then you have to decide on the following event: "In the next two tosses we will get two heads in a row."

Would you bet that the event will happen or that it will not happen?

answered Jul 21 '10 at 15:50

Graham Cookson
2,62121428

Since

0.712=0.5041, I would regard this as close enough to an even bet to be prepared to go modestly either way just for fun (and to ignore any issues over the shape of the prior). I sometimes buy insurance and lottery tickets with far worse odds. – Henry Oct 4 '11 at 13:35

At the end of that blog post it says "instead of using the uniform distribution as a prior, we can be even more agnostic. In this case, we can use the Beta(0,0) distribution as a prior. Such a distribution corresponds to the case where any mean of the distribution is equally likely. In this case, the two approaches, Bayesian and frequentist give the same results." which kind of sums it up really! – tdc Feb 8 '12 at 8:39

The big problem with that blog post is it does not adequately characterize what a non-Bayesian (but rational) decision maker would do. It's little more than a straw man. – whuber♦ May 4 '12 at 22:36

@tdc: the Bayesian (Jeffreys) prior is Beta(0.5, 0.5) and some would say that it is the only justifiable prior. – Neil G Aug 3 '12 at 18:59

add a comment

up vote20down vote

Just a little bit of fun...

A Bayesian is one who, vaguely expecting a horse, and catching a glimpse of a donkey, strongly believes he has seen a mule.

From this site:

http://www2.isye.gatech.edu/~brani/isyebayes/jokes.html

and from the same site, a nice essay...

"An Intuitive Explanation of Bayes' Theorem"

http://yudkowsky.net/rational/bayes

edited Dec 4 '10 at 3:12

answered Dec 4 '10 at 3:06

Brett Magill
2,9631524

In which case, the wouldn't the frequentist be one who knows the ratio of donkey, mule and horse populations, and upon observing a pack of mules starts to calculate the p-value to know as to whether there has been a statistically significant increase in the population ratio of mules. – Andrew Apr 20 '12 at 6:22

add a comment

up vote18down vote

Let us say a man rolls a six sided die and it has outcomes 1, 2, 3, 4, 5, or 6. Furthermore, he says that if it lands on a 3, he'll give you a free text book.

Then informally:

The Frequentist would say that each outcome has an equal 1 in 6 chance of occurring. She views probability as being derived from long run frequency distributions.

The Bayesian however would say hang on a second, I know that man, he's David Blaine, a famous trickster! I have a feeling he's up to something. I'm going to say that there's only a 1% chance of it landing on a 3 BUT I'll re-evaluate that beliefe and change it the more times he rolls the die. If I see the other numbers come up equally often, then I'll iteratively increase the chance from 1% to something slightly higher, otherwise I'll reduce it even further. She views probability as degrees of belief in a proposition.

edited Sep 18 '11 at 10:09

answered Jul 19 '10 at 23:40

Tony Breyal
1,327712

I think the frequentist would (verbosely) point out his assumptions and would avoid making any useful prediction. Maybe he'd say, "Assuming the die is fair, each outcome has an equal 1 in 6 chance of occurring. Furthermore, if the die rolls are fair and David Blaine rolls the die 17 times, there is only a 5% chance that it will never land on 3, so such an outcome would make me doubt that the die is fair." – Thomas Levine Jun 16 '11 at 3:41

@ThomasLevine I like that. Have an upvote from me :) – Tony Breyal Feb 29 '12 at 21:35

add a comment

up vote14down vote

The Bayesian is asked to make bets, which may include anything from which fly will crawl up a wall faster to which medicine will save most lives, or which prisoners should go to jail. He has a big box with a handle. He knows that if he puts absolutely everything he knows into the box, including his personal opinion, and turns the handle, it will make the best possible decision for him.

The frequentist is asked to write reports. He has a big black book of rules. If the situation he is asked to make a report on is covered by his rulebook, he can follow the rules and write a report so carefully worded that it is wrong, at worst, one time in 100 (or one time in 20, or one time in whatever the specification for his report says).

The frequentist knows (because he has written reports on it) that the Bayesian sometimes makes bets that, in the worst case, when his personal opinion is wrong, could turn out badly. The frequentist also knows (for the same reason) that if he bets against the Bayesian every time he differs from him, then, over the long run, he will lose.

answered Dec 4 '10 at 6:54

mcdowella
20722

add a comment

up vote14down vote

In plain english, I would say that Bayesian and Frequentist reasoning are distinguished by two different ways of answering the question:

What is probability?

Most differences will essentially boil down to how each answers this question, for it basically defines the domain of valid applications of the theory. Now you can't really give either answer in terms of "plain english", without further generating more questions. For me the answer is (as you could probably guess)

probability is logic

my "non-plain english" reason for this is that the calculus of propositions is a special case of the calculus of probabilities, if we represent truth by 1 and falsehood by 0. Additionally, the calculus of probabilities can be derived from the calculus of propositions. This conforms with the "bayesian" reasoning most closely - although it also extends the bayesian reasoning in applications by providing principles to assign probabilities, in addition to principles to manipulate them. Of course, this leads to the follow up question "what is logic?" for me, the closest thing I could give as an answer to this question is "logic is the common sense judgements of a rational person, with a given set of assumptions" (what is a rational person? etc. etc.). Logic has all the same features that Bayesian reasoning has. For example, logic does not tell you what to assume or what is "absolutely true". It only tells you how the truth of one proposition is related to the truth of another one. You always have to supply a logical system with "axioms" for it to get started on the conclusions. They also has the same limitations in that you can get arbitrary results from contradictory axioms. But "axioms" are nothing but prior probabilities which have been set to 1. For me, to reject Bayesian reasoning is to reject logic. For if you accept logic, then because Bayesian reasoning "logically flows from logic" (how's that for plain english :P ), you must also accept Bayesian reasoning.

For the frequentist reasoning, we have the answer:

probability is frequency

although I'm not sure "frequency" is a plain english term in the way it is used here - perhaps "proportion" is a better word. I wanted to add into the frequentist answer that the probability of an event is thought to be a real, measurable (observable?) quantity, which exists independently of the person/object who is calculating it. But I couldn't do this in a "plain english" way.

So perhaps a "plain english" version of one the difference could be that frequentist reasoning is an attempt at reasoning from "absolute" probabilities, whereas bayesian reasoning is an attempt at reasoning from "relative" probabilities.

Another difference is that frequentist foundations are more vague in how you translate the real world problem into the abstract mathematics of the theory. A good example is the use of "random variables" in the theory - they have a precise definition in the abstract world of mathematics, but there is no unambiguous procedure one can use to decide if some observed quantity is or isn't a "random variable".

The bayesian way of reasoning, the notion of a "random variable" is not necessary. A probability distribution is assigned to a quantity because it is unknown - which means that it cannot be deduced logically from the information we have. This provides at once a simple connection between the observable quantity and the theory - as "being unknown" is unambiguous.

You can also see in the above example a further difference in these two ways of thinking - "random" vs "unknown". "randomness" is phrased in such a way that the "randomness" seems like it is a property of the actual quantity. Conversely, "being unknown" depends on which person you are asking about that quantity - hence it is a property of the statistician doing the analysis. This gives rise to the "objective" versus "subjective" adjectives often attached to each theory. It is easy to show that "randomness" cannot be a property of some standard examples, by simply asking two frequentists who are given different information about the same quantity to decide if its "random". One is the usual Bernoulli Urn: frequentist 1 is blindfolded while drawing, whereas frequentist 2 is standing over the urn, watching frequentist 1 draw the balls from the urn. If the declaration of "randomness" is a property of the balls in the urn, then it cannot depend on the different knowledge of frequentist 1 and 2 - and hence the two frequentist should give the same declaration of "random" or "not random".

answered Aug 28 '11 at 15:51

probabilityislogic
12.2k3549

I'd be interested if you could rewrite this without the reference to common sense. – Peter Ellis Jan 24 '12 at 10:03

@PeterEllis - What's wrong with common sense? We all have it, and it is usually foolish not to use it... – probabilityislogic Jan 24 '12 at 12:15

It's too contested what it actually is, and too culturally specific. "Common sense" is short hand for whatever is the perceived sensible way of doing things in this particular culture (which all too often looks far from sensible to another culture in time and space), so referring to it in a definition ducks the key questions. It's particularly unhelpful as part of a definition of logic (and so, I would argue, is the concept of a "rational person" in that particular context - particularly as I am guessing your definition of a "rational person" would be a logical person who has common sense!) – Peter Ellis Jan 24 '12 at 18:04

I fail to understand why using common sense is bad. using your definition of it, why would we not want to do what is sensible at the time? And what is the "key questions" that are being dodged? you say common sense has no well defined meaning, and then go and provide one! – probabilityislogic Jan 26 '12 at 0:56

He can't provide one, his argument is that there is no universal definition, only culturally-specific ones. Two people from different cultural backgrounds (and that includes different styles of statistical education) will quite possibly have two different understandings of what is sensible to do in a given situations. – naught101 Feb 21 '12 at 3:05

show 1 more comment

up vote5down vote

In reality, I think much of the philosophy surrounding the issue is just grandstanding. That's not to dismiss the debate, but it is a word of caution. Sometimes, practical matters take priority - I'll give an example below.

Also, you could just as easily argue that there are more than two approaches:

Neyman-Pearson ('frequentist')
Likelihood-based approaches
Fully Bayesian

A senior colleague recently reminded me that "many people in common language talk about frequentist and Bayesian. I think a more valid distinction is likelihood-based and frequentist. Both maximum likelihood and Bayesian methods adhere to the likelihood principle whereas frequentist methods don't."

I'll start off with a very simple practical example:

We have a patient. The patient is either healthy(H) or sick(S). We will perform a test on the patient, and the result will either be Positive(+) or Negative(-). If the patient is sick, they will always get a Positive result. We'll call this the correct(C) result and say that

P (+ | S) = 1

P (C o r r e c t | S) = 1

If the patient is healthy, the test will be negative 95% of the time, but there will be some false positives.

P (- | H) = 0.95

P (+ | H) = 0.05

In other works, the probability of the test being Correct, for Healthy people, is 95%.

So, the test is either 100% accurate or 95% accurate, depending on whether the patient is healthy or sick. Taken together, this means the test is at least 95% accurate.

So far so good. Those are the statements that would be make by a frequentist. Those statements are quite simple to understand and are true. There's no need to waffle about a 'frequentist interpretation'.

But, things get interesting when you try to turn things around. Given the test result, what can you learn about the health of the patient? Given a negative test result, the patient is obviously healthy, as there are no false negatives.

But we must also consider the case where the test is positive. Was the test positive because the patient was actually sick, or was it a false positive? This is where the frequentist and Bayesian diverge. Everybody will agree that this cannot be answered at the moment. The frequentist will refuse to answer. The Bayesian will be prepared to give you an answer, but you'll have to give the Bayesian a prior first - i.e. tell it what proportion of the patients are sick.

To recap, the following statements are true:

For healthy patients, the test is very accurate.
For sick patients, the test is very accurate.

If you are satisfied with statements such as that, then you are using frequentist interpretations. This might change from project to project, depending on what sort of problems you're looking at.

But you might want to make different statements and answer the following question:

For those patients that got a positive test result, how accurate is the test?

This requires a prior and a Bayesian approach. Note also that this is the only question of interest to the doctor. The doctor will say "I know that the patients will either get a positive result or a negative result. I also now that the negative result means the patient is healthy and can be send home. The only patients that interest me now are those that got a positive result -- are they sick?."

To summarize: In examples such as this, the Bayesian will agree with everything said by the frequentist. But the Bayesian will argue that the frequentist's statements, while true, are not very useful; and will argue that the useful questions can only be answered with a prior.

A frequentist will consider each possible value of the parameter (H or S) in turn and ask "if the parameter is equal to this value, what is the probability of my test being correct?"

A Bayesian will instead consider each possible observed value (+ or -) in turn and ask "If I imagine I have just observed that value, what does that tell me about the conditional probability of H-versus-S?"

answered Jun 26 '12 at 18:36

Aaron McDaid
29629

Do you mean For sick patients, the test is NOT very accurate. you forget the NOT? – agstudyJan 6 at 23:44

It's very accurate in both cases, so no I did not forget a word. For healthy people, the result will be correct (i.e. 'Negative') 95% of the time. And for sick people, the result will be correct (i.e. 'Positive') 95% of the time. – Aaron McDaid Jan 7 at 20:56

add a comment

up vote3down vote

Schools of thought in Probability Theory

answered Jul 19 '10 at 19:30

adamo
21224

add a comment

up vote2down vote

This question about drawing inferences about an individual bowl player when you have two data sets - other players' results, and the new player's results, is a good spontaneous example of the difference which my answer tries to address in plain English.

answered Jan 24 '12 at 10:01

Peter Ellis
11.7k1657

add a comment

up vote2down vote

Bayesian and frequentist statistics are compatible in that they can be understood as two limiting cases of assessing the probability of future events based on past events and an assumed model, if one admits that in the limit of a very large number of observations, no uncertainty about the system remains, and that in this sense a very large number of observations is equal to knowing the parameters of the model.

Assume we have made some observations, e.g., outcome of 10 coin flips. In Bayesian statistics, you start from what you have observed and then you assess the probability of future observations or model parameters. In frequentist statistics, you start from an idea (hypothesis) of what is true by assuming scenarios of a large number of observations that have been made, e.g., coin is unbiased and gives 50% heads up, if you throw it many many times. Based on these scenarios of a large number of observations (=hypothesis), you assess the frequency of making observations like the one you did, i.e.,frequency of different outcomes of 10 coin flips. It is only then that you take your actual outcome, compare it to the frequency of possible outcomes, and decide whether the outcome belongs to those that are expected to occur with high frequency. If this is the case you conclude that the observation made does not contradict your scenarios (=hypothesis). Otherwise, you conclude that the observation made is incompatible with your scenarios, and you reject the hypothesis.

Thus Bayesian statistics starts from what has been observed and assesses possible future outcomes. Frequentist statistics starts with an abstract experiment of what would be observed if one assumes something, and only then compares the outcomes of the abstract experiment with what was actually observed. Otherwise the two approaches are compatible. They both assess the probability of future observations based on some observations made or hypothesized.

I started to write this up in a more formal way:

Positioning Bayesian inference as a particular application of frequentist inference and vice versa. figshare.

http://dx.doi.org/10.6084/m9.figshare.867707

The manuscript is new. If you happen to read it, and have comments, please let me know.

answered Dec 13 '13 at 18:37

user36160
211

add a comment

up vote1down vote

Somewhat OT, but here are two poems I wrote (one about each school of thought):

For Bayesians and for frequentists

answered Sep 18 '11 at 11:15

Peter Flom♦
45.8k847113

+1, for humour, even though it doesn't answer the OP – naught101 Feb 21 '12 at 2:57

add a comment

up vote1down vote

I would say that they look at probability in different ways. The Bayesian is subjective and uses a prior beliefs to define a prior probability distribution on the possible values of the unknown parameters. So he relies on a theory of probability like deFinetti's. The frequentist see probability as something that has to do with a limiting frequency based on an observed proportion. This is in line with the theory of probability as developed by Kolmogorov and von Mises.
A frequentist does parametric inference using just the likelihood function. A Bayesian takes that and multiplies to by a prior and normalizes it to get the posterior distribution that he uses for inference.

edited May 4 '12 at 22:56

chl♦
31.9k588203

answered May 4 '12 at 22:03

Michael Chernick
22.7k22358

add a comment

up vote0down vote

I have studied an exciting example like this: Take a look at this pictures

enter image description here

What did you see?

If you said that this is maybe a half-black half-white dog, you are frequentist.

If you see a black dog, it means that you are bayesian, this's based on your available knowledge about dog that there's rarely exist a dog half-black half-white.

So, most of us are Bayesian, we only don't recognize that.

edited Dec 5 at 8:30

answered May 28 '13 at 10:07

Kinh Nguyen
565

At best, this is a caricature of the differences in the two approaches. – whuber♦ May 28 '13 at 15:17

add a comment

up vote-4down vote

A male cat and a female cat are penned up in a steel chamber, along with enough food and water for 70 days.

A Frequentist would say the average gestation period for felines is 66 days, the female was in heat when the cats were penned up, and once in heat she will mate repeatedly for 4 to 7 days. Since there were likely many acts of propagation and enough subsequent time for gestation, the odds are, when the box is opened on day 70, there's a litter of newborn kittens.

A Bayesian would say, I heard some serious Marvin Gaye coming from the box on day 1 and then this morning I heard many kitten-like sounds coming from the box. So without knowing much about cat reproduction, the odds are, when the box is opened on day 70, there's a litter of newborn kittens.

edited Jul 20 '10 at 22:52

answered Jul 20 '10 at 19:54

A Lion
3641612

The way I wrote it up, specifically with the bayesian not knowing much about cat reproduction, at the beginning only the frequentist would bet on there being kittens. The relevant points of my very crude examplewere mostly that the frequentist made his prediction based on the data at the beginning, then sat back without incorporating new supplementary data, while the bayesian didn't have much data to begin with, but continued to incorporate relevant data as it become available. – A Lion Jul 21 '10 at 16:09

...and why wouldn't a non-Bayesian avail herself of the additional data, too? – whuber♦ May 4 '12 at 22:38

0 0