uva 10090 So you want to be a 2n-aire?

来源：互联网发布：如何关闭linux系统编辑：程序博客网时间：2024/05/22 01:58

原题：
The player starts with a prize of $1,
and is asked a sequence of n ques-
tions. For each question, he may
• quit and keep his prize.
• answer the question.
If wrong, he quits with nothing.If correct, the prize is doubled,and he continues with the next question.After the last question, he quits with his prize. The player wants to maximize his expected prize.Once each question is asked, the player is able to assess the probability p that he will be able to answer it. For each question, we assume that p is a random variable uniformly distributed over the range t..1.
Input
Input is a number of lines, each with two numbers: an integer 1 ≤ n ≤ 30, and a real 0 ≤ t ≤ 1. Input
is terminated by a line containing ‘0 0’. This line should not be processed.
Output
For each input n and t, print the player’s expected prize, if he plays the best strategy. Output should
be rounded to three fractional digits.
Sample Input
1 0.5
1 0.3
2 0.6
24 0.25
0 0
Sample Output
1.500
1.357
2.560
230.138
大意：
在一个电视娱乐节目中,你一开始有1元钱。主持人会问你n个问题,每次你听到问题后有两个选择:一是放弃回答该问题,退出游戏,拿走奖金;二是回答问题。如果回答正确,奖金加倍;如果回答错误,游戏结束,你一分钱也拿不到。如果正确地回答完所有n个问题,你将拿走所有的2 n 元钱,成为2 n 元富翁。当然,回答问题是有风险的。每次听到问题后,你可以立刻估计出答对的概率。由于主持人会随机问问题,你可以认为每个问题的答对概率在t和1之间均匀分布。输入整数n和实数t(1≤n≤30,0≤t≤1),你的任务是求出在最优策略下,拿走的奖金金额的期望值。这里的最优策略是指让奖金的期望值尽量大。
代码：

#include <bits/stdc++.h>using namespace std;double PowTwo[31],t;int n;int main(){    ios::sync_with_stdio(false);    PowTwo[0]=1;    for(int i=1;i<=30;i++)        PowTwo[i]=PowTwo[i-1]*2;    double ex;    while(cin>>n>>t,n+t)    {        if(abs(1-t)<1e-9)        {            cout<<fixed<<setprecision(3)<<PowTwo[n]<<endl;            continue;        }        ex=PowTwo[n];        for(int i=n-1;i>=0;i--)        {            double p0=PowTwo[i]/ex;            if(p0<=t)                ex=(1+t)/2*ex;            else            {                double p1=(p0-t)/(1-t);                ex=PowTwo[i]*p1+(1+p0)/2*ex*(1-p1);            }        }        cout<<fixed<<setprecision(3)<<ex<<endl;    }    return 0;}

解答:
这题不会做，偶然发现在刘汝佳的紫书讲解数学期望的例题上面就是这道题。此题真是好题啊，而且不太好想。
下面解答来自刘汝佳的紫书。
假设你刚开始游戏,如果直接放弃,奖金为1;如果回答,期望奖金是多少呢?不仅和
第1题的答对概率p相关,而且和答后面的题的情况相关。即:
选择“回答第1题”后的期望奖金 = p * 答对1题后的最大期望奖金
注意,上式中“答对1题后的最大期望奖金”和这次的p无关,这提示我们用递推的思想,
用d[i]表示“答对i题后的最大期望奖金”,再加上“不回答”时的情况,可以得到:若第1题答对
概率为p,期望奖金的最大值 = max{2 0 , p*d[1]}
这里故意写成2 0 ,强调这是“答对0题后放弃”所得到的最终奖金。
上述分析可以推广到一般情况,但是要注意一点:到目前为止,一直假定p是已知的,
而p实际上并不固定,而是在t~1内均匀分布。根据连续概率的定义,d[i]在概念上等于
max{2 i ,
p*d[i+1]}在p=t~1上的积分。不要害怕“积分”二字,因为虽然在概念上这是一个积
分,但是落实到具体的解法上,仍然只需要基础知识。
因为有max函数的存在,需要分两种情况讨论,即p×d[i+1]<2 i 和p×d[i+1]≥2 i 两种情况。
令p 0 =max{t, 2 i /d[i+1]}(加了一个max是因为根据题目,p≥t),则:
p < p0时,p×d[i+1]<2 i ,因此“不回答”比较好,期望奖金等于2 i 。
p≥p0时,“回答”比较好,期望奖金等于d[i]乘以p的平均值(d[i]作为常数被“提出来”了),即(1+p0)/2 ×d[i+1]。（取平均值的实际上就是算均匀分布的期望，算出来的期望值是获胜概率的期望值）
在第一种情况中,p的实际范围是[t,p0),因此概率为p1=(p0-t)/(1-t)。根据全期望公
式,d[i] = 2 i × p1 + (1+p0)/2 × d[i+1] × (1-p1)。
边界是d[n] = 2 n ,逆向递推出d[0]就是本题的答案。

0 0