集体智慧编程（四）优化

来源：互联网发布：大众点评源码编辑：程序博客网时间：2024/05/17 23:00

博客地址：http://andyheart.me。

如果有任何错误，请私信或者评论，多谢。

本章主要介绍了几种常用的优化算法，优化技术擅长处理：受多种变量影响，存在多种可能解的问题。

优化算法的关键在于找到成本函数。

涉及到的算法如下：

随机搜索
爬山法（随机重复爬山法）
模拟退火算法
遗传算法

组团旅游

本章从一个组团旅游的问题引出。

描述：来自美国各地的家庭成员要在同一天乘坐飞机到达同一个地方，并且在同一天离开，设计一个合理的方案。

分析：在实现过程中，首先应该知道成员名称以及对应的地点；其次应该掌握航班信息。

相应的Python代码如下：

import timeimport randomimport mathpeople = [('Seymour','BOS'),          ('Franny','DAL'),          ('Zooey','CAK'),          ('Walt','MIA'),          ('Buddy','ORD'),          ('Les','OMA')]# Laguardiadestination='LGA'flights={}# for line in file('schedule.txt'):  origin,dest,depart,arrive,price=line.strip().split(',')  flights.setdefault((origin,dest),[])  # Add details to the list of possible flights  flights[(origin,dest)].append((depart,arrive,int(price)))def getminutes(t):  x=time.strptime(t,'%H:%M')  return x[3]*60+x[4]def printschedule(r):  for d in range(len(r)/2):    name=people[d][0]    origin=people[d][1]    out=flights[(origin,destination)][int(r[d])]    ret=flights[(destination,origin)][int(r[d+1])]    print '%10s%10s %5s-%5s $%3s %5s-%5s $%3s' % (name,origin,                                                  out[0],out[1],out[2],                                                  ret[0],ret[1],ret[2])

成本函数

我们已经说过了，成本函数是优化算法的关键，确定以后，对于优化算法来说，我们只要将成本函数尽可能的变小就可以了。任何优化算法的目标，就是：要寻找一组能够使得成本函数的返回结果达到最小化的输入。

在本例中，成本函数的影响因素主要包括以下几个方面：

价格
所有航班的总票价，财务因素
旅行时间
每个人在飞机上花费的时间
等待时间
在机场等待的时间
出发时间
航班起飞时间太早有可能有额外的花费
汽车租用时间（不懂）

在找到影响成本函数的因素之后，我们就需要找到办法将他们组合在一起形成一个值（应该为一个函数对应的值），例如在本例中，我们可以假定在飞机上的飞行时间每一分钟价值1美元，在机场等待的时间每一分钟等于0.5美元，这样，问题的成本函数就会轻易的用一个值来代替。

在代码中加入如下函数：

def schedulecost(sol):  totalprice=0  latestarrival=0  earliestdep=24*60  for d in range(len(sol)/2):    # Get the inbound and outbound flights    origin=people[d][1]    outbound=flights[(origin,destination)][int(sol[d])]    returnf=flights[(destination,origin)][int(sol[d+1])]    # Total price is the price of all outbound and return flights    totalprice+=outbound[2]    totalprice+=returnf[2]    # Track the latest arrival and earliest departure    if latestarrival<getminutes(outbound[1]): latestarrival=getminutes(outbound[1])    if earliestdep>getminutes(returnf[0]): earliestdep=getminutes(returnf[0])  # Every person must wait at the airport until the latest person arrives.  # They also must arrive at the same time and wait for their flights.  totalwait=0    for d in range(len(sol)/2):    origin=people[d][1]    outbound=flights[(origin,destination)][int(sol[d])]    returnf=flights[(destination,origin)][int(sol[d+1])]    totalwait+=latestarrival-getminutes(outbound[1])    totalwait+=getminutes(returnf[0])-earliestdep    # Does this solution require an extra day of car rental? That'll be $50!  if latestarrival>earliestdep: totalprice+=50  return totalprice+totalwait

在建立了成本函数以后，我们的目标就是需要对函数的值进行优化从而达到最小值。

优化算法

优化算法主要解决成本函数确定的情况下尽量得到最小值的问题。

随机搜索

顾名思义，随机搜索就是一种随机尝试的方法，在实现过程中随机的产生一定数量的解，并且对这些解一一进行成本值的计算，取最小值。Python代码如下：

def randomoptimize(domain,costf):  best=999999999  bestr=None  for i in range(0,1000):    # Create a random solution    r=[float(random.randint(domain[i][0],domain[i][1]))        for i in range(len(domain))]    # Get the cost    cost=costf(r)    # Compare it to the best one so far    if cost<best:      best=cost      bestr=r   return r

爬山法（随机重复爬山法）

爬山法从一个随机解开始，然后在其邻近的解集中寻找更好的题解（具有更低的成本），从而找到局部最小值，作为最优解。算法容易找到局部最优解，而不是全局最优解。解决这个问题的办法可以使用随机重复爬山法，即让爬山法以多个随机生成的初始解为起点运行多次，借此希望找到一个全局最优解。Python代码如下：

def hillclimb(domain,costf):  # 创建一个随机解  sol=[random.randint(domain[i][0],domain[i][1])      for i in range(len(domain))]  # 主循环  while 1:    # 创建一个相邻解的列表    neighbors=[]    for j in range(len(domain)):      # 在每个方向上相对于原值偏离一点点      if sol[j]>domain[j][0]:        neighbors.append(sol[0:j]+[sol[j]+1]+sol[j+1:])      if sol[j]<domain[j][1]:        neighbors.append(sol[0:j]+[sol[j]-1]+sol[j+1:])    # 在相邻解中寻找最优解    current=costf(sol)    best=current    for j in range(len(neighbors)):      cost=costf(neighbors[j])      if cost<best:        best=cost        sol=neighbors[j]    # 如果没有最优解，退出循环    if best==current:      break  return sol

模拟退火算法

原理：从某一个随机解开始，用某一个变量代表温度，开始时非常高，尔后逐渐变低，每一次迭代期间，算法会随机选中题解中的某个数字，然后朝着某个方向变化。该算法关键在于，如果新的成本更低，则新的题解称为当前题解，如果新的成本更高，新的题解仍可能称为当前题解，这是避免局部最优的尝试。

开始阶段，算法接受差的题解能力较强，随着算法的深入，越来越不能接受差的题解。其被接受的概率由一下公式得到：

P=e(−(highcost−lowcost)/temperature)

因此，该算法在示例中的Python代码如下：

def annealingoptimize(domain,costf,T=10000.0,cool=0.95,step=1):  # Initialize the values randomly  vec=[float(random.randint(domain[i][0],domain[i][1]))        for i in range(len(domain))]  while T>0.1:    # Choose one of the indices    i=random.randint(0,len(domain)-1)    # Choose a direction to change it    dir=random.randint(-step,step)    # Create a new list with one of the values changed    vecb=vec[:]    vecb[i]+=dir    if vecb[i]<domain[i][0]: vecb[i]=domain[i][0]    elif vecb[i]>domain[i][1]: vecb[i]=domain[i][1]    # Calculate the current cost and the new cost    ea=costf(vec)    eb=costf(vecb)    p=pow(math.e,(-eb-ea)/T)    # Is it better, or does it make the probability    # cutoff?    if (eb<ea or random.random()<p):      vec=vecb          # Decrease the temperature    T=T*cool  return vec

遗传算法

原理：首先，算法随机生成一组解（称为种群），从中选取成本函数最低的解（精英选拔法），然后修改题解，生成新的种群，修改题解方法有两种，变异和交叉（配对）。变异就是随机改变题解某一个特征的值。交叉就是两个题解的特征值进行交叉。代码生成可以如下：

def geneticoptimize(domain,costf,popsize=50,step=1,                    mutprob=0.2,elite=0.2,maxiter=100):  # Mutation Operation  def mutate(vec):    i=random.randint(0,len(domain)-1)    if random.random()<0.5 and vec[i]>domain[i][0]:      return vec[0:i]+[vec[i]-step]+vec[i+1:]     elif vec[i]<domain[i][1]:      return vec[0:i]+[vec[i]+step]+vec[i+1:]  # Crossover Operation  def crossover(r1,r2):    i=random.randint(1,len(domain)-2)    return r1[0:i]+r2[i:]  # Build the initial population  pop=[]  for i in range(popsize):    vec=[random.randint(domain[i][0],domain[i][1])          for i in range(len(domain))]    pop.append(vec)  # How many winners from each generation?  topelite=int(elite*popsize)  # Main loop   for i in range(maxiter):    scores=[(costf(v),v) for v in pop]    scores.sort()    ranked=[v for (s,v) in scores]    # Start with the pure winners    pop=ranked[0:topelite]    # Add mutated and bred forms of the winners    while len(pop)<popsize:      if random.random()<mutprob:        # Mutation        c=random.randint(0,topelite)        pop.append(mutate(ranked[c]))      else:        # Crossover        c1=random.randint(0,topelite)        c2=random.randint(0,topelite)        pop.append(crossover(ranked[c1],ranked[c2]))    # Print current best score    print scores[0][0]  return scores[0][1]

0 0