EM算法及其扩展

来源:互联网 发布:pmp认证 知乎 编辑:程序博客网 时间:2024/06/13 07:05

Preface
题目中的各大算法前人早已总结得很完善,如JerryLead,abcjennifer,pluskid 等各路前辈,开此题的目的就是斗胆在前人的基础上,画蛇添足,把自己工作中的一点研究和经验分享出来,与大家一同探讨进步,多多支持。

EM 部分一、EM的历史  (一)、大话EM  (二)、EM核心二、EM的推导  (一)、误差下界导出EM  (二)、EM迭代收敛证明  (三)、Q函数的分布推广三、EM的推广  (一)、GEM  (二)、ECM四、EM的一些参考资料GMM 部分一、一维GMM  (一)、理论推导  (二)、实现部分     1、数据模拟(R)     2、GMM的实现(scala)二、多维GMM  (一)、理论推导  (二)、实现部分     1、数据模拟(R)     2、GMMs的实现(scala)       (1)、初始值模拟       (2)、行列式计算       (3)、矩阵逆计算       (4)、GMMs综合

一、EM的历史

EM的历史就是一部血泪史,难怪MLAPP一书的作者Murphy会说:“This is an unfortunate term, since there are many ways to generalize EM.... .”

(一)、大话EM

EM历史
                                                                        图1 Dempster, Laird, Rubin(EM算法的重要引领者)
    EM(Expectation Maximization)算法的发展有一个分水岭–1977,那一年,Dempster, Laird, Rubin(上图1)三个一起发表了一篇论文:Maximum Likelihood from Incomplete Data via the EM Algorithm。 自此,EM算法结束了一种零零散散状态,以一种严谨的数学表达及证明进入人们的视野。
     其实,在DLR(Dempster, Laird, Rubin)之前,一些MLE(maximum likelihood estimation)问题,早已有人用到EM的一些思想去解决,但是都是比较特定的领域,直到DLR总览之后的总结,EM才大放光彩,那么为什么说EM的历史是一部血泪史,故事还得从DLR之后继续说,当DLR首次为EM正名之后,EM中的Q函数,M步是一直是很多人“诟病”的焦点,大家关注较多的通常有两个问题,一是Q函数的复杂度,二是M步的优化效率,然后就此展开了一场“厮杀”,有人觉得EM似乎可以从另外一个角度来解释,于是有了GEM框架(后面会提及),其觉得不必每次都最大化M步,找到一个比之前似然值更大的即可,有点梯度下降迭代的影子。有人觉得GEM的想法可取,但是高维下M步的迭代速度还是有点慢,于是有了ECM(后面会提及)对EM的扩展,不久,又有个哥们觉得ECM的改进不过瘾,进而修改了其中CM步骤的优化问题,提出了ECME算法,然后ECME在提出一段时间后,又被人整合到了新的框架,AECM算法中,等等,后面类似这样的事还有很多,今天提出的观点,明天就有人在你的基础上改进,一场算法竞技,血与泪并存,正是因为他们的不懈,推动着历史的滚轮大步向前。


待整理…

/**  * Created by yangcheng on 2016/9/10.  * This code is about gaussian mixture model(GMM).  * Hope it well done in our work.  */object GMM {    def main(args: Array[String]): Unit = {        import scala.io.Source._        import scala.Array._        import scala.collection.mutable        import scala.util.Random        val data = fromFile("C:/Users/yangcheng/Desktop/rnormData.txt").getLines().toArray        //data.foreach(println)        // k indicates that it has k kinds of gaussian model        val k: Int = 4        // initial the parameter        val alpha: Array[Double] = new Array[Double](k).map(_ + (1.0 / k))        val theta = new mutable.HashMap[Int, Map[String, Double]]()        //Random.setSeed(1000)        for (i <- 0 until k) {            theta.put(i, Map("mean" -> Random.nextInt(100), "var" -> Random.nextInt(100)))        }        //theta.foreach(println)        // training        var len: Double = 1.0        var j: Int = 0        while (len > math.pow(10, -1)) {            j += 1            val oldTheta = theta.clone()            // calculate the latent variable (E step)            val gamma = data.map(a => {                val rs = new Array[Double](k)                for (i <- 0 until k) {                    rs(i) = alpha(i) * (1 / math.sqrt(2 * math.Pi * theta(i)("var")) *                      math.exp(-math.pow(a.toDouble - theta(i)("mean"), 2) / 2 / theta(i)("var")))                }                rs.map(_ / rs.sum)            })            //gamma.foreach(a => println(a.mkString(",")))            // maximum the Q fun (M step)            for (i <- 0 until k) {                // update the sd of gaussian                val sdjk = gamma.map(_ (i)).zip(data.map(a => math.pow(a.toDouble - theta(i)("mean"), 2))).map(a => a._1 * a._2).sum                theta(i) += ("var" -> sdjk / gamma.map(_ (i)).sum)                // update the mean of gaussian                val meanjk = gamma.map(_ (i)).zip(data).map(a => a._1 * a._2.toDouble).sum                theta(i) += ("mean" -> meanjk / gamma.map(_ (i)).sum)                // update the alpha                alpha(i) = gamma.map(_ (i)).sum / data.length            }            // judge the length of update step is enough or not            len = 0.0            for (i <- 0 until k) {                len += (theta(i)("mean") - oldTheta(i)("mean")).abs                len += (theta(i)("var") - oldTheta(i)("var")).abs            }            //oldTheta.foreach(a => println("oldTheta = " + a))        }        // print the result        for(i <- 0 until k) println("alpha" + i + " = " + alpha(i))        theta.foreach(a => println("theta = " + a))        println("j = " + j)    }}

/**  * Created by yangcheng on 2016/9/12.  * This code is about Multidimensional gaussian mixture model(GMMs).  * it more general in real world.  * Hope it well done in work.  */object GMMs {    def main(args: Array[String]): Unit = {        import scala.io.Source._        import scala.Array._        import scala.collection.mutable        import scala.util.Random        import scala.annotation.tailrec        // load the data        val mvdata = fromFile("C:/Users/yangcheng/Desktop/mvrnormData.txt").getLines().toArray        //mvdata.map(_.split(" ")).foreach(a => println(a.mkString("[", ", ", "]")))        // initial the parameter        // k indicates that it has k kinds of gaussian model        // dim indicates that those gaussian model have dim dimensions        val dim: Int = 2        val k: Int = 2        val alpha: Array[Double] = new Array[Double](k).map(_ + 1.0 / k)        val mvMu = mutable.HashMap[Int, Array[Double]]()        val mvSigma = mutable.HashMap[Int, Array[Array[Double]]]()        val Imatrix = mutable.HashMap[Int, Array[Array[Double]]]()        // Generate random mean vector and covariance matrix (positive definite matrix, Mr.Hurwitz theorem)        for (i <- 0 until k) {            // for a mean vector            mvMu.put(i, new Array[Double](dim).map(_ + Random.nextInt(100)))            // for a cov matrix and identity matrix            mvSigma.put(i, ofDim[Double](dim, dim))            Imatrix.put(i, ofDim[Double](dim, dim))            for (j <- 0 until dim) {                mvSigma(i)(j)(j) = (Random.nextInt(10) + 1) * 10                Imatrix(i)(j)(j) = 1                for (s <- j + 1 until dim) {                    mvSigma(i)(j)(s) = Random.nextGaussian()                    mvSigma(i)(s)(j) = mvSigma(i)(j)(s)                    Imatrix(i)(j)(s) = 0                    Imatrix(i)(s)(j) = Imatrix(i)(j)(s)                }            }        }        //mvdata.map(_.split(" ")).map(a => a.map(_.toDouble)).map(b => b.zip(mvMu(0)).map(c => c._1 - c._2)).foreach(a => println(a.mkString("\t")))        //alpha.foreach(println)        mvMu.foreach(a => println("mvMu = " + a))        mvMu(0).foreach(a => println("mvMu0 = " + a))        //mvSigma.foreach(a => println("mvSigma = " + a))        //mvSigma.foreach(a => println(a._2.mkString("\t")))        //mvSigma(0).foreach(a => println(a.mkString("\t\t")))        //Imatrix.foreach(a => println(a._2.mkString("\t")))        //Imatrix(0).foreach(a => println(a.mkString("\t\t")))        // define the function of Extension matrix adjustment, make sure there are haven't zero row or column        //def adjust(M: Array[Array[Double]]): Array[Array[Double]] = {        //    val allRowSum = M.reduceLeft(_.zip(_).map(a => 2 * a._1 + 2 * a._2))        //    M.map(a => a.zip(allRowSum).map(b => b._1 + b._2))        //}        //adjust(mvSigma(0)).foreach(a => println("adjust mvSigma = " + a.mkString("\t")))        // Define the calculation of the inverse of a matrix, with a elementary transformation        // Extension matrix elementary transformation        def calculateInverse(Sigma: Array[Array[Double]], E: Array[Array[Double]]): Array[Array[Double]] = {            val l = Sigma.length            // combine Sigma with E, form the Extension Matrix            val ExtensionMatrix = ofDim[Double](l, 2 * l)            for (i <- 0 until l) {                ExtensionMatrix(i) = Sigma(i).++(E(i))            }            //ExtensionMatrix.foreach(a => println("ExtensionMatrix = " + a.mkString("\t")))            // elementary transformation            @tailrec            def elementaryTransform(eM: Array[Array[Double]], n: Int): Array[Array[Double]] = {                if (n >= l) {                    for (j <- (1 until l).reverse) {                        for (k <- 0 until j) {                            eM(k) = eM(k).zip(eM(j)).map(a => a._1 - eM(k)(j) * a._2)                        }                    }                    eM                } else {                    eM(n) = eM(n).map(_ / eM(n)(n))                    for (i <- n until l - 1) {                        eM(i + 1) = eM(i + 1).map(_ / eM(i + 1)(n))                        eM(i + 1) = eM(i + 1).zip(eM(n)).map(a => a._1 - a._2)                    }                    elementaryTransform(eM, n + 1)                }            }            // take out the identity matrix            elementaryTransform(ExtensionMatrix, 0).map(a => {                val gg = new Array[Double](l)                for (i <- 0 until l) {                    gg(i) = a(i + l)                }                gg            })        }        // print the output        calculateInverse(Sigma = mvSigma(0), E = Imatrix(0)).foreach(a => println("Inverse of Sigma = " + a.mkString("\t")))        // Define the calculation of the determinant of a matrix        // turn M into Upper triangular matrix        @tailrec        def calculateDeteminant(M: Array[Array[Double]], n: Int): Double = {            if (n >= M.length - 1) {                var Deteminant: Double = 1.0                for (i <- M.indices) {                    Deteminant = Deteminant * M(i)(i)                }                Deteminant            } else {                for (i <- n until M.length - 1) {                    M(i + 1) = M(i + 1).zip(M(n)).map(a => a._1 - a._2 * M(i + 1)(n) / M(n)(n))                }                calculateDeteminant(M, n + 1)            }        }        //println(calculateDeteminant(mvSigma(0),0))        // def the matrix multiplication function        def matrixMultiplicationLeft(left: Array[Double], middle: Array[Array[Double]]): Array[Double] = {            val col = middle(0).length            val leftResult = new Array[Double](col)            for (j <- 0 until col) {                leftResult(j) = left.zip(middle.map(_ (j))).map(a => a._1 * a._2).sum            }            leftResult        }        //matrixMultiplicationLeft(alpha,mvSigma(0)).foreach(println)        // training        var len: Double = 1.0        var j: Int = 0        while (len > math.pow(10, -1)) {            j += 1            val oldmvMu = mvMu.clone()            val oldmvSigma = mvSigma.clone()            // calculate the latent variable (E step)            val gamma = mvdata.map(a => {                val rs = new Array[Double](k)                for (i <- 0 until k) {                    val dx = a.split(" ").zip(mvMu(i)).map(b => b._1.toDouble - b._2) //.toArray                    val temp = matrixMultiplicationLeft(dx, calculateInverse(mvSigma(i), Imatrix(i))).zip(dx).map(a => a._1 * a._2).sum                    val ISigma = mvSigma(i).clone()                    rs(i) = (1.0 / math.pow(2 * math.Pi, dim / 2)) * (1.0 / math.pow(calculateDeteminant(ISigma, 0).abs, 0.5)) * math.exp(-0.5 * temp) * alpha(i)                }                rs.map(_ / rs.sum)            })            //gamma.foreach(a => println(a.mkString(",")))            //val dx = mvdata(0).split(" ").zip(mvMu(0)).map(b => b._1.toDouble - b._2)            //println("dx = " + dx.mkString("\t"))            //val temp00 = calculateInverse(Sigma = mvSigma(0), E = Imatrix(0))            //val temp11 = matrixMultiplicationLeft(dx, calculateInverse(mvSigma(0), Imatrix(0))).zip(dx).map(a => a._1 * a._2).sum            //println("temp0 = " + temp00(0).mkString("\t"))            // maximum the Q fun (M step)            for (i <- 0 until k) {                // update the sigma of the multidimensional gaussian model                val iSigma = ofDim[Double](dim, dim)                val dx2 = mvdata.map(a => {                    a.split(" ").zip(mvMu(i)).map(b => b._1.toDouble - b._2)                })                for (n <- 0 until dim) {                    for (m <- 0 until dim) {                        iSigma(n)(m) = gamma.map(_ (i)).zip(dx2.map(_ (n)).zip(dx2.map(_ (m))).map(a => a._1.toDouble * a._2)).map(b => b._1 * b._2).sum /                          gamma.map(_ (i)).sum                    }                }                mvSigma.put(i, iSigma)                // update the mean of the multidimensional gaussian model                val iMu = new Array[Double](dim)                for (j <- 0 until dim) {                    iMu(j) = gamma.map(_ (i)).zip(mvdata.map(_.split(" ")).map(_ (j))).map(a => a._1 * a._2.toDouble).sum / gamma.map(_ (i)).sum                }                mvMu.put(i, iMu)                // update the alpha                alpha(i) = gamma.map(_ (i)).sum / mvdata.length            }            // judge the length of update step is enough or not            len = 0.0            for (i <- 0 until k) {                len += mvMu(i).zip(oldmvMu(i)).map(a => (a._1 - a._2).abs).sum                for (j <- 0 until dim) {                    len += mvSigma(i)(j).zip(oldmvSigma(i)(j)).map(a => (a._1 - a._2).abs).sum                }            }        }        alpha.foreach(println)        mvMu.foreach(a => println("mvMu = " + a._2.mkString("\t")))        mvSigma.foreach(a => println("mvSigma = " + a._2.mkString("\t")))        mvSigma(0).foreach(a => println("mvSigma0 = " + a.mkString("\t")))        mvSigma(1).foreach(a => println("mvSigma0 = " + a.mkString("\t")))        println(j)    }}
1 0
原创粉丝点击