贝叶斯网络结构学习之连续参数处理方法

来源:互联网 发布:接收http报文数据 编辑:程序博客网 时间:2024/04/19 17:37
题目:贝叶斯网络结构学习之连续参数处理方法

        首先必须说明:严格来说,这不是一篇完整的文档,因为文档最后并没有给出确定的结果,至少个人不认为文档引用的几个程序一定是正确的。

        前面介绍了三种学习贝叶斯网络结构的Matlab实现,但都限于数据集的数据是离散型的,即贝叶斯网络每个节点的取值只是有限几种情况。那么当节点的取值是连续的时候如何处理呢?

        看了一些文献,发现如何对连续数据使用贝叶斯网络建模的关键在于将原来的连续值离散化,因为之所以无法使用前面介绍的函数就是由于每个节点的值变成连续的了,只要将其离散化(discretization)就是了,而遇到此问题时一般会引用到以下文献:

Tsai C J, Lee C I, Yang W P. Adiscretization algorithm based on class-attribute contingency coefficient[J].Information Sciences, 2008, 178(3): 714-731.

        查了一下,还专门有一小部分人做这方面的工作,也就是说这是一个研究方向。该文献中提出了CACC算法,即Class-Attribute Contingency Coefficient。在贝叶斯网络结构学习这个方向前前后后断续续也花了一个多月了,现在我并不想再去细细的琢磨CACC算法并去实现它了,所以就在网上搜了一下是否有现成的Matlab代码,于是找到了如下链接:

http://cn.mathworks.com/matlabcentral/fileexchange/24343-discretization-algorithms--class-attribute-contingency-coefficient

        打开链接:

        这是Matlab官方的一个Community,类似于BBS吧,网友们可以分享自己的代码。切换到Functions页面(默认是Overview页面):

        可以看到网友Guangdi Li分享的CACC的代码,该代码分两部分,一部分是CACC_Discretization函数部分(即CACC算法),一个是测试例子ControlCenter.m,CACC_Discretization函数代码如下:

function [ DiscreData,DiscretizationSet ] = CACC_Discretization( OriginalData, C )%Paper: Cheng-Jung Tsai , Chien-I. Lee , Wei-Pang Yang, A discretization%algorithm based on Class-Attribute Contingency Coefficient, Information Sciences: an International Journal, v.178 n.3, p.714-731, February, 2008 %1 Input: Dataset with i continuous attribute, M examples and S target classes;%2 Begin%3 For each continuous attribute Ai%4 Find the maximum dn and the minimum d0 values of Ai;%5 Form a set of all distinct values of A in ascending order;%6 Initialize all possible interval boundaries B with the minimum and maximum%7 Calculate the midpoints of all the adjacent pairs in the set;%8 Set the initial discretization scheme as D: {[d0,dn]}and Globalcacc = 0;%9 Initialize k = 1;%10 For each inner boundary B which is not already in scheme D,%11 Add it into D;%12 Calculate the corresponding cacc value;%13 Pick up the scheme D?with the highest cacc value;%14 If cacc > Globalcacc or k < S then%15 Replace D with D?%16 Globalcacc = cacc;%17 k = k + 1;%18 Goto Line 10;%18 Else%19 D?= D;%20 End If%21 Output the Discretization scheme D?with k intervals for continuous attribute Ai;%22 End% This code is implemented by Guangdi Li, 2009/06/04% OriginalData is organized as F1,F2,...,Fm,C1,C2,...,CnF = size( OriginalData,2 ) - C ;M = size( OriginalData,1 );DiscreData = zeros( M,C+F ); DiscreData( :,F+1:F+C ) = OriginalData( :,F+1:F+C );% Assume the maximum number of interval is M/(3*C)MaxNumF = floor(M/(3*C));% Save all the discretization intervals, which is saved in columnDiscretizationSet = zeros( MaxNumF,F );for p = 1:F    % Step 1    %Dn = max( OriginalData( :,p )); % the maximum boundary     %Do = min( OriginalData( :,p )); % the minimum boundary       SortedInterval = unique( OriginalData( :,p ) );    if length(SortedInterval) == 1 % all values are equal        DiscretizationSet( 1,p )= SortedInterval;        DiscreData( :,p ) = zeros(M,1);          continue;    end            B = zeros( 1,length( SortedInterval )-1 );    Len = length( B );    for q = 1:Len        B( q ) = ( SortedInterval( q ) + SortedInterval( q+1 ) )/ 2;    end    %B      D = zeros( 1,MaxNumF ); % D save all discretizations for variable Fi    %D( 1 ) = Do; D( 2 ) = Dn;     GlobalCACC = -Inf;    %B    %p    %Step 2    k=0; % save the number of discretizations in D, the initiate state is 2     while true          CACC = - Inf; Local = 0;          for q = 1:Len              if isempty( find( D( 1:k )==B(q), 1 ) ) == 1                                   DTemp = D;                 DTemp( k+1 ) = B( q );                 DTemp( 1:( k+1 ) ) = sort( DTemp( 1:( k+1 ) ) );                 CACCValue = CACC_Evaluation( OriginalData,C,p,DTemp( 1:( k+1 ) ) );                 if CACC < CACCValue                     CACC = CACCValue;                    Local= q;                 end                         end                  end          %Local          %CACC          %GlobalCACC          if GlobalCACC < CACC && k < MaxNumF              GlobalCACC = CACC             k = k + 1;             D( k ) = B( Local );             D( 1:k ) = sort( D( 1:k ) );          elseif  k <= MaxNumF && k <= C && Local ~= 0             k = k + 1;             D( k ) = B( Local );             D( 1:k ) = sort( D( 1:k ) );                        else              break;          end       end    DiscretizationSet( 1:k,p )= D( 1:k )';    % do the discretization process according to intervals in D.     DiscreData( :,p ) = DiscretWithInterval( OriginalData,C,p,D( 1:k ) );    endendfunction CACCValue = CACC_Evaluation( OriginalData, C, Feature, DiscretInterval )%Paper: Kurgan, L. and Cios, K.J. (2002). CAIM Discretization Algorithm, IEEE Transactions of Knowledge and Data Engineering, 16(2): 145-153% OriginalData is organized as F1,F2,...,Fm,C1,C2,...,CnM = size( OriginalData,1 );k = length( DiscretInterval );[ DiscretData,QuantaMatrix ] = DiscretWithInterval( OriginalData,C,Feature,DiscretInterval );%Discrete the continuous data upon OriginalData %QuantaMatrix% Compute the value of CAIM via quanta matrix and equation (sum maxr/Mr)/n RowQuantaMatrix = sum( QuantaMatrix,2 );ColumnQuantaMatrix = sum( QuantaMatrix,1 );CACCValue = 0 ;for p = 1:C    for q = 1:k       if RowQuantaMatrix( p ) > 0 && ColumnQuantaMatrix( q ) > 0          CACCValue = CACCValue + ( QuantaMatrix( p,q ) )^2/( RowQuantaMatrix( p )*ColumnQuantaMatrix( q )) ;       end    endendCACCValue = M*( CACCValue-1 )/log2(k+1) ;endfunction [ DiscretData,QuantaMatrix ] = DiscretWithInterval( OriginalData,C,Column,DiscretInterval )% C is the number of class variables.M = size( OriginalData,1 );k = length( DiscretInterval );F = size( OriginalData,2 ) - C;DiscretData = zeros( M,1 );%Discrete the continuous data upon OriginalData for p = 1:M    for t = 1:k         if OriginalData( p,Column ) <= DiscretInterval( t )             DiscretData( p ) = t-1;             break;         elseif OriginalData( p,Column ) > DiscretInterval( k )             DiscretData( p ) = k;         end                  end        end%OriginalData( :,Column )%Quanta matrix CState = C;FState = length( DiscretInterval ) + 1;QuantaMatrix = zeros( CState,FState );for p = 1:M    for q = 1:C        if OriginalData( p,F+q ) == 1           Row = q;           Column = DiscretData( p )+1;           QuantaMatrix( Row,Column ) = QuantaMatrix( Row,Column ) + 1;        end    endend%QuantaMatrixend

        但是,注意到在Overview页面的Comments and Ratings部分的两个网友的评论:

        第二条评论karin Zachinelly在说这个函数实现是不完全的或者说实际上是错误的(This implementation is not complete and it is actually incorrect),第一条评论JulioZaragoza在说他把账号karin Zachinelly(即第二条评论的账号)停用了,他实现的CACC代码详见本账号的主页,于是顺藤摸瓜到了Julio Zaragoza的主页:

https://cn.mathworks.com/matlabcentral/profile/authors/3419760-julio-zaragoza

       从主页上可以发现此人分享了一个CACC的代码:

https://cn.mathworks.com/matlabcentral/fileexchange/41740-discretization-methods--class-attribute-contingency-coefficient--cacc-matlab-?s_tid=prof_contriblnk

       同样,默认是Overview页面,切换到Functions页面:

        代码包括两部分,一个是cacc函数,一个是main.m测试例子。cacc函数代码如下:

function [ discdata,discvalues,discscheme ] = cacc(data)        % Author: Julio Zaragoza    % The University of Adelaide, Adelaide, South Australia    %    % This function follows the paper:    % 'A Discretization Algorithm based on Class-Attribute Contingency Coefficient' (CACC),     % by Sheng-Jung Tsai, Shien-I Lee and Wei-Pang Yang, Information Sciences, Elsevier, 2008    %    % input:    % data: a MxN matrix containing the data to be discretized where M is     %       the number of examples and N is the number of features     %       (including the class).        %    %       Data should be organized as follows F1,F2,...,Fn-1,S (the last     %       column is taken as the class)        % output:    % discdata: a MxN matrix containing the discretized data (it can be     %           normalized or not normalized... check the end of this file).    % discvalues: a Nx1 cell containing the possible values of each feature.    % discscheme: a N-1x1 cell containing the discretization scheme (the    %             boundaries for each feature).        fprintf('cacc discretization...\n');        % variables for storing the discretized data, values and     % the discretization scheme    discdata = zeros(size(data));    discvalues = cell([size(data,2) 1]);    discscheme = cell([size(data,2)-1 1]);        % s is the number of target classes of S (possible class values)    s = unique(data(:,size(data,2)));        % assume the maximum number of intervals is M*0.75 (there is no point in     % using an algorithm to discretize a dataset if the resulting discretized     % dataset contains a lot of possible values/intervals)    maxnumintervals = floor(size(data,1)*0.75);        % following the pseudo-code of the paper:    % for each continuous attribute Ai    for A = 1:size(data,2)-1 % (-1 because we are not discretizing the class)                % find the maximum dn and the minimum d0 values of Ai        d0 = min(data(:,A));        dn = max(data(:,A));                % form a set of all distinct values of A in ascending order        distincvaluesA = sort(unique(data(:,A)));                % calculate the midpoints of all the adjacent pairs in the set        B = (distincvaluesA(1:length(distincvaluesA)-1)+distincvaluesA(2:length(distincvaluesA)))/2;                % set the initial discretization scheme as D: {[d0,dn]} and globalcacc = 0;        D = [d0 dn];        globalcacc = 0;                % initialize k = 1 (well... = 0 in this code), this is for helping         % the algorithm to stop once we have reached the maximum number of         % intervals        k = 0;                %for each inner boundary B which is not already in scheme D, Add it into D        %   calculate the corresponding cacc value        %   pick up the scheme D' with the highest cacc value:                for i=1:length(B)            auxB = B;            maxcacc = 0;            while length(auxB) > 0                if(auxB(1) == d0)                                        auxB(1) = [];                    continue;                end                % add the boundary B which is not already in scheme D                D = unique(sort([D, auxB(1)]));                                % calculate cacc value                caccval = caccvalue(data(:,A), D, s, data(:,size(data,2)));                                % print acc value                caccval                                 % pick up the scheme D' with the highest cacc value                if(caccval > maxcacc)                    Dprime = D;                    maxcacc = caccval;                    toremove = auxB(1);                end                % remove the boundary B (since we already tried with it)                D=D(D~=auxB(1));                auxB(1) = [];            end                        % if cacc > globalcacc            %    replace D with D', globalcacc = cacc:            if maxcacc > globalcacc                B = B(B~=toremove);                D = Dprime;                globalcacc = maxcacc;                k = k + 1;                if  k > maxnumintervals % if we have reached the maximun number                     break;              % of intervals we stop and continue with                 end                     % the next attribute Ai+1            end        end                % output the discretization scheme D' and discretized         % data with k intervals for continuous attribute Ai        discscheme{A} = D;        discdata(:,A) = discretizedata(data(:,A),D);        discvalues{A} = unique(discdata(:,A));    end    discdata(:,size(data,2)) = data(:,size(data,2));    discvalues{size(data,2)} = unique(discdata(:,size(data,2)));endfunction caccval = caccvalue(data, discscheme, s, c)    M = size(data,1);        % discretize the continuous data and compute the quanta matrix:        % I decided not to call the 'discretizedata' function (at the end of this file)    % and then compute the quanta matrix since that would require two for-    % loops iterating over each instance on the dataset each one (one for     % discretizing the data in the 'discretizedata' function and another     % one for computing the quanta matrix).    % I discovered that I could use only one for-loop to discretize the data and compute    % the quanta matrix at the same time as follows:        discretizeddata = zeros(size(data,1),1);    quantamatrix = zeros(length(s),length(discscheme)-1);    for i = 1:size(data,1)        for t = 2:length(discscheme)           if data(i) <= discscheme(t)               discretizeddata(i) = t-1; % discretize data               break;           end        end        % compute quanta matrix        quantamatrix(c(i),discretizeddata(i)) = quantamatrix(c(i),discretizeddata(i)) + 1;    end        % compute y value by using the quanta matrix:    y = 0;    rowquantamatrix = sum(quantamatrix,2);    columnquantamatrix = sum(quantamatrix,1);        for p = 1:length(s)        for q = 1:length(discscheme)-1           if rowquantamatrix(p) > 0 && columnquantamatrix(q) > 0              y = y + (quantamatrix(p,q))^2 / (rowquantamatrix(p)*columnquantamatrix(q));           end        end    end        % compute y' value from y value:    yprime = M*(y-1)/log(length(discscheme)-1);        % compute cacc value from y' value:    caccval = sqrt(yprime/(yprime+M));endfunction discdata = discretizedata(data, discscheme)     % discretize the continuous data upon the     % discretization scheme,     % this function is called at the end of the cacc algorithm, when we    % already have the 'final' discretization scheme    discdata = zeros(size(data,1),1);    for p = 1:size(data,1)        for t = 2:length(discscheme)           if data(p) <= discscheme(t)               discdata(p) = t-1;               break;           end        end    end           % normalize discrete data    % i.e., if the discretized data     % for attribute Ai is e.g.: 1,9,13,15,     % 'normalize' these discrete values as 1,2,3,4:    % (you can comment the next 4 lines of code if you    % don't want this functionality):    normvalues = sort(unique(discdata));    for i = 1:length(normvalues)        discdata(find(discdata==normvalues(i))) = i;    endend 

        我把Guangdi Li和Julio Zaragoza两人的代码都测试了一下:

%Test CACC Algorithm(Class-Attribute Contingency Coefficient)%Reference: Tsai C J, Lee C I, Yang W P. A discretization algorithm based on %class-attribute contingency coefficient[J]. Information Sciences, 2008, 178(3): 714-731.clear all;close all;clc;%Age dataset of Table 2 in Reference%Care-->1; Edu-->2; Work-->3dataset = [ 3,  1;            5,  1;            6,  1;            15, 2;            17, 2;            21, 2;            35, 3;            45, 3;            46, 3;            51, 2;            56, 2;            57, 2;            66, 1;            70, 1;            71, 1];[ discdata,discvalues,discscheme ] = cacc(dataset);C = 1;%CACC_Discretization的输入参数C指的是类的个数(即dataset中有几列为class)[ DiscreData1,DiscretizationSet1 ] = CACC_Discretization( dataset, C );%[ DiscreData2,DiscretizationSet2 ] = CAIM_Discretization( dataset, C );

发现两个结果真的不一样,与开篇提到的论文中的结果似乎也有差别。因为CACC是一种监督式(supervised)的离散化方法:(以下摘自文献摘要)

        想知道什么是监督式的方法还是了解一下机器学习中的监督学习比较好,概念是一样的。所以CACC需要输入类别信息,Julio Zaragoza实现版本默认输入数据data中只能是最后一列表示类别,而Guangdi Li实现版本可以指定后面的几列都是监督信息,比如机器学习的研究热点multi-label learning就是有多个监督信息;因此,抛开两人谁的实现是正确的问题,二者可能各有优劣吧。

        另外,顺藤摸瓜到Guangdi Li的主页:

https://cn.mathworks.com/matlabcentral/profile/authors/1714285-guangdi-li

        发现他的Contributions列表里有CAIM算法代码,这也是一种Discretization算法,在CACC的文献里是有提及的,还做了对比算法:

https://cn.mathworks.com/matlabcentral/fileexchange/24344-caim-discretization-algorithm?s_tid=prof_contriblnk

        同样,切换到Funtions页面:

        注意到,代码一共有四部分,除了ControlCenter.m是测试例子之外,CAIM_Discretization是CAIM主函数,另外两个CAIM_Evaluation和DiscretWithInterval在函数CAIM_Discretization要被调用的。三个函数的代码分别如下:

        函数CAIM_Discretization:

function [ DiscreData,DiscretizationSet ] = CAIM_Discretization( OriginalData, C )%CAIM Algorithm%Given: M examples described by continuous attributes Fi, S classes%For every Fi do: %Step 1 %1.1 find maximum (dn) and minimum (do) values%1.2 sort all distinct values of Fi in ascending order and initialize all possible interval boundaries, B, with the minimum, maximum, and all the midpoints of all adjacent pairs in the set %1.3 set the initial discretization scheme to D:{[do,dn]}, set variable GlobalCAIM=0 %Step 2 %2.1 initialize k=1 %2.2 tentatively add an inner boundary, which is not already in D, from set B, and calculate the corresponding CAIM value %2.3 after all tentative additions have been tried, accept the one with the highest corresponding value of CAIM%2.4 if (CAIM >GlobalCAIM  or  k<S) then update D with the accepted, in step 2.3, boundary and set the GlobalCAIM=CAIM, otherwise terminate %2.5 set k=k+1 and go to 2.2%Result:Discretization scheme D %Paper: Kurgan, L. and Cios, K.J. (2002). CAIM Discretization Algorithm, IEEE Transactions of Knowledge and Data Engineering, 16(2): 145-153% This code is implemented by Guangdi Li, 2009/06/04% OriginalData is organized as F1,F2,...,Fm,C1,C2,...,CnF = size( OriginalData,2 ) - C ;M = size( OriginalData,1 );DiscreData = zeros( M,C+F ); DiscreData( :,F+1:F+C ) = OriginalData( :,F+1:F+C );% Assume the maximum number of interval is M/(3*C)MaxNumF = floor(M/(3*C));% Save all the discretization intervals, which is saved in columnDiscretizationSet = zeros( MaxNumF,F );for p = 1:F    % Step 1    %Dn = max( OriginalData( :,p )); % the maximum boundary     %Do = min( OriginalData( :,p )); % the minimum boundary       SortedInterval = unique( OriginalData( :,p ) );    B = zeros( 1,length( SortedInterval )-1 );    Len = length( B );    for q = 1:Len        B( q ) = ( SortedInterval( q ) + SortedInterval( q+1 ) )/ 2;    end    %B      D = zeros( 1,MaxNumF ); % D save all discretizations for variable Fi    %D( 1 ) = Do; D( 2 ) = Dn;     GlobalCAIM = -Inf;        %Step 2    k=0; % save the number of discretizations in D, the initiate state is 2     while true          CAIM = - Inf; Local = 0;          for q = 1:Len              if isempty( find( D( 1:k )==B(q), 1 ) ) == 1                                   DTemp = D;                 DTemp( k+1 ) = B( q );                 DTemp( 1:( k+1 ) ) = sort( DTemp( 1:( k+1 ) ) );                 CAIMValue = CAIM_Evaluation( OriginalData,C,p,DTemp( 1:( k+1 ) ) );                 if CAIM < CAIMValue                     CAIM = CAIMValue;                    Local= q;                 end                         end                  end         % CAIM         % GlobalCAIM          if GlobalCAIM < CAIM && k < MaxNumF              GlobalCAIM = CAIM;             k = k + 1;             D( k ) = B( Local );             D( 1:k ) = sort( D( 1:k ) );          elseif  k <= MaxNumF && k <= C             k = k + 1;             D( k ) = B( Local );             D( 1:k ) = sort( D( 1:k ) );                        else              break;          end       end    DiscretizationSet( 1:k,p )= D( 1:k )';    % do the discretization process according to intervals in D.     DiscreData( :,p ) = DiscretWithInterval( OriginalData,C,p,D( 1:k ) );    endend
        函数CAIM_Evaluation:

function CAIMValue = CAIM_Evaluation( OriginalData, C, Feature, DiscretInterval )%Paper: Kurgan, L. and Cios, K.J. (2002). CAIM Discretization Algorithm, IEEE Transactions of Knowledge and Data Engineering, 16(2): 145-153% OriginalData is organized as F1,F2,...,Fm,C1,C2,...,Cnk = length( DiscretInterval );[ DiscretData,QuantaMatrix ] = DiscretWithInterval( OriginalData,C,Feature,DiscretInterval );%Discrete the continuous data upon OriginalData %QuantaMatrix% Compute the value of CAIM via quanta matrix and equation (sum maxr/Mr)/n SumQuantaMatrix = sum( QuantaMatrix,1 );CAIMValue = 0 ;for p = 1:k    if max( QuantaMatrix(:,p) ) > 0       CAIMValue = CAIMValue + ( max( QuantaMatrix(:,p) ) )^2/SumQuantaMatrix(p) ;    endendCAIMValue = CAIMValue/k ;end
        函数DiscretWithInterval:
function [ DiscretData,QuantaMatrix ] = DiscretWithInterval( OriginalData,C,Column,DiscretInterval )% C is the number of class variables.M = size( OriginalData,1 );k = length( DiscretInterval );F = size( OriginalData,2 ) - C;DiscretData = zeros( M,1 );%Discrete the continuous data upon OriginalData for p = 1:M    for t = 1:k         if OriginalData( p,Column ) <= DiscretInterval( t )             DiscretData( p ) = t-1;             break;         elseif OriginalData( p,Column ) > DiscretInterval( k )             DiscretData( p ) = k;         end                  end        end%OriginalData( :,Column )%Quanta matrix CState = C;FState = length( DiscretInterval ) + 1;QuantaMatrix = zeros( CState,FState );for p = 1:M    for q = 1:C        if OriginalData( p,F+q ) == 1           Row = q;           Column = DiscretData( p )+1;           QuantaMatrix( Row,Column ) = QuantaMatrix( Row,Column ) + 1;        end    endend%QuantaMatrixend

        CAIM_Discretization的使用与CACC_Discretization的使用是一模一样的。可以接着上面的测试代码加一行: 

[ DiscreData2,DiscretizationSet2 ] = CAIM_Discretization( dataset, C );

当然,CAIM的离散化结果与前面的也不一样。


        接下来说一说Guangdi Li和Julio Zaragoza的关系,比较有趣的是他们不但在这个社区为了CACC函数有交集,在现实中还真是有交集的,他们都曾到同一个课题组访学交流过:

        看第一篇论文:

Bielza C, Li G, Larranaga P.Multi-dimensional classification with Bayesian networks[J]. InternationalJournal of Approximate Reasoning, 2011, 52(6): 705-727.

        该论文作者全名为Concha Bielza, Guangdi Li, PedroLarranaga,

        再看第二篇论文:

Zaragoza J H, Sucar L E, Morales E F, etal. Bayesian chain classifiers for multidimensional classification[C]//IJCAI.2011, 11: 2192-2197.

        该论文作者全名为Julio H Zaragoza, LuisEnrique Sucar, Eduardo F Morales, Concha Bielza, Pedro Larranaga

这两篇论文里有两个共同的名字:Concha Bielza和Pedro Larranaga。

        一会儿再提这两个大牛,先排除同名的可能性,注意前面的Guangdi Li的主页提到自己是K.U.Leuven(比利时鲁汶大学),即与论文作者单位相同;Julio H Zaragoza的单位是Mexico的National Institute for Astrophysics, Optics and Electronics,但是在Matlab社区里Julio Zaragoza的主页显示的是The University of Adelaide(澳大利亚的阿德莱德大学),二者并不一致,但是同时也应该注意到,这篇论文发表在了IJCAI2011上面,而Matlab社区里的回答是在2013年。Matlab社区Julio Zaragoza的主页有一个个人主页链接:

https://cs.adelaide.edu.au/~jzaragoza/doku.php


        当然,这个主页已经不用了(This page is no longer maintained),但该主页显示,Julio Zaragoza是The University of Adelaide的一名博士生(PhD Student),但应该已于2014年毕业了,因为主页上的Publications里提到了PhDThesis:

        继续用谷歌学术(或glgoo)随便搜索一篇Julio Zaragoza的论文追踪到Julio Zaragoza的Google Scholar主页:

可以发现电子邮件验证是apple.com,再看作者的论文发表,就可以得出结论,此人先在Mexico的National Institute for Astrophysics, Optics and Electronics,然后在The University of Adelaide攻读博士学位,毕业后又到了Apple ?

        接下来说Concha Bielza(女)和Pedro Larranaga,他们是西班牙马德里理工大学(西班牙语:Universidad Politécnica de Madrid,英语:Technical University of Madrid) 的教授,据说是西班牙甚至欧洲那一块的学术大牛,共同创建了Computational Intelligence Group,有兴趣的可以看他们的研究组主页了解更多信息:http://cig.fi.upm.es/

        Concha Bielza的主页: http://cig.fi.upm.es/CIGmembers/concha_bielza

        个人主页有她个人CV(http://cig.fi.upm.es/also/cv_concha_bielza_english.pdf):

        Pedro Larranaga的主页: http://cig.fi.upm.es/CIGmembers/pedro-larranaga

        个人主页有他个人CV(http://cig.fi.upm.es/also/larranaga-cv-2016.pdf):

        可以发现,Concha Bielza一直在马德里理工大学,而Pedro Larranaga是2007年才到了马德里理工大学,之前在University of the Basque Country,即巴斯克大学。查了一下这所大学,最有意思的是其简称并不是英文名称首字母缩写UBC,而是UPV/EHU,这是什么意思?其实对比一下马德里理工大学的简称会明白其中一点,马德里理工大学的简称是UPM,是西班牙语Universidad Politécnica de Madrid 的简称,而为什么巴斯克大学有两个简称呢?看一看维基百科就知道了The University of the Basque Country (Basque: Euskal Herriko Unibertsitatea,EHU; Spanish: Universidad del País Vasco, UPV; UPV/EHU),即UPV是西班牙语的简称,而EHU是巴斯克语的简称。巴斯克语是个什么语言?Basque Country与Spain什么关系?咦,我们不是在讲离散化技术么?这是扯到哪儿去了?坏了,偏了,偏的太远了,先容我辨辨回去的方向……


        最后分享一个比较有趣的在线贝叶斯网络结构学习网站:http://b-course.cs.helsinki.fi/obc/ ,具体使用方法自己琢磨: