[翻译]080728-Thermal Face Recognition Over Time

来源：互联网发布：淘宝商城花卉编辑：程序博客网时间：2024/06/03 07:05

1.数据库：

经过10周的时间，在受控条件下获得了240个不同个体的可见光和长波红外（LWIR）图像。在每周的作业中，对每个个体在两种不同的光照环境下（FERET和面部照片）和两种不同表情下（中性及其他）进行拍照。可见光图像是彩色的，分辨率为1200×1600。红外热图的分辨率为320× 240，色彩深度为12位。

最后对每幅图像进行手工调整。è眼部位置固定且图像大小为99×132的标准几何图形，所有必要的图像插值操作都是双线性插值（bilinearly），对要用于主元分析（PCA）测试的图像都经过了进一步的直方图均衡化（histogram-equalized）处理。

2.使用算法：

每两种形态使用两种算法进行试验:主元分析法（PCA）和blinded for review 算法。

3.实验：

实验数据集合：每周对每一个测试对象采集一副中性的面部表情图像

实验过程：使用上述两种算法，将每周的不同形态的图像（V/IR/F）同第一周对应的图像进行比较，测试最高识别率

实验结果：各种形态图像的识别率变化都没有一定的趋势可言；实验还表明在十周的时间个图像变化近似于平直è可以假设这两种算法和两种形态的图像的在每周的识别性能是相互独立的并且在局部是近似稳定分布的è可以认定为高斯分布。è估计各分布的标准方差，并画出误差线。

(1) 在时间延迟情况下使用主元分析（PCA）算法的热人脸识别性能明显要低于其对应的可见光图像的识别。

(2) 使用blinded for review 算法的两种形态图像的整体识别性能较PCA算法明显提高，更重要的是两种形态的识别性能曲线交叉次数增多，但依然在两者的误差条（Error Bar）之内。è该算法的两种形态的识别性能差异并不具有统计显著性。

Thermal Face Recognition Over Time

文章介绍了一种基于可见光和热红外图像人脸识别性能的对比研究，重点研究了采集和测试图像间的时间延迟(Time-Lapse)对研究结果的影响。该领域早期的研究大多数强调在同一时间内进行图像采集和测试。实验结果显示：在时间延迟影响下的可见光和热红外图像的性能差异要比想象中的要小，实际上在现有数据上时不具有统计显著性的。

1 Introduction

Face recognition with thermal infrared imagery has recently enjoyed renewed interest. While the volume of literature on the subject is notably smaller than that related to visible face recognition, there is nonetheless a steady stream of research [1, 2, 3, 4, 5, 6]. These papers have established that thermal imagery of human faces constitutes a valid biometric signature, though mostly relying on databases limited both in size and variability, due to the expense and complexity of extensive data collection. Early results were based on gallery and probe sets collected indoors during a single session. In that respect, they resemble the fa/fb tests in the FERET program [7].

目前，人们对基于热红外图像的人脸识别产生了新的兴趣，然而该方面的文献较基于可见光的人脸识别方面的文献来说要明显少得多，然而尽管如此，该方面的研究也还是很多的。在这些文献中指出，尽管大多数研究只依赖于容量和可变性有限的数据库但是由于大量数据采集的费用和复杂性等因素，人脸的热图像还是形成了一个有效的生物特征签字辨别(biometric signature)。早期的结果是基于在单阶段（single session）期间室内采集的的图库（gallery）和测试集合（probe sets），在这方面有些类似于在FERET项目中使用的fa/fb测试。

More recently, a study involving imagery collected indoors in a laboratory setting over multiple weeks was presented in [4, 8]. In that study, the authors note that when using a PCA-based recognition system, visible face recognition of time-lapse images yields better results than its thermal counterpart. They go on to conjecture, based on their visual analysis of the thermal imagery, that large variations of the thermal emission patterns of the face over time were responsible for the degraded performance. The current paper seeks to reproduce and extend some of the results in [4, 8]. In particular, we show that while those results are reproducible, it may be premature to attribute the performance difference to a modality-specific phenomenon. The results below demonstrate that a statistically significant performance difference between modalities can be measured when recognition is performed using PCA. However, when a more sophisticated algorithm is used, no such difference is measurable. This indicates that the authors of [4, 8] may have observed a measurement effect, and that the “inherent” value of visible and thermal imagery for time-lapse face recognition under controlled conditions is equivalent.

最近，在[4，8]中提出了一项经过几周时间在实验室装置下进行室内图像采集的研究。在该研究中作者注意到，当使用基于主元分析（PCA）的识别系统时，可见光延时图像的人脸识别产生的效果要优于其对应的热图像的人脸识别。他们根据对热像的可视化分析进一步推测：其性能的降低是由于随着时间的推移，人脸热放射模式发生了较大改变。当前的一些文章试图再现和扩展[4，8]中的实验结果。尤其是如果这些实验结果可以重现，它可用于解释形式-具体现象的性能差异。下面的结论说明：当使用PCA进行识别时，不同图像形态识别的统计显著性能差异是可以测量的。但是当使用更先进的算法时，这种差异是不可测的。这表明[4，8]的作者可能已经注意到一种测量效应：在受控条件下用于时间延迟人脸识别的可见光和红外图像的“固有”值是相等的。

2 Data Collection and Normalization(数据采集和标准化)

The data used in this study was generously provided by the authors of [4, 8]. A complete description of the data collection procedure can be found in the references, and we include a brief summary here. Visible and longwave IR (LWIR) images of 240 distinct subjects were acquired under controlled conditions, over a period of ten weeks. During each weekly session, each subject was imaged under two different illumination conditions (FERET and mugshot), and with two different expressions (“neutral” “and other”). Visible images were acquired in color and a 1200 × 1600 resolution. Thermal images were acquired at 320×240 resolution and 12 bit depth.

本研究所使用的数据大多是由[4，8]的作者提供的。数据采集步骤地详细描述请参照参考书目，本文只包括了一个简单的摘要。经过10周的时间，在受控条件下我们获得了240个不同个体的可见光和长波红外（LWIR）图像。在每周的作业中，对每个个体在两种不同的光照环境下（FERET和面部照片）两种不同表情下（中性及其他）进行拍照。可见光图像是彩色的，分辨率为1200×1600。红外热图的分辨率为320× 240，色彩深度为12位。

Eye coordinates for all images, both visible and thermal, were manually located by the authors of [4, 8]. These coordinates were used to affinely register the images to a standard geometry with fixed eye locations and image size of 99×132 pixels. All necessary interpolation was performed bilinearly. The visible and thermal cameras were boresighted during data collection, therefore eye coordinates on corresponding images may not match exactly, as they had to be manually located in each modality separately. After alignment, all images were masked to remove all but the inner face, excluding ears and hair. Images used for the PCA experiments were further histogram-equalized, in order to match the processing in [4, 8]. Since the other algorithm does its own internal image processing, no equalization was performed on images before recognition.

所有图像（可见光和热红外）的眼部坐标被[4，8]的作者手工定位。这些坐标用来将这些图像仿射记录为眼部位置固定且图像大小为99×132的标准几何图形。所有必要的图像插值操作都是双线性插值（bilinearly）。在数据采集期间可见光和红外摄像机都经过了瞄准调整（boresighted），由于在每一个形态中眼部坐标都需要分别进行手动定位，因此在相应图像上的眼部坐标可能不是精确相同的。经过校正，所有图像除了内脸面（不包含耳朵和头发）之外全部被除掉。为了和[4，8]中的处理相一致，对要用于主元分析（PCA）测试的图像都经过了进一步的直方图均衡化（histogram-equalized）处理。由于另外一个算法是针对内部图像处理，因此在识别之前没有对图像进行均衡化处理。

[插值（Interpolation/resampling）是一种图像处理方法，它可以为数码图像增加或减少象素的数目。某些数码相机运用插值的方法创造出象素比传感器实际能产生象素多的图像，或创造数码变焦产生的图像。实际上，几乎所有的图像处理软件支持一种或以上插值方法。图像放大后锯齿现象的强弱直接反映了图像处理器插值运算的成熟程度。]

3 Thermal Infrared Phenomenology（热红外现象学）

While the nature of face imagery in the visible domain is well-studied, particularly with respect to illumination dependence [9], its thermal counterpart has received less attention. In [4], the authors show some variability in thermal emission patterns during time-lapse experiments, and properly blame it for decreased recognition performance. Figure 1 shows comparable variability within data collected with our own LWIR sensor. Each column shows images acquired in different sessions. It is clear that thermal emission patterns around the eyes, nose and mouth are rather different in different sessions. Such variations can be induced by changing environmental conditions. For example, exposed to cold or wind, capillary vessels at the surface of the skin contract, reducing the effective blood flow and thereby the surface temperature of the face. Also, when a subject transitions from a cold outdoor environment to a warm indoor one, a reverse process occurs, whereby capillaries dilate, suddenly flushing the skin with warm blood in the body’s effort to regain normal temperature. We have no knowledge of the environmental conditions during the data collection by the authors of [4], although we presume that they were fairly constant throughout all sessions.

虽然对可视化区域的人脸图像性质已经充分的研究，尤其是在光照独立性方面，当对于热图像方面的相关领域却关注甚少。在参考文献[4]中，作者说明了一些在时间延迟测验中热仿射图样的一些变化量，并认为这是使识别性能下降的相关因素。图一展示了使用长波红外探测器采集数据的可比较变化。每一栏展示了在不同作业阶段所采集的图像。从中可以看出在不同作业阶段，围绕眼部、鼻子和嘴部的热放射图样是明显不同的。这种差异可能是由于环境条件的改变引起的。例如：在寒冷或大风环境下，位于皮肤表面的毛细血管收缩，减少有效血流量，面部表层温度降低。同样，当采集对象从一个寒冷的室外环境进入一个温暖的室内环境时，情况恰恰相反，此时毛细血管放缩，身体温度逐渐恢复正常。我们并不了解在[4]中作者进行数据采集期间的环境变量，但我们假设它们在所有阶段保持不变。

Additional fluctuations in thermal appearance are unrelated to ambient conditions, but are rather related to the subject’s metabolism. Vigorous physical activity, consumption of food, alcohol or caffeine may all affect the thermal appearance of a subject’s face. Also, high temporal frequency thermal variation is associated with breathing. The nose or mouth will appear cooler as the subject is inhaling and warmer as he or she exhales, since exhaled air is at core body temperature, which is several degrees warmer than skin temperature.

在热图表面的其他变动同环境条件是无关的，但是同采集对象的新陈代谢有关。强烈的体育活动、食物消化以及酒精或咖啡因等因素都会影响采集对象面部的红外热图。另外，高频率的热量变化还同呼吸有关。当采集对象吸气时，其嘴部或者鼻子处将会显示冷色的，而呼气时则会显示暖色的，这是因为呼出的气体来自体内，其温度要高于皮肤温度。

Much like recognition from visible imagery is affected by illumination, recognition with thermal imagery is affected by a number of exogenous and endogenous factors. And while the appearance of some features may change, their underlying shape remains the same and continues to hold useful information for recognition. Thus, much like in the case of visible imagery, different algorithms are more or less sensitive to image variations. Proper compensation for those variations is a critical step of any successful face (or generally object) recognition algorithm, regardless of modality. Clearly, the better algorithms for thermal face recognition will perform equivalent compensation on the infrared imagery prior to comparing probe and gallery samples.

类似于识别可见光图像要受光照影响一样，红外图像识别也受到许多内部和外部的因素影响。当一些特征在表面改变后，它们的基本形状仍将保持不变并且仍对人脸识别提供很多有用的信息。因此，同可见光图像类似，不同的算法都会或多或少引起图像的变化。不考虑形态，对这些变化进行适当的补偿是所有人脸（或一般对象）识别的一个关键步骤。显而易见，更好的热人脸识别算法将对红外图像起到的平等补偿作用要优于比对探测器和图库样本。

Variation in facial thermal emission from two subjects in different sessions. Left column is the enrollment image and right column is the test image.（不同阶段的两组不同采集对象的热图差异。左图是采集图像，有图是测试图像）

4 Algorithms Tested(算法测试)

We performed experiments with two different algorithms in each of the two modalities: PCA with Mahalanobis angle distance and the (blinded for review) algorithm. The first is a standard algorithm with performance evaluations widely available in the literature, including [2], in which the authors present a comprehensive analysis of its performance on visible and thermal infrared imagery in a same-session recognition scenario. The second one is a commercial algorithm made available for testing in binary form.1

我们对每两种形态使用两种算法进行试验:主成分分析法（PCA）和blinded for review 算法。第一种是一种标准算法，伴随着性能赋值（performance evaluations）使用，在很多文章中都可以看到该算法，包括文章[2]中作者就其在同一阶段采集的可见光和红外图像人脸识别中的性能进行了广泛的分析。第二种方法是一种二进制形式测试的商业算法。

The training set for both algorithms was completely disjoint from gallery and probe images, provided by the authors of [4], in time, space and subjects. That is, the training set was collected at an earlier time, in a different location and used a disjoint set of subjects. This insures that the results reported below are indicative of real-world performance. We should also note that the training set was different from that used in [4], since their complete training set was not available to us. We chose to use a larger set of images collected over the last several years with our own visible and thermal cameras. This further increases the realism of the results, since one cannot usually expect to have training imagery from the same camera as the testing imagery. As a result of these divergences from [4], our PCA results are somewhat different. However, the qualitative nature of the results, as seen below, agrees strongly with those of [4].

这两个算法的训练数据集均完全从有[4]的作者提供的画库和调查图像中分离出来的。也就是说，该训练数据集是在早期、不同的地点以及相分离的集合中采集来的。这确保下述实验结果反映了现实世界的性能。我们还应注意到该训练数据集是并不同于文章[4]中所使用的，因为我们无法得到[4]中完整的数据集。本文所使用的是近几年来通过自己的可见光和热红外摄像机拍摄的图像构成的一个更大的图像集合。这进一步增强了实验结果的现实性，因为我们总不能期望使用和拍摄测试图像集相同的摄像机来拍摄训练数据集。由于这些同[4]中差异，我们的主元分析（PCA）结果稍微有一些不同。但如下所示，从定性的角度分析，该实验结果同[4]中的实验结果是完全相一致的。

5 Experimental Results and Discussion（试验结果及讨论）

In order to evaluate recognition performance with timelapse data, we performed the following experiments. The first-week frontal illumination images of each subject with neutral expression were used as the gallery. Thus the gallery contains a single image of each subject. （测试数据库的构成）For all weeks, the probe set contains neutral expression images of each subject, with mugshot lighting. The number of subjects in each week ranges from 44 to 68, while the number of overlapping subjects with respect to the first week ranges from 31 to 56. We computed top-rank recognition rates for each of the weekly probe sets with both modalities and algorithms. The results are shown in Figures 2 and 3. Note that the first data point in each graph corresponds to same-session recognition performance.

为了评估使用延时数据的识别性能，我们进行了一下实验。我们将第一周采集对象在中性表情下的正面光照图像作为图库。这样该图库包含了每一个采集对象的一张图像。经过整个测试过程，测试图像集合包含了所有采集对象的中性表情图像。每周采集对象数范围为44---68，然而和第一周采集对象相重叠的人数大约为31---56。我们在形态和算法两个方面来计算每一周的最高识别率，试验结果如图2和图3所示。注意每一个图像中的第一个数据点相当于同一阶段的识别性能。

Figure 2: Top-rank recognition results for visible, LWIR and fusion as a function of weeks elapsed between enrollment and testing, using PCA. Note that the x-coordinate of each curve is slightly offset in order to better present the error bars.

Figure 3: Top-rank recognition results for visible, LWIR and fusion as a function of weeks elapsed between enrollment and testing, using (blinded for review) algorithm. Note that the x-coordinate of each curve is slightly offset in order to better present the error bars.

Focusing for a moment on the performance curves, we notice that there is no clear trend for either visible or thermal modalities, encompassing weeks two through ten. That is, we do not see a clearly decreasing performance trend for either modality. This appears to indicate that whatever timelapse effects are responsible for performance degradation versus same-session results are roughly constant over the ten week trial period. Other studies have shown that over a period of years face recognition performance degrades linearly with time [10]. Our observation here is simply that the slope of the degradation line is small enough as to be nearly flat over a ten week period (except for the samesession result, of course). Following that observation, we assume that weekly recognition performances for both algorithms and modalities are drawn independently and distributed according to a (locally) constant distribution, which we may assume to be Gaussian. Using this assumption, we estimate the standard deviation of that distribution, and plot error bars at two standard deviations.

注意观察性能曲线一段时间，我们注意到从第二周到第十周无论是对于可见光图像还是对于红外图像，其变化都没有一定的趋势可言。也就是说对于每种不同形式的图像，我们没有看到那一个性能发生了明显的降低。这表明在这十周的试验期内，时间延迟效果对于识别性能的影响相对于相同阶段的识别性能而言是基本稳定的。其他研究表明，在数年的时间里，人脸识别性能的下降是同时间成线性关系的[10]。我们的观测数据所表明的在十周的时间（当然不包括相同阶段（第一周）的时间）里下降曲线的幅度近似于平直。从观测数据我们可以假设这两种算法和两种形态的图像的在每一周的识别性能是相互独立的并且在局部是近似稳定分布的，因此我们可以认定为高斯分布。利用该假设，我们可以估计该分布的标准方差，并在两个标准方差上画出误差线。

Figure 2 shows the week by week recognition rates using PCA-based recognition. We see that, consistently with the results in [4, 8], thermal performance is lower than visible performance. In fact, for at least six out of nine timelapse weeks that difference is statistically significant. Table 1 shows mean recognition rates over weeks two through nine for each algorithm and modality. As shown in the last column, we see that mean visible performance is higher than the mean thermal performance by 2.17 standard deviations. This clearly indicates that thermal face recognition with PCA under a time-lapse scenario is significantly less reliable than its visible counterpart.

图二显示了基于主元分析的人脸识别每周的识别率。从中可以看出，同[4，8]中的实验结果一致，热图性能较可见光图像性能要低一些。实际上，至少2/3的延迟时间中的差异是具有统计显著性的。图表以显示了每种算法和形态的图像识别从第二周到第九周的平均误差率。从最后一栏中可以看出，可见光图像的识别性能要高于红外图像识别性能2.17个标准方差。这显然说明在时间延迟情况下使用主元分析（PCA）的热人脸识别性能明显要低于其对应的可见光图像的识别。

Turning to Figure 3, we see the results of running the same experiments with the (blinded for review) algorithm. Firstly, we note that overall recognition performance is markedly improved in both modalities. More importantly, we see that weekly performance curves for both modalities cross each other multiple times, while remaining within each other’s error bars. This indicates that the performance difference between modalities using this algorithm is not statistically significant. In fact, looking at Table 1, we see that the difference between mean performances for the modalities is only 0.21 standard deviations, hardly a significant result. We should also note that the mean visible time-lapse performance with this algorithm is 88.65%, compared to approximately 86.5% for the FaceIt algorithm, as reported in [4]. This shows that the (blinded for review) algorithm is competitive with the commercial state-of-theart on this data set, and therefore provides a fair means of evaluating thermal recognition performance, as using a poor visible algorithm for comparison would like thermal recognition appear better.

再看图三，我们可以看到在相同实验中使用了(blinded for review)算法的实验结果。首先，我们注意到两种形态的整体识别性能明显提高，更重要的一点是我们看到两种形态的性能曲线交叉次数增多，但依然在两者的误差条（Error Bar）之内。这表明使用该算法的两种形态的识别性能差异并不具有统计显著性。实际上从表一我们可以看出两种形态平均性能的差异只有0.21个标准偏差，几乎没有明显的区别。我们还注意到，使用该算法的基于可见光时间延迟的人脸识别性能是88.65%，而在参考文献【4】中使用Face It 算法可达到近似86.5%的识别性能。这表明使用（blinded for review）算法可以同测试数据集的商业化state-of-theart相比。因此为评估热图人脸识别性能提供了一种不错的方法。同时用一个较差的可见光算法相比，热图人脸识别要好一些。

Figures 2 and 3, as well as Table 1 also show the result of fusing both imaging modalities for recognition. Following [2] and [4] we simply add the scores from each modality to create a combined score. Recognition is performed by a nearest neighbor classifier with respect to the combined score. As many previous studies have shown [1, 2, 4], fusion greatly increases performance

图二、图三以及表一也展示了使用经过两种图像形态进行融合后的识别结果。我们可以根据【2】和【4】将两种形态的分值（score）简单相加来产生合并后的分值。然后参照这些合并分值使用最近邻域分类器进行识别。根据文献【1，2，4】中的研究可知，融合后的识别性能将会提高很大。

Table 1: Mean top-match recognition performance for timelapse experiments with both algorithms.

6 Conclusions （结论）

The main conclusion of this paper is that one must be cautious when evaluating the value of an imaging modality for a specific recognition task. Ideally, this question should be framed as that of estimating the Bayes optimal error for a classification problem. Inevitably, that estimate is based on an empirical measure of performance which inextricably tied to a particular classifier. While such an estimate can provide us with a valuable upper bound on the Bayes error, it cannot separate classifier effects from data-specific（数据专用） behavior. In this case, we show that while the results in [4] are reproducible, they do not imply that time-lapse face recognition with thermal infrared imagery is inferior to that performed with visible imagery. We have shown by example that, at least on this data set, the Bayes errors for each modality are comparable. Are more detailed analysis will surely require a much larger pool of subjects.

该文章的主要结论是：当评测针对某一特定识别任务的成像形态的数值时要相当谨慎。从理论上说，该问题应该包含用于分类问题的贝叶斯最优误差估计。不可避免的，该估计只必须是对针对某一特定分类器的的实验性能的实证测量。然而该估计可以为我们提供很有价值的贝叶斯误差上界，它不能区分专用数据特征中的分类器影响。在这种情况下，我们要说明当文章【4】中的实验结果可以重现时，这并不暗指使用热红外图像的时间延迟人脸识别性能要劣于可见光图像的人脸识别性能。通过实验可以证明，至少在该测试数据集上每种形态的图像识别的贝叶斯误差是差不多的。当然更详细的分析必将需要一个更大的测试集合。

Based on the preceding analysis, and recent results by the authors on time-lapse recognition with a more challenging, larger and diverse data set [11], we firmly believe that the use of thermal imagery of faces for biometric authentication is not only viable, but in certain circumstances even preferable over the use of visible images. Without a doubt, the used of fused visible and thermal imagery provides a level of performance not attainable by either alone.

根据前面的分析，作者关于时间延迟的人脸识别目前的研究结果还有更多的困难，需要更大且多种多样的数据集，但我们坚信，基于红外人脸热图的生物特征认证技术不但可以实现，而且其前景要优于可将光图像的人脸识别技术。毫无疑问，基于红外和可见光图像融合的人脸识别技术为两者提供了新的发展方向。