Comparing Floats: How To Determine if Floating Quantities Are Close Enough Once a Tolerance Has Been Reached

来源：互联网发布：2017全球碳排放量数据编辑：程序博客网时间：2024/05/16 18:56

UNLIKE INTEGERS, which are either different or equal, floating pointnumbers may be different, equal, or "close enough." This is adown-to-earth practitioner's dis-cussion on how to determine, in arobust manner, whether or not floating quantities are close enough,once a tolerance (the meaning of "close enough") has been assigned. Thegoal is a comparison procedure that is correct more often than not andreasonably efficient. Floating point jargon is taken from C/C++, andfloat means a 32-bit IEEE float. All reasoning can be extended todouble precision and other floating point quantities.

While the inequality of floating point numbers is easilyestablished using operators > or <, "close enough," or fuzzyequality requires more work. Consider the following numbers:


float a1 = 0.123456
float b1 = 0.123457

Are a1 and b1 equal? Strictly speaking, a1==b1 will return false, but are a1 and b1 close enough to be considered equal?This decision requires some knowledge of the problem domain, usuallyrepresented by a tolerance. Let's call it feps for float epsilon. The tolerance is related to how many significant digits mustmatch so that two numbers are close enough to be, for practicalpurposes, equal.

Since IEEE floats have at least six significant digits in decimalnotation, this discussion will arbitrarily allow losing approximatelyone digit to error and will require five matching digits for equalityby establishing a tolerance like


float feps = 0.00001

Thus, feps>(a1-b1) will return true, indicating that a1 and b1 are equal, as will feps>(b1-a1). The "close enough" equality test has become an inequality test against a difference. However, if you consider:


float a2 = 0.123456
float b2 = -0.123457
eps>(a2-b2)

 will return 

feps>0.246913

, which is false, while 

feps>(b2-a2)

 will return 

feps> -0.246913

, which is incorrectly true—oops! Equality (and inequality as well) is symmetric: if 

a!=b

 is true, then 

b!=a

 must also be true. Fix: 

feps>fabsf(a2-b2)

 will return false and so will 

feps>fabsf(b2-a2)

, at the cost of one 

fabsf

(or your favorite work-alike) per comparison.

Turning a comparison into an inequality against a difference, however, has shortcomings. Let:

float big1 = 1.23456e28float big2 = 1.23457e28Are big1 and big2 equal? Their difference is a quantity of the order of 1.0e23, or, in U.S. dollars, much larger than the net worth of Bill Gates. Ifboth are the result of computations involving thousands of floatingpoint operations, however, big1 and big2 are most likely the same number. Yet feps>fabsf(big1-big2) will return false. How about setting feps=2.0e23? Just kidding. There's more—let:

float small1 = 1.23456e-10float small2 = 3.45678e-11Here, small1 and small2 are truly different, and generously bigger than the smallest IEEE float larger than zero. However, feps>fabsf(small1-small2) returns true so that small1 and small2 incorrectly appear to be equal. The same error happens if:

float small3 = 0.000004float small4 = -0.000004are used instead of small1 and small2. Of course, you can always change feps to a suitably smaller value. Or can you? Let:


feps = 1.0e-10

Now feps>fabsf(small3-small4) is false, so they're different. But feps>fabsf(a1-b1) is also false, sigh. Afeps this small tests for too many significant digits (more than afloat can afford). Consequently, it is too often as persnickety as the == operator. And feps>fabsf(small1-small2) is still incorrectly true; the same happens with feps>fabsf(big1-big2), which is still (incorrectly) false.

Here is the point: a tolerance tested for inequality against adifference is a practical way to test "close enough" equality to zero.This method cannot account for "close enough" equality of two floatnumbers over their entire dynamic range.

To test fuzzy equality over the possible range of two floatnumbers, the ratio (vs. the absolute difference) of the numbers must be"close enough" to unity (vs. smaller than feps). Taking a ratio for acomparison is perhaps expensive in some cases, but is better thanreturning an incorrect result.

The value of fabsf(a1/b1) is ~0.999992. That of fabsf(b1/a1) is ~1.000008; 1+feps is 1.00001, and 1-feps is 0.99999. A seemingly correct ratio-test predicate such as:


(((1-feps) < fabsf(n/d)) && (fabsf(n/d) < (1+feps)))

where n=numerator, d=denominator is true for n<d or n>d indifferently if n is "close enough" to d.The value of fabsf(big1/big2) is the same as fabsf(a1/b1). The value of fabsf(small1/small2) is 3.57146, that of fabsf(small2/small1)

is 0.280001; the ratio-test predicate is correctly false for both, indicating that small1 and small2 are different.

Here is the catch: the value of fabsf(a2/b2) is ~1.000008, that of fabsf(b2/a2) is ~0.999992 and the value of fabsf(small3/small4) is 1.0: in all three cases the seemingly correct ratio-test predicate isincorrectly true. But wait: fabsf is a carryover from thedifference-test predicate. In fact, the ratio-test must not use fabsf,so the ratio must be signed:

(-)/(-) or (+)/(+) is positive, hence potentially close to positive 1.0 and capable of scoring true, but (-)/(+) or (+)/(-) is negative and automatically disqualified from scoring true, as it should be (there are exceptions as noted later on).

The correct predicate for the ratio-test is then:


(((1-feps) < (n/d)) && ((n/d) < (1+feps)))

Now the value of (small3/small4) is -1.0 and false, which is correct. The same is true for a2 and b2. But don't breathe a sigh of relief just yet! Now, let:


float a3 = 1.0e36
float b3 = 1.0e-4

In this case, n/d can be either 1.0e40 (oops: IEEE float overflow), or 1.0e-40 (oops: IEEE float underflow). Perhaps 1.0e40 will be interpreted as 0.0F, and perhaps not. Almost certainly 1.0e40 will cause indigestion to your code. Here is the fix: before taking the ratio, check for overflow or underflow. Here is how.The predicate ((d<1.0)&&(n>d*maxFloat))

is true (and safe, at least in C/C++) if n/d would overflow; if true, obviously the equality is false. The part (d<1.0) is needed because d*maxFloat may otherwise cause overflow.

The predicate ((d>1.0)&&(n<d*minFloat)) is true (and safe) if n/d would underflow. If d>1.0, then d*minFloat will not underflow. So long to floats.

Doubles, long doubles, and whatever else the future may provide requiresimple mechanical adjustments that a C++ compiler can be trained totake care of automatically. What about mixed-mode equality tests, like

(double==float)? Your compiler will let you do it, as it will all sorts of other nasty things. Consider this admittedly contrived example:


#define PI 3.1415926535897932
double dpi = PI
float fpi = PI
double deps = 8.742278E-8

Then deps>fabs(dpi-fpi) is false, but deps>fabsf(dpi-fpi) is true. Is fpi close enough to dpi, or not? How can this be decided? In case it matters, who's lookingforward to debugging this? Or even worse, would you ever know ithappened? A C++ compiler can be trained to complain when attemptingmixed-mode testing. Let's make it whine, and let's give heed to itsadvice.

Listing 1 gives the code to make it all happen. The ANSI standard class numeric_limits<T> provides basic machinery; the class nTol adds typed tolerances for floating point types using specialization. Exact types are dismissed early and nTol<type> needs no specialization for them. If specialization for a type is needed, but missing, the code will abort.

The template function testNumEq takes two arguments of the sametype, hence mixed-type comparisons will not compile. It will acceptinteger arguments (just in case), and get rid of them as soon aspossible, as none of the "close enough" code is necessary. All thestatic inline function calls should be optimized away by the compiler,leaving constant values of the correct floating point type in theirplace.

As listed, the code compiles on Microsoft Visual C++ 6.0 under NT 4.0. Removing the namespace std:: from calls to numeric_limits<> allows the code to compile happily on HP aCC 1.15 under HPUX 10.20. Results are equivalent on both platforms.

What is the runtime cost of all this?

At best, a function call and a comparison are required if a==b is true. When comparing to 0.0, afunction call and no more than six comparisons are required. In mostother cases, a function call plus one division, one multiplication, andno more than 14 comparisons are necessary.

Last but not least, user beware: all "close enough" or fuzzy evaluations have room for ambiguity. For instance, testNumEq(a, b) will return true for (a=0.0, b=1.0e-20), (a=0.0, b=1.0e-30), (a=0.0, b=1.0e-10), and (a=0.0, b=-1.0e-10), but will return false for (a=1.0e-20, b=1.0e-30), and (a=1.0e-10, a=-1.0e-10). That is to sayequality is not transitive when 0.0 is in the picture, but sometimes itmakes sense. For instance, if for some problem domains 1.e-6 and -1.e-6 are indeed "close enough" to eachother to be the same number, it is important to understand that thistruly means each of them is "close enough" to zero, rather than bothbeing equal to each other. The correct test is, therefore:


(testNumEq(1.E-6F, 0.0F) && testNumEq(-1.E-6F, 0.0F))

which is true (depending on your tolerance of course). Indeed, testNumEq(-1.E-6F, 1.E-6F) is false, as it should be.