How floating point variables
来源:互联网 发布:oracle 数据库别名 编辑:程序博客网 时间:2024/05/23 21:23
Integers are great for counting whole numbers, but sometimes we need to store very large numbers, or numbers with a fractional component. A floating point type variable is a variable that can hold a real number, such as 4.0, 2.5, 3.33, or 0.1226. There are three different floating point data types:float, double, and long double. A float is usually 4 bytes and a double 8 bytes, but these are not strict requirements, so sizes may vary. Long doubles were added to the language after it’s release for architectures that support even larger floating point numbers. But typically, they are also 8 bytes, equivalent to a double. Floating point data types are always signed (can hold positive and negative values).
Here are some declarations of floating point numbers:
float
fValue;
double
dValue;
long
double
dValue2;
The floating part of the name floating point refers to the fact that a floating point number can have a variable number of decimal places. For example, 2.5 has 1 decimal place, whereas 0.1226 has 4 decimal places.
When we assign numbers to floating point numbers, it is convention to use at least one decimal place. This helps distinguish floating point values from integer values.
int
nValue = 5;
// 5 means integer
float
fValue = 5.0;
// 5.0 means floating point
How floating point variables store information is beyond the scope of this tutorial, but it is very similar to how numbers are written in scientific notation. Scientific notation is a useful shorthand for writing lengthy numbers in a concise manner. In scientific notation, a number has two parts: the significand, and a power of 10 called an exponent. The letter ‘e’ or ‘E’ is used to separate the two parts. Thus, a number such as 5e2 is equivalent to 5 * 10^2, or 500. The number 5e-2 is equivalent to 5 * 10^-2, or 0.05.
In fact, we can use scientific notation to assign values to floating point variables.
double
dValue1 = 500.0;
double
dValue2 = 5e2;
// another way to assign 500
double
dValue3 = 0.05;
double
dValue4 = 5e-2;
// another way to assign 0.05
Furthermore, if we output a number that is large enough, or has enough decimal places, it will be printed in scientific notation:
#include <iostream>
int
main()
{
using
namespace
std;
double
dValue = 1000000.0;
cout << dValue << endl;
dValue = 0.00001;
cout << dValue << endl;
return
0;
}
Outputs:
Precision
Consider the fraction 1/3. The decimal representation of this number is 0.33333333333333… with 3′s going out to infinity. An infinite length number would require infinite memory, and we typically only have 4 or 8 bytes. Floating point numbers can only store a certain number of digits, and the rest are lost. The precision of a floating point number is how many digits it can represent without information loss.
When outputting floating point numbers, cout has a default precision of 6 — that is, it assumes all variables are only significant to 6 digits, and hence it will truncate anything after that.
The following program shows cout truncating to 6 digits:
#include <iostream>
int
main()
{
using
namespace
std;
float
fValue;
fValue = 1.222222222222222f;
cout << fValue << endl;
fValue = 111.22222222222222f;
cout << fValue << endl;
fValue = 111111.222222222222f;
cout << fValue << endl;
}
This program outputs:
Note that each of these is only 6 digits.
However, we can override the default precision that cout shows by using the setprecision() function that is defined in a header file called iomanip.
#include <iostream>
#include <iomanip> // for setprecision()
int
main()
{
using
namespace
std;
cout << setprecision(16);
// show 16 digits
float
fValue = 3.33333333333333333333333333333333333333f;
cout << fValue << endl;
double
dValue = 3.3333333333333333333333333333333333333;
cout << dValue << endl;
Outputs:
Because we set the precision to 16 digits, each of the above numbers has 16 digits. But, as you can see, the numbers certainly aren’t precise to 16 digits!
Variables of type float typically have a precision of about 7 significant digits (which is why everything after that many digits in our answer above is junk). Variables of type double typically have a precision of about 16 significant digits. Variables of type double are named so because they offer approximately double the precision of a float.
Now let’s consider a really big number:
#include <iostream>
int
main()
{
using
namespace
std;
float
fValue = 123456789.0f;
cout << fValue << endl;
return
0;
}
Output:
1.23457e+008 is 1.23457 * 10^8, which is 123457000. Note that we have lost precision here too!
Consequently, one has to be careful when using floating point numbers that require more precision than the variables can hold.
Rounding errors
One of the reasons floating point numbers can be tricky is due to non-obvious differences between binary and decimal (base 10) numbers. In normal decimal numbers, the fraction 1/3rd is the infinite decimal sequence: 0.333333333… Similarly, consider the fraction 1/10. In decimal, this is easy represented as 0.1, and we are used to thinking of 0.1 as an easily representable number. However, in binary, 0.1 is represented by the infinite sequence: 0.00011001100110011…
You can see the effects of this in the following program:
#include <iomanip>
int
main()
{
using
namespace
std;
cout << setprecision(17);
double
dValue = 0.1;
cout << dValue << endl;
}
This outputs:
Not quite 0.1! This is because the double had to truncate the approximation due to it’s limited memory, which resulted in a number that is not exactly 0.1. This is called a rounding error.
Rounding errors can play havoc with math-intense programs, as mathematical operations can compound the error. In the following program, we use 9 addition operations.
#include <iostream>
#include <iomanip>
int
main()
{
using
namespace
std;
cout << setprecision(17);
double
dValue;
dValue = 0.1 + 0.1 + 0.1 + 0.1 + 0.1 + 0.1 + 0.1 + 0.1 + 0.1 + 0.1;
cout << dValue << endl;
}
This program should output 1, but it actually outputs:
Note that the error is no longer in the last column like in the previous example! It has propagated to the second to last column. As you continue to do mathematical operations, this error can propagate further, causing the actual number to drift farther and farther from the number the user would expect.
Comparison of floating point numbers
One of the things that programmers like to do with numbers and variables is see whether two numbers or variables are equal to each other. C++ provides an operator called the equality operator (==) precisely for this purpose. For example, we could write a code snippet like this:
int
x = 5;
// integers have no precision issues
if
(x==5)
cout <<
"x is 5"
<< endl;
else
cout <<
"x is not 5"
<< endl;
This program would print “x is 5″.
However, when using floating point numbers, you can get some unexpected results if the two numbers being compared are very close. Consider:
float
fValue1 = 1.345f;
float
fValue2 = 1.123f;
float
fTotal = fValue1 + fValue2;
// should be 2.468
if
(fTotal == 2.468)
cout <<
"fTotal is 2.468"
;
else
cout <<
"fTotal is not 2.468"
;
- How floating point variables
- How floating point numbers are represented
- C++ General: How is floating point representated?
- Comparing floating point numbers
- PHP Floating point precision
- Floating point exceptionexec
- Comparing floating point numbers
- Storing floating point numbers
- Comparing floating point numbers
- Floating point exception
- soft floating point library
- floating point exception || LD_LIBRARY_PATH
- Study notes: Floating point
- Floating point exception
- FLOATING POINT DETERMINISM
- invaild floating point operation
- Floating Point IEEE745
- 11809 - Floating-Point Numbers
- linux 查看版本号
- "Installing Software" has encountered a problem---pydev on ubuntu
- 注册表相关
- JAVA动态代理
- Magento事件机制 - Magento Event/Observer 【magento二次开发】
- How floating point variables
- pgsql比较操作
- tomcat开启https
- jsp:选中复选框实现删除功能
- hdu 1269 迷宫城堡
- 主宰全球的10大算法
- js中ie与标准dom的区别——事件处理
- pgsql 数学函数及操作
- adb shell dumpsys 命令 查看内存