c语言由一个小问题引发的关于gets和scanf的探究

来源：互联网发布：java爬虫入门教程编辑：程序博客网时间：2024/06/05 08:39

scanf( )函数和gets( )函数都可用于输入字符串，但在功能上有区别。若想从键盘上输入字符串"hi hello"，则应该使用__gets__函数。gets可以接收空格；而scanf遇到空格、回车和Tab键都会认为输入结束，所有它不能接收空格。char string[15]; gets(string); /*遇到回车认为输入结束*/

scanf("%s",string); /*遇到空格认为输入结束*/

所以在输入的字符串中包含空格时，应该使用gets输入。

scanf和gets获取字符串时的区别在C语言中，能构获取字符串的函数至少有两个：

1.scanf()

所在头文件：stdio.h

语法：scanf("格式控制字符串",变量地址列表);

接受字符串时：scanf("%s",字符数组名或指针);

2.gets()

所在头文件：stdio.h

语法：gets(字符数组名或指针);

两者在接受字符串时：

1.不同点：

scanf不能接受空格、制表符Tab、回车等；

而gets能够接受空格、制表符Tab和回车等；

2.相同点：

字符串接受结束后自动加'\0'。

例1：

#include <stdio.h>

main(){

char ch1[10],ch2[10];

scanf("%s",ch1);

gets(ch2);

}

依次键入asd空格fg回车，asd空格fg回车，则ch1="asd\0"，ch2="asd fg\0"。

例2：

#include <stdio.h>

main(){

char ch1[10],ch2[10],c1,c2;

scanf("%s",ch1);

c1=getchar();

gets(ch2);

c2=getchar();

}

依次键入asdfg回车，asdfg回车，则ch1="asdfg\0"，c1='\n'，ch2="asdfg\0"，c2需输入。

scanf ：当遇到回车，空格和tab键会自动在字符串后面添加'\0'，但是回车，空格和tab键仍会留在输入的缓冲区中。

gets：可接受回车键之前输入的所有字符，并用'\n'替代 '\0'.回车键不会留在输入缓冲区中

gets()用到读取字符串，用回车结束输入

scanf()可以读取所有类型的变量

-----------------------

1. 为什么 fflush(stdin) 是错的？
首先请看以下程序：

#include <stdio.h>

int main( void ){
int i;
for (;;)  {
    fputs('Please input an integer: ', stdout);
    scanf('%d', &i);
    printf('%d\n', i);
}
  return 0;
}
这个程序首先会提示用户输入一个整数，然后等待用户输入，如果用户输入的是整数，程序会输出刚才输入的整数，并且再次提示用户输入一个整数，然后等待用户输入。
但是一旦用户输入的不是整数（如小数或者字母），假设 scanf 函数最后一次得到的整数是 2 ，那么程序会不停地输出“Please input an integer: 2”。这是因为 scanf('%d', &i); 只能接受整数，如果用户输入了字母，则这个字母会遗留在“输入缓冲区”中。因为缓冲中有数据，故而 scanf 函数不会等待用户输入，直接就去缓冲中读取，可是缓冲中的却是字母，这个字母再次被遗留在缓冲中，如此反复，从而导致不停地输出“Please input an integer: 2”。
也许有人会说：“居然这样，那么在 scanf 函数后面加上‘fflush(stdin);’，把输入缓冲清空掉不就行了？”
然而这是错的！C和C++的标准里从来没有定义过 fflush(stdin)。也许有人会说：“可是我用 fflush(stdin) 解决了这个问题，你怎么能说是错的呢？”的确，某些编译器（如VC6）支持用 fflush(stdin) 来清空输入缓冲，但是并非所有编译器都要支持这个功能（linux 下的 gcc 就不支持），因为标准中根本没有定义 fflush(stdin)。
MSDN 文档里也清楚地写着fflush on input stream is an extension to the C standard（fflush 操作输入流是对 C 标准的扩充）。当然，如果你毫不在乎程序的移植性，用 fflush(stdin) 也没什么大问题。以下是 C99 对 fflush 函数的定义：
int fflush(FILE *stream);
如果 stream 指向输出流或者更新流（update stream），并且这个更新流最近执行的操作不是输入，那么 fflush 函数将把这个流中任何待写数据传送至宿主环境（host environment）写入文件。否则，它的行为是未定义的。原文如下：
int fflush(FILE *stream);
If stream points to an output stream or an update stream in which
the most recent operation was not input, the fflush function causes
any unwritten data for that stream to be delivered to the host environment
to be written to the file; otherwise, the behavior is undefined.
其中，宿主环境可以理解为操作系统或内核等。
由此可知，如果 stream 指向输入流（如 stdin），那么 fflush 函数的行为是不确定的。故而使用 fflush(stdin)  是不正确的，至少是移植性不好的。
2.   清空输入缓冲区的方法
虽然不可以用 fflush(stdin)，但是我们可以自己写代码来清空输入缓冲区。只需要在 scanf 函数后面加上几句简单的代码就可以了。
    #include <stdio.h>

int main( void ) {

int i, c;

for ( ; ; ) {

        fputs('Please input an integer: ', stdout);
        scanf('%d', &i);
      if ( feof(stdin) || ferror(stdin) ){
                     break;
        }
        while ( (c = getchar()) != '\n' && c != EOF ) ;
           printf('%d\n', i);
      }

  return 0;
    }
    #include <iostream>

#include <limits> // 为了使用numeric_limits

  using std::cout;
    using std::endl;
    using std::cin;
    using std::numeric_limits;
    using std::streamsize;
  int main() {

    int value;
      for ( ; ; )   {
        cout << 'Enter an integer: ';
        cin >> value;
        if ( cin.eof() || cin.bad() )
        { // 如果用户输入文件结束标志（或文件已被读完），
          // 或者发生读写错误，则退出循环

   // do something
            break;
        }
        // 读到非法字符后，输入流将处于出错状态，
        // 为了继续获取输入，首先要调用 clear 函数
        // 来清除输入流的错误标记，然后才能调用
        // ignore 函数来清除输入流中的数据。
        cin.clear();
        // numeric_limits<streamsize>::max() 返回输入缓冲的大小。
        // ignore 函数在此将把输入流中的数据清空。
        // 这两个函数的具体用法请读者自行查询。
        cin.ignore( numeric_limits<streamsize>::max(), '\n' );

        cout << value << '\n';
      }
    return 0;
    }

～～～～～～～～～～～～～～～～～～～～～～～～～``

C语言中有几个基本输入函数：

//获取字符系列
int fgetc(FILE *stream);
int getc(FILE *stream);
int getchar(void);
//获取行系列
char *fgets(char * restrict s, int n, FILE * restrict stream);
char *gets(char *s);//可能导致溢出，用fgets代替之。
//格式化输入系列
int fscanf(FILE * restrict stream, const char * restrict format, …);
int scanf(const char * restrict format, …);
int sscanf(const char * restrict str, const char * restrict format, …);

这里仅讨论输入函数在标准输入（stdin）情况下的使用。纵观上述各输入函数，

获取字符系列的的前三个函数fgetc、getc、getchar。以getchar为例，将在stdin缓冲区为空时，等待输入，直到回车换行时函数返回。若stdin缓冲区不为空，getchar直接返回。getchar返回时从缓冲区中取出一个字符，并将其转换为int，返回此int值。

MINGW 4.4.3中FILE结构体源码：

typedef struct _iobuf
{
char*_ptr;//指向当前缓冲区读取位置
int_cnt;//缓冲区中剩余数据长度
char*_base;
int_flag;
int_file;
int_charbuf;
int_bufsiz;
char*_tmpfname;
} FILE;

各编译器实现可能不一样，这里获取字符系列函数只用到_ptr和_cnt。

MINGW 4.4.3中getchar()实现：

__CRT_INLINE int __cdecl __MINGW_NOTHROW getchar (void)
{
  return (--stdin->_cnt >= 0)
    ?  (int) (unsigned char) *stdin->_ptr++
    : _filbuf (stdin);
}

其中stdin为FILE指针类型，在MINGW 4.4.3中，getc()和getchar()实现为内联函数，fgetc()实现为函数。顺便说一句，C99标准中已经加入对内联函数的支持了。

获取行系列的fgets和gets，其中由于gets无法确定缓冲区大小，常导致溢出情况，这里不推荐也不讨论gets函数。对于fgets函数，每次敲入回车，fgets即返回。fgets成功返回时，将输入缓冲区中的数据连换行符’\n’一起拷贝到第一个参数所指向的空间中。若输入数据超过缓冲区长度，fgets会截取数据到前n-1（n为fgets第二个参数，为第一个参数指向空间的长度），然后在末尾加入’\n’。因此fgets是安全的。通常用fgets(buf, BUF_LEN, stdin);代替gets(buf);。
格式化输入系列中，fscanf从文件流进行格式化输入很不好用。常用的还是scanf，格式化输入系列函数舍去输入数据（根据函数不同可能是标准输入也可能是字符串输入，如：sscanf）前的空白字符（空格、制表符、换行符）直至遇到非空白字符，然后根据格式参数尝试对非空白字符及后续字符进行解析。该系列函数返回成功解析赋值的变量数，若遇文件尾或错误，返回EOF。

=================分割线=================

提到缓冲区，就不得不提setbuf和setvbuf两个缓冲区设置函数，其声明如下：

void setbuf(FILE * restrict stream, char * restrict buf);
int setvbuf(FILE * restrict stream, char * restrict buf, int mode, size_t size);

setvbuf的mode参数有：

_IOFBF（满缓冲）：缓冲区空时读入数据；缓冲区满时向流写入数据。
_IOLBF（行缓冲）：每次从流读入一行数据或向流写入数据。如：stdio,stdout
_IONBF（无缓冲）：直接从流读入数据，或者直接向流写入数据，而没有缓冲区。如：stderr

setbuf(stream, buf);在：

buf == NULL：等价于(void)setvbuf(stream, NULL, _IONBF, 0);
buf指向长度为BUFSIZ的缓冲区：等价于(void)setvbuf(stream, buf, _IOFBF, BUFSIZ);

注：BUFSIZ宏在stdio.h中定义。

这里还要提一下传说中的setbuf的经典错误，在《C陷阱和缺陷》上有提到：

int main()
{
    int c;
    char buf[BUFSIZ];
    setbuf(stdout,buf);
    while((c = getchar()) != EOF)
        putchar(c);
    
    return 0;
}

问题是这样的：程序交回控制给操作系统之前C运行库必须进行清理工作，其中一部分是刷新输出缓冲，但是此时main函数已经运行完毕，buf缓冲区作用域在main函数中，此时buf字符数组已经释放，导致输出诡异乱码。

解决方案：可以将buf设置为static，或者全局变量，或者调用malloc来动态申请内存。

=================分割线=================

下面来看看几种流行的缓冲区清空方法：

fflush(stdin);式

由C99标准文档中：

If stream points to an output stream or an update stream in which the most recent
operation was not input, the fflush function causes any unwritten data for that stream
to be delivered to the host environment to be written to the ﬁle; otherwise, the behavior is
undeﬁned.

可以看出fflush对输入流为参数的行为并未定义。但由MSDN上的fflush定义：

If the file associated with stream is open for output, fflush writes to that file the 
contents of the buffer associated with the stream. If the stream is open for input, 
fflush clears the contents of the buffer.

可以看出fflush(stdin)在VC上还是有效地！鉴于各编译器对fflush的未定义行为实现不一样，不推荐使用fflush(stdin)刷新输入缓冲区。

setbuf(stdin, NULL);式

由前面对setbuf函数的介绍，可以得知，setbuf(stdin, NULL);是使stdin输入流由默认缓冲区转为无缓冲区。都没有缓冲区了，当然缓冲区数据残留问题会解决。但这并不是我们想要的。

scanf("%*[^\n]");式（《C语言程序设计现代方法第二版》中提到）

这里用到了scanf格式化符中的“*”，即赋值屏蔽；“%[^集合]”，匹配不在集合中的任意字符序列。这也带来个问题，缓冲区中的换行符’\n’会留下来，需要额外操作来单独丢弃换行符。

经典式

int c;
while((c = getchar()) != '\n' && c != EOF);

由代码知，不停地使用getchar()获取缓冲区中字符，直到获取的字符c是换行符’\n’或者是文件结尾符EOF为止。这个方法可以完美清除输入缓冲区，并且具备可移植性。

0 0