判断文件是否为二进制

来源:互联网 发布:ppt图表无法修改数据 编辑:程序博客网 时间:2024/06/04 18:55

在工作中,碰到处理STL文件,有时候拿到的文件是二进制,有时候又是ASCII, 所以

想着写个方法进行判断,然后再选择打开方式。

话不多说,上代码!

enum FileTypeEnum   {     FileTypeUnknown,    FileTypeBinary,    FileTypeText  };FileTypeEnumDetectFileType(const char *filename,                            unsigned long length,                            double percent_bin){  if (!filename || percent_bin < 0)    {    return FileTypeUnknown;    }  FILE *fp = Fopen(filename, "rb");  if (!fp)    {    return FileTypeUnknown;    }  // Allocate buffer and read bytes  unsigned char *buffer = new unsigned char [length];  size_t read_length = fread(buffer, 1, length, fp);  fclose(fp);  if (read_length == 0)    {    return FileTypeUnknown;    }  // Loop over contents and count  size_t text_count = 0;  const unsigned char *ptr = buffer;  const unsigned char *buffer_end = buffer + read_length;  while (ptr != buffer_end)    {    if ((*ptr >= 0x20 && *ptr <= 0x7F) ||        *ptr == '\n' ||        *ptr == '\r' ||        *ptr == '\t')      {      text_count++;      }    ptr++;    }  delete [] buffer;  double current_percent_bin =    (static_cast<double>(read_length - text_count) /     static_cast<double>(read_length));  if (current_percent_bin >= percent_bin)    {    return FileTypeBinary;    }  return FileTypeText;}

调用示例:

DetectFileType(filename,256,0.05)

算法原来很简单:

  • Up to ‘length’ bytes are read from the file, if more than ‘percent_bin’ %
  • of the bytes are non-textual elements, the file is considered binary,
  • otherwise textual. Textual elements are bytes in the ASCII [0x20, 0x7E]
  • range, but also \n, \r, \t.

意思就是,从文件中读取一段字符串,并统计字符串中非文本字符的数量,如果超过

字符串长度的百分之percent_bin,那么就是二进制文件。

这里文本字符包括 \n \r \t 以及ASCII码值在[0x20, 0x7E]这个范围的

整个文件不需要全部读取到内存。

0 0
原创粉丝点击