Hive 或 Impala 的数据类型与对应底层的 Parquet schema的数据类型不兼容

来源：互联网发布：合肥seo 编辑：程序博客网时间：2024/06/01 19:19

背景：修改了hive表的某些字段的数据类型，如从String -> Double ，此时，该表所对应的底层文件格式为Parquet，修改之后，更新Impala索引，然后查询修改数据类型的字段，会出现与Parquet schema 列数据类型不兼容的问题。

如： Impala ——

正在提取遇到以下错误的结果：

Bad status for request TFetchResultsReq(fetchType=0, operationHandle=TOperationHandle(hasResultSet=True, modifiedRowCount=None, operationType=0, operationId=THandleIdentifier(secret='D\x11\x18]\xf7\xa2E*\x8f\x99Ky\x9c\xc8\xda>', guid='D\x11\x18]\xf7\xa2E*\x8f\x99Ky\x9c\xc8\xda>')), orientation=4, maxRows=100): TFetchResultsResp(status=TStatus(errorCode=None, errorMessage="File 'hdfs://xxx:8020/user/hdfs/test/0f399649-1e1d-444b-9d71-24c8db0ac7f3.parquet' has an incompatible Parquet schema for column 'default.test.yyy'. Column type: DOUBLE, Parquet schema:\noptional byte_array QTY [i:30 d:1 r:0]\n", sqlState='HY000', infoMessages=None, statusCode=3), results=None, hasMoreRows=None)

查看其中错误信息：

errorMessage="File 'hdfs://xxx:8020/user/hdfs/test/0f399649-1e1d-444b-9d71-24c8db0ac7f3.parquet' has an incompatible Parquet schema for column 'default.test.yyy'. Column type: DOUBLE, Parquet schema:\noptional byte_array QTY [i:30 d:1 r:0]\n"

对应Hive类似，解决方法，参考：

1. http://stackoverflow.com/questions/36085891/hive-doesnt-change-parquet-schema

2. https://issues.cloudera.org/browse/IMPALA-779

0 0

Hive 或 Impala 的数据类型与 对应底层的 Parquet schema的数据类型不兼容

Hive 或 Impala 的数据类型与对应底层的 Parquet schema的数据类型不兼容