使用HUE操作Hive的時候出現Invalid OperationHandle

這一陣子使用HUE去連接HiveServer2當作HIVE的Web界面給別人使用
當是用了一陣子之後開始有人反映當他們執行的Query超過10分鐘之後會跳出如下的錯誤

[17/Oct/2016 17:46:26 +0900] api          ERROR    error in <function watch_query_refresh_json at 0x7f645587e0c8>
Traceback (most recent call last):
  File "/usr/local/hue/apps/beeswax/src/beeswax/api.py", line 58, in decorator
    return view_fn(request, *args, **kwargs)
  File "/usr/local/hue/apps/beeswax/src/beeswax/api.py", line 198, in watch_query_refresh_json
    handle, state = _get_query_handle_and_state(query_history)
  File "/usr/local/hue/apps/beeswax/src/beeswax/views.py", line 865, in _get_query_handle_and_state
    state = dbms.get(query_history.owner, query_history.get_query_server_config()).get_state(handle)
  File "/usr/local/hue/apps/beeswax/src/beeswax/server/dbms.py", line 621, in get_state
    return self.client.get_state(handle)
  File "/usr/local/hue/apps/beeswax/src/beeswax/server/hive_server2_lib.py", line 1043, in get_state
    res = self._client.get_operation_status(operationHandle)
  File "/usr/local/hue/apps/beeswax/src/beeswax/server/hive_server2_lib.py", line 855, in get_operation_status
    return self.call(self._client.GetOperationStatus, req)
  File "/usr/local/hue/apps/beeswax/src/beeswax/server/hive_server2_lib.py", line 644, in call
    raise QueryServerException(Exception('Bad status for request %s:\n%s' % (req, res)), message=message)
QueryServerException: Bad status for request TGetOperationStatusReq(operationHandle=TOperationHandle(hasResultSet=False, modifiedRowCount=None, operationType=0, operationId=THandleIdentifier(secret='F\x9aasK\xa9ML\x86\xbbs\xe2/\x97\x9f\xb9', guid='\xe4\x0b\xb2\xa8\xad]K=\x9f\xbf}\x9f\xc0\x0f\xfe\xa9'))):
TGetOperationStatusResp(status=TStatus(errorCode=0, errorMessage='Invalid OperationHandle: OperationHandle [opType=EXECUTE_STATEMENT, getHandleIdentifier()=e40bb2a8-ad5d-4b3d-9fbf-7d9fc00ffea9]', sqlState=None, infoMessages=['*org.apache.hive.service.cli.HiveSQLException:Invalid OperationHandle: OperationHandle [opType=EXECUTE_STATEMENT, getHandleIdentifier()=e40bb2a8-ad5d-4b3d-9fbf-7d9fc00ffea9]:12:11', 'org.apache.hive.service.cli.operation.OperationManager:getOperation:OperationManager.java:154', 'org.apache.hive.service.cli.CLIService:getOperationStatus:CLIService.java:377', 'org.apache.hive.service.cli.thrift.ThriftCLIService:GetOperationStatus:ThriftCLIService.java:610', 'org.apache.hive.service.cli.thrift.TCLIService$Processor$GetOperationStatus:getResult:TCLIService.java:1477', 'org.apache.hive.service.cli.thrift.TCLIService$Processor$GetOperationStatus:getResult:TCLIService.java:1462', 'org.apache.thrift.ProcessFunction:process:ProcessFunction.java:39', 'org.apache.thrift.TBaseProcessor:process:TBaseProcessor.java:39', 'org.apache.hive.service.auth.TSetIpAddressProcessor:process:TSetIpAddressProcessor.java:56', 'org.apache.thrift.server.TThreadPoolServer$WorkerProcess:run:TThreadPoolServer.java:285', 'java.util.concurrent.ThreadPoolExecutor:runWorker:ThreadPoolExecutor.java:1142', 'java.util.concurrent.ThreadPoolExecutor$Worker:run:ThreadPoolExecutor.java:617', 'java.lang.Thread:run:Thread.java:745'], statusCode=3), operationState=None, errorMessage=None, sqlState=None, errorCode=None)

但儘管HUE跑出了錯誤訊息,但是Hive的Job仍然在執行,而且Job跑完之後結果也會回傳回來
去看HiveServer2的log也有相同的訊息

2016-10-17 23:54:41,451 INFO  [org.apache.ranger.audit.queue.AuditBatchQueue0]: provider.BaseAuditHandler (BaseAuditHandler.java:logStatus(312)) - Audit Status Log: name=hiveServer2.async.batch, finalDestination=hiveServer2.async.batch.hdfs, interval=01:00.001 minutes, events=5, succcessCount=2, totalEvents=10269, totalSuccessCount=1049
org.apache.hive.service.cli.HiveSQLException: Invalid OperationHandle: OperationHandle [opType=EXECUTE_STATEMENT, getHandleIdentifier()=ba657134-71b6-4282-bd49-dbc0999738a2]
        at org.apache.hive.service.cli.operation.OperationManager.getOperation(OperationManager.java:154)
        at org.apache.hive.service.cli.CLIService.getOperationStatus(CLIService.java:377)
        at org.apache.hive.service.cli.thrift.ThriftCLIService.GetOperationStatus(ThriftCLIService.java:610)
        at org.apache.hive.service.cli.thrift.TCLIService$Processor$GetOperationStatus.getResult(TCLIService.java:1477)
        at org.apache.hive.service.cli.thrift.TCLIService$Processor$GetOperationStatus.getResult(TCLIService.java:1462)
        at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
        at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
        at org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:56)
        at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:285)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)

環境如下:
HiveServer2: 兩台 使用VIP做HA 版本 1.2
HUE: 3.10

一開始的時候以為是timeout還是什麼的,因為在hue.ini可以看到impala的timeout差不多就是10分鐘
但是Hive的設定裡面沒有相關的參數,後來找到了下面的資料

https://community.hortonworks.com/questions/41933/hiveserver2-going-down-frequently.html
https://groups.google.com/a/cloudera.org/forum/#!topic/hue-user/3ZpbjhHOlcE

原來是因為我使用VIP的機制用rolling的方式輪流去詢問HiveServer2來達到HA
但問題就出在HUE過了一段時間後似乎會重建對HiveServer2的session導致原來的Query問到不同的HiveServer2
也就是原本在A伺服器執行的Query,HUE跑去問B伺服器狀態,才會出現Invalid OperationHandle

目前沒有什麼好的解法,只能先不用VIP改設定成直接連某一台HiveServer2
查了一下HUE的issue,似乎要到HUE 4.0以後的版本才能真正支援HiveServer2的HA
https://issues.cloudera.org/browse/HUE-2738

comments powered by Disqus