In impala while running queries over large data, sometimes we may get an error like this.
WARNINGS: Create file /tmp/impala-scratch/94869b99d0d6457:765d2bc009a914ad_94869b99d0d6457:765d2bc009a914af_516bff1b-7342-434e-8c95-c777bb7f237e failed with errno=2 description=Error(2): No such file or directory Backend 1:Create file /tmp/impala-scratch/94869b99d0d6457:765d2bc009a914ad_94869b99d0d6457:765d2bc009a914af_516bff1b-7342-434e-8c95-c777bb7f237e failed with errno=2 description=Error(2): No such file or directory
One of my friends faced this issue and on investigation I found that this issue is because of the unexpected deletion of files inside the impala scratch directory. The intermediate files used during large sort, join, aggregation, or analytic function operations are stored in this scratch directory. By default the impala scratch directory is /tmp/impala-scratch. These directoroes will be deleted after the query execution. The best solution for this problem is to change the scratch directory to some other directory. This can be done by starting the impala daemon with the option –scratch_dirs=”path_to_directory”. This directory is in the local linux file system. Not in the hdfs. Impala will not start if it is not having proper read/write access to the files in the “scratch” directory.
If you are using impala in EMR cluster, to modify the start up options, make the changes in the bootstrap action. If you want to modify this conf in an existing EMR cluster, stop the service nanny in all the nodes and restart the impala with this scratch directory property. If service nanny is running, you will not be able to restart the impala with this new argument because the service nanny will perform the service restart before your restart .. 🙂