Impala doesn't require HBase to operate, it can use raw HDFS. Simple example, if you had a few terabytes of TSV files, you could easily copy the raw data into HDFS and then create a simple schema around it. All queries on this data would be in parallel across all the nodes in the cluster, this is partly due to the distributed nature of HDFS.