How To Deploy Apache Kudu/Impala From Sources

Introducction

In this very practical article, we will build two of the most interesting technologies in the Big Data ecosystem: Apache kudu and Apache Impala. We will build both technologies from source code, package them and deploy them with the minimum configurations to make them functional (we will avoid performance related configuration on this occasion) and additionally we will use Apache Impala making it as independent as possible from HDFS.

It is known that Apache Kudu and Apache Impala release their versions in “Source Release” form, therefore they do not make the binaries of these technologies available. This forces users to build such projects from source code. Generally the projects of the Big Data ecosystem and the Apache Software Foundation are based on Java code, which makes them easier to build and in particular to distribute (in most cases only a JVM is needed on the machine to be deployed).

In the case of Apache Kudu and Apache Impala this is not the case, they are two projects whose code base is C++, which makes their construction and in particular their distribution in the target systems somewhat more complicated.

Continue reading “How To Deploy Apache Kudu/Impala From Sources”