Blog do projektu Open Source JavaHotel

poniedziałek, 9 października 2017

Hadoop and SQL engines

Every Hadoop distribution comes with several SQL engines. So I decided to create a simple test to compare them. Run the same queries against the same data set. So far I have been working with two Hadoop distribution, BigInsights 4.x and HDP 2.6.2 with Big SQL 5.0.1. The second is now the successor of BigInsights.
I was comparing the following SQL engines:

  • MySQL, embedded
  • Hive against data in different format: text files, Parquet and OCR
  • Big SQL on Hive tables, Parquet and OCR
  • Spark SQL
  • Phoenix, SQL engine for HBase.
It is not any kind of benchmarking, the purpose is not to prove the superiority of one SQL engine over another. I also haven't done any kind of tunning or reconfiguration to speed up. Just to conduct a simple check after installation and have several numbers at hand.
The test description and several results are here.
Although I do not claim any ultimate authority here, I can provide several conclusions.
  • Big SQL is a winner. Particularly comparing to Hive. Very important: Big SQL is running on the same physical data, the only difference is a different computational model. It even beats MySQL. But, of course, MySQL will get the upper hand for OLTP requests. 
  • Hive behaves much better paired with TEZ. On the other hand, the execution time is very fluid, can change from one execution to another drastically.
  • Spark SQL is outside competition but it is hard to outmatch in-memory execution.
  • Phoenix SQL is at the end of the race, but the execution time is very stable.


Brak komentarzy:

Prześlij komentarz