System Overview: LevelDB

LevelDB is a fast key-value storage library written at Google that provides an ordered mapping from string keys to string values. Many of its ideas and techniques are widely used in Big Data stacks, e.g., BigTable and HBase. The code in LevelDB is well-written with good documents, which is a ideal project to learn from.

Written on March 17, 2017

In the Code: User-Defined Function(UDF) in Spark SQL

This post illustrates the implementations of UDF in Spark SQL, where the targeted version is Spark 1.6.0 and the targeted language is Scala. I will talk about UDF in roughly two parts: registration and execution.

Written on December 2, 2016

Bulk Loading in HBase with Practice in MR & Spark

Bulk loading is a feature of HBase for ingesting tons of data efficiently. In this post, I are going to share some basic concepts of bulk loading and its practice in MapReduce and Spark.

Written on May 15, 2016