Big data refers to large amounts of data beyond acceptable limits of commonly-used data collection, storage, management, and analysis software. Big data has become a new trend and culture in academia and industry from the beginning of this decade. The importance of big data technology is being widely recognized and getting higher owing to recent technology development. In particular, popular social media services as well as devices connected via Internet-of-Things are accelerating generation of big data. Then, the cloud service improves accessibility of such big data by allowing us to access it everywhere. Furthermore, computing power has also improved rapidly with the introduction of new CPU and GPU hardware technologies. On the basis of these environmental changes, MapReduce and Hadoop significantly contributed to making big data processing prevalent in these days. Hadoop, which is an open-source implementation of MapReduce, enables us to achieve high-performance computing with only commodity machines, but without requiring expensive mainframe computers.
This keynote consists of two parts. The first part introduces the recent trends of big data platforms originated from Hadoop. Then, the second part addresses a few interesting big data applications enabled by such big data platforms.
In the first part, I would like to present the concept of the MapReduce paradigm and its significance in the history of big data processing. Then, I will discuss advantages as well as limitations of Hadoop and systematically review research efforts to overcome the limitations in three catagories: supports of iterative processing, stream processing, and the SQL language. As a solution for the third category, NewSQL, such as Google’s spanner and F1, has emerged as a new paradigm. I will also elaborate on ODYS, a massively-parallel search engine which has been developed at KAIST, as an example NewSQL system.
In the second part, I will address the effort of combining AI with big data since big data technology serves as an enabler of artificial intelligence (AI). For example, IBM Watson learned from 200 million pages including Wikipedia and news articles. With this rich knowledge base, IBM Watson surprisingly beat quiz-show human champions. IBM is expanding its application to medical science where Watson for Oncology collaborates with human doctors to diagnose cancers. As another example, smartphone vendors are developing intelligent personal assistants, such as Google Now, Apple Siri, and Amazon Alexa. These assistants benefit from big data because they are getting smarter by learning from huge amounts of user feedback and queries. I will overview some of these data-driven services.
In summary, the keynote will address the characteristics of big data, recent trends of big data platforms, and emerging applications for big data intelligence.