Big Data analytics with Hadoop: the Smart Elephant is in the room
Abstract. The document presents basic problems related to the knowledge discovery in Big Data context. The current growth rate is imposed by the use of different data analysis, adapted for detecting valuable information from many uncorrelated locations. The aggregation of key content generated by various sources is in line with the current trend visible in the world of analytics, exploring the business value dormant in ever-increasing volumes of data. This article describes the architecture of Big Data systems, explains the MapReduce programming paradigm, the Hadoop Distributed File System, Hadoop layers, its architecture, advantages and disadvantages of Hadoop and also clarifies what Big Data is. An integrated approach for Big Data system architecture is illustrated with an example of implementation of NoSQL database and Hadoop Cluster. NoSQL is a technology that is used to store vast databases that are non-relational in nature and that support CRUD (Create, Read, Update and Delete) operations. The results are the basis for the comparison of traditional database systems and large-scale data systems.
Author: Joanna Konopko