Sale

Data Engineering Skills - Hadoop Shell: A Comprehensive Guide to Hadoop FS Commands

Name: Data Engineering Skills - Hadoop Shell: A Comprehensive Guide to Hadoop FS Commands
Brand: CreateSpace Independent Publishing Platform
SKU: 9781717577511
Price: 39.06 USD
Availability: InStock

CreateSpace Independent Publishing Platform

SKU:

9781717577511

ISBN13:

9781717577511

$39.99 $39.06

(No reviews yet)

Condition:

New

Hadoop is the most adopted distributed storage and processing framework for very large datasets in the world today. Although it had started as a small research project, less famously known as Apache Nutch, back in 2006 but later moved to a new subproject called Hadoop. Doug Cutting who was one of the founders of Hadoop, named it after his son's toy elephant. His son used to call the toy as hadoop, so that's how Hadoop got its name. The idea of Hadoop originated from a white paper that Google had published back in 2003 called "Google File System." This paper talked about specifically how Google designed its applications around a distributed storage and processing framework. Doug Cutting and Mike Cafarella took same concept and made it more generalized so it fits use cases of many other companies around the globe. Hadoop is famous for its distributed storage which is provided by its file system - commonly known as HDFS and distributed processing engine which is supported by something called - MapReduce. The MapReduce enabled processing of distributed datasets possible by running the code where data resides, which was a big paradigm shift compared to previous generations of processing engines. Earlier data needed to be transferred to machines where code is residing so further processing can be done on that data and results could be generated. But since data is usually bigger in size than actual code is, it used to take more time in setting the environment than actual processing would take. Hadoop adopted opposite approach where data doesn't move between machines much but code binaries are sent to machine where data is residing and then that code will locally run on that particular machine and return the results back. This approach provides obvious benefits in overall performance as setting time has reduced substantially and multiple processes can be ran on same data across distributed network of machines in parallel. I decided to write this book as the first in a series of books that I am planning to publish in future on various big data technologies. The goal of this book is to help data engineers build enough foundation in Hadoop before moving on to more high level technologies such as Spark, Hive, etc. This book is designed to be more hands on rather than plain theory. In this book, I will explain the Hadoop framework and how it works behind the scenes. Then we will shift our focus to learn specifically about Hadoop Shell. Hadoop comes with an inbuilt shell which is inspired from Linux Shell and has many similar concepts. To make our learning interesting, I have categorized various important shell commands in such a way that can be used to solve some real world like problems. These problems are inspired by real scenarios faced during several years of my working as a big data specialist.

| Author: Neeraj Malhotra
| Publisher: CreateSpace Independent Publishing Platform
| Publication Date: Apr 27, 2018
| Number of Pages: 136 pages
| Language: English
| Binding: Paperback
| ISBN-10: 1717577512
| ISBN-13: 9781717577511