Big Data Analytics - 9781785884696

Name: Big Data Analytics - 9781785884696
Brand: Packt Publishing
SKU: 9781785884696
Price: 61.36 USD
Availability: InStock

Packt Publishing

SKU:

9781785884696

ISBN13:

9781785884696

$61.36

(No reviews yet)

Condition:

New

A handy reference guide for data analysts and data scientists to fetch "Value" out of big data analytics using Spark on Hadoop ClustersAbout This Book* Practical tutorial with real-world examples that explores Spark on Hadoop clusters* This book is based on the latest version of Apache Spark and Hadoop integrated with the most commonly used tools* Learn about all the Spark stack components including the latest topics such as DataFrames, DataSets, and SparkRWho This Book Is ForThough this book is primarily aimed at data analysts and data scientists, it will also help architects, programmers, and practitioners. Knowledge of either Spark or Hadoop would be beneficial. It is assumed that you have basic programming background in Scala, Python, SQL, or R programming with basic Linux experience. Working experience within big data environments is not mandatory.What You Will Learn* Find out about and implement the tools and techniques of big data analytics using Spark on Hadoop clusters* Understand all the Hadoop and Spark ecosystem components and how Spark replaced MapReduce* Get to know all the Spark components: Spark Core, Spark SQL, DataFrames, DataSets, Streaming, MLLib, and Graphx* See batch and real-time data analytics using Spark Core, Spark SQL, and Spark Streaming* Get to grips with data science and machine learning using MLLib, H2O, Hivemall, Graphx, and SparkR* Get an introduction to all the new tools (based on Notebooks, Data Flow, and Spark as a Service) and their integrations with Spark and HadoopIn DetailThis book explains the fundamentals of Apache Spark and Hadoop, and how they are easily integrated together with the most commonly used tools and techniques. All the Spark components-Spark Core, Spark SQL, DataFrames, Data sets, Streaming, MLlib, Graphx, and Hadoop core components-HDFS, MapReduce, and Yarn are explored in greater depth with implementation examples on Spark and Hadoop clusters.The big data analytics industry is moving away from MapReduce to Spark. In this book, the advantages of Spark over MapReduce are explained at great depth so you can reap the benefits of in-memory speeds. The DataFrames API, Data Sources API, and new Data sets API are explained so you can build big data analytical applications.We'll explore real-time data analytics using Spark Streaming with Apache Kafka and HBase to help you build streaming applications. You'll get to know the machine learning techniques using MLLib and SparkR, and Graph Analytics with the GraphX component of Spark.You will also get the opportunity to start working with web-based notebooks such as Jupyter, Apache Zeppelin, and the data flow tool Apache NiFi to analyze and visualize data.

| Author: Venkat Ankam
| Publisher: Packt Publishing
| Publication Date: Sep 29, 2016
| Number of Pages: 326 pages
| Language: English
| Binding: Paperback
| ISBN-10: 1785884697
| ISBN-13: 9781785884696