Big Data Programming

To stay competitive a business needs to know as much as it can about people, the environment it's operating in, and who and where the competitors are. The amount of data companies collect keeps growing. There is an urgent need of a strategy to make sense of it all. Star Big Data Programming is a certification course that will help learners master the skills they need to establish a successful career as a data engineer. The program will help the learners master the skills on HDFS, MapReduce, HBase, Hive, Pig, Yarn, Oozie, Flume and Sqoop using real-time use cases from retail, social media, aviation, tourism, and finance industries. It equips the learners with in-depth knowledge of writing code using the MapReduce framework and managing large data sets with HBase.

Audience

Intermediate

Big Data Programming Course Objectives

In this course, you will learn about:

Big data and its business applications
Apache Hadoop and its big data eco-system
Deploying Hadoop in a clustered environment
Interacting with No-SQL databases
Managing key Hadoop components (HDFS, YARN and Hive)
Spark - the next-generation computational framework
Installing and working with Hadoop
Hadoop related technologies – Avro, Flume, Sqoop, Pig, Oozie, etc
Advanced topics like Hadoop security, Cloudera, IBM InfoSphere and more

Course Outcome

After competing this course, you will be able to:

Understand the finer nuances of the Big Data technology
Deal with Big Data related tools, platforms, and their architecture to store, program, process, and manage the data
Deploy Hadoop and its related technologies
Use the Hadoop ecosystem to manage your data
Deploy machine learning concepts with Mahout

Table Of Contents Outline

Introducing Data and Big Data
Identifying the Business Applications of Big Data
Big Data and Hadoop
HDFS - Storing Data in Hadoop
Introduction to MapReduce
YARN and MapReduce - Processing Data in Hadoop
Developing a First Application for MapReduce
Exploring the Working of a MapReduce Process
Avro
Parquet
Flume - Service for Streaming Event Data
Sqoop (MySQL to Hadoop)
Apache Pig
Hive – Data Warehouse
Oozie– Workflow Scheduler
Exploring Crunch - Joining and Data Integration
Exploring Spark and Scala
Exploring HBase - Big Data Store
Zookeeper - Coordination Service for Distributed Applications
Exploring Storm
Machine Learning with Mahout
Interacting with NoSQL Databases
Hadoop and Security
Apache Drill and Google BigQuery
Exploring Cloudera
Exploring Hortonworks
HDInsight
IBM Infosphere
Hadoop and AWS
Appendix- Exploring Pivotal HD Case Studies

Labs

Chapter 1. Setting up the required environment for Apache Hadoop installation
Chapter 2. Installing the Single-Node Hadoop configuration on the system
Chapter 3. Exploring the Web-Based User Interface of Hadoop Cluster
Chapter 4. Implementing Map-Reduce Program for Word Count
Chapter 5. Implementing Basic Pig Latin Script
Chapter 6. Implementing Basic Hive Query Language Operations
Chapter 7. Using Apache Flume to fetch open-source user tweets from Twitter

Big Data Programming

Audience