Overview
Data Science and Machine Learning course will help you master the data science and analytics using different machine learning techniques and further gain deep understanding in data manipulation using R , also get introduced to hadoop architecture .
Objectives
At the end of Machine Learning with Data Science training course, participants will be able to
Prerequisites
- A background in Java is required
- This machine learning and data science course is appropriate for developers, who wish to write, maintain and/or optimize Java code using Hadoop framework
- Hands on experience on writing Java programs using Eclipse editor would be a plus
Course Outline
- Introduction
- Understanding Big Data
- Understand how different companies use big data for their business need
- Big Data Challanges
- Introduction to Data Science
- Types of Data Scientists
- Data Science Components
- Data Science Use Cases
- Introduction to R and Hadoop
- R and Hadoop Integration
- Machine Learning with Mahout
- HDFS- Hadoop Distributed File System
- Assumptions and Goals
- CAP principle
- Anatomy of Hadoop Cluster
- Anatomy of a File Write
- Anatomy of a File Read
- MapReduce Framework Architecture
- Hadoop Processes
- Understanding Various configuration Properties of Hadoop
- Introduction to R
- Describe why R is Used?
- Implement R programing concepts
- Learn Data Import techniques
- Analyze the processing of the Data
- Observation and Experiments
- Sampling Methods
- Quantitative Variables
- Skewness,Modality and Measures of Center
- Variance, Standard Deviation, Interquartile Range
- Probability Rules
- Disjoint,Non Disjoint events, Independence
- Conditional Probability
- Probability Distributions
- Understand Machine Learning
- Use Cases Walkthrough
- Machine Learning Techniques
- Describe Clustering
- Analyze Clustering Scenarios using Clustering Algorithms
- Learn TF-IDF and cosine Similarity
- Understand Supervised Learning Technique
- Classification
- Recommendation
- Learn Decision Tree Classifier
- Implement how various Decision Tree algorithms work.
- Implement Application of Techniques on a smaller datasets for better understanding using R.
- Understand Unsupervised Learning Technique
- Understand the implementation of Random Forest Classifier
- Understand the implementation of Na-ve Bayer’s Classifier
- Apply both techniques on smaller datasets using R
- Understand Association Rule Mining
- Understand the need for R integration with Hadoop
- Learn the ways to integrate R and Hadoop
- Understand the usage of RHadoop package
- Perform R integration with Hadoop and Run MapReduce examples
- Understand Mahout
- Gain insight on implementing Machine Learning with Mahout
- Understand Learning, Classification and Clustering techniques with Mahout
- Implement Recommendation technique and Frequent Pattern Mining in Mahout
- Understand Mahout Algorithms and Parallel proicessing
- Learn Advanced techniques in R
- Implement Parallel Random Forest
- Understand Data Visualization