Overview
This Data Engineering on Google Cloud Platform training course teaches attendees how to design data processing systems, build end-to-end data pipelines, analyze data, and carry out machine learning.
Objectives
At the end of Google Data Engineer training course, participants will be able to
Prerequisites
- Basic proficiency with common query language such as SQL
- Experience with data modeling, extract, transform, load activities
- Experience developing applications using a common programming language such as Python
- Familiarity with Machine Learning and/or statistics
Course Outline
- Creating and managing clusters.
- Leveraging custom machine types and preemptible worker nodes
- Scaling and deleting Clusters
- Running Pig and Hive jobs.
- Separation of storage and compute.
- Customize cluster with initialization actions.
- BigQuery Support.
- Google’s Machine Learning APIs
- Common ML Use Cases
- Invoking ML APIs
- Serverless Data Analysis with Google BigQuery and Cloud Dataflow
- What is BigQuery
- Queries and Functions
- Loading data into BigQuery
- Exporting data from BigQuery
- Nested and repeated fields
- Querying multiple tables
- Performance and pricing
- The Beam programming model
- Data pipelines in Beam Python
- Data pipelines in Beam Java
- Scalable Big Data processing using Beam
- Incorporating additional data
- Handling stream data
- GCP Reference architecture
- Serverless Machine Learning with TensorFlow on Google Cloud Platform
- What is machine learning (ML)
- Effective ML: concepts, types
- ML datasets: generalization
- Getting started with TensorFlow
- TensorFlow graphs and loops + lab
- Monitoring ML training
- Why Cloud ML?
- Packaging up a TensorFlow model
- End-to-end training
- Creating good features
- Transforming inputs
- Synthetic features
- Preprocessing with Cloud ML
- Building Resilient Streaming Systems on Google Cloud Platform
- Stream data processing: Challenges
- Handling variable data volumes
- Dealing with unordered/late data
- What is Cloud Pub/Sub?
- How it works: Topics and Subscriptions
- Challenges in stream processing.
- Handle late data: watermarks, triggers, accumulation.
- Streaming analytics: from data to decisions
- Querying streaming data with BigQuery
- What is Google Data Studio?
- What is Cloud Spanner?
- Designing Bigtable schema
- Ingesting into Bigtable