Overview
The Google Cloud Data Engineer certification training prepares learners to design, build, maintain, and troubleshoot data processing systems on Google Cloud. This course covers data management, processing, and machine learning on Google Cloud’s powerful data infrastructure, equipping participants to make data-driven decisions by leveraging Google Cloud solutions.
Objectives
At the end of Google Data Engineer training course, participants will be able to
Prerequisites
- Basic knowledge of SQL and data modeling.
- Familiarity with general cloud computing concepts.
- Experience with data warehousing or data pipelines.
- Fundamental understanding of programming (Python or Java recommended).
- Interest in using data to solve business problems and drive decisions.
Course Outline
- Creating and managing clusters.
- Leveraging custom machine types and preemptible worker nodes
- Scaling and deleting Clusters
- Running Pig and Hive jobs.
- Separation of storage and compute.
- Customize cluster with initialization actions.
- BigQuery Support.
- Google’s Machine Learning APIs
- Common ML Use Cases
- Invoking ML APIs
- Serverless Data Analysis with Google BigQuery and Cloud Dataflow
- What is BigQuery
- Queries and Functions
- Loading data into BigQuery
- Exporting data from BigQuery
- Nested and repeated fields
- Querying multiple tables
- Performance and pricing
- The Beam programming model
- Data pipelines in Beam Python
- Data pipelines in Beam Java
- Scalable Big Data processing using Beam
- Incorporating additional data
- Handling stream data
- GCP Reference architecture
- Serverless Machine Learning with TensorFlow on Google Cloud Platform
- What is machine learning (ML)
- Effective ML: concepts, types
- ML datasets: generalization
- Getting started with TensorFlow
- TensorFlow graphs and loops + lab
- Monitoring ML training
- Why Cloud ML?
- Packaging up a TensorFlow model
- End-to-end training
- Creating good features
- Transforming inputs
- Synthetic features
- Preprocessing with Cloud ML
- Building Resilient Streaming Systems on Google Cloud Platform
- Stream data processing: Challenges
- Handling variable data volumes
- Dealing with unordered/late data
- What is Cloud Pub/Sub?
- How it works: Topics and Subscriptions
- Challenges in stream processing.
- Handle late data: watermarks, triggers, accumulation.
- Streaming analytics: from data to decisions
- Querying streaming data with BigQuery
- What is Google Data Studio?
- What is Cloud Spanner?
- Designing Bigtable schema
- Ingesting into Bigtable