Storing data in HDFS

  • Achieving reliable and secure storage
  • Monitoring storage metrics
  • Controlling HDFS from the Command Line

Parallel processing with MapReduce

  • Detailing the MapReduce approach
  • Transferring algorithms not data
  • Dissecting the key stages of a MapReduce job

Automating data transfer

  • Facilitating data Ingress and Egress
  • Aggregating data with Flume
  • Configuring data fan in and fan out
  • Moving relational data with Sqoop