Month 10: Big Data and Scalable Machine Learning

Month 10: Big Data and Scalable Machine Learning

Week 1: Introduction to Big Data

  • Day 1: Introduction to Big Data: Challenges and Opportunities
  • Day 2: Big Data Technologies: Hadoop, MapReduce
  • Day 3: Big Data Storage: HDFS, S3
  • Day 4: Big Data Processing: Batch vs Real-time
  • Day 5: Big Data Tools: Hive, Pig, Sqoop

Week 2: Distributed Computing with Apache Spark

  • Day 1: Introduction to Apache Spark
  • Day 2: Spark Core and Resilient Distributed Datasets (RDDs)
  • Day 3: Spark SQL and DataFrames
  • Day 4: Spark Streaming
  • Day 5: Machine Learning with Spark MLlib

Week 3: Large-Scale Machine Learning

  • Day 1: Introduction to Large-Scale Machine Learning
  • Day 2: Distributed Machine Learning: Data-Parallelism, Model-Parallelism
  • Day 3: Parameter Servers and Distributed Optimization
  • Day 4: Deep Learning at Scale: Distributed Training
  • Day 5: Large-Scale Machine Learning Libraries: TensorFlow, PyTorch

Week 4: Stream Processing and Real-Time Analytics

  • Day 1: Introduction to Stream Processing
  • Day 2: Stream Processing Engines: Apache Storm, Flink
  • Day 3: Real-Time Analytics: Windowing, Aggregation, Join
  • Day 4: Real-Time Machine Learning: Online Learning
  • Day 5: Practical Application: Building a Real-Time Analytics Pipeline