Month 10: Big Data and Scalable Machine Learning
Week 1: Introduction to Big Data
- Day 1: Introduction to Big Data: Challenges and Opportunities
- Day 2: Big Data Technologies: Hadoop, MapReduce
- Day 3: Big Data Storage: HDFS, S3
- Day 4: Big Data Processing: Batch vs Real-time
- Day 5: Big Data Tools: Hive, Pig, Sqoop
Week 2: Distributed Computing with Apache Spark
- Day 1: Introduction to Apache Spark
- Day 2: Spark Core and Resilient Distributed Datasets (RDDs)
- Day 3: Spark SQL and DataFrames
- Day 4: Spark Streaming
- Day 5: Machine Learning with Spark MLlib
Week 3: Large-Scale Machine Learning
- Day 1: Introduction to Large-Scale Machine Learning
- Day 2: Distributed Machine Learning: Data-Parallelism, Model-Parallelism
- Day 3: Parameter Servers and Distributed Optimization
- Day 4: Deep Learning at Scale: Distributed Training
- Day 5: Large-Scale Machine Learning Libraries: TensorFlow, PyTorch
Week 4: Stream Processing and Real-Time Analytics
- Day 1: Introduction to Stream Processing
- Day 2: Stream Processing Engines: Apache Storm, Flink
- Day 3: Real-Time Analytics: Windowing, Aggregation, Join
- Day 4: Real-Time Machine Learning: Online Learning
- Day 5: Practical Application: Building a Real-Time Analytics Pipeline