Big Data Management and Analysis
CUSP-GX 8083 / ECE-GY 9113
Course Overview
This course provides a comprehensive introduction to the principles, technologies, and challenges of big data management and analysis. Students will explore:
- Key technologies: SQL, NoSQL, MapReduce, Spark
- Techniques: Data preprocessing, querying, visualization
- Applications: Machine learning, text analysis, distributed computing
- Hands-on projects with real-world examples
Learning Outcomes
By the end of the course, you will be able to:
- Identify and compare core technologies for managing big data
- Optimize data management for different scenarios
- Apply best practices for extracting insights and visualizing data
- Communicate findings effectively
Critical Questions Addressed
- What are the core technologies for managing big data, and how do they differ?
- How can data management be optimized for various scenarios?
- What are the best practices for extracting insights and telling a compelling story through data visualization?
Prerequisites
- Basic Python knowledge
- Basic data analysis skills (e.g., spreadsheets)
Support
- Ask questions on Slack. See the invite link on this Google Doc.
Office Hours
Danny Y. Huang Thursdays, 20:30–21:00
- In person: Stay after class to talk to Danny
- On Zoom: Remain on the Zoom call (same link as class)
About Danny
- Ex-Googler who used big data to uncover cybersecurity problems
- Researcher focused on everyday security and privacy issues using big data
- More info