The book is based on Stanford Computer Science course CS246: Mining Massive Datasets (and CS345A: Data Mining).

The book, like the course, is designed at the undergraduate computer science level with no formal prerequisites. To support deeper explorations, most of the chapters are supplemented with further reading references.


  1. Data Mining
  2. Map-Reduce and the New Software Stack
  3. Finding Similar Items
  4. Mining Data Streams
  5. Link Analysis
  6. Frequent Itemsets
  7. Clustering
  8. Advertising on the Web
  9. Recommendation Systems
  10. Mining Social-Network Graphs
  11. Dimensionality Reduction
  12. Large-Scale Machine Learning

