Data Mining, Search, and the World Wide Web
来源:互联网 发布:java string库函数 编辑:程序博客网 时间:2024/05/01 09:16
http://infolab.stanford.edu/~sergey/349/
CS 349: Data Mining, Search, and the World Wide Web
http://www-db.stanford.edu/~sergey/cs349.htmlTuesdays and Thursdays 4:15 - 5:30 in Bldg 370, Room 370 on the Main Quad
Instructors: Sergey Brin and Lawrence Page
Tues and Thurs 5:30 - 7:00 or by appointment.
sergey@cs.stanford.edu and page@cs.stanford.edu
Course Assistant: Diane Tang
Gates 416: Mon - Wed 11:15 - 12:15 or by appointment.
dtang@cs.stanford.edu
Description
Over the past two years there has been a close collaboration between the Data Mining Group (MIDAS) and the Digital Libraries Group at Stanford in the area of Web research. It has culminated in the WebBase project whose aims are to maintain a local copy of the World Wide Web (or at least a substantial portion thereof) and to use it as a research tool for information retrieval, data mining, and other applications. This has led to the development of the PageRank algorithm, the Google search engine, the DIPRE algorithm, and a number of other works which represent the cutting edge of research on the Web today (see WebBase Publications).The topics of this class are data mining and information retrieval in the context of the World Wide Web. First, we will cover background material in data mining and information retrieval that is relevant to the class. Second, we will cover recent advances made at Stanford (PageRank, DIPRE,...) and elsewhere (Kleinberg, Mitchell,...). Third and most important students will get the opportunity to work hands on with the WebBase as this will be a project class. We have already modularized a large part of the code to give people the opportunity to work with it and will continue to do so throughout the summer. Several people have already taken advantage of the code. The current WebBase repository consists of roughly 25 million web pages amounting to 150 GB of HTML.
Prerequisites
- A strong knowledge of C.
- Working knowledge of C++.
- Very basic statistics, graph theory and linear algebra.
Very Tentative Syllabus
- Introduction: 1
- 9/24 Introduction:
- 9/29 WebBase 1 (slides)
The Anatomy of a Large-Scale Hypertextual Web Search Engine
- Data Mining: 5
Publications of IBM's QUEST project- 10/1 Market Basket (slides)
R. Agrawal, T. Imielinski, A. Swami: ``Mining Associations between Sets of Items in Massive Databases'', Proc. of the ACM SIGMOD Int'l Conference on Management of Data, Washington D.C., May 1993, 207-216. PDF format. Abstract.
Dynamic Itemset Counting and Implication Rules for Market Basket Data
by Sergey Brin. Rajeev Motwani, Jeffrey D. Ullman and Shalom Tsur.
We present and algorithm for counting large itemsets faster than previous algorithms. We rely on partial results to guide the mining process.
Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 255-264, Tuscon, Arizona, May 13-15 1997. (html , postscript,gzipped ps, bibtex) - 10/6 Causality
Scalable Techniques for Mining Causal Structures by C. Silverstein, S. Brin, R. Motwani, and J. Ullman. VLDB '98.
Abstract ~ Postscript - 10/8 WebBase 2
- 10/13 Classification and Singular Value Decomposition (slides - html postscript)
SGI's MLC++ Library - 10/15 Clustering Techniques (slides - html postscript)
Berkeley Clustering Demo - *** Project Proposals Due ***
- 10/20 Data Mining in the Real World
- Search: 3
- 10/22 Standard IR
- 10/27 New Technologies
- 10/29 Latent Semantic Indexing
Bellcore's LSI site - 11/3 WebBase 3
- *** Milestone Due ***
- 10/1 Market Basket (slides)
- Web: 6
- 11/5 Search Engines 1 - basics, size, evaluation
- 11/10 Search Engines 2 - crawling, robots.txt, ...
- 11/12 PageRank, Kleinberg
- 11/17 DIPRE
- 11/19 DEC Research
- 11/24 Classification of Web Pages
- *** Final Project Due ***
Mailing List
Sergey BrinLast modified: Sat Oct 24 23:18:37 PDT 1998
- Data Mining, Search, and the World Wide Web
- Mining the Link Structure of the World Wide Web
- The World Wide Web
- The World Wide Web
- ACM Web Search and Data Mining (WSDM) Call For Paper
- Data Mining on the Web
- [MSRA040809023]MSRA - Web Search & Data Mining - Researcher
- HTTP: World Wide Web
- World Wide Web
- PROGRAMMING THE WORLD WIDE WEB Chapter 1 Fundamentals
- PROGRAMMING THE WORLD WIDE WEB Chapter 2 Introduction to HTML
- PROGRAMMING THE WORLD WIDE WEB Chapter 3 Cascading Style Sheets
- Security Technologies for the World Wide Web, Second Edition
- Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data
- World Wide Web Publishing Service
- What is "World Wide Web"
- World Wide Web Technology Surveys
- Text mining and web mining
- Linux 内核更新步骤
- 使用SetUnhandledExceptionFilter和DebugMiniDumpFilter来使程序在崩溃时生成dump文件
- VFP 截取被遮掩的窗口图像!
- perl中的转义字符
- freescale LTIB BSP 使用总结
- Data Mining, Search, and the World Wide Web
- freescale LTIB使用总结1
- PPT经常用的115个技巧
- freescale LTIB使用总结2
- 游戏开发中可能会用到的公式(2)——三角公式
- 接上篇手工课 材料:美女图片一张, 要求将其制成 边旋转边渐隐的魔幻图片
- Java Annotation手册
- android模拟器 avd路径修改
- 手机软件原型 设计工具