信息检索领域相关资料

来源:互联网 发布:精通android网络编程 编辑:程序博客网 时间:2024/05/22 17:38
信息检索领域相关资料 (A Guide to Information Retrieval)Organized by Hongfei YanLast updated on Sept. 16, 2009---------------------ContentsBooks+ Finding Out About: Search Engine Technology from a cognitive Perspective (Belew, R.K., 2000)http://www-cse.ucsd.edu/~rik/foa/+ Foundations of Statistical Natural (C. Manning and H. Schutze, 1999)+ Information Retrieval, 2nd edition (C.J. van Rijsbergen, 1979)(full text)http://www.dcs.gla.ac.uk/Keith/Preface.html+ Information Retrieval: A Survey (Ed Greengrass, 2000)http://www.csee.umbc.edu/cadip/readings/IR.report.120600.book.pdf+ Information Retrieval: Data Structures & Algorithms(Frakes, W. and Baeza-Yates, R., 1992)http://www.dcc.uchile.cl/~rbaeza/iradsbook/irbook.html+ Information Retrieval Interaction (Ingwersen, P., Taylor Graham, 1992)http://www.db.dk/pi/iri/+ Introduction to Information Retrieval(Christopher D. Manning, Prabhakar Raghavan, and Hinrich Schuetze, 2008)http://www-csli.stanford.edu/~schuetze/information-retrieval-book.html+ Managing Gigabytes:compressing and indexing documents and images,2nd edition, (Ian H. Witten, Alistair Moffat,and Timothy Bell,1999)+ Mining the Web: Discovering Knowledge from Hypertext Data (Soumen Chakrabarti, 2003)+ Modeling the Internet and the Web: probabilistic Methods and Algorithms (Pierre Baldi, Paolo Frasconi and Padhraic Smyth, 2003)+ Modern Information Retrieval (Ricardo Baeza-Yates and Berthier Ribeiro-Neto, 2000)+ Readings in Information Retrieval. (Sparck-Jones, K. and Willett, P., 1997)+ Search Engines: Information Retrieval in Practice(B. Croft, D. Metzler, T. Strohman, 2009)http://www.pearsonhighered.com/croft1epreview/samples.html+ Search Engine: Principle,Technology and Systems 搜索引擎-原理、技术与系统(Xiaoming Li,et al., 2005 ), (full text)http://sewm.pku.edu.cn/book/dlbook.html+ The Geometry of Information Retrieval (C.J. van Rijsbergen, 2004)http://ir.dcs.gla.ac.uk/GeometryOfIR/+ The Turn: Integration of Information Seeking and Retrieval in Context(Ingwersen, P., and Jarvelin, K., 2005)+ TREC: Experiment and Evaluation in Information Retrieval (Voorhees, E.M., and Harman, D.K., 2005)http://mitpress.mit.edu/catalog/item/default.asp?ttype=2&tid=10667Conferences and Workshops+ CIKM: Conference on Information and Knowledge Managementhttp://www.csee.umbc.edu/cikm/+ SIGIR: Special Interest Group on Information Retrievalhttp://www.sigir.org/+ SIGKDD: Knowledge Discovery and Data Mininghttp://www.kdd.org/+ World Wide Webhttp://www.iw3c2.org/+ SEWM: Symposium of Search Engine and WebMining全国搜索引擎和网上信息挖掘学术研讨会http://net.pku.edu.cn/~sewm/Courses+ CMU Information Retrievalhttp://nyc.lti.cs.cmu.edu/classes/11-741/ (Spring 2006)Instructors: Jamie Callan and Yiming Yang + Cornell University The Structure of Information Networks (Spring 2006)http://www.cs.cornell.edu/courses/cs685/2006sp/Instructor: Jon Kleinberg+ Peking University Web Based Information Architectures (Fall 2006)http://net.pku.edu.cn/~wbia/Instructor: Xiaoming Li, Jimin Wang and Bo Peng+ Stanford Univ. Text Information Retrieval and Web Mining (Autumn 2005)http://www.stanford.edu/class/cs276/Instructor: Christopher Manning and Prabhakar Raghavan+ UIUC Introduction to Text Information Systems (Spring 2007)http://sifaka.cs.uiuc.edu/course/410s07/Instructor: ChengXiang Zhai+ UMass Univ. Information retrieval course (Spring 2005)http://ciir.cs.umass.edu/cmpsci646/Instructors: James Allan+ Washington Univ. Search Engines coursehttp://courses.washington.edu/lis544/Evaluation Resources+ CLEF: Cross-Language Evaluation Forumhttp://clef.iei.pi.cnr.it/+ CWIRF: Chinese Web Information Retrieval Forumhttp://www.cwirf.org/+ DUC: Document Understanding Conferenceshttp://duc.nist.gov/+ INEX: INitiative for the Evaluation of XML Retrievalhttp://inex.is.informatik.uni-duisburg.de/+ NTCIR: NII-NACSIS Test Collection for IR Systemshttp://research.nii.ac.jp/ntcir/+ TREC: Text REtrieval Conference http://trec.nist.gov/Journals+ Briefings in Bioinformatics (full text)http://bib.oxfordjournals.org/archive/+ Computational Linguistics, The MIT Presshttp://mitpress.mit.edu/catalog/item/default.asp?ttype=4&tid=10+ Data & Knowledge Engineering (DKE), Elsevierhttp://www.elsevier.com/wps/find/journaldescription.cws_home/505608/description?navopenmenu=-2+ D-Lib Magazinehttp://www.dlib.org/+ Information Processing Letters, Elsevierhttp://www.elsevier.com/locate/issn/00200190+ Information Processing and Management (IP&M), Elsevierhttp://www.elsevier.com/locate/infoproman+ Information Retrieval, Springerhttp://www.springer.com/sgw/cda/frontpage/0,11855,3-0-70-35744790-detailsPage%253Djournal%257Cdescription%257Cdescription,00.html+ Information Researchhttp://informationr.net/ir+ International Journal on Digital Libraries, Springerhttp://link.springer.de/link/service/journals/00799/index.htm+ International Journal of Cooperative Information Systems (IJCIS), World Scientifichttp://ejournals.wspc.com.sg/ijcis/ijcis.shtml+ International Journal on Document Analysis and Recognition, Springerhttp://link.springer.de/link/service/journals/10032/index.htm+ International Journal of Intelligent Systems, Wileyhttp://www3.interscience.wiley.com/cgi-bin/jhome/36062+ International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems (IJUFKS), World Scientifichttp://ejournals.wspc.com.sg/ijufks/ijufks.shtml+ Journal of the American Society for Information Science and Technology (JASIST), Wileyhttp://www3.interscience.wiley.com/cgi-bin/jhome/76501873+ Journal of Documentation (JDoc). Emeraldhttp://www.emeraldinsight.com/0022-0418.htm+ Journal of Intelligent Information Systems (JIIS), Springerhttp://www.wkap.nl/journalhome.htm/0925-9902+ Knowledge and Information Systems (KAIS), Springerhttp://link.springer.de/link/service/journals/10115/index.htm+ Natural Language Engineering, Cambridge University Presshttp://www.cambridge.org/journals/journal_catalogue.asp?mnemonic=NLE+ Transactions On Information Systems (TOIS), ACMhttp://www.acm.org/tois/+ Transactions on Knowledge and Data Engineering (TKDE), IEEE http://www.computer.org/tkde/List Archives+ SIG-IRList, http://www.sigir.org/sigirlist/index.htmlOrganizations and Special Interest Groups+ Cambridge NLIP, http://www.cl.cam.ac.uk/Research/NL/+ CMU LTI, http://www.lti.cs.cmu.edu/+ DEC laboratories in Palo Alto, Calif.+ Glasgow Information Retrieval Group, http://www.dcs.gla.ac.uk/ir/+ Google Labs, http://labs.google.com/+ LTI, http://www.lti.cs.cmu.edu/+ Massachusetts CIIR, http://ciir.cs.umass.edu/+ MSR Asia, Web Search & Data Mining Grouphttp://research.microsoft.com/wsm/+ Standford InfoLab, http://infolab.stanford.edu/+ UIUC Information Retrieval Group, http://sifaka.cs.uiuc.edu/ir/+ 北大天网组, http://sewm.pku.edu.cn/+ 北京大学计算语言学研究所, http://icl.pku.edu.cn/+ 复旦大学信息检索和自然语言处理组, http://www.cs.fudan.edu.cn/mcwil/irnlp/+ 哈工大信息检索组, http://ir.hit.edu.cn/+ 清华大学智能技术与系统国家重点实验室http://www.csai.tsinghua.edu.cn/ #+ 中科院大规模内容计算组, http://159.226.40.18/ (fail to visit)Researchers+ Andrew McCallum,http://www.cs.umass.edu/~mccallum/+ ChengXiang Zhai, developing Lemurhttp://www-faculty.cs.uiuc.edu/~czhai/+ Gerard Saltonhttp://www.cs.cornell.edu/Info/Department/Annual95/Faculty/Salton.html+ Karen Sparck, developing IDFhttp://www.cl.cam.ac.uk/users/ksj/+ Keith van Rijsbergenhttp://www.dcs.gla.ac.uk/~keith/+ Jamie Callan, http://www.cs.cmu.edu/~callan/+ Jon Kleinberg, developing HIThttp://www.cs.cornell.edu/home/kleinber/+ Li Xiaoming, developing Tianwang & Infomall+ Nick Craswell, developing Terabyte Trackhttp://research.microsoft.com/~nickcr+ Susan Dumais, developing LSIhttp://research.microsoft.com/~sdumais/+ Yiming Yang, developing text categorizationhttp://www.cs.cmu.edu/~yiming/+ Stephen Robertson, http://research.microsoft.com/users/robertson/+ Tefko Saracevichttp://www.scils.rutgers.edu/~tefko/+ W. Bruce Crofthttp://ciir.cs.umass.edu/personnel/croft.htmlResearch-related Resources+ http://www-faculty.cs.uiuc.edu/~czhai/research.htmlSoftware+ Apache Lucene: a full-featured text search engine libraryhttp://lucene.apache.org/java/docs/index.html+ Gate: a general architecture for text engineeringhttp://gate.ac.uk/+ Lemur: A full-text search enginehttp://www.lemurproject.org/+ MG: A full-text search enginehttp://www.math.utah.edu/pub/mg/+ Porter Stemmer: English stemming algorithmhttp://www.tartarus.org/martin/PorterStemmer/+ Nutch: an open source web search enginehttp://sourceforge.net/projects/nutch/+ TSE: A Tiny Search Enginehttp://sewm.pku.edu.cn/src/TSE/---------------------References: [1] Information Retrieval Resources, http://www.sigir.org/resources.html[2] http://ir.dcs.gla.ac.uk/resources.html[3] http://www.cs.cmu.edu/~callan/Teaching/Resources.html[4] Diekemar, Information Retrieval Links, Jan. 28, 1999. http://web.syr.edu/~diekemar/ir.html[5] 陈鸿标,网上研习信息检索,1999年11月. http://159.226.40.18/freshman/resources/网上研习信息检索.doc[6] 数据挖掘研究院, http://www.dmresearch.net/[7] 语音自然语言在线, http://www.snlpinfo.com/index.php[8] PKU SEWM Group, http://sewm.pku.edu.cn/[9] http://www.cs.cmu.edu/~callan/Teaching/Resources.html[10] http://icl.pku.edu.cn/member/lisujian/maincontent.htm[11] http://www.cs.fudan.edu.cn/mcwil/irnlp/link.htm[12] Robert Krovetz, A Guide to the Literature of Information Retrieval,http://159.226.40.18/freshman/resources/guide-to-ir-lit.ps[13] ACM Digital Library, http://portal.acm.org/portal.cfmhttp://acm.lib.tsinghua.edu.cn/acm/[14] http://www.sigir.org/proceedings/Proc-Browse.html[15] SIGIR,http://portal.acm.org/browse_dl.cfm?linked=1&part=series&idx=SERIES278&coll=portal&dl=ACM&CFID=72474811&CFTOKEN=69288563[16] WWW, International World Wide Web Conferencehttp://portal.acm.org/browse_dl.cfm?linked=1&part=series&idx=SERIES968&coll=portal&dl=ACM&CFID=72474811&CFTOKEN=69288563[17] China Digital Journal Community, http://wanfang.calis.edu.cn/wf/szhqk/index.html---------------------More details are listed as follows====================CIIR (The Center for Intelligent Information Retrieval, 美国Massachusetts大学的智能信息检索中心)http://ciir.cs.umass.edu/The Center for Intelligent Information Retrieval, a National Science Foundation-created S/IUCRC Center, is one of the leading information retrieval research labs in the world. The CIIR develops tools that provide effective and efficient access to large, heterogeneous, distributed, text and multimedia databases.CIIR accomplishments include significant research advances in the areas of distributed information retrieval, information filtering, topic detection, multimedia indexing and retrieval, document image processing, terabyte collections, data mining, summarization, resource discovery, interfaces and visualization, and cross-lingual information retrieval.The Center for Intelligent Information Retrieval continues to support the emerging information infrastructure, both through research and technology transfer. The goal of the CIIR is to develop tools that provide effective and efficient access to large, heterogeneous, distributed, text and multimedia databases. ====================Glasgow Information Retrieval Grouphttp://www.dcs.gla.ac.uk/ir/由Keith van Rijsbergen率领的英国Glasgow大学信息检索研究小组。这个小组理论和实践并重,旨在建造一个高效、新颖、成功的多媒体信息检索系统,为终极用户服务。The Information Retrieval Group led by Professor Keith van Rijsbergen has a vigorous programme of research, based on both theory and experiment, aimed at giving end-users novel, effective, and efficient access to the world of multi-media information. The group, part of the Department of Computing Science, University of Glasgow, has a strong research history in a wide area of information retrieval research from theoretical modelling of the retrieval process to advanced system building and to the user-oriented evaluation of information retrieval systems. The group's interests also include many areas of Web information retrieval such as link analysis, summarisation and the development of novel interaction techniques (e.g., ostension, implicit feedback and graphical visualisation). Our research preserves a strong emphasis on the evaluation of interactive IR systems, and the group maintains strong links with researchers in Human-Computer Interaction and Psychology.------Keith van Rijsbergen, http://www.dcs.gla.ac.uk/~keith/英国格拉斯哥大学。概率IR的逻辑推理学派代表人,出版了著名的IR经典教材 INFORMATION RETRIEVAL, 重点介绍用概率研究信息检的方法。=====================Cambridge NLIP Group (Natural Language and Information Processing Group)http://www.cl.cam.ac.uk/Research/NL/Research in NLIP has been done in the Computer Laboratory for nearly fifty years. The earliest work, by Roger Needham and Karen Sparck Jones, was on automatic thesaurus construction, in the context of document retrieval and machine translation. Subsequent research by Karen Sparck Jones during the 1960s and 70s focused on statistical approaches to retrieval and included innovative work on term weighting.  From the later 1970s research in language processing developed, with work on syntax, semantics and discourse processing,------Karen Sparck Jones, http://www.cl.cam.ac.uk/users/ksj/Karen Sparck Jones has been one of the most influential figures in Computing since the 1950’s. Her work on Information Retrieval and Natural Language Processing has never been so central as it is are today, with its implications for search engine technology, the semantic web and even bioinformatics.In 1972, Karen Sparck Jones published in the Journal of Documentation the paper which defined the term weighting scheme now known as inverse document frequency (IDF).Karen Sparck Jones is emeritus Professor of Computers and Information at the Computer Laboratory, University of Cambridge. She has worked in automatic language and information processing research since the late fifties, and has many publications including several books, most recently `Evaluating Natural Language Processing Systems' with Julia Galliers, and `Readings in Information Retrieval', edited with Peter Willett. 1988年度Salton奖得主。现代概率IR模型的另一创始人。在NLP、IR等领域都颇有建树,而且做了大量的组织性工作。现在供职于英国剑桥大学计算机学院。====================LTICMU (Carnegie Mellon Universit) Language Technologies Institute,http://www.lti.cs.cmu.edu/The Language Technologies Institute (LTI) of the School of Computer Science atCarnegie Mellon University conducts research and provides graduate educationin all aspects of language technology and information management. The LTI wasestablished in 1996, as an expansion of the Center for Machine Translation(CMT).The Center for Machine Translation (CMT) was a research branch of the Schoolof Computer Science devoted to basic and applied research in all aspects ofnatural language processing, with a primary focus on machine translation,speech processing, and information retrieval. Containing a unique mix ofacademic and industrial researchers specializing in various aspects ofcomputer science, artificial intelligence, computational linguistics andtheoretical linguistics, the CMT provided a rich and diverse environment forcollaboration among faculty, staff, visiting scholars, and qualified students.------Lemur ToolkitLemur is a collection of search engine algorithms and information retrievalapplications used for IR research, development and education. Lemur provides arich query language that supports search against simple texts, structured(XML) texts, and texts annotated with part-of-speech, named-entity, and otherannotations used in NLP and text-mining applications. Lemur's search enginescomfortably support collections ranging from a few gigabytes to a fewterabytes of text. The software is distributed under open-source license, andis used widely in the IR research community.====================Standford InfoLabhttp://infolab.stanford.edu/The Stanford WebBase Projecthttp://dbpubs.stanford.edu:8091/~testbed/doc2/WebBase/The Stanford WebBase project is investigating various issues in crawling,storage, indexing, and querying of large collections of Web pages. The projectbuilds on the previous Google activity that was part of the DLI1 initiative.The DLI2 WebBase project aims to build the necessary infrastructure tofacilitate the development and testing of new algorithms for clustering,searching, mining, and classification of Web content.====================北大天网组, http://sewm.pku.edu.cn/    北京大学网络实验室自1997年开始从事搜索引擎方面的研究与系统开发,技术积累深厚,综合实力和学术影响在国内一直处于领先地位。我们研发的“天网”搜索引擎系统是全国最有影响的出自校园的搜索引擎,从1997年10月开始一直运行至今。“天网”在增量搜索技术、快速检索技术,海量信息存储技术等方面都具有较强的优势,她的不断发展培育了一批批在海量网络文本信息处理方面有实战经验的学生,受到中外IT企业的普遍欢迎。    从2001年开始,本研究组在搜索引擎技术的基础上,展开了中国互联网信息历史的收集与存档工作,形成了“中国互联网信息博物馆”,至今已收藏20亿在不同时期出现过的中文网页,是目前全国规模最大的历史网页收藏与回放系统。同时,我们还尝试了在其基础上进行多学科交叉的研究。====================中科院大规模内容计算组http://159.226.40.18/    信息检索小组主要针对文本信息的检索开展研究,多次参加TREC会议,取得了很好的研究成果。小组开发的天罗检索系统在很多国家重要的信息部门得到了广泛的应用,目前主要的研究方向包括WEB信息的获取,WEB信息检索等。    信息分析小组的研究主要集中在大规模多源异构信息的分析与挖掘方面,主要包括文本分类与聚类、信息过滤、个性化服务、自然语言问答和浅层自然语言处理等。小组研制了一系列文本信息加工处理的实验平台,目前实验平台可以通过主页中“成果演示”进行演示。值得一提的是小组开展的公开源码计划,其中的高性能分词系统ICTCLAS得到了研究人员的广泛认同与使用。====================复旦大学信息检索和自然语言处理组, http://www.cs.fudan.edu.cn/mcwil/irnlp/ 大规模文本处理主要研究自然语言(特别是中文信息)的处理技术和方法,包括二个方面内容:首先是基础性工作,主要是基础性的理论和算法, 包括自动分词、未登录词识别、词性和概念标注、句法分析和语义分析等,也包括语料库的搜集整理等;其次是中文信息处理的应用技术,包括自动索引、文本检索、文本摘要、文本分类和文本过滤,特别是上述技术在网络环境下的应用。这部分工作是文本方向的研究重点。====================HIT-IRLab, http://ir.hit.edu.cn/    哈工大信息检索研究室 (HIT-IRLab) 成立于 2001 年 3月。研究方向包括文本检索、问答系统、自动文摘、文本挖掘和语言分析等, 研究室以语言分析为基础研究,以文本过滤为应用研究,以信息抽取为语言分析从句子理解向 篇章理解的延伸,以句子检索为在语言分析和篇章理解的支持下的智能化精准检索技术。 ====================SIGIR(美国计算机学会信息检索特别兴趣小组)、TREC(文本检索学术年会)MUC(消息理解学术年会)TIPSTER(美国国防部高级研究计划署的IR实践基地)====================北京大学计算语言学研究所http://icl.pku.edu.cn/    北京大学计算语言学研究所成立于1986年。致力于计算语言学理论、语言信息处理的基础资源和应用技术三方面的研究。    围绕计算语言学和自然语言处理,包括如下三个主要的方向:首先基础资源的研究与建设:计算词典学与机器词典,综合型语言知识库,语料库语言学与语料库加工技术,术语学、术语自动提取、术语标准化研究等。其次是基础理论、NLP的模型和方法:计算语言学基础,自然语言处理核心技术,现代汉语语法,汉语的词/句法/语义分析,NLP统计模型,语言处理的信息论方法等。另外是应用技术:机器翻译的方法、技术与系统实现,信息检索与提取,自然语言信息处理系统的评价方法和技术,受限汉语及其辅助写作系统,中国古诗词计算机辅助研究等。====================清华大学智能技术与系统国家重点实验室http://www.csai.tsinghua.edu.cn/     智能技术与系统国家重点实验室依托于清华大学。实验室于1990年2月对外开放运行。主要从事人工智能基本原理、基本方法的基础与应用基础研究,包括智能信息处理、机器学习、智能控制,以及神经网络理论等,还从事与人工智能有关的应用技术与系统集成技术的研究,主要有智能机器人、声音、图形、图像、文字及语言处理等。================Susan Dumais, http://research.microsoft.com/~sdumais/I am interested in algorithms and interfaces for improved informationretrieval, as well as general issues in and human-computer interaction. Ijoined Microsoft Research in July 1997. I work on a wide variety ofinformation access and management issues, including: personal informationmanagement, web search, question answering, information retrieval, textcategorization, collaborative filtering, interfaces for improved search andnavigation, and user/task modeling.Prior to coming to Microsoft, I worked on a statistical method forconcept-based retrieval known as Latent Semantic Indexing. You can findpointers to this work on the Bellcore (now Telcordia) LSI page. ===============UIUC Information Retrieval Grouphttp://sifaka.cs.uiuc.edu/ir/The Information Retrieval (IR) group is part of the Database and InformationSystems (DAIS) Lab  of the Computer Science Department at University ofIllinois at Urbana-Champaign. We work on a wide spectrum of problems in thegeneral area of text information management, including  retrieval,organization, filtering , and mining of textual information, aiming atdeveloping advanced text information management techniques and systems thathelp people make better use of text information.------ChengXiang Zhai, http://www-faculty.cs.uiuc.edu/~czhai/Research Interests: Information Retrieval, Text Mining, Natural LanguageProcessing, BioinformaticsUniversity of Illinois at Urbana-Champaign, is recognized forhis work on user-centered, adaptive intelligent information access. Histechniques expect to improve search-engine performance, support betterinformation organization and enable understanding of large volumes ofinformation. Zhai's work in information retrieval is expected to enhancecurricula and provide new educational tools for the growing informationtechnology workforce.===============Stephen Robertson, http://research.microsoft.com/users/robertson/Stephen Robertson joined Microsoft Research Cambridge in April 1998.In 1998, he was awarded the Tony Kent STRIX award by the Institute ofInformation Scientists. In 2000, he was awarded the Salton Award by ACM SIGIR.He is a Fellow of Girton College, Cambridge.At Microsoft, he runs a group called Information Retrieval and Analysis, whichis concerned with core search processes such as term weighting, documentscoring and ranking algorithms, and combination of evidence from differentsources. These are studied theoretically through the use of formal models,mainly statistical, and statistical methods including machine learningmethods, and experimentally, through activities such as the Text RetrievalConference (TREC) and with internally generated evaluation sets. The group(with its Keenbow evaluation environment) has had some excellent results atTREC. The group works closely with product groups to transfer ideas andtechniques.His main research interests are in the design and evaluation of retrievalsystems. He is the author, jointly with Karen Sparck Jones, of a probabilistictheory of information retrieval, which has been moderately influential. Afurther development of that model, with Stephen Walker, led to the termweighting and document ranking function known as Okapi BM25, which is used inmany experimental text retrieval systems.Prior to joining Microsoft, he was at City University London, where he retainsa part-time position as Professor of Information Systems in the Department ofInformation Science (homepage). He was Head of Department for eight years,during which time it achieved the highest possible rating in two successiveresearch assessment exercises. He also started the Centre for InteractiveSystems Research, the main research vehicle of which is the Okapi textretrieval system, which has also done well at TREC.Before joining City, he was a research fellow at University College London,where he took his PhD in the School of Library Archive and InformationStudies. Before that he was in the research department at Aslib. He has an MScin Information Science from City and a first degree in mathematics fromCambridge. ===================Nick Craswellhttp://research.microsoft.com/~nickcrI am an associate researcher at Microsoft Research Cambridge, in theInformation Retrieval and Analysis Group.Research OverviewI am interested in Web search evaluation, mostly on enterprise-scale webs butalso the World Wide Web. I built the VLC, VLC2, WT2g and .GOV testcollections, which have been made available to research groups around theworld. David Hawking and I coordinated the TREC Web Track experiments. I amcurrently involved in the TREC Terabyte Track and Enterprise Track. Somepublications: Book chapter preprint (pdf), IR'01 (citeseer) and CSIRO'01(pdf).I also work on effective Web search, which means making use of information inpages, link structure and URL structure to generate more useful Web searchresults. Some papers: SIGIR'05 (pdf), SIGIR'01 (pdf), TOIS'03 (pdf) (copyingis by permission of ACM, Inc.) and ADCS'03 (pdf).My PhD was in distributed information retrieval (thesis pdf) which meansbuilding a system on top of multiple engines/databases that already exist. Myrecent work in the area has considered whether (or when) DIR is reallypractical. Some papers: ADC'99 (ps), DL'00 (pdf), ADC'03 (pdf) and ADC'04(pdf). ===============Web Search & Data Mining Group of MSR Asiahttp://research.microsoft.com/wsm/The goal of the Web Search & Data Mining Group of MSR Asia is to drive thenext generation of Web search by leveraging data mining, machine learning, andknowledge discovery techniques for information analysis, organization,retrieval, and visualization. In addition, in contrast with current Web searchmethods, which essentially do document-level ranking and retrieval, the WebSearch & Data Mining Group has created search at the object level to bringincreased knowledge and intelligence to users.A Glimpse at Several Core Innovations:Large-scale Experimental Web Search PlatformThe Web Search & Data Mining Group is creating a large scale search platformto efficiently store, parse, index and search billions of Web pages and othertypes of documents. The search platform is flexible enough to allow fortesting of various state-of-the-art search techniques that have been createdat the lab using new technologies.Structuralizing the WebThe biggest challenge facing both users and search engines over the nextseveral decades is the continued unstructured growth of the Internet. As such,search functions that can effectively and efficiently dig outmachine-understandable information and knowledge layers from unorganized andunstructured Web data will be the key to supporting relevant search results.To meet this challenge, the group is exploring technologies, namely Webinformation extraction, deep Web mining, and Web structure mining that canautomatically classify structures and extract objects from the Web. Theinformation and knowledge gathered using these new techniques greatly improvesthe performance of current Web search and even facilitates the creation ofmore sophisticated next generation search technologies.Vertical SearchToday's conventional search engines can be described as page-level searchengines whose main function is to rank web pages according to their relevanceto a given query. Driving the future of the search industry are functions thatdelve deeper into vertical domains to provide knowledge and intelligence toquery results. At MSR Asia, the Web Search & Data Mining Group is addressingthe greatest challenges faced by vertical search including large scale webclassification, object-level information extraction, object identification andintegration, and object relationship mining and ranking. The results of theseefforts are leading to more advanced search engines that deliver intelligenceand insight to search results.Mobile SearchThe explosive growth of new computing devices such as handheld computers,Windows Mobile-based PocketPCs, and SmartPhones is driving demand for greaterand more efficient information access. These devices, which leverage the powerof the Web and allow greater access to information than ever before, are stillnot capable of performing at the level of a desktop PC. At MSR Asia, the WebSearch & Data Mining Group is inventing new technologies to improve the mobilesearch and browsing experience and deliver the capabilities of a PC to usersof these new devices. Project initiatives include developing innovativepresentation schemes and user interfaces to facilitate search and browsingtasks on mobile devices and developing context aware search technologies toaddress the special information needs of mobile users.Multimedia SearchThe Web Search & Data Mining Group is conducting research into newtechnologies that index multimedia content such as images, videos, and audio.Through content analysis and advanced visualization techniques, the group istransforming today's conventional text based search engines to includemultimedia content thus delivering more intelligent search results to users.For example, the group recently developed a new multimedia news reader whichmines large archival news databases presenting text, map information, images,and background music within a unique user interface providing readers with amore efficient news search engine and a more enjoyable reading experience.------Wei-Ying Mahttp://research.microsoft.com/users/wyma/Senior Researcher, Research Manager, Microsoft Research AsiaDr. Wei-Ying Ma received the B.S. degree in electrical engineering from theNational Tsing Hua University in Taiwan in 1990, and the M.S. and Ph.D.degrees in electrical and computer engineering from the University ofCalifornia at Santa Barbara in 1994 and 1997, respectively. From 1994 to 1997he was engaged in the Alexandria Digital Library (ADL) project in UCSB whilecompleting his Ph.D. He developed a web-based image retrieval system calledNetra which has been frequently cited by other researchers and is regarded asone of the most representative image retrieval systems. From 1997 to 2001, hewas with HP Labs where he worked in the field of multimedia adaptation anddistributed media services infrastructure. He joined Microsoft Research Asiain 2001. Since then, he has been leading a research group to conduct researchin the areas of information retrieval, web search, data mining, mobilebrowsing, and multimedia management. He currently serves as an Editor for theACM/Springer Multimedia Systems Journal and Associate Editor for ACMTransactions on Information System (TOIS). He has served on the organizing andprogram committees of many international conferences including ACM Multimedia,ACM SIGIR, ACM CIKM, WWW, ICME, CVPR, SPIE Multimedia Storage and ArchivingSystems, SPIE Multimedia Communication and Networking, etc. He is also thegeneral co-chair of International Multimedia Modeling (MMM) Conference 2005and International Conference on Image and Video Retrieval (CIVR) 2005. He haspublished 5 book chapters and over 100 international journal and conferencepapers.====================Google Labshttp://labs.google.com/Google Labs is a playground for Google engineers and adventurous Google users.Google staffers with wild and crazy ideas post their prototypes on Google Labsand solicit feedback on how the technology could be used or improved. None ofthese experiments are guaranteed to make it onto Google.com, as this is reallythe first phase in the development process. Google users with a desire to jumpover the cutting edge are invited to check out any or all of the postedprototypes and send their comments directly to the Googlers who developedthem. Please, remember to wear your safety goggles while using this site.Labs.google.com, Google's technology playground.Google labs showcases a few of our favorite ideas that aren't quite ready forprime time. Your feedback can help us improve them. Please play with theseprototypes and send your comments directly to the Googlers who developed them. Want to learn more about Google technology? Here are some papers.http://labs.google.com/papers/index.htmlPassionate about these topics? You should work at Google.algorithms, artificial intelligence, compiler optimization,computer architecture, computer graphics,data compression, data mining, file system design,genetic algorithms, information retrieval,machine learning, natural language processing, operating systems,profiling, robotics, text processing, user interface design,web information retrieval, and more! http://www.google.com/press/podium.htmlGoogle Press Center: The Google Podium Here you'll find a selection of public presentations made by Googleexecutives. From time to time, we will continue to add transcripts, audio orvideo clips and links to presentations hosted elsewhere.====================Jon Kleinberghttp://www.cs.cornell.edu/home/kleinber/Professor of Computer Science, Cornell UniversityMy research is concerned with algorithms that exploit the combinatorialstructure of networks and information. My recent work has included* link analysis and modeling of the World Wide Web and related information networks;* discrete optimization and network algorithms; and* algorithmic approaches to clustering, indexing, and data mining. ====================