7 Important Data Science Papers
来源:互联网 发布:mac系统盘在哪里 编辑:程序博客网 时间:2024/06/05 03:42
转自:http://datascience101.wordpress.com/2013/08/26/7-important-data-science-papers/
It is back-to-school time, and here are some papers to keep you busy this school year. All the papers are free. This list is far from exhaustive, but these are some important papers in data science and big data.
Google Search
- PageRank – This is the paper that explains the algorithm behind Google search.
Hadoop
- MapReduce – This paper explains a programming model for processing large datasets. In particular, it is the programming model used in hadoop.
- Google File System – Part of hadoop is HDFS. HDFS is an open-source version of the distributed file system explained in this paper.
NoSQL
These are 2 of the papers that drove/started the NoSQL debate. Each paper describes a different type of storage system intended to be massively scabable.
- Amazon Dynamo
- Google Bigtable
Machine Learning
- 10 algorithms in data mining | pdf download – This paper covers a number (10 to be exact) of important machine learning algorithms.
- A Few Useful Things to Know about Machine Learning – This paper is filled with tips, tricks, and insights to make machine learning more successful.
Bonus Paper
- Random Forests – One of the most popular machine learning techniques. It is heavily used in Kaggle competitions, even by the winners.
Are there any other papers you feel should be on the list?
- 7 Important Data Science Papers
- 7 Steps for Learning Data Mining and Data Science
- Big Data Science Collection
- Data Science note_1
- Data Science in Python
- Data Science---Pandas Learning
- Python for data science
- data science cs109 homework1
- cloudera data science workbench
- surveys on big data science
- Kaggle-Data Science London-1
- something useful for data science.
- class Python Data Science Toolbox
- Papers
- papers
- papers
- 100 open source Big Data architecture papers for data professionals
- summary of big data science terms
- 调用oracle存储过程权限不足问题
- 我是如何在GitHub上开源一个项目的(截图说明)
- 提高效率的13个策略
- easyui--验证表单提交后才开始验证的方法--form
- 纯CSS弹出层的示例代码
- 7 Important Data Science Papers
- 泛型
- 有了MAC地址,为什么还要有IP地址?
- 黑马程序员:模板
- 在jsp页面获得url参数的方法
- JAVA设计模式之 组合模式【Composite Pattern】
- RedrawWindow(NULL, NULL, RDW_INVALIDATE|RDW_ERASE|RDW_ERASENOW|RDW_ALLCHILDREN);
- 5 Cross-Platform Mobile Development Tools You Should Try
- android(客户端)和PC(服务器端)通信RSA 加密解密