How can I become data scientist?FAQ
来源:互联网 发布:淘宝运营商是真的吗 编辑:程序博客网 时间:2024/05/22 06:05
How can I become a data scientist?FAQ
Re-AskFollow7.7k
Comment1
100+ ANSWERS
7,708 FOLLOWERS
114,145 VIEWS
69 MERGED QUESTIONS
EDITS
Quora User
Upvoted by Ryan Fox Squire, Neuroscientist Turned Data Scientist • Quora User, Australian Data Engineer • William Emmanuel Yu, Chief Nerd. PHB and PHD.
Originally Answered: How do I become a data scientist?
Strictly speaking, there is no such thing as "data science" (see What is data science? ). See also: Vardi, Science has only two legs: http://portal.acm.org/ft_gateway...
Here are some resources I've collected about working with data, I hope you find them useful (note: I'm an undergrad student, this is not an expert opinion in any way).
1) Learn about matrix factorizations
Take the Computational Linear Algebra course (it is sometimes called Applied Linear Algebra or Matrix Computations or Numerical Analysis or Matrix Analysis and it can be either CS or Applied Math course). Matrix decomposition algorithms are fundamental to many data mining applications and are usually underrepresented in a standard "machine learning" curriculum. With TBs of data traditional tools such as Matlab become not suitable for the job, you cannot just run eig() on Big Data. Distributed matrix computation packages such as those included in Apache Mahout [1] are trying to fill this void but you need to understand how the numeric algorithms/LAPACK/BLAS routines [2][3][4][5] work in order to use them properly, adjust for special cases, build your own and scale them up to terabytes of data on a cluster of commodity machines.[6] Usually numerics courses are built upon undergraduate algebra and calculus so you should be good with prerequisites. I'd recommend these resources for self study/reference material:
See Jack Dongarra : Courses and What are some good resources for learning about numerical analysis?
2) Learn about distributed computing
It is important to learn how to work with a Linux cluster and how to design scalable distributed algorithms if you want to work with big data (Why the current obsession with big data? ).
Crays and Connection Machines of the past can now be replaced with farms of cheap cloud instances, the computing costs dropped to less than $1.80/GFlop in 2011 vs $15M in 1984: http://en.wikipedia.org/wiki/FLOPS .
If you want to squeeze the most out of your (rented) hardware it is also becoming increasingly important to be able to utilize the full power of multicore (see http://en.wikipedia.org/wiki/Moo... )
Note: this topic is not part of a standard Machine Learning track but you can probably find courses such as Distributed Systems or Parallel Programming in your CS/EE catalog. See distributed computing resources, a systems course at UIUC, key works, and for starters: Introduction to Computer Networking.
After studying the basics of networking and distributed systems, I'd focus on distributed databases, which will soon become ubiquitous with the data deluge and hitting the limits of vertical scaling. See key works, research trends and for starters: Introduction to relational databases and Introduction to distributed databases (HBase in Action).
3) Learn about statistical analysis
Start learning statistics by coding with R: What are essential references for R? and experiment with real-world data: Where can I find large datasets open to the public?
Cosma Shalizi compiled some great materials on computational statistics, check out his lecture slides, and also What are some good resources for learning about statistical analysis?
I've found that learning statistics in a particular domain (e.g. Natural Language Processing) is much more enjoyable than taking Stats 101. My personal recommendation is the course by Michael Collins at Columbia (also available on Coursera).
You can also choose a field where the use of quantitative statistics and causality principles [7] is inevitable, say molecular biology [8], or a fun sub-field such as cancer research [9], or even narrower domain, e.g.
Re-AskFollow7.7k
Comment1
100+ ANSWERS
7,708 FOLLOWERS
114,145 VIEWS
69 MERGED QUESTIONS
EDITS
Upvoted by Ryan Fox Squire, Neuroscientist Turned Data Scientist • Quora User, Australian Data Engineer • William Emmanuel Yu, Chief Nerd. PHB and PHD.
Originally Answered: How do I become a data scientist?
Strictly speaking, there is no such thing as "data science" (see What is data science? ). See also: Vardi, Science has only two legs: http://portal.acm.org/ft_gateway...
See Jack Dongarra : Courses and What are some good resources for learning about numerical analysis?
Crays and Connection Machines of the past can now be replaced with farms of cheap cloud instances, the computing costs dropped to less than $1.80/GFlop in 2011 vs $15M in 1984: http://en.wikipedia.org/wiki/FLOPS .
If you want to squeeze the most out of your (rented) hardware it is also becoming increasingly important to be able to utilize the full power of multicore (see http://en.wikipedia.org/wiki/Moo... )
Note: this topic is not part of a standard Machine Learning track but you can probably find courses such as Distributed Systems or Parallel Programming in your CS/EE catalog. See distributed computing resources, a systems course at UIUC, key works, and for starters: Introduction to Computer Networking.
After studying the basics of networking and distributed systems, I'd focus on distributed databases, which will soon become ubiquitous with the data deluge and hitting the limits of vertical scaling. See key works, research trends and for starters: Introduction to relational databases and Introduction to distributed databases (HBase in Action).
Cosma Shalizi compiled some great materials on computational statistics, check out his lecture slides, and also What are some good resources for learning about statistical analysis?
I've found that learning statistics in a particular domain (e.g. Natural Language Processing) is much more enjoyable than taking Stats 101. My personal recommendation is the course by Michael Collins at Columbia (also available on Coursera).
You can also choose a field where the use of quantitative statistics and causality principles [7] is inevitable, say molecular biology [8], or a fun sub-field such as cancer research [9], or even narrower domain, e.g.
0 0
- How can I become data scientist?FAQ
- How to become a data scientist
- become a data scientist
- how to become a data scientist - see knowledge graph
- A road map to become a Data Scientist(上)
- How can I pass data between two levels?
- How can I access my Wubi root.disk data
- How Can I Stop
- Windows SDK Registry: How can I read in data from the registry?
- Windows SDK Registry: How can I write data to the registry?
- How can I create a data binding in code using WPF?
- How can I capture raw 802.11 frames, including non-data (management, beacon) frames?
- How can i get FWHM?
- Journey from a Python noob to a Kaggler on Python So, you want to become a data scientist or may be
- How can I create a tray icon
- How can I create a tray icon
- How can I overclock my video card?
- How can I output simple debugging messages
- 百度天气api初体验
- hdu 5245__Joyful
- Java学习笔记20150907
- Effective C++——条款27(第5章)
- iOS中Keychain保存用户名和密码
- How can I become data scientist?FAQ
- HUST 1010The Minimum Length
- HDU 5430-Reflect(欧拉函数求圆内反射方法数)
- VS2010/MFC编程入门之十一中(对话框:模拟对话及其弹出过程)学习时的一个注意点
- SpringMVC—接收请求参数和页面传参
- hdu 5427 A problem of sorting
- VS2013开发上位机并调用MSCcommm控件的方式
- CCF 201409-4最优配餐(BFS)
- 收到批评的开心