大数据挖掘为什么不选择R?
来源:互联网 发布:白酒在淘宝上好卖吗 编辑:程序博客网 时间:2024/04/29 07:28
Tools: any thoughts on open source R vs. Rapid-I ?
R Pros:
- it's a programming language: you can do what you want
- number of algorithms: many analysis and data transformation schemes already exist
- integration and expandability: embedding an R script into own programs is pretty easy, writing extensions as well since you already know the language then
- widely used in academia and in education of statisticians: huge user base
- the no 1 option when it comes to pure statistics and data is not too large
R Cons:
- it's a programming language: you have to write / adapt source code for every single step
- scalability often is an issue: if you have large sets of data, you will easily get into trouble (since R is licensed under GPL and IP rights are hold by thousands of people I highly double that a 100% legal way for an open-core license model can be used here so that a company can jump in and help you out here)
- hardly any native support for enterprise usage / deployment: no process definitions, no scheduling, no integration, no...
- nothing for web-based applications: harder deployment etc.
RapidMiner Pros:
- gray-box programming: you can program your analysis but without writing source code (and you still have access to all details and are able to change them)
- processes: each program / analysis script is a parametrized process which can be re-used and - more important - connected or embedded into business processes
- scalability: depending on the used algorithms, RapidMiner can use much more data. By using the server version (RapidAnalytics), this can even be improved and the Enterprise Edition of RapidMiner offer methods for in-database-mining
- cluster support: multiple servers can be used as computation cluster
- business analytics: RM is much stronger when it comes to analytical ETL, data and text mining, and - especially with the Enterprise Edition of the server RapidAnalytics - predictive reporting and dashboards
- integration and expandability: easy integration of processes as web services (via RapidAnalytics), RM integrates Weka, R (best of both worlds), and offers options for additional extensions with scripts if something is missing
RapidMiner Cons:
- probably a smaller user base than R
- the integration between RapidMiner and R is not (yet!) as perfect as it should be
- less statistical methods than R but more methods derived from machine learning / data mining
- "There is an operator for that" is an often heard answer by RapidMiner-People. However, many operators (the basic building blocks of analysis processes) mean higher complexity
- the graphical user interface is really powerful but this also adds to complexity
This is certainly not a complete list or overview and there are many more aspects than those I have discussed above. And which tool is more appropriate certainly depends on your background and requirement. However, I still hope that it helps.
The good thing is: both are open source and can be tested, and due to the open source nature you can also have the best of both worlds with a single solution (RapidMiner + R Extension). So just give them a try and test them yourself!
- 大数据挖掘为什么不选择R?
- 【大数据部落】用R语言挖掘Twitter数据
- 【大数据部落】用R挖掘Twitter数据
- R语言数据挖掘之开篇——参考书选择
- 十大数据挖掘算法的R语言实现
- 大数据为什么要选择Spark
- 大数据为什么要选择Spark
- R语言数据挖掘
- 大数据 | 数据挖掘 | R语言 R绘图Session#1 - 基础
- 大数据 | 数据挖掘 | R语言 R绘图Session#2 - Bar Plot
- 为什么R语言是学习数据分析的第一选择
- 突破R内存限制的企业级大数据挖掘利器:Microsoft R Server 快速上手
- 大数据环境下集成R语言的数据挖掘系统 之 数据分析
- 数据挖掘与R语言
- 数据挖掘与R语言
- 数据挖掘与R语言
- 数据挖掘与R语言
- 《数据挖掘:R语言实战》
- MVC过滤器的介绍
- 浅谈manacher算法 最长回文子串(Longest Palindromic Substring)
- PL/SQL Developer如何连接64位的Oracle图解 .
- 哈希表实现
- reboot porting,重启指令的kernel移植,linux shell command
- 大数据挖掘为什么不选择R?
- 利用OpenCV实现模拟绳线运动
- c#进制之间转换
- 异常处理1
- PHPstudy配置总结
- Qt控件精讲五:输入组件
- 在 Visual Studio 2012 中使用 VMSDK 开发领域特定语言 (一)
- 异常处理2
- gevent程序员指南