Machine Learning Portofolio in Manufacturing Intelligence Practice
来源:互联网 发布:太湖超级计算机 知乎 编辑:程序博客网 时间:2024/06/08 16:13
Machine learning processes
- Step1: Define your problem
- Step2: Prepare your data
- Step3: Spot-check algorithms
- Step4: Improve results
- Step5: Present results
Machine learning Practice Tool
- Weka (Open source UI platform)
No coding, no need deep methmetics, and quick start. - Azure Machine Learning studio
Free apply account and practice
Dataset in short-memory for practice
We choose some datasets in IUC and use Weka platform to do practice based on machine learning basic processes.
Dataset:
iris.data (iris flowers classification dataset)in IUC
Note: in Weka it supports .arff file, so we can load .csv to weka then transfer to .arff from .csv.
Modeling steps overall
Problem definition
We will use Iris flowers classification dataset.
Each instance in the iris dataset describs measurements of iris flowers.
The task of predict which species of 3 iris flower the observation belongs.
Prepare your data
- Load dataset
In weka, we can use Explorer interface to load dataset with .arff formate to check the data. - Analyze the dataset
Review the distribution of each attribute and the interactions between attriutes, which my shed light on specific data transforms and specific modeling techniques the we could use.- Summary statistics
- We notice a few things: (dataset basic description from business view)
- The dataset is called iris.
- There are 150 instances. If we use 10-fold cross-validation later to evaluate the algorithms, then each fold will be comprised of 15 instances, which is quite small. We may want to think about using 5-folds of 30 instances instead.
- There are 5 attributes, 4 inputs and 1 output variable.
- There are a small number of attributes and we could investigate further using feature selection methods.
- Click on each attribute in the Attributes pane and review the summary statistics in the Selected attribute pane.
- We can notice a few facts about our data:(dataset deep description from technique)
- There are no missing values for any of the attributes.
- All inputs are numeric and have values in the same range between about 0 and about 8.
- The last attribute is the output variable called class, it is nominal and has three values.
- The classes are balanced, meaning that there is an equal number of instances in each class.
- If they were not balanced we may want to think about balancing them.
We may see some benet from either normalizing or standardizing the data.
- Attribute distributions (Further deep description for attributes)
We can use visualize All button to review graphical distribution of each attribute.
- We can see overlap but diering distributions for each of the class values on each of the attributes. This is a good sign as we can probably separate the classes.
- It looks like sepalwidth has a Gaussian-like distribution. If we had a lot more data, perhaps it would be even more Gaussian.
- It looks like the other 3 input attributes have nearly-Gaussian distributions with a skew or a large number of observations at the low end of the distribution. Again, it makes me think that the data may be Gaussian if we had an order of magnitude more examples.
- We also get a visual indication that the classes are balanced.
阅读全文
0 0
- Machine Learning Portofolio in Manufacturing Intelligence Practice
- Machine learning potorfolio in manufacturing intelligence practice 2
- Machine Learning - Gradient Descent in Practice
- Machine Learning - Neural Networks Learning: Backpropagation in Practice
- Machine Learning v.s. Artificial Intelligence
- Smile - Statistical Machine Intelligence and Learning Engine
- What is Machine Intelligence vs. Machine Learning vs. Deep Learning vs. Artificial Intelligence (AI)
- Andrew NG 《machine learning》week 5,class2 —Backpropagation in practice
- Thinking In Machine Learning
- machine learning in R
- machine learning in action
- Hyperparameters in Machine Learning
- Machine Learning in Python
- Machine Learning in Action
- Machine Learning in Action_CH2_1_kNN
- Machine Learning in iOS
- Machine Learning In Action
- Machine Learning In Action
- 听云支持.NET Core的应用性能监控
- struts2
- 334. Increasing Triplet Subsequence
- Ubuntu 16.04 安装基础入门教程
- 本周不容错过的的9篇NLP论文 | PaperDaily #21
- Machine Learning Portofolio in Manufacturing Intelligence Practice
- 不吉利的日期
- 【笔记】机器学习入门(一)
- Unity Android Activity控制
- springboot servlet-api问题
- hdu 5534 Partial Tree 背包DP
- 09-天亮大数据系列教程之hive之udf/udaf/udtf
- MySQL中针对大数据量常用技术
- [Leetcode] 481. Magical String 解题报告