CatBoost: A machine learning library to handle categorical (CAT) data automatically MACHINE LEARNING
来源:互联网 发布:好伙伴物流软件 编辑:程序博客网 时间:2024/05/27 21:50
Introduction
How many of you have seen this error while building your machine learning models using “sklearn”?
I bet most of us! At least in the initial days.
This error occurs when dealing with categorical (string) variables. In sklearn, you are required to convert these categories in the numerical format.
In order to do this conversion, we use several pre-processing methods like “label encoding”, “one hot encoding” and others.
In this article, I will discuss a recently open sourced library ” CatBoost” developed and contributed by Yandex. CatBoost can use categorical features directly and is scalable in nature.
“This is the first Russian machine learning technology that’s an open source,” said Mikhail Bilenko, Yandex’s head of machine intelligence and research.
P.S. You can also read this article written by me before “How to deal with categorical variables?“.
Table of Contents
- What is CatBoost?
- Advantages of CatBoost library
- CatBoost in comparison to other boosting algorithms
- Installing CatBoost
- Solving ML challenge using CatBoost
- End Notes
1. What is CatBoost?
CatBoost is a recently open-sourced machine learning algorithm from Yandex. It can easily integrate with deep learning frameworks like Google’s TensorFlow and Apple’s Core ML. It can work with diverse data types to help solve a wide range of problems that businesses face today. To top it up, it provides best-in-class accuracy.
It is especially powerful in two ways:
- It yields state-of-the-art results without extensive data training
typically required by other machine learning methods, and - Provides powerful out-of-the-box support for the more descriptive
data formats that accompany many business problems.
“CatBoost” name comes from two words “**Cat**egory” and “**Boost**ing”.
As discussed, the library works well with multiple Categories of data, such as audio, text, image including historical data.
“Boost” comes from gradient boosting machine learning algorithm as this library is based on gradient boosting library. Gradient boosting is a powerful machine learning algorithm that is widely applied to multiple types of business challenges like fraud detection, recommendation items, forecasting and it performs well also. It can also return very good result with relatively less data, unlike DL models that need to learn from a massive amount of data.
Here is a video message of Mikhail Bilenko, Yandex’s head of machine intelligence and research and Anna Veronika Dorogush, Head of Tandex machine learning systems.
- CatBoost: A machine learning library to handle categorical (CAT) data automatically MACHINE LEARNING
- Shark machine learning library
- Introduction to Machine Learning
- Introduction to Machine Learning
- Introduction to Machine Learning
- Introduction to Machine Learning
- Introduction to machine learning
- Introduction to Machine Learning
- Machine Learning Library for Python
- Learning Path : Your mentor to become a machine learning expert
- Machine Learning(big data)
- machine learning
- Machine Learning
- machine learning
- Machine Learning
- machine learning
- Machine Learning
- machine learning
- test
- 链表题目整理
- GameEntity(一) —— CampType
- SSH整合 xml版 和注解版
- [USACO17OPEN]Where's Bessie? 贝西在哪呢
- CatBoost: A machine learning library to handle categorical (CAT) data automatically MACHINE LEARNING
- 这是别人的添加监听事件的方法
- bootstrap之transition
- Badboy自动化测试工具6 Variable Setter
- React中constructor(){}
- 闵神的数论
- 类的概述·main方法中参数String[] argv的意义讲解
- 主成分分析(PCA)
- python基础系列教程——python中的字符串和正则表达式全解