从将蘑菇分类为可食用或不可食用的分类中学习随机森林分类器

本文主要是介绍从将蘑菇分类为可食用或不可食用的分类中学习随机森林分类器，希望对大家解决编程问题提供一定的参考价值，需要的开发者们随着小编来一起学习吧！

There are about 50,000 species of mushrooms and out of which 1 to 2 % of them are poisonous. Predicting whether a mushroom is edible or not is a classic problem in the domain of Machine Learning. A mushroom is classified based on a number of features

大约有50,000种蘑菇，其中有1-2％是有毒的。预测蘑菇是否可食用是机器学习领域的经典问题。蘑菇根据许多特征分类

什么是RandomForest分类器？ (What is a RandomForest Classifier?)

A RandomForest Classifier operates as an ensemble algorithm. An ensemble algorithm is the one that combines two more algorithms to derive better results. A RandomForest Classifier is a combination of number of Decision tress. If you do not have an idea of what a Decision Tree is, please visit my blog on Decision Trees.

RandomForest分类器用作集成算法。集成算法是一种将两种以上算法组合在一起以得出更好结果的算法。 RandomForest分类器是决策树数量的组合。如果您不知道什么是决策树，请访问我关于决策树的博客。

From the number of decision trees derived the best are chosen and it is fitted to the model.

从得出的决策树数量中选择最佳，然后将其拟合到模型中。

让我们编码！ (Let’s Code!)

Now we are going to build a RandomForest Classifier machine learning model using python and some libraries. Libraries are set of programs already written to make the calculations simpler. If you do not know the common machine learning terminologies like Model, Training, etc. please do visit my article on Basic Terminologies of Machine Learning using this link. Let’s start to code!

现在，我们将使用python和一些库来构建RandomForest分类器机器学习模型。库是已经编写的一组程序，可以简化计算。如果您不了解诸如Model，Training等常见的机器学习术语，请使用此链接访问我有关机器学习的基本术语的文章。让我们开始编码！

Here we have imported the necessary libraries and packages for us to perform the simple linear regression. The libraries and packages imported are:

在这里，我们已经导入了必要的库和包，以执行简单的线性回归。导入的库和软件包为：

Numpy: This is a package that is used for scientific calculations and array calculations in python.
Numpy：这是一个用于在python中进行科学计算和数组计算的软件包。
Pandas: This is a powerful package that has some functions for Data Analysis and Manipulation.
熊猫：这是一个功能强大的软件包，具有一些用于数据分析和处理的功能。
Sklearn: This is a free machine learning library that contains many functions and methods that are necessary to build a machine learning model. From Sklearn we have imported three functions LabelEncoder, model_selection, and ensemble. LabelEncoder is used to convert the categorical variables into numerical variables and the Ensemble function contains the built-in package for RandomForestClassifier.
Sklearn：这是一个免费的机器学习库，其中包含构建机器学习模型所需的许多功能和方法。从Sklearn，我们导入了三个函数LabelEncoder，model_selection和ensemble。 LabelEncoder用于将分类变量转换为数值变量，并且Ensemble函数包含RandomForestClassifier的内置包。

Here we are importing the dataset named “mushrooms.csv” and displaying their values before LabelEncoding. The link to the dataset is here.

在这里，我们将导入名为“ mushrooms.csv”的数据集，并在LabelEncoding之前显示其值。数据集的链接在这里。

Here, we have dropped or deleted the rows that have null values in them to have better predictions. LabelEncoding is the process of converting the categorical or alphabetical values into numerical values as a computer can understand only numerics. And we have also done LabelEncoding here.

在这里，我们删除或删除了其中具有空值的行，以进行更好的预测。 LabelEncoding是将分类或字母值转换为数字值的过程，因为计算机只能理解数字。而且我们还在这里完成了LabelEncoding。

Here, we are selecting the target variable(y) whether a mushroom is edible or not and the features to predict the target variable in the variable(x) and we are splitting the dataset into train and test sets.

在这里，我们选择蘑菇是否可食用的目标变量(y)，以及在变量(x)中预测目标变量的特征，然后将数据集分为训练集和测试集。

Here, we are initializing the RandomForestClassifier model with the number of decision trees to be formed as “40”. And we are fitting our data to our model.

在这里，我们要初始化的RandomForestClassifier模型的决策树数为“ 40”。我们正在将数据拟合到模型中。

40 Decision trees will be formed and the best out of them for the two classes Edible or Not-Edible will be selected as the final model.

将形成40个决策树，并从两个类别的可食用或不可食用中选出最好的作为最终模型。