An effective method for detecting the wheat freshness by integrating biophotonics and machine learning algorithm

0
An effective method for detecting the wheat freshness by integrating biophotonics and machine learning algorithm

After extracting descriptive statistical features from the measured time-series data, these features are fed into various machine learning algorithms for validation in order to identify the most suitable classification algorithm. There are various machine learning algorithms for discrimination25,26,27,28,29, such as Support Vector Machines, BP Neural Networks, etc.

Since the wheat sample data is considered a small sample size, the Support Vector Machine (SVM) is selected as the classification algorithm to assess the freshness of wheat. Furthermore, to optimize the hyperparameters for the SVM, this study introduces the Particle Swarm Optimization (PSO) algorithm, thereby establishing a PSO-SVM classification methodology. At the same time, for comparison purposes, additional machine learning algorithms, including K-Nearest Neighbors (KNN), Multilayer Perceptron (MLP), and Decision Trees, are also employed to validate the efficacy of the proposed method. In this section, a brief introduction to these algorithms will be outlined.

The K-nearest neighbors (KNN) algorithm

The K-Nearest Neighbors (KNN) algorithm is a fundamental technique in machine learning. Its core principle involves classifying unlabeled data points by identifying the K closest data points in the training set and using their categories to predict the class of the unknown data point. A key advantage of the KNN algorithm lies in its simplicity and intuitiveness; it does not require complex model training processes, relying solely on distance measurements between data points. However, the selection of an appropriate value for K and the distance metric is crucial for the algorithm’s performance, and may need to be adjusted and optimized based on the specific problem at hand.

Decision tree algorithm

The Decision tree algorithm is a supervised learning technique that simulates the human decision-making process by performing a series of data splits to achieve classification. This process begins at the root node, where each internal node represents a test on a specific attribute, each branch represents the outcome of that test, and each leaf node signifies a category. The ID3 algorithm is one variant of the decision tree algorithm that selects the optimal attribute for splitting by calculating information gain, aiming to minimize the depth of the tree. The C4.5 algorithm enhances ID3 by incorporating tricks such as handling missing values, pruning techniques, and rule extraction. Overall, the decision tree algorithm is widely utilized across various fields due to its simplicity and interpretability.

The multilayer perceptron

The Multilayer Perceptron (MLP) is a type of feed-forward artificial neural network that consists of an input layer, one or more hidden layers, and an output layer. The learning process of an MLP involves two main steps: forward propagation and back-propagation. During forward propagation, data flows from the input layer to the hidden layers and finally to the output layer. Within each layer, neurons perform a weighted summation of the input data and generate an output through an activation function, which then serves as the input for the neurons in the subsequent layer, continuing the process until the final output is produced. In back-propagation, the weights are adjusted backward through the layers based on the error between the network’s output and the actual target values, with the aim of reducing this error. This process is iterative and continues until the network’s performance meets specific criteria or a predetermined number of training iterations is reached. Activation functions play a crucial role in multilayer perceptrons (MLPs), determining the model’s ability to learn complex patterns. Popular activation functions include Sigmoid, Tanh, and ReLU. MLPs typically employ the back-propagation algorithm to adjust weights with the goal of minimizing the discrepancy between predicted values and actual values. MLPs have a wide range of applications, addressing various issues such as image recognition, speech recognition, and classification.

Support vector machine

The Support Vector Machine (SVM) is a powerful machine learning algorithm primarily employed for classification and regression tasks. The fundamental concept behind SVM is to find a hyperplane that optimally separates data points belonging to different classes. In two-dimensional space, this hyperplane is a straight line; in three-dimensional space, it appears as a plane; and in higher-dimensional spaces, it is referred to as a hyperplane. The objective of SVM is to find such a hyperplane that maximizes the margin on both sides, which is defined as the distance from the hyperplane to the nearest data. This margin is known as the “maximum margin”, and it is a key characteristic of SVM because it helps ensure that the decision boundary possesses optimal generalization capability.

SVM are capable of handling linearly separable data, which means there exists a distinct boundary that can effectively separate different classes of data. For data that is not linearly separable, SVM employs the kernel trick to map the data into higher dimensions, where a hyperplane can be identified to separate the classes. The process of training an SVM involves addressing optimization problems, including dual problems and Lagrange multipliers. These mathematical concepts provide a solid foundation for SVMs. To mitigate the risk of overfitting, SVMs use regularization techniques by incorporating the hinge loss function to minimize the complexity of the model along with the cost of misclassifications. Due to their strong performance and theoretical underpinnings, SVMs are widely applied across various fields, including text classification, image recognition, bioinformatics, and financial time series analysis.

The final optimization process of the Support Vector Machine can be summarized by the following formula:

$$ \beginarray*20c {\mathop \max \limits_\lambda \left( {\mathop \sum \limits_i = 0^n \lambda _i – \frac12\mathop \sum \limits_i = 0^n \mathop \sum \limits_j = 0^n \lambda _i \lambda _j y_i y_j \varphi _x_j \cdot \varphi _x_j } \right)} \\ st.~~\mathop \sum \limits_i = 0^n \lambda _i y_i = 0,\quad 0 \le \lambda _i \le C \\ \endarray $$

(1)

In the equation, \(\upvarphi _\textx_\textj\cdot \upvarphi _\textx_\textj=\textK\left(\textx_\texti,\textx_\textj\right)=\textexp\left(\upgamma \left^2\right),\upgamma =\frac12\upsigma ^2\), where K denote kernel function. C and \(\upsigma \) is hyper-parameters.

The particle swarm optimization (PSO) algorithm

The Particle Swarm Optimization (PSO) algorithm, proposed by Eberhart and Kennedy in 1995, is an optimization technique grounded in swarm intelligence theory. It simulates the flocking behavior of birds and other groups, where individuals collaboratively search for food. Each member continuously adapts its search strategy by learning from its own experiences and those of other group members. By facilitating information sharing among individuals within the swarm, the algorithm aims to identify the optimal solution to a problem. In PSO, each particle represents a potential solution within the solution space, characterized by two key attributes: velocity and position. Velocity indicates the speed of movement, while position signifies the direction of movement. As particles traverse through the search space, they adjust their positions and velocities based on both individual experiences and the collective experiences of the swarm. Through iterative updates of velocities and positions, the algorithm converges towards an optimal solution that satisfies the specified conditions. Due to its simplicity, rapid convergence, and minimal requirement for parameter tuning, PSO has been widely applied in various domains, including function optimization, neural network training, fuzzy system control, and other areas traditionally addressed by genetic algorithms.

PSO-SVM algorithm

The Support Vector Machine algorithm mentioned earlier has two important hyper-parameters, which need to be determined manually. This process often necessitates extensive experimentation and iterative fine-tuning on large datasets to find the optimal values. This paper utilizes the Particle Swarm Optimization (PSO) method to find the best hyper-parameters C and \(\upsigma \) that maximize the model’s performance. Beginning with random solutions, the optimal values are sought through iterative processes, with the quality of the solutions being evaluated by a fitness function.

Then, the SVM algorithm is used to explore the optimum classifying result within the solution space. This hybrid approach is c referred to as the Particle Swarm Optimization-Support Vector Machine (PSO-SVM) algorithm. The steps of the algorithm can be outlined as follows:

Step 1: The parameters of PSO are initialized. The position of each particle, denoted as Xi is randomly assigned within the range [Xmin, Xmax], while the velocity of the particle is set within the range [Vmin, Vmax].

Step 2: The position and velocity of each particle are computed and updated. Additionally, the fitness of each particle and the entire group is also updated.

Step 3: To determine if the end condition is satisfied, if it is not, step 2 is repeated; otherwise, the optimal C and \(\upsigma \) of SVM is gotten.

In the algorithm, the fitness function serves as the criterion for evaluating the accuracy and generalization ability of a SVM model. The Mean Squared Error (MSE) is frequently employed as the fitness function. The better the model performs, the higher its fitness value will be.

The iterative update formula is as follows:

$$ \beginarray*20c v_i = wv_i + c_1 r_1 \left( p_besti – x_i \right) + c_2 r_2 \left( g_besti – x_i \right) \\ x_i = x_i + v_i \\ \endarray $$

(2)

In the equation, p represents the best value for particle i, g denotes the global best value of the swarm, c1 and c2 are two parameters known as learning factors, r1 and r2 are random numbers between 0 and 1, and w is called the inertia factor, which regulates the speed of convergence and the ability to identify optimal solutions.

The algorithm flow chart is given in Fig. 4.

Fig. 4
figure 4

The algorithm flow chart of PSO-SVM.

Table 1 gives a summary of the advantages and disadvantages of KNN, MLP, Decision Tree, SVM, and PSO-SVM algorithm:

Table 1 Comparative summary of various feature classification algorithms.

link

Leave a Reply

Your email address will not be published. Required fields are marked *