Machine learning and computer vision technology to analyze and discriminate soil samples

0
Machine learning and computer vision technology to analyze and discriminate soil samples

Some properties of studied soil sample groups are presented in Table 1. The highest pH and EC values were determined from Soil_1 sample class with the values of 8.46 and 0.827 ds m−1. Soil_5 had the greatest CaCO3 (21.120%) and organic matter (5.95%). Soil_1 and Soil_3 had the greatest phosphorus with the values of 5.494 and 5.179, respectively, while the lowest value was found as 2.545 in Soil_5 sample class. Clay, silt, and sand values varied from 10.71 to 13.94, 18.18 to 28.28 and 59.60 to 67.91, respectively.

Table 1 Some properties of studied soil samples.

In this study, two different ways were followed in the analysis. In the first method, Fine Tree, Quadratic SVM, Fine KNN and Subspace Discriminant algorithms and neural network algorithms as Bilayered Neural Network, Wide Neural Network, Narrow Neural Network, and Trilayered Neural Network were performed by using MATLAB. In the second method, PART, Random Forest, Bayes Net, Logistic algorithms were carried out by using WEKA. A total of 12 algorithms were modeled on soil samples and the results were compared in detail.

The overall accuracy of the Fine Tree algorithm was determined as 99.7%. Herein, 1% of FNR was found in Soil_1 and Soil_4 samples, while other sample groups were classified with accuracies of 100%. The overall accuracy value for Quadratic SVM and Fine KNN algorithms was obtained as 99.7%. In these algorithms, Soil_4 and Soil_6 soil groups were classified with a value of 99%, while other soil groups were classified as 100%. Among all algorithms, the Subspace Discriminant algorithm had the greatest overall accuracy with a value of 99.8%. In this algorithm, all soil groups were classified with 100% accuracy except for Soil_4 (TPR:99%) samples (Fig. 3). In the neural network algorithms, the overall accuracy of Wide Neural Network, Narrow Neural Network, and Bilayered Neural Network was determined as 99.5%. In all these algorithms, the TPR was found to be 99% and 98% for Soil_4 and Soil_6, respectively, and 100% for all other soil samples. The overall accuracy of Trilayered Neural Network (99.2%) was lower than other neural network algorithms. In addition, the FNR values of the Soil_1, Soil_4, Soil_5, and Soil_6 algorithms were revealed as 1%, 1%, 2% and 1%, respectively, while the Soil_2 and Soil_3 samples are classified with accuracy of 100% (Fig. 4).

Figure 3
figure 3

Confusion matrices—selected traditional machine learning models (Overall accuracies: Fine Tree—99.7%, Quadratic SVM—99.7%, Fine KNN—99.7%, Subspace Discriminant—99.8%).

Figure 4
figure 4

Confusion matrices—selected neural network models. (Overall accuracies: Narrow Neural Network—99.5%, Wide Neural Network—99.5%, Bilayered Neural Network—99.5%, Trilayered Neural Network—99.2%).

Average accuracy and confusion matrices of PART, Random Forest, Bayes Net and Logistic algorithms are presented in Table 2 and model performance results are given in Table 3. The PART algorithm classified the studied soil samples with a value of 99.67%. Precision values were found to be 0.990 for Soil_5 and Soil_6, while 1.000 for other soil groups supported these findings. While FPR values were obtained as 0.002 in Soil_5 and Soil_6 classes, it was obtained as 0.000 in all other soil classes. The highest MCC and F-Measure values were found to be 1.000 in Soil_1, Soil_2 and Soil_3 classes. The lowest MCC and F-Measure values were determined in Soil_6 with 0.988 and 0.990, respectively. In the Random Forest algorithm, the average accuracy value was found to be 99.67%. For this algorithm, MCC and F-Measure values were obtained as 0.988 and 0.990 for Soil_4 and Soil_6, respectively, and 1.000 for other soil groups. Soil_1, Soil_2, Soil_3, and Soil_5 had the greatest FPR with the value of 1.000. However, Soil_5 and Soil_6 had the highest FPR values as 0.002. In this machine learning group, Bayes Net algorithm had the greatest average accuracy and model performance results. The average accuracy value of Bayes Net algorithm was determined as 99.83%. Moreover, the FPR value was obtained as 0.990 in the Soil_6 samples, and MCC and F-Measure values were determined as 0.994 and 0.995 for Soil_4 and Soil_6, while these values were found to be 1,000 in the other soil groups. The precision value was found as 0.990 for Soil_4 group, and this value was set as 1,000 for other algorithms. Among these machine learning group, Logistic algorithm had the lower average accuracy with a value of 99.50%, and TPR values were obtained as 0.990 and 0.980 for Soil_4 and Soil_6, respectively. MCC values were also determined as 0.994, 0.988 and 0.982 for Soil_1, Soil_4 and Soil_6, respectively. The lowest F-Measure values was obtained as 0.985 in Soil_6. Soil_6 was followed by Soil_4 and Soil_1 with the values of 0.990 and 0.995, respectively. Soil_1, Soil_4, and Soil_6 had the lowest Precision values as 0.990 while the Soil_2, Soil_3, and Soil_5 had the greatest Precision values as 1.000.

Table 2 Average accuracies and confusion matrices of soil sample class.
Table 3 Classification performance results of soil sample class.

Considering overall accuracy values and performance metrics, the most successful models were BayesNet (99.83%), Subspace Discriminant (99.80%), Quadratic SVM (99.7%), Fine KNN (99.7%) and Fine Tree (99.7%), respectively. In addition, among the neural network models, Narrow, Wide and Bilayered Neural Networks were more successful with the value of 99.5%, while Trilayered Neural Network was the least successful model with the value of 99.2%.

For the soil classification using soil properties (moisture content, specific gravity, clay content, plastic, void ratio, and liquid limit parameters) and machine learning methods, Pham et al.23 used Adaboost, Tree and artificial neural network (ANN) modeling. The authors indicated that the developed adaboost model showed that it could well classify the soil and, in this model only 11 samples were not correctly identified among the total 88 data. Similar with to present study, Barman & Choudhury24 focused on texture properties of soil and classified by image analysis and support vector machine. According to their result of multi-class classification, the average percentage of accuracy ranges between 81.25% and 96.84%. Li et al.22 reported soil classification based on machine learning algorithm, and authors compared SVM and CNN (Convolutional Neural Network). CNN classification results were more successful than the SVM with the classification results between 85.91% and 95.58 in CNN, and between 88.37% and 91.16% in SVM. Comply to the present study, Azizi et al.25 studied deep learning to classify aggregates of any size in specific classes. The authors stated to train the Inception-v4, ResNet50, VggNet16, and CNN, architects were utilized, and the accuracies was above 95%, however the greatest accuracy value was found by ResNet50 (98.72%).

Bahrens et al.26 used a deep learning approach for digital soil mapping and random forest. The extended Gaussian pyramid and mixed scaling produced the best-performing set of covariates, and deep learning modeling produced the most accurate estimates, on average 4–7 percent more accurate than random forest, according to the experiments carried out by the authors using three different datasets. Mengistu and Alemayehu27 presented soil classification and characterization by sensor network approach and computer vision. The authors stated that characterization and classification are performed by Back-propagation neural network, the neural network was created with the 7 inputs and 6 neurons in it is output layer to classify soils, and 89.7% accuracy is achieved. Khatti and Grover42 compared the relationship between the index properties of the soil and CBR (California Bearing Ratio) with regression and ANN models in the literature. Researchers stated that the performance results (R) of simple regression, multi regression and artificial neural network models ranged between 0.5339 and 0.9736. It was emphasized that the highest result (0.9736-ANN) was found in the relevant study. Khatti and Grover43 also determined appropriate hyperparameters in ANN for the best prediction of soil geotechnical properties. Researchers have shown that ANN models based on the LM (Levenberg—Marquardt), BFG (BFGs Quasi—Newton) and SCG (Scaled Conjugate Gradient) algorithm require data sets with strong (0.61–0.80) to very strong (0.81–1.00) correlations. On the other hand, they revealed that ANN models based on GDM (Gradient Descent with Momentum), GD (Gradient Descent) and GDA (Gradient Descent Algorithm with Adaptive Learning) algorithms only need data sets with strong correlations to achieve a performance higher than 0.90. Bahmed et al.44 used Gaussian process regression (GPR), ensemble tree (ET), support vector machine (SVM), and decision tree (DT)), and hybrid (relevance vector machine (RVM)) to determine the most appropriate performance model for predicting the unconfined compressive strength (UCS) of the soil. They used relevance vector machine (RVM) models. The authors measured the performance of the models with three new index performance measures: the a20-index, the index of scatter (IOS), and the index of agreement (IOA). The PSO-optimized Laplacian kernel-based RVM model UCS16 has been identified as the optimal performance model after it outperformed all other models with higher a20-index (testing = 67.30, validation = 55.95), IOS (testing = 0.2799, validation = 0.3506) and IOA (testing = 0.8634, validation = 0.7795).

The physical attribute of soil texture is significant and highly changeable. It has a significant impact on numerous other soil properties, such as fertility and water retention capacity, which are highly relevant for agricultural productivity45,46. Thus, understanding soil texture diversity is essential for managing soil, developing agricultural policies, and keeping an eye on how land use is affecting the ecosystem46,47. Accurate evaluations of soil groups and soil loss are essential for reducing the effects of erosion and enhancing the fertility of agricultural fields48. In the present study, many backgrounds were tried while obtaining images. In order to obtain better results, colors in which the petri dish is located, and the background color is completely different from the sample color were preferred. This was the most important limitation of the study. The developed techniques can be easily applied to different sample groups. In addition, the investigated technique can also be combined with spectroscopic approaches to achieve higher accuracy49. In this way, very similar objects can be easily distinguished. Herein, an important point is to apply feature selection and determine the proper features for the input when working in many different band groups and color channels. Furthermore, spectra and textures can be reduced by down-sampling multi-feature data. In this way, greatest classification or prediction success can be achieved.

Achieving to sustainable development, effective soil monitoring is essential. Machine learning and computer vision technologies are revolutionizing soil texture analysis by making it effective, accurate and accessible. These advances simplify soil assessment processes and increase the reliability of soil texture estimates, which is vital for informed decision-making in agricultural and environmental management. The integration of citizen science initiatives increases the accuracy and local validity of digital soil maps50. Machine learning and computer vision advance the understanding and management of soil resources and provide solutions for sustainable land use practices and precision agriculture.

link

Leave a Reply

Your email address will not be published. Required fields are marked *