Automatic restoration and reconstruction of defective tooth based on deep learning technology | BMC Oral Health

0
Automatic restoration and reconstruction of defective tooth based on deep learning technology | BMC Oral Health

This study was reviewed by the Scientific Ethics Committee of Guangzhou University and has been confirmed to fully comply with the Declaration of Helsinki and the relevant regulations of the People’s Republic of China on biomedical human research. The approval number is GUSE [2024]095. To demonstrate the effectiveness of our method, we conduct an extensive series of experiments that evaluate its performance across several dimensions, including defect levels, types of tooth roots, inference speed, and generalization abilities, which can be seen in following sections.

Datasets and training setting

The proposed framework for tooth restoration and reconstruction integrates three distinct network models from various domains. To ensure optimal training performance for each network, three independent datasets were constructed, corresponding to the tooth image restoration network, the tooth image preprocessing network, and the tooth 3D reconstruction network. These datasets are designated as Dataset1, Dataset2, and Dataset3. The construction process and methods for these datasets are described in detail below.

Dataset1

A total of 166 freshly extracted human teeth, including canines and molars, were collected from the Guangdong University of Technology Affiliated Dental Hospital and the Sun Yat-sen University Affiliated Dental Hospital. Each tooth was thoroughly cleaned and dried before being photographed using an optical microscope equipped with an industrial camera (HIKVISION, China), resulting in 2D images with a resolution of 300 × 300. During evaluation and validation, higher-resolution images were also employed to assess the robustness and generalization ability of the proposed framework under more realistic conditions. Images were captured from frontal, dorsal, and lateral perspectives, yielding 5,296 complete 2D images of teeth. Subsequently, corresponding defective tooth images were generated based on LaMa [21]. To systematically assess the effectiveness of the proposed method, these images were categorized into slight, moderate, and severe defect classes, based on the proportion of the damaged area relative to the total tooth area (see Table 1).

Table 1 The definition and classification of levels of tooth defect

Dataset2

This dataset is primarily used for training the tooth image preprocessing network, facilitating the conversion between two image styles. It comprises two types of 2D images from different domains: real RGB images of teeth and synthetic grayscale images, both sourced from Dataset1 and Dataset3. The total number of images is 10,446.

Dataset3

Following the methodology employed in the standard public 3D dataset ShapeNet [34], this tooth 3D dataset was constructed. First, a high-precision 3D laser scanners (SIMSCAN, China) was utilized to capture the accurate 3D shape and structural information of the collected teeth, saving the data in.obj file format. These 3D tooth models were then imported into the open-source rendering software Blender to generate high-quality 2D images while recording the camera’s pose information (including elevation, azimuth, and height). This pose information serves as critical input data for network training, linking the 2D images to the 3D models. Ultimately, 5,150 synthetic tooth images and corresponding 3D point cloud files were generated. Specific details and image examples of the three datasets are outlined in Table 2. To expand the dataset size and enhance the model’s robustness, data augmentation techniques such as horizontal and vertical flipping, as well as random brightness adjustments, were applied. Furthermore, the datasets were divided into training, validation, and test sets in an 8:1:1 ratio.

Table 2 The related information and image examples of the three datasets

Notably, to minimize domain shift during dataset creation, all tooth images were captured using the same industrial camera and optical microscope under a fixed configuration. The scanning process for 3D models was also conducted using the same scanner, and all acquisition procedures were operated by the same trained technician. These strict controls ensured consistency in imaging conditions and minimized inter-device variability, thereby improving the reliability of model training and evaluation.

Training settings

The experiments are conducted using the Python programming language on a high-performance workstation. Due to the involvement of three distinct network models, variations in software configuration, hardware requirements, and training parameter settings are observed. Specific details are provided in Table 3. For the network training process, several key hyperparameters are obtained through transfer learning [35], which allowed us to leverage pre-trained models and fine-tune them for our specific task, thereby improving training efficiency and model performance.

Table 3 The training setting parameters of the three different network models

Experimental design of tooth image restoration network

In this section, we first conducted convergence experiments to determine the optimal training iteration for each model. Figure 5 presents the FID score curves of GLCIC, LaMa, and CTSDG models throughout the training process. The optimal number of iterations was determined using a periodic validation strategy, where model performance was evaluated on the validation set at fixed intervals. Specifically, the model was assessed at each predefined iteration step by monitoring the trend of Fréchet Inception Distance (FID) [36] scores. According to the minimization criterion of the FID metric (lower values indicate a closer match between the generated and real data distributions), the optimal number of iterations for each model was identified. As shown in the figure, CTSDG achieves the lowest FID score under its optimal iteration configuration, demonstrating the best generative performance among the three.

Fig. 5
figure 5

The FID–Iteration curves of three image inpainting models

Subsequently, experiments are conducted to evaluate the performance of the GLCIC, LaMa, and CTSDG models in addressing varying degrees of tooth defect. Both qualitative and quantitative analyses are performed on the results, focusing on slight, moderate, and severe tooth defects. Additionally, the restoration effects are also assessed in defect forms that both the tooth root and crown are defective. Following the evaluation criteria from LaMa and CTSDG, Peak Signal-to-Noise Ratio (PSNR) [37], Structural Similarity Index Measure (SSIM) [37], Learned Perceptual Image Patch Similarity (LPIPS) [38], and FID are selected as assessment metrics to comprehensively evaluate the restoration performance and generalization capability of these models, facilitating the identification of the most suitable model for tooth image restoration tasks. More details about them can be seen in Table 4.

Table 4 The evaluation metrics for tooth image restoration network

A. Slight defect

Figure 6 illustrates the restoration effects of the GLCIC, LaMa, and CTSDG models on slight defect teeth, both for single and multiple tooth roots. The results indicate that all three models effectively restore the damaged regions. However, LaMa and CTSDG demonstrate superior performance overall, with the restored regions closely resembling the ground truth. Quantitative results in Tables 5 and 6 further corroborate this conclusion, showing that LaMa and CTSDG outperform GLCIC across all metrics. Compared to LaMa, CTSDG outperforms across all four metrics, with a particularly notable improvement in FID, where it achieves a 20% better result than LaMa.

Fig. 6
figure 6

Restoration results of different models on slight defect

Table 5 PSNR and SSIM comparison of different models under different levels of tooth defect
Table 6 LPIPS and FID comparison of different models under different levels of tooth defect

B. Moderate defect

Figure 7 presents the restoration results for teeth with moderate defect using the GLCIC, LaMa, and CTSDG models. As the defective area increases, the restoration quality of GLCIC becomes increasingly blurred, resulting in unnatural textures. In contrast, LaMa and CTSDG generate natural and realistic restoration results. Quantitative analyses reveal minimal differences in all metrics between LaMa and CTSDG. Notably, in terms of detail restoration, CTSDG exhibits more reasonable structural shapes and clearer textures, attributed to its dual-branch discriminator design, which supervises both pixel and edge effects for global optimization, thereby maintaining image clarity while generating realistic textures.

Fig. 7
figure 7

Restoration results of different models on moderate defect

C. Severe defect

Figure 8 showcases the performance of the three models under severe tooth defect conditions. GLCIC fails to provide effective restoration, leading to excessive smoothing and structural distortion. Due to the large defective area, the available local information is reduced, resulting in a decline in restoration quality for both LaMa and CTSDG. However, they still produce impressive results. Interestingly, Tables 5 and 6 indicate that CTSDG slightly outperforms LaMa in severe defect scenarios. This can be attributed to the CTSDG’s CFA module which refines intricate details by drawing on distant spatial features. This is especially advantageous in severe defect cases where it is crucial to preserve the fine structure features.

Fig. 8
figure 8

Restoration results of different models on severe defect

D. Simultaneous defect to tooth root and crown

Figure 9 displays the restoration results for cases with simultaneous defect to both the tooth root and crown. GLCIC continues to exhibit blurriness in crown restoration, while both LaMa and CTSDG effectively restore the defect to the root and crown, yielding satisfactory results. However, in terms of detail, both models show superior performance in root restoration, indicating their enhanced capability in addressing root defect.

Fig. 9
figure 9

Restoration results of different models on tooth root and crown defect

After analyzing the qualitative and quantitative results from Sect. Experimental design of tooth image restoration network A to D, CTSDG consistently demonstrates superior performance across various levels and types of defects. Its ability to capture fine structural details and generate more realistic textures makes it particularly effective in handling both slight and severe defects. This adaptability, combined with its higher-quality restorations, solidifies CTSDG as the more reliable choice for our application.

E. Robustness under varying illumination conditions

Based on the preceding qualitative and quantitative experiments, CTSDG demonstrated superior restoration performance under various defect conditions. Therefore, CTSDG was selected as the core component of the tooth image restoration network in this study. However, dental image acquisition is often influenced by varying lighting in clinical environments, which can interfere with the model’s ability to accurately identify and restore defective regions. Therefore, to evaluate the robustness of the CTSDG model under different illumination conditions commonly encountered in clinical environments, we adjusted the brightness levels of input images to create low, standard, and high illumination conditions. The restoration performance of CTSDG was then evaluated on both single-root and multi-root teeth. As illustrated in Figs. 10 and 11, CTSDG consistently produced high-quality restorations across all lighting scenarios. The restored results maintained coherent morphology, fine texture, and clear edge boundaries, indicating the model’s strong robustness to illumination changes. These results confirm CTSDG’s practical applicability in real-world clinical settings.

Fig. 10
figure 10

The restoration results of single-root teeth under different illumination conditions. Column a and c are the single-root teeth tooth defect images under different light intensities; Column b and d are the corresponding restoration results

Fig. 11
figure 11

The restoration results of multi-root teeth under different illumination conditions. Column a and c are the multi-root teeth tooth defect images under different light intensities; Column b and d are the corresponding restoration results

Experimental design of tooth image preprocessing network

In this section, we first conducted convergence experiments to determine the optimal training epochs for each model. Figure 12 shows the Kernel Inception Distance (KID) [39] score curves of four models including ACLGAN, UVCGAN, CycleGAN, and the LSeSim loss-based CycleGAN throughout the training process. Similar to the approach used in Sect. Experimental design of tooth image restoration network, a periodic validation strategy was employed, wherein model performance was evaluated at fixed intervals on the validation set. It can be observed that the LSeSim loss-based CycleGAN achieves the lowest KID score under its optimal training configuration, indicating superior performance in image translation.

Fig. 12
figure 12

The KID–Epoch curves of four image translation models

As illustrated in Figs. 13 and 14, various models including ACLGAN, UVCGAN, CycleGAN, and LSeSim loss-based CycleGAN are compared to assess their qualitative results in converting complete RGB images of single-root and multi-root teeth to grayscale images. It is observed that both ACLGAN and UVCGAN fail to produce satisfactory conversion results for single and multi-root teeth, exhibiting distortions in the shape and structure of the original images. While CycleGAN performs adequately with single-root teeth, its performance declines with multi-root teeth, resulting in an inability to accurately preserve the shape of the source images, leading to distortions and misalignments of the multi-root teeth features. In contrast, the introduction of LSeSim loss into CycleGAN effectively addresses these issues. The conversion results not only exhibit higher quality but also demonstrate greater stylistic consistency with the target images. Specifically, the hue is adjusted based solely on the style of the target images, while maintaining the structural integrity and shape of the source tooth images. This is because the LSeSim loss is designed to maintain the integrity of the structural shape while allowing reasonable appearance adjustments and does not impose penalties for appropriate appearance modifications in the target domain.

Fig. 13
figure 13

Qualitative results of different models on single-root teeth preprocessing

Fig. 14
figure 14

Qualitative results of different models on multi-root teeth preprocessing

For quantitative evaluation, two mainstream metrics for I2I translation performance are employed: FID and KID. FID calculates the distribution differences, including mean and covariance, between real and generated images, while KID serves as an advanced metric for assessing GAN convergence and image quality. Lower scores in both metrics indicate that the generated images are closer to the target domain. Additionally, the Inception Score (IS) [40] is used as a supplementary assessment metric, with higher IS values reflecting better image quality. Their formulas are shown in Table 7.

Table 7 The evaluation metrics for tooth image preprocessing network

According to the results presented in Table 8, the CycleGAN model combined with LSeSim loss outperforms other methods across all metrics, particularly demonstrating significantly lower FID and KID scores, indicating superior image conversion capabilities. Therefore, we select the LSeSim loss-based CycleGAN model as the image preprocessing network due to its outstanding performance in preserving structural integrity during domain translation.

Table 8 Quantitative results of different models in tooth preprocessing tasks

Experimental design of tooth 3D reconstruction network

Based on the analyses in Sects. Experimental design of tooth image restoration network and Experimental design of tooth image preprocessing network, CTSDG and LSeSim loss-based CycleGAN are selected as the core frameworks for the tooth image restoration and preprocessing networks. In this section, the integration of the three stages of tooth restoration and reconstruction is presented. As shown in Figs. 15 and 16, the qualitative results illustrate the progression of defective teeth from image restoration and preprocessing to 3D reconstruction, considering various defect levels and forms: slight, moderate, severe, and simultaneous defect to the tooth root and crown.

Fig. 15
figure 15

Qualitative results of different defective levels of teeth from image restoration and preprocessing to 3D reconstruction. Specially, columns 4 to 6 are the 3D reconstruction results of Pixel2Mesh based on different backbone networks

Fig. 16
figure 16

Qualitative results of defective tooth root and crown from image restoration and preprocessing to 3D reconstruction

For the tooth 3D reconstruction network, the Pixel2Mesh model is innovatively employed, with modifications made to its backbone network to enhance reconstruction accuracy. Specifically, the reconstruction results of VGG16 (the original backbone), ResNet18, and ResNet50 are compared. Columns 4 to 6 of Figs. 15 and 16 display the reconstruction outcomes for each model. The results indicate that the use of VGG16 and ResNet18 struggles with reconstructing the complex structures of multi-root teeth, resulting in rough models with insufficient detail. In contrast, the Pixel2Mesh model based on ResNet50 successfully reconstructs the teeth models, yielding smoother continuous surfaces, more precise geometries, and enhanced surface details. Additionally, to intuitively compare the 3D reconstruction results with the ground truth, we visualize the surface deviation between the predicted and the actual tooth meshes using point cloud color difference maps. As shown in Figs. 17 and 18, the point cloud error maps of multi-root and single-root teeth reconstructed using Pixel2Mesh with different backbone network demonstrates that the results reconstructed by the ResNet50-based Pixel2Mesh exhibits the smallest difference when compared to the real point cloud, particularly in regions requiring significant deformation, such as the tooth root. In these areas, the error values fall within the 0–0.5 mm range, highlighting the model’s ability to capture complex structural changes with high precision. This improvement is attributed to the deeper architecture of ResNet50, which enables the learning of more complex features, thereby allowing the model to capture subtle details of the tooth surface and significantly improve reconstruction accuracy.

Fig. 17
figure 17

The point-cloud color maps of multi-root teeth reconstructed by Pixel2Mesh based on different backbone networks

Fig. 18
figure 18

The point-cloud color maps of single-root teeth reconstructed by Pixel2Mesh based on different backbone networks

To further evaluate the geometric quality of the reconstructed 3D tooth models, particularly the surface smoothness and potential existence of mesh holes, we conducted a qualitative comparison of reconstruction results using three different backbone networks in the Pixel2Mesh framework. As shown in Fig. 19, we present both posterior and left views of representative 3D reconstructions. From the visualizations, it can be observed that models using VGG16 and ResNet18 as backbones exhibit surface discontinuities, including jagged regions and visible mesh holes. These discontinuities indicate incomplete mesh connections and rough geometry, which may negatively impact downstream clinical applications. In contrast, the reconstruction based on ResNet50 demonstrates smooth and continuous surfaces without noticeable holes or structural gaps. This confirms that the deeper feature extraction capacity of ResNet50 contributes to a more accurate and geometrically reliable tooth surface reconstruction.

Fig. 19
figure 19

The posterior and left views of 3D tooth models reconstructed using Pixel2Mesh based on different backbone networks

For evaluation metrics, standard and widely used measures of 3D reconstruction are employed, including F-Score [41], Chamfer Distance (CD), and Earth Mover’s Distance (EMD) [42]. Their mathematical expressions can be seen in Table 9.

Table 9 The evaluation metrics for tooth 3D reconstruction network

Tables 10 and 11 show that regardless of the defect situation, the Pixel2Mesh model based on ResNet50 achieves higher F-Score, with lower CD and EMD values, indicating superior reconstruction performance. Specifically, compared to the original model (VGG16-based), the average F-Score, CD, and EMD are improved by 26.5%, 34.7%, and 22.3%, respectively. Therefore, we select the ResNet50-based Pixel2Mesh model for tooth 3D reconstruction due to its ability to consistently deliver superior performance across different defect scenarios.

Table 10 The values of F-Score at different defect situations with different thresholds where \(\tau ={10}^{-4}\)
Table 11 The values of CD and EMD at different defect situations

To further investigate the regional accuracy of the reconstructed mesh, we conducted a statistical analysis of reconstruction errors across three key anatomical regions: crown, cervical, and root. From Fig. 20, we observe that the cervical region consistently exhibits the lowest reconstruction error, followed by the root region, while the crown region shows the highest error. This indicates that the network performs better in transitional regions (like the cervical zone) where geometric consistency is easier to learn. Additionally, across all three regions, the Pixel2Mesh model based on ResNet50 shows lower overall errors and tighter error distributions compared to those based on VGG16 and ResNet18.

Fig. 20
figure 20

Error box plots of the Pixel2Mesh model based on different backbone networks in the crown, cervical, and root structure areas

Application performance testing

Inference speed evaluation

From a clinical perspective, the inference speed nature of the tooth restoration and reconstruction process is crucial for enhancing the patient experience. In situations requiring rapid decision-making, practitioners often need to quickly obtain tooth models to formulate treatment plans promptly. Therefore, the inference speed of the proposed tooth image restoration and reconstruction method is evaluated. The total time required for the three stages including image restoration, preprocessing, and 3D reconstruction under different defect forms is recorded. The results presented in Table 12 indicate that the overall processing time is approximately 12 s, demonstrating minimal variation across different levels of tooth defect. This suggests that, compared to traditional manual restoration techniques and CAD-based reconstruction methods, our approach offers significant advantages in terms of time efficiency, enabling the generation of high-quality reconstruction results within a short timeframe. In contrast, traditional manual restoration is often time-consuming, requiring professionals to spend hours or even days, particularly when dealing with complex defects. Similarly, CAD-based methods demand extensive interaction and specialized design expertise, which limits their convenience and responsiveness in real-time applications.

Table 12 The overall time(s) needed for the three stages including image restoration, preprocessing, and 3D reconstruction

Generalization evaluation

To comprehensively evaluate the generalization performance of the proposed tooth restoration and reconstruction framework, this section investigates its effectiveness on tooth types not included in the training dataset. During the experiments, several defective incisor and molar samples—unseen during training—were selected and input into the pre-trained tooth restoration network. The restoration results in the defective regions were then assessed. As shown in Figs. 21 and 22, the model demonstrated strong adaptability to these novel tooth types, successfully generating accurate restorations in the defective regions while maintaining high visual realism and texture consistency. Furthermore, in the 3D reconstruction task, the preprocessed incisor images were fed into the reconstruction network to evaluate the similarity between the generated 3D models and the ground-truth anatomical structures. The experimental results indicate that the model accurately reconstructed the overall geometry of both incisors and molars, effectively capturing their realistic anatomical features. These findings further confirm that the proposed method retains a high level of reconstruction accuracy and exhibits strong generalization ability when applied to previously unseen tooth types.

Fig. 21
figure 21

Restoration and reconstruction results of intraoral incisors

Fig. 22
figure 22

Restoration and reconstruction results of novel molars

Noise robustness

In practical applications, image acquisition is often affected by various types of noise due to sensor limitations, lighting variations, or patient movement. To verify the robustness of the proposed framework under such conditions, we conducted experiments to evaluate the restoration and reconstruction performance when the input images are corrupted with Gaussian noise of varying intensities. Specifically, Gaussian noise was added to the input RGB images of defective teeth with different Signal-to-Noise Ratio (SNR) levels. The proposed restoration and reconstruction pipeline was then applied to these noisy inputs. As shown in Figs. 23 and 24, whether restoring single-root or multi-root teeth, even when the input images are degraded by noise, the framework maintain high restoration and reconstruction quality. In the restoration stage, CTSDG consistently localizes and completes the missing regions, demonstrating strong resilience to noise perturbation. During the preprocessing stage, the CycleGAN model with LSeSim loss effectively suppresses noise artifacts while preserving structural details, resulting in clean grayscale images. Subsequently, the 3D reconstruction network is able to accurately infer complete mesh structures, regardless of noise level. These results highlight the robustness of the proposed method in noisy environments, ensuring reliable performance in real-world clinical settings where acquisition conditions may be less controlled.

Fig. 23
figure 23

Qualitative results of restoration and reconstruction of single-root teeth under different Gaussian noise conditions

Fig. 24
figure 24

Qualitative results of restoration and reconstruction of multi-root teeth under different Gaussian noise conditions

Failed cases

Despite the overall effectiveness of our proposed framework, we observed several failure cases under extreme or complex input conditions. As illustrated in Fig. 25, these cases reveal the limitations of the current method and point to opportunities for future improvement. When the defective tooth retains only the crown and almost the entire root is missing, the inpainting model fails to restore the correct anatomical structure. Due to the limited image features available, the network struggles to infer accurate geometry. Furthermore, in cases where the tooth root is not only completely missing but also has an uncommon or highly irregular geometry, the network fails to predict the correct topology. Finally, when the defect involves a complete loss of multiple roots in multi-root teeth, the network occasionally misclassifies the tooth as a single-root type, resulting in an oversimplified reconstructed mesh that does not match the true tooth morphology structure. These failed cases highlight current model limitations and emphasize the need for enhanced robustness, particularly under sparse input conditions and rare tooth morphology.

Fig. 25
figure 25

Failed cases of the teeth restoration

link

Leave a Reply

Your email address will not be published. Required fields are marked *