End-to-End Multi-Task Deep Learning Menggunakan SSDLite-MobileViT untuk Mengestimasi Fenotipe Pertumbuhan Tanaman Selada (Lactuca Sativa) dari Citra RGB-D
End-to-End Multi-Task Deep Learning using SSDLite-MobileViT for Estimating Lettuce (Lactuca Sativa) Growth Phenotypes from RGB-D Images

Date
2025Author
Athariq, Ahmad Ghalib
Advisor(s)
Nainggolan, Pauzi Ibrahim
Seniman
Metadata
Show full item recordAbstract
Traditional methods for lettuce phenotyping are often destructive and labor-intensive, while many existing deep learning methods are too computationally demanding for mobile devices. This study aims to develop an efficient multi-task end-to-end deep learning system to estimate the key growth-related traits, the fresh weight (FW) and height (H) of lettuce (Lactuca sativa) from RGB-D images, utilizing the lightweight SSDLite-MobileViT deep learning architecture, specifically designed for inference on mobile platforms. The proposed method employs a dual-backbone architecture to separately process RGB and depth data, which are then fused using Attentional Feature Fusion (AFF) to potentially enhance data fusion performance. The model is trained to simultaneously perform object detection and phenotype regression on a combined dataset, with evaluation conducted via 5-fold cross-validation. Performance is evaluated using metrics such as Average Precision (AP), Coefficient of Determination (R²), MAPE, and RMSE, with final deployment on Android via ExecuTorch runtime. Evaluation results show that the dual-backbone architecture achieves superior performance, with a Coefficient of Determination (R²) of 96.66% for height estimation, an Average Precision (AP) of 74.18%, and an Average Recall (AR) of 79.89%. The use of depth fusion significantly reduces the Mean Absolute Percentage Error (MAPE) for H estimation by 33.33% compared to the baseline model. Inference time on mobile devices ranges between 380–605 ms by utilizing the CPU, indicating practical feasibility for real-world deployment. However, further development is needed to improve the reliability of the estimation and the generalizability of the model. While the H estimation shows promising performance, the FW estimation still faces challenges, as evident from the relatively high MAPE value. In addition, the model's ability to detect objects is still limited, especially in small or large lettuce plants, and the model's test data coverage is still limited to the early phase of plant growth. The findings in this study demonstrate the potential of the proposed approach while underscoring the need for more representative datasets and a more effective loss function.
Collections
- Undergraduate Theses [1248]