dc.contributor.advisor | Nasution, Umaya Ramadhani Putri | |
dc.contributor.advisor | Sawaluddin | |
dc.contributor.author | Manurung, Gery Jonathan | |
dc.date.accessioned | 2025-07-19T07:42:43Z | |
dc.date.available | 2025-07-19T07:42:43Z | |
dc.date.issued | 2025 | |
dc.identifier.uri | https://repositori.usu.ac.id/handle/123456789/105815 | |
dc.description.abstract | The proliferation of sophisticated deepfake videos poses a serious threat to digital trust and security, demanding detection systems that are not only accurate but also computationally efficient for practical applications. This research aims to design, implement, and evaluate a hybrid architecture that balances high accuracy with inference efficiency for video-based deepfake detection. The proposed model integrates EfficientNet-B1 as a feature extractor with a Cross-Attention Multi-Scale Vision Transformer (Cross-ViT) for context modeling. The model was trained on a combination of the FaceForensics++ and Celeb-DF(v2) datasets and evaluated on an out-of-distribution dataset, DeepFakeDetection (DFD), to test its generalization capabilities. The evaluation results demonstrate reliable detection performance, achieving an Area Under the Curve (AUC) of 92.35% and a video-level F1-score of 83.62%. The model's primary advantage is its exceptional computational efficiency, requiring only 0.349 G-FLOP for per-frame inference, despite having a large parameter count (114.33 Million). This study also reveals that the use of a small batch size, Face-Cutout augmentation, and a Binary Cross-Entropy (BCE) loss function significantly contributes to improved generalization and effective video-level aggregation. This research successfully validates an efficient and scalable hybrid architecture that offers a practical solution for deepfake detection by balancing accuracy, inference speed, and model size. | en_US |
dc.language.iso | id | en_US |
dc.publisher | Universitas Sumatera Utara | en_US |
dc.subject | Deepfake Detection | en_US |
dc.subject | Vision Transformer | en_US |
dc.subject | Cross-ViT | en_US |
dc.subject | EfficientNet | en_US |
dc.subject | Computational Efficiency | en_US |
dc.subject | Deep Learning | en_US |
dc.title | Deteksi Deepfake pada Wajah dalam Video dengan Cross-Attention Multi-Scale Vision Transformer dan EfficientNet | en_US |
dc.title.alternative | Deepfake Detection on Faces in Video with Cross-Attention Multi-Scale Vision Transformer and EfficientNet | en_US |
dc.type | Thesis | en_US |
dc.identifier.nim | NIM211402137 | |
dc.identifier.nidn | NIDN0011049114 | |
dc.identifier.nidn | NIDN0031125982 | |
dc.identifier.kodeprodi | KODEPRODI59201#Teknologi Informasi | |
dc.description.pages | 85 Pages | en_US |
dc.description.type | Skripsi Sarjana | en_US |
dc.subject.sdgs | SDGs 16. Peace, Justice And Strong Institutions | en_US |