• Login
    View Item 
    •   USU-IR Home
    • Faculty of Computer Science and Information Technology
    • Department of Information Technology
    • Undergraduate Theses
    • View Item
    •   USU-IR Home
    • Faculty of Computer Science and Information Technology
    • Department of Information Technology
    • Undergraduate Theses
    • View Item
    JavaScript is disabled for your browser. Some features of this site may not work without it.

    Deteksi Deepfake pada Wajah dalam Video dengan Cross-Attention Multi-Scale Vision Transformer dan EfficientNet

    Deepfake Detection on Faces in Video with Cross-Attention Multi-Scale Vision Transformer and EfficientNet

    Thumbnail
    View/Open
    Cover (655.7Kb)
    Fulltext (3.130Mb)
    Date
    2025
    Author
    Manurung, Gery Jonathan
    Advisor(s)
    Nasution, Umaya Ramadhani Putri
    Sawaluddin
    Metadata
    Show full item record
    Abstract
    The proliferation of sophisticated deepfake videos poses a serious threat to digital trust and security, demanding detection systems that are not only accurate but also computationally efficient for practical applications. This research aims to design, implement, and evaluate a hybrid architecture that balances high accuracy with inference efficiency for video-based deepfake detection. The proposed model integrates EfficientNet-B1 as a feature extractor with a Cross-Attention Multi-Scale Vision Transformer (Cross-ViT) for context modeling. The model was trained on a combination of the FaceForensics++ and Celeb-DF(v2) datasets and evaluated on an out-of-distribution dataset, DeepFakeDetection (DFD), to test its generalization capabilities. The evaluation results demonstrate reliable detection performance, achieving an Area Under the Curve (AUC) of 92.35% and a video-level F1-score of 83.62%. The model's primary advantage is its exceptional computational efficiency, requiring only 0.349 G-FLOP for per-frame inference, despite having a large parameter count (114.33 Million). This study also reveals that the use of a small batch size, Face-Cutout augmentation, and a Binary Cross-Entropy (BCE) loss function significantly contributes to improved generalization and effective video-level aggregation. This research successfully validates an efficient and scalable hybrid architecture that offers a practical solution for deepfake detection by balancing accuracy, inference speed, and model size.
    URI
    https://repositori.usu.ac.id/handle/123456789/105815
    Collections
    • Undergraduate Theses [858]

    Repositori Institusi Universitas Sumatera Utara - 2025

    Universitas Sumatera Utara

    Perpustakaan

    Resource Guide

    Katalog Perpustakaan

    Journal Elektronik Berlangganan

    Buku Elektronik Berlangganan

    DSpace software copyright © 2002-2016  DuraSpace
    Contact Us | Send Feedback
    Theme by 
    Atmire NV
     

     

    Browse

    All of USU-IRCommunities & CollectionsBy Issue DateTitlesAuthorsAdvisorsKeywordsTypesBy Submit DateThis CollectionBy Issue DateTitlesAuthorsAdvisorsKeywordsTypesBy Submit Date

    My Account

    LoginRegister

    Repositori Institusi Universitas Sumatera Utara - 2025

    Universitas Sumatera Utara

    Perpustakaan

    Resource Guide

    Katalog Perpustakaan

    Journal Elektronik Berlangganan

    Buku Elektronik Berlangganan

    DSpace software copyright © 2002-2016  DuraSpace
    Contact Us | Send Feedback
    Theme by 
    Atmire NV