The model makes use of a CNN-LSTM model for deepfake detection. The core premise is to feed the sequence of frames first through a time distributed CNN, and then reshape and pass it through an LSTM which performs the sequence analysis. The outputs are generated by a Dense layer which acts as a binary classifier.
I have used OpenCV library to assist with face detection and TensorFlow for everything else. The videos are preprocessed to extract the frames, detect faces, and the corresponding directories are used to build the data pipeline.
The sequence of images is used as input. This model currently does not take audio into consideration but can be extended to take audio as input as well.
For my purposes I have made use of a subset of the data with 1600 videos evenly split across real and deepfakes (50:50)
This can be achieved using OpenCV. Refer the file generate_preprocessed_files.py
for an idea of how to achieve this.
Using OpenCV, we can then detect individual faces in the images, and then use these as a sequence. Here are a few examples
Reason we would prefer using faces and not the whole body is because in this dataset the GANs have only modified the facial structure and not bodily structure. Of course, this could be possible in other sets.
Refer data_pipeline.py
. Nothing too fancy here apart from the use of TensorArrays
, it is an important concept I could not find any examples of online.
Refer train.py
- If you already have preprocessed files, run
train.py
with appropriate parameter changes
>>> python train.py
- If not, pass the directory of the videos to the function
processing()
in the fileprocessing.py
and pass the resultant directory totrain.py
>>> import processing
>>> python processing.processing(location = "enter/directory/here", save_path = 'enter/save_path', capture_sec = 5, num_vids = 500)
>>> python train.py
- If you simply want to use the model, use
load_model()
available under Keras, (assuming you are in the same directory as the model)
>>> import tensorflow as tf
>>> model = tf.keras.models.load_model('model')
and then use the model the way you see fit
Now, the model itself is of small capacity, and the complete database was not used, so a lack of generalisation as well as overfit is to be expected.
The accuracy starts at 77.8%, but over time it steadily picks up peaking at 91.2%. This is consistent with the reduction in loss going from epoch 0 to epoch 8, however, we see a drop in accuracy post that. This happened because of repeated random shuffles, and a small stabilisation factor attached to the loss. Another possible reason is that the learning rate was set too high, forcing the gradient to move outside an optima
Evaluation too was performed on a reduced dataset and as expected, the fit was poor. The model due to limited capacity and insufficient training data (both caused by a lack of computational power) does not generalise well.
OpenCV is an amazing library, but can at times falter to detect faces, or frankly, anything at all. In order to make sure that I capture as much information from the sequences as possible, I averaged the bounding boxes from the past information in the sequence and used it to detect faces in the images where no face was found
All information is provided in code comments