Playing with Realistic Neural Talking Head Models
Researchers at the Samsung AI Center in Moscow (Russia) have recently presented interesting work called Living portraits: they made Mona Lisa and other subjects of photos and art alive using video of real people. They presented a framework for meta-learning of adversarial generative models called “Few-Shot Adversarial Learning”.
You can read more about details in the original paper.
Here we review this great implementation of the algorithm in PyTorch. The author of this implementation is Vincent Thévenin - research worker in De Vinci Innovation Center.
Starting with Realistic-Neural-Talking-Head-Models
Clone the repo and move to the project root.
Download required files from here. You will get two files: Pytorch_VGGFACE_IR.py (PyTorch code) and Pytorch_VGGFACE.pth (Pytorch model).
We will not train the model as author provided own pretrained weights on Google Drive.
Install required matplotlib, opencv and face_alignment libraries
pip install matplotlib opencv-python face_alignment
sudo apt-get install python-tk
We also need to install NVIDIA driver required to run embedder_inference.py. Download NVIDIA driver for your GPU from here. For instance for Tesla K80:
wget http://us.download.nvidia.com/tesla/440.33.01/NVIDIA-Linux-x86_64-440.33.01.run
Make run file executable and install the driver
chmod +x NVIDIA-Linux-x86_64-440.33.01.run
sudo ./NVIDIA-Linux-x86_64-440.33.01.run
In some cases this variant fails due to X server. Alternative variant to install NVIDIA driver is following. Identify the recommended graphics driver for your system
ubuntu-drivers devices
And install the recommended NVIDIA driver and reboot
sudo apt-get install nvidia-384
Finally we need to install PyTorch according to version of CUDA installed. Get the version of CUDA installed
nvcc --version
Let’s say we have CUDA v10.1
Cuda compilation tools, release 10.1, V10.1.243
Go to page https://pytorch.org/ and select appropriate instruction to install PyTorch
pip install torch torchvision
Let’s run the embedder (embedder_inference.py) on videos or images of a person and get embedding vector :
python embedder_inference.py
Output:
Saving e_hat...
...Done saving
You will get two files: e_hat_images.tar and e_hat_video.tar. Let’s run finetuning_training.py:
python finetuning_training.py
Output:
What source to finetune on?
0: Video
1: Images
Enter 1 for images
Downloading: "https://download.pytorch.org/models/vgg19-dcbb9e9d.pth" to /home/vladimir/.cache/torch/checkpoints/vgg19-dcbb9e9d.pth
...
avg batch time for batch size of 1 : 0:00:12.463398
[0/40][0/1] Loss_D: 2.0553 Loss_G: 5.2649 D(x): 1.0553 D(G(y)): 1.0553
At the end you will get similar image
It’s seen that the quality is bad.
Try training on video
avg batch time for batch size of 1 : 0:00:13.738919
[0/40][0/1] Loss_D: 2.0539 Loss_G: 3.3976 D(x): 1.0539 D(G(y)): 1.0539
At that time losses are lower and the final results look better.
Result of finetuning_training on own image
And original
Here is a script webcam_inference.py available for testing image generation on video from camera. Run the webcam_inference.py
python webcam_inference.py
This script will run the model using person from embedding vector and webcam input, performs only inference. The script generates three images: facial landmark, original (me) and fake. This time inference is done for model fine tuned on video.
Now try it when model fine tuned on images
Inference performs very slow. That took few minutes to start on GCE VM with NVIDIA GPU.
We can retrain the generator on videos to get better results. Author of the project stated that the generator was trained on 5 epochs which is not optimal.
For training you can use VoxCeleb2 dataset. To download the dataset you should request access to dataset filling the form here. I have a github repo with bash script for downloading all parts of the dataset. Run script to download dataset:
sh download.sh
Note: each part of dataset weights 30GB (270GB in total).
Once all parts are downloaded concatenate the files to single zip archive:
cat vox2_dev* > vox2_mp4.zip
Now change a path to mp4 folder with training videos in file train.py (line 21):
path_to_mp4 = '../../Data/vox2_mp4/dev/mp4'
dataset = VidDataSet(K=8, path_to_mp4 = path_to_mp4, device=device)
Now run the train.py script
python train.py
This should print out
Initiating new checkpoint...
...Done
Downloading the face detection CNN. Please wait...
Downloading the Face Alignment Network(FAN). Please wait...
If you get error
ImportError: No module named dataset.dataset_class
use Python v3. You need to install all the required libraries above using pip3.