From 9023c7315d7bc7e532697dd71d0d56ac9b9a518d Mon Sep 17 00:00:00 2001 From: Shadow Cun Date: Thu, 20 Apr 2023 22:58:35 +0800 Subject: [PATCH 1/9] Update README.md --- README.md | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/README.md b/README.md index af83e996..58e98743 100644 --- a/README.md +++ b/README.md @@ -121,9 +121,10 @@ Tutorials from communities: [中文windows教程](https://www.bilibili.com/video ### Windows ([中文windows教程](https://www.bilibili.com/video/BV1Dc411W7V6/)): 1. Install [Python 3.10.6](https://www.python.org/downloads/windows/), checking "Add Python to PATH". -2. Install [git](https://git-scm.com/download/win). -3. Install `ffmpeg`, following [this instruction](https://www.wikihow.com/Install-FFmpeg-on-Windows). +2. Install [git](https://git-scm.com/download/win) manually (OR `scoop install git` via [scoop](https://scoop.sh/)). +3. Install `ffmpeg`, following [this instruction](https://www.wikihow.com/Install-FFmpeg-on-Windows) (OR using `scoop install ffmpeg` via [scoop](https://scoop.sh/)). 4. Download our SadTalker repository, for example by running `git clone /~https://github.com/Winfredy/SadTalker.git`. +5. Download the `checkpoint` and `gfpgan` [below↓](/~https://github.com/Winfredy/SadTalker#-2-download-trained-models). 5. Run `start.bat` from Windows Explorer as normal, non-administrator, user, a gradio WebUI demo will be started. ### Macbook: From a930df3c2309305b702f5680cb62493f4e00e9a5 Mon Sep 17 00:00:00 2001 From: Shadow Cun Date: Tue, 25 Apr 2023 01:12:40 +0800 Subject: [PATCH 2/9] Update FAQ.md --- docs/FAQ.md | 12 ++++++++++++ 1 file changed, 12 insertions(+) diff --git a/docs/FAQ.md b/docs/FAQ.md index fe758809..41e2dab3 100644 --- a/docs/FAQ.md +++ b/docs/FAQ.md @@ -26,3 +26,15 @@ Make sure you have downloaded the checkpoints and gfpgan as [here](https://githu **Q: RuntimeError: unexpected EOF, expected 237192 more bytes. The file might be corrupted.** The files are not automatically downloaded. Please update the code and download the gfpgan folders as [here](/~https://github.com/Winfredy/SadTalker#-2-download-trained-models). + +**Q: CUDA out of memory error** + +please refer to https://stackoverflow.com/questions/73747731/runtimeerror-cuda-out-of-memory-how-setting-max-split-size-mb + +``` +# windows +set PYTORCH_CUDA_ALLOC_CONF=max_split_size_mb:128 python inference.py ... + +# linux +export PYTORCH_CUDA_ALLOC_CONF=max_split_size_mb:128 python inference.py ... +``` From 643fc4c9d20bb23633428916db903e90d55729ba Mon Sep 17 00:00:00 2001 From: Shadow Cun Date: Tue, 25 Apr 2023 01:13:00 +0800 Subject: [PATCH 3/9] Update FAQ.md --- docs/FAQ.md | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/docs/FAQ.md b/docs/FAQ.md index 41e2dab3..763e24a4 100644 --- a/docs/FAQ.md +++ b/docs/FAQ.md @@ -33,8 +33,10 @@ please refer to https://stackoverflow.com/questions/73747731/runtimeerror-cuda-o ``` # windows -set PYTORCH_CUDA_ALLOC_CONF=max_split_size_mb:128 python inference.py ... +set PYTORCH_CUDA_ALLOC_CONF=max_split_size_mb:128 +python inference.py ... # linux -export PYTORCH_CUDA_ALLOC_CONF=max_split_size_mb:128 python inference.py ... +export PYTORCH_CUDA_ALLOC_CONF=max_split_size_mb:128 +python inference.py ... ``` From f8ad3222b259cc0c7486fe509c7226091fa5bd23 Mon Sep 17 00:00:00 2001 From: Shadow Cun Date: Tue, 25 Apr 2023 01:15:48 +0800 Subject: [PATCH 4/9] Update FAQ.md --- docs/FAQ.md | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/docs/FAQ.md b/docs/FAQ.md index 763e24a4..6451a226 100644 --- a/docs/FAQ.md +++ b/docs/FAQ.md @@ -40,3 +40,7 @@ python inference.py ... export PYTORCH_CUDA_ALLOC_CONF=max_split_size_mb:128 python inference.py ... ``` + +**Q: Error while decoding stream #0:0: Invalid data found when processing input [mp3float @ 0000015037628c00] Header missing** + +Our method only support wav or mp3 files as input, please make sure the feeded audios are in these formats. From 0fc2f9c0e96f51ccf120dc3ee6ba55e9ad13e90a Mon Sep 17 00:00:00 2001 From: Chenxi Date: Tue, 2 May 2023 06:55:12 +0000 Subject: [PATCH 5/9] replicate --- README.md | 3 +- cog.yaml | 35 +++++++ predict.py | 214 ++++++++++++++++++++++++++++++++++++++ src/facerender/animate.py | 3 +- 4 files changed, 252 insertions(+), 3 deletions(-) create mode 100644 cog.yaml create mode 100644 predict.py diff --git a/README.md b/README.md index 58e98743..14c7ad44 100644 --- a/README.md +++ b/README.md @@ -5,8 +5,7 @@ -             [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Winfredy/SadTalker/blob/main/quick_demo.ipynb)       [![Hugging Face Spaces](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https://huggingface.co/spaces/vinthony/SadTalker)       [![sd webui-colab](https://img.shields.io/badge/Automatic1111-Colab-green)](https://colab.research.google.com/github/camenduru/stable-diffusion-webui-colab/blob/main/video/stable/stable_diffusion_1_5_video_webui_colab.ipynb) - +             [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Winfredy/SadTalker/blob/main/quick_demo.ipynb)       [![Hugging Face Spaces](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https://huggingface.co/spaces/vinthony/SadTalker)       [![sd webui-colab](https://img.shields.io/badge/Automatic1111-Colab-green)](https://colab.research.google.com/github/camenduru/stable-diffusion-webui-colab/blob/main/video/stable/stable_diffusion_1_5_video_webui_colab.ipynb)       [![Replicate](https://replicate.com/cjwbw/sadtalker/badge)](https://replicate.com/cjwbw/sadtalker)
Wenxuan Zhang *,1,2   diff --git a/cog.yaml b/cog.yaml new file mode 100644 index 00000000..05bcbd58 --- /dev/null +++ b/cog.yaml @@ -0,0 +1,35 @@ +build: + gpu: true + cuda: "11.3" + python_version: "3.8" + system_packages: + - "ffmpeg" + - "libgl1-mesa-glx" + - "libglib2.0-0" + python_packages: + - "torch==1.12.1" + - "torchvision==0.13.1" + - "torchaudio==0.12.1" + - "joblib==1.1.0" + - "scikit-image==0.19.3" + - "basicsr==1.4.2" + - "facexlib==0.3.0" + - "resampy==0.3.1" + - "pydub==0.25.1" + - "scipy==1.10.1" + - "kornia==0.6.8" + - "face_alignment==1.3.5" + - "imageio==2.19.3" + - "imageio-ffmpeg==0.4.7" + - "librosa==0.9.2" # + - "tqdm==4.65.0" + - "yacs==0.1.8" + - "gfpgan==1.3.8" + - "dlib-bin==19.24.1" + - "av==10.0.0" + - "trimesh==3.9.20" + run: + - mkdir -p /root/.cache/torch/hub/checkpoints/ && wget --output-document "/root/.cache/torch/hub/checkpoints/s3fd-619a316812.pth" "https://www.adrianbulat.com/downloads/python-fan/s3fd-619a316812.pth" + - mkdir -p /root/.cache/torch/hub/checkpoints/ && wget --output-document "/root/.cache/torch/hub/checkpoints/2DFAN4-cd938726ad.zip" "https://www.adrianbulat.com/downloads/python-fan/2DFAN4-cd938726ad.zip" + +predict: "predict.py:Predictor" diff --git a/predict.py b/predict.py new file mode 100644 index 00000000..1a44a663 --- /dev/null +++ b/predict.py @@ -0,0 +1,214 @@ +"""run bash scripts/download_models.sh first to prepare the weights file""" +import os +import shutil +from argparse import Namespace +from src.utils.preprocess import CropAndExtract +from src.test_audio2coeff import Audio2Coeff +from src.facerender.animate import AnimateFromCoeff +from src.generate_batch import get_data +from src.generate_facerender_batch import get_facerender_data +from cog import BasePredictor, Input, Path + +checkpoints = "checkpoints" + + +class Predictor(BasePredictor): + def setup(self): + """Load the model into memory to make running multiple predictions efficient""" + device = "cuda" + + path_of_lm_croper = os.path.join( + checkpoints, "shape_predictor_68_face_landmarks.dat" + ) + path_of_net_recon_model = os.path.join(checkpoints, "epoch_20.pth") + dir_of_BFM_fitting = os.path.join(checkpoints, "BFM_Fitting") + wav2lip_checkpoint = os.path.join(checkpoints, "wav2lip.pth") + + audio2pose_checkpoint = os.path.join(checkpoints, "auido2pose_00140-model.pth") + audio2pose_yaml_path = os.path.join("src", "config", "auido2pose.yaml") + + audio2exp_checkpoint = os.path.join(checkpoints, "auido2exp_00300-model.pth") + audio2exp_yaml_path = os.path.join("src", "config", "auido2exp.yaml") + + free_view_checkpoint = os.path.join( + checkpoints, "facevid2vid_00189-model.pth.tar" + ) + + # init model + self.preprocess_model = CropAndExtract( + path_of_lm_croper, path_of_net_recon_model, dir_of_BFM_fitting, device + ) + + self.audio_to_coeff = Audio2Coeff( + audio2pose_checkpoint, + audio2pose_yaml_path, + audio2exp_checkpoint, + audio2exp_yaml_path, + wav2lip_checkpoint, + device, + ) + + self.animate_from_coeff = { + "full": AnimateFromCoeff( + free_view_checkpoint, + os.path.join(checkpoints, "mapping_00109-model.pth.tar"), + os.path.join("src", "config", "facerender_still.yaml"), + device, + ), + "others": AnimateFromCoeff( + free_view_checkpoint, + os.path.join(checkpoints, "mapping_00229-model.pth.tar"), + os.path.join("src", "config", "facerender.yaml"), + device, + ), + } + + def predict( + self, + source_image: Path = Input( + description="Upload the source image, it can be video.mp4 or picture.png", + ), + driven_audio: Path = Input( + description="Upload the driven audio, accepts .wav and .mp4 file", + ), + enhancer: str = Input( + description="Choose a face enhancer", + choices=["gfpgan", "RestoreFormer"], + default="gfpgan", + ), + preprocess: str = Input( + description="how to preprocess the images", + choices=["crop", "resize", "full"], + default="full", + ), + ref_eyeblink: Path = Input( + description="path to reference video providing eye blinking", + default=None, + ), + ref_pose: Path = Input( + description="path to reference video providing pose", + default=None, + ), + still: bool = Input( + description="can crop back to the original videos for the full body aniamtion when preprocess is full", + default=True, + ), + ) -> Path: + """Run a single prediction on the model""" + + animate_from_coeff = ( + self.animate_from_coeff["full"] + if preprocess == "full" + else self.animate_from_coeff["others"] + ) + + args = load_default() + args.pic_path = str(source_image) + args.audio_path = str(driven_audio) + device = "cuda" + args.still = still + args.ref_eyeblink = None if ref_eyeblink is None else str(ref_eyeblink) + args.ref_pose = None if ref_pose is None else str(ref_pose) + + # crop image and extract 3dmm from image + results_dir = "results" + if os.path.exists(results_dir): + shutil.rmtree(results_dir) + os.makedirs(results_dir) + first_frame_dir = os.path.join(results_dir, "first_frame_dir") + os.makedirs(first_frame_dir) + + print("3DMM Extraction for source image") + first_coeff_path, crop_pic_path, crop_info = self.preprocess_model.generate( + args.pic_path, first_frame_dir, preprocess, source_image_flag=True + ) + if first_coeff_path is None: + print("Can't get the coeffs of the input") + return + + if ref_eyeblink is not None: + ref_eyeblink_videoname = os.path.splitext(os.path.split(ref_eyeblink)[-1])[ + 0 + ] + ref_eyeblink_frame_dir = os.path.join(results_dir, ref_eyeblink_videoname) + os.makedirs(ref_eyeblink_frame_dir, exist_ok=True) + print("3DMM Extraction for the reference video providing eye blinking") + ref_eyeblink_coeff_path, _, _ = self.preprocess_model.generate( + ref_eyeblink, ref_eyeblink_frame_dir + ) + else: + ref_eyeblink_coeff_path = None + + if ref_pose is not None: + if ref_pose == ref_eyeblink: + ref_pose_coeff_path = ref_eyeblink_coeff_path + else: + ref_pose_videoname = os.path.splitext(os.path.split(ref_pose)[-1])[0] + ref_pose_frame_dir = os.path.join(results_dir, ref_pose_videoname) + os.makedirs(ref_pose_frame_dir, exist_ok=True) + print("3DMM Extraction for the reference video providing pose") + ref_pose_coeff_path, _, _ = self.preprocess_model.generate( + ref_pose, ref_pose_frame_dir + ) + else: + ref_pose_coeff_path = None + + # audio2ceoff + batch = get_data( + first_coeff_path, + args.audio_path, + device, + ref_eyeblink_coeff_path, + still=still, + ) + coeff_path = self.audio_to_coeff.generate( + batch, results_dir, args.pose_style, ref_pose_coeff_path + ) + # coeff2video + print("coeff2video") + data = get_facerender_data( + coeff_path, + crop_pic_path, + first_coeff_path, + args.audio_path, + args.batch_size, + args.input_yaw, + args.input_pitch, + args.input_roll, + expression_scale=args.expression_scale, + still_mode=still, + preprocess=preprocess, + ) + animate_from_coeff.generate( + data, results_dir, args.pic_path, crop_info, + enhancer=enhancer, background_enhancer=args.background_enhancer, + preprocess=preprocess) + + output = "/tmp/out.mp4" + mp4_path = os.path.join(results_dir, [f for f in os.listdir(results_dir) if "enhanced.mp4" in f][0]) + shutil.copy(mp4_path, output) + + return Path(output) + + +def load_default(): + return Namespace( + pose_style=0, + batch_size=2, + expression_scale=1.0, + input_yaw=None, + input_pitch=None, + input_roll=None, + background_enhancer=None, + face3dvis=False, + net_recon="resnet50", + init_path=None, + use_last_fc=False, + bfm_folder="./checkpoints/BFM_Fitting/", + bfm_model="BFM_model_front.mat", + focal=1015.0, + center=112.0, + camera_d=10.0, + z_near=5.0, + z_far=15.0, + ) diff --git a/src/facerender/animate.py b/src/facerender/animate.py index 3adea961..2ee28e73 100644 --- a/src/facerender/animate.py +++ b/src/facerender/animate.py @@ -177,7 +177,8 @@ def generate(self, x, video_save_dir, pic_path, crop_info, enhancer=None, backgr audio_name = os.path.splitext(os.path.split(audio_path)[-1])[0] new_audio_path = os.path.join(video_save_dir, audio_name+'.wav') start_time = 0 - sound = AudioSegment.from_mp3(audio_path) + # cog will not keep the .mp3 filename + sound = AudioSegment.from_file(audio_path) frames = frame_num end_time = start_time + frames*1/25*1000 word1=sound.set_frame_rate(16000) From a60d62b13cc70294972425ff756c60668ebf2dbc Mon Sep 17 00:00:00 2001 From: Shadow Cun Date: Thu, 4 May 2023 11:40:11 +0800 Subject: [PATCH 6/9] Update README.md --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index 14c7ad44..7724c7cf 100644 --- a/README.md +++ b/README.md @@ -5,7 +5,7 @@ -             [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Winfredy/SadTalker/blob/main/quick_demo.ipynb)       [![Hugging Face Spaces](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https://huggingface.co/spaces/vinthony/SadTalker)       [![sd webui-colab](https://img.shields.io/badge/Automatic1111-Colab-green)](https://colab.research.google.com/github/camenduru/stable-diffusion-webui-colab/blob/main/video/stable/stable_diffusion_1_5_video_webui_colab.ipynb)       [![Replicate](https://replicate.com/cjwbw/sadtalker/badge)](https://replicate.com/cjwbw/sadtalker) +     [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Winfredy/SadTalker/blob/main/quick_demo.ipynb)   [![Hugging Face Spaces](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https://huggingface.co/spaces/vinthony/SadTalker)   [![sd webui-colab](https://img.shields.io/badge/Automatic1111-Colab-green)](https://colab.research.google.com/github/camenduru/stable-diffusion-webui-colab/blob/main/video/stable/stable_diffusion_1_5_video_webui_colab.ipynb)   [![Replicate](https://replicate.com/cjwbw/sadtalker/badge)](https://replicate.com/cjwbw/sadtalker)
Wenxuan Zhang *,1,2   From a9034df0b3a1f6e52ffb5a906efdcabd53476638 Mon Sep 17 00:00:00 2001 From: kainstan Date: Wed, 10 May 2023 21:21:05 +0800 Subject: [PATCH 7/9] . idea is a pychar configuration file that should be ignored --- .gitignore | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/.gitignore b/.gitignore index 65365db2..851588a9 100644 --- a/.gitignore +++ b/.gitignore @@ -157,7 +157,7 @@ cython_debug/ # be found at /~https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore # and can be added to the global gitignore or merged into this file. For a more nuclear # option (not recommended) you can uncomment the following to ignore the entire idea folder. -#.idea/ +.idea/ examples/results/* gfpgan/* From 44889c3a8bd1fe8a2014d915e909673bef72afa6 Mon Sep 17 00:00:00 2001 From: kainstan Date: Thu, 11 May 2023 16:52:26 +0800 Subject: [PATCH 8/9] .DS_ Store is a hidden configuration file for Mac and should be ignored --- .gitignore | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/.gitignore b/.gitignore index 851588a9..0ecb8ed9 100644 --- a/.gitignore +++ b/.gitignore @@ -165,4 +165,7 @@ checkpoints/ results/* Dockerfile start_docker.sh -start.sh \ No newline at end of file +start.sh + +# Mac +.DS_Store From bbe54e928d71bd5c0c0650972450fa4907f3e34b Mon Sep 17 00:00:00 2001 From: ribasoka Date: Mon, 15 May 2023 23:09:05 +0300 Subject: [PATCH 9/9] Update app.py - Fixed timeout bug --- app.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/app.py b/app.py index edde0cf4..4bb206b0 100644 --- a/app.py +++ b/app.py @@ -144,6 +144,6 @@ def sadtalker_demo(): if __name__ == "__main__": demo = sadtalker_demo() - demo.launch(share=True) + demo.queue().launch(share=True)