Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PaddleSpeech 快乐开源活动 (2025 H1) #3997

Open
zxcd opened this issue Feb 27, 2025 · 0 comments
Open

PaddleSpeech 快乐开源活动 (2025 H1) #3997

zxcd opened this issue Feb 27, 2025 · 0 comments
Assignees

Comments

@zxcd
Copy link
Collaborator

zxcd commented Feb 27, 2025

📣PaddleSpeech 快乐开源活动

旨在鼓励更多的开发者参与到飞桨大模型套件的开源建设中,帮助社区修复 bug 或贡献 feature,共建飞桨。

任务目标

目前由于版本问题,文档已经跟不上代码啦!

  • 按照readme操作可以完全跑通
  • 文档与代码一致
  • 文档书写错误

任务一:修正合成vocoder中的synthesize_e2e.sh中参数错误

序号 待修改文件 认领人/状态/PR 号
1 examples/csmsc/voc1/local/synthesize_e2e.sh  
2 examples/csmsc/voc3/local/synthesize_e2e.sh
3 examples/csmsc/voc5/local/synthesize_e2e.sh

任务二:补全合成系列中的脚本中参数缺失

序号 待修改文件 认领人/状态/PR 号
4 examples/aishell3/tts3/run.sh
examples/aishell3/tts3/README.md
 
5 examples/aishell3_vctk/ernie_sat/run.sh
examples/aishell3_vctk/ernie_sat/README.md
 
6 examples/canton/tts3/run.sh
examples/canton/tts3/README.md
 
7 examples/csmsc/tts0/run.sh
examples/csmsc/tts0/README.md
 
8 examples/csmsc/tts2/run.sh
examples/csmsc/tts2/README.md
 
9 examples/csmsc/tts3/run.sh
examples/csmsc/tts3/README.md
 
10 examples/csmsc/tts3_rhy/run.sh
examples/csmsc/tts3_rhy/README.md
 
11 examples/ljspeech/tts3/run.sh
examples/ljspeech/tts3/README.md
 
12 examples/opencpop/svs1/run.sh
examples/opencpop/svs1/README.md
 
13 examples/vctk/ernie_sat/run.sh
examples/vctk/ernie_sat/README.md
 
14 examples/vctk/tts3/run.sh
examples/vctk/tts3/README.md
 

任务三:修正文本书写错误(随时更新)

序号 待修改文件 认领人/状态/PR 号
15 examples/csmsc/voc3/README.md  

任务一修改示例

修正目标:examples/*/voc*/local/synthesize_e2e.sh 例如:examples/csmsc/voc1/local/synthesize_e2e.sh

synthesize_e2e.sh 中代码如下:
python3 ${BIN_DIR}/../synthesize.py \
    --am=tacotron2_aishell3 \
    --am_config=${config_path} \
    --am_ckpt=${train_output_path}/checkpoints/${ckpt_name} \
    --am_stat=dump/train/speech_stats.npy \
    --voc=pwgan_aishell3 \
    --voc_config=pwg_aishell3_ckpt_0.5/default.yaml \
    --voc_ckpt=pwg_aishell3_ckpt_0.5/snapshot_iter_1000000.pdz \
    --voc_stat=pwg_aishell3_ckpt_0.5/feats_stats.npy \
    --test_metadata=dump/test/norm/metadata.jsonl \
    --output_dir=${train_output_path}/test \
    --phones_dict=dump/phone_id_map.txt \
    --speaker_dict=dump/speaker_id_map.txt \
    --voice-cloning=True

由于合成时训练的是 voc 而非 am, 因此包含train_output_path的应该是 --voc, --voc_config 等 voc 相关部分,--am 相关部分按照 examples/csmsc/voc1/README.md 中的描述修改为 fastspeech2_nosil_baker_ckpt_0.4 文件夹下的相关文件。

任务二修改示例

修正目标:examples/*/*/local/run.shexamples/*/*/README.md
在部分 synthesize_e2e.shsynthesize.sh 中,通过对 stage 的修改支持多种模型的推理,但该参数未在对应的 run.shREADME.md 中暴露,需要将参数和对应的说明添加补充全。
例如 :examples/aishell3/tts3/local/synthesize_e2e.sh 中通过 stage 控制分别使用 pwgan,hifigan 进行推理。

  • run.sh 中修改:
if [ ${stage} -le 2 ] && [ ${stop_stage} -ge 2 ]; then
    # synthesize, vocoder is pwgan by default stage 0, stage 1 will use hifigan as vocoder
    CUDA_VISIBLE_DEVICES=${gpus} ./local/synthesize.sh --stage 0 ${conf_path} ${train_output_path} ${ckpt_name} || exit -1
fi

if [ ${stage} -le 3 ] && [ ${stop_stage} -ge 3 ]; then
    # synthesize_e2e, vocoder is pwgan by default stage 0, stage 1 will use hifigan as vocoder
    CUDA_VISIBLE_DEVICES=${gpus} ./local/synthesize_e2e.sh --stage 0 ${conf_path} ${train_output_path} ${ckpt_name} || exit -1
fi
  • README.md 中修改:
`./local/synthesize.sh` calls `${BIN_DIR}/../synthesize.py`, which can synthesize waveform from `metadata.jsonl`. 

CUDA_VISIBLE_DEVICES=${gpus} ./local/synthesize.sh --stage 0 ${conf_path} ${train_output_path} ${ckpt_name}

`--stage` controls the vocoder model during synthesis, which can be `0` or `1`, use `pwgan` or `hifigan` model as vocoder.

任务三修改示例

修改examples/csmsc/voc3/README.md

HiFiGAN checkpoint contains files listed below.
mb_melgan_csmsc_ckpt_0.1.1
├── default.yaml                    # default config used to train MultiBand MelGAN
├── feats_stats.npy                 # statistics used to normalize spectrogram when training MultiBand MelGAN
└── snapshot_iter_1000000.pdz       # generator parameters of MultiBand MelGAN

README.md 中模型下载 MultiBand MelGAN 模型,但文件列表写的是 HiFiGAN 。

看板信息

任务方向 任务数量 提交作品 / 任务认领 提交率 完成 完成率
PaddleSpeech 快乐开源活动 15 0 / 0 0.0% 0 0.0%

统计信息

排名不分先后

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: In Progress
Development

No branches or pull requests

2 participants