Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

モーフィング機能を追加する #713

Open
wants to merge 62 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
62 commits
Select commit Hold shift + click to select a range
aa80388
`to_wav`を移動
qryxip Dec 24, 2023
883803a
モーフィング機能を追加する
qryxip Dec 24, 2023
06556c9
`Permission`に`StyleId`を持たせる
qryxip Dec 25, 2023
21e0715
Minor refactor
qryxip Dec 25, 2023
a31bd96
voicevox_core.hをアップデート
qryxip Dec 25, 2023
2acd5e8
`readonly`をやめる
qryxip Dec 25, 2023
1162102
`Permission` → `MorphablePair`
qryxip Dec 25, 2023
82260ca
[skip ci] `MorphablePair` → `MorphableTargets`
qryxip Dec 25, 2023
ae080b5
[skip ci] Minor refactor
qryxip Dec 25, 2023
26a72e0
[skip ci] Minor refactor
qryxip Dec 25, 2023
599f6ad
[skip ci] Merge branch 'main' into add-morphing
qryxip Dec 30, 2023
58d6d7d
snapshots.tomlを更新
qryxip Dec 30, 2023
66be03f
`mingw-w64-x86_64-clang`をインストール
qryxip Dec 30, 2023
ccd3c81
`windows-x86-cpu`の`can_skip_in_simple_test`を外す
qryxip Dec 30, 2023
459a881
KyleMayes/install-llvm-actionを使う
qryxip Dec 30, 2023
7205c39
`i686-pc-windows-msvc`からClangのインストールを外してみる
qryxip Dec 30, 2023
471264d
Revert "`windows-x86-cpu`の`can_skip_in_simple_test`を外す"
qryxip Dec 30, 2023
58f6f90
Revert "`i686-pc-windows-msvc`からClangのインストールを外してみる"
qryxip Dec 30, 2023
8ff5a5e
sample.vvmを更新
qryxip Dec 30, 2023
706fdac
`morphable_targets`の単体テスト
qryxip Dec 30, 2023
e21c61c
`24000` → `DEFAULT_SAMPLING_RATE`
qryxip Dec 31, 2023
f53fa11
FIXMEを追加
qryxip Dec 31, 2023
57a81f3
内部メソッド名変更
qryxip Dec 31, 2023
0b896ea
`Morph` → `SpeakerFeature`
qryxip Dec 31, 2023
c8c85b0
`to_wav`を移動
qryxip Dec 31, 2023
e1f94b1
FIXMEコメント変更
qryxip Dec 31, 2023
38b8732
"WARNING"を消す
qryxip Dec 31, 2023
e283209
voicevox_core.hをアップデート
qryxip Dec 31, 2023
bdf874f
C API実装
qryxip Jan 1, 2024
51e22bf
Merge branch 'main' into add-morphing
qryxip Jan 2, 2024
503f035
Python APIの実装
qryxip Jan 2, 2024
9c70222
`morph_rate`を`f32`から`f64`に
qryxip Jan 2, 2024
27a4c7a
Java APIの実装
qryxip Jan 2, 2024
5b014ec
docstringを書く
qryxip Jan 2, 2024
13dcca2
スペクトログラムの計算を修正
qryxip Jan 2, 2024
6d2eb80
`SpeakerFeatureException`の追加
qryxip Jan 3, 2024
dbbf89c
不要な`todo!`分岐を削除
qryxip Jan 3, 2024
f04380e
`Synthesizer`のimplを`morph`側に寄せる
qryxip Jan 3, 2024
03d5055
FIXME追加
qryxip Jan 4, 2024
9c41398
`synthesis_morphing`のテスト
qryxip Jan 4, 2024
8891bb8
`MorphableTargets` → `MorphableStyles`
qryxip Jan 4, 2024
8904be2
スペクトログラムをndarrayで捌く
qryxip Jan 4, 2024
4444c69
Minor refactor
qryxip Jan 4, 2024
bdb7c3b
C APIでも16通りテストする
qryxip Jan 4, 2024
b6d81fa
Merge branch 'main' into add-morphing
qryxip Jan 28, 2024
3710699
Merge branch 'main' into add-morphing
qryxip Feb 10, 2024
5035740
テストを更新
qryxip Feb 10, 2024
998977d
Merge branch 'main' into add-morphing
qryxip Feb 14, 2024
2443754
Merge branch 'main' into add-morphing
qryxip Feb 26, 2024
1925685
Merge branch 'main' into add-morphing
qryxip Mar 16, 2024
2827328
TODOコメントを更新
qryxip Mar 16, 2024
6eb6e40
Merge branch 'main' into add-morphing
qryxip Mar 30, 2024
6f86e8a
Merge branch 'main' into add-morphing
qryxip Apr 9, 2024
cfe60f0
Merge branch 'main' into add-morphing
qryxip Apr 20, 2024
c25cde9
Merge branch 'main' into add-morphing
qryxip Apr 30, 2024
d55a3a8
Merge branch 'main' into add-morphing
qryxip May 3, 2024
6bb862c
Merge branch 'main' into add-morphing
qryxip May 6, 2024
3b839d4
Merge branch 'main' into add-morphing
qryxip May 11, 2024
191ccea
Merge branch 'main' into add-morphing
qryxip May 19, 2024
3b8f429
Fix a test
qryxip May 19, 2024
0ddd0e4
Merge branch 'main' into add-morphing
qryxip May 22, 2024
97b2e81
fixup! Fix a test
qryxip May 22, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions .github/workflows/build_and_deploy.yml
Original file line number Diff line number Diff line change
Expand Up @@ -194,6 +194,11 @@ jobs:
git fetch private refs/tags/${{ env.PRODUCTION_REPOSITORY_TAG }}
git -c user.name=dummy -c user.email=dummy@dummy.dummy merge FETCH_HEAD
) > /dev/null 2>&1
- if: matrix.os == 'windows-2019'
name: Install Clang
uses: KyleMayes/install-llvm-action@v1
with:
version: "16.0"
- name: Set up Python 3.8
if: matrix.whl_local_version
uses: actions/setup-python@v4
Expand Down
5 changes: 5 additions & 0 deletions .github/workflows/test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -125,6 +125,11 @@ jobs:
runs-on: ${{ matrix.os }}
steps:
- uses: actions/checkout@v3
- if: matrix.os == 'windows-2019'
name: Install Clang
uses: KyleMayes/install-llvm-action@v1
with:
version: "16.0"
Comment on lines +128 to +132
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

リリース時以外スキップされるwindows-x86-cpu (i686-pc-windows-msvc)に対しても必要。あるとビルドできるし無いとビルドできない。

can_skip_in_simple_testを外して確かめた。

- name: Set up Python 3.8
uses: actions/setup-python@v4
with:
Expand Down
59 changes: 48 additions & 11 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

11 changes: 9 additions & 2 deletions Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@ anstyle-query = "1.0.0"
anyhow = "1.0.65"
assert_cmd = "2.0.8"
async_zip = "=0.0.16"
az = "1.2.1"
bindgen = "0.69.4"
binstall-tar = "0.4.39"
bytes = "1.1.0"
Expand Down Expand Up @@ -46,12 +47,14 @@ jni = "0.21.1"
libc = "0.2.134"
libloading = "0.7.3"
libtest-mimic = "0.6.0"
lit2 = "1.0.9"
log = "0.4.17"
nanoid = "0.4.0"
ndarray = "0.15.6"
ndarray-stats = "0.5.1"
num-traits = "0.2.15"
octocrab = { version = "0.19.0", default-features = false }
once_cell = "1.18.0"
once_cell = "1.19.0"
ouroboros = "0.18.0"
parse-display = "0.8.2"
pretty_assertions = "1.3.0"
Expand All @@ -61,7 +64,7 @@ pyo3-asyncio = "0.20.0"
pyo3-log = "0.9.0"
quote = "1.0.33"
rayon = "1.6.1"
regex = "1.10.0"
regex = "1.10.4"
reqwest = { version = "0.11.13", default-features = false }
rstest = "0.15.0"
rstest_reuse = "0.6.0"
Expand Down Expand Up @@ -99,6 +102,10 @@ rev = "e1940f3fd61a48bed5bbec8cd2645e13923b1f80"
git = "/~https://github.com/VOICEVOX/process_path.git"
rev = "de226a26e8e18edbdb1d6f986afe37bbbf35fbf4"

[workspace.dependencies.world]
git = "/~https://github.com/White-Green/WORLD_rs.git"
rev = "37c0d11691afd42e37c627a2a964459c9eaf77b3"

[workspace.package]
version = "0.0.0"
edition = "2021"
Expand Down
4 changes: 4 additions & 0 deletions crates/voicevox_core/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@ directml = ["voicevox-ort/directml"]
[dependencies]
anyhow.workspace = true
async_zip = { workspace = true, features = ["deflate"] }
az.workspace = true
camino.workspace = true
derive-getters.workspace = true
derive-new.workspace = true
Expand All @@ -27,6 +28,7 @@ itertools.workspace = true
jlabel.workspace = true
nanoid.workspace = true
ndarray.workspace = true
num-traits.workspace = true
once_cell.workspace = true
open_jtalk.workspace = true
ouroboros.workspace = true
Expand All @@ -44,10 +46,12 @@ tracing.workspace = true
uuid = { workspace = true, features = ["v4", "serde"] }
voicevox_core_macros = { path = "../voicevox_core_macros" }
voicevox-ort = { workspace = true, features = ["ndarray", "download-binaries"] }
world = { workspace = true, features = ["ndarray"] }
zip.workspace = true

[dev-dependencies]
heck.workspace = true
lit2.workspace = true
pretty_assertions.workspace = true
rstest.workspace = true
rstest_reuse.workspace = true
Expand Down
60 changes: 60 additions & 0 deletions crates/voicevox_core/src/engine/audio_file.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,60 @@
use std::io::{Cursor, Write as _};

use az::{Az as _, Cast};
use num_traits::Float;

use crate::{synthesizer::DEFAULT_SAMPLING_RATE, AudioQueryModel};

pub(crate) fn to_wav<T: Float + From<i16> + From<f32> + Cast<i16>>(
wave: &[T],
audio_query: &AudioQueryModel,
) -> Vec<u8> {
// TODO: /~https://github.com/VOICEVOX/voicevox_core/issues/762

let volume_scale = *audio_query.volume_scale();
let output_stereo = *audio_query.output_stereo();
let output_sampling_rate = *audio_query.output_sampling_rate();

// TODO: 44.1kHzなどの対応
qryxip marked this conversation as resolved.
Show resolved Hide resolved

let num_channels: u16 = if output_stereo { 2 } else { 1 };
let bit_depth: u16 = 16;
let repeat_count: u32 = (output_sampling_rate / DEFAULT_SAMPLING_RATE) * num_channels as u32;
let block_size: u16 = bit_depth * num_channels / 8;

let bytes_size = wave.len() as u32 * repeat_count * 2;
let wave_size = bytes_size + 44;

let buf: Vec<u8> = Vec::with_capacity(wave_size as usize);
let mut cur = Cursor::new(buf);

cur.write_all("RIFF".as_bytes()).unwrap();
cur.write_all(&(wave_size - 8).to_le_bytes()).unwrap();
cur.write_all("WAVEfmt ".as_bytes()).unwrap();
cur.write_all(&16_u32.to_le_bytes()).unwrap(); // fmt header length
cur.write_all(&1_u16.to_le_bytes()).unwrap(); //linear PCM
cur.write_all(&num_channels.to_le_bytes()).unwrap();
cur.write_all(&output_sampling_rate.to_le_bytes()).unwrap();

let block_rate = output_sampling_rate * block_size as u32;

cur.write_all(&block_rate.to_le_bytes()).unwrap();
cur.write_all(&block_size.to_le_bytes()).unwrap();
cur.write_all(&bit_depth.to_le_bytes()).unwrap();
cur.write_all("data".as_bytes()).unwrap();
cur.write_all(&bytes_size.to_le_bytes()).unwrap();

for &value in wave {
let v = num_traits::clamp(
value * <T as From<_>>::from(volume_scale),
-T::one(),
T::one(),
);
let data = (v * <T as From<_>>::from(0x7fff)).az::<i16>();
for _ in 0..repeat_count {
cur.write_all(&data.to_le_bytes()).unwrap();
}
}

cur.into_inner()
}
5 changes: 4 additions & 1 deletion crates/voicevox_core/src/engine/mod.rs
Original file line number Diff line number Diff line change
@@ -1,15 +1,18 @@
mod acoustic_feature_extractor;
pub(crate) mod audio_file;
mod full_context_label;
mod kana_parser;
mod model;
mod mora_list;
mod morph;
pub(crate) mod open_jtalk;

pub(crate) use self::acoustic_feature_extractor::OjtPhoneme;
pub(crate) use self::audio_file::to_wav;
pub(crate) use self::full_context_label::{
extract_full_context_label, mora_to_text, FullContextLabelError,
};
pub(crate) use self::kana_parser::{create_kana, parse_kana, KanaParseError};
pub use self::model::{AccentPhraseModel, AudioQueryModel, MoraModel};
pub use self::model::{AccentPhraseModel, AudioQueryModel, MoraModel, MorphableTargetInfo};
pub(crate) use self::mora_list::mora2text;
pub use self::open_jtalk::FullcontextExtractor;
5 changes: 5 additions & 0 deletions crates/voicevox_core/src/engine/model.rs
Original file line number Diff line number Diff line change
Expand Up @@ -82,6 +82,11 @@ impl AudioQueryModel {
}
}

#[derive(Deserialize, Serialize, PartialEq, Debug)]
pub struct MorphableTargetInfo {
pub is_morphable: bool,
}

#[cfg(test)]
mod tests {
use pretty_assertions::assert_eq;
Expand Down
Loading
Loading