Convolution1D and Deconvolution1D layers #4811

magicse · 2023-06-18T11:44:39Z

Simple question.
My model has many Convolution1D and Deconvolution1D layers. the execution time on CPU and VULKAN is about the same. I just wanted to know if ncnn supports VULKAN acceleration for Convolution1D and Deconvolution1D layers?

nihui · 2023-06-18T14:10:48Z

currently, no vulkan conv1d / deconv1d

magicse · 2023-06-18T14:13:25Z

Thank You @nihui .
Is there an example of a custom layer template somewhere that VULKAN uses?
Something like this implement-custom-layer-step-by-step but for VULKAN.
I want to try to make my custom Conv1d layer with VULKAN support
Because without VULKAN my HIFI GAN vocoder is quite slow. Vocal phrase in 3 seconds generated in 36 seconds

Baiyuetribe · 2023-06-19T11:12:11Z

Additionally, this holds true for the vocoders of both VITS and DiffSinger, in summary, all TTS synthesis relies on this.

magicse · 2023-06-21T06:03:30Z

I had to create
Convolution1D_vulkan.cpp

#include "Convolution1D_vulkan.h"
#include "layer_shader_type.h"
#include "layer_type.h"

Convolution1D_vulkan::Convolution1D_vulkan()
{
    one_blob_only = true;
    support_vulkan = true;
    support_image_storage = true;
    pipeline_convolution1d = 0;
    reshape_w = 0;
}
int Convolution1D_vulkan::create_pipeline(const Option& _opt)
{
 ...
}
int Convolution1D_vulkan::destroy_pipeline(const Option&)
{
	//
}
int Convolution1D_vulkan::upload_model(VkTransfer& cmd, const Option& opt)
{
....
}
int Convolution1D_vulkan::forward(const VkMat& bottom_blob, VkMat& top_blob, VkCompute& cmd, const Option& opt) const
{
...
}

Convolution1D_vulkan.h

All needed implementations

Main.cpp

#include "Convolution1D_vulkan.h"
DEFINE_LAYER_CREATOR(Convolution1D_vulkan)
...
ncnn::Net HIFIVOICE;
HIFIVOICE.register_custom_layer("Convolution1D_vulkan", Convolution1D_vulkan_layer_creator);

All compiled well
But also i have convolution1d.comp. I had to create convolution1d.text2hex.txt and convolution1d.hex.h from convolution1d.comp.
As i saw native ncnn shaders for VULKAN calls thru indexes

       int shader_type_index = -1;
        if (elempack == 1 && out_elempack == 1) shader_type_index = LayerShaderType::convolution;
        if (elempack == 4 && out_elempack == 4) shader_type_index = LayerShaderType::convolution_pack4;

        pipeline_convolution1d = new Pipeline(vkdev);
        pipeline_convolution1d->set_optimal_local_size_xyz(local_size_xyz);
        pipeline_convolution1d->create(shader_type_index, opt, specializations);

But i dont know how implement this in my custom layer without layer_shader_type.h and layer_shader_type_enum.h.

magicse · 2023-06-24T12:39:53Z

I found how make this

    static std::vector<uint32_t> spirv;

    static ncnn::Mutex lock;
    {
        ncnn::MutexLockGuard guard(lock);
        if (spirv.empty())
        {
            compile_spirv_module(convolution1d_comp_data, sizeof(convolution1d_comp_data), opt, spirv);
        }

    }

       std::vector<vk_specialization_type> specializations(7 + 10);
        specializations[0].i = kernel_w;
        specializations[1].i = dilation_w;
        specializations[2].i = stride_w;
        specializations[3].i = bias_term;
        specializations[4].i = activation_type;
        specializations[5].f = activation_params.w >= 1 ? activation_params[0] : 0.f;
        specializations[6].f = activation_params.w == 2 ? activation_params[1] : 0.f;
        specializations[7 + 0].i = shape_bordered_packed.dims;
        specializations[7 + 1].i = shape_bordered_packed.w;
        specializations[7 + 2].i = shape_bordered_packed.h;
        specializations[7 + 3].i = shape_bordered_packed.c;
        specializations[7 + 4].i = shape_bordered_packed.cstep;
        specializations[7 + 5].i = out_shape_packed.dims;
        specializations[7 + 6].i = out_shape_packed.w;
        specializations[7 + 7].i = out_shape_packed.h;
        specializations[7 + 8].i = out_shape_packed.c;
        specializations[7 + 9].i = out_shape_packed.cstep;

       Mat local_size_xyz(8, 8, std::min(4, (num_output / out_elempack + 1) / 2), (void*)0);
        if (out_shape_packed.dims != 0)
        {
            local_size_xyz.w = std::min(8, out_shape_packed.w);
            local_size_xyz.h = std::min(8, out_shape_packed.h);
            local_size_xyz.c = std::min(4, (out_shape_packed.c + 1) / 2);
        }


        pipeline_convolution1d = new Pipeline(vkdev);
        pipeline_convolution1d->set_optimal_local_size_xyz(local_size_xyz);
        pipeline_convolution1d->create(spirv.data(), spirv.size() * 4, specializations);

magicse · 2023-06-24T12:48:42Z

I have only one question
in shaders i saw this "afp v0" , sfp , "psc(c) ", "afpvec4"
I didn't understand what this the types ... afp, psc , sfp, afpvec4? I couldn't find any information about this.

GLSL data type C data type Description
bool int A conditional type, taking on values of true or false.
int int Signed integer.
float float Single floating-point scalar.
vec2 float [2] Two component floating-point vector.
vect3 float [3] Three component floating-point vector.
vec4 float [4] Four component floating-point vector.
bvec2 int [2] Two component Boolean vector.
bvec3 int [3] Three component Boolean vector.
bvec4 int [4] Four component Boolean vector.
ivec2 int [2] Two component signed integer vector.
ivec3 int [3] Three component signed integer vector.
ivec4 int [4] Four component signed integer vector.
mat2 float [4] 2×2 floating-point matrix.
mat3 float [9] 3×3 floating-point matrix.
mat4 float [16] 4×4 floating-point matrix.
sampler1D int Handle for accessing a 1D texture.
sampler2D int Handle for accessing a 2D texture.
sampler3D int Handle for accessing a 3D texture.
samplerCube int Handle for accessing a cubemap texture.
sampler1DShadow int A handle for accessing a 1D depth texture with comparison.
Sampler2DShadow int A handle for accessing a 2D depth texture with comparison.

magicse · 2023-06-25T10:17:21Z

I found declarations here gpu.cpp

nihui · 2023-06-25T16:21:13Z

I have only one question in shaders i saw this "afp v0" , sfp , "psc(c) ", "afpvec4" I didn't understand what this the types ... afp, psc , sfp, afpvec4? I couldn't find any information about this.

GLSL data type C data type Description bool int A conditional type, taking on values of true or false. int int Signed integer. float float Single floating-point scalar. vec2 float [2] Two component floating-point vector. vect3 float [3] Three component floating-point vector. vec4 float [4] Four component floating-point vector. bvec2 int [2] Two component Boolean vector. bvec3 int [3] Three component Boolean vector. bvec4 int [4] Four component Boolean vector. ivec2 int [2] Two component signed integer vector. ivec3 int [3] Three component signed integer vector. ivec4 int [4] Four component signed integer vector. mat2 float [4] 2×2 floating-point matrix. mat3 float [9] 3×3 floating-point matrix. mat4 float [16] 4×4 floating-point matrix. sampler1D int Handle for accessing a 1D texture. sampler2D int Handle for accessing a 2D texture. sampler3D int Handle for accessing a 3D texture. samplerCube int Handle for accessing a cubemap texture. sampler1DShadow int A handle for accessing a 1D depth texture with comparison. Sampler2DShadow int A handle for accessing a 2D depth texture with comparison.

under construction ...
/~https://github.com/nihui/ncnn/blob/doc-glsl-ext/docs/developer-guide/glsl-extension.md

magicse · 2023-06-25T16:46:42Z

Hi @nihui , thank You for link and helping.
Now i try create conv1d shader.... May it will be ready soon )))

nihui · 2023-06-26T03:17:45Z

Hi @nihui , thank You for link and helping. Now i try create conv1d shader.... May it will be ready soon )))

Hi, you can join ncnn qq group if you use qq (see ncnn readme) thru which I can provide more help in time

nihui · 2023-06-26T03:48:42Z

/~https://github.com/Tencent/ncnn/wiki/glsl-extension

magicse · 2023-06-30T00:08:34Z

Work in progress

convolution1d.comp for kernel_w > 1 and elempack 1

#version 450

#if NCNN_fp16_storage
#extension GL_EXT_shader_16bit_storage: require
#endif
#if NCNN_fp16_arithmetic
#extension GL_EXT_shader_explicit_arithmetic_types_float16: require
#endif


#extension GL_EXT_debug_printf : enable
#extension GL_GOOGLE_include_directive: enable
#include "vulkan_activation.comp"

layout (constant_id = 0) const int kernel_w = 1;
layout (constant_id = 1) const int dilation_w = 1;
layout (constant_id = 2) const int stride_w = 1;
layout (constant_id = 3) const int bias_term = 0;
layout (constant_id = 4) const int activation_type = 0;
layout (constant_id = 5) const float activation_param_0 = 0;
layout (constant_id = 6) const float activation_param_1 = 0;

#define shape_constant_id_offset 7
layout (constant_id = shape_constant_id_offset + 0) const int dims = 0;
layout (constant_id = shape_constant_id_offset + 1) const int w = 0;
layout (constant_id = shape_constant_id_offset + 2) const int h = 0;
layout (constant_id = shape_constant_id_offset + 3) const int c = 0;
layout (constant_id = shape_constant_id_offset + 4) const int cstep = 0;

layout (constant_id = shape_constant_id_offset + 5) const int outdims = 0;
layout (constant_id = shape_constant_id_offset + 6) const int outw = 0;
layout (constant_id = shape_constant_id_offset + 7) const int outh = 0;
layout (constant_id = shape_constant_id_offset + 8) const int outc = 0;
layout (constant_id = shape_constant_id_offset + 9) const int outcstep = 0;

#if NCNN_image_shader
layout (binding = 0) uniform unfp sampler2D bottom_blob;
layout (binding = 1, imfmtc1) writeonly uniform unfp image2D top_blob;
layout (binding = 2) uniform unfp sampler3D weight_blob;
layout (binding = 3) uniform unfp sampler3D bias_blob;
#else
layout (binding = 0) readonly buffer bottom_blob { sfp bottom_blob_data[]; };
layout (binding = 1) writeonly buffer top_blob { sfp top_blob_data[]; };
layout (binding = 2) readonly buffer weight_blob { sfp weight_data[]; };
layout (binding = 3) readonly buffer bias_blob { sfp bias_data[]; };
#endif

layout (push_constant) uniform parameter
{
   int dims;
   int w;
   int h;
   int c;
   int cstep;

   int outdims;
   int outw;
   int outh;
   int outc;
   int outcstep;
} p;

void print_bottom_blob()
{    
	int gx = int(gl_GlobalInvocationID.x);
	int gy = int(gl_GlobalInvocationID.y);
	int gz = int(gl_GlobalInvocationID.z);
	if (gx >= 1 || gy >= 1)
			return;
	debugPrintfEXT("Hello %i, %i\n", gx, gy);
	for (int i = 0; i < psc(w); ++i) {
		for (int j = 0; j < psc(h); ++j) {
		debugPrintfEXT("Elem %d %d: %f ", i, j, bottom_blob_data[i*psc(h)+j]);
		}
		debugPrintfEXT("\n");
	}
}

void main()
{

    int gx = int(gl_GlobalInvocationID.x) * 2;
    int gy = int(gl_GlobalInvocationID.y) * 2;
    int gz = int(gl_GlobalInvocationID.z) * 2;
    //print_bottom_blob();

    if (gx >= psc(outw) || gy >= psc(outh) || gz >= psc(outc))
        return;

    const ivec2 gx2 = gx + ivec2(0, 1);
    const ivec2 gy2 = gy + ivec2(0, 1);

    afp sum0 = afp(0.0f);
    afp sum1 = afp(0.0f);
    afp sum2 = afp(0.0f);
    afp sum3 = afp(0.0f);	

    if (bias_term == 1)
    {
#if NCNN_image_shader
        //sum = image2d_ld1(bias_blob, ivec2(gx, 0));
#else
	sum0 = buffer_ld1(bias_data, gy2.x);
	sum2 = buffer_ld1(bias_data, gy2.y);
	sum1 = sum0;
	sum3 = sum2;
#endif
    }

#if NCNN_image_shader
  //
#else
	ivec2 w_offsetv = kernel_w * psc(h) * gy2; //  weight offset
	for (int iny = 0; iny < psc(h); iny++)
	{
		ivec2 v_offsetv = iny * psc(w) + gx2 * stride_w; // value offset
		for (int x = 0; x < kernel_w; x++)
		{
			afp v0 = buffer_ld1(bottom_blob_data, v_offsetv.x + x * dilation_w); // Load the value +0
			afp v1 = buffer_ld1(bottom_blob_data, v_offsetv.y + x * dilation_w); // Load the value +1
			afp k0 = buffer_ld1(weight_data, w_offsetv.x + x); // Load the weight value +0
			afp k1 = buffer_ld1(weight_data, w_offsetv.y + x); // Load the weight value +1

			sum0 += v0 * k0;
			sum1 += v1 * k0;
			sum2 += v0 * k1;
			sum3 += v1 * k1;
		}
		w_offsetv += kernel_w; // Move to the next set of weights
	}
#endif	
	sum0 = activation_afp(sum0, activation_type, activation_param_0, activation_param_1);
	sum1 = activation_afp(sum1, activation_type, activation_param_0, activation_param_1);
	sum2 = activation_afp(sum2, activation_type, activation_param_0, activation_param_1);
	sum3 = activation_afp(sum3, activation_type, activation_param_0, activation_param_1);
	
#if NCNN_image_shader
    //image2d_st1(top_blob, ivec3(gx2.x, gy2.x, gz2.x), sum0);
    //image2d_st1(top_blob, ivec3(gx2.y, gy2.x, gz2.x), sum1);
    //image2d_st1(top_blob, ivec3(gx2.x, gy2.y, gz2.x), sum2);
    //image2d_st1(top_blob, ivec3(gx2.y, gy2.y, gz2.x), sum3);
#else
	if (gy + 1 < psc(outh) && gx + 1 < psc(outw)) buffer_st1(top_blob_data, gy2.x * psc(outw) + gx2.x, sum0);
	if (gy + 1 < psc(outh) && gx + 1 < psc(outw)) buffer_st1(top_blob_data, gy2.x * psc(outw) + gx2.y, sum1);
	if (gy + 1 < psc(outh) && gx + 1 < psc(outw)) buffer_st1(top_blob_data, gy2.y * psc(outw) + gx2.x, sum2);
	if (gy + 1 < psc(outh) && gx + 1 < psc(outw)) buffer_st1(top_blob_data, gy2.y * psc(outw) + gx2.y, sum3);
#endif
}

magicse · 2023-07-05T11:58:44Z

My convolution1d.comp for kernel_w > 1 and elempack 1 ( unpacked float 32) work correct and produce correct results.
Convolution1D_vulkan.cpp arranged as custom layer like
And all work correctly .
But i have one problem. create_pipeline procedure Convolution1D_vulkan::create_pipeline(const Option& _opt) receives all parameters correctly except bottom_shapes and top_shapes, it always empty.
Main.cpp

#include "Convolution1D_vulkan.h"
DEFINE_LAYER_CREATOR(Convolution1D_vulkan)
....
ncnn::Net HIFIVOICE;
HIFIVOICE.opt.use_fp16_packed = false;
HIFIVOICE.opt.use_fp16_storage = false;
HIFIVOICE.opt.use_fp16_arithmetic = false;
HIFIVOICE.opt.use_int8_storage = false;
HIFIVOICE.opt.use_int8_arithmetic = false;
HIFIVOICE.opt.use_int8_packed = false;
HIFIVOICE.opt.use_vulkan_compute = true;
HIFIVOICE.register_custom_layer("Convolution1D_vulkan", Convolution1D_vulkan_layer_creator);
....

int Convolution1D_vulkan::create_pipeline(const Option& _opt)
{
    std::cout << "=== Create Pipeline: ===" << std::endl;
    if (dynamic_weight)
    {
        support_vulkan = false;
        support_image_storage = false;
        return 0;
    }

    // Create a convolution pipeline using Vulkan
    Option opt = _opt;
	const Mat& shape = bottom_shapes.empty() ? Mat() : bottom_shapes[0];
    const Mat& out_shape = top_shapes.empty() ? Mat() : top_shapes[0];
    std::cout << "=== Create Pipeline: ===" << std::endl;
    std::cout << "=== Create Pipeline: New shape from bottom_shapes WxHxC x Dims ===" << shape.w << " " << shape.h << " " << shape.c << " " << shape.d <<std::endl;
    std::cout << "=== Create Pipeline: New out_shape from top_shapes WxHxC x Dims ===" << out_shape.w << " " << out_shape.h << " " << out_shape.c << " " << out_shape.d <<std::endl;

Output

=== Create Pipeline: ===
=== Create Pipeline: New shape from bottom_shapes WxHxC x Dims ===0 0 0 0
=== Create Pipeline: New out_shape from top_shapes WxHxC x Dims ===0 0 0 0

=== Create Pipeline: padding pipeline pad_left pad_right : ===3 3
=== Create Pipeline: data_packed.create maxk : ===7
=== Create Pipeline: data_packed.create num_input : ===5120
=== Create Pipeline: data_packed.create elempack : ===4
=== Create Pipeline: data_packed.create num_input / elempack : ===1280
=== Create Pipeline: data_packed.create num_output : ===8
=== Create Pipeline: data_packed.create out_elempack : ===4
=== Create Pipeline: data_packed.create num_output / out_elempack : ===2
=== Create Pipeline: data_packed.create (size_t)4 * elempack * out_elempack : ===64
=== Create Pipeline: data_packed.create elempack * out_elempack : ===16
=== Create Pipeline: weight_data WxHxC : === 286720 x 1 x 1
=== Create Pipeline: weight_data WxHxC reshaped : === 7 x 5120 x 8

magicse · 2023-07-09T14:52:17Z

I made Convolution1d for the GPU (for float 32 pack1 and pack4 blobs and unpacked weights)
And now a 7 second voice phrase is generated in 7 seconds.
Good results.

7 second voice phrase without GPU Inference duration:

------------------------
Inference duration: 177 seconds
Out matrix size W x H = 173312 x 1 number of channels 1

7 second voice phrase with GPU Inference duration:

------------------------
Inference duration: 7 seconds
Out matrix size W x H = 173312 x 1 number of channels 1

This is realtime.

magicse · 2023-07-09T15:12:52Z

convolution1d_pack4.comp (float 32 pack4 blobs and input unpacked weights)

#version 450

#if NCNN_fp16_storage
#extension GL_EXT_shader_16bit_storage: require
#endif
#if NCNN_fp16_arithmetic
#extension GL_EXT_shader_explicit_arithmetic_types_float16: require
#endif


//#extension GL_EXT_debug_printf : enable
#extension GL_GOOGLE_include_directive: enable
#include "vulkan_activation.comp"

layout (constant_id = 0) const int kernel_w = 1;
layout (constant_id = 1) const int dilation_w = 1;
layout (constant_id = 2) const int stride_w = 1;
layout (constant_id = 3) const int bias_term = 0;
layout (constant_id = 4) const int activation_type = 0;
layout (constant_id = 5) const float activation_param_0 = 0;
layout (constant_id = 6) const float activation_param_1 = 0;

#define shape_constant_id_offset 7
layout (constant_id = shape_constant_id_offset + 0) const int dims = 0;
layout (constant_id = shape_constant_id_offset + 1) const int w = 0;
layout (constant_id = shape_constant_id_offset + 2) const int h = 0;
layout (constant_id = shape_constant_id_offset + 3) const int c = 0;
layout (constant_id = shape_constant_id_offset + 4) const int cstep = 0;

layout (constant_id = shape_constant_id_offset + 5) const int outdims = 0;
layout (constant_id = shape_constant_id_offset + 6) const int outw = 0;
layout (constant_id = shape_constant_id_offset + 7) const int outh = 0;
layout (constant_id = shape_constant_id_offset + 8) const int outc = 0;
layout (constant_id = shape_constant_id_offset + 9) const int outcstep = 0;

#if NCNN_image_shader
layout (binding = 0) uniform unfp sampler3D bottom_blob;
layout (binding = 1, imfmtc4) writeonly uniform unfp image3D top_blob;
layout (binding = 2) uniform unfp sampler3D weight_blob;
layout (binding = 3) uniform unfp sampler3D bias_blob;
#else
//layout (binding = 0) readonly buffer bottom_blob { sfp bottom_blob_data[]; };
layout (binding = 0) readonly buffer bottom_blob { sfpvec4 bottom_blob_data[]; };

//layout (binding = 1) writeonly buffer top_blob { sfp top_blob_data[]; };
layout (binding = 1) writeonly buffer top_blob { sfpvec4 top_blob_data[]; };

//layout (binding = 2) readonly buffer weight_blob { sfp weight_data[]; };
//layout (binding = 3) readonly buffer bias_blob { sfp bias_data[]; };
#if NCNN_fp16_packed || (NCNN_fp16_storage && !NCNN_fp16_arithmetic)
layout (binding = 2) readonly buffer weight_blob { sfpvec4 weight_data[]; };
#else
//layout (binding = 2) readonly buffer weight_blob { sfpmat4 weight_data[]; };
//layout (binding = 2) readonly buffer weight_blob { sfpvec4 weight_data[]; };
layout (binding = 2) readonly buffer weight_blob { sfp weight_data[]; };

#endif
layout (binding = 3) readonly buffer bias_blob { sfpvec4 bias_data[]; };

#endif

layout (push_constant) uniform parameter
{
    int dims;
    int w;
    int h;
    int c;
    int cstep;

    int outdims;
    int outw;
    int outh;
    int outc;
    int outcstep;
} p;

/*
void print_bottblob()
{    
	int gx = int(gl_GlobalInvocationID.x);
    int gy = int(gl_GlobalInvocationID.y);
    int gz = int(gl_GlobalInvocationID.z);
	if (gx >= 1 || gy >= 1 || gz >= 1)
			return;
	//debugPrintfEXT("Hello %i, %i\n", gx, gy);
	for (int i = 0; i < psc(h)/4; ++i) {
		for (int j = 0; j < psc(w); ++j) {
		//for (int j = 0; j < psc(h); ++j) {
		//afp v = buffer_ld1(bottom_blob_data, 3);
		//debugPrintfEXT("Elem %d %d: %f ", i, j, v);
		
		//debugPrintfEXT("Bot_Blob %d %d: %f ", i, j, bottom_blob_data[i*psc(h)+j]);
		
		afpvec4 test = buffer_ld4(bottom_blob_data, i*psc(w)+j);
		debugPrintfEXT(" Top_Blob %d %d: %v4f ", i, j, test);
		
		//afpvec4 value;
		//value = buffer_ld4(bottom_blob_data, i*psc(h)+j );		
		//debugPrintfEXT("Bot_Blob %d %d: %f ", i, j, value);

		}
		debugPrintfEXT("\n");
	}
}

void print_weight()
{    
	int gx = int(gl_GlobalInvocationID.x);
    int gy = int(gl_GlobalInvocationID.y);
    int gz = int(gl_GlobalInvocationID.z);
	if (gx >= 1 || gy >= 1 || gz >= 1)
			return;
	debugPrintfEXT("Hello %i, %i\n", gx, gy);
	for (int i = 0; i < psc(outh)*4; ++i) {
		for (int j = 0; j < psc(outw)*kernel_w; ++j) {
		//afp v = buffer_ld1(bottom_blob_data, 3);
		//debugPrintfEXT("Elem %d %d: %f ", i, j, v);
		debugPrintfEXT("Weight %d %d: %f ", i, j, weight_data[i*psc(outw)*kernel_w+j]);
		//afpvec4 test = buffer_ld4(weight_data, i*psc(outw)+j);
		//debugPrintfEXT(" Weight %d %d: %v4f ", i, j, test);
		}
		debugPrintfEXT("\n");
	}
}

*/

void main()
{

    int gx = int(gl_GlobalInvocationID.x) * 2;
    int gy = int(gl_GlobalInvocationID.y) * 2;
    int gz = int(gl_GlobalInvocationID.z) * 2;

	//print_bottblob();
	//print_weight();

    if (gx >= psc(outw) || gy >= psc(outh) || gz >= psc(outc))
        return;

    const ivec2 gx2 = gx + ivec2(0, 1);
    const ivec2 gy2 = gy + ivec2(0, 1);
    const ivec2 gy4 = gy*4 + ivec2(0, 4);
    const ivec2 gz2 = gz + ivec2(0, 1);

	afpvec4 sum0 = afpvec4(0.0f);
	afpvec4 sum1 = afpvec4(0.0f);
	afpvec4 sum2 = afpvec4(0.0f);
	afpvec4 sum3 = afpvec4(0.0f);	
	
	afpvec4 sum4 = afpvec4(0.0f);
	afpvec4 sum5 = afpvec4(0.0f);

	afpvec4 sum6 = afpvec4(0.0f);
	afpvec4 sum7 = afpvec4(0.0f);
	afpvec4 sum8 = afpvec4(0.0f);
	afpvec4 sum9 = afpvec4(0.0f);	
	
	
	afpvec4 sum10 = afpvec4(0.0f);
	afpvec4 sum11 = afpvec4(0.0f);
	afpvec4 sum12 = afpvec4(0.0f);
	afpvec4 sum13 = afpvec4(0.0f);
	
	afpvec4 sum14 = afpvec4(0.0f);
	afpvec4 sum15 = afpvec4(0.0f);
	
	afpvec4 sum16 = afpvec4(0.0f);
	afpvec4 sum17 = afpvec4(0.0f);
	afpvec4 sum18 = afpvec4(0.0f);
	afpvec4 sum19 = afpvec4(0.0f);

    if (bias_term == 1)
    {
#if NCNN_image_shader
        sum = image2d_ld1(bias_blob, ivec2(gx, 0));
#else
		sum4 = buffer_ld4(bias_data, gy2.x);
		sum5 = sum4;
		sum14 = buffer_ld4(bias_data, gy2.y);
		sum15 = sum14;

#endif
    }

#if NCNN_image_shader
	//
#else
		

			ivec4 gy4_0 = gy4.x + ivec4(0, 1, 2, 3);
			ivec4 gy4_1 = gy4.y + ivec4(0, 1, 2, 3);

			ivec4 w_offsetv4_0;
			ivec4 w_offsetv4_1;
			w_offsetv4_0 = kernel_w * psc(h) * 4 * gy4_0;
			w_offsetv4_1 = kernel_w * psc(h) * 4 * gy4_1;
			
			for (int iny = 0; iny < psc(h); iny++)
			{
				
				ivec2 v_offsetv = iny * psc(w) + gx2 * stride_w;
				
				for (int x = 0; x < kernel_w; x++)
				{
					
					afpvec4 v0 = buffer_ld4(bottom_blob_data, v_offsetv.x + x * dilation_w);
					afpvec4 v1 = buffer_ld4(bottom_blob_data, v_offsetv.y + x * dilation_w);
					
					
					afp k0 = buffer_ld1(weight_data, (w_offsetv4_0.x + x) + kernel_w * 0); // Load the weight value
					afp k1 = buffer_ld1(weight_data, (w_offsetv4_0.x + x) + kernel_w * 1); // Load the weight value
					afp k2 = buffer_ld1(weight_data, (w_offsetv4_0.x + x) + kernel_w * 2); // Load the weight value
					afp k3 = buffer_ld1(weight_data, (w_offsetv4_0.x + x) + kernel_w * 3); // Load the weight value
					
					afp k4 = buffer_ld1(weight_data, (w_offsetv4_0.y + x) + kernel_w * 0); // Load the weight value
					afp k5 = buffer_ld1(weight_data, (w_offsetv4_0.y + x) + kernel_w * 1); // Load the weight value
					afp k6 = buffer_ld1(weight_data, (w_offsetv4_0.y + x) + kernel_w * 2); // Load the weight value
					afp k7 = buffer_ld1(weight_data, (w_offsetv4_0.y + x) + kernel_w * 3); // Load the weight value
					
					afp k8 = buffer_ld1(weight_data, (w_offsetv4_0.z + x) + kernel_w * 0); // Load the weight value
					afp k9 = buffer_ld1(weight_data, (w_offsetv4_0.z + x) + kernel_w * 1); // Load the weight value
					afp k10 = buffer_ld1(weight_data, (w_offsetv4_0.z + x) + kernel_w * 2); // Load the weight value
					afp k11 = buffer_ld1(weight_data, (w_offsetv4_0.z + x) + kernel_w * 3); // Load the weight value
					
					afp k12 = buffer_ld1(weight_data, (w_offsetv4_0.w + x) + kernel_w * 0); // Load the weight value
					afp k13 = buffer_ld1(weight_data, (w_offsetv4_0.w + x) + kernel_w * 1); // Load the weight value
					afp k14 = buffer_ld1(weight_data, (w_offsetv4_0.w + x) + kernel_w * 2); // Load the weight value
					afp k15 = buffer_ld1(weight_data, (w_offsetv4_0.w + x) + kernel_w * 3); // Load the weight value
					
					
					afp k16 = buffer_ld1(weight_data, (w_offsetv4_1.x + x) + kernel_w * 0); // Load the weight value
					afp k17 = buffer_ld1(weight_data, (w_offsetv4_1.x + x) + kernel_w * 1); // Load the weight value
					afp k18 = buffer_ld1(weight_data, (w_offsetv4_1.x + x) + kernel_w * 2); // Load the weight value
					afp k19 = buffer_ld1(weight_data, (w_offsetv4_1.x + x) + kernel_w * 3); // Load the weight value
					
					afp k20 = buffer_ld1(weight_data, (w_offsetv4_1.y + x) + kernel_w * 0); // Load the weight value
					afp k21 = buffer_ld1(weight_data, (w_offsetv4_1.y + x) + kernel_w * 1); // Load the weight value
					afp k22 = buffer_ld1(weight_data, (w_offsetv4_1.y + x) + kernel_w * 2); // Load the weight value
					afp k23 = buffer_ld1(weight_data, (w_offsetv4_1.y + x) + kernel_w * 3); // Load the weight value
					
					afp k24 = buffer_ld1(weight_data, (w_offsetv4_1.z + x) + kernel_w * 0); // Load the weight value
					afp k25 = buffer_ld1(weight_data, (w_offsetv4_1.z + x) + kernel_w * 1); // Load the weight value
					afp k26 = buffer_ld1(weight_data, (w_offsetv4_1.z + x) + kernel_w * 2); // Load the weight value
					afp k27 = buffer_ld1(weight_data, (w_offsetv4_1.z + x) + kernel_w * 3); // Load the weight value
					
					afp k28 = buffer_ld1(weight_data, (w_offsetv4_1.w + x) + kernel_w * 0); // Load the weight value
					afp k29 = buffer_ld1(weight_data, (w_offsetv4_1.w + x) + kernel_w * 1); // Load the weight value
					afp k30 = buffer_ld1(weight_data, (w_offsetv4_1.w + x) + kernel_w * 2); // Load the weight value
					afp k31 = buffer_ld1(weight_data, (w_offsetv4_1.w + x) + kernel_w * 3); // Load the weight value
					
					

#if NCNN_fp16_packed || (NCNN_fp16_storage && !NCNN_fp16_arithmetic)
                // GL_EXT_shader_16bit_storage does not define f16mat4 type :(
                afpmat4 k0 = afpmat4(
                    buffer_ld4(weight_data, (w_offsetv.x + x) * 4 + 0),
                    buffer_ld4(weight_data, (w_offsetv.x + x) * 4 + 1),
                    buffer_ld4(weight_data, (w_offsetv.x + x) * 4 + 2),
                    buffer_ld4(weight_data, (w_offsetv.x + x) * 4 + 3)
                );
                afpmat4 k1 = afpmat4(
                    buffer_ld4(weight_data, (w_offsetv.y + x) * 4 + 0),
                    buffer_ld4(weight_data, (w_offsetv.y + x) * 4 + 1),
                    buffer_ld4(weight_data, (w_offsetv.y + x) * 4 + 2),
                    buffer_ld4(weight_data, (w_offsetv.y + x) * 4 + 3)
                );
#else

				
#endif
					//debugPrintfEXT(" k0, k1, k2, k3 %f, %f, %f, %f \n", k0, k1, k2, k3);
					//debugPrintfEXT(" k4, k5, k6, k7 %f, %f, %f, %f \n", k4, k5, k6, k7);
					sum0 += v0 * afpvec4(k0, k1, k2, k3); //* k0;
					sum1 += v1 * afpvec4(k0, k1, k2, k3); //* k0;
					sum2 += v0 * afpvec4(k4, k5, k6, k7); //* k1;
					sum3 += v1 * afpvec4(k4, k5, k6, k7); //* k1;
					
					sum6 += v0 * afpvec4(k8, k9, k10, k11); //* k0;
					sum7 += v1 * afpvec4(k8, k9, k10, k11); //* k0;
					sum8 += v0 * afpvec4(k12, k13, k14, k15); //* k1;
					sum9 += v1 * afpvec4(k12, k13, k14, k15); //* k1;
					
					sum10 += v0 * afpvec4(k16, k17, k18, k19); //* k0;
					sum11 += v1 * afpvec4(k16, k17, k18, k19); //* k0;
					sum12 += v0 * afpvec4(k20, k21, k22, k23); //* k1;
					sum13 += v1 * afpvec4(k20, k21, k22, k23); //* k1;
					
					sum16 += v0 * afpvec4(k24, k25, k26, k27); //* k0;
					sum17 += v1 * afpvec4(k24, k25, k26, k27); //* k0;
					sum18 += v0 * afpvec4(k28, k29, k30, k31); //* k1;
					sum19 += v1 * afpvec4(k28, k29, k30, k31); //* k1;
					

				}

				w_offsetv4_0 += kernel_w*4;
				w_offsetv4_1 += kernel_w*4;
			}
			
			sum4.x += sum0.x + sum0.y + sum0.z + sum0.w;
			sum4.y += sum2.x + sum2.y + sum2.z + sum2.w;
			sum4.z += sum6.x + sum6.y + sum6.z + sum6.w;
			sum4.w += sum8.x + sum8.y + sum8.z + sum8.w;
			
			sum5.x += sum1.x + sum1.y + sum1.z + sum1.w;
			sum5.y += sum3.x + sum3.y + sum3.z + sum3.w;
			sum5.z += sum7.x + sum7.y + sum7.z + sum7.w;
			sum5.w += sum9.x + sum9.y + sum9.z + sum9.w;

			sum14.x += sum10.x + sum10.y + sum10.z + sum10.w;
			sum14.y += sum12.x + sum12.y + sum12.z + sum12.w;
			sum14.z += sum16.x + sum16.y + sum16.z + sum16.w;
			sum14.w += sum18.x + sum18.y + sum18.z + sum18.w;
			
			sum15.x += sum11.x + sum11.y + sum11.z + sum11.w;
			sum15.y += sum13.x + sum13.y + sum13.z + sum13.w;
			sum15.z += sum17.x + sum17.y + sum17.z + sum17.w;
			sum15.w += sum19.x + sum19.y + sum19.z + sum19.w;			

#endif	
	sum4 = activation_afpvec4(sum4, activation_type, activation_param_0, activation_param_1);
	sum5 = activation_afpvec4(sum5, activation_type, activation_param_0, activation_param_1);
	sum14 = activation_afpvec4(sum14, activation_type, activation_param_0, activation_param_1);
	sum15 = activation_afpvec4(sum15, activation_type, activation_param_0, activation_param_1);
#if NCNN_image_shader
    image2d_st1(top_blob, ivec3(gx2.x, gy2.x, gz2.x), sum0);
    image2d_st1(top_blob, ivec3(gx2.y, gy2.x, gz2.x), sum1);
    image2d_st1(top_blob, ivec3(gx2.x, gy2.y, gz2.x), sum2);
    image2d_st1(top_blob, ivec3(gx2.y, gy2.y, gz2.x), sum3);
#else
	if (gy + 1 < psc(outh) && gx + 1 < psc(outw)) buffer_st4(top_blob_data, gy2.x * psc(outw) + gx2.x, sum4);
	if (gy + 1 < psc(outh) && gx + 1 < psc(outw)) buffer_st4(top_blob_data, gy2.x * psc(outw) + gx2.y, sum5);
	if (gy + 1 < psc(outh) && gx + 1 < psc(outw)) buffer_st4(top_blob_data, gy2.y * psc(outw) + gx2.x, sum14);
	if (gy + 1 < psc(outh) && gx + 1 < psc(outw)) buffer_st4(top_blob_data, gy2.y * psc(outw) + gx2.y, sum15);
#endif

}

magicse · 2023-07-24T20:49:48Z

I have finished creating working convolution1d_vulkan for fp32

opt.use_fp16_packed = false;
opt.use_fp16_storage = false;
opt.use_fp16_arithmetic = false;
opt.use_int8_storage = false;
opt.use_int8_arithmetic = false;
opt.use_int8_packed = false;

convolution1d.comp
convolution1d_pack1to4.comp
convolution1d_pack4.comp
convolution1d_pack4to1.comp in progress

Inference duration for this mel spectrogram: 5 seconds

output.mp4

nihui · 2023-10-20T08:49:58Z

vulkan conv1d #5060

magicse · 2023-10-21T09:17:55Z

hi @nihui
I tried new /~https://github.com/Tencent/ncnn/pull/5060/files conv1d comp shaders and layer (convolution1d_vulkan.cpp, convolution1d_vulkan.h) and it doesn't work correctly, I don't get sound. The output I get is just noise.
Try my model ncnn-hifi-GAN with opt.use_vulkan_compute = true (i get noise) and opt.use_vulkan_compute = false (i get sound).
But with my own shaders for fp32 It worked correctly and i get sound.
Convolution1D_vulkan.cpp
Convolution1D_vulkan.h
convolution1d.comp
convolution1d_pack1to4.comp
convolution1d_pack4.comp

nihui · 2023-10-23T06:26:15Z

try disabling fp16

The following test print the same result on cpu and gpu

int main()
{
    ncnn::Net net;

    net.opt.use_vulkan_compute = true;
    // net.opt.use_vulkan_compute = false;

    net.opt.use_fp16_packed = false;
    net.opt.use_fp16_storage = false;
    net.opt.use_fp16_arithmetic = false;

    net.load_param("/home/nihui/osd/ncnn-nihui/mytools/hifivoice.ncnn.param");
    net.load_model("/home/nihui/osd/ncnn-nihui/mytools/hifivoice.ncnn.bin");

    {
        ncnn::Extractor ex = net.create_extractor();
        ex.set_vulkan_compute(false);

        ncnn::Mat in0 = RandomMat(64, 80);

        ex.input("in0", in0);

        ncnn::Mat out0;
        ex.extract("out0", out0);

        fprintf(stderr, "out0 %d %d %d %d %d\n", out0.dims, out0.w, out0.h, out0.d, out0.c);

        fprintf(stderr, "out0 %f %f %f %f %f %f\n", out0[0], out0[1], out0[10], out0[20], out0[200], out0[1020]);
    }

    {
        ncnn::Extractor ex = net.create_extractor();
        ex.set_vulkan_compute(true);

        ncnn::Mat in0 = RandomMat(64, 80);

        ex.input("in0", in0);

        ncnn::Mat out0;
        ex.extract("out0", out0);

        fprintf(stderr, "out0 %d %d %d %d %d\n", out0.dims, out0.w, out0.h, out0.d, out0.c);

        fprintf(stderr, "out0 %f %f %f %f %f %f\n", out0[0], out0[1], out0[10], out0[20], out0[200], out0[1020]);
    }

    return 0;
}

[nihui@nihuini-LC2 mytools]$ ./testnet 
[0 AMD Radeon Graphics (RADV NAVI14)]  queueC=1[4]  queueG=0[1]  queueT=0[1]
[0 AMD Radeon Graphics (RADV NAVI14)]  bugsbn1=0  bugbilz=0  bugcopc=0  bugihfa=0
[0 AMD Radeon Graphics (RADV NAVI14)]  fp16-p/s/a=1/1/1  int8-p/s/a=1/1/1
[0 AMD Radeon Graphics (RADV NAVI14)]  subgroup=64  basic/vote/ballot/shuffle=1/1/1/1
[0 AMD Radeon Graphics (RADV NAVI14)]  fp16-matrix-16_8_8/16_8_16/16_16_16=0/0/0
[1 llvmpipe (LLVM 16.0.6, 256 bits)]  queueC=0[1]  queueG=0[1]  queueT=0[1]
[1 llvmpipe (LLVM 16.0.6, 256 bits)]  bugsbn1=0  bugbilz=0  bugcopc=0  bugihfa=0
[1 llvmpipe (LLVM 16.0.6, 256 bits)]  fp16-p/s/a=1/1/1  int8-p/s/a=1/1/1
[1 llvmpipe (LLVM 16.0.6, 256 bits)]  subgroup=8  basic/vote/ballot/shuffle=1/1/1/1
[1 llvmpipe (LLVM 16.0.6, 256 bits)]  fp16-matrix-16_8_8/16_8_16/16_16_16=0/0/0
out0 2 16384 1 1 1
out0 0.048641 0.074993 -0.038795 0.100711 -0.048371 0.117223
out0 2 16384 1 1 1
out0 0.048641 0.074993 -0.038795 0.100711 -0.048371 0.117224

magicse · 2023-10-29T01:10:39Z

Hi @nihui , thank you for your work. Now ncnn is open to new directions such as sound synthesis, voice conversion, music synthesis and TTS.
I check your code and of course I get correct results.

Z:\AI_SDK\VAE-GAN\HIFIVoice_cpp>hifivoice.exe -i melgram_flipped.jpg

Input option value=melgram_flipped.jpg
path = melgram_flipped.jpgimagepath0: melgram_flipped.jpg
argv[0]: mel
argv[1]: melgram_flipped.jpg
[0 NVIDIA GeForce RTX 3060]  queueC=2[8]  queueG=0[16]  queueT=1[2]
[0 NVIDIA GeForce RTX 3060]  bugsbn1=0  bugbilz=0  bugcopc=0  bugihfa=0
[0 NVIDIA GeForce RTX 3060]  fp16-p/s/a=1/1/1  int8-p/s/a=1/1/1
[0 NVIDIA GeForce RTX 3060]  subgroup=32  basic=1  vote=1  ballot=1  shuffle=1
in0 2 64 80 1 1
out0 2 16384 1 1 1
out0 0.048641 0.074994 -0.038795 0.100711 -0.048369 0.117221
out0 2 16384 1 1 1
out0 0.048641 0.074994 -0.038795 0.100711 -0.048372 0.117223
Max mel magnitude val: 1.89804
Min mel magnitude val: -11
[677 x 80]; ch: 1
MelIn 3 677 80 1 1
Inference duration: 6 seconds
Out matrix size W x H = 173312 x 1 number of channels 1

Final
Z:\AI_SDK\VAE-GAN\HIFIVoice_cpp>

I also found what the problem was.
convolution1d with vulkan=true and convolution1d with vulkan=false handle ncnn::Mat with an incorrect dimension differently.

For example, convolution1d is waiting for input dimension 2, and I passed ncnn:Mat with dimension 3.

convolution1d with vulkan=false treats ncnn::Mat with dimension 3 correctly as dimension 2, but convolution1d with vulkan=true produces the wrong result.
My code was like this

     ncnn::Mat MelIn(melscpectro.cols, melscpectro.rows, 1, (void*)melscpectro.data);
     fprintf(stderr, "MelIn %d %d %d %d %d\n", MelIn.dims, MelIn.w, MelIn.h, MelIn.d, MelIn.c);

and I was getting an erroneous result with vulkan=true because dims=3

MelIn 3 677 80 1 1

Now I have changed the code

     ncnn::Mat MelIn(melscpectro.cols, melscpectro.rows, (void*)melscpectro.data);
     fprintf(stderr, "MelIn %d %d %d %d %d\n", MelIn.dims, MelIn.w, MelIn.h, MelIn.d, MelIn.c);

Output:

MelIn 2 677 80 1 1

and I get the correct result with convolution1d vulkan=true
Thank you again @nihui for your work !!

magicse changed the title ~~Convolution1D~~ Convolution1D and Deconvolution1D layers Jun 18, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Convolution1D and Deconvolution1D layers #4811

Convolution1D and Deconvolution1D layers #4811

magicse commented Jun 18, 2023

nihui commented Jun 18, 2023

magicse commented Jun 18, 2023 •

edited

Loading

Baiyuetribe commented Jun 19, 2023

magicse commented Jun 21, 2023 •

edited

Loading

magicse commented Jun 24, 2023 •

edited

Loading

magicse commented Jun 24, 2023 •

edited

Loading

magicse commented Jun 25, 2023

nihui commented Jun 25, 2023

magicse commented Jun 25, 2023 •

edited

Loading

nihui commented Jun 26, 2023

nihui commented Jun 26, 2023

magicse commented Jun 30, 2023 •

edited

Loading

magicse commented Jul 5, 2023 •

edited

Loading

magicse commented Jul 9, 2023 •

edited

Loading

magicse commented Jul 9, 2023 •

edited

Loading

magicse commented Jul 24, 2023 •

edited

Loading

nihui commented Oct 20, 2023

magicse commented Oct 21, 2023 •

edited

Loading

nihui commented Oct 23, 2023 •

edited

Loading

magicse commented Oct 29, 2023 •

edited

Loading

Convolution1D and Deconvolution1D layers #4811

Convolution1D and Deconvolution1D layers #4811

Comments

magicse commented Jun 18, 2023

nihui commented Jun 18, 2023

magicse commented Jun 18, 2023 • edited Loading

Baiyuetribe commented Jun 19, 2023

magicse commented Jun 21, 2023 • edited Loading

magicse commented Jun 24, 2023 • edited Loading

magicse commented Jun 24, 2023 • edited Loading

magicse commented Jun 25, 2023

nihui commented Jun 25, 2023

magicse commented Jun 25, 2023 • edited Loading

nihui commented Jun 26, 2023

nihui commented Jun 26, 2023

magicse commented Jun 30, 2023 • edited Loading

magicse commented Jul 5, 2023 • edited Loading

magicse commented Jul 9, 2023 • edited Loading

magicse commented Jul 9, 2023 • edited Loading

magicse commented Jul 24, 2023 • edited Loading

nihui commented Oct 20, 2023

magicse commented Oct 21, 2023 • edited Loading

nihui commented Oct 23, 2023 • edited Loading

magicse commented Oct 29, 2023 • edited Loading

magicse commented Jun 18, 2023 •

edited

Loading

magicse commented Jun 21, 2023 •

edited

Loading

magicse commented Jun 24, 2023 •

edited

Loading

magicse commented Jun 24, 2023 •

edited

Loading

magicse commented Jun 25, 2023 •

edited

Loading

magicse commented Jun 30, 2023 •

edited

Loading

magicse commented Jul 5, 2023 •

edited

Loading

magicse commented Jul 9, 2023 •

edited

Loading

magicse commented Jul 9, 2023 •

edited

Loading

magicse commented Jul 24, 2023 •

edited

Loading

magicse commented Oct 21, 2023 •

edited

Loading

nihui commented Oct 23, 2023 •

edited

Loading

magicse commented Oct 29, 2023 •

edited

Loading