Input Tensor Shape Explanation

by anstdev - opened Aug 3

Aug 3

•

Hello, I am trying to integrate this model in my project (using Unity Sentis).
However, I struggle how to create the proper input tensor.

The input tensor shape is shown to me as [2, 'num_splits', 512, 1024] of type float ( read input shape via https://stackoverflow.com/a/73955585 ).

My questions:

What is each input dimension about? Is there any mapping to well-known spleeter parameters?
Is the input tensor different from the original Spleeter model?
How to construct the proper input tensor from a float[] of audio samples? I assume the model expects 44100 Hz audio, is it correct?
- More specifically, which parameters for the Short-time fourier transform (STFT) are needed (windowSize, hopSize, etc.)?
Similarly, could you please also explain the output tensor shape?

Thanks for the help!

anstdev

Aug 3

You can find my current attempt on GitHub: https://github.com/achimmihca/SpleeterAiUnityDemo
More specifically: https://github.com/achimmihca/SpleeterAiUnityDemo/blob/main/Assets/Scenes/SpleeterAudioSeparator.cs

csukuangfj

Owner Aug 3

First of all, sherpa-onnx provides C# API.

Second, if you don't want to use sherpa-onnx, you can have a look at
https://github.com/k2-fsa/sherpa-onnx/blob/master/scripts/spleeter/separate_onnx.py

csukuangfj changed discussion status to closed Aug 3

anstdev

Aug 3

Thank you, the linked Python implementation is very helpful for me.
Somehow I was not able to find it. Sorry for the inconvenience.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment