- Create a Docker image powered by the Piper engine for TTS.
- Run a fine-tuning job based on this image. This job produces an ONNX model for TTS.
- Deploy the model as a Serverless AI endpoint.
- Synthesize speech from text by using the deployed model.
Costs
Nebius AI Cloud charges you for the following billing items:- Compute virtual machines (VMs).
- Boot disks attached to the VMs.
- Used space in Standard storage in an Object Storage bucket.
Steps
Prepare infrastructure
-
Create a CPU-only VM.
It’s required to build the Docker image because the VM’s operating system (Linux) is suitable for the image architecture.
Configure SSH access to the VM so that you can connect to it later.
- Web console
- CLI
- Terraform
-
In the web console, go to
Compute → Virtual machines.
-
Click
Create virtual machine.
-
On the page that opens, set the following VM configuration:
- Computing resources:
Without GPU. - Platform: Non-GPU AMD EPYC Genoa.
- Preset: 16 CPUs — 64 GiB RAM.
- Boot disk size: at least 100 GiB.
- Public IP address:
Auto assign dynamic IP. - Username and SSH key: Configure access credentials.
- Computing resources:
- Click Create VM.
-
Create a bucket to store fine-tuning artifacts.
- Web console
- CLI
- Terraform
-
In the web console, go to
Storage → Object Storage.
-
Click
Create bucket.
- In the Maximum size field, select Unlimited. Leave the other settings at their default values.
- Click Create bucket.
Prepare a dataset
On a local machine, prepare a dataset for training the ONNX model. After that, upload the dataset to the bucket.-
Create a working directory:
-
Create and activate a virtual Python environment:
-
In this environment, install the required Python dependencies for the dataset preparation:
-
Install FFmpeg. This is a tool that allows you to record and convert audio and that is required for TorchCodec.
You can install FFmpeg by running
conda install "ffmpeg"orbrew install "ffmpeg"(macOS only). -
Download five training samples from Hugging Face:
-
After the script prints
Done, check that the samples are downloaded:The output should be the following: -
Upload the
inputfolder to the bucket created earlier:- Web console
- In the web console, go to
Storage → Object Storage.
- Open the bucket page.
- Create the
/mnt/data/input/rawdirectory. To do so, click Add → Folder for every directory in this path. - Go to
/mnt/data/input/rawand then click Add → Object. - Upload the samples.
Prepare files for the Docker image
-
To connect to the VM, get its public IP address:
- Web console
- CLI
- In the web console, go to
Compute → Virtual machines.
- Open the VM page.
- In Network → Public IPv4, copy the address.
-
Connect to the VM by using SSH:
Specify the username that you set when creating the VM.
-
On the VM, create a working directory:
-
In this directory, create the following files for building the Docker image:
You can check that you stored all the files and didn’t miss any by using
train.py
app.py
requirements.txt
sitecustomize.py
Dockerfile
lsortree.
Build and push the Docker image
On the VM:- Install Docker.
-
Install additional packages and prepare Docker for building the image:
-
Check that the Docker daemon is running:
If Docker is running, this command returns a table of containers (can be empty). If you don’t see the table and the daemon isn’t running, launch it.
- Create an account in Docker Hub. Use it for authentication when you push your image to a repository.
- Create a public repository in Docker Hub. You will push your Docker image there.
-
In the
~/piper-nebiusdirectory, build the image:In the command, specify your public repository. For example,myrepository/tts:piper-nebius-ui-tutorial. -
Authenticate in Docker Hub:
Specify your username at Docker Hub and enter your password when prompted.
-
Push the image to the repository:
This operation can take several minutes to complete.
Create and deploy the ONNX model by using a Serverless AI job and endpoint
-
Create a fine-tuning job that generates the ONNX model:
After the job reaches the
- Web console
- CLI
-
In the web console, go to
AI Services → Jobs.
-
Click
Create job.
-
On the page that opens, specify the following job parameters:
-
Image path:
<repository>/<image>:piper-nebius-ui-tutorial. Set the image that you’ve pushed to the Docker repository. -
Advanced settings → Entrypoint command:
python3. -
Advanced settings → Arguments:
/app/train.py --raw-dir /mnt/data/input/raw --work-dir /tmp/work --output-dir /mnt/data/output --voice-name demo_voice --no-base-ckpt --max-epochs 50 --batch-size 4 --num-workers 0.Why these arguments
--raw-dir /mnt/data/input/raw: Matches the uploaded files.--work-dir /tmp/work: Properly saves files to Object Storage.--output-dir /mnt/data/output: Saves the exported ONNX model to the mounted volume.--no-base-ckpt: Helps avoid checkpoint compatibility problems in the dataset path.--batch-size 4 --num-workers 0: Make standard settings for a small dataset.
- Computing resources: Keep the predefined settings.
- Mount volumes: Bucket.
-
Mount path:
/mnt/data. After that, clickAttach bucket and then select the bucket created earlier.
-
Image path:
- Click Create.
Completestatus, the filesoutput/model.onnxandoutput/model.onnx.jsonare created in the bucket. These files contain the produced model. -
Deploy the model on a Serverless AI endpoint:
Wait until the endpoint reaches the
- Web console
- CLI
-
In the web console, go to
AI Services → Endpoints.
-
Click
Create endpoint.
-
On the page that opens, specify the following endpoint parameters:
- Image path:
<repository>/<image>:piper-nebius-ui-tutorial. Set the image that you’ve pushed to the Docker repository. - Ports:
8000. - Advanced settings → Entrypoint command:
uvicorn. - Advanced settings → Arguments:
app:app --host 0.0.0.0 --port 8000. - Computing resources: Keep the predefined settings.
- Mount volumes: Bucket.
- Mount path:
/mnt/data. After that, clickAttach bucket and then select the bucket created earlier.
- IP address: Public static IP.
- Image path:
- Click Create.
Runningstatus.
Synthesize speech
-
Get the endpoint IP address:
- Web console
- CLI
- In the web console, go to
AI Services → Endpoints.
- Open the page of the deployed endpoint.
- Copy the IP address from the Network → Public endpoints field.
-
To verify the endpoint health, run a health check:
Expected output:The
"ok":truemessage shows that the endpoint is healthy. -
To synthesize speech, call the endpoint:
The method generates the
speech.wavfile with the recordedHello worldphrase. The audio quality can be low because only five samples from a dataset were used to train the model. That is expected because the tutorial’s purpose is only to showcase the process of the speech synthesis. To improve the audio quality, use a bigger dataset and more samples for the model training.