Using Piper With Foliate For Text-to-Speech

Recently, I’ve discovered that the Piper text-to-speech utility can be coupled with the Foliate e-book reader to really enhance the TTS experience. For those unaware, Piper is a TTS service that uses voices trained by VITS. What does that mean exactly? To put it in the simplest of terms, the voices are machine trained using large-language models, and come very close to sounding natural. They don’t sound anything like the robotic voices used by festival and eSpeak NG, which are the classic TTS solutions for Linux. This turns Foliate into a pseudo-audiobook experience. I say “pseudo”, because the voices do slip up from time to time, but it’s much more pleasant sounding than the traditional solutions.

Installing Foliate

Foliate should be available in most major package manager repositories. Ubuntu and Debian users can install it with the command below. It’s also available through both Snap and Flatpak

$ sudo apt install foliate
$ sudo snap install foliate
$ flatpak install com.github.johnfactotum.Foliate

There’s also a package available for Arch, Fedora,and OpenSUSE.

Installing & Configuring Piper

Piper is a little more tricky to get set up, as it’s a Python application, but it’s pretty straight forward if you have pip3 installed. If not, make sure you have that set up from your distro’s respective package repos, and then issue this command. Also, if you don’t have Python 2 installed, you should be able to just call pip.

$ pip3 install piper-tts

Once you’ve finished there, we need to create a directory for voices.

$ mkdir ~/.local/share/piper-tts

From here, we create a script with our arguments for Piper. This is what Foliate will be calling to actually use it as a TTS service. It also assumes that you have aplay installed, but you can pipe this to just about anything if you know how.

$ sudo vim /usr/bin/piper-tts

#!/bin/bash
piper --download-dir $HOME/.local/share/piper-tts --data-dir $HOME/.local/share/piper-tts --model en_GB-northern_english_male-medium --sentence-silence 0.5 --output-raw | aplay -r 22050 -f S16_LE -t raw - & 
trap 'kill $!; exit 0' INT 
wait

$ sudo chmod +x /usr/bin/piper-tts

The voice model I use is a British voice, and sounds the most natural, in my opinion. Feel free to define a different one, though. Voices are available on HuggingFace, and there are several to choose from in multiple languages. To replace the voice, follow the expected naming scheme. For example, if you wanted to use Joe from the US, his model name is “en_US-joe-medium”. Some voices also offer a “high” and “low” variant, providing higher or lower audio quality. Also, feel free to play with the sentence silence variable. What I set seems good enough, but you might have a different preference. Finally, to test that it’s working, you can pipe a string of text to the script.

$ echo 'This is a test sentence.' | piper-tts

Configuring Foliate

Now that Piper is set up, we just need to tell Foliate to use it. To do so, launch it, and then click the cog icon in the top-right corner to open its settings. Click the “Configure” button for the TTS settings, select “Other”, and then just enter piper-tts into the command box. Now, click the “Speak” button, and Piper should read off the sample text! As a tip, you can tell Piper to start reading a page with the F5 key. Or, you can highlight a portion of text, and tell it to start reading from there.