I was fairly impressed and ended up using kokoro-tts: hexgrad/Kokoro-82M · Hugging Face
I can’t run it locally (no NVIDIA GPU) but Google Colabs works perfectly fine for my needs.
Should anyone have a strong enough NVIDIA GPU, then I would recommend kokoro.