Accent conversion and controllability remain fundamental challenges in cross-lingual text-to-speech (TTS), particularly for low-resource and phonetically diverse Indic languages. While recent large language model (LLM)-based TTS systems exhibit strong cross-lingual generalization, they provide limited explicit control over accent characteristics and intensity. In this paper, we propose CrossAccentTTS, a framework that enables both accent control and conversion while preserving speaker identity. Specifically, we introduce an Accent Intensity Controller (AIC) that injects weighted language embeddings into the accent subspace, allowing smooth interpolation between accents and fine-grained modulation of accent strength at inference time. Experiments on the Indic Multilingual and L2-arctic datasets show that CrossAccent-TTS achieves precise control of accent intensity, outperforming strong baselines in accent similarity and controllability while maintaining speaker similarity and naturalness.
* Equal Contributions
@inproceedings{crossaccentInterspeech,
title={CrossAccent-TTS: Cross-Lingual Accent-Intensity Controllable Text-to-Speech via Disentangled Speaker and Accent Representations},
author={Annamdevula, Ram and Tatawat, Ankit and Gudmalwar, Ashishkumar and Shah, Nirmesh and Wasnik, Pankaj},
booktitle={INTERSPEECH},
year={2026}
}