MMODELYST
Generative media

Media

Ranked by human-preference arena ELO (text-to-speech) · via Artificial Analysis · verified Jun 15, 2026 · methodology
#ModelCreatorELO±95%VotesReleased
1Fun-Realtime-TTSAlibaba1226-16/16
2Gemini 3.1 Flash TTSGoogle1215-13/13
3Realtime TTS-2 - Research PreviewInworld1208-14/14
4Sonic 3.5Cartesia1197-15/15
5xAI Text to SpeechxAI1196-14/14
6Realtime TTS 1.5 MaxInworld1195-14/14
7Async Flash v1.5async1181-15/15
8StepAudio 2.5 TTSStepFun1177-16/16
9Eleven v3ElevenLabs1176-12/12
10Speech 2.8 HDMiniMax1170-12/12
11Inworld TTS 1 MaxInworld1161-13/13
12Async Pro v1.0async1159-15/15
13Step TTS 2 (Mar 2026)StepFun1155-14/14
14Speech 2.8 TurboMiniMax1152-12/12
15Realtime TTS 1.5 MiniInworld1151-12/12
16Speech 2.6 TurboMiniMax1137-11/11
17Speech 2.6 HDMiniMax1136-12/12
18SIMBA 3.0Speechify1129-13/13
19Inworld TTS 1Inworld1121-12/12
20Azure HD 2.5Microsoft1121-13/13
21Fish Audio S2 ProFish Audio1117-14/14
22Turbo v2.5ElevenLabs1109-10/10
23Speech-02-HDMiniMax1108-12/12
24Step Audio EditX (Mar 2026)StepFun1107-14/14
25Multilingual v2ElevenLabs1107-10/10
26Gradium TTSGradium1098-15/15
27TTS-1 HDOpenAI1096-12/12
28Flash v2.5ElevenLabs1095-11/11
29Speech-02-TurboMiniMax1090-12/12
30Lightning V3.1 ProSmallest.ai1090-14/14
31Chatterbox HDResemble AI1086-13/13
32TTS-1OpenAI1086-10/10
33Sonic 3Cartesia1083-12/12
34Gemini 2.5 Flash Lite TTSGoogle1082-12/12
35StudioGoogle1081-11/11
36MiMo-V2.5-TTSXiaomi1078-14/14
37Voxtral TTSMistral1077-14/14
38Polly GenerativeAmazon1072-12/12
39Azure NeuralMicrosoft1067-12/12
40Polly Long-FormAmazon1064-13/13
41T2A-01-HDMiniMax1064-11/11
42Kokoro 82M v1.0Kokoro1061-11/11
43Magpie-Multilingual 357M (Feb 2026)NVIDIA1060-14/14
44OpenAudio S1Fish Audio1060-11/11
45SIMBA 1.6Speechify1060-13/13
46Octave 2Hume AI1060-12/12
47Async Flash v1.0async1059-11/11
48Chirp 3: HDGoogle1056-12/12
49Maya1Maya Research1051-12/12
50JourneyGoogle1050-13/13
51Sonic English (Oct 2024)Cartesia1048-12/12
52CodaRime1044-14/14
53Gemini 2.5 Pro (Dec 2025)Google1037-12/12
54SIMBA 1.0Speechify1034-11/11
55Gemini 2.5 Flash TTS (Dec 2025)Google1032-12/12
56Lightning v3.1Smallest.ai1030-13/13
57T2A-01-TurboMiniMax1028-11/11
58Octave TTSHume AI1027-11/11
59MiMo-V2-TTSXiaomi1012-14/14
60Fish Speech 1.5Fish Audio1008-11/11
61Magpie-Multilingual 357MNVIDIA1008-11/11
62Arcana v3Rime1006-14/14
63ChatterboxResemble AI1004-11/11
64MAI-Voice-1Microsoft1004-13/13
65Zonos-v0.1Zyphra10000/0
66VibeVoice 7BMicrosoft968-13/13
67LMNTLMNT967-12/12
68Murf Speech Gen 2Murf AI967-11/11
69VibeVoice 1.5BMicrosoft961-13/13
70OpenVoice v2OpenVoice960-13/13
71Magpie MultilingualNVIDIA939-14/14
72Neuphonic TTSNeuphonic939-13/13
73Qwen3 TTS FlashAlibaba930-14/14
74Qwen3 TTSAlibaba917-13/13
75XTTS v2Coqui914-15/15
76WaveNetGoogle895-11/11
77StyleTTS 2StyleTTS 889-15/15
78Mist V2Rime881-15/15
79Polly NeuralAmazon881-14/14
80StandardGoogle876-11/11
81Neural2Google875-12/12
82Noiz TTSNoiz853-17/17
83MetaVoice v1MetaVoice838-18/18
84Falcon (Beta)Murf AI826-15/15
85Polly StandardAmazon812-15/15

ELO comes from blind pairwise human votes in the Artificial Analysis arenas — preference, not capability; confidence intervals (±95%) show how settled each rating is. Votes = arena appearances.