Speech-to-Text
This document explains how to use the IMKit SDK to convert voice message audio content into text.
Prerequisites
Ensure your SDK version is ≥ 5.22.0 and the speech-to-text feature is enabled in the RC Console.
This feature is supported from version 5.22.0 onward and only applies to voice messages sent with SDK 5.22.0 or later. High-definition voice messages must meet these requirements: 8000Hz or 16000Hz sample rate, mono channel, aac
format, and duration ≤ 60 seconds. Historical voice messages don't support this feature.
Demo





Overview
The IMKit SDK supports recording voice messages up to 60 seconds. In the chat UI, users can long-press a voice message bubble, select Convert to Text from the menu to invoke IMLib SDK's speech-to-text conversion. The SDK tracks visibility states of converted text, displaying text UI based on visibility settings when re-entering the conversation.
The Convert to Text option won't appear if: the feature is disabled, message delivery failed, or conversion is in progress.
Component Diagram
Key Classes
Class | Purpose | Description |
---|---|---|
RCSTTContentView | STT content view | Displays conversion states. |
RCSTTDetailView | Converted text view | Shows converted text. |
RCSTTFailureView | Failure view | Displays conversion failure. |
RCDotLoadingView | Loading view | Shows loading state during conversion. |
RCSTTContentViewModel | Core STT class | Initiates requests, calculates text height, notifies RCSTTContentView to refresh. |
RCSpeechToTextModel | STT data model | Tracks current conversion state. |
RCSTTObserverContext | STT observer context | Handles all STT requests, registers/deregisters listeners. |
Class diagram:
RCSpeechToTextModel
is the data model used by IMKit SDK to track STT states. Key properties are listed below (see API docs for full details).
RCSpeechToTextModel
properties:
Property | Type | Description |
---|---|---|
status | RCSpeechToTextStatus | Conversion status. |
sttInfo | RCSpeechToTextInfo | Corresponds to database sttInfo . |
isVisible | BOOL | Whether sttInfo view is visible (default: NO). |
- After initiating conversion, the default
RCSpeechToTextInfo
object in database hasisVisible
set to YES.
Workflow
RCSTTContentViewModel
binds to RCSTTContentView
via RCSTTContentViewModelDelegate
. When RCSpeechToTextModel
state changes, the ViewModel notifies RCSTTContentView
to update UI.