Speech-to-Text

This document explains how to use the IMKit SDK to convert voice message audio content into text.

Prerequisites

Ensure your SDK version is ≥ 5.22.0 and the speech-to-text feature is enabled in the RC Console.

tip

This feature is supported from version 5.22.0 onward and only applies to voice messages sent with SDK 5.22.0 or later. High-definition voice messages must meet these requirements: 8000Hz or 16000Hz sample rate, mono channel, aac format, and duration ≤ 60 seconds. Historical voice messages don't support this feature.

Demo

Overview

The IMKit SDK supports recording voice messages up to 60 seconds. In the chat UI, users can long-press a voice message bubble, select Convert to Text from the menu to invoke IMLib SDK's speech-to-text conversion. The SDK tracks visibility states of converted text, displaying text UI based on visibility settings when re-entering the conversation.

tip

The Convert to Text option won't appear if: the feature is disabled, message delivery failed, or conversion is in progress.

Component Diagram

Key Classes

Class	Purpose	Description
`RCSTTContentView`	STT content view	Displays conversion states.
`RCSTTDetailView`	Converted text view	Shows converted text.
`RCSTTFailureView`	Failure view	Displays conversion failure.
`RCDotLoadingView`	Loading view	Shows loading state during conversion.
`RCSTTContentViewModel`	Core STT class	Initiates requests, calculates text height, notifies RCSTTContentView to refresh.
`RCSpeechToTextModel`	STT data model	Tracks current conversion state.
`RCSTTObserverContext`	STT observer context	Handles all STT requests, registers/deregisters listeners.

Class diagram:

RCSpeechToTextModel is the data model used by IMKit SDK to track STT states. Key properties are listed below (see API docs for full details).

RCSpeechToTextModel properties:

Property	Type	Description
`status`	`RCSpeechToTextStatus`	Conversion status.
`sttInfo`	`RCSpeechToTextInfo`	Corresponds to database `sttInfo`.
`isVisible`	`BOOL`	Whether `sttInfo` view is visible (default: NO).

tip

After initiating conversion, the default RCSpeechToTextInfo object in database has isVisible set to YES.

Workflow

RCSTTContentViewModel binds to RCSTTContentView via RCSTTContentViewModelDelegate. When RCSpeechToTextModel state changes, the ViewModel notifies RCSTTContentView to update UI.

IMLib

IMKit

Global Chat UIKit

push notification

IMLib

Global Chat UIKit

CallLib

CallKit

RTCLib

Chat Server

Chat Server SDK

Audio & Video Server

Speech-to-Text

Prerequisites​

Demo​

Overview​

Component Diagram​

Key Classes​

Workflow​