Skip to main content

Speech-to-Text

This document explains how to use the IMKit SDK to convert voice message audio content into text.

Prerequisites

Ensure your SDK version is ≥ 5.22.0 and the speech-to-text feature is enabled in the RC Console.

tip

This feature is supported from version 5.22.0 onward and only applies to voice messages sent with SDK 5.22.0 or later. High-definition voice messages must meet these requirements: 8000Hz or 16000Hz sample rate, mono channel, aac format, and duration ≤ 60 seconds. Historical voice messages don't support this feature.

Demo

Overview

The IMKit SDK supports recording voice messages up to 60 seconds. In the chat UI, users can long-press a voice message bubble, select Convert to Text from the menu to invoke IMLib SDK's speech-to-text conversion. The SDK tracks visibility states of converted text, displaying text UI based on visibility settings when re-entering the conversation.

tip

The Convert to Text option won't appear if: the feature is disabled, message delivery failed, or conversion is in progress.

Component Diagram

Key Classes

ClassPurposeDescription
RCSTTContentViewSTT content viewDisplays conversion states.
RCSTTDetailViewConverted text viewShows converted text.
RCSTTFailureViewFailure viewDisplays conversion failure.
RCDotLoadingViewLoading viewShows loading state during conversion.
RCSTTContentViewModelCore STT classInitiates requests, calculates text height, notifies RCSTTContentView to refresh.
RCSpeechToTextModelSTT data modelTracks current conversion state.
RCSTTObserverContextSTT observer contextHandles all STT requests, registers/deregisters listeners.

Class diagram:

RCSpeechToTextModel is the data model used by IMKit SDK to track STT states. Key properties are listed below (see API docs for full details).

RCSpeechToTextModel properties:

PropertyTypeDescription
statusRCSpeechToTextStatusConversion status.
sttInfoRCSpeechToTextInfoCorresponds to database sttInfo.
isVisibleBOOLWhether sttInfo view is visible (default: NO).
tip
  • After initiating conversion, the default RCSpeechToTextInfo object in database has isVisible set to YES.

Workflow

RCSTTContentViewModel binds to RCSTTContentView via RCSTTContentViewModelDelegate. When RCSpeechToTextModel state changes, the ViewModel notifies RCSTTContentView to update UI.