Skip to main content

Voice-to-Text Scenario Implementation

Note: Please first confirm your integrated SDK version.
  • If you've integrated version 5.22.0 and above: RongCloud SDK has built-in voice-to-text functionality, no need for additional third-party SDK integration. You only need to activate it in RongCloud Console. Please directly refer to the following official documentation to complete integration:

  • If you've integrated versions below 5.22.0, or have custom third-party voice-to-text requirements: Please continue reading this document to get detailed implementation solutions suitable for older versions or custom integration scenarios.

Preparation

Before getting started, please ensure you have created an application and completed client SDK integration.

tip

This document uses integrating 5.X IMKit SDK as an example to implement voice-to-text functionality. If you're using IMLib SDK, you need to actively get the FileUrl (remote address) of HQVoiceMessage, and after business side successfully converts voice messages to text by calling other third-party SDKs, call RongCoreClient's setMessageExtra method to put the converted information into the message's additional information to implement this functionality yourself. Web does not support using this document to implement voice-to-text capability.

Voice-to-Text Functionality: IMKit SDK records voice files (supports up to 60 seconds of audio). After entering the chat UI, you can manually long-press voice message bubbles in the message list. In the appearing menu, click "Convert to Text", then business side calls other third-party SDKs to successfully convert voice messages to text and display the corresponding text information for the voice. Specific supported languages depend on third-party capabilities.

Effect Example

Operation Steps

1. Customize Long-Press Message Menu

Long-pressing messages in conversation page opens a popup. Based on current message type judgment, if it's a high-quality voice message, add "Convert to Text" option. Handle "Convert to Text" processing in onMessageItemLongClick callback. Your business side calls other third-party APIs, gets converted information and sets it to Message's Extra. After calling refreshMessage, SDK automatically refreshes message display information.

MessageItemLongClickAction speechToText =
new MessageItemLongClickAction.Builder()
.titleResId(R.string.speech_to_text)
.actionListener(
new MessageItemLongClickAction.MessageItemLongClickListener() {
@Override
public boolean onMessageItemLongClick(
Context context, UiMessage uiMessage) {
// Get message entity
Message message = uiMessage.getMessage();
// Assign value to current message's extra
message.setExtra("1231234");
// Store message's extra to local database
RongCoreClient.getInstance().setMessageExtra(message.getMessageId(),"1231234",null);
// Refresh single Message information
IMCenter.getInstance().refreshMessage(message);
return true;
}
})
.showFilter(
new MessageItemLongClickAction.Filter() {
@Override
public boolean filter(UiMessage uiMessage) {
//Judge if it's high-quality voice message, show convert to text Item
Message message = uiMessage.getMessage();
return (message.getContent() instanceof HQVoiceMessage);
}
})
.build();
MessageItemLongClickActionManager.getInstance()
.addMessageItemLongClickAction(speechToText);

2. Modify High-Quality Voice Message Display Style

Android Implementation Process

  1. Copy SDK's default HQVoiceMessageItemProvider class and rc_item_hq_voice_message.xml resource to your project directory and rename them.
  2. Call replaceMessageProvider method to replace SDK's default message display template. After replacement, SDK will automatically call this type of message's custom template for rendering when rendering messages.
  3. Add display View for converted text in custom rc_item_hq_voice_message.xml, add judgment in custom HQVoiceMessageItemProvider - if message additional information is not empty, display converted text.

Code class references:

iOS Implementation Process

Import the classes below, register and bind custom Cell in chat page:

- (void)registerCustomCellsAndMessages {
[super registerCustomCellsAndMessages];
///Register custom voice-to-text message Cell
[self registerClass:[CustomHQVoiceMessageCell class] forMessageClass:[RCHQVoiceMessage class]];
}

3. Modify Voice Message Sampling Rate

Since most manufacturers in the market support 16K sampling rate for voice-to-text functionality, if you need to modify voice message sampling rate, you need to integrate IMKit source code and add sampling rate setting information at the location shown below.

Android Modification Location

mMediaRecorder.setAudioSamplingRate(SamplingRate.RC_SAMPLE_RATE_16000.getValue());

iOS Modification Location

rcHQVoiceRecorderHandler.recordSettings = @{
AVFormatIDKey : @(kAudioFormatMPEG4AAC_HE),
AVNumberOfChannelsKey : @1,
AVEncoderBitRateKey : @(16000)
};