Voice-to-Text Scenario Implementation
-
If you've integrated version 5.22.0 and above: RongCloud SDK has built-in voice-to-text functionality, no need for additional third-party SDK integration. You only need to activate it in RongCloud Console. Please directly refer to the following official documentation to complete integration:
-
If you've integrated versions below 5.22.0, or have custom third-party voice-to-text requirements: Please continue reading this document to get detailed implementation solutions suitable for older versions or custom integration scenarios.
Preparation
Before getting started, please ensure you have created an application and completed client SDK integration.
This document uses integrating 5.X IMKit SDK as an example to implement voice-to-text functionality. If you're using IMLib SDK, you need to actively get the FileUrl
(remote address) of HQVoiceMessage
, and after business side successfully converts voice messages to text by calling other third-party SDKs, call RongCoreClient
's setMessageExtra
method to put the converted information into the message's additional information to implement this functionality yourself. Web does not support using this document to implement voice-to-text capability.
Voice-to-Text Functionality: IMKit SDK records voice files (supports up to 60 seconds of audio). After entering the chat UI, you can manually long-press voice message bubbles in the message list. In the appearing menu, click "Convert to Text", then business side calls other third-party SDKs to successfully convert voice messages to text and display the corresponding text information for the voice. Specific supported languages depend on third-party capabilities.
Effect Example


Operation Steps
1. Customize Long-Press Message Menu
Long-pressing messages in conversation page opens a popup. Based on current message type judgment, if it's a high-quality voice message, add "Convert to Text" option. Handle "Convert to Text" processing in onMessageItemLongClick
callback. Your business side calls other third-party APIs, gets converted information and sets it to Message's Extra
. After calling refreshMessage
, SDK automatically refreshes message display information.
- Android
- iOS
MessageItemLongClickAction speechToText =
new MessageItemLongClickAction.Builder()
.titleResId(R.string.speech_to_text)
.actionListener(
new MessageItemLongClickAction.MessageItemLongClickListener() {
@Override
public boolean onMessageItemLongClick(
Context context, UiMessage uiMessage) {
// Get message entity
Message message = uiMessage.getMessage();
// Assign value to current message's extra
message.setExtra("1231234");
// Store message's extra to local database
RongCoreClient.getInstance().setMessageExtra(message.getMessageId(),"1231234",null);
// Refresh single Message information
IMCenter.getInstance().refreshMessage(message);
return true;
}
})
.showFilter(
new MessageItemLongClickAction.Filter() {
@Override
public boolean filter(UiMessage uiMessage) {
//Judge if it's high-quality voice message, show convert to text Item
Message message = uiMessage.getMessage();
return (message.getContent() instanceof HQVoiceMessage);
}
})
.build();
MessageItemLongClickActionManager.getInstance()
.addMessageItemLongClickAction(speechToText);
// 1. Override long-press event, add convert to text button:
- (NSArray<UIMenuItem *> *)getLongTouchMessageCellMenuList:(RCMessageModel *)model {
NSMutableArray<UIMenuItem *> *menuList = [[super getLongTouchMessageCellMenuList:model] mutableCopy];
if ([model.content isKindOfClass:[RCHQVoiceMessage class]]) {
UIMenuItem *forwardItem = [[UIMenuItem alloc] initWithTitle:@"Voice to Text"
action:@selector(audioToString)];
[menuList addObject:forwardItem];
}
return menuList.copy;
}
// 2. Implement click method to handle "convert to text" processing. Your business side calls other third-party APIs, gets converted information and sets it to Message's Extra:
// Subclass page declares property
@property (nonatomic, strong) RCMessageModel *currentSelectedModel;
// Implement click method
- (void)audioToString {
// Implement business layer voice-to-text functionality
NSString *result = @"Voice to text";
[[RCCoreClient sharedCoreClient] setMessageExtra:self.currentSelectedModel.messageId value:result completion:^(BOOL ret) {
dispatch_async(dispatch_get_main_queue(), ^{
self.currentSelectedModel.extra = result;
self.currentSelectedModel.cellSize = CGSizeZero;
[self.conversationMessageCollectionView reloadData];
RCMessageModel *model = [self.conversationDataRepository lastObject];
if (model.messageId == self.currentSelectedModel.messageId ) {
[self scrollToBottomAnimated:YES];
}
});
}];
}
2. Modify High-Quality Voice Message Display Style
Android Implementation Process
- Copy SDK's default
HQVoiceMessageItemProvider
class andrc_item_hq_voice_message.xml
resource to your project directory and rename them. - Call
replaceMessageProvider
method to replace SDK's default message display template. After replacement, SDK will automatically call this type of message's custom template for rendering when rendering messages. - Add display View for converted text in custom
rc_item_hq_voice_message.xml
, add judgment in customHQVoiceMessageItemProvider
- if message additional information is not empty, display converted text.
Code class references:
iOS Implementation Process
Import the classes below, register and bind custom Cell in chat page:
- iOS CustomHQVoiceMessageCell.m
- iOS CustomVoicePlayer.m
- iOS CustomHQVoiceMsgDownloadManager.m
- iOS CustomVoicePlayer.h
- iOS CustomHQVoiceMsgDownloadInfo.h
- iOS CustomHQVoiceMessageCell.h
- iOS CustomHQVoiceMsgDownloadManager.h
- iOS CustomHQVoiceMsgDownloadInfo.m
- (void)registerCustomCellsAndMessages {
[super registerCustomCellsAndMessages];
///Register custom voice-to-text message Cell
[self registerClass:[CustomHQVoiceMessageCell class] forMessageClass:[RCHQVoiceMessage class]];
}
3. Modify Voice Message Sampling Rate
Since most manufacturers in the market support 16K sampling rate for voice-to-text functionality, if you need to modify voice message sampling rate, you need to integrate IMKit source code and add sampling rate setting information at the location shown below.
Android Modification Location
mMediaRecorder.setAudioSamplingRate(SamplingRate.RC_SAMPLE_RATE_16000.getValue());

iOS Modification Location
rcHQVoiceRecorderHandler.recordSettings = @{
AVFormatIDKey : @(kAudioFormatMPEG4AAC_HE),
AVNumberOfChannelsKey : @1,
AVEncoderBitRateKey : @(16000)
};
