kouwei qing

Posted on Dec 25

HarmonyOS Audio Collection in Audio and Video Practice

#harmonyos #video

Background

In the process of application development, there are audio collection requirements in many scenarios, such as the voice sending function in chat features, real-time speech-to-text conversion function, real-time voice calls, and real-time video calls. On the Android and iOS platforms, the system provides two forms:

Real-time audio stream collection
Audio file recording

The system also provides different forms of APIs. For example, on Android:

AudioRecorder Java interface
MediaRecorder Java interface
OpenSLES C++ interface
AAudio C++ interface

During the process of HarmonyOS adaptation, there is also a need for audio collection. In this article, we will implement the audio collection function step by step.

Introduction to Audio Recording Interfaces

HarmonyOS provides two types of audio collection interfaces, namely TS and C++:

AudioCapture
OHAudio

The APIs of these two languages will be introduced respectively.

AudioCapture

Using AudioCapturer to record audio involves creating an AudioCapturer instance, configuring audio collection parameters, starting and stopping the collection, and releasing resources. The following state diagram provided by the official clearly marks the methods and state transitions:

createAudioCapture

Creating a capture mainly involves parameter configuration:

import { audio } from '@kit.AudioKit';

let audioStreamInfo: audio.AudioStreamInfo = {
  samplingRate: audio.AudioSamplingRate.SAMPLE_RATE_48000, // Sampling rate
  channels: audio.AudioChannel.CHANNEL_2, // Number of channels
  sampleFormat: audio.AudioSampleFormat.SAMPLE_FORMAT_S16LE, // Sampling format
  encodingType: audio.AudioEncodingType.ENCODING_TYPE_RAW // Encoding format
};

let audioCapturerInfo: audio.AudioCapturerInfo = {
  source: audio.SourceType.SOURCE_TYPE_MIC,
  capturerFlags: 0
};

let audioCapturerOptions: audio.AudioCapturerOptions = {
  streamInfo: audioStreamInfo,
  capturerInfo: audioCapturerInfo
};

audio.createAudioCapturer(audioCapturerOptions, (err, data) => {
  if (err) {
  } else {
    let audioCapturer = data;
  }
});

The parameters consist of two main parts:

AudioStreamInfo: Audio format configuration information
- samplingRate: Sampling rate
- channels: Number of channels
- sampleFormat: Sampling format
- encodingType: Audio encoding type. Currently, only the ENCODING_TYPE_RAW configuration for PCM is supported.
AudioCapturerInfo: Collection configuration information
- source: Audio source type, including:
  - SOURCE_TYPE_INVALID: Invalid audio source
  - SOURCE_TYPE_MIC: Microphone audio source
  - SOURCE_TYPE_VOICE_RECOGNITION: Voice recognition source
  - SOURCE_TYPE_PLAYBACK_CAPTURE: Audio source for recording the playback audio stream (internal recording)
  - SOURCE_TYPE_VOICE_COMMUNICATION: Audio source for voice call scenarios
  - SOURCE_TYPE_VOICE_MESSAGE: Audio source for short voice messages
- capturerFlags: Audio capturer flag. 0 represents the audio capturer.

on('readData')

The on('readData') method is used to subscribe to and monitor the callback for reading audio data:

let readDataCallback = (buffer: ArrayBuffer) => {
    // Process the audio stream
};
audioCapturer.on('readData', readDataCallback);

start

The start method is used to start recording:

import { BusinessError } from '@kit.BasicServicesKit';
audioCapturer.start((err: BusinessError) => {
  if (err) {
  } else {
  }
});

stop

The stop method is used to stop recording:

import { BusinessError } from '@kit.BasicServicesKit';
audioCapturer.stop((err: BusinessError) => {
  if (err) {
  } else {
  }
});

release

The release method destroys the instance and releases resources:

import { BusinessError } from '@kit.BasicServicesKit';
audioCapturer.release((err: BusinessError) => {
  if (err) {
  } else {
  }
});

OHAudio

OHAudio is a set of C APIs introduced by the system in API version 10. This API is designed to be unified and supports both normal audio paths and low-latency paths. It only supports the PCM format and is suitable for scenarios where audio input functions are implemented at the Native layer. Many audio encoding libraries are implemented in C/C++. After migrating to the HarmonyOS platform, using the OHAudio C++ interface on the collection side can reduce the consumption of data transfer between the TS layer and the C++ layer and improve efficiency.

OHAudio depends on the libohaudio.so dynamic library. By introducing the <native_audiostreambuilder.h> and <native_audiocapturer.h> header files, you can use the APIs related to audio recording.

Creating the Constructor

OH_AudioStreamBuilder* builder;
OH_AudioStreamBuilder_Create(&builder, AUDIOSTREAM_TYPE_CAPTURER);

Configuring Audio Stream Parameters

You can refer to the following example:

// Set the audio sampling rate
OH_AudioStreamBuilder_SetSamplingRate(builder, 48000);
// Set the number of audio channels
OH_AudioStreamBuilder_SetChannelCount(builder, 2);
// Set the audio sampling format
OH_AudioStreamBuilder_SetSampleFormat(builder, AUDIOSTREAM_SAMPLE_S16LE);
// Set the encoding type of the audio stream
OH_AudioStreamBuilder_SetEncodingType(builder, AUDIOSTREAM_ENCODING_TYPE_RAW);
// Set the working scenario of the input audio stream
OH_AudioStreamBuilder_SetCapturerInfo(builder, AUDIOSTREAM_SOURCE_TYPE_MIC);

The roles of these parameters are similar to those of AudioCapture.

Setting the Audio Callback Functions

// Custom write data function
int32_t MyOnReadData(
    OH_AudioCapturer* capturer,
    void* userData,
    void* buffer,
    int32_t length)
{
    // Take out the recording data with the length from the buffer
    return 0;
}
// Custom audio stream event function
int32_t MyOnStreamEvent(
    OH_AudioCapturer* capturer,
    void* userData,
    OH_AudioStream_Event event)
{
    // Update the player state and interface according to the audio stream event information represented by event
    return 0;
}
// Custom audio interruption event function
int32_t MyOnInterruptEvent(
    OH_AudioCapturer* capturer,
    void* userData,
    OH_AudioInterrupt_ForceType type,
    OH_AudioInterrupt_Hint hint)
{
    // Update the recorder state and interface according to the audio interruption information represented by type and hint
    return 0;
}
// Custom exception callback function
int32_t MyOnError(
    OH_AudioCapturer* capturer,
    void* userData,
    OH_AudioStream_Result error)
{
    // Make corresponding processing according to the audio exception information represented by error
    return 0;
}

OH_AudioCapturer_Callbacks callbacks;
// Configure callback functions
callbacks.OH_AudioCapturer_OnReadData = MyOnReadData;
callbacks.OH_AudioCapturer_OnStreamEvent = MyOnStreamEvent;
callbacks.OH_AudioCapturer_OnInterruptEvent = MyOnInterruptEvent;
callbacks.OH_AudioCapturer_OnError = MyOnError;

// Set the callback for the audio input stream
OH_AudioStreamBuilder_SetCapturerCallback(builder, callbacks, nullptr);

Configure the callback functions through the OH_AudioStreamBuilder_SetCapturerCallback function.

Constructing the Recording Audio Stream

OH_AudioCapturer* audioCapturer;
OH_AudioStreamBuilder_GenerateCapturer(builder, &audioCapturer);

Using the Audio Stream

OH_AudioStream_Result OH_AudioCapturer_Start(OH_AudioCapturer* capturer): Start recording.
OH_AudioStream_Result OH_AudioCapturer_Pause(OH_AudioCapturer* capturer): Pause recording.
OH_AudioStream_Result OH_AudioCapturer_Stop(OH_AudioCapturer* capturer): Stop recording.
OH_AudioStream_Result OH_AudioCapturer_Flush(OH_AudioCapturer* capturer): Release cached data.
OH_AudioStream_Result OH_AudioCapturer_Release(OH_AudioCapturer* capturer): Release the recording instance.

Releasing the Constructor

OH_AudioStreamBuilder_Destroy(builder);

Audio Recording Best Practices

Let's take recording MP3 as an example to implement the full process practice of audio collection.

Permission Application

Audio collection requires dynamic permission application. Declare the permissions in module.json5:

"requestPermissions": [  
  {  
    "name": "ohos.permission.MICROPHONE",  
    "reason": "$string:reason",  
    "usedScene": {  
      "abilities": [  
        "FormAbility"  
      ],  
      "when": "inuse"  
    }  
  }  
],

Apply for permissions dynamically:

function reqPermissionsFromUser(permissions: Array<Permissions>, context: common.UIAbilityContext): void {  
  let atManager: abilityAccessCtrl.AtManager = abilityAccessCtrl.createAtManager();  
  // requestPermissionsFromUser will determine the authorization status of permissions to decide whether to pop up a window  
  atManager.requestPermissionsFromUser(context, permissions).then((data) => {  
    let grantStatus: Array<number> = data.authResults;  
    let length: number = grantStatus.length;  
    for (let i = 0; i < length; i++) {  
      if (grantStatus[i] === 0) {  
        // User has authorized, and you can continue to access the target operation  
      } else {  
        // User has refused authorization. Prompt the user that authorization is required to access the functions on the current page and guide the user to open the corresponding permissions in the system settings.  
        return;  
      }  
    }    // Authorization is successful  
  }).catch((err: BusinessError) => {  
    console.error(`Failed to request permissions from user. Code is ${err.code}, message is ${err.message}`);  
  })  
}

Call the permission application method in aboutToAppera and start recording after the authorization is successful:

const context: common.UIAbilityContext = getContext(this) as common.UIAbilityContext;  
reqPermissionsFromUser(permissions, context);

Configuring the C++ Project

After creating a C++ module, configure the dependence on the ohaudio dynamic library:

cmake_minimum_required(VERSION 3.5.0)  
project(audiorecorderdemo)  

set(NATIVERENDER_ROOT_PATH ${CMAKE_CURRENT_SOURCE_DIR})  

if(DEFINED PACKAGE_FIND_FILE)  
    include(${PACKAGE_FIND_FILE})  
endif()  

include_directories(${NATIVERENDER_ROOT_PATH}  
                    ${NATIVERENDER_ROOT_PATH}/include)  

add_library(capture SHARED napi_init.cpp)  
target_link_libraries(capture PUBLIC libace_napi.z.so)  
target_link_libraries(capture PUBLIC libohaudio.so)

Configure the napi method:

static napi_value start(napi_env env, napi_callback_info info)  
{  


    return nullptr;  

}  
static napi_value stop(napi_env env, napi_callback_info info)  
{  


    return nullptr;  

}  
EXTERN_C_START  
static napi_value Init(napi_env env, napi_value exports)  
{  
    napi_property_descriptor desc[] = {  
        { "start", nullptr, start, nullptr, nullptr, nullptr, napi_default, nullptr },  
        { "stop", nullptr, stop, nullptr, nullptr, nullptr, napi_default, nullptr }  
    };  
    napi_define_properties(env, exports, sizeof(desc) / sizeof(desc[0]), desc);  
    return exports;  
}

Implementing the Start of Recording

// Custom write data function  
int32_t MyOnReadData(  
    OH_AudioCapturer* capturer,  
    void* userData,  
    void* buffer,  
    int32_t length)  
{  
    //TODO Take out the recording data with the length from the buffer  
    return 0;  
}  
// Custom audio stream event function  
int32_t MyOnStreamEvent(  
    OH_AudioCapturer* capturer,  
    void* userData,  
    OH_AudioStream_Event event)  
{  
    //TODO Update the player state and interface according to the audio stream event information represented by event  
    return 0;  
}  
// Custom audio interruption event function  
int32_t MyOnInterruptEvent(  
    OH_AudioCapturer* capturer,  
    void* userData,  
    OH_AudioInterrupt_ForceType type,  
    OH_AudioInterrupt_Hint hint)  
{  
    //TODO Update the recorder state and interface according to the audio interruption information represented by type and hint  
    return 0;  
}  
// Custom exception callback function  
int32_t MyOnError(  
    OH_AudioCapturer* capturer,  
    void* userData,  
    OH_AudioStream_Result error)  
{  
    //TODO Make corresponding processing according to the audio exception information represented by error  
    return 0;  
}  
static napi_value start(napi_env env, napi_callback_info info)  
{  
    OH_AudioStreamBuilder* builder;  
    OH_AudioStreamBuilder_Create(&builder, AUDIOSTREAM_TYPE_CAPTURER);  
    // Set the audio sampling rate  
    OH_AudioStreamBuilder_SetSamplingRate(builder, 48000);  
    // Set the audio声道  
    OH_AudioStreamBuilder_SetChannelCount(builder, 2);  
    // Set the audio sampling format  
    OH_AudioStreamBuilder_SetSampleFormat(builder, AUDIOSTREAM_SAMPLE_S16LE);  
    // Set the encoding type of the audio stream  
    OH_AudioStreamBuilder_SetEncodingType(builder, AUDIOSTREAM_ENCODING_TYPE_RAW);  
    // Set the working scenario of the input audio stream  
    OH_AudioStreamBuilder_SetCapturerInfo(builder, AUDIOSTREAM_SOURCE_TYPE_MIC);  

OH_AudioCapturer_Callbacks callbacks;  
    // Configure callback functions  
    callbacks.OH_AudioCapturer_OnReadData = MyOnReadData;  
    callbacks.OH_AudioCapturer_OnStreamEvent = MyOnStreamEvent;  
    callbacks.OH_AudioCapturer_OnInterruptEvent = MyOnInterruptEvent;  
    callbacks.OH_AudioCapturer_OnError = MyOnError;  
    // Set the callback for the audio input stream  
    OH_AudioStreamBuilder_SetCapturerCallback(builder, callbacks, nullptr);  
    OH_AudioCapturer* audioCapturer;  
    OH_AudioStreamBuilder_GenerateCapturer(builder, &audioCapturer);  
    return nullptr;  
}

Best Practice 1:
To avoid unexpected behaviors, when setting audio callback functions, please ensure that each callback in OH_AudioCapturer_Callbacks is initialized with a custom callback method or a null pointer. For example:

OH_AudioCapturer_Callbacks callbacks;

// Configure callback functions. If you need to listen, assign values.
callbacks.OH_AudioCapturer_OnReadData = MyOnReadData;
callbacks.OH_AudioCapturer_OnInterruptEvent = MyOnInterruptEvent;

// (Required) If you don't need to listen, initialize with a null pointer.
callbacks.OH_AudioCapturer_OnStreamEvent = nullptr;
callbacks.OH_AudioCapturer_OnError = nullptr;

Best Practice 2:
For devices that support the low-latency mode, in scenarios with high latency requirements (such as voice calls), you can use the low-latency mode to create an audio recording constructor to obtain a higher-quality audio experience:

OH_AudioStream_LatencyMode latencyMode = AUDIOSTREAM_LATENCY_MODE_FAST;
OH_AudioStreamBuilder_SetLatencyMode(builder, latencyMode);

Audio File Processing

In the audio callback, we can process the audio data. It can be handed over to ASR or directly written to a file. In the next article, we will implement the practice of encoding it into MP3 and writing it to a file.

Stopping Playback and Destroying the Instance

OH_AudioCapturer_Stop(builder, &audioCapturer);
OH_AudioStreamBuilder_Destroy(builder);

Summary

This article introduced two audio collection methods provided by HarmonyOS: AudioCapture at the TS layer and OHAudio at the C++ layer, and implemented the real-time audio collection function using the OHAudio interface.

DEV Community