Tristan Elliott

Posted on Feb 4

Using Android to stream to Twitch. Part 3. Manual video encoding

#android #mobile #kotlin #tristan

Read before continuing
Goal of this series
Steps in this series
Where to start?
High level understanding
Understanding the encoder
The Surfaces
Creating a recording session
Creating a capture request
The actual encoding

PLEASE READ BEFORE CONTINUING

First I would like to apologize, I set a time limit of 2 hours when attempting to write this. To meet that time limit I had to rush and did not give the numerous sections the time and details that are needed.
THIS IS NOT A BEGINERS TUTORIAL!!!!!!!. This blog series will fall under the advanced tutorial. We will not be using a library! We are using threads and manually passing the bytes to the encoder. I say this not to discourage people from reading but simply to let people know that they may encounter some topics that might seem complicated.
Also, it would be helpful in your understanding of this blog post if you have already implemented a Camera preview

The Goal of this series

As the title states, this entire series will be about how to get the video from our Android device to stream on Twitch.

The steps you should take

1) get a preview working on your application
2) Allow your application to capture video
3) Create a secure Socket to connect to the Twitch injection servers
4) Perform the RTMP handshake
5) Encode the video from the device (very hard) This blog post
6) Send the encoded data to the Twitch injection server via the socket

Where to start?

Obviously, if you just want to record a video and have it saved to your device you should use the CameraX. However, we want a little more control so we are going to use the Camera2 API
Also, for manual encoding it all starts with the MediaCodec class. The MediaCodec class gives us access to to low level encoder/decoder components. Here is a visual demonstration from the documentation:

Long story short, The MediaCodec encoder is going to allow us to do this:

1) get a surface

2) pipe individual frames from the camera2 API into that surface

3) encode those frames into the format we want

4) save the video data on the device

What we are doing at a high level

So at a high level we have a camera and we need to get the data from that camera to a preview(shows the user what the camera is seeing) and to the encoder. The encoder will then allow us to encode the frames into an mp4 file via a MediaMuxer, which can then be saved to the public file system
Technically, the preview is not necessary but the user will want to see what they are recording

Understanding the encoder

So lets talk a little about the MediaCodec and how/what it is encoding. The main goal of the encoder is to take raw video frames and compress it into a more efficient format
With the encoder we must do 3 specific things:
- 1) Create it
- 2) Configure it
- 3) Start it

1) Create it

private val mEncoder: MediaCodec? by lazy {
        MediaCodec.createEncoderByType("video/avc")
    }

The code above will create an instance of the encoder that supports advanced video recording and delay the instantiation until it is used. Why use video/avc? Its common and I got it to work, which is good enough for me.

2) Configure it

Now, according to the documentation once the encoder is created, it enters the Uninitialized state. To transfer it to the next state, we first have to configure the encoder. Our configuration code looks like this:
Official Google code example (different then below)

    init {

        val codecProfile =1

        val format = MediaFormat.createVideoFormat(mMimeType, width, height)

        // Set some properties.  Failing to specify some of these can cause the MediaCodec
        // configure() call to throw an unhelpful exception.
        format.setInteger(
            MediaFormat.KEY_COLOR_FORMAT,
            MediaCodecInfo.CodecCapabilities.COLOR_FormatSurface)
        format.setInteger(MediaFormat.KEY_BIT_RATE, bitRate)
        format.setInteger(MediaFormat.KEY_FRAME_RATE, frameRate)
        format.setInteger(MediaFormat.KEY_I_FRAME_INTERVAL, IFRAME_INTERVAL)


            format.setInteger(MediaFormat.KEY_PROFILE, codecProfile)
            format.setInteger(MediaFormat.KEY_COLOR_STANDARD, MediaFormat.COLOR_STANDARD_BT2020)
            format.setInteger(MediaFormat.KEY_COLOR_RANGE, MediaFormat.COLOR_RANGE_FULL)
            format.setInteger(MediaFormat.KEY_COLOR_TRANSFER, getTransferFunction())
            format.setFeatureEnabled(MediaCodecInfo.CodecCapabilities.FEATURE_HdrEditing, true)


        Log.d(TAG, "format: " + format)

        // Create a MediaCodec encoder, and configure it with our format.  Get a Surface
        // we can use for input and wrap it with a class that handles the EGL work.
        mEncoder!!.configure(format, null, null, MediaCodec.CONFIGURE_FLAG_ENCODE)
    }

Notice that we first have to create the format: MediaFormat.createVideoFormat(mMimeType, width, height). Which the mMimeType should be the same as what was used for createEncoderByType and the width and height would be the width and the height of the android device. The format is how we tell the encoder what it should do
Next you will see all the setInteger(), which is used to set all the values inside of the format. MediaFormat has all of its values stored in key value pairs and we change them by calling setInteger
Lastly we call, mEncoder!!.configure(format, null, null, MediaCodec.CONFIGURE_FLAG_ENCODE), which will transition the encoder to the configured state. According to the documentation our encoder it now ready to start

2) Start it

You should technically only call start() when you want to start recording. Calling will transition the encoder into the Executing, meaning we can now process data through the buffer queue manipulation. In the official code example, you can see the start method looks like this:

    public fun start() {
        if (mUseMediaRecorder) {
            mMediaRecorder!!.apply {
                prepare()
                start()
            }
        } else {
            mEncoder!!.start()

            // Start the encoder thread last.  That way we're sure it can see all of the state
            // we've initialized.
            mEncoderThread!!.start()
            mEncoderThread!!.waitUntilReady()
        }
    }

In the code above, ignore the mUseMediaRecorder conditional(I completely deleted it in my code). We literally just call mEncoder!!.start(). The rest of the code associated with mEncoderThread will be talked about later in the blog post

The Surfaces

The Surfaces are handed to our camera and will automatically handle the image buffers. In this application we are using 2 surfaces. A custom Preview surface(automatically show the recording to the user) and a Encoder surface, which is how we get the data to the actual encoder. As previously mentioned these surfaces need to be handed to the Camera device and we can do that through a session, specifically a CameraCaptureSession

Creating a recording session

CameraCaptureSession documentation
What we are doing here is creating a CameraCaptureSession and then giving that session to the CameraDevice. Now this is the exact process that we use to get data from our camera to the preview and the encoder. When we think session we should think communication. Below is an example of the code found HERE


    }

    /**
     * Starts a [CameraCaptureSession] and returns the configured session (as the result of the
     * suspend coroutine
     */
    private suspend fun createCaptureSession(
        device: CameraDevice,
        targets: List<Surface>,
    ): CameraCaptureSession = suspendCoroutine { cont ->

        // Create a capture session using the predefined targets; this also involves defining the
        // session state callback to be notified of when the session is ready
        // Convert List<Surface> into OutputConfiguration
        val outputConfigs = targets.map { OutputConfiguration(it) }

        //  Create SessionConfiguration
        val sessionConfig = SessionConfiguration(
            SessionConfiguration.SESSION_REGULAR, // Use SESSION_HIGH_SPEED for high frame rates
            outputConfigs,
            Executors.newSingleThreadExecutor(), // Ensures callback execution
            object : CameraCaptureSession.StateCallback() {
                override fun onConfigured(session: CameraCaptureSession) {
                    cont.resume(session) // Resume coroutine with session
                }

                override fun onConfigureFailed(session: CameraCaptureSession) {
                    val exc = RuntimeException("Camera ${device.id} session configuration failed")
                    Log.e("CameraSession", exc.message, exc)
                    cont.resumeWithException(exc)
                }
            }
        )
        device.createCaptureSession(
            sessionConfig
        )

    }

I would like to point out that the targets: List<Surface> must be the exact same targets used as the CaptureRequest

Creating a capture request

CaptureRequest documentation
The capture request is how we can continuously notify the encoder that there is a new frame to be encoded. We can create one like so:

 fun createRecordRequest(session: CameraCaptureSession,
                            previewStabilization: Boolean = false): CaptureRequest {
        // Capture request holds references to target surfaces
        return session.device.createCaptureRequest(CameraDevice.TEMPLATE_RECORD).apply {
            // Add the preview and recording surface targets
            addTarget(viewFinder.holder.surface)
            addTarget(encoder.getInputSurface())

            // Sets user requested FPS for all targets
            set(CaptureRequest.CONTROL_AE_TARGET_FPS_RANGE, Range(fps, fps))

            if (previewStabilization) {
                set(CaptureRequest.CONTROL_VIDEO_STABILIZATION_MODE,
                    CaptureRequest.CONTROL_VIDEO_STABILIZATION_MODE_PREVIEW_STABILIZATION)
            }
        }.build()
    }

and then register it like this:

session.setRepeatingRequest(createRecordRequest(session),
            object : CameraCaptureSession.CaptureCallback() {
                override fun onCaptureCompleted(session: CameraCaptureSession,
                                                request: CaptureRequest,
                                                result: TotalCaptureResult
                ) {
                    Log.d("onCaptureCompleted","CAPTURE")

                        encoder.frameAvailable()

                }
            }, cameraHandler)

The actual encoding

-Official google code example

The actual data encoding starts when this code is run:


        val recordTargets = pipeline.getRecordTargets()
        session = createCaptureSession(camera, recordTargets)
        encoder.start()

        session.setRepeatingRequest(recordRequest,
            object : CameraCaptureSession.CaptureCallback() {
                override fun onCaptureCompleted(session: CameraCaptureSession,
                                                request: CaptureRequest,
                                                result: TotalCaptureResult
                ) {
                    Log.d("onCaptureCompleted","CAPTURE")

                        encoder.frameAvailable()

                }
            }, cameraHandler)

Which then runs this code inside of the main encoder:

public fun frameAvailable() {
        val handler = mEncoderThread!!.getHandler()
        handler.sendMessage(handler.obtainMessage(
            EncoderThread.EncoderHandler.MSG_FRAME_AVAILABLE))
    }

Which then runs this code inside of a custom thread:

  fun frameAvailable() {
           // Log.d("THREADframeAvailable", "frameAvailable")
            if (drainEncoder()) {
                synchronized (mLock) {
                    mFrameNum++
                    mLock.notify()
                }
            }
        }

on that same thread, this code is then run:

 public fun drainEncoder(): Boolean {
            val TIMEOUT_USEC: Long = 0     // no timeout -- check for buffers, bail if none
            var encodedFrame = false

            while (true) {


                var encoderStatus: Int = mEncoder.dequeueOutputBuffer(mBufferInfo, TIMEOUT_USEC)
                if (encoderStatus == MediaCodec.INFO_TRY_AGAIN_LATER) {
                    // no output available yet
                    break;
                } else if (encoderStatus == MediaCodec.INFO_OUTPUT_FORMAT_CHANGED) {
                    // Should happen before receiving buffers, and should only happen once.
                    // The MediaFormat contains the csd-0 and csd-1 keys, which we'll need
                    // for MediaMuxer.  It's unclear what else MediaMuxer might want, so
                    // rather than extract the codec-specific data and reconstruct a new
                    // MediaFormat later, we just grab it here and keep it around.
                    mEncodedFormat = mEncoder.getOutputFormat()
                    Log.d("drainEncoder", "encoder output format changed: " + mEncodedFormat)
                } else if (encoderStatus < 0) {
                    Log.w("drainEncoder", "unexpected result from encoder.dequeueOutputBuffer: " +
                            encoderStatus)
                    // let's ignore it
                } else {
                    //encodedData is the actual compressed video frames, encoded and ready for storage
                    var encodedData: ByteBuffer? = mEncoder.getOutputBuffer(encoderStatus)
                    if (encodedData == null) {
                        throw RuntimeException("encoderOutputBuffer " + encoderStatus +
                                " was null");
                    }

                    if ((mBufferInfo.flags and MediaCodec.BUFFER_FLAG_CODEC_CONFIG) != 0) {
                        // The codec config data was pulled out when we got the
                        // INFO_OUTPUT_FORMAT_CHANGED status.  The MediaMuxer won't accept
                        // a single big blob -- it wants separate csd-0/csd-1 chunks --
                        // so simply saving this off won't work.
                       Log.d("drainEncoder", "ignoring BUFFER_FLAG_CODEC_CONFIG")
                        mBufferInfo.size = 0
                    }

                    if (mBufferInfo.size != 0) {
                        // adjust the ByteBuffer values to match BufferInfo (not needed?)
                        //tells where the valid data starts and moves the buffer's read pointer to the start of the valid data.
                        encodedData.position(mBufferInfo.offset)
                        //prevents reading beyond the valid data.
                        //This ensures only the encoded frame data and not extra padding or old data is sent to
                        //the MediaMixer
                        encodedData.limit(mBufferInfo.offset + mBufferInfo.size)

                        if (mVideoTrack == -1) {
                            //initialize the MediaMuxer if needed
                            mVideoTrack = mMuxer.addTrack(mEncodedFormat!!)
                            mMuxer.setOrientationHint(mOrientationHint)
                            mMuxer.start()
                            Log.d("drainEncoder", "Started media muxer")
                        }


                        //writes the encoded frame into the muxer.
                        mMuxer.writeSampleData(mVideoTrack, encodedData, mBufferInfo)
                        encodedFrame = true


                        Log.d("drainEncoder", "sent " + mBufferInfo.size + " bytes to muxer, ts=" +
                                    mBufferInfo.presentationTimeUs)

                    }

                    mEncoder.releaseOutputBuffer(encoderStatus, false)

                    if ((mBufferInfo.flags and MediaCodec.BUFFER_FLAG_END_OF_STREAM) != 0) {
                        Log.w("drainEncoder", "reached end of stream unexpectedly")
                        break      // out of while
                    }
                }
            }

            return encodedFrame
        }

This will continously loop because the encoder's surface was passed in with this function: setRepeatingRequest

Stopping the encoder

Official Google code example of shutting down and saving
One of first things you want to do is to stop the session, close it and then delay:

 session.stopRepeating()
 session.close()
 delay(2000)

I added the delay because I was having a race condition that I could not figure out. Adding a delay seemed to fix it.
Next we need to shut down the encoder:

public fun shutdown(): Boolean {
        Log.d(TAG, "releasing encoder objects")
        Log.d("stopStreamEncoding", "releasing encoder objects")


        val handler = mEncoderThread!!.getHandler()
        handler.sendMessage(handler.obtainMessage(EncoderThread.EncoderHandler.MSG_SHUTDOWN))
        Log.d("stopStreamEncoding", "sendMessage to handler")
        try {
            Log.d("stopStreamEncoding", "sendMessage to JOIN")
            mEncoderThread!!.join()//
        } catch (ie: InterruptedException ) {
            Log.w(TAG, "Encoder thread join() was interrupted", ie)
        }

        Log.d("stopStreamEncoding", "STOP AND RELEASE")
        mEncoder!!.stop()
        mEncoder!!.release()

        return true
    }

Which then leads to us quitting the thread and stoping the Muxer:

 fun shutdown() {
             Log.d(TAG, "shutdown the mMuxer")
            Looper.myLooper()!!.quit()
            mMuxer.stop()
            mMuxer.release()
        }

Saving the data to a public device

This part was actually the most confusing to me but we need to create a path and then save it:

 private val outputFile: File by lazy { createFile(requireContext(), "mp4") }
MediaScannerConnection.scanFile(
                                        context,
                                        arrayOf(outputFile.absolutePath),
                                        arrayOf("video/mp4"),
                                        null
                                    )
 if (outputFile.exists()) {
                                        Log.d("stopStreamEncoding", "EXISTS")
                                        // Launch external activity via intent to play video recorded using our provider
                                        startActivity(Intent().apply {
                                            action = Intent.ACTION_VIEW
                                            type = MimeTypeMap.getSingleton()
                                                .getMimeTypeFromExtension(outputFile.extension)
                                            val authority = "${BuildConfig.APPLICATION_ID}.provider"
                                            data = FileProvider.getUriForFile(view.context, authority, outputFile)
                                            flags = Intent.FLAG_GRANT_READ_URI_PERMISSION or
                                                    Intent.FLAG_ACTIVITY_CLEAR_TOP
                                        })
                                    }

Long story short it does this: This code informs the Android system that a new media file (video) has been created.Triggers the Media Scanner to index the file, making it available in the system's media library. Checks if the file exists and then shows the recording to the user via an intent
Before this will work we need to go the Android manifest and register a provider:

  <!-- FileProvider used to share media with other apps -->
        <provider
            android:name="androidx.core.content.FileProvider"
            android:authorities="${applicationId}.provider"
            android:exported="false"
            android:grantUriPermissions="true">
            <meta-data
                android:name="android.support.FILE_PROVIDER_PATHS"
                android:resource="@xml/file_paths"/>
        </provider>

lastly we create a file_paths.xml file:

<paths xmlns:android="http://schemas.android.com/apk/res/android">
    <!-- Exposes the Movies directory on external storage -->
    <external-path
        name="external_storage"
        path="Movies/YourAppName/" />

</paths>

Conclusion

Thank you for taking the time out of your day to read this blog post of mine. If you have any questions or concerns please comment below or reach out to me on Twitter.

DEV Community

Using Android to stream to Twitch. Part 3. Manual video encoding

Table of contents

My app on the Google play store

My app's GitHub code

Resources