Implementing Live Camera OCR with Jetpack Compose

#android #jetpackcompose

Building apps that can seamlessly interpret real-world data is becoming increasingly essential, especially with the rise of AI and machine learning.

Integrating features like Optical Character Recognition (OCR) directly into mobile apps allows users to extract and process text from images or camera feeds, enhancing the app's interactivity and usefulness.

In this post, we’ll explore how to implement a live camera view with OCR in Jetpack Compose. Leveraging Compose's modern UI toolkit Jetpack's CameraX, and the power of ML Kit, we’ll create a streamlined, intuitive experience for real-time text detection. Whether you’re building a document scanner, a data entry helper, or just want to experiment with cool tech, this guide will provide a practical, step-by-step approach to integrating these features into your app.

The Composable

@OptIn(ExperimentalPermissionsApi::class)
@Composable
private fun CameraView(
    modifier: Modifier,
    onTextDetected: (Text) -> Unit = {},
) {
    val context = LocalContext.current
    val lifecycleOwner = LocalLifecycleOwner.current
    val permissionState = rememberPermissionState(permission = Manifest.permission.CAMERA) // 1.

    val cameraController = remember {
        LifecycleCameraController(context).apply {
            setEnabledUseCases(CameraController.IMAGE_ANALYSIS)
            setImageAnalysisAnalyzer(
                ContextCompat.getMainExecutor(context),
                TextRecognitionAnalyzer(onTextDetected = onTextDetected), // 2.
            )
        }
    }

    Box(
        modifier = modifier.fillMaxWidth(),
        contentAlignment = Alignment.Center,
    ) {
        AndroidView(
            modifier = Modifier
                .fillMaxSize()
                .clip(RoundedCornerShape(12.dp)),
            factory = { context ->
                PreviewView(context).apply { // 3.
                    scaleType = PreviewView.ScaleType.FILL_CENTER
                    layoutParams = ViewGroup.LayoutParams(
                        ViewGroup.LayoutParams.MATCH_PARENT,
                        ViewGroup.LayoutParams.MATCH_PARENT,
                    )
                    this.controller = cameraController
                    cameraController.bindToLifecycle(lifecycleOwner) // 4.
                }
            },
        )

        if (!permissionState.status.isGranted) { // 5.
            Column(
                horizontalAlignment = Alignment.CenterHorizontally,
            ) {
                Text(
                    text = "Needs camera permission",
                )
                Spacer(modifier = Modifier.size(8.dp))
                Button(
                    onClick = {
                        permissionState.launchPermissionRequest()
                    },
                ) {
                    Text(text = "Request permission")
                }
            }
        }
    }
}

Accessing the camera needs the appropriate permission. This example code handles permission requests and grants using the Google Accompanist Permissions Library.
A custom analyzer that has input an image and output some text. This will be covered in the next section.
The PreviewView from Jetpack's CameraX handles the live preview from the camera. Unfortunately, this is not Compose-native, so we need to use the old Android View.
The controller must be aware when the app is in the foreground/background for resource allocation, so this conveniently takes care of it.
The UI for displaying permission granting requests.

The Analyzer

internal class TextRecognitionAnalyzer(
    private val onTextDetected: (Text) -> Unit,
) : ImageAnalysis.Analyzer {

    private val scope: CoroutineScope = CoroutineScope(Dispatchers.IO + SupervisorJob())
    private val textRecognizer = TextRecognition.getClient(TextRecognizerOptions.DEFAULT_OPTIONS) // 1.

    @OptIn(ExperimentalGetImage::class)
    override fun analyze(imageProxy: ImageProxy) {
        scope.launch { // 2.
            val mediaImage = imageProxy.image ?: run {
                imageProxy.close()
                return@launch
            }

            val inputImage =
                InputImage.fromMediaImage(mediaImage, imageProxy.imageInfo.rotationDegrees)

            suspendCoroutine { continuation ->
                textRecognizer.process(inputImage)
                    .addOnSuccessListener { visionText: Text ->
                        if (visionText.text.isNotBlank()) {
                            onTextDetected(visionText) // 3.
                        }
                    }
                    .addOnCompleteListener {
                        continuation.resume(Unit)
                    }
            }
            delay(100)
        }.invokeOnCompletion { exception ->
            exception?.printStackTrace()
            imageProxy.close()
        }
    }
}

This is Google's ML KitRecognition for doing OCR (Optical Character Recognition) using machine learning. Essentially, it will analyze images and return text.
We are using coroutines to handle the background processing.
This is called when the OCR processing is complete and we can proceed with the business logic for the extracted text.

By integrating a live camera view with OCR in Jetpack Compose, you unlock powerful capabilities to process real-world data in real-time, adding immense value to your app's user experience. This combination of CameraX, ML Kit, and Compose makes it possible to create seamless, modern, and efficient interfaces while leveraging cutting-edge AI tools.

Happy coding!

DEV Community

Implementing Live Camera OCR with Jetpack Compose

The Composable

The Analyzer

Top comments (0)

Read next

Database Security Made Simple: Essential Practices

Artificial Intelligence

SigLIP 2: AI Breakthrough in Multilingual Image Understanding Achieves Record Accuracy

AI Driving Breakthrough: Self-Learning Cars Achieve 98.5% Success Rate in 100,000+ Test Scenarios