Data Processing Optimization in Model Lightweighting of HarmonyOS Next

This article aims to deeply explore the technical details of data processing optimization related to model lightweighting in the Huawei HarmonyOS Next system, and is summarized based on actual development practices. It mainly serves as a vehicle for technical sharing and communication. Mistakes and omissions are inevitable. Colleagues are welcome to put forward valuable opinions and questions so that we can make progress together. This article is original content, and any form of reprint must indicate the source and the original author.

1. The Impact of Data Processing on Model Lightweighting

(1) Importance Analysis

In the model world of HarmonyOS Next, data processing is like a chef preparing ingredients for a model. The quality and processing method of the ingredients (data) directly affect the taste (performance) of the final dish (model). Data processing plays a crucial role in model lightweighting. Reasonable data processing can reduce the amount of data required for model training, improve training efficiency, and thus contribute to model lightweighting. For example, through effective data processing, noise and redundant information in the data can be removed, enabling the model to learn the key features in the data more quickly and reducing unnecessary computational and storage overhead.

(2) Impact on the Model Training and Optimization Process

Impact on the Training Process The data processing method directly affects the model training process. If the data is not properly processed, the model may take more time and resources to learn the features in the data. For example, uneven data distribution may cause the model to be biased towards certain features during training, affecting the generalization ability of the model. Through data preprocessing, such as normalization or standardization, the data distribution can be made more reasonable, the convergence speed of the model can be accelerated, and the number of training rounds can be reduced. It's like providing athletes with a flat and regular track in a race, allowing them to reach the finish line more quickly.
Impact on the Optimization Process Data processing also plays a key role in the model optimization stage. For example, during the model pruning process, if the data is not processed properly, it may lead to incorrect evaluation of the importance of neurons. If there are outliers in the data, some neurons may show high activity when processing these abnormal data, and thus be misidentified as important neurons and retained, affecting the pruning effect. Through reasonable data processing, such as data cleaning to remove outliers, the accuracy of model optimization can be improved, and the effect of model lightweighting can be better.

(3) Examples of the Indirect Impact of Different Data Processing Strategies on Model Performance

Data Sampling Strategy Data sampling is a common data processing strategy. For example, when processing a large - scale image dataset, if a random down - sampling strategy is adopted to reduce the amount of training data, the computational cost of model training can be reduced. However, if the sampling ratio is inappropriate, some important information may be lost, resulting in a decrease in the accuracy of the model. Suppose the original image dataset has 100,000 images and is randomly down - sampled to 50,000 images. If the sampling process does not fully consider the diversity of the data, the accuracy of the model on the test set may drop from 90% to 85%. However, if stratified sampling is adopted, and reasonable sampling is carried out according to factors such as the category of the images, the diversity of the data can be maintained while reducing the amount of data, and the accuracy may only drop to 88%, while the computational cost is significantly reduced.
Data Transformation Strategy Data transformations such as flipping and rotating images also affect model performance. Taking an image classification model as an example, randomly flipping the training data can increase the diversity of the data and enable the model to learn more features of the images. However, if the flipping is excessive, it may introduce too much similar data, leading to overfitting of the model. For example, for a dataset containing animal pictures, if each picture is flipped multiple times, the model may over - focus on the local features of the animals (such as the left - right part features of the animals being over - learned due to flipping) and ignore the overall features, so the accuracy on the test set may drop from 92% to 89%. But if the flipping is moderate, such as each picture is randomly flipped 0 - 1 times, the accuracy may increase to 93%, and the generalization ability of the model is also enhanced.

2. Data Augmentation and Pre - processing Technologies

(1) Data Augmentation Technologies and Their Functions

Flipping Operation The flipping operation is a simple and effective data augmentation technology. For image data, horizontal or vertical flipping can increase the diversity of the data. For example, in a face recognition model, face pictures may have left - right symmetry. By horizontally flipping the pictures, the model can learn the left - right symmetry features of the face, thereby improving the model's recognition ability for faces at different angles. In HarmonyOS Next, the relevant image processing library (such as the HarmonyOS - adapted library of OpenCV) can be used to easily achieve image flipping. The following is a simple code example (simplified version):

import cv from '@ohos.multimedia.camera.cv';

// Load the image
let image = cv.imread('face_image.jpg');

// Horizontally flip the image
let flippedImage = cv.flip(image, 1); // 1 represents horizontal flipping

// Save the flipped image
cv.imwrite('flipped_face_image.jpg', flippedImage);

Cropping Operation The cropping operation can make the model focus on different regions of the image and improve the robustness of the model. In a target detection model, randomly cropping the image can enable the model to learn the features of the target object in different positions and sizes. For example, in a vehicle detection model, cropping out part of the background or part of the vehicle in the image allows the model to still accurately detect the target object when it is partially occluded. In HarmonyOS Next, a similar image processing library can be used to achieve the cropping operation. Suppose we want to crop a specified - size area from the center of the image. The code example is as follows:

import cv from '@ohos.multimedia.camera.cv';

// Load the image
let image = cv.imread('car_image.jpg');

// Get the image size
let height = image.rows;
let width = image.cols;

// Define the cropping area (here, assume cropping the central area, with a size of half of the original image)
let x = width / 4;
let y = height / 4;
let cropWidth = width / 2;
let cropHeight = height / 2;

// Crop the image
let croppedImage = image.submat(y, y + cropHeight, x, x + cropWidth);

// Save the cropped image
cv.imwrite('cropped_car_image.jpg', croppedImage);

Rotation Operation The rotation operation can simulate the situation of the image at different angles. In an image classification model, randomly rotating the image can enable the model to learn the features of the object at different angles. For example, in a digit recognition model, rotating the digit pictures can enable the model to recognize digits at different tilt angles. The code example for implementing the rotation operation using the image processing library is as follows (taking a 30 - degree rotation as an example):

import cv from '@ohos.multimedia.camera.cv';

// Load the image
let image = cv.imread('digit_image.jpg');

// Get the center coordinates of the image
let center = new cv.Point(image.cols / 2, image.rows / 2);

// Define the rotation matrix, here rotate 30 degrees
let rotationMatrix = cv.getRotationMatrix2D(center, 30, 1);

// Perform the rotation operation
let rotatedImage = cv.warpAffine(image, rotationMatrix, new cv.Size(image.cols, image.rows));

// Save the rotated image
cv.imwrite('rotated_digit_image.jpg', rotatedImage);

(2) Data Pre - processing Methods and Optimization

Normalization Method Normalization is the process of mapping data to a specific interval. In model training, the commonly used normalization method is to normalize the data to the interval of 0 to 1 or - 1 to 1. The purpose is to make the data comparable between different features and accelerate the training speed of the model. For example, in a house price prediction model, if the input data includes the house area (in square meters, with a relatively large value) and the number of rooms (with a relatively small value), without normalization, the area feature may dominate the model training, causing the model to ignore other features such as the number of rooms. In HarmonyOS Next, the normalization operation can be performed during the data loading stage. Suppose we have a dataset that contains feature data features. The following is a simple code example for normalizing to the 0 - 1 interval:

// Assume features is a two - dimensional array, and each row represents the features of a sample
let maxValues = features[0].map((value) => value);
let minValues = features[0].map((value) => value);

// Find the maximum and minimum values of each feature
for (let i = 1; i < features.length; i++) {
    for (let j = 0; j < features[i].length; j++) {
        if (features[i][j] > maxValues[j]) {
            maxValues[j] = features[i][j];
        }
        if (features[i][j] < minValues[j]) {
            minValues[j] = features[i][j];
        }
    }
}

// Normalization operation
let normalizedFeatures = features.map((sample) => {
    return sample.map((value, index) => (value - minValues[index]) / (maxValues[index] - minValues[index]));
});

Standardization Method Standardization is the process of transforming data into a distribution with a mean of 0 and a standard deviation of 1. This method is very effective when dealing with data with normal distribution characteristics. For example, in a stock price prediction model, the fluctuation data of stock prices usually has certain normal distribution characteristics. Through standardization, the model can better learn the distribution law of the data. In HarmonyOS Next, the standardization operation can be implemented using a statistical calculation library. The following is a simple standardization code example (assuming the use of the stats library, and in practice, an appropriate library may need to be selected according to the specific situation):

import stats from '@ohos.stats';

// Assume features is a two - dimensional array, and each row represents the features of a sample
let meanValues = [];
let stdDevValues = [];

// Calculate the mean and standard deviation of each feature
for (let j = 0; j < features[0].length; j++) {
    let sum = 0;
    for (let i = 0; i < features.length; i++) {
        sum += features[i][j];
    }
    meanValues.push(sum / features.length);

    let varianceSum = 0;
    for (let i = 0; i < features.length; i++) {
        varianceSum += Math.pow(features[i][j] - meanValues[j], 2);
    }
    stdDevValues.push(Math.sqrt(varianceSum / features.length));
}

// Standardization operation
let standardizedFeatures = features.map((sample) => {
    return sample.map((value, index) => (value - meanValues[index]) / stdDevValues[index]);
});

(3) Key Points for Optimizing the Data Processing Process

Selection of Data Augmentation Strategies When choosing data augmentation strategies, reasonable selection should be made according to the type of the model and the application scenario. For image classification models, operations such as flipping and rotating may be more effective; for target detection models, the cropping operation may be more critical. At the same time, pay attention to the degree of data augmentation to avoid over - augmentation leading to overfitting of the model. For example, in a simple image classification model, if each picture is subjected to excessive rotation and flipping operations, the model may learn too much noise information, resulting in a decrease in the accuracy on the test set.
Adjustment of Pre - processing Parameters When performing data normalization and standardization, adjust the parameters according to the actual distribution of the data. For example, in the normalization process, if there are outliers in the data, it may affect the selection of the maximum and minimum values, and thus affect the normalization effect. Outliers can be processed first, such as deleting or correcting them, and then normalization can be carried out. In the standardization process, if the data distribution deviates significantly from the normal distribution, the data may need to be transformed first to make it closer to the normal distribution, and then the standardization operation can be carried out to improve the training effect of the model.

3. Case of Collaborative Optimization of Data Processing and Model

(1) Case Background and Objectives

We take a plant recognition application running on a HarmonyOS Next device as an example. This application needs to classify the photographed plant pictures and identify the types of plants. Since the resources of HarmonyOS Next devices are limited, our goal is to make the model lightweight and improve the running efficiency of the model on the device through the collaborative application of data processing, model structure optimization, quantization and other technologies while ensuring the recognition accuracy.

(2) Collaborative Optimization Process

Data Processing Stage
- Data Augmentation: First, perform data augmentation operations on the plant picture dataset. Random flipping, rotation (random rotation between - 15 degrees and 15 degrees), and cropping (randomly cropping 10% of the image edges) operations are adopted to increase the diversity of the data. Through these operations, the size of the dataset has increased by about 3 times, enabling the model to learn the features of plants in different postures and angles.
- Data Pre - processing: Normalize the enhanced dataset and normalize the image pixel values to the 0 - 1 interval. This helps to accelerate the training speed of the model and enables the model to converge better during the training process.
Model Structure Optimization Stage Structured pruning technology is adopted to optimize the model. According to the analysis of neuron activity, some convolutional layers and fully - connected layers with low activity in the model are pruned. For example, in a convolutional neural network - based plant recognition model, about 40% of the neurons in the last fully - connected layer and one convolutional layer are pruned. After pruning, the number of model parameters is reduced by about 50%, and the computational complexity is significantly reduced.
Quantization Stage Quantize the pruned model. The uniform quantization method is adopted to convert the model parameters from 32 - bit floating - point numbers to 8 - bit integers. After quantization, the storage size of the model is further reduced, and the computational efficiency is improved. During the quantization process, according to the distribution range of the model parameters, the quantization range is set to - 0.5 to 0.5, and the quantization bit number is 8 bits.

(3) Analysis of Performance Improvement Effects

Accuracy Evaluation Before collaborative optimization, the accuracy of the model on the test set was 85%. After the collaborative optimization of data processing and the model, the accuracy increased to 90%. This is mainly due to the fact that data augmentation enables the model to learn more features, and the fine - tuning of the model during the model structure optimization and quantization processes reduces the risk of overfitting and improves the generalization ability of the model.
Resource Utilization Evaluation
- Model Size: Before optimization, the model size was 30MB. After structured pruning and quantization, the model size was reduced to 5MB. This is a huge advantage for HarmonyOS Next devices with limited storage resources, making it easier to deploy the model on the device.
- Computational Load: In the inference stage, the computational load of the model before optimization was 3 million operations, and after optimization, the computational load was reduced to 1 million operations, and the computational speed was increased by about 2 times. This enables the plant recognition application to give recognition results more quickly in actual use and improves the user experience.

(4) Summary of Key Points and Precautions

Collaborative Optimization Order When collaboratively applying data processing, model structure optimization, quantization and other technologies, pay attention to the optimization order. Generally, data processing, such as data augmentation and pre - processing, is carried out first to provide a better data foundation for model training; then model structure optimization, such as pruning, is carried out to reduce the number of model parameters and computational complexity; finally, quantization processing is carried out to further compress the model size and improve computational efficiency. If the order is inappropriate, the optimization effect may be affected. For example, if quantization is carried out first and then pruning, due to the limited representation range of the quantized parameters, the importance of neurons may be mis - evaluated, affecting the pruning effect.
Adaptation of Data and Model Data processing and model optimization should be adapted to each other. The data augmentation method should be selected according to the structure of the model and the application scenario to avoid introducing features that are not relevant to the model or difficult to learn. At the same time, the parameter settings of model structure optimization and quantization should consider the characteristics of the data, such as the distribution range of the data and the importance of features. For example, in the quantization process, if the numerical range of the data is large and the quantization range is set unreasonably, significant accuracy loss may occur.
Performance Monitoring and Adjustment During the collaborative optimization process, continuously monitor the performance indicators of the model, such as accuracy, model size, computational load, etc. If it is found that the optimization at a certain stage leads to a decline in performance, analyze the reasons in time and adjust the optimization strategy. For example, if the accuracy drops too much after pruning, the pruning ratio can be appropriately adjusted or other optimization methods (such as fine - tuning) can be adopted to restore the accuracy. It is hoped that through this case analysis, some practical experience and references can be provided for everyone in the data processing optimization of model lightweighting in HarmonyOS Next, so that everyone can better apply these technologies in actual development and create more efficient and lightweight intelligent models. If you encounter other problems during the practice process, you are welcome to communicate and discuss together! Haha!