When working with large datasets, rendering all points in a line chart can cause significant performance issues. For example, plotting 50,000 data points directly can overwhelm the browser and make the chart unresponsive. Tools like amCharts and ApexCharts struggle with such datasets, while ECharts performs better but still isn't optimized for extremely large datasets.
To address this, I combined ECharts with the Largest Triangle Three Buckets (LTTB) algorithm. LTTB is a downsampling algorithm that reduces data points while preserving the visual structure of the dataset.
In this post, Iāll explain how the LTTB algorithm works step by step with an example of reducing 10 points to 4. For each step, I'll include the relevant code, explanations, and resulting data transformations.
LTTB Algorithm Code
Hereās the full code for the LTTB algorithm:
export function lttb(data: Point[], threshold: number): Point[] {
const dataLength = data.length;
if (threshold >= dataLength || threshold === 0) {
return data; // No downsampling needed
}
const sampled: Point[] = [];
let sampledIndex = 0;
const bucketSize = (dataLength - 2) / (threshold - 2);
let a = 0; // Start point
let nextA = 0;
sampled[sampledIndex++] = data[a]; // Add the first point
for (let i = 0; i < threshold - 2; i++) {
let avgX = 0;
let avgY = 0;
let avgRangeStart = Math.floor((i + 1) * bucketSize) + 1;
let avgRangeEnd = Math.floor((i + 2) * bucketSize) + 1;
avgRangeEnd = avgRangeEnd < dataLength ? avgRangeEnd : dataLength;
const avgRangeLength = avgRangeEnd - avgRangeStart;
for (; avgRangeStart < avgRangeEnd; avgRangeStart++) {
avgX += data[avgRangeStart].x;
avgY += data[avgRangeStart].y;
}
avgX /= avgRangeLength;
avgY /= avgRangeLength;
let rangeOffs = Math.floor((i + 0) * bucketSize) + 1;
const rangeTo = Math.floor((i + 1) * bucketSize) + 1;
const pointAX = data[a].x;
const pointAY = data[a].y;
let maxArea = -1;
for (; rangeOffs < rangeTo; rangeOffs++) {
const area = Math.abs(
(pointAX - avgX) * (data[rangeOffs].y - pointAY) -
(pointAX - data[rangeOffs].x) * (avgY - pointAY)
);
if (area > maxArea) {
maxArea = area;
nextA = rangeOffs;
}
}
sampled[sampledIndex++] = data[nextA]; // Add the most important point
a = nextA;
}
sampled[sampledIndex++] = data[dataLength - 1]; // Add the last point
return sampled;
}
Example: Reducing 10 Points to 4 Points
Weāll reduce the following dataset of 10 points to 4 points using the LTTB algorithm:
Original Dataset
const data = [
{ x: 1, y: 10 },
{ x: 2, y: 20 },
{ x: 3, y: 15 },
{ x: 4, y: 25 },
{ x: 5, y: 30 },
{ x: 6, y: 20 },
{ x: 7, y: 40 },
{ x: 8, y: 35 },
{ x: 9, y: 45 },
{ x: 10, y: 50 }
];
const threshold = 4; // Reduce to 4 points
Step 1: Add the First Point
sampled[sampledIndex++] = data[a]; // Add the first point
ā The algorithm always starts by adding the first data point. This ensures that the downsampled dataset begins correctly.
š Resulting Data:
sampled = [
{ x: 1, y: 10 }
];
Step 2: Calculate the Bucket Size
const bucketSize = (dataLength - 2) / (threshold - 2);
ā The bucket size determines how many data points fall into each "bucket." In this case:
So, each bucket contains 4 data points.
Step 3: First Bucket (Points 2ā5)
let avgX = 0;
let avgY = 0;
let avgRangeStart = Math.floor((i + 1) * bucketSize) + 1;
let avgRangeEnd = Math.floor((i + 2) * bucketSize) + 1;
avgRangeEnd = avgRangeEnd < dataLength ? avgRangeEnd : dataLength;
const avgRangeLength = avgRangeEnd - avgRangeStart;
for (; avgRangeStart < avgRangeEnd; avgRangeStart++) {
avgX += data[avgRangeStart].x;
avgY += data[avgRangeStart].y;
}
avgX /= avgRangeLength;
avgY /= avgRangeLength;
ā
The algorithm calculates the average point of the first bucket (points 2ā5). This is done by summing their x
and y
values and dividing by the number of points.
Step 4: Find the Most Significant Point
let rangeOffs = Math.floor((i + 0) * bucketSize) + 1;
const rangeTo = Math.floor((i + 1) * bucketSize) + 1;
const pointAX = data[a].x;
const pointAY = data[a].y;
let maxArea = -1;
for (; rangeOffs < rangeTo; rangeOffs++) {
const area = Math.abs(
(pointAX - avgX) * (data[rangeOffs].y - pointAY) -
(pointAX - data[rangeOffs].x) * (avgY - pointAY)
);
if (area > maxArea) {
maxArea = area;
nextA = rangeOffs;
}
}
ā The algorithm identifies the point in the bucket that forms the largest triangle with the average point and the previously selected point.
For this bucket, (2,20) has the largest area
sampled = [
{ x: 1, y: 10 },
{ x: 2, y: 20 }
];
Step 5: Second Bucket (Points 6ā9)
The algorithm repeats the process for the second bucket (points 6ā9), computing:
The most significant point in this bucket is (6, 20).
š Updated Data:
sampled = [
{ x: 1, y: 10 },
{ x: 2, y: 20 },
{ x: 6, y: 20 }
];
Step 6: Add the Last Point
sampled[sampledIndex++] = data[dataLength - 1]; // Add the last point
ā The algorithm always ends by adding the last point.
š Final Data:
sampled = [
{ x: 1, y: 10 },
{ x: 2, y: 20 },
{ x: 6, y: 20 },
{ x: 10, y: 50 }
];
Conclusion
By combining ECharts with the LTTB algorithm, I was able to downsample my dataset efficiently and improve chart performance significantly. If you work with large datasets, this method is a great way to balance performance and visual accuracy. Let me know if youāve tried this approach or if you have other techniques to share!
Top comments (0)