DEV Community

Cover image for Filling a 10 Million Image Grid with PHP for Internet History
Vincent Boon
Vincent Boon

Posted on

Filling a 10 Million Image Grid with PHP for Internet History

I am building 10MPage.com which captures the state of the internet in 2025. Every internet user is allowed to upload a small image of 64x64 pixels and contribute to this archive.

The flow for adding new images works like this: When someone uploads an image, it becomes a pending tile. Each pending tile must be approved, given the unpredictability of internet submissions...

After approval the tile will be placed on the grid. The grid is a database table called tiles where each row has a X and Y field.

Pending tiles are larger images that need to be placed into the tiles table. For example, there can be 20 pending tiles with a size of 1x1 and others with larger sizes. Larger pending tiles will be split up into multiple tiles. Ultimately, all tiles are 1x1.

In this post, I describe my journey of placing pending tiles onto a grid and optimizing the process for large-scale efficiency.
10MPage must be able to reach 10 million small tiles.
Spoiler alert: With my first approach and a few thousand tiles it took a few seconds to add a new tile. The larger the grid grew the slower it became. I made a quick calculation to see how long it would take to add 10 million tiles... a few years....

Let's take a look at my initial approach. It was simple, loop through the grid to find an empty spot and place the pending tile there. This includes some logic on how the grid expands as it grows larger.

Here is the method for finding an available spot based on the given width and height of the pending tile:

public function find(int $blockWidth, int $blockHeight): array
{
    $currentMaxX = Tile::query()->max('x') ?? 0;
    $currentMaxY = Tile::query()->max('y') ?? 0;
    // Calculate current grid width and height
    $currentWidth = $currentMaxX + 1;
    $currentHeight = $currentMaxY + 1;
    // Determine the new grid dimensions based on maintaining a square
    $newWidth = $currentWidth;
    $newHeight = $currentHeight;
    if ($currentWidth > $currentHeight) {
        $newHeight++;
    } elseif ($currentHeight > $currentWidth) {
        $newWidth++;
    } else {
        // If currently a square, expand the width by one
        $newWidth++;
    }
    // Look for a fitting spot within the updated dimensions
    for ($y = 0; $y < $newHeight; $y++) {
        for ($x = 0; $x < $newWidth; $x++) {
            if ($this->canPlaceBlock($x, $y, $blockWidth, $blockHeight)) {
                // Return the top-left coordinate where the block can be placed
                return ['x' => $x, 'y' => $y];
            }
        }
    }
    return [0,0];
}

protected function canPlaceBlock(int $startX, int $startY, int $blockWidth, int $blockHeight): bool
{
    for ($y = $startY; $y < $startY + $blockHeight; $y++) {
        for ($x = $startX; $x < $startX + $blockWidth; $x++) {
            if (Tile::query()->where('x', $x)->where('y', $y)->exists()) {
                return false;
            }
        }
    }
    return true;
}   
Enter fullscreen mode Exit fullscreen mode

This approach works well for small numbers of tiles, but as the grid grows larger, it becomes very slow. This is to be expected as the loop ALWAYS starts from zero.

A quick win for larges blocks was to rewrite the canPlaceBlock method to this so that it only runs a single query:

public function canPlaceBlock(int $startX, int $startY, int $blockWidth, int $blockHeight): bool
{
    $ys = range($startY, $startY + $blockHeight - 1);
    $xs = range($startX, $startX + $blockWidth - 1);

    return ! Tile::whereIn('x', $xs)->whereIn('y', $ys)->exists();
}
Enter fullscreen mode Exit fullscreen mode

Let's also try to optimize the find method by starting at the lowest x and y that currently exist in the database:

$maxX = Tile::query()->max('x') ?? 1000;
$maxY = Tile::query()->max('y') ?? 1000;
$minX = Tile::query()->min('x') ?? 0;
$minY = Tile::query()->min('y') ?? 0;

// Map the occupied tiles to a key-value array for faster lookups
$occupiedTiles = Tile::query()
    ->where('x', '>=', $minX)
    ->where('x', '<=', $maxX)
    ->where('y', '>=', $minY)
    ->where('y', '<=', $maxY)
    ->get()
    ->mapWithKeys(fn (Tile $tile) => [$tile->x.','.$tile->y => true]);

// Try finding a spot in the current grid dimensions
// Scan for a suitable spot within the current grid bounds
for ($x = $minX; $x <= $maxX; $x++) {
    for ($y = $minY; $y <= $maxY; $y++) {
        if (! isset($occupiedTiles["$x,$y"]) && $this->canPlaceBlock($x, $y, $blockWidth, $blockHeight)) {
            return ['x' => $x, 'y' => $y];
        }
    }
}

// Expand the grid in the shorter dimension to maintain a square-like shape
$currentWidth = $maxX - $minX + 1;
$currentHeight = $maxY - $minY + 1;

if ($currentWidth <= $currentHeight) {
    return [
        'x' => $maxX + 1,
        'y' => 0,
    ];
} else {
    return [
        'x' => 0,
        'y' => $maxY + 1,
    ];
}
Enter fullscreen mode Exit fullscreen mode

This approach however does not solve anything, the min values are still zero, it consumes more memory by loading the entire grid to allow faster checks for free tiles. However, a query still has to be done to check if the entire tile will fit (only for pending tiles with a size greater than one, but that isn't implemented).

With both these solutions there are these two issues:

  • Slow with large number of tiles
  • Only 1 pending tile can be placed at the same time

What if we worked with smaller blocks? Let's say 100x100 tiles. With this we can solve both issues, first of all we never have to check a larger grid than 100x100. And secondly we can run concurrent processes to place pending tiles in the different block.

For the concurrency to work we need to make sure that each block is only used once at the same time and that the pending tile does not overflow out of the block.

I've decided to call these blocks 'Placement blocks' and have created a new database table called placement_blocks. For each of these blocks we can store the min/max x/ and a boolean if the block is full.

Once all the blocks are full, new ones need to be created at the right and bottom of the grid. Which looks like this:

Placement block expansion

The placement process now has to start with finding an available placement block. First of all we have this recursive function to find an available block. If no blocks are left it will create new placement blocks:

/**
 * Find an available placement block
 * If no blocks are available a new layer on the grid will be created
 */
public function find(array $excludeBlocks = []): PlacementBlock
{
    $availableBlock = PlacementBlock::query()
        ->where('full', '=', false)
        ->orderBy('min_y')
        ->when($excludeBlocks !== [], function ($query) use ($excludeBlocks) {
            return $query->whereNotIn('id', $excludeBlocks);
        })
        ->first();

    if ($availableBlock !== null) {
        return $availableBlock;
    }

    $currentMaxX = PlacementBlock::query()->max('max_x');
    $currentMaxY = PlacementBlock::query()->max('max_y');

    if ($currentMaxY === null) {
        return PlacementBlock::query()->create([
            'min_x' => 0,
            'min_y' => 0,
            'max_x' => static::PLACEMENT_BLOCK_SIZE - 1,
            'max_y' => static::PLACEMENT_BLOCK_SIZE - 1,
        ]);
    }

    $newYBlockCount = round(round($currentMaxY + static::PLACEMENT_BLOCK_SIZE) / static::PLACEMENT_BLOCK_SIZE);
    $newXBlockCount = round(round($currentMaxX + static::PLACEMENT_BLOCK_SIZE) / static::PLACEMENT_BLOCK_SIZE);

    // Create new blocks on the right side
    for ($y = 0; $y < $newYBlockCount; $y++) {
        $targetX = $newXBlockCount - 1;

        PlacementBlock::query()->create([
            'min_x' => $targetX * static::PLACEMENT_BLOCK_SIZE,
            'min_y' => $y * static::PLACEMENT_BLOCK_SIZE,
            'max_x' => ($targetX + 1) * static::PLACEMENT_BLOCK_SIZE - 1,
            'max_y' => ($y + 1) * static::PLACEMENT_BLOCK_SIZE - 1,
        ]);
    }

    // minus one because we already created the right side
    for ($x = 0; $x < $newXBlockCount-1; $x++) {
        $targetY = $newYBlockCount - 1;

        PlacementBlock::query()->create([
            'min_x' => $x * static::PLACEMENT_BLOCK_SIZE,
            'min_y' => $targetY * static::PLACEMENT_BLOCK_SIZE,
            'max_x' => ($x + 1) * static::PLACEMENT_BLOCK_SIZE - 1,
            'max_y' => ($targetY + 1) * static::PLACEMENT_BLOCK_SIZE - 1,
        ]);
    }

    return $this->find($excludeBlocks);
}
Enter fullscreen mode Exit fullscreen mode

This is being used by the placement method. This approach uses two locks: one global lock and one specific to the placement block. First it will lock the global lock to find an available placement block. Once a placement block is found the it is locked and the global lock is released for the next process to also find a placement block.

public function place(PendingTile $pendingTile): void
{
    $availablePlacementBlockCount = PlacementBlock::query()->where('full', '=', false)->count();
    $attempts = $availablePlacementBlockCount === 0 ? 1 : $availablePlacementBlockCount;

    $attemptedBlockIds = [];

    // Global placement block lock
    $lock = Cache::lock('retrieving-block', 30);

    try {
        // Global placement block lock so that only 1 process can retrieve a block at a time
        $lock->block(10);

        for ($i = 0; $i < $attempts; $i++) {
            $block = $this->findPlacementBlock->find($attemptedBlockIds);

            $attemptedBlockIds[] = $block->id;

            $blockCacheKey = "block:{$block->id}";

            // Attempt to acquire block-specific lock
            if (cache()->add($blockCacheKey, true, 3600)) {

                // Found a free block, release the global lock
                $lock->release();

                try {
                    $spot = $this->findAvailableSpot->find($pendingTile->width, $pendingTile->height, $block);

                    if ($spot === null) {
                        cache()->forget($blockCacheKey);
                        continue;
                    }

                    if ($pendingTile->width === 1 && $pendingTile->height === 1) {
                        $this->createSingleTile($pendingTile, $spot);
                    } else {
                        $this->createMultipleTiles($pendingTile, $spot);
                    }

                    $pendingTile->delete();
                    $block->checkFull();

                    break;
                } finally {
                    cache()->forget($blockCacheKey);
                }
            }
        }

    } catch (LockTimeoutException) {
        // Lock timeout, ignore and try again later
    } finally {
        $lock->release();
    }
}
Enter fullscreen mode Exit fullscreen mode

The find available spot uses our canPlaceBlock that we wrote before with the tiles limited to the placement block to find a spot within this placement block. Once the spot is found it will add the tile and release the placement block lock.

But what happens if a pending tile does not fit in an available placement block? It will have to wait until the grid expands to be placed. Adding a pending tile larger than a single placement block is not supported.

I am running this via Laravel jobs with Laravel Horizon, there are multiple workers handling placement. With this implementation, the number of tiles that can be placed simultaneously is limited by the number of workers and available placement blocks.
As the project grows I can scale up the workers to place more tiles at the same time. I just need to keep the amount of workers equal or lower than the amount of placement blocks.

Thank you for reading this article. I hope you've learned something.
If you did, why not add your favorite programming language, crypto coin, or your pet to the 10MPage? It is free!

Top comments (0)