I am building 10MPage, a digital archive of the state of the internet in 2025 using images. Anyone on the internet can upload a small image that holds personal meaning to them. As the name of the project suggests, the project must support 10 million small images. Each image is 64x64 pixels.
This post is about handling those images.
Users can upload an image through a Livewire component which then gets placed in a pending_tiles table where the image needs to be stored temporarily for approval.
After figuring out how I was going to render 10 million small images, see my other post for that, I wrote a quick command to generate a lot of images with random colors. This worked fine and I got a nice colorful grid:
Simple setup
I am hosting this project on a cheap VPS. My first question was: Can a single Linux directory handle 10 million files?
Apparently ext4 should be able to do this and there is a large directory setting to support even more!
So that's it then? All the small images will be placed on my cheap VPS and served through nginx. Well no, there are a few issues with that setup. For the initial phase of the project with a few thousand images max it is fine but to prevent future headache I wanted a more scalable solution.
Investigating Laravel storage options
When looking at the Laravel documentation on file storage you'll see that it is powered by Flysystem. Flysystem is a wonderful package to handle files in PHP applications. It is very flexible with many drivers.
After examining the different options available I have found S3 which is a cloud based file storage solution originally developed by Amazon. S3, three S's, stands for Simple Storage Service. With S3 it is possible to generate URL's directly to the server that is hosting that file. This means that the requests do not have to go though our small server which saves resources!
With S3 you are not tied to Amazon, I am using another S3 provider. Side quest: You can inspect the image URL's on 10MPage to find out which on ;)
After exploring the various storage options that Flysystem offers, I decided on S3 as the best choice for several reasons. Its ability to serve files directly from a scalable cloud storage system reduces the load on my VPS, allowing for better performance and future growth. It also integrates seamlessly with Laravel, making the transition straightforward.
The next step was configuring S3 in my Laravel application. Here's how I set it up and made some key adjustments to handle the transition smoothly.
Configuring S3 in Laravel
First of all I required the league/flysystem-aws-s3-v3
package and replaced the current local
driver with s3
:
I had to make some changes to the tile publishing proces as it is no longer possible to move a file from location A to location B directly. Instead I must now download the file into memory from the pending tiles disk and then upload it to the tiles disk.
$file = Storage::disk($pendingTile->disk)->get($pendingTile->path);
Storage::disk('tiles')->put($pendingTile->path, $file);
Storage::disk($pendingTile->disk)->delete($pendingTile->path);
Before I was base64 encoding the images to display them on the frontend, which was fine because they are so small. But with S3 it is possible to get an URL directly to the image. Here are the old base64 method and the new way of retrieving the image:
// Old
public function base64Image(): string
{
$image = Storage::disk($this->disk)->get($this->path);
$mimeType = Storage::disk($this->disk)->mimeType($this->path);
$base64 = base64_encode($image);
return "data:$mimeType;base64,$base64";
}
// New
public function imageUrl(): string
{
return Storage::disk($this->disk)->url($this->path);
}
And in my view:
One Exception from S3
In the form where users can upload a tile the images are still uploaded to disk. The reason is simple: I don't want to track files in S3 that users upload but don't finish submitting through the form. I want to have a temporary disk where I can delete all files that are older than X hours. I am doing that with this command:
class CleanupTmpFilesCommand extends Command
{
protected $signature = 'app:cleanup-tmp-files';
protected $description = 'Delete old files';
public function handle(): int
{
$path = Storage::disk('tmp')->path('');
$command = [
'find',
$path,
'-type',
'f',
'-mmin',
'+360', // 6 hours
'-delete',
];
Process::command($command)->run()->throw();
return static::SUCCESS;
}
}
This means that users have six hours to finish the upload form which is plenty of time.
But what about scalability? If I scale to multiple web servers, will this setup still work? The answer, as always, it depends.
For example, if we have three webservers using a round robin load balancer this approach will not work because we cannot assure that the same user makes requests to the same webserver.
But the solution is simple, we implement sticky sessions to our load balancer. That way we can be sure that the same user always connects to the same webserver.
Conclusion
Building a scalable and efficient system to handle 10 million small images requires careful consideration of storage, processing, and scalability challenges. Starting with a simple setup on a VPS and gradually evolving to use S3 storage has enabled the project to grow while maintaining flexibility and performance. The decision to use S3 not only offloads resource demands from the server but also simplifies serving images to the frontend.
By addressing temporary file handling and scalability concerns like sticky sessions, the system is prepared for growth and robust enough to handle future challenges. This balance between simplicity and scalability ensures that 10MPage can fulfill its goal of preserving a digital snapshot of the internet in 2025 while remaining accessible and maintainable.
Thank you for reading this article. I hope you've learned something.
If you did, why not add your favorite programming language, crypto coin, or your pet to the 10MPage? It is free!
Top comments (0)