DEV Community

lechat
lechat

Posted on

Building a Developer-Focused Search Engine in Rust: Lessons Learned and Challenges Overcome 🚀

As developers, we all know the struggle of wading through irrelevant search results to find that one golden line of code. So, I thought, why not build a search engine tailored for us devs? With Rust, Actix, Elasticsearch, React, and Next.js, I created a search engine for developers.

Here is what I made:
https://dev-search.com/

I am not a senior dev, so if I am doing something stupid, please let me know 😅

🎯 The Mission

The goal was simple: create a developer's information-focused search engine with:

Frontend: React + Next.js (SSG for speed and SEO)

Backend: Rust and Elasticsearch for robust, scalable search functionality

🚧 Challenges Faced

Search by Elasticsearch is slow 😢

Because there are more than 10 million documents, the search of elesticsearch was slow.

I found that the problem that was slowing it down was:

"track_total_hits": {big number like 10000}
Enter fullscreen mode Exit fullscreen mode

The Solution

Actually keeping that number big like 10000 is as slow as actually fetching 10000 documents from elasticsearch. By changing this to

"track_total_hits": false
Enter fullscreen mode Exit fullscreen mode

made the search a lot faster. But this change disables ability to track how many records were hit by a search, so you must consider well if it is good for your use case.

Too Many Malicious Users Scanning the Website 👽

Ah, the joys of running a public-facing site! Within days of launching, I noticed strange requests hitting my server logs. From bots pretending to be browsers to outright weird payloads like \x00\x00SMB, my site became a playground for malicious users. Here's a gem from my logs:

35.203.211.8 - - [30/Dec/2024:05:15:37 +0000] "\x00\x00\x00\xAC\xFESMB..."
Enter fullscreen mode Exit fullscreen mode

The Solution: Fail2Ban

Fail2Ban came to the rescue! This nifty tool monitors log files and dynamically bans IPs that show malicious behavior. Here's how I set it up:

Defined a Fail2Ban Jail for Nginx:

[nginx-malicious]
enabled = true
port = http,https
logpath = /var/log/nginx/access.log
maxretry = 5
findtime = 300
bantime = 600
action = iptables[name=nginx-malicious, port="http,https", protocol=tcp]
Enter fullscreen mode Exit fullscreen mode

Filter to Detect Malicious Patterns:

[Definition]
failregex = ^<HOST> - - .*SMB.*
ignoreregex =
Enter fullscreen mode Exit fullscreen mode

Dynamic Blocking in Action:

When Fail2Ban detects malicious requests, it updates the firewall to block the offending IP:

sudo iptables -L -n | grep DROP
Enter fullscreen mode Exit fullscreen mode

With Fail2Ban, malicious IPs were swiftly banned, and my server logs became much cleaner. Lesson learned: Bots will come, but so will the ban hammer. 🛠️

Please note that, if you are using Docker/Docker compose, you might need the following:
https://github.com/fail2ban/fail2ban/issues/2376#issuecomment-2565534465

Adsense not showing 😿

As you can see on the capture:

Image description

Even though Adsense is set, the Adsense often doesn't show up...
I investigated why it is not showing up, but I guess there are 2 reasons:

  1. My website's reputation is low
  2. Google cannot find ad for the specified ad size

Well, I cannot change the first reason, but maybe I can do something for the second one. What I did is as follows.

The Solution

At first, I tried the fixed sized ad because I wanted a not too large ad:

<GoogleAdUnit>
    <ins class="adsbygoogle"
        style="display:inline-block;width:300px;height:90px"
        data-ad-client="ca-pub-{ad-client-id}"
        data-ad-slot="{slot id}">
    </ins>
</GoogleAdUnit>
Enter fullscreen mode Exit fullscreen mode

But this often fails to show the ad.

  • Please note that I am using nextjs13_google_adsense because I am using Next.js.

So, after that, I tried a responsive ad. The default code of the responsive ad is:

<GoogleAdUnit>
    <ins
        className="adsbygoogle"
        style={{ display: 'block', width: '100%' }}
        data-ad-client="ca-pub-{ad-client-id}" // Replace with your AdSense client ID
        data-ad-slot='{slot id}' // Replace with your Ad slot ID
        data-ad-format="auto"
        data-full-width-responsive="true"
    />
</GoogleAdUnit>
Enter fullscreen mode Exit fullscreen mode

This is the best because the size is changed in accordance with the ad size. But, to me, the auto sized ad looked too big 😅

So I limited the height like this. Please note that I am using the "horizontal" for the data-ad-format because I wanted a not-too-big horizontal ad.

<GoogleAdUnit>
    <ins
        className="adsbygoogle"
        style={{ display: 'block', width: '100%', height: '50px' }} // limit height
        data-ad-client="ca-pub-{ad-client-id}" // Replace with your AdSense client ID
        data-ad-slot='{slot id}' // Replace with your Ad slot ID
        data-ad-format="horizontal" // horizontal
        data-full-width-responsive="true"
    />
</GoogleAdUnit>
Enter fullscreen mode Exit fullscreen mode

It still sometimes fail to show ad, but ad more often appear on my website now because there is not limitation for the width 😀

Unsolved Problems

  • Website Design is too simple
  • The search accuracy is low
  • The returned data is almost always only stackoverflow because large amount of the database is records from stackoverflow. Not sure whether this is OK..

🙏 Thanks for Reading!

Top comments (0)