In this article, we will walk through the process of setting up a search page that leverages MongoDB and Elasticsearch to search across multiple referenced collections. This approach is particularly useful when dealing with complex data relationships and needing powerful search capabilities.
Prerequisites
- MongoDB: Installed on your server or using a managed service like MongoDB Atlas.
- Elasticsearch: Installed on your server or using a managed service like Elastic Cloud.
- Node.js: For backend implementation.
- Logstash: To sync data from MongoDB to Elasticsearch.
To effectively utilize Elasticsearch's powerful search capabilities, it's crucial to transform and synchronize data from MongoDB into a format suitable for indexing. This involves flattening nested data structures and ensuring that only relevant, indexable data is included. In this section, we'll cover the process of data exchange, transformation, and synchronization using Logstash.
Understanding Data Exchange
Data Extraction
- MongoDB Source Configuration: Logstash can be configured to extract data directly from MongoDB. This involves setting up a connection to your MongoDB instance and specifying the collections to be monitored.
Data Transformation
Flattening Data: MongoDB collections often contain nested documents and references to other collections. To make this data indexable in Elasticsearch, we need to flatten these nested structures. Flattening involves merging related data from different collections into a single, cohesive document.
-
Example: Suppose we have the following collections:
-
users
: Contains user information. -
posts
: Contains posts made by users, with each post referencing a user byuser_id
.
We need to transform these collections into a single document structure containing both post and user information.
-
Example Configuration with Logstash
Logstash Input Configuration
Logstash can be configured to read from MongoDB using the mongodb
input plugin. Here's an example configuration:
input {
mongodb {
uri => 'mongodb://localhost:27017/mydatabase'
placeholder_db_dir => '/opt/logstash-mongodb/'
placeholder_db_name => 'logstash_sqlite.db'
collection => 'posts'
batch_size => 5000
}
}
-
uri
: Connection string for MongoDB. -
placeholder_db_dir
: Directory to store state information. -
placeholder_db_name
: Name of the SQLite database file to store state information. -
collection
: Name of the MongoDB collection to monitor. -
batch_size
: Number of documents to process in each batch.
Logstash Filter Configuration
To flatten the data and enrich posts with user information, we use the aggregate
filter plugin:
filter {
aggregate {
task_id => "%{user_id}"
code => "
map['user'] ||= {}
event.to_hash.each { |k, v| map['user'][k] = v }
"
push_previous_map_as_event => true
timeout => 3
}
}
-
task_id
: A unique identifier for aggregating related data, in this case,user_id
. -
code
: The script to enrich and flatten data. -
push_previous_map_as_event
: Ensures the aggregated data is pushed as a single event. -
timeout
: Time to wait before pushing the event.
Logstash Output Configuration
Finally, configure the output to send the transformed data to Elasticsearch:
output {
elasticsearch {
hosts => ["localhost:9200"]
index => "posts_with_users"
}
}
-
hosts
: Elasticsearch server address. -
index
: Name of the Elasticsearch index to store the data.
Keeping Data Updated
Real-time Synchronization
Change Data Capture: Utilize MongoDB Change Streams to capture real-time changes in the MongoDB collections. This ensures that any updates, insertions, or deletions in MongoDB are reflected in Elasticsearch.
Logstash Configuration: Logstash, when configured with the appropriate input plugins, can continuously monitor MongoDB collections for changes and apply them to Elasticsearch.
Indexing Only Relevant Data
Selective Indexing: Focus on indexing only the fields that are relevant for search queries. This reduces the index size and improves search performance.
Example: If you are only interested in searching posts by content and user details, configure Logstash to only include
post_content
,user.name
, anduser.email
in the events sent to Elasticsearch.
Example Logstash Pipeline
Here is a complete example of a Logstash pipeline that extracts, transforms, and loads data from MongoDB to Elasticsearch:
input {
mongodb {
uri => 'mongodb://localhost:27017/mydatabase'
placeholder_db_dir => '/opt/logstash-mongodb/'
placeholder_db_name => 'logstash_sqlite.db'
collection => 'posts'
batch_size => 5000
}
}
filter {
aggregate {
task_id => "%{user_id}"
code => "
map['user'] ||= {}
event.to_hash.each { |k, v| map['user'][k] = v }
"
push_previous_map_as_event => true
timeout => 3
}
# Select only relevant fields
mutate {
remove_field => ["_id", "user_id"]
}
}
output {
elasticsearch {
hosts => ["localhost:9200"]
index =>
## Step 1: Set Up MongoDB and Elasticsearch
### MongoDB Installation
Follow the [MongoDB installation guide](https://docs.mongodb.com/manual/installation/) for your operating system. Alternatively, you can use a managed service like MongoDB Atlas.
### Elasticsearch Installation
Follow the [Elasticsearch installation guide](https://www.elastic.co/guide/en/elasticsearch/reference/current/install-elasticsearch.html) for your operating system. Alternatively, you can use a managed service like Elastic Cloud.
## Step 2: Data Modeling and Indexing
### Identify Collections and Relationships
Assume we have two collections:
- `users`: Contains user information.
- `posts`: Contains posts made by users, with each post referencing a user.
### Flatten Data for Elasticsearch
Elasticsearch works best with denormalized (flattened) data. This means we need to create a single document structure that includes fields from both `users` and `posts`.
## Step 3: Sync Data from MongoDB to Elasticsearch
### Use a Data Sync Tool
To keep your Elasticsearch index updated with data from MongoDB, you can use tools like Logstash, Mongo-Connector, or custom scripts using MongoDB Change Streams.
### Example with Logstash
#### Install Logstash
Follow the [Logstash installation guide](https://www.elastic.co/guide/en/logstash/current/installing-logstash.html) for your operating system.
#### Create a Logstash Configuration File
Here’s an example configuration that denormalizes data from `users` and `posts` collections into a single index:
plaintext
input {
mongodb {
uri => 'mongodb://localhost:27017/mydatabase'
placeholder_db_dir => '/opt/logstash-mongodb/'
placeholder_db_name => 'logstash_sqlite.db'
collection => 'posts'
batch_size => 5000
}
}
filter {
# Enrich posts with user data
aggregate {
task_id => "%{user_id}"
code => "
map['user'] ||= {}
event.to_hash.each { |k, v| map['user'][k] = v }
"
push_previous_map_as_event => true
timeout => 3
}
}
output {
elasticsearch {
hosts => ["localhost:9200"]
index => "posts_with_users"
}
}
#### Run Logstash
Start Logstash with your configuration file.
sh
bin/logstash -f logstash.conf
## Step 4: Index Data in Elasticsearch
Ensure that your data is indexed correctly in Elasticsearch. You can verify this by querying the Elasticsearch index:
sh
curl -X GET "localhost:9200/posts_with_users/_search?pretty"
## Step 5: Create the Search Page
### Backend Setup
#### Choose a Backend Framework
We will use Node.js for this example.
#### Install Elasticsearch Client
Install the Elasticsearch client library for Node.js:
sh
npm install @elastic/elasticsearch
#### Example Code
Create a file named `app.js` and add the following code:
javascript
const { Client } = require('@elastic/elasticsearch');
const express = require('express');
const app = express();
const client = new Client({ node: 'http://localhost:9200' });
async function search(query) {
const { body } = await client.search({
index: 'posts_with_users',
body: {
query: {
multi_match: {
query: query,
fields: ['post_content', 'user.name', 'user.email']
}
}
}
});
return body.hits.hits;
}
app.get('/search', async (req, res) => {
const query = req.query.q;
const results = await search(query);
res.json(results);
});
app.listen(3000, () => {
console.log('Server is running on port 3000');
});
#### Run the Server
Start your Node.js server:
sh
node app.js
Top comments (1)
unfortunately there is a typo in the markdown format making the article hard to read. But the subject is interesting