One of the best ways to drive traffic to your website is to have strong Search Engine Optimization (SEO). You can provide search engines with all the URLs for your website using a Sitemap. This allows for easier indexing and more efficient crawling by the search engines.
Maintaining a static sitemap can be tedious and more work if your website is always changing. The best solution is to dynamically create one.
Let's check out a couple of ways we can accomplish this.
Create a sitemap using a script at build time
If all of your content and pages are local in your project, you can easily use a script at build time to create a sitemap.xml
.
My blog uses MDX files instead of a CMS, so I do not have to worry about my content changing after build time.
My script uses globby
to traverse the file system and return all my routes. First thing we need to do is install it as a dev dependency.
npm i -D globby
Then we can create the script:
scripts/generate-sitemap.js
const fs = require('fs');
const globby = require('globby');
const generateSitemap = async () => {
// Fetch all routes based on patterns
// Your folder structure might be different so change bellow to match your needs
const pages = await globby([
'pages/**/*.{ts,tsx,mdx}', // All routes inside /pages
'_content/**/*.mdx', // All MDX files inside my /_content
'!pages/**/[*.{ts,tsx}', // Ignore my dynamic route index Example /pages/blog/[slug].tsx
'!pages/_*.{ts,tsx}', // Ignore next.js files
'!pages/api', // Ignore API routes
'!pages/admin.tsx' // Ignore pages not meant to be indexed
]);
const urlSet = pages
.map((page) => {
// Remove none route related parts of filename.
const path = page
.replace('pages', '')
.replace('_content', '')
.replace(/(.tsx|.ts)/, '')
.replace('.mdx', '');
// Remove the word index from route
const route = path === '/index' ? '' : path;
// Build url portion of sitemap.xml
return `<url><loc>https://codebycorey.com${route}</loc></url>`;
})
.join('');
// Add urlSet to entire sitemap string
const sitemap = `<?xml version="1.0" encoding="UTF-8"?><urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">${urlSet}</urlset>`;
// Create sitemap file
fs.writeFileSync('public/sitemap.xml', sitemap);
};
module.exports = generateSitemap;
To run the script at build time, you can create a next.config.js
file. This will modify Next.js build process.
const generateSitemap = require('./scripts/generate-sitemap');
const generateRSS = require('./scripts/generate-rss');
module.exports = {
webpack: (config, { isServer }) => {
if (isServer) {
generateSitemap();
}
return config;
}
};
Now when you build your website, you should see a freshly created public/sitemap.xml
.
Lastly, I recommend adding public/sitemap.xml
to your .gitignore
since it is a generated file.
Create a sitemap using a route
You cannot create a sitemap
at build time When you are using a content management system (CMS). It might work when you first build your project, but if you push out new content after the build, your sitemap
will be outdated.
What we could do is create an API route to fetch the data, and we rewrite the sitemap request to use the API route.
First create the API route:
pages/api/sitemap.ts
export default async (req, res) => {
// Fetch data from a CMS.
const resp = await fetch('MOCK_URL');
const externalPosts = await resp.json();
const routes = externalPosts.map((post) => `/blog/${posts.slug}`);
const localRoutes = ['/index', '/blog'];
const pages = routes.concat(localRoutes);
const urlSet = pages
.map((page) => {
// Remove none route related parts of filename.
const path = page
.replace('pages', '')
.replace('_content', '')
.replace(/(.tsx|.ts)/, '')
.replace('.mdx', '');
// Remove the word index from route
const route = path === '/index' ? '' : path;
// Build url portion of sitemap.xml
return `<url><loc>https://codebycorey.com${route}</loc></url>`;
})
.join('');
// Add urlSet to entire sitemap string
const sitemap = `<?xml version="1.0" encoding="UTF-8"?><urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">${urlSet}</urlset>`;
// set response content header to xml
res.setHeader('Content-Type', 'text/xml');
// write the sitemap
res.write(sitemap);
res.end();
};
Now we can create a route rewrite to make /sitemap.xml
actually call /api/sitemap
.
Create next.config.js
and add:
module.exports = {
async rewrites() {
return [
{
source: '/sitemap.xml',
destination: '/api/sitemap'
}
];
}
};
Now when you navigate to http://localhost:3000/sitemap.xml
, you should see your sitemap based on local files and your CMS.
Bonus: Robots.txt
One additional thing you can add to your website to improve SEO is a robots.txt
(AKA robots exclusion standard). This basically tells search engines which routes they are not allowed to index.
Create public/robots.txt
and add
User-agent: *
Disallow:
Sitemap: https://your-url.com/sitemap.xml
This will tell search engines they are welcome to crawl your entire website.
If you would like to block any pages from being indexed, add it as disallow field.
User-agent: *
Disallow: /admin
Disallow: /secret-page
Sitemap: https://your-url.com/sitemap.xml
Top comments (3)
I am using a similar generate-sitemap script manually, thinking how to automate it on every new blog post (dynamic route). But I need to add the new sitemap.xml to the commit in order to push it to origin master, from where it is atomically deployed to vercel.
Is that possible with your first solution at build time?
Thank you very much for your article, I really enjoyed it 👍
I deploy my website using Vercel. Anytime my main branch changes, it triggers a build and runs my generate-sitemap through
next.config.js
. Since Vercel is running my script, I do not need to commit it to repository.Alright, so that would work out for me. Thank you!