Configuring a Dynamic Sitemap on Wagtail

Creating a dynamic sitemap

A sitemap lists a website’s most important pages, making sure search engines can find and crawl them. It's important to keep your sitemap up to date for optimal SEO. With a quick bit of coding, you can set your sitemap to be created dynamically on demand, ensuring it always reflects the latest content.

Creating a dynamic sitemap for your site is straightforward in Wagtail. A fresh copy will be rendered each time it is requested ensuring it reflects the current content. After the brief setup, and without additional coding, this will crawl all the live Wagtail pages in the default language for your site.

To your base.py. add django and wagtail sitemaps to your installed apps:

'wagtail.contrib.sitemaps',
'django.contrib.sitemaps',

In the urls.py file in the root folder:

from wagtail.contrib.sitemaps.views import sitemap

and in the urlpatterns, above the catch-all:

url(r'^sitemap.xml$', sitemap),

Now, browsing to example.com/sitemap.xml shows something similar to:

<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url>
<loc>https://example.com/</loc>
<lastmod>2021-05-17</lastmod>
</url>
<url>
<loc>https://example.com/contact/</loc>
<lastmod>2021-05-17</lastmod>
</url>
<url>
<loc>https://example.com/blog/</loc>
<lastmod>2021-06-02</lastmod>
</url>
<url>
<loc>https://example.com/services/</loc>
<lastmod>2021-05-19</lastmod>
</url>
<url>
<loc>https://example.com/about/</loc>
<lastmod>2021-05-19</lastmod>
</url>
<url>
<loc>https://example.com/privacy/</loc>
<lastmod>2021-05-19</lastmod>
</url>
</urlset>

You can see the auto-generated sitemap.xml for this site here.

Adding Support for Routable Pages

If you're using routable pages on your site, you might want to add these as well.

Go to each class with routable pages and override the default get_sitemap_urls method called for each page. Add the following method to the class:

def get_sitemap_urls(self, request):
        sitemap = super().get_sitemap_urls(request)
        sitemap.append(
            {
                "location": self.full_url + self.reverse_subpage('routable_page_name'),
                "lastmod": self.last_published_at or self.latest_revision_created_at,
            }
        )
        return sitemap

Note: the method for getting the lastmod value may not be appropriate, it may be better to search when the latest item in the list would be for example.

You could also add an item to the dictionary to give the page priority:

{ 
    "location": self.full_url + self.reverse_subpage('routable_page_name'),
    "lastmod": self.last_published_at or self.latest_revision_created_at,
    "priority": 0.9,
}

If, for some reason, you have a page class that you don’t want to show in the sitemap (any pages that you don’t want indexed, or an empty redirect page), override the get_sitemap_urls and return an empty set:

def get_sitemap_urls(self, request):
    return[]

Make sure to add the noindex meta instruction to the corresponding template:

<meta name="robots" content="noindex, follow" />

Do not add it as a disallow to your robots.txt (more on that in the next blog).

Configuring Multi-lingual Sites

Using wagtail-localize, I found that only the default language pages get added to the sitemap. I've added it as a bug/feature request, so I hope to see it in a future release.

In the meantime, I used the following workaround on my home page model. It tells sitemaps to find the published siblings of the home page (which should be the home pages of your secondary languages) and add those to the sitemap along with all of their published descendants.

def get_sitemap_urls(self, request):
        sitemap = super().get_sitemap_urls(request)

        for locale_home in self.get_siblings(inclusive=False).live():
            for entry in locale_home.get_sitemap_urls(request):
                sitemap.append(entry)
            for child_page in locale_home.get_descendants().live():
                for entry in child_page.get_sitemap_urls(request):
                    sitemap.append(entry)
        return sitemap

Now browsing back to the sitemap for our multi-lingual version of the example.com, you would see:

<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url>
<loc>https://example.com/en/</loc>
<lastmod>2021-05-17</lastmod>
</url>
<url>
<loc>https://example.com/es/</loc>
<lastmod>2021-08-18</lastmod>
</url>
<url>
<loc>https://example.com/es/contacto/</loc>
<lastmod>2021-05-17</lastmod>
</url>
<url>
<loc>https://example.com/es/privacidad/</loc>
<lastmod>2021-05-19</lastmod>
</url>
<url>
<loc>https://example.com/es/servicios/</loc>
<lastmod>2021-08-18</lastmod>
</url>
<url>
<loc>https://example.com/es/blog/</loc>
<lastmod>2021-05-19</lastmod>
</url>
<url>
<loc>https://example.com/es/sobre/</loc>
<lastmod>2021-05-19</lastmod>
</url>
<url>
<loc>https://example.com/fr/</loc>
<lastmod>2021-05-18</lastmod>
</url>
<url>
<loc>https://example.com/fr/contact/</loc>
<lastmod>2021-08-05</lastmod>
</url>
<url>
<loc>https://example.com/fr/confidentialite/</loc>
<lastmod>2021-05-19</lastmod>
</url>
<url>
<loc>https://example.com/fr/services/</loc>
<lastmod>2021-08-05</lastmod>
</url>
<url>
<loc>https://example.com/fr/blog/</loc>
<lastmod>2021-05-19</lastmod>
</url>
<url>
<loc>https://example.com/fr/a-propos/</loc>
<lastmod>2021-05-19</lastmod>
</url>
<url>
<loc>https://example.com/en/contact/</loc>
<lastmod>2021-05-17</lastmod>
</url>
<url>
<loc>https://example.com/en/blog/</loc>
<lastmod>2021-06-02</lastmod>
</url>
<url>
<loc>https://example.com/en/language-coaching/</loc>
<lastmod>2021-05-19</lastmod>
</url>
<url>
<loc>https://example.com/en/translation-services/</loc>
<lastmod>2021-05-19</lastmod>
</url>
<url>
<loc>https://example.com/en/about/</loc>
<lastmod>2021-05-19</lastmod>
</url>
<url>
<loc>https://example.com/en/privacy/</loc>
<lastmod>2021-05-19</lastmod>
</url>
</urlset>
 

You're all set to go. If you have any comments, questions or suggestions on the above, feel free to leave those in the comment section below.

 
Comments
Sign In to leave a comment.

Next Post

Wagtail - Configure the robots.txt and Block Search Indexing (the correct way)

Wagtail - Configure the robots.txt and Block Search Indexing (the correct way)

Rather than just being a static file, you can use Django/Wagtail templating to create a dynamically generated robots.txt. This is not the place to block search engine crawlers though, I'll show a method to apply that from your base template.

 

Previous Post

Open Wagtail Rich Text Block Links in a New Tab

Open Wagtail Rich Text Block Links in a New Tab

You'd think this would be something you could do out of the box, but the makers of Wagtail have stated that this is against their philosophy and won't ever be implementing it. Here's a simple fix to circumvent that decision.