Jekyll Static Site Search With lunr.js

One thing that static site generators are not good for is discoverability of content once a user is actually on the site. You’re rather forced to rely on external search engines to promote your content and make it findable to readers. Thankfully, with a little JavaScript magic, it is possible to overcome this limitation without running any additional services; making it very attractive for situations like GitHub Pages or low power hosting. Today, we will be implementing this with lunr.js and some creative use of Jekyll templates but the same theory could be applied to any static site system.

We will start in the root of our Jekyll site by creating a new layout template for our search page in ./_layouts/search.html.

---
layout: default
---
<h1>{{ page.title }}</h1>
<div class="search">
  <form action="search" method="get">
    <input type="text" id="search-box" name="query">
    <input type="submit" value="Search">
  </form>
  <ul id="search-results" class="search-results"></ul>
</div>
<script src="/assets/js/search-content.js"></script>
<script src="/assets/js/lunr.js"></script>
<script src="/assets/js/search.js"></script>

This page is very basic and includes a simple GET request form as well as the three pieces of JavaScript we will need to make this system work. Unfortunately, due to the dependencies between each of the scripts, they have to be loaded in order and we really can’t take advantage of any of the async or defer tricks that usually help pages load faster. I decided to use a GET form instead of a POST because it avoided the need for a more complex server side component and since my entire site is served over HTTPS anyway, the query parameters are as secure as they’re going to get within that GET request.

Next, we create a simple stub page, ./search.html, which mainly exists to make Jekyll render the template we just created. You can also use it to change the search page title if you’d like by editing the title key in the YAML frontmatter.

---
layout: search
title: Search
---

Now that we have the HTML rendering it’s time to create the JavaScript that makes it all work. We will start with the template that generates our search data. Since the entire site is static we have to pregenerate the JSON array that we’ll be asking Lunr to search later and the easiest way to do this is by hooking into the Jekyll build process. I store all of my site’s JS in ./assets/js so we will set the page permalink for ./search-content.js accordingly.

---
permalink: /assets/js/search-content.js
---
window.store = {
  {% assign searchable_pages = site.pages | where_exp: "page", "page.menu == 'main'" %}
  {% assign searchable_documents = site.documents %}
  {% for page in searchable_pages %}
    {% assign searchable_documents = searchable_documents | push: page %}
  {% endfor %}
  {% for doc in searchable_documents %}
    "{{ doc.url | slugify }}": {
      "title": "{{ doc.title | xml_escape }}",
      "author": "{{ doc.author | xml_escape }}",
      "category": "{{ doc.category | xml_escape }}",
      "content": {{ doc.content | strip_html | jsonify }},
      "url": "{{ doc.url | xml_escape }}"
    }
    {% unless forloop.last %},{% endunless %}
  {% endfor %}
}

This template does a couple of clever things to get it’s job done the way I wanted for my site. First of all, I have some pages which I want included in the search (e.g. About Me) while there are others that I don’t, like the search page itself. If I just iterated through Jekyll’s site.documents I wouldn’t have access to those pages and I couldn’t just combine site.documents and site.pages without losing control of which pages are searchable. I overcame this by using a bit of YAML frontmatter tagging that I had used previously to automate what pages were on my main menubar and filtering them using a where_exp in the template. From there I push them into a combined array called searchable_documents which mainly contains all of my blog posts. This also would allow me to include other content or make other collections searchable or not by adding/removing them from this master searchable_documents array.

From there I iterate through the documents array and build a JSON object which contains a mapping of each document URL to its searchable data. This strategy, particularly using doc.content, can make the JSON file generated fairly large because it is including the full readable text representation of every page as a searchable element. In my case at time of writing this means about 210KB of data for 35 blog posts. You could reduce the load by swapping doc.content for a smaller snippet like doc.excerpt but this would limit your searching to just those excerpts instead of what amounts to full-text search of your entire site. Probably more of a problem for a more active site, but at the rate I’m going it will be years before the search content exceeds what will load quickly for most users.

Once we’ve got the content generated by Jekyll it’s time to pull down the search library that we will be using, lunr.js. The GitHub repository will have the latest releases available.

wget https://github.com/olivernn/lunr.js/archive/v2.3.8.tar.gz -P /tmp
tar xcvf v2.3.8.tar.gz -C /tmp
cp /tmp/lunr.js-2.3.8/lunr.js ./assets/js

Finally, we just need some basic JavaScript to handle the search page interaction and to link everything together. We’ll create this file in ./assets/search.js.

(function() {
  function displaySearchResults(results, store) {
    var searchResults = document.getElementById('search-results');

    if (results.length) { // Are there any results?
      var appendString = '';

      for (var i = 0; i < results.length; i++) {  // Iterate over the results
        var item = store[results[i].ref];
        appendString += '<li><a href="' + item.url + '"><h3>' + item.title + '</h3></a>';
        appendString += '<p>' + item.content.substring(0, 150) + '...</p></li>';
      }

      searchResults.innerHTML = appendString;
    } else {
      searchResults.innerHTML = '<li>No results found</li>';
    }
  }

  function getQueryVariable(variable) {
    var query = window.location.search.substring(1);
    var vars = query.split('&');

    for (var i = 0; i < vars.length; i++) {
      var pair = vars[i].split('=');

      if (pair[0] === variable) {
        return decodeURIComponent(pair[1].replace(/\+/g, '%20'));
      }
    }
  }

  var searchTerm = getQueryVariable('query');

  if (searchTerm) {
    document.getElementById('search-box').setAttribute("value", searchTerm);

    // Initalize lunr with the fields it will be searching on. I've given title
    // a boost of 10 to indicate matches on this field are more important.
    var idx = lunr(function () {
      this.field('id');
      this.field('title', { boost: 10 });
      this.field('author');
      this.field('category');
      this.field('content');

      // Add data to lunr
      for (var key in window.store) {
        this.add({
          'id': key,
          'title': window.store[key].title,
          'author': window.store[key].author,
          'category': window.store[key].category,
          'content': window.store[key].content
        });
      }
    });

    var results = idx.search(searchTerm); // Get lunr to perform a search
    displaySearchResults(results, window.store); // We'll write this in the next section
  }
})();

Both of the functions displaySearchResults() and getQueryVariable() are fairly straightforward and handle rendering each search result as a li and pulling search terms from the URL respectively. The real fun is where lunr gets initialized and loaded with all of our search data. During the field definition we can do additional tuning like increasing the importance of certain fields to control our search output. Once that’s done we perform the search and render any results, creating our search page!