Jekyll Static Site Search With lunr.js
One thing that static site generators are not good for is discoverability of content once a user is actually on the site. You’re rather forced to rely on external search engines to promote your content and make it findable to readers. Thankfully, with a little JavaScript magic, it is possible to overcome this limitation without running any additional services; making it very attractive for situations like GitHub Pages or low power hosting. Today, we will be implementing this with lunr.js and some creative use of Jekyll templates but the same theory could be applied to any static site system.
We will start in the root of our Jekyll site by creating a new layout template
for our search page in ./_layouts/search.html
.
---
layout: default
---
<h1>{{ page.title }}</h1>
<div class="search">
<form action="search" method="get">
<input type="text" id="search-box" name="query">
<input type="submit" value="Search">
</form>
<ul id="search-results" class="search-results"></ul>
</div>
<script src="/assets/js/search-content.js"></script>
<script src="/assets/js/lunr.js"></script>
<script src="/assets/js/search.js"></script>
This page is very basic and includes a simple GET request form as well as the
three pieces of JavaScript we will need to make this system work. Unfortunately,
due to the dependencies between each of the scripts, they have to be loaded in
order and we really can’t take advantage of any of the async
or defer
tricks
that usually help pages load faster. I decided to use a GET form instead of a
POST because it avoided the need for a more complex server side component and
since my entire site is served over HTTPS anyway, the query parameters are as
secure as they’re going to get within that GET request.
Next, we create a simple stub page, ./search.html
, which mainly exists to make
Jekyll render the template we just created. You can also use it to change the
search page title if you’d like by editing the title
key in the YAML frontmatter.
---
layout: search
title: Search
---
Now that we have the HTML rendering it’s time to create the JavaScript that makes
it all work. We will start with the template that generates our search data. Since
the entire site is static we have to pregenerate the JSON array that we’ll be
asking Lunr to search later and the easiest way to do this is by hooking into the
Jekyll build process. I store all of my site’s JS in ./assets/js
so we will
set the page permalink for ./search-content.js
accordingly.
---
permalink: /assets/js/search-content.js
---
window.store = {
{% assign searchable_pages = site.pages | where_exp: "page", "page.menu == 'main'" %}
{% assign searchable_documents = site.documents %}
{% for page in searchable_pages %}
{% assign searchable_documents = searchable_documents | push: page %}
{% endfor %}
{% for doc in searchable_documents %}
"{{ doc.url | slugify }}": {
"title": "{{ doc.title | xml_escape }}",
"author": "{{ doc.author | xml_escape }}",
"category": "{{ doc.category | xml_escape }}",
"content": {{ doc.content | strip_html | jsonify }},
"url": "{{ doc.url | xml_escape }}"
}
{% unless forloop.last %},{% endunless %}
{% endfor %}
}
This template does a couple of clever things to get it’s job done the way I wanted
for my site. First of all, I have some pages which I want included in the search
(e.g. About Me) while there are others that I don’t, like the search
page itself. If I just iterated through Jekyll’s site.documents
I wouldn’t
have access to those pages and I couldn’t just combine site.documents
and
site.pages
without losing control of which pages are searchable. I overcame
this by using a bit of YAML frontmatter tagging that I had used previously to
automate what pages were on my main menubar and filtering them using a where_exp
in the template. From there I push them into a combined array called searchable_documents
which mainly contains all of my blog posts. This also would allow me to include
other content or make other collections searchable or not by adding/removing them
from this master searchable_documents
array.
From there I iterate through the documents array and build a JSON object which
contains a mapping of each document URL to its searchable data. This strategy,
particularly using doc.content
, can make the JSON file generated fairly large
because it is including the full readable text representation of every page as
a searchable element. In my case at time of writing this means about 210KB of
data for 35 blog posts. You could reduce the load by swapping doc.content
for
a smaller snippet like doc.excerpt
but this would limit your searching to just
those excerpts instead of what amounts to full-text search of your entire site.
Probably more of a problem for a more active site, but at the rate I’m going it
will be years before the search content exceeds what will load quickly for most
users.
Once we’ve got the content generated by Jekyll it’s time to pull down the search library that we will be using, lunr.js. The GitHub repository will have the latest releases available.
wget https://github.com/olivernn/lunr.js/archive/v2.3.8.tar.gz -P /tmp
tar xcvf v2.3.8.tar.gz -C /tmp
cp /tmp/lunr.js-2.3.8/lunr.js ./assets/js
Finally, we just need some basic JavaScript to handle the search page interaction
and to link everything together. We’ll create this file in ./assets/search.js
.
(function() {
function displaySearchResults(results, store) {
var searchResults = document.getElementById('search-results');
if (results.length) { // Are there any results?
var appendString = '';
for (var i = 0; i < results.length; i++) { // Iterate over the results
var item = store[results[i].ref];
appendString += '<li><a href="' + item.url + '"><h3>' + item.title + '</h3></a>';
appendString += '<p>' + item.content.substring(0, 150) + '...</p></li>';
}
searchResults.innerHTML = appendString;
} else {
searchResults.innerHTML = '<li>No results found</li>';
}
}
function getQueryVariable(variable) {
var query = window.location.search.substring(1);
var vars = query.split('&');
for (var i = 0; i < vars.length; i++) {
var pair = vars[i].split('=');
if (pair[0] === variable) {
return decodeURIComponent(pair[1].replace(/\+/g, '%20'));
}
}
}
var searchTerm = getQueryVariable('query');
if (searchTerm) {
document.getElementById('search-box').setAttribute("value", searchTerm);
// Initalize lunr with the fields it will be searching on. I've given title
// a boost of 10 to indicate matches on this field are more important.
var idx = lunr(function () {
this.field('id');
this.field('title', { boost: 10 });
this.field('author');
this.field('category');
this.field('content');
// Add data to lunr
for (var key in window.store) {
this.add({
'id': key,
'title': window.store[key].title,
'author': window.store[key].author,
'category': window.store[key].category,
'content': window.store[key].content
});
}
});
var results = idx.search(searchTerm); // Get lunr to perform a search
displaySearchResults(results, window.store); // We'll write this in the next section
}
})();
Both of the functions displaySearchResults()
and getQueryVariable()
are fairly
straightforward and handle rendering each search result as a li
and pulling search
terms from the URL respectively. The real fun is where lunr
gets initialized
and loaded with all of our search data. During the field definition we can do
additional tuning like increasing the importance of certain fields to control
our search output. Once that’s done we perform the search and render any results,
creating our search page!