Donate to Programming Historian today!

May 26, 2020

Full-Text Search for Lessons

Zoe LeBlanc

In an effort to make finding lessons more user-friendly, we’ve officially launched full-text searching for all our lessons. Previously you could use filter buttons to select lessons based on topic or activity, and sort them by date and difficulty. However, you weren’t able to find lessons based on their content.

As of today now you can you dig even deeper, finding the exact lesson to match your interests in all of our supported languages! This feature has been a long time coming (our initial issue ticket was opened on September 20, 2018) and we hope this new addition will make Programming Historian even more accessible.

To use the search feature, go to the lessons page and click on the Start Searching button.

Initial lesson home page, showing the start search button.
Initial lesson home page. Click start searching to enter search queries.

You’ll now see a search bar and button. You can enter your search terms and get a list of the relevant lessons, with the search terms highlighted.

Search results displaying highlighted search terms.
Search results for Twitter and Network

The results are ranked by relevance and you can also filter them using our existing buttons.

Search results displaying highlighted search terms with selected filters.
Search results for Twitter and Network with topic Python

If you want more information about searching, you can click the information button to get more details about how to use this feature.

Search info section, displaying additional details on how to search.

How does the search work?

Behind the scenes, this search feature uses LunrJS, a software package for enabling full-text search on static sites.

Search info section, displaying additional details on how to search.
Inverted index diagram
Search info section, displaying additional details on how to search.
Book index (from Wikipedia entry on Book of Knowledge)

Lunr builds an inverted index of all our lessons, which is essentially the same as an index at the back of a book. So each time you enter a search term, Lunr looks for term, finds all the lessons that is in, and then returns the lessons based on relevance of the term (which is calculated using an information retrieval algorithm called Okapi BM25).

For optimal results, we recommend using multiple search terms, as well as the + and - symbols to get exact searches or limit searches, respectively. You can also read more about how to search with Lunr on their searching documentation.

In adding full-text search, we have endeavored to optimize speed, as well as accuracy of results. Most search engines utilized inverted indices (like Solr or ElasticSearch) but they still expect you to have some sort of database to dynamically return results to your queries. Since we use a static site architecture, we don’t have any live databases, which means that our entire search index needs to be built prior to the site being loaded (otherwise users won’t be able to get search results).

Lunr remains one of the most common solutions for adding search to static site, but there’s a few drawbacks. One is that it takes a lot of time to build the search index and it can end up creating fairly large files to be loaded into the browser. We also had the additional complication of wanting to separate search results by language.

In the end, we implemented a fairly novel approach (as far as I know) to generate the search corpora using Jekyll, and then built a separate NodeJS app to make the search indices - in essence creating a microservice. You can view the code for this search index building in the search-index repository, and we used TravisCI to automatically rebuild the indices every night. In separating out the search index building, we were able to limit the JavaScript dependencies in our main repository and minimize the time needed to build the site locally. Having the indices built separately also allowed us to use Github’s built-in CDN functionality for serving the indices, as well as enabling us to limit the JavaScript payload for slower connections.

For more information about the technical features behind our full-text search, feel free to visit our technical documentation on search.

We hope that search allows users to more easily access lessons, as well as discover lessons in new ways. As Programming Historian continues to produce new lessons and support for additional languages, we hope features like full-text search help us maintain a user-friendly and sustainable web infrastructure.

About the author

Zoe LeBlanc is a Postdoctoral Associate and Weld Fellow at the Center for Digital Humanities, Princeton University.