How Ember Observer searches addon source code

In an earlier post I wrote about Ember Observer's code search feature, which allows searching through all Ember addon codebases. Ember Observer's code search was built using google/codesearch. Google Code Search was originally a Google Labs product that allowed users to search for open-source code on the web. The service was eventually shut down, but in 2012 author Russ Cox published a write-up of the technical details and released google/codesearch, an open-source implementation of similar functionality written in Go.

google/codesearch is a set of command-line tools that provides fast, indexed regular expression search of source code. It provides a cindex command to build an index from a list of directories or files, and a csearch command to search the index for matches. There is also a cgrep command that will search an arbitrary set of files, but it does not use the index, so unlike csearch it requires opening each file to be searched.

Building the index

To use google/codesearch, an index must be first be built with cindex:

$ cindex -h
usage: cindex [-list] [-reset] [path...]

Files must be on the local filesystem to be indexed. Ember Observer fetches copies of all addons' master branches to index them. The index is not automatically updated when underlying files change. To keep search results up-to-date, we run a daily cron job that pulls down changes to addons and then rebuilds the index. Directories within each addon that we do not want to search against, such as node_modules, are removed before indexing.

Once the directory containing the addons' source code has been indexed, the csearch command can be used to search through those addons.

Searching for addons

Ember Observer's code search passes search terms to csearch. csearch takes an RE2 regexp as input and outputs a line of text per match, much like grep:

$ csearch 'registerAsyncHelper'
/source/affinity-engine-curtain/tests/helpers/delay.js:export default Ember.Test.registerAsyncHelper('delay', function(app, duration) {
/source/affinity-engine-menu-bar-button-load/tests/helpers/start-app.js:    Ember.Test.registerAsyncHelper('delay', function(app, duration = 0) {
/source/affinity-engine-menu-bar-button-rewind/tests/helpers/start-app.js:    Ember.Test.registerAsyncHelper('delay', function(app, duration = 0) {
/source/affinity-engine-menu-bar-button-save/tests/helpers/start-app.js:    Ember.Test.registerAsyncHelper('delay', function(app, duration = 0) {
/source/affinity-engine-stage/test-support/helpers/affinity-engine/stage/register-test-helpers.js:  Ember.Test.registerAsyncHelper('delay', delay);
/source/affinity-engine-stage/test-support/helpers/affinity-engine/stage/register-test-helpers.js:  Ember.Test.registerAsyncHelper('step', step);
...

To find the addons containing a search term, we extract addon names from the output using a regular expression. Usage counts are obtained by counting the number of matches for each addon.

Searching for usages

Finding usages of a search term within a single addon requires a different approach. The index contains all addons, so by default csearch will search all addons. To restrict the search to a specific addon, we set the -f flag (file name regex) to the path of that addon’s source tree. We also use the -n flag so that output lines contain the line number of the match:

$ csearch -f '/source/ember-basic-dropdown/' -n 'registerAsyncHelper'
/source/ember-basic-dropdown/test-support/helpers/ember-basic-dropdown.js:55:  Ember.Test.registerAsyncHelper('clickDropdown', function(app, cssPath, options = {}) {
/source/ember-basic-dropdown/test-support/helpers/ember-basic-dropdown.js:59:  Ember.Test.registerAsyncHelper('tapDropdown', function(app, cssPath, options = {}) {
/source/ember-basic-dropdown/tests/helpers/native-click.js:12:export default Ember.Test.registerAsyncHelper('nativeClick', function(app, selector, context) {

We can extract addon names, file names, and line numbers of matches from the output with a regular expression. However, csearch only returns the exact line that was matched. It does not provide a way to retrieve context along with a match. In order to show the surrounding lines we must open the file containing the match:

lines = IO.readlines(filename, "\n")
lines[from_line-1..to_line-1].zip(from_line..to_line)
=> [["import Ember from 'ember';\n", 1], ["\n", 2], ["export default Ember.Test.registerAsyncHelper('flashMessage',\n", 3], ["  function(app, messageType, messageContent) {\n", 4], ["    const container = app.__container__;\n", 5], ["    const route = container.lookup('route:application');\n", 6]]

Wrapping up

google/codesearch is a powerful tool for searching through large codebases. We use its cindex command to build an index of a local copy of Ember addons once per day, and its csearch command to search the indexed addons. The searches run against the index, which limits the number of files that need to be opened and keeps most searches fast. For searches with a small number of results, csearch completes almost instantly, even against the 3,000 or so addons we currently index. However, for searches with many results, such as anything containing Ember, some extra work is needed to make searches fast enough to be usable. In a later post I'll write about the process of making Ember Observer's code search performant even for worst-case searches.

Part 3: Speeding up code search, step by step