Comby

a tool for changing :[code]

Code is so interesting because simple linear sequences of characters express rich non-linear structures (like trees) and arbitrary information (like comments). Humans are very creative, and have developed a variety of ways to order these characters to mean different things. For example, C-like comments can start with // but Python-like comments use #. Problematically, we use the same characters to mean different things across languages and in different contexts: a # in C could be the start of a macro. Or it could be part of a string like "#". Or it could be a meaningless character inside a C comment like // #. Yet at some level, a choice of characters in all languages correspond to similar underlying structures or information: most languages have comments and balanced delimiters like parentheses () or braces {} nest expressions, as found in typical if-statements. The key idea in Comby is to match code that respects the basics of this richer structure on a per-language basis, and not (only) as arbitrary sequences of characters. Matching sequences of characters is something that regex is really good for, but regex is not generally powerful enough to recognize nested code structures. Regex also supports a lot of additional functionality that can lead to complex patterns. Comby tries to cut down on this complexity to make changing code easy, so you'll find that regex-style match operators are absent in Comby. And that's OK.

Basic Usage

Comby is a tool for matching and rewriting code. You start by writing a simple template to match syntax. Look at this Go function:

func main() {
    fmt.Println("hello world")
}

We can match the arguments to fmt.Println with this match template:

fmt.Println(:[arguments])

The :[arguments] part is called a hole. It saves the matched part to a variable. In this case, the variable is called arguments, but we could have called it something else, like :[1] or :[the_1st_arg]. Your choice! As long as it only contains alphabet characters, numbers, or underscores.

The :[arguments] hole matches the "hello world" string. We can use it in a rewrite template to rewrite the function, like this one:

fmt.Println(fmt.Sprintf("comby says %s", :[arguments]))

Comby takes the match and rewrite templates and replaces the matched part in place:

func main() {
  fmt.Println(fmt.Sprintf("comby says %s", "hello world"))
}

Holes are the only special part in match templates, and they always have the form :[...]. All other characters are interpreted literally (there's a bit of detail about whitespace that we'll talk about in the next part). The point is that you never have to escape any characters in your template. Just say what you mean!

:[hole] matches all characters, including newlines. If the match template was just
:[file_content], it would match all the file content. The way :[hole] starts and stops matching depends on the code structure around it. Let's look at an example, matching on this Javascript code using the match template on the right:

if (width <= 1280 && height <= 800) {
    return 1;
}
if (:[var] <= :[rest])

:[var] matches until it sees the  <=  part coming after it and matches width. :[rest] matches the rest of the condition: 1280 && height <= 800. These holes match lazily: they look for the shortest way to satisfy the match. One way to refine matching is to add concrete context around holes based on what we care about. For example, we could match height to :[height] with both of the following templates, which depends on matching different parts of surrounding code:

if (:[_] && :[height] :[_])
if (:[_] :[height] <= 800)

Comby tries to make matching code forgiving. Whitespace in the template, like a single space, multiple contiguous spaces, or newlines are interpreted all the same: Comby will match the corresponding whitespace in the source code, but will not care about matching the exact number of spaces, or distinguish between spaces and newlines. Not being strict about whitespace tends to be the right default decision for code in most languages. It means our previous match templates all still work in these cases where our Javascript code is formatted differently:

if (width <= 1280
    && height <= 800) {
    return 1;
}
if (width     <= 1280
    && height <= 800) {
    return 1;
}

If you're wondering about indentation-sensitive languages like Python, be sure to check out the FAQ.

If holes only matched lazily and indiscriminately up to patterns like <= it wouldn't be much more special than matching a sequence of characters. But matching is smarter than that. In many languages, balanced delimiters like (), [] and {} are always balanced. By default, a match template like (:[1]) will only match characters inside well-balanced parentheses. Here are two example matches in this code:

result = foo(bar(x)) + foobar(baz(x));

The hole binds to bar(x) and baz(x) respectively, which we can easily rewrite to a different call qux(x), for example. The observant reader will notice that (x) are nested matches. By default, Comby will match at the toplevel, but nested matches can be found with added context (e.g., bar(:[1])), or by extracting and rerunning Comby on modified code. Note that writing a regular expression to do the same is not easy (simple attempts like \(.*\) or \(.*?\) don't work).

Let's change the code above and make it a little more interesting. Suppose it was this Javascript:

var result = foo(bar(x /* arg 1) */)) + foobar("(");

Now there's quite a bit of complexity if we want to match the arguments of foo and foobar. A block comment /* arg 1) */ is inlined for bar. Because this is a comment, it shouldn't matter whether the parenthesis inside are balanced or not. The same goes for the string literal argument to foobar: it's not a parenthesis in the code. The special thing here is that our original match template (:[1]) can stay exactly the same and still matches the two arguments (in this case, it captures the comment and string)

var result = foo(bar(x /* arg 1) */)) + foobar("(");

Comby understands this interaction between delimiters, strings, and comments and makes reasonable guesses for your language based on file extension (you can also force a particular matcher with a command line option, see the Quick Reference). And, you can always fall back to a generic matcher for files or languages that are not explicitly supported. See the FAQ for language support and extension.

Note that if we tried to use a regex above, our pattern would need to understand that /* */ delineates comments, otherwise it would get confused by the parenthesis inside! The same problem comes up for the string literal argument, which contains an unbalanced parenthesis. A regular expression that takes all of this into account would get ugly fast, and that's only for Javascript!

Comby supports the following syntax, which carry special meaning for matching:

:[hole] matches zero or more characters (including whitespace, and across newlines) in a lazy fashion. When :[hole] is used inside delimiters, as in {:[h1], :[h2]} or (:[h]), those delimiters set a boundary for what the hole can match, and the hole will then only match patterns within those delimiters. Holes can be used outside of delimiters as well.

:[[hole]] matches one or more alphanumeric characters and _.

:[hole.] (with a period at the end) matches one or more alphanumeric characters and punctuation (like ., ;, and -).

:[hole\n] (with a \n at the end) matches one or more characters up to a newline, including the newline.

:[ ] (with a space) matches only whitespace characters, excluding newlines. To assign the matched whitespace to variable, put the variable name after the space, like :[ hole].

:[?hole] (with a ? before the variable name) optionally matches syntax. Optional holes work like ordinary holes, except that if they fail to match syntax, the variable is assigned the empty string "". Optional hole support is currently an experimental feature.

  Using :[hole] inside string quotes will match only within the string. This is implemented for most languages. Comby also understands the difference between escapable string literals (like "string" in C) and raw string literals (like `string` in Go), and will know to stop between these delimiters.

  Use :[[hole]] to match only alphanumeric and underscore characters. This hole does not match across newlines or punctuation.

  You can refer to the same variable using either :[[hole]] or :[hole] in the rewrite template.

  You almost never want to start a template with :[hole], since it matches everything including newlines up to its suffix. This can make things slow. :[hole] is typically useful inside balanced delimiters.

  Consider combinations of holes to match interesting properties. For example, to capture leading indentation of a line, use a template like
:[ leading_indentation]:[everything_until_newline\n].

  Optional holes are useful for matching syntax that may or may not exist. For example, a Go function may or may not have a receiver. So, to match all Go functions in a project, including ones with a receiver, use an optional :[?receiver] hole.

  Looking for inspiration? Check out these simple code rewrites and the FAQ.

Advanced Usage

You can refine matches and rewrite templates with rules in Comby. Rules start with the word where. A rule can check whether two variables are syntactically equal. For example, we can check for duplicate expressions in if-conditions with the following match template and rule:

if (:[left_side] && :[right_side])
where :[left_side] == :[right_side]

This matches code where the programmer perhaps made a mistake and duplicated an expression without changing a variable like x to y:

if (x == 500 && x == 500)

You can use the != operator to check inequality. Multiple conditions can be separated by a comma, and mean "logical and". The following adds a condition to ignore our match case above:

where :[left_side] == :[right_side], :[left_side] != "x == 500"

Variables can be compared to other variables or string contents (enclosed by double quotes).

Comby includes experimental language features for sub-matching and rewriting in rules. These features might change slightly in meaning or syntax, but are currently available if you want to experiment with it.

Here is an example using the sub-matching syntax:

where match :[left_side] {
      | "x == 600" -> false
      | "x == 500" -> true
      }

The match { ... } says to match the text bound to :[left_side] against each of the match cases | match_case, and to perform the filter on the right-hand side of the -> when the pattern matches. Sub-matching statements can nest:

where match :[left_side] {
      | "x == 500" ->
            match :[right_side] {
            | "x == 500" -> true
            | "x == 600" -> false
            }
      | "x == 600" -> false
      }

A rewrite { ... } expression can rewrite syntax captured in a hole. This is useful for rewriting repetitions of a pattern. This example converts arguments of a dict to a JSON-like format, where dict(foo=bar,baz=qux) becomes {"foo": bar, "baz": qux}:

dict(:[args])
:[args]
where rewrite :[args] {  ":[[k]]=:[[v]]" -> "\":[k]\": :[v]" }

The pattern rewrites every matching instance of :[[k]]=:[[v]] to ":[k]": :[v]. The contents of the :[args] hole are overwritten if the rewrite pattern fires. Note that the left and right hand sides inside the { ... } need enclosing string quotes. This means that our pattern needs to escape the double quotes on the right hand side.

Conceptually, a rewrite rule works the same way as a toplevel match and rewrite template, but only for a particular hole, and has the effect of overwriting the hole contents when there are substitutions.

It is possible to have sequences of rewrite expressions in a rule. Here a second rewrite expression adds quotes around :[v]:

where
rewrite :[args] {  ":[[k]]=:[[v]]" -> "\":[k]\": :[v]" },
rewrite :[args] {  ": :[[v]]" -> ": \":[v]\"" }

The rewrite expressions are evaluated in a left-to-right sequence and overwrite :[args] in every case where expressions succeed. Rewrite expressions always return true, even if they don't succeed in rewriting a pattern. What this means for the example above is that the first rewrite expression will be attempted on :[args]. Even if it does not succeed in rewriting any patterns, the second rewrite expression will also be attempted. If neither rewrite expression change the contents of :[args], it remains unchanged in the output of the toplevel rewrite template.

It is not currently possible to nest rewrite statements, though there are plans for support in the near future. For more about future plans, see the roadmap.

Quick Reference

The project is available on GitHub: https://github.com/comby-tools/comby.

brew install comby (Mac OS X)

bash <(curl -sL get.comby.dev) (Ubuntu Linux)

docker pull comby/comby (Docker)

See the GitHub project for others or to build from source.

All the examples on comby.live and in the catalog are just a copy away from working on your command-line (just click on terminal in comby.live).

echo "foo(a, b)" | comby 'foo(:[1], :[2])' 'bar(:[2], :[1])' -stdin

Output:

------ /dev/null
++++++ /dev/null
@|-1,1 +1,1 ============================================================
-|foo(a, b)
+|bar(b, a)

comby 'foo :[template]' 'bar :[template]' .go

comby 'foo :[template]' 'bar :[template]' .go -i

comby 'foo :[template]' 'bar :[template]' .go -rule 'where :[template] == "main"'

comby 'foo :[template]' 'bar :[template]' file.go -d foo

comby 'foo :[template]' 'bar :[template]' .txt -matcher .js

echo "foo(a, b)" | comby 'foo(:[1], :[2])' 'bar(:[2], :[1])' -stdin -diff

Output:

--- /dev/null
+++ /dev/null
@@ -1,1 +1,1 @@
-foo(a, b)
+bar(b, a)
comby 'foo(:[1], :[2])' 'bar(:[2], :[1])' sub.js -review

Output:

------ sub.js
++++++ sub.js
@|-1,3 +1,3 ============================================================
-|function subtract(param1, param2) { 
+|function subtract(param2, param1) { 
 |  return param1 - param2;
 |}
Accept change (y = yes [default], n = no, e = edit original, E = apply+edit, q = quit)?
comby .go -templates /path/to/directory

A rewrite pattern should be described by two files in path/to/directory, one named match and the other named rewrite. An optional rule can be put in the same directory, in a file called rule. See the catalog directory layout for a sample catalog of templates.

FAQ

Comby supports basic delimiter matching for common characters like (), {}, and [] using a generic matcher. This works as a fallback for data formats like JSON, new languages, and existing ones that may not have explicit support yet (like VHDL). The grammars for the following languages have been refined to take into account basic language-specific delimiters, comments, and string literals:

Assembly, Bash, C/C++, C#, Clojure, CSS, Dart, Elm, Elixir, Erlang, Fortran, F#, Go, Haskell, HTML/XML, Java, Javascript, JSX, JSON, Julia, LaTeX, Lisp, Nim, OCaml, Pascal, PHP, Python, Reason, Ruby, Rust, Scala, SQL, Swift, Plain Text, TSX, Typescript

Note: Comby cannot recognize arbitrary matching tags like <foo>...</foo> in HTML or XML yet (we do have plans to support it soon). Matching tags within angle brackets <...> works.

Hopefully the language you're interested is already supported. If not, you can define your own language in a simple JSON file and pass it as a custom matcher. Just define the following supported language constructs in JSON, as follows:

{
   "user_defined_delimiters":[
      [
         "case",
         "esac"
      ]
   ],
   "escapable_string_literals":{
      "delimiters":[
         "\""
      ],
      "escape_character":"\\"
   },
   "raw_string_literals": [],
   "comments":[
      [
         "Multiline",
         "/*",
         "*/"
      ],
      [
         "Until_newline",
         "//"
      ]
   ]
}

Put the contents above in a JSON file, like my-language.json, and then specify your file with the -custom-matcher flag. The following runs the custom language rewrite on all files with the extension .newlang:

comby -custom-matcher my-language.json 'match...' 'rewrite...' .newlang

If you want your missing language to be built into Comby, open a feature request, or have a look at the languages file which can be modified for additional languages.

Note that languages can currently be added and expanded with respect to syntactic code structures that Comby recognizes: balanced delimiters, comments, and kinds of string literals. By design, it currently isn't possible to further refine the meaning of syntax into keywords or high-level structures like functions.

Sometimes, yes. But often, small changes and refactorings are complicated by nested expressions, comments, or strings. Consider the following C-like snippet. Say the challenge is to rewrite the two if conditions to the value 1. Can you write a regular expression that matches the contents of the two if condition expressions, and only those two? Feel free to share your pattern with @rvtond on Twitter.

if (fgets(line, 128, file_pointer) == Null) // 1) if (...) returns 0
      return 0;
...
if (scanf("%d) %d", &x, &y) == 2) // 2) if (scanf("%d) %d", &x, &y) == 2) returns 0
      return 0;

To match these with comby, all you need to write is if (:[condition]), and specify one flag that this language is C-like. The replacement is if (1). See the live example.

Comby is well-suited for matching and changing coarse syntactic structures. Uses include:

  Custom linter checks and refactorings. See the example catalog for checks in existing tools.

  Bug hunting. Find unchecked functions, incorrect API calls, or copy-paste errors with structured matching that is easier and more powerful than regex.

  Temporarily changing or removing code for tests or analyses. Stubbing or changing code is useful for suppressing spurious warnings, and for refining static analyses or fuzzing.

  A custom templating engine. Because Comby understands balanced delimiters generically, you can easily roll your own templating engine. This web page mixes HTML, LaTeX-like, and Markdown-like syntax to generate the final page using custom templates.

.

Note: Comby is not well-suited to stylistic changes and formatting like "insert a line break after 80 characters". Pair Comby with a language-specific formatter to preserve formatting (like gofmt for the Go language) after performing a change.

Comby does not currently consider whitespace indentation significant. We have plans to support it though! The idea is that your declarative templates will match on code that happens at the correct relative indentation level, for languages like Python. Stay tuned! Of course, a lot of Python code is not sensitive to whitespace indentation, so Comby is still useful (for example, a lot of Python 2 to Python 3 conversions can be written with Comby).

See the feature table and roadmap in the GitHub repository.

Pop in at the Gitter channel for more support.

Resources and demos