a tool for changing
:[code]
Code is so interesting because simple linear sequences of characters
express rich non-linear structures (like trees) and arbitrary information
(like comments). Humans are very creative, and have developed a variety of
ways to order these characters to mean different things. For example,
C-like comments can start with //
but Python-like comments use
#
. Problematically, we use the same characters to mean different
things across languages and in different contexts: a #
in C could
be the start of a macro. Or it could be part of a string like
"#"
. Or it could be a meaningless character inside a C comment
like // #
. Yet at some level, a choice of characters in all
languages correspond to similar underlying structures or information: most
languages have comments and balanced delimiters like parentheses
()
or braces {}
nest expressions, as found in typical
if-statements. The key idea in Comby is to match code that respects the
basics of this richer structure on a per-language basis, and not (only) as
arbitrary sequences of characters. Matching sequences of characters is
something that regex is really good for, but regex is not generally
powerful enough to recognize nested code structures. Regex also supports a
lot of additional functionality that can lead to complex patterns. Comby
tries to cut down on this complexity to make changing code easy, so you'll
find that regex-style match operators are absent in Comby. And that's OK.
Comby is a tool for matching and rewriting code. You start by writing a simple template to match syntax. Look at this Go function:
func main() {
fmt.Println("hello world")
}
We can match the arguments to fmt.Println
with this match
template:
fmt.Println(:[arguments]
)
The :[arguments]
part is called a hole. It saves the matched part
to a variable. In this case, the variable is
called arguments
, but we could have called it something
else, like :[1]
or :[the_1st_arg]
. Your choice! As long as it only
contains alphabet characters, numbers, or underscores.
The :[arguments]
hole matches the "hello world"
string.
We can use it in a rewrite template to rewrite the function,
like this one:
fmt.Println(fmt.Sprintf("comby says %s", :[arguments]
))
Comby takes the match and rewrite templates and replaces the matched part in place:
func main() {
fmt.Println(fmt.Sprintf("comby says %s", "hello world"))
}
Holes are the only special part in match templates, and they always
have the form
. All other characters are
interpreted literally (there's a bit of detail about whitespace that
we'll talk about in the next part). The point is that you never
have to escape any characters in your template. Just say what you mean!:[...]
:[hole]
matches all
characters, including newlines. If the match template
was just
:[file_content]
, it would match all the file
content. The way :[hole]
starts and stops matching depends on the code
structure around it. Let's look at an example, matching on this
Javascript code using the match template on the right:
if (width <= 1280 && height <= 800) {
return 1;
}
if (:[var]
<= :[rest]
)
:[var]
matches until it sees the <=
part coming
after it and matches width
. :[rest]
matches the rest of the
condition: 1280 && height <= 800
. These holes match lazily: they
look for the shortest way to satisfy the match. One way to refine
matching is to add concrete context around holes based on what we care
about. For example, we could match height
to :[height]
with both
of the following templates, which depends on matching different parts of
surrounding code:
if (:[_]
&& :[height]
:[_]
)
if (:[_]
:[height]
<= 800)
Comby tries to make matching code forgiving. Whitespace in the template, like a single space, multiple contiguous spaces, or newlines are interpreted all the same: Comby will match the corresponding whitespace in the source code, but will not care about matching the exact number of spaces, or distinguish between spaces and newlines. Not being strict about whitespace tends to be the right default decision for code in most languages. It means our previous match templates all still work in these cases where our Javascript code is formatted differently:
if (width <= 1280
&& height <= 800) {
return 1;
}
if (width <= 1280
&& height <= 800) {
return 1;
}
If you're wondering about indentation-sensitive languages like Python, be sure to check out the FAQ.
If holes only matched lazily and indiscriminately up to patterns
like <=
it wouldn't be much more special than matching a sequence
of characters. But matching is smarter than that. In many languages,
balanced delimiters like ()
, []
and {}
are always balanced. By default, a match template like
(
will only match characters inside well-balanced
parentheses. Here are two example matches in this code::[1]
)
result = foo(bar(x)
)
+ foobar(baz(x)
)
;
The hole binds to bar(x)
and baz(x)
respectively,
which we can easily rewrite to a different call qux(x)
, for
example. The observant reader will notice that
(
are nested matches. By default, Comby will
match at the toplevel, but nested matches can be found with added
context (e.g., x
)bar(
), or by extracting and rerunning Comby
on modified code. Note that writing a regular expression to do the same
is not easy (simple attempts like :[1]
)\(.*\)
or \(.*?\)
don't work).
Let's change the code above and make it a little more interesting. Suppose it was this Javascript:
var result = foo(bar(x /* arg 1) */
)) + foobar("("
);
Now there's quite a bit of complexity if we want to match the
arguments of foo
and foobar
. A block comment
is inlined
for /* arg 1) */
bar
. Because this is a comment, it shouldn't matter whether
the parenthesis inside are balanced or not. The same goes for the string
literal argument to foobar
: it's not a parenthesis in the code.
The special thing here is that our original match template
(
can stay exactly the same and still matches the
two arguments (in this case, it captures the comment and string):[1]
)
var result = foo(bar(x /* arg 1) */)
)
+ foobar("("
);
Comby understands this interaction between delimiters, strings, and comments and makes reasonable guesses for your language based on file extension (you can also force a particular matcher with a command line option, see the Quick Reference). And, you can always fall back to a generic matcher for files or languages that are not explicitly supported. See the FAQ for language support and extension.
Note that if we tried to use a regex above, our pattern would need to
understand that /* */
delineates comments, otherwise it would
get confused by the parenthesis inside! The same problem comes up for
the string literal argument, which contains an unbalanced parenthesis. A
regular expression that takes all of this into account would get ugly
fast, and that's only for Javascript!
Comby supports the following syntax, which carry special meaning for matching:
matches zero or more characters (including
whitespace, and across newlines) in a lazy fashion. When :[hole]
is used
inside delimiters, as in :[hole]
{
or :[h1]
, :[h2]
}(
, those
delimiters set a boundary for what the hole can match, and the hole will then only match
patterns within those delimiters. Holes can be used outside of delimiters as well.:[h]
)
matches one or more alphanumeric
characters and :[[hole]]
_
.
(with a period at the end) matches
one or more alphanumeric characters and punctuation (like :[hole.]
.
, ;
, and
-
).
(with a :[hole\n]
\n
at the end) matches one or more
characters up to a newline, including the newline.
(with a space) matches only whitespace
characters, excluding newlines. To assign the matched whitespace to
variable, put the variable name after the space, like :[ ]
:[
hole]
.
(with a :[?hole]
?
before the variable
name) optionally matches syntax. Optional holes work like ordinary
holes, except that if they fail to match syntax, the variable is
assigned the empty string ""
. Optional hole support is currently an
experimental
feature.
Using :[hole]
inside
string quotes will match only within the string. This is implemented for
most languages. Comby also understands the
difference between escapable string literals (like
"string"
in C) and raw string literals (like
`string`
in Go), and will know to stop between these
delimiters.
Optional holes
are useful for matching syntax that may or may not exist. For example, a
Go function may or may not have a
receiver. So, to match all
Go functions in a project, including ones with a receiver, use an
optional :[?receiver]
hole.
Looking for
inspiration? Check out these simple code
rewrites and the FAQ.
You can refine matches and rewrite templates with rules in Comby. Rules start with the word where
. A rule can check whether two variables are syntactically equal. For example, we can check for duplicate expressions in if-conditions with the following match template and rule:
if (:[left_side]
&& :[right_side]
)
where :[left_side]
== :[right_side]
This matches code where the programmer perhaps made a mistake and duplicated an expression without changing a variable like x
to y
:
if (x == 500
&& x == 500
)
You can use the !=
operator to check inequality. Multiple conditions can be separated by a comma, and mean "logical and". The following adds a condition to ignore our match case above:
where :[left_side]
== :[right_side]
, :[left_side]
!= "x == 500"
Variables can be compared to other variables or string contents (enclosed by double quotes).
Comby includes experimental language features for sub-matching and rewriting in rules. These features might change slightly in meaning or syntax, but are currently available if you want to experiment with it.
Here is an example using the sub-matching syntax:
where match :[left_side]
{
| "x == 600"
-> false
| "x == 500"
-> true
}
The match { ... }
says to match the text bound to :[left_side]
against each of the match cases | match_case
, and to perform the filter on the right-hand side of the ->
when the pattern matches. Sub-matching statements can nest:
where match :[left_side]
{
| "x == 500"
->
match :[right_side]
{
| "x == 500"
-> true
| "x == 600"
-> false
}
| "x == 600"
-> false
}
A rewrite { ... }
expression can rewrite syntax captured in a hole. This is useful for rewriting repetitions of a pattern. This example converts arguments of a dict
to a JSON-like format, where dict(foo=bar,baz=qux)
becomes {"foo": bar, "baz": qux}
:
dict(:[args]
)
:[args]
where rewrite :[args]
{ ":[[k]]
=:[[v]]
" -> "\":[k]
\": :[v]
" }
The pattern rewrites every matching instance of
to :[[k]]
=:[[v]]
"
.
The contents of the :[k]
": :[v]
:[args]
hole are overwritten if the rewrite pattern
fires. Note that the left and right hand sides inside the { ... }
need
enclosing string quotes. This means that our pattern needs to escape the double quotes
on the right hand side.
Conceptually, a rewrite rule works the same way as a toplevel match and rewrite template, but only for a particular hole, and has the effect of overwriting the hole contents when there are substitutions.
It is possible to have sequences of rewrite expressions in a rule. Here a second rewrite expression adds quotes around :[v]
:
where
rewrite :[args]
{ ":[[k]]
=:[[v]]
" -> "\":[k]
\": :[v]
" },
rewrite :[args]
{ ": :[[v]]
" -> ": \":[v]
\"" }
The rewrite
expressions are evaluated in a left-to-right sequence and overwrite :[args]
in every case where expressions succeed. Rewrite expressions always return true
, even if they don't succeed in rewriting a pattern. What this means for the example above is that the first rewrite expression will be attempted on :[args]
. Even if it does not succeed in rewriting any patterns, the second rewrite expression will also be attempted. If neither rewrite expression change the contents of :[args]
, it remains unchanged in the output of the toplevel rewrite template.
It is not currently possible to nest rewrite statements, though there are plans for support in the near future. For more about future plans, see the roadmap.
The project is available on GitHub: https://github.com/comby-tools/comby.
brew install comby
(Mac OS X)
bash <(curl -sL get.comby.dev)
(Ubuntu Linux)
docker pull comby/comby
(Docker)
See the GitHub project for others or to build from source.
All the examples on comby.live and in the catalog are just a copy away from working on your command-line (just click on terminal in comby.live).
echo "foo(a, b)"
| comby 'foo(:[1]
, :[2]
)' 'bar(:[2]
, :[1]
)' -stdin
Output:
------ /dev/null
++++++ /dev/null
@|
-1,1 +1,1 ============================================================
-|
foo
(a
, b
)
+|
bar
(b
, a
)
comby 'foo :[template]
' 'bar :[template]
' .go
comby 'foo :[template]
' 'bar :[template]
' .go -i
comby 'foo :[template]
' 'bar :[template]
' .go -rule 'where :[template]
== "main"
'
comby 'foo :[template]
' 'bar :[template]
' file.go -d foo
comby 'foo :[template]
' 'bar :[template]
' .txt -matcher .js
echo "foo(a, b)"
| comby 'foo(:[1]
, :[2]
)' 'bar(:[2]
, :[1]
)' -stdin -diff
Output:
--- /dev/null
+++ /dev/null
@@ -1,1 +1,1 @@
-foo(a, b)
+bar(b, a)
comby 'foo(:[1]
, :[2]
)' 'bar(:[2]
, :[1]
)' sub.js -review
Output:
------ sub.js
++++++ sub.js
@|
-1,3 +1,3 ============================================================
-|
function subtract
(param1
,
param2
) {
+|
function subtract(param2
, param1
) {
|
return param1 - param2;
|
}
Accept change (y = yes
[default], n = no
, e = edit original
, E = apply+edit
, q = quit)?
comby .go -templates /path/to/directory
A rewrite pattern should be described by two files in
path/to/directory
, one named match
and the other named
rewrite
. An optional rule can be put in the same directory, in
a file called rule
. See the catalog directory
layout for a sample
catalog of templates.
Comby supports
basic delimiter matching for common characters like ()
,
{}
, and []
using a generic matcher. This works as a
fallback for data formats like JSON, new languages, and existing ones that
may not have explicit support yet (like VHDL). The grammars for the
following languages have been refined to take into account basic
language-specific delimiters, comments, and string literals:
Assembly, Bash, C/C++, C#, Clojure, CSS, Dart, Elm, Elixir,
Erlang, Fortran, F#, Go, Haskell, HTML/XML, Java, Javascript, JSX,
JSON, Julia, LaTeX, Lisp, Nim, OCaml, Pascal, PHP, Python, Reason,
Ruby, Rust, Scala, SQL, Swift, Plain Text, TSX, Typescript
Note: Comby cannot recognize arbitrary matching tags like <foo>...</foo>
in HTML or XML yet (we do have plans to support it soon). Matching tags within angle
brackets <...>
works.
Hopefully the language you're interested is already supported. If not, you can define your own language in a simple JSON file and pass it as a custom matcher. Just define the following supported language constructs in JSON, as follows:
{
"user_defined_delimiters"
:[
[
"case"
,
"esac"
]
],
"escapable_string_literals"
:{
"delimiters"
:[
"\""
],
"escape_character"
:"\\"
},
"raw_string_literals"
: [],
"comments"
:[
[
"Multiline"
,
"/*"
,
"*/"
],
[
"Until_newline"
,
"//"
]
]
}
Put the contents above in a JSON file, like my-language.json
, and then specify your file with the -custom-matcher
flag. The following runs the custom language rewrite on all files with the extension .newlang
:
comby -custom-matcher my-language.json 'match...' 'rewrite...' .newlang
If you want your missing language to be built into Comby, open a feature request, or have a look at the languages file which can be modified for additional languages.
Note that languages can currently be added and expanded with respect to syntactic code structures that Comby recognizes: balanced delimiters, comments, and kinds of string literals. By design, it currently isn't possible to further refine the meaning of syntax into keywords or high-level structures like functions.
Sometimes, yes. But often, small changes and refactorings are
complicated by nested expressions, comments, or strings. Consider the
following C-like snippet. Say the challenge is to rewrite the two if
conditions to the value 1
. Can you write a regular expression that
matches the contents of the two if condition expressions, and only those
two? Feel free to share your pattern with
@rvtond on Twitter.
if (fgets(line, 128, file_pointer) == Null) // 1) if (...) returns 0
return 0;
...
if (scanf("%d) %d", &x, &y) == 2) // 2) if (scanf("%d) %d", &x, &y) == 2) returns 0
return 0;
To match these with comby, all you need to write is if (
, and specify one flag that this language is C-like. The
replacement is :[condition]
)if (1)
. See the live
example.
Comby is well-suited for matching and changing coarse syntactic structures. Uses include:
Custom linter checks and refactorings. See the example
catalog for checks in existing
tools.
Bug hunting. Find unchecked functions, incorrect API calls, or
copy-paste errors with structured matching that is easier and more
powerful than regex.
Temporarily changing or removing code for tests or analyses.
Stubbing or changing code is useful for suppressing spurious
warnings, and for refining static analyses or fuzzing.
A custom templating engine. Because Comby understands balanced
delimiters generically, you can easily roll your own templating
engine. This web page mixes HTML, LaTeX-like, and Markdown-like
syntax
to generate the final page using custom
templates.
Note: Comby is not well-suited to stylistic changes and formatting like
"insert a line break after 80 characters". Pair Comby with a
language-specific formatter to preserve formatting (like gofmt
for the
Go language) after performing a change.
Comby does not currently consider whitespace indentation significant. We have plans to support it though! The idea is that your declarative templates will match on code that happens at the correct relative indentation level, for languages like Python. Stay tuned! Of course, a lot of Python code is not sensitive to whitespace indentation, so Comby is still useful (for example, a lot of Python 2 to Python 3 conversions can be written with Comby).
See the feature table and roadmap in the GitHub repository.
Pop in at the Gitter channel for more support.
A demo of how it's possible to support future rewrites for proposed syntax extensions like optional chaining in ES/JS. This just one syntactic variant of course; the fun part is how it's possible to support new syntax changes on concrete syntax trees even before it's adopted. https://t.co/jQuhNeTmXA pic.twitter.com/ZzaAXHsLqK
— Rijnard van Tonder (@rvtond) September 18, 2019
Here's a declarative way to rewrite a list concat of strings using an infix operator in #Haskell instead (this change inspired by a HLint check). Replace repeated patterns like ',' with '++', but only inside a list. Hear more tomorrow @ 1:30PM ST Ovation @strangeloop_stl pic.twitter.com/hIxO8uFRHV
— Rijnard van Tonder (@rvtond) September 13, 2019