I strongly believe that the Crystal programming language has great potential. The only thing that, in my mind, is holding it back a bit is its slow compile times, which become more and more noticeable as your project grows.
Why is the compiler slow? And can we improve the situation? This is what I'm going to talk about here, and think about solutions as we go.
Why the compiler is slow?
The main reason is that the compiler compiles the entire program, from scratch, every time you invoke it. Well, it at least does a full semantic analysis of your program: there is some caching going on regarding producing object files. But semantic analysis is always done, from scratch, for the entire program.
And when I say "the entire program" it's the code that you wrote, but also the code from shards you use, and even the standard library. And, well, it's not the "entire program" because only those types and methods that are effectively used need to be analyzed and compiled, but it can still be quite a lot of code.
The actual algorithms used in the compiler are good and fast. We could try to make them even faster. But, as you add more and more code to your program, inevitably, no matter how fast the algorithms, it will take more and more time to compile programs.
Why is semantic analysis done from scratch every time?
Not doing semantic analysis from scratch means doing it by chunks, and reusing that previous knowledge.
Most, if not all, compiled programming languages will analyze and compile individual files and then link them afterwards. Can we do the same in Crystal?
Let's consider this file:
# dependency.cr
def add(x, y)
x + y
end
Well, we have no idea what are the types of x
and y
. If we call it with two ints then we can type and compile that method with that information. And the same is true if we pass it two strings. Or any object that has a +
method with an argument. So we can't type and compile that file just like that.
That's one thing. Another thing is not knowing what types resolve to. For example in this file:
# dependency.cr
def something
Foo.bar
end
something
Here something
doesn't take any arguments so we could try to type and compile it right away. But if we do so... what's Foo
? And what's the class method bar
in it? We can't know. Well, maybe a require is missing in that file:
# dependency.cr
require "foo"
def something
Foo.bar
end
something
and now if foo.cr
defines Foo
all is good. But Crystal (and Ruby) allow not requiring a dependency like that, and instead it can come from somewhere else. For example:
# caller.cr
require "dependency"
class Foo
def self.bar
1
end
end
Here caller.cr
defines a class Foo
with a class method bar
, and requires dependency
which in turns defines something
that uses that Foo
. And this is perfectly fine, even though there's no explicit mention of where does Foo
come from in something
.
The conclusion so far is that it's almost always impossible to look at a single file and type it without knowing the full program.
Why Crystal allows writing programs like that?
At this point one might wonder "is it a good idea that a file can work without it specifying where do types and methods come from?" But here I'm not going to try to answer that question, which is mainly subjective, and instead try to focus on how we could improve Crystal as it is right now. Of course Ruby could be "improved" (made faster) in many ways if we changed it (like, disallow some dynamisms, force some type declarations, etc.) but Ruby is being improved while not changing the way it is, while keeping its philosophy. Ruby has a vision, and it's being improved while keeping that vision. And I think that with Crystal we should strive to do the same.
Incremental compilation
So let's discard for now the idea of being able to compile files individually. What else can we do?
One idea is to compile the entire program first, and then try to reuse that knowledge for subsequent compilations. It would work like this:
- We compile the entire program and build a dependencies graph: which files depend on which other files?
- We also remember the types used in methods? For example if
add(x, y)
was called with two ints, we remember that it returns an int. - Next time we compile a program, if we find a call to
add(x, y)
with two ints, we check if the file whereadd
was defined, or any of its dependencies (recursively) changed, and if not, we could avoid typing that method again as we would know that it returns an int.
Would that really work?
Looking at file dependencies in real programs
I created a branch that modifies the compiler to output a file dependencies graph for a given program. You can try it by checking out that branch, creating a compiler from it and then compiling any program. This will generate a dependencies.dot
file which you can then turn into a PDF by using this command:
$ dot -Tpdf dependencies.dot > dependencies.pdf
(you'll need to install graphviz first)
Running it against this program:
# foo.cr
puts "Hello world!"
will produce this graph:
What a lovely tree-like graph! It's very easy to see how no cycles exist here.
They say "beauty lies in the eyes of the beholder" but I hope we can all agree that the graph above is a mess.
Still, within all that mess, we can find this:
kernel.cr
defines puts
, so it's natural that foo.cr
dependes on it. But what's interesting is that nobody depends on foo.cr
(naturally.) And that also means that if next time we only change foo.cr
then we can reuse everything else from the previous compilation.
Trouble ahead
This looks promising. But running the tool with some bigger programs I found something. Let's take a look at this program:
# foo.cr
class Foo
def to_s(io : IO)
io << "foo"
end
end
puts Foo.new
The graph near foo.cr
now looks like this:
Now foo.cr
has more dependencies, but something curious is that someone is now depending on foo.cr
. Who?
We could play the "trace the line" game to find it, but looking at the dot file there's this:
"src/io.cr" -> "foo.cr"
The above means "src/io.cr" depends on "foo.cr". WAT? How can the file that defines IO
depend on "foo.cr"? It should be the other way around!
Debugging this a bit, we can see that the call the program makes is this one:
puts Foo.new
Next we have that the definition of puts
is:
def puts(*objects) : Nil
STDOUT.puts *objects
end
So that's calling puts
on an IO
(STDOUT
). Let's take a look at that:
class IO
def puts(obj : _) : Nil
self << obj
puts
end
end
This is calling IO#<<(obj)
where obj
is an instance of Foo
. Let's take a look at that <<
method:
class IO
def <<(obj) : self
obj.to_s self
self
end
end
So this is calling obj.to_s(self)
, where obj
is an instance of Foo
. And where is that method defined? In foo.cr
! Bingo!
That's why suddenly io.cr
depends on foo.cr
.
One way to understand this is to know that in Crystal all methods are monomorphized. Monomorwhat? It just means that if you call IO#puts
with a type X
, a method will be generated for that particular X
. If you call it with a Y
, a method will be generated for that particular Y
(for that particular type.) In the example above, an IO#<<(obj)
method was generated where obj
is of type Foo
.
This is one reason that, I think, explains how messy the graph above is.
In other compiled, statically typed programming languages, this doesn't happen. Usually obj
responds to some explicit interface and then the method lies behind a virtual table and a virtual dispatch. None of this happens in Crystal.
Does it matter?
Does it matter that io.cr
depends on foo.cr
? Maybe it's not a big deal. Well, it probably is. It means that changes to a file are likely to affect seemingly unrelated files. And so if we try to use the "reuse previous compilation" strategy, not a lot could be reused in the end.
The end?
This is not the end of the story. It's just the end of this blog post. I didn't continue thinking about how to approach this problem, but hopefully these posts spark some interest and some thoughts from others about how to solve this interesting problem.
Top comments (7)
Does this mean typing parameters to an interface (a module) still generates overloads of the method for each concrete class that uses it?
Does that produce overloads of
write_to_diary
when they use the method?I mainly ask because it seems like
IO#<<
could type it's parameter asObject
if that was allowed and avoid the cyclical dependency but if that kind of thing still generates the overloads I see why that's not worthwhile to allow.Also, how does the cyclical dependency affect compilation? Does the whole of IO have to be recompiled now? How does that affect its other dependencies? I'm interested in learning about the ripples of what the recompile looks like
That's a great question!
So, if you call the method like this:
then compiling with
--emit llvm-ir
we can see these function definitions:write_to_diary<Foo>:IO::FileDescriptor
write_to_diary<Bar>:IO::FileDescriptor
Foo@Writer#write<IO::FileDescriptor>:IO::FileDescriptor
Bar@Writer#write<IO::FileDescriptor>:IO::FileDescriptor
So one method per class was generated.
However, if we have a variable whose type is
Writer
:(the above can also happen if you have an instance variable of type
Writer
and you assign aFoo
to it, and then read it back)then looking at the generated LLVM IR we find:
write_to_diary<Writer>:IO::FileDescriptor
Foo@Writer#write<IO::FileDescriptor>:IO::FileDescriptor
Bar@Writer#write<IO::FileDescriptor>:IO::FileDescriptor
So the first call didn't produce a multidispatch. The second one did, but it could have been avoided. I think we did it like that because modules in Ruby are mainly used to mix-in code into other types, not like interfaces. Eventually we added abstract methods and started using modules as interfaces. And we even thought about distinguishing between modules as mix-ins and modules as interfaces... but we never got to do anything about that. One idea was to automatically detect whether a module was used as an interface or as a mixin. But maybe something more explicit could work. But... for interfaces we'd probably require explicit type annotations. Anyway, these were all just thoughts.
Now, if instead of a module you use a class as a base type, and do the same as above:
we can find these functions:
write_to_diary<Writer+>:IO::FileDescriptor
Writer+@Writer#write<IO::FileDescriptor>:IO::FileDescriptor
That is, the
write
call didn't produce a multidispatch.Why it works fine for classes? I think now I remember a reason. Let's say you have this code:
So we output the value of
@a
insideWriter
, but it can be an int or a string on each subclass. So if thewrite
method is only produced once, forWriter
, what happens when trying to access@a
? Well, the answer is that the code above doesn't compile. It will complain that you didn't define@a
inWriter
. It works fine if you hide@a
behind a method, because that will produce a multi-dispatch.For modules you aren't required to define their instance variables, because their methods don't go through a base type (like the example above.)
Anyway, this reply is becoming very long, but now that I think about it, we later introduced multi-dispatch for instance var reads, so maybe we could avoid doing a multidispatch on a module as much as possible. Not sure. I'm also not sure how much it will help with compilation times.
Excellent writeup! During the Advent of Code 2022 puzzle season I solved some of the problems in different languages, but mostly Crystal and Nim. I did some very informal benchmarking and found that my programs in both languages ran quickly and had low memory consumption, and the Crystal ones usually had slightly more concise source code, but took much longer to compile. Like 10 times longer. I think part of this is due to Nim requiring all functions to have type annotations, while Crystal is a bit more relaxed.
It would be great to improve things though, because I imagine larger programs might take a very long time to build in release mode.
Thank you Ary, this is a great article! It goes into the details of compiler operations and explains a complex topic. But it's easy to follow along even without much compiler experience.
I'm eager to discuss more options for compiler optimization.
One aspect I'm wondering about: You demonstrate an example where incremental compilation doesn't work / is very complicated.
But I suppose it could work well for other code?
Maybe it would still be useful to have incremental compilation for pieces of code that can be pretty much isolated? Even if we can only cache a portion of the program's code, that should be something. I don't know how much that would affect and what would be necessary to even detect possible cases.
That's a great point! And in fact I was thinking about writing about that in my next article. I don't know the answer yet, so we'll find out soon :-)
Have you looked at using Red Green tree for the AST. This comes from the Roslyn compiler, I think.
Thanks. I just looked into it. It seems those are used to reuse syntax. But here we need to reuse semantic.