Programming Crystal Book Club

mafinar · 26 April 2021 14:12

Crystal recently reached version 1. I had been following it for awhile but never got to really learn it. Most languages I picked up out of hobby the past few years were all functional or functional first language so I those this time around let’s get into a pleasant OOP language. This one seems pleasant enough (fast enough too).

I love reading books and learning from it, especially programming languages. And there is a book about Crystal, I’d work my way through #book-programming-crystal and will document my progress and general feelings about it through this thread. @AstonJ will be joining me in this too - and we’d love for more of you join in as well!

Mostly I’ll be sharing my thoughts, discoveries and feelings as I work from chapter to chapter. I will be cross referencing the documents and exploratory programming results and mention of any new updates or changes the language received at version 1 (which was last month, so after the book was published), so that anyone joining the club afterwards can know the upgrade stories.

Let’s have fun learning and talking Crystal!

AstonJ · 27 April 2021 21:44

Nice one Mafinar!! I’ve been meaning to learn Crystal for a while since it is so close to Ruby (so I think should be fairly easy to pick up) and a great time to now given it hit 1.0 last month

I’m not sure when I will be able to jump into the book club myself as I want to finish Programming Erlang first, but will definitely be keeping an eye on this thread for you and anyone else who gets started first. If I get a minute I will post on the Crystal forums to ask if anyone else would like to join us too

For those of you interested in Crystal, it would be great for you to join us - the more the merrier as they say!

AstonJ · 30 April 2021 00:59

Just a quick note to say I posted on their forum and one of the core team members mentioned that with the book being two years old Crystal has changed a bit since, and there are some more recent tutorials here. So wondering whether it might be a good idea to go through those first or after?

mafinar · 30 April 2021 01:04

Yes I was aware of it, and I did do a super-fast skimming of their guides. So what I will do is, during my chapter updates, if I find something that is dated, or something that is better handled by a 1.0 feature, I’ll not those things. My main focus is to learn the language, and I will probably do some Advent of Code stunts and attach it here too. So really, I want to treat this as my language learning journal where the book is a waypoint marker and a soft reference

AstonJ · 30 April 2021 01:16

Ah that’s awesome! I think you doing that will help a lot of people who read the book too - thank you!

mafinar · 1 May 2021 17:32

Day 1, Chapter 1

Okay, here I am with the first experience.

So just read the first chapter today- the sales pitch chapter, which is the first of two introductory chapters in most programming language books (the other being syntax pitch).

This was a fairly pleasant read, introduced the language, the strengths, some nice benchmarks. What was missing here was “weakness” of Crystal, something that you see in other languages, like in Elixir ones, you hear about its number crunching (not anymore, hopefully). It would’ve been good if we had a small thing on what Crystal is not good at (for now any way).

So I did do some experiments from the book, namely, a chunk of code where Crystal and Ruby are syntactically equivalent, and then ran ~~lies, damn lies, and~~ benchmarks on both. And why pick on Ruby, a dynamically typed language, and not Go? So I went ahead and did the same thing with Go. So here are the results (varies from the book but the relativity is real).

 [0] % time ./crystal_fib
 701408732
 ./fibonacci  2.87s user 0.01s system 99% cpu 2.878 total

 [1] % time ./go_fib
 701408732
 ./fib  3.58s user 0.01s system 99% cpu 3.591 total

 [2] % time ruby fibonacci.cr
 701408732
 57.51s user 0.11s system 99% cpu 57.747 total

Oh and here’s the code snippet (Crystal and Ruby, same syntax for this one):

def fib(n)
    return n if n <= 1
    fib(n - 1) + fib(n - 2)
end

sum = 0

(1..42).each do |i|
    sum += fib(i)
end

puts sum

Benchmarks are never limited to just one simple story, but still, that was fun, especially the part where I wrote Go after like a 100 years. Also, Crystal’s pretty fast.

Then there were some web framework benchmarks that I skimmed and kept for later trial, some nice charts where Crystal’s large where “large is good”, and some relative algorithm execution speed. Crystal’s selling point in my opinion, are speed and beauty, this chapter focused more on speed since the beauty of the syntax is in the examples itself (and will be demonstrated throughout the other chapters any way).

After the benchmark, I enjoyed discussions on web, database, typing, nil safety (I will read this more and treat it with respect in the review that follows) and at the end, which I’d say one of the best things about this book: A Company’s Story- a chat excerpt with the CEO and CTI of Red Panthers. I hope every chapter ends with this type of note. Reminded me a little of Adopting Elixir and why I felt motivated after reading that book.

I had mentioned earlier that this book was written some time ago and Crystal updated since then, I will be leaving some notes on those updates that might aid other readers (and myself) when returning to the chapters. However, this being the first chapter, nothing here felt outdated to me (I mean, the fibonacci syntax ran on v1), but I will jot down the differences and open a repository of experiments that I do with this book.

Well that’s my view on the first chapter. Also the end of the first part. It was a good read, and I am looking forward to getting to do some real code now!

AstonJ · 1 May 2021 19:35

Great update Mafinar!!

Nice to see some figures compared to Go - and those results are pretty good!!! Would be cool to see more like that if you come across them in the book (maybe for fun, add a few other languages too?)

I think that will be super useful for others - thank you!

I look forward to seeing your future updates

mafinar · 2 May 2021 00:46

A quick update

I did do a quick skimming Chapter 2. It was mentioned that if one knows Ruby, then one could get away skimming this chapter. However, I never really used Ruby, but I did use 10 other languages that look like it, so I figured, let’s skip that.

Chapter two was the second of the two introductory pitches most programming language books give, the whirlwind syntax tour pitch. I will get to the details of it in a later update but I ended up liking the experience, I already solved an Advent of Code puzzle last night while watching Castlevania (again) and will probably aim towards solving an AoC or two when I need to sharpen my procedural programming claws with something other than Python. Here is the link. I am very particular about tooling and code organization so this repository will go through some metamorphosis till I can eli5 the organization.

I also bought this book.. I love the experience of learning a programming language the University way (i.e. learn the language, then data structure, solve problems that are the superhuman version of problems you mostly would do at work) without any ~enterprise~ earthly product or job description attached to it. So will solve exercises of that book with Crystal too. (My next goal is, Mazes for Programmer but I don’t want to get ahead of myself here, nothing’s ever guaranteed and the pandemic is making a good job of reminding us of that).

After sharing my Chapter 2 experience (part two of this one), I will train my muscles to play with Crystal syntax nicely and this thread will turn into a demonstration of Crystal code, or not.

So this was a short update, lock-down has given me loads of time. Have a great weekend folks!

OvermindDL1 · 2 May 2021 07:11

mafinar:

 [0] % time ./crystal_fib
 701408732
 ./fibonacci  2.87s user 0.01s system 99% cpu 2.878 total

 [1] % time ./go_fib
 701408732
 ./fib  3.58s user 0.01s system 99% cpu 3.591 total

 [2] % time ruby fibonacci.cr
 701408732
 57.51s user 0.11s system 99% cpu 57.747 total

Oooo I love benchmarks! You didn’t post your go code and I don’t want to install ruby right now, but I have crystal 1.0.0 installed currently so let’s try it with your code above and this rust code:

fn fib(n: usize) -> usize {
    match n {
        0 | 1 => 1,
        n => fib(n - 1) + fib(n - 2),
    }
}

fn main() {
    let sum: usize = (0..42usize).map(fib).sum();
    println!("{}", sum);
}

And I compiled crystal like:

❯ shards build --release
Dependencies are satisfied
Building: crystal_fib
Error target crystal_fib failed to compile:
/usr/bin/ld: cannot find -levent (this usually means you need to install the development package for libevent)
collect2: error: ld returned 1 exit status

Uh wut? I’m not using any deps, this is as basic as it gets, crystal doesn’t come with everything it already needs?!? Well let’s get libevent I guess…

I love how it said Dependencies are satisfied in green first, lol, quit lying…

So trying again (it took longer to build than the rust one, about 3 times longer, how?!?):

❯ shards build --release     
Dependencies are satisfied
Building: crystal_fib

And ditto with the rust one:

❯ cargo run --release
   Compiling testing v0.1.0 (/home/overminddl1/rust/tmp/rust_fib)
    Finished release [optimized] target(s) in 0.38s
     Running `target/release/rust_fib`
701408732

And I got these times very consistently across a lot of manual runs but I ran them both with hyperfine anyway (a great tool to benchmark multiple cli tools statistically) for the pretty output as well:

❯ time ./crystal_fib/bin/crystal_fib
701408732
./crystal_fib/bin/crystal_fib  4.79s user 0.00s system 100% cpu 4.793 total

❯ time ./rust_fib/target/release/rust_fib
701408732
./rust_fib/target/release/rust_fib  1.79s user 0.00s system 99% cpu 1.792 total

❯ hyperfine --warmup 3 './crystal_fib/bin/crystal_fib' './rust_fib/target/release/testing' 
Benchmark #1: ./crystal_fib/bin/crystal_fib
  Time (mean ± σ):      4.797 s ±  0.028 s    [User: 4.791 s, System: 0.004 s]
  Range (min … max):    4.770 s …  4.868 s    10 runs
 
Benchmark #2: ./rust_fib/target/release/rust_fib
  Time (mean ± σ):      1.808 s ±  0.019 s    [User: 1.804 s, System: 0.002 s]
  Range (min … max):    1.785 s …  1.847 s    10 runs
 
Summary
  './rust_fib/target/release/rust_fib' ran
    2.65 ± 0.03 times faster than './crystal_fib/bin/crystal_fib'

And here are the versions of each I’m using:

❯ crystal --version
Crystal 1.0.0 [dd40a2442] (2021-03-22)

LLVM: 10.0.0
Default target: x86_64-unknown-linux-gnu

❯ rustc --version
rustc 1.51.0 (2fd73fabe 2021-03-23)

So they both should be the most recent, checking, yeah they are both the most recent.

I wonder why the rust code is so much faster than the crystal code… Let’s dump the rust assembly:

❯ RUSTFLAGS="--emit asm -C llvm-args=-x86-asm-syntax=intel" cargo build --release
   Compiling testing v0.1.0 (/home/overminddl1/rust/tmp/rust_fib)
    Finished release [optimized] target(s) in 0.30s

❯ bat ./target/release/deps/rust_fib-fe8442559c3857a8.s
../*snip to the main function*/
144   │     mov edi, 2
 145   │     call    _ZN8rust_fib3fib17h2021b2594a90a8b6E
 146   │     mov r14, rax
 147   │     mov edi, 3
 148   │     call    _ZN8rust_fib3fib17h2021b2594a90a8b6E
 149   │     mov rbx, rax
 150   │     add rbx, r14
 151   │     mov edi, 4
 152   │     call    _ZN8rust_fib3fib17h2021b2594a90a8b6E
 153   │     mov r14, rax
 154   │     add r14, rbx
 155   │     mov edi, 5
 156   │     call    _ZN8rust_fib3fib17h2021b2594a90a8b6E
 157   │     mov rbx, rax
 158   │     add rbx, r14
.../*snip this is repetitive, the whole loop was unrolled, let's look at the `fib` function then:*/
 91   │ _ZN8rust_fib3fib17h2021b2594a90a8b6E:
  92   │     .cfi_startproc
  93   │     push    r14
  94   │     .cfi_def_cfa_offset 16
  95   │     push    rbx
  96   │     .cfi_def_cfa_offset 24
  97   │     push    rax
  98   │     .cfi_def_cfa_offset 32
  99   │     .cfi_offset rbx, -24
 100   │     .cfi_offset r14, -16
 101   │     mov r14d, 1
 102   │     cmp rdi, 2
 103   │     jb  .LBB5_4
 104   │     mov rbx, rdi
 105   │     xor r14d, r14d
 106   │     .p2align    4, 0x90
 107   │ .LBB5_2:
 108   │     lea rdi, [rbx - 1]
 109   │     call    _ZN8rust_fib3fib17h2021b2594a90a8b6E
 110   │     add rbx, -2
 111   │     add r14, rax
 112   │     cmp rbx, 1
 113   │     ja  .LBB5_2
 114   │     add r14, 1
 115   │ .LBB5_4:
 116   │     mov rax, r14
 117   │     add rsp, 8
 118   │     .cfi_def_cfa_offset 24
 119   │     pop rbx
 120   │     .cfi_def_cfa_offset 16
 121   │     pop r14
 122   │     .cfi_def_cfa_offset 8
 123   │     ret
 124   │ .Lfunc_end5:
 125   │     .size   _ZN8rust_fib3fib17h2021b2594a90a8b6E, .Lfunc_end5-_ZN8rust_fib3fib17h2021b2594a90a8b6E
 126   │     .cfi_endproc

So that looks decently optimized though I’m fairly sure I could do it shorter, but this might be faster, eh… Let’s see… does the crystal compiler have a way to output the assembly? Hmm… Although crystal is using LLVM as its optimizer like Rust is too so I’d be surprised if their code wasn’t identical output, but their significant time difference states otherwise… Ah yes! It has the same --emit that rust does, gotta love clang standards:

❯ crystal build --release --emit asm crystal_fib.cr

❯ bat ./crystal_fib.s
/*snip to the main*/
/*Wow there is a *LOT* of extra code here... the crystal executable is 3.2megabytes
  unstripped and rust is 785kb unstripped, holy that's a huge difference... rust is
  287kb stripped and crystal's is 396kb stripped, still an oddly big difference... */
/* Oh, I see GC stuff in here... right... Finally found main:*/
 999   │ .Ltmp147:
1000   │     .loc    7 9 12
1001   │     movl    $1, %edi
1002   │     callq   "*fib<Int32>:Int32"
1003   │     movl    %eax, %ebp
1004   │     movl    $2, %edi
1005   │     callq   "*fib<Int32>:Int32"
1006   │     movl    %eax, %ebx
1007   │     addl    %ebp, %ebx
1008   │     jo  .LBB0_139
1009   │     movl    $3, %edi
1010   │     callq   "*fib<Int32>:Int32"
1011   │     addl    %eax, %ebx
1012   │     jo  .LBB0_139
/*snip more, same kind of stuff although I wonder what's up with the `jo` instruction,
  it overall looks more inefficient, how weird... let's look at `fib` since it will
  dominate the time anyway, found it:*/
76386   │     .type   "*fib<Int32>:Int32",@function
76387   │ "*fib<Int32>:Int32":
76388   │ .Lfunc_begin239:
76389   │     .loc    7 1 0
76390   │     .cfi_startproc
76391   │     pushq   %rbp
76392   │     .cfi_def_cfa_offset 16
76393   │     pushq   %rbx
76394   │     .cfi_def_cfa_offset 24
76395   │     pushq   %rax
76396   │     .cfi_def_cfa_offset 32
76397   │     .cfi_offset %rbx, -24
76398   │     .cfi_offset %rbp, -16
76399   │     movl    %edi, %ebx
76400   │ .Ltmp10165:
76401   │     .loc    7 2 5 prologue_end
76402   │     cmpl    $1, %edi
76403   │     jg  .LBB239_2
76404   │     movl    %ebx, %eax
76405   │     jmp .LBB239_4
76406   │ .LBB239_2:
76407   │     .loc    7 0 5 is_stmt 0
76408   │     movb    $1, %al
76409   │     .loc    7 2 5
76410   │     testb   %al, %al
76411   │     je  .LBB239_5
76412   │     leal    -1(%rbx), %edi
76413   │     .loc    7 3 5 is_stmt 1
76414   │     callq   "*fib<Int32>:Int32"
76415   │     movl    %eax, %ebp
76416   │     addl    $-2, %ebx
76417   │     .loc    7 3 18 is_stmt 0
76418   │     movl    %ebx, %edi
76419   │     callq   "*fib<Int32>:Int32"
76420   │     addl    %ebp, %eax
76421   │     jo  .LBB239_5
76422   │ .LBB239_4:
76423   │     .loc    7 0 0
76424   │     addq    $8, %rsp
76425   │     .cfi_def_cfa_offset 24
76426   │     popq    %rbx
76427   │     .cfi_def_cfa_offset 16
76428   │     popq    %rbp
76429   │     .cfi_def_cfa_offset 8
76430   │     retq
76431   │ .LBB239_5:
76432   │     .cfi_def_cfa_offset 32
76433   │     callq   __crystal_raise_overflow@PLT
76434   │ .Ltmp10166:
76435   │ .Lfunc_end239:
76436   │     .size   "*fib<Int32>:Int32", .Lfunc_end239-"*fib<Int32>:Int32"
76437   │     .cfi_endproc

Hmm, so yeah crystal isn’t optimizing it anywhere near as well as the rust code, there’s a lot more instructions to do the same work, that would easily add up to the snowball effect of slowdown that it gets over rust…

I wonder… It looks like the crystal code is using 32-bit calls in the machine code, that’s exceptionally weird… The integer size wasn’t defined in the source (why?!?) so let’s define it, changing the crystal source to be this, actually why doesn’t it have an integer size of just “using the platform word size”, oh well, using 64 bit then to match the rust code:

def fib(n : UInt64)
    return n if n <= 1_u64
    fib(n - 1_u64) + fib(n - 2_64)
end

sum = 0_u64

(1_u64..42_u64).each do |i|
    sum += fib(i)
end

puts sum

I’m still fascinating at how long crystal takes to compile, it’s so much longer than rust from a clean compile, very odd… But running it:

❯ shards build --release
Dependencies are satisfied
Building: crystal_fib

❯ time ./bin/crystal_fib
Unhandled exception: Arithmetic overflow (OverflowError)
  from src/crystal_fib.cr:3:20 in 'fib'
  from src/crystal_fib.cr:9:12 in '__crystal_main'
  from /home/overminddl1/.asdf/installs/crystal/1.0.0/share/crystal/src/crystal/main.cr:110:5 in 'main'
  from __libc_start_main
  from _start
  from ???
./bin/crystal_fib  0.01s user 0.01s system 125% cpu 0.014 total

Well, this is interesting, lol. I’m very sure I put u64 on everything, and u64 definitely holds the value… It happens pretty immediately too, even more weird… Wait, it says it’s crashing on the + between the two fib calls, wha? How? Wha? Is the plus trying a 32 bit add on 64 bit values?!?

Wait, I put 2_64, and that compiled and ran and all… fixing that to 2_u64, so the code is now this:

def fib(n : UInt64) : UInt64
    return n if n <= 1_u64
    fib(n - 1_u64) + fib(n - 2_u64)
end

sum = 0_u64

(1_u64..42_u64).each do |i|
    sum += fib(i)
end

puts sum

Now it runs:

❯ shards build --release
Dependencies are satisfied
Building: crystal_fib

❯ time ./bin/crystal_fib
701408732
./bin/crystal_fib  4.53s user 0.01s system 100% cpu 4.541 total

Hmm, still not any faster… To the assembly!

❯ crystal build --release --emit asm crystal_fib.cr

❯ bat ./crystal_fib.s
/* jumping to the fib */
76364   │     .type   "*fib<UInt64>:UInt64",@function
76365   │ "*fib<UInt64>:UInt64":
76366   │ .Lfunc_begin239:
76367   │     .loc    7 1 0
76368   │     .cfi_startproc
76369   │     pushq   %r14
76370   │     .cfi_def_cfa_offset 16
76371   │     pushq   %rbx
76372   │     .cfi_def_cfa_offset 24
76373   │     pushq   %rax
76374   │     .cfi_def_cfa_offset 32
76375   │     .cfi_offset %rbx, -24
76376   │     .cfi_offset %r14, -16
76377   │     movq    %rdi, %rbx
76378   │ .Ltmp10164:
76379   │     .loc    7 2 5 prologue_end
76380   │     cmpq    $1, %rdi
76381   │     ja  .LBB239_2
76382   │     movq    %rbx, %rax
76383   │     jmp .LBB239_4
76384   │ .LBB239_2:
76385   │     .loc    7 0 5 is_stmt 0
76386   │     movb    $1, %al
76387   │     .loc    7 2 5
76388   │     testb   %al, %al
76389   │     je  .LBB239_5
76390   │     leaq    -1(%rbx), %rdi
76391   │     .loc    7 3 5 is_stmt 1
76392   │     callq   "*fib<UInt64>:UInt64"
76393   │     movq    %rax, %r14
76394   │     addq    $-2, %rbx
76395   │     .loc    7 3 22 is_stmt 0
76396   │     movq    %rbx, %rdi
76397   │     callq   "*fib<UInt64>:UInt64"
76398   │     addq    %r14, %rax
76399   │     jb  .LBB239_5
76400   │ .LBB239_4:
76401   │     .loc    7 0 0
76402   │     addq    $8, %rsp
76403   │     .cfi_def_cfa_offset 24
76404   │     popq    %rbx
76405   │     .cfi_def_cfa_offset 16
76406   │     popq    %r14
76407   │     .cfi_def_cfa_offset 8
76408   │     retq
76409   │ .LBB239_5:
76410   │     .cfi_def_cfa_offset 32
76411   │     callq   __crystal_raise_overflow@PLT
76412   │ .Ltmp10165:
76413   │ .Lfunc_end239:
76414   │     .size   "*fib<UInt64>:UInt64", .Lfunc_end239-"*fib<UInt64>:UInt64"
76415   │     .cfi_endproc

Well it’s definitely 64-bit now, but it’s still using extra instructions and extra branches…

It looks like it’s mainly slower because it’s testing for overflow and jumping to an exception throw if it happens… That’s incredibly poor on such numerical sensitive code, I wonder if there’s a way to turn that off (rust handles this a lot better…), doesn’t seem so from some googling… Let’s change the rust integer operation from wrapping to checked to see if it slows down by the same amount, so the code is now:

fn fib(n: usize) -> usize {
    match n {
        0 | 1 => 1,
        n => fib(n.checked_sub(1).unwrap()).checked_add(fib(n.checked_sub(2).unwrap())).unwrap(),
    }
}

fn main() {
    let sum: usize = (0..42usize).map(fib).sum();
    println!("{}", sum);
}

So that should be doing the same thing as crystal is now, first let’s compile (using nightly because checked_* is nightly right now) and run and time it:

❯ cargo +nightly run --release
   Compiling rust_fib v0.1.0 (/home/overminddl1/rust/tmp/rust_fib)
    Finished release [optimized] target(s) in 0.30s
     Running `target/release/rust_fib`
701408732

❯ time ./target/release/testing
701408732
./target/release/testing  1.80s user 0.00s system 99% cpu 1.797 total

No time change, let’s see if the assembly changed:


/*jumping to fib*/
  88   │     .section    .text._ZN8rust_fib3fib17ha1136bd30e432779E,"ax",@progbits
  89   │     .p2align    4, 0x90
  90   │     .type   _ZN8rust_fib3fib17ha1136bd30e432779E,@function
  91   │ _ZN8rust_fib3fib17ha1136bd30e432779E:
  92   │     .cfi_startproc
  93   │     push    r14
  94   │     .cfi_def_cfa_offset 16
  95   │     push    rbx
  96   │     .cfi_def_cfa_offset 24
  97   │     sub rsp, 8
  98   │     .cfi_def_cfa_offset 32
  99   │     .cfi_offset rbx, -24
 100   │     .cfi_offset r14, -16
 101   │     mov eax, 1
 102   │     cmp rdi, 2
 103   │     jb  .LBB5_3
 104   │     mov rbx, rdi
 105   │     add rdi, -1
 106   │     call    _ZN8rust_fib3fib17ha1136bd30e432779E
 107   │     mov r14, rax
 108   │     add rbx, -2
 109   │     mov rdi, rbx
 110   │     call    _ZN8rust_fib3fib17ha1136bd30e432779E
 111   │     add rax, r14
 112   │     jb  .LBB5_2
 113   │ .LBB5_3:
 114   │     add rsp, 8
 115   │     .cfi_def_cfa_offset 24
 116   │     pop rbx
 117   │     .cfi_def_cfa_offset 16
 118   │     pop r14
 119   │     .cfi_def_cfa_offset 8
 120   │     ret
 121   │ .LBB5_2:
 122   │     .cfi_def_cfa_offset 32
 123   │     lea rdi, [rip + .L__unnamed_2]
 124   │     lea rdx, [rip + .L__unnamed_3]
 125   │     mov esi, 43
 126   │     call    qword ptr [rip + _ZN4core9panicking5panic17h3de4db67bd397eb3E@GOTPCREL]
 127   │     ud2
 128   │ .Lfunc_end5:
 129   │     .size   _ZN8rust_fib3fib17ha1136bd30e432779E, .Lfunc_end5-_ZN8rust_fib3fib17ha1136bd30e432779E
 130   │     .cfi_endproc

So the exception does get thrown in a panic at label .LBB5_2, but there was only one place that rust wasn’t able to prove it can’t happen, the checked_add (the two checked_sub’s it was able to prove it can never happen, because the 0 | 1 case above proves it) so it tests it there, but even then the code is still overall more efficient than the crystal code, very interesting…

So for numerical work crystal isn’t there quite yet compared to other languages. ^.^;

EDIT 1:

And I’m curious about those compiling times, people say rust compiles slow but crystal seems a lot slower, let’s actually test!

❯ # First let's wipe out all of rusts artifacts:
❯ rm -r target

❯ # Then a fresh compile
❯ time cargo build --release
   Compiling rust_fib v0.1.0 (/home/overminddl1/rust/tmp/rust_fib)
    Finished release [optimized] target(s) in 0.68s
cargo build --release  0.78s user 0.11s system 113% cpu 0.780 total

❯ # Then an incremental no-change compile
❯ time cargo build --release
    Finished release [optimized] target(s) in 0.00s
cargo build --release  0.09s user 0.01s system 99% cpu 0.097 total

And now crystal’s compiles:

❯ # First let's wipe out all of crystals artifacts:
❯ rm -r bin

❯ # Then a fresh compile
❯ time shards build --release
Dependencies are satisfied
Building: crystal_fib
shards build --release  1.57s user 0.21s system 156% cpu 1.140 total

❯ # Then an incremental no-change compile
❯ time shards build --release
Dependencies are satisfied
Building: crystal_fib
shards build --release  1.58s user 0.22s system 153% cpu 1.169 total

That ‘felt’ a lot faster of compiling than it did before… does crystal store its artifacts outside of the project directory somewhere? Still much slower than Rusts compiles though, again how very odd especially since they both use LLVM as an optimizer, guess the code quality that is sent to LLVM is very different…

EDIT 2:

I changed the crystal code to remove the types, so it is what was in the original post about this, and now the compile times have changed to what I remember it being:

❯ rm -r bin

❯ time shards build --release
Dependencies are satisfied
Building: crystal_fib
shards build --release  13.10s user 0.20s system 104% cpu 12.773 total

❯ time shards build --release
Dependencies are satisfied
Building: crystal_fib
shards build --release  1.50s user 0.22s system 147% cpu 1.168 total

So it looks like its type inference is VERY slow or something, very weird…

EDIT 3:

Did it with elixir as well, the code:

defmodule Fib do
    def fib(0), do: 1
    def fib(1), do: 1
    def fib(n) do
        fib(n-1) + fib(n-2)
    end
    
    def run(-1, sum), do: sum
    def run(n, sum), do: run(n-1, sum + fib(n))
    def run() do
        sum = run(41, 0)
        IO.puts(sum)
    end
end

And running it within the vm:

iex(11)> :timer.tc(Fib, :run, [])  
701408732
{7480096, :ok}

So it took about 7 and a half seconds to run, not at all bad for an interpreted language, a LOT better than ruby, lol.

Now even though the BEAM VM is not designed to be shut down, let’s try it by running it straight and shutting it down again as fast as possible, I.E. I made an escript! Running it:

❯ time ./elixir_fib
701408732
./elixir_fib  7.85s user 0.12s system 100% cpu 7.931 total

Not much slower at all, starting up and shutting down the VM (that’s designed not to ever be started up and shut down often, rather it should run all the time) only added about 0.35 seconds to it. ^.^

DevotionGeo · 2 May 2021 07:27

The first book you mentioned uses Ruby, Python and JavaScript for its examples. The second one uses only Ruby. These books look like two great resources for someone learning Ruby. I’ll advise these books to my mentee who is learning Ruby and after that Crystal.

DevotionGeo · 2 May 2021 07:39

Great insight!

If you ever wrote a programming book or recorded a course, I’ll buy it in a heartbeat.

OvermindDL1 · 2 May 2021 07:44

Eh that all comes down to time, baby takes all that up, It’s almost 2am here as it is, lol.

I added more to that post above, an elixir comparison as well. It’s interesting how much faster elixir is compared to ruby even though elixir is an entirely bytecode interpreted language.

Also, the elixir_fib escript came out weighing 1.1 megabytes, lol. ^.^;

An escript for those that don’t know the erlang VM ecosystem is just a zip file with, in this case, 66 bytes at the front to launch the VM with the code package, so that launch time is including unzipping and such the files.

EDIT:

Oh, and for completion the elixir version was:

❯ iex --version
Erlang/OTP 24 [RELEASE CANDIDATE 3] [erts-12.0] [source] [64-bit] [smp:16:16] [ds:16:16:10] [async-threads:1] [jit]

IEx 1.12.0-rc.1 (ce321f8) (compiled with Erlang/OTP 24)

Wait, does the VM include a JIT now?!? When was that added?! You used to have to enable it manually, which I didn’t do here! o.O

DevotionGeo · 2 May 2021 08:01

Wow, Elixir’s performance is impressive with JIT. And the JIT is enabled by default in Erlang/OTP 24.

mafinar · 2 May 2021 15:43

That… was… awesome. I’ll investigate and breakdown my benchmarks following your pattern from now on. Thanks for the guidelines.

Yes, the crystal compile time is slow, it felt even slower when compared to Go, which has exceptionally fast compile time.

Here’s the Go and Crystal side by side (I originally wanted to put that but later didn’t)

Edit: oops that is the wrong screenshot, I remember later changing the Go code to sum it by iteration.

mafinar · 2 May 2021 20:47

Today’s Update (Chapter 2, Part 2)

Fired up a play crystal play and played with the small snippets and “Your Turn”-s. Mostly select syntax introduction. Here are things that stood out:

[Observation] Seems like TIMTOWTDI isn’t as strong in this one. The ways to do things are usually less than or equal to two, more commonly one.
[Muscle Memory Challenge] It’s size, not length, certainly not both
[Muscle Memory Challenge] The %w() style array declaration syntax is there, however, %w//, which is an Elixir worn habit of mine, won’t work, it’s mostly [], (), {}, <> (I could be wrong on the last one, I really needed two variants for this syntax to be complete, sadly // isn’t one)
[Observation] Most things are similar to Ruby, with stricter implication- it IS a statically typed language. But what do I know? I don’t know Ruby.
[Update] There is an example about case...when expression (Yes, they’re all expressions and can happily be on the RHS of the =). So there is a case..in..when syntax which does exhaustive pattern matching. So if it’s a Union type or Enum matching your case is doing, use one with an in
You cannot just get away with empty array [] or hash {}, there needs to be an of <Type> at the end to unconfuse the compiler.
There are classes and there are modules. Looks like modules here serve dual purpose 1. Like a namespace and 2. A chunk of code ready to be included in your other chunk of code.

There’s one more thing. So, there’s a line- “But there’s currently no way to access those instance variables from outside of the object.” That led me to believe that for example- Point.@x will not be accessible and I will need a property (Getter + Setter), getter or setter declared. But what it meant was unless I declare them properties, I won’t be able to access them (But I can totally access Point.@x regardless of property declaration). I was thinking this was an update but I tried with older versions of Crystal and it remained consistent. That motivated me to post a question at the Crystal Forum and the amazing folks there clarified that.

And Yes! There is that “Company Story” at the end. This time it was Dev Demand. Praising the binary and typing, good stuff. Though I disagree with a statement made that Crystal has better tooling than Go. I don’t think the tooling is better than Go, it’s good, and (hopefully) improving, and it doesn’t bother me (yet, hopefully won’t) but not Go good (yet, hopefully).

Lastly, there was a discussion on fibers that I didn’t read much about, concurrency is more fun to read on a chapter dedicated to it. I tried pasting some code in the playground and it seemed to have angered it, maybe will work the good old fashioned way, maybe the code’s outdated, I’ll find out later tonight or tomorrow.

That’s all update for my Chapter 2. Have a great week ahead folks!

mafinar · 2 May 2021 20:50

As would I.

AstonJ · 2 May 2021 22:21

I want to join in

What computer have you got Mafinar? Yours is much quicker than mine

$ time crystal build fib-crystal.cr
crystal build fib-crystal.cr  0.93s user 0.34s system 135% cpu 0.940 total
$ time ./fib-crystal
701408732
./fib-crystal  4.66s user 0.04s system 96% cpu 4.840 total

$ time rustc fib-rust.rs           
rustc fib-rust.rs  0.24s user 0.07s system 121% cpu 0.252 total
$ time ./fib-rust
701408732                  
./fib-rust  3.66s user 0.02s system 95% cpu 3.879 total

time ruby fib-crystal.cr
701408732
ruby fib-crystal.cr  55.80s user 0.48s system 99% cpu 56.348 total

Not sure whether it makes much difference building with rustc and crystal build : /

OvermindDL1 · 3 May 2021 15:37

Very much so, I wasn’t expecting that, lol. I need to catch back up on its development. ^.^;

Heh, it was a lot of work, I just finally had a few minutes free from the baby late at night and needed to delve into something technical, it was very appropriately timed. ^.^

Should look at OCaml! It’s well known as having one of the fastest (and most pluggable) optimizing compilers of any language. ^.^

Hmm… Ooo 4.12 is out, updating!

And now with this OCaml code:

let rec fib = function
| 0 -> 1
| 1 -> 1
| n -> (fib (n - 1)) + (fib (n - 2))

let () =
  let sum = ref 0 in
  let () = for n = 0 to 41 do
    sum := !sum + fib n
  done in
  print_endline (string_of_int !sum)

Let’s compile it in optimized mode from scratch (with timing to show how fast it compiles, really obvious with lots of files as even my fairly decently sized projects often don’t even take a second):

❯ time ocamlopt ./fib.ml -o fib
ocamlopt ./fib.ml -o fib  0.18s user 0.09s system 119% cpu 0.226 total

❯ time ./fib                   
701408732
./fib  2.55s user 0.00s system 99% cpu 2.556 total

So basically what I expected. Compiled super fast, was almost as fast as C code, not quite as fast as the Rust, faster than crystal by a good margin, and this is all while remembering that even optimized OCaml code still uses tagged types, so even with that overhead it’s still that fast. For note, the blazing fast OCaml compiler is also written in OCaml. ^.^

And now let’s do the go code, so with the code in that screenshot then compile and run it (go version 1.16.3):

❯ time go build .
go build .  0.35s user 0.29s system 157% cpu 0.408 total

❯ time ./go_fib 
701408732
./go_fib  2.98s user 0.01s system 100% cpu 2.985 total

Interesting, go both compiles slower than ocaml and runs slower than ocaml, I figured it would have beat ocaml, interesting…

/* relatable */

Lol.

The one I’m using seems quicker than yours as well, it’s a few years-old Ryzen 7 that I’m running these on.

Oh wait, you didn’t compile the rust and the crystal code in release mode, you built them in debug mode, both will be slower that way! ^.^;

Look at the commands I ran in my post to build them each in release mode.

If you call the base compilers it probably won’t be as simple as adding --release to them like you can when using a build system as adding --release to the build system causes each’s build system to add a whole host of arguments to the compilers to optimize the code. You really should use their respective build systems, it’s a lot easier.

OvermindDL1 · 3 May 2021 15:56

Ah hah! I found out Crystal has operators that allow wrapping with no exception throwing, you prepend & to things like + and so forth, so I updated the source to:

def fib(n)
    return n if n <= 1
    fib(n &- 1) &+ fib(n &- 2)
end

sum = 0

(1..42).each do |i|
    sum &+= fib(i)
end

puts sum

And compile and run!

❯ time shards build --release
Dependencies are satisfied
Building: crystal_fib
shards build --release  13.34s user 0.29s system 98% cpu 13.797 total

❯ time ./bin/crystal_fib 
701408732
./bin/crystal_fib  4.10s user 0.01s system 99% cpu 4.126 total

Hmm, I ran it a half dozen times and the same time result every time to within 0.01s, so it did help, but not as much as I’d hoped…

Taking a look at the assembly:

76385   │     .type   "*fib<Int32>:Int32",@function
76386   │ "*fib<Int32>:Int32":
76387   │ .Lfunc_begin239:
76388   │     .loc    7 1 0
76389   │     .cfi_startproc
76390   │     pushq   %rbp
76391   │     .cfi_def_cfa_offset 16
76392   │     pushq   %rbx
76393   │     .cfi_def_cfa_offset 24
76394   │     pushq   %rax
76395   │     .cfi_def_cfa_offset 32
76396   │     .cfi_offset %rbx, -24
76397   │     .cfi_offset %rbp, -16
76398   │     movl    %edi, %ebx
76399   │ .Ltmp10165:
76400   │     .loc    7 2 5 prologue_end
76401   │     cmpl    $1, %edi
76402   │     jg  .LBB239_3
76403   │     movl    %ebx, %eax
76404   │     .loc    7 0 0 is_stmt 0
76405   │     addq    $8, %rsp
76406   │     .cfi_def_cfa_offset 24
76407   │     popq    %rbx
76408   │     .cfi_def_cfa_offset 16
76409   │     popq    %rbp
76410   │     .cfi_def_cfa_offset 8
76411   │     retq
76412   │ .LBB239_3:
76413   │     .cfi_def_cfa_offset 32
76414   │     .loc    7 2 5
76415   │     leal    -1(%rbx), %edi
76416   │     .loc    7 3 5 is_stmt 1
76417   │     callq   "*fib<Int32>:Int32"
76418   │     movl    %eax, %ebp
76419   │     addl    $-2, %ebx
76420   │     .loc    7 3 20 is_stmt 0
76421   │     movl    %ebx, %edi
76422   │     callq   "*fib<Int32>:Int32"
76423   │     addl    %ebp, %eax
76424   │     .loc    7 0 0
76425   │     addq    $8, %rsp
76426   │     .cfi_def_cfa_offset 24
76427   │     popq    %rbx
76428   │     .cfi_def_cfa_offset 16
76429   │     popq    %rbp
76430   │     .cfi_def_cfa_offset 8
76431   │     retq
76432   │ .Ltmp10166:
76433   │ .Lfunc_end239:
76434   │     .size   "*fib<Int32>:Int32", .Lfunc_end239-"*fib<Int32>:Int32"
76435   │     .cfi_endproc

Well the test and jumps and exception are gone now, but it’s still generating very poor code, very weird…

Well, at least it’s faster than before, even if not as fast as any other native compiled language I’ve tested yet (even ocaml oddly)… ^.^;

AstonJ · 3 May 2021 17:03

Thanks ODL - that’s quite a bit faster!

$ time shards build --release
Dependencies are satisfied
Building: crystal_fib
shards build --release  0.83s user 0.29s system 119% cpu 0.936 total

$ time ./bin/crystal_fib 
701408732
./bin/crystal_fib  2.62s user 0.01s system 92% cpu 2.840 total

Your code from here is actually a bit slower

$ time shards build --release
Dependencies are satisfied
Building: crystal_fib
shards build --release  9.01s user 0.39s system 102% cpu 9.199 total

$ time ./bin/crystal_fib     
701408732
./bin/crystal_fib  2.68s user 0.02s system 93% cpu 2.901 total