erlang:master
β garazdawi:beamasm
opened 12:02PM - 11 Sep 20 UTC
This PR introduces BeamAsm, a JIT compiler for the Erlang VM.
## Implementati⦠on
BeamAsm provides load-time conversion of Erlang beam instructions into native code on x86-64. This allows the loader to eliminate any instruction dispatching overhead and also specialize each instruction on their argument types.
BeamAsm does not do any cross instruction optimizations and the x and y register arrays work the same as when interpreting beam instructions. This allows the erlang run-time system to be largely unchanged except for places that need to work with loaded beam instructions like code loading, tracing, and a few others.
BeamAsm uses [asmjit](https://github.com/asmjit/asmjit) to generate native code in run-time. Only small parts of the [Assembler API](https://asmjit.com/doc/group__asmjit__assembler.html) of [asmjit](https://github.com/asmjit/asmjit) is used. At the moment [asmjit](https://github.com/asmjit/asmjit) only supports x86 32/64 bit assembler, but work is ongoing to also support arm 64-bit.
For a more lengthy description of how the implementation works, you can view the [internal documentation of BeamAsm](https://erlang.org/doc/apps/erts/BeamAsm.html).
## Performance
How much faster is BeamAsm than the interpreter? That will depend a lot on what your application is doing.
For example, the number of Estones as computed by the [estone benchmark suite](https://github.com/erlang/otp/blob/master/erts/emulator/test/estone_SUITE.erl) becomes about 50% larger, meaning about 50% more work can be done during the same time period. Individual benchmarks within the estone benchmark suite vary from a 170% increase ([pattern matching](https://github.com/erlang/otp/blob/master/erts/emulator/test/estone_SUITE.erl#L493-L596)) to no change at all ([huge messages](https://github.com/erlang/otp/blob/master/erts/emulator/test/estone_SUITE.erl#L481-L490)). So, not surprising, computation heavy workload can show quite a large gain, while communication heavy workloads remain about the same.
If we run the JSON benchmarks found in the [Poison](https://github.com/devinus/poison/tree/master/bench) or [Jason](https://github.com/michalmuskala/jason/tree/master/bench), BeamAsm achieves anything from 30% to 130% increase (average at about 70%) in the number of iterations per second for all Erlang/Elixir implementations. For some benchmarks, BeamAsm is even faster than the pure C implementation [jiffy](https://github.com/davisp/jiffy).
More complex applications tend to see a more moderate performance increase, for instance, RabbitMQ is able to handle 30% to 50% more messages per second depending on the scenario.
## Profiling/Debugging
One of the great things about executing native code is that some of the utilities used to profile C/C++/Rust/go can be used to profile Erlang code. For instance, this is what a run of `perf` on Linux can look like:

There are more details in the [internal documentation of BeamAsm](https://erlang.org/doc/apps/erts/BeamAsm.html) on how to achieve this.
## Drawbacks
Loading native code uses more memory. We expect the loaded code to be about 10% larger when using BeamAsm than when using the interpreter.
This PR includes a major rewrite of how the Erlang code loader works. The new loader does not include HiPE support, which means that it will not be possible to run HiPE compiled code in OTP-24.
We are still looking for anyone that wants to maintain HiPE so that it can continue to push the boundary on what high-performance erlang looks like.
## Try it out!
We are looking for any feedback you can provide about the functionality and performance of BeamAsm. To compile it you need a relatively modern C++ compiler and an operating system that allows memory to be executable and writable at the same time (which is most OSs, except OpenBSD).
If you are on windows you can download installers here:
* [OTP BeamAsm win32](https://github.com/erlang/otp/releases/download/OTP-24.0.2/otp_win32_24.0.2.exe)
* [OTP BeamAsm win64](https://github.com/erlang/otp/releases/download/OTP-24.0.2/otp_win64_24.0.2.exe).
Note that these are built using our internal nightly tests, so contains more changes than what this PR includes.
This was posted by one of our members via one of our automated news source trackers. If you feel this thread could be in a better category or could include better tags and you are at Trust Level 3 or higher, please feel free to move/edit it
5 Likes
Itβs amazing! But it says that Erlang will use more memory after this, and it already used too much memory comparing to Go or Node, or even Ruby.
(Of course BEAM does a lot of more work which causes the higher use of memory).
2 Likes
AstonJ
12 September 2020 02:56
3
When would that be an issue DG?
2 Likes
It may become an issue when the amount of memory available is small, like in Heroku, Gigalixir or some of AWS/GCP instances.
2 Likes
AstonJ
12 September 2020 16:15
5
I thought thatβs what you might have meant
Personally I donβt think that is a worry for the target-users of Elixir/Erlang - those who need massive scale for very busy/large projects. Ram gets cheaper over time too so Iβm sure more ram will be reflected in cloud services at the same price point sooner or later
2 Likes