I know that these benchmarks might not be the exact picture of real-world scenario, but still I expect a Rust web framework performing a lot better, even in these benchmarks.
What is the reason behind Rust’s web framework, Rocket, not performing as well as expected in the Techempower benchmarks?
Part of the answer might be this GitHub comment by Rocket’s primary maintainer, especially the second item:
There are three reasons: [why Rocket was not part of the Techempower benchmarks at the time]
- I personally believe that developers tend to misplace trust on benchmarks. As such, it might be a good thing that Rocket isn’t on there. Perhaps then the decision of which framework to choose can be based on what’s really important: ergonomics, productivity, security, and correctness.
- Rocket doesn’t and won’t ever cater to benchmarks. That is, we won’t make design decisions or changes to Rocket just to be faster at a particular benchmark without further justification.
- No one’s written/submitted to TechEmpower.
Of course, it would still be interesting to see what is happening here that makes Rocket less efficient for these particular microbenchmarks w.r.t. other frameworks, especially the other frameworks also written in Rust.
I think we need better benchmarks (something we have actually thought about and may do as a sister project ;-))
This might turn into a huge endeavour though. There’s a fine line between “a fair benchmark where the people who know the framework well can make the app work really fast” and “the people will put hand-optimised Assembly so as to win at the benchmark”. That would require careful manual curation. Not sure you’d want to swim in that particular swamp.
You know us… we like to do things differently
The focus would be on real-world meaningful benchmarks, so tricks to cook the results would be limited
I don’t trust techempower first of all, far too many libraries doing questionable things that would never ever be done in production in it.
Second of all, last I checked the rocket.rs code it’s using is forcing each request to come as a new tcp connection, which will significantly make it ‘look’ slower than most other libraries (the fact its even still as fast as it is, is quite impressive though). To simulate more ‘actual’ web usage all testing should be done with new tcp connections each time though.
Third, the rocket.rs version they are using is still using the old version of hyper pre-async, which though still fast is not anywhere near as fast as modern hyper (which is used by the nightly version of rocket, which is the one most people use of rocket anyway until the next big release).
That’s what so so many of the libraries in the benchmarks do though, they aren’t even remotely oriented for what would seem like real world use, optimized only for benchmarks.
In that case as @AstonJ said,
Also, Rocket team should now jump in and fix the issues you mentioned, because Techempower benchmarks are trusted by many even if they have flaws.
Yep… the horrible thing is that they get away with it . But how can we improve that situation?
It’s a social problem, not technical, so not sure a random bunch of a few enthusiasts can help much. Maybe if we roll our own benchmarking harness. But that would entail huge technical voluntary work – and we must have people that mercilessly curate the submitted code so that it doesn’t just game the benchmarking metrics.
It will require a lot of work, but the benefits will be endless. Having benchmarks where you’re sure that the results aligns with the real-world scenario.
@AstonJ, one more thing. If there is some kind of measurement of the time spent and work gone into making each version of the app (toy-app) for those benchmarks, that will be a plus. A distributed fault-tolerant system/app on Dragon (C++) or Actix (Rust) might be the fastest. But doing it right (or even just doing it) will require a lot of work and time comparing to doing it in Phoenix.
Great points everyone! (I wonder if we should split these into a dedicated thread about a new benchmarking system?)
Ok so the benchmarks I have in mind would be based on real-world apps, these would be split into things such as a user registration system, i.e something from the real-world. Each submission would then be measured and rated.
There would be two types that can be submitted - the first would need to follow a very strict spec, eg, must use Postgres. The second would be where the submission could choose any technology so long as it can be used for the intended purpose in a live environment. (So number of errors/failures would factor in scoring and too many would render the submission ‘unfit for purpose’, thus disqualified.)
The benchmarks would be at benchmarks.devtalk.com and of course, in true DT fashion, there would be some twists and other cool things to make them even better
Does that sound interesting to you all?
We probably should. Excuse my cynicism and being sceptical: I am simply looking at it from the angle of my partially burned out and eternally tired self and I know I wouldn’t start such an effort. But if several people start it off and make a reliable benchmark harness (with proper isolation, so likely through Docker) and put several languages/frameworks combos then I am sure many others will join in eventually.