Ruby Structs

CommunityNews · 15 January 2022 03:55

Ruby’s Struct is one of several powerful core classes which is often overlooked and under utilized compared to the more popular Hash class. This is a shame and I’m often surprised when working with others who don’t know about structs or, worse, abuse them entirely. I’d like to set the record straight by sharing the joy of structs with you and how you can leverage their power to improve your Ruby code further.

Read in full here:

This thread was posted by one of our members via one of our news source trackers.

bot · 15 January 2022 03:55

Corresponding tweet for this thread:

Share link for this tweet.

OvermindDL1 · 18 January 2022 17:17

So why do OpenStructs exist then? What’s their purpose that can’t be done better using one of the other ways that aren’t susceptible to huge security issues and performance loss?

AstonJ · 18 January 2022 17:52

One is an Object and one is a Class - so I am guessing convenience over performance (I’ve never actually needed to use a Ruby Struct myself).

Struct vs OpenStruct

The difference between Struct & OpenStruct:

Struct creates a new class with predefined attributes, equality method (==) & enumerable

OpenStruct creates a new object with the given attributes

An OpenStruct is a fancy Hash object, while a Struct is like creating a new class from a template.

Video from that page:

OvermindDL1 · 18 January 2022 17:53

From what I read the OpenStruct dynamically generates functions for accessed fields, but that just sounds like __getattr__ method in python, which doesn’t sound anywhere near that slow in comparison?

But still, why does OpenStruct exist if a class can do what it does anyway, but faster?

AstonJ · 18 January 2022 18:06

It can’t… OpenStruct is ‘open’ - meaning you can add attributes on the fly, whereas with Structs they have to be pre-defined in the class…

OvermindDL1 · 18 January 2022 18:07

Except can’t you do that with classes too with the method_missing and such methods to do whatever anyway?

AstonJ · 18 January 2022 18:16

Not if the method already exists

OvermindDL1 · 18 January 2022 18:17

Which would gain speed back, which is then even better, yes? ^.^

But still, what purpose does OpenStruct that isn’t done better by something else? Like why would someone most definitely use it over another way, or even just a hash?

AstonJ · 18 January 2022 18:22

From SO:

OpenStruct objects are useful when you need something to fit a certain method call interface (i.e. send in a duck-typed object responding to #name and #value ), or when you want to encapsulate the implementation details, but also want to avoid over-engineering the solution. They also make an awesome stub object, and I often use them in place of framework stubs when I don’t need the overhead of a stub/mock.

You’ll just have to get over your dislike of Ruby and write a few apps in it ODL - just so you can see why so many people love it

OvermindDL1 · 18 January 2022 18:58

It’s not ruby, it’s buggy dynamically typed languages without any prechecks so a whole whole ton of classes of bugs can and do sneak in to code that you don’t find out until you are actually running the code one way or the other. ^.^;

Python very much included here.

OCaml less so, lol.

OvermindDL1 · 18 January 2022 20:14

You know, I’m curious how it compares to python now that I thought of it, let’s see:

Ruby 3.1.0:

Calculating -------------------------------------
               Array     11.877  (± 0.0%) i/s -     60.000  in   5.052928s
                Hash      5.975  (± 0.0%) i/s -     30.000  in   5.028904s
              Struct      2.949  (± 0.0%) i/s -     15.000  in   5.087194s
          OpenStruct      0.052  (± 0.0%) i/s -      1.000  in  19.335931s
               Class      2.068  (± 0.0%) i/s -     11.000  in   5.320189s

Comparison:
               Array:       11.9 i/s
                Hash:        6.0 i/s - 1.99x  (± 0.00) slower
              Struct:        2.9 i/s - 4.03x  (± 0.00) slower
               Class:        2.1 i/s - 5.74x  (± 0.00) slower
          OpenStruct:        0.1 i/s - 229.66x  (± 0.00) slower

mruby 3.0.0

❯ ruby ./snippet.rb
trace (most recent call last):
./snippet.rb:6: undefined method 'require' (NoMethodError)

Uhhh, mruby does not appear to be a ruby drop-in replacement, that’s… disconcerting… o.O
And I still can’t get any of the other 4 ruby JIT’s/runtimes that I have here even running at all… Everything is so easy in python land in comparison…

And this code for the python benchmark:

#! /usr/bin/env python

import timeit
from dataclasses import dataclass

MAX = 1_000_000
WARMUP_TIMES = 2
REPEAT_TIMES = 5

class ExampleClass:
    def __init__(self, to: str, from_: str):
        self.to = to
        self.from_ = from_

@dataclass
class ExampleDataClass:
    to: str
    from_: str

def bench_array():
    global MAX
    for _ in range(MAX): ["Mork", "Mindy"]

def bench_hash():
    global MAX
    for _ in range(MAX): {"to": "Mork", "from": "Mindy"}

def bench_class():
    global MAX
    for _ in range(MAX): ExampleClass("Mork", "Mindy") # or ExampleClass(to: "Mork", from_: "Mindy")

def bench_data_class():
    global MAX
    for _ in range(MAX): ExampleDataClass("Mork", "Mindy") # or ExampleDataClass(to: "Mork", from_: "Mindy")

def bench(func):
    global MAX, WARMUP_TIMES, REPEAT_TIMES
    print("\nWarming up: " + func.__name__)
    for i in range(WARMUP_TIMES):
        func()
    print("Timing: " + func.__name__)
    result = timeit.Timer(func, globals={MAX: MAX}).repeat(
        repeat=5,
        number=REPEAT_TIMES)
    print(f"Result for {func.__name__}: {repr(result)}")
    return result

def main():
    global MAX
    benches = [bench_array, bench_hash, bench_class, bench_data_class]
    results = [(func.__name__, bench(func)) for func in benches]
    fastest = [(name, min(result)) for (name, result) in results]
    fastest = sorted(fastest, key=lambda t: t[1])
    t = fastest[0][1]
    print("\nResults:")
    for (name, result) in fastest:
        print(f"{name} {(1 / result):.3f} i/s in {(result / t):.3f} of fastest {result}s")

if __name__ == "__main__":
    main()

Python 3.10.0

Results:
bench_array 2.702 i/s in 1.000 of fastest 0.3699605700094253s
bench_hash 1.522 i/s in 1.775 of fastest 0.6566669140011072s
bench_class 0.393 i/s in 6.863 of fastest 2.53908338106703s
bench_data_class 0.392 i/s in 6.887 of fastest 2.547773462953046s

Slower than ruby on average, used to be faster, wonder if ruby runs the JIT by default now…

Oh yeah, ruby has a JIT now doesn’t it? Needs some argument passed in as it’s not automatic as of 2.6 (but is that changed for version 3?)… found it:

Ruby 3.1.0 JIT

Calculating -------------------------------------
               Array     12.203  (± 0.0%) i/s -     61.000  in   5.001542s
                Hash      5.736  (± 0.0%) i/s -     29.000  in   5.077485s
              Struct      2.305  (± 0.0%) i/s -     12.000  in   5.208924s
          OpenStruct      0.054  (± 0.0%) i/s -      1.000  in  18.596932s
               Class      1.691  (± 0.0%) i/s -      9.000  in   5.324369s

Comparison:
               Array:       12.2 i/s
                Hash:        5.7 i/s - 2.13x  (± 0.00) slower
              Struct:        2.3 i/s - 5.29x  (± 0.00) slower
               Class:        1.7 i/s - 7.22x  (± 0.00) slower
          OpenStruct:        0.1 i/s - 226.94x  (± 0.00) slower

That’s… not faster?! So I’m guessing the JIT was already on, that explains why it was faster than python.

Ruby 3.1.0 YJIT

What about the new Yjit that just came out for ruby:

Calculating -------------------------------------
               Array     11.862  (± 0.0%) i/s -     60.000  in   5.061451s
                Hash      5.906  (± 0.0%) i/s -     30.000  in   5.089482s
              Struct      2.899  (± 0.0%) i/s -     15.000  in   5.173745s
          OpenStruct      0.052  (± 0.0%) i/s -      1.000  in  19.108493s
               Class      2.012  (± 0.0%) i/s -     11.000  in   5.468258s

Comparison:
               Array:       11.9 i/s
                Hash:        5.9 i/s - 2.01x  (± 0.00) slower
              Struct:        2.9 i/s - 4.09x  (± 0.00) slower
               Class:        2.0 i/s - 5.90x  (± 0.00) slower
          OpenStruct:        0.1 i/s - 226.66x  (± 0.00) slower

Hmm, barely faster in some ways, barely slower in others…

Still can’t get mruby and the others working either… >.>

Hmm, what about python’s JIT’s:

GraalPython 21.3.0

Results:
bench_class 8735.840 i/s in 1.000 of fastest 0.00011447095312178135s
bench_data_class 7911.958 i/s in 1.104 of fastest 0.00012639095075428486s
bench_array 3452.217 i/s in 2.531 of fastest 0.0002896689111366868s
bench_hash 1.802 i/s in 4847.337 of fastest 0.5548792550107464s

That… is not only a whole lot faster, that made classes and dataclasses even faster than the primitive stuff! o.O

I’m pretty sure that the array, class, and data class ones got optimized out though, because those times are almost nop's… Hard thing about benchmarking after all! And very hard to fix running on graal since it’s so aggressive on that, so what about PyPy, the more traditional python JIT:

PyPy 3.8

Results:
bench_class 319.380 i/s in 1.000 of fastest 0.0031310629565268755s
bench_data_class 318.421 i/s in 1.003 of fastest 0.0031404918991029263s
bench_hash 317.940 i/s in 1.005 of fastest 0.003145241062156856s
bench_array 212.141 i/s in 1.506 of fastest 0.004713836940936744s

Still so much faster. Wonder why Ruby’s JIT’s work so very poorly… o.O

I wonder just what JIT work is happening to get the python hash even faster than python arrays?!

Hmm, should probably have benched Python’s tuples, I’d imagine them to be about the same as an array, maybe a tiny touch faster but meh…

What about OCaml since I often hop to it instead of python or other dynamic languages, using this code (I wrote this code significantly faster than the python one… and it worked first try), using the standad ocaml statistical benchmarker so output is a bit different, I’ll try to convert:

(* open Core *)
open Core_bench

let max_count = 1_000_000

module TheMap = Map.Make(String)

type the_struct = {to_: string; from: string}

class the_object to_ from = object (self)
  val to_ = to_
  val from = from
end

let () =
  Core.Command.run (Bench.make_command [

    Bench.Test.create ~name:"tuple" (fun () -> "Mork", "Mindy");
    Bench.Test.create ~name:"array" (fun () -> ["Mork"; "Mindy"]);
    Bench.Test.create ~name:"immutable map" (fun () ->
        TheMap.empty
        |> TheMap.add "to" "Mork"
        |> TheMap.add "from" "Mindy"
    );
    Bench.Test.create ~name:"mutable map" (fun () ->
        let hash = Hashtbl.create 2 in
        let () = Hashtbl.add hash"to" "Mark" in
        let () = Hashtbl.add hash "from" "Mindy" in
        hash
    );
    Bench.Test.create ~name:"record" (fun () -> {to_="Mork"; from="Mindy"});
    Bench.Test.create ~name:"object" (fun () -> new the_object "Mork" "Mindy");

  ])

OCaml 4.13.1 REPL - Fully Interpreted (like ruby or python without JIT’s)

Estimated testing time 12s (6 benchmarks x 2s). Change using '-quota'.
┌───────────────┬──────────┬─────────┬────────────┐
│ Name          │ Time/Run │ mWd/Run │ Percentage │
├───────────────┼──────────┼─────────┼────────────┤
│ tuple         │  16.55ns │         │      4.89% │
│ array         │  16.52ns │         │      4.88% │
│ immutable map │ 216.65ns │  18.00w │     63.95% │
│ mutable map   │ 338.79ns │  30.00w │    100.00% │
│ record        │  16.56ns │         │      4.89% │
│ object        │  80.56ns │   5.00w │     23.78% │
└───────────────┴──────────┴─────────┴────────────┘

Or in instruction per second and ordered from fastest to slowest and divided by 1_000_000 to match the python and ruby that would be:

array → 60.532 i/s
tuple → 60.422 i/s
record → 60.386 i/s
object → 12.413 i/s
immutable map → 4.615 i/s
mutable map → 2.951 i/s

Even completely not JIT’d, just interpreted, OCaml is soooo much faster than ruby (even JIT’d) or normal python, this is the boon of just knowing what the types are (which you might have noticed that I barely if ever stated at all in the ocaml code, it’s able to infer it).

OCaml 4.13.1 compiled (compiles are basically instant, fast enough it’s hard to time from the shell, far less than 1 second)

┌───────────────┬──────────┬─────────┬────────────┐
│ Name          │ Time/Run │ mWd/Run │ Percentage │
├───────────────┼──────────┼─────────┼────────────┤
│ tuple         │   2.61ns │         │      4.08% │
│ array         │   2.57ns │         │      4.03% │
│ immutable map │  21.34ns │  18.00w │     33.39% │
│ mutable map   │  63.92ns │  30.00w │    100.00% │
│ record        │   2.58ns │         │      4.03% │
│ object        │  19.51ns │   5.00w │     30.53% │
└───────────────┴──────────┴─────────┴────────────┘

Or in instruction per second and ordered from fastest to slowest and divided by 1_000_000 to match the python and ruby that would be:

array → 389.105 i/s
record → 387.596 i/s
tuple → 383.141 i/s
object → 51.255 i/s
immutable map → 46.860 i/s
mutable map → 15.644 i/s

So yeah, that’s basically as fast as such code could run before it starts getting optimized out, maybe could get a little faster by referencing string constants and such without an indirection, but that’s very much micro-optimizing at that point.

EDIT: For note, in OCaml, a record is like a dataclass in python, has a predefined set of fields, where object is more ‘fluffy’ (but still entirely type safe, technically objects are row-typed records, kind of a compile-time mutable but runtime immutably keyed hashmap)

Maartz · 19 January 2022 16:28

Same. I’ve never used Ruby Structs… I learn about them like a week ago in this post from Alchemist up above.

Also never encountered it in any codebase so far…

OvermindDL1 · 19 January 2022 16:31

I can definitely see use of ruby structs, but they do seem very oddly ‘heavy’, because they are basically just OCaml records, just with named indices instead of indexes into a tuple, and in ocaml even in fully interpreted mode records are still the same speed as arrays and tuples, so it seems weird that they are so costly unless the ruby compilation-to-bytecode step just isn’t able to figure such things out?

What is utterly flabbergasting to me is how ineffectual the ruby JIT is on such code, you’d think this would be precisely the kind of thing that could be extremely easily monomorphized into trivial machine instructions.

Maartz · 19 January 2022 16:44

True!

Finally, we have OpenStruct which is part of Ruby Core as well. At this point, you might be thinking: “Hey, looks like an OpenStruct is better in terms of melding Struct and Classsyntax and functionality.” Well, you’d be very wrong in terms of performance but, as mentioned with Class usage, I promise to expand upon this more later.

Warming up --------------------------------------

               Array     2.000  i/100ms
                Hash     1.000  i/100ms
              Struct     1.000  i/100ms
          OpenStruct     1.000  i/100ms
               Class     1.000  I/100ms

Calculating -------------------------------------

               Array     23.854  (± 0.0%) i/s -    120.000  in   5.030763s
                Hash      8.908  (±11.2%) i/s -     45.000  in   5.072652s
              Struct      4.798  (± 0.0%) i/s -     24.000  in   5.005263s
          OpenStruct      0.178  (± 0.0%) i/s -      1.000  in   5.618325s
               Class      4.230  (± 0.0%) i/s -     22.000  in   5.203019s

Comparison:

               Array:       23.9 i/s
                Hash:        8.9 i/s - 2.68x  (± 0.00) slower
              Struct:        4.8 i/s - 4.97x  (± 0.00) slower
               Class:        4.2 i/s - 5.64x  (± 0.00) slower
          OpenStruct:        0.2 i/s - 134.02x  (± 0.00) slower

Still it seems that Structs are faster, but if it was, why is not idiomatic to use them ?
And use a class for “singleton instance” and struct for the rest like in Swift.