Please introduce yourself.
Hi, my name is Sean Moriarity. I am the author of Genetic Algorithms in Elixir. I’m originally from Philadelphia, Pennsylvania. I’m a huge Philly sports fan—fan of sports in general. And I’m a programmer. I’m a big fan of Elixir and functional programming, machine learning, a lot of other things.
What is the journey that took you into the world of Elixir and the larger world of Nx and other Elixir-related libraries?
Actually, I tell this story a lot now. Originally when I was in school, we were learning Scala, which is a half-functional language. I think it’s called object-functional. I got pretty interested in functional programming and how to do things functionally over object-oriented approaches. I was looking for something new to learn because I didn’t necessarily love Scala. I thought it was decent, but I didn’t love it. I was on Quora one day, and I saw someone ask the question, “What is the best language for building websites or doing backend development?” Someone said Elixir, and they did a really good job of basically selling me on the language. I got really into Elixir from that. That was probably around five or six years ago when that happened.
Then the journey into Nx started when I was mostly interested in doing numerical computing and like mathematics, machine learning, because those were just the things I was interested in doing programming-wise. I decided that I was going to start to write those libraries in Elixir, despite all of the warnings about the BEAM (Bogdan’s Erlang Abstract Machine, the virtual machine underlying Elixir) not being good for numerical computing and heavy computations. I thought the language was really cool. I figured it would be cool to write some fun libraries and see where they could go.
I ended up writing this library called Genex. It’s basically genetic algorithms in Elixir. The purpose of Genex in Elixir is that it was really just a dummy project, a toy project that did some pretty cool things. You could solve some interesting problems. This was a fun project for me. I wondered if some other people in the community might be interested in it.
So I decided to fire off an application to publish a book about genetic algorithms in Elixir to The Pragmatic Bookshelf. I connected with Dave Rankin, who’s now the CEO of The Pragmatic Bookshelf. He helped me walk through and massage my application for the publishing committee. They eventually accepted the book that would become Genetic Algorithms in Elixir.
From there, some people in the community were interested in seeing whether or not I would be interested in writing some machine learning libraries. Brian Cardarella, the CEO of DockYard, reached out to me. He thought I might be a good candidate to help Elixir Creator José Valim out with the machine learning initiatives he wanted to eventually do in the future. We connected, and eventually that became the initial efforts for what is now known as the Nx Project. That was probably about two years ago—the Nx Project started, and we’ve been working on them basically ever since.
When I hear Elixir, I usually think “web server.” How do you get from there to math?
That’s a good question. It’s actually a question we kind of face a lot. I like to say that data scientists and machine learning folks are the most stubborn programmers on the planet. They can refuse to try anything new. The criticism we hear a lot of times is, “Hey, Elixir is good for this. What’s the point of venturing into something else?” I don’t necessarily know if I have an amazing answer for the question.
To me, Elixir is aesthetically just a beautiful language. From the functional perspective, it jibes very well with the computations you’re doing mathematically. It’s like expressing a lot of the papers you’ll read in like machine learning. The algorithms are expressed really well functionally, and maybe not so well in other languages that have mutability and some other things.
I think there are a lot of things that Elixir does right in terms of being able to write idiomatic programs for machine learning and mathematics. Its strength as a language for web development and deployments is also a reason for why people should use it for machine learning and mathematics.
While it’s cool that you can do some neat machine learning theory work with whatever language you want, at the end of the day, your objective is to deploy that somewhere to solve a real problem. The deployment story in other ecosystems is not necessarily as strong as the proven deployment, like the proven deployment story in the Elixir ecosystem.
When you really need to deploy a mission critical, fault tolerant application, Elixir is the way to go. An ecosystem that lets you do both machine learning and web development—backend development— is really a compelling application or a compelling thing to have.
What qualities make a language beautiful? How does a beautiful language affect the joy and clarity of the code that you produce?
That’s a really good question. I think about this a lot because I see programming being as much an art as it is a science. I’m not much of an algorithms and data structures savant. But I do really, really like writing code. I can appreciate cleverness and beautiful code. I think describing what makes a language beautiful is a lot like asking an artist what makes a painting beautiful to them. I think some people might laugh at that. But that’s how I see it. I write code a lot more like an artist would paint. I don’t write code just because I have to; I genuinely enjoy doing it. But I also have periods where I just don’t enjoy writing code.
My life cycles; my creativity ebbs and flows; my desire to write code ebbs and flows. It’s hard to give a concrete answer to what makes a language beautiful, because it’s one of those things—you just know it when you see it or when you feel it.
Elixir is very much a language where you have immutability and a functional approach. How does that raise you up and support you as a programmer?
I think there’s a lot of ways. Having worked on some larger applications with languages that have mutability, I know how annoying it can get to track global state and follow the execution trace of a program. When you’re debugging large applications that have global state and mutability, the smallest thing can turn around and invite you to disaster. Having something like Elixir that offers immutability, I personally think functional programming is just easier to reason about.
If you write a good composable functional program, it should be relatively easy for you to reason about what the application is doing at certain times and to reason about the state that the application is in. Then in turn, that just helps you write more maintainable code.
Functional programming has gotten a lot of attention in the software engineering community over the last, I don’t know, five, ten years.
A lot of people start off in functional or semifunctional languages like Scala, like you did, or Haskell. In terms of high-performance computing, how does functional programming help create code that is correct and performant?
I think the biggest thing here is that functional programming helps you write efficient parallel or concurrent code. It’s obviously the big thing with Elixir. It helps you write parallel or concurrent code in a correct way. As I mentioned before, reasoning about state, especially when a system is very parallel, very parallelized, can be difficult. It’s important to have immutability in some of the constructs that you see in languages like Haskell, Elixir, and Erlang.
I think a good example of this is Apache Spark with Scala. There’s some other languages out there, array programming languages that adopt this functional paradigm. I think Futhark is technically a functional language. I wouldn’t quote myself on that or I wouldn’t marry myself to that statement, but I believe it somewhat is.
I think a lot of languages now are adopting this functional construct because it’s been proven to help people write maintainable code. On the opposite end of the spectrum, one of the issues you have with performance is it specifically arises with immutability. A lot of these languages are reference counted or garbage collected. Not having access to explicit memory management has an impact on performance, and we’re getting better at writing compilers and optimizing things in such a way that it doesn’t really matter that much anymore. Having the ability to write programs that are easy to understand, easy to maintain, and also highly parallelizable is a significant win for everyone.
What is the mission statement for Nx?
I don’t know if we necessarily have a concrete mission statement. I don’t think we’ve gone out and formalized anything. Personally I think the mission statement is to provide an experience in numerical computing and machine learning that is on par or better than the Python ecosystem and some of the other ecosystems out there.
One of the things that I found is that, like I said, data scientists and machine learning people are stubborn. In convincing people to make the switch over to use Nx for their machine learning and their data science projects, you have to really convince them that your language is at least two or three times better than, or your libraries are at least two or three times better than, something that is crucial to them.
I think a lot of people like myself value aesthetics and their own creative expression over just choosing the programming language that has the most utility. If we were all just deciding based on rationality, what language to use, we probably would still be using Assembly. It got the job done well enough. Obviously programmers decide to do things based on more than just rationality and logic.
I think one of our jobs is really just to convince people that you can do the exact same things you can do in the Python ecosystem and other ecosystems in Elixir, and you can do them a lot better.
Can you name any of the features that Nx excels at and would be something that would entice those Python programmers?
I think writing production-grade servings in the machine learning world—a serving is just kind of a wrapper around doing inference for a model—is something that you can do much quicker and much more performant than you can in the Python ecosystem.
In the Python ecosystem, there are a ton of frameworks for doing model serving, but they all have their own spin on what model serving is and what model serving should encapsulate. Whereas in our ecosystem, it’s built natively into the library, and you can create a serving and distribute it across multiple nodes. If you have multiple GPUs on your machine, you can partition workloads between both GPUs. That’s just something that implements dynamic batching, which is an algorithm for handling concurrent requests, overlapping requests in a performant way.
You can do it in the Python ecosystem, but I don’t necessarily know if it’s as good as what we have right now in the Elixir ecosystem. Nx marries really well with Phoenix and some of the other frameworks in the Elixir ecosystem. It makes it really easy to train a model and then deploy a model to an application at scale very quickly.
What is machine learning? And how do neural networks fit into this?
Hmmm. You have to answer the question of what a machine is. That gets a little philosophical. I wouldn’t go too in depth there.
There’s a definition from a computer scientist, a professor—I think he’s at Carnegie Mellon— named Tom Mitchell. His definition of learning is more or less when a system improves on some task through some experiences according to some metric. It might sound a lot like optimization, because it really is. Optimization and machine learning are essentially the same thing.
You choose a performance metric. In the case of image classification, your performance metric might be how accurately can I predict that an apple is a Granny Smith apple or a Red Delicious, I think of what those types of apples are. Obviously that’s easy for a human to do, because we can just look at the color. We can look at the features of the apple. But to a computer, that’s not something that they could just do. That’s not something that you could just program into them and say, like, if the picture looks like this, then Granny Smith; if the picture looks like this, then Red Delicious. We choose these performance metrics, and then we feed them experiences. Those experiences could be images. In the case of something like text classification, it would be example texts. Then our system improves over time at that task.
Essentially, machine learning, to me, would be just a computer program that meets that definition of improving on some task through some experiences on some performance metric.
My entire understanding of neural networks is something akin to, “It is magic.” There’s something about stimulus and response and self-organization.
The funny thing about neural networks is that asking even the most seasoned deep learning researcher what’s going on inside the training of a neural network, they would probably give you a good answer. A lot of times it just boils down to, “We’re not really sure why this worked as well as it did or why this happened.”
When you’re dealing with large models, you’re dealing with really high-dimensional spaces, a lot of complexity. Trying to boil down the scale of something like a large language model and what it’s doing and how it learns is a really, really difficult task.
The definition I like the most comes from Francois Chollet, who is the author of deep learning in Python. He’s a researcher at Google, the creator of the Keras library. He describes neural networks as learning hierarchical representations. A lot of confusion that arises with neural networks is precisely in the fact that people call them neural networks. A lot of people just assume that they work like the brain and the brain is complex.
The neural network comes from the fact that you have a bunch of matrices that if you were to draw them out as a network, it ends up looking like a giant graph, a giant collection of neurons. Realistically, all it is is a bunch of linear algebra. You send an input through a bunch of transformations. These transformations are just layers. These transformations extract successive or hierarchical representations from the input.
I would input an image of, let’s say my dog, and then one layer might do the work of extracting text or extracting edges. Then another layer might do the work of extracting colors and the depth of the color and the brightness, and so forth, until I have multiple representations of this input, I have multiple extracted features, essentially, and then I can synthesize those to get an output to correctly classify that this is, in fact, a dog.
Now, the tricky thing is that it’s not necessarily that that’s exactly what’s happening every time you train a model in practice. It could identify some patterns or features in the data you have that don’t map to anything that you or I would understand, because that’s kind of just the nature of working in such high dimensions.
But we like to use these sort of like convenient descriptions of what a neural network is doing to help people understand what’s going on. It really all just boils down to linear algebra and some very simple linear algebra and calculus.
It’s difficult, again, to say exactly what is going on, when you train a neural network, and you know what a neural network specifically learns, because it’s something that, even to this day, is an open research problem. Interpreting what these neural networks learn is still a tough and difficult research problem.
What math do you need to know to understand machine learning?
The beauty in all of this is that nowadays with the frameworks that we have, you don’t really need to understand any of the math behind the models that you’re implementing, because essentially all of it is just abstracted away. The days of someone needing to hand program a support vector machine in C or C++ are over.
You can just use one of the libraries out there that implements the algorithm for you. It takes care of the training for you. It takes care of the inference for you. Understanding the math is kind of just a nice boost. You might have some idea of what might be going on.
I think understanding the math is important because it will help you in some ways correctly apply different algorithms and understand why certain algorithms might fail in certain cases. But to get started, you can get started without having any understanding of linear algebra or calculus or anything that’s going on. A lot of the libraries operate at such a high level that the details are more or less abstracted away from you.
At one point in your book, you talk about the language of data. Could you explain that?
Linear algebra is often nicknamed “the language of data.” That’s why it’s important. Linear algebra, it gives us rules to understand and manipulate real-world data and to represent real-world data. It is a really fundamental part of machine learning. But once again, it’s one of those things that’s kind of abstracted away a lot of times for a lot of the tasks you would end up doing in machine learning anyway.
What is Axon? Also, what is deep learning?
Axon is a library for doing deep learning and creating and training neural networks in Elixir. Deep learning is a class of machine learning that deals particularly with neural networks. The deep aspect of a deep learning model refers to the successive layers in a neural network. This was one of the things that confused me when I was first learning: How could a two-layer neural network be considered a deep learning model? It’s not necessarily deep, right? But it turns out there’s really no concrete definition for what makes a model deep. Any neural network that has more than one or has at least one hidden layer is considered a deep learning model.
The wording for “deep” actually has an interesting story behind it. Back in, I would say, the ’80s, the deep learning folks—it was probably the ’80s or ’90s before they gained popularity—called themselves connectionists because the models had a bunch of connections. If you were to draw them out, they look densely connected.
When connectionists would submit papers to academic conferences, they would get denied because these machine learning conferences didn’t necessarily respect the techniques that they had. They thought that they weren’t necessarily grounded in mathematical principles. They would really only allow publication of one or two deep learning–based papers at these conferences.
At one of the conferences, I’m not exactly sure which one it was—maybe sometime in the early 2000s, maybe the ’90s, I’m not particularly sure—they decided that they were going to rebrand themselves as deep learning to build more of a strong, smart reputation around the techniques they were using. The rest is pretty much history in terms of how people use that term.
What are the strengths of using connectionism or deep learning in terms of development?
Deep learning excels at problems that are difficult to express with traditional logic. That boils down to dealing with unstructured data. If you have some nondeterministic, unstructured input you need to be processed in some way, then a deep learning model can do that really well for you. The quintessential example for this is image classification. I like to use this one a lot.
For example, if I showed you a picture of a dog and I asked you, “What is this picture of?” you would be able to say, it’s a dog. Then if I asked you, well, “Why is it a dog?” you would say, well, it has a tail. It’s got fur. It’s got four legs. You could describe that. Then if I asked you, “OK, now can you write a program that tells me whether or not a picture is of a dog?” that’s something that’s really, really difficult to express with the instructions you have in any programming language. In that case, you need to dig for something that is a little bit deeper, that trades off maybe a little bit of determinism and works in a kind of nondeterministic way for the ability to tell you that that is a dog or whatever else the picture is. Deep learning excels in these cases where it’s difficult for somebody to express what a computation should do with formal logic.
Where do the features come from?
A lot of times, it happens implicitly in the model. One of the strengths of deep learning is that it’s able to extract or do the feature engineering for you. It can extract these representations automatically. Depending on the type of model you use, you might have to do what’s called preprocessing or transformations on the data before you feed it into your model.
With Axon, it deals specifically with just the process of creating a neural network and then training one. It doesn’t necessarily deal with any of the transformations that happen on data. That would be something that would happen with Nx. We have some Nx image processing libraries that would do some of this feature extraction for you with some algorithms for doing this feature extraction.
What exactly is a training process?
The training process is essentially just the process where you feed a bunch of data into a model. The model adjusts its parameters or whatever it is that it’s learning. It adjusts its representations over time. For a neural network, what that looks like is gradient descent. Gradient descent is really just a process of finding a minimum.
I like to use the analogy of if I were to drop you in the middle of a lake or the ocean and I said, “I need you to find the deepest point in the lake without a map,” one of the ways you could do that is you could take samples from, say, a depth finder, but you don’t have access to what the actual depths in the lake are. What you could do is you could take a sample of the depth at multiple different points and you just find the point where the depth continues to go down. You do this, you repeat this process a bunch of times until eventually you get to a point where it may or may not be the deepest point in the lake, but it’s probably deeper than the point you originally started in. That is essentially what gradient descent is doing now.
It’s difficult to visualize what gradient descent is doing. It’s impossible for us to plot the location of a model’s parameters in space because it’s just super high dimensionality and it’s really difficult for us to visualize that process. But it more or less, it’s essentially this depth-finding experiment where you’re trying to find the deepest point in the ocean. In this case, the ocean is your loss function.
The loss function is just a measure of how well your model is doing at a particular task. Or in this case, it’s how bad your model is doing at a particular task. The deeper you get in the ocean, the better your model is performing. You just repeat this process with new samples of data, in this case, new samples of the depth of the ocean, until you reach a point where you’re happy.
How do you define “happiness”?
That’s another one that’s a little difficult because the goodness of a model is not necessarily just how well it does on your training set. Machine learning is unique in that you don’t care about how well you do on the training set or even a validation set, which is a set that you hold out from the training process to evaluate how well your model does. You really just care about how your model does in whatever deployment scenario you have. You care about how your model does on data that it doesn’t see during training. We call that generalization.
That really is one of the key differentiators between machine learning and just pure optimization. In pure optimization, you are trying to minimize some function on data that you have, whereas in machine learning, you’re trying to minimize a function on data that you don’t have. It could also mean, depending on what your business application is, the best model might not necessarily be the model that is the most accurate.
You could have a model that is a fraction of a percent less accurate or has less error than another, but it could be three times as fast. You could experience just a significant latency boost, or it could be three times easier to deploy, three times smaller. It really just depends on what the deployment value or what the value you’re seeking from the model is.
How do you build up a body of training data? And how does that model continue to learn during and throughout deployment?
There’s really no threshold of, “Hey, this is enough data for this task.” It really just depends on what your application is and the thing you’re trying to accomplish. In the past, maybe even just a few years ago, what a lot of data is, or the meaning of a lot of data has changed significantly.
In an image classification model, you might be able to get away with 10,000 examples on whatever your particular task is. These examples could just be images with a classification pair. Nowadays, with large language models, you might need text from the entire internet. That is enough data for your task.
The data and where the data comes from is tied a lot to whatever your business application is. You might go and pull data from some repository on the internet. Maybe you’re pulling code from open source software projects. Or it could be something where the data is tied specifically to whatever your application does. It could be data collected from your users. It could be data collected from analytics. It really is just tied to whatever your business use case is. How you work with that data, how you store that data, how you end up using that data is all really just dependent on what your business needs.
Continuous learning is kind of an open problem where you have a model that fails in some capacity. This happens often. You have models that encounter some failure scenario in production. Then taking that failure scenario and improving the model on it, it’s really just a process of either continuous training or experimenting with new models, identifying new features. It’s more of an art than a science.
There’s no concrete, “Hey, this is the guide for when your machine learning model classifies this picture of a dog as a cat. This is how you handle it.” It’s not a security incident, for example, where there are well-defined steps you should take after a security incident happens. [Model failure] is very open-ended, the study you should do and the steps you should take after a model fails in production.
It’s really just a process of collecting more data and then retraining models on more data, higher quality data, different data, diverse data.
How do you put machine learning into practice?
I like to say that machine learning models are not meant to live in a notebook, because a lot of machine learning tutorials will end with a trained model and notebook. Realistically that’s just not what people need. That’s not the reason you trained a model in the first place. You train a model because you want it to get useful predictions for you and you want to integrate it into some application.
In the Elixir ecosystem, we use NxServing, which is a model inference abstraction. We take these trained models and we wrap them in servings, which are just an abstraction around inference. Then we start those servings as a part of our application supervision tree. Then we can use these models to get inferences with high performance, essentially continuously. It really depends, again, on what your business use case is.
But for us, a lot of times that looks like just wrapping your model in a serving and then putting it in a supervision tree and deploying it as a part of something like a Phoenix app or maybe even a NURBS [Nonuniform Rational B-Splines] project.
Who is the ideal reader for your book?
I would say there are two types of people that are the ideal reader for my book. The first is the Elixir programmer who is interested in machine learning, interested in using Nx, but has no idea what machine learning is or even where to start. I think this is the perfect book for them to start with. The second type of reader is someone who is a machine learning expert or has worked in machine learning for quite a bit but is interested in learning how we do it in the Elixir ecosystem.
I think there’s enough overlap between what you would find in the Python ecosystem and our ecosystem that understanding the examples, the Elixir examples presented in the book, won’t be very difficult. It actually is probably a really good jump for somebody who wants to learn the language to jump into the language with something that they already understand or something that they already know.
How can people follow what you’re up to and keep track of what your latest events are?
I’m most active on X [formerly Twitter] right now. My handle is sean_moriarity. I also have a website that I don’t really write on as much, SeanMoriarty.com. Those are really where I’m most active. Otherwise, you can just kind of see what I’m up to on GitHub. I’m sometimes in the Elixir forum, occasionally in the Elixir Slack.
Then I’m also active in the Erlang Ecosystem Foundation Slack and specifically the Machine Learning Working Group. If you join the Erlang Ecosystem Foundation and then join the Slack, you’ll find me in the Machine Learning Working Group there.