Solution(s)
From what I gathered, there are two decent possible solutions to this conundrum.
row_number
+ over
One of them is using row_number()
with over()
from Ecto:
https://hexdocs.pm/ecto/Ecto.Query.WindowAPI.html#row_number/0
Assuming I join both file_info
with all_counts
in a single table, I can then perform the query as mentioned in the previous SO post I mentioned in my question:
file_info_with_counts
|> select([fi], %{
rn: over(row_number(), partition_by: [fi.home_id, fi.path], order_by: [asc: fi.item_id]),
item_id: fi.item_id,
# you get the idea ...
})
|> subquery()
IO.inspect(file_info_with_counts |> where([c], c.rn <= 3) |> Repo.all()
Which prints what I wanted.
Source:
Lateral inner joins
However, as mentioned by some people in the community, this solution is rather old, and these days lateral joins seem to also cover this use case.
tops =
from top in "file_info_with_counts",
where: top.home_id == parent_as(:parent).home_id,
where: top.path == parent_as(:parent).path,
order_by: [asc: top.item_id],
limit: 3,
select: %{id: top.id}
from parent in "file_info_with_counts",
as: :parent,
group_by: [parent.home_id, parent.path],
lateral_join: top in subquery(tops),
on: true,
select: %{home_id: parent.home_id, path: parent.path, item_id: top.id}
Source:
This solution is not without merit, however, given my familiarity with row_number
I opted for that solution instead.
Unless there is a considerable performance difference between the two in favor of lateral joins, I will keep the previous solution.
For more info, here is the original source where I got these answers: