# K Closest star selection

May 9, 2018
`programming`

`gamedev`

In my post about generation galaxy warp lanes GalaxyWarpLaneGeneration I talked about the a naive method that would select the N closest stars to a given star. Even though the name states the method as naive it isn’t such a naive tasks when you consider large amounts of stars. For a game the number of stars could be in the hundreds or maybe thousand, but what if for a minute we actually consider the real world? How could you select N closest start from 10M star? or 100M maybe even a 1B?

My initial though was the simple approach: calculate the distance between the source and each of the other stars, and sort the other star list on this distance. This results in an O(nlogn) algorithm which is the best a sort on this type of randomized data. But there is a better way: The key is we don’t need to sort all the list, we are just looking to get the closest N elements. The algorithm is as follows:

- create a MAX heap of size N
- add the first N stars from the list to the heap, based on a distance comparator from the source star.
- for the remaining stars in the list:
- if the distance to that star is less than the top of the heap, remove the top and add this star.

- get all the element in the heap for the result.

```
public List<Star> closest(List<Star> stars, int n) {
PriorityQueue<Pair<Star,Double>> heap = new PriorityQueue<>(n,
(x1, x2) -> x2.getSecond().compareTo(x1.getSecond()));
for (int i =0;i<n;i++) {
heap.add(new Pair<>(stars.get(i), distance(stars.get(i))));
}
for(int i=n;i<stars.size();i++) {
if(stars.get(i) == this) continue;;
double distance = distance(stars.get(i));
if (distance < heap.peek().getSecond()) {
heap.poll();
heap.add(new Pair<>(stars.get(i), distance));
}
}
return heap.stream().map(Pair::getFirst).collect(toList());
}
```

we are adding (N-k) element to a heap of k element. Each insertion will cost log(k). So the total will be o(N * logK) I am usually connecting 2 or 3 stars which result in 1000 * log2 which is around 300 operations vs. 1000 * log(1000) which is also 300 operations. It doesn’t make a difference for the basic case. But if I had 10M start then 10M * log2 = 3M vs 10M * log10M = 70M it does make a difference.