If you are not using some preselection option, then each image is compared against each other image during matching points and 40,000 key points is overkill value. For 5,000 images it means 5,000x5,000x40,000 point to compare...huge number.
Try set 10,000 key points limit and 4000 or even less(3,000/2,000) for tie point limit. If some images will be not aligned, try 15,000 key point limit.
From GPU utilization perspective it is good to have more key points for matching, because your GPU will be longer time computing and less time transfering data to and from GPU...have more performant GPU(e.g. RTX3080) make sense. If 10,000 key limit would work for you, no need to have high performance GPU(RTX3060ti would be enough).
You can check my test in this post
https://www.agisoft.com/forum/index.php?topic=14622.msg64275#msg64275 to see what is the difference in GPU utilization when different number of key points needs to be matched.
The good is, that estimating location does not take long.
Your detect points 10min. time for 864 photos...you can speed up this phase on CPU with high single core frequency, because it is single threaded task.
I am using RTX 2060 super, intel 11700f@4.4GHz and 18Mpix JPEG files, each ~ 10MB big. My CPU can feed GPU at speed ~ 3-4 JPEGs/s. Time of detect point on one photo by GPU does not care much, because it is quick process, the bottleneck is CPU single core boost frequency.
Try make changes in alignment settings and if it does not help you much, then we will try to change hardware.