Hello,
I had data sets where the alignment with 40k points was not succesful (cameras were not aligned), while with 20k the orientation of cameras was solved. In both situation the maximum set points were found. So with 40k, in each photo 40k points were detected. For 20k, it was 20k as well.
As I already said, the reliability of matches between photos in case of 20k could be higher, so that a threshold value between valid/invalid matches in case of lower amount of points could cause a more robust solution. There are always many missmatches recognized as valid in datasets with poor lightning/quality (noise), so that I can imagine, in case of 20k points per photo, more matches with a better reliability are prefered/used. In case of 40k there could be the same matches, but additionally some of lower quality that could falsify the results.
Thats only my opinion, I found it strange too as I saw better results with lower points. So I thought about it, and thats what makes sense for me.
Unfortunately I am on vacation right now, so that I cant search for a example data set. But when I come back I will try to prepare it for you.