My suggestion is not about finding camera poses, but rather establishing the internal camera parameters that determine the mapping between incoming rays and image positions; specifically the principal point, focal length and distortion parameters. These are critical to accurate modeling, and best practice is to estimate them by a calibration procedure that is independent of the model data set .
The traditional calibration method relies on photos of charts having known structure, usually a planar chess board pattern. This is the method supported by MetaShape software. It works well for rectilinear lenses of long focal length, but not well for wide-angle and fish-eye lenses, which are actually more useful for large scale modeling. The main problem is that a chart of practical size, presented at a realistic working distance, covers only a small part of the field of view of such a lens. Placing the chart close to the lens is not a good solution, because at that distance the distortion function can be quite different from the one for a subject at normal working distance.
The robot panohead method uses photos taken at normal working distance, and substitutes known camera pointing angles for known subject structure. I know from experience that it finds very accurate and repeatable lens parameters. That includes the principal point, which is fundamental to accurate distortion correction but is so difficult to estimate from model data that the default in MetaShape is to not even try. This is unfortunate, because the principal point depends on mechanical factors that can change from one shoot to the next, namely play in the lens mount and temperature. Best practice with a removable lens camera is to shoot a calibration series at the same time as the model images, with the lens held firmly in position for example by an elastic cord.