The optimisation process that occurs during image alignment doesn't have any info about the relative/absolute position of the points in the real world, just their positions in the images. Agisoft Lens uses a grid of squares as a target, so assuming that there is some pattern recognition going on to recognise the squares, then it knows that the distance between each of the corners is the same, and rows/columns of corners will have the same X/Y coordinates. This helps avoid lens parameters that would result in the "bowl" effect. GCPs perform the same function, although this doesn't help users of the standard version.
The bowl effect in turn is caused by the the optimisation process drifting towards incorrect values that provide a mathematically correct outcome. This is usually due to inadequate overlap of images (not just the amount, but the spread of overlap across the frame) In these cases, the fov, distortion parameters and camera orientations compensate for the deviations from the true values. The same thing happens with automated panorama stitching, although it's less critical in that application.