I have a metashape photogrammetry output and want to load the cameras from the XML file.
The format seems pretty straight forward.
<cameras next_id="#CAMS" next_group_id="0">
<camera id="0" sensor_id="0" label="NAME">
<transform>4x4 matrix I call transformation_cam_to_rig</transform>
</camera>
[...]
</camera>
</cameras>
<transform>
<rotation locked="false">3x3 matrix I call rotation_rig_to_world</rotation>
<translation locked="false">3 vector I call translation_rig_to_world</translation>
<scale locked="true">float I call cam_scale</scale>
</transform>
Playing around with the data I think I got the coordinate system relations right.
Which means I can concatenate the two transformations and invert it in order to get world-to-cam camera extrinsics.
rotation_cam_to_rig = transformation_cam_to_rig[:3, :3]
translation_cam_to_rig = transformation_cam_to_rig[:3, 3]
# Cam2World = Rig2World x Cam2Rig
rotation_cam_to_world = rotation_rig_to_world @ rotation_cam_to_rig
translation_cam_to_world = (rotation_rig_to_world @ translation_cam_to_rig) + translation_rig_to_world
# World2Cam = inverse(Cam2World)
rotation_world_to_cam = rotation_cam_to_world.T
translation_world_to_cam = -rotation_world_to_cam @ translation_cam_to_world
extrinsics = Extrinsics(R=rotation_world_to_cam, t= translation_world_to_cam * cam_scale)
And the mesh from the photogrammetry export gets not scaled/transformed at all.
If I do all the above, I get my cameras of the rig in the right orientation and in the correct relationship to each other, but all the cameras are off by the same translation vector and I don't know where this comes from. The cameras in the fbx file are alright, but I can't manage to load/create the system directly from XML.
The relation ship between all the cams and the orientation from them to the mesh fits the one that I see when I load the fbx and the obj from the metashape output, except that offset.
Anything that I missed?
For easier debug purposes I attached the xml with just a single camera of the rig.
Doing the math I explained the camera gets this extrinsics:
R: [[ 9.99969410e-01 7.63561715e-03 -1.69572932e-03], [ 7.63491560e-03 -9.99970765e-01 -4.19803546e-04], [-1.69888521e-03 4.06843954e-04 -9.99998474e-01]]
t: [-756.47805702 -82.62256735 1256.77879887]
and therefore a position of -R.T @ t = [[ 759.22085595 -77.35528795 1255.45941392]].
This camera is too far to off in x-direction.
(unexpected offset is ~[ 800 80 -130])