Agisoft Metashape
Agisoft Metashape => Python and Java API => Topic started by: Phogi on June 10, 2021, 03:17:52 PM

Hi everyone,
I searched on the forum and found there are two ways to reproject 3D points into 2D image pixels, one is from sensor.calibration, the other is from camera.project. What is the difference between them? Furthermore how does the pixels count, from topleft corner or they've been shifted from center + cx/cy?
Could anyone help with how does camera.project works? The reason is I want to check why some markers are having higher pixel errors when they do align fine, if there's anything abnormal in the camera matrix or projection matrix, but how can I retrieve this info?
Thanks in advance!

Hello Phogi,
I think the 2 methods are equivalent. Given a marker (<Marker 'PC01'>) with internal 3d coordinates (marker.position), its 2d coordinates on a given camera (<Camera 'IMG_5813.JPG'>) can be determined either by:
1. combining camera.sensor.calibration.project with camera.transform.inv() as in camera.sensor.calibration.project(camera.transform.inv().mulp(marker.position))
Out[12]: 20210610 07:53:29 Vector([1998.4666958189314, 2522.822857879895])
;
2. just using camera.project as in camera.project(marker.position)
Out[13]: 20210610 07:54:11 Vector([1998.4666958189316, 2522.822857879895])
As you can see result is the same. I would use camera.project as it is more elegant and recent. Especially as it treats rolling shutter type sensor projection correctly while old method (sensor.calibration.project + camera.transform.inv()) does not. See following and attached screen capture:
marker
Out[23]: 20210610 08:10:29 <Marker '5'>
camera
Out[24]: 20210610 08:10:35 <Camera 'DJI_0011_S'>
camera.sensor.rolling_shutter
Out[25]: 20210610 08:10:44 True
camera.sensor.calibration.project(camera.transform.inv().mulp(marker.position))
Out[26]: 20210610 08:10:59 Vector([2618.710760365361, 1049.2876409972341])
camera.project(marker.position)
Out[27]: 20210610 08:11:15 Vector([2618.1837353917476, 1046.1846513344458])
PS. the 2d pixel image coordinates given by these formula are relative to the top left corner....as specified in Appendix C of User Manual:
The image coordinate system has origin in the middle of the topleft pixel (with coordinates (0.5, 0.5)). The X axis in the image coordinate system points to the right, Y axis points down. Image coordinates are measured in pixels.

Hi Paul,
Thank you! That explains very clear about the difference. But for the image pixels it does not look like from topleft (as attached image) but more like from bottum left?
Could you explain a bit more, camera.transform.inv().mulp(marker.position) which coordinate system does marker.position and the transformed result are in? As chunk.transform.matrix.inv().mulp(chunk.crs.unproject(marker.reference.location)) from my understanding is to transform marker's world coordinates to camera coordiante system, but I tried to use x/z to calculate the pixels it does differently, so I'm not sure what's the procedure of camera.project is doing...
Thank you!
Phogi

Hello Phogi,
for the example posted, are you sure the image shown corresonds to camera.project in code? Has its orientation been modified from 0? You can see that by opening photos pane and look at Orientation column in Details view.
camera.transform.inv().mulp(marker.position) goes from internal 3d coordinate system (Xint, Yint, Zint) to camera 3d coordinate system (Xcam, Ycam, Zcam) where (from User Manual Appendix C):
The local camera coordinate system has origin at the camera projection center. The Z axis points towards the viewing direction, X axis points to the right, Y axis points down
3 rd attachment shows coordinates axes for internal and camera coordinate system and respective coordinates of Point 1 in each CS.
chunk.transform.matrix.mulp(marker.position) goes from internal 3d coordinate system (Xint, Yint, Zint) to geocentric world coordinate system (Xgeoc,Ygeoc,Zgeoc) see 2nd attachement
and chunk.crs.project(Xgeoc , Ygeoc, Zgeoc ) goes from world geocentric system (Xgeoc,Ygeoc,Zgeoc) to coordinate system defined by chunk crs (Xcrs, Ycrs, H) (for example UTM coordinates).
So chunk.transform.matrix.inv().mulp(chunk.crs.unproject(marker.reference.location)) would go from chunk crs coordinate system to internal 3d coordinate system...the coordinates should be very close to marker.position...
Hope this is clear,

Hi Paul,
Thank you so much, I actually found that the marker position I used was wrong! In GUI GCP002 is the first marker but then in the chunk.markers it is the third, so I was using another marker position that's why it was off a lot!
May I ask if you know how can we get the projection matrix? As camera.project() can project world coordinates into image coordinates, from my understanding the marker.position it should be meaning the u,v,w coordinates, but it can't directly change from homogenious coordinates by divide the w, so I'm confusing about which internal coordinate system does this marker.position actually refer to?
Thanks a lot!
Phogi

Phogi,
I think that camera.project() which transforms from your project internal coordinates (Xint, Yint, Xint) to camera pixel coordinates (u,v) can be represented by combination of 2 matrices in case where there is no radial and decentering distorsion and where camera is frame type with no rolling shutter. These are:
 camera.transform.inv() matrix (CTinv) which transforms from internal project coordinates to camera local coordinates (Xcam, Ycam, Zcam);
 calibration matrix K = Metashape.Matrix([f+b1, b2, w/2+cx], [0, f, h/2+cy], [0, 0, 1]) (where f = focal, cx,cy = principal point coordinates, b1 = affinity, b2 = skew, w = image width and h = image height in pixels) which transforms from camera homogeneous coordinates (Xcam/Zcam, Ycam/Zcam, 1) to pixel coordinates (u,v).
In case where there is radial and decentering distorsion, then it can not be represented by only combining the 2 matrices as distorsion (radial + decentering) values have to be applied between use of these 2 matrices (CTinv and K)...
see following attachment... to better explain this...

Hi Paul,
Wow that's fantastic examples!! Thank you, it is the most detailed example I ever seen! Huge thanks!
:) ;) :D
Best,
Phogi

Hi Paul,
Thank you for the explains, I have another question regarding the camera transformation matrix, so if I know the photo's precise geolocation, which should be the translation of camera matrix relative to the world coordinate system. How will that represent in Metashape's coordinate system? It seems the CTinv * Tinv should refer to the world to internal coordinates but it does not link with the photo's geolocation though. Should I consider camera.center as the internal coordinates related to the geocentric of image GPS?
Thank you!
Best,
Phogi

Phogi,
The camera transform matrix CT translation component contains the camera.center coordinates in internal CS see:
chunk = Metashape.app.document.chunk
c = chunk.cameras[0] # 1st camera in chunk
CT = c.transform # camera transform matrix from local camera CS to internal CS
CT
Out[10]: 20210628 09:50:18
20210628 09:50:18 Matrix([[0.9877551448514599, 0.06472273578200843, 0.1419533067327537, 6.441632407819176],
20210628 09:50:18 [0.04974436100893698, 0.993052621548698, 0.1066395301146463, 8.4591383771754],
20210628 09:50:18 [0.1478691055199951, 0.09827236797875841, 0.9841124271771823, 0.16349698202983323],
20210628 09:50:18 [0.0, 0.0, 0.0, 1.0]])
c.center
Out[11]: 20210628 09:50:39 Vector([6.441632407819176, 8.4591383771754, 0.16349698202983323])
CT.translation()
Out[12]: 20210628 09:52:15 Vector([6.441632407819176, 8.4591383771754, 0.16349698202983323])
and to get the camera center in geocentric coordinates just do T.mulp(c.center) where T = chunk.transform.matrix see:
c = chunk.cameras[0] # 1st camera in chunk
CT = c.transform # camera transform matrix from local camera CS to internal CS
T = chunk.transform.matrix
T.mulp(c.center) # geocentric coordinates of camera center
Out[4]: 20210628 18:35:43 Vector([924778.0098904001, 5818984.275948655, 2435469.3308245763])
T*CT
Out[5]: 20210628 18:36:10
20210628 18:36:10 Matrix([[25.156228593938813, 56.75552071028278, 22.513266829782758, 924778.0098904002],
20210628 18:36:10 [27.333547068227375, 11.304934505212975, 59.04191572109327, 5818984.275948654],
20210628 18:36:10 [54.59774280236616, 31.81008450558328, 19.185337009006652, 2435469.3308245763],
20210628 18:36:10 [0.0, 0.0, 0.0, 1.0]])
(T*CT).translation() # translation component of T*CT = geocentric coordinates of camera center
Out[6]: 20210628 18:37:16 Vector([924778.0098904002, 5818984.275948654, 2435469.3308245763])

Hi Paul,
Thank you, that makes sense, should this mean the projection matrix then right? From there it could be used to draw the viewmatches as it related to the essential matrix. I am looking at some weird case that the GCP marking RMSE is small enough but the DEM is lower than the GCP, probably this is another different question.
Much appreciate for your kind help Paul, legend!
Best,
Phogi