Metashape 1.6.5 network processing on HPC.
I'm trying to understand the worker/processing node logfile in the network monitor during the MatchPhotos.prematch part of alignment.
It looks like ~1/2 of the time is being taken initializing the GPUs in each iteration of the task, but I don't know if there is more going on behind the scenes.
In the logfile excerpt below, it looks like each task (iteration of MatchPhotos.prematch) takes ~140-160s, and for each GPU (4 total) "free memory" takes ~18s - I'm not sure if that's querying free memory, or freeing the memory from the last run, or something else, but by my math, each iteration spends about 72s on the "free memory:" portion of the task. Is that normal?
Log excerpt below going through a few iterations starting from the end of point detection:
2020-11-29 23:47:18 [GPU] photo 82059: 70000 points
2020-11-29 23:47:18 points detected in 150.208 sec
2020-11-29 23:47:23 processing finished in 162.678 sec
2020-11-29 23:50:02 MatchPhotos.prematch (4/100): accuracy = High, preselection = generic, reference, keypoint limit = 70000, tiepoint limit = 0, apply masks = 0, filter tie points = 0
2020-11-29 23:50:10 loaded matching partition in 0.025084 sec
2020-11-29 23:50:10 loaded keypoint partition in 0.018421 sec
2020-11-29 23:50:54 loaded keypoints in 37.2838 sec
2020-11-29 23:50:54 loaded matching data in 0.000232 sec
2020-11-29 23:50:54 Selecting pairs...
2020-11-29 23:50:54 Found 4 GPUs in 0.261444 sec (CUDA: 0.217975 sec, OpenCL: 0.043425 sec)
2020-11-29 23:51:12 Using device: Tesla V100-SXM2-16GB, 80 compute units, free memory: 15806/16130 MB, compute capability 7.0
2020-11-29 23:51:12 driver/runtime CUDA: 10010/6050
2020-11-29 23:51:12 max work group size 1024
2020-11-29 23:51:12 max work item sizes [1024, 1024, 64]
2020-11-29 23:51:12 got device properties in 0.001026 sec, free memory in 18.1027 sec
2020-11-29 23:51:30 Using device: Tesla V100-SXM2-16GB, 80 compute units, free memory: 15806/16130 MB, compute capability 7.0
2020-11-29 23:51:30 driver/runtime CUDA: 10010/6050
2020-11-29 23:51:30 max work group size 1024
2020-11-29 23:51:30 max work item sizes [1024, 1024, 64]
2020-11-29 23:51:30 got device properties in 0.000813 sec, free memory in 18.3504 sec
2020-11-29 23:51:49 Using device: Tesla V100-SXM2-16GB, 80 compute units, free memory: 15806/16130 MB, compute capability 7.0
2020-11-29 23:51:49 driver/runtime CUDA: 10010/6050
2020-11-29 23:51:49 max work group size 1024
2020-11-29 23:51:49 max work item sizes [1024, 1024, 64]
2020-11-29 23:51:49 got device properties in 0.000794 sec, free memory in 18.2349 sec
2020-11-29 23:52:06 Using device: Tesla V100-SXM2-16GB, 80 compute units, free memory: 15806/16130 MB, compute capability 7.0
2020-11-29 23:52:06 driver/runtime CUDA: 10010/6050
2020-11-29 23:52:06 max work group size 1024
2020-11-29 23:52:06 max work item sizes [1024, 1024, 64]
2020-11-29 23:52:06 got device properties in 0.002548 sec, free memory in 17.4495 sec
2020-11-29 23:53:18 977304 matches found in 144.231 sec
2020-11-29 23:53:18 matches combined in 0.133228 sec
2020-11-29 23:53:19 filtered 84711 out of 579510 matches (14.6177%) in 0.116332 sec
2020-11-29 23:53:19 saved matches in 0.006697 sec
2020-11-29 23:53:19 processing finished in 197.588 sec
2020-11-29 23:53:22 MatchPhotos.prematch (11/100): accuracy = High, preselection = generic, reference, keypoint limit = 70000, tiepoint limit = 0, apply masks = 0, filter tie points = 0
2020-11-29 23:53:29 loaded matching partition in 0.010066 sec
2020-11-29 23:53:29 loaded keypoint partition in 0.004592 sec
2020-11-29 23:53:38 loaded keypoints in 7.61736 sec
2020-11-29 23:53:38 loaded matching data in 0.000374 sec
2020-11-29 23:53:38 Selecting pairs...
2020-11-29 23:53:38 Found 4 GPUs in 0.23591 sec (CUDA: 0.196422 sec, OpenCL: 0.039459 sec)
2020-11-29 23:53:56 Using device: Tesla V100-SXM2-16GB, 80 compute units, free memory: 15806/16130 MB, compute capability 7.0
2020-11-29 23:53:56 driver/runtime CUDA: 10010/6050
2020-11-29 23:53:56 max work group size 1024
2020-11-29 23:53:56 max work item sizes [1024, 1024, 64]
2020-11-29 23:53:56 got device properties in 0.000824 sec, free memory in 18.0572 sec
2020-11-29 23:54:14 Using device: Tesla V100-SXM2-16GB, 80 compute units, free memory: 15806/16130 MB, compute capability 7.0
2020-11-29 23:54:14 driver/runtime CUDA: 10010/6050
2020-11-29 23:54:14 max work group size 1024
2020-11-29 23:54:14 max work item sizes [1024, 1024, 64]
2020-11-29 23:54:14 got device properties in 0.000802 sec, free memory in 17.8316 sec
2020-11-29 23:54:32 Using device: Tesla V100-SXM2-16GB, 80 compute units, free memory: 15806/16130 MB, compute capability 7.0
2020-11-29 23:54:32 driver/runtime CUDA: 10010/6050
2020-11-29 23:54:32 max work group size 1024
2020-11-29 23:54:32 max work item sizes [1024, 1024, 64]
2020-11-29 23:54:32 got device properties in 0.000784 sec, free memory in 17.5461 sec
2020-11-29 23:54:49 Using device: Tesla V100-SXM2-16GB, 80 compute units, free memory: 15806/16130 MB, compute capability 7.0
2020-11-29 23:54:49 driver/runtime CUDA: 10010/6050
2020-11-29 23:54:49 max work group size 1024
2020-11-29 23:54:49 max work item sizes [1024, 1024, 64]
2020-11-29 23:54:49 got device properties in 0.000805 sec, free memory in 17.486 sec
2020-11-29 23:56:01 269548 matches found in 142.999 sec
2020-11-29 23:56:01 matches combined in 0.062111 sec
2020-11-29 23:56:02 filtered 30416 out of 151205 matches (20.1157%) in 0.045621 sec
2020-11-29 23:56:02 saved matches in 0.006546 sec
2020-11-29 23:56:02 processing finished in 160.668 sec
Thanks for any insight. Happy to read more about network processing if I'm missing an informative reference/resource.
[edit] - a quick additional question/comment - I notice that in the code snippet below during the step between "setting point indices..." and "generated XXX tiepoints", there is a long time (almost an hour) with almost no CPU activity (I couldn't tell if the GPUs were active on this node). I was reviewing my workstation log and this long period occurs in non-network processing too - is this a GPU or CPU step? I guess it slows down with many tiepoints?
2020-11-30 01:46:24 MatchPhotos.finalize (1/1): accuracy = High, preselection = generic, reference, keypoint limit = 70000, tiepoint limit = 0, apply masks = 0, filter tie points = 0
2020-11-30 01:46:30 loaded matching data in 0.016731 sec
2020-11-30 01:46:31 loaded matching partition in 0.113757 sec
2020-11-30 01:46:31 loaded keypoint partition in 0.000596 sec
2020-11-30 01:46:44 loaded matches in 8.68071 sec
2020-11-30 01:46:44 setting point indices... 63857782 done in 25.3933 sec
2020-11-30 02:37:58 generated 63857782 tie points, 3.26963 average projections
2020-11-30 02:38:12 removed 344871 multiple indices
2020-11-30 02:38:13 removed 21032 track