Forum

Author Topic: Network vs Non-network alignment performance  (Read 1592 times)

andyroo

  • Sr. Member
  • ****
  • Posts: 438
    • View Profile
Network vs Non-network alignment performance
« on: December 02, 2020, 10:28:00 AM »
I was comparing alignment time on a relatively large project on network vs non-network, and was surprised that the non-network machine seems to be going much faster (~4x). Especially in alignment finalization. One thing I noticed is that the node is only processing 7 images at a time, while the workstation is processing  ~20. The workstation (Threadripper 3960x/256GB RAM/2x RTX2080 Super) takes 5-6 minutes to adjust points from each 20-image batch, while the network node (2x 18-core 2.3GHz Skylake CPU/384GB RAM/4x NVidia V100) takes 7-8 minutes to adjust points from each 7-image batch. I understand that the 3960x is higher frequency, but not why the machine with more RAM/cores is taking fewer images. The project is the same, just copied to the network and paths changed on images. Network nodes have faster disk/network access than my workstation.

log excerpts below:

Code: [Select]
...
2020-12-01 22:52:00 adding camera 75068 (77523 of 80065), 1128 of 1132 used
2020-12-01 22:52:00 adding camera 76188 (77524 of 80065), 1056 of 1056 used
2020-12-01 22:52:43 adding 116073 points, 45 far (12.272 threshold), 311 inaccurate, 346 invisible, 129 weak
2020-12-01 22:53:48 adjusting: xxxxxxxxxx 0.268141 -> 0.267689
2020-12-01 22:59:16 adding 964 points, 413 far (12.272 threshold), 322 inaccurate, 352 invisible, 131 weak
2020-12-01 22:59:16 optimized in 393.425 seconds
2020-12-01 22:59:36 adding camera 77283 (77525 of 80065), 7480 of 7500 used
2020-12-01 22:59:36 adding camera 78352 (77526 of 80065), 5854 of 5858 used
2020-12-01 22:59:36 adding camera 76647 (77527 of 80065), 4961 of 4964 used
2020-12-01 22:59:36 adding camera 77607 (77528 of 80065), 4405 of 4415 used
2020-12-01 22:59:36 adding camera 77284 (77529 of 80065), 3959 of 3986 used
2020-12-01 22:59:36 adding camera 77323 (77530 of 80065), 3047 of 3059 used
2020-12-01 22:59:36 adding camera 76345 (77531 of 80065), 2832 of 2832 used
2020-12-01 22:59:36 adding camera 77427 (77532 of 80065), 2829 of 2833 used
2020-12-01 22:59:36 adding camera 76648 (77533 of 80065), 2721 of 2727 used
2020-12-01 22:59:36 adding camera 78353 (77534 of 80065), 2613 of 2620 used
2020-12-01 22:59:36 adding camera 78543 (77535 of 80065), 2494 of 2504 used
2020-12-01 22:59:36 adding camera 76189 (77536 of 80065), 2260 of 2260 used
2020-12-01 22:59:36 adding camera 77608 (77537 of 80065), 2020 of 2030 used
2020-12-01 22:59:36 adding camera 77285 (77538 of 80065), 1845 of 1859 used
2020-12-01 22:59:36 adding camera 76344 (77539 of 80065), 1408 of 1408 used
2020-12-01 22:59:36 adding camera 76649 (77540 of 80065), 1401 of 1413 used
2020-12-01 22:59:36 adding camera 75067 (77541 of 80065), 1349 of 1349 used
2020-12-01 22:59:36 adding camera 77322 (77542 of 80065), 1299 of 1310 used
2020-12-01 22:59:36 adding camera 77426 (77543 of 80065), 1145 of 1145 used
2020-12-01 22:59:36 adding camera 76190 (77544 of 80065), 1016 of 1016 used
2020-12-01 23:00:18 adding 101126 points, 36 far (12.272 threshold), 314 inaccurate, 352 invisible, 133 weak
2020-12-01 23:01:24 adjusting: xxxxxxxxxx 0.26763 -> 0.267341
2020-12-01 23:06:53 adding 916 points, 422 far (12.272 threshold), 324 inaccurate, 351 invisible, 134 weak
2020-12-01 23:06:53 optimized in 395.194 seconds
2020-12-01 23:07:13 adding camera 77286 (77545 of 80065), 7435 of 7447 used
2020-12-01 23:07:13 adding camera 77609 (77546 of 80065), 6284 of 6292 used
2020-12-01 23:07:13 adding camera 77321 (77547 of 80065), 6004 of 6017 used
...

and the network version:

Code: [Select]
...
2020-12-02 01:09:59 adding camera 76868 (78524 of 80065), 2417 of 2423 used
2020-12-02 01:09:59 adding camera 76869 (78525 of 80065), 1439 of 1449 used
2020-12-02 01:10:19 adding 32960 points, 8 far (12.272 threshold), 314 inaccurate, 364 invisible, 153 weak
2020-12-02 01:11:11 adjusting: xxxxxxxxxx 0.254763 -> 0.25469
2020-12-02 01:18:11 adding 832 points, 303 far (12.272 threshold), 323 inaccurate, 364 invisible, 153 weak
2020-12-02 01:18:11 optimized in 472.695 seconds
2020-12-02 01:18:32 adding camera 76870 (78526 of 80065), 4766 of 4783 used
2020-12-02 01:18:32 adding camera 77833 (78527 of 80065), 4613 of 4626 used
2020-12-02 01:18:32 adding camera 76871 (78528 of 80065), 2639 of 2651 used
2020-12-02 01:18:32 adding camera 77834 (78529 of 80065), 2339 of 2347 used
2020-12-02 01:18:32 adding camera 76872 (78530 of 80065), 1394 of 1406 used
2020-12-02 01:18:52 adding 32871 points, 18 far (12.272 threshold), 314 inaccurate, 364 invisible, 152 weak
2020-12-02 01:19:44 adjusting: xxxxxxxxxx 0.254728 -> 0.254651
2020-12-02 01:26:29 adding 830 points, 304 far (12.272 threshold), 323 inaccurate, 364 invisible, 152 weak
2020-12-02 01:26:29 optimized in 457.407 seconds
2020-12-02 01:26:50 adding camera 77835 (78531 of 80065), 8946 of 8949 used
2020-12-02 01:26:50 adding camera 76873 (78532 of 80065), 4739 of 4748 used
...

[edit] 15 hours in on the workstation and it's within 5% of the network, which has been running for about 60h. The workstation is at 83% complete, network is at 88%. I see that the workstation also takes smaller bites sometimes (like now, where it's at about the same stage as the network alignment), but finishes those much more quickly, around 1 minute, instead of 7:

Code: [Select]
2020-12-02 09:05:59 adding camera 76833 (78428 of 80065), 1185 of 1188 used
2020-12-02 09:05:59 adding camera 76211 (78429 of 80065), 1010 of 1012 used
2020-12-02 09:06:42 adding 39366 points, 18 far (12.272 threshold), 319 inaccurate, 362 invisible, 134 weak
2020-12-02 09:07:49 adjusting: xxxxxxxxxx 0.252005 -> 0.251927
2020-12-02 09:13:54 adding 815 points, 317 far (12.272 threshold), 324 inaccurate, 362 invisible, 134 weak
2020-12-02 09:13:54 optimized in 432.302 seconds
2020-12-02 09:14:13 adding camera 77799 (78430 of 80065), 10228 of 10235 used
2020-12-02 09:14:13 adding camera 77800 (78431 of 80065), 4966 of 4976 used
2020-12-02 09:14:13 adding camera 76834 (78432 of 80065), 3857 of 3859 used
2020-12-02 09:14:13 adding camera 76210 (78433 of 80065), 2683 of 2686 used
2020-12-02 09:14:13 adding camera 77801 (78434 of 80065), 2278 of 2290 used
2020-12-02 09:14:13 adding camera 76835 (78435 of 80065), 2180 of 2182 used
2020-12-02 09:14:13 adding camera 76209 (78436 of 80065), 1538 of 1542 used
2020-12-02 09:14:13 adding camera 74924 (78437 of 80065), 1521 of 1521 used
2020-12-02 09:14:13 adding camera 76836 (78438 of 80065), 1321 of 1325 used
2020-12-02 09:14:55 adding 49143 points, 9 far (12.272 threshold), 319 inaccurate, 362 invisible, 132 weak
2020-12-02 09:16:03 adjusting: xxxxxxxxxx 0.252002 -> 0.251906
2020-12-02 09:22:48 adding 813 points, 317 far (12.272 threshold), 323 inaccurate, 362 invisible, 132 weak
2020-12-02 09:22:48 optimized in 472.756 seconds
2020-12-02 09:23:07 adding camera 77802 (78439 of 80065), 10655 of 10665 used
2020-12-02 09:23:07 adding camera 77803 (78440 of 80065), 5295 of 5308 used
« Last Edit: December 02, 2020, 08:30:59 PM by andyroo »

andyroo

  • Sr. Member
  • ****
  • Posts: 438
    • View Profile
Re: Network vs Non-network alignment performance
« Reply #1 on: December 03, 2020, 06:58:37 AM »
Another 10h in and they're both going about the same pace - workstation is 4.9% behind network project - 87% to 90.9%. I notice on my workstation that Metashape shows 404GB of Commit RAM and 57GB working set/private RAM, with 107GB available RAM (256GB installed).

Is this potentially a factor for the dramatic slowdown in the later part of this alignment?

The number of cameras added for each adjustment/optimization is pretty variable with both the workstation and the network, but the workstation is regularly about a minute faster on each. Network machine shows 16GB used out of 384

andyroo

  • Sr. Member
  • ****
  • Posts: 438
    • View Profile
Re: Network vs Non-network alignment performance
« Reply #2 on: December 03, 2020, 11:21:00 AM »
I was exploring system resource use and noticed that metashape is reading/writing to pagefile.sys and system vol. information during adding/adjusting (screenshots attached) - not a huge amount of activity, but it makes me wonder if it's using the pagefile, and why, since I have like 100GB of free RAM.

MaxRSS on the SLURM node where the job was processing was ~71GB with 384GB RAM.
« Last Edit: December 04, 2020, 12:40:30 AM by andyroo »

andyroo

  • Sr. Member
  • ****
  • Posts: 438
    • View Profile
Re: Network vs Non-network alignment performance
« Reply #3 on: December 06, 2020, 11:14:47 PM »
wondering if there are some tweaks to project setup or processing that might speed up this last step, and why it makes it so quickly to ~80% but then so slowly for the last 20% with large projects), seemingly regardless of the amount of available RAM.