Forum

Author Topic: Poor network processing performance  (Read 6013 times)

jinjamu

  • Jr. Member
  • **
  • Posts: 57
    • View Profile
Poor network processing performance
« on: November 07, 2024, 02:25:16 PM »
I am doing some preparations for some fairly large scale underwater photogrammetry, looking at multiple models of around 10,000 to 15,000 12Mpix photos, conceivably substantially more photos depending on upcoming trials, to be processed ideally as one model.  I decided to have a look at using the network processing option to see if this could provide a convenient, scalable approach, using networked laptops since processing may actually be happening on board a vessel out at sea.  I set up 2 laptops, one a XMG running a Ryzen 9 5900HX plus Nvidia 3080, the other a Razer running an I9-14900HX plus Nvidia 4090, both with 64Gb of RAM and SSD storage.  Both quite powerful machines as laptops go. 
It took a while to get metashape network processing to work, most of the effort was spent to get the machines to see each other across the LAN (the Metashape part was quite straightforward).  I used the more powerful Razer machine to act as both the Server and a Worker, and the XMG as just a Worker. 
As test data I used a dataset of around 5000 photos which has been previously used to create a model of an underwater shipwreck.  I ran the alignment using identical processing parameters in the Razer standalone, the XMG standalone and both network processed, and compared the alignment timings.  I got the following:
XMG - Matching time 1hr45m, Alignment time 51m
Razer - Matching time 1hr8m, Alignment time 42m
Networked - Matching time 2hr27m, Alignment time 48m
The only setting I changed was to put the Razer (the more powerful computer) to High priority on the management console, everything else I left as default, reasoning that since it is a more powerful machine, it should be given more work to do(?).  (I could not find much in the way of instructions)
So anyway, I was quite surprised that running BOTH machines in parallel resulted in a substantially LONGER matching time than either of the standalone processes, in fact more than twice as long than the Razer alone.  The Alignment time was roughly equivalent in all three tests.  I am sure there are overheads involved, but I doubt they should affect to that extent.
I am of course very interested to understand why this is, could be I am doing something wrong.  It would be great to hear from anyone on this forum who may have experience in such setups.
Thanks in advance
John

Bzuco

  • Full Member
  • ***
  • Posts: 244
    • View Profile
Re: Poor network processing performance
« Reply #1 on: November 07, 2024, 08:29:29 PM »
Matching points is perfectly GPU accelerated, so any distribution of this task between two computers will cause probably slowness. Try to increase number of workers on each machines if it helps.

It is better to check times of each subtasks individually:
detecting points
selecting pairs
matching points
estimating camera locations

...and also monitor CPU and GPU usage - how much time are they computing and how much time waiting + how big is the data transfer between computers across LAN.


Alexey Pasumansky

  • Agisoft Technical Support
  • Hero Member
  • *****
  • Posts: 15228
    • View Profile
Re: Poor network processing performance
« Reply #2 on: November 08, 2024, 01:22:11 AM »
Hello John,

How the shared storage was organized, where the project files and source photos were stored and what was the connection speed of both laptops to that storage? What the configuration the same as for the local processing tests?

Would be also interesting to see the processing time for the similar configuration, but if you leave only one node (in network mode) working at a time.

Additionally if you have processing logs saved for all the tests, you can send them to support@agisoft.com, so that we could check, what took so long during the image matching.
Best regards,
Alexey Pasumansky,
Agisoft LLC

jinjamu

  • Jr. Member
  • **
  • Posts: 57
    • View Profile
Re: Poor network processing performance
« Reply #3 on: November 08, 2024, 11:52:48 AM »
Hi Alexey,

To answer your questions:

The shared storage in on a 4Tb SSD on the Razer, which is the machine running the server and the worker also.  I then mapped a drive letter (G:) on both machines (Razer and XMG) to this shared folder.  The project files and the photos are both on G:

For networking I am connecting both machines wired to a wireless access point's 2 Gb Ethernet LAN ports.

The local processing test on the Razer was different only in that it used the proper SSD drive letter (D:) for the project files and the photos.

I did not have logging on but I will switch on and rerun tests.

You say "Would be also interesting to see the processing time for the similar configuration, but if you leave only one node (in network mode) working at a time."  By this do I understand that you want a test on the Razer alone but running in network mode, and with just one Worker on the Razer?

In the meantime following Bzuco's suggestions I ran a test with 3 workers on each machine, then again with 6  on each.  The test with 3 seemed to be producing more CPU and GPU activity on both machines, as compared to the test with one worker on each machine, unfortunately the timing reported in the Chunk Info box was totally wrong, reporting some 5 hours for the process, when it was definitely a lot less, so there is some error in compiling that info for networking mode processing.  (Which may call into question the timings I reported above I guess! - something to look at please). 

The test with 6 and 6 workers failed.  What seems to have happened is at the end of the Matching task all the workers except 1 went into some waiting mode, waiting for just one worker on the XMG, which while showing some CPU activity (3 or 4%) never progressed to completion after 1 hour so I aborted.  Something else to kindly look at.

Finally, another thing to mention is that no single test resulted in the identical number of photos being aligned, even though I was running totally identical alignment parameters.  Small differences (say 10 photos which aligned in one test did not align in another), but still I thought I would mention.

Thanks

John

Alexey Pasumansky

  • Agisoft Technical Support
  • Hero Member
  • *****
  • Posts: 15228
    • View Profile
Re: Poor network processing performance
« Reply #4 on: November 08, 2024, 02:51:47 PM »
Hello John,

Yes, I have meant the following:
Test 1 (as you have already completed): Network mode, shared storage, two nodes connected.
Test 2: all the same, but first node is on pause, so only second is working.
Test 3: same as above, but only second node is working while the first one is on pause.
Best regards,
Alexey Pasumansky,
Agisoft LLC

jinjamu

  • Jr. Member
  • **
  • Posts: 57
    • View Profile
Re: Poor network processing performance
« Reply #5 on: November 08, 2024, 07:00:05 PM »
Hi Alexey,

Where is the log for network processing stored please?  Because my first test for network processing came back with an empty log.

Thanks

John

Alexey Pasumansky

  • Agisoft Technical Support
  • Hero Member
  • *****
  • Posts: 15228
    • View Profile
Re: Poor network processing performance
« Reply #6 on: November 11, 2024, 12:53:40 AM »
Hello John,

You can access the log via Agisoft Network Monitor - right-click on Batch name then choose Details option from the context menu. If the batch is over, you may need to enable "show completed batches" option in View Menu.
Best regards,
Alexey Pasumansky,
Agisoft LLC

jinjamu

  • Jr. Member
  • **
  • Posts: 57
    • View Profile
Re: Poor network processing performance
« Reply #7 on: November 12, 2024, 12:20:57 PM »
Hi Alexey,

I have just sent you a detailed email with test results and related logs.

Please let me know your thoughts and any suggestions - I am happy to test further....

Regards

John

Alexey Pasumansky

  • Agisoft Technical Support
  • Hero Member
  • *****
  • Posts: 15228
    • View Profile
Re: Poor network processing performance
« Reply #8 on: November 12, 2024, 01:52:37 PM »
Hello John,

Thank you for sending the test logs, we will analyze them and get back to you shortly.

Meanwhile I have a question about local processing tests (test 4 and test 5): is that correct that for those tests the project was not saved as PSX after loading images to the workspace? If so, please check if there is considerable difference in processing time for the same local processing tests, if you save the project in PSX format on the shared storage (on Razer laptop, like doing that for network processing) before starting the Align Photos operation in local mode.

When the project is saved in PSX, the intermediate processing results will be saved to the project.files folder, and if you do not save project in this format, then all the intermediate results will be kept in memory.
Best regards,
Alexey Pasumansky,
Agisoft LLC

jinjamu

  • Jr. Member
  • **
  • Posts: 57
    • View Profile
Re: Poor network processing performance
« Reply #9 on: November 12, 2024, 03:29:29 PM »
Alexey,

Thanks for the info, I was not aware (could be a useful tip to speed up initial processing!)

To be certain of the results I have rerun the standalone tests, for both scenarios (project in memory and project pre-saved on local SSD - so NO network processing at all).  There seems to be about 15% increase of time where the project is running pre-saved on SSD vs. purely in memory, and if this extrapolates to larger projects can be considered significant in my opinion.  Compared to this, when running the projects from the shared storage (i.e. both project file and photos on the Razer shared storage), this is MUCH slower.  This happens even for the Razer which is itself providing the shared storage (to remind you that the shared drive is on an internal Razer SSD)

I am wondering if sharing the files from a NAS may help network processing, replacing the Windows sharing.  Windows disk sharing is quite notorious in its poor execution apparently.  Could this be the bottleneck we are experiencing?

I'm not sure this would explain all the results I shared with you though.....

Thanks

John

Alexey Pasumansky

  • Agisoft Technical Support
  • Hero Member
  • *****
  • Posts: 15228
    • View Profile
Re: Poor network processing performance
« Reply #10 on: November 12, 2024, 03:34:45 PM »
Hello John,

NAS storage may be preferred, also I would suggest to use UNC paths instead of virtual drive mount on Windows workers.
So in your tests the access to the shared folder on one of the machines could be a reason of considerable slow down then, but I will say for sure after analyzing the logs.
Best regards,
Alexey Pasumansky,
Agisoft LLC

Alexey Pasumansky

  • Agisoft Technical Support
  • Hero Member
  • *****
  • Posts: 15228
    • View Profile
Re: Poor network processing performance
« Reply #11 on: November 12, 2024, 06:42:30 PM »
As an additional idea (mostly related to the alignment stage), we also could suggest to check, whether i9 CPU is switching the job related to Metashape running as a worker to the "efficient" cores. While in GUI mode it works using "performance" cores.

There is some discussion related to force-using performance cores regardless the window status (such as minimized window):
https://www.reddit.com/r/XMG_gg/comments/vlqn6d/psa_rendering_tasks_are_moved_to_ecores_when/
Maybe you can also try that.
Best regards,
Alexey Pasumansky,
Agisoft LLC

jinjamu

  • Jr. Member
  • **
  • Posts: 57
    • View Profile
Re: Poor network processing performance
« Reply #12 on: November 12, 2024, 08:17:44 PM »
Alexey,

I tried changing the file paths all to UNC, it does not look like a material difference in the timings unfortunately, but I need to recheck.

I also copied the photos to an identical path on both machines and ran the networking test, so this time no photos were being read across the network, only the project files themselves.  There was an improvement in the matching phase, but not in the alignment phase.  I guess this would be consistent with what we were saying regarding the probable bottleneck on the shared folder (photos are read once at the beginning of the alignment?).

I will send you the log, and you can compare it directly with Test 1 log which you already have

Regards

John


andyroo

  • Sr. Member
  • ****
  • Posts: 458
    • View Profile
Re: Poor network processing performance
« Reply #13 on: November 15, 2024, 12:15:48 AM »
Something else worth considering - it sounds like you are networking with gigabit ethernet. That's significantly slower than even an internal spinny disk (SATA3 = 6Gb/s). not sure what the network adapter limits are on your laptops but if they support 10Gbe or even 2.5Gbe then it might be worth investing in a switch that supports their max network speed. I have our network on 10Gbe and thinking about upgrading to 25G (SFP28) because that looks like the bottleneck.

Also note that the network host/server doesn't require a metashape license so you can have even a mini-pc with SSD storage and fast networking as your metashape server, and share the drive with a UNC path.

Bzuco

  • Full Member
  • ***
  • Posts: 244
    • View Profile
Re: Poor network processing performance
« Reply #14 on: November 15, 2024, 12:01:50 PM »
Until you resolve network processing performance:
As you are running both laptops with performant components, you should really consider undervolting CPU and GPU, otherwise you waste energy on heat instead of performance. Both CPU and GPU chips suffers from overvolting(default setting from facatory).
You can do undervolting CPU in razer utility, or using ThrottleStop utility, or Intel Extreme Tuning utility, ... and GPU using MSI Afterburner.
This will significantly speed up your local processing when all CPU cores are utilized(estimating camera locations, generating dense point cloud(filtering depth maps) ) and GPU tasks(matching points, depth maps calculation).
With undervolted components you can reach higher and more stable frequencies without attacking high temperatures, when chips start to throtte.
Here is video showing undervolting results https://www.youtube.com/watch?v=azGt-rH_8qc