Forum

Author Topic: Much faster processing using multiple local nodes/workers  (Read 1893 times)

CheeseAndJamSandwich

  • Full Member
  • ***
  • Posts: 182
    • View Profile
    • Sketchfab Models
Much faster processing using multiple local nodes/workers
« on: December 19, 2024, 01:46:08 AM »
It turns out, we have some very sizeable speed increases available to us today, just using a neat trick!

Recently, @Bzuco promoted the method of running Metashape using network processing, but doing it locally, just on your one workstation, but then spawning multiple worker nodes!
https://www.agisoft.com/forum/index.php?topic=16384.msg70457#msg70457
Which is based on a trick for processing 'HUGE' projects, as in Geospatial Tips' YouTube video:
https://www.youtube.com/watch?v=BYRIC-qkZJ8

This seems to alleviate much of the issue we all recognise, whereby Metashape seems to dwell for ages, using only 10-20% of the processor or GPU.  It's simply not pegging the CPU/GPU at 100% all the time, like we'd hope it would.

Obviously many stages of Metashape's processing is limited to a single thread, but other stages that aren't still seem to not push the hardware to 100%.  But it looks like running multiple nodes on the one workstation does allow us to push it to 100% more often.

Doing some trials with Metashape Pro and the Networking feature, i've also run tests, increasing the node count from 1 to 2, to 3, to 4...  noting the processing times.
Just use metashape-server.exe and the right arguments to start up one server, and then 6 nodes.  All using your workstation's IP address.  Then use the Network Monitor to connect to it, to unpause, pause nodes when needed.

For aligning, with my crappy old Thinkpad P51 with a 5700 XT eGPU, i hit a hard wall at 6 nodes.
For building the model, it can only handle 2 nodes, as it fills up the VRAM and massively slows down if it has to swap VRAM to RAM.

So for the benchmark test i just ran, i got the following for the Match & Align Photos job:
Normal:  2:12 + 0:39 = 171 mins
8 Nodes:  1:20 + 0:38 = 119 mins
43% faster!!!


So, could Metashape adopt multiple local worker/nodes?
For each sub-task, the optimal number of nodes is spawned to complete the task, maximising the utilisation of the hardware.
Either MS just dynamically adds another node if it's not at 100% utilisation, or there's still VRAM available, or kills one if it's swapping... or perhaps MS runs some tests to 'characterise' the hardware in use, and sets the numbers of nodes it'll use for each sub-task?

This would benefit Metashape Std and Pro.   Where Pro remains able to connecting to the different IP addresses on the network...  And each client running its own multiple local nodes.

And using multiple 'workers' is very standard trick... FileZilla runs multiple fzsftp.exe workers to do up to 10 concurrent downloads. Massively speeding up the transfer of smaller files, or if the server pings are crap...  And Handbrake spawns multiple HandBrake.worker.exe workers to concurrently transcode multiple videos at once on CPUs with large core counts, as transcoding doesn't scale linearly with cores. This allows for faster transcoding of the batch of files.

I'd guess that every Metashape user/customer, and Agisoft themselves would simply love it to just process the jobs as fast as possible!
My 'little' scan of our dive site, 'Manta Point'.  Mantas & divers photoshopped in for scale!
https://postimg.cc/K1sXypzs
Sketchfab Models:
https://sketchfab.com/cheeseandjamsandwich/models

PolarNick

  • Full Member
  • ***
  • Posts: 107
    • View Profile
Re: Much faster processing using multiple local nodes/workers
« Reply #1 on: December 23, 2024, 12:19:48 PM »
I agree that it will be nice to have such speedup out-of-the-box, but I see a lot of problems and reliability risks that makes this feature very hard (or even impossible) to implement.

Quote
For each sub-task, the optimal number of nodes is spawned to complete the task, maximising the utilisation of the hardware.
Either MS just dynamically adds another node if it's not at 100% utilisation, or there's still VRAM available, or kills one if it's swapping... or perhaps MS runs some tests to 'characterise' the hardware in use, and sets the numbers of nodes it'll use for each sub-task?

This is hard even if we are talking about fixed hardware (f.e. in your case you have to manually tune number of nodes on per-subtask basis), in case of generic hardware - IMHO this is nearly impossible to be implemented reliably. Even usage of the GPU in parallel triggers bugs in GPU drivers (f.e. race-conditions), but even if it wasn't the case - the problem with parallel VRAM/RAM usage is nearly impossible to be solved in generic case. What if sub-task has peak RAM/VRAM usage at the very end of the processing? It will be killed, but it wasted a lot of computation resources (slowing down other nodes).

Quote
or kills one if it's swapping...

The same problem with reliability on any hardware/OS: what is "if it's swapping"? Allocation of at least one byte in swap? But it can happend even if RAM usage is very low.

Quote
And using multiple 'workers' is very standard trick... FileZilla runs multiple fzsftp.exe workers to do up to 10 concurrent downloads. Massively speeding up the transfer of smaller files, or if the server pings are crap...  And Handbrake spawns multiple HandBrake.worker.exe workers to concurrently transcode multiple videos at once on CPUs with large core counts, as transcoding doesn't scale linearly with cores. This allows for faster transcoding of the batch of files.

They use fixed number of parallel workers (without some kind of adaptive nodes count, or killing on RAM/VRAM) - it is the common thing in any software - including Metashape. And Metashape has one more feature - to have high-level parallelism - via launching nodes in local cluster (like you do).

CheeseAndJamSandwich

  • Full Member
  • ***
  • Posts: 182
    • View Profile
    • Sketchfab Models
Re: Much faster processing using multiple local nodes/workers
« Reply #2 on: December 23, 2024, 03:07:38 PM »
I agree that it will be nice to have such speedup out-of-the-box, but I see a lot of problems and reliability risks that makes this feature very hard (or even impossible) to implement.
Yeah, it might be a bit difficult to do everything to give as near to 100% utilisation as possible... Or it might be easy to get some easy wins to give us some more speed gains.
Though they know which subtasks could give them problems... And of course, A.I. will fix everything else!  ::)

When we do the testing, we see speeds increase with increasing numbers of local nodes, until a threshold...  So if nothing else, this fudge exposes the fact that gains are possible.  Either buy running more worker threads, or by just doing it in this faux networked way.

To get exotic, perhaps MS could look at the whole batch of jobs it has been given, and perhaps prepare, or process stuff that it needs for a latter job, and just run it at a low priority, that doesn't disrupt the current job, or even if it does, still giving a net gain.  So it could do some Depth Maps stuff when it's still Aligning... And when Depth Maps comes up later, it just completes what's left.  If all the tasks that can be sliced up into packages, are, that don't have outstanding dependencies, that are multithreaded when the main task is heavily single threaded, or that are GPU tasks, when the main task is a CPU Task, or vice versa.  All things that happen in the real world, making real things.  Opportunistic Processing  8)
ooh, just googled it, it's called Out-of-Order Execution!
https://en.wikipedia.org/wiki/Out-of-order_execution
Just add another tickbox to the Batch dialog box for OOOE!!!

At the end of the day, there's a depressing amount of periods of seemingly almost idle utilisation, in between it then kicking into gear the and melting the CPU or GPU.  Why can't we melt both at the same time???

We'd just like to have the thousands of dollars of hardware we have utilised as much as possible.
And Agisoft certainly wants to be able to sell and even faster product.

Perhaps this networking 'trick' is touching on stuff that Agisoft already has in development?
« Last Edit: December 23, 2024, 03:31:58 PM by CheeseAndJamSandwich »
My 'little' scan of our dive site, 'Manta Point'.  Mantas & divers photoshopped in for scale!
https://postimg.cc/K1sXypzs
Sketchfab Models:
https://sketchfab.com/cheeseandjamsandwich/models