1
Feature Requests / Re: Much faster processing using multiple local nodes/workers
« on: December 23, 2024, 12:19:48 PM »
I agree that it will be nice to have such speedup out-of-the-box, but I see a lot of problems and reliability risks that makes this feature very hard (or even impossible) to implement.
This is hard even if we are talking about fixed hardware (f.e. in your case you have to manually tune number of nodes on per-subtask basis), in case of generic hardware - IMHO this is nearly impossible to be implemented reliably. Even usage of the GPU in parallel triggers bugs in GPU drivers (f.e. race-conditions), but even if it wasn't the case - the problem with parallel VRAM/RAM usage is nearly impossible to be solved in generic case. What if sub-task has peak RAM/VRAM usage at the very end of the processing? It will be killed, but it wasted a lot of computation resources (slowing down other nodes).
The same problem with reliability on any hardware/OS: what is "if it's swapping"? Allocation of at least one byte in swap? But it can happend even if RAM usage is very low.
They use fixed number of parallel workers (without some kind of adaptive nodes count, or killing on RAM/VRAM) - it is the common thing in any software - including Metashape. And Metashape has one more feature - to have high-level parallelism - via launching nodes in local cluster (like you do).
Quote
For each sub-task, the optimal number of nodes is spawned to complete the task, maximising the utilisation of the hardware.
Either MS just dynamically adds another node if it's not at 100% utilisation, or there's still VRAM available, or kills one if it's swapping... or perhaps MS runs some tests to 'characterise' the hardware in use, and sets the numbers of nodes it'll use for each sub-task?
This is hard even if we are talking about fixed hardware (f.e. in your case you have to manually tune number of nodes on per-subtask basis), in case of generic hardware - IMHO this is nearly impossible to be implemented reliably. Even usage of the GPU in parallel triggers bugs in GPU drivers (f.e. race-conditions), but even if it wasn't the case - the problem with parallel VRAM/RAM usage is nearly impossible to be solved in generic case. What if sub-task has peak RAM/VRAM usage at the very end of the processing? It will be killed, but it wasted a lot of computation resources (slowing down other nodes).
Quote
or kills one if it's swapping...
The same problem with reliability on any hardware/OS: what is "if it's swapping"? Allocation of at least one byte in swap? But it can happend even if RAM usage is very low.
Quote
And using multiple 'workers' is very standard trick... FileZilla runs multiple fzsftp.exe workers to do up to 10 concurrent downloads. Massively speeding up the transfer of smaller files, or if the server pings are crap... And Handbrake spawns multiple HandBrake.worker.exe workers to concurrently transcode multiple videos at once on CPUs with large core counts, as transcoding doesn't scale linearly with cores. This allows for faster transcoding of the batch of files.
They use fixed number of parallel workers (without some kind of adaptive nodes count, or killing on RAM/VRAM) - it is the common thing in any software - including Metashape. And Metashape has one more feature - to have high-level parallelism - via launching nodes in local cluster (like you do).