Forum

Author Topic: Inquiry Regarding GPU Underutilization and Processing Time During Dense Cloud an  (Read 3345 times)

tanaka takaya

  • Newbie
  • *
  • Posts: 2
    • View Profile
We are currently generating a 3D model using Agisoft Metashape Professional with approximately 1,800 drone images taken at a city hall construction site. However, we have observed that the GPU (NVIDIA GeForce RTX 5090) does not appear to be fully utilized during each processing stage.
Below is a summary of the behavior observed during each stage:
   • Sparse Point Cloud / Alignment
Processing Time: Approx. 40 minutes
GPU Usage: Around 12GB at the start, then stabilizing around 8GB
CPU Usage: Less than 10% at the start, intermittently reaching 100%
   • Dense Cloud Generation
Processing Time: Approx. 45 minutes
GPU Usage: Steady around 9GB
CPU Usage: Nearly constant at 100%
   • Mesh Reconstruction
Processing Time: Approx. 30 minutes
GPU Usage: Steady around 9GB
CPU Usage: Around 40%, intermittently reaching 100%
As shown above, CPU load is consistently high, and especially during the dense cloud generation stage, the GPU seems to be underutilized.
Our goal with this setup is to efficiently generate 3D models from large-scale image datasets, and we would like to maximize GPU usage to reduce processing time.
We would greatly appreciate your guidance on the following points:
   1. What are the main reasons why the GPU may not be fully utilized during each stage (e.g., settings, architecture, limitations)?
   2. What are the recommended settings or optimization procedures to fully leverage the GPU (RTX 5090, 32GB VRAM)?
   3. What are effective methods to identify whether the bottleneck lies in the CPU, memory, or disk I/O?

System Specifications:
   • Motherboard: ASUS PRO WS W790-ACE
   • CPU: Intel Xeon W5-2565X (18 cores / 36 threads / 3.20GHz, up to 4.8GHz with Turbo Boost)
   • Memory: 256GB DDR5-5600 REG ECC (64GB × 4)
   • GPU: NVIDIA GeForce RTX 5090 (VRAM 32GB)
   • Storage: 2TB SSD (M.2 NVMe Gen4) + 4TB HDD (S-ATA)
   • OS: Microsoft Windows 11 Professional 64bit
   • Input Data: 1,800 images (16GB) of structures at a city hall construction site
   • Software Version: Agisoft Metashape Professional Version 2.2.2 build 21069 (64 bit)
Processing Settings:
   • Sparse Point Cloud / Alignment: Medium Quality
   • Dense Cloud Generation: Medium Quality
   • Mesh Reconstruction: Default parameters for "Build Texture"

CheeseAndJamSandwich

  • Full Member
  • ***
  • Posts: 216
    • View Profile
    • Sketchfab Models
A couple of things to try:

Bzuco's Local Networked trick.  Great for massive projects, too!
Basically, running multiple instances of Metashape on one machine.
This has given us quite a decent boost.  And you do see the HW utilisation go up a lot!
As with a lot of software, tasks, often there's only so much you can parallelise stuff, hence lots of tasks not making much use of those ThreadRippers!  so to beat that, chop it up into smaller pieces and then run them concurrently, so each worker can use the optimum number of cores.
https://www.agisoft.com/forum/index.php?topic=16384.msg70462#msg70462
When you get it all running, you have to adjust how many nodes/workers are running for each stage, by pausing, unpausing them... The right number seems to depend on your hardware + image sizes + etc. etc.  Test to find what works.

And there's also a similar trick with the GPU, for Depthmaps, if you use them,  which already naturally does the above trick... Which is why you've seen GPU1 and GPU2 in the console, it already splits it into 2.  But we can tell it to split it further.
Bzuco's (again) BuildDepthMaps/max_gpu_multiplier tweak.
Default is 2, but try 3, 4, 5, 6, whatever!
Again, it will depend on your hardware, images sizes, etc..  Find which give you the best.  On my bandwidth limited eGPU setup it gave wonderful speed increases!
https://www.agisoft.com/forum/index.php?topic=17015.msg72825#msg72825

As Moore's Law hit the limits, our CPUs and GPUs are just getting 'wider' and 'wider' with more and more cores, and that many tasks just don't parallelise much, we're hoping that Metashape will one day do the above tricks internally, adjusting to just do the best on the fly, without us having to faff around tending to our jobs.

Other examples of this trick I've seen, are Handbrake... If you have 23 episodes to transcode on a nice CPU with too many cores, you can tell it to spawn more than one instance of the worker thread.  And you see faster transcodes of the batch.
Same with good ol' Filezilla.  Transferring lots of little files by FTP is dog slow, so you run up to 10x concurrent worker threads, downloading 10 files concurrently.  Which is also how Browsers download web pages and all the images.
Both these two you'll see multiple worker threads under the main application in Task Manager.

Again, Alexey, is this in the works?
My 'little' scan of our dive site, 'Manta Point'.  Mantas & divers photoshopped in for scale!
https://postimg.cc/K1sXypzs
Sketchfab Models:
https://sketchfab.com/cheeseandjamsandwich/models

tanaka takaya

  • Newbie
  • *
  • Posts: 2
    • View Profile
I have three questions I would like to ask:
1.   What exactly does "create several local instances of running Metashapes" mean in the context of Metashape, and how can it be set up and utilized in practice?

2.   Regarding the following settings and optimizations (①–③), could you please clarify which processing stages in Metashape ([Alignment], [Dense Cloud], [Mesh Reconstruction]) each contributes to most significantly?
① Running several local instances in parallel (executing several Metashape processes on a single PC)
② Increasing the value of BuildDepthMaps/max_gpu_multiplier
③ Setting main/gpu_enable_opencl to true and main/gpu_enable_cuda to false

3.   I came across "Increasing the value of main/refine_max_gpu_multiplier" through my own research. If you are familiar with this parameter, could you please explain which processing phase it affects and the scope of its impact, like question 2?

For reference, I am including the following forum threads:
https://www.agisoft.com/forum/index.php?topic=17015.0 ,
https://www.agisoft.com/forum/index.php?topic=12458.0

I would greatly appreciate your detailed answers to each item.
Thank you very much for your assistance.

CheeseAndJamSandwich

  • Full Member
  • ***
  • Posts: 216
    • View Profile
    • Sketchfab Models
Watch the video that Bzuco linked to!!!  It describes the setup.
This one: https://www.youtube.com/watch?v=BYRIC-qkZJ8
Though the syntax has changed for the latest versions.
The trick is that the ip address is all the same, the address for your machine.
Like you can map to network shared folders that exist on your own machine, you can run network apps on your own machine too.

Basically, you just run:
Code: [Select]
"c:/program files/agisoft/metashape pro/metashape.exe" --worker --host 192.168.0.2 --root //mymachine/scansa few times.  Then you'll have the several instances of the worker node running.  You can see them in task manager.
Then run:
Code: [Select]
"c:/program files/agisoft/metashape pro/metashape-server.exe" --server --host 192.168.0.2 Which you can then connect to with the Agisoft Network Monitor application...  Unpause however many you need for each stage.
Then when running Metashape Pro, also set to use Network Processing, when you start a workflow item, or batch, it'll pump it out to the server, that divides the work up into smaller pieces, which it then issues to the nodes.
I just used two batch files that i ran.  I haven't used them, pro in a while though, as i was only trailing it... Need to clean install windows to get another 30 days!  8)

Again, the issue is that some tasks don't scale well with multithreading... Diminishing returns... Say, they only get faster up to 4 threads... So, instead, you run multiple instances, e.g. 4x, so they can use all 16 threads of you CPU.

I'd love to see the results from someone with a threadripper setup!  As they have the stupid number of threads, and have often complained that Metashape just doesn't work that well on them...

RE: BuildDepthMaps/max_gpu_multiplier
I think this is only for the building of depth maps... as the Tweak's name suggests!
But again, Metashape by default already spawns 2x GPU threads...  Again, in the console/logs, you'll see GPU1, GPU2...  We're just telling it to spawn more!

RE: main/refine_max_gpu_multiplier
Haven't played with this one. 
Try some numbers, 4, 6, 8, and let us know!

Still silence from Alexey on this whole subject.  Sadly.
My 'little' scan of our dive site, 'Manta Point'.  Mantas & divers photoshopped in for scale!
https://postimg.cc/K1sXypzs
Sketchfab Models:
https://sketchfab.com/cheeseandjamsandwich/models

sklein

  • Newbie
  • *
  • Posts: 6
    • View Profile
Again, the issue is that some tasks don't scale well with multithreading... Diminishing returns... Say, they only get faster up to 4 threads... So, instead, you run multiple instances, e.g. 4x, so they can use all 16 threads of you CPU.

Which tasks only get faster up to 4 threads?

CheeseAndJamSandwich

  • Full Member
  • ***
  • Posts: 216
    • View Profile
    • Sketchfab Models
Again, the issue is that some tasks don't scale well with multithreading... Diminishing returns... Say, they only get faster up to 4 threads... So, instead, you run multiple instances, e.g. 4x, so they can use all 16 threads of you CPU.

Which tasks only get faster up to 4 threads?

I was just using 4 as an example!  :P
My 'little' scan of our dive site, 'Manta Point'.  Mantas & divers photoshopped in for scale!
https://postimg.cc/K1sXypzs
Sketchfab Models:
https://sketchfab.com/cheeseandjamsandwich/models