Forum

Author Topic: Error: std::bad_alloc building mesh from dense cloud - 1.7.2 node w/384GB RAM  (Read 348 times)

andyroo

  • Sr. Member
  • ****
  • Posts: 379
    • View Profile
Getting std::bad_alloc when trying to build an interpolated (not extrapolated) mesh on high from the dense cloud on some big chunks. Dense cloud is GCS (NAD83(2011)). I have successfully built interpolated and uninterpolated DEMs, and orthoimages for these chunks.

We first built an uninterpolated DEM from the dense cloud for the elevation model, then built an interpolated DEM and orthophoto (using the interpolated DEM.

I am now trying to build a mesh from the dense cloud to use for a comparison orthoimage (because in smaller experiments the mesh was much faster and smaller than the interpolated DEM).

The mesh was generated after rotating the bounding box to the DEM projected coordinate system (PCS = NAD83 UTM). Rotation was performed to minimize the height/width of the nodata collars on the DEM generated from the dense cloud, since if it stays rotated, the DEM bounds go all the way to the corners of the along-track-oriented (not PCS-oriented) bounding box. I wonder if the mesh is failing because it's doing grid interpolation over the whole empty area of the rotated bounding box. In that case, I need to switch the order or re-rotate the region to be oriented with the data, but it will probably still fail on another section that is L-shaped with a bunch of empty space.

These are the details from the node - I included a previous successful (smaller) mesh generation before too:

2021-05-07 17:45:55 BuildModel: source data = Dense cloud, surface type = Height field, face count = High, interpolation = Enabled, vertex colors = 0
2021-05-07 17:45:56 Generating mesh...
2021-05-07 17:46:20 generating 213317x132869 grid (0.00214379 resolution)
2021-05-07 17:46:20 rasterizing dem... done in 81.9141 sec
2021-05-07 17:47:42 filtering dem... done in 375.867 sec
2021-05-07 17:55:06 constructed triangulation from 21327465 vertices, 42654924 faces
2021-05-07 17:57:38 grid interpolated in 220.33 sec
2021-05-07 18:13:56 triangulating... 106374525 points 212748181 faces done in 4727.18 sec
2021-05-07 19:32:45 Peak memory used: 181.40 GB at 2021-05-07 19:32:43
2021-05-07 19:33:00 processing finished in 6425.13 sec
2021-05-07 19:33:00 BuildModel: source data = Dense cloud, surface type = Height field, face count = High, interpolation = Enabled, vertex colors = 0
2021-05-07 19:33:01 Generating mesh...
2021-05-07 19:33:37 generating 262471x233536 grid (0.00219694 resolution)
2021-05-07 19:33:37 rasterizing dem... done in 209.04 sec
2021-05-07 19:37:06 filtering dem... done in 847.863 sec
2021-05-07 19:53:17 constructed triangulation from 23493503 vertices, 46987000 faces
2021-05-07 19:57:34 grid interpolated in 380.113 sec
2021-05-07 20:20:53 Error: std::bad_alloc
2021-05-07 20:20:53 processing failed in 2872.89 sec

Alexey Pasumansky

  • Agisoft Technical Support
  • Hero Member
  • *****
  • Posts: 12857
    • View Profile
Hello Andy,

How many points are there in the dense point cloud? Does it help to use Medium face count preset for Build Model operation?

If possible, please also provide the screenshot of the source dense cloud with the bounding box.
Best regards,
Alexey Pasumansky,
Agisoft LLC

andyroo

  • Sr. Member
  • ****
  • Posts: 379
    • View Profile
Hi Alexey,

Thanks for the quick reply!
How many points are there in the dense point cloud? Does it help to use Medium face count preset for Build Model operation?

~3.8 billion points (3,783,821,907)

If possible, please also provide the screenshot of the source dense cloud with the bounding box.

screenshot attached - <sigh> apparently I didn't save this particular psx after running the bounding box script (or I missed this one), so it doesn't/didn't have smaller, PCS-oriented bounding boxes. Too many irons in the fire... I also attached a screenshot of the "corrected" bounding box.

I downloaded the project from the HPC last night to try it on my local workstation in non-network mode. If it works here, I'll try the modified extent on the HPC. If that works, I'll also probably try my local workstation in network mode (with the single machine acting as host, monitor, node, and GUI).

Alexey Pasumansky

  • Agisoft Technical Support
  • Hero Member
  • *****
  • Posts: 12857
    • View Profile
Hello Andy,

Dense cloud based mesh generation doesn't support fine-level task subdivision, so it will be performed on a single node anyway.

But the problem could be related to very dense source point cloud.
Best regards,
Alexey Pasumansky,
Agisoft LLC

andyroo

  • Sr. Member
  • ****
  • Posts: 379
    • View Profile
Just finished reconstruction on my local workstation using same parameters as on the node, and got two bad allocation errors (the successful (2nd) chunk is a fragment of the size of the 1st and 3rd chunks, which threw bad allocation errors).

Interestingly I was at my workstation when the second one happened, and RAM usage was about 10% of my total RAM at that point. Is it reporting a bad allocation from an earlier step? Here's the log for the whole batch:

Code: [Select]
2021-05-08 08:55:22 saved project in 0.036 sec
2021-05-08 08:55:22 BuildModel: quality = High, depth filtering = Mild, PM version, reuse depth maps, source data = Dense cloud, surface type = Height field, face count = High, interpolation = Enabled, vertex colors = 0
2021-05-08 08:55:22 Generating mesh...
2021-05-08 08:58:13 generating 244691x628622 grid (0.00219517 resolution)
2021-05-08 08:58:13 rasterizing dem...2021-05-08 08:58:14 Error: bad allocation
Saving project...
2021-05-08 08:58:14 saved project in 0.035 sec
2021-05-08 08:58:14 BuildModel: quality = High, depth filtering = Mild, PM version, reuse depth maps, source data = Dense cloud, surface type = Height field, face count = High, interpolation = Enabled, vertex colors = 0
2021-05-08 08:58:14 Generating mesh...
2021-05-08 08:59:00 generating 192407x45168 grid (0.00214067 resolution)
2021-05-08 08:59:00 rasterizing dem... done in 50.546 sec
2021-05-08 08:59:51 filtering dem... done in 84.444 sec
2021-05-08 09:01:32 constructed triangulation from 4820125 vertices, 9640244 faces
2021-05-08 09:02:22 grid interpolated in 67.125 sec
2021-05-08 09:07:34 triangulating... 91578521 points 183151732 faces done in 1646.28 sec
2021-05-08 09:35:05 Peak memory used: 68.87 GB at 2021-05-08 09:35:00
2021-05-08 09:35:05 Finished processing in 2210.72 sec
2021-05-08 09:35:05 Saving project...
2021-05-08 09:35:57 saved project in 52.595 sec
2021-05-08 09:35:57 BuildModel: quality = High, depth filtering = Mild, PM version, reuse depth maps, source data = Dense cloud, surface type = Height field, face count = High, interpolation = Enabled, vertex colors = 0
2021-05-08 09:35:57 Generating mesh...
2021-05-08 09:37:52 generating 248536x639273 grid (0.00215781 resolution)
2021-05-08 09:37:52 rasterizing dem...2021-05-08 09:37:52 Error: bad allocation
Saving project...
2021-05-08 09:37:52 saved project in 0.048 sec
2021-05-08 09:37:52 Finished batch processing in 2550.7 sec (exit code 1)

I'll reprocess a single chunk and see what my max ram usage gets to. Is it possible to approximately calculate the maximum number of vertices I can have in a mesh with a given amount of RAM? Is there any other troubleshooting/diagnosis I can do to pin down the cause?

Mesh is being built with the parameters shown in the attachment. Only things I changed from the defaults were Surface Type (default Abitrary, changed to Height field), Custom Face Count (default 200,000; changed to 0 - but shouldn't make a difference since I chose 'high' and not 'custom'), and calculate vertex colors (default yes, changed to no because I figured it would save time and I don't need them).

-EDIT- I ran the batch with just the first chunk selected. It failed in 100s and RAM usage never got above ~22MB
« Last Edit: May 08, 2021, 07:56:47 PM by andyroo »

Alexey Pasumansky

  • Agisoft Technical Support
  • Hero Member
  • *****
  • Posts: 12857
    • View Profile
Quote
It failed in 100s and RAM usage never got above ~22MB

Hello Andy,

It seems Metashape was not able to allocate sufficient RAM for 248536x639273 grid.
Best regards,
Alexey Pasumansky,
Agisoft LLC

andyroo

  • Sr. Member
  • ****
  • Posts: 379
    • View Profile
It seems Metashape was not able to allocate sufficient RAM for 248536x639273 grid.

Hi Alexey,

I have 256GB of RAM, and it looked like it never tried to allocate more than about 10% of that. Does the allocation error occur before the memory is assigned?

The DEMs (interpolated and uninterpolated I already built for this chunk are much larger than what the mesh is attempting - 450228x942522 (DEMs built directly from the dense cloud).

I guess there is some operation for the meshing that requires more RAM than the DEM from dense cloud does?

Andy

Alexey Pasumansky

  • Agisoft Technical Support
  • Hero Member
  • *****
  • Posts: 12857
    • View Profile
Hello Andy,

DEM is generated in tiles as LOD pyramid, whereas mesh from the point cloud source is built in a single block.

In the beginning of height field mesh generation Metashape is trying to allocate RAM required for the complete grid first.
Best regards,
Alexey Pasumansky,
Agisoft LLC

andyroo

  • Sr. Member
  • ****
  • Posts: 379
    • View Profile
Hi Alexey,

Can I calculate the amount of RAM that will be used in the complete grid from the dimensions reported from the grid? I'm wondering if I can use the custom mesh size to optimize the grid without reducing resolution more than I have to.

-EDIT- It looks like the grid size when generating a mesh from a dense cloud is a function of the depth map resolution x the region size. I'm wondering how much memory each grid cell needs - this would allow me to more accurately calculate the maximum region I can reconstruct at once, and to tile the dense cloud appropriately.

For example, I see that when I shrink the region/grid from 244691x628622 to 91307x600192 then instead of getting bad_alloc error I appear to be using about 200GB, which is fairly close to 32-bit float x raster size if I'm doing my math right...
« Last Edit: May 10, 2021, 09:20:05 PM by andyroo »

andyroo

  • Sr. Member
  • ****
  • Posts: 379
    • View Profile
It looks like in batch mode network processing, when erroring out with bad_alloc, metashape treats errors differently than on my workstation, and recycles/restarts the batch job after cycling through all chunks. Is this expected/optional behavior, and is there an option to change this so the job completes rather than endlessly cycling? Screenshot attached.

andyroo

  • Sr. Member
  • ****
  • Posts: 379
    • View Profile
Just started looking at "completed" meshes, and the first one I checked has some serious problems (see screenshots). Only partial reconstruction, and some artifacts that extend far outside the region. Screenshots of the model view window attached showing close and far views. Far view is sparse cloud vs mesh. Close view is dense cloud vs mesh. Mesh was reconstructed from batch dialog.

It *looks* like the top right of the mesh *could* be about where the upper right boundary of the original region was, but I'm not certain [edit - the other messed up mesh shot off to the upper left, so not at all consistent with my previous (in strikeout) hypothesis]. The project was aligned over a large area, then divided into smaller regions saved to new PSX files for region-based dense cloud processing.

-EDIT- looked at the remaining completed meshes and 2 of 6 were bad. Both had ~190GB of RAM as max usage (the two largest meshes - using about half of the total 384GB available on each node). Compared mesh and DEM construction time and size from the dense cloud, and there didn't appear to be a consistent advantage to mesh - meshes were generally about the same except where they broke, then ~2x longer. Best mesh performance was about 1/2 the time of the DEM, but avg was approximately equal. Mesh size across these 6 DEMs was consistently smaller, but only by about 17%, which wasn't enough to warrant pursuing this further. For our purposes, mesh-based ortho reconstruction is not as performant as DEM-based because of the requirement for single-node/full-allocation grid generation.
« Last Edit: May 11, 2021, 04:59:35 AM by andyroo »