Forum

Author Topic: 2.0.2 crashing during depth maps, no garbage collection? (win11 2x RX7900XTX)  (Read 4908 times)

andyroo

  • Sr. Member
  • ****
  • Posts: 443
    • View Profile
I'm finding around >100GB of files that look like they're "orphaned" after crashes in 2.0.2 while building dense clouds on a AMD R9 7950X with dual RX 7900 XTX GPUs running win11.

I am filling out the crash reporter and reporting the crashes using AMD's bug report tool, but just want to make sure these are files I should delete, and ask - what's the best way to delete them? should I leave an empty dir or replace the dir? Is there any way I can make metashape reuse these files (e.g. by running in network mode with host/client/monitor all on this machine)?

I see that in projects where I later successfully generated the dense cloud, the "leftover" files are in /depth_maps and the completed files are in /depth_maps.1 - can I just delete /depth_maps if it has the *unfiltered* and *inliers* files?

Below is an excerpt of a dir listing for my latest crash:

Code: [Select]
07/03/2023  10:26 AM    <DIR>          .
07/03/2023  08:30 AM    <DIR>          ..
07/03/2023  10:23 AM       288,485,718 data0.zip
07/03/2023  10:24 AM       564,512,732 data1.zip
07/03/2023  10:25 AM       501,141,795 data2.zip
07/03/2023  10:26 AM       220,401,350 data3.zip
07/03/2023  08:32 AM       314,578,131 data_unfiltered0.zip
07/03/2023  08:34 AM       581,660,929 data_unfiltered1.zip
...
07/03/2023  10:22 AM       223,559,401 data_unfiltered65.zip
07/03/2023  08:44 AM       307,905,838 data_unfiltered7.zip
07/03/2023  08:46 AM       461,265,241 data_unfiltered8.zip
07/03/2023  08:47 AM       606,145,517 data_unfiltered9.zip
07/03/2023  08:32 AM       180,486,450 inliers0.zip
07/03/2023  08:34 AM       379,902,421 inliers1.zip
...
07/03/2023  10:22 AM        80,761,792 inliers65.zip
07/03/2023  08:44 AM       102,940,373 inliers7.zip
07/03/2023  08:46 AM       262,037,387 inliers8.zip
07/03/2023  08:47 AM       387,124,437 inliers9.zip
07/03/2023  08:31 AM       150,178,344 pm_cameras_info.data
07/03/2023  08:31 AM            26,992 pm_cameras_partitioning.grp
             138 File(s) 60,588,842,406 bytes
               2 Dir(s)  5,667,775,660,032 bytes free

Alexey Pasumansky

  • Agisoft Technical Support
  • Hero Member
  • *****
  • Posts: 15086
    • View Profile
Hello andyroo,

Unfortunately, it is not possible to re-use intermediate files created during the local processing in order to resume the task.
It could work, however, if you get the network processing configured (even with the single node), then re-connecting the crashed node will allow to continue the processing and only a failed sub-task will be re-calculated - not the entire process.
You can  *_unfiltered* и inliers* folders manually or using Clean-up Project tool (from Metashape Advanced preferences tab), when using the latter option, however, the project shouldn't be opened on any computer.
Best regards,
Alexey Pasumansky,
Agisoft LLC

ManyPixels

  • Newbie
  • *
  • Posts: 40
    • View Profile
100% related to this thread: https://www.agisoft.com/forum/index.php?topic=15033.msg65907#msg65907

The instability of depth maps calculation on AMD GPUs is horrible and it's the only problematic step with these GPUs, which clearly means the problem comes from Agisoft. The only time I got something relevant, I got a message saying "Assertion "23915205205203748 (value=7.11311e+31/61.3534 encountered, computation device is unstable)" failed at line 3771!"

While we understand that all computational devices can inherently exhibit some level of instability, it is crucial to remember that robustness and fault-tolerance should be an essential consideration in professional-grade software, especially one as vital to our work as Metashape. If you're not considering that, you can remove a zero to the price of the software.

In this context, an error-handling mechanism designed to catch failures during computation could significantly improve the software's reliability. By developing Metashape to handle such computational errors, failed computations could be retried from their last successful state, effectively making the main thread 'incorruptible'.

This would necessitate the creation of 'checkpoints' at various stages of computation to allow for a reliable state to revert to when errors occur. While such an architectural change could present its own challenges and potential performance trade-offs due to the overhead of maintaining these checkpoints, the enhancement in the software's robustness could justify the trade-off.

As customers investing in a premium software suite like Metashape, we look for a certain level of reliability and resilience to hardware-related issues. By implementing these measures, I believe Agisoft could further strengthen its reputation and provide users with a more consistent and reliable tool for our professional needs.

I hope these thoughts can be taken into consideration for future development and updates to the software. Thank you for your time and for your ongoing work on this essential tool.