Forum

Author Topic: Running Metashape within Docker using python wrapper finishes with segfault  (Read 5160 times)

FrancoCorleone

  • Newbie
  • *
  • Posts: 13
    • View Profile
Hello all,
My first question is about a possible reason for Metashape finishing with segmentation fault using python wrapper for processing but that happens ONLY in docker. I tried on Mac, I tried on EC2 instance running natively, and then I tried in docker (GPU passed in, Vulkan available, ubuntu 18 based image). Whole processing is done correctly, and then it just exits with segmentation fault instead of code 0.

Any tips appreciated, this one is annoying to find.

Greetings,
Jed

arafat

  • Newbie
  • *
  • Posts: 28
    • View Profile
I had a similar problem in the past and it was because there was not enough RAM
Hope it's helpful 

FrancoCorleone

  • Newbie
  • *
  • Posts: 13
    • View Profile
@arafat but I was running that on quite powerful machines in docker. You think that was still too little?

Alexey Pasumansky

  • Agisoft Technical Support
  • Hero Member
  • *****
  • Posts: 14888
    • View Profile
Hello Franco,

What is the amount of RAM available and at what operation seg fault happens? Do you have the log of the processing operation up to the unexpected termination?
Best regards,
Alexey Pasumansky,
Agisoft LLC

FrancoCorleone

  • Newbie
  • *
  • Posts: 13
    • View Profile
You see the unusual part is, it processes the whole thing. And then on exit, it doesn't return 0 but segfault instead. That's why I'm so confused.

FrancoCorleone

  • Newbie
  • *
  • Posts: 13
    • View Profile
I should probably add one more thing. I tried to run the same script with metashape -r also in docker, and it didn't end with segfault. That's why I doubt it's a RAM related issue. I think it's more possible that the python wrapper doesn't release all allocated memory. What you think?

SFL_Beda

  • Newbie
  • *
  • Posts: 24
    • View Profile
I am currently struggling with the same Issue.

In my case I have identified the following triggers:
  • The problem only occurs if the GPU is used
  • The problem only occurs if a texture is built (As I understand that is the only part that uses Vulkan)
  • The problem only occurs in Headless mode (for me over SSH to a remote server). Exporting the DISPLAY environment variable fixes the issue but I can only do that because the server is not actually headless.

PolarNick

  • Jr. Member
  • **
  • Posts: 97
    • View Profile
> The problem only occurs in Headless mode (for me over SSH to a remote server). Exporting the DISPLAY environment variable fixes the issue but I can only do that because the server is not actually headless.

What if you run vulkaninfo? Does it behave the same - does it segafults on exit in case of headless mode without DISPLAY?

SFL_Beda

  • Newbie
  • *
  • Posts: 24
    • View Profile
I assume you mean vulkaninfo from the package vulkan-tools?

No it doesn't produce the segfault but it detects that DISPLAY is not set and skips a part.
"'DISPLAY' environment variable not set... skipping surface info"
I attached the output i get

I also have a support ticket open concerning the issue (#191429) but wanted to post here as well for other people that are running into the same Issue.

PolarNick

  • Jr. Member
  • **
  • Posts: 97
    • View Profile
Hm, I don't sure but may be disabling all Vulkan drivers except NVIDIA (i.e. Intel and Mesa) could help. In this topic the problem with segfault on exit was due to mesa driver. But on the other hand they had segfault on exit of vulkaninfo, which exits fine for you.

SFL_Beda

  • Newbie
  • *
  • Posts: 24
    • View Profile
Sadly I cannot really do potentially system breaking changes on our computation server.

However I get the same Issue on my personal laptop which is on an older Ubuntu version (20.04lts vs 22.04lts on the  server) if i connect over ssh.
I also tried an older Nvidia driver (tried 525 and 535 on the laptop, 535 on the server) but it didn't change anything.

As it does not seem to be device specific, the bug should hopefully be reproducible.
« Last Edit: October 23, 2023, 12:22:16 PM by SFL_Beda »

e.spiridonova

  • Agisoft Technical Support
  • Newbie
  • *****
  • Posts: 37
    • View Profile
Dear Beda,

Thank you for the information.

On the headless server we were able to reproduce the problem. We are investigating the problem.

But for now we can only offer the following workarounds:

1. Use:
metashape-pro/metashape -platform offscreen -r general_workflow.py data output
 
2. Or check the success of the script execution through a self-written exif code. For example, save a text file at the end of the script "finished.txt " with the text "ok". And at the beginning of the script, delete it if there is one.
Best regards,
Elizaveta Spiridonova,
Agisoft LLC