Forum

Author Topic: Python script and network processing - script loops at end and can't cancel.  (Read 1803 times)

andyroo

  • Sr. Member
  • ****
  • Posts: 440
    • View Profile
I am trying to adapt a modified (only because we've been independently adding version updates) of William George's Puget Systems Metashape Benchmark script to work with network processing (because otherwise it will only run on one node) and I don't understand how the dialog boxes are being treated in network processing, or why the script is crashing at the end, or why I can't cancel it by cancelling from the GUI/client, or by selecting stop/quit from the network monitor, or by running client.abortBatch or client.abortNode from the client once it's connected.

When the script completes, it throws an error in the monitor and starts again from the beginning:

2020-11-09 12:56:43 [127.0.0.1:54773] failed #0 RunScript: Unexpected process termination

and in the node details it looks like it displays the results (like it would at the end), errors, then restarts. So what is the proper way to have the script terminate? I have attached the version I'm running in network and standalone version of 1.6.5. I've tried it both on a HPC and on a local workstation setup to be the server/monitor/node/client all-at-once, and I get the same behavior.

I *think* that this is exaample below (for align photos) is how I would modify the script to make it run distributed processing, but not sure how to properly end it instead of having it loop, which I am guessing has to do with the way it treats the Metashape.app.quit() command and/or the Metashape.app.messageBox(<message>) command.

Code: [Select]
# ALIGN PHOTOS
# open the first version of the project we want to test, in its unprocessed state
doc.open(folderpath + '/'+projectname+'/'+projectname+' Align Photos.psx' , False , True )
chunk = doc.chunk

# get a beginning time stamp
timer1a = time.time()

# match photos
# THIS HAS TO BE HANDLED DIFFERENTLY FOR 1.6.1 and later vs EARLIER VERSIONS
if Metashape.app.version in ( '1.6.1', '1.6.2', '1.6.3', '1.6.4', '1.6.5' ):

# for versions 1.6.1 and later, matching accuracy was changed to scaling. Values taken from Alexey's (dev) post at metashape forum. Metashape.HighAccuracy changes to downscale=1

#check if network processing is enabled
if Metashape.app.settings.network_enable:
network_tasks = list()
task = Metashape.Tasks.MatchPhotos()
task.downscale = 1
task.keypoint_limit = 40000
task.tiepoint_limit = 0
task.generic_preselection = True
task.reference_preselection = False
task.subdivide_task = True

n_task = Metashape.NetworkTask()
n_task.name = task.name
n_task.params = task.encode()
n_task.frames.append((chunk.key, 0))
network_tasks.append(n_task)
else:
chunk.matchPhotos(downscale=1, generic_preselection=True,reference_preselection=False, subdivide_task=True)

else:

# for version 1.5.1 or earlier this is unchanged from original script.
chunk.matchPhotos(accuracy=Metashape.HighAccuracy, generic_preselection=True,reference_preselection=False)

# align cameras
# THIS HAS TO BE HANDLED DIFFERENTLY FOR 1.6.1 and later vs EARLIER VERSIONS
if Metashape.app.version in ( '1.6.1', '1.6.2', '1.6.3', '1.6.4', '1.6.5' ):

# note v1.6.1 added subdivide_task=True to see how good it gets
chunk.alignCameras(subdivide_task=True)

else:

# for version 1.5.1 or earlier this is unchanged from original script.
chunk.alignCameras()

# get an ending time stamp
timer1b = time.time()

- Also it seems like to modify this to run properly with the network, I probably need to get rid of the GUI messagebox, and somehow control that one task is completed before any others begin (for benchmark consistency assuming it is mimicking a workflow that requires sequential ordering). And I'm a little unclear on how to properly control logging for each step of the task - or how I can launch a script from the gui (client) that spins tasks off to however many nodes want them, to be done in order, and reported back to the client so it can write the elapsed time down for the benchmark...
« Last Edit: November 10, 2020, 04:37:50 AM by andyroo »

andyroo

  • Sr. Member
  • ****
  • Posts: 440
    • View Profile
After some testing, I see that I can't just replace code with a network variant, but I need to change the way the script works. Not sure how I can send a batch off to the network and wait for it to finish. When I just add network batch submission code in each step I get duplicate batch submission warnings and all sorts of useless output in the network version of the benchmark. I couldn't find any example code where a task was submitted to the server then the script waited for the task to finish (necessary to benchmark a step comparably to a single machine). It looks like maybe the easiest way to do that is with task.runscript but I couldn't find a good example. I hope I don't have to design a new benchmark that submits all of the steps as a network batch list from a single project, because that wouldn't really be comparable, which makes me sad. Here's the previous script after I put in all the network tasks and batch submittals. BUT which didn't work at all and threw a "duplicate batch" error until I killed it (at least it dies gracefully now)