Forum

Author Topic: Network error: "Can't read project: ..."  (Read 11246 times)

acind

  • Newbie
  • *
  • Posts: 9
    • View Profile
Network error: "Can't read project: ..."
« on: June 30, 2017, 01:21:19 PM »
Hi,

We have a Agisoft network cluster setup with 10 nodes that has been running fine for some time.

However after upgrading to version 1.3.2.4205 we have started getting these read/write errors:

2017-06-30 12:17:00 Error: Can't read project: //NETWORKDRIVE/Photoscan/PROJECTPATH/0/0/point_cloud.1/point_cloud.zip

This also causes the project to fail on open in the GUI.

looking at that failing .zip its written as "point_cloud.zip.tmp" file, and removing the .tmp extension allows us to open the project in the GUI, but the aligning processing seem to have failed.

What to do?

Thanks,
Joacim

Alexey Pasumansky

  • Agisoft Technical Support
  • Hero Member
  • *****
  • Posts: 15160
    • View Profile
Re: Network error: "Can't read project: ..."
« Reply #1 on: July 05, 2017, 12:17:39 PM »
Hello Joacim,

Were these problems related to the network processing?

Can you please also confirm that both server and nodes are upgraded to 4205 build?
Best regards,
Alexey Pasumansky,
Agisoft LLC

willfig

  • Newbie
  • *
  • Posts: 35
    • View Profile
Re: Network error: "Can't read project: ..."
« Reply #2 on: September 14, 2017, 04:36:32 PM »
We're having similar issues.  This has been an intermittent issue in the past but I've just started trying to load some projects onto our network of 6 computers and and am getting these errors every time.  I noted there was a new software version so I have gone through and updated every node and the server to 1.3.3 build 4827.  We haven't used the network much for the previous 3-4 weeks so I'm not sure when exactly we started having issues.  I do believe all the computers will have had OS updates (Win7 and Win10 variously) during this period. I'm not sure if this is contributing to the issue.  Note the root drive is mapped on all computers and I can open it.  The errors I get are:

2017-09-14 23:10:18 [10.165.18.2:63723] finished #0 MatchPhotos
2017-09-14 23:10:18 [10.165.18.2:63723] finished #0 MatchPhotos.initialize (1/1)
2017-09-14 23:10:19 [10.165.18.61:61839] failed #0 MatchPhotos.detect (5/7): Can't open file: Invalid argument (22): //NETWORKDRIVE/ROJECTPATH/LH-CG_tag05.psx


2017-09-14 23:11:18 [10.165.18.2:63723] finished #0 AlignCameras.update (1/1)
2017-09-14 23:11:20 [10.165.18.2:63723] finished #0 AlignCameras.merge (1/1)
2017-09-14 23:11:20 [10.165.18.2:63723] finished #0 AlignCameras.update (1/1)
2017-09-14 23:11:24 [10.165.18.2:63723] failed #0 AlignCameras.finalize (1/1): Can't replace file or directory: The parameter is incorrect (87): ////NETWORKDRIVE/ROJECTPATH/LH-CG_tag05.files/0/0/point_cloud.1/point_cloud.zip
2017-09-14 23:11:24 [10.165.18.2:63723] failed #0 AlignCameras.finalize (1/1): Can't open file: The system cannot find the file specified (2): ////NETWORKDRIVE/ROJECTPATH/LH-CG_tag05.files/0/0/point_cloud.1/point_cloud.zip


Note that once this fails and I have to cancel the job the file is corrupt and even results from previous steps (marker detection for instance) are lost.

I've also tried running projects from the start again which have already been successfully built previously and am getting this same issue.  I'd be quite grateful for some help here.  Not at all sure what's happening.

willfig

  • Newbie
  • *
  • Posts: 35
    • View Profile
Re: Network error: "Can't read project: ..."
« Reply #3 on: September 15, 2017, 01:16:26 AM »
Just wanted to add that I've had no problems running these same projects locally.  So it doesn't seem this has anything to do with direct access to the network share or permissions.

Alexey Pasumansky

  • Agisoft Technical Support
  • Hero Member
  • *****
  • Posts: 15160
    • View Profile
Re: Network error: "Can't read project: ..."
« Reply #4 on: September 19, 2017, 08:42:19 PM »
Hello willfig,

Is it possible that there are issues with the access rights for the  current process? Can you also provide the error messages from the node that fails to finishe the tasks (you can access the log from the Network Monitor window using Details option after right-clicking on the node label).
Best regards,
Alexey Pasumansky,
Agisoft LLC

willfig

  • Newbie
  • *
  • Posts: 35
    • View Profile
Re: Network error: "Can't read project: ..."
« Reply #5 on: September 20, 2017, 01:40:46 AM »
Unfortunately I can't as we've had to reset the server since this time.  I can try and capture this if it happens again.

I don't think it has to do with access rights as this seems to be a rather spontaneous occurrence.  Our set-up involves all our computers running a service in the background that automatically logs them into the same admin account and then launches the PSP node.  Then users can log onto (with their own user credentials) either these node computers or other computers running PSP and put jobs on to the network.  They can also choose to run jobs locally (on the node computer or their own computer).  If they chose to run locally while logged onto a node computer there would instances of PSP running in the background on the admin account as well as on their account. In practice our policy is that if you want to run something locally you should pause the node on the network first just to avoid putting too much on one computer...we have had models come out bad in the past when we did this.

We only seem to have this issue with jobs run through the network, not locally even though they all access the network drive space for getting the images and writing the PSP file.  Note we can also log into the admin account directly and run locally and also don't have this issue. So I don't think its an access rights issue.

One thing I do worry about is network connectivity...specifically the continuity.  We log all metadata for our models on a custom MS Access database. This is stored in the same network area as all our photos and PSP files.  We have consistent problems with the database losing the connection.  We will lose any changes that have been made to the currently active record and have to restart it.  Based on some searches online, it looks like MS Access is somewhat sensitive to network dropouts...even very short ones.  I've asked our IT guys at the Uni and they tell me there is nothing unusual about out network continuity....not sure I believe that.

Given this only happens with jobs put over the network, not run locally, I do wonder if there is something about the network that makes it very sensitive to network dropouts?  My best guess is that a network dropout happens at a critical time for one of the processes and this essentially corrupts the project file in some way.  Then every action to try and write to that file generates an error.  If I pause the node where the errors are piling up, you can seem them start to pile up on the node that the job gets pushed to.  The only way to stop it is to pause and abort the actual job.  This mainly seems to happen at the align photo and build dense cloud stage.  And whenever it happens, the project file is then corrupt and can't be opened.  So even if it got through alignment and saved, all that will be lost and we have to start from the beginning.

willfig

  • Newbie
  • *
  • Posts: 35
    • View Profile
Re: Network error: "Can't read project: ..."
« Reply #6 on: October 05, 2017, 12:57:23 AM »
Hello willfig,

Is it possible that there are issues with the access rights for the  current process? Can you also provide the error messages from the node that fails to finishe the tasks (you can access the log from the Network Monitor window using Details option after right-clicking on the node label).

Wondering if you have an input here as we are still having issues.  I've been doing some trouble shooting.  At least in this instance, the first error happens at the alignment phase. Here's the output from one of the nodes when it first failed.  Note this is the same error I get if I pause this node and rerun the project from scratch (to force it to use another node):

2017-10-05 08:45:59 adding 854 points, 874 far (0.516083 threshold), 217 inaccurate, 0 invisible, 0 weak
2017-10-05 08:45:59 adjusting: xxxxxxx 0.166182 -> 0.166019
2017-10-05 08:46:00 point variance: 0.165237 threshold: 0.495711
2017-10-05 08:46:00 adding 1007 points, 909 far (0.495711 threshold), 222 inaccurate, 1 invisible, 0 weak
2017-10-05 08:46:00 adjusting: xxxxxxxx 0.163735 -> 0.163594
2017-10-05 08:46:00 point variance: 0.161966 threshold: 0.485899
2017-10-05 08:46:00 adding 1096 points, 932 far (0.485899 threshold), 224 inaccurate, 1 invisible, 0 weak
2017-10-05 08:46:00 optimized in 1.703 seconds
2017-10-05 08:46:00 coordinates applied in 0 sec
2017-10-05 08:46:00 Error: Can't replace file or directory: The parameter is incorrect (87): //research-data.shared.sydney.edu.au/FSC/PRJ-HabitatStr/Processing/Lizard Island/Table coral colonies/LIRS Tables-Precision/Table 8/LIRS-Table08_2014_sub06_test1.files/0/0/point_cloud.2/point_cloud.zip
2017-10-05 08:46:00 processing failed in 1.954 sec
2017-10-05 08:46:00 AlignCameras.finalize (1/1): adaptive fitting = 0
2017-10-05 08:46:00 loaded camera partition in 0 sec
2017-10-05 08:46:00 4 blocks: 61 5 3 2
2017-10-05 08:46:00 Error: Can't open file: The system cannot find the file specified (2): //research-data.shared.sydney.edu.au/FSC/PRJ-HabitatStr/Processing/Lizard Island/Table coral colonies/LIRS Tables-Precision/Table 8/LIRS-Table08_2014_sub06_test1.files/0/0/point_cloud.2/point_cloud.zip
2017-10-05 08:46:00 processing failed in 0.031 sec
2017-10-05 08:46:00 AlignCameras.finalize (1/1): adaptive fitting = 0
2017-10-05 08:46:00 loaded camera partition in 0 sec
2017-10-05 08:46:00 4 blocks: 61 5 3 2
2017-10-05 08:46:00 Error: Can't open file: The system cannot find the file specified (2): //research-data.shared.sydney.edu.au/FSC/PRJ-HabitatStr/Processing/Lizard Island/Table coral colonies/LIRS Tables-Precision/Table 8/LIRS-Table08_2014_sub06_test1.files/0/0/point_cloud.2/point_cloud.zip
2017-10-05 08:46:00 processing failed in 0.032 sec
2017-10-05 08:46:00 AlignCameras.finalize (1/1): adaptive fitting = 0
2017-10-05 08:46:00 loaded camera partition in 0 sec
2017-10-05 08:46:00 4 blocks: 61 5 3 2
2017-10-05 08:46:00 Error: Can't open file: The system cannot find the file specified (2): //research-data.shared.sydney.edu.au/FSC/PRJ-HabitatStr/Processing/Lizard Island/Table coral colonies/LIRS Tables-Precision/Table 8/LIRS-Table08_2014_sub06_test1.files/0/0/point_cloud.2/point_cloud.zip
2017-10-05 08:46:00 processing failed in 0.031 sec
2017-10-05 08:46:00 AlignCameras.finalize (1/1): adaptive fitting = 0
2017-10-05 08:46:00 loaded camera partition in 0 sec
2017-10-05 08:46:00 4 blocks: 61 5 3 2


Alexey Pasumansky

  • Agisoft Technical Support
  • Hero Member
  • *****
  • Posts: 15160
    • View Profile
Re: Network error: "Can't read project: ..."
« Reply #7 on: October 10, 2017, 10:01:52 PM »
Hello willfig,

I still assume that there might be some network connection issues or access rights problems.

Local processing saves the data under the current user, whereas the network processing is controlled by the server instance, so the data may be marked by the server.
Best regards,
Alexey Pasumansky,
Agisoft LLC

willfig

  • Newbie
  • *
  • Posts: 35
    • View Profile
Re: Network error: "Can't read project: ..."
« Reply #8 on: October 11, 2017, 01:26:38 AM »
Hi Alexey,

Thanks for the reply. I’ve created an account that has admin rights and that’s what we use to log into each of the nodes as well as the server, to launch. We then just switch users (Windows) and leave this account logged in with the command window open, running the node/server. We then log onto node machines with different accounts to fire jobs off. We don’t use the server for any other task.  All files are on a network server which we all have access to. We map it as a drive letter though in the DOS command to launch the server and nodes we put in the full path for the —root.  Is this the recommended way to set this up?

I went to fire off some jobs yesterday. According to network monitor all nodes were up and running. And I had run jobs with no problems a few days prior though none in the interim. The models immediately begin failing. I noted that if I restarted the node, it would then work fine. So I did this for all computers and then everything worked fine for 10 models I processed. Note I did not relaunch the server in any of this. Also note that the .bat file I’ve created to launch each node also has a step where it disconnects and remaps the network drive.

I will try once again to discuss this with our network guys but I’m just wondering if you see any red flags on our setup that could be causing this? Do you have experience with a Windows network set up like this? Is the server particularly sensitive to any inconsistencies in the connection with the server on a Windows network?