Forum

Author Topic: Problem with Cluster, compute nodes are killed "Out of memory"  (Read 3371 times)

nicolo.dellunto

  • Newbie
  • *
  • Posts: 5
    • View Profile
General Problem description:
When Photoscan is executed on a single system (Linux) it works as intended without any errors. When it's configured for network processing using 3 compute nodes with server and client running on a 4'rd node the processes on the compute nodes are killed by the kernel due to "Out of memory". RAM utilization in "single system mode" (i.e. no network processing) is below 2GB during the complete process. Using the same data set allocates 64 GB RAM (verified using htop) on each compute node after approx 30 sec run time even though each node only reads 1/3 of the images.

One observation is that the progress dialog shows different processing steps if the program is run in "single system mode" or "network processing mode", please see below.



No cluster, Photoscan executed on single node

HW/OS info:
- 64GB RAM
- Dual AMD Opteron 6220 8 core, 3.0 GHz
- CentOS 7.2
- kernel: 2.6.32-573.18.1.el6.x86_64
- Photoscan version: 1.2.4, 64-bit

Test data set:
- 115 jpg images, 2592 x 1720, RGB. Each image approx 800k

RAM usage:
Doesn't exceed 2 GB during execution

Progress Dialog:
1. Detecting points
2. Matching points
3. ...

--> Successful program execution, no errors!

---------------------------------------------------------------------------------------------

Cluster (network processing)

HW/OS info
- 64GB RAM
- Dual AMD Opteron 6220 8 core, 3.0 GHz
- CentOS 7.2
- kernel: 2.6.32-573.18.1.el6.x86_64


Test data set:
- 115 jpg images, 2592 x 1720, RGB. Each image approx 800k

Startup:
start server ( on 10.14.0.249): photoscan.sh --server --control 10.14.0.249 --dispatch 10.14.0.249
start nodes (on each node): photoscan.sh --node --dispatch 10.14.0.249
start client (on 10.14.0.249): photoscan.sh

Progress Dialog:
1.  Match Photos (3/3 nodes active)
2.  .... (no further update after approx 30 sec)

After 30 seconds the RAM utilization on each node reaches max RAM (64GB) and process is killed on all three nodes by kernel due to "Out of memory" (/var/log/messages)

Attached files: logs from stdout for server and all thee nodes

Alexey Pasumansky

  • Agisoft Technical Support
  • Hero Member
  • *****
  • Posts: 14943
    • View Profile
Re: Problem with Cluster, compute nodes are killed "Out of memory"
« Reply #1 on: June 16, 2016, 07:40:00 PM »
Hello Nicolo,

Unfortunately cannot see any attached logs. If possible, can you start he network processing on the single node only (that fails on multi-node approach) and send us back outputs from server and nodes? You can also send this information to support@agisoft.com.
Best regards,
Alexey Pasumansky,
Agisoft LLC

nicolo.dellunto

  • Newbie
  • *
  • Posts: 5
    • View Profile
Re: Problem with Cluster, compute nodes are killed "Out of memory"
« Reply #2 on: June 17, 2016, 09:11:48 AM »
Thank you Alexey for your feedback, I'll send you the logs by email and I'll run the test

Best,

/n

Dmitry Semyonov

  • Agisoft Technical Support
  • Full Member
  • *****
  • Posts: 200
    • View Profile
Re: Problem with Cluster, compute nodes are killed "Out of memory"
« Reply #3 on: June 20, 2016, 11:47:43 AM »
Hello Nicolo,

Thank you for reporting the problem.

It was fixed in PhotoScan 1.2.5. Please check PhotoScan downloads page for an updated version:
http://www.agisoft.com/downloads/installer/
With best regards,
Dmitry Semyonov
Agisoft

nicolo.dellunto

  • Newbie
  • *
  • Posts: 5
    • View Profile
Re: Problem with Cluster, compute nodes are killed "Out of memory"
« Reply #4 on: June 20, 2016, 11:58:18 AM »
Thank you so much for your reply!
We will make some test with the new version :-)

Best,

/n