Hello Ingsayyad,
After alignment stage you get sparse point cloud, consisting of the valid tie points, for each point in the sparse cloud there's a set of projections on the individual images. Also you've got extrinsic and intrinsic orientation for the aligned cameras.
Could you please specify whether this data is sufficient for your needs, or probably provide some details regarding fifth step of the workflow you've mentioned?
Also note that in the next version of PhotoScan it would be possible to generate DEM from the dense cloud and create an orthophoto based on the DEM. So mesh generation stage could be skipped.