Hello Jay,
There are several ways to generate the mesh model after completing the camera alignment:
1. based on tie points - would be completed very fast, but the model will be quite rough,
2. based on the dense point cloud (note that dense point cloud includes the depth maps generation step) - usually used for aerial surveys,
3. based on the depth maps source (depth maps will be generated automatically, if you select Depth Maps source option in the Build Model dialog) - suggested for the close range projects.
Without the depth maps generation you wouldn't be able to complete approaches 2 and 3 described above.
If the depth maps generation process takes a long time, you can lower the depth maps quality option. Also check that you have rather powerful GPU that is enabled in Metashape preferences window.