Yes, JamesPatrick is right. The pixel-wise annotations consist of a second image with the same dimensions. The pixel values are used to encode information e.g. which type of land is this pixel? building, water, road,...
I'm working on the classification of areas in grasslands.
One use case i can think of is the validation of annotations on orthomosaic. The shape of the annotated area is what i want to validate. I would be able to annotate images taken on the ground align the images and project them on the orthoplan. Then i can layer the ground image annotation with the annotation from the orthomosaic to see where errors were made.