It doesn't read text from an image like a software made to extract text would.
It predicts how the image is going to look like pixel by pixel; It doesn't see text and images differently.
What it wrote is visually similar to "Good morning" in Good Morning images and that's all it wants to achieve.