These captions are generated by a deep artificial neural network. Nothing about the text generation is hardcoded, except that the maximum text length is limited for sanity. The model uses character-level prediction, so you can specify prefix text of one or more characters to influence the text generated. Using someone's name or other short text as a prefix works best.
The network was trained using public images generated by users of the Imgflip Meme Generator for the top 48 most popular Meme Templates. Beware, no profanity filtering was done on the training data so you may encounter vulgarity.
Curious about the technical details of building the network? Check out the accompanying article Meme Text Generation with a Convolutional Network in Keras & Tensorflow