Daniel Wigton - Nostr Hypermedia

I have never been happy with generative AI. Over the past few years it has become quite impressive. There is no doubt that it can generate "good" images, what I am unhappy with is that they can't follow even moderately simple prompts that contain relative positions. E.g. "A boy and his father walking on either side of a horse." It will only get the relative positions accurate by accident if it doesn't start adding other random people as well. The image it creates will look nice but it isn't what you asked for. I think the reason is they way generative AI works. It basically spews out random noise then asks a vision model which one looks more like the prompt. It is like Michelangelo trying to paint the Sistine Chapel by returning every morning to saying only "yes" or "no" in response to an assistant asking "is this what you wanted?" I think the only way to get real prompt following is to train language models to use image creation tools. That way changes can be made with something like intent.