A 100KB text-to-image AI model?

Researchers at NVidia are showing off Perfusion, a text-to-image model they say is 100KB in size and takes four minutes to train. What could possibly go wrong? (ᔥ Hackaday)

