Text-to-Image Prompt Engineering

JavaScript isn't enabled in your browser, so this file can't be opened. Enable and reload.

Presentation: https://docs.google.com/presentation/d/1GneFREiaI4xyiDbwDNxqPIekVNjk_zOmg9qzC-HMGwk/edit?usp=sharing
Paper: https://arxiv.org/pdf/2403.19103.pdf

PRISM Algorithm

Pseudonym
*

What significant improvement does DALL-E 3 introduce to enhance the prompt-following abilities of the model? *

Integration of a complex recurrent neural network.

Implementation of a new type of GAN specially optimized for text-to-image tasks.

Training on highly descriptive generated image captions to improve data quality.

What is the main advantage of the PRISM algorithm introduced in this paper?
*

It requires detailed manual input to generate prompts.

It automates prompt generation for personalized text-to-image (T2I) with minimal human input and generalizes across different models.

It necessitates white-box access to text-to-image (T2I) models.

Given that PRISM leverages the in-context learning abilities of large language models (LLMs) to refine prompts, how does the system update the candidate prompt distribution based on the generated images and evaluation scores?
*

Submit

Clear form

Never submit passwords through Google Forms.

This content is neither created nor endorsed by Google. Report Abuse - Terms of Service - Privacy Policy

Forms