2-1 Text to Image
Learn what is text to image and how to use AI text-to-image generator with easy, step-by-step instructions.
Last updated
Learn what is text to image and how to use AI text-to-image generator with easy, step-by-step instructions.
Last updated
Have you ever encountered similar problems while using SeaArt: the control effect is not ideal, or the drawing results do not reflect the added prompts, and so on? In this article, we will comprehensively introduce how to master the operation of text-to-image and grasp the strategy of writing efficient prompt words.
In SeaArt AI, there are three modes of drawing: Text to Image, Image to Image, and ControINet
The basic steps for drawing are: select a model→ enter prompts→ set parameters→ generate.
The model determines the style, prompts define the content of the image, and parameters refine the preset characteristics of the image.
To guide AI more effectively, a method has been explored, namely providing positive or negative feedback to constrain the model's behavior. This guiding information is called a prompt, serving as a bridge between humans and AI.
A prompt includes positive prompts (describing the desired image content) and negative prompts (content not desired in the image).
If you're unfamiliar with editing prompts, you can click on Tools - Prompt Studio to quickly build efficient prompt combinations.
Negative Prompts are particularly useful when some Models have a poor understanding of specific details (e.g., hand structures), as they help avoid these elements and improve image quality.
For example, include: (bad hands, bad anatomy, bad body, bad face, bad teeth, bad arms, bad legs, deformities: 1.3)
Input Prompts: natural language/phrase form
Natural language: A girl with black hair dancing
Phrase form: A girl, black hair, dancing
The role of prompts is to guide and assist the model in the drawing process, rather than being a strict requirement. Even if your input is just a casual sentence, the model can still create an image for you, and the result might even be quite good.
*Rich prompts can better control the final output effect. In the later fine-tuning process, specific keywords can be quickly modified and verified for their impact on the drawing result.
An effective prompt is like assigning a task to the AI art generator. If the instruction is vague, such as merely saying "design a picture" without specifying elements and purpose, the result is often unpredictable. Therefore, detailed and specific instructions can greatly improve the quality and relevance of the outcome.
For example, if the prompt simply inputs "a girl," it does not mention the girl's attire, scene, camera angle, etc., and the AI can only perform based on the model's historical experience during training. Thanks to the model's capabilities, the drawing results we get are still quite good. However, if there are specific requirements for the content of the screen, such efficiency is very low.
When we add other descriptive words for content, the image will become much more stable.
An ideal prompt formula includes elements such as the main content, environmental background, composition, image settings, and reference style, each affecting the drawing outcome to different extents.
*This formula is a reference, not a strict rule for every prompt creation. First, determine the main content's impact, then optimize details according to personal needs.
Main content: Main content describes the primary subject, like people or animals, their clothing, expressions, fur, actions, or the material of objects. Generating multiple subjects together might pose issues; it's advisable to create each subject separately and then use ControINet generation for integration.
Environmental background: Environmental background sets the scene and auxiliary elements like sky color, surroundings, lighting, and color tone, enhancing the image's atmosphere and highlighting its theme.
Composition of shots: Composition adjusts the camera angle and perspective, such as depth of field emphasis or object layout, significantly boosting the visual impact.
Image settings: Image settings include terms to enhance visual expressiveness, like detail richness, photography quality, and cinematic effect. Image resolution and detail level are mainly determined by size, with post-processing techniques like Upscale further enhancing details.
Reference style: Describes the desired artistic style and mood, such as mentioning an artist's name, art techniques, era, or colors. However, the image style is largely determined by the model; if the model hasn't been trained on specific artistic style keywords, it might not understand them. For specific style requirements, using a model trained in that style might yield better results than merely using prompts.
Remix: If you find writing prompts too complex, you can look for inspiration from the AI-generated images on the homepage and use the one-click reuse of existing parameters and prompt words to simplify the creation process.
Emphasizing prompts relies on parentheses and numerical values to control the weight of specific prompts. The higher the weight value, the more the model prioritizes that prompt, focusing on rendering that part during the process. As a result, the final image will reflect more of the corresponding information. Conversely, less emphasis will result in less representation of that content in the image.
One method is to increase weight through the use of parentheses, and the other is to directly enter numerical values, with the latter being the more commonly used approach.
There are three types of parentheses for controlling the weight of prompt words:
Round parentheses ( ): Each layer increases the original weight by 1.1 times.
Square brackets [ ]: Each layer decreases the original weight to 0.9 times.
Moreover, parentheses support multiple layers of stacking, with each layer representing a weight multiplied by a fixed factor.
For example, by default, the girl's clothes will be a combination of yellow and orange. However, when "(((orange coat)))" is used, with the parentheses indicating an increase in emphasis, the model's depiction of the orange coat is enhanced, resulting in more orange appearing in the coat in the final image.
Conversely, when "[[orange coat]]" is used, with the square brackets indicating a decrease in emphasis, the orange elements are diminished. The model will then prioritize the remaining keywords "((Yellow coat))", leading to the coat appearing more yellow in the final image.
Directly input numerical values to control weight.
for example, by default, the hair is presented in green and red colors. If we set the weight after "(green hair)" to 0.9, it means the weight of the green hair part is reduced to 0.9 times its original value. Similarly, if we want to increase the weight of the green hair, we can simply enter 1.1 afterward.
*Although the emphasis on keywords' weight can vary from 0.1 to 100, considering the potential effect deviations caused by extreme weight values, it is recommended to keep the weight between 0.5 and 1.5 for optimal image results.
For specific parameter settings, click here to view details.
✨4-Parameters