2-3 ControlNet

Master AI image generation with ControlNet. Learn about its preprocessors, how it works, and how to use it to create stunning AI art.

What is ControINet?

ControlNet is a plugin used for controlling AI image generation. It employs a technology known as "Conditional Generative Adversarial Networks" (CGANs) to generate images. Unlike traditional Generative Adversarial Networks, ControlNet allows users to finely control the generated images, such as uploading line drawings for AI to colorize, or controlling the posture of characters, generating image line drawings, etc.

Different from traditional drawing models, a complete ControlNet consists of two parts: a Preprocessing Model and a ControlNet.

  • Preprocessing Model: Responsible for extracting the spatial semantic information from the original image and converting it into a visual preview image, such as line drawings, depth maps, etc.

  • ControlNet: Processes more fundamental structured information like lines and depth of field.

Canny

Basic Information

The Canny model primarily identifies edge information in input images, capable of extracting precise line drawings from uploaded pictures. It then generates new scenes consistent with the original image's composition based on specified prompts.

Preprocessor:

  • canny: Hard edge detection.

  • invert: Inverts the colors to black lines on a white background, reversing the colors of the line drawings.

invert is not unique to Canny and can be used in conjunction with most line drawing models. When we select ControINet types like Line Art or MLSD t recognition, the invert is available.

Method of Operation

Operational Sequence:

Upload Image - Select Model - Select ControINet Type - Enter Prompts - Generate

Intelligent Analysis:

Reverse-infer the image prompts and model. If a different style from the original image is desired, it is recommended to turn off intelligent analysis.

Parameter Settings:

Preprocessing Resolution

The preprocessing resolution affects the output resolution of the preview image. Since the aspect ratio of the image is fixed, and the default output is a 1x image, the resolution setting essentially determines the horizontal size of the preview image. For example, if the original image size and the target image size are both 512x768, when we set the preprocessing resolution to 128, 256, 512, 1024, the preprocessed image size will change to 128x192 (0.25x original), 256x384 (0.5x original), 512x768 (original), and 1024x1536 (2x original), respectively.

In general, the higher the resolution setting, the richer the details of the generated image.

*Sometimes, when the preprocessing detection image and the final image size are inconsistent, it can lead to damaged drawn images, with clear pixelation at the edges of figures in the final drawing.

Control Weight

Determines the strength of the ControINet. The higher the intensity, the more pronounced the control over the image effect, and the closer the generated image is to the original.

Control Mode

Used to switch the weight proportion between ControlNet and the prompt words. The default setting is balanced.

Prioritize Prompts: The control diagram effect will be weakened.

Prioritize Pre-processing Image: The control diagram effect will be enhanced.

Generation Results

From the generated results, it can be seen that the basic composition is exactly the same as the original image, but the details are completely different. If you need images with other changes, such as hair color, facial details, clothing, etc., you can adjust the keywords and parameters to achieve the desired effects.

OpenPose Full

Basic Information

OpenPose Full can achieve precise control over human body movements and facial expression features. It's capable not only of generating poses for a single person but also for multiple people.

OpenPose Full can identify key structural points of the human body such as the head, shoulders, elbows, knees, etc., while ignoring details of clothing, hairstyles, and backgrounds, ensuring the true reproduction of poses and expressions.

Preprocessor

Human Pose Recognition

The default processors are from the openpose series, including openpose, face, faceonly, full, hand. These five preprocessors are used to detect facial features, limbs, hands, and other human body structures respectively.

Animal Pose Recognition

It is recommended to use the animal_openpose processor, which can be used in conjunction with specialized preprocessing models, such as control_sd15_animal_openpose_fp16.

In general, using the default openpose_full preprocessor is sufficient.

Line Art/Line Art Anime

Basic Information

Line Art is also about extracting the edge line art from images, but its use cases are more specific, including two directions: realistic and anime.

Preprocessor

Line Art

More suitable for realistic images, the extracted line art is more restorative, retaining more edge details during detection, thus the control effect is more significant.

Line Art Anime

Relatively more random.

Difference between Line Art and Canny

Canny: Hard straight lines, uniform thickness.

Line Art: Obvious brushstroke traces, similar to real hand-drawn drafts, allowing clear observation of thickness transition under different edges.

Line Art retains more details, resulting in a relatively softer image, and is more suitable for line art coloring functions.

Canny are more precise and simplify the content of the image.

*Line Art can be used for coloring draft images, fully following the draft.

Depth

Basic Information

Depth, also known as distance images, intuitively reflects the three-dimensional depth information of objects in a scene. Depth is displayed in black and white; the closer an object is to the camera, the lighter (whiter) its color; conversely, the farther away it is, the darker (blacker) its color.

Depth can extract the foreground and background relationship of objects from an image, create a depth map, and apply it to image drawing. Therefore, when it is necessary to clarify the hierarchical relationship of objects in a scene, depth detection can serve as a powerful auxiliary tool.

It is recommended to use the depth_midas preprocessor to achieve better image output results.

Normal Bae

Basic Information

Normal Bae involves generating a normal map based on the light and shadow information in the scene, thereby simulating the details of the object's surface texture and accurately restoring the layout of the scene's content. Therefore, model recognition is often used to reflect more realistic light and shadow details on object surfaces. In the example below, you can see a significant improvement in the lighting and shadow effects of the scene after drawing with model recognition.

When using, it is recommended to select the normal_bae preprocessor for a more noticeable improvement in lighting and shadow effects.

Segmentation

Basic Information

Segmentation can divide the scene into different blocks while detecting content outlines, and assign semantic annotations to these blocks, thereby achieving more precise control over the image.

Observing the image below, we can see that the image after semantic segmentation detection includes different colored blocks. Different contents in the scene are assigned different colors, such as characters marked in red, the ground in brown, signboards in pink, etc. When generating images, the model will produce specific objects within the corresponding color block range, thus achieving more accurate content restoration.

When using, it is recommended to select the default seg_ufade20k preprocessor. Users can also modify the image content by filling in color blocks in the preprocessing image.

Tile Resample

Tile Resample can convert low-resolution images into higher-resolution versions while minimizing quality loss.

Three preprocessors: tile_resample, tile_colorfix, and tile_colorfixsharp.

*In comparison, the default resample offers more flexibility in drawing, and the content will not differ significantly from the original image.

*In comparison, the default resample offers more flexibility in drawing, and the content will not differ significantly from the original image.

MLSD

Basic Information

MLSD recognition extracts straight edge lines from the scene, making it particularly useful for delineating the linear geometric boundaries of objects. The most typical applications are in the fields of geometric architecture, interior design, and similar areas.

Scribble HED

Basic Information

Scribble HED resembles crayon scribble line drawings, offering more freedom in controlling the image effect.

Preprocessors: HED, PiDiNet, XDoG, and t2ia_sketch_pidi.

As can be seen from the images below, the first two preprocessors produce thicker outlines that are more in line with the hand-drawn effect of doodles, while the latter two produce finer lines, suitable for realistic styles.

*Can be used for coloring draft images, with a certain degree of randomness.

HED

Basic Information

HED creates clear and precise boundaries around objects, with an output similar to Canny. Its effectiveness lies in the ability to capture complex details and contours while retaining detailed features (facial expressions, hair, fingers, etc.). The HED preprocessor can be used to modify the style and color of an image.

Compared to Canny, HED produces softer lines and retains more details. Users can choose the appropriate preprocessor based on their actual needs.

color_grid

Basic Information

Through the use of preprocessors, we can obtain results from color block processing, where the generated images will be redrawn based on the original colors.

shuffle

Basic Information

By randomly shuffling all information features of the reference image and then recombining them, the generated image may differ from the original in structure, content, etc., but a hint of stylistic correlation can still be observed.

The use of content recombination is not widespread due to its relatively poor control stability. However, using it to gain inspiration could be a good choice.

Reference Generation

Basic Information

To generate a new image based on the reference original, it is recommended to use the default "only" preprocessor.

Control Weight: The higher the value, the stronger the stability of the image, and the more obvious the traces of the original image's style will be preserved.

recolor

Filling in colors for images is very suitable for repairing some black-and-white old photos. However, it cannot guarantee that colors appear accurately in specific positions, and there might be cases of color contamination.

Preprocessors: "intensity" and "luminance", with "luminance" is recommended.

ip_adapter

Basic Information

Turning an uploaded image into image prompts allows it to recognize the artistic style and content of the reference image, and then generate similar works. It can also be used in conjunction with other ControINet.

Method of Operation

  1. Upload the original image A that needs to be generated, and select ControINet options like Canny, openpose, Depth, etc.

  1. Add a new ControINet, ip_adapter, upload the style image B you want to inherit, and finally click Generate.

Result: Image A with the style of B.

Last updated