2-3 ControlNet

Master AI image generation with ControlNet. Learn about its preprocessors, how it works, and how to use it to create stunning AI art.

What is ControINet?

ControlNet is a plugin used for controlling AI image generation. It employs a technology known as "Conditional Generative Adversarial Networks" (CGANs) to generate images. Unlike traditional Generative Adversarial Networks, ControlNet allows users to finely control the generated images, such as uploading line drawings for AI to colorize, or controlling the posture of characters, generating image line drawings, etc.

Four examples of AI images processed by different models

Different from traditional drawing models, a complete ControlNet consists of two parts: a Preprocessing Model and a ControlNet.

Preprocessing Model: Responsible for extracting the spatial semantic information from the original image and converting it into a visual preview image, such as line drawings, depth maps, etc.
ControlNet: Processes more fundamental structured information like lines and depth of field.

The process of generating an AI girl image using ControlNet

Canny

Basic Information

The Canny model primarily identifies edge information in input images, capable of extracting precise line drawings from uploaded pictures. It then generates new scenes consistent with the original image's composition based on specified prompts.

Before and after comparison of using the Canny model to process a 3d cartoon image

Preprocessor:

canny: Hard edge detection.
invert: Inverts the colors to black lines on a white background, reversing the colors of the line drawings.

Before and after comparison of using the Canny model to process a sketch image

invert is not unique to Canny and can be used in conjunction with most line drawing models. When we select ControINet types like Line Art or MLSD t recognition, the invert is available.

Method of Operation

Operational Sequence:

Upload Image - Select Model - Select ControINet Type - Enter Prompts - Generate

Intelligent Analysis:

Reverse-infer the image prompts and model. If a different style from the original image is desired, it is recommended to turn off intelligent analysis.

Parameter Settings:

Preprocessing Resolution

The preprocessing resolution affects the output resolution of the preview image. Since the aspect ratio of the image is fixed, and the default output is a 1x image, the resolution setting essentially determines the horizontal size of the preview image. For example, if the original image size and the target image size are both 512x768, when we set the preprocessing resolution to 128, 256, 512, 1024, the preprocessed image size will change to 128x192 (0.25x original), 256x384 (0.5x original), 512x768 (original), and 1024x1536 (2x original), respectively.

In general, the higher the resolution setting, the richer the details of the generated image.

*Sometimes, when the preprocessing detection image and the final image size are inconsistent, it can lead to damaged drawn images, with clear pixelation at the edges of figures in the final drawing.

Examples of images processed by the Canny model with different resolution

Control Weight

Determines the strength of the ControINet. The higher the intensity, the more pronounced the control over the image effect, and the closer the generated image is to the original.

Examples of superman images processed by the Canny model with different weight

Control Mode

Used to switch the weight proportion between ControlNet and the prompt words. The default setting is balanced.

Prioritize Prompts: The control diagram effect will be weakened.

Prioritize Pre-processing Image: The control diagram effect will be enhanced.

Generation Results

From the generated results, it can be seen that the basic composition is exactly the same as the original image, but the details are completely different. If you need images with other changes, such as hair color, facial details, clothing, etc., you can adjust the keywords and parameters to achieve the desired effects.

Four examples of AI-generated superman images with different models

OpenPose Full

Basic Information

OpenPose Full can achieve precise control over human body movements and facial expression features. It's capable not only of generating poses for a single person but also for multiple people.

OpenPose Full can identify key structural points of the human body such as the head, shoulders, elbows, knees, etc., while ignoring details of clothing, hairstyles, and backgrounds, ensuring the true reproduction of poses and expressions.

Preprocessor

Human Pose Recognition

The default processors are from the openpose series, including openpose, face, faceonly, full, hand. These five preprocessors are used to detect facial features, limbs, hands, and other human body structures respectively.

Animal Pose Recognition

It is recommended to use the animal_openpose processor, which can be used in conjunction with specialized preprocessing models, such as control_sd15_animal_openpose_fp16.

In general, using the default openpose_full preprocessor is sufficient.

Different examples of AI images with one pose

Line Art

Basic Information

Line Art is also about extracting the edge line art from images, but its use cases are more specific, including two directions: realistic and anime.

Preprocessor

Line Art

More suitable for realistic images, the extracted line art is more restorative, retaining more edge details during detection, thus the control effect is more significant.

Difference between Line Art and Canny

Canny: Hard straight lines, uniform thickness.

Line Art: Obvious brushstroke traces, similar to real hand-drawn drafts, allowing clear observation of thickness transition under different edges.

Line Art retains more details, resulting in a relatively softer image, and is more suitable for line art coloring functions.

Canny are more precise and simplify the content of the image.

Comparison of images processed by Line Art and Canny

*Line Art can be used for coloring draft images, fully following the draft.

Depth

Basic Information

Depth, also known as distance images, intuitively reflects the three-dimensional depth information of objects in a scene. Depth is displayed in black and white; the closer an object is to the camera, the lighter (whiter) its color; conversely, the farther away it is, the darker (blacker) its color.

Comparison of the original image and the image processed with Depth

Depth can extract the foreground and background relationship of objects from an image, create a depth map, and apply it to image drawing. Therefore, when it is necessary to clarify the hierarchical relationship of objects in a scene, depth detection can serve as a powerful auxiliary tool.

It is recommended to use the depth_midas preprocessor to achieve better image output results.

Comparison of the original image, depth image and result image

Normal Bae

Basic Information

Normal Bae involves generating a normal map based on the light and shadow information in the scene, thereby simulating the details of the object's surface texture and accurately restoring the layout of the scene's content. Therefore, model recognition is often used to reflect more realistic light and shadow details on object surfaces. In the example below, you can see a significant improvement in the lighting and shadow effects of the scene after drawing with model recognition.

When using, it is recommended to select the normal_bae preprocessor for a more noticeable improvement in lighting and shadow effects.

Comparison of the original image, Normal Bae image and result image

Segmentation

Basic Information

Segmentation can divide the scene into different blocks while detecting content outlines, and assign semantic annotations to these blocks, thereby achieving more precise control over the image.

Observing the image below, we can see that the image after semantic segmentation detection includes different colored blocks. Different contents in the scene are assigned different colors, such as characters marked in red, the ground in brown, signboards in pink, etc. When generating images, the model will produce specific objects within the corresponding color block range, thus achieving more accurate content restoration.

When using, it is recommended to select the default seg_ufade20k preprocessor. Users can also modify the image content by filling in color blocks in the preprocessing image.

Comparison of the original image, Segmentation image and result image

Tile Resample

Tile Resample can convert low-resolution images into higher-resolution versions while minimizing quality loss.

Three preprocessors: tile_resample, tile_colorfix, and tile_colorfixsharp.

*In comparison, the default resample offers more flexibility in drawing, and the content will not differ significantly from the original image.

Examples of AI cartoon wolf images processed with different preprocessors

*In comparison, the default resample offers more flexibility in drawing, and the content will not differ significantly from the original image.

MLSD

Basic Information

MLSD recognition extracts straight edge lines from the scene, making it particularly useful for delineating the linear geometric boundaries of objects. The most typical applications are in the fields of geometric architecture, interior design, and similar areas.

Comparison of the original image and the image after MLSD recognition

Comparison of the original image, MLSD recognition image, and result image

Scribble HED

Basic Information

Scribble HED resembles crayon scribble line drawings, offering more freedom in controlling the image effect.

Preprocessors: HED, PiDiNet, XDoG, and t2ia_sketch_pidi.

As can be seen from the images below, the first two preprocessors produce thicker outlines that are more in line with the hand-drawn effect of doodles, while the latter two produce finer lines, suitable for realistic styles.

Examples of images processed with different Scribble HED preprocessors

Comparison of the original image, Scribble HED processed image, and the result image

*Can be used for coloring draft images, with a certain degree of randomness.

HED

Basic Information

HED creates clear and precise boundaries around objects, with an output similar to Canny. Its effectiveness lies in the ability to capture complex details and contours while retaining detailed features (facial expressions, hair, fingers, etc.). The HED preprocessor can be used to modify the style and color of an image.

Compared to Canny, HED produces softer lines and retains more details. Users can choose the appropriate preprocessor based on their actual needs.

Comparison - original vs HED and original vs Canny

Four examples of AI-generated girl images processed with Scribble HED

color_grid

Basic Information

Through the use of preprocessors, we can obtain results from color block processing, where the generated images will be redrawn based on the original colors.

Comparison of original vs color_grid vs result

shuffle

Basic Information

By randomly shuffling all information features of the reference image and then recombining them, the generated image may differ from the original in structure, content, etc., but a hint of stylistic correlation can still be observed.

The use of content recombination is not widespread due to its relatively poor control stability. However, using it to gain inspiration could be a good choice.

Comparison of original vs shuffle vs result

Reference Generation

Basic Information

To generate a new image based on the reference original, it is recommended to use the default "only" preprocessor.

Control Weight: The higher the value, the stronger the stability of the image, and the more obvious the traces of the original image's style will be preserved.

Different reference images generated based on the cartoon fox image

recolor

Filling in colors for images is very suitable for repairing some black-and-white old photos. However, it cannot guarantee that colors appear accurately in specific positions, and there might be cases of color contamination.

Preprocessors: "intensity" and "luminance", with "luminance" is recommended.

Comparison - original vs recolor_luminance vs recolor_intensity

ip_adapter

Basic Information

Turning an uploaded image into image prompts allows it to recognize the artistic style and content of the reference image, and then generate similar works. It can also be used in conjunction with other ControINet.

apply ip_adapter to an image to genrated a new one

Method of Operation

Upload the original image A that needs to be generated, and select ControINet options like Canny, openpose, Depth, etc.

Add a new ControINet, ip_adapter, upload the style image B you want to inherit, and finally click Generate.

Result: Image A with the style of B.

Previous2-2 Image to Image Next2-4 AI Apps

Last updated 9 months ago