# 2-3 ControlNet

## What is ControINet?

ControlNet is a plugin used for controlling AI image generation. It employs a technology known as "Conditional Generative Adversarial Networks" (CGANs) to generate images. Unlike traditional Generative Adversarial Networks, ControlNet allows users to finely control the generated images, such as uploading line drawings for AI to colorize, or controlling the posture of characters, generating image line drawings, etc.

Different from traditional drawing models, a complete ControlNet consists of two parts: a <mark style="background-color:yellow;">Preprocessing Model</mark> and a <mark style="background-color:yellow;">ControlNet.</mark>

* Preprocessing Model: Responsible for extracting the spatial semantic information from the original image and converting it into a visual preview image, such as line drawings, depth maps, etc.
* ControlNet: Processes more fundamental structured information like lines and depth of field.

### **Canny**

**Basic Information**

The Canny model primarily identifies <mark style="background-color:yellow;">edge information</mark> in input images, capable of extracting precise line drawings from uploaded pictures. It then generates new scenes consistent with the original image's composition based on specified prompts.

<figure><img src="https://2219884424-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FigAtVLBrlaI8jVruJfC8%2Fuploads%2FWID88kYz6RagZgC0uovG%2F26df399a-12dd-4eeb-a352-ffdbcd5a3b84.png?alt=media&#x26;token=5fea1d87-b808-4b35-b62b-a7cb40b1ed9e" alt="Before and after comparison of using the Canny model to process a 3d cartoon image" width="563"><figcaption><p>Original / Preprocessing</p></figcaption></figure>

Preprocessor:

* **canny:** Hard edge detection.
* **invert:** Inverts the colors to black lines on a white background, reversing the colors of the line drawings.

<figure><img src="https://2219884424-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FigAtVLBrlaI8jVruJfC8%2Fuploads%2FohDKUdJBxtvylw8iRzJS%2F54f8a769-fde5-4b77-9400-9020d4163d0f.png?alt=media&#x26;token=470bd547-92ca-4657-8e81-f2ce55016cb9" alt="Before and after comparison of using the Canny model to process a sketch image" width="563"><figcaption><p>Original / invert</p></figcaption></figure>

invert is not unique to Canny and can be used in conjunction with most line drawing models. When we select ControINet types like Line Art or MLSD t recognition, the invert is available.

**Method of Operation**

Operational Sequence:

<mark style="background-color:yellow;">Upload Image - Select Model - Select ControINet Type - Enter Prompts - Generate</mark>

Intelligent Analysis:

Reverse-infer the image prompts and model. If a different style from the original image is desired, it is recommended to turn off intelligent analysis.

Parameter Settings:

<mark style="background-color:yellow;">Preprocessing Resolution</mark>

The preprocessing resolution affects the output resolution of the preview image. Since the aspect ratio of the image is fixed, and the default output is a 1x image, the resolution setting essentially determines the horizontal size of the preview image. For example, if the original image size and the target image size are both 512x768, when we set the preprocessing resolution to 128, 256, 512, 1024, the preprocessed image size will change to 128x192 (0.25x original), 256x384 (0.5x original), 512x768 (original), and 1024x1536 (2x original), respectively.

<mark style="color:red;">In general, the higher the resolution setting, the richer the details of the generated image.</mark>&#x20;

<mark style="color:red;">\*</mark>Sometimes, when the preprocessing detection image and the final <mark style="background-color:yellow;">image size are inconsistent</mark>, it can lead to damaged drawn images, with clear pixelation at the edges of figures in the final drawing.

<figure><img src="https://2219884424-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FigAtVLBrlaI8jVruJfC8%2Fuploads%2FPFiQaO1cw3W3XwEsGSNs%2FRainbow%20%C3%97%20SeaArt%20AI%20(1).png?alt=media&#x26;token=ab94f4cd-5b96-45a7-a3a4-851f4b92069c" alt="Examples of images processed by the Canny model with different resolution"><figcaption></figcaption></figure>

<mark style="background-color:yellow;">Control Weight</mark>

Determines the strength of the ControINet. The higher the intensity, the more pronounced the control over the image effect, and the closer the generated image is to the original.

<figure><img src="https://2219884424-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FigAtVLBrlaI8jVruJfC8%2Fuploads%2FMVmdtsa6TiDkeLQ9oO8D%2F%25E6%258E%25A7%25E5%2588%25B6%25E6%259D%2583%25E9%2587%258D.png?alt=media&#x26;token=722ec59b-6582-4558-909a-23f77d951ae0" alt="Examples of superman images processed by the Canny model with different weight" width="506"><figcaption></figcaption></figure>

<mark style="background-color:yellow;">Control Mode</mark>

Used to switch the weight proportion between ControlNet and the prompt words. The default setting is balanced.

Prioritize Prompts: The control diagram effect will be weakened.

Prioritize Pre-processing Image: The control diagram effect will be enhanced.

<mark style="background-color:yellow;">Generation Results</mark>

From the generated results, it can be seen that the basic composition is exactly the same as the original image, but the details are completely different. If you need images with other changes, such as hair color, facial details, clothing, etc., you can adjust the keywords and parameters to achieve the desired effects.

<figure><img src="https://2219884424-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FigAtVLBrlaI8jVruJfC8%2Fuploads%2F1a0EtnSZvROp78CH67y7%2Fca%25E7%25BB%2593%25E6%259E%259C.png?alt=media&#x26;token=d5daf961-fbb1-4267-a615-9385e16297a2" alt="Four examples of AI-generated superman images with different models"><figcaption></figcaption></figure>

### **OpenPose Full**

**Basic Information**

OpenPose Full can achieve precise control over <mark style="background-color:yellow;">human body movements and facial expression features.</mark> It's capable not only of generating poses for a single person but also for multiple people.

OpenPose Full can <mark style="background-color:yellow;">identify key structural points of the human body</mark> such as the head, shoulders, elbows, knees, etc., while ignoring details of clothing, hairstyles, and backgrounds, ensuring the true reproduction of poses and expressions.

**Preprocessor**

Human Pose Recognition

The default processors are from the openpose series, including <mark style="background-color:yellow;">openpose, face, faceonly, full, hand.</mark> These five preprocessors are used to detect facial features, limbs, hands, and other human body structures respectively.

Animal Pose Recognition

It is recommended to use the animal\_openpose processor, which can be used in conjunction with specialized preprocessing models, such as control\_sd15\_animal\_openpose\_fp16.

<figure><img src="https://2219884424-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FigAtVLBrlaI8jVruJfC8%2Fuploads%2F0LNuqNhn93UWXeMwdBuz%2Ffba2887e-6a5a-46e4-a6cb-832db9277bde.png?alt=media&#x26;token=148a5ff9-9a71-4b20-8330-1370a216263a" alt="Animal pose recognition" width="563"><figcaption></figcaption></figure>

In general, using the default openpose\_full preprocessor is sufficient.

### **Line Art**

**Basic Information**

Line Art is also about extracting the edge line art from images, but its use cases are more specific, including two directions: realistic and anime.

**Preprocessor**

Line Art

More suitable for realistic images, the extracted line art is more restorative, retaining more edge details during detection, thus the control effect is more significant.

Difference between Line Art and Canny

Canny: Hard straight lines, uniform thickness.

Line Art: Obvious brushstroke traces, similar to real hand-drawn drafts, allowing clear observation of thickness transition under different edges.

Line Art retains <mark style="background-color:yellow;">more details,</mark> resulting in a <mark style="background-color:yellow;">relatively softer</mark> image, and is more suitable for line art coloring functions.

Canny are <mark style="background-color:yellow;">more precise and simplify</mark> the content of the image.

<mark style="color:red;">\*Line Art can be used for coloring draft images, fully following the draft.</mark>

### **Depth**

**Basic Information**

Depth, also known as distance images, intuitively reflects the three-dimensional depth information of objects in a scene. Depth is displayed in black and white; <mark style="background-color:yellow;">the closer an object is to the camera, the lighter (whiter) its color; conversely, the farther away it is, the darker (blacker) its color.</mark>

<figure><img src="https://2219884424-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FigAtVLBrlaI8jVruJfC8%2Fuploads%2FAy3APC4KTHkMlMQPHSsF%2F25cfc1a4-9c25-4290-8596-d11eb4657188.png?alt=media&#x26;token=8b06a194-46bf-4682-b54a-5a035b19cc75" alt="Comparison of the original image and the image processed with Depth"><figcaption><p>Original / Depth</p></figcaption></figure>

Depth can extract the <mark style="background-color:yellow;">foreground and background</mark> relationship of objects from an image, create a depth map, and apply it to image drawing. Therefore, when it is necessary to <mark style="background-color:yellow;">clarify the hierarchical relationship of objects in a scene,</mark> depth detection can serve as a powerful auxiliary tool.

It is recommended to use the <mark style="background-color:red;">depth\_midas</mark> preprocessor to achieve better image output results.

<figure><img src="https://2219884424-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FigAtVLBrlaI8jVruJfC8%2Fuploads%2F0nxK0jkU9jIfQulyPiXS%2Fe673e27a-5c5a-4dbe-8bd3-12dc046d21bb.png?alt=media&#x26;token=4e06806a-eec3-41e1-818b-4a52ee009cb8" alt="Comparison of the original image, depth image and result image"><figcaption><p>Original / Depth / Result</p></figcaption></figure>

### **Normal Bae**

**Basic Information**

Normal Bae involves generating a normal map based on the light and shadow information in the scene, <mark style="background-color:yellow;">thereby simulating the details of the object's surface</mark> texture and accurately restoring the layout of the scene's content. Therefore, model recognition is often used to reflect more <mark style="background-color:yellow;">realistic light</mark> and shadow details on object surfaces. In the example below, you can see a significant improvement in the lighting and shadow effects of the scene after drawing with model recognition.

When using, it is recommended to select the <mark style="background-color:red;">normal\_bae</mark> preprocessor for a more noticeable improvement in lighting and shadow effects.

<figure><img src="https://2219884424-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FigAtVLBrlaI8jVruJfC8%2Fuploads%2Fz79H8k0P1R4mlXaBSVwf%2F8d40f66d-90e1-4aee-9f10-d8802578e682.png?alt=media&#x26;token=e0cc57f9-b69e-4918-9351-9d0fa8be74d8" alt="Comparison of the original image, Normal Bae image and result image"><figcaption><p>Original / Normal Bae / Result</p></figcaption></figure>

### **Segmentation**

**Basic Information**

Segmentation can divide the scene into different blocks while detecting content outlines, and assign semantic annotations to these blocks, thereby achieving <mark style="background-color:yellow;">more precise control over the image.</mark>

Observing the image below, we can see that the image after semantic segmentation detection includes different colored blocks. Different contents in the scene are assigned different colors, such as characters marked in red, the ground in brown, signboards in pink, etc. When generating images, the model will produce specific objects within the corresponding color block range, thus achieving more accurate content restoration.

When using, it is recommended to select the default <mark style="background-color:red;">seg\_ufade20k</mark> preprocessor. Users can also modify the image content by filling in color blocks in the preprocessing image.

<figure><img src="https://2219884424-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FigAtVLBrlaI8jVruJfC8%2Fuploads%2FNHFnAXdy7lUWwxSfPBkN%2F17ee3ab3-bd91-47f4-8818-5b10e3dd9bbf.png?alt=media&#x26;token=e0eed696-fa77-42d7-9850-dc93533bb6f6" alt="Comparison of the original image, Segmentation image and result image"><figcaption><p>Original / Segmentation / Result</p></figcaption></figure>

### **Tile Resample**

Tile Resample can convert low-resolution images into higher-resolution versions while minimizing quality loss.

Three preprocessors: tile\_resample, tile\_colorfix, and tile\_colorfixsharp.

\*In comparison, the default resample offers more flexibility in drawing, and the content will not differ significantly from the original image.

<figure><img src="https://2219884424-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FigAtVLBrlaI8jVruJfC8%2Fuploads%2FWPbq0nEQduFJjQ9F5ePa%2F%25E8%25B6%2585%25E6%25B8%2585%25E9%2587%258D%25E7%25BB%2598.png?alt=media&#x26;token=35d1aa4e-5c84-4753-8284-fc1201b1568f" alt="Examples of AI cartoon wolf images processed with different preprocessors"><figcaption></figcaption></figure>

<mark style="color:red;">\*</mark>In comparison, the default <mark style="background-color:red;">resample</mark> offers more flexibility in drawing, and the content will not differ significantly from the original image.

### **MLSD**

**Basic Information**

MLSD recognition extracts <mark style="background-color:yellow;">straight edge</mark> lines from the scene, making it particularly useful for delineating the linear geometric boundaries of objects. The most typical applications are in the <mark style="background-color:yellow;">fields of geometric architecture, interior design, and similar areas.</mark>

<figure><img src="https://2219884424-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FigAtVLBrlaI8jVruJfC8%2Fuploads%2FOLVtg04MMLzX6UfuXxkm%2F26c049a8-4c7f-4658-8444-4ba7f0d66c57.png?alt=media&#x26;token=1905f58c-959c-45d6-9eaa-cf05c1216a65" alt="Comparison of the original image, MLSD recognition image, and result image"><figcaption></figcaption></figure>

### **Scribble HED**

**Basic Information**

Scribble HED resembles crayon scribble line drawings, offering more freedom in controlling the image effect.

Preprocessors: HED, PiDiNet, XDoG, and t2ia\_sketch\_pidi.

As can be seen from the images below, the first two preprocessors produce thicker outlines that are more in line with the hand-drawn effect of doodles, while the latter two produce finer lines, suitable for realistic styles.

<mark style="color:red;">\*Can be used for coloring draft images, with a certain degree of randomness.</mark>

**HED**

**Basic Information**

HED creates <mark style="background-color:yellow;">clear and precise boundaries around objects,</mark> with an output similar to Canny. Its effectiveness lies in the ability to <mark style="background-color:yellow;">capture complex details and contours</mark> while retaining detailed features (facial expressions, hair, fingers, etc.). The HED preprocessor can be used to modify the style and color of an image.

Compared to Canny, HED produces <mark style="background-color:yellow;">softer lines and retains more details.</mark> Users can choose the appropriate preprocessor based on their actual needs.

### **color\_grid**

**Basic Information**

Through the use of preprocessors, we can obtain results from color block processing, where the generated images will be redrawn based on the original colors.

### **shuffle**

**Basic Information**

By randomly shuffling all information features of the reference image and then recombining them, the generated image may differ from the original in structure, content, etc., but a hint of stylistic correlation can still be observed.

The use of content recombination is not widespread due to its relatively poor control stability. However, using it to gain inspiration could be a good choice.

<figure><img src="https://2219884424-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FigAtVLBrlaI8jVruJfC8%2Fuploads%2F4EKfSttSg1X6U4Odkw1R%2F1c1f5598-fb07-41ff-833b-b4c0cee7ea5e.png?alt=media&#x26;token=24917d02-60ae-4408-9a37-441cf22f8fee" alt="Comparison of original vs shuffle vs result" width="563"><figcaption></figcaption></figure>

### **Reference Generation**

**Basic Information**

To generate a new image based on the reference original, it is recommended to use the default <mark style="background-color:red;">"only"</mark> preprocessor.

**Control Weight:** The higher the value, the stronger the stability of the image, and the more obvious the traces of the original image's style will be preserved.

<figure><img src="https://2219884424-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FigAtVLBrlaI8jVruJfC8%2Fuploads%2FdZ8dsZj9eMX3nU94IoXF%2F%25E5%258F%2582%25E8%2580%2583%25E7%2594%259F%25E6%2588%2590.png?alt=media&#x26;token=e6c93c70-a374-4955-841a-d21fac4fcf0a" alt="Different reference images generated based on the cartoon fox image"><figcaption></figcaption></figure>

### **recolor**

Filling in colors for images is very suitable for repairing some black-and-white old photos. However, it cannot guarantee that colors appear accurately in specific positions, and there might be cases of color contamination.

**Preprocessors:** "intensity" and "luminance", with <mark style="background-color:red;">"luminance"</mark> is recommended.

<figure><img src="https://2219884424-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FigAtVLBrlaI8jVruJfC8%2Fuploads%2FUlQKi0YmVJK8osCl7nsO%2Fd857536d-c629-4fc7-b1c8-b8158b54ac9d.png?alt=media&#x26;token=6a514cea-1527-4f8d-80b0-2c3f71edfade" alt="Comparison - original vs recolor_luminance vs recolor_intensity"><figcaption></figcaption></figure>

### **ip\_adapter**

**Basic Information**

Turning an uploaded image into image prompts allows it to recognize the artistic style and content of the reference image, and then generate similar works. It can also be used in conjunction with other ControINet.

<figure><img src="https://2219884424-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FigAtVLBrlaI8jVruJfC8%2Fuploads%2FzrySeb4o3yRECgplxNRs%2F3f9ee50e-8c2b-45ff-80cc-f64f96375759.png?alt=media&#x26;token=fbe2d1c8-6642-48b9-aea0-45e06352cb47" alt="apply ip_adapter to an image to genrated a new one"><figcaption></figcaption></figure>

**Method of Operation**

1. Upload the original image A that needs to be generated, and select ControINet options like Canny, openpose, Depth, etc.
2. Add a new ControINet, ip\_adapter, upload the style image B you want to inherit, and finally click Generate.

<mark style="background-color:red;">Result: Image A with the style of B.</mark>
