# OpenCV 5

> OpenCV 5.x documentation and tutorials from the opencv/opencv 5.x branch (doc/).

## [OpenCV 5](https://docharvest.github.io/docs/opencv5/)

Contents

opencv5

OpenCV 5

OpenCV 5

OpenCV 5.x documentation and tutorials from the opencv/opencv 5.x branch (doc/).

OpenCV 5.x documentation and tutorials from the opencv/opencv 5.x branch (doc/).

-   [Faqfaq](/docs/opencv5/faq/)
-   [Opencv Logoopencv-logo](/docs/opencv5/opencv-logo/)
-   [Js Tutorialsjs\_tutorials/js\_tutorials](/docs/opencv5/js_tutorials/js_tutorials/)
-   [Py Tutorialspy\_tutorials/py\_tutorials](/docs/opencv5/py_tutorials/py_tutorials/)
-   [Tutorialstutorials/tutorials](/docs/opencv5/tutorials/tutorials/)
-   [Js Table Of Contents Corejs\_tutorials/js\_core/js\_table\_of\_contents\_core](/docs/opencv5/js_tutorials/js_core/js_table_of_contents_core/)
-   [Js Table Of Contents Dnnjs\_tutorials/js\_dnn/js\_table\_of\_contents\_dnn](/docs/opencv5/js_tutorials/js_dnn/js_table_of_contents_dnn/)
-   [Js Table Of Contents Guijs\_tutorials/js\_gui/js\_table\_of\_contents\_gui](/docs/opencv5/js_tutorials/js_gui/js_table_of_contents_gui/)
-   [Js Table Of Contents Imgprocjs\_tutorials/js\_imgproc/js\_table\_of\_contents\_imgproc](/docs/opencv5/js_tutorials/js_imgproc/js_table_of_contents_imgproc/)
-   [Js Table Of Contents Setupjs\_tutorials/js\_setup/js\_table\_of\_contents\_setup](/docs/opencv5/js_tutorials/js_setup/js_table_of_contents_setup/)
-   [Js Table Of Contents Videojs\_tutorials/js\_video/js\_table\_of\_contents\_video](/docs/opencv5/js_tutorials/js_video/js_table_of_contents_video/)
-   [Py Table Of Contents Bindingspy\_tutorials/py\_bindings/py\_table\_of\_contents\_bindings](/docs/opencv5/py_tutorials/py_bindings/py_table_of_contents_bindings/)
-   [Py Table Of Contents Calib3dpy\_tutorials/py\_calib3d/py\_table\_of\_contents\_calib3d](/docs/opencv5/py_tutorials/py_calib3d/py_table_of_contents_calib3d/)
-   [Py Table Of Contents Corepy\_tutorials/py\_core/py\_table\_of\_contents\_core](/docs/opencv5/py_tutorials/py_core/py_table_of_contents_core/)
-   [Py Table Of Contents Featurespy\_tutorials/py\_features/py\_table\_of\_contents\_features](/docs/opencv5/py_tutorials/py_features/py_table_of_contents_features/)
-   [Py Table Of Contents Guipy\_tutorials/py\_gui/py\_table\_of\_contents\_gui](/docs/opencv5/py_tutorials/py_gui/py_table_of_contents_gui/)
-   [Py Table Of Contents Imgprocpy\_tutorials/py\_imgproc/py\_table\_of\_contents\_imgproc](/docs/opencv5/py_tutorials/py_imgproc/py_table_of_contents_imgproc/)
-   [Py Table Of Contents Mlpy\_tutorials/py\_ml/py\_table\_of\_contents\_ml](/docs/opencv5/py_tutorials/py_ml/py_table_of_contents_ml/)
-   [Py Table Of Contents Objdetectpy\_tutorials/py\_objdetect/py\_table\_of\_contents\_objdetect](/docs/opencv5/py_tutorials/py_objdetect/py_table_of_contents_objdetect/)
-   [Py Table Of Contents Photopy\_tutorials/py\_photo/py\_table\_of\_contents\_photo](/docs/opencv5/py_tutorials/py_photo/py_table_of_contents_photo/)
-   [Py Table Of Contents Setuppy\_tutorials/py\_setup/py\_table\_of\_contents\_setup](/docs/opencv5/py_tutorials/py_setup/py_table_of_contents_setup/)
-   [Py Table Of Contents Videopy\_tutorials/py\_video/py\_table\_of\_contents\_video](/docs/opencv5/py_tutorials/py_video/py_table_of_contents_video/)
-   [Animationstutorials/app/animations](/docs/opencv5/tutorials/app/animations/)
-   [Highgui Wayland Ubuntututorials/app/highgui\_wayland\_ubuntu](/docs/opencv5/tutorials/app/highgui_wayland_ubuntu/)
-   [Intelperctutorials/app/intelperc](/docs/opencv5/tutorials/app/intelperc/)
-   [Kinect Opennitutorials/app/kinect\_openni](/docs/opencv5/tutorials/app/kinect_openni/)
-   [Orbbec Astra Opennitutorials/app/orbbec\_astra\_openni](/docs/opencv5/tutorials/app/orbbec_astra_openni/)
-   [Orbbec Uvctutorials/app/orbbec\_uvc](/docs/opencv5/tutorials/app/orbbec_uvc/)
-   [Raster Io Gdaltutorials/app/raster\_io\_gdal](/docs/opencv5/tutorials/app/raster_io_gdal/)
-   [Table Of Content Apptutorials/app/table\_of\_content\_app](/docs/opencv5/tutorials/app/table_of_content_app/)
-   [Trackbartutorials/app/trackbar](/docs/opencv5/tutorials/app/trackbar/)
-   [Video Input Psnr Ssimtutorials/app/video\_input\_psnr\_ssim](/docs/opencv5/tutorials/app/video_input_psnr_ssim/)
-   [Video Writetutorials/app/video\_write](/docs/opencv5/tutorials/app/video_write/)
-   [Table Of Content Calib3dtutorials/calib3d/table\_of\_content\_calib3d](/docs/opencv5/tutorials/calib3d/table_of_content_calib3d/)
-   [Usactutorials/calib3d/usac](/docs/opencv5/tutorials/calib3d/usac/)
-   [Mat Operationstutorials/core/mat\_operations](/docs/opencv5/tutorials/core/mat_operations/)
-   [Table Of Content Coretutorials/core/table\_of\_content\_core](/docs/opencv5/tutorials/core/table_of_content_core/)
-   [Table Of Content Dnntutorials/dnn/table\_of\_content\_dnn](/docs/opencv5/tutorials/dnn/table_of_content_dnn/)
-   [Table Of Content Featurestutorials/features/table\_of\_content\_features](/docs/opencv5/tutorials/features/table_of_content_features/)
-   [Table Of Content Geometrytutorials/geometry/table\_of\_content\_geometry](/docs/opencv5/tutorials/geometry/table_of_content_geometry/)

…and 256 more in the sidebar.

[llms.txt](/docs/opencv5/llms.txt) for agents

## [Faq](https://docharvest.github.io/docs/opencv5/faq/)

Contents

opencv5

Faq

OpenCV 5

Faq

# Frequently Asked Questions {#faq}

Compatibility page. FAQ migrated to the project [wiki](https://github.com/opencv/opencv/wiki/FAQ).

## [Js Basic Ops](https://docharvest.github.io/docs/opencv5/js_tutorials/js_core/js_basic_ops/js_basic_ops/)

Contents

opencv5

Js Basic Ops

OpenCV 5

Js Basic Ops

# Basic Operations on Images {#tutorial\_js\_basic\_ops}

## Goal

-   Learn how to access image properties
-   Learn how to construct Mat
-   Learn how to copy Mat
-   Learn how to convert the type of Mat
-   Learn how to use MatVector
-   Learn how to access pixel values and modify them
-   Learn how to set Region of Interest (ROI)
-   Learn how to split and merge images

## Accessing Image Properties

Image properties include number of rows, columns and size, depth, channels, type of image data.

@code{.js} let src = cv.imread("canvasInput"); console.log('image width: ' + src.cols + '\\n' + 'image height: ' + src.rows + '\\n' + 'image size: ' + src.size().width + '\*' + src.size().height + '\\n' + 'image depth: ' + src.depth() + '\\n' + 'image channels ' + src.channels() + '\\n' + 'image type: ' + src.type() + '\\n'); @endcode

@note src.type() is very important while debugging because a large number of errors in OpenCV.js code are caused by invalid data type.

## How to construct Mat

There are 4 basic constructors:

@code{.js} // 1. default constructor let mat = new cv.Mat(); // 2. two-dimensional arrays by size and type let mat = new cv.Mat(size, type); // 3. two-dimensional arrays by rows, cols, and type let mat = new cv.Mat(rows, cols, type); // 4. two-dimensional arrays by rows, cols, and type with initialization value let mat = new cv.Mat(rows, cols, type, new cv.Scalar()); @endcode

There are 3 static functions:

@code{.js} // 1. Create a Mat which is full of zeros let mat = cv.Mat.zeros(rows, cols, type); // 2. Create a Mat which is full of ones let mat = cv.Mat.ones(rows, cols, type); // 3. Create a Mat which is an identity matrix let mat = cv.Mat.eye(rows, cols, type); @endcode

There are 2 factory functions: @code{.js} // 1. Use JS array to construct a mat. // For example: let mat = cv.matFromArray(2, 2, cv.CV\_8UC1, \[1, 2, 3, 4\]); let mat = cv.matFromArray(rows, cols, type, array); // 2. Use imgData to construct a mat let ctx = canvas.getContext("2d"); let imgData = ctx.getImageData(0, 0, canvas.width, canvas.height); let mat = cv.matFromImageData(imgData); @endcode

@note Don't forget to delete cv.Mat when you don't want to use it any more.

## How to copy Mat

There are 2 ways to copy a Mat:

@code{.js} // 1. Clone let dst = src.clone(); // 2. CopyTo(only entries indicated in the mask are copied) src.copyTo(dst, mask); @endcode

## How to convert the type of Mat

We use the function: **convertTo(m, rtype, alpha = 1, beta = 0)** @param m output matrix; if it does not have a proper size or type before the operation, it is reallocated. @param rtype desired output matrix type or, rather, the depth since the number of channels are the same as the input has; if rtype is negative, the output matrix will have the same type as the input. @param alpha optional scale factor. @param beta optional delta added to the scaled values.

@code{.js} src.convertTo(dst, rtype); @endcode

## How use MatVector

@code{.js} let mat = new cv.Mat(); // Initialise a MatVector let matVec = new cv.MatVector(); // Push a Mat back into MatVector matVec.push\_back(mat); // Get a Mat fom MatVector let cnt = matVec.get(0); mat.delete(); matVec.delete(); cnt.delete(); @endcode

@note Don't forget to delete cv.Mat, cv.MatVector and cnt(the Mat you get from MatVector) when you don't want to use them any more.

## Accessing and Modifying pixel values

Firstly, you should know the following type relationship:

Data Properties

C++ Type

JavaScript Typed Array

Mat Type

data

uchar

Uint8Array

CV\_8U

data8S

char

Int8Array

CV\_8S

data16U

ushort

Uint16Array

CV\_16U

data16S

short

Int16Array

CV\_16S

data32S

int

Int32Array

CV\_32S

data32F

float

Float32Array

CV\_32F

data64F

double

Float64Array

CV\_64F

**1\. data**

@code{.js} let row = 3, col = 4; let src = cv.imread("canvasInput"); if (src.isContinuous()) { let R = src.data\[row \* src.cols \* src.channels() + col \* src.channels()\]; let G = src.data\[row \* src.cols \* src.channels() + col \* src.channels() + 1\]; let B = src.data\[row \* src.cols \* src.channels() + col \* src.channels() + 2\]; let A = src.data\[row \* src.cols \* src.channels() + col \* src.channels() + 3\]; } @endcode

@note Data manipulation is only valid for continuous Mat. You should use isContinuous() to check first.

**2\. at**

Mat Type

At Manipulation

CV\_8U

ucharAt

CV\_8S

charAt

CV\_16U

ushortAt

CV\_16S

shortAt

CV\_32S

intAt

CV\_32F

floatAt

CV\_64F

doubleAt

@code{.js} let row = 3, col = 4; let src = cv.imread("canvasInput"); let R = src.ucharAt(row, col \* src.channels()); let G = src.ucharAt(row, col \* src.channels() + 1); let B = src.ucharAt(row, col \* src.channels() + 2); let A = src.ucharAt(row, col \* src.channels() + 3); @endcode

@note At manipulation is only for single channel access and the value can't be modified.

**3\. ptr**

Mat Type

Ptr Manipulation

JavaScript Typed Array

CV\_8U

ucharPtr

Uint8Array

CV\_8S

charPtr

Int8Array

CV\_16U

ushortPtr

Uint16Array

CV\_16S

shortPtr

Int16Array

CV\_32S

intPtr

Int32Array

CV\_32F

floatPtr

Float32Array

CV\_64F

doublePtr

Float64Array

@code{.js} let row = 3, col = 4; let src = cv.imread("canvasInput"); let pixel = src.ucharPtr(row, col); let R = pixel\[0\]; let G = pixel\[1\]; let B = pixel\[2\]; let A = pixel\[3\]; @endcode

mat.ucharPtr(k) get the k th row of the mat. mat.ucharPtr(i, j) get the i th row and the j th column of the mat.

## Image ROI

Sometimes, you will have to play with certain region of images. For eye detection in images, first face detection is done all over the image and when face is obtained, we select the face region alone and search for eyes inside it instead of searching whole image. It improves accuracy (because eyes are always on faces) and performance (because we search for a small area)

We use the function: **roi (rect)** @param rect rectangle Region of Interest.

## Try it

\\htmlonly

\\endhtmlonly

## Splitting and Merging Image Channels

Sometimes you will need to work separately on R,G,B channels of image. Then you need to split the RGB images to single planes. Or another time, you may need to join these individual channels to RGB image.

@code{.js} let src = cv.imread("canvasInput"); let rgbaPlanes = new cv.MatVector(); // Split the Mat cv.split(src, rgbaPlanes); // Get R channel let R = rgbaPlanes.get(0); // Merge all channels cv.merge(rgbaPlanes, src); src.delete(); rgbaPlanes.delete(); R.delete(); @endcode

@note Don't forget to delete cv.Mat, cv.MatVector and R(the Mat you get from MatVector) when you don't want to use them any more.

## Making Borders for Images (Padding)

If you want to create a border around the image, something like a photo frame, you can use **cv.copyMakeBorder()** function. But it has more applications for convolution operation, zero padding etc. This function takes following arguments:

-   **src** - input image
    
-   **top**, **bottom**, **left**, **right** - border width in number of pixels in corresponding directions
    
-   **borderType** - Flag defining what kind of border to be added. It can be following types:
    
    -   **cv.BORDER\_CONSTANT** - Adds a constant colored border. The value should be given as next argument.
        -   **cv.BORDER\_REFLECT** - Border will be mirror reflection of the border elements, like this : _fedcba|abcdefgh|hgfedcb_
        -   **cv.BORDER\_REFLECT\_101** or **cv.BORDER\_DEFAULT** - Same as above, but with a slight change, like this : _gfedcb|abcdefgh|gfedcba_
        -   **cv.BORDER\_REPLICATE** - Last element is replicated throughout, like this: _aaaaaa|abcdefgh|hhhhhhh_
        -   **cv.BORDER\_WRAP** - Can't explain, it will look like this : _cdefgh|abcdefgh|abcdefg_
-   **value** - Color of border if border type is cv.BORDER\_CONSTANT
    

## Try it

\\htmlonly

\\endhtmlonly

## [Js Image Arithmetics](https://docharvest.github.io/docs/opencv5/js_tutorials/js_core/js_image_arithmetics/js_image_arithmetics/)

Contents

opencv5

Js Image Arithmetics

OpenCV 5

Js Image Arithmetics

# Arithmetic Operations on Images {#tutorial\_js\_image\_arithmetics}

## Goal

-   Learn several arithmetic operations on images like addition, subtraction, bitwise operations, etc.
-   You will learn these functions : **cv.add()**, **cv.subtract()**, etc.

## Image Addition

You can add two images by OpenCV function, cv.add(). res = img1 + img2. Both images should be of same depth and type.

For example, consider below sample: @code{.js} let src1 = cv.imread("canvasInput1"); let src2 = cv.imread("canvasInput2"); let dst = new cv.Mat(); let mask = new cv.Mat(); let dtype = -1; cv.add(src1, src2, dst, mask, dtype); src1.delete(); src2.delete(); dst.delete(); mask.delete(); @endcode

## Image Subtraction

You can subtract two images by OpenCV function, cv.subtract(). res = img1 - img2. Both images should be of same depth and type. Note that when used with RGBA images, the alpha channel is also subtracted.

For example, consider below sample: @code{.js} let src1 = cv.imread("canvasInput1"); let src2 = cv.imread("canvasInput2"); let dst = new cv.Mat(); let mask = new cv.Mat(); let dtype = -1; cv.subtract(src1, src2, dst, mask, dtype); src1.delete(); src2.delete(); dst.delete(); mask.delete(); @endcode

## Bitwise Operations

This includes bitwise AND, OR, NOT and XOR operations. They will be highly useful while extracting any part of the image, defining and working with non-rectangular ROI etc. Below we will see an example on how to change a particular region of an image.

I want to put OpenCV logo above an image. If I add two images, it will change color. If I blend it, I get an transparent effect. But I want it to be opaque. If it was a rectangular region, I could use ROI as we did in last chapter. But OpenCV logo is a not a rectangular shape. So you can do it with bitwise operations.

## Try it

\\htmlonly

\\endhtmlonly

## [Js Image Arithmetics](https://docharvest.github.io/docs/opencv5/js_tutorials/js_core/js_some_data_structures/js_image_arithmetics/)

Contents

opencv5

Js Image Arithmetics

OpenCV 5

Js Image Arithmetics

# Some Data Structures {#tutorial\_js\_some\_data\_structures}

## Goal

-   You will learn some data structures : **Point**, **Scalar**, **Size**, **Circle**, **Rect**, **RotatedRect** etc.

Scalar is array type in Javascript. Point, Size, Circle, Rect and RotatedRect are object type in JavaScript.

## Point

There are 2 ways to construct a Point and they are the same: @code{.js} // The first way let point = new cv.Point(x, y); // The second way let point = {x: x, y: y}; @endcode

@param x x coordinate of the point.(the origin is the top left corner of the image) @param y y coordinate of the point.

## Scalar

There are 2 ways to construct a Scalar and they are the same: @code{.js} // The first way let scalar = new cv.Scalar(R, G, B, Alpha); // The second way let scalar = \[R, G, B, Alpha\]; @endcode

@param R pixel value of red channel. @param G pixel value of green channel. @param B pixel value of blue channel. @param Alpha pixel value of alpha channel.

## Size

There are 2 ways to construct a Size and they are the same: @code{.js} // The first way let size = new cv.Size(width, height); // The second way let size = {width : width, height : height}; @endcode

@param width the width of the size. @param height the height of the size.

## Circle

There are 2 ways to construct a Circle and they are the same: @code{.js} // The first way let circle = new cv.Circle(center, radius); // The second way let circle = {center : center, radius : radius}; @endcode

@param center the center of the circle. @param radius the radius of the circle.

## Rect

There are 2 ways to construct a Rect and they are the same: @code{.js} // The first way let rect = new cv.Rect(x, y, width, height); // The second way let rect = {x : x, y : y, width : width, height : height}; @endcode

@param x x coordinate of the vertex which is the top left corner of the rectangle. @param y y coordinate of the vertex which is the top left corner of the rectangle. @param width the width of the rectangle. @param height the height of the rectangle.

## RotatedRect

There are 2 ways to construct a RotatedRect and they are the same: @code{.js} // The first way let rotatedRect = new cv.RotatedRect(center, size, angle); // The second way let rotatedRect = {center : center, size : size, angle : angle}; @endcode

@param center the rectangle mass center. @param size width and height of the rectangle. @param angle the rotation angle in a clockwise direction. When the angle is 0, 90, 180, 270 etc., the rectangle becomes an up-right rectangle.

Learn how to get the vertices from rotatedRect:

We use the function: **cv.RotatedRect.points(rotatedRect)** @param rotatedRect rotated rectangle

@code{.js} let vertices = cv.RotatedRect.points(rotatedRect); let point1 = vertices\[0\]; let point2 = vertices\[1\]; let point3 = vertices\[2\]; let point4 = vertices\[3\]; @endcode

Learn how to get the bounding rectangle from rotatedRect:

We use the function: **cv.RotatedRect.boundingRect(rotatedRect)** @param rotatedRect rotated rectangle

@code{.js} let boundingRect = cv.RotatedRect.boundingRect(rotatedRect); @endcode

## [Js Table Of Contents Core](https://docharvest.github.io/docs/opencv5/js_tutorials/js_core/js_table_of_contents_core/)

Contents

opencv5

Js Table Of Contents Core

OpenCV 5

Js Table Of Contents Core

# Core Operations {#tutorial\_js\_table\_of\_contents\_core}

-   @subpage tutorial\_js\_basic\_ops
    
    Learn to read and edit pixel values, working with image ROI and other basic operations.
    
-   @subpage tutorial\_js\_image\_arithmetics
    
    Perform arithmetic operations on images
    
-   @subpage tutorial\_js\_some\_data\_structures
    
    Learn some data structures

## [Js Image Classification](https://docharvest.github.io/docs/opencv5/js_tutorials/js_dnn/js_image_classification/js_image_classification/)

Contents

opencv5

Js Image Classification

OpenCV 5

Js Image Classification

# Image Classification Example {#tutorial\_js\_image\_classification}

## Goal

-   In this tutorial you will learn how to use OpenCV.js dnn module for image classification.

\\htmlonly

\\endhtmlonly

## [Js Image Classification With Camera](https://docharvest.github.io/docs/opencv5/js_tutorials/js_dnn/js_image_classification/js_image_classification_with_camera/)

Contents

opencv5

Js Image Classification With Camera

OpenCV 5

Js Image Classification With Camera

# Image Classification Example with Camera {#tutorial\_js\_image\_classification\_with\_camera}

## Goal

-   In this tutorial you will learn how to use OpenCV.js dnn module for image classification example with camera.

@note If you don't know how to capture video from camera, please review @ref tutorial\_js\_video\_display.

\\htmlonly

\\endhtmlonly

## [Js Object Detection](https://docharvest.github.io/docs/opencv5/js_tutorials/js_dnn/js_object_detection/js_object_detection/)

Contents

opencv5

Js Object Detection

OpenCV 5

Js Object Detection

# Object Detection Example {#tutorial\_js\_object\_detection}

## Goal

-   In this tutorial you will learn how to use OpenCV.js dnn module for object detection.

\\htmlonly

\\endhtmlonly

## [Js Object Detection With Camera](https://docharvest.github.io/docs/opencv5/js_tutorials/js_dnn/js_object_detection/js_object_detection_with_camera/)

Contents

opencv5

Js Object Detection With Camera

OpenCV 5

Js Object Detection With Camera

# Object Detection Example with Camera{#tutorial\_js\_object\_detection\_with\_camera}

## Goal

-   In this tutorial you will learn how to use OpenCV.js dnn module for object detection with camera.

\\htmlonly

\\endhtmlonly

## [Js Pose Estimation](https://docharvest.github.io/docs/opencv5/js_tutorials/js_dnn/js_pose_estimation/js_pose_estimation/)

Contents

opencv5

Js Pose Estimation

OpenCV 5

Js Pose Estimation

# Pose Estimation Example {#tutorial\_js\_pose\_estimation}

## Goal

-   In this tutorial you will learn how to use OpenCV.js dnn module for pose estimation.

\\htmlonly

\\endhtmlonly

## [Js Semantic Segmentation](https://docharvest.github.io/docs/opencv5/js_tutorials/js_dnn/js_semantic_segmentation/js_semantic_segmentation/)

Contents

opencv5

Js Semantic Segmentation

OpenCV 5

Js Semantic Segmentation

# Semantic Segmentation Example {#tutorial\_js\_semantic\_segmentation}

## Goal

-   In this tutorial you will learn how to use OpenCV.js dnn module for semantic segmentation.

\\htmlonly

\\endhtmlonly

## [Js Style Transfer](https://docharvest.github.io/docs/opencv5/js_tutorials/js_dnn/js_style_transfer/js_style_transfer/)

Contents

opencv5

Js Style Transfer

OpenCV 5

Js Style Transfer

# Style Transfer Example {#tutorial\_js\_style\_transfer}

## Goal

-   In this tutorial you will learn how to use OpenCV.js dnn module for style transfer.

\\htmlonly

\\endhtmlonly

## [Js Table Of Contents Dnn](https://docharvest.github.io/docs/opencv5/js_tutorials/js_dnn/js_table_of_contents_dnn/)

Contents

opencv5

Js Table Of Contents Dnn

OpenCV 5

Js Table Of Contents Dnn

# Deep Neural Networks (dnn module) {#tutorial\_js\_table\_of\_contents\_dnn}

-   @subpage tutorial\_js\_image\_classification
    
    Image classification example
    
-   @subpage tutorial\_js\_image\_classification\_with\_camera
    
    Image classification example with camera
    
-   @subpage tutorial\_js\_object\_detection
    
    Object detection example
    
-   @subpage tutorial\_js\_object\_detection\_with\_camera
    
    Object detection example with camera
    
-   @subpage tutorial\_js\_semantic\_segmentation
    
    Semantic segmentation example
    
-   @subpage tutorial\_js\_style\_transfer
    
    Style transfer example
    
-   @subpage tutorial\_js\_pose\_estimation
    
    Pose estimation example

## [Js Image Display](https://docharvest.github.io/docs/opencv5/js_tutorials/js_gui/js_image_display/js_image_display/)

Contents

opencv5

Js Image Display

OpenCV 5

Js Image Display

# Getting Started with Images {#tutorial\_js\_image\_display}

## Goals

-   Learn how to read an image and how to display it in a web.

## Read an image

OpenCV.js saves images as cv.Mat type. We use HTML canvas element to transfer cv.Mat to the web or in reverse. The ImageData interface can represent or set the underlying pixel data of an area of a canvas element.

@note Please refer to canvas docs for more details.

First, create an ImageData obj from canvas: @code{.js} let canvas = document.getElementById(canvasInputId); let ctx = canvas.getContext('2d'); let imgData = ctx.getImageData(0, 0, canvas.width, canvas.height); @endcode

Then, use cv.matFromImageData to construct a cv.Mat: @code{.js} let src = cv.matFromImageData(imgData); @endcode

@note Because canvas only support 8-bit RGBA image with continuous storage, the cv.Mat type is cv.CV\_8UC4. It is different from native OpenCV because images returned and shown by the native **imread** and **imshow** have the channels stored in BGR order.

## Display an image

First, convert the type of src to cv.CV\_8UC4: @code{.js} let dst = new cv.Mat(); // scale and shift are used to map the data to \[0, 255\]. src.convertTo(dst, cv.CV\_8U, scale, shift); // \*\*\* is GRAY, RGB, or RGBA, according to src.channels() is 1, 3 or 4. cv.cvtColor(dst, dst, cv.COLOR\_\*\*\*2RGBA); @endcode

Then, new an ImageData obj from dst: @code{.js} let imgData = new ImageData(new Uint8ClampedArray(dst.data), dst.cols, dst.rows); @endcode

Finally, display it: @code{.js} let canvas = document.getElementById(canvasOutputId); let ctx = canvas.getContext('2d'); ctx.clearRect(0, 0, canvas.width, canvas.height); canvas.width = imgData.width; canvas.height = imgData.height; ctx.putImageData(imgData, 0, 0); @endcode

## In OpenCV.js

OpenCV.js implements image reading and showing using the above method.

We use **cv.imread (imageSource)** to read an image from html canvas or img element. @param imageSource canvas element or id, or img element or id. @return mat with channels stored in RGBA order.

We use **cv.imshow (canvasSource, mat)** to display it. The function may scale the mat, depending on its depth:

-   If the mat is 8-bit unsigned, it is displayed as is.
-   If the mat is 16-bit unsigned or 32-bit integer, the pixels are divided by 256. That is, the value range \[0,255\*256\] is mapped to \[0,255\].
-   If the mat is 32-bit floating-point, the pixel values are multiplied by 255. That is, the value range \[0,1\] is mapped to \[0,255\].

@param canvasSource canvas element or id. @param mat mat to be shown.

The above code of image reading and showing could be simplified as below. @code{.js} let img = cv.imread(imageSource); cv.imshow(canvasOutput, img); img.delete(); @endcode

## Try it

\\htmlonly

\\endhtmlonly

## [Js Table Of Contents Gui](https://docharvest.github.io/docs/opencv5/js_tutorials/js_gui/js_table_of_contents_gui/)

Contents

opencv5

Js Table Of Contents Gui

OpenCV 5

Js Table Of Contents Gui

# GUI Features {#tutorial\_js\_table\_of\_contents\_gui}

-   @subpage tutorial\_js\_image\_display
    
    Learn to load an image and display it in a web
    
-   @subpage tutorial\_js\_video\_display
    
    Learn to capture video from Camera and play it
    
-   @subpage tutorial\_js\_trackbar
    
    Create trackbar to control certain parameters

## [Js Trackbar](https://docharvest.github.io/docs/opencv5/js_tutorials/js_gui/js_trackbar/js_trackbar/)

Contents

opencv5

Js Trackbar

OpenCV 5

Js Trackbar

# Add a Trackbar to Your Application {#tutorial\_js\_trackbar}

## Goal

-   Use HTML DOM Input Range Object to add a trackbar to your application.

## Code Demo

Here, we will create a simple application that blends two images. We will let the user enter the weight by using the trackbar.

First, we need to create three canvas elements: two for input and one for output. Please refer to the tutorial @ref tutorial\_js\_image\_display. @code{.js} let src1 = cv.imread('canvasInput1'); let src2 = cv.imread('canvasInput2'); @endcode

Then, we use HTML DOM Input Range Object to implement the trackbar, which is shown as below.

@note <input> elements with type="range" are not supported in Internet Explorer 9 and earlier versions.

You can create an <input> element with type="range" with the document.createElement() method: @code{.js} let x = document.createElement('INPUT'); x.setAttribute('type', 'range'); @endcode

You can access an <input> element with type="range" with getElementById(): @code{.js} let x = document.getElementById('myRange'); @endcode

As a trackbar, the range element need a trackbar name, the default value, minimum value, maximum value, step and the callback function which is executed every time trackbar value changes. The callback function always has a default argument, which is the trackbar position. Additionally, a text element to display the trackbar value is fine. In our case, we can create the trackbar as below: @code{.html} Weight:   @endcode

Finally, we can use the trackbar value in the callback function, blend the two images, and display the result. @code{.js} let weightValue = document.getElementById('weightValue'); let trackbar = document.getElementById('trackbar'); weightValue.setAttribute('value', trackbar.value); let alpha = trackbar.value/trackbar.max; let beta = ( 1.0 - alpha ); let src1 = cv.imread('canvasInput1'); let src2 = cv.imread('canvasInput2'); let dst = new cv.Mat(); cv.addWeighted( src1, alpha, src2, beta, 0.0, dst, -1); cv.imshow('canvasOutput', dst); dst.delete(); src1.delete(); src2.delete(); @endcode

@sa cv.addWeighted

## Try it

\\htmlonly

\\endhtmlonly

## [Js Video Display](https://docharvest.github.io/docs/opencv5/js_tutorials/js_gui/js_video_display/js_video_display/)


## [Js Canny](https://docharvest.github.io/docs/opencv5/js_tutorials/js_imgproc/js_canny/js_canny/)

Contents

opencv5

Js Canny

OpenCV 5

Js Canny

# Canny Edge Detection {#tutorial\_js\_canny}

## Goal

-   Concept of Canny edge detection
-   OpenCV functions for that : **cv.Canny()**

## Theory

Canny Edge Detection is a popular edge detection algorithm. It was developed by John F. Canny in 1986. It is a multi-stage algorithm and we will go through each stages.

\-# **Noise Reduction**

```
Since edge detection is susceptible to noise in the image, first step is to remove the noise in the
image with a 5x5 Gaussian filter. We have already seen this in previous chapters.
```

\-# **Finding Intensity Gradient of the Image**

```
Smoothened image is then filtered with a Sobel kernel in both horizontal and vertical direction to
get first derivative in horizontal direction (\f$G_x\f$) and vertical direction (\f$G_y\f$). From these two
images, we can find edge gradient and direction for each pixel as follows:

\f[
Edge\_Gradient \; (G) = \sqrt{G_x^2 + G_y^2} \\
Angle \; (\theta) = \tan^{-1} \bigg(\frac{G_y}{G_x}\bigg)
\f]

Gradient direction is always perpendicular to edges. It is rounded to one of four angles
representing vertical, horizontal and two diagonal directions.
```

\-# **Non-maximum Suppression**

```
After getting gradient magnitude and direction, a full scan of image is done to remove any unwanted
pixels which may not constitute the edge. For this, at every pixel, pixel is checked if it is a
local maximum in its neighborhood in the direction of gradient. Check the image below:

![image](images/nms.jpg)

Point A is on the edge ( in vertical direction). Gradient direction is normal to the edge. Point B
and C are in gradient directions. So point A is checked with point B and C to see if it forms a
local maximum. If so, it is considered for next stage, otherwise, it is suppressed ( put to zero).

In short, the result you get is a binary image with "thin edges".
```

\-# **Hysteresis Thresholding**

```
This stage decides which are all edges are really edges and which are not. For this, we need two
threshold values, minVal and maxVal. Any edges with intensity gradient more than maxVal are sure to
be edges and those below minVal are sure to be non-edges, so discarded. Those who lie between these
two thresholds are classified edges or non-edges based on their connectivity. If they are connected
to "sure-edge" pixels, they are considered to be part of edges. Otherwise, they are also discarded.
See the image below:

![image](images/hysteresis.jpg)

The edge A is above the maxVal, so considered as "sure-edge". Although edge C is below maxVal, it is
connected to edge A, so that also considered as valid edge and we get that full curve. But edge B,
although it is above minVal and is in same region as that of edge C, it is not connected to any
"sure-edge", so that is discarded. So it is very important that we have to select minVal and maxVal
accordingly to get the correct result.

This stage also removes small pixels noises on the assumption that edges are long lines.
```

So what we finally get is strong edges in the image.

## Canny Edge Detection in OpenCV

We use the function: **cv.Canny(image, edges, threshold1, threshold2, apertureSize = 3, L2gradient = false)** @param image 8-bit input image. @param edges output edge map; single channels 8-bit image, which has the same size as image. @param threshold1 first threshold for the hysteresis procedure. @param threshold2 second threshold for the hysteresis procedure.. @param apertureSize aperture size for the Sobel operator. @param L2gradient specifies the equation for finding gradient magnitude. If it is True, it uses the equation mentioned above which is more accurate, otherwise it uses this function: \\f$Edge\_Gradient ; (G) = |G\_x| + |G\_y|\\f$.

## Try it

\\htmlonly

\\endhtmlonly

## [Js Colorspaces](https://docharvest.github.io/docs/opencv5/js_tutorials/js_imgproc/js_colorspaces/js_colorspaces/)

Contents

opencv5

Js Colorspaces

OpenCV 5

Js Colorspaces

# Changing Colorspaces {#tutorial\_js\_colorspaces}

## Goal

-   In this tutorial, you will learn how to convert images from one color-space to another, like RGB \\f$\\leftrightarrow\\f$ Gray, RGB \\f$\\leftrightarrow\\f$ HSV etc.
-   You will learn following functions : **cv.cvtColor()**, **cv.inRange()** etc.

## cvtColor

There are more than 150 color-space conversion methods available in OpenCV. But we will look into the most widely used one: RGB \\f$\\leftrightarrow\\f$ Gray.

We use the function: **cv.cvtColor (src, dst, code, dstCn = 0)** @param src input image. @param dst output image of the same size and depth as src @param code color space conversion code(see **cv.ColorConversionCodes**). @param dstCn number of channels in the destination image; if the parameter is 0, the number of the channels is derived automatically from src and code.

For RGB \\f$\\rightarrow\\f$ Gray conversion we use the code cv.COLOR\_RGBA2GRAY.

## Try it

\\htmlonly

\\endhtmlonly

## inRange

Checks if array elements lie between the elements of two other arrays.

We use the function: **cv.inRange (src, lowerb, upperb, dst)** @param src first input image. @param lowerb inclusive lower boundary Mat of the same size as src. @param upperb inclusive upper boundary Mat of the same size as src. @param dst output image of the same size as src and cv.CV\_8U type.

## Try it

\\htmlonly

\\endhtmlonly

## [Js Contour Features](https://docharvest.github.io/docs/opencv5/js_tutorials/js_imgproc/js_contours/js_contour_features/js_contour_features/)

Contents

opencv5

Js Contour Features

OpenCV 5

Js Contour Features

# Contour Features {#tutorial\_js\_contour\_features}

@prev\_tutorial{tutorial\_js\_contours\_begin} @next\_tutorial{tutorial\_js\_contour\_properties}

## Goal

-   To find the different features of contours, like area, perimeter, centroid, bounding box etc
-   You will learn plenty of functions related to contours.

1.  Moments

* * *

Image moments help you to calculate some features like center of mass of the object, area of the object etc. Check out the wikipedia page on [Image Moments](http://en.wikipedia.org/wiki/Image_moment)

We use the function: **cv.moments (array, binaryImage = false)** @param array raster image (single-channel, 8-bit or floating-point 2D array) or an array ( 1×N or N×1 ) of 2D points. @param binaryImage if it is true, all non-zero image pixels are treated as 1's. The parameter is used for images only.

## Try it

\\htmlonly

\\endhtmlonly

From this moments, you can extract useful data like area, centroid etc. Centroid is given by the relations, \\f$C\_x = \\frac{M\_{10}}{M\_{00}}\\f$ and \\f$C\_y = \\frac{M\_{01}}{M\_{00}}\\f$. This can be done as follows: @code{.js} let cx = M.m10/M.m00 let cy = M.m01/M.m00 @endcode

2.  Contour Area

* * *

Contour area is given by the function **cv.contourArea()** or from moments, **M\['m00'\]**.

We use the function: **cv.contourArea (contour, oriented = false)** @param contour input vector of 2D points (contour vertices) @param oriented oriented area flag. If it is true, the function returns a signed area value, depending on the contour orientation (clockwise or counter-clockwise). Using this feature you can determine orientation of a contour by taking the sign of an area. By default, the parameter is false, which means that the absolute value is returned.

## Try it

\\htmlonly

\\endhtmlonly

3.  Contour Perimeter

* * *

It is also called arc length. It can be found out using **cv.arcLength()** function.

We use the function: **cv.arcLength (curve, closed)** @param curve input vector of 2D points. @param closed flag indicating whether the curve is closed or not.

## Try it

\\htmlonly

\\endhtmlonly

4.  Contour Approximation

* * *

It approximates a contour shape to another shape with less number of vertices depending upon the precision we specify. It is an implementation of [Douglas-Peucker algorithm](http://en.wikipedia.org/wiki/Ramer-Douglas-Peucker_algorithm). Check the wikipedia page for algorithm and demonstration.

We use the function: **cv.approxPolyDP (curve, approxCurve, epsilon, closed)** @param curve input vector of 2D points stored in cv.Mat. @param approxCurve result of the approximation. The type should match the type of the input curve. @param epsilon parameter specifying the approximation accuracy. This is the maximum distance between the original curve and its approximation. @param closed If true, the approximated curve is closed (its first and last vertices are connected). Otherwise, it is not closed.

## Try it

\\htmlonly

\\endhtmlonly

5.  Convex Hull

* * *

Convex Hull will look similar to contour approximation, but it is not (Both may provide same results in some cases). Here, **cv.convexHull()** function checks a curve for convexity defects and corrects it. Generally speaking, convex curves are the curves which are always bulged out, or at-least flat. And if it is bulged inside, it is called convexity defects. For example, check the below image of hand. Red line shows the convex hull of hand. The double-sided arrow marks shows the convexity defects, which are the local maximum deviations of hull from contours.

We use the function: **cv.convexHull (points, hull, clockwise = false, returnPoints = true)** @param points input 2D point set. @param hull output convex hull. @param clockwise orientation flag. If it is true, the output convex hull is oriented clockwise. Otherwise, it is oriented counter-clockwise. The assumed coordinate system has its X axis pointing to the right, and its Y axis pointing upwards. @param returnPoints operation flag. In case of a matrix, when the flag is true, the function returns convex hull points. Otherwise, it returns indices of the convex hull points.

## Try it

\\htmlonly

\\endhtmlonly

6.  Checking Convexity

* * *

There is a function to check if a curve is convex or not, **cv.isContourConvex()**. It just return whether True or False. Not a big deal.

@code{.js} cv.isContourConvex(cnt); @endcode

7.  Bounding Rectangle

* * *

There are two types of bounding rectangles.

### 7.a. Straight Bounding Rectangle

It is a straight rectangle, it doesn't consider the rotation of the object. So area of the bounding rectangle won't be minimum.

We use the function: **cv.boundingRect (points)** @param points input 2D point set.

## Try it

\\htmlonly

\\endhtmlonly

### 7.b. Rotated Rectangle

Here, bounding rectangle is drawn with minimum area, so it considers the rotation also.

We use the function: **cv.minAreaRect (points)** @param points input 2D point set.

## Try it

\\htmlonly

\\endhtmlonly

8.  Minimum Enclosing Circle

* * *

Next we find the circumcircle of an object using the function **cv.minEnclosingCircle()**. It is a circle which completely covers the object with minimum area.

We use the functions: **cv.minEnclosingCircle (points)** @param points input 2D point set.

**cv.circle (img, center, radius, color, thickness = 1, lineType = cv.LINE\_8, shift = 0)** @param img image where the circle is drawn. @param center center of the circle. @param radius radius of the circle. @param color circle color. @param thickness thickness of the circle outline, if positive. Negative thickness means that a filled circle is to be drawn. @param lineType type of the circle boundary. @param shift number of fractional bits in the coordinates of the center and in the radius value.

## Try it

\\htmlonly

\\endhtmlonly

9.  Fitting an Ellipse

* * *

Next one is to fit an ellipse to an object. It returns the rotated rectangle in which the ellipse is inscribed. We use the functions: **cv.fitEllipse (points)** @param points input 2D point set.

**cv.ellipse1 (img, box, color, thickness = 1, lineType = cv.LINE\_8)** @param img image. @param box alternative ellipse representation via RotatedRect. This means that the function draws an ellipse inscribed in the rotated rectangle. @param color ellipse color. @param thickness thickness of the ellipse arc outline, if positive. Otherwise, this indicates that a filled ellipse sector is to be drawn. @param lineType type of the ellipse boundary.

## Try it

\\htmlonly

\\endhtmlonly

10.  Fitting a Line

* * *

Similarly we can fit a line to a set of points. We can approximate a straight line to it.

We use the functions: **cv.fitLine (points, line, distType, param, reps, aeps)** @param points input 2D point set. @param line output line parameters. It should be a Mat of 4 elements\[vx, vy, x0, y0\], where \[vx, vy\] is a normalized vector collinear to the line and \[x0, y0\] is a point on the line. @param distType distance used by the M-estimator(see cv.DistanceTypes). @param param numerical parameter ( C ) for some types of distances. If it is 0, an optimal value is chosen. @param reps sufficient accuracy for the radius (distance between the coordinate origin and the line). @param aeps sufficient accuracy for the angle. 0.01 would be a good default value for reps and aeps.

**cv.line (img, pt1, pt2, color, thickness = 1, lineType = cv.LINE\_8, shift = 0)** @param img image. @param pt1 first point of the line segment. @param pt2 second point of the line segment. @param color line color. @param thickness line thickness. @param lineType type of the line,. @param shift number of fractional bits in the point coordinates.

## Try it

\\htmlonly

\\endhtmlonly

## [Js Contour Properties](https://docharvest.github.io/docs/opencv5/js_tutorials/js_imgproc/js_contours/js_contour_properties/js_contour_properties/)

Contents

opencv5

Js Contour Properties

OpenCV 5

Js Contour Properties

# Contour Properties {#tutorial\_js\_contour\_properties}

@prev\_tutorial{tutorial\_js\_contour\_features} @next\_tutorial{tutorial\_js\_contours\_more\_functions}

## Goal

-   Here we will learn to extract some frequently used properties of objects like Solidity, Equivalent Diameter, Mask image, Mean Intensity etc.

1.  Aspect Ratio

* * *

It is the ratio of width to height of bounding rect of the object.

\\f\[Aspect ; Ratio = \\frac{Width}{Height}\\f\] @code{.js} let rect = cv.boundingRect(cnt); let aspectRatio = rect.width / rect.height; @endcode

2.  Extent

* * *

Extent is the ratio of contour area to bounding rectangle area.

\\f\[Extent = \\frac{Object ; Area}{Bounding ; Rectangle ; Area}\\f\] @code{.js} let area = cv.contourArea(cnt, false); let rect = cv.boundingRect(cnt)); let rectArea = rect.width \* rect.height; let extent = area / rectArea; @endcode

3.  Solidity

* * *

Solidity is the ratio of contour area to its convex hull area.

\\f\[Solidity = \\frac{Contour ; Area}{Convex ; Hull ; Area}\\f\] @code{.js} let area = cv.contourArea(cnt, false); cv.convexHull(cnt, hull, false, true); let hullArea = cv.contourArea(hull, false); let solidity = area / hullArea; @endcode

4.  Equivalent Diameter

* * *

Equivalent Diameter is the diameter of the circle whose area is same as the contour area.

\\f\[Equivalent ; Diameter = \\sqrt{\\frac{4 \\times Contour ; Area}{\\pi}}\\f\] @code{.js} let area = cv.contourArea(cnt, false); let equiDiameter = Math.sqrt(4 \* area / Math.PI); @endcode

5.  Orientation

* * *

Orientation is the angle at which object is directed. Following method also gives the Major Axis and Minor Axis lengths. @code{.js} let rotatedRect = cv.fitEllipse(cnt); let angle = rotatedRect.angle; @endcode

6.  Mask and Pixel Points

* * *

In some cases, we may need all the points which comprises that object.

We use the function: **cv.transpose (src, dst)** @param src input array. @param dst output array of the same type as src.

\\htmlonly

\\endhtmlonly

7.  Maximum Value, Minimum Value and their locations

* * *

We use the function: **cv.minMaxLoc(src, mask)** @param src input single-channel array. @param mask optional mask used to select a sub-array.

@code{.js} let result = cv.minMaxLoc(src, mask); let minVal = result.minVal; let maxVal = result.maxVal; let minLoc = result.minLoc; let maxLoc = result.maxLoc; @endcode

8.  Mean Color or Mean Intensity

* * *

Here, we can find the average color of an object. Or it can be average intensity of the object in grayscale mode. We again use the same mask to do it.

We use the function: **cv.mean (src, mask)** @param src input array that should have from 1 to 4 channels so that the result can be stored in Scalar. @param mask optional operation mask.

@code{.js} let average = cv.mean(src, mask); @endcode

## [Js Contours Begin](https://docharvest.github.io/docs/opencv5/js_tutorials/js_imgproc/js_contours/js_contours_begin/js_contours_begin/)

Contents

opencv5

Js Contours Begin

OpenCV 5

Js Contours Begin

# Contours : Getting Started {#tutorial\_js\_contours\_begin}

@next\_tutorial{tutorial\_js\_contour\_features}

## Goal

-   Understand what contours are.
-   Learn to find contours, draw contours etc
-   You will learn these functions : **cv.findContours()**, **cv.drawContours()**

## What are contours?

Contours can be explained simply as a curve joining all the continuous points (along the boundary), having same color or intensity. The contours are a useful tool for shape analysis and object detection and recognition.

-   For better accuracy, use binary images. So before finding contours, apply threshold or canny edge detection.
-   Since opencv 3.2 source image is not modified by this function.
-   In OpenCV, finding contours is like finding white object from black background. So remember, object to be found should be white and background should be black.

## How to draw the contours?

To draw the contours, cv.drawContours function is used. It can also be used to draw any shape provided you have its boundary points.

We use the functions: **cv.findContours (image, contours, hierarchy, mode, method, offset = new cv.Point(0, 0))** @param image source, an 8-bit single-channel image. Non-zero pixels are treated as 1's. Zero pixels remain 0's, so the image is treated as binary. @param contours detected contours. @param hierarchy containing information about the image topology. It has as many elements as the number of contours. @param mode contour retrieval mode(see cv.RetrievalModes). @param method contour approximation method(see cv.ContourApproximationModes). @param offset optional offset by which every contour point is shifted. This is useful if the contours are extracted from the image ROI and then they should be analyzed in the whole image context.

**cv.drawContours (image, contours, contourIdx, color, thickness = 1, lineType = cv.LINE\_8, hierarchy = new cv.Mat(), maxLevel = INT\_MAX, offset = new cv.Point(0, 0))** @param image destination image. @param contours all the input contours. @param contourIdx parameter indicating a contour to draw. If it is negative, all the contours are drawn. @param color color of the contours. @param thickness thickness of lines the contours are drawn with. If it is negative, the contour interiors are drawn. @param lineType line connectivity(see cv.LineTypes). @param hierarchy optional information about hierarchy. It is only needed if you want to draw only some of the contours(see maxLevel).

@param maxLevel maximal level for drawn contours. If it is 0, only the specified contour is drawn. If it is 1, the function draws the contour(s) and all the nested contours. If it is 2, the function draws the contours, all the nested contours, all the nested-to-nested contours, and so on. This parameter is only taken into account when there is hierarchy available. @param offset optional contour shift parameter.

## Try it

\\htmlonly

\\endhtmlonly

# Contour Approximation Method

This is the fifth argument in cv.findContours function. What does it denote actually?

Above, we told that contours are the boundaries of a shape with same intensity. It stores the (x,y) coordinates of the boundary of a shape. But does it store all the coordinates ? That is specified by this contour approximation method.

If you pass cv.ContourApproximationModes.CHAIN\_APPROX\_NONE.value, all the boundary points are stored. But actually do we need all the points? For eg, you found the contour of a straight line. Do you need all the points on the line to represent that line? No, we need just two end points of that line. This is what cv.CHAIN\_APPROX\_SIMPLE does. It removes all redundant points and compresses the contour, thereby saving memory.

## [Js Contours Hierarchy](https://docharvest.github.io/docs/opencv5/js_tutorials/js_imgproc/js_contours/js_contours_hierarchy/js_contours_hierarchy/)

Contents

opencv5

Js Contours Hierarchy

OpenCV 5

Js Contours Hierarchy

# Contours Hierarchy {#tutorial\_js\_contours\_hierarchy}

@prev\_tutorial{tutorial\_js\_contours\_more\_functions}

## Goal

-   This time, we learn about the hierarchy of contours, i.e. the parent-child relationship in Contours.

## Theory

In the last few articles on contours, we have worked with several functions related to contours provided by OpenCV. But when we found the contours in image using **cv.findContours()** function, we have passed an argument, **Contour Retrieval Mode**. We usually passed **cv.RETR\_LIST** or **cv.RETR\_TREE** and it worked nice. But what does it actually mean ?

Also, in the output, we got three arrays, first is the image, second is our contours, and one more output which we named as **hierarchy** (Please checkout the codes in previous articles). But we never used this hierarchy anywhere. Then what is this hierarchy and what is it for ? What is its relationship with the previous mentioned function argument ?

That is what we are going to deal in this article.

### What is Hierarchy?

Normally we use the **cv.findContours()** function to detect objects in an image, right ? Sometimes objects are in different locations. But in some cases, some shapes are inside other shapes. Just like nested figures. In this case, we call outer one as **parent** and inner one as **child**. This way, contours in an image has some relationship to each other. And we can specify how one contour is connected to each other, like, is it child of some other contour, or is it a parent etc. Representation of this relationship is called the **Hierarchy**.

Consider an example image below :

In this image, there are a few shapes which I have numbered from **0-5**. _2 and 2a_ denotes the external and internal contours of the outermost box.

Here, contours 0,1,2 are **external or outermost**. We can say, they are in **hierarchy-0** or simply they are in **same hierarchy level**.

Next comes **contour-2a**. It can be considered as a **child of contour-2** (or in opposite way, contour-2 is parent of contour-2a). So let it be in **hierarchy-1**. Similarly contour-3 is child of contour-2a and it comes in next hierarchy. Finally contours 4,5 are the children of contour-3a, and they come in the last hierarchy level. From the way I numbered the boxes, I would say contour-4 is the first child of contour-3a (It can be contour-5 also).

I mentioned these things to understand terms like **same hierarchy level**, **external contour**, **child contour**, **parent contour**, **first child** etc. Now let's get into OpenCV.

### Hierarchy Representation in OpenCV

So each contour has its own information regarding what hierarchy it is, who is its child, who is its parent etc. OpenCV represents it as an array of four values : **\[Next, Previous, First\_Child, Parent\]**

\*"Next denotes next contour at the same hierarchical level."\*

For eg, take contour-0 in our picture. Who is next contour in its same level ? It is contour-1. So simply put Next = 1. Similarly for Contour-1, next is contour-2. So Next = 2.

What about contour-2? There is no next contour in the same level. So simply, put Next = -1. What about contour-4? It is in same level with contour-5. So its next contour is contour-5, so Next = 5.

\*"Previous denotes previous contour at the same hierarchical level."\*

It is same as above. Previous contour of contour-1 is contour-0 in the same level. Similarly for contour-2, it is contour-1. And for contour-0, there is no previous, so put it as -1.

\*"First\_Child denotes its first child contour."\*

There is no need of any explanation. For contour-2, child is contour-2a. So it gets the corresponding index value of contour-2a. What about contour-3a? It has two children. But we take only first child. And it is contour-4. So First\_Child = 4 for contour-3a.

\*"Parent denotes index of its parent contour."\*

It is just opposite of **First\_Child**. Both for contour-4 and contour-5, parent contour is contour-3a. For contour-3a, it is contour-3 and so on.

@note If there is no child or parent, that field is taken as -1

So now we know about the hierarchy style used in OpenCV, we can check into Contour Retrieval Modes in OpenCV with the help of same image given above. ie what do flags like cv.RETR\_LIST, cv.RETR\_TREE, cv.RETR\_CCOMP, cv.RETR\_EXTERNAL etc mean?

## Contour Retrieval Mode

### 1\. RETR\_LIST

This is the simplest of the four flags (from explanation point of view). It simply retrieves all the contours, but doesn't create any parent-child relationship. **Parents and kids are equal under this rule, and they are just contours**. ie they all belongs to same hierarchy level.

So here, 3rd and 4th term in hierarchy array is always -1. But obviously, Next and Previous terms will have their corresponding values.

### 2\. RETR\_EXTERNAL

If you use this flag, it returns only extreme outer flags. All child contours are left behind. **We can say, under this law, Only the eldest in every family is taken care of. It doesn't care about other members of the family)**.

### 3\. RETR\_CCOMP

This flag retrieves all the contours and arranges them to a 2-level hierarchy. ie external contours of the object (ie its boundary) are placed in hierarchy-1. And the contours of holes inside object (if any) is placed in hierarchy-2. If any object inside it, its contour is placed again in hierarchy-1 only. And its hole in hierarchy-2 and so on.

Just consider the image of a "big white zero" on a black background. Outer circle of zero belongs to first hierarchy, and inner circle of zero belongs to second hierarchy.

We can explain it with a simple image. Here I have labelled the order of contours in red color and the hierarchy they belongs to, in green color (either 1 or 2). The order is same as the order OpenCV detects contours.

So consider first contour, ie contour-0. It is hierarchy-1. It has two holes, contours 1&2, and they belong to hierarchy-2. So for contour-0, Next contour in same hierarchy level is contour-3. And there is no previous one. And its first is child is contour-1 in hierarchy-2. It has no parent, because it is in hierarchy-1. So its hierarchy array is \[3,-1,1,-1\]

Now take contour-1. It is in hierarchy-2. Next one in same hierarchy (under the parenthood of contour-1) is contour-2. No previous one. No child, but parent is contour-0. So array is \[2,-1,-1,0\].

Similarly contour-2 : It is in hierarchy-2. There is not next contour in same hierarchy under contour-0. So no Next. Previous is contour-1. No child, parent is contour-0. So array is \[-1,1,-1,0\].

Contour - 3 : Next in hierarchy-1 is contour-5. Previous is contour-0. Child is contour-4 and no parent. So array is \[5,0,4,-1\].

Contour - 4 : It is in hierarchy 2 under contour-3 and it has no sibling. So no next, no previous, no child, parent is contour-3. So array is \[-1,-1,-1,3\].

### 4\. RETR\_TREE

And this is the final guy, Mr.Perfect. It retrieves all the contours and creates a full family hierarchy list. **It even tells, who is the grandpa, father, son, grandson and even beyond... :)**.

For example, I took above image, rewrite the code for cv.RETR\_TREE, reorder the contours as per the result given by OpenCV and analyze it. Again, red letters give the contour number and green letters give the hierarchy order.

Take contour-0 : It is in hierarchy-0. Next contour in same hierarchy is contour-7. No previous contours. Child is contour-1. And no parent. So array is \[7,-1,1,-1\].

Take contour-2 : It is in hierarchy-1. No contour in same level. No previous one. Child is contour-2. Parent is contour-0. So array is \[-1,-1,2,0\].

## [Js Contours More Functions](https://docharvest.github.io/docs/opencv5/js_tutorials/js_imgproc/js_contours/js_contours_more_functions/js_contours_more_functions/)

Contents

opencv5

Js Contours More Functions

OpenCV 5

Js Contours More Functions

# Contours : More Functions {#tutorial\_js\_contours\_more\_functions}

@prev\_tutorial{tutorial\_js\_contour\_properties} @next\_tutorial{tutorial\_js\_contours\_hierarchy}

## Goal

-   Convexity defects and how to find them.
-   Finding shortest distance from a point to a polygon
-   Matching different shapes

## Theory and Code

### 1\. Convexity Defects

We saw what is convex hull in second chapter about contours. Any deviation of the object from this hull can be considered as convexity defect.We can visualize it using an image. We draw a line joining start point and end point, then draw a circle at the farthest point.

@note Remember we have to pass returnPoints = False while finding convex hull, in order to find convexity defects.

We use the function: **cv.convexityDefects (contour, convexhull, convexityDefect)** @param contour input contour. @param convexhull convex hull obtained using convexHull that should contain indices of the contour points that make the hull @param convexityDefect the output vector of convexity defects. Each convexity defect is represented as 4-element(start\_index, end\_index, farthest\_pt\_index, fixpt\_depth), where indices are 0-based indices in the original contour of the convexity defect beginning, end and the farthest point, and fixpt\_depth is fixed-point approximation (with 8 fractional bits) of the distance between the farthest contour point and the hull. That is, to get the floating-point value of the depth will be fixpt\_depth/256.0.

## Try it

\\htmlonly

\\endhtmlonly

### 2\. Point Polygon Test

This function finds the shortest distance between a point in the image and a contour. It returns the distance which is negative when point is outside the contour, positive when point is inside and zero if point is on the contour.

We use the function: **cv.pointPolygonTest (contour, pt, measureDist)** @param contour input contour. @param pt point tested against the contour. @param measureDist if true, the function estimates the signed distance from the point to the nearest contour edge. Otherwise, the function only checks if the point is inside a contour or not.

@code{.js} let dist = cv.pointPolygonTest(cnt, new cv.Point(50, 50), true); @endcode

### 3\. Match Shapes

OpenCV comes with a function **cv.matchShapes()** which enables us to compare two shapes, or two contours and returns a metric showing the similarity. The lower the result, the better match it is. It is calculated based on the hu-moment values. Different measurement methods are explained in the docs.

We use the function: **cv.matchShapes (contour1, contour2, method, parameter)** @param contour1 first contour or grayscale image. @param contour2 second contour or grayscale image. @param method comparison method, see cv::ShapeMatchModes @param parameter method-specific parameter(not supported now).

## Try it

\\htmlonly

\\endhtmlonly

## [Js Table Of Contents Contours](https://docharvest.github.io/docs/opencv5/js_tutorials/js_imgproc/js_contours/js_table_of_contents_contours/)

Contents

opencv5

Js Table Of Contents Contours

OpenCV 5

Js Table Of Contents Contours

# Contours in OpenCV.js {#tutorial\_js\_table\_of\_contents\_contours}

-   @subpage tutorial\_js\_contours\_begin
    
    Learn to find and draw Contours.
    
-   @subpage tutorial\_js\_contour\_features
    
    Learn to find different features of contours like area, perimeter, bounding rectangle etc.
    
-   @subpage tutorial\_js\_contour\_properties
    
    Learn to find different properties of contours like Solidity, Mean Intensity etc.
    
-   @subpage tutorial\_js\_contours\_more\_functions
    
    Learn to find convexity defects, pointPolygonTest, match different shapes etc.
    
-   @subpage tutorial\_js\_contours\_hierarchy
    
    Learn about Contour Hierarchy

## [Js Filtering](https://docharvest.github.io/docs/opencv5/js_tutorials/js_imgproc/js_filtering/js_filtering/)

Contents

opencv5

Js Filtering

OpenCV 5

Js Filtering

# Smoothing Images {#tutorial\_js\_filtering}

## Goals

-   Blur the images with various low pass filters
-   Apply custom-made filters to images (2D convolution)

## 2D Convolution ( Image Filtering )

As in one-dimensional signals, images also can be filtered with various low-pass filters(LPF), high-pass filters(HPF) etc. LPF helps in removing noises, blurring the images etc. HPF filters helps in finding edges in the images.

OpenCV provides a function **cv.filter2D()** to convolve a kernel with an image. As an example, we will try an averaging filter on an image. A 5x5 averaging filter kernel will look like below:

\\f\[K = \\frac{1}{25} \\begin{bmatrix} 1 & 1 & 1 & 1 & 1 \\ 1 & 1 & 1 & 1 & 1 \\ 1 & 1 & 1 & 1 & 1 \\ 1 & 1 & 1 & 1 & 1 \\ 1 & 1 & 1 & 1 & 1 \\end{bmatrix}\\f\]

We use the functions: **cv.filter2D (src, dst, ddepth, kernel, anchor = new cv.Point(-1, -1), delta = 0, borderType = cv.BORDER\_DEFAULT)** @param src input image. @param dst output image of the same size and the same number of channels as src. @param ddepth desired depth of the destination image. @param kernel convolution kernel (or rather a correlation kernel), a single-channel floating point matrix; if you want to apply different kernels to different channels, split the image into separate color planes using split and process them individually. @param anchor anchor of the kernel that indicates the relative position of a filtered point within the kernel; the anchor should lie within the kernel; default value new cv.Point(-1, -1) means that the anchor is at the kernel center. @param delta optional value added to the filtered pixels before storing them in dst. @param borderType pixel extrapolation method(see cv.BorderTypes).

## Try it

\\htmlonly

\\endhtmlonly

## Image Blurring (Image Smoothing)

Image blurring is achieved by convolving the image with a low-pass filter kernel. It is useful for removing noises. It actually removes high frequency content (eg: noise, edges) from the image. So edges are blurred a little bit in this operation. (Well, there are blurring techniques which doesn't blur the edges too). OpenCV provides mainly four types of blurring techniques.

### 1\. Averaging

This is done by convolving image with a normalized box filter. It simply takes the average of all the pixels under kernel area and replace the central element. This is done by the function **cv.blur()** or **cv.boxFilter()**. Check the docs for more details about the kernel. We should specify the width and height of kernel. A 3x3 normalized box filter would look like below:

\\f\[K = \\frac{1}{9} \\begin{bmatrix} 1 & 1 & 1 \\ 1 & 1 & 1 \\ 1 & 1 & 1 \\end{bmatrix}\\f\]

We use the functions: **cv.blur (src, dst, ksize, anchor = new cv.Point(-1, -1), borderType = cv.BORDER\_DEFAULT)** @param src input image; it can have any number of channels, which are processed independently, but the depth should be CV\_8U, CV\_16U, CV\_16S, CV\_32F or CV\_64F. @param dst output image of the same size and type as src. @param ksize blurring kernel size. @param anchor anchor point; anchor = new cv.Point(-1, -1) means that the anchor is at the kernel center. @param borderType border mode used to extrapolate pixels outside of the image(see cv.BorderTypes).

**cv.boxFilter (src, dst, ddepth, ksize, anchor = new cv.Point(-1, -1), normalize = true, borderType = cv.BORDER\_DEFAULT)** @param src input image. @param dst output image of the same size and type as src. @param ddepth the output image depth (-1 to use src.depth()). @param ksize blurring kernel size. @param anchor anchor point; anchor = new cv.Point(-1, -1) means that the anchor is at the kernel center. @param normalize flag, specifying whether the kernel is normalized by its area or not. @param borderType border mode used to extrapolate pixels outside of the image(see cv.BorderTypes).

@note If you don't want to use normalized box filter, use **cv.boxFilter()**. Pass an argument normalize = false to the function.

## Try it

\\htmlonly

\\endhtmlonly

### 2\. Gaussian Blurring

In this, instead of box filter, gaussian kernel is used.

We use the function: **cv.GaussianBlur (src, dst, ksize, sigmaX, sigmaY = 0, borderType = cv.BORDER\_DEFAULT)** @param src input image; the image can have any number of channels, which are processed independently, but the depth should be CV\_8U, CV\_16U, CV\_16S, CV\_32F or CV\_64F. @param dst output image of the same size and type as src. @param ksize blurring kernel size. @param sigmaX Gaussian kernel standard deviation in X direction. @param sigmaY Gaussian kernel standard deviation in Y direction; if sigmaY is zero, it is set to be equal to sigmaX, if both sigmas are zeros, they are computed from ksize.width and ksize.height, to fully control the result regardless of possible future modifications of all this semantics, it is recommended to specify all of ksize, sigmaX, and sigmaY. @param borderType pixel extrapolation method(see cv.BorderTypes).

## Try it

\\htmlonly

\\endhtmlonly

### 3\. Median Blurring

Here, the function **cv.medianBlur()** takes median of all the pixels under kernel area and central element is replaced with this median value. This is highly effective against salt-and-pepper noise in the images. Interesting thing is that, in the above filters, central element is a newly calculated value which may be a pixel value in the image or a new value. But in median blurring, central element is always replaced by some pixel value in the image. It reduces the noise effectively. Its kernel size should be a positive odd integer.

We use the function: **cv.medianBlur (src, dst, ksize)** @param src input 1, 3, or 4 channel image; when ksize is 3 or 5, the image depth should be cv.CV\_8U, cv.CV\_16U, or cv.CV\_32F, for larger aperture sizes, it can only be cv.CV\_8U. @param dst destination array of the same size and type as src. @param ksize aperture linear size; it must be odd and greater than 1, for example: 3, 5, 7 ...

@note The median filter uses cv.BORDER\_REPLICATE internally to cope with border pixels.

## Try it

\\htmlonly

\\endhtmlonly

### 4\. Bilateral Filtering

**cv.bilateralFilter()** is highly effective in noise removal while keeping edges sharp. But the operation is slower compared to other filters. We already saw that gaussian filter takes the a neighbourhood around the pixel and find its gaussian weighted average. This gaussian filter is a function of space alone, that is, nearby pixels are considered while filtering. It doesn't consider whether pixels have almost same intensity. It doesn't consider whether pixel is an edge pixel or not. So it blurs the edges also, which we don't want to do.

Bilateral filter also takes a gaussian filter in space, but one more gaussian filter which is a function of pixel difference. Gaussian function of space make sure only nearby pixels are considered for blurring while gaussian function of intensity difference make sure only those pixels with similar intensity to central pixel is considered for blurring. So it preserves the edges since pixels at edges will have large intensity variation.

We use the function: **cv.bilateralFilter (src, dst, d, sigmaColor, sigmaSpace, borderType = cv.BORDER\_DEFAULT)** @param src source 8-bit or floating-point, 1-channel or 3-channel image. @param dst output image of the same size and type as src. @param d diameter of each pixel neighborhood that is used during filtering. If it is non-positive, it is computed from sigmaSpace. @param sigmaColor filter sigma in the color space. A larger value of the parameter means that farther colors within the pixel neighborhood will be mixed together, resulting in larger areas of semi-equal color. @param sigmaSpace filter sigma in the coordinate space. A larger value of the parameter means that farther pixels will influence each other as long as their colors are close enough. When d>0, it specifies the neighborhood size regardless of sigmaSpace. Otherwise, d is proportional to sigmaSpace. @param borderType border mode used to extrapolate pixels outside of the image(see cv.BorderTypes).

@note For simplicity, you can set the 2 sigma values to be the same. If they are small (< 10), the filter will not have much effect, whereas if they are large (> 150), they will have a very strong effect, making the image look "cartoonish". Large filters (d > 5) are very slow, so it is recommended to use d=5 for real-time applications, and perhaps d=9 for offline applications that need heavy noise filtering.

## Try it

\\htmlonly

\\endhtmlonly

## [Js Geometric Transformations](https://docharvest.github.io/docs/opencv5/js_tutorials/js_imgproc/js_geometric_transformations/js_geometric_transformations/)

Contents

opencv5

Js Geometric Transformations

OpenCV 5

Js Geometric Transformations

# Geometric Transformations of Images {#tutorial\_js\_geometric\_transformations}

## Goals

-   Learn how to apply different geometric transformation to images like translation, rotation, affine transformation etc.
-   You will learn these functions: **cv.resize**, **cv.warpAffine**, **cv.getAffineTransform** and **cv.warpPerspective**

## Transformations

### Scaling

Scaling is just resizing of the image. OpenCV comes with a function **cv.resize()** for this purpose. The size of the image can be specified manually, or you can specify the scaling factor. Different interpolation methods are used. Preferable interpolation methods are **cv.INTER\_AREA** for shrinking and **cv.INTER\_CUBIC** (slow) & **cv.INTER\_LINEAR** for zooming.

We use the function: **cv.resize (src, dst, dsize, fx = 0, fy = 0, interpolation = cv.INTER\_LINEAR)** @param src input image @param dst output image; it has the size dsize (when it is non-zero) or the size computed from src.size(), fx, and fy; the type of dst is the same as of src. @param dsize output image size; if it equals zero, it is computed as: \\f\[𝚍𝚜𝚒𝚣𝚎 = 𝚂𝚒𝚣𝚎(𝚛𝚘𝚞𝚗𝚍(𝚏𝚡_𝚜𝚛𝚌.𝚌𝚘𝚕𝚜), 𝚛𝚘𝚞𝚗𝚍(𝚏𝚢_𝚜𝚛𝚌.𝚛𝚘𝚠𝚜))\\f\] Either dsize or both fx and fy must be non-zero. @param fx scale factor along the horizontal axis; when it equals 0, it is computed as \\f\[(𝚍𝚘𝚞𝚋𝚕𝚎)𝚍𝚜𝚒𝚣𝚎.𝚠𝚒𝚍𝚝𝚑/𝚜𝚛𝚌.𝚌𝚘𝚕𝚜\\f\]

@param fy scale factor along the vertical axis; when it equals 0, it is computed as \\f\[(𝚍𝚘𝚞𝚋𝚕𝚎)𝚍𝚜𝚒𝚣𝚎.𝚑𝚎𝚒𝚐𝚑𝚝/𝚜𝚛𝚌.𝚛𝚘𝚠𝚜\\f\] @param interpolation interpolation method(see **cv.InterpolationFlags**)

## Try it

\\htmlonly

\\endhtmlonly

### Translation

Translation is the shifting of object's location. If you know the shift in (x,y) direction, let it be \\f$(t\_x,t\_y)\\f$, you can create the transformation matrix \\f$\\textbf{M}\\f$ as follows:

\\f\[M = \\begin{bmatrix} 1 & 0 & t\_x \\ 0 & 1 & t\_y \\end{bmatrix}\\f\]

We use the function: **cv.warpAffine (src, dst, M, dsize, flags = cv.INTER\_LINEAR, borderMode = cv.BORDER\_CONSTANT, borderValue = new cv.Scalar())** @param src input image. @param dst output image that has the size dsize and the same type as src. @param Mat 2 × 3 transformation matrix(cv.CV\_64FC1 type). @param dsize size of the output image. @param flags combination of interpolation methods(see cv.InterpolationFlags) and the optional flag WARP\_INVERSE\_MAP that means that M is the inverse transformation ( 𝚍𝚜𝚝→𝚜𝚛𝚌 ) @param borderMode pixel extrapolation method (see cv.BorderTypes); when borderMode = BORDER\_TRANSPARENT, it means that the pixels in the destination image corresponding to the "outliers" in the source image are not modified by the function. @param borderValue value used in case of a constant border; by default, it is 0.

rows.

## Try it

\\htmlonly

\\endhtmlonly

### Rotation

Rotation of an image for an angle \\f$\\theta\\f$ is achieved by the transformation matrix of the form

\\f\[M = \\begin{bmatrix} cos\\theta & -sin\\theta \\ sin\\theta & cos\\theta \\end{bmatrix}\\f\]

But OpenCV provides scaled rotation with adjustable center of rotation so that you can rotate at any location you prefer. Modified transformation matrix is given by

\\f\[\\begin{bmatrix} \\alpha & \\beta & (1- \\alpha ) \\cdot center.x - \\beta \\cdot center.y \\ - \\beta & \\alpha & \\beta \\cdot center.x + (1- \\alpha ) \\cdot center.y \\end{bmatrix}\\f\]

where:

\\f\[\\begin{array}{l} \\alpha = scale \\cdot \\cos \\theta , \\ \\beta = scale \\cdot \\sin \\theta \\end{array}\\f\]

We use the function: **cv.getRotationMatrix2D (center, angle, scale)** @param center center of the rotation in the source image. @param angle rotation angle in degrees. Positive values mean counter-clockwise rotation (the coordinate origin is assumed to be the top-left corner). @param scale isotropic scale factor.

## Try it

\\htmlonly

\\endhtmlonly

### Affine Transformation

In affine transformation, all parallel lines in the original image will still be parallel in the output image. To find the transformation matrix, we need three points from input image and their corresponding locations in output image. Then **cv.getAffineTransform** will create a 2x3 matrix which is to be passed to **cv.warpAffine**.

We use the function: **cv.getAffineTransform (src, dst)**

@param src three points(\[3, 1\] size and cv.CV\_32FC2 type) from input imag. @param dst three corresponding points(\[3, 1\] size and cv.CV\_32FC2 type) in output image.

## Try it

\\htmlonly

\\endhtmlonly

### Perspective Transformation

For perspective transformation, you need a 3x3 transformation matrix. Straight lines will remain straight even after the transformation. To find this transformation matrix, you need 4 points on the input image and corresponding points on the output image. Among these 4 points, 3 of them should not be collinear. Then transformation matrix can be found by the function **cv.getPerspectiveTransform**. Then apply **cv.warpPerspective** with this 3x3 transformation matrix.

We use the functions: **cv.warpPerspective (src, dst, M, dsize, flags = cv.INTER\_LINEAR, borderMode = cv.BORDER\_CONSTANT, borderValue = new cv.Scalar())**

@param src input image. @param dst output image that has the size dsize and the same type as src. @param Mat 3 × 3 transformation matrix(cv.CV\_64FC1 type). @param dsize size of the output image. @param flags combination of interpolation methods (cv.INTER\_LINEAR or cv.INTER\_NEAREST) and the optional flag WARP\_INVERSE\_MAP, that sets M as the inverse transformation (𝚍𝚜𝚝→𝚜𝚛𝚌). @param borderMode pixel extrapolation method (cv.BORDER\_CONSTANT or cv.BORDER\_REPLICATE). @param borderValue value used in case of a constant border; by default, it is 0.

**cv.getPerspectiveTransform (src, dst)**

@param src coordinates of quadrangle vertices in the source image. @param dst coordinates of the corresponding quadrangle vertices in the destination image.

## Try it

\\htmlonly

\\endhtmlonly

## [Js Grabcut](https://docharvest.github.io/docs/opencv5/js_tutorials/js_imgproc/js_grabcut/js_grabcut/)

Contents

opencv5

Js Grabcut

OpenCV 5

Js Grabcut

# Foreground Extraction using GrabCut Algorithm {#tutorial\_js\_grabcut}

## Goal

-   We will learn GrabCut algorithm to extract foreground in images

## Theory

GrabCut algorithm was designed by Carsten Rother, Vladimir Kolmogorov & Andrew Blake from Microsoft Research Cambridge, UK. in their paper, ["GrabCut": interactive foreground extraction using iterated graph cuts](http://dl.acm.org/citation.cfm?id=1015720) . An algorithm was needed for foreground extraction with minimal user interaction, and the result was GrabCut.

How it works from user point of view ? Initially user draws a rectangle around the foreground region (foreground region should be completely inside the rectangle). Then algorithm segments it iteratively to get the best result. Done. But in some cases, the segmentation won't be fine, like, it may have marked some foreground region as background and vice versa. In that case, user need to do fine touch-ups. Just give some strokes on the images where some faulty results are there. Strokes basically says _"Hey, this region should be foreground, you marked it background, correct it in next iteration"_ or its opposite for background. Then in the next iteration, you get better results.

What happens in background ?

-   User inputs the rectangle. Everything outside this rectangle will be taken as sure background (That is the reason it is mentioned before that your rectangle should include all the objects). Everything inside rectangle is unknown. Similarly any user input specifying foreground and background are considered as hard-labelling which means they won't change in the process.
-   Computer does an initial labelling depending on the data we gave. It labels the foreground and background pixels (or it hard-labels)
-   Now a Gaussian Mixture Model(GMM) is used to model the foreground and background.
-   Depending on the data we gave, GMM learns and create new pixel distribution. That is, the unknown pixels are labelled either probable foreground or probable background depending on its relation with the other hard-labelled pixels in terms of color statistics (It is just like clustering).
-   A graph is built from this pixel distribution. Nodes in the graphs are pixels. Additional two nodes are added, **Source node** and **Sink node**. Every foreground pixel is connected to Source node and every background pixel is connected to Sink node.
-   The weights of edges connecting pixels to source node/end node are defined by the probability of a pixel being foreground/background. The weights between the pixels are defined by the edge information or pixel similarity. If there is a large difference in pixel color, the edge between them will get a low weight.
-   Then a mincut algorithm is used to segment the graph. It cuts the graph into two separating source node and sink node with minimum cost function. The cost function is the sum of all weights of the edges that are cut. After the cut, all the pixels connected to Source node become foreground and those connected to Sink node become background.
-   The process is continued until the classification converges.

It is illustrated in below image (Image Courtesy: [http://www.cs.ru.ac.za/research/g02m1682/](http://www.cs.ru.ac.za/research/g02m1682/))

## Demo

We use the function: **cv.grabCut (image, mask, rect, bgdModel, fgdModel, iterCount, mode = cv.GC\_EVAL)**

@param image input 8-bit 3-channel image. @param mask input/output 8-bit single-channel mask. The mask is initialized by the function when mode is set to GC\_INIT\_WITH\_RECT. Its elements may have one of the cv.grabCutClasses. @param rect ROI containing a segmented object. The pixels outside of the ROI are marked as "obvious background". The parameter is only used when mode==GC\_INIT\_WITH\_RECT. @param bgdModel temporary array for the background model. Do not modify it while you are processing the same image. @param fgdModel temporary arrays for the foreground model. Do not modify it while you are processing the same image. @param iterCount number of iterations the algorithm should make before returning the result. Note that the result can be refined with further calls with mode==GC\_INIT\_WITH\_MASK or mode==GC\_EVAL . @param mode operation mode that could be one of the cv::GrabCutModes

## Try it

\\htmlonly

\\endhtmlonly

## [Js Gradients](https://docharvest.github.io/docs/opencv5/js_tutorials/js_imgproc/js_gradients/js_gradients/)

Contents

opencv5

Js Gradients

OpenCV 5

Js Gradients

# Image Gradients {#tutorial\_js\_gradients}

## Goal

-   Find Image gradients, edges etc
-   We will learn following functions : **cv.Sobel()**, **cv.Scharr()**, **cv.Laplacian()** etc

## Theory

OpenCV provides three types of gradient filters or High-pass filters, Sobel, Scharr and Laplacian. We will see each one of them.

### 1\. Sobel and Scharr Derivatives

Sobel operators is a joint Gaussian smoothing plus differentiation operation, so it is more resistant to noise. You can specify the direction of derivatives to be taken, vertical or horizontal (by the arguments, yorder and xorder respectively). You can also specify the size of kernel by the argument ksize. If ksize = -1, a 3x3 Scharr filter is used which gives better results than 3x3 Sobel filter. Please see the docs for kernels used.

We use the functions: **cv.Sobel (src, dst, ddepth, dx, dy, ksize = 3, scale = 1, delta = 0, borderType = cv.BORDER\_DEFAULT)** @param src input image. @param dst output image of the same size and the same number of channels as src. @param ddepth output image depth(see cv.combinations); in the case of 8-bit input images it will result in truncated derivatives. @param dx order of the derivative x. @param dy order of the derivative y. @param ksize size of the extended Sobel kernel; it must be 1, 3, 5, or 7. @param scale optional scale factor for the computed derivative values. @param delta optional delta value that is added to the results prior to storing them in dst. @param borderType pixel extrapolation method(see cv.BorderTypes).

**cv.Scharr (src, dst, ddepth, dx, dy, scale = 1, delta = 0, borderType = cv.BORDER\_DEFAULT)** @param src input image. @param dst output image of the same size and the same number of channels as src. @param ddepth output image depth(see cv.combinations). @param dx order of the derivative x. @param dy order of the derivative y. @param scale optional scale factor for the computed derivative values. @param delta optional delta value that is added to the results prior to storing them in dst. @param borderType pixel extrapolation method(see cv.BorderTypes).

## Try it

\\htmlonly

\\endhtmlonly

### 2\. Laplacian Derivatives

It calculates the Laplacian of the image given by the relation, \\f$\\Delta src = \\frac{\\partial ^2{src}}{\\partial x^2} + \\frac{\\partial ^2{src}}{\\partial y^2}\\f$ where each derivative is found using Sobel derivatives. If ksize = 1, then following kernel is used for filtering:

\\f\[kernel = \\begin{bmatrix} 0 & 1 & 0 \\ 1 & -4 & 1 \\ 0 & 1 & 0 \\end{bmatrix}\\f\]

We use the function: **cv.Laplacian (src, dst, ddepth, ksize = 1, scale = 1, delta = 0, borderType = cv.BORDER\_DEFAULT)** @param src input image. @param dst output image of the same size and the same number of channels as src. @param ddepth output image depth. @param ksize aperture size used to compute the second-derivative filters. @param scale optional scale factor for the computed Laplacian values. @param delta optional delta value that is added to the results prior to storing them in dst. @param borderType pixel extrapolation method(see cv.BorderTypes).

## Try it

\\htmlonly

\\endhtmlonly

## One Important Matter!

In our last example, output datatype is cv.CV\_8U. But there is a slight problem with that. Black-to-White transition is taken as Positive slope (it has a positive value) while White-to-Black transition is taken as a Negative slope (It has negative value). So when you convert data to cv.CV\_8U, all negative slopes are made zero. In simple words, you miss that edge.

If you want to detect both edges, better option is to keep the output datatype to some higher forms, like cv.CV\_16S, cv.CV\_64F etc, take its absolute value and then convert back to cv.CV\_8U. Below code demonstrates this procedure for a horizontal Sobel filter and difference in results.

## Try it

\\htmlonly

\\endhtmlonly

## [Js Histogram Backprojection](https://docharvest.github.io/docs/opencv5/js_tutorials/js_imgproc/js_histograms/js_histogram_backprojection/js_histogram_backprojection/)

Contents

opencv5

Js Histogram Backprojection

OpenCV 5

Js Histogram Backprojection

# Histogram - 3 : Histogram Backprojection {#tutorial\_js\_histogram\_backprojection}

## Goal

-   We will learn about histogram backprojection.

## Theory

It was proposed by **Michael J. Swain , Dana H. Ballard** in their paper **Indexing via color histograms**.

**What is it actually in simple words?** It is used for image segmentation or finding objects of interest in an image. In simple words, it creates an image of the same size (but single channel) as that of our input image, where each pixel corresponds to the probability of that pixel belonging to our object. In more simpler worlds, the output image will have our object of interest in more white compared to remaining part. Well, that is an intuitive explanation. (I can't make it more simpler). Histogram Backprojection is used with camshift algorithm etc.

**How do we do it ?** We create a histogram of an image containing our object of interest (in our case, the ground, leaving player and other things). The object should fill the image as far as possible for better results. And a color histogram is preferred over grayscale histogram, because color of the object is a better way to define the object than its grayscale intensity. We then "back-project" this histogram over our test image where we need to find the object, ie in other words, we calculate the probability of every pixel belonging to the ground and show it. The resulting output on proper thresholding gives us the ground alone.

## Backprojection in OpenCV

We use the functions: **cv.calcBackProject (images, channels, hist, dst, ranges, scale)**

@param images source arrays. They all should have the same depth, cv.CV\_8U, cv.CV\_16U or cv.CV\_32F , and the same size. Each of them can have an arbitrary number of channels. @param channels the list of channels used to compute the back projection. The number of channels must match the histogram dimensionality. @param hist input histogram that can be dense or sparse. @param dst destination back projection array that is a single-channel array of the same size and depth as images\[0\]. @param ranges array of arrays of the histogram bin boundaries in each dimension(see cv.calcHist). @param scale optional scale factor for the output back projection.

**cv.normalize (src, dst, alpha = 1, beta = 0, norm\_type = cv.NORM\_L2, dtype = -1, mask = new cv.Mat())**

@param src input array. @param dst output array of the same size as src . @param alpha norm value to normalize to or the lower range boundary in case of the range normalization. @param beta upper range boundary in case of the range normalization; it is not used for the norm normalization. @param norm\_type normalization type (see cv.NormTypes). @param dtype when negative, the output array has the same type as src; otherwise, it has the same number of channels as src and the depth = CV\_MAT\_DEPTH(dtype). @param mask optional operation mask.

## Try it

\\htmlonly

\\endhtmlonly

## [Js Histogram Begins](https://docharvest.github.io/docs/opencv5/js_tutorials/js_imgproc/js_histograms/js_histogram_begins/js_histogram_begins/)

Contents

opencv5

Js Histogram Begins

OpenCV 5

Js Histogram Begins

# Histograms - 1 : Find, Plot, Analyze !!! {#tutorial\_js\_histogram\_begins}

## Goal

-   Find histograms
-   Plot histograms
-   You will learn the function: **cv.calcHist()**.

## Theory

So what is histogram ? You can consider histogram as a graph or plot, which gives you an overall idea about the intensity distribution of an image. It is a plot with pixel values (ranging from 0 to 255, not always) in X-axis and corresponding number of pixels in the image on Y-axis.

It is just another way of understanding the image. By looking at the histogram of an image, you get intuition about contrast, brightness, intensity distribution etc of that image. Almost all image processing tools today, provides features on histogram. Below is an image from [Cambridge in Color website](http://www.cambridgeincolour.com/tutorials/histograms1.htm), and I recommend you to visit the site for more details.

You can see the image and its histogram. (Remember, this histogram is drawn for grayscale image, not color image). Left region of histogram shows the amount of darker pixels in image and right region shows the amount of brighter pixels. From the histogram, you can see dark region is more than brighter region, and amount of midtones (pixel values in mid-range, say around 127) are very less.

## Find Histogram

We use the function: **cv.calcHist (image, channels, mask, hist, histSize, ranges, accumulate = false)**

@param image source arrays. They all should have the same depth, cv.CV\_8U, cv.CV\_16U or cv.CV\_32F , and the same size. Each of them can have an arbitrary number of channels. @param channels list of the dims channels used to compute the histogram. @param mask optional mask. If the matrix is not empty, it must be an 8-bit array of the same size as images\[i\] . The non-zero mask elements mark the array elements counted in the histogram. @param hist output histogram(cv.CV\_32F type), which is a dense or sparse dims -dimensional array. @param histSize array of histogram sizes in each dimension. @param ranges array of the dims arrays of the histogram bin boundaries in each dimension. @param accumulate accumulation flag. If it is set, the histogram is not cleared in the beginning when it is allocated. This feature enables you to compute a single histogram from several sets of arrays, or to update the histogram in time.

## Try it

\\htmlonly

\\endhtmlonly

## [Js Histogram Equalization](https://docharvest.github.io/docs/opencv5/js_tutorials/js_imgproc/js_histograms/js_histogram_equalization/js_histogram_equalization/)

Contents

opencv5

Js Histogram Equalization

OpenCV 5

Js Histogram Equalization

# Histograms - 2: Histogram Equalization {#tutorial\_js\_histogram\_equalization}

## Goal

-   We will learn the concepts of histogram equalization and use it to improve the contrast of our images.

## Theory

Consider an image whose pixel values are confined to some specific range of values only. For eg, brighter image will have all pixels confined to high values. But a good image will have pixels from all regions of the image. So you need to stretch this histogram to either ends (as given in below image, from wikipedia) and that is what Histogram Equalization does (in simple words). This normally improves the contrast of the image.

I would recommend you to read the wikipedia page on [Histogram Equalization](http://en.wikipedia.org/wiki/Histogram_equalization) for more details about it. It has a very good explanation with worked out examples, so that you would understand almost everything after reading that.

## Histograms Equalization in OpenCV

We use the function: **cv.equalizeHist (src, dst)**

@param src source 8-bit single channel image. @param dst destination image of the same size and type as src.

## Try it

\\htmlonly

\\endhtmlonly

## CLAHE (Contrast Limited Adaptive Histogram Equalization)

In **adaptive histogram equalization**, image is divided into small blocks called "tiles" (tileSize is 8x8 by default in OpenCV). Then each of these blocks are histogram equalized as usual. So in a small area, histogram would confine to a small region (unless there is noise). If noise is there, it will be amplified. To avoid this, **contrast limiting** is applied. If any histogram bin is above the specified contrast limit (by default 40 in OpenCV), those pixels are clipped and distributed uniformly to other bins before applying histogram equalization. After equalization, to remove artifacts in tile borders, bilinear interpolation is applied.

We use the class: **cv.CLAHE (clipLimit = 40, tileGridSize = new cv.Size(8, 8))**

@param clipLimit threshold for contrast limiting. @param tileGridSize size of grid for histogram equalization. Input image will be divided into equally sized rectangular tiles. tileGridSize defines the number of tiles in row and column.

@note Don't forget to delete CLAHE!

## Try it

\\htmlonly

\\endhtmlonly

## [Js Table Of Contents Histograms](https://docharvest.github.io/docs/opencv5/js_tutorials/js_imgproc/js_histograms/js_table_of_contents_histograms/)

Contents

opencv5

Js Table Of Contents Histograms

OpenCV 5

Js Table Of Contents Histograms

# Histograms in OpenCV.js {#tutorial\_js\_table\_of\_contents\_histograms}

-   @subpage tutorial\_js\_histogram\_begins
    
    Learn the basics of histograms
    
-   @subpage tutorial\_js\_histogram\_equalization
    
    Learn to Equalize Histograms to get better contrast for images
    
-   @subpage tutorial\_js\_histogram\_backprojection
    
    Learn histogram backprojection to segment colored objects

## [Js Houghcircles](https://docharvest.github.io/docs/opencv5/js_tutorials/js_imgproc/js_houghcircles/js_houghcircles/)

Contents

opencv5

Js Houghcircles

OpenCV 5

Js Houghcircles

# Hough Circle Transform {#tutorial\_js\_houghcircles}

## Goal

-   We will learn to use Hough Transform to find circles in an image.
-   We will learn these functions: **cv.HoughCircles()**

## Theory

A circle is represented mathematically as \\f$(x-x\_{center})^2 + (y - y\_{center})^2 = r^2\\f$ where \\f$(x\_{center},y\_{center})\\f$ is the center of the circle, and \\f$r\\f$ is the radius of the circle. From equation, we can see we have 3 parameters, so we need a 3D accumulator for hough transform, which would be highly ineffective. So OpenCV uses more trickier method, **Hough Gradient Method** which uses the gradient information of edges.

We use the function: **cv.HoughCircles (image, circles, method, dp, minDist, param1 = 100, param2 = 100, minRadius = 0, maxRadius = 0)**

@param image 8-bit, single-channel, grayscale input image. @param circles output vector of found circles(cv.CV\_32FC3 type). Each vector is encoded as a 3-element floating-point vector (x,y,radius) . @param method detection method(see cv.HoughModes). Currently, the only implemented method is HOUGH\_GRADIENT @param dp inverse ratio of the accumulator resolution to the image resolution. For example, if dp = 1 , the accumulator has the same resolution as the input image. If dp = 2 , the accumulator has half as big width and height. @param minDist minimum distance between the centers of the detected circles. If the parameter is too small, multiple neighbor circles may be falsely detected in addition to a true one. If it is too large, some circles may be missed. @param param1 first method-specific parameter. In case of HOUGH\_GRADIENT , it is the higher threshold of the two passed to the Canny edge detector (the lower one is twice smaller). @param param2 second method-specific parameter. In case of HOUGH\_GRADIENT , it is the accumulator threshold for the circle centers at the detection stage. The smaller it is, the more false circles may be detected. Circles, corresponding to the larger accumulator values, will be returned first. @param minRadius minimum circle radius. @param maxRadius maximum circle radius.

## Try it

\\htmlonly

\\endhtmlonly

## [Js Houghlines](https://docharvest.github.io/docs/opencv5/js_tutorials/js_imgproc/js_houghlines/js_houghlines/)

Contents

opencv5

Js Houghlines

OpenCV 5

Js Houghlines

# Hough Line Transform {#tutorial\_js\_houghlines}

## Goal

-   We will understand the concept of the Hough Transform.
-   We will learn how to use it to detect lines in an image.
-   We will learn the following functions: **cv.HoughLines()**, **cv.HoughLinesP()**

## Theory

The Hough Transform is a popular technique to detect any shape, if you can represent that shape in a mathematical form. It can detect the shape even if it is broken or distorted a little bit. We will see how it works for a line.

A line can be represented as \\f$y = mx+c\\f$ or in a parametric form, as \\f$\\rho = x \\cos \\theta + y \\sin \\theta\\f$ where \\f$\\rho\\f$ is the perpendicular distance from the origin to the line, and \\f$\\theta\\f$ is the angle formed by this perpendicular line and the horizontal axis measured in counter-clockwise (That direction varies on how you represent the coordinate system. This representation is used in OpenCV). Check the image below:

So if the line is passing below the origin, it will have a positive rho and an angle less than 180. If it is going above the origin, instead of taking an angle greater than 180, the angle is taken less than 180, and rho is taken negative. Any vertical line will have 0 degree and horizontal lines will have 90 degree.

Now let's see how the Hough Transform works for lines. Any line can be represented in these two terms, \\f$(\\rho, \\theta)\\f$. So first it creates a 2D array or accumulator (to hold the values of the two parameters) and it is set to 0 initially. Let rows denote the \\f$\\rho\\f$ and columns denote the \\f$\\theta\\f$. Size of array depends on the accuracy you need. Suppose you want the accuracy of angles to be 1 degree, you will need 180 columns. For \\f$\\rho\\f$, the maximum distance possible is the diagonal length of the image. So taking one pixel accuracy, the number of rows can be the diagonal length of the image.

Consider a 100x100 image with a horizontal line at the middle. Take the first point of the line. You know its (x,y) values. Now in the line equation, put the values \\f$\\theta = 0,1,2,....,180\\f$ and check the \\f$\\rho\\f$ you get. For every \\f$(\\rho, \\theta)\\f$ pair, you increment value by one in our accumulator in its corresponding \\f$(\\rho, \\theta)\\f$ cells. So now in accumulator, the cell (50,90) = 1 along with some other cells.

Now take the second point on the line. Do the same as above. Increment the values in the cells corresponding to \\f$(\\rho, \\theta)\\f$ you got. This time, the cell (50,90) = 2. What you actually do is voting the \\f$(\\rho, \\theta)\\f$ values. You continue this process for every point on the line. At each point, the cell (50,90) will be incremented or voted up, while other cells may or may not be voted up. This way, at the end, the cell (50,90) will have maximum votes. So if you search the accumulator for maximum votes, you get the value (50,90) which says, there is a line in this image at a distance 50 from the origin and at angle 90 degrees. It is well shown in the below animation (Image Courtesy: [Amos Storkey](http://homepages.inf.ed.ac.uk/amos/hough.html) )

This is how hough transform works for lines. It is simple. Below is an image which shows the accumulator. Bright spots at some locations denote they are the parameters of possible lines in the image. (Image courtesy: [Wikipedia](http://en.wikipedia.org/wiki/Hough_transform) )

# Hough Transform in OpenCV

Everything explained above is encapsulated in the OpenCV function, **cv.HoughLines()**. It simply returns an array of (\\f$(\\rho, \\theta)\\f$ values. \\f$\\rho\\f$ is measured in pixels and \\f$\\theta\\f$ is measured in radians. First parameter, Input image should be a binary image, so apply threshold or use canny edge detection before applying hough transform.

We use the function: **cv.HoughLines (image, lines, rho, theta, threshold, srn = 0, stn = 0, min\_theta = 0, max\_theta = Math.PI)** @param image 8-bit, single-channel binary source image. The image may be modified by the function. @param lines output vector of lines(cv.32FC2 type). Each line is represented by a two-element vector (ρ,θ) . ρ is the distance from the coordinate origin (0,0). θ is the line rotation angle in radians. @param rho distance resolution of the accumulator in pixels. @param theta angle resolution of the accumulator in radians. @param threshold accumulator threshold parameter. Only those lines are returned that get enough votes @param srn for the multi-scale Hough transform, it is a divisor for the distance resolution rho . The coarse accumulator distance resolution is rho and the accurate accumulator resolution is rho/srn . If both srn=0 and stn=0 , the classical Hough transform is used. Otherwise, both these parameters should be positive. @param stn for the multi-scale Hough transform, it is a divisor for the distance resolution theta. @param min\_theta for standard and multi-scale Hough transform, minimum angle to check for lines. Must fall between 0 and max\_theta. @param max\_theta for standard and multi-scale Hough transform, maximum angle to check for lines. Must fall between min\_theta and CV\_PI.

## Try it

\\htmlonly

\\endhtmlonly

## Probabilistic Hough Transform

In the hough transform, you can see that even for a line with two arguments, it takes a lot of computation. Probabilistic Hough Transform is an optimization of the Hough Transform we saw. It doesn't take all the points into consideration. Instead, it takes only a random subset of points which is sufficient for line detection. Just we have to decrease the threshold. See image below which compares Hough Transform and Probabilistic Hough Transform in Hough space. (Image Courtesy : [Franck Bettinger's home page](http://phdfb1.free.fr/robot/mscthesis/node14.html) )

OpenCV implementation is based on Robust Detection of Lines Using the Progressive Probabilistic Hough Transform by Matas, J. and Galambos, C. and Kittler, J.V. @cite Matas00.

We use the function: **cv.HoughLinesP (image, lines, rho, theta, threshold, minLineLength = 0, maxLineGap = 0)**

@param image 8-bit, single-channel binary source image. The image may be modified by the function. @param lines output vector of lines(cv.32SC4 type). Each line is represented by a 4-element vector (x1,y1,x2,y2) ,where (x1,y1) and (x2,y2) are the ending points of each detected line segment. @param rho distance resolution of the accumulator in pixels. @param theta angle resolution of the accumulator in radians. @param threshold accumulator threshold parameter. Only those lines are returned that get enough votes @param minLineLength minimum line length. Line segments shorter than that are rejected. @param maxLineGap maximum allowed gap between points on the same line to link them.

## Try it

\\htmlonly

\\endhtmlonly

## [Js Imgproc Camera](https://docharvest.github.io/docs/opencv5/js_tutorials/js_imgproc/js_imgproc_camera/js_imgproc_camera/)

Contents

opencv5

Js Imgproc Camera

OpenCV 5

Js Imgproc Camera

# Image Processing for Video Capture {#tutorial\_js\_imgproc\_camera}

## Goal

-   learn image processing for video capture.

\\htmlonly

\\endhtmlonly

## [Js Intelligent Scissors](https://docharvest.github.io/docs/opencv5/js_tutorials/js_imgproc/js_intelligent_scissors/js_intelligent_scissors/)

Contents

opencv5

Js Intelligent Scissors

OpenCV 5

Js Intelligent Scissors

# Intelligent Scissors Demo {#tutorial\_js\_intelligent\_scissors}

## Goal

-   Here you can check how to use IntelligentScissors tool for image segmentation task.
-   Available methods and parameters: @ref cv::segmentation::IntelligentScissorsMB

@note The feature is integrated into [CVAT](https://github.com/openvinotoolkit/cvat) annotation tool and you can try it online on [https://cvat.org](https://cvat.org)

\\htmlonly

\\endhtmlonly

## [Js Morphological Ops](https://docharvest.github.io/docs/opencv5/js_tutorials/js_imgproc/js_morphological_ops/js_morphological_ops/)

Contents

opencv5

Js Morphological Ops

OpenCV 5

Js Morphological Ops

# Morphological Transformations {#tutorial\_js\_morphological\_ops}

## Goal

-   We will learn different morphological operations like Erosion, Dilation, Opening, Closing etc.
-   We will learn different functions like : **cv.erode()**, **cv.dilate()**, **cv.morphologyEx()** etc.

## Theory

Morphological transformations are some simple operations based on the image shape. It is normally performed on binary images. It needs two inputs, one is our original image, second one is called **structuring element** or **kernel** which decides the nature of operation. Two basic morphological operators are Erosion and Dilation. Then its variant forms like Opening, Closing, Gradient etc also comes into play. We will see them one-by-one with help of following image:

### 1\. Erosion

The basic idea of erosion is just like soil erosion only, it erodes away the boundaries of foreground object (Always try to keep foreground in white). So what it does? The kernel slides through the image (as in 2D convolution). A pixel in the original image (either 1 or 0) will be considered 1 only if all the pixels under the kernel is 1, otherwise it is eroded (made to zero).

So what happends is that, all the pixels near boundary will be discarded depending upon the size of kernel. So the thickness or size of the foreground object decreases or simply white region decreases in the image. It is useful for removing small white noises (as we have seen in colorspace chapter), detach two connected objects etc.

We use the function: **cv.erode (src, dst, kernel, anchor = new cv.Point(-1, -1), iterations = 1, borderType = cv.BORDER\_CONSTANT, borderValue = cv.morphologyDefaultBorderValue())** @param src input image; the number of channels can be arbitrary, but the depth should be one of cv.CV\_8U, cv.CV\_16U, cv.CV\_16S, cv.CV\_32F or cv.CV\_64F. @param dst output image of the same size and type as src. @param kernel structuring element used for erosion. @param anchor position of the anchor within the element; default value new cv.Point(-1, -1) means that the anchor is at the element center. @param iterations number of times erosion is applied. @param borderType pixel extrapolation method(see cv.BorderTypes). @param borderValue border value in case of a constant border

## Try it

\\htmlonly

\\endhtmlonly

### 2\. Dilation

It is just opposite of erosion. Here, a pixel element is '1' if at least one pixel under the kernel is '1'. So it increases the white region in the image or size of foreground object increases. Normally, in cases like noise removal, erosion is followed by dilation. Because, erosion removes white noises, but it also shrinks our object. So we dilate it. Since noise is gone, they won't come back, but our object area increases. It is also useful in joining broken parts of an object.

We use the function: **cv.dilate (src, dst, kernel, anchor = new cv.Point(-1, -1), iterations = 1, borderType = cv.BORDER\_CONSTANT, borderValue = cv.morphologyDefaultBorderValue())** @param src input image; the number of channels can be arbitrary, but the depth should be one of cv.CV\_8U, cv.CV\_16U, cv.CV\_16S, cv.CV\_32F or cv.CV\_64F. @param dst output image of the same size and type as src. @param kernel structuring element used for dilation. @param anchor position of the anchor within the element; default value new cv.Point(-1, -1) means that the anchor is at the element center. @param iterations number of times dilation is applied. @param borderType pixel extrapolation method(see cv.BorderTypes). @param borderValue border value in case of a constant border

## Try it

\\htmlonly

\\endhtmlonly

### 3\. Opening

Opening is just another name of **erosion followed by dilation**. It is useful in removing noise.

We use the function: **cv.morphologyEx (src, dst, op, kernel, anchor = new cv.Point(-1, -1), iterations = 1, borderType = cv.BORDER\_CONSTANT, borderValue = cv.morphologyDefaultBorderValue())** @param src source image. The number of channels can be arbitrary. The depth should be one of cv.CV\_8U, cv.CV\_16U, cv.CV\_16S, cv.CV\_32F or cv.CV\_64F @param dst destination image of the same size and type as source image. @param op type of a morphological operation, (see cv.MorphTypes). @param kernel structuring element. It can be created using cv.getStructuringElement. @param anchor anchor position with the kernel. Negative values mean that the anchor is at the kernel center. @param iterations number of times dilation is applied. @param borderType pixel extrapolation method(see cv.BorderTypes). @param borderValue border value in case of a constant border. The default value has a special meaning.

## Try it

\\htmlonly

\\endhtmlonly

### 4\. Closing

Closing is reverse of Opening, **Dilation followed by Erosion**. It is useful in closing small holes inside the foreground objects, or small black points on the object.

## Try it

\\htmlonly

\\endhtmlonly

### 5\. Morphological Gradient

It is the difference between dilation and erosion of an image.

The result will look like the outline of the object.

## Try it

\\htmlonly

\\endhtmlonly

### 6\. Top Hat

It is the difference between input image and Opening of the image.

## Try it

\\htmlonly

\\endhtmlonly

### 7\. Black Hat

It is the difference between the closing of the input image and input image.

## Try it

\\htmlonly

\\endhtmlonly

## Structuring Element

We manually created a structuring elements in the previous examples with help of cv.Mat.ones. It is rectangular shape. But in some cases, you may need elliptical/circular shaped kernels or diamond-shaped kernels. So for this purpose, OpenCV has a function, **cv.getStructuringElement()**. You just pass the shape and size of the kernel, you get the desired kernel.

We use the function: **cv.getStructuringElement (shape, ksize, anchor = new cv.Point(-1, -1))** @param shape element shape that could be one of cv.MorphShapes @param ksize size of the structuring element. @param anchor anchor position within the element. The default value \[−1,−1\] means that the anchor is at the center. Note that only the shape of a cross-shaped element depends on the anchor position. In other cases the anchor just regulates how much the result of the morphological operation is shifted.

## Try it

\\htmlonly

\\endhtmlonly

## [Js Pyramids](https://docharvest.github.io/docs/opencv5/js_tutorials/js_imgproc/js_pyramids/js_pyramids/)

Contents

opencv5

Js Pyramids

OpenCV 5

Js Pyramids

# Image Pyramids {#tutorial\_js\_pyramids}

## Goal

-   We will learn about Image Pyramids
-   We will learn these functions: **cv.pyrUp()**, **cv.pyrDown()**

## Theory

Normally, we used to work with an image of constant size. But on some occasions, we need to work with (the same) images in different resolution. For example, while searching for something in an image, like face, we are not sure at what size the object will be present in said image. In that case, we will need to create a set of the same image with different resolutions and search for object in all of them. These set of images with different resolutions are called **Image Pyramids** (because when they are kept in a stack with the highest resolution image at the bottom and the lowest resolution image at top, it looks like a pyramid).

There are two kinds of Image Pyramids. 1) **Gaussian Pyramid** and 2) **Laplacian Pyramids**

Higher level (Low resolution) in a Gaussian Pyramid is formed by removing consecutive rows and columns in Lower level (higher resolution) image. Then each pixel in higher level is formed by the contribution from 5 pixels in underlying level with gaussian weights. By doing so, a \\f$M \\times N\\f$ image becomes \\f$M/2 \\times N/2\\f$ image. So area reduces to one-fourth of original area. It is called an Octave. The same pattern continues as we go upper in pyramid (ie, resolution decreases). Similarly while expanding, area becomes 4 times in each level. We can find Gaussian pyramids using **cv.pyrDown()** and **cv.pyrUp()** functions.

Laplacian Pyramids are formed from the Gaussian Pyramids. There is no exclusive function for that. Laplacian pyramid images are like edge images only. Most of its elements are zeros. They are used in image compression. A level in Laplacian Pyramid is formed by the difference between that level in Gaussian Pyramid and expanded version of its upper level in Gaussian Pyramid.

## Downsample

We use the function: **cv.pyrDown (src, dst, dstsize = new cv.Size(0, 0), borderType = cv.BORDER\_DEFAULT)** @param src input image. @param dst output image; it has the specified size and the same type as src. @param dstsize size of the output image. @param borderType pixel extrapolation method(see cv.BorderTypes, cv.BORDER\_CONSTANT isn't supported).

## Try it

\\htmlonly

\\endhtmlonly

## Upsample

We use the function: **cv.pyrUp (src, dst, dstsize = new cv.Size(0, 0), borderType = cv.BORDER\_DEFAULT)** @param src input image. @param dst output image; it has the specified size and the same type as src. @param dstsize size of the output image. @param borderType pixel extrapolation method(see cv.BorderTypes, only cv.BORDER\_DEFAULT is supported).

## Try it

\\htmlonly

\\endhtmlonly

## [Js Table Of Contents Imgproc](https://docharvest.github.io/docs/opencv5/js_tutorials/js_imgproc/js_table_of_contents_imgproc/)

Contents

opencv5

Js Table Of Contents Imgproc

OpenCV 5

Js Table Of Contents Imgproc

# Image Processing {#tutorial\_js\_table\_of\_contents\_imgproc}

-   @subpage tutorial\_js\_colorspaces
    
    Learn how to change images between different color spaces.
    
-   @subpage tutorial\_js\_geometric\_transformations
    
    Learn how to apply different geometric transformations to images like rotation, translation etc.
    
-   @subpage tutorial\_js\_thresholding
    
    Learn how to convert images to binary images using global thresholding, Adaptive thresholding, Otsu's binarization etc.
    
-   @subpage tutorial\_js\_filtering
    
    Learn how to blur the images, filter the images with custom kernels etc.
    
-   @subpage tutorial\_js\_morphological\_ops
    
    Learn about morphological transformations like Erosion, Dilation, Opening, Closing etc.
    
-   @subpage tutorial\_js\_gradients
    
    Learn how to find image gradients, edges etc.
    
-   @subpage tutorial\_js\_canny
    
    Learn how to find edges with Canny Edge Detection.
    
-   @subpage tutorial\_js\_pyramids
    
    Learn about image pyramids and how to use them for image blending.
    
-   @subpage tutorial\_js\_table\_of\_contents\_contours
    
    Learn about Contours in OpenCV.js.
    
-   @subpage tutorial\_js\_table\_of\_contents\_histograms
    
    Learn about histograms in OpenCV.js.
    
-   @subpage tutorial\_js\_table\_of\_contents\_transforms
    
    Learn different Image Transforms in OpenCV.js like Fourier Transform, Cosine Transform etc.
    
-   @subpage tutorial\_js\_template\_matching
    
    Learn how to search for an object in an image using Template Matching.
    
-   @subpage tutorial\_js\_houghlines
    
    Learn how to detect lines in an image.
    
-   @subpage tutorial\_js\_houghcircles
    
    Learn how to detect circles in an image.
    
-   @subpage tutorial\_js\_watershed
    
    Learn how to segment images with watershed segmentation.
    
-   @subpage tutorial\_js\_grabcut
    
    Learn how to extract foreground with GrabCut algorithm.
    
-   @subpage tutorial\_js\_imgproc\_camera
    
    Learn image processing for video capture.
    
-   @subpage tutorial\_js\_intelligent\_scissors
    
    Learn how to use IntelligentScissors tool for image segmentation task.

## [Js Template Matching](https://docharvest.github.io/docs/opencv5/js_tutorials/js_imgproc/js_template_matching/js_template_matching/)

Contents

opencv5

Js Template Matching

OpenCV 5

Js Template Matching

# Template Matching {#tutorial\_js\_template\_matching}

## Goals

-   To find objects in an image using Template Matching
-   You will learn these functions : **cv.matchTemplate()**, **cv.minMaxLoc()**

## Theory

Template Matching is a method for searching and finding the location of a template image in a larger image. OpenCV comes with a function **cv.matchTemplate()** for this purpose. It simply slides the template image over the input image (as in 2D convolution) and compares the template and patch of input image under the template image. Several comparison methods are implemented in OpenCV. (You can check docs for more details). It returns a grayscale image, where each pixel denotes how much does the neighbourhood of that pixel match with template.

If input image is of size (WxH) and template image is of size (wxh), output image will have a size of (W-w+1, H-h+1). Once you got the result, you can use **cv.minMaxLoc()** function to find where is the maximum/minimum value. Take it as the top-left corner of rectangle and take (w,h) as width and height of the rectangle. That rectangle is your region of template.

@note If you are using cv.TM\_SQDIFF as comparison method, minimum value gives the best match.

## Template Matching in OpenCV

We use the function: **cv.matchTemplate (image, templ, result, method, mask = new cv.Mat())**

@param image image where the search is running. It must be 8-bit or 32-bit floating-point. @param templ searched template. It must be not greater than the source image and have the same data type. @param result map of comparison results. It must be single-channel 32-bit floating-point. @param method parameter specifying the comparison method(see cv.TemplateMatchModes). @param mask mask of searched template. It must have the same datatype and size with templ. It is not set by default.

## Try it

\\htmlonly

\\endhtmlonly

## [Js Thresholding](https://docharvest.github.io/docs/opencv5/js_tutorials/js_imgproc/js_thresholding/js_thresholding/)

Contents

opencv5

Js Thresholding

OpenCV 5

Js Thresholding

# Image Thresholding {#tutorial\_js\_thresholding}

## Goal

-   In this tutorial, you will learn Simple thresholding, Adaptive thresholding, Otsu's thresholding etc.
-   You will learn these functions : **cv.threshold**, **cv.adaptiveThreshold** etc.

## Simple Thresholding

Here, the matter is straight forward. If pixel value is greater than a threshold value, it is assigned one value (may be white), else it is assigned another value (may be black).

We use the function: **cv.threshold (src, dst, thresh, maxval, type)** @param src input array. @param dst output array of the same size and type and the same number of channels as src. @param thresh threshold value. @param maxval maximum value to use with the cv.THRESH\_BINARY and cv.THRESH\_BINARY\_INV thresholding types. @param type thresholding type(see cv.ThresholdTypes).

**thresholding type** - OpenCV provides different styles of thresholding and it is decided by the fourth parameter of the function. Different types are:

-   cv.THRESH\_BINARY
-   cv.THRESH\_BINARY\_INV
-   cv.THRESH\_TRUNC
-   cv.THRESH\_TOZERO
-   cv.THRESH\_OTSU
-   cv.THRESH\_TRIANGLE

@note Input image should be single channel only in case of cv.THRESH\_OTSU or cv.THRESH\_TRIANGLE flags

## Try it

\\htmlonly

\\endhtmlonly

## Adaptive Thresholding

In the previous section, we used a global value as threshold value. But it may not be good in all the conditions where image has different lighting conditions in different areas. In that case, we go for adaptive thresholding. In this, the algorithm calculate the threshold for a small regions of the image. So we get different thresholds for different regions of the same image and it gives us better results for images with varying illumination.

We use the function: **cv.adaptiveThreshold (src, dst, maxValue, adaptiveMethod, thresholdType, blockSize, C)** @param src source 8-bit single-channel image. @param dst destination image of the same size and the same type as src. @param maxValue non-zero value assigned to the pixels for which the condition is satisfied @param adaptiveMethod adaptive thresholding algorithm to use. @param thresholdType thresholding type that must be either cv.THRESH\_BINARY or cv.THRESH\_BINARY\_INV. @param blockSize size of a pixel neighborhood that is used to calculate a threshold value for the pixel: 3, 5, 7, and so on. @param C constant subtracted from the mean or weighted mean (see the details below). Normally, it is positive but may be zero or negative as well.

**adaptiveMethod** - It decides how thresholding value is calculated: - cv.ADAPTIVE\_THRESH\_MEAN\_C - cv.ADAPTIVE\_THRESH\_GAUSSIAN\_C

## Try it

\\htmlonly

\\endhtmlonly

## [Js Fourier Transform](https://docharvest.github.io/docs/opencv5/js_tutorials/js_imgproc/js_transforms/js_fourier_transform/js_fourier_transform/)

Contents

opencv5

Js Fourier Transform

OpenCV 5

Js Fourier Transform

# Fourier Transform {#tutorial\_js\_fourier\_transform}

## Goal

-   To find the Fourier Transform of images using OpenCV
-   Some applications of Fourier Transform
-   We will learn following functions : **cv.dft()** etc

## Theory

Fourier Transform is used to analyze the frequency characteristics of various filters. For images, **2D Discrete Fourier Transform (DFT)** is used to find the frequency domain. A fast algorithm called **Fast Fourier Transform (FFT)** is used for calculation of DFT. Details about these can be found in any image processing or signal processing textbooks.

For a sinusoidal signal, \\f$x(t) = A \\sin(2 \\pi ft)\\f$, we can say \\f$f\\f$ is the frequency of signal, and if its frequency domain is taken, we can see a spike at \\f$f\\f$. If signal is sampled to form a discrete signal, we get the same frequency domain, but is periodic in the range \\f$\[- \\pi, \\pi\]\\f$ or \\f$\[0,2\\pi\]\\f$ (or \\f$\[0,N\]\\f$ for N-point DFT). You can consider an image as a signal which is sampled in two directions. So taking fourier transform in both X and Y directions gives you the frequency representation of image.

More intuitively, for the sinusoidal signal, if the amplitude varies so fast in short time, you can say it is a high frequency signal. If it varies slowly, it is a low frequency signal. You can extend the same idea to images. Where does the amplitude varies drastically in images ? At the edge points, or noises. So we can say, edges and noises are high frequency contents in an image. If there is no much changes in amplitude, it is a low frequency component.

Performance of DFT calculation is better for some array size. It is fastest when array size is power of two. The arrays whose size is a product of 2’s, 3’s, and 5’s are also processed quite efficiently. So if you are worried about the performance of your code, you can modify the size of the array to any optimal size (by padding zeros) before finding DFT. OpenCV provides a function, **cv.getOptimalDFTSize()** for this.

Now we will see how to find the Fourier Transform.

## Fourier Transform in OpenCV

Performance of DFT calculation is better for some array size. It is fastest when array size is power of two. The arrays whose size is a product of 2’s, 3’s, and 5’s are also processed quite efficiently. So if you are worried about the performance of your code, you can modify the size of the array to any optimal size (by padding zeros). So how do we find this optimal size ? OpenCV provides a function, cv.getOptimalDFTSize() for this.

We use the functions: **cv.dft (src, dst, flags = 0, nonzeroRows = 0)**

@param src input array that could be real or complex. @param dst output array whose size and type depends on the flags. @param flags transformation flags, representing a combination of the cv.DftFlags @param nonzeroRows when the parameter is not zero, the function assumes that only the first nonzeroRows rows of the input array (DFT\_INVERSE is not set) or only the first nonzeroRows of the output array (DFT\_INVERSE is set) contain non-zeros, thus, the function can handle the rest of the rows more efficiently and save some time; this technique is very useful for calculating array cross-correlation or convolution using DFT.

**cv.getOptimalDFTSize (vecsize)**

@param vecsize vector size.

**cv.copyMakeBorder (src, dst, top, bottom, left, right, borderType, value = new cv.Scalar())**

@param src input array that could be real or complex. @param dst output array whose size and type depends on the flags. @param top parameter specifying how many top pixels in each direction from the source image rectangle to extrapolate. @param bottom parameter specifying how many bottom pixels in each direction from the source image rectangle to extrapolate. @param left parameter specifying how many left pixels in each direction from the source image rectangle to extrapolate. @param right parameter specifying how many right pixels in each direction from the source image rectangle to extrapolate. @param borderType border type. @param value border value if borderType == cv.BORDER\_CONSTANT.

**cv.magnitude (x, y, magnitude)**

@param x floating-point array of x-coordinates of the vectors. @param y floating-point array of y-coordinates of the vectors; it must have the same size as x. @param magnitude output array of the same size and type as x.

**cv.split (m, mv)**

@param m input multi-channel array. @param mv output vector of arrays; the arrays themselves are reallocated, if needed.

**cv.merge (mv, dst)**

@param mv input vector of matrices to be merged; all the matrices in mv must have the same size and the same depth. @param dst output array of the same size and the same depth as mv\[0\]; The number of channels will be the total number of channels in the matrix array.

## Try it

\\htmlonly

\\endhtmlonly

## [Js Table Of Contents Transforms](https://docharvest.github.io/docs/opencv5/js_tutorials/js_imgproc/js_transforms/js_table_of_contents_transforms/)

Contents

opencv5

Js Table Of Contents Transforms

OpenCV 5

Js Table Of Contents Transforms

# Image Transforms in OpenCV.js {#tutorial\_js\_table\_of\_contents\_transforms}

-   @subpage tutorial\_js\_fourier\_transform Learn to find the Fourier Transform of images

## [Js Watershed](https://docharvest.github.io/docs/opencv5/js_tutorials/js_imgproc/js_watershed/js_watershed/)

Contents

opencv5

Js Watershed

OpenCV 5

Js Watershed

# Image Segmentation with Watershed Algorithm {#tutorial\_js\_watershed}

## Goal

-   We will learn how to use marker-based image segmentation using watershed algorithm
-   We will learn: **cv.watershed()**

## Theory

Any grayscale image can be viewed as a topographic surface where high intensity denotes peaks and hills while low intensity denotes valleys. You start filling every isolated valleys (local minima) with different colored water (labels). As the water rises, depending on the peaks (gradients) nearby, water from different valleys, obviously with different colors will start to merge. To avoid that, you build barriers in the locations where water merges. You continue the work of filling water and building barriers until all the peaks are under water. Then the barriers you created gives you the segmentation result. This is the "philosophy" behind the watershed. You can visit the [CMM webpage on watershed](https://people.cmm.minesparis.psl.eu/users/beucher/wtshed.html) to understand it with the help of some animations.

But this approach gives you oversegmented result due to noise or any other irregularities in the image. So OpenCV implemented a marker-based watershed algorithm where you specify which are all valley points are to be merged and which are not. It is an interactive image segmentation. What we do is to give different labels for our object we know. Label the region which we are sure of being the foreground or object with one color (or intensity), label the region which we are sure of being background or non-object with another color and finally the region which we are not sure of anything, label it with 0. That is our marker. Then apply watershed algorithm. Then our marker will be updated with the labels we gave, and the boundaries of objects will have a value of -1.

## Code

Below we will see an example on how to use the Distance Transform along with watershed to segment mutually touching objects.

Consider the coins image below, the coins are touching each other. Even if you threshold it, it will be touching each other.

We start with finding an approximate estimate of the coins. For that, we can use the Otsu's binarization.

## Try it

\\htmlonly

\\endhtmlonly

Now we need to remove any small white noises in the image. For that we can use morphological opening. To remove any small holes in the object, we can use morphological closing. So, now we know for sure that region near to center of objects are foreground and region much away from the object are background. Only region we are not sure is the boundary region of coins.

So we need to extract the area which we are sure they are coins. Erosion removes the boundary pixels. So whatever remaining, we can be sure it is coin. That would work if objects were not touching each other. But since they are touching each other, another good option would be to find the distance transform and apply a proper threshold. Next we need to find the area which we are sure they are not coins. For that, we dilate the result. Dilation increases object boundary to background. This way, we can make sure whatever region in background in result is really a background, since boundary region is removed. See the image below.

## Try it

\\htmlonly

\\endhtmlonly

The remaining regions are those which we don't have any idea, whether it is coins or background. Watershed algorithm should find it. These areas are normally around the boundaries of coins where foreground and background meet (Or even two different coins meet). We call it border. It can be obtained from subtracting sure\_fg area from sure\_bg area.

We use the function: **cv.distanceTransform (src, dst, distanceType, maskSize, labelType = cv.CV\_32F)**

@param src 8-bit, single-channel (binary) source image. @param dst output image with calculated distances. It is a 8-bit or 32-bit floating-point, single-channel image of the same size as src. @param distanceType type of distance(see cv.DistanceTypes). @param maskSize size of the distance transform mask, see (cv.DistanceTransformMasks). @param labelType type of output image. It can be cv.CV\_8U or cv.CV\_32F. Type cv.CV\_8U can be used only for the first variant of the function and distanceType == DIST\_L1.

## Try it

\\htmlonly

\\endhtmlonly

In the thresholded image, we get some regions of coins which we are sure of coins and they are detached now. (In some cases, you may be interested in only foreground segmentation, not in separating the mutually touching objects. In that case, you need not use distance transform, just erosion is sufficient. Erosion is just another method to extract sure foreground area, that's all.)

## Try it

\\htmlonly

\\endhtmlonly

Now we know for sure which are region of coins, which are background and all. So we create marker (it is an array of same size as that of original image, but with int32 datatype) and label the regions inside it. The regions we know for sure (whether foreground or background) are labelled with any positive integers, but different integers, and the area we don't know for sure are just left as zero. For this we use **cv.connectedComponents()**. It labels background of the image with 0, then other objects are labelled with integers starting from 1.

But we know that if background is marked with 0, watershed will consider it as unknown area. So we want to mark it with different integer. Instead, we will mark unknown region, defined by unknown, with 0.

Now our marker is ready. It is time for final step, apply watershed. Then marker image will be modified. The boundary region will be marked with -1.

We use the function: **cv.connectedComponents (image, labels, connectivity = 8, ltype = cv.CV\_32S)** @param image the 8-bit single-channel image to be labeled. @param labels destination labeled image(cv.CV\_32SC1 type). @param connectivity 8 or 4 for 8-way or 4-way connectivity respectively. @param ltype output image label type. Currently cv.CV\_32S and cv.CV\_16U are supported.

We use the function: **cv.watershed (image, markers)**

@param image input 8-bit 3-channel image. @param markers input/output 32-bit single-channel image (map) of markers. It should have the same size as image .

## Try it

\\htmlonly

\\endhtmlonly

## [Js Intro](https://docharvest.github.io/docs/opencv5/js_tutorials/js_setup/js_intro/js_intro/)

Contents

opencv5

Js Intro

OpenCV 5

Js Intro

# Introduction to OpenCV.js and Tutorials {#tutorial\_js\_intro}

## OpenCV

OpenCV was created at Intel in 1999 by **Gary Bradski**. The first release came out in 2000. **Vadim Pisarevsky** joined Gary Bradski to manage Intel's Russian software OpenCV team. In 2005, OpenCV was used on Stanley; the vehicle that won the 2005 DARPA Grand Challenge. Later, its active development continued under the support of Willow Garage, with Gary Bradski and Vadim Pisarevsky leading the project. OpenCV now supports a multitude of algorithms related to Computer Vision and Machine Learning and is expanding day by day.

OpenCV supports a wide variety of programming languages such as C++, Python, and Java, and is available on different platforms including Windows, Linux, OS X, Android, and iOS. Interfaces for high-speed GPU operations based on CUDA and OpenCL are also under active development. OpenCV.js brings OpenCV to the open web platform and makes it available to the JavaScript programmer.

## OpenCV.js: OpenCV for the JavaScript programmer

Web is the most ubiquitous open computing platform. With HTML5 standards implemented in every browser, web applications are able to render online video with HTML5 video tags, capture webcam video via WebRTC API, and access each pixel of a video frame via canvas API. With abundance of available multimedia content, web developers are in need of a wide array of image and vision processing algorithms in JavaScript to build innovative applications. This requirement is even more essential for emerging applications on the web, such as Web Virtual Reality (WebVR) and Augmented Reality (WebAR). All of these use cases demand efficient implementations of computation-intensive vision kernels on web.

[Emscripten](https://emscripten.org/) is an LLVM-to-JavaScript compiler. It takes LLVM bitcode - which can be generated from C/C++ using clang, and compiles that into asm.js or WebAssembly that can execute directly inside the web browsers. . Asm.js is a highly optimizable, low-level subset of JavaScript. Asm.js enables ahead-of-time compilation and optimization in JavaScript engine that provide near-to-native execution speed. WebAssembly is a new portable, size- and load-time-efficient binary format suitable for compilation to the web. WebAssembly aims to execute at native speed. WebAssembly is currently being designed as an open standard by W3C.

OpenCV.js is a JavaScript binding for selected subset of OpenCV functions for the web platform. It allows emerging web applications with multimedia processing to benefit from the wide variety of vision functions available in OpenCV. OpenCV.js leverages Emscripten to compile OpenCV functions into asm.js or WebAssembly targets, and provides a JavaScript APIs for web application to access them. The future versions of the library will take advantage of acceleration APIs that are available on the Web such as SIMD and multi-threaded execution.

OpenCV.js was initially created in Parallel Architectures and Systems Group at University of California Irvine (UCI) as a research project funded by Intel Corporation. OpenCV.js was further improved and integrated into the OpenCV project as part of Google Summer of Code 2017 program.

## OpenCV.js Tutorials

OpenCV introduces a new set of tutorials that will guide you through various functions available in OpenCV.js. **This guide is mainly focused on OpenCV 3.x version**.

The purpose of OpenCV.js tutorials is to: -# Help with adaptability of OpenCV in web development -# Help the web community, developers and computer vision researchers to interactively access a variety of web-based OpenCV examples to help them understand specific vision algorithms.

Because OpenCV.js is able to run directly inside browser, the OpenCV.js tutorial web pages are intuitive and interactive. For example, using WebRTC API and evaluating JavaScript code would allow developers to change the parameters of CV functions and do live CV coding on web pages to see the results in real time.

Prior knowledge of JavaScript and web application development is recommended to understand this guide.

## Contributors

Below is the list of contributors of OpenCV.js bindings and tutorials.

-   Sajjad Taheri (Architect of the initial version and GSoC mentor, University of California, Irvine)
-   Congxiang Pan (GSoC student, Shanghai Jiao Tong University)
-   Gang Song (GSoC student, Shanghai Jiao Tong University)
-   Wenyao Gan (Student intern, Shanghai Jiao Tong University)
-   Mohammad Reza Haghighat (Project initiator & sponsor, Intel Corporation)
-   Ningxin Hu (Students' supervisor, Intel Corporation)

## [Js Nodejs](https://docharvest.github.io/docs/opencv5/js_tutorials/js_setup/js_nodejs/js_nodejs/)

Contents

opencv5

Js Nodejs

OpenCV 5

Js Nodejs

# Using OpenCV.js In Node.js {#tutorial\_js\_nodejs}

## Goals

In this tutorial, you will learn:

-   Use OpenCV.js in a [Node.js](https://nodejs.org) application.
-   Load images with [jimp](https://www.npmjs.com/package/jimp) in order to use them with OpenCV.js.
-   Using [jsdom](https://www.npmjs.com/package/canvas) and [node-canvas](https://www.npmjs.com/package/canvas) to support `cv.imread()`, `cv.imshow()`
-   The basics of [emscripten](https://emscripten.org/) APIs, like [Module](https://emscripten.org/docs/api_reference/module.html) and [File System](https://emscripten.org/docs/api_reference/Filesystem-API.html) on which OpenCV.js is based.
-   Learn Node.js basics. Although this tutorial assumes the user knows JavaScript, experience with Node.js is not required.

@note Besides giving instructions to run OpenCV.js in Node.js, another objective of this tutorial is to introduce users to the basics of [emscripten](https://emscripten.org/) APIs, like [Module](https://emscripten.org/docs/api_reference/module.html) and [File System](https://emscripten.org/docs/api_reference/Filesystem-API.html) and also Node.js.

## Minimal example

Create a file `example1.js` with the following content:

@code{.js} // Define a global variable 'Module' with a method 'onRuntimeInitialized': Module = { onRuntimeInitialized() { // this is our application: console.log(cv.getBuildInformation()) } } // Load 'opencv.js' assigning the value to the global variable 'cv' cv = require('./opencv.js') @endcode

### Execute it

-   Save the file as `example1.js`.
-   Make sure the file `opencv.js` is in the same folder.
-   Make sure [Node.js](https://nodejs.org) is installed on your system.

The following command should print OpenCV build information:

@code{.bash} node example1.js @endcode

### What just happened?

-   **In the first statement**:, by defining a global variable named 'Module', emscripten will call `Module.onRuntimeInitialized()` when the library is ready to use. Our program is in that method and uses the global variable `cv` just like in the browser.
-   The statement **"cv = require('./opencv.js')"** requires the file `opencv.js` and assign the return value to the global variable `cv`. `require()` which is a Node.js API, is used to load modules and files. In this case we load the file `opencv.js` form the current folder, and, as said previously emscripten will call `Module.onRuntimeInitialized()` when its ready.
-   See [emscripten Module API](https://emscripten.org/docs/api_reference/module.html) for more details.

## Working with images

OpenCV.js doesn't support image formats so we can't load png or jpeg images directly. In the browser it uses the HTML DOM (like HTMLCanvasElement and HTMLImageElement to decode and decode images). In node.js we will need to use a library for this.

In this example we use [jimp](https://www.npmjs.com/package/jimp), which supports common image formats and is pretty easy to use.

### Example setup

Execute the following commands to create a new node.js package and install [jimp](https://www.npmjs.com/package/jimp) dependency:

@code{.bash} mkdir project1 cd project1 npm init -y npm install jimp @endcode

### The example

@code{.js} const Jimp = require('jimp');

async function onRuntimeInitialized(){

// load local image file with jimp. It supports jpg, png, bmp, tiff and gif: var jimpSrc = await Jimp.read('./lena.jpg');

// `jimpImage.bitmap` property has the decoded ImageData that we can use to create a cv:Mat var src = cv.matFromImageData(jimpSrc.bitmap);

// following lines is copy&paste of opencv.js dilate tutorial: let dst = new cv.Mat(); let M = cv.Mat.ones(5, 5, cv.CV\_8U); let anchor = new cv.Point(-1, -1); cv.dilate(src, dst, M, anchor, 1, cv.BORDER\_CONSTANT, cv.morphologyDefaultBorderValue());

// Now that we are finish, we want to write `dst` to file `output.png`. For this we create a `Jimp` // image which accepts the image data as a [`Buffer`](https://nodejs.org/docs/latest-v10.x/api/buffer.html). // `write('output.png')` will write it to disk and Jimp infers the output format from given file name: new Jimp({ width: dst.cols, height: dst.rows, data: Buffer.from(dst.data) }) .write('output.png');

src.delete(); dst.delete(); }

// Finally, load the open.js as before. The function `onRuntimeInitialized` contains our program. Module = { onRuntimeInitialized }; cv = require('./opencv.js'); @endcode

### Execute it

-   Save the file as `exampleNodeJimp.js`.
-   Make sure a sample image `lena.jpg` exists in the current directory.

The following command should generate the file `output.png`:

@code{.bash} node exampleNodeJimp.js @endcode

## Emulating HTML DOM and canvas

As you might already seen, the rest of the examples use functions like `cv.imread()`, `cv.imshow()` to read and write images. Unfortunately as mentioned they won't work on Node.js since there is no HTML DOM.

In this section, you will learn how to use [jsdom](https://www.npmjs.com/package/canvas) and [node-canvas](https://www.npmjs.com/package/canvas) to emulate the HTML DOM on Node.js so those functions work.

### Example setup

As before, we create a Node.js project and install the dependencies we need:

@code{.bash} mkdir project2 cd project2 npm init -y npm install canvas jsdom @endcode

### The example

@code{.js} const { Canvas, createCanvas, Image, ImageData, loadImage } = require('canvas'); const { JSDOM } = require('jsdom'); const { writeFileSync, existsSync, mkdirSync } = require("fs");

// This is our program. This time we use JavaScript async / await and promises to handle asynchronicity. (async () => {

// before loading opencv.js we emulate a minimal HTML DOM. See the function declaration below. installDOM();

await loadOpenCV();

// using node-canvas, we an image file to an object compatible with HTML DOM Image and therefore with cv.imread() const image = await loadImage('./lena.jpg');

const src = cv.imread(image); const dst = new cv.Mat(); const M = cv.Mat.ones(5, 5, cv.CV\_8U); const anchor = new cv.Point(-1, -1); cv.dilate(src, dst, M, anchor, 1, cv.BORDER\_CONSTANT, cv.morphologyDefaultBorderValue());

// we create an object compatible HTMLCanvasElement const canvas = createCanvas(300, 300); cv.imshow(canvas, dst); writeFileSync('output.jpg', canvas.toBuffer('image/jpeg')); src.delete(); dst.delete(); })();

// Load opencv.js just like before but using Promise instead of callbacks: function loadOpenCV() { return new Promise(resolve => { global.Module = { onRuntimeInitialized: resolve }; global.cv = require('./opencv.js'); }); }

// Using jsdom and node-canvas we define some global variables to emulate HTML DOM. // Although a complete emulation can be archived, here we only define those globals used // by cv.imread() and cv.imshow(). function installDOM() { const dom = new JSDOM(); global.document = dom.window.document;

// The rest enables DOM image and canvas and is provided by node-canvas global.Image = Image; global.HTMLCanvasElement = Canvas; global.ImageData = ImageData; global.HTMLImageElement = Image; } @endcode

### Execute it

-   Save the file as `exampleNodeCanvas.js`.
-   Make sure a sample image `lena.jpg` exists in the current directory.

The following command should generate the file `output.jpg`:

@code{.bash} node exampleNodeCanvas.js @endcode

## Dealing with files

In this tutorial you will learn how to configure emscripten so it uses the local filesystem for file operations instead of using memory. Also it tries to describe how [files are supported by emscripten applications](https://emscripten.org/docs/api_reference/Filesystem-API.html)

Accessing the emscripten filesystem is often needed in OpenCV applications for example to load machine learning models such as the ones used in @ref tutorial\_dnn\_googlenet and @ref tutorial\_dnn\_javascript.

### Example setup

Before the example, is worth consider first how files are handled in emscripten applications such as OpenCV.js. Remember that OpenCV library is written in C++ and the file opencv.js is just that C++ code being translated to JavaScript or WebAssembly by emscripten C++ compiler.

These C++ sources use standard APIs to access the filesystem and the implementation often ends up in system calls that read a file in the hard drive. Since JavaScript applications in the browser don't have access to the local filesystem, [emscripten emulates a standard filesystem](https://emscripten.org/docs/api_reference/Filesystem-API.html) so compiled C++ code works out of the box.

In the browser, this filesystem is emulated in memory while in Node.js there's also the possibility of using the local filesystem directly. This is often preferable since there's no need of copy file's content in memory. This section explains how to do just that, this is, configuring emscripten so files are accessed directly from our local filesystem and relative paths match files relative to the current local directory as expected.

### The example

@code{.js} const { Canvas, createCanvas, Image, ImageData, loadImage } = require('canvas'); const { JSDOM } = require('jsdom'); const { writeFileSync, existsSync, mkdirSync } = require('fs'); const https = require('https');

(async () => { const createFileFromUrl = function (path, url, maxRedirects = 10) { console.log('Downloading ' + url + '...'); return new Promise((resolve, reject) => { const download = (url, redirectCount) => { if (redirectCount > maxRedirects) { reject(new Error('Too many redirects')); } else { let connection = https.get(url, (response) => { if (response.statusCode === 200) { let data = \[\]; response.on('data', (chunk) => { data.push(chunk); });

```
        response.on('end', () => {
          try {
            writeFileSync(path, Buffer.concat(data));
            resolve();
          } catch (err) {
            reject(new Error('Failed to write file ' + path));
          }
        });
      } else if (response.statusCode === 302 || response.statusCode === 301) {
        connection.abort();
        download(response.headers.location, redirectCount + 1);
      } else {
        reject(new Error('Failed to load ' + url + ' status: ' + response.statusCode));
      }
    }).on('error', (err) => {
      reject(new Error('Network Error: ' + err.message));
    });
  }
};
download(url, 0);
```

}); };

if (!existsSync('./face\_detection\_yunet\_2023mar.onnx')) { await createFileFromUrl('./face\_detection\_yunet\_2023mar.onnx', '[https://media.githubusercontent.com/media/opencv/opencv\_zoo/main/models/face\_detection\_yunet/face\_detection\_yunet\_2023mar.onnx](https://media.githubusercontent.com/media/opencv/opencv_zoo/main/models/face_detection_yunet/face_detection_yunet_2023mar.onnx)') }

if (!existsSync('./opencv.js')) { await createFileFromUrl('./opencv.js', '[https://docs.opencv.org/5.x/opencv.js](https://docs.opencv.org/5.x/opencv.js)') }

if (!existsSync('./lena.jpg')) { await createFileFromUrl('./lena.jpg', '[https://docs.opencv.org/5.x/lena.jpg](https://docs.opencv.org/5.x/lena.jpg)') }

await loadOpenCV();

const image = await loadImage('./lena.jpg'); const src = cv.imread(image); let srcBGR = new cv.Mat(); cv.cvtColor(src, srcBGR, cv.COLOR\_RGBA2BGR);

// Load the deep learning model file. Notice how we reference local files using relative paths just // like we normally would do let netDet = new cv.FaceDetectorYN("./face\_detection\_yunet\_2023mar.onnx", "", new cv.Size(320, 320), 0.9, 0.3, 5000); netDet.setInputSize(new cv.Size(src.cols, src.rows)); let out = new cv.Mat(); netDet.detect(srcBGR, out);

let faces = \[\]; for (let i = 0, n = out.data32F.length; i < n; i += 15) { let left = out.data32F\[i\]; let top = out.data32F\[i + 1\]; let right = (out.data32F\[i\] + out.data32F\[i + 2\]); let bottom = (out.data32F\[i + 1\] + out.data32F\[i + 3\]); left = Math.min(Math.max(0, left), src.cols - 1); top = Math.min(Math.max(0, top), src.rows - 1); right = Math.min(Math.max(0, right), src.cols - 1); bottom = Math.min(Math.max(0, bottom), src.rows - 1);

if (left < right && top < bottom) { faces.push({ x: left, y: top, width: right - left, height: bottom - top, x1: out.data32F\[i + 4\] < 0 || out.data32F\[i + 4\] > src.cols - 1 ? -1 : out.data32F\[i + 4\], y1: out.data32F\[i + 5\] < 0 || out.data32F\[i + 5\] > src.rows - 1 ? -1 : out.data32F\[i + 5\], x2: out.data32F\[i + 6\] < 0 || out.data32F\[i + 6\] > src.cols - 1 ? -1 : out.data32F\[i + 6\], y2: out.data32F\[i + 7\] < 0 || out.data32F\[i + 7\] > src.rows - 1 ? -1 : out.data32F\[i + 7\], x3: out.data32F\[i + 8\] < 0 || out.data32F\[i + 8\] > src.cols - 1 ? -1 : out.data32F\[i + 8\], y3: out.data32F\[i + 9\] < 0 || out.data32F\[i + 9\] > src.rows - 1 ? -1 : out.data32F\[i + 9\], x4: out.data32F\[i + 10\] < 0 || out.data32F\[i + 10\] > src.cols - 1 ? -1 : out.data32F\[i + 10\], y4: out.data32F\[i + 11\] < 0 || out.data32F\[i + 11\] > src.rows - 1 ? -1 : out.data32F\[i + 11\], x5: out.data32F\[i + 12\] < 0 || out.data32F\[i + 12\] > src.cols - 1 ? -1 : out.data32F\[i + 12\], y5: out.data32F\[i + 13\] < 0 || out.data32F\[i + 13\] > src.rows - 1 ? -1 : out.data32F\[i + 13\], confidence: out.data32F\[i + 14\] }) } } out.delete();

faces.forEach(function(rect) { cv.rectangle(src, {x: rect.x, y: rect.y}, {x: rect.x + rect.width, y: rect.y + rect.height}, \[0, 255, 0, 255\]); if(rect.x1>0 && rect.y1>0) cv.circle(src, {x: rect.x1, y: rect.y1}, 2, \[255, 0, 0, 255\], 2) if(rect.x2>0 && rect.y2>0) cv.circle(src, {x: rect.x2, y: rect.y2}, 2, \[0, 0, 255, 255\], 2) if(rect.x3>0 && rect.y3>0) cv.circle(src, {x: rect.x3, y: rect.y3}, 2, \[0, 255, 0, 255\], 2) if(rect.x4>0 && rect.y4>0) cv.circle(src, {x: rect.x4, y: rect.y4}, 2, \[255, 0, 255, 255\], 2) if(rect.x5>0 && rect.y5>0) cv.circle(src, {x: rect.x5, y: rect.y5}, 2, \[0, 255, 255, 255\], 2) });

const canvas = createCanvas(image.width, image.height); cv.imshow(canvas, src); writeFileSync('output3.jpg', canvas.toBuffer('image/jpeg')); console.log('The result is saved.') src.delete(); srcBGR.delete(); })();

/\*\*

-   Loads opencv.js.

-   Installs HTML Canvas emulation to support `cv.imread()` and `cv.imshow`

-   Mounts given local folder `localRootDir` in emscripten filesystem folder `rootDir`. By default it will mount the local current directory in emscripten `/work` directory. This means that `/work/foo.txt` will be resolved to the local file `./foo.txt`
-   @param {string} rootDir The directory in emscripten filesystem in which the local filesystem will be mount.
-   @param {string} localRootDir The local directory to mount in emscripten filesystem.
-   @returns {Promise} resolved when the library is ready to use. \*/ function loadOpenCV(rootDir = '/work', localRootDir = process.cwd()) { if(global.Module && global.Module.onRuntimeInitialized && global.cv && global.cv.imread) { Promise.resolve() } return new Promise(resolve => { installDOM() global.Module = { onRuntimeInitialized() { // We change emscripten current work directory to 'rootDir' so relative paths are resolved // relative to the current local folder, as expected cv.FS.chdir(rootDir) resolve() }, preRun() { // preRun() is another callback like onRuntimeInitialized() but is called just before the // library code runs. Here we mount a local folder in emscripten filesystem and we want to // do this before the library is executed so the filesystem is accessible from the start const FS = global.Module.FS // create rootDir if it doesn't exists if(!FS.analyzePath(rootDir).exists) { FS.mkdir(rootDir); } // create localRootFolder if it doesn't exists if(!existsSync(localRootDir)) { mkdirSync(localRootDir, { recursive: true}); } // FS.mount() is similar to Linux/POSIX mount operation. It basically mounts an external // filesystem with given format, in given current filesystem directory. FS.mount(FS.filesystems.NODEFS, { root: localRootDir}, rootDir); } }; global.cv = require('./opencv.js') }); }

function installDOM(){ const dom = new JSDOM(); global.document = dom.window.document; global.Image = Image; global.HTMLCanvasElement = Canvas; global.ImageData = ImageData; global.HTMLImageElement = Image; } @endcode

### Execute it

-   Save the file as `exampleNodeCanvasData.js`.
-   The files `face_detection_yunet_2023mar.onnx`, `lena.jpg` and `opencv.js` will be downloaded if they not present in project's directory.

The following command should generate the file `output3.jpg` look the image below:

@code{.bash} node exampleNodeCanvasData.js @endcode

## [Js Setup](https://docharvest.github.io/docs/opencv5/js_tutorials/js_setup/js_setup/js_setup/)

Contents

opencv5

Js Setup

OpenCV 5

Js Setup

# Build OpenCV.js {#tutorial\_js\_setup}

@note You don't have to build your own copy if you simply want to start using it. Refer the Using Opencv.js tutorial for steps on getting a prebuilt copy from our releases or online documentation.

## Installing Emscripten

[Emscripten](https://github.com/emscripten-core/emscripten) is an LLVM-to-JavaScript compiler. We will use Emscripten to build OpenCV.js.

@note While this describes installation of required tools from scratch, there's a section below also describing an alternative procedure to perform the same build using docker containers which is often easier.

To Install Emscripten, follow instructions of [Emscripten SDK](https://emscripten.org/docs/getting_started/downloads.html).

For example: @code{.bash} ./emsdk update ./emsdk install latest ./emsdk activate latest @endcode

After install, ensure the `EMSDK` environment is setup correctly.

For example: @code{.bash} source ./emsdk\_env.sh echo ${EMSDK} @endcode

Modern versions of Emscripten requires to use `emcmake` / `emmake` launchers:

@code{.bash} emcmake sh -c 'echo ${EMSCRIPTEN}' @endcode

The version 2.0.10 of emscripten is verified for latest WebAssembly. Please check the version of Emscripten to use the newest features of WebAssembly.

For example: @code{.bash} ./emsdk update ./emsdk install 2.0.10 ./emsdk activate 2.0.10 @endcode

## Obtaining OpenCV Source Code

You can use the latest stable OpenCV version or you can grab the latest snapshot from our [Git repository](https://github.com/opencv/opencv.git).

### Obtaining the Latest Stable OpenCV Version

-   Go to our [releases page](https://opencv.org/releases).
-   Download the source archive and unpack it.

### Obtaining the Cutting-edge OpenCV from the Git Repository

Launch Git client and clone [OpenCV repository](http://github.com/opencv/opencv).

For example: @code{.bash} git clone [https://github.com/opencv/opencv.git](https://github.com/opencv/opencv.git) @endcode

@note It requires `git` installed in your development environment.

## Building OpenCV.js from Source

\-# To build `opencv.js`, execute python script `<opencv_src_dir>/platforms/js/build_js.py <build_dir>`. The build script builds WebAssembly version by default(`--build_wasm` switch is kept by back-compatibility reason). By default everything is bundled into one JavaScript file by `base64` encoding the WebAssembly code. For production builds you can add `--disable_single_file` which will reduce total size by writing the WebAssembly code to a dedicated `.wasm` file which the generated JavaScript file will automatically load.

```
For example, to build in `build_js` directory:
@code{.bash}
emcmake python ./opencv/platforms/js/build_js.py build_js
@endcode

@note
- It requires `python` and `cmake` installed in your development environment.
- To build with Emscripten 4.0.20 or later, append --cmake_option="-DCMAKE_CXX_STANDARD=17" .
  Embind requires C++17 or later since Emscripten 4.0.20.
  @code{.bash}
  emcmake python ./opencv/platforms/js/build_js.py build_js --cmake_option="-DCMAKE_CXX_STANDARD=17"
  @endcode
```

\-# \[Optional\] To build the OpenCV.js loader, append `--build_loader`.

```
For example:
@code{.bash}
emcmake python ./opencv/platforms/js/build_js.py build_js --build_loader
@endcode

@note
The loader is implemented as a js file in the path `<opencv_js_dir>/bin/loader.js`. The loader utilizes the [WebAssembly Feature Detection](https://github.com/GoogleChromeLabs/wasm-feature-detect) to detect the features of the browser and load corresponding OpenCV.js automatically. To use it, you need to use the UMD version of [WebAssembly Feature Detection](https://github.com/GoogleChromeLabs/wasm-feature-detect) and introduce the `loader.js` in your Web application.

Example Code:
@code{.javascript}
// Set paths configuration
let pathsConfig = {
    wasm: "../../build_wasm/opencv.js",
    threads: "../../build_mt/opencv.js",
    simd: "../../build_simd/opencv.js",
    threadsSimd: "../../build_mtSIMD/opencv.js",
}

// Load OpenCV.js and use the pathsConfiguration and main function as the params.
loadOpenCV(pathsConfig, main);
@endcode
```

\-# \[optional\] To build documents, append `--build_doc` option.

```
For example:
@code{.bash}
emcmake python ./opencv/platforms/js/build_js.py build_js --build_doc
@endcode

@note
It requires `doxygen` installed in your development environment.
```

\-# \[optional\] To build tests, append `--build_test` option.

```
For example:
@code{.bash}
emcmake python ./opencv/platforms/js/build_js.py build_js --build_test
@endcode
```

\-# \[optional\] To enable OpenCV contrib modules append `--cmake_option="-DOPENCV_EXTRA_MODULES_PATH=/path/to/opencv_contrib/modules/"`

```
For example:
@code{.bash}
emcmake python ./platforms/js/build_js.py build_js --cmake_option="-DOPENCV_EXTRA_MODULES_PATH=opencv_contrib/modules"
@endcode
```

\-# \[optional\] To enable WebNN backend, append `--webnn` option.

```
For example:
@code{.bash}
emcmake python ./opencv/platforms/js/build_js.py build_js --webnn
@endcode
```

## Running OpenCV.js Tests

Remember to launch the build command passing `--build_test` as mentioned previously. This will generate test source code ready to run together with `opencv.js` file in `build_js/bin`

### Manually in your browser

To run tests, launch a local web server in `\<build_dir\>/bin` folder. For example, node http-server which serves on `localhost:8080`.

Navigate the web browser to `http://localhost:8080/tests.html`, which runs the unit tests automatically. Command example:

@code{.sh} npx http-server build\_js/bin firefox [http://localhost:8080/tests.html](http://localhost:8080/tests.html) @endcode

@note This snippet and the following require [Node.js](https://nodejs.org) to be installed.

### Headless with Puppeteer

Alternatively tests can run with [GoogleChrome/puppeteer](https://github.com/GoogleChrome/puppeteer#readme) which is a version of Google Chrome that runs in the terminal (useful for Continuous integration like travis CI, etc)

@code{.sh} cd build\_js/bin npm install npm install --no-save puppeteer # automatically downloads Chromium package node run\_puppeteer.js @endcode

@note Checkout `node run_puppeteer --help` for more options to debug and reporting.

@note The command `npm install` only needs to be executed once, since installs the tools dependencies; after that they are ready to use.

@note Use `PUPPETEER_SKIP_CHROMIUM_DOWNLOAD=1 npm install --no-save puppeteer` to skip automatic downloading of Chromium. You may specify own Chromium/Chrome binary through `PUPPETEER_EXECUTABLE_PATH=$(which google-chrome)` environment variable. **BEWARE**: Puppeteer is only guaranteed to work with the bundled Chromium, use at your own risk.

### Using Node.js.

For example:

@code{.sh} cd build\_js/bin npm install node tests.js @endcode

@note If all tests are failed, then consider using Node.js from 8.x version (`lts/carbon` from `nvm`).

\-# \[optional\] To build `opencv.js` with threads optimization, append `--threads` option.

```
For example:
@code{.bash}
emcmake python ./opencv/platforms/js/build_js.py build_js --build_wasm --threads
@endcode

The default threads number is the logic core number of your device. You can use `cv.parallel_pthreads_set_threads_num(number)` to set threads number by yourself and use `cv.parallel_pthreads_get_threads_num()` to get the current threads number.

@note
You should build wasm version of `opencv.js` if you want to enable this optimization. And the threads optimization only works in browser, not in node.js. You need to enable the `WebAssembly threads support` feature first with your browser. For example, if you use Chrome, please enable this flag in chrome://flags.
```

\-# \[optional\] To build `opencv.js` with wasm simd optimization, append `--simd` option.

```
For example:
@code{.bash}
emcmake python ./opencv/platforms/js/build_js.py build_js --build_wasm --simd
@endcode

The simd optimization is experimental as wasm simd is still in development.

@note
Now only emscripten LLVM upstream backend supports wasm simd, referring to https://emscripten.org/docs/porting/simd.html. So you need to setup upstream backend environment with the following command first:
@code{.bash}
./emsdk update
./emsdk install latest-upstream
./emsdk activate latest-upstream
source ./emsdk_env.sh
@endcode

@note
You should build wasm version of `opencv.js` if you want to enable this optimization. For browser, you need to enable the `WebAssembly SIMD support` feature first. For example, if you use Chrome, please enable this flag in chrome://flags. For Node.js, you need to run script with flag `--experimental-wasm-simd`.

@note
The simd version of `opencv.js` built by latest LLVM upstream may not work with the stable browser or old version of Node.js. Please use the latest version of unstable browser or Node.js to get new features, like `Chrome Dev`.
```

\-# \[optional\] To build wasm intrinsics tests, append `--build_wasm_intrin_test` option.

```
For example:
@code{.bash}
emcmake python ./opencv/platforms/js/build_js.py build_js --build_wasm --simd --build_wasm_intrin_test
@endcode

For wasm intrinsics tests, you can use the following function to test all the cases:
@code{.js}
cv.test_hal_intrin_all()
@endcode

And the failed cases will be logged in the JavaScript debug console.

If you only want to test single data type of wasm intrinsics, you can use the following functions:
@code{.js}
cv.test_hal_intrin_uint8()
cv.test_hal_intrin_int8()
cv.test_hal_intrin_uint16()
cv.test_hal_intrin_int16()
cv.test_hal_intrin_uint32()
cv.test_hal_intrin_int32()
cv.test_hal_intrin_uint64()
cv.test_hal_intrin_int64()
cv.test_hal_intrin_float32()
cv.test_hal_intrin_float64()
@endcode
```

\-# \[optional\] To build performance tests, append `--build_perf` option.

```
For example:
@code{.bash}
emcmake python ./opencv/platforms/js/build_js.py build_js --build_perf
@endcode

To run performance tests, launch a local web server in \<build_dir\>/bin folder. For example, node http-server which serves on `localhost:8080`.

There are some kernels now in the performance test like `cvtColor`, `resize` and `threshold`. For example, if you want to test `threshold`, please navigate the web browser to `http://localhost:8080/perf/perf_imgproc/perf_threshold.html`. You need to input the test parameter like `(1920x1080, CV_8UC1, THRESH_BINARY)`, and then click the `Run` button to run the case. And if you don't input the parameter, it will run all the cases of this kernel.

You can also run tests using Node.js.

For example, run `threshold` with parameter `(1920x1080, CV_8UC1, THRESH_BINARY)`:
@code{.sh}
cd bin/perf
npm install
node perf_threshold.js --test_param_filter="(1920x1080, CV_8UC1, THRESH_BINARY)"
@endcode
```

## Building OpenCV.js with Docker

Alternatively, the same build can be can be accomplished using [docker](https://www.docker.com/) containers which is often easier and more reliable, particularly in non linux systems. You only need to install [docker](https://www.docker.com/) on your system and use a popular container that provides a clean well tested environment for emscripten builds like this, that already has latest versions of all the necessary tools installed.

So, make sure [docker](https://www.docker.com/) is installed in your system and running. The following shell script should work in Linux and MacOS:

@code{.bash} git clone [https://github.com/opencv/opencv.git](https://github.com/opencv/opencv.git) cd opencv docker run --rm -v $(pwd):/src -u $(id -u):$(id -g) emscripten/emsdk emcmake python3 ./platforms/js/build\_js.py build\_js @endcode

In Windows use the following PowerShell command:

@code{.bash} docker run --rm --workdir /src -v "$(get-location):/src" "emscripten/emsdk" emcmake python3 ./platforms/js/build\_js.py build\_js @endcode

@warning The example uses latest version of emscripten. If the build fails you should try a version that is known to work fine which is `2.0.10` using the following command:

@code{.bash} docker run --rm -v $(pwd):/src -u $(id -u):$(id -g) emscripten/emsdk:2.0.10 emcmake python3 ./platforms/js/build\_js.py build\_js @endcode

In Windows use the following PowerShell command:

@code{.bash} docker run --rm --workdir /src -v "$(get-location):/src" "emscripten/emsdk:2.0.10" emcmake python3 ./platforms/js/build\_js.py build\_js @endcode

### Building the documentation with Docker

To build the documentation `doxygen` needs to be installed. Create a file named `Dockerfile` with the following content:

```
FROM emscripten/emsdk:2.0.10

RUN apt-get update \
  && DEBIAN_FRONTEND=noninteractive apt-get install -y --no-install-recommends doxygen \
  && rm -rf /var/lib/apt/lists/*
```

Then we build the docker image and name it `opencv-js-doc` with the following command (that needs to be run only once):

@code{.bash} docker build . -t opencv-js-doc @endcode

Now run the build command again, this time using the new image and passing `--build_doc`:

@code{.bash} docker run --rm -v $(pwd):/src -u $(id -u):$(id -g) "opencv-js-doc" emcmake python3 ./platforms/js/build\_js.py build\_js --build\_doc @endcode

## [Js Table Of Contents Setup](https://docharvest.github.io/docs/opencv5/js_tutorials/js_setup/js_table_of_contents_setup/)

Contents

opencv5

Js Table Of Contents Setup

OpenCV 5

Js Table Of Contents Setup

# Introduction to OpenCV.js {#tutorial\_js\_table\_of\_contents\_setup}

-   @subpage tutorial\_js\_intro
    
    Introduction of OpenCV.js and Tutorials
    
-   @subpage tutorial\_js\_usage
    
    Get started with OpenCV.js
    
-   @subpage tutorial\_js\_setup
    
    Build OpenCV.js from source
    
-   @subpage tutorial\_js\_nodejs
    
    Using OpenCV.js In Node.js

## [Js Usage](https://docharvest.github.io/docs/opencv5/js_tutorials/js_setup/js_usage/js_usage/)

Contents

opencv5

Js Usage

OpenCV 5

Js Usage

# Using OpenCV.js {#tutorial\_js\_usage}

## Steps

In this tutorial, you will learn how to include and start to use `opencv.js` inside a web page. You can get a copy of `opencv.js` from `opencv-{VERSION_NUMBER}-docs.zip` in each [release](https://github.com/opencv/opencv/releases), or simply download the prebuilt script from the online documentations at "[https://docs.opencv.org/{VERSION\_NUMBER}/opencv.js](https://docs.opencv.org/%7BVERSION_NUMBER%7D/opencv.js)" (For example, [https://docs.opencv.org/5.0.0/opencv.js](https://docs.opencv.org/5.0.0/opencv.js). Use `5.x` if you want the latest build). You can also build your own copy by following the tutorial @ref tutorial\_js\_setup.

### Create a web page

First, let's create a simple web page that is able to upload an image.

@code{.js}

 Hello OpenCV.js

## Hello OpenCV.js

imageSrc 

@endcode

To run this web page, copy the content above and save to a local index.html file. To run it, open it using your web browser.

@note It is a better practice to use a local web server to host the index.html.

### Include OpenCV.js

Set the URL of `opencv.js` to `src` attribute of <script> tag.

@note For this tutorial, we host `opencv.js` at same folder as index.html. You can also choose to use the URL of the prebuilt `opencv.js` in our online documentation.

Example for synchronous loading: @code{.js}

@endcode

You may want to load `opencv.js` asynchronously by `async` attribute in <script> tag. To be notified when `opencv.js` is ready, you can register a callback to `onload` attribute.

Example for asynchronous loading @code{.js}

@endcode

### Use OpenCV.js

Once `opencv.js` is ready, you can access OpenCV objects and functions through `cv` object. The promise-typed `cv` object should be unwrap with `await` operator. See [https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Operators/await](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Operators/await) .

For example, you can create a cv.Mat from an image by cv.imread.

@note Because image loading is asynchronous, you need to put cv.Mat creation inside the `onload` callback.

@code{.js} imgElement.onload = async function() { cv = (cv instanceof Promise) ? await cv : cv; let mat = cv.imread(imgElement); } @endcode

Many OpenCV functions can be used to process cv.Mat. You can refer to other tutorials, such as @ref tutorial\_js\_table\_of\_contents\_imgproc, for details.

In this tutorial, we just show a cv.Mat on screen. To show a cv.Mat, you need a canvas element.

@code{.js}

@endcode

You can use cv.imshow to show cv.Mat on the canvas. @code{.js} cv.imshow("outputCanvas", mat); @endcode

Putting all of the steps together, the final index.html is shown below.

@code{.js}

 Hello OpenCV.js

## Hello OpenCV.js

OpenCV.js is loading...

imageSrc 

canvasOutput

@endcode

@note You have to call delete method of cv.Mat to free memory allocated in Emscripten's heap. Please refer to [Memory management of Emscripten](https://emscripten.org/docs/porting/connecting_cpp_and_javascript/embind.html#memory-management) for details.

## Try it

\\htmlonly

\\endhtmlonly

## [Js Tutorials](https://docharvest.github.io/docs/opencv5/js_tutorials/js_tutorials/)

Contents

opencv5

Js Tutorials

OpenCV 5

Js Tutorials

# OpenCV.js Tutorials {#tutorial\_js\_root}

-   @subpage tutorial\_js\_table\_of\_contents\_setup
    
    Learn how to use OpenCV.js inside your web pages!
    
-   @subpage tutorial\_js\_table\_of\_contents\_gui
    
    Here you will learn how to read and display images and videos, and create trackbar.
    
-   @subpage tutorial\_js\_table\_of\_contents\_core
    
    In this section you will learn some basic operations on image, some mathematical tools and some data structures etc.
    
-   @subpage tutorial\_js\_table\_of\_contents\_imgproc
    
    In this section you will learn different image processing functions inside OpenCV.
    
-   @subpage tutorial\_js\_table\_of\_contents\_video
    
    In this section you will learn different techniques to work with videos like object tracking etc.
    
-   @subpage tutorial\_js\_table\_of\_contents\_dnn
    
    These tutorials show how to use dnn module in JavaScript

## [Js Bg Subtraction](https://docharvest.github.io/docs/opencv5/js_tutorials/js_video/js_bg_subtraction/js_bg_subtraction/)

Contents

opencv5

Js Bg Subtraction

OpenCV 5

Js Bg Subtraction

# Background Subtraction {#tutorial\_js\_bg\_subtraction}

## Goal

-   We will familiarize with the background subtraction methods available in OpenCV.js.

## Basics

Background subtraction is a major preprocessing steps in many vision based applications. For example, consider the cases like visitor counter where a static camera takes the number of visitors entering or leaving the room, or a traffic camera extracting information about the vehicles etc. In all these cases, first you need to extract the person or vehicles alone. Technically, you need to extract the moving foreground from static background.

If you have an image of background alone, like image of the room without visitors, image of the road without vehicles etc, it is an easy job. Just subtract the new image from the background. You get the foreground objects alone. But in most of the cases, you may not have such an image, so we need to extract the background from whatever images we have. It become more complicated when there is shadow of the vehicles. Since shadow is also moving, simple subtraction will mark that also as foreground. It complicates things.

OpenCV.js has implemented one algorithm for this purpose, which is very easy to use.

## BackgroundSubtractorMOG2

It is a Gaussian Mixture-based Background/Foreground Segmentation Algorithm. It is based on two papers by Z.Zivkovic, "Improved adaptive Gaussian mixture model for background subtraction" in 2004 and "Efficient Adaptive Density Estimation per Image Pixel for the Task of Background Subtraction" in 2006. One important feature of this algorithm is that it selects the appropriate number of gaussian distribution for each pixel. It provides better adaptibility to varying scenes due illumination changes etc.

While coding, we use the constructor: **cv.BackgroundSubtractorMOG2 (history = 500, varThreshold = 16, detectShadows = true)** @param history Length of the history. @param varThreshold Threshold on the squared distance between the pixel and the sample to decide whether a pixel is close to that sample. This parameter does not affect the background update. @param detectShadows If true, the algorithm will detect shadows and mark them. It decreases the speed a bit, so if you do not need this feature, set the parameter to false. @return instance of cv.BackgroundSubtractorMOG2

Use **apply (image, fgmask, learningRate = -1)** method to get the foreground mask @param image Next video frame. Floating point frame will be used without scaling and should be in range \[0,255\]. @param fgmask The output foreground mask as an 8-bit binary image. @param learningRate The value between 0 and 1 that indicates how fast the background model is learnt. Negative parameter value makes the algorithm to use some automatically chosen learning rate. 0 means that the background model is not updated at all, 1 means that the background model is completely reinitialized from the last frame.

@note The instance of cv.BackgroundSubtractorMOG2 should be deleted manually.

## Try it

\\htmlonly

\\endhtmlonly

## [Js Lucas Kanade](https://docharvest.github.io/docs/opencv5/js_tutorials/js_video/js_lucas_kanade/js_lucas_kanade/)

Contents

opencv5

Js Lucas Kanade

OpenCV 5

Js Lucas Kanade

# Optical Flow {#tutorial\_js\_lucas\_kanade}

## Goal

-   We will understand the concepts of optical flow and its estimation using Lucas-Kanade method.
-   We will use functions like **cv.calcOpticalFlowPyrLK()** to track feature points in a video.

## Optical Flow

Optical flow is the pattern of apparent motion of image objects between two consecutive frames caused by the movement of object or camera. It is 2D vector field where each vector is a displacement vector showing the movement of points from first frame to second. Consider the image below (Image Courtesy: [Wikipedia article on Optical Flow](http://en.wikipedia.org/wiki/Optical_flow)).

It shows a ball moving in 5 consecutive frames. The arrow shows its displacement vector. Optical flow has many applications in areas like :

-   Structure from Motion
-   Video Compression
-   Video Stabilization ...

Optical flow works on several assumptions:

\-# The pixel intensities of an object do not change between consecutive frames. 2. Neighbouring pixels have similar motion.

Consider a pixel \\f$I(x,y,t)\\f$ in first frame (Check a new dimension, time, is added here. Earlier we were working with images only, so no need of time). It moves by distance \\f$(dx,dy)\\f$ in next frame taken after \\f$dt\\f$ time. So since those pixels are the same and intensity does not change, we can say,

\\f\[I(x,y,t) = I(x+dx, y+dy, t+dt)\\f\]

Then take taylor series approximation of right-hand side, remove common terms and divide by \\f$dt\\f$ to get the following equation:

\\f\[f\_x u + f\_y v + f\_t = 0 ;\\f\]

where:

\\f\[f\_x = \\frac{\\partial f}{\\partial x} ; ; ; f\_y = \\frac{\\partial f}{\\partial y}\\f\]\\f\[u = \\frac{dx}{dt} ; ; ; v = \\frac{dy}{dt}\\f\]

Above equation is called Optical Flow equation. In it, we can find \\f$f\_x\\f$ and \\f$f\_y\\f$, they are image gradients. Similarly \\f$f\_t\\f$ is the gradient along time. But \\f$(u,v)\\f$ is unknown. We cannot solve this one equation with two unknown variables. So several methods are provided to solve this problem and one of them is Lucas-Kanade.

### Lucas-Kanade method

We have seen an assumption before, that all the neighbouring pixels will have similar motion. Lucas-Kanade method takes a 3x3 patch around the point. So all the 9 points have the same motion. We can find \\f$(f\_x, f\_y, f\_t)\\f$ for these 9 points. So now our problem becomes solving 9 equations with two unknown variables which is over-determined. A better solution is obtained with least square fit method. Below is the final solution which is two equation-two unknown problem and solve to get the solution.

\\f\[\\begin{bmatrix} u \\ v \\end{bmatrix} = \\begin{bmatrix} \\sum\_{i}{f\_{x\_i}}^2 & \\sum\_{i}{f\_{x\_i} f\_{y\_i} } \\ \\sum\_{i}{f\_{x\_i} f\_{y\_i}} & \\sum\_{i}{f\_{y\_i}}^2 \\end{bmatrix}^{-1} \\begin{bmatrix} - \\sum\_{i}{f\_{x\_i} f\_{t\_i}} \\ - \\sum\_{i}{f\_{y\_i} f\_{t\_i}} \\end{bmatrix}\\f\]

( Check similarity of inverse matrix with Harris corner detector. It denotes that corners are better points to be tracked.)

So from user point of view, idea is simple, we give some points to track, we receive the optical flow vectors of those points. But again there are some problems. Until now, we were dealing with small motions. So it fails when there is large motion. So again we go for pyramids. When we go up in the pyramid, small motions are removed and large motions becomes small motions. So applying Lucas-Kanade there, we get optical flow along with the scale.

## Lucas-Kanade Optical Flow in OpenCV.js

We use the function: **cv.calcOpticalFlowPyrLK (prevImg, nextImg, prevPts, nextPts, status, err, winSize = new cv.Size(21, 21), maxLevel = 3, criteria = new cv.TermCriteria(cv.TermCriteria\_COUNT+ cv.TermCriteria\_EPS, 30, 0.01), flags = 0, minEigThreshold = 1e-4)**. @param prevImg first 8-bit input image or pyramid constructed by buildOpticalFlowPyramid. @param nextImg second input image or pyramid of the same size and the same type as prevImg. @param prevPts vector of 2D points for which the flow needs to be found; point coordinates must be single-precision floating-point numbers. @param nextPts output vector of 2D points (with single-precision floating-point coordinates) containing the calculated new positions of input features in the second image; when cv.OPTFLOW\_USE\_ INITIAL\_FLOW flag is passed, the vector must have the same size as in the input. @param status output status vector (of unsigned chars); each element of the vector is set to 1 if the flow for the corresponding features has been found, otherwise, it is set to 0. @param err output vector of errors; each element of the vector is set to an error for the corresponding feature, type of the error measure can be set in flags parameter; if the flow wasn't found then the error is not defined (use the status parameter to find such cases). @param winSize size of the search window at each pyramid level. @param maxLevel 0-based maximal pyramid level number; if set to 0, pyramids are not used (single level), if set to 1, two levels are used, and so on; if pyramids are passed to input then algorithm will use as many levels as pyramids have but no more than maxLevel. @param criteria parameter, specifying the termination criteria of the iterative search algorithm (after the specified maximum number of iterations criteria.maxCount or when the search window moves by less than criteria.epsilon. @param flags operation flags:

-   cv.OPTFLOW\_USE\_INITIAL\_FLOW uses initial estimations, stored in nextPts; if the flag is not set, then prevPts is copied to nextPts and is considered the initial estimate.
-   cv.OPTFLOW\_LK\_GET\_MIN\_EIGENVALS use minimum eigen values as an error measure (see minEigThreshold description); if the flag is not set, then L1 distance between patches around the original and a moved point, divided by number of pixels in a window, is used as a error measure. @param minEigThreshold the algorithm calculates the minimum eigen value of a 2x2 normal matrix of optical flow equations, divided by number of pixels in a window; if this value is less than minEigThreshold, then a corresponding feature is filtered out and its flow is not processed, so it allows to remove bad points and get a performance boost.

### Try it

\\htmlonly

\\endhtmlonly

(This code doesn't check how correct are the next keypoints. So even if any feature point disappears in image, there is a chance that optical flow finds the next point which may look close to it. So actually for a robust tracking, corner points should be detected in particular intervals.)

## Dense Optical Flow in OpenCV.js

Lucas-Kanade method computes optical flow for a sparse feature set (in our example, corners detected using Shi-Tomasi algorithm). OpenCV.js provides another algorithm to find the dense optical flow. It computes the optical flow for all the points in the frame. It is based on Gunnar Farneback's algorithm which is explained in "Two-Frame Motion Estimation Based on Polynomial Expansion" by Gunnar Farneback in 2003.

We use the function: **cv.calcOpticalFlowFarneback (prev, next, flow, pyrScale, levels, winsize, iterations, polyN, polySigma, flags)** @param prev first 8-bit single-channel input image. @param next second input image of the same size and the same type as prev. @param flow computed flow image that has the same size as prev and type CV\_32FC2. @param pyrScale parameter, specifying the image scale (<1) to build pyramids for each image; pyrScale=0.5 means a classical pyramid, where each next layer is twice smaller than the previous one. @param levels number of pyramid layers including the initial image; levels=1 means that no extra layers are created and only the original images are used. @param winsize averaging window size; larger values increase the algorithm robustness to image noise and give more chances for fast motion detection, but yield more blurred motion field. @param iterations number of iterations the algorithm does at each pyramid level. @param polyN size of the pixel neighborhood used to find polynomial expansion in each pixel; larger values mean that the image will be approximated with smoother surfaces, yielding more robust algorithm and more blurred motion field, typically polyN =5 or 7. @param polySigma standard deviation of the Gaussian that is used to smooth derivatives used as a basis for the polynomial expansion; for polyN=5, you can set polySigma=1.1, for polyN=7, a good value would be polySigma=1.5. @param flags operation flags that can be a combination of the following:

-   cv.OPTFLOW\_USE\_INITIAL\_FLOW uses the input flow as an initial flow approximation.
-   cv.OPTFLOW\_FARNEBACK\_GAUSSIAN uses the Gaussian 𝚠𝚒𝚗𝚜𝚒𝚣𝚎×𝚠𝚒𝚗𝚜𝚒𝚣𝚎 filter instead of a box filter of the same size for optical flow estimation; usually, this option gives z more accurate flow than with a box filter, at the cost of lower speed; normally, winsize for a Gaussian window should be set to a larger value to achieve the same level of robustness.

### Try it

\\htmlonly

\\endhtmlonly

## [Js Meanshift](https://docharvest.github.io/docs/opencv5/js_tutorials/js_video/js_meanshift/js_meanshift/)

Contents

opencv5

Js Meanshift

OpenCV 5

Js Meanshift

# Meanshift and Camshift {#tutorial\_js\_meanshift}

## Goal

-   We will learn about Meanshift and Camshift algorithms to find and track objects in videos.

## Meanshift

The intuition behind the meanshift is simple. Consider you have a set of points. (It can be a pixel distribution like histogram backprojection). You are given a small window ( may be a circle) and you have to move that window to the area of maximum pixel density (or maximum number of points). It is illustrated in the simple image given below:

The initial window is shown in blue circle with the name "C1". Its original center is marked in blue rectangle, named "C1\_o". But if you find the centroid of the points inside that window, you will get the point "C1\_r" (marked in small blue circle) which is the real centroid of window. Surely they don't match. So move your window such that circle of the new window matches with previous centroid. Again find the new centroid. Most probably, it won't match. So move it again, and continue the iterations such that center of window and its centroid falls on the same location (or with a small desired error). So finally what you obtain is a window with maximum pixel distribution. It is marked with green circle, named "C2". As you can see in image, it has maximum number of points. The whole process is demonstrated on a static image below:

So we normally pass the histogram backprojected image and initial target location. When the object moves, obviously the movement is reflected in histogram backprojected image. As a result, meanshift algorithm moves our window to the new location with maximum density.

### Meanshift in OpenCV.js

To use meanshift in OpenCV.js, first we need to setup the target, find its histogram so that we can backproject the target on each frame for calculation of meanshift. We also need to provide initial location of window. For histogram, only Hue is considered here. Also, to avoid false values due to low light, low light values are discarded using **cv.inRange()** function.

We use the function: **cv.meanShift (probImage, window, criteria)** @param probImage Back projection of the object histogram. See cv.calcBackProject for details. @param window Initial search window. @param criteria Stop criteria for the iterative search algorithm. @return number of iterations meanShift took to converge and the new location

### Try it

\\htmlonly

\\endhtmlonly

## Camshift

Did you closely watch the last result? There is a problem. Our window always has the same size when the object is farther away and it is very close to camera. That is not good. We need to adapt the window size with size and rotation of the target. Once again, the solution came from "OpenCV Labs" and it is called CAMshift (Continuously Adaptive Meanshift) published by Gary Bradsky in his paper "Computer Vision Face Tracking for Use in a Perceptual User Interface" in 1988.

It applies meanshift first. Once meanshift converges, it updates the size of the window as, \\f$s = 2 \\times \\sqrt{\\frac{M\_{00}}{256}}\\f$. It also calculates the orientation of best fitting ellipse to it. Again it applies the meanshift with new scaled search window and previous window location. The process is continued until required accuracy is met.

### Camshift in OpenCV.js

It is almost same as meanshift, but it returns a rotated rectangle (that is our result) and box parameters (used to be passed as search window in next iteration).

We use the function: **cv.CamShift (probImage, window, criteria)** @param probImage Back projection of the object histogram. See cv.calcBackProject for details. @param window Initial search window. @param criteria Stop criteria for the iterative search algorithm. @return Rotated rectangle and the new search window

### Try it

\\htmlonly

\\endhtmlonly

## Additional Resources

\-# French Wikipedia page on [Camshift](http://fr.wikipedia.org/wiki/Camshift). (The two animations are taken from here) 2. Bradski, G.R., "Real time face and object tracking as a component of a perceptual user interface," Applications of Computer Vision, 1998. WACV '98. Proceedings., Fourth IEEE Workshop on , vol., no., pp.214,219, 19-21 Oct 1998

## [Js Table Of Contents Video](https://docharvest.github.io/docs/opencv5/js_tutorials/js_video/js_table_of_contents_video/)

Contents

opencv5

Js Table Of Contents Video

OpenCV 5

Js Table Of Contents Video

# Video Analysis {#tutorial\_js\_table\_of\_contents\_video}

-   @subpage tutorial\_js\_meanshift
    
    Here, we will learn about tracking algorithms such as "Meanshift", and its upgraded version, "Camshift" to find and track objects in videos.
    
-   @subpage tutorial\_js\_lucas\_kanade
    
    Now let's discuss an important concept, "Optical Flow", which is related to videos and has many applications.
    
-   @subpage tutorial\_js\_bg\_subtraction
    
    In several applications, we need to extract foreground for further operations like object tracking. Background Subtraction is a well-known method in those cases.

## [Opencv Logo](https://docharvest.github.io/docs/opencv5/opencv-logo/)

Contents

opencv5

Opencv Logo

OpenCV 5

Opencv Logo

OpenCV logo has been originally designed and contributed to OpenCV by Adi Shavit in 2006. The graphical part consists of three stylized letters O, C, V, colored in the primary R, G, B color components, used by humans and computers to perceive the world. It is shaped in a way to mimic the famous [Kanizsa's triangle](https://en.wikipedia.org/wiki/Illusory_contours) to emphasize that the prior knowledge and internal processing are at least as important as the actually acquired "raw" data.

The restyled version of the logo has been designed and contributed by [xperience.ai](https://xperience.ai/) in July 2020 for the [20th anniversary](https://opencv.org/anniversary/) of OpenCV.

The logo uses [Exo 2](https://fonts.google.com/specimen/Exo+2#about) font by Natanael Gama distributed under OFL license.

Higher-resolution version of the logo, as well as SVG version of it, can be obtained at OpenCV [Media Kit](https://opencv.org/resources/media-kit/).

## [Py Bindings Basics](https://docharvest.github.io/docs/opencv5/py_tutorials/py_bindings/py_bindings_basics/py_bindings_basics/)


## [Py Table Of Contents Bindings](https://docharvest.github.io/docs/opencv5/py_tutorials/py_bindings/py_table_of_contents_bindings/)

Contents

opencv5

Py Table Of Contents Bindings

OpenCV 5

Py Table Of Contents Bindings

# OpenCV-Python Bindings {#tutorial\_py\_table\_of\_contents\_bindings}

Here, you will learn how OpenCV-Python bindings are generated.

-   @subpage tutorial\_py\_bindings\_basics
    
    Learn how OpenCV-Python bindings are generated.

## [Py Calibration](https://docharvest.github.io/docs/opencv5/py_tutorials/py_calib3d/py_calibration/py_calibration/)

Contents

opencv5

Py Calibration

OpenCV 5

Py Calibration

# Camera Calibration {#tutorial\_py\_calibration}

## Goal

In this section, we will learn about

-   types of distortion caused by cameras
-   how to find the intrinsic and extrinsic properties of a camera
-   how to undistort images based off these properties

## Basics

Some pinhole cameras introduce significant distortion to images. Two major kinds of distortion are radial distortion and tangential distortion.

Radial distortion causes straight lines to appear curved. Radial distortion becomes larger the farther points are from the center of the image. For example, one image is shown below in which two edges of a chess board are marked with red lines. But, you can see that the border of the chess board is not a straight line and doesn't match with the red line. All the expected straight lines are bulged out. Visit [Distortion (optics)](http://en.wikipedia.org/wiki/Distortion_%28optics%29) for more details.

In the following sections several new parameters are introduced. Visit [Camera Calibration and 3D Reconstruction](#tutorial_table_of_content_calib3d) for more details.

Radial distortion can be represented as follows:

\\f\[x\_{distorted} = x( 1 + k\_1 r^2 + k\_2 r^4 + k\_3 r^6) \\ y\_{distorted} = y( 1 + k\_1 r^2 + k\_2 r^4 + k\_3 r^6)\\f\]

Similarly, tangential distortion occurs because the image-taking lense is not aligned perfectly parallel to the imaging plane. So, some areas in the image may look nearer than expected. The amount of tangential distortion can be represented as below:

\\f\[x\_{distorted} = x + \[ 2p\_1xy + p\_2(r^2+2x^2)\] \\ y\_{distorted} = y + \[ p\_1(r^2+ 2y^2)+ 2p\_2xy\]\\f\]

In short, we need to find five parameters, known as distortion coefficients given by:

\\f\[Distortion ; coefficients=(k\_1 \\hspace{10pt} k\_2 \\hspace{10pt} p\_1 \\hspace{10pt} p\_2 \\hspace{10pt} k\_3)\\f\]

In addition to this, we need to some other information, like the intrinsic and extrinsic parameters of the camera. Intrinsic parameters are specific to a camera. They include information like focal length (\\f$f\_x,f\_y\\f$) and optical centers (\\f$c\_x, c\_y\\f$). The focal length and optical centers can be used to create a camera matrix, which can be used to remove distortion due to the lenses of a specific camera. The camera matrix is unique to a specific camera, so once calculated, it can be reused on other images taken by the same camera. It is expressed as a 3x3 matrix:

\\f\[camera ; matrix = \\left \[ \\begin{matrix} f\_x & 0 & c\_x \\ 0 & f\_y & c\_y \\ 0 & 0 & 1 \\end{matrix} \\right \]\\f\]

Extrinsic parameters corresponds to rotation and translation vectors which translates a coordinates of a 3D point to a coordinate system.

For stereo applications, these distortions need to be corrected first. To find these parameters, we must provide some sample images of a well defined pattern (e.g. a chess board). We find some specific points of which we already know the relative positions (e.g. square corners in the chess board). We know the coordinates of these points in real world space and we know the coordinates in the image, so we can solve for the distortion coefficients. For better results, we need at least 10 test patterns.

## Code

As mentioned above, we need at least 10 test patterns for camera calibration. OpenCV comes with some images of a chess board (see samples/data/left01.jpg -- left14.jpg), so we will utilize these. Consider an image of a chess board. The important input data needed for calibration of the camera is the set of 3D real world points and the corresponding 2D coordinates of these points in the image. 2D image points are OK which we can easily find from the image. (These image points are locations where two black squares touch each other in chess boards)

What about the 3D points from real world space? Those images are taken from a static camera and chess boards are placed at different locations and orientations. So we need to know \\f$(X,Y,Z)\\f$ values. But for simplicity, we can say chess board was kept stationary at XY plane, (so Z=0 always) and camera was moved accordingly. This consideration helps us to find only X,Y values. Now for X,Y values, we can simply pass the points as (0,0), (1,0), (2,0), ... which denotes the location of points. In this case, the results we get will be in the scale of size of chess board square. But if we know the square size, (say 30 mm), we can pass the values as (0,0), (30,0), (60,0), ... . Thus, we get the results in mm. (In this case, we don't know square size since we didn't take those images, so we pass in terms of square size).

3D points are called **object points** and 2D image points are called **image points.**

### Setup

So to find pattern in chess board, we can use the function, **cv.findChessboardCorners()**. We also need to pass what kind of pattern we are looking for, like 8x8 grid, 5x5 grid etc. In this example, we use 7x6 grid. (Normally a chess board has 8x8 squares and 7x7 internal corners). It returns the corner points and retval which will be True if pattern is obtained. These corners will be placed in an order (from left-to-right, top-to-bottom)

@note This function may not be able to find the required pattern in all the images. So, one good option is to write the code such that, it starts the camera and check each frame for required pattern. Once the pattern is obtained, find the corners and store it in a list. Also, provide some interval before reading next frame so that we can adjust our chess board in different direction. Continue this process until the required number of good patterns are obtained. Even in the example provided here, we are not sure how many images out of the 14 given are good. Thus, we must read all the images and take only the good ones.

@note Instead of chess board, we can alternatively use a circular grid. In this case, we must use the function **cv.findCirclesGrid()** to find the pattern. Fewer images are sufficient to perform camera calibration using a circular grid.

Once we find the corners, we can increase their accuracy using **cv.cornerSubPix()**. We can also draw the pattern using **cv.drawChessboardCorners()**. All these steps are included in below code:

@code{.py} import numpy as np import cv2 as cv import glob

# termination criteria

criteria = (cv.TERM\_CRITERIA\_EPS + cv.TERM\_CRITERIA\_MAX\_ITER, 30, 0.001)

# prepare object points, like (0,0,0), (1,0,0), (2,0,0) ....,(6,5,0)

objp = np.zeros((6\*7,3), np.float32) objp\[:,:2\] = np.mgrid\[0:7,0:6\].T.reshape(-1,2)

# Arrays to store object points and image points from all the images.

objpoints = \[\] # 3d point in real world space imgpoints = \[\] # 2d points in image plane.

images = glob.glob('\*.jpg')

for fname in images: img = cv.imread(fname) gray = cv.cvtColor(img, cv.COLOR\_BGR2GRAY)

```
# Find the chess board corners
ret, corners = cv.findChessboardCorners(gray, (7,6), None)

# If found, add object points, image points (after refining them)
if ret == True:
    objpoints.append(objp)

    corners2 = cv.cornerSubPix(gray,corners, (11,11), (-1,-1), criteria)
    imgpoints.append(corners2)

    # Draw and display the corners
    cv.drawChessboardCorners(img, (7,6), corners2, ret)
    cv.imshow('img', img)
    cv.waitKey(500)
```

cv.destroyAllWindows() @endcode One image with pattern drawn on it is shown below:

### Calibration

Now that we have our object points and image points, we are ready to go for calibration. We can use the function, **cv.calibrateCamera()** which returns the camera matrix, distortion coefficients, rotation and translation vectors etc. @code{.py} ret, mtx, dist, rvecs, tvecs = cv.calibrateCamera(objpoints, imgpoints, gray.shape\[::-1\], None, None) @endcode

### Undistortion

Now, we can take an image and undistort it. OpenCV comes with two methods for doing this. However first, we can refine the camera matrix based on a free scaling parameter using **cv.getOptimalNewCameraMatrix()**. If the scaling parameter alpha=0, it returns undistorted image with minimum unwanted pixels. So it may even remove some pixels at image corners. If alpha=1, all pixels are retained with some extra black images. This function also returns an image ROI which can be used to crop the result.

So, we take a new image (left12.jpg in this case. That is the first image in this chapter) @code{.py} img = cv.imread('left12.jpg') h, w = img.shape\[:2\] newcameramtx, roi = cv.getOptimalNewCameraMatrix(mtx, dist, (w,h), 1, (w,h)) @endcode

#### 1\. Using **cv.undistort()**

This is the easiest way. Just call the function and use ROI obtained above to crop the result. @code{.py}

# undistort

dst = cv.undistort(img, mtx, dist, None, newcameramtx)

# crop the image

x, y, w, h = roi dst = dst\[y:y+h, x:x+w\] cv.imwrite('calibresult.png', dst) @endcode

#### 2\. Using **remapping**

This way is a little bit more difficult. First, find a mapping function from the distorted image to the undistorted image. Then use the remap function. @code{.py}

# undistort

mapx, mapy = cv.initUndistortRectifyMap(mtx, dist, None, newcameramtx, (w,h), 5) dst = cv.remap(img, mapx, mapy, cv.INTER\_LINEAR)

# crop the image

x, y, w, h = roi dst = dst\[y:y+h, x:x+w\] cv.imwrite('calibresult.png', dst) @endcode Still, both the methods give the same result. See the result below:

You can see in the result that all the edges are straight.

Now you can store the camera matrix and distortion coefficients using write functions in NumPy (np.savez, np.savetxt etc) for future uses.

## Re-projection Error

Re-projection error gives a good estimation of just how exact the found parameters are. The closer the re-projection error is to zero, the more accurate the parameters we found are. Given the intrinsic, distortion, rotation and translation matrices, we must first transform the object point to image point using **cv.projectPoints()**. Then, we can calculate the norm between what we got with our transformation and the corner finding algorithm. To find the RMSE (root mean squared error), we average the squared errors over all points and images, then take the square root. @code{.py} mean\_error = 0 for i in range(len(objpoints)): imgpoints2, \_ = cv.projectPoints(objpoints\[i\], rvecs\[i\], tvecs\[i\], mtx, dist) error = cv.norm(imgpoints\[i\], imgpoints2, cv.NORM\_L2SQR) / len(imgpoints2) mean\_error += error

print( "total error: {}".format(np.sqrt(mean\_error/len(objpoints))) ) @endcode

## Exercises

\-# Try camera calibration with circular grid.

## [Py Depthmap](https://docharvest.github.io/docs/opencv5/py_tutorials/py_calib3d/py_depthmap/py_depthmap/)

Contents

opencv5

Py Depthmap

OpenCV 5

Py Depthmap

# Depth Map from Stereo Images {#tutorial\_py\_depthmap}

## Goal

In this session, - We will learn to create a depth map from stereo images.

## Basics

In the last session, we saw basic concepts like epipolar constraints and other related terms. We also saw that if we have two images of same scene, we can get depth information from that in an intuitive way. Below is an image and some simple mathematical formulas which prove that intuition. (Image Courtesy :

The above diagram contains equivalent triangles. Writing their equivalent equations will yield us following result:

\\f\[disparity = x - x' = \\frac{Bf}{Z}\\f\]

\\f$x\\f$ and \\f$x'\\f$ are the distance between points in image plane corresponding to the scene point 3D and their camera center. \\f$B\\f$ is the distance between two cameras (which we know) and \\f$f\\f$ is the focal length of camera (already known). So in short, the above equation says that the depth of a point in a scene is inversely proportional to the difference in distance of corresponding image points and their camera centers. So with this information, we can derive the depth of all pixels in an image.

So it finds corresponding matches between two images. We have already seen how epiline constraint make this operation faster and accurate. Once it finds matches, it finds the disparity. Let's see how we can do it with OpenCV.

## Code

Below code snippet shows a simple procedure to create a disparity map. @code{.py} import numpy as np import cv2 as cv from matplotlib import pyplot as plt

imgL = cv.imread('tsukuba\_l.png', cv.IMREAD\_GRAYSCALE) imgR = cv.imread('tsukuba\_r.png', cv.IMREAD\_GRAYSCALE)

stereo = cv.StereoBM.create(numDisparities=16, blockSize=15) disparity = stereo.compute(imgL,imgR) plt.imshow(disparity,'gray') plt.show() @endcode Below image contains the original image (left) and its disparity map (right). As you can see, the result is contaminated with high degree of noise. By adjusting the values of numDisparities and blockSize, you can get a better result.

There are some parameters when you get familiar with StereoBM, and you may need to fine tune the parameters to get better and smooth results. Parameters:

-   texture\_threshold: filters out areas that don't have enough texture for reliable matching
-   Speckle range and size: Block-based matchers often produce "speckles" near the boundaries of objects, where the matching window catches the foreground on one side and the background on the other. In this scene it appears that the matcher is also finding small spurious matches in the projected texture on the table. To get rid of these artifacts we post-process the disparity image with a speckle filter controlled by the speckle\_size and speckle\_range parameters. speckle\_size is the number of pixels below which a disparity blob is dismissed as "speckle." speckle\_range controls how close in value disparities must be to be considered part of the same blob.
-   Number of disparities: How many pixels to slide the window over. The larger it is, the larger the range of visible depths, but more computation is required.
-   min\_disparity: the offset from the x-position of the left pixel at which to begin searching.
-   uniqueness\_ratio: Another post-filtering step. If the best matching disparity is not sufficiently better than every other disparity in the search range, the pixel is filtered out. You can try tweaking this if texture\_threshold and the speckle filtering are still letting through spurious matches.
-   prefilter\_size and prefilter\_cap: The pre-filtering phase, which normalizes image brightness and enhances texture in preparation for block matching. Normally you should not need to adjust these.

These parameters are set with dedicated setters and getters after the algorithm initialization, such as `setTextureThreshold`, `setSpeckleRange`, `setUniquenessRatio`, and more. See cv::StereoBM documentation for details.

## Additional Resources

-   [Ros stereo img processing wiki page](http://wiki.ros.org/stereo_image_proc/Tutorials/ChoosingGoodStereoParameters)

## Exercises

\-# OpenCV samples contain an example of generating disparity map and its 3D reconstruction. Check stereo\_match.py in OpenCV-Python samples.

## [Py Epipolar Geometry](https://docharvest.github.io/docs/opencv5/py_tutorials/py_calib3d/py_epipolar_geometry/py_epipolar_geometry/)

Contents

opencv5

Py Epipolar Geometry

OpenCV 5

Py Epipolar Geometry

# Epipolar Geometry {#tutorial\_py\_epipolar\_geometry}

## Goal

In this section,

-   We will learn about the basics of multiview geometry
-   We will see what is epipole, epipolar lines, epipolar constraint etc.

## Basic Concepts

When we take an image using pin-hole camera, we loose an important information, ie depth of the image. Or how far is each point in the image from the camera because it is a 3D-to-2D conversion. So it is an important question whether we can find the depth information using these cameras. And the answer is to use more than one camera. Our eyes works in similar way where we use two cameras (two eyes) which is called stereo vision. So let's see what OpenCV provides in this field.

(_Learning OpenCV_ by Gary Bradsky has a lot of information in this field.)

Before going to depth images, let's first understand some basic concepts in multiview geometry. In this section we will deal with epipolar geometry. See the image below which shows a basic setup with two cameras taking the image of same scene.

If we are using only the left camera, we can't find the 3D point corresponding to the point \\f$x\\f$ in image because every point on the line \\f$OX\\f$ projects to the same point on the image plane. But consider the right image also. Now different points on the line \\f$OX\\f$ projects to different points (\\f$x'\\f$) in right plane. So with these two images, we can triangulate the correct 3D point. This is the whole idea.

The projection of the different points on \\f$OX\\f$ form a line on right plane (line \\f$l'\\f$). We call it **epiline** corresponding to the point \\f$x\\f$. It means, to find the point \\f$x\\f$ on the right image, search along this epiline. It should be somewhere on this line (Think of it this way, to find the matching point in other image, you need not search the whole image, just search along the epiline. So it provides better performance and accuracy). This is called **Epipolar Constraint**. Similarly all points will have its corresponding epilines in the other image. The plane \\f$XOO'\\f$ is called **Epipolar Plane**.

\\f$O\\f$ and \\f$O'\\f$ are the camera centers. From the setup given above, you can see that projection of right camera \\f$O'\\f$ is seen on the left image at the point, \\f$e\\f$. It is called the **epipole**. Epipole is the point of intersection of line through camera centers and the image planes. Similarly \\f$e'\\f$ is the epipole of the left camera. In some cases, you won't be able to locate the epipole in the image, they may be outside the image (which means, one camera doesn't see the other).

All the epilines pass through its epipole. So to find the location of epipole, we can find many epilines and find their intersection point.

So in this session, we focus on finding epipolar lines and epipoles. But to find them, we need two more ingredients, **Fundamental Matrix (F)** and **Essential Matrix (E)**. Essential Matrix contains the information about translation and rotation, which describe the location of the second camera relative to the first in global coordinates. See the image below (Image courtesy: Learning OpenCV by Gary Bradsky):

But we prefer measurements to be done in pixel coordinates, right? Fundamental Matrix contains the same information as Essential Matrix in addition to the information about the intrinsics of both cameras so that we can relate the two cameras in pixel coordinates. (If we are using rectified images and normalize the point by dividing by the focal lengths, \\f$F=E\\f$). In simple words, Fundamental Matrix F, maps a point in one image to a line (epiline) in the other image. This is calculated from matching points from both the images. A minimum of 8 such points are required to find the fundamental matrix (while using 8-point algorithm). More points are preferred and use RANSAC to get a more robust result.

## Code

So first we need to find as many possible matches between two images to find the fundamental matrix. For this, we use SIFT descriptors with FLANN based matcher and ratio test. @code{.py} import numpy as np import cv2 as cv from matplotlib import pyplot as plt

img1 = cv.imread('myleft.jpg', cv.IMREAD\_GRAYSCALE) #queryimage # left image img2 = cv.imread('myright.jpg', cv.IMREAD\_GRAYSCALE) #trainimage # right image

sift = cv.SIFT\_create()

# find the keypoints and descriptors with SIFT

kp1, des1 = sift.detectAndCompute(img1,None) kp2, des2 = sift.detectAndCompute(img2,None)

# FLANN parameters

FLANN\_INDEX\_KDTREE = 1 index\_params = dict(algorithm = FLANN\_INDEX\_KDTREE, trees = 5) search\_params = dict(checks=50)

flann = cv.FlannBasedMatcher(index\_params,search\_params) matches = flann.knnMatch(des1,des2,k=2)

pts1 = \[\] pts2 = \[\]

# ratio test as per Lowe's paper

for i,(m,n) in enumerate(matches): if m.distance < 0.8\*n.distance: pts2.append(kp2\[m.trainIdx\].pt) pts1.append(kp1\[m.queryIdx\].pt) @endcode Now we have the list of best matches from both the images. Let's find the Fundamental Matrix. @code{.py} pts1 = np.int32(pts1) pts2 = np.int32(pts2) F, mask = cv.findFundamentalMat(pts1,pts2,cv.FM\_LMEDS)

# We select only inlier points

pts1 = pts1\[mask.ravel()==1\] pts2 = pts2\[mask.ravel()==1\] @endcode Next we find the epilines. Epilines corresponding to the points in first image is drawn on second image. So mentioning of correct images are important here. We get an array of lines. So we define a new function to draw these lines on the images. @code{.py} def drawlines(img1,img2,lines,pts1,pts2): ''' img1 - image on which we draw the epilines for the points in img2 lines - corresponding epilines ''' r,c = img1.shape img1 = cv.cvtColor(img1,cv.COLOR\_GRAY2BGR) img2 = cv.cvtColor(img2,cv.COLOR\_GRAY2BGR) for r,pt1,pt2 in zip(lines,pts1,pts2): color = tuple(np.random.randint(0,255,3).tolist()) x0,y0 = map(int, \[0, -r\[2\]/r\[1\] \]) x1,y1 = map(int, \[c, -(r\[2\]+r\[0\]\*c)/r\[1\] \]) img1 = cv.line(img1, (x0,y0), (x1,y1), color,1) img1 = cv.circle(img1,tuple(pt1),5,color,-1) img2 = cv.circle(img2,tuple(pt2),5,color,-1) return img1,img2 @endcode Now we find the epilines in both the images and draw them. @code{.py}

# Find epilines corresponding to points in right image (second image) and

# drawing its lines on left image

lines1 = cv.computeCorrespondEpilines(pts2.reshape(-1,1,2), 2,F) lines1 = lines1.reshape(-1,3) img5,img6 = drawlines(img1,img2,lines1,pts1,pts2)

# Find epilines corresponding to points in left image (first image) and

# drawing its lines on right image

lines2 = cv.computeCorrespondEpilines(pts1.reshape(-1,1,2), 1,F) lines2 = lines2.reshape(-1,3) img3,img4 = drawlines(img2,img1,lines2,pts2,pts1)

plt.subplot(121),plt.imshow(img5) plt.subplot(122),plt.imshow(img3) plt.show() @endcode Below is the result we get:

You can see in the left image that all epilines are converging at a point outside the image at right side. That meeting point is the epipole.

For better results, images with good resolution and many non-planar points should be used.

## Exercises

\-# One important topic is the forward movement of camera. Then epipoles will be seen at the same locations in both with epilines emerging from a fixed point. [See this discussion](http://answers.opencv.org/question/17912/location-of-epipole/). 2. Fundamental Matrix estimation is sensitive to quality of matches, outliers etc. It becomes worse when all selected matches lie on the same plane. [Check this discussion](http://answers.opencv.org/question/18125/epilines-not-correct/).

## [Py Pose](https://docharvest.github.io/docs/opencv5/py_tutorials/py_calib3d/py_pose/py_pose/)

Contents

opencv5

Py Pose

OpenCV 5

Py Pose

# Pose Estimation {#tutorial\_py\_pose}

## Goal

In this section, - We will learn to exploit 3d module to create some 3D effects in images.

## Basics

This is going to be a small section. During the last session on camera calibration, you have found the camera matrix, distortion coefficients etc. Given a pattern image, we can utilize the above information to calculate its pose, or how the object is situated in space, like how it is rotated, how it is displaced etc. For a planar object, we can assume Z=0, such that, the problem now becomes how camera is placed in space to see our pattern image. So, if we know how the object lies in the space, we can draw some 2D diagrams in it to simulate the 3D effect. Let's see how to do it.

Our problem is, we want to draw our 3D coordinate axis (X, Y, Z axes) on our chessboard's first corner. X axis in blue color, Y axis in green color and Z axis in red color. So in-effect, Z axis should feel like it is perpendicular to our chessboard plane.

First, let's load the camera matrix and distortion coefficients from the previous calibration result. @code{.py} import numpy as np import cv2 as cv import glob

# Load previously saved data

with np.load('B.npz') as X: mtx, dist, \_, \_ = \[X\[i\] for i in ('mtx','dist','rvecs','tvecs')\] @endcode Now let's create a function, draw which takes the corners in the chessboard (obtained using **cv.findChessboardCorners()**) and **axis points** to draw a 3D axis. @code{.py} def draw(img, corners, imgpts): corner = tuple(corners\[0\].ravel().astype("int32")) imgpts = imgpts.astype("int32") img = cv.line(img, corner, tuple(imgpts\[0\].ravel()), (255,0,0), 5) img = cv.line(img, corner, tuple(imgpts\[1\].ravel()), (0,255,0), 5) img = cv.line(img, corner, tuple(imgpts\[2\].ravel()), (0,0,255), 5) return img @endcode Then as in previous case, we create termination criteria, object points (3D points of corners in chessboard) and axis points. Axis points are points in 3D space for drawing the axis. We draw axis of length 3 (units will be in terms of chess square size since we calibrated based on that size). So our X axis is drawn from (0,0,0) to (3,0,0), so for Y axis. For Z axis, it is drawn from (0,0,0) to (0,0,-3). Negative denotes it is drawn towards the camera. @code{.py} criteria = (cv.TERM\_CRITERIA\_EPS + cv.TERM\_CRITERIA\_MAX\_ITER, 30, 0.001) objp = np.zeros((6\*7,3), np.float32) objp\[:,:2\] = np.mgrid\[0:7,0:6\].T.reshape(-1,2)

axis = np.float32(\[\[3,0,0\], \[0,3,0\], \[0,0,-3\]\]).reshape(-1,3) @endcode Now, as usual, we load each image. Search for 7x6 grid. If found, we refine it with subcorner pixels. Then to calculate the rotation and translation, we use the function, **cv.solvePnPRansac()**. Once we those transformation matrices, we use them to project our **axis points** to the image plane. In simple words, we find the points on image plane corresponding to each of (3,0,0),(0,3,0),(0,0,3) in 3D space. Once we get them, we draw lines from the first corner to each of these points using our generateImage() function. Done !!! @code{.py} for fname in glob.glob('left\*.jpg'): img = cv.imread(fname) gray = cv.cvtColor(img,cv.COLOR\_BGR2GRAY) ret, corners = cv.findChessboardCorners(gray, (7,6),None)

```
if ret == True:
    corners2 = cv.cornerSubPix(gray,corners,(11,11),(-1,-1),criteria)

    # Find the rotation and translation vectors.
    ret,rvecs, tvecs = cv.solvePnP(objp, corners2, mtx, dist)

    # project 3D points to image plane
    imgpts, jac = cv.projectPoints(axis, rvecs, tvecs, mtx, dist)

    img = draw(img,corners2,imgpts)
    cv.imshow('img',img)
    k = cv.waitKey(0) & 0xFF
    if k == ord('s'):
        cv.imwrite(fname[:6]+'.png', img)
```

cv.destroyAllWindows() @endcode See some results below. Notice that each axis is 3 squares long.:

### Render a Cube

If you want to draw a cube, modify the generateImage() function and axis points as follows.

Modified generateImage() function: @code{.py} def draw(img, corners, imgpts): imgpts = np.int32(imgpts).reshape(-1,2)

```
# draw ground floor in green
img = cv.drawContours(img, [imgpts[:4]],-1,(0,255,0),-3)

# draw pillars in blue color
for i,j in zip(range(4),range(4,8)):
    img = cv.line(img, tuple(imgpts[i]), tuple(imgpts[j]),(255),3)

# draw top layer in red color
img = cv.drawContours(img, [imgpts[4:]],-1,(0,0,255),3)

return img
```

@endcode Modified axis points. They are the 8 corners of a cube in 3D space: @code{.py} axis = np.float32(\[\[0,0,0\], \[0,3,0\], \[3,3,0\], \[3,0,0\], \[0,0,-3\],\[0,3,-3\],\[3,3,-3\],\[3,0,-3\] \]) @endcode And look at the result below:

If you are interested in graphics, augmented reality etc, you can use OpenGL to render more complicated figures.

## [Py Table Of Contents Calib3d](https://docharvest.github.io/docs/opencv5/py_tutorials/py_calib3d/py_table_of_contents_calib3d/)

Contents

opencv5

Py Table Of Contents Calib3d

OpenCV 5

Py Table Of Contents Calib3d

# Camera Calibration and 3D Reconstruction {#tutorial\_py\_table\_of\_contents\_calib3d}

-   @subpage tutorial\_py\_calibration
    
    Let's find how good is our camera. Is there any distortion in images taken with it? If so how to correct it?
    
-   @subpage tutorial\_py\_pose
    
    This is a small section which will help you to create some cool 3D effects with calib module.
    
-   @subpage tutorial\_py\_epipolar\_geometry
    
    Let's understand epipolar geometry and epipolar constraint.
    
-   @subpage tutorial\_py\_depthmap
    
    Extract depth information from 2D images.

## [Py Basic Ops](https://docharvest.github.io/docs/opencv5/py_tutorials/py_core/py_basic_ops/py_basic_ops/)

Contents

opencv5

Py Basic Ops

OpenCV 5

Py Basic Ops

# Basic Operations on Images {#tutorial\_py\_basic\_ops}

## Goal

Learn to:

-   Access pixel values and modify them
-   Access image properties
-   Set a Region of Interest (ROI)
-   Split and merge images

Almost all the operations in this section are mainly related to Numpy rather than OpenCV. A good knowledge of Numpy is required to write better optimized code with OpenCV.

_( Examples will be shown in a Python terminal, since most of them are just single lines of code )_

## Accessing and Modifying pixel values

Let's load a color image first: @code{.py}

> > > import numpy as np import cv2 as cv

> > > img = cv.imread('messi5.jpg') assert img is not None, "file could not be read, check with os.path.exists()" @endcode You can access a pixel value by its row and column coordinates. For BGR image, it returns an array of Blue, Green, Red values. For grayscale image, just corresponding intensity is returned. @code{.py}
> > > 
> > > > > px = img\[100,100\] print( px ) \[157 166 200\]

# accessing only blue pixel

> > > blue = img\[100,100,0\] print( blue ) 157 @endcode You can modify the pixel values the same way. @code{.py}
> > > 
> > > > > img\[100,100\] = \[255,255,255\] print( img\[100,100\] ) \[255 255 255\] @endcode

**Warning**

Numpy is an optimized library for fast array calculations. So simply accessing each and every pixel value and modifying it will be very slow and it is discouraged.

## Accessing Image Properties

Image properties include number of rows, columns, and channels; type of image data; number of pixels; etc.

The shape of an image is accessed by img.shape. It returns a tuple of the number of rows, columns, and channels (if the image is color): @code{.py}

> > > print( img.shape ) (342, 548, 3) @endcode

@note If an image is grayscale, the tuple returned contains only the number of rows and columns, so it is a good method to check whether the loaded image is grayscale or color.

Total number of pixels is accessed by `img.size`: @code{.py}

> > > print( img.size ) 562248 @endcode Image datatype is obtained by \`img.dtype\`: @code{.py}
> > > 
> > > > > print( img.dtype ) uint8 @endcode

@note img.dtype is very important while debugging because a large number of errors in OpenCV-Python code are caused by invalid datatype.

## Image ROI

Sometimes, you will have to play with certain regions of images. For eye detection in images, first face detection is done over the entire image. When a face is obtained, we select the face region alone and search for eyes inside it instead of searching the whole image. It improves accuracy (because eyes are always on faces :D ) and performance (because we search in a small area).

ROI is again obtained using Numpy indexing. Here I am selecting the ball and copying it to another region in the image: @code{.py}

> > > ball = img\[280:340, 330:390\] img\[273:333, 100:160\] = ball @endcode Check the results below:

## Splitting and Merging Image Channels

Sometimes you will need to work separately on the B,G,R channels of an image. In this case, you need to split the BGR image into single channels. In other cases, you may need to join these individual channels to create a BGR image. You can do this simply by: @code{.py}

> > > b,g,r = cv.split(img) img = cv.merge((b,g,r)) @endcode Or @code
> > > 
> > > > > b = img\[:,:,0\] @endcode Suppose you want to set all the red pixels to zero - you do not need to split the channels first. Numpy indexing is faster: @code{.py}
> > > > > 
> > > > > > > img\[:,:,2\] = 0 @endcode

**Warning**

cv.split() is a costly operation (in terms of time). So use it only if necessary. Otherwise go for Numpy indexing.

## Making Borders for Images (Padding)

If you want to create a border around an image, something like a photo frame, you can use **cv.copyMakeBorder()**. But it has more applications for convolution operation, zero padding etc. This function takes following arguments:

-   **src** - input image
    
-   **top**, **bottom**, **left**, **right** - border width in number of pixels in corresponding directions
    
-   **borderType** - Flag defining what kind of border to be added. It can be following types:
    
    -   **cv.BORDER\_CONSTANT** - Adds a constant colored border. The value should be given as next argument.
    -   **cv.BORDER\_REFLECT** - Border will be mirror reflection of the border elements, like this : _fedcba|abcdefgh|hgfedcb_
    -   **cv.BORDER\_REFLECT\_101** or **cv.BORDER\_DEFAULT** - Same as above, but with a slight change, like this : _gfedcb|abcdefgh|gfedcba_
    -   **cv.BORDER\_REPLICATE** - Last element is replicated throughout, like this: _aaaaaa|abcdefgh|hhhhhhh_
    -   **cv.BORDER\_WRAP** - Can't explain, it will look like this : _cdefgh|abcdefgh|abcdefg_
-   **value** - Color of border if border type is cv.BORDER\_CONSTANT
    

Below is a sample code demonstrating all these border types for better understanding: @code{.py} import cv2 as cv import numpy as np from matplotlib import pyplot as plt

BLUE = \[255,0,0\]

img1 = cv.imread('opencv-logo.png') assert img1 is not None, "file could not be read, check with os.path.exists()"

replicate = cv.copyMakeBorder(img1,10,10,10,10,cv.BORDER\_REPLICATE) reflect = cv.copyMakeBorder(img1,10,10,10,10,cv.BORDER\_REFLECT) reflect101 = cv.copyMakeBorder(img1,10,10,10,10,cv.BORDER\_REFLECT\_101) wrap = cv.copyMakeBorder(img1,10,10,10,10,cv.BORDER\_WRAP) constant= cv.copyMakeBorder(img1,10,10,10,10,cv.BORDER\_CONSTANT,value=BLUE)

plt.subplot(231),plt.imshow(img1,'gray'),plt.title('ORIGINAL') plt.subplot(232),plt.imshow(replicate,'gray'),plt.title('REPLICATE') plt.subplot(233),plt.imshow(reflect,'gray'),plt.title('REFLECT') plt.subplot(234),plt.imshow(reflect101,'gray'),plt.title('REFLECT\_101') plt.subplot(235),plt.imshow(wrap,'gray'),plt.title('WRAP') plt.subplot(236),plt.imshow(constant,'gray'),plt.title('CONSTANT')

plt.show() @endcode See the result below. (Image is displayed with matplotlib. So RED and BLUE channels will be interchanged):

## [Py Image Arithmetics](https://docharvest.github.io/docs/opencv5/py_tutorials/py_core/py_image_arithmetics/py_image_arithmetics/)

Contents

opencv5

Py Image Arithmetics

OpenCV 5

Py Image Arithmetics

# Arithmetic Operations on Images {#tutorial\_py\_image\_arithmetics}

## Goal

-   Learn several arithmetic operations on images, like addition, subtraction, bitwise operations, and etc.
-   Learn these functions: **cv.add()**, **cv.addWeighted()**, etc.

## Image Addition

You can add two images with the OpenCV function, cv.add(), or simply by the numpy operation res = img1 + img2. Both images should be of same depth and type, or the second image can just be a scalar value.

@note There is a difference between OpenCV addition and Numpy addition. OpenCV addition is a saturated operation while Numpy addition is a modulo operation.

For example, consider the below sample: @code{.py}

> > > x = np.uint8(\[250\]) y = np.uint8(\[10\])

> > > print( cv.add(x,y) ) # 250+10 = 260 => 255 \[\[255\]\]

> > > print( x+y ) # 250+10 = 260 % 256 = 4 \[4\] @endcode This will be more visible when you add two images. Stick with OpenCV functions, because they will provide a better result.

## Image Blending

This is also image addition, but different weights are given to images in order to give a feeling of blending or transparency. Images are added as per the equation below:

\\f\[g(x) = (1 - \\alpha)f\_{0}(x) + \\alpha f\_{1}(x)\\f\]

By varying \\f$\\alpha\\f$ from \\f$0 \\rightarrow 1\\f$, you can perform a cool transition between one image to another.

Here I took two images to blend together. The first image is given a weight of 0.7 and the second image is given 0.3. cv.addWeighted() applies the following equation to the image:

\\f\[dst = \\alpha \\cdot img1 + \\beta \\cdot img2 + \\gamma\\f\]

Here \\f$\\gamma\\f$ is taken as zero. @code{.py} img1 = cv.imread('ml.png') img2 = cv.imread('opencv-logo.png') assert img1 is not None, "file could not be read, check with os.path.exists()" assert img2 is not None, "file could not be read, check with os.path.exists()"

dst = cv.addWeighted(img1,0.7,img2,0.3,0)

cv.imshow('dst',dst) cv.waitKey(0) cv.destroyAllWindows() @endcode Check the result below:

## Bitwise Operations

This includes the bitwise AND, OR, NOT, and XOR operations. They will be highly useful while extracting any part of the image (as we will see in coming chapters), defining and working with non-rectangular ROI's, and etc. Below we will see an example of how to change a particular region of an image.

I want to put the OpenCV logo above an image. If I add two images, it will change the color. If I blend them, I get a transparent effect. But I want it to be opaque. If it was a rectangular region, I could use ROI as we did in the last chapter. But the OpenCV logo is a not a rectangular shape. So you can do it with bitwise operations as shown below: @code{.py}

# Load two images

img1 = cv.imread('messi5.jpg') img2 = cv.imread('opencv-logo-white.png') assert img1 is not None, "file could not be read, check with os.path.exists()" assert img2 is not None, "file could not be read, check with os.path.exists()"

# I want to put logo on top-left corner, So I create a ROI

rows,cols,channels = img2.shape roi = img1\[0:rows, 0:cols\]

# Now create a mask of logo and create its inverse mask also

img2gray = cv.cvtColor(img2,cv.COLOR\_BGR2GRAY) ret, mask = cv.threshold(img2gray, 10, 255, cv.THRESH\_BINARY) mask\_inv = cv.bitwise\_not(mask)

# Now black-out the area of logo in ROI

img1\_bg = cv.bitwise\_and(roi,roi,mask = mask\_inv)

# Take only region of logo from logo image.

img2\_fg = cv.bitwise\_and(img2,img2,mask = mask)

# Put logo in ROI and modify the main image

dst = cv.add(img1\_bg,img2\_fg) img1\[0:rows, 0:cols \] = dst

cv.imshow('res',img1) cv.waitKey(0) cv.destroyAllWindows() @endcode See the result below. Left image shows the mask we created. Right image shows the final result. For more understanding, display all the intermediate images in the above code, especially img1\_bg and img2\_fg.

## Exercises

\-# Create a slide show of images in a folder with smooth transition between images using cv.addWeighted function

## [Py Optimization](https://docharvest.github.io/docs/opencv5/py_tutorials/py_core/py_optimization/py_optimization/)

Contents

opencv5

Py Optimization

OpenCV 5

Py Optimization

# Performance Measurement and Improvement Techniques {#tutorial\_py\_optimization}

## Goal

In image processing, since you are dealing with a large number of operations per second, it is mandatory that your code is not only providing the correct solution, but that it is also providing it in the fastest manner. So in this chapter, you will learn:

-   To measure the performance of your code.
-   Some tips to improve the performance of your code.
-   You will see these functions: **cv.getTickCount**, **cv.getTickFrequency**, etc.

Apart from OpenCV, Python also provides a module **time** which is helpful in measuring the time of execution. Another module **profile** helps to get a detailed report on the code, like how much time each function in the code took, how many times the function was called, etc. But, if you are using IPython, all these features are integrated in a user-friendly manner. We will see some important ones, and for more details, check links in the **Additional Resources** section.

## Measuring Performance with OpenCV

The **cv.getTickCount** function returns the number of clock-cycles after a reference event (like the moment the machine was switched ON) to the moment this function is called. So if you call it before and after the function execution, you get the number of clock-cycles used to execute a function.

The **cv.getTickFrequency** function returns the frequency of clock-cycles, or the number of clock-cycles per second. So to find the time of execution in seconds, you can do following: @code{.py} e1 = cv.getTickCount()

# your code execution

e2 = cv.getTickCount() time = (e2 - e1)/ cv.getTickFrequency() @endcode We will demonstrate with following example. The following example applies median filtering with kernels of odd sizes ranging from 5 to 49. (Don't worry about what the result will look like - that is not our goal): @code{.py} img1 = cv.imread('messi5.jpg') assert img1 is not None, "file could not be read, check with os.path.exists()"

e1 = cv.getTickCount() for i in range(5,49,2): img1 = cv.medianBlur(img1,i) e2 = cv.getTickCount() t = (e2 - e1)/cv.getTickFrequency() print( t )

# Result I got is 0.521107655 seconds

@endcode @note You can do the same thing with the time module. Instead of cv.getTickCount, use the time.time() function. Then take the difference of the two times.

## Default Optimization in OpenCV

Many of the OpenCV functions are optimized using SSE2, AVX, etc. It contains the unoptimized code also. So if our system support these features, we should exploit them (almost all modern day processors support them). It is enabled by default while compiling. So OpenCV runs the optimized code if it is enabled, otherwise it runs the unoptimized code. You can use **cv.useOptimized()** to check if it is enabled/disabled and **cv.setUseOptimized()** to enable/disable it. Let's see a simple example. @code{.py}

# check if optimization is enabled

In \[5\]: cv.useOptimized() Out\[5\]: True

In \[6\]: %timeit res = cv.medianBlur(img,49) 10 loops, best of 3: 34.9 ms per loop

# Disable it

In \[7\]: cv.setUseOptimized(False)

In \[8\]: cv.useOptimized() Out\[8\]: False

In \[9\]: %timeit res = cv.medianBlur(img,49) 10 loops, best of 3: 64.1 ms per loop @endcode As you can see, optimized median filtering is ~2x faster than the unoptimized version. If you check its source, you can see that median filtering is SIMD optimized. So you can use this to enable optimization at the top of your code (remember it is enabled by default).

## Measuring Performance in IPython

Sometimes you may need to compare the performance of two similar operations. IPython gives you a magic command %timeit to perform this. It runs the code several times to get more accurate results. Once again, it is suitable to measuring single lines of code.

For example, do you know which of the following addition operations is better, x = 5; y = x\*\*2, x = 5; y = x\*x, x = np.uint8(\[5\]); y = x\*x, or y = np.square(x)? We will find out with %timeit in the IPython shell. @code{.py} In \[10\]: x = 5

In \[11\]: %timeit y=x\*\*2 10000000 loops, best of 3: 73 ns per loop

In \[12\]: %timeit y=x\*x 10000000 loops, best of 3: 58.3 ns per loop

In \[15\]: z = np.uint8(\[5\])

In \[17\]: %timeit y=z\*z 1000000 loops, best of 3: 1.25 us per loop

In \[19\]: %timeit y=np.square(z) 1000000 loops, best of 3: 1.16 us per loop @endcode You can see that, x = 5 ; y = x\*x is fastest and it is around 20x faster compared to Numpy. If you consider the array creation also, it may reach up to 100x faster. Cool, right? _(Numpy devs are working on this issue)_

@note Python scalar operations are faster than Numpy scalar operations. So for operations including one or two elements, Python scalar is better than Numpy arrays. Numpy has the advantage when the size of the array is a little bit bigger.

We will try one more example. This time, we will compare the performance of **cv.countNonZero()** and **np.count\_nonzero()** for the same image.

@code{.py} In \[35\]: %timeit z = cv.countNonZero(img) 100000 loops, best of 3: 15.8 us per loop

In \[36\]: %timeit z = np.count\_nonzero(img) 1000 loops, best of 3: 370 us per loop @endcode See, the OpenCV function is nearly 25x faster than the Numpy function.

@note Normally, OpenCV functions are faster than Numpy functions. So for same operation, OpenCV functions are preferred. But, there can be exceptions, especially when Numpy works with views instead of copies.

## More IPython magic commands

There are several other magic commands to measure performance, profiling, line profiling, memory measurement, and etc. They all are well documented. So only links to those docs are provided here. Interested readers are recommended to try them out.

## Performance Optimization Techniques

There are several techniques and coding methods to exploit maximum performance of Python and Numpy. Only relevant ones are noted here and links are given to important sources. The main thing to be noted here is, first try to implement the algorithm in a simple manner. Once it is working, profile it, find the bottlenecks, and optimize them.

\-# Avoid using loops in Python as much as possible, especially double/triple loops etc. They are inherently slow. 2. Vectorize the algorithm/code to the maximum extent possible, because Numpy and OpenCV are optimized for vector operations. 3. Exploit the cache coherence. 4. Never make copies of an array unless it is necessary. Try to use views instead. Array copying is a costly operation.

If your code is still slow after doing all of these operations, or if the use of large loops is inevitable, use additional libraries like Cython to make it faster.

## Additional Resources

\-# [Python Optimization Techniques](http://wiki.python.org/moin/PythonSpeed/PerformanceTips) 2. Scipy Lecture Notes - [Advanced Numpy](http://scipy-lectures.github.io/advanced/advanced_numpy/index.html#advanced-numpy) 3. [Timing and Profiling in IPython](http://pynash.org/2013/03/06/timing-and-profiling/)

## [Py Table Of Contents Core](https://docharvest.github.io/docs/opencv5/py_tutorials/py_core/py_table_of_contents_core/)

Contents

opencv5

Py Table Of Contents Core

OpenCV 5

Py Table Of Contents Core

# Core Operations {#tutorial\_py\_table\_of\_contents\_core}

-   @subpage tutorial\_py\_basic\_ops
    
    Learn to read and edit pixel values, working with image ROI and other basic operations.
    
-   @subpage tutorial\_py\_image\_arithmetics
    
    Perform arithmetic operations on images
    
-   @subpage tutorial\_py\_optimization
    
    Getting a solution is important. But getting it in the fastest way is more important. Learn to check the speed of your code, optimize the code etc.

## [Py Fast](https://docharvest.github.io/docs/opencv5/py_tutorials/py_features/py_fast/py_fast/)

Contents

opencv5

Py Fast

OpenCV 5

Py Fast

# FAST Algorithm for Corner Detection {#tutorial\_py\_fast}

## Goal

In this chapter, - We will understand the basics of FAST algorithm - We will find corners using OpenCV functionalities for FAST algorithm.

## Theory

We saw several feature detectors and many of them are really good. But when looking from a real-time application point of view, they are not fast enough. One best example would be SLAM (Simultaneous Localization and Mapping) mobile robot which have limited computational resources.

As a solution to this, FAST (Features from Accelerated Segment Test) algorithm was proposed by Edward Rosten and Tom Drummond in their paper "Machine learning for high-speed corner detection" in 2006 (Later revised it in 2010). A basic summary of the algorithm is presented below. Refer original paper for more details (All the images are taken from original paper).

### Feature Detection using FAST

\-# Select a pixel \\f$p\\f$ in the image which is to be identified as an interest point or not. Let its intensity be \\f$I\_p\\f$. 2. Select appropriate threshold value \\f$t\\f$. 3. Consider a circle of 16 pixels around the pixel under test. (See the image below)

```
![image](images/fast_speedtest.jpg)
```

\-# Now the pixel \\f$p\\f$ is a corner if there exists a set of \\f$n\\f$ contiguous pixels in the circle (of 16 pixels) which are all brighter than \\f$I\_p + t\\f$, or all darker than \\f$I\_p − t\\f$. (Shown as white dash lines in the above image). \\f$n\\f$ was chosen to be 12. 5. A **high-speed test** was proposed to exclude a large number of non-corners. This test examines only the four pixels at 1, 9, 5 and 13 (First 1 and 9 are tested if they are too brighter or darker. If so, then checks 5 and 13). If \\f$p\\f$ is a corner, then at least three of these must all be brighter than \\f$I\_p + t\\f$ or darker than \\f$I\_p − t\\f$. If neither of these is the case, then \\f$p\\f$ cannot be a corner. The full segment test criterion can then be applied to the passed candidates by examining all pixels in the circle. This detector in itself exhibits high performance, but there are several weaknesses:

```
-   It does not reject as many candidates for n \< 12.
-   The choice of pixels is not optimal because its efficiency depends on ordering of the
    questions and distribution of corner appearances.
-   Results of high-speed tests are thrown away.
-   Multiple features are detected adjacent to one another.
```

First 3 points are addressed with a machine learning approach. Last one is addressed using non-maximal suppression.

### Machine Learning a Corner Detector

\-# Select a set of images for training (preferably from the target application domain) 2. Run FAST algorithm in every images to find feature points. 3. For every feature point, store the 16 pixels around it as a vector. Do it for all the images to get feature vector \\f$P\\f$. 4. Each pixel (say \\f$x\\f$) in these 16 pixels can have one of the following three states:

```
![image](images/fast_eqns.jpg)
```

\-# Depending on these states, the feature vector \\f$P\\f$ is subdivided into 3 subsets, \\f$P\_d\\f$, \\f$P\_s\\f$, \\f$P\_b\\f$. 6. Define a new boolean variable, \\f$K\_p\\f$, which is true if \\f$p\\f$ is a corner and false otherwise. 7. Use the ID3 algorithm (decision tree classifier) to query each subset using the variable \\f$K\_p\\f$ for the knowledge about the true class. It selects the \\f$x\\f$ which yields the most information about whether the candidate pixel is a corner, measured by the entropy of \\f$K\_p\\f$. 8. This is recursively applied to all the subsets until its entropy is zero. 9. The decision tree so created is used for fast detection in other images.

### Non-maximal Suppression

Detecting multiple interest points in adjacent locations is another problem. It is solved by using Non-maximum Suppression.

\-# Compute a score function, \\f$V\\f$ for all the detected feature points. \\f$V\\f$ is the sum of absolute difference between \\f$p\\f$ and 16 surrounding pixels values. 2. Consider two adjacent keypoints and compute their \\f$V\\f$ values. 3. Discard the one with lower \\f$V\\f$ value.

### Summary

It is several times faster than other existing corner detectors.

But it is not robust to high levels of noise. It is dependent on a threshold.

## FAST Feature Detector in OpenCV

It is called as any other feature detector in OpenCV. If you want, you can specify the threshold, whether non-maximum suppression to be applied or not, the neighborhood to be used etc.

For the neighborhood, three flags are defined, cv.FAST\_FEATURE\_DETECTOR\_TYPE\_5\_8, cv.FAST\_FEATURE\_DETECTOR\_TYPE\_7\_12 and cv.FAST\_FEATURE\_DETECTOR\_TYPE\_9\_16. Below is a simple code on how to detect and draw the FAST feature points. @code{.py} import numpy as np import cv2 as cv from matplotlib import pyplot as plt

img = cv.imread('blox.jpg', cv.IMREAD\_GRAYSCALE) # `<opencv_root>/samples/data/blox.jpg`

# Initiate FAST object with default values

fast = cv.FastFeatureDetector\_create()

# find and draw the keypoints

kp = fast.detect(img,None) img2 = cv.drawKeypoints(img, kp, None, color=(255,0,0))

# Print all default params

print( "Threshold: {}".format(fast.getThreshold()) ) print( "nonmaxSuppression:{}".format(fast.getNonmaxSuppression()) ) print( "neighborhood: {}".format(fast.getType()) ) print( "Total Keypoints with nonmaxSuppression: {}".format(len(kp)) )

cv.imwrite('fast\_true.png', img2)

# Disable nonmaxSuppression

fast.setNonmaxSuppression(0) kp = fast.detect(img, None)

print( "Total Keypoints without nonmaxSuppression: {}".format(len(kp)) )

img3 = cv.drawKeypoints(img, kp, None, color=(255,0,0))

cv.imwrite('fast\_false.png', img3) @endcode See the results. First image shows FAST with nonmaxSuppression and second one without nonmaxSuppression:

## Additional Resources

\-# Edward Rosten and Tom Drummond, "Machine learning for high speed corner detection" in 9th European Conference on Computer Vision, vol. 1, 2006, pp. 430–443. 2. Edward Rosten, Reid Porter, and Tom Drummond, "Faster and better: a machine learning approach to corner detection" in IEEE Trans. Pattern Analysis and Machine Intelligence, 2010, vol 32, pp. 105-119.

## [Py Feature Homography](https://docharvest.github.io/docs/opencv5/py_tutorials/py_features/py_feature_homography/py_feature_homography/)

Contents

opencv5

Py Feature Homography

OpenCV 5

Py Feature Homography

# Feature Matching + Homography to find Objects {#tutorial\_py\_feature\_homography}

## Goal

In this chapter, - We will mix up the feature matching and findHomography from calib3d module to find known objects in a complex image.

## Basics

So what we did in last session? We used a queryImage, found some feature points in it, we took another trainImage, found the features in that image too and we found the best matches among them. In short, we found locations of some parts of an object in another cluttered image. This information is sufficient to find the object exactly on the trainImage.

For that, we can use a function from calib3d module, ie **cv.findHomography()**. If we pass the set of points from both the images, it will find the perspective transformation of that object. Then we can use **cv.perspectiveTransform()** to find the object. It needs at least four correct points to find the transformation.

We have seen that there can be some possible errors while matching which may affect the result. To solve this problem, algorithm uses RANSAC or LEAST\_MEDIAN (which can be decided by the flags). So good matches which provide correct estimation are called inliers and remaining are called outliers. **cv.findHomography()** returns a mask which specifies the inlier and outlier points.

So let's do it !!!

## Code

First, as usual, let's find SIFT features in images and apply the ratio test to find the best matches. @code{.py} import numpy as np import cv2 as cv from matplotlib import pyplot as plt

MIN\_MATCH\_COUNT = 10

img1 = cv.imread('box.png', cv.IMREAD\_GRAYSCALE) # queryImage img2 = cv.imread('box\_in\_scene.png', cv.IMREAD\_GRAYSCALE) # trainImage

# Initiate SIFT detector

sift = cv.SIFT\_create()

# find the keypoints and descriptors with SIFT

kp1, des1 = sift.detectAndCompute(img1,None) kp2, des2 = sift.detectAndCompute(img2,None)

FLANN\_INDEX\_KDTREE = 1 index\_params = dict(algorithm = FLANN\_INDEX\_KDTREE, trees = 5) search\_params = dict(checks = 50)

flann = cv.FlannBasedMatcher(index\_params, search\_params)

matches = flann.knnMatch(des1,des2,k=2)

# store all the good matches as per Lowe's ratio test.

good = \[\] for m,n in matches: if m.distance < 0.7\*n.distance: good.append(m) @endcode Now we set a condition that at least 10 matches (defined by MIN\_MATCH\_COUNT) are to be there to find the object. Otherwise simply show a message saying not enough matches are present.

If enough matches are found, we extract the locations of matched keypoints in both the images. They are passed to find the perspective transformation. Once we get this 3x3 transformation matrix, we use it to transform the corners of queryImage to corresponding points in trainImage. Then we draw it. @code{.py} if len(good)>MIN\_MATCH\_COUNT: src\_pts = np.float32(\[ kp1\[m.queryIdx\].pt for m in good \]).reshape(-1,1,2) dst\_pts = np.float32(\[ kp2\[m.trainIdx\].pt for m in good \]).reshape(-1,1,2)

```
M, mask = cv.findHomography(src_pts, dst_pts, cv.RANSAC,5.0)
matchesMask = mask.ravel().tolist()

h,w = img1.shape
pts = np.float32([ [0,0],[0,h-1],[w-1,h-1],[w-1,0] ]).reshape(-1,1,2)
dst = cv.perspectiveTransform(pts,M)

img2 = cv.polylines(img2,[np.int32(dst)],True,255,3, cv.LINE_AA)
```

else: print( "Not enough matches are found - {}/{}".format(len(good), MIN\_MATCH\_COUNT) ) matchesMask = None @endcode Finally we draw our inliers (if successfully found the object) or matching keypoints (if failed). @code{.py} draw\_params = dict(matchColor = (0,255,0), # draw matches in green color singlePointColor = None, matchesMask = matchesMask, # draw only inliers flags = 2)

img3 = cv.drawMatches(img1,kp1,img2,kp2,good,None,\*\*draw\_params)

plt.imshow(img3, 'gray'),plt.show() @endcode See the result below. Object is marked in white color in cluttered image:

## [Py Features Harris](https://docharvest.github.io/docs/opencv5/py_tutorials/py_features/py_features_harris/py_features_harris/)

Contents

opencv5

Py Features Harris

OpenCV 5

Py Features Harris

# Harris Corner Detection {#tutorial\_py\_features\_harris}

## Goal

In this chapter,

-   We will understand the concepts behind Harris Corner Detection.
-   We will see the following functions: **cv.cornerHarris()**, **cv.cornerSubPix()**

## Theory

In the last chapter, we saw that corners are regions in the image with large variation in intensity in all the directions. One early attempt to find these corners was done by **Chris Harris & Mike Stephens** in their paper **A Combined Corner and Edge Detector** in 1988, so now it is called the Harris Corner Detector. He took this simple idea to a mathematical form. It basically finds the difference in intensity for a displacement of \\f$(u,v)\\f$ in all directions. This is expressed as below:

\\f\[E(u,v) = \\sum\_{x,y} \\underbrace{w(x,y)}_\\text{window function} , \[\\underbrace{I(x+u,y+v)}_\\text{shifted intensity}-\\underbrace{I(x,y)}\_\\text{intensity}\]^2\\f\]

The window function is either a rectangular window or a Gaussian window which gives weights to pixels underneath.

We have to maximize this function \\f$E(u,v)\\f$ for corner detection. That means we have to maximize the second term. Applying Taylor Expansion to the above equation and using some mathematical steps (please refer to any standard text books you like for full derivation), we get the final equation as:

\\f\[E(u,v) \\approx \\begin{bmatrix} u & v \\end{bmatrix} M \\begin{bmatrix} u \\ v \\end{bmatrix}\\f\]

where

\\f\[M = \\sum\_{x,y} w(x,y) \\begin{bmatrix}I\_x I\_x & I\_x I\_y \\ I\_x I\_y & I\_y I\_y \\end{bmatrix}\\f\]

Here, \\f$I\_x\\f$ and \\f$I\_y\\f$ are image derivatives in x and y directions respectively. (These can be easily found using **cv.Sobel()**).

Then comes the main part. After this, they created a score, basically an equation, which determines if a window can contain a corner or not.

\\f\[R = \\det(M) - k(\\operatorname{trace}(M))^2\\f\]

where - \\f$\\det(M) = \\lambda\_1 \\lambda\_2\\f$ - \\f$\\operatorname{trace}(M) = \\lambda\_1 + \\lambda\_2\\f$ - \\f$\\lambda\_1\\f$ and \\f$\\lambda\_2\\f$ are the eigenvalues of \\f$M\\f$

So the magnitudes of these eigenvalues decide whether a region is a corner, an edge, or flat.

-   When \\f$|R|\\f$ is small, which happens when \\f$\\lambda\_1\\f$ and \\f$\\lambda\_2\\f$ are small, the region is flat.
-   When \\f$R<0\\f$, which happens when \\f$\\lambda\_1 >> \\lambda\_2\\f$ or vice versa, the region is edge.
-   When \\f$R\\f$ is large, which happens when \\f$\\lambda\_1\\f$ and \\f$\\lambda\_2\\f$ are large and \\f$\\lambda\_1 \\sim \\lambda\_2\\f$, the region is a corner.

It can be represented in a nice picture as follows:

So the result of Harris Corner Detection is a grayscale image with these scores. Thresholding for a suitable score gives you the corners in the image. We will do it with a simple image.

## Harris Corner Detector in OpenCV

OpenCV has the function **cv.cornerHarris()** for this purpose. Its arguments are:

-   **img** - Input image. It should be grayscale and float32 type.
-   **blockSize** - It is the size of neighbourhood considered for corner detection
-   **ksize** - Aperture parameter of the Sobel derivative used.
-   **k** - Harris detector free parameter in the equation.

See the example below: @code{.py} import numpy as np import cv2 as cv

filename = 'chessboard.png' img = cv.imread(filename) gray = cv.cvtColor(img,cv.COLOR\_BGR2GRAY)

gray = np.float32(gray) dst = cv.cornerHarris(gray,2,3,0.04)

#result is dilated for marking the corners, not important dst = cv.dilate(dst,None)

# Threshold for an optimal value, it may vary depending on the image.

img\[dst>0.01\*dst.max()\]=\[0,0,255\]

cv.imshow('dst',img) if cv.waitKey(0) & 0xff == 27: cv.destroyAllWindows() @endcode Below are the three results:

## Corner with SubPixel Accuracy

Sometimes, you may need to find the corners with maximum accuracy. OpenCV comes with a function **cv.cornerSubPix()** which further refines the corners detected with sub-pixel accuracy. Below is an example. As usual, we need to find the Harris corners first. Then we pass the centroids of these corners (There may be a bunch of pixels at a corner, we take their centroid) to refine them. Harris corners are marked in red pixels and refined corners are marked in green pixels. For this function, we have to define the criteria when to stop the iteration. We stop it after a specified number of iterations or a certain accuracy is achieved, whichever occurs first. We also need to define the size of the neighbourhood it searches for corners. @code{.py} import numpy as np import cv2 as cv

filename = 'chessboard2.jpg' img = cv.imread(filename) gray = cv.cvtColor(img,cv.COLOR\_BGR2GRAY)

# find Harris corners

gray = np.float32(gray) dst = cv.cornerHarris(gray,2,3,0.04) dst = cv.dilate(dst,None) ret, dst = cv.threshold(dst,0.01\*dst.max(),255,0) dst = np.uint8(dst)

# find centroids

ret, labels, stats, centroids = cv.connectedComponentsWithStats(dst)

# define the criteria to stop and refine the corners

criteria = (cv.TERM\_CRITERIA\_EPS + cv.TERM\_CRITERIA\_MAX\_ITER, 100, 0.001) corners = cv.cornerSubPix(gray,np.float32(centroids),(5,5),(-1,-1),criteria)

# Now draw them

res = np.hstack((centroids,corners)) res = np.int0(res) img\[res\[:,1\],res\[:,0\]\]=\[0,0,255\] img\[res\[:,3\],res\[:,2\]\] = \[0,255,0\]

cv.imwrite('subpixel5.png',img) @endcode Below is the result, where some important locations are shown in the zoomed window to visualize:

## Additional Resources

## Exercises

## [Py Features Meaning](https://docharvest.github.io/docs/opencv5/py_tutorials/py_features/py_features_meaning/py_features_meaning/)

Contents

opencv5

Py Features Meaning

OpenCV 5

Py Features Meaning

# Understanding Features {#tutorial\_py\_features\_meaning}

## Goal

In this chapter, we will just try to understand what are features, why are they important, why corners are important etc.

## Explanation

Most of you will have played the jigsaw puzzle games. You get a lot of small pieces of an image, where you need to assemble them correctly to form a big real image. **The question is, how you do it?** What about the projecting the same theory to a computer program so that computer can play jigsaw puzzles? If the computer can play jigsaw puzzles, why can't we give a lot of real-life images of a good natural scenery to computer and tell it to stitch all those images to a big single image? If the computer can stitch several natural images to one, what about giving a lot of pictures of a building or any structure and tell computer to create a 3D model out of it?

Well, the questions and imaginations continue. But it all depends on the most basic question: How do you play jigsaw puzzles? How do you arrange lots of scrambled image pieces into a big single image? How can you stitch a lot of natural images to a single image?

The answer is, we are looking for specific patterns or specific features which are unique, can be easily tracked and can be easily compared. If we go for a definition of such a feature, we may find it difficult to express it in words, but we know what they are. If someone asks you to point out one good feature which can be compared across several images, you can point out one. That is why even small children can simply play these games. We search for these features in an image, find them, look for the same features in other images and align them. That's it. (In jigsaw puzzle, we look more into continuity of different images). All these abilities are present in us inherently.

So our one basic question expands to more in number, but becomes more specific. **What are these features?**. (The answer should be understandable also to a computer.)

It is difficult to say how humans find these features. This is already programmed in our brain. But if we look deep into some pictures and search for different patterns, we will find something interesting. For example, take below image:

The image is very simple. At the top of image, six small image patches are given. Question for you is to find the exact location of these patches in the original image. How many correct results can you find?

A and B are flat surfaces and they are spread over a lot of area. It is difficult to find the exact location of these patches.

C and D are much more simple. They are edges of the building. You can find an approximate location, but exact location is still difficult. This is because the pattern is same everywhere along the edge. At the edge, however, it is different. An edge is therefore better feature compared to flat area, but not good enough (It is good in jigsaw puzzle for comparing continuity of edges).

Finally, E and F are some corners of the building. And they can be easily found. Because at the corners, wherever you move this patch, it will look different. So they can be considered as good features. So now we move into simpler (and widely used image) for better understanding.

Just like above, the blue patch is flat area and difficult to find and track. Wherever you move the blue patch it looks the same. The black patch has an edge. If you move it in vertical direction (i.e. along the gradient) it changes. Moved along the edge (parallel to edge), it looks the same. And for red patch, it is a corner. Wherever you move the patch, it looks different, means it is unique. So basically, corners are considered to be good features in an image. (Not just corners, in some cases blobs are considered good features).

So now we answered our question, "what are these features?". But next question arises. How do we find them? Or how do we find the corners?. We answered that in an intuitive way, i.e., look for the regions in images which have maximum variation when moved (by a small amount) in all regions around it. This would be projected into computer language in coming chapters. So finding these image features is called **Feature Detection**.

We found the features in the images. Once you have found it, you should be able to find the same in the other images. How is this done? We take a region around the feature, we explain it in our own words, like "upper part is blue sky, lower part is region from a building, on that building there is glass etc" and you search for the same area in the other images. Basically, you are describing the feature. Similarly, a computer also should describe the region around the feature so that it can find it in other images. So called description is called **Feature Description**. Once you have the features and its description, you can find same features in all images and align them, stitch them together or do whatever you want.

So in this module, we are looking to different algorithms in OpenCV to find features, describe them, match them etc.

## [Py Matcher](https://docharvest.github.io/docs/opencv5/py_tutorials/py_features/py_matcher/py_matcher/)

Contents

opencv5

Py Matcher

OpenCV 5

Py Matcher

# Feature Matching {#tutorial\_py\_matcher}

## Goal

In this chapter - We will see how to match features in one image with others. - We will use the Brute-Force matcher and FLANN Matcher in OpenCV

## Basics of Brute-Force Matcher

Brute-Force matcher is simple. It takes the descriptor of one feature in first set and is matched with all other features in second set using some distance calculation. And the closest one is returned.

For BF matcher, first we have to create the BFMatcher object using **cv.BFMatcher()**. It takes two optional params. First one is normType. It specifies the distance measurement to be used. By default, it is cv.NORM\_L2. It is good for SIFT, SURF etc (cv.NORM\_L1 is also there). For binary string based descriptors like ORB, BRIEF, BRISK etc, cv.NORM\_HAMMING should be used, which used Hamming distance as measurement. If ORB is using WTA\_K == 3 or 4, cv.NORM\_HAMMING2 should be used.

Second param is boolean variable, crossCheck which is false by default. If it is true, Matcher returns only those matches with value (i,j) such that i-th descriptor in set A has j-th descriptor in set B as the best match and vice-versa. That is, the two features in both sets should match each other. It provides consistent result, and is a good alternative to ratio test proposed by D.Lowe in SIFT paper.

Once it is created, two important methods are _BFMatcher.match()_ and _BFMatcher.knnMatch()_. First one returns the best match. Second method returns k best matches where k is specified by the user. It may be useful when we need to do additional work on that.

Like we used cv.drawKeypoints() to draw keypoints, **cv.drawMatches()** helps us to draw the matches. It stacks two images horizontally and draw lines from first image to second image showing best matches. There is also **cv.drawMatchesKnn** which draws all the k best matches. If k=2, it will draw two match-lines for each keypoint. So we have to pass a mask if we want to selectively draw it.

Let's see one example for each of SIFT and ORB (Both use different distance measurements).

### Brute-Force Matching with ORB Descriptors

Here, we will see a simple example on how to match features between two images. In this case, I have a queryImage and a trainImage. We will try to find the queryImage in trainImage using feature matching. ( The images are /samples/data/box.png and /samples/data/box\_in\_scene.png)

We are using ORB descriptors to match features. So let's start with loading images, finding descriptors etc. @code{.py} import numpy as np import cv2 as cv import matplotlib.pyplot as plt

img1 = cv.imread('box.png',cv.IMREAD\_GRAYSCALE) # queryImage img2 = cv.imread('box\_in\_scene.png',cv.IMREAD\_GRAYSCALE) # trainImage

# Initiate ORB detector

orb = cv.ORB\_create()

# find the keypoints and descriptors with ORB

kp1, des1 = orb.detectAndCompute(img1,None) kp2, des2 = orb.detectAndCompute(img2,None) @endcode Next we create a BFMatcher object with distance measurement cv.NORM\_HAMMING (since we are using ORB) and crossCheck is switched on for better results. Then we use Matcher.match() method to get the best matches in two images. We sort them in ascending order of their distances so that best matches (with low distance) come to front. Then we draw only first 10 matches (Just for sake of visibility. You can increase it as you like) @code{.py}

# create BFMatcher object

bf = cv.BFMatcher(cv.NORM\_HAMMING, crossCheck=True)

# Match descriptors.

matches = bf.match(des1,des2)

# Sort them in the order of their distance.

matches = sorted(matches, key = lambda x:x.distance)

# Draw first 10 matches.

img3 = cv.drawMatches(img1,kp1,img2,kp2,matches\[:10\],None,flags=cv.DrawMatchesFlags\_NOT\_DRAW\_SINGLE\_POINTS)

plt.imshow(img3),plt.show() @endcode Below is the result I got:

### What is this Matcher Object?

The result of matches = bf.match(des1,des2) line is a list of DMatch objects. This DMatch object has following attributes:

-   DMatch.distance - Distance between descriptors. The lower, the better it is.
-   DMatch.trainIdx - Index of the descriptor in train descriptors
-   DMatch.queryIdx - Index of the descriptor in query descriptors
-   DMatch.imgIdx - Index of the train image.

### Brute-Force Matching with SIFT Descriptors and Ratio Test

This time, we will use BFMatcher.knnMatch() to get k best matches. In this example, we will take k=2 so that we can apply ratio test explained by D.Lowe in his paper. @code{.py} import numpy as np import cv2 as cv import matplotlib.pyplot as plt

img1 = cv.imread('box.png',cv.IMREAD\_GRAYSCALE) # queryImage img2 = cv.imread('box\_in\_scene.png',cv.IMREAD\_GRAYSCALE) # trainImage

# Initiate SIFT detector

sift = cv.SIFT\_create()

# find the keypoints and descriptors with SIFT

kp1, des1 = sift.detectAndCompute(img1,None) kp2, des2 = sift.detectAndCompute(img2,None)

# BFMatcher with default params

bf = cv.BFMatcher() matches = bf.knnMatch(des1,des2,k=2)

# Apply ratio test

good = \[\] for m,n in matches: if m.distance < 0.75\*n.distance: good.append(\[m\])

# cv.drawMatchesKnn expects list of lists as matches.

img3 = cv.drawMatchesKnn(img1,kp1,img2,kp2,good,None,flags=cv.DrawMatchesFlags\_NOT\_DRAW\_SINGLE\_POINTS)

plt.imshow(img3),plt.show() @endcode See the result below:

## FLANN based Matcher

FLANN stands for Fast Library for Approximate Nearest Neighbors. It contains a collection of algorithms optimized for fast nearest neighbor search in large datasets and for high dimensional features. It works faster than BFMatcher for large datasets. We will see the second example with FLANN based matcher.

For FLANN based matcher, we need to pass two dictionaries which specifies the algorithm to be used, its related parameters etc. First one is IndexParams. For various algorithms, the information to be passed is explained in FLANN docs. As a summary, for algorithms like SIFT, SURF etc. you can pass following: @code{.py} FLANN\_INDEX\_KDTREE = 1 index\_params = dict(algorithm = FLANN\_INDEX\_KDTREE, trees = 5) @endcode While using ORB, you can pass the following. The commented values are recommended as per the docs, but it didn't provide required results in some cases. Other values worked fine.: @code{.py} FLANN\_INDEX\_LSH = 6 index\_params= dict(algorithm = FLANN\_INDEX\_LSH, table\_number = 6, # 12 key\_size = 12, # 20 multi\_probe\_level = 1) #2 @endcode Second dictionary is the SearchParams. It specifies the number of times the trees in the index should be recursively traversed. Higher values gives better precision, but also takes more time. If you want to change the value, pass search\_params = dict(checks=100).

With this information, we are good to go. @code{.py} import numpy as np import cv2 as cv import matplotlib.pyplot as plt

img1 = cv.imread('box.png',cv.IMREAD\_GRAYSCALE) # queryImage img2 = cv.imread('box\_in\_scene.png',cv.IMREAD\_GRAYSCALE) # trainImage

# Initiate SIFT detector

sift = cv.SIFT\_create()

# find the keypoints and descriptors with SIFT

kp1, des1 = sift.detectAndCompute(img1,None) kp2, des2 = sift.detectAndCompute(img2,None)

# FLANN parameters

FLANN\_INDEX\_KDTREE = 1 index\_params = dict(algorithm = FLANN\_INDEX\_KDTREE, trees = 5) search\_params = dict(checks=50) # or pass empty dictionary

flann = cv.FlannBasedMatcher(index\_params,search\_params)

matches = flann.knnMatch(des1,des2,k=2)

# Need to draw only good matches, so create a mask

matchesMask = \[\[0,0\] for i in range(len(matches))\]

# ratio test as per Lowe's paper

for i,(m,n) in enumerate(matches): if m.distance < 0.7\*n.distance: matchesMask\[i\]=\[1,0\]

draw\_params = dict(matchColor = (0,255,0), singlePointColor = (255,0,0), matchesMask = matchesMask, flags = cv.DrawMatchesFlags\_DEFAULT)

img3 = cv.drawMatchesKnn(img1,kp1,img2,kp2,matches,None,\*\*draw\_params)

plt.imshow(img3,),plt.show() @endcode See the result below:

## [Py Orb](https://docharvest.github.io/docs/opencv5/py_tutorials/py_features/py_orb/py_orb/)

Contents

opencv5

Py Orb

OpenCV 5

Py Orb

# ORB (Oriented FAST and Rotated BRIEF) {#tutorial\_py\_orb}

## Goal

In this chapter, - We will see the basics of ORB

## Theory

As an OpenCV enthusiast, the most important thing about the ORB is that it came from "OpenCV Labs". This algorithm was brought up by Ethan Rublee, Vincent Rabaud, Kurt Konolige and Gary R. Bradski in their paper **ORB: An efficient alternative to SIFT or SURF** in 2011. As the title says, it is a good alternative to SIFT and SURF in computation cost, matching performance and mainly the patents. Yes, SIFT and SURF are patented and you are supposed to pay them for its use. But ORB is not !!!

ORB is basically a fusion of FAST keypoint detector and BRIEF descriptor with many modifications to enhance the performance. First it use FAST to find keypoints, then apply Harris corner measure to find top N points among them. It also use pyramid to produce multiscale-features. But one problem is that, FAST doesn't compute the orientation. So what about rotation invariance? Authors came up with following modification.

It computes the intensity weighted centroid of the patch with located corner at center. The direction of the vector from this corner point to centroid gives the orientation. To improve the rotation invariance, moments are computed with x and y which should be in a circular region of radius \\f$r\\f$, where \\f$r\\f$ is the size of the patch.

Now for descriptors, ORB use BRIEF descriptors. But we have already seen that BRIEF performs poorly with rotation. So what ORB does is to "steer" BRIEF according to the orientation of keypoints. For any feature set of \\f$n\\f$ binary tests at location \\f$(x\_i, y\_i)\\f$, define a \\f$2 \\times n\\f$ matrix, \\f$S\\f$ which contains the coordinates of these pixels. Then using the orientation of patch, \\f$\\theta\\f$, its rotation matrix is found and rotates the \\f$S\\f$ to get steered(rotated) version \\f$S\_\\theta\\f$.

ORB discretize the angle to increments of \\f$2 \\pi /30\\f$ (12 degrees), and construct a lookup table of precomputed BRIEF patterns. As long as the keypoint orientation \\f$\\theta\\f$ is consistent across views, the correct set of points \\f$S\_\\theta\\f$ will be used to compute its descriptor.

BRIEF has an important property that each bit feature has a large variance and a mean near 0.5. But once it is oriented along keypoint direction, it loses this property and become more distributed. High variance makes a feature more discriminative, since it responds differentially to inputs. Another desirable property is to have the tests uncorrelated, since then each test will contribute to the result. To resolve all these, ORB runs a greedy search among all possible binary tests to find the ones that have both high variance and means close to 0.5, as well as being uncorrelated. The result is called **rBRIEF**.

For descriptor matching, multi-probe LSH which improves on the traditional LSH, is used. The paper says ORB is much faster than SURF and SIFT and ORB descriptor works better than SURF. ORB is a good choice in low-power devices for panorama stitching etc.

## ORB in OpenCV

As usual, we have to create an ORB object with the function, **cv.ORB()** or using features common interface. It has a number of optional parameters. Most useful ones are nFeatures which denotes maximum number of features to be retained (by default 500), scoreType which denotes whether Harris score or FAST score to rank the features (by default, Harris score) etc. Another parameter, WTA\_K decides number of points that produce each element of the oriented BRIEF descriptor. By default it is two, ie selects two points at a time. In that case, for matching, NORM\_HAMMING distance is used. If WTA\_K is 3 or 4, which takes 3 or 4 points to produce BRIEF descriptor, then matching distance is defined by NORM\_HAMMING2.

Below is a simple code which shows the use of ORB. @code{.py} import numpy as np import cv2 as cv from matplotlib import pyplot as plt

img = cv.imread('simple.jpg', cv.IMREAD\_GRAYSCALE)

# Initiate ORB detector

orb = cv.ORB\_create()

# find the keypoints with ORB

kp = orb.detect(img,None)

# compute the descriptors with ORB

kp, des = orb.compute(img, kp)

# draw only keypoints location,not size and orientation

img2 = cv.drawKeypoints(img, kp, None, color=(0,255,0), flags=0) plt.imshow(img2), plt.show() @endcode See the result below:

ORB feature matching, we will do in another chapter.

## Additional Resources

\-# Ethan Rublee, Vincent Rabaud, Kurt Konolige, Gary R. Bradski: ORB: An efficient alternative to SIFT or SURF. ICCV 2011: 2564-2571.

## [Py Shi Tomasi](https://docharvest.github.io/docs/opencv5/py_tutorials/py_features/py_shi_tomasi/py_shi_tomasi/)

Contents

opencv5

Py Shi Tomasi

OpenCV 5

Py Shi Tomasi

# Shi-Tomasi Corner Detector & Good Features to Track {#tutorial\_py\_shi\_tomasi}

## Goal

In this chapter,

-   We will learn about the another corner detector: Shi-Tomasi Corner Detector
-   We will see the function: **cv.goodFeaturesToTrack()**

## Theory

In last chapter, we saw Harris Corner Detector. Later in 1994, J. Shi and C. Tomasi made a small modification to it in their paper **Good Features to Track** which shows better results compared to Harris Corner Detector. The scoring function in Harris Corner Detector was given by:

\\f\[R = \\lambda\_1 \\lambda\_2 - k(\\lambda\_1+\\lambda\_2)^2\\f\]

Instead of this, Shi-Tomasi proposed:

\\f\[R = \\min(\\lambda\_1, \\lambda\_2)\\f\]

If it is a greater than a threshold value, it is considered as a corner. If we plot it in \\f$\\lambda\_1 - \\lambda\_2\\f$ space as we did in Harris Corner Detector, we get an image as below:

From the figure, you can see that only when \\f$\\lambda\_1\\f$ and \\f$\\lambda\_2\\f$ are above a minimum value, \\f$\\lambda\_{\\min}\\f$, it is considered as a corner(green region).

## Code

OpenCV has a function, **cv.goodFeaturesToTrack()**. It finds N strongest corners in the image by Shi-Tomasi method (or Harris Corner Detection, if you specify it). As usual, image should be a grayscale image. Then you specify number of corners you want to find. Then you specify the quality level, which is a value between 0-1, which denotes the minimum quality of corner below which everyone is rejected. Then we provide the minimum euclidean distance between corners detected.

With all this information, the function finds corners in the image. All corners below quality level are rejected. Then it sorts the remaining corners based on quality in the descending order. Then function takes first strongest corner, throws away all the nearby corners in the range of minimum distance and returns N strongest corners.

In below example, we will try to find 25 best corners: @code{.py} import numpy as np import cv2 as cv from matplotlib import pyplot as plt

img = cv.imread('blox.jpg') gray = cv.cvtColor(img,cv.COLOR\_BGR2GRAY)

corners = cv.goodFeaturesToTrack(gray,25,0.01,10) corners = np.int0(corners)

for i in corners: x,y = i.ravel() cv.circle(img,(x,y),3,255,-1)

plt.imshow(img),plt.show() @endcode See the result below:

This function is more appropriate for tracking. We will see that when its time comes.

## [Py Sift Intro](https://docharvest.github.io/docs/opencv5/py_tutorials/py_features/py_sift_intro/py_sift_intro/)

Contents

opencv5

Py Sift Intro

OpenCV 5

Py Sift Intro

# Introduction to SIFT (Scale-Invariant Feature Transform) {#tutorial\_py\_sift\_intro}

## Goal

In this chapter, - We will learn about the concepts of SIFT algorithm - We will learn to find SIFT Keypoints and Descriptors.

## Theory

In last couple of chapters, we saw some corner detectors like Harris etc. They are rotation-invariant, which means, even if the image is rotated, we can find the same corners. It is obvious because corners remain corners in rotated image also. But what about scaling? A corner may not be a corner if the image is scaled. For example, check a simple image below. A corner in a small image within a small window is flat when it is zoomed in the same window. So Harris corner is not scale invariant.

In 2004, **D.Lowe**, University of British Columbia, came up with a new algorithm, Scale Invariant Feature Transform (SIFT) in his paper, **Distinctive Image Features from Scale-Invariant Keypoints**, which extract keypoints and compute its descriptors. _(This paper is easy to understand and considered to be best material available on SIFT. This explanation is just a short summary of this paper)_.

There are mainly four steps involved in SIFT algorithm. We will see them one-by-one.

### 1\. Scale-space Extrema Detection

From the image above, it is obvious that we can't use the same window to detect keypoints with different scale. It is OK with small corner. But to detect larger corners we need larger windows. For this, scale-space filtering is used. In it, Laplacian of Gaussian is found for the image with various \\f$\\sigma\\f$ values. LoG acts as a blob detector which detects blobs in various sizes due to change in \\f$\\sigma\\f$. In short, \\f$\\sigma\\f$ acts as a scaling parameter. For eg, in the above image, gaussian kernel with low \\f$\\sigma\\f$ gives high value for small corner while gaussian kernel with high \\f$\\sigma\\f$ fits well for larger corner. So, we can find the local maxima across the scale and space which gives us a list of \\f$(x,y,\\sigma)\\f$ values which means there is a potential keypoint at (x,y) at \\f$\\sigma\\f$ scale.

But this LoG is a little costly, so SIFT algorithm uses Difference of Gaussians which is an approximation of LoG. Difference of Gaussian is obtained as the difference of Gaussian blurring of an image with two different \\f$\\sigma\\f$, let it be \\f$\\sigma\\f$ and \\f$k\\sigma\\f$. This process is done for different octaves of the image in Gaussian Pyramid. It is represented in below image:

Once this DoG are found, images are searched for local extrema over scale and space. For eg, one pixel in an image is compared with its 8 neighbours as well as 9 pixels in next scale and 9 pixels in previous scales. If it is a local extrema, it is a potential keypoint. It basically means that keypoint is best represented in that scale. It is shown in below image:

Regarding different parameters, the paper gives some empirical data which can be summarized as, number of octaves = 4, number of scale levels = 5, initial \\f$\\sigma=1.6\\f$, \\f$k=\\sqrt{2}\\f$ etc as optimal values.

### 2\. Keypoint Localization

Once potential keypoints locations are found, they have to be refined to get more accurate results. They used Taylor series expansion of scale space to get more accurate location of extrema, and if the intensity at this extrema is less than a threshold value (0.03 as per the paper), it is rejected. This threshold is called **contrastThreshold** in OpenCV

DoG has higher response for edges, so edges also need to be removed. For this, a concept similar to Harris corner detector is used. They used a 2x2 Hessian matrix (H) to compute the principal curvature. We know from Harris corner detector that for edges, one eigen value is larger than the other. So here they used a simple function,

If this ratio is greater than a threshold, called **edgeThreshold** in OpenCV, that keypoint is discarded. It is given as 10 in paper.

So it eliminates any low-contrast keypoints and edge keypoints and what remains is strong interest points.

### 3\. Orientation Assignment

Now an orientation is assigned to each keypoint to achieve invariance to image rotation. A neighbourhood is taken around the keypoint location depending on the scale, and the gradient magnitude and direction is calculated in that region. An orientation histogram with 36 bins covering 360 degrees is created (It is weighted by gradient magnitude and gaussian-weighted circular window with \\f$\\sigma\\f$ equal to 1.5 times the scale of keypoint). The highest peak in the histogram is taken and any peak above 80% of it is also considered to calculate the orientation. It creates keypoints with same location and scale, but different directions. It contribute to stability of matching.

### 4\. Keypoint Descriptor

Now keypoint descriptor is created. A 16x16 neighbourhood around the keypoint is taken. It is divided into 16 sub-blocks of 4x4 size. For each sub-block, 8 bin orientation histogram is created. So a total of 128 bin values are available. It is represented as a vector to form keypoint descriptor. In addition to this, several measures are taken to achieve robustness against illumination changes, rotation etc.

### 5\. Keypoint Matching

Keypoints between two images are matched by identifying their nearest neighbours. But in some cases, the second closest-match may be very near to the first. It may happen due to noise or some other reasons. In that case, ratio of closest-distance to second-closest distance is taken. If it is greater than 0.8, they are rejected. It eliminates around 90% of false matches while discards only 5% correct matches, as per the paper.

This is a summary of SIFT algorithm. For more details and understanding, reading the original paper is highly recommended.

## SIFT in OpenCV

Now let's see SIFT functionalities available in OpenCV. Note that these were previously only available in [the opencv contrib repo](https://github.com/opencv/opencv_contrib), but the patent expired in the year 2020. So they are now included in the main repo. Let's start with keypoint detection and draw them. First we have to construct a SIFT object. We can pass different parameters to it which are optional and they are well explained in docs. @code{.py} import numpy as np import cv2 as cv

img = cv.imread('home.jpg') gray= cv.cvtColor(img,cv.COLOR\_BGR2GRAY)

sift = cv.SIFT\_create() kp = sift.detect(gray,None)

img=cv.drawKeypoints(gray,kp,img)

cv.imwrite('sift\_keypoints.jpg',img) @endcode **sift.detect()** function finds the keypoint in the images. You can pass a mask if you want to search only a part of image. Each keypoint is a special structure which has many attributes like its (x,y) coordinates, size of the meaningful neighbourhood, angle which specifies its orientation, response that specifies strength of keypoints etc.

OpenCV also provides **cv.drawKeyPoints()** function which draws the small circles on the locations of keypoints. If you pass a flag, **cv.DRAW\_MATCHES\_FLAGS\_DRAW\_RICH\_KEYPOINTS** to it, it will draw a circle with size of keypoint and it will even show its orientation. See below example. @code{.py} img=cv.drawKeypoints(gray,kp,img,flags=cv.DRAW\_MATCHES\_FLAGS\_DRAW\_RICH\_KEYPOINTS) cv.imwrite('sift\_keypoints.jpg',img) @endcode See the two results below:

Now to calculate the descriptor, OpenCV provides two methods.

\-# Since you already found keypoints, you can call **sift.compute()** which computes the descriptors from the keypoints we have found. Eg: kp,des = sift.compute(gray,kp) 2. If you didn't find keypoints, directly find keypoints and descriptors in a single step with the function, **sift.detectAndCompute()**.

We will see the second method: @code{.py} sift = cv.SIFT\_create() kp, des = sift.detectAndCompute(gray,None) @endcode Here kp will be a list of keypoints and des is a numpy array of shape \\f$\\text{(Number of Keypoints)} \\times 128\\f$.

So we got keypoints, descriptors etc. Now we want to see how to match keypoints in different images. That we will learn in coming chapters.

## [Py Table Of Contents Features](https://docharvest.github.io/docs/opencv5/py_tutorials/py_features/py_table_of_contents_features/)

Contents

opencv5

Py Table Of Contents Features

OpenCV 5

Py Table Of Contents Features

# Feature Detection and Description {#tutorial\_py\_table\_of\_contents\_features}

-   @subpage tutorial\_py\_features\_meaning
    
    What are the main features in an image? How can finding those features be useful to us?
    
-   @subpage tutorial\_py\_features\_harris
    
    Okay, Corners are good features? But how do we find them?
    
-   @subpage tutorial\_py\_shi\_tomasi
    
    We will look into Shi-Tomasi corner detection
    
-   @subpage tutorial\_py\_sift\_intro
    
    Harris corner detector is not good enough when scale of image changes. Lowe developed a breakthrough method to find scale-invariant features and it is called SIFT
    
-   @subpage tutorial\_py\_fast
    
    All the above feature detection methods are good in some way. But they are not fast enough to work in real-time applications like SLAM. There comes the FAST algorithm, which is really "FAST".
    
-   @subpage tutorial\_py\_orb
    
    SURF is good in what it does, but what if you have to pay a few dollars every year to use it in your applications? Yeah, it is patented!!! To solve that problem, OpenCV devs came up with a new "FREE" alternative to SIFT & SURF, and that is ORB.
    
-   @subpage tutorial\_py\_matcher
    
    We know a great deal about feature detectors and descriptors. It is time to learn how to match different descriptors. OpenCV provides two techniques, Brute-Force matcher and FLANN based matcher.
    
-   @subpage tutorial\_py\_feature\_homography
    
    Now we know about feature matching. Let's mix it up with 3d module to find objects in a complex image.

## [Py Drawing Functions](https://docharvest.github.io/docs/opencv5/py_tutorials/py_gui/py_drawing_functions/py_drawing_functions/)

Contents

opencv5

Py Drawing Functions

OpenCV 5

Py Drawing Functions

# Drawing Functions in OpenCV {#tutorial\_py\_drawing\_functions}

## Goal

-   Learn to draw different geometric shapes with OpenCV
-   You will learn these functions : **cv.line()**, **cv.circle()** , **cv.rectangle()**, **cv.ellipse()**, **cv.putText()** etc.

## Code

In all the above functions, you will see some common arguments as given below:

-   img : The image where you want to draw the shapes
-   color : Color of the shape. for BGR, pass it as a tuple, eg: (255,0,0) for blue. For grayscale, just pass the scalar value.
-   thickness : Thickness of the line or circle etc. If **\-1** is passed for closed figures like circles, it will fill the shape. _default thickness = 1_
-   lineType : Type of line, whether 8-connected, anti-aliased line etc. _By default, it is 8-connected._ cv.LINE\_AA gives anti-aliased line which looks great for curves.

### Drawing Line

To draw a line, you need to pass starting and ending coordinates of line. We will create a black image and draw a blue line on it from top-left to bottom-right corners. @code{.py} import numpy as np import cv2 as cv

# Create a black image

img = np.zeros((512,512,3), np.uint8)

# Draw a diagonal blue line with thickness of 5 px

cv.line(img,(0,0),(511,511),(255,0,0),5) @endcode

### Drawing Rectangle

To draw a rectangle, you need top-left corner and bottom-right corner of rectangle. This time we will draw a green rectangle at the top-right corner of image. @code{.py} cv.rectangle(img,(384,0),(510,128),(0,255,0),3) @endcode

### Drawing Circle

To draw a circle, you need its center coordinates and radius. We will draw a circle inside the rectangle drawn above. @code{.py} cv.circle(img,(447,63), 63, (0,0,255), -1) @endcode

### Drawing Ellipse

To draw the ellipse, we need to pass several arguments. One argument is the center location (x,y). Next argument is axes lengths (semi-major axis length, semi-minor axis length). angle is the angle of rotation of ellipse in anti-clockwise direction. startAngle and endAngle denotes the starting and ending of ellipse arc measured in clockwise direction from major axis. i.e. giving values 0 and 360 gives the full ellipse. For more details, check the documentation of **cv.ellipse()**. Below example draws a half ellipse at the center of the image. @code{.py} cv.ellipse(img,(256,256),(100,50),0,0,180,255,-1) @endcode

### Drawing Polygon

To draw a polygon, first you need coordinates of vertices. Make those points into an array of shape ROWSx1x2 where ROWS are number of vertices and it should be of type int32. Here we draw a small polygon of with four vertices in yellow color. @code{.py} pts = np.array(\[\[10,5\],\[20,30\],\[70,20\],\[50,10\]\], np.int32) pts = pts.reshape((-1,1,2)) cv.polylines(img,\[pts\],True,(0,255,255)) @endcode

@note If third argument is False, you will get a polylines joining all the points, not a closed shape.

@note cv.polylines() can be used to draw multiple lines. Just create a list of all the lines you want to draw and pass it to the function. All lines will be drawn individually. It is a much better and faster way to draw a group of lines than calling cv.line() for each line.

### Adding Text to Images:

To put texts in images, you need specify following things. - Text data that you want to write - Position coordinates of where you want put it (i.e. bottom-left corner where data starts). - Font type (Check **cv.putText()** docs for supported fonts) - Font Scale (specifies the size of font) - regular things like color, thickness, lineType etc. For better look, lineType = cv.LINE\_AA is recommended.

We will write **OpenCV** on our image in white color. @code{.py} font = cv.FONT\_HERSHEY\_SIMPLEX cv.putText(img,'OpenCV',(10,500), font, 4,(255,255,255),2,cv.LINE\_AA) @endcode

### Result

So it is time to see the final result of our drawing. As you studied in previous articles, display the image to see it.

## Additional Resources

\-# The angles used in ellipse function is not our circular angles. For more details, visit [this discussion](http://answers.opencv.org/question/14541/angles-in-ellipse-function/).

## Exercises

\-# Try to create the logo of OpenCV using drawing functions available in OpenCV.

## [Py Image Display](https://docharvest.github.io/docs/opencv5/py_tutorials/py_gui/py_image_display/py_image_display/)

Contents

opencv5

Py Image Display

OpenCV 5

Py Image Display

# Getting Started with Images {#tutorial\_py\_image\_display}

Tutorial content has been moved: @ref tutorial\_display\_image

## [Py Mouse Handling](https://docharvest.github.io/docs/opencv5/py_tutorials/py_gui/py_mouse_handling/py_mouse_handling/)

Contents

opencv5

Py Mouse Handling

OpenCV 5

Py Mouse Handling

# Mouse as a Paint-Brush {#tutorial\_py\_mouse\_handling}

## Goal

-   Learn to handle mouse events in OpenCV
-   You will learn these functions : **cv.setMouseCallback()**

## Simple Demo

Here, we create a simple application which draws a circle on an image wherever we double-click on it.

First we create a mouse callback function which is executed when a mouse event take place. Mouse event can be anything related to mouse like left-button down, left-button up, left-button double-click etc. It gives us the coordinates (x,y) for every mouse event. With this event and location, we can do whatever we like. To list all available events available, run the following code in Python terminal: @code{.py} import cv2 as cv events = \[i for i in dir(cv) if 'EVENT' in i\] print( events ) @endcode Creating mouse callback function has a specific format which is same everywhere. It differs only in what the function does. So our mouse callback function does one thing, it draws a circle where we double-click. So see the code below. Code is self-explanatory from comments : @code{.py} import numpy as np import cv2 as cv

# mouse callback function

def draw\_circle(event,x,y,flags,param): if event == cv.EVENT\_LBUTTONDBLCLK: cv.circle(img,(x,y),100,(255,0,0),-1)

# Create a black image, a window and bind the function to window

img = np.zeros((512,512,3), np.uint8) cv.namedWindow('image') cv.setMouseCallback('image',draw\_circle)

while(1): cv.imshow('image',img) if cv.waitKey(20) & 0xFF == 27: break cv.destroyAllWindows() @endcode

## More Advanced Demo

Now we go for a much better application. In this, we draw either rectangles or circles (depending on the mode we select) by dragging the mouse like we do in Paint application. So our mouse callback function has two parts, one to draw rectangle and other to draw the circles. This specific example will be really helpful in creating and understanding some interactive applications like object tracking, image segmentation etc. @code{.py} import numpy as np import cv2 as cv

drawing = False # true if mouse is pressed mode = True # if True, draw rectangle. Press 'm' to toggle to curve ix,iy = -1,-1

# mouse callback function

def draw\_circle(event,x,y,flags,param): global ix,iy,drawing,mode

```
if event == cv.EVENT_LBUTTONDOWN:
    drawing = True
    ix,iy = x,y

elif event == cv.EVENT_MOUSEMOVE:
    if drawing == True:
        if mode == True:
            cv.rectangle(img,(ix,iy),(x,y),(0,255,0),-1)
        else:
            cv.circle(img,(x,y),5,(0,0,255),-1)

elif event == cv.EVENT_LBUTTONUP:
    drawing = False
    if mode == True:
        cv.rectangle(img,(ix,iy),(x,y),(0,255,0),-1)
    else:
        cv.circle(img,(x,y),5,(0,0,255),-1)
```

@endcode Next we have to bind this mouse callback function to OpenCV window. In the main loop, we should set a keyboard binding for key 'm' to toggle between rectangle and circle. @code{.py} img = np.zeros((512,512,3), np.uint8) cv.namedWindow('image') cv.setMouseCallback('image',draw\_circle)

while(1): cv.imshow('image',img) k = cv.waitKey(1) & 0xFF if k == ord('m'): mode = not mode elif k == 27: break

cv.destroyAllWindows() @endcode

## Exercises

\-# In our last example, we drew filled rectangle. You modify the code to draw an unfilled rectangle.

## [Py Table Of Contents Gui](https://docharvest.github.io/docs/opencv5/py_tutorials/py_gui/py_table_of_contents_gui/)

Contents

opencv5

Py Table Of Contents Gui

OpenCV 5

Py Table Of Contents Gui

# Gui Features in OpenCV {#tutorial\_py\_table\_of\_contents\_gui}

-   @ref tutorial\_display\_image
    
    Learn to load an image, display it, and save it back
    
-   @subpage tutorial\_py\_video\_display
    
    Learn to play videos, capture videos from a camera, and write videos
    
-   @subpage tutorial\_py\_drawing\_functions
    
    Learn to draw lines, rectangles, ellipses, circles, etc with OpenCV
    
-   @subpage tutorial\_py\_mouse\_handling
    
    Draw stuff with your mouse
    
-   @subpage tutorial\_py\_trackbar
    
    Create trackbar to control certain parameters

## [Py Trackbar](https://docharvest.github.io/docs/opencv5/py_tutorials/py_gui/py_trackbar/py_trackbar/)

Contents

opencv5

Py Trackbar

OpenCV 5

Py Trackbar

# Trackbar as the Color Palette {#tutorial\_py\_trackbar}

## Goal

-   Learn to bind trackbar to OpenCV windows
-   You will learn these functions : **cv.getTrackbarPos()**, **cv.createTrackbar()** etc.

## Code Demo

Here we will create a simple application which shows the color you specify. You have a window which shows the color and three trackbars to specify each of B,G,R colors. You slide the trackbar and correspondingly window color changes. By default, initial color will be set to Black.

For cv.createTrackbar() function, first argument is the trackbar name, second one is the window name to which it is attached, third argument is the default value, fourth one is the maximum value and fifth one is the callback function which is executed every time trackbar value changes. The callback function always has a default argument which is the trackbar position. In our case, function does nothing, so we simply pass.

Another important application of trackbar is to use it as a button or switch. OpenCV, by default, doesn't have button functionality. So you can use trackbar to get such functionality. In our application, we have created one switch in which application works only if switch is ON, otherwise screen is always black. @code{.py} import numpy as np import cv2 as cv

def nothing(x): pass

# Create a black image, a window

img = np.zeros((300,512,3), np.uint8) cv.namedWindow('image')

# create trackbars for color change

cv.createTrackbar('R','image',0,255,nothing)

cv.createTrackbar('G','image',0,255,nothing) cv.createTrackbar('B','image',0,255,nothing)

# create switch for ON/OFF functionality

switch = '0 : OFF \\n1 : ON' cv.createTrackbar(switch, 'image',0,1,nothing)

while(1): cv.imshow('image',img) k = cv.waitKey(1) & 0xFF if k == 27: break

```
# get current positions of four trackbars
r = cv.getTrackbarPos('R','image')
g = cv.getTrackbarPos('G','image')
b = cv.getTrackbarPos('B','image')
s = cv.getTrackbarPos(switch,'image')

if s == 0:
    img[:] = 0
else:
    img[:] = [b,g,r]
```

cv.destroyAllWindows() @endcode The screenshot of the application looks like below :

## Exercises

\-# Create a Paint application with adjustable colors and brush radius using trackbars. For drawing, refer previous tutorial on mouse handling.

## [Py Video Display](https://docharvest.github.io/docs/opencv5/py_tutorials/py_gui/py_video_display/py_video_display/)

Contents

opencv5

Py Video Display

OpenCV 5

Py Video Display

# Getting Started with Videos {#tutorial\_py\_video\_display}

## Goal

-   Learn to read video, display video, and save video.
-   Learn to capture video from a camera and display it.
-   You will learn these functions : **cv.VideoCapture()**, **cv.VideoWriter()**

## Capture Video from Camera

Often, we have to capture live stream with a camera. OpenCV provides a very simple interface to do this. Let's capture a video from the camera (I am using the built-in webcam on my laptop), convert it into grayscale video and display it. Just a simple task to get started.

To capture a video, you need to create a **VideoCapture** object. Its argument can be either the device index or the name of a video file. A device index is just the number to specify which camera. Normally one camera will be connected (as in my case). So I simply pass 0 (or -1). You can select the second camera by passing 1 and so on. After that, you can capture frame-by-frame. But at the end, don't forget to release the capture. @code{.py} import numpy as np import cv2 as cv

cap = cv.VideoCapture(0) if not cap.isOpened(): print("Cannot open camera") exit() while True: # Capture frame-by-frame ret, frame = cap.read()

```
# if frame is read correctly ret is True
if not ret:
    print("Can't receive frame (stream end?). Exiting ...")
    break
# Our operations on the frame come here
gray = cv.cvtColor(frame, cv.COLOR_BGR2GRAY)
# Display the resulting frame
cv.imshow('frame', gray)
if cv.waitKey(1) == ord('q'):
    break
```

# When everything done, release the capture

cap.release() cv.destroyAllWindows()@endcode `cap.read()` returns a bool (`True`/`False`). If the frame is read correctly, it will be `True`. So you can check for the end of the video by checking this returned value.

Sometimes, cap may not have initialized the capture. In that case, this code shows an error. You can check whether it is initialized or not by the method **cap.isOpened()**. If it is `True`, OK. Otherwise open it using **cap.open()**.

You can also access some of the features of this video using **cap.get(propId)** method where propId is a number from 0 to 18. Each number denotes a property of the video (if it is applicable to that video). Full details can be seen here: cv::VideoCapture::get(). Some of these values can be modified using **cap.set(propId, value)**. Value is the new value you want.

For example, I can check the frame width and height by `cap.get(cv.CAP_PROP_FRAME_WIDTH)` and `cap.get(cv.CAP_PROP_FRAME_HEIGHT)`. It gives me 640x480 by default. But I want to modify it to 320x240. Just use `ret = cap.set(cv.CAP_PROP_FRAME_WIDTH,320)` and `ret = cap.set(cv.CAP_PROP_FRAME_HEIGHT,240)`.

@note If you are getting an error, make sure your camera is working fine using any other camera application (like Cheese in Linux).

## Playing Video from file

Playing video from file is the same as capturing it from camera, just change the camera index to a video file name. Also while displaying the frame, use appropriate time for `cv.waitKey()`. If it is too less, video will be very fast and if it is too high, video will be slow (Well, that is how you can display videos in slow motion). 25 milliseconds will be OK in normal cases. @code{.py} import numpy as np import cv2 as cv

cap = cv.VideoCapture('vtest.avi')

while cap.isOpened(): ret, frame = cap.read()

```
# if frame is read correctly ret is True
if not ret:
    print("Can't receive frame (stream end?). Exiting ...")
    break
gray = cv.cvtColor(frame, cv.COLOR_BGR2GRAY)

cv.imshow('frame', gray)
if cv.waitKey(1) == ord('q'):
    break
```

cap.release() cv.destroyAllWindows() @endcode

@note Make sure a proper version of ffmpeg or gstreamer is installed. Sometimes it is a headache to work with video capture, mostly due to wrong installation of ffmpeg/gstreamer.

## Saving a Video

So we capture a video and process it frame-by-frame, and we want to save that video. For images, it is very simple: just use `cv.imwrite()`. Here, a little more work is required.

This time we create a **VideoWriter** object. We should specify the output file name (eg: output.avi). Then we should specify the **FourCC** code (details in next paragraph). Then number of frames per second (fps) and frame size should be passed. And the last one is the **isColor** flag. If it is `True`, the encoder expect color frame, otherwise it works with grayscale frame.

[FourCC](http://en.wikipedia.org/wiki/FourCC) is a 4-byte code used to specify the video codec. The list of available codes can be found in [fourcc.org](https://fourcc.org/codecs.php). It is platform dependent. The following codecs work fine for me.

-   In Fedora: DIVX, XVID, MJPG, X264, WMV1, WMV2. (XVID is more preferable. MJPG results in high size video. X264 gives very small size video)
-   In Windows: DIVX (More to be tested and added)
-   In OSX: MJPG (.mp4), DIVX (.avi), X264 (.mkv).

FourCC code is passed as `cv.VideoWriter_fourcc('M','J','P','G')` or `cv.VideoWriter_fourcc(*'MJPG')` for MJPG.

The below code captures from a camera, flips every frame in the vertical direction, and saves the video. @code{.py} import numpy as np import cv2 as cv

cap = cv.VideoCapture(0)

# Define the codec and create VideoWriter object

fourcc = cv.VideoWriter\_fourcc(\*'XVID') out = cv.VideoWriter('output.avi', fourcc, 20.0, (640, 480))

while cap.isOpened(): ret, frame = cap.read() if not ret: print("Can't receive frame (stream end?). Exiting ...") break frame = cv.flip(frame, 0)

```
# write the flipped frame
out.write(frame)

cv.imshow('frame', frame)
if cv.waitKey(1) == ord('q'):
    break
```

# Release everything if job is finished

cap.release() out.release() cv.destroyAllWindows() @endcode

## [Py Canny](https://docharvest.github.io/docs/opencv5/py_tutorials/py_imgproc/py_canny/py_canny/)

Contents

opencv5

Py Canny

OpenCV 5

Py Canny

# Canny Edge Detection {#tutorial\_py\_canny}

## Goal

In this chapter, we will learn about

-   Concept of Canny edge detection
-   OpenCV functions for that : **cv.Canny()**

## Theory

Canny Edge Detection is a popular edge detection algorithm. It was developed by John F. Canny in 1986. It is a multi-stage algorithm and we will go through each stages.

\-# **Noise Reduction**

```
Since edge detection is susceptible to noise in the image, first step is to remove the noise in the
image with a 5x5 Gaussian filter. We have already seen this in previous chapters.
```

\-# **Finding Intensity Gradient of the Image**

```
Smoothened image is then filtered with a Sobel kernel in both horizontal and vertical direction to
get first derivative in horizontal direction (\f$G_x\f$) and vertical direction (\f$G_y\f$). From these two
images, we can find edge gradient and direction for each pixel as follows:

\f[
Edge\_Gradient \; (G) = \sqrt{G_x^2 + G_y^2} \\
Angle \; (\theta) = \tan^{-1} \bigg(\frac{G_y}{G_x}\bigg)
\f]

Gradient direction is always perpendicular to edges. It is rounded to one of four angles
representing vertical, horizontal and two diagonal directions.
```

\-# **Non-maximum Suppression**

```
After getting gradient magnitude and direction, a full scan of image is done to remove any unwanted
pixels which may not constitute the edge. For this, at every pixel, pixel is checked if it is a
local maximum in its neighborhood in the direction of gradient. Check the image below:

![image](images/nms.jpg)

Point A is on the edge ( in vertical direction). Gradient direction is normal to the edge. Point B
and C are in gradient directions. So point A is checked with point B and C to see if it forms a
local maximum. If so, it is considered for next stage, otherwise, it is suppressed ( put to zero).

In short, the result you get is a binary image with "thin edges".
```

\-# **Hysteresis Thresholding**

```
This stage decides which are all edges are really edges and which are not. For this, we need two
threshold values, minVal and maxVal. Any edges with intensity gradient more than maxVal are sure to
be edges and those below minVal are sure to be non-edges, so discarded. Those who lie between these
two thresholds are classified edges or non-edges based on their connectivity. If they are connected
to "sure-edge" pixels, they are considered to be part of edges. Otherwise, they are also discarded.
See the image below:

![image](images/hysteresis.jpg)

The edge A is above the maxVal, so considered as "sure-edge". Although edge C is below maxVal, it is
connected to edge A, so that also considered as valid edge and we get that full curve. But edge B,
although it is above minVal and is in same region as that of edge C, it is not connected to any
"sure-edge", so that is discarded. So it is very important that we have to select minVal and maxVal
accordingly to get the correct result.

This stage also removes small pixels noises on the assumption that edges are long lines.
```

So what we finally get is strong edges in the image.

## Canny Edge Detection in OpenCV

OpenCV puts all the above in single function, **cv.Canny()**. We will see how to use it. First argument is our input image. Second and third arguments are our minVal and maxVal respectively. Fourth argument is aperture\_size. It is the size of Sobel kernel used for find image gradients. By default it is 3. Last argument is L2gradient which specifies the equation for finding gradient magnitude. If it is True, it uses the equation mentioned above which is more accurate, otherwise it uses this function: \\f$Edge\_Gradient ; (G) = |G\_x| + |G\_y|\\f$. By default, it is False. @code{.py} import numpy as np import cv2 as cv from matplotlib import pyplot as plt

img = cv.imread('messi5.jpg', cv.IMREAD\_GRAYSCALE) assert img is not None, "file could not be read, check with os.path.exists()" edges = cv.Canny(img,100,200)

plt.subplot(121),plt.imshow(img,cmap = 'gray') plt.title('Original Image'), plt.xticks(\[\]), plt.yticks(\[\]) plt.subplot(122),plt.imshow(edges,cmap = 'gray') plt.title('Edge Image'), plt.xticks(\[\]), plt.yticks(\[\])

plt.show() @endcode See the result below:

## Additional Resources

\-# Canny edge detector at [Wikipedia](http://en.wikipedia.org/wiki/Canny_edge_detector) -# [Canny Edge Detection Tutorial](http://dasl.unlv.edu/daslDrexel/alumni/bGreen/www.pages.drexel.edu/_weg22/can_tut.html) by Bill Green, 2002.

## Exercises

\-# Write a small application to find the Canny edge detection whose threshold values can be varied using two trackbars. This way, you can understand the effect of threshold values.

## [Py Colorspaces](https://docharvest.github.io/docs/opencv5/py_tutorials/py_imgproc/py_colorspaces/py_colorspaces/)

Contents

opencv5

Py Colorspaces

OpenCV 5

Py Colorspaces

# Changing Colorspaces {#tutorial\_py\_colorspaces}

## Goal

-   In this tutorial, you will learn how to convert images from one color-space to another, like BGR \\f$\\leftrightarrow\\f$ Gray, BGR \\f$\\leftrightarrow\\f$ HSV, etc.
-   In addition to that, we will create an application to extract a colored object in a video
-   You will learn the following functions: **cv.cvtColor()**, **cv.inRange()**, etc.

## Changing Color-space

There are more than 150 color-space conversion methods available in OpenCV. But we will look into only two, which are most widely used ones: BGR \\f$\\leftrightarrow\\f$ Gray and BGR \\f$\\leftrightarrow\\f$ HSV.

For color conversion, we use the function cv.cvtColor(input\_image, flag) where flag determines the type of conversion.

For BGR \\f$\\rightarrow\\f$ Gray conversion, we use the flag cv.COLOR\_BGR2GRAY. Similarly for BGR \\f$\\rightarrow\\f$ HSV, we use the flag cv.COLOR\_BGR2HSV. To get other flags, just run following commands in your Python terminal: @code{.py}

> > > import cv2 as cv flags = \[i for i in dir(cv) if i.startswith('COLOR\_')\] print( flags ) @endcode @note For HSV, hue range is \[0,179\], saturation range is \[0,255\], and value range is \[0,255\]. Different software use different scales. So if you are comparing OpenCV values with them, you need to normalize these ranges.

## Object Tracking

Now that we know how to convert a BGR image to HSV, we can use this to extract a colored object. In HSV, it is easier to represent a color than in BGR color-space. In our application, we will try to extract a blue colored object. So here is the method:

-   Take each frame of the video
-   Convert from BGR to HSV color-space
-   We threshold the HSV image for a range of blue color
-   Now extract the blue object alone, we can do whatever we want on that image.

Below is the code which is commented in detail: @code{.py} import cv2 as cv import numpy as np

cap = cv.VideoCapture(0)

while(1):

```
# Take each frame
_, frame = cap.read()

# Convert BGR to HSV
hsv = cv.cvtColor(frame, cv.COLOR_BGR2HSV)

# define range of blue color in HSV
lower_blue = np.array([110,50,50])
upper_blue = np.array([130,255,255])

# Threshold the HSV image to get only blue colors
mask = cv.inRange(hsv, lower_blue, upper_blue)

# Bitwise-AND mask and original image
res = cv.bitwise_and(frame,frame, mask= mask)

cv.imshow('frame',frame)
cv.imshow('mask',mask)
cv.imshow('res',res)
k = cv.waitKey(5) & 0xFF
if k == 27:
    break
```

cv.destroyAllWindows() @endcode Below image shows tracking of the blue object:

@note There is some noise in the image. We will see how to remove it in later chapters.

@note This is the simplest method in object tracking. Once you learn functions of contours, you can do plenty of things like find the centroid of an object and use it to track the object, draw diagrams just by moving your hand in front of a camera, and other fun stuff.

## How to find HSV values to track?

This is a common question found in [stackoverflow.com](http://www.stackoverflow.com). It is very simple and you can use the same function, cv.cvtColor(). Instead of passing an image, you just pass the BGR values you want. For example, to find the HSV value of Green, try the following commands in a Python terminal: @code{.py}

> > > green = np.uint8(\[\[\[0,255,0 \]\]\]) hsv\_green = cv.cvtColor(green,cv.COLOR\_BGR2HSV) print( hsv\_green ) \[\[\[ 60 255 255\]\]\] @endcode Now you take \[H-10, 100,100\] and \[H+10, 255, 255\] as the lower bound and upper bound respectively. Apart from this method, you can use any image editing tools like GIMP or any online converters to find these values, but don't forget to adjust the HSV ranges.

## Exercises

\-# Try to find a way to extract more than one colored object, for example, extract red, blue, and green objects simultaneously.

## [Py Contour Features](https://docharvest.github.io/docs/opencv5/py_tutorials/py_imgproc/py_contours/py_contour_features/py_contour_features/)

Contents

opencv5

Py Contour Features

OpenCV 5

Py Contour Features

# Contour Features {#tutorial\_py\_contour\_features}

@prev\_tutorial{tutorial\_py\_contours\_begin} @next\_tutorial{tutorial\_py\_contour\_properties}

## Goal

In this article, we will learn

-   To find the different features of contours, like area, perimeter, centroid, bounding box etc
-   You will see plenty of functions related to contours.

1.  Moments

* * *

Image moments help you to calculate some features like center of mass of the object, area of the object etc. Check out the wikipedia page on [Image Moments](http://en.wikipedia.org/wiki/Image_moment)

The function **cv.moments()** gives a dictionary of all moment values calculated. See below: @code{.py} import numpy as np import cv2 as cv

img = cv.imread('star.jpg', cv.IMREAD\_GRAYSCALE) assert img is not None, "file could not be read, check with os.path.exists()" ret,thresh = cv.threshold(img,127,255,0) contours,hierarchy = cv.findContours(thresh, 1, 2)

cnt = contours\[0\] M = cv.moments(cnt) print( M ) @endcode From this moments, you can extract useful data like area, centroid etc. Centroid is given by the relations, \\f$C\_x = \\frac{M\_{10}}{M\_{00}}\\f$ and \\f$C\_y = \\frac{M\_{01}}{M\_{00}}\\f$. This can be done as follows: @code{.py} cx = int(M\['m10'\]/M\['m00'\]) cy = int(M\['m01'\]/M\['m00'\]) @endcode

2.  Contour Area

* * *

Contour area is given by the function **cv.contourArea()** or from moments, **M\['m00'\]**. @code{.py} area = cv.contourArea(cnt) @endcode

3.  Contour Perimeter

* * *

It is also called arc length. It can be found out using **cv.arcLength()** function. Second argument specify whether shape is a closed contour (if passed True), or just a curve. @code{.py} perimeter = cv.arcLength(cnt,True) @endcode

4.  Contour Approximation

* * *

It approximates a contour shape to another shape with less number of vertices depending upon the precision we specify. It is an implementation of [Douglas-Peucker algorithm](http://en.wikipedia.org/wiki/Ramer-Douglas-Peucker_algorithm). Check the wikipedia page for algorithm and demonstration.

To understand this, suppose you are trying to find a square in an image, but due to some problems in the image, you didn't get a perfect square, but a "bad shape" (As shown in first image below). Now you can use this function to approximate the shape. In this, second argument is called epsilon, which is maximum distance from contour to approximated contour. It is an accuracy parameter. A wise selection of epsilon is needed to get the correct output. @code{.py} epsilon = 0.1\*cv.arcLength(cnt,True) approx = cv.approxPolyDP(cnt,epsilon,True) @endcode Below, in second image, green line shows the approximated curve for epsilon = 10% of arc length. Third image shows the same for epsilon = 1% of the arc length. Third argument specifies whether curve is closed or not.

5.  Convex Hull

* * *

Convex Hull will look similar to contour approximation, but it is not (Both may provide same results in some cases). Here, **cv.convexHull()** function checks a curve for convexity defects and corrects it. Generally speaking, convex curves are the curves which are always bulged out, or at-least flat. And if it is bulged inside, it is called convexity defects. For example, check the below image of hand. Red line shows the convex hull of hand. The double-sided arrow marks shows the convexity defects, which are the local maximum deviations of hull from contours.

There is a little bit things to discuss about it its syntax: @code{.py} hull = cv.convexHull(points\[, hull\[, clockwise\[, returnPoints\]\]\]) @endcode Arguments details:

-   **points** are the contours we pass into.
-   **hull** is the output, normally we avoid it.
-   **clockwise** : Orientation flag. If it is True, the output convex hull is oriented clockwise. Otherwise, it is oriented counter-clockwise.
-   **returnPoints** : By default, True. Then it returns the coordinates of the hull points. If False, it returns the indices of contour points corresponding to the hull points.

So to get a convex hull as in above image, following is sufficient: @code{.py} hull = cv.convexHull(cnt) @endcode But if you want to find convexity defects, you need to pass returnPoints = False. To understand it, we will take the rectangle image above. First I found its contour as cnt. Now I found its convex hull with returnPoints = True, I got following values: \[\[\[234 202\]\], \[\[ 51 202\]\], \[\[ 51 79\]\], \[\[234 79\]\]\] which are the four corner points of rectangle. Now if do the same with returnPoints = False, I get following result: \[\[129\],\[ 67\],\[ 0\],\[142\]\]. These are the indices of corresponding points in contours. For eg, check the first value: cnt\[129\] = \[\[234, 202\]\] which is same as first result (and so on for others).

You will see it again when we discuss about convexity defects.

6.  Checking Convexity

* * *

There is a function to check if a curve is convex or not, **cv.isContourConvex()**. It just return whether True or False. Not a big deal. @code{.py} k = cv.isContourConvex(cnt) @endcode

7.  Bounding Rectangle

* * *

There are two types of bounding rectangles.

### 7.a. Straight Bounding Rectangle

It is a straight rectangle, it doesn't consider the rotation of the object. So area of the bounding rectangle won't be minimum. It is found by the function **cv.boundingRect()**.

Let (x,y) be the top-left coordinate of the rectangle and (w,h) be its width and height. @code{.py} x,y,w,h = cv.boundingRect(cnt) cv.rectangle(img,(x,y),(x+w,y+h),(0,255,0),2) @endcode

### 7.b. Rotated Rectangle

Here, bounding rectangle is drawn with minimum area, so it considers the rotation also. The function used is **cv.minAreaRect()**. It returns a Box2D structure which contains following details - ( center (x,y), (width, height), angle of rotation ). But to draw this rectangle, we need 4 corners of the rectangle. It is obtained by the function **cv.boxPoints()** @code{.py} rect = cv.minAreaRect(cnt) box = cv.boxPoints(rect) box = np.int0(box) cv.drawContours(img,\[box\],0,(0,0,255),2) @endcode Both the rectangles are shown in a single image. Green rectangle shows the normal bounding rect. Red rectangle is the rotated rect.

8.  Minimum Enclosing Circle

* * *

Next we find the circumcircle of an object using the function **cv.minEnclosingCircle()**. It is a circle which completely covers the object with minimum area. @code{.py} (x,y),radius = cv.minEnclosingCircle(cnt) center = (int(x),int(y)) radius = int(radius) cv.circle(img,center,radius,(0,255,0),2) @endcode

9.  Fitting an Ellipse

* * *

Next one is to fit an ellipse to an object. It returns the rotated rectangle in which the ellipse is inscribed. @code{.py} ellipse = cv.fitEllipse(cnt) cv.ellipse(img,ellipse,(0,255,0),2) @endcode

10.  Fitting a Line

* * *

Similarly we can fit a line to a set of points. Below image contains a set of white points. We can approximate a straight line to it. @code{.py} rows,cols = img.shape\[:2\] \[vx,vy,x,y\] = cv.fitLine(cnt, cv.DIST\_L2,0,0.01,0.01) lefty = int((-x\*vy/vx) + y) righty = int(((cols-x)\*vy/vx)+y) cv.line(img,(cols-1,righty),(0,lefty),(0,255,0),2) @endcode

## [Py Contour Properties](https://docharvest.github.io/docs/opencv5/py_tutorials/py_imgproc/py_contours/py_contour_properties/py_contour_properties/)

Contents

opencv5

Py Contour Properties

OpenCV 5

Py Contour Properties

# Contour Properties {#tutorial\_py\_contour\_properties}

@prev\_tutorial{tutorial\_py\_contour\_features} @next\_tutorial{tutorial\_py\_contours\_more\_functions}

Here we will learn to extract some frequently used properties of objects like Solidity, Equivalent Diameter, Mask image, Mean Intensity etc. More features can be found at [Matlab regionprops documentation](http://www.mathworks.in/help/images/ref/regionprops.html).

_(NB : Centroid, Area, Perimeter etc also belong to this category, but we have seen it in last chapter)_

1.  Aspect Ratio

* * *

It is the ratio of width to height of bounding rect of the object.

\\f\[Aspect ; Ratio = \\frac{Width}{Height}\\f\] @code{.py} x,y,w,h = cv.boundingRect(cnt) aspect\_ratio = float(w)/h @endcode

2.  Extent

* * *

Extent is the ratio of contour area to bounding rectangle area.

\\f\[Extent = \\frac{Object ; Area}{Bounding ; Rectangle ; Area}\\f\] @code{.py} area = cv.contourArea(cnt) x,y,w,h = cv.boundingRect(cnt) rect\_area = w\*h extent = float(area)/rect\_area @endcode

3.  Solidity

* * *

Solidity is the ratio of contour area to its convex hull area.

\\f\[Solidity = \\frac{Contour ; Area}{Convex ; Hull ; Area}\\f\] @code{.py} area = cv.contourArea(cnt) hull = cv.convexHull(cnt) hull\_area = cv.contourArea(hull) solidity = float(area)/hull\_area @endcode

4.  Equivalent Diameter

* * *

Equivalent Diameter is the diameter of the circle whose area is same as the contour area.

\\f\[Equivalent ; Diameter = \\sqrt{\\frac{4 \\times Contour ; Area}{\\pi}}\\f\] @code{.py} area = cv.contourArea(cnt) equi\_diameter = np.sqrt(4\*area/np.pi) @endcode

5.  Orientation

* * *

Orientation is the angle at which object is directed. Following method also gives the Major Axis and Minor Axis lengths. @code{.py} (x,y),(MA,ma),angle = cv.fitEllipse(cnt) @endcode

6.  Mask and Pixel Points

* * *

In some cases, we may need all the points which comprises that object. It can be done as follows: @code{.py} mask = np.zeros(imgray.shape,np.uint8) cv.drawContours(mask,\[cnt\],0,255,-1) pixelpoints = np.transpose(np.nonzero(mask)) #pixelpoints = cv.findNonZero(mask) @endcode Here, two methods, one using Numpy functions, next one using OpenCV function (last commented line) are given to do the same. Results are also same, but with a slight difference. Numpy gives coordinates in **(row, column)** format, while OpenCV gives coordinates in **(x,y)** format. So basically the answers will be interchanged. Note that, **row = y** and **column = x**.

7.  Maximum Value, Minimum Value and their locations

* * *

We can find these parameters using a mask image. @code{.py} min\_val, max\_val, min\_loc, max\_loc = cv.minMaxLoc(imgray,mask = mask) @endcode

8.  Mean Color or Mean Intensity

* * *

Here, we can find the average color of an object. Or it can be average intensity of the object in grayscale mode. We again use the same mask to do it. @code{.py} mean\_val = cv.mean(im,mask = mask) @endcode

9.  Extreme Points

* * *

Extreme Points means topmost, bottommost, rightmost and leftmost points of the object. @code{.py} leftmost = tuple(cnt\[cnt\[:,:,0\].argmin()\]\[0\]) rightmost = tuple(cnt\[cnt\[:,:,0\].argmax()\]\[0\]) topmost = tuple(cnt\[cnt\[:,:,1\].argmin()\]\[0\]) bottommost = tuple(cnt\[cnt\[:,:,1\].argmax()\]\[0\]) @endcode For eg, if I apply it to an Indian map, I get the following result :

## Exercises

\-# There are still some features left in matlab regionprops doc. Try to implement them.

## [Py Contours Begin](https://docharvest.github.io/docs/opencv5/py_tutorials/py_imgproc/py_contours/py_contours_begin/py_contours_begin/)

Contents

opencv5

Py Contours Begin

OpenCV 5

Py Contours Begin

# Contours : Getting Started {#tutorial\_py\_contours\_begin}

@next\_tutorial{tutorial\_py\_contour\_features}

## Goal

-   Understand what contours are.
-   Learn to find contours, draw contours etc
-   You will see these functions : **cv.findContours()**, **cv.drawContours()**

## What are contours?

Contours can be explained simply as a curve joining all the continuous points (along the boundary), having same color or intensity. The contours are a useful tool for shape analysis and object detection and recognition.

-   For better accuracy, use binary images. So before finding contours, apply threshold or canny edge detection.
-   Since OpenCV 3.2, findContours() no longer modifies the source image.
-   In OpenCV, finding contours is like finding white object from black background. So remember, object to be found should be white and background should be black.

Let's see how to find contours of a binary image: @code{.py} import numpy as np import cv2 as cv

im = cv.imread('test.jpg') assert im is not None, "file could not be read, check with os.path.exists()" imgray = cv.cvtColor(im, cv.COLOR\_BGR2GRAY) ret, thresh = cv.threshold(imgray, 127, 255, 0) contours, hierarchy = cv.findContours(thresh, cv.RETR\_TREE, cv.CHAIN\_APPROX\_SIMPLE) @endcode See, there are three arguments in **cv.findContours()** function, first one is source image, second is contour retrieval mode, third is contour approximation method. And it outputs the contours and hierarchy. Contours is a Python list of all the contours in the image. Each individual contour is a Numpy array of (x,y) coordinates of boundary points of the object.

@note We will discuss second and third arguments and about hierarchy in details later. Until then, the values given to them in code sample will work fine for all images.

## How to draw the contours?

To draw the contours, cv.drawContours function is used. It can also be used to draw any shape provided you have its boundary points. Its first argument is source image, second argument is the contours which should be passed as a Python list, third argument is index of contours (useful when drawing individual contour. To draw all contours, pass -1) and remaining arguments are color, thickness etc.

-   To draw all the contours in an image: @code{.py} cv.drawContours(img, contours, -1, (0,255,0), 3) @endcode
-   To draw an individual contour, say 4th contour: @code{.py} cv.drawContours(img, contours, 3, (0,255,0), 3) @endcode
-   But most of the time, below method will be useful: @code{.py} cnt = contours\[4\] cv.drawContours(img, \[cnt\], 0, (0,255,0), 3) @endcode

@note Last two methods are same, but when you go forward, you will see last one is more useful.

# Contour Approximation Method

This is the third argument in cv.findContours function. What does it denote actually?

Above, we told that contours are the boundaries of a shape with same intensity. It stores the (x,y) coordinates of the boundary of a shape. But does it store all the coordinates ? That is specified by this contour approximation method.

If you pass cv.CHAIN\_APPROX\_NONE, all the boundary points are stored. But actually do we need all the points? For eg, you found the contour of a straight line. Do you need all the points on the line to represent that line? No, we need just two end points of that line. This is what cv.CHAIN\_APPROX\_SIMPLE does. It removes all redundant points and compresses the contour, thereby saving memory.

Below image of a rectangle demonstrate this technique. Just draw a circle on all the coordinates in the contour array (drawn in blue color). First image shows points I got with cv.CHAIN\_APPROX\_NONE (734 points) and second image shows the one with cv.CHAIN\_APPROX\_SIMPLE (only 4 points). See, how much memory it saves!!!

## [Py Contours Hierarchy](https://docharvest.github.io/docs/opencv5/py_tutorials/py_imgproc/py_contours/py_contours_hierarchy/py_contours_hierarchy/)

Contents

opencv5

Py Contours Hierarchy

OpenCV 5

Py Contours Hierarchy

# Contours Hierarchy {#tutorial\_py\_contours\_hierarchy}

@prev\_tutorial{tutorial\_py\_contours\_more\_functions}

## Goal

This time, we learn about the hierarchy of contours, i.e. the parent-child relationship in Contours.

## Theory

In the last few articles on contours, we have worked with several functions related to contours provided by OpenCV. But when we found the contours in image using **cv.findContours()** function, we have passed an argument, **Contour Retrieval Mode**. We usually passed **cv.RETR\_LIST** or **cv.RETR\_TREE** and it worked nice. But what does it actually mean ?

Also, in the output, we got three arrays, first is the image, second is our contours, and one more output which we named as **hierarchy** (Please checkout the codes in previous articles). But we never used this hierarchy anywhere. Then what is this hierarchy and what is it for ? What is its relationship with the previous mentioned function argument ?

That is what we are going to deal in this article.

### What is Hierarchy?

Normally we use the **cv.findContours()** function to detect objects in an image, right ? Sometimes objects are in different locations. But in some cases, some shapes are inside other shapes. Just like nested figures. In this case, we call outer one as **parent** and inner one as **child**. This way, contours in an image has some relationship to each other. And we can specify how one contour is connected to each other, like, is it child of some other contour, or is it a parent etc. Representation of this relationship is called the **Hierarchy**.

Consider an example image below :

In this image, there are a few shapes which I have numbered from **0-5**. _2 and 2a_ denotes the external and internal contours of the outermost box.

Here, contours 0,1,2 are **external or outermost**. We can say, they are in **hierarchy-0** or simply they are in **same hierarchy level**.

Next comes **contour-2a**. It can be considered as a **child of contour-2** (or in opposite way, contour-2 is parent of contour-2a). So let it be in **hierarchy-1**. Similarly contour-3 is child of contour-2 and it comes in next hierarchy. Finally contours 4,5 are the children of contour-3a, and they come in the last hierarchy level. From the way I numbered the boxes, I would say contour-4 is the first child of contour-3a (It can be contour-5 also).

I mentioned these things to understand terms like **same hierarchy level**, **external contour**, **child contour**, **parent contour**, **first child** etc. Now let's get into OpenCV.

### Hierarchy Representation in OpenCV

So each contour has its own information regarding what hierarchy it is, who is its child, who is its parent etc. OpenCV represents it as an array of four values : **\[Next, Previous, First\_Child, Parent\]**

\*"Next denotes next contour at the same hierarchical level."\*

For eg, take contour-0 in our picture. Who is next contour in its same level ? It is contour-1. So simply put Next = 1. Similarly for Contour-1, next is contour-2. So Next = 2.

What about contour-2? There is no next contour in the same level. So simply, put Next = -1. What about contour-4? It is in same level with contour-5. So its next contour is contour-5, so Next = 5.

\*"Previous denotes previous contour at the same hierarchical level."\*

It is same as above. Previous contour of contour-1 is contour-0 in the same level. Similarly for contour-2, it is contour-1. And for contour-0, there is no previous, so put it as -1.

\*"First\_Child denotes its first child contour."\*

There is no need of any explanation. For contour-2, child is contour-2a. So it gets the corresponding index value of contour-2a. What about contour-3a? It has two children. But we take only first child. And it is contour-4. So First\_Child = 4 for contour-3a.

\*"Parent denotes index of its parent contour."\*

It is just opposite of **First\_Child**. Both for contour-4 and contour-5, parent contour is contour-3a. For contour-3a, it is contour-3 and so on.

@note If there is no child or parent, that field is taken as -1

So now we know about the hierarchy style used in OpenCV, we can check into Contour Retrieval Modes in OpenCV with the help of same image given above. ie what do flags like cv.RETR\_LIST, cv.RETR\_TREE, cv.RETR\_CCOMP, cv.RETR\_EXTERNAL etc mean?

## Contour Retrieval Mode

### 1\. RETR\_LIST

This is the simplest of the four flags (from explanation point of view). It simply retrieves all the contours, but doesn't create any parent-child relationship. **Parents and kids are equal under this rule, and they are just contours**. ie they all belongs to same hierarchy level.

So here, 3rd and 4th term in hierarchy array is always -1. But obviously, Next and Previous terms will have their corresponding values. Just check it yourself and verify it.

Below is the result I got, and each row is hierarchy details of corresponding contour. For eg, first row corresponds to contour 0. Next contour is contour 1. So Next = 1. There is no previous contour, so Previous = -1. And the remaining two, as told before, it is -1. @code{.py}

> > > hierarchy array(\[\[\[ 1, -1, -1, -1\], \[ 2, 0, -1, -1\], \[ 3, 1, -1, -1\], \[ 4, 2, -1, -1\], \[ 5, 3, -1, -1\], \[ 6, 4, -1, -1\], \[ 7, 5, -1, -1\], \[-1, 6, -1, -1\]\]\]) @endcode This is the good choice to use in your code, if you are not using any hierarchy features.

### 2\. RETR\_EXTERNAL

If you use this flag, it returns only extreme outer flags. All child contours are left behind. **We can say, under this law, Only the eldest in every family is taken care of. It doesn't care about other members of the family :)**.

So, in our image, how many extreme outer contours are there? ie at hierarchy-0 level?. Only 3, ie contours 0,1,2, right? Now try to find the contours using this flag. Here also, values given to each element is same as above. Compare it with above result. Below is what I got : @code{.py}

> > > hierarchy array(\[\[\[ 1, -1, -1, -1\], \[ 2, 0, -1, -1\], \[-1, 1, -1, -1\]\]\]) @endcode You can use this flag if you want to extract only the outer contours. It might be useful in some cases.

### 3\. RETR\_CCOMP

This flag retrieves all the contours and arranges them to a 2-level hierarchy. ie external contours of the object (ie its boundary) are placed in hierarchy-1. And the contours of holes inside object (if any) is placed in hierarchy-2. If any object inside it, its contour is placed again in hierarchy-1 only. And its hole in hierarchy-2 and so on.

Just consider the image of a "big white zero" on a black background. Outer circle of zero belongs to first hierarchy, and inner circle of zero belongs to second hierarchy.

We can explain it with a simple image. Here I have labelled the order of contours in red color and the hierarchy they belongs to, in green color (either 1 or 2). The order is same as the order OpenCV detects contours.

So consider first contour, ie contour-0. It is hierarchy-1. It has two holes, contours 1&2, and they belong to hierarchy-2. So for contour-0, Next contour in same hierarchy level is contour-3. And there is no previous one. And its first is child is contour-1 in hierarchy-2. It has no parent, because it is in hierarchy-1. So its hierarchy array is \[3,-1,1,-1\]

Now take contour-1. It is in hierarchy-2. Next one in same hierarchy (under the parenthood of contour-1) is contour-2. No previous one. No child, but parent is contour-0. So array is \[2,-1,-1,0\].

Similarly contour-2 : It is in hierarchy-2. There is not next contour in same hierarchy under contour-0. So no Next. Previous is contour-1. No child, parent is contour-0. So array is \[-1,1,-1,0\].

Contour - 3 : Next in hierarchy-1 is contour-5. Previous is contour-0. Child is contour-4 and no parent. So array is \[5,0,4,-1\].

Contour - 4 : It is in hierarchy 2 under contour-3 and it has no sibling. So no next, no previous, no child, parent is contour-3. So array is \[-1,-1,-1,3\].

Remaining you can fill up. This is the final answer I got: @code{.py}

> > > hierarchy array(\[\[\[ 3, -1, 1, -1\], \[ 2, -1, -1, 0\], \[-1, 1, -1, 0\], \[ 5, 0, 4, -1\], \[-1, -1, -1, 3\], \[ 7, 3, 6, -1\], \[-1, -1, -1, 5\], \[ 8, 5, -1, -1\], \[-1, 7, -1, -1\]\]\]) @endcode

### 4\. RETR\_TREE

And this is the final guy, Mr.Perfect. It retrieves all the contours and creates a full family hierarchy list. **It even tells, who is the grandpa, father, son, grandson and even beyond... :)**.

For example, I took above image, rewrite the code for cv.RETR\_TREE, reorder the contours as per the result given by OpenCV and analyze it. Again, red letters give the contour number and green letters give the hierarchy order.

Take contour-0 : It is in hierarchy-0. Next contour in same hierarchy is contour-7. No previous contours. Child is contour-1. And no parent. So array is \[7,-1,1,-1\].

Take contour-2 : It is in hierarchy-1. No contour in same level. No previous one. Child is contour-3. Parent is contour-1. So array is \[-1,-1,3,1\].

And remaining, try yourself. Below is the full answer: @code{.py}

> > > hierarchy array(\[\[\[ 7, -1, 1, -1\], \[-1, -1, 2, 0\], \[-1, -1, 3, 1\], \[-1, -1, 4, 2\], \[-1, -1, 5, 3\], \[ 6, -1, -1, 4\], \[-1, 5, -1, 4\], \[ 8, 0, -1, -1\], \[-1, 7, -1, -1\]\]\]) @endcode

## [Py Contours More Functions](https://docharvest.github.io/docs/opencv5/py_tutorials/py_imgproc/py_contours/py_contours_more_functions/py_contours_more_functions/)

Contents

opencv5

Py Contours More Functions

OpenCV 5

Py Contours More Functions

# Contours : More Functions {#tutorial\_py\_contours\_more\_functions}

@prev\_tutorial{tutorial\_py\_contour\_properties} @next\_tutorial{tutorial\_py\_contours\_hierarchy}

## Goal

In this chapter, we will learn about - Convexity defects and how to find them. - Finding shortest distance from a point to a polygon - Matching different shapes

## Theory and Code

### 1\. Convexity Defects

We saw what is convex hull in second chapter about contours. Any deviation of the object from this hull can be considered as convexity defect.

OpenCV comes with a ready-made function to find this, **cv.convexityDefects()**. A basic function call would look like below: @code{.py} hull = cv.convexHull(cnt,returnPoints = False) defects = cv.convexityDefects(cnt,hull) @endcode

@note Remember we have to pass returnPoints = False while finding convex hull, in order to find convexity defects.

It returns an array where each row contains these values - **\[ start point, end point, farthest point, approximate distance to farthest point \]**. We can visualize it using an image. We draw a line joining start point and end point, then draw a circle at the farthest point. Remember first three values returned are indices of cnt. So we have to bring those values from cnt.

@code{.py} import cv2 as cv import numpy as np

img = cv.imread('star.jpg') assert img is not None, "file could not be read, check with os.path.exists()" img\_gray = cv.cvtColor(img,cv.COLOR\_BGR2GRAY) ret,thresh = cv.threshold(img\_gray, 127, 255,0) contours,hierarchy = cv.findContours(thresh,2,1) cnt = contours\[0\]

hull = cv.convexHull(cnt,returnPoints = False) defects = cv.convexityDefects(cnt,hull)

for i in range(defects.shape\[0\]): s,e,f,d = defects\[i,0\] start = tuple(cnt\[s\]\[0\]) end = tuple(cnt\[e\]\[0\]) far = tuple(cnt\[f\]\[0\]) cv.line(img,start,end,\[0,255,0\],2) cv.circle(img,far,5,\[0,0,255\],-1)

cv.imshow('img',img) cv.waitKey(0) cv.destroyAllWindows() @endcode And see the result:

### 2\. Point Polygon Test

This function finds the shortest distance between a point in the image and a contour. It returns the distance which is negative when point is outside the contour, positive when point is inside and zero if point is on the contour.

For example, we can check the point (50,50) as follows: @code{.py} dist = cv.pointPolygonTest(cnt,(50,50),True) @endcode In the function, third argument is measureDist. If it is True, it finds the signed distance. If False, it finds whether the point is inside or outside or on the contour (it returns +1, -1, 0 respectively).

@note If you don't want to find the distance, make sure third argument is False, because, it is a time consuming process. So, making it False gives about 2-3X speedup.

### 3\. Match Shapes

OpenCV comes with a function **cv.matchShapes()** which enables us to compare two shapes, or two contours and returns a metric showing the similarity. The lower the result, the better match it is. It is calculated based on the hu-moment values. Different measurement methods are explained in the docs. @code{.py} import cv2 as cv import numpy as np

img1 = cv.imread('star.jpg', cv.IMREAD\_GRAYSCALE) img2 = cv.imread('star2.jpg', cv.IMREAD\_GRAYSCALE) assert img1 is not None, "file could not be read, check with os.path.exists()" assert img2 is not None, "file could not be read, check with os.path.exists()"

ret, thresh = cv.threshold(img1, 127, 255,0) ret, thresh2 = cv.threshold(img2, 127, 255,0) contours,hierarchy = cv.findContours(thresh,2,1) cnt1 = contours\[0\] contours,hierarchy = cv.findContours(thresh2,2,1) cnt2 = contours\[0\]

ret = cv.matchShapes(cnt1,cnt2,1,0.0) print( ret ) @endcode I tried matching shapes with different shapes given below:

I got following results:

-   Matching Image A with itself = 0.0
-   Matching Image A with Image B = 0.001946
-   Matching Image A with Image C = 0.326911

See, even image rotation doesn't affect much on this comparison.

@note [Hu-Moments](http://en.wikipedia.org/wiki/Image_moment#Rotation_invariant_moments) are seven moments invariant to translation, rotation and scale. Seventh one is skew-invariant. Those values can be found using **cv.HuMoments()** function.

## Exercises

\-# Check the documentation for **cv.pointPolygonTest()**, you can find a nice image in Red and Blue color. It represents the distance from all pixels to the white curve on it. All pixels inside curve is blue depending on the distance. Similarly outside points are red. Contour edges are marked with White. So problem is simple. Write a code to create such a representation of distance. -# Compare images of digits or letters using **cv.matchShapes()**. ( That would be a simple step towards OCR )

## [Py Table Of Contents Contours](https://docharvest.github.io/docs/opencv5/py_tutorials/py_imgproc/py_contours/py_table_of_contents_contours/)

Contents

opencv5

Py Table Of Contents Contours

OpenCV 5

Py Table Of Contents Contours

# Contours in OpenCV {#tutorial\_py\_table\_of\_contents\_contours}

-   @subpage tutorial\_py\_contours\_begin
    
    Learn to find and draw Contours
    
-   @subpage tutorial\_py\_contour\_features
    
    Learn to find different features of contours like area, perimeter, bounding rectangle etc.
    
-   @subpage tutorial\_py\_contour\_properties
    
    Learn to find different properties of contours like Solidity, Mean Intensity etc.
    
-   @subpage tutorial\_py\_contours\_more\_functions
    
    Learn to find convexity defects, pointPolygonTest, match different shapes etc.
    
-   @subpage tutorial\_py\_contours\_hierarchy
    
    Learn about Contour Hierarchy

## [Py Filtering](https://docharvest.github.io/docs/opencv5/py_tutorials/py_imgproc/py_filtering/py_filtering/)

Contents

opencv5

Py Filtering

OpenCV 5

Py Filtering

# Smoothing Images {#tutorial\_py\_filtering}

## Goals

Learn to: - Blur images with various low pass filters - Apply custom-made filters to images (2D convolution)

## 2D Convolution ( Image Filtering )

As in one-dimensional signals, images also can be filtered with various low-pass filters (LPF), high-pass filters (HPF), etc. LPF helps in removing noise, blurring images, etc. HPF filters help in finding edges in images.

OpenCV provides a function **cv.filter2D()** to convolve a kernel with an image. As an example, we will try an averaging filter on an image. A 5x5 averaging filter kernel will look like the below:

\\f\[K = \\frac{1}{25} \\begin{bmatrix} 1 & 1 & 1 & 1 & 1 \\ 1 & 1 & 1 & 1 & 1 \\ 1 & 1 & 1 & 1 & 1 \\ 1 & 1 & 1 & 1 & 1 \\ 1 & 1 & 1 & 1 & 1 \\end{bmatrix}\\f\]

The operation works like this: keep this kernel above a pixel, add all the 25 pixels below this kernel, take the average, and replace the central pixel with the new average value. This operation is continued for all the pixels in the image. Try this code and check the result: @code{.py} import numpy as np import cv2 as cv from matplotlib import pyplot as plt

img = cv.imread('opencv\_logo.png') assert img is not None, "file could not be read, check with os.path.exists()"

kernel = np.ones((5,5),np.float32)/25 dst = cv.filter2D(img,-1,kernel)

plt.subplot(121),plt.imshow(img),plt.title('Original') plt.xticks(\[\]), plt.yticks(\[\]) plt.subplot(122),plt.imshow(dst),plt.title('Averaging') plt.xticks(\[\]), plt.yticks(\[\]) plt.show() @endcode Result:

## Image Blurring (Image Smoothing)

Image blurring is achieved by convolving the image with a low-pass filter kernel. It is useful for removing noise. It actually removes high frequency content (eg: noise, edges) from the image. So edges are blurred a little bit in this operation (there are also blurring techniques which don't blur the edges). OpenCV provides four main types of blurring techniques.

### 1\. Averaging

This is done by convolving an image with a normalized box filter. It simply takes the average of all the pixels under the kernel area and replaces the central element. This is done by the function **cv.blur()** or **cv.boxFilter()**. Check the docs for more details about the kernel. We should specify the width and height of the kernel. A 3x3 normalized box filter would look like the below:

\\f\[K = \\frac{1}{9} \\begin{bmatrix} 1 & 1 & 1 \\ 1 & 1 & 1 \\ 1 & 1 & 1 \\end{bmatrix}\\f\]

@note If you don't want to use a normalized box filter, use **cv.boxFilter()**. Pass an argument normalize=False to the function.

Check a sample demo below with a kernel of 5x5 size: @code{.py} import cv2 as cv import numpy as np from matplotlib import pyplot as plt

img = cv.imread('opencv-logo-white.png') assert img is not None, "file could not be read, check with os.path.exists()"

blur = cv.blur(img,(5,5))

plt.subplot(121),plt.imshow(img),plt.title('Original') plt.xticks(\[\]), plt.yticks(\[\]) plt.subplot(122),plt.imshow(blur),plt.title('Blurred') plt.xticks(\[\]), plt.yticks(\[\]) plt.show() @endcode Result:

### 2\. Gaussian Blurring

In this method, instead of a box filter, a Gaussian kernel is used. It is done with the function, **cv.GaussianBlur()**. We should specify the width and height of the kernel which should be positive and odd. We also should specify the standard deviation in the X and Y directions, sigmaX and sigmaY respectively. If only sigmaX is specified, sigmaY is taken as the same as sigmaX. If both are given as zeros, they are calculated from the kernel size. Gaussian blurring is highly effective in removing Gaussian noise from an image.

If you want, you can create a Gaussian kernel with the function, **cv.getGaussianKernel()**.

The above code can be modified for Gaussian blurring: @code{.py} blur = cv.GaussianBlur(img,(5,5),0) @endcode Result:

### 3\. Median Blurring

Here, the function **cv.medianBlur()** takes the median of all the pixels under the kernel area and the central element is replaced with this median value. This is highly effective against salt-and-pepper noise in an image. Interestingly, in the above filters, the central element is a newly calculated value which may be a pixel value in the image or a new value. But in median blurring, the central element is always replaced by some pixel value in the image. It reduces the noise effectively. Its kernel size should be a positive odd integer.

In this demo, I added a 50% noise to our original image and applied median blurring. Check the result: @code{.py} median = cv.medianBlur(img,5) @endcode Result:

### 4\. Bilateral Filtering

**cv.bilateralFilter()** is highly effective in noise removal while keeping edges sharp. But the operation is slower compared to other filters. We already saw that a Gaussian filter takes the neighbourhood around the pixel and finds its Gaussian weighted average. This Gaussian filter is a function of space alone, that is, nearby pixels are considered while filtering. It doesn't consider whether pixels have almost the same intensity. It doesn't consider whether a pixel is an edge pixel or not. So it blurs the edges also, which we don't want to do.

Bilateral filtering also takes a Gaussian filter in space, but one more Gaussian filter which is a function of pixel difference. The Gaussian function of space makes sure that only nearby pixels are considered for blurring, while the Gaussian function of intensity difference makes sure that only those pixels with similar intensities to the central pixel are considered for blurring. So it preserves the edges since pixels at edges will have large intensity variation.

The below sample shows use of a bilateral filter (For details on arguments, visit docs). @code{.py} blur = cv.bilateralFilter(img,9,75,75) @endcode Result:

See, the texture on the surface is gone, but the edges are still preserved.

## Additional Resources

\-# Details about the [bilateral filtering](http://people.csail.mit.edu/sparis/bf_course/)

## [Py Geometric Transformations](https://docharvest.github.io/docs/opencv5/py_tutorials/py_imgproc/py_geometric_transformations/py_geometric_transformations/)

Contents

opencv5

Py Geometric Transformations

OpenCV 5

Py Geometric Transformations

# Geometric Transformations of Images {#tutorial\_py\_geometric\_transformations}

## Goals

-   Learn to apply different geometric transformations to images, like translation, rotation, affine transformation etc.
-   You will see these functions: **cv.getPerspectiveTransform**

## Transformations

OpenCV provides two transformation functions, **cv.warpAffine** and **cv.warpPerspective**, with which you can perform all kinds of transformations. **cv.warpAffine** takes a 2x3 transformation matrix while **cv.warpPerspective** takes a 3x3 transformation matrix as input.

### Scaling

Scaling is just resizing of the image. OpenCV comes with a function **cv.resize()** for this purpose. The size of the image can be specified manually, or you can specify the scaling factor. Different interpolation methods are used. Preferable interpolation methods are **cv.INTER\_AREA** for shrinking and **cv.INTER\_CUBIC** (slow) & **cv.INTER\_LINEAR** for zooming. By default, the interpolation method **cv.INTER\_LINEAR** is used for all resizing purposes. You can resize an input image with either of following methods: @code{.py} import numpy as np import cv2 as cv

img = cv.imread('messi5.jpg') assert img is not None, "file could not be read, check with os.path.exists()"

res = cv.resize(img,None,fx=2, fy=2, interpolation = cv.INTER\_CUBIC)

#OR

height, width = img.shape\[:2\] res = cv.resize(img,(2_width, 2_height), interpolation = cv.INTER\_CUBIC) @endcode

### Translation

Translation is the shifting of an object's location. If you know the shift in the (x,y) direction and let it be \\f$(t\_x,t\_y)\\f$, you can create the transformation matrix \\f$\\textbf{M}\\f$ as follows:

\\f\[M = \\begin{bmatrix} 1 & 0 & t\_x \\ 0 & 1 & t\_y \\end{bmatrix}\\f\]

You can take make it into a Numpy array of type np.float32 and pass it into the **cv.warpAffine()** function. See the below example for a shift of (100,50): @code{.py} import numpy as np import cv2 as cv

img = cv.imread('messi5.jpg', cv.IMREAD\_GRAYSCALE) assert img is not None, "file could not be read, check with os.path.exists()" rows,cols = img.shape

M = np.float32(\[\[1,0,100\],\[0,1,50\]\]) dst = cv.warpAffine(img,M,(cols,rows))

cv.imshow('img',dst) cv.waitKey(0) cv.destroyAllWindows() @endcode **warning**

The third argument of the **cv.warpAffine()** function is the size of the output image, which should be in the form of **(width, height)**. Remember width = number of columns, and height = number of rows.

See the result below:

### Rotation

Rotation of an image for an angle \\f$\\theta\\f$ is achieved by the transformation matrix of the form

\\f\[M = \\begin{bmatrix} cos\\theta & -sin\\theta \\ sin\\theta & cos\\theta \\end{bmatrix}\\f\]

But OpenCV provides scaled rotation with adjustable center of rotation so that you can rotate at any location you prefer. The modified transformation matrix is given by

\\f\[\\begin{bmatrix} \\alpha & \\beta & (1- \\alpha ) \\cdot center.x - \\beta \\cdot center.y \\ - \\beta & \\alpha & \\beta \\cdot center.x + (1- \\alpha ) \\cdot center.y \\end{bmatrix}\\f\]

where:

\\f\[\\begin{array}{l} \\alpha = scale \\cdot \\cos \\theta , \\ \\beta = scale \\cdot \\sin \\theta \\end{array}\\f\]

To find this transformation matrix, OpenCV provides a function, **cv.getRotationMatrix2D**. Check out the below example which rotates the image by 90 degree with respect to center without any scaling. @code{.py} img = cv.imread('messi5.jpg', cv.IMREAD\_GRAYSCALE) assert img is not None, "file could not be read, check with os.path.exists()" rows,cols = img.shape

# cols-1 and rows-1 are the coordinate limits.

M = cv.getRotationMatrix2D(((cols-1)/2.0,(rows-1)/2.0),90,1) dst = cv.warpAffine(img,M,(cols,rows)) @endcode See the result:

### Affine Transformation

In affine transformation, all parallel lines in the original image will still be parallel in the output image. To find the transformation matrix, we need three points from the input image and their corresponding locations in the output image. Then **cv.getAffineTransform** will create a 2x3 matrix which is to be passed to **cv.warpAffine**.

Check the below example, and also look at the points I selected (which are marked in green color): @code{.py} img = cv.imread('drawing.png') assert img is not None, "file could not be read, check with os.path.exists()" rows,cols,ch = img.shape

pts1 = np.float32(\[\[50,50\],\[200,50\],\[50,200\]\]) pts2 = np.float32(\[\[10,100\],\[200,50\],\[100,250\]\])

M = cv.getAffineTransform(pts1,pts2)

dst = cv.warpAffine(img,M,(cols,rows))

plt.subplot(121),plt.imshow(img),plt.title('Input') plt.subplot(122),plt.imshow(dst),plt.title('Output') plt.show() @endcode See the result:

### Perspective Transformation

For perspective transformation, you need a 3x3 transformation matrix. Straight lines will remain straight even after the transformation. To find this transformation matrix, you need 4 points on the input image and corresponding points on the output image. Among these 4 points, 3 of them should not be collinear. Then the transformation matrix can be found by the function **cv.getPerspectiveTransform**. Then apply **cv.warpPerspective** with this 3x3 transformation matrix.

See the code below: @code{.py} img = cv.imread('sudoku.png') assert img is not None, "file could not be read, check with os.path.exists()" rows,cols,ch = img.shape

pts1 = np.float32(\[\[56,65\],\[368,52\],\[28,387\],\[389,390\]\]) pts2 = np.float32(\[\[0,0\],\[300,0\],\[0,300\],\[300,300\]\])

M = cv.getPerspectiveTransform(pts1,pts2)

dst = cv.warpPerspective(img,M,(300,300))

plt.subplot(121),plt.imshow(img),plt.title('Input') plt.subplot(122),plt.imshow(dst),plt.title('Output') plt.show() @endcode Result:

## Additional Resources

\-# "Computer Vision: Algorithms and Applications", Richard Szeliski

## [Py Grabcut](https://docharvest.github.io/docs/opencv5/py_tutorials/py_imgproc/py_grabcut/py_grabcut/)

Contents

opencv5

Py Grabcut

OpenCV 5

Py Grabcut

# Interactive Foreground Extraction using GrabCut Algorithm {#tutorial\_py\_grabcut}

## Goal

In this chapter - We will see GrabCut algorithm to extract foreground in images - We will create an interactive application for this.

## Theory

GrabCut algorithm was designed by Carsten Rother, Vladimir Kolmogorov & Andrew Blake from Microsoft Research Cambridge, UK. in their paper, ["GrabCut": interactive foreground extraction using iterated graph cuts](http://dl.acm.org/citation.cfm?id=1015720) . An algorithm was needed for foreground extraction with minimal user interaction, and the result was GrabCut.

How it works from user point of view ? Initially user draws a rectangle around the foreground region (foreground region should be completely inside the rectangle). Then algorithm segments it iteratively to get the best result. Done. But in some cases, the segmentation won't be fine, like, it may have marked some foreground region as background and vice versa. In that case, user need to do fine touch-ups. Just give some strokes on the images where some faulty results are there. Strokes basically says _"Hey, this region should be foreground, you marked it background, correct it in next iteration"_ or its opposite for background. Then in the next iteration, you get better results.

See the image below. First player and football is enclosed in a blue rectangle. Then some final touchups with white strokes (denoting foreground) and black strokes (denoting background) is made. And we get a nice result.

So what happens in background ?

-   User inputs the rectangle. Everything outside this rectangle will be taken as sure background (That is the reason it is mentioned before that your rectangle should include all the objects). Everything inside rectangle is unknown. Similarly any user input specifying foreground and background are considered as hard-labelling which means they won't change in the process.
-   Computer does an initial labelling depending on the data we gave. It labels the foreground and background pixels (or it hard-labels)
-   Now a Gaussian Mixture Model(GMM) is used to model the foreground and background.
-   Depending on the data we gave, GMM learns and create new pixel distribution. That is, the unknown pixels are labelled either probable foreground or probable background depending on its relation with the other hard-labelled pixels in terms of color statistics (It is just like clustering).
-   A graph is built from this pixel distribution. Nodes in the graphs are pixels. Additional two nodes are added, **Source node** and **Sink node**. Every foreground pixel is connected to Source node and every background pixel is connected to Sink node.
-   The weights of edges connecting pixels to source node/end node are defined by the probability of a pixel being foreground/background. The weights between the pixels are defined by the edge information or pixel similarity. If there is a large difference in pixel color, the edge between them will get a low weight.
-   Then a mincut algorithm is used to segment the graph. It cuts the graph into two separating source node and sink node with minimum cost function. The cost function is the sum of all weights of the edges that are cut. After the cut, all the pixels connected to Source node become foreground and those connected to Sink node become background.
-   The process is continued until the classification converges.

It is illustrated in below image (Image Courtesy: [http://www.cs.ru.ac.za/research/g02m1682/](http://www.cs.ru.ac.za/research/g02m1682/))

## Demo

Now we go for grabcut algorithm with OpenCV. OpenCV has the function, **cv.grabCut()** for this. We will see its arguments first:

-   _img_ - Input image
-   _mask_ - It is a mask image where we specify which areas are background, foreground or probable background/foreground etc. It is done by the following flags, **cv.GC\_BGD, cv.GC\_FGD, cv.GC\_PR\_BGD, cv.GC\_PR\_FGD**, or simply pass 0,1,2,3 to image.
-   _rect_ - It is the coordinates of a rectangle which includes the foreground object in the format (x,y,w,h)
-   _bdgModel_, _fgdModel_ - These are arrays used by the algorithm internally. You just create two np.float64 type zero arrays of size (1,65).
-   _iterCount_ - Number of iterations the algorithm should run.
-   _mode_ - It should be **cv.GC\_INIT\_WITH\_RECT** or **cv.GC\_INIT\_WITH\_MASK** or combined which decides whether we are drawing rectangle or final touchup strokes.

First let's see with rectangular mode. We load the image, create a similar mask image. We create _fgdModel_ and _bgdModel_. We give the rectangle parameters. It's all straight-forward. Let the algorithm run for 5 iterations. Mode should be _cv.GC\_INIT\_WITH\_RECT_ since we are using rectangle. Then run the grabcut. It modifies the mask image. In the new mask image, pixels will be marked with four flags denoting background/foreground as specified above. So we modify the mask such that all 0-pixels and 2-pixels are put to 0 (ie background) and all 1-pixels and 3-pixels are put to 1(ie foreground pixels). Now our final mask is ready. Just multiply it with input image to get the segmented image. @code{.py} import numpy as np import cv2 as cv from matplotlib import pyplot as plt

img = cv.imread('messi5.jpg') assert img is not None, "file could not be read, check with os.path.exists()" mask = np.zeros(img.shape\[:2\],np.uint8)

bgdModel = np.zeros((1,65),np.float64) fgdModel = np.zeros((1,65),np.float64)

rect = (50,50,450,290) cv.grabCut(img,mask,rect,bgdModel,fgdModel,5,cv.GC\_INIT\_WITH\_RECT)

mask2 = np.where((mask==2)|(mask==0),0,1).astype('uint8') img = img\*mask2\[:,:,np.newaxis\]

plt.imshow(img),plt.colorbar(),plt.show() @endcode See the results below:

Oops, Messi's hair is gone. _Who likes Messi without his hair?_ We need to bring it back. So we will give there a fine touchup with 1-pixel (sure foreground). At the same time, Some part of ground has come to picture which we don't want, and also some logo. We need to remove them. There we give some 0-pixel touchup (sure background). So we modify our resulting mask in previous case as we told now.

_What I actually did is that, I opened input image in paint application and added another layer to the image. Using brush tool in the paint, I marked missed foreground (hair, shoes, ball etc) with white and unwanted background (like logo, ground etc) with black on this new layer. Then filled remaining background with gray. Then loaded that mask image in OpenCV, edited original mask image we got with corresponding values in newly added mask image. Check the code below:_ @code{.py}

# newmask is the mask image I manually labelled

newmask = cv.imread('newmask.png', cv.IMREAD\_GRAYSCALE) assert newmask is not None, "file could not be read, check with os.path.exists()"

# wherever it is marked white (sure foreground), change mask=1

# wherever it is marked black (sure background), change mask=0

mask\[newmask == 0\] = 0 mask\[newmask == 255\] = 1

mask, bgdModel, fgdModel = cv.grabCut(img,mask,None,bgdModel,fgdModel,5,cv.GC\_INIT\_WITH\_MASK)

mask = np.where((mask==2)|(mask==0),0,1).astype('uint8') img = img\*mask\[:,:,np.newaxis\] plt.imshow(img),plt.colorbar(),plt.show() @endcode See the result below:

So that's it. Here instead of initializing in rect mode, you can directly go into mask mode. Just mark the rectangle area in mask image with 2-pixel or 3-pixel (probable background/foreground). Then mark our sure\_foreground with 1-pixel as we did in second example. Then directly apply the grabCut function with mask mode.

## Exercises

\-# OpenCV samples contain a sample grabcut.py which is an interactive tool using grabcut. Check it. Also watch this [youtube video](http://www.youtube.com/watch?v=kAwxLTDDAwU) on how to use it. -# Here, you can make this into a interactive sample with drawing rectangle and strokes with mouse, create trackbar to adjust stroke width etc.

## [Py Gradients](https://docharvest.github.io/docs/opencv5/py_tutorials/py_imgproc/py_gradients/py_gradients/)

Contents

opencv5

Py Gradients

OpenCV 5

Py Gradients

# Image Gradients {#tutorial\_py\_gradients}

## Goal

In this chapter, we will learn to:

-   Find Image gradients, edges etc
-   We will see following functions : **cv.Sobel()**, **cv.Scharr()**, **cv.Laplacian()** etc

## Theory

OpenCV provides three types of gradient filters or High-pass filters, Sobel, Scharr and Laplacian. We will see each one of them.

### 1\. Sobel and Scharr Derivatives

Sobel operators is a joint Gaussian smoothing plus differentiation operation, so it is more resistant to noise. You can specify the direction of derivatives to be taken, vertical or horizontal (by the arguments, yorder and xorder respectively). You can also specify the size of kernel by the argument ksize. If ksize = -1, a 3x3 Scharr filter is used which gives better results than 3x3 Sobel filter. Please see the docs for kernels used.

### 2\. Laplacian Derivatives

It calculates the Laplacian of the image given by the relation, \\f$\\Delta src = \\frac{\\partial ^2{src}}{\\partial x^2} + \\frac{\\partial ^2{src}}{\\partial y^2}\\f$ where each derivative is found using Sobel derivatives. If ksize = 1, then following kernel is used for filtering:

\\f\[kernel = \\begin{bmatrix} 0 & 1 & 0 \\ 1 & -4 & 1 \\ 0 & 1 & 0 \\end{bmatrix}\\f\]

## Code

Below code shows all operators in a single diagram. All kernels are of 5x5 size. Depth of output image is passed -1 to get the result in np.uint8 type. @code{.py} import numpy as np import cv2 as cv from matplotlib import pyplot as plt

img = cv.imread('dave.jpg', cv.IMREAD\_GRAYSCALE) assert img is not None, "file could not be read, check with os.path.exists()"

laplacian = cv.Laplacian(img,cv.CV\_64F) sobelx = cv.Sobel(img,cv.CV\_64F,1,0,ksize=5) sobely = cv.Sobel(img,cv.CV\_64F,0,1,ksize=5)

plt.subplot(2,2,1),plt.imshow(img,cmap = 'gray') plt.title('Original'), plt.xticks(\[\]), plt.yticks(\[\]) plt.subplot(2,2,2),plt.imshow(laplacian,cmap = 'gray') plt.title('Laplacian'), plt.xticks(\[\]), plt.yticks(\[\]) plt.subplot(2,2,3),plt.imshow(sobelx,cmap = 'gray') plt.title('Sobel X'), plt.xticks(\[\]), plt.yticks(\[\]) plt.subplot(2,2,4),plt.imshow(sobely,cmap = 'gray') plt.title('Sobel Y'), plt.xticks(\[\]), plt.yticks(\[\])

plt.show() @endcode Result:

## One Important Matter!

In our last example, output datatype is cv.CV\_8U or np.uint8. But there is a slight problem with that. Black-to-White transition is taken as Positive slope (it has a positive value) while White-to-Black transition is taken as a Negative slope (It has negative value). So when you convert data to np.uint8, all negative slopes are made zero. In simple words, you miss that edge.

If you want to detect both edges, better option is to keep the output datatype to some higher forms, like cv.CV\_16S, cv.CV\_64F etc, take its absolute value and then convert back to cv.CV\_8U. Below code demonstrates this procedure for a horizontal Sobel filter and difference in results. @code{.py} import numpy as np import cv2 as cv from matplotlib import pyplot as plt

img = cv.imread('box.png', cv.IMREAD\_GRAYSCALE) assert img is not None, "file could not be read, check with os.path.exists()"

# Output dtype = cv.CV\_8U

sobelx8u = cv.Sobel(img,cv.CV\_8U,1,0,ksize=5)

# Output dtype = cv.CV\_64F. Then take its absolute and convert to cv.CV\_8U

sobelx64f = cv.Sobel(img,cv.CV\_64F,1,0,ksize=5) abs\_sobel64f = np.absolute(sobelx64f) sobel\_8u = np.uint8(abs\_sobel64f)

plt.subplot(1,3,1),plt.imshow(img,cmap = 'gray') plt.title('Original'), plt.xticks(\[\]), plt.yticks(\[\]) plt.subplot(1,3,2),plt.imshow(sobelx8u,cmap = 'gray') plt.title('Sobel CV\_8U'), plt.xticks(\[\]), plt.yticks(\[\]) plt.subplot(1,3,3),plt.imshow(sobel\_8u,cmap = 'gray') plt.title('Sobel abs(CV\_64F)'), plt.xticks(\[\]), plt.yticks(\[\])

plt.show() @endcode Check the result below:

## [Py 2d Histogram](https://docharvest.github.io/docs/opencv5/py_tutorials/py_imgproc/py_histograms/py_2d_histogram/py_2d_histogram/)

Contents

opencv5

Py 2d Histogram

OpenCV 5

Py 2d Histogram

# Histograms - 3 : 2D Histograms {#tutorial\_py\_2d\_histogram}

## Goal

In this chapter, we will learn to find and plot 2D histograms. It will be helpful in coming chapters.

## Introduction

In the first article, we calculated and plotted one-dimensional histogram. It is called one-dimensional because we are taking only one feature into our consideration, ie grayscale intensity value of the pixel. But in two-dimensional histograms, you consider two features. Normally it is used for finding color histograms where two features are Hue & Saturation values of every pixel.

There is a python sample (samples/python/color\_histogram.py) already for finding color histograms. We will try to understand how to create such a color histogram, and it will be useful in understanding further topics like Histogram Back-Projection.

## 2D Histogram in OpenCV

It is quite simple and calculated using the same function, **cv.calcHist()**. For color histograms, we need to convert the image from BGR to HSV. (Remember, for 1D histogram, we converted from BGR to Grayscale). For 2D histograms, its parameters will be modified as follows:

-   **channels = \[0,1\]** _because we need to process both H and S plane._
-   **bins = \[180,256\]** _180 for H plane and 256 for S plane._
-   **range = \[0,180,0,256\]** _Hue value lies between 0 and 180 & Saturation lies between 0 and 256._

Now check the code below: @code{.py} import numpy as np import cv2 as cv

img = cv.imread('home.jpg') assert img is not None, "file could not be read, check with os.path.exists()" hsv = cv.cvtColor(img,cv.COLOR\_BGR2HSV)

hist = cv.calcHist(\[hsv\], \[0, 1\], None, \[180, 256\], \[0, 180, 0, 256\]) @endcode That's it.

## 2D Histogram in Numpy

Numpy also provides a specific function for this : **np.histogram2d()**. (Remember, for 1D histogram we used **np.histogram()** ). @code{.py} import numpy as np import cv2 as cv from matplotlib import pyplot as plt

img = cv.imread('home.jpg') assert img is not None, "file could not be read, check with os.path.exists()" hsv = cv.cvtColor(img,cv.COLOR\_BGR2HSV)

hist, xbins, ybins = np.histogram2d(h.ravel(),s.ravel(),\[180,256\],\[\[0,180\],\[0,256\]\]) @endcode First argument is H plane, second one is the S plane, third is number of bins for each and fourth is their range.

Now we can check how to plot this color histogram.

## Plotting 2D Histograms

### Method - 1 : Using cv.imshow()

The result we get is a two dimensional array of size 180x256. So we can show them as we do normally, using cv.imshow() function. It will be a grayscale image and it won't give much idea what colors are there, unless you know the Hue values of different colors.

### Method - 2 : Using Matplotlib

We can use **matplotlib.pyplot.imshow()** function to plot 2D histogram with different color maps. It gives us a much better idea about the different pixel density. But this also, doesn't gives us idea what color is there on a first look, unless you know the Hue values of different colors. Still I prefer this method. It is simple and better.

@note While using this function, remember, interpolation flag should be nearest for better results.

Consider code: @code{.py} import numpy as np import cv2 as cv from matplotlib import pyplot as plt

img = cv.imread('home.jpg') assert img is not None, "file could not be read, check with os.path.exists()" hsv = cv.cvtColor(img,cv.COLOR\_BGR2HSV) hist = cv.calcHist( \[hsv\], \[0, 1\], None, \[180, 256\], \[0, 180, 0, 256\] )

plt.imshow(hist,interpolation = 'nearest') plt.show() @endcode Below is the input image and its color histogram plot. X axis shows S values and Y axis shows Hue.

In histogram, you can see some high values near H = 100 and S = 200. It corresponds to blue of sky. Similarly another peak can be seen near H = 25 and S = 100. It corresponds to yellow of the palace. You can verify it with any image editing tools like GIMP.

### Method 3 : OpenCV sample style !!

There is a sample code for color-histogram in OpenCV-Python samples (samples/python/color\_histogram.py). If you run the code, you can see the histogram shows the corresponding color also. Or simply it outputs a color coded histogram. Its result is very good (although you need to add extra bunch of lines).

In that code, the author created a color map in HSV. Then converted it into BGR. The resulting histogram image is multiplied with this color map. He also uses some preprocessing steps to remove small isolated pixels, resulting in a good histogram.

I leave it to the readers to run the code, analyze it and have your own hack arounds. Below is the output of that code for the same image as above:

You can clearly see in the histogram what colors are present, blue is there, yellow is there, and some white due to chessboard is there. Nice !!!

## [Py Histogram Backprojection](https://docharvest.github.io/docs/opencv5/py_tutorials/py_imgproc/py_histograms/py_histogram_backprojection/py_histogram_backprojection/)

Contents

opencv5

Py Histogram Backprojection

OpenCV 5

Py Histogram Backprojection

# Histogram - 4 : Histogram Backprojection {#tutorial\_py\_histogram\_backprojection}

## Goal

In this chapter, we will learn about histogram backprojection.

## Theory

It was proposed by **Michael J. Swain , Dana H. Ballard** in their paper **Indexing via color histograms**.

**What is it actually in simple words?** It is used for image segmentation or finding objects of interest in an image. In simple words, it creates an image of the same size (but single channel) as that of our input image, where each pixel corresponds to the probability of that pixel belonging to our object. In more simpler words, the output image will have our object of interest in more white compared to remaining part. Well, that is an intuitive explanation. (I can't make it more simpler). Histogram Backprojection is used with camshift algorithm etc.

**How do we do it ?** We create a histogram of an image containing our object of interest (in our case, the ground, leaving player and other things). The object should fill the image as far as possible for better results. And a color histogram is preferred over grayscale histogram, because color of the object is a better way to define the object than its grayscale intensity. We then "back-project" this histogram over our test image where we need to find the object, ie in other words, we calculate the probability of every pixel belonging to the ground and show it. The resulting output on proper thresholding gives us the ground alone.

## Algorithm in Numpy

\-# First we need to calculate the color histogram of both the object we need to find (let it be 'M') and the image where we are going to search (let it be 'I'). @code{.py} import numpy as np import cv2 as cvfrom matplotlib import pyplot as plt

#roi is the object or region of object we need to find roi = cv.imread('rose\_red.png') assert roi is not None, "file could not be read, check with os.path.exists()" hsv = cv.cvtColor(roi,cv.COLOR\_BGR2HSV)

#target is the image we search in target = cv.imread('rose.png') assert target is not None, "file could not be read, check with os.path.exists()" hsvt = cv.cvtColor(target,cv.COLOR\_BGR2HSV)

# Find the histograms using calcHist. Can be done with np.histogram2d also

M = cv.calcHist(\[hsv\],\[0, 1\], None, \[180, 256\], \[0, 180, 0, 256\] ) I = cv.calcHist(\[hsvt\],\[0, 1\], None, \[180, 256\], \[0, 180, 0, 256\] ) @endcode 2. Find the ratio \\f$R = \\frac{M}{I}\\f$. Then backproject R, ie use R as palette and create a new image with every pixel as its corresponding probability of being target. ie B(x,y) = R\[h(x,y),s(x,y)\] where h is hue and s is saturation of the pixel at (x,y). After that apply the condition \\f$B(x,y) = min\[B(x,y), 1\]\\f$. @code{.py} h,s,v = cv.split(hsvt) B = R\[h.ravel(),s.ravel()\] B = np.minimum(B,1) B = B.reshape(hsvt.shape\[:2\]) @endcode 3. Now apply a convolution with a circular disc, \\f$B = D \\ast B\\f$, where D is the disc kernel. @code{.py} disc = cv.getStructuringElement(cv.MORPH\_ELLIPSE,(5,5)) cv.filter2D(B,-1,disc,B) B = np.uint8(B) cv.normalize(B,B,0,255,cv.NORM\_MINMAX) @endcode 4. Now the location of maximum intensity gives us the location of object. If we are expecting a region in the image, thresholding for a suitable value gives a nice result. @code{.py} ret,thresh = cv.threshold(B,50,255,0) @endcode That's it !!

## Backprojection in OpenCV

OpenCV provides an inbuilt function **cv.calcBackProject()**. Its parameters are almost same as the **cv.calcHist()** function. One of its parameter is histogram which is histogram of the object and we have to find it. Also, the object histogram should be normalized before passing on to the backproject function. It returns the probability image. Then we convolve the image with a disc kernel and apply threshold. Below is my code and output : @code{.py} import numpy as np import cv2 as cv

roi = cv.imread('rose\_red.png') assert roi is not None, "file could not be read, check with os.path.exists()" hsv = cv.cvtColor(roi,cv.COLOR\_BGR2HSV)

target = cv.imread('rose.png') assert target is not None, "file could not be read, check with os.path.exists()" hsvt = cv.cvtColor(target,cv.COLOR\_BGR2HSV)

# calculating object histogram

roihist = cv.calcHist(\[hsv\],\[0, 1\], None, \[180, 256\], \[0, 180, 0, 256\] )

# normalize histogram and apply backprojection

cv.normalize(roihist,roihist,0,255,cv.NORM\_MINMAX) dst = cv.calcBackProject(\[hsvt\],\[0,1\],roihist,\[0,180,0,256\],1)

# Now convolute with circular disc

disc = cv.getStructuringElement(cv.MORPH\_ELLIPSE,(5,5)) cv.filter2D(dst,-1,disc,dst)

# threshold and binary AND

ret,thresh = cv.threshold(dst,50,255,0) thresh = cv.merge((thresh,thresh,thresh)) res = cv.bitwise\_and(target,thresh)

res = np.vstack((target,thresh,res)) cv.imwrite('res.jpg',res) @endcode Below is one example I worked with. I used the region inside blue rectangle as sample object and I wanted to extract the full ground.

## Additional Resources

\-# "Indexing via color histograms", Swain, Michael J. , Third international conference on computer vision,1990.

## [Py Histogram Begins](https://docharvest.github.io/docs/opencv5/py_tutorials/py_imgproc/py_histograms/py_histogram_begins/py_histogram_begins/)

Contents

opencv5

Py Histogram Begins

OpenCV 5

Py Histogram Begins

# Histograms - 1 : Find, Plot, Analyze !!! {#tutorial\_py\_histogram\_begins}

## Goal

Learn to - Find histograms, using both OpenCV and Numpy functions - Plot histograms, using OpenCV and Matplotlib functions - You will see these functions : **cv.calcHist()**, **np.histogram()** etc.

## Theory

So what is histogram ? You can consider histogram as a graph or plot, which gives you an overall idea about the intensity distribution of an image. It is a plot with pixel values (ranging from 0 to 255, not always) in X-axis and corresponding number of pixels in the image on Y-axis.

It is just another way of understanding the image. By looking at the histogram of an image, you get intuition about contrast, brightness, intensity distribution etc of that image. Almost all image processing tools today, provides features on histogram. Below is an image from [Cambridge in Color website](http://www.cambridgeincolour.com/tutorials/histograms1.htm), and I recommend you to visit the site for more details.

You can see the image and its histogram. (Remember, this histogram is drawn for grayscale image, not color image). Left region of histogram shows the amount of darker pixels in image and right region shows the amount of brighter pixels. From the histogram, you can see dark region is more than brighter region, and amount of midtones (pixel values in mid-range, say around 127) are very less.

## Find Histogram

Now we have an idea on what is histogram, we can look into how to find this. Both OpenCV and Numpy come with in-built function for this. Before using those functions, we need to understand some terminologies related with histograms.

**BINS** :The above histogram shows the number of pixels for every pixel value, ie from 0 to 255. ie you need 256 values to show the above histogram. But consider, what if you need not find the number of pixels for all pixel values separately, but number of pixels in a interval of pixel values? say for example, you need to find the number of pixels lying between 0 to 15, then 16 to 31, ..., 240 to 255. You will need only 16 values to represent the histogram. And that is what is shown in example given in @ref tutorial\_histogram\_calculation "OpenCV Tutorials on histograms".

So what you do is simply split the whole histogram to 16 sub-parts and value of each sub-part is the sum of all pixel count in it. This each sub-part is called "BIN". In first case, number of bins were 256 (one for each pixel) while in second case, it is only 16. BINS is represented by the term **histSize** in OpenCV docs.

**DIMS** : It is the number of parameters for which we collect the data. In this case, we collect data regarding only one thing, intensity value. So here it is 1.

**RANGE** : It is the range of intensity values you want to measure. Normally, it is \[0,256\], ie all intensity values.

### 1\. Histogram Calculation in OpenCV

So now we use **cv.calcHist()** function to find the histogram. Let's familiarize with the function and its parameters :

_cv.calcHist(images, channels, mask, histSize, ranges\[, hist\[, accumulate\]\])_

\-# images : it is the source image of type uint8 or float32. it should be given in square brackets, ie, "\[img\]". -# channels : it is also given in square brackets. It is the index of channel for which we calculate histogram. For example, if input is grayscale image, its value is \[0\]. For color image, you can pass \[0\], \[1\] or \[2\] to calculate histogram of blue, green or red channel respectively. -# mask : mask image. To find histogram of full image, it is given as "None". But if you want to find histogram of particular region of image, you have to create a mask image for that and give it as mask. (I will show an example later.) -# histSize : this represents our BIN count. Need to be given in square brackets. For full scale, we pass \[256\]. -# ranges : this is our RANGE. Normally, it is \[0,256\].

So let's start with a sample image. Simply load an image in grayscale mode and find its full histogram. @code{.py} img = cv.imread('home.jpg', cv.IMREAD\_GRAYSCALE) assert img is not None, "file could not be read, check with os.path.exists()" hist = cv.calcHist(\[img\],\[0\],None,\[256\],\[0,256\]) @endcode hist is a 256x1 array, each value corresponds to number of pixels in that image with its corresponding pixel value.

### 2\. Histogram Calculation in Numpy

Numpy also provides you a function, **np.histogram()**. So instead of calcHist() function, you can try below line : @code{.py} hist,bins = np.histogram(img.ravel(),256,\[0,256\]) @endcode hist is same as we calculated before. But bins will have 257 elements, because Numpy calculates bins as 0-0.99, 1-1.99, 2-2.99 etc. So final range would be 255-255.99. To represent that, they also add 256 at end of bins. But we don't need that 256. Upto 255 is sufficient.

@note Numpy has another function, **np.bincount()** which is much faster than (around 10X) np.histogram(). So for one-dimensional histograms, you can better try that. Don't forget to set minlength = 256 in np.bincount. For example, hist = np.bincount(img.ravel(),minlength=256)

@note OpenCV function is faster than (around 40X) than np.histogram(). So stick with OpenCV function.

Now we should plot histograms, but how?

## Plotting Histograms

There are two ways for this, -# Short Way : use Matplotlib plotting functions -# Long Way : use OpenCV drawing functions

### 1\. Using Matplotlib

Matplotlib comes with a histogram plotting function : matplotlib.pyplot.hist()

It directly finds the histogram and plot it. You need not use calcHist() or np.histogram() function to find the histogram. See the code below: @code{.py} import numpy as np import cv2 as cv from matplotlib import pyplot as plt

img = cv.imread('home.jpg', cv.IMREAD\_GRAYSCALE) assert img is not None, "file could not be read, check with os.path.exists()" plt.hist(img.ravel(),256,\[0,256\]); plt.show() @endcode You will get a plot as below :

Or you can use normal plot of matplotlib, which would be good for BGR plot. For that, you need to find the histogram data first. Try below code: @code{.py} import numpy as np import cv2 as cv from matplotlib import pyplot as plt

img = cv.imread('home.jpg') assert img is not None, "file could not be read, check with os.path.exists()" color = ('b','g','r') for i,col in enumerate(color): histr = cv.calcHist(\[img\],\[i\],None,\[256\],\[0,256\]) plt.plot(histr,color = col) plt.xlim(\[0,256\]) plt.show() @endcode Result:

You can deduct from the above graph that, blue has some high value areas in the image (obviously it should be due to the sky)

### 2\. Using OpenCV

Well, here you adjust the values of histograms along with its bin values to look like x,y coordinates so that you can draw it using cv.line() or cv.polyline() function to generate same image as above. This is already available with OpenCV-Python official samples. Check the code at samples/python/hist.py.

## Application of Mask

We used cv.calcHist() to find the histogram of the full image. What if you want to find histograms of some regions of an image? Just create a mask image with white color on the region you want to find histogram and black otherwise. Then pass this as the mask. @code{.py} img = cv.imread('home.jpg', cv.IMREAD\_GRAYSCALE) assert img is not None, "file could not be read, check with os.path.exists()"

# create a mask

mask = np.zeros(img.shape\[:2\], np.uint8) mask\[100:300, 100:400\] = 255 masked\_img = cv.bitwise\_and(img,img,mask = mask)

# Calculate histogram with mask and without mask

# Check third argument for mask

hist\_full = cv.calcHist(\[img\],\[0\],None,\[256\],\[0,256\]) hist\_mask = cv.calcHist(\[img\],\[0\],mask,\[256\],\[0,256\])

plt.subplot(221), plt.imshow(img, 'gray') plt.subplot(222), plt.imshow(mask,'gray') plt.subplot(223), plt.imshow(masked\_img, 'gray') plt.subplot(224), plt.plot(hist\_full), plt.plot(hist\_mask) plt.xlim(\[0,256\])

plt.show() @endcode See the result. In the histogram plot, blue line shows histogram of full image while green line shows histogram of masked region.

## Additional Resources

\-# [Cambridge in Color website](http://www.cambridgeincolour.com/tutorials/histograms1.htm)

## [Py Histogram Equalization](https://docharvest.github.io/docs/opencv5/py_tutorials/py_imgproc/py_histograms/py_histogram_equalization/py_histogram_equalization/)

Contents

opencv5

Py Histogram Equalization

OpenCV 5

Py Histogram Equalization

# Histograms - 2: Histogram Equalization {#tutorial\_py\_histogram\_equalization}

## Goal

In this section,

-   We will learn the concepts of histogram equalization and use it to improve the contrast of our images.

## Theory

Consider an image whose pixel values are confined to some specific range of values only. For eg, brighter image will have all pixels confined to high values. But a good image will have pixels from all regions of the image. So you need to stretch this histogram to either ends (as given in below image, from wikipedia) and that is what Histogram Equalization does (in simple words). This normally improves the contrast of the image.

I would recommend you to read the wikipedia page on [Histogram Equalization](http://en.wikipedia.org/wiki/Histogram_equalization) for more details about it. It has a very good explanation with worked out examples, so that you would understand almost everything after reading that. Instead, here we will see its Numpy implementation. After that, we will see OpenCV function. @code{.py} import numpy as np import cv2 as cv from matplotlib import pyplot as plt

img = cv.imread('wiki.jpg', cv.IMREAD\_GRAYSCALE) assert img is not None, "file could not be read, check with os.path.exists()"

hist,bins = np.histogram(img.flatten(),256,\[0,256\])

cdf = hist.cumsum() cdf\_normalized = cdf \* float(hist.max()) / cdf.max()

plt.plot(cdf\_normalized, color = 'b') plt.hist(img.flatten(),256,\[0,256\], color = 'r') plt.xlim(\[0,256\]) plt.legend(('cdf','histogram'), loc = 'upper left') plt.show() @endcode

You can see histogram lies in brighter region. We need the full spectrum. For that, we need a transformation function which maps the input pixels in brighter region to output pixels in full region. That is what histogram equalization does.

Now we find the minimum histogram value (excluding 0) and apply the histogram equalization equation as given in wiki page. But I have used here, the masked array concept array from Numpy. For masked array, all operations are performed on non-masked elements. You can read more about it from Numpy docs on masked arrays. @code{.py} cdf\_m = np.ma.masked\_equal(cdf,0) cdf\_m = (cdf\_m - cdf\_m.min())\*255/(cdf\_m.max()-cdf\_m.min()) cdf = np.ma.filled(cdf\_m,0).astype('uint8') @endcode Now we have the look-up table that gives us the information on what is the output pixel value for every input pixel value. So we just apply the transform. @code{.py} img2 = cdf\[img\] @endcode Now we calculate its histogram and cdf as before ( you do it) and result looks like below :

Another important feature is that, even if the image was a darker image (instead of a brighter one we used), after equalization we will get almost the same image as we got. As a result, this is used as a "reference tool" to make all images with same lighting conditions. This is useful in many cases. For example, in face recognition, before training the face data, the images of faces are histogram equalized to make them all with same lighting conditions.

## Histograms Equalization in OpenCV

OpenCV has a function to do this, **cv.equalizeHist()**. Its input is just grayscale image and output is our histogram equalized image.

Below is a simple code snippet showing its usage for same image we used : @code{.py} img = cv.imread('wiki.jpg', cv.IMREAD\_GRAYSCALE) assert img is not None, "file could not be read, check with os.path.exists()" equ = cv.equalizeHist(img) res = np.hstack((img,equ)) #stacking images side-by-side cv.imwrite('res.png',res) @endcode

So now you can take different images with different light conditions, equalize it and check the results.

Histogram equalization is good when histogram of the image is confined to a particular region. It won't work good in places where there is large intensity variations where histogram covers a large region, ie both bright and dark pixels are present. Please check the SOF links in Additional Resources.

## CLAHE (Contrast Limited Adaptive Histogram Equalization)

The first histogram equalization we just saw, considers the global contrast of the image. In many cases, it is not a good idea. For example, below image shows an input image and its result after global histogram equalization.

It is true that the background contrast has improved after histogram equalization. But compare the face of statue in both images. We lost most of the information there due to over-brightness. It is because its histogram is not confined to a particular region as we saw in previous cases (Try to plot histogram of input image, you will get more intuition).

So to solve this problem, **adaptive histogram equalization** is used. In this, image is divided into small blocks called "tiles" (tileSize is 8x8 by default in OpenCV). Then each of these blocks are histogram equalized as usual. So in a small area, histogram would confine to a small region (unless there is noise). If noise is there, it will be amplified. To avoid this, **contrast limiting** is applied. If any histogram bin is above the specified contrast limit (by default 40 in OpenCV), those pixels are clipped and distributed uniformly to other bins before applying histogram equalization. After equalization, to remove artifacts in tile borders, bilinear interpolation is applied.

Below code snippet shows how to apply CLAHE in OpenCV: @code{.py} import numpy as np import cv2 as cv

img = cv.imread('tsukuba\_l.png', cv.IMREAD\_GRAYSCALE) assert img is not None, "file could not be read, check with os.path.exists()"

# create a CLAHE object (Arguments are optional).

clahe = cv.createCLAHE(clipLimit=2.0, tileGridSize=(8,8)) cl1 = clahe.apply(img)

cv.imwrite('clahe\_2.jpg',cl1) @endcode See the result below and compare it with results above, especially the statue region:

## Additional Resources

\-# Wikipedia page on [Histogram Equalization](http://en.wikipedia.org/wiki/Histogram_equalization) 2. [Masked Arrays in Numpy](http://docs.scipy.org/doc/numpy/reference/maskedarray.html)

Also check these SOF questions regarding contrast adjustment:

\-# [How can I adjust contrast in OpenCV in C?](http://stackoverflow.com/questions/10549245/how-can-i-adjust-contrast-in-opencv-in-c) 4. [How do I equalize contrast & brightness of images using opencv?](http://stackoverflow.com/questions/10561222/how-do-i-equalize-contrast-brightness-of-images-using-opencv)

## [Py Table Of Contents Histograms](https://docharvest.github.io/docs/opencv5/py_tutorials/py_imgproc/py_histograms/py_table_of_contents_histograms/)

Contents

opencv5

Py Table Of Contents Histograms

OpenCV 5

Py Table Of Contents Histograms

# Histograms in OpenCV {#tutorial\_py\_table\_of\_contents\_histograms}

-   @subpage tutorial\_py\_histogram\_begins
    
    Learn the basics of histograms
    
-   @subpage tutorial\_py\_histogram\_equalization
    
    Learn to Equalize Histograms to get better contrast for images
    
-   @subpage tutorial\_py\_2d\_histogram
    
    Learn to find and plot 2D Histograms
    
-   @subpage tutorial\_py\_histogram\_backprojection
    
    Learn histogram backprojection to segment colored objects

## [Py Houghcircles](https://docharvest.github.io/docs/opencv5/py_tutorials/py_imgproc/py_houghcircles/py_houghcircles/)

Contents

opencv5

Py Houghcircles

OpenCV 5

Py Houghcircles

# Hough Circle Transform {#tutorial\_py\_houghcircles}

## Goal

In this chapter, - We will learn to use Hough Transform to find circles in an image. - We will see these functions: **cv.HoughCircles()**

## Theory

A circle is represented mathematically as \\f$(x-x\_{center})^2 + (y - y\_{center})^2 = r^2\\f$ where \\f$(x\_{center},y\_{center})\\f$ is the center of the circle, and \\f$r\\f$ is the radius of the circle. From equation, we can see we have 3 parameters, so we need a 3D accumulator for hough transform, which would be highly ineffective. So OpenCV uses more trickier method, **Hough Gradient Method** which uses the gradient information of edges.

The function we use here is **cv.HoughCircles()**. It has plenty of arguments which are well explained in the documentation. So we directly go to the code. @code{.py} import numpy as np import cv2 as cv

img = cv.imread('opencv-logo-white.png', cv.IMREAD\_GRAYSCALE) assert img is not None, "file could not be read, check with os.path.exists()" img = cv.medianBlur(img,5) cimg = cv.cvtColor(img,cv.COLOR\_GRAY2BGR)

circles = cv.HoughCircles(img,cv.HOUGH\_GRADIENT,1,20, param1=50,param2=30,minRadius=0,maxRadius=0)

circles = np.uint16(np.around(circles)) for i in circles\[0,:\]: # draw the outer circle cv.circle(cimg,(i\[0\],i\[1\]),i\[2\],(0,255,0),2) # draw the center of the circle cv.circle(cimg,(i\[0\],i\[1\]),2,(0,0,255),3)

cv.imshow('detected circles',cimg) cv.waitKey(0) cv.destroyAllWindows() @endcode Result is shown below:

## [Py Houghlines](https://docharvest.github.io/docs/opencv5/py_tutorials/py_imgproc/py_houghlines/py_houghlines/)

Contents

opencv5

Py Houghlines

OpenCV 5

Py Houghlines

# Hough Line Transform {#tutorial\_py\_houghlines}

## Goal

In this chapter, - We will understand the concept of the Hough Transform. - We will see how to use it to detect lines in an image. - We will see the following functions: **cv.HoughLines()**, **cv.HoughLinesP()**

## Theory

The Hough Transform is a popular technique to detect any shape, if you can represent that shape in a mathematical form. It can detect the shape even if it is broken or distorted a little bit. We will see how it works for a line.

A line can be represented as \\f$y = mx+c\\f$ or in a parametric form, as \\f$\\rho = x \\cos \\theta + y \\sin \\theta\\f$ where \\f$\\rho\\f$ is the perpendicular distance from the origin to the line, and \\f$\\theta\\f$ is the angle formed by this perpendicular line and the horizontal axis measured in counter-clockwise (That direction varies on how you represent the coordinate system. This representation is used in OpenCV). Check the image below:

So if the line is passing below the origin, it will have a positive rho and an angle less than 180. If it is going above the origin, instead of taking an angle greater than 180, the angle is taken less than 180, and rho is taken negative. Any vertical line will have 0 degree and horizontal lines will have 90 degree.

Now let's see how the Hough Transform works for lines. Any line can be represented in these two terms, \\f$(\\rho, \\theta)\\f$. So first it creates a 2D array or accumulator (to hold the values of the two parameters) and it is set to 0 initially. Let rows denote the \\f$\\rho\\f$ and columns denote the \\f$\\theta\\f$. Size of array depends on the accuracy you need. Suppose you want the accuracy of angles to be 1 degree, you will need 180 columns. For \\f$\\rho\\f$, the maximum distance possible is the diagonal length of the image. So taking one pixel accuracy, the number of rows can be the diagonal length of the image.

Consider a 100x100 image with a horizontal line at the middle. Take the first point of the line. You know its (x,y) values. Now in the line equation, put the values \\f$\\theta = 0,1,2,....,180\\f$ and check the \\f$\\rho\\f$ you get. For every \\f$(\\rho, \\theta)\\f$ pair, you increment value by one in our accumulator in its corresponding \\f$(\\rho, \\theta)\\f$ cells. So now in accumulator, the cell (50,90) = 1 along with some other cells.

Now take the second point on the line. Do the same as above. Increment the values in the cells corresponding to `(rho, theta)` you got. This time, the cell (50,90) = 2. What you actually do is voting the \\f$(\\rho, \\theta)\\f$ values. You continue this process for every point on the line. At each point, the cell (50,90) will be incremented or voted up, while other cells may or may not be voted up. This way, at the end, the cell (50,90) will have maximum votes. So if you search the accumulator for maximum votes, you get the value (50,90) which says, there is a line in this image at a distance 50 from the origin and at angle 90 degrees. It is well shown in the below animation (Image Courtesy: [Amos Storkey](http://homepages.inf.ed.ac.uk/amos/hough.html) )

This is how hough transform works for lines. It is simple, and may be you can implement it using Numpy on your own. Below is an image which shows the accumulator. Bright spots at some locations denote they are the parameters of possible lines in the image. (Image courtesy: [Wikipedia](http://en.wikipedia.org/wiki/Hough_transform) )

# Hough Transform in OpenCV

Everything explained above is encapsulated in the OpenCV function, **cv.HoughLines()**. It simply returns an array of :math:(rho, theta)\` values. \\f$\\rho\\f$ is measured in pixels and \\f$\\theta\\f$ is measured in radians. First parameter, Input image should be a binary image, so apply threshold or use canny edge detection before applying hough transform. Second and third parameters are \\f$\\rho\\f$ and \\f$\\theta\\f$ accuracies respectively. Fourth argument is the threshold, which means the minimum vote it should get to be considered as a line. Remember, number of votes depends upon the number of points on the line. So it represents the minimum length of line that should be detected. @include hough\_line\_transform.py Check the results below:

## Probabilistic Hough Transform

In the hough transform, you can see that even for a line with two arguments, it takes a lot of computation. Probabilistic Hough Transform is an optimization of the Hough Transform we saw. It doesn't take all the points into consideration. Instead, it takes only a random subset of points which is sufficient for line detection. We just have to decrease the threshold. See image below which compares Hough Transform and Probabilistic Hough Transform in Hough space. (Image Courtesy : [Franck Bettinger's home page](http://phdfb1.free.fr/robot/mscthesis/node14.html) )

OpenCV implementation is based on Robust Detection of Lines Using the Progressive Probabilistic Hough Transform by Matas, J. and Galambos, C. and Kittler, J.V. @cite Matas00. The function used is **cv.HoughLinesP()**. It has two new arguments.

-   **minLineLength** - Minimum length of line. Line segments shorter than this are rejected.
-   **maxLineGap** - Maximum allowed gap between line segments to treat them as a single line.

Best thing is that, it directly returns the two endpoints of lines. In previous case, you got only the parameters of lines, and you had to find all the points. Here, everything is direct and simple. @include probabilistic\_hough\_line\_transform.py See the results below:

## Additional Resources

\-# [Hough Transform on Wikipedia](http://en.wikipedia.org/wiki/Hough_transform)

## [Py Morphological Ops](https://docharvest.github.io/docs/opencv5/py_tutorials/py_imgproc/py_morphological_ops/py_morphological_ops/)

Contents

opencv5

Py Morphological Ops

OpenCV 5

Py Morphological Ops

# Morphological Transformations {#tutorial\_py\_morphological\_ops}

## Goal

In this chapter, - We will learn different morphological operations like Erosion, Dilation, Opening, Closing etc. - We will see different functions like : **cv.erode()**, **cv.dilate()**, **cv.morphologyEx()** etc.

## Theory

Morphological transformations are some simple operations based on the image shape. It is normally performed on binary images. It needs two inputs, one is our original image, second one is called **structuring element** or **kernel** which decides the nature of operation. Two basic morphological operators are Erosion and Dilation. Then its variant forms like Opening, Closing, Gradient etc also comes into play. We will see them one-by-one with help of following image:

### 1\. Erosion

The basic idea of erosion is just like soil erosion only, it erodes away the boundaries of foreground object (Always try to keep foreground in white). So what it does? The kernel slides through the image (as in 2D convolution). A pixel in the original image (either 1 or 0) will be considered 1 only if all the pixels under the kernel is 1, otherwise it is eroded (made to zero).

So what happends is that, all the pixels near boundary will be discarded depending upon the size of kernel. So the thickness or size of the foreground object decreases or simply white region decreases in the image. It is useful for removing small white noises (as we have seen in colorspace chapter), detach two connected objects etc.

Here, as an example, I would use a 5x5 kernel with full of ones. Let's see it how it works: @code{.py} import cv2 as cv import numpy as np

img = cv.imread('j.png', cv.IMREAD\_GRAYSCALE) assert img is not None, "file could not be read, check with os.path.exists()" kernel = np.ones((5,5),np.uint8) erosion = cv.erode(img,kernel,iterations = 1) @endcode Result:

### 2\. Dilation

It is just opposite of erosion. Here, a pixel element is '1' if at least one pixel under the kernel is '1'. So it increases the white region in the image or size of foreground object increases. Normally, in cases like noise removal, erosion is followed by dilation. Because, erosion removes white noises, but it also shrinks our object. So we dilate it. Since noise is gone, they won't come back, but our object area increases. It is also useful in joining broken parts of an object. @code{.py} dilation = cv.dilate(img,kernel,iterations = 1) @endcode Result:

### 3\. Opening

Opening is just another name of **erosion followed by dilation**. It is useful in removing noise, as we explained above. Here we use the function, **cv.morphologyEx()** @code{.py} opening = cv.morphologyEx(img, cv.MORPH\_OPEN, kernel) @endcode Result:

### 4\. Closing

Closing is reverse of Opening, **Dilation followed by Erosion**. It is useful in closing small holes inside the foreground objects, or small black points on the object. @code{.py} closing = cv.morphologyEx(img, cv.MORPH\_CLOSE, kernel) @endcode Result:

### 5\. Morphological Gradient

It is the difference between dilation and erosion of an image.

The result will look like the outline of the object. @code{.py} gradient = cv.morphologyEx(img, cv.MORPH\_GRADIENT, kernel) @endcode Result:

### 6\. Top Hat

It is the difference between input image and Opening of the image. Below example is done for a 9x9 kernel. @code{.py} tophat = cv.morphologyEx(img, cv.MORPH\_TOPHAT, kernel) @endcode Result:

### 7\. Black Hat

It is the difference between the closing of the input image and input image. @code{.py} blackhat = cv.morphologyEx(img, cv.MORPH\_BLACKHAT, kernel) @endcode Result:

## Structuring Element

We manually created a structuring elements in the previous examples with help of Numpy. It is rectangular shape. But in some cases, you may need elliptical/circular shaped kernels. So for this purpose, OpenCV has a function, **cv.getStructuringElement()**. You just pass the shape and size of the kernel, you get the desired kernel. @code{.py}

# Rectangular Kernel

> > > cv.getStructuringElement(cv.MORPH\_RECT,(5,5)) array(\[\[1, 1, 1, 1, 1\], \[1, 1, 1, 1, 1\], \[1, 1, 1, 1, 1\], \[1, 1, 1, 1, 1\], \[1, 1, 1, 1, 1\]\], dtype=uint8)

# Elliptical Kernel

> > > cv.getStructuringElement(cv.MORPH\_ELLIPSE,(5,5)) array(\[\[0, 0, 1, 0, 0\], \[1, 1, 1, 1, 1\], \[1, 1, 1, 1, 1\], \[1, 1, 1, 1, 1\], \[0, 0, 1, 0, 0\]\], dtype=uint8)

# Cross-shaped Kernel

> > > cv.getStructuringElement(cv.MORPH\_CROSS,(5,5)) array(\[\[0, 0, 1, 0, 0\], \[0, 0, 1, 0, 0\], \[1, 1, 1, 1, 1\], \[0, 0, 1, 0, 0\], \[0, 0, 1, 0, 0\]\], dtype=uint8)

# Diamond-shaped Kernel

> > > cv.getStructuringElement(cv.MORPH\_DIAMOND,(5,5)) array(\[\[0, 0, 1, 0, 0\], \[0, 1, 1, 1, 0\], \[1, 1, 1, 1, 1\], \[0, 1, 1, 1, 0\], \[0, 0, 1, 0, 0\]\], dtype=uint8) @endcode Additional Resources

* * *

\-# [Morphological Operations](http://homepages.inf.ed.ac.uk/rbf/HIPR2/morops.htm) at HIPR2

## [Py Pyramids](https://docharvest.github.io/docs/opencv5/py_tutorials/py_imgproc/py_pyramids/py_pyramids/)

Contents

opencv5

Py Pyramids

OpenCV 5

Py Pyramids

# Image Pyramids {#tutorial\_py\_pyramids}

## Goal

In this chapter, - We will learn about Image Pyramids - We will use Image pyramids to create a new fruit, "Orapple" - We will see these functions: **cv.pyrUp()**, **cv.pyrDown()**

## Theory

Normally, we used to work with an image of constant size. But on some occasions, we need to work with (the same) images in different resolution. For example, while searching for something in an image, like face, we are not sure at what size the object will be present in said image. In that case, we will need to create a set of the same image with different resolutions and search for object in all of them. These set of images with different resolutions are called **Image Pyramids** (because when they are kept in a stack with the highest resolution image at the bottom and the lowest resolution image at top, it looks like a pyramid).

There are two kinds of Image Pyramids. 1) **Gaussian Pyramid** and 2) **Laplacian Pyramids**

Higher level (Low resolution) in a Gaussian Pyramid is formed by removing consecutive rows and columns in Lower level (higher resolution) image. Then each pixel in higher level is formed by the contribution from 5 pixels in underlying level with gaussian weights. By doing so, a \\f$M \\times N\\f$ image becomes \\f$M/2 \\times N/2\\f$ image. So area reduces to one-fourth of original area. It is called an Octave. The same pattern continues as we go upper in pyramid (ie, resolution decreases). Similarly while expanding, area becomes 4 times in each level. We can find Gaussian pyramids using **cv.pyrDown()** and **cv.pyrUp()** functions. @code{.py} img = cv.imread('messi5.jpg') assert img is not None, "file could not be read, check with os.path.exists()" lower\_reso = cv.pyrDown(higher\_reso) @endcode Below is the 4 levels in an image pyramid.

Now you can go down the image pyramid with **cv.pyrUp()** function. @code{.py} higher\_reso2 = cv.pyrUp(lower\_reso) @endcode Remember, higher\_reso2 is not equal to higher\_reso, because once you decrease the resolution, you loose the information. Below image is 3 level down the pyramid created from smallest image in previous case. Compare it with original image:

Laplacian Pyramids are formed from the Gaussian Pyramids. There is no exclusive function for that. Laplacian pyramid images are like edge images only. Most of its elements are zeros. They are used in image compression. A level in Laplacian Pyramid is formed by the difference between that level in Gaussian Pyramid and expanded version of its upper level in Gaussian Pyramid. The three levels of a Laplacian level will look like below (contrast is adjusted to enhance the contents):

## Image Blending using Pyramids

One application of Pyramids is Image Blending. For example, in image stitching, you will need to stack two images together, but it may not look good due to discontinuities between images. In that case, image blending with Pyramids gives you seamless blending without leaving much data in the images. One classical example of this is the blending of two fruits, Orange and Apple. See the result now itself to understand what I am saying:

Please check first reference in additional resources, it has full diagramatic details on image blending, Laplacian Pyramids etc. Simply it is done as follows:

\-# Load the two images of apple and orange 2. Find the Gaussian Pyramids for apple and orange (in this particular example, number of levels is 6) 3. From Gaussian Pyramids, find their Laplacian Pyramids 4. Now join the left half of apple and right half of orange in each levels of Laplacian Pyramids 5. Finally from this joint image pyramids, reconstruct the original image.

Below is the full code. (For sake of simplicity, each step is done separately which may take more memory. You can optimize it if you want so). @code{.py} import cv2 as cv import numpy as np,sys

A = cv.imread('apple.jpg') B = cv.imread('orange.jpg') assert A is not None, "file could not be read, check with os.path.exists()" assert B is not None, "file could not be read, check with os.path.exists()"

# generate Gaussian pyramid for A

G = A.copy() gpA = \[G\] for i in range(6): G = cv.pyrDown(G) gpA.append(G)

# generate Gaussian pyramid for B

G = B.copy() gpB = \[G\] for i in range(6): G = cv.pyrDown(G) gpB.append(G)

# generate Laplacian Pyramid for A

lpA = \[gpA\[5\]\] for i in range(5,0,-1): GE = cv.pyrUp(gpA\[i\]) L = cv.subtract(gpA\[i-1\],GE) lpA.append(L)

# generate Laplacian Pyramid for B

lpB = \[gpB\[5\]\] for i in range(5,0,-1): GE = cv.pyrUp(gpB\[i\]) L = cv.subtract(gpB\[i-1\],GE) lpB.append(L)

# Now add left and right halves of images in each level

LS = \[\] for la,lb in zip(lpA,lpB): rows,cols,dpt = la.shape ls = np.hstack((la\[:,0:cols//2\], lb\[:,cols//2:\])) LS.append(ls)

# now reconstruct

ls\_ = LS\[0\] for i in range(1,6): ls\_ = cv.pyrUp(ls\_) ls\_ = cv.add(ls\_, LS\[i\])

# image with direct connecting each half

real = np.hstack((A\[:,:cols//2\],B\[:,cols//2:\]))

## cv.imwrite('Pyramid\_blending2.jpg',ls\_) cv.imwrite('Direct\_blending.jpg',real) @endcode Additional Resources

\-# [Image Blending](http://pages.cs.wisc.edu/~csverma/CS766_09/ImageMosaic/imagemosaic.html)

## [Py Table Of Contents Imgproc](https://docharvest.github.io/docs/opencv5/py_tutorials/py_imgproc/py_table_of_contents_imgproc/)

Contents

opencv5

Py Table Of Contents Imgproc

OpenCV 5

Py Table Of Contents Imgproc

# Image Processing in OpenCV {#tutorial\_py\_table\_of\_contents\_imgproc}

-   @subpage tutorial\_py\_colorspaces
    
    Learn to change images between different color spaces. Plus learn to track a colored object in a video.
    
-   @subpage tutorial\_py\_geometric\_transformations
    
    Learn to apply different geometric transformations to images like rotation, translation etc.
    
-   @subpage tutorial\_py\_thresholding
    
    Learn to convert images to binary images using global thresholding, Adaptive thresholding, Otsu's binarization etc
    
-   @subpage tutorial\_py\_filtering
    
    Learn to blur the images, filter the images with custom kernels etc.
    
-   @subpage tutorial\_py\_morphological\_ops
    
    Learn about morphological transformations like Erosion, Dilation, Opening, Closing etc
    
-   @subpage tutorial\_py\_gradients
    
    Learn to find image gradients, edges etc.
    
-   @subpage tutorial\_py\_canny
    
    Learn to find edges with Canny Edge Detection
    
-   @subpage tutorial\_py\_pyramids
    
    Learn about image pyramids and how to use them for image blending
    
-   @subpage tutorial\_py\_table\_of\_contents\_contours
    
    All about Contours in OpenCV
    
-   @subpage tutorial\_py\_table\_of\_contents\_histograms
    
    All about histograms in OpenCV
    
-   @subpage tutorial\_py\_table\_of\_contents\_transforms
    
    Meet different Image Transforms in OpenCV like Fourier Transform, Cosine Transform etc.
    
-   @subpage tutorial\_py\_template\_matching
    
    Learn to search for an object in an image using Template Matching
    
-   @subpage tutorial\_py\_houghlines
    
    Learn to detect lines in an image
    
-   @subpage tutorial\_py\_houghcircles
    
    Learn to detect circles in an image
    
-   @subpage tutorial\_py\_watershed
    
    Learn to segment images with watershed segmentation
    
-   @subpage tutorial\_py\_grabcut
    
    Learn to extract foreground with GrabCut algorithm

## [Py Template Matching](https://docharvest.github.io/docs/opencv5/py_tutorials/py_imgproc/py_template_matching/py_template_matching/)

Contents

opencv5

Py Template Matching

OpenCV 5

Py Template Matching

# Template Matching {#tutorial\_py\_template\_matching}

## Goals

In this chapter, you will learn - To find objects in an image using Template Matching - You will see these functions : **cv.matchTemplate()**, **cv.minMaxLoc()**

## Theory

Template Matching is a method for searching and finding the location of a template image in a larger image. OpenCV comes with a function **cv.matchTemplate()** for this purpose. It simply slides the template image over the input image (as in 2D convolution) and compares the template and patch of input image under the template image. Several comparison methods are implemented in OpenCV. (You can check docs for more details). It returns a grayscale image, where each pixel denotes how much does the neighbourhood of that pixel match with template.

If input image is of size (WxH) and template image is of size (wxh), output image will have a size of (W-w+1, H-h+1). Once you got the result, you can use **cv.minMaxLoc()** function to find where is the maximum/minimum value. Take it as the top-left corner of rectangle and take (w,h) as width and height of the rectangle. That rectangle is your region of template.

@note If you are using cv.TM\_SQDIFF as comparison method, minimum value gives the best match.

## Template Matching in OpenCV

Here, as an example, we will search for Messi's face in his photo. So I created a template as below:

We will try all the comparison methods so that we can see how their results look like: @code{.py} import cv2 as cv import numpy as np from matplotlib import pyplot as plt

img = cv.imread('messi5.jpg', cv.IMREAD\_GRAYSCALE) assert img is not None, "file could not be read, check with os.path.exists()" img2 = img.copy() template = cv.imread('template.jpg', cv.IMREAD\_GRAYSCALE) assert template is not None, "file could not be read, check with os.path.exists()" w, h = template.shape\[::-1\]

# All the 6 methods for comparison in a list

methods = \['TM\_CCOEFF', 'TM\_CCOEFF\_NORMED', 'TM\_CCORR', 'TM\_CCORR\_NORMED', 'TM\_SQDIFF', 'TM\_SQDIFF\_NORMED'\]

for meth in methods: img = img2.copy() method = getattr(cv, meth)

```
# Apply template Matching
res = cv.matchTemplate(img,template,method)
min_val, max_val, min_loc, max_loc = cv.minMaxLoc(res)

# If the method is TM_SQDIFF or TM_SQDIFF_NORMED, take minimum
if method in [cv.TM_SQDIFF, cv.TM_SQDIFF_NORMED]:
    top_left = min_loc
else:
    top_left = max_loc
bottom_right = (top_left[0] + w, top_left[1] + h)

cv.rectangle(img,top_left, bottom_right, 255, 2)

plt.subplot(121),plt.imshow(res,cmap = 'gray')
plt.title('Matching Result'), plt.xticks([]), plt.yticks([])
plt.subplot(122),plt.imshow(img,cmap = 'gray')
plt.title('Detected Point'), plt.xticks([]), plt.yticks([])
plt.suptitle(meth)

plt.show()
```

@endcode See the results below:

-   cv.TM\_CCOEFF

-   cv.TM\_CCOEFF\_NORMED

-   cv.TM\_CCORR

-   cv.TM\_CCORR\_NORMED

-   cv.TM\_SQDIFF

-   cv.TM\_SQDIFF\_NORMED

You can see that the result using **cv.TM\_CCORR** is not good as we expected.

## Template Matching with Multiple Objects

In the previous section, we searched image for Messi's face, which occurs only once in the image. Suppose you are searching for an object which has multiple occurrences, **cv.minMaxLoc()** won't give you all the locations. In that case, we will use thresholding. So in this example, we will use a screenshot of the famous game **Mario** and we will find the coins in it. @code{.py} import cv2 as cv import numpy as np from matplotlib import pyplot as plt

img\_rgb = cv.imread('mario.png') assert img\_rgb is not None, "file could not be read, check with os.path.exists()" img\_gray = cv.cvtColor(img\_rgb, cv.COLOR\_BGR2GRAY) template = cv.imread('mario\_coin.png', cv.IMREAD\_GRAYSCALE) assert template is not None, "file could not be read, check with os.path.exists()" w, h = template.shape\[::-1\]

res = cv.matchTemplate(img\_gray,template,cv.TM\_CCOEFF\_NORMED) threshold = 0.8 loc = np.where( res >= threshold) for pt in zip(\*loc\[::-1\]): cv.rectangle(img\_rgb, pt, (pt\[0\] + w, pt\[1\] + h), (0,0,255), 2)

cv.imwrite('res.png',img\_rgb) @endcode Result:

## [Py Thresholding](https://docharvest.github.io/docs/opencv5/py_tutorials/py_imgproc/py_thresholding/py_thresholding/)

Contents

opencv5

Py Thresholding

OpenCV 5

Py Thresholding

# Image Thresholding {#tutorial\_py\_thresholding}

## Goal

-   In this tutorial, you will learn simple thresholding, adaptive thresholding and Otsu's thresholding.
-   You will learn the functions **cv.threshold** and **cv.adaptiveThreshold**.

## Simple Thresholding

Here, the matter is straight-forward. For every pixel, the same threshold value is applied. If the pixel value is smaller than or equal to the threshold, it is set to 0, otherwise it is set to a maximum value. The function **cv.threshold** is used to apply the thresholding. The first argument is the source image, which **should be a grayscale image**. The second argument is the threshold value which is used to classify the pixel values. The third argument is the maximum value which is assigned to pixel values exceeding the threshold. OpenCV provides different types of thresholding which is given by the fourth parameter of the function. Basic thresholding as described above is done by using the type cv.THRESH\_BINARY. All simple thresholding types are:

-   cv.THRESH\_BINARY
-   cv.THRESH\_BINARY\_INV
-   cv.THRESH\_TRUNC
-   cv.THRESH\_TOZERO
-   cv.THRESH\_TOZERO\_INV

See the documentation of the types for the differences.

The method returns two outputs. The first is the threshold that was used and the second output is the **thresholded image**.

This code compares the different simple thresholding types: @code{.py} import cv2 as cv import numpy as np from matplotlib import pyplot as plt

img = cv.imread('gradient.png', cv.IMREAD\_GRAYSCALE) assert img is not None, "file could not be read, check with os.path.exists()" ret,thresh1 = cv.threshold(img,127,255,cv.THRESH\_BINARY) ret,thresh2 = cv.threshold(img,127,255,cv.THRESH\_BINARY\_INV) ret,thresh3 = cv.threshold(img,127,255,cv.THRESH\_TRUNC) ret,thresh4 = cv.threshold(img,127,255,cv.THRESH\_TOZERO) ret,thresh5 = cv.threshold(img,127,255,cv.THRESH\_TOZERO\_INV)

titles = \['Original Image','BINARY','BINARY\_INV','TRUNC','TOZERO','TOZERO\_INV'\] images = \[img, thresh1, thresh2, thresh3, thresh4, thresh5\]

for i in range(6): plt.subplot(2,3,i+1),plt.imshow(images\[i\],'gray',vmin=0,vmax=255) plt.title(titles\[i\]) plt.xticks(\[\]),plt.yticks(\[\])

plt.show() @endcode @note To plot multiple images, we have used the plt.subplot() function. Please checkout the matplotlib docs for more details.

The code yields this result:

## Adaptive Thresholding

In the previous section, we used one global value as a threshold. But this might not be good in all cases, e.g. if an image has different lighting conditions in different areas. In that case, adaptive thresholding can help. Here, the algorithm determines the threshold for a pixel based on a small region around it. So we get different thresholds for different regions of the same image which gives better results for images with varying illumination.

In addition to the parameters described above, the method cv.adaptiveThreshold takes three input parameters:

The **adaptiveMethod** decides how the threshold value is calculated: - cv.ADAPTIVE\_THRESH\_MEAN\_C: The threshold value is the mean of the neighbourhood area minus the constant **C**. - cv.ADAPTIVE\_THRESH\_GAUSSIAN\_C: The threshold value is a gaussian-weighted sum of the neighbourhood values minus the constant **C**.

The **blockSize** determines the size of the neighbourhood area and **C** is a constant that is subtracted from the mean or weighted sum of the neighbourhood pixels.

The code below compares global thresholding and adaptive thresholding for an image with varying illumination: @code{.py} import cv2 as cv import numpy as np from matplotlib import pyplot as plt

img = cv.imread('sudoku.png', cv.IMREAD\_GRAYSCALE) assert img is not None, "file could not be read, check with os.path.exists()" img = cv.medianBlur(img,5)

ret,th1 = cv.threshold(img,127,255,cv.THRESH\_BINARY) th2 = cv.adaptiveThreshold(img,255,cv.ADAPTIVE\_THRESH\_MEAN\_C,  
cv.THRESH\_BINARY,11,2) th3 = cv.adaptiveThreshold(img,255,cv.ADAPTIVE\_THRESH\_GAUSSIAN\_C,  
cv.THRESH\_BINARY,11,2)

titles = \['Original Image', 'Global Thresholding (v = 127)', 'Adaptive Mean Thresholding', 'Adaptive Gaussian Thresholding'\] images = \[img, th1, th2, th3\]

for i in range(4): plt.subplot(2,2,i+1),plt.imshow(images\[i\],'gray') plt.title(titles\[i\]) plt.xticks(\[\]),plt.yticks(\[\]) plt.show() @endcode Result:

## Otsu's Binarization

In global thresholding, we used an arbitrary chosen value as a threshold. In contrast, Otsu's method avoids having to choose a value and determines it automatically.

Consider an image with only two distinct image values (_bimodal image_), where the histogram would only consist of two peaks. A good threshold would be in the middle of those two values. Similarly, Otsu's method determines an optimal global threshold value from the image histogram.

In order to do so, the cv.threshold() function is used, where cv.THRESH\_OTSU is passed as an extra flag. The threshold value can be chosen arbitrary. The algorithm then finds the optimal threshold value which is returned as the first output.

Check out the example below. The input image is a noisy image. In the first case, global thresholding with a value of 127 is applied. In the second case, Otsu's thresholding is applied directly. In the third case, the image is first filtered with a 5x5 gaussian kernel to remove the noise, then Otsu thresholding is applied. See how noise filtering improves the result. @code{.py} import cv2 as cv import numpy as np from matplotlib import pyplot as plt

img = cv.imread('noisy2.png', cv.IMREAD\_GRAYSCALE) assert img is not None, "file could not be read, check with os.path.exists()"

# global thresholding

ret1,th1 = cv.threshold(img,127,255,cv.THRESH\_BINARY)

# Otsu's thresholding

ret2,th2 = cv.threshold(img,0,255,cv.THRESH\_BINARY+cv.THRESH\_OTSU)

# Otsu's thresholding after Gaussian filtering

blur = cv.GaussianBlur(img,(5,5),0) ret3,th3 = cv.threshold(blur,0,255,cv.THRESH\_BINARY+cv.THRESH\_OTSU)

# plot all the images and their histograms

images = \[img, 0, th1, img, 0, th2, blur, 0, th3\] titles = \['Original Noisy Image','Histogram','Global Thresholding (v=127)', 'Original Noisy Image','Histogram',"Otsu's Thresholding", 'Gaussian filtered Image','Histogram',"Otsu's Thresholding"\]

for i in range(3): plt.subplot(3,3,i_3+1),plt.imshow(images\[i_3\],'gray') plt.title(titles\[i_3\]), plt.xticks(\[\]), plt.yticks(\[\]) plt.subplot(3,3,i_3+2),plt.hist(images\[i_3\].ravel(),256) plt.title(titles\[i_3+1\]), plt.xticks(\[\]), plt.yticks(\[\]) plt.subplot(3,3,i_3+3),plt.imshow(images\[i_3+2\],'gray') plt.title(titles\[i\*3+2\]), plt.xticks(\[\]), plt.yticks(\[\]) plt.show() @endcode Result:

### How does Otsu's Binarization work?

This section demonstrates a Python implementation of Otsu's binarization to show how it actually works. If you are not interested, you can skip this.

Since we are working with bimodal images, Otsu's algorithm tries to find a threshold value (t) which minimizes the **weighted within-class variance** given by the relation:

\\f\[\\sigma\_w^2(t) = q\_1(t)\\sigma\_1^2(t)+q\_2(t)\\sigma\_2^2(t)\\f\]

where

\\f\[q\_1(t) = \\sum\_{i=1}^{t} P(i) \\quad & \\quad q\_2(t) = \\sum\_{i=t+1}^{I} P(i)\\f\]\\f\[\\mu\_1(t) = \\sum\_{i=1}^{t} \\frac{iP(i)}{q\_1(t)} \\quad & \\quad \\mu\_2(t) = \\sum\_{i=t+1}^{I} \\frac{iP(i)}{q\_2(t)}\\f\]\\f\[\\sigma\_1^2(t) = \\sum\_{i=1}^{t} \[i-\\mu\_1(t)\]^2 \\frac{P(i)}{q\_1(t)} \\quad & \\quad \\sigma\_2^2(t) = \\sum\_{i=t+1}^{I} \[i-\\mu\_2(t)\]^2 \\frac{P(i)}{q\_2(t)}\\f\]

It actually finds a value of t which lies in between two peaks such that variances to both classes are minimal. It can be simply implemented in Python as follows: @code{.py} img = cv.imread('noisy2.png', cv.IMREAD\_GRAYSCALE) assert img is not None, "file could not be read, check with os.path.exists()" blur = cv.GaussianBlur(img,(5,5),0)

# find normalized\_histogram, and its cumulative distribution function

hist = cv.calcHist(\[blur\],\[0\],None,\[256\],\[0,256\]) hist\_norm = hist.ravel()/hist.sum() Q = hist\_norm.cumsum()

bins = np.arange(256)

fn\_min = np.inf thresh = -1

for i in range(1,256): p1,p2 = np.hsplit(hist\_norm,\[i\]) # probabilities q1,q2 = Q\[i\],Q\[255\]-Q\[i\] # cum sum of classes if q1 < 1.e-6 or q2 < 1.e-6: continue b1,b2 = np.hsplit(bins,\[i\]) # weights

```
# finding means and variances
m1,m2 = np.sum(p1*b1)/q1, np.sum(p2*b2)/q2
v1,v2 = np.sum(((b1-m1)**2)*p1)/q1,np.sum(((b2-m2)**2)*p2)/q2

# calculates the minimization function
fn = v1*q1 + v2*q2
if fn < fn_min:
    fn_min = fn
    thresh = i
```

# find otsu's threshold value with OpenCV function

ret, otsu = cv.threshold(blur,0,255,cv.THRESH\_BINARY+cv.THRESH\_OTSU) print( "{} {}".format(thresh,ret) ) @endcode

## Additional Resources

\-# Digital Image Processing, Rafael C. Gonzalez

## Exercises

\-# There are some optimizations available for Otsu's binarization. You can search and implement it.

## [Py Fourier Transform](https://docharvest.github.io/docs/opencv5/py_tutorials/py_imgproc/py_transforms/py_fourier_transform/py_fourier_transform/)

Contents

opencv5

Py Fourier Transform

OpenCV 5

Py Fourier Transform

# Fourier Transform {#tutorial\_py\_fourier\_transform}

## Goal

In this section, we will learn - To find the Fourier Transform of images using OpenCV - To utilize the FFT functions available in Numpy - Some applications of Fourier Transform - We will see following functions : **cv.dft()**, **cv.idft()** etc

## Theory

Fourier Transform is used to analyze the frequency characteristics of various filters. For images, **2D Discrete Fourier Transform (DFT)** is used to find the frequency domain. A fast algorithm called **Fast Fourier Transform (FFT)** is used for calculation of DFT. Details about these can be found in any image processing or signal processing textbooks. Please see Additional Resources\_ section.

For a sinusoidal signal, \\f$x(t) = A \\sin(2 \\pi ft)\\f$, we can say \\f$f\\f$ is the frequency of signal, and if its frequency domain is taken, we can see a spike at \\f$f\\f$. If signal is sampled to form a discrete signal, we get the same frequency domain, but is periodic in the range \\f$\[- \\pi, \\pi\]\\f$ or \\f$\[0,2\\pi\]\\f$ (or \\f$\[0,N\]\\f$ for N-point DFT). You can consider an image as a signal which is sampled in two directions. So taking fourier transform in both X and Y directions gives you the frequency representation of image.

More intuitively, for the sinusoidal signal, if the amplitude varies so fast in short time, you can say it is a high frequency signal. If it varies slowly, it is a low frequency signal. You can extend the same idea to images. Where does the amplitude varies drastically in images ? At the edge points, or noises. So we can say, edges and noises are high frequency contents in an image. If there is no much changes in amplitude, it is a low frequency component. ( Some links are added to Additional Resources\_ which explains frequency transform intuitively with examples).

Now we will see how to find the Fourier Transform.

## Fourier Transform in Numpy

First we will see how to find Fourier Transform using Numpy. Numpy has an FFT package to do this. **np.fft.fft2()** provides us the frequency transform which will be a complex array. Its first argument is the input image, which is grayscale. Second argument is optional which decides the size of output array. If it is greater than size of input image, input image is padded with zeros before calculation of FFT. If it is less than input image, input image will be cropped. If no arguments passed, Output array size will be same as input.

Now once you got the result, zero frequency component (DC component) will be at top left corner. If you want to bring it to center, you need to shift the result by \\f$\\frac{N}{2}\\f$ in both the directions. This is simply done by the function, **np.fft.fftshift()**. (It is more easier to analyze). Once you found the frequency transform, you can find the magnitude spectrum. @code{.py} import cv2 as cv import numpy as np from matplotlib import pyplot as plt

img = cv.imread('messi5.jpg', cv.IMREAD\_GRAYSCALE) assert img is not None, "file could not be read, check with os.path.exists()" f = np.fft.fft2(img) fshift = np.fft.fftshift(f) magnitude\_spectrum = 20\*np.log(np.abs(fshift))

plt.subplot(121),plt.imshow(img, cmap = 'gray') plt.title('Input Image'), plt.xticks(\[\]), plt.yticks(\[\]) plt.subplot(122),plt.imshow(magnitude\_spectrum, cmap = 'gray') plt.title('Magnitude Spectrum'), plt.xticks(\[\]), plt.yticks(\[\]) plt.show() @endcode Result look like below:

See, You can see more whiter region at the center showing low frequency content is more.

So you found the frequency transform Now you can do some operations in frequency domain, like high pass filtering and reconstruct the image, ie find inverse DFT. For that you simply remove the low frequencies by masking with a rectangular window of size 60x60. Then apply the inverse shift using **np.fft.ifftshift()** so that DC component again come at the top-left corner. Then find inverse FFT using **np.ifft2()** function. The result, again, will be a complex number. You can take its absolute value. @code{.py} rows, cols = img.shape crow, ccol = rows//2, cols//2 fshift\[crow-30:crow+31, ccol-30:ccol+31\] = 0 f\_ishift = np.fft.ifftshift(fshift) img\_back = np.fft.ifft2(f\_ishift) img\_back = np.real(img\_back)

plt.subplot(131),plt.imshow(img, cmap = 'gray') plt.title('Input Image'), plt.xticks(\[\]), plt.yticks(\[\]) plt.subplot(132),plt.imshow(img\_back, cmap = 'gray') plt.title('Image after HPF'), plt.xticks(\[\]), plt.yticks(\[\]) plt.subplot(133),plt.imshow(img\_back) plt.title('Result in JET'), plt.xticks(\[\]), plt.yticks(\[\])

plt.show() @endcode Result look like below:

The result shows High Pass Filtering is an edge detection operation. This is what we have seen in Image Gradients chapter. This also shows that most of the image data is present in the Low frequency region of the spectrum. Anyway we have seen how to find DFT, IDFT etc in Numpy. Now let's see how to do it in OpenCV.

If you closely watch the result, especially the last image in JET color, you can see some artifacts (One instance I have marked in red arrow). It shows some ripple like structures there, and it is called **ringing effects**. It is caused by the rectangular window we used for masking. This mask is converted to sinc shape which causes this problem. So rectangular windows is not used for filtering. Better option is Gaussian Windows.

## Fourier Transform in OpenCV

OpenCV provides the functions **cv.dft()** and **cv.idft()** for this. It returns the same result as previous, but with two channels. First channel will have the real part of the result and second channel will have the imaginary part of the result. The input image should be converted to np.float32 first. We will see how to do it. @code{.py} import numpy as np import cv2 as cv from matplotlib import pyplot as plt

img = cv.imread('messi5.jpg', cv.IMREAD\_GRAYSCALE) assert img is not None, "file could not be read, check with os.path.exists()"

dft = cv.dft(np.float32(img),flags = cv.DFT\_COMPLEX\_OUTPUT) dft\_shift = np.fft.fftshift(dft)

magnitude\_spectrum = 20\*np.log(cv.magnitude(dft\_shift\[:,:,0\],dft\_shift\[:,:,1\]))

plt.subplot(121),plt.imshow(img, cmap = 'gray') plt.title('Input Image'), plt.xticks(\[\]), plt.yticks(\[\]) plt.subplot(122),plt.imshow(magnitude\_spectrum, cmap = 'gray') plt.title('Magnitude Spectrum'), plt.xticks(\[\]), plt.yticks(\[\]) plt.show() @endcode

@note You can also use **cv.cartToPolar()** which returns both magnitude and phase in a single shot

So, now we have to do inverse DFT. In previous session, we created a HPF, this time we will see how to remove high frequency contents in the image, ie we apply LPF to image. It actually blurs the image. For this, we create a mask first with high value (1) at low frequencies, ie we pass the LF content, and 0 at HF region.

@code{.py} rows, cols = img.shape crow, ccol = rows//2, cols//2

# create a mask first, center square is 1, remaining all zeros

mask = np.zeros((rows,cols,2),np.uint8) mask\[crow-30:crow+30, ccol-30:ccol+30\] = 1

# apply mask and inverse DFT

fshift = dft\_shift\*mask f\_ishift = np.fft.ifftshift(fshift) img\_back = cv.idft(f\_ishift) img\_back = cv.magnitude(img\_back\[:,:,0\],img\_back\[:,:,1\])

plt.subplot(121),plt.imshow(img, cmap = 'gray') plt.title('Input Image'), plt.xticks(\[\]), plt.yticks(\[\]) plt.subplot(122),plt.imshow(img\_back, cmap = 'gray') plt.title('Magnitude Spectrum'), plt.xticks(\[\]), plt.yticks(\[\]) plt.show() @endcode See the result:

@note As usual, OpenCV functions **cv.dft()** and **cv.idft()** are faster than Numpy counterparts. But Numpy functions are more user-friendly. For more details about performance issues, see below section.

# Performance Optimization of DFT

Performance of DFT calculation is better for some array size. It is fastest when array size is power of two. The arrays whose size is a product of 2’s, 3’s, and 5’s are also processed quite efficiently. So if you are worried about the performance of your code, you can modify the size of the array to any optimal size (by padding zeros) before finding DFT. For OpenCV, you have to manually pad zeros. But for Numpy, you specify the new size of FFT calculation, and it will automatically pad zeros for you.

So how do we find this optimal size ? OpenCV provides a function, **cv.getOptimalDFTSize()** for this. It is applicable to both **cv.dft()** and **np.fft.fft2()**. Let's check their performance using IPython magic command %timeit. @code{.py} In \[15\]: img = cv.imread('messi5.jpg', cv.IMREAD\_GRAYSCALE) In \[16\]: assert img is not None, "file could not be read, check with os.path.exists()" In \[17\]: rows,cols = img.shape In \[18\]: print("{} {}".format(rows,cols)) 342 548

In \[19\]: nrows = cv.getOptimalDFTSize(rows) In \[20\]: ncols = cv.getOptimalDFTSize(cols) In \[21\]: print("{} {}".format(nrows,ncols)) 360 576 @endcode See, the size (342,548) is modified to (360, 576). Now let's pad it with zeros (for OpenCV) and find their DFT calculation performance. You can do it by creating a new big zero array and copy the data to it, or use **cv.copyMakeBorder()**. @code{.py} nimg = np.zeros((nrows,ncols)) nimg\[:rows,:cols\] = img @endcode OR: @code{.py} right = ncols - cols bottom = nrows - rows bordertype = cv.BORDER\_CONSTANT #just to avoid line breakup in PDF file nimg = cv.copyMakeBorder(img,0,bottom,0,right,bordertype, value = 0) @endcode Now we calculate the DFT performance comparison of Numpy function: @code{.py} In \[22\]: %timeit fft1 = np.fft.fft2(img) 10 loops, best of 3: 40.9 ms per loop In \[23\]: %timeit fft2 = np.fft.fft2(img,\[nrows,ncols\]) 100 loops, best of 3: 10.4 ms per loop @endcode It shows a 4x speedup. Now we will try the same with OpenCV functions. @code{.py} In \[24\]: %timeit dft1= cv.dft(np.float32(img),flags=cv.DFT\_COMPLEX\_OUTPUT) 100 loops, best of 3: 13.5 ms per loop In \[27\]: %timeit dft2= cv.dft(np.float32(nimg),flags=cv.DFT\_COMPLEX\_OUTPUT) 100 loops, best of 3: 3.11 ms per loop @endcode It also shows a 4x speed-up. You can also see that OpenCV functions are around 3x faster than Numpy functions. This can be tested for inverse FFT also, and that is left as an exercise for you.

## Why Laplacian is a High Pass Filter?

A similar question was asked in a forum. The question is, why Laplacian is a high pass filter? Why Sobel is a HPF? etc. And the first answer given to it was in terms of Fourier Transform. Just take the fourier transform of Laplacian for some higher size of FFT. Analyze it: @code{.py} import cv2 as cv import numpy as np from matplotlib import pyplot as plt

# simple averaging filter without scaling parameter

mean\_filter = np.ones((3,3))

# creating a gaussian filter

x = cv.getGaussianKernel(5,10) gaussian = x\*x.T

# different edge detecting filters

# scharr in x-direction

scharr = np.array(\[\[-3, 0, 3\], \[-10,0,10\], \[-3, 0, 3\]\])

# sobel in x direction

sobel\_x= np.array(\[\[-1, 0, 1\], \[-2, 0, 2\], \[-1, 0, 1\]\])

# sobel in y direction

sobel\_y= np.array(\[\[-1,-2,-1\], \[0, 0, 0\], \[1, 2, 1\]\])

# laplacian

laplacian=np.array(\[\[0, 1, 0\], \[1,-4, 1\], \[0, 1, 0\]\])

filters = \[mean\_filter, gaussian, laplacian, sobel\_x, sobel\_y, scharr\] filter\_name = \['mean\_filter', 'gaussian','laplacian', 'sobel\_x',  
'sobel\_y', 'scharr\_x'\] fft\_filters = \[np.fft.fft2(x) for x in filters\] fft\_shift = \[np.fft.fftshift(y) for y in fft\_filters\] mag\_spectrum = \[np.log(np.abs(z)+1) for z in fft\_shift\]

for i in range(6): plt.subplot(2,3,i+1),plt.imshow(mag\_spectrum\[i\],cmap = 'gray') plt.title(filter\_name\[i\]), plt.xticks(\[\]), plt.yticks(\[\])

plt.show() @endcode See the result:

From image, you can see what frequency region each kernel blocks, and what region it passes. From that information, we can say why each kernel is a HPF or a LPF

## Additional Resources

\-# [An Intuitive Explanation of Fourier Theory](http://cns-alumni.bu.edu/~slehar/fourier/fourier.html) by Steven Lehar 2. [Fourier Transform](http://homepages.inf.ed.ac.uk/rbf/HIPR2/fourier.htm) at HIPR 3. [What does frequency domain denote in case of images?](http://dsp.stackexchange.com/q/1637/818)

## [Py Table Of Contents Transforms](https://docharvest.github.io/docs/opencv5/py_tutorials/py_imgproc/py_transforms/py_table_of_contents_transforms/)

Contents

opencv5

Py Table Of Contents Transforms

OpenCV 5

Py Table Of Contents Transforms

# Image Transforms in OpenCV {#tutorial\_py\_table\_of\_contents\_transforms}

-   @subpage tutorial\_py\_fourier\_transform Learn to find the Fourier Transform of images

## [Py Watershed](https://docharvest.github.io/docs/opencv5/py_tutorials/py_imgproc/py_watershed/py_watershed/)

Contents

opencv5

Py Watershed

OpenCV 5

Py Watershed

# Image Segmentation with Watershed Algorithm {#tutorial\_py\_watershed}

## Goal

In this chapter, - We will learn to use marker-based image segmentation using watershed algorithm - We will see: **cv.watershed()**

## Theory

Any grayscale image can be viewed as a topographic surface where high intensity denotes peaks and hills while low intensity denotes valleys. You start filling every isolated valleys (local minima) with different colored water (labels). As the water rises, depending on the peaks (gradients) nearby, water from different valleys, obviously with different colors will start to merge. To avoid that, you build barriers in the locations where water merges. You continue the work of filling water and building barriers until all the peaks are under water. Then the barriers you created gives you the segmentation result. This is the "philosophy" behind the watershed. You can visit the [CMM webpage on watershed](https://people.cmm.minesparis.psl.eu/users/beucher/wtshed.html) to understand it with the help of some animations.

But this approach gives you oversegmented result due to noise or any other irregularities in the image. So OpenCV implemented a marker-based watershed algorithm where you specify which are all valley points are to be merged and which are not. It is an interactive image segmentation. What we do is to give different labels for our object we know. Label the region which we are sure of being the foreground or object with one color (or intensity), label the region which we are sure of being background or non-object with another color and finally the region which we are not sure of anything, label it with 0. That is our marker. Then apply watershed algorithm. Then our marker will be updated with the labels we gave, and the boundaries of objects will have a value of -1.

## Code

Below we will see an example on how to use the Distance Transform along with watershed to segment mutually touching objects.

Consider the coins image below, the coins are touching each other. Even if you threshold it, it will be touching each other.

We start with finding an approximate estimate of the coins. For that, we can use the Otsu's binarization. @code{.py} import numpy as np import cv2 as cv from matplotlib import pyplot as plt

img = cv.imread('coins.png') assert img is not None, "file could not be read, check with os.path.exists()" gray = cv.cvtColor(img,cv.COLOR\_BGR2GRAY) ret, thresh = cv.threshold(gray,0,255,cv.THRESH\_BINARY\_INV+cv.THRESH\_OTSU) @endcode Result:

Now we need to remove any small white noises in the image. For that we can use morphological opening. To remove any small holes in the object, we can use morphological closing. So, now we know for sure that region near to center of objects are foreground and region much away from the object are background. Only region we are not sure is the boundary region of coins.

So we need to extract the area which we are sure they are coins. Erosion removes the boundary pixels. So whatever remaining, we can be sure it is coin. That would work if objects were not touching each other. But since they are touching each other, another good option would be to find the distance transform and apply a proper threshold. Next we need to find the area which we are sure they are not coins. For that, we dilate the result. Dilation increases object boundary to background. This way, we can make sure whatever region in background in result is really a background, since boundary region is removed. See the image below.

The remaining regions are those which we don't have any idea, whether it is coins or background. Watershed algorithm should find it. These areas are normally around the boundaries of coins where foreground and background meet (Or even two different coins meet). We call it border. It can be obtained from subtracting sure\_fg area from sure\_bg area. @code{.py}

# noise removal

kernel = np.ones((3,3),np.uint8) opening = cv.morphologyEx(thresh,cv.MORPH\_OPEN,kernel, iterations = 2)

# sure background area

sure\_bg = cv.dilate(opening,kernel,iterations=3)

# Finding sure foreground area

dist\_transform = cv.distanceTransform(opening,cv.DIST\_L2,5) ret, sure\_fg = cv.threshold(dist\_transform,0.7\*dist\_transform.max(),255,0)

# Finding unknown region

sure\_fg = np.uint8(sure\_fg) unknown = cv.subtract(sure\_bg,sure\_fg) @endcode See the result. In the thresholded image, we get some regions of coins which we are sure of coins and they are detached now. (In some cases, you may be interested in only foreground segmentation, not in separating the mutually touching objects. In that case, you need not use distance transform, just erosion is sufficient. Erosion is just another method to extract sure foreground area, that's all.)

Now we know for sure which are region of coins, which are background and all. So we create marker (it is an array of same size as that of original image, but with int32 datatype) and label the regions inside it. The regions we know for sure (whether foreground or background) are labelled with any positive integers, but different integers, and the area we don't know for sure are just left as zero. For this we use **cv.connectedComponents()**. It labels background of the image with 0, then other objects are labelled with integers starting from 1.

But we know that if background is marked with 0, watershed will consider it as unknown area. So we want to mark it with different integer. Instead, we will mark unknown region, defined by unknown, with 0. @code{.py}

# Marker labelling

ret, markers = cv.connectedComponents(sure\_fg)

# Add one to all labels so that sure background is not 0, but 1

markers = markers+1

# Now, mark the region of unknown with zero

markers\[unknown==255\] = 0 @endcode See the result shown in JET colormap. The dark blue region shows unknown region. Sure coins are colored with different values. Remaining area which are sure background are shown in lighter blue compared to unknown region.

Now our marker is ready. It is time for final step, apply watershed. Then marker image will be modified. The boundary region will be marked with -1. @code{.py} markers = cv.watershed(img,markers) img\[markers == -1\] = \[255,0,0\] @endcode See the result below. For some coins, the region where they touch are segmented properly and for some, they are not.

## Additional Resources

\-# CMM page on [Watershed Transformation](https://people.cmm.minesparis.psl.eu/users/beucher/wtshed.html)

## Exercises

\-# OpenCV samples has an interactive sample on watershed segmentation, watershed.py. Run it, Enjoy it, then learn it.

## [Py Kmeans Index](https://docharvest.github.io/docs/opencv5/py_tutorials/py_ml/py_kmeans/py_kmeans_index/)

Contents

opencv5

Py Kmeans Index

OpenCV 5

Py Kmeans Index

# K-Means Clustering {#tutorial\_py\_kmeans\_index}

-   @subpage tutorial\_py\_kmeans\_understanding
    
    Read to get an intuitive understanding of K-Means Clustering
    
-   @subpage tutorial\_py\_kmeans\_opencv
    
    Now let's try K-Means functions in OpenCV

## [Py Kmeans Opencv](https://docharvest.github.io/docs/opencv5/py_tutorials/py_ml/py_kmeans/py_kmeans_opencv/py_kmeans_opencv/)

Contents

opencv5

Py Kmeans Opencv

OpenCV 5

Py Kmeans Opencv

# K-Means Clustering in OpenCV {#tutorial\_py\_kmeans\_opencv}

## Goal

-   Learn to use **cv.kmeans()** function in OpenCV for data clustering

## Understanding Parameters

### Input parameters

\-# **samples** : It should be of **np.float32** data type, and each feature should be put in a single column. -# **nclusters(K)** : Number of clusters required at end -# **criteria** : It is the iteration termination criteria. When this criteria is satisfied, algorithm iteration stops. Actually, it should be a tuple of 3 parameters. They are \`( type, max\_iter, epsilon )\`: -# type of termination criteria. It has 3 flags as below: - **cv.TERM\_CRITERIA\_EPS** - stop the algorithm iteration if specified accuracy, _epsilon_, is reached. - **cv.TERM\_CRITERIA\_MAX\_ITER** - stop the algorithm after the specified number of iterations, _max\_iter_. - **cv.TERM\_CRITERIA\_EPS + cv.TERM\_CRITERIA\_MAX\_ITER** - stop the iteration when any of the above condition is met. -# max\_iter - An integer specifying maximum number of iterations. -# epsilon - Required accuracy

\-# **attempts** : Flag to specify the number of times the algorithm is executed using different initial labellings. The algorithm returns the labels that yield the best compactness. This compactness is returned as output. -# **flags** : This flag is used to specify how initial centers are taken. Normally two flags are used for this : **cv.KMEANS\_PP\_CENTERS** and **cv.KMEANS\_RANDOM\_CENTERS**.

### Output parameters

\-# **compactness** : It is the sum of squared distance from each point to their corresponding centers. -# **labels** : This is the label array (same as 'code' in previous article) where each element marked '0', '1'..... -# **centers** : This is array of centers of clusters.

Now we will see how to apply K-Means algorithm with three examples.

1.  Data with Only One Feature

* * *

Consider, you have a set of data with only one feature, ie one-dimensional. For eg, we can take our t-shirt problem where you use only height of people to decide the size of t-shirt.

So we start by creating data and plot it in Matplotlib @code{.py} import numpy as np import cv2 as cv from matplotlib import pyplot as plt

x = np.random.randint(25,100,25) y = np.random.randint(175,255,25) z = np.hstack((x,y)) z = z.reshape((50,1)) z = np.float32(z) plt.hist(z,256,\[0,256\]),plt.show() @endcode So we have 'z' which is an array of size 50, and values ranging from 0 to 255. I have reshaped 'z' to a column vector. It will be more useful when more than one features are present. Then I made data of np.float32 type.

We get following image :

Now we apply the KMeans function. Before that we need to specify the criteria. My criteria is such that, whenever 10 iterations of algorithm is ran, or an accuracy of epsilon = 1.0 is reached, stop the algorithm and return the answer. @code{.py}

# Define criteria = ( type, max\_iter = 10 , epsilon = 1.0 )

criteria = (cv.TERM\_CRITERIA\_EPS + cv.TERM\_CRITERIA\_MAX\_ITER, 10, 1.0)

# Set flags (Just to avoid line break in the code)

flags = cv.KMEANS\_RANDOM\_CENTERS

# Apply KMeans

compactness,labels,centers = cv.kmeans(z,2,None,criteria,10,flags) @endcode This gives us the compactness, labels and centers. In this case, I got centers as 60 and 207. Labels will have the same size as that of test data where each data will be labelled as '0','1','2' etc. depending on their centroids. Now we split the data to different clusters depending on their labels. @code{.py} A = z\[labels==0\] B = z\[labels==1\] @endcode Now we plot A in Red color and B in Blue color and their centroids in Yellow color. @code{.py}

# Now plot 'A' in red, 'B' in blue, 'centers' in yellow

plt.hist(A,256,\[0,256\],color = 'r') plt.hist(B,256,\[0,256\],color = 'b') plt.hist(centers,32,\[0,256\],color = 'y') plt.show() @endcode Below is the output we got:

2.  Data with Multiple Features

* * *

In previous example, we took only height for t-shirt problem. Here, we will take both height and weight, ie two features.

Remember, in previous case, we made our data to a single column vector. Each feature is arranged in a column, while each row corresponds to an input test sample.

For example, in this case, we set a test data of size 50x2, which are heights and weights of 50 people. First column corresponds to height of all the 50 people and second column corresponds to their weights. First row contains two elements where first one is the height of first person and second one his weight. Similarly remaining rows corresponds to heights and weights of other people. Check image below:

Now I am directly moving to the code: @code{.py} import numpy as np import cv2 as cv from matplotlib import pyplot as plt

X = np.random.randint(25,50,(25,2)) Y = np.random.randint(60,85,(25,2)) Z = np.vstack((X,Y))

# convert to np.float32

Z = np.float32(Z)

# define criteria and apply kmeans()

criteria = (cv.TERM\_CRITERIA\_EPS + cv.TERM\_CRITERIA\_MAX\_ITER, 10, 1.0) ret,label,center=cv.kmeans(Z,2,None,criteria,10,cv.KMEANS\_RANDOM\_CENTERS)

# Now separate the data, Note the flatten()

A = Z\[label.ravel()==0\] B = Z\[label.ravel()==1\]

# Plot the data

plt.scatter(A\[:,0\],A\[:,1\]) plt.scatter(B\[:,0\],B\[:,1\],c = 'r') plt.scatter(center\[:,0\],center\[:,1\],s = 80,c = 'y', marker = 's') plt.xlabel('Height'),plt.ylabel('Weight') plt.show() @endcode Below is the output we get:

3.  Color Quantization

* * *

Color Quantization is the process of reducing number of colors in an image. One reason to do so is to reduce the memory. Sometimes, some devices may have limitation such that it can produce only limited number of colors. In those cases also, color quantization is performed. Here we use k-means clustering for color quantization.

There is nothing new to be explained here. There are 3 features, say, R,G,B. So we need to reshape the image to an array of Mx3 size (M is number of pixels in image). And after the clustering, we apply centroid values (it is also R,G,B) to all pixels, such that resulting image will have specified number of colors. And again we need to reshape it back to the shape of original image. Below is the code: @code{.py} import numpy as np import cv2 as cv

img = cv.imread('home.jpg') Z = img.reshape((-1,3))

# convert to np.float32

Z = np.float32(Z)

# define criteria, number of clusters(K) and apply kmeans()

criteria = (cv.TERM\_CRITERIA\_EPS + cv.TERM\_CRITERIA\_MAX\_ITER, 10, 1.0) K = 8 ret,label,center=cv.kmeans(Z,K,None,criteria,10,cv.KMEANS\_RANDOM\_CENTERS)

# Now convert back into uint8, and make original image

center = np.uint8(center) res = center\[label.flatten()\] res2 = res.reshape((img.shape))

cv.imshow('res2',res2) cv.waitKey(0) cv.destroyAllWindows() @endcode See the result below for K=8:

## [Py Kmeans Understanding](https://docharvest.github.io/docs/opencv5/py_tutorials/py_ml/py_kmeans/py_kmeans_understanding/py_kmeans_understanding/)

Contents

opencv5

Py Kmeans Understanding

OpenCV 5

Py Kmeans Understanding

# Understanding K-Means Clustering {#tutorial\_py\_kmeans\_understanding}

## Goal

In this chapter, we will understand the concepts of K-Means Clustering, how it works etc.

## Theory

We will deal this with an example which is commonly used.

### T-shirt size problem

Consider a company, which is going to release a new model of T-shirt to market. Obviously they will have to manufacture models in different sizes to satisfy people of all sizes. So the company make a data of people's height and weight, and plot them on to a graph, as below:

Company can't create t-shirts with all the sizes. Instead, they divide people to Small, Medium and Large, and manufacture only these 3 models which will fit into all the people. This grouping of people into three groups can be done by k-means clustering, and algorithm provides us best 3 sizes, which will satisfy all the people. And if it doesn't, company can divide people to more groups, may be five, and so on. Check image below :

### How does it work ?

This algorithm is an iterative process. We will explain it step-by-step with the help of images.

Consider a set of data as below ( You can consider it as t-shirt problem). We need to cluster this data into two groups.

**Step : 1** - Algorithm randomly chooses two centroids, \\f$C1\\f$ and \\f$C2\\f$ (sometimes, any two data are taken as the centroids).

**Step : 2** - It calculates the distance from each point to both centroids. If a test data is more closer to \\f$C1\\f$, then that data is labelled with '0'. If it is closer to \\f$C2\\f$, then labelled as '1' (If more centroids are there, labelled as '2','3' etc).

In our case, we will color all '0' labelled with red, and '1' labelled with blue. So we get following image after above operations.

**Step : 3** - Next we calculate the average of all blue points and red points separately and that will be our new centroids. That is \\f$C1\\f$ and \\f$C2\\f$ shift to newly calculated centroids. (Remember, the images shown are not true values and not to true scale, it is just for demonstration only).

And again, perform step 2 with new centroids and label data to '0' and '1'.

So we get result as below :

Now **Step - 2** and **Step - 3** are iterated until both centroids are converged to fixed points. _(Or it may be stopped depending on the criteria we provide, like maximum number of iterations, or a specific accuracy is reached etc.)_ **These points are such that sum of distances between test data and their corresponding centroids are minimum**. Or simply, sum of distances between \\f$C1 \\leftrightarrow Red\_Points\\f$ and \\f$C2 \\leftrightarrow Blue\_Points\\f$ is minimum.

\\f\[minimize ;\\bigg\[J = \\sum\_{All: Red\_Points}distance(C1,Red\_Point) + \\sum\_{All: Blue\_Points}distance(C2,Blue\_Point)\\bigg\]\\f\]

Final result almost looks like below :

So this is just an intuitive understanding of K-Means Clustering. For more details and mathematical explanation, please read any standard machine learning textbooks or check links in additional resources. It is just a top layer of K-Means clustering. There are a lot of modifications to this algorithm like, how to choose the initial centroids, how to speed up the iteration process etc.

## Additional Resources

\-# [Machine Learning Course](https://www.coursera.org/course/ml), Video lectures by Prof. Andrew Ng (Some of the images are taken from this)

## [Py Table Of Contents Ml](https://docharvest.github.io/docs/opencv5/py_tutorials/py_ml/py_table_of_contents_ml/)

Contents

opencv5

Py Table Of Contents Ml

OpenCV 5

Py Table Of Contents Ml

# Machine Learning {#tutorial\_py\_table\_of\_contents\_ml}

-   @subpage tutorial\_py\_kmeans\_index
    
    Learn to use K-Means Clustering to group data to a number of clusters. Plus learn to do color quantization using K-Means Clustering

## [Py Table Of Contents Objdetect](https://docharvest.github.io/docs/opencv5/py_tutorials/py_objdetect/py_table_of_contents_objdetect/)

Contents

opencv5

Py Table Of Contents Objdetect

OpenCV 5

Py Table Of Contents Objdetect

# Object Detection {#tutorial\_py\_table\_of\_contents\_objdetect}

Content has been moved: @ref tutorial\_table\_of\_content\_objdetect

## [Py Chromatic Aberration](https://docharvest.github.io/docs/opencv5/py_tutorials/py_photo/py_chromatic_aberration/py_chromatic_aberration/)

Contents

opencv5

Py Chromatic Aberration

OpenCV 5

Py Chromatic Aberration

# Chromatic Aberration Correction {#tutorial\_py\_chromatic\_aberration}

## Goal

In this chapter, we will learn how to

-   Calibrate your camera and get the coefficients to correct lateral chromatic aberration.
    
-   Export these coefficients that model the red/blue channel misalignments.
    
-   Correct images using functions in OpenCV.
    

## Basics

Lateral chromatic aberration occurs when different wavelengths focus at slightly different image positions. This results in red/blue fringes at the high-contrast edges, and is particularly common in old or lower-quality cameras and lenses. It is a property of the lens and appears consistently in every image taken with that camera and lens.

Image credit: PawełS, CC BY-SA 3.0 [http://creativecommons.org/licenses/by-sa/3.0/](http://creativecommons.org/licenses/by-sa/3.0/), via Wikimedia Commons

We treat lateral chromatic aberration as a geometric distortion of red and blue channels relative to the reference green, and aim to estimate a mapping that aligns the red and blue channels to green.

The correction follows the paper of Rudakova et al. on the lateral chromatic aberration. The misalignment in each channel is modeled as a polynomial of some degree. The distance between the precise locations of centers in red/blue and green channels is minimized with a warp of these centers.

The paper also proposed to use the calibration pattern of black discs, many more than the polynomial model coefficients count to get a proper fit. Degree 11 is often used, but smaller degrees can achieve similar level of accuracy with much better performance.

## Calibration

To create a model of the misalignments of the channels, we use the following calibration procedure:

1.  Print out the calibration photo available in [opencv\_extra/testdata/cv/cameracalibration/chromatic\_aberration/chromatic\_aberration\_pattern\_a3.png](https://github.com/opencv/opencv_extra/tree/5.x/testdata/cv/cameracalibration/chromatic_aberration/chromatic_aberration_pattern_a3.png). The photo is a grid of black discs on a white background, and as the chromatic aberration fringes appear on the edges of objects in the photo, we will be able to see many different misalignments and model them precisely.
    
2.  Take one or more images of the printed out calibration grid using your camera. Make sure that all of the discs are in the photo, and that the grid fills as much place as possible, as the chromatic aberration is the strongest at the edges and corners of the photo. You should be able to see color fringes by eye.
    
3.  Run calibraion, see [chromatic\_calibration.py](../../../../apps/chromatic-aberration-calibration/chromatic_calibration.py). The app can be used as follows:
    

```
chromatic_calibration.py calibrate [-h] [--degree DEGREE] --coeffs_file YAML image
chromatic_calibration.py correct   [-h] --coeffs_file YAML [-o OUTPUT] image
chromatic_calibration.py full      [-h] [--degree DEGREE] --coeffs_file YAML [-o OUTPUT] image
chromatic_calibration.py scan      [-h] --degree_range k0 k1 image
```

Calibrate estimates polynomial coefficients and outputs them to a YAML file to be used with correction functions.

-   Splits BGR, finds disk centers per channel at sub-pixel precision.
-   Pairs centers to green via KD-tree.
-   Builds monomial terms up to `--degree` and solves least squares, then refines with another optimization algorithm.
-   Saves a YAML with:
    -   `image_width`, `image_height`
    -   `red_channel/blue_channel`: `coeffs_x`, `coeffs_y` (length $M=(d+1)(d+2)/2$), and `rms` residuals.

Scan sweeps polynomial degree range and compares quality. Although higher degrees should almost always model the aberration better, lower degrees can be much faster.

-   Runs calibration for each degree in k0,..,k1 inclusive to fit models for each degree.
-   Extracts full disk contours per channel.
-   Warps R/B contours toward G using each degree’s polynomials and measures nearest-neighbor distances.
-   Prints a table of max / mean / std distances (in pixels) for red and blue.
-   The user can then choose what degree works best and calibrate the camera with that specific degree.

## Code

Minimal Python example for chromatic aberration correction:

```
import cv2 as cv

INPUT      = "path/to/input.jpg"
CALIB_YAML = "path/to/ca_photo_calib.yaml"
OUTPUT     = "corrected.png"
BAYER      = -1
SHOW       = True

FileStorage fs(parsed_args.coeffs_file, FileStorage::READ);
coeffMat, calib_size, degree = cv2.loadChromaticAberrationParams(fs.root())
corrected = cv.correctChromaticAberration(img, coeffMat, calib_size, degree, BAYER)

if SHOW:
    cv.namedWindow("Original",  cv.WINDOW_AUTOSIZE)
    cv.namedWindow("Corrected", cv.WINDOW_AUTOSIZE)
    cv.imshow("Original",  img)
    cv.imshow("Corrected", corrected)
    print("Press any key to close...")
    cv.waitKey(0)
    cv.destroyAllWindows()

cv.imwrite(OUTPUT, corrected)
```

## Additional Resources

@cite rudakova2013precise

## [Py Hdr](https://docharvest.github.io/docs/opencv5/py_tutorials/py_photo/py_hdr/py_hdr/)

Contents

opencv5

Py Hdr

OpenCV 5

Py Hdr

# High Dynamic Range (HDR) {#tutorial\_py\_hdr}

## Goal

In this chapter, we will

-   Learn how to generate and display HDR image from an exposure sequence.
-   Use exposure fusion to merge an exposure sequence.

## Theory

High-dynamic-range imaging (HDRI or HDR) is a technique used in imaging and photography to reproduce a greater dynamic range of luminosity than is possible with standard digital imaging or photographic techniques. While the human eye can adjust to a wide range of light conditions, most imaging devices use 8-bits per channel, so we are limited to only 256 levels. When we take photographs of a real world scene, bright regions may be overexposed, while the dark ones may be underexposed, so we can’t capture all details using a single exposure. HDR imaging works with images that use more than 8 bits per channel (usually 32-bit float values), allowing much wider dynamic range.

There are different ways to obtain HDR images, but the most common one is to use photographs of the scene taken with different exposure values. To combine these exposures it is useful to know your camera’s response function and there are algorithms to estimate it. After the HDR image has been merged, it has to be converted back to 8-bit to view it on usual displays. This process is called tonemapping. Additional complexities arise when objects of the scene or camera move between shots, since images with different exposures should be registered and aligned.

In this tutorial we show 2 algorithms (Debevec, Robertson) to generate and display HDR image from an exposure sequence, and demonstrate an alternative approach called exposure fusion (Mertens), that produces low dynamic range image and does not need the exposure times data. Furthermore, we estimate the camera response function (CRF) which is of great value for many computer vision algorithms. Each step of HDR pipeline can be implemented using different algorithms and parameters, so take a look at the reference manual to see them all.

## Exposure sequence HDR

In this tutorial we will look on the following scene, where we have 4 exposure images, with exposure times of: 15, 2.5, 1/4 and 1/30 seconds. (You can download the images from [Wikipedia](https://en.wikipedia.org/wiki/High-dynamic-range_imaging))

### 1\. Loading exposure images into a list

The first stage is simply loading all images into a list. In addition, we will need the exposure times for the regular HDR algorithms. Pay attention for the data types, as the images should be 1-channel or 3-channels 8-bit (np.uint8) and the exposure times need to be float32 and in seconds.

@code{.py} import cv2 as cv import numpy as np

# Loading exposure images into a list

img\_fn = \["img0.jpg", "img1.jpg", "img2.jpg", "img3.jpg"\] img\_list = \[cv.imread(fn) for fn in img\_fn\] exposure\_times = np.array(\[15.0, 2.5, 0.25, 0.0333\], dtype=np.float32) @endcode

### 2\. Merge exposures into HDR image

In this stage we merge the exposure sequence into one HDR image, showing 2 possibilities which we have in OpenCV. The first method is Debevec and the second one is Robertson. Notice that the HDR image is of type float32, and not uint8, as it contains the full dynamic range of all exposure images.

@code{.py}

# Merge exposures to HDR image

merge\_debevec = cv.createMergeDebevec() hdr\_debevec = merge\_debevec.process(img\_list, times=exposure\_times.copy()) merge\_robertson = cv.createMergeRobertson() hdr\_robertson = merge\_robertson.process(img\_list, times=exposure\_times.copy()) @endcode

### 3\. Tonemap HDR image

We map the 32-bit float HDR data into the range \[0..1\]. Actually, in some cases the values can be larger than 1 or lower the 0, so notice we will later have to clip the data in order to avoid overflow.

@note: The function `cv.createTonemap()` uses a default gamma value of 1.0. Set it explicitly to 2.2 to match standard display brightness and ensure consistent tone mapping results.

@code{.py}

# Tonemap HDR images using gamma correction (set gamma=2.2 for standard display brightness)

tonemap1 = cv.createTonemap(gamma=2.2) res\_debevec = tonemap1.process(hdr\_debevec.copy()) res\_robertson = tonemap1.process(hdr\_robertson.copy()) @endcode

### 4\. Merge exposures using Mertens fusion

Here we show an alternative algorithm to merge the exposure images, where we do not need the exposure times. We also do not need to use any tonemap algorithm because the Mertens algorithm already gives us the result in the range of \[0..1\].

@code{.py}

# Exposure fusion using Mertens

merge\_mertens = cv.createMergeMertens() res\_mertens = merge\_mertens.process(img\_list) @endcode

### 5\. Convert to 8-bit and save

In order to save or display the results, we need to convert the data into 8-bit integers in the range of \[0..255\].

@code{.py}

# Convert datatype to 8-bit and save

res\_debevec\_8bit = np.clip(res\_debevec_255, 0, 255).astype('uint8') res\_robertson\_8bit = np.clip(res\_robertson_255, 0, 255).astype('uint8') res\_mertens\_8bit = np.clip(res\_mertens\*255, 0, 255).astype('uint8')

cv.imwrite("ldr\_debevec.jpg", res\_debevec\_8bit) cv.imwrite("ldr\_robertson.jpg", res\_robertson\_8bit) cv.imwrite("fusion\_mertens.jpg", res\_mertens\_8bit) @endcode

## Results

You can see the different results but consider that each algorithm have additional extra parameters that you should fit to get your desired outcome. Best practice is to try the different methods and see which one performs best for your scene.

The results below were generated with a gamma value of 2.2 during tonemapping.

### Debevec:

### Robertson:

### Mertenes Fusion:

## Estimating Camera Response Function

The camera response function (CRF) gives us the connection between the scene radiance to the measured intensity values. The CRF if of great importance in some computer vision algorithms, including HDR algorithms. Here we estimate the inverse camera response function and use it for the HDR merge.

@code{.py}

# Estimate camera response function (CRF)

cal\_debevec = cv.createCalibrateDebevec() crf\_debevec = cal\_debevec.process(img\_list, times=exposure\_times) hdr\_debevec = merge\_debevec.process(img\_list, times=exposure\_times.copy(), response=crf\_debevec.copy()) cal\_robertson = cv.createCalibrateRobertson() crf\_robertson = cal\_robertson.process(img\_list, times=exposure\_times) hdr\_robertson = merge\_robertson.process(img\_list, times=exposure\_times.copy(), response=crf\_robertson.copy()) @endcode

The camera response function is represented by a 256-length vector for each color channel. For this sequence we got the following estimation:

## Additional Resources

1.  Paul E Debevec and Jitendra Malik. Recovering high dynamic range radiance maps from photographs. In ACM SIGGRAPH 2008 classes, page 31. ACM, 2008. @cite DM97
2.  Mark A Robertson, Sean Borman, and Robert L Stevenson. Dynamic range improvement through multiple exposures. In Image Processing, 1999. ICIP 99. Proceedings. 1999 International Conference on, volume 3, pages 159–163. IEEE, 1999. @cite RB99
3.  Tom Mertens, Jan Kautz, and Frank Van Reeth. Exposure fusion. In Computer Graphics and Applications, 2007. PG'07. 15th Pacific Conference on, pages 382–390. IEEE, 2007. @cite MK07
4.  Images from [Wikipedia-HDR](https://en.wikipedia.org/wiki/High-dynamic-range_imaging)

## Exercises

1.  Try all tonemap algorithms: cv::TonemapDrago, cv::TonemapMantiuk and cv::TonemapReinhard
2.  Try changing the parameters in the HDR calibration and tonemap methods.

## [Py Inpainting](https://docharvest.github.io/docs/opencv5/py_tutorials/py_photo/py_inpainting/py_inpainting/)

Contents

opencv5

Py Inpainting

OpenCV 5

Py Inpainting

# Image Inpainting {#tutorial\_py\_inpainting}

## Goal

In this chapter, - We will learn how to remove small noises, strokes etc in old photographs by a method called inpainting - We will see inpainting functionalities in OpenCV.

## Basics

Most of you will have some old degraded photos at your home with some black spots, some strokes etc on it. Have you ever thought of restoring it back? We can't simply erase them in a paint tool because it is will simply replace black structures with white structures which is of no use. In these cases, a technique called image inpainting is used. The basic idea is simple: Replace those bad marks with its neighbouring pixels so that it looks like the neighbourhood. Consider the image shown below (taken from [Wikipedia](http://en.wikipedia.org/wiki/Inpainting)):

Several algorithms were designed for this purpose and OpenCV provides two of them. Both can be accessed by the same function, **cv.inpaint()**

First algorithm is based on the paper **"An Image Inpainting Technique Based on the Fast Marching Method"** by Alexandru Telea in 2004. It is based on Fast Marching Method. Consider a region in the image to be inpainted. Algorithm starts from the boundary of this region and goes inside the region gradually filling everything in the boundary first. It takes a small neighbourhood around the pixel on the neighbourhood to be inpainted. This pixel is replaced by normalized weighted sum of all the known pixels in the neighbourhood. Selection of the weights is an important matter. More weightage is given to those pixels lying near to the point, near to the normal of the boundary and those lying on the boundary contours. Once a pixel is inpainted, it moves to next nearest pixel using Fast Marching Method. FMM ensures those pixels near the known pixels are inpainted first, so that it just works like a manual heuristic operation. This algorithm is enabled by using the flag, cv.INPAINT\_TELEA.

Second algorithm is based on the paper **"Navier-Stokes, Fluid Dynamics, and Image and Video Inpainting"** by Bertalmio, Marcelo, Andrea L. Bertozzi, and Guillermo Sapiro in 2001. This algorithm is based on fluid dynamics and utilizes partial differential equations. Basic principle is heurisitic. It first travels along the edges from known regions to unknown regions (because edges are meant to be continuous). It continues isophotes (lines joining points with same intensity, just like contours joins points with same elevation) while matching gradient vectors at the boundary of the inpainting region. For this, some methods from fluid dynamics are used. Once they are obtained, color is filled to reduce minimum variance in that area. This algorithm is enabled by using the flag, cv.INPAINT\_NS.

## Code

We need to create a mask of same size as that of input image, where non-zero pixels corresponds to the area which is to be inpainted. Everything else is simple. My image is degraded with some black strokes (I added manually). I created a corresponding strokes with Paint tool. @code{.py} import numpy as np import cv2 as cv

img = cv.imread('messi\_2.jpg') mask = cv.imread('mask2.png', cv.IMREAD\_GRAYSCALE)

dst = cv.inpaint(img,mask,3,cv.INPAINT\_TELEA)

cv.imshow('dst',dst) cv.waitKey(0) cv.destroyAllWindows() @endcode See the result below. First image shows degraded input. Second image is the mask. Third image is the result of first algorithm and last image is the result of second algorithm.

## Additional Resources

\-# Bertalmio, Marcelo, Andrea L. Bertozzi, and Guillermo Sapiro. "Navier-stokes, fluid dynamics, and image and video inpainting." In Computer Vision and Pattern Recognition, 2001. CVPR 2001. Proceedings of the 2001 IEEE Computer Society Conference on, vol. 1, pp. I-355. IEEE, 2001. 2. Telea, Alexandru. "An image inpainting technique based on the fast marching method." Journal of graphics tools 9.1 (2004): 23-34.

## Exercises

\-# OpenCV comes with an interactive sample on inpainting, samples/python/inpaint.py, try it. 2. A few months ago, I watched a video on [Content-Aware Fill](http://www.youtube.com/watch?v=ZtoUiplKa2A), an advanced inpainting technique used in Adobe Photoshop. On further search, I was able to find that same technique is already there in GIMP with different name, "Resynthesizer" (You need to install separate plugin). I am sure you will enjoy the technique.

## [Py Non Local Means](https://docharvest.github.io/docs/opencv5/py_tutorials/py_photo/py_non_local_means/py_non_local_means/)

Contents

opencv5

Py Non Local Means

OpenCV 5

Py Non Local Means

# Image Denoising {#tutorial\_py\_non\_local\_means}

## Goal

In this chapter,

-   You will learn about Non-local Means Denoising algorithm to remove noise in the image.
-   You will see different functions like **cv.fastNlMeansDenoising()**, **cv.fastNlMeansDenoisingColored()** etc.

## Theory

In earlier chapters, we have seen many image smoothing techniques like Gaussian Blurring, Median Blurring etc and they were good to some extent in removing small quantities of noise. In those techniques, we took a small neighbourhood around a pixel and did some operations like gaussian weighted average, median of the values etc to replace the central element. In short, noise removal at a pixel was local to its neighbourhood.

There is a property of noise. Noise is generally considered to be a random variable with zero mean. Consider a noisy pixel, \\f$p = p\_0 + n\\f$ where \\f$p\_0\\f$ is the true value of pixel and \\f$n\\f$ is the noise in that pixel. You can take large number of same pixels (say \\f$N\\f$) from different images and computes their average. Ideally, you should get \\f$p = p\_0\\f$ since mean of noise is zero.

You can verify it yourself by a simple setup. Hold a static camera to a certain location for a couple of seconds. This will give you plenty of frames, or a lot of images of the same scene. Then write a piece of code to find the average of all the frames in the video (This should be too simple for you now ). Compare the final result and first frame. You can see reduction in noise. Unfortunately this simple method is not robust to camera and scene motions. Also often there is only one noisy image available.

So idea is simple, we need a set of similar images to average out the noise. Consider a small window (say 5x5 window) in the image. Chance is large that the same patch may be somewhere else in the image. Sometimes in a small neighbourhood around it. What about using these similar patches together and find their average? For that particular window, that is fine. See an example image below:

The blue patches in the image looks the similar. Green patches looks similar. So we take a pixel, take small window around it, search for similar windows in the image, average all the windows and replace the pixel with the result we got. This method is Non-Local Means Denoising. It takes more time compared to blurring techniques we saw earlier, but its result is very good. More details and online demo can be found at first link in additional resources.

For color images, image is converted to CIELAB colorspace and then it separately denoise L and AB components.

## Image Denoising in OpenCV

OpenCV provides four variations of this technique.

\-# **cv.fastNlMeansDenoising()** - works with a single grayscale images 2. **cv.fastNlMeansDenoisingColored()** - works with a color image. 3. **cv.fastNlMeansDenoisingMulti()** - works with image sequence captured in short period of time (grayscale images) 4. **cv.fastNlMeansDenoisingColoredMulti()** - same as above, but for color images.

Common arguments are: - h : parameter deciding filter strength. Higher h value removes noise better, but removes details of image also. (10 is ok) - hForColorComponents : same as h, but for color images only. (normally same as h) - templateWindowSize : should be odd. (recommended 7) - searchWindowSize : should be odd. (recommended 21)

Please visit first link in additional resources for more details on these parameters.

We will demonstrate 2 and 3 here. Rest is left for you.

### 1\. cv.fastNlMeansDenoisingColored()

As mentioned above it is used to remove noise from color images. (Noise is expected to be gaussian). See the example below: @code{.py} import numpy as np import cv2 as cv from matplotlib import pyplot as plt

img = cv.imread('die.png')

dst = cv.fastNlMeansDenoisingColored(img,None,10,10,7,21)

plt.subplot(121),plt.imshow(img) plt.subplot(122),plt.imshow(dst) plt.show() @endcode Below is a zoomed version of result. My input image has a gaussian noise of \\f$\\sigma = 25\\f$. See the result:

### 2\. cv.fastNlMeansDenoisingMulti()

Now we will apply the same method to a video. The first argument is the list of noisy frames. Second argument imgToDenoiseIndex specifies which frame we need to denoise, for that we pass the index of frame in our input list. Third is the temporalWindowSize which specifies the number of nearby frames to be used for denoising. It should be odd. In that case, a total of temporalWindowSize frames are used where central frame is the frame to be denoised. For example, you passed a list of 5 frames as input. Let imgToDenoiseIndex = 2 and temporalWindowSize = 3. Then frame-1, frame-2 and frame-3 are used to denoise frame-2. Let's see an example. @code{.py} import numpy as np import cv2 as cv from matplotlib import pyplot as plt

cap = cv.VideoCapture('vtest.avi')

# create a list of first 5 frames

img = \[cap.read()\[1\] for i in range(5)\]

# convert all to grayscale

gray = \[cv.cvtColor(i, cv.COLOR\_BGR2GRAY) for i in img\]

# convert all to float64

gray = \[np.float64(i) for i in gray\]

# create a noise of variance 25

noise = np.random.randn(\*gray\[1\].shape)\*10

# Add this noise to images

noisy = \[i+noise for i in gray\]

# Convert back to uint8

noisy = \[np.uint8(np.clip(i,0,255)) for i in noisy\]

# Denoise 3rd frame considering all the 5 frames

dst = cv.fastNlMeansDenoisingMulti(noisy, 2, 5, None, 4, 7, 35)

plt.subplot(131),plt.imshow(gray\[2\],'gray') plt.subplot(132),plt.imshow(noisy\[2\],'gray') plt.subplot(133),plt.imshow(dst,'gray') plt.show() @endcode Below image shows a zoomed version of the result we got:

It takes considerable amount of time for computation. In the result, first image is the original frame, second is the noisy one, third is the denoised image.

## Additional Resources

\-# [http://www.ipol.im/pub/art/2011/bcm\_nlm/](http://www.ipol.im/pub/art/2011/bcm_nlm/) (It has the details, online demo etc. Highly recommended to visit. Our test image is generated from this link) 2. [Online course at coursera](https://www.coursera.org/course/images) (First image taken from here)

## [Py Table Of Contents Photo](https://docharvest.github.io/docs/opencv5/py_tutorials/py_photo/py_table_of_contents_photo/)

Contents

opencv5

Py Table Of Contents Photo

OpenCV 5

Py Table Of Contents Photo

# Computational Photography {#tutorial\_py\_table\_of\_contents\_photo}

Here you will learn different OpenCV functionalities related to Computational Photography like image denoising etc.

-   @subpage tutorial\_py\_non\_local\_means
    
    See a good technique to remove noises in images called Non-Local Means Denoising
    
-   @subpage tutorial\_py\_inpainting
    
    Do you have a old degraded photo with many black spots and strokes on it? Take it. Let's try to restore them with a technique called image inpainting.
    
-   @subpage tutorial\_py\_hdr
    
    Learn how to merge exposure sequence and process high dynamic range images.
    
-   @subpage tutorial\_py\_chromatic\_aberration
    
    Correct chromatic aberration in your camera's photos by calibrating the camera

## [Py Intro](https://docharvest.github.io/docs/opencv5/py_tutorials/py_setup/py_intro/py_intro/)

Contents

opencv5

Py Intro

OpenCV 5

Py Intro

# Introduction to OpenCV-Python Tutorials {#tutorial\_py\_intro}

## OpenCV

OpenCV was started at Intel in 1999 by **Gary Bradsky**, and the first release came out in 2000. **Vadim Pisarevsky** joined Gary Bradsky to manage Intel's Russian software OpenCV team. In 2005, OpenCV was used on Stanley, the vehicle that won the 2005 DARPA Grand Challenge. Later, its active development continued under the support of Willow Garage with Gary Bradsky and Vadim Pisarevsky leading the project. OpenCV now supports a multitude of algorithms related to Computer Vision and Machine Learning and is expanding day by day.

OpenCV supports a wide variety of programming languages such as C++, Python, Java, etc., and is available on different platforms including Windows, Linux, OS X, Android, and iOS. Interfaces for high-speed GPU operations based on CUDA and OpenCL are also under active development.

OpenCV-Python is the Python API for OpenCV, combining the best qualities of the OpenCV C++ API and the Python language.

## OpenCV-Python

OpenCV-Python is a library of Python bindings designed to solve computer vision problems.

Python is a general purpose programming language started by **Guido van Rossum** that became very popular very quickly, mainly because of its simplicity and code readability. It enables the programmer to express ideas in fewer lines of code without reducing readability.

Compared to languages like C/C++, Python is slower. That said, Python can be easily extended with C/C++, which allows us to write computationally intensive code in C/C++ and create Python wrappers that can be used as Python modules. This gives us two advantages: first, the code is as fast as the original C/C++ code (since it is the actual C++ code working in background) and second, it is easier to code in Python than C/C++. OpenCV-Python is a Python wrapper for the original OpenCV C++ implementation.

OpenCV-Python makes use of **Numpy**, which is a highly optimized library for numerical operations with a MATLAB-style syntax. All the OpenCV array structures are converted to and from Numpy arrays. This also makes it easier to integrate with other libraries that use Numpy such as SciPy and Matplotlib.

## OpenCV-Python Tutorials

OpenCV introduces a new set of tutorials which will guide you through various functions available in OpenCV-Python. **This guide is mainly focused on OpenCV 3.x version** (although most of the tutorials will also work with OpenCV 2.x).

Prior knowledge of Python and Numpy is recommended as they won't be covered in this guide. **Proficiency with Numpy is a must in order to write optimized code using OpenCV-Python.**

This tutorial was originally started by _Abid Rahman K._ as part of the Google Summer of Code 2013 program under the guidance of _Alexander Mordvintsev_.

## OpenCV Needs You !!!

Since OpenCV is an open source initiative, all are welcome to make contributions to the library, documentation, and tutorials. If you find any mistake in this tutorial (from a small spelling mistake to an egregious error in code or concept), feel free to correct it by cloning OpenCV in [GitHub](https://github.com/opencv/opencv) and submitting a pull request. OpenCV developers will check your pull request, give you important feedback and (once it passes the approval of the reviewer) it will be merged into OpenCV. You will then become an open source contributor :-)

As new modules are added to OpenCV-Python, this tutorial will have to be expanded. If you are familiar with a particular algorithm and can write up a tutorial including basic theory of the algorithm and code showing example usage, please do so.

Remember,**together** we can make this project a great success !!!

## Contributors

Below is the list of contributors who submitted tutorials to OpenCV-Python.

\-# Alexander Mordvintsev (GSoC-2013 mentor) 2. Abid Rahman K. (GSoC-2013 intern)

## Additional Resources

\-# A Quick guide to Python - [A Byte of Python](https://python.swaroopch.com/)

1.  [A Quick guide to Python](https://www.freecodecamp.org/news/the-python-guide-for-beginners/)
2.  [NumPy Quickstart tutorial](https://numpy.org/doc/stable/user/quickstart.html)
3.  [NumPy Reference](https://numpy.org/doc/stable/reference/index.html)
4.  [OpenCV Documentation](https://docs.opencv.org/)
5.  [OpenCV Forum](https://forum.opencv.org/)

## [Install OpenCV for Python with pip {#tutorial_py_pip_install}](https://docharvest.github.io/docs/opencv5/py_tutorials/py_setup/py_pip_install/py_pip_install/)

Contents

opencv5

Install OpenCV for Python with pip {#tutorial\_py\_pip\_install}

OpenCV 5

Install OpenCV for Python with pip {#tutorial\_py\_pip\_install}

This quick-start shows the **recommended** way for most users to get OpenCV in Python: install from **PyPI** with `pip`. It also explains virtual environments, platform notes, and common troubleshooting. If you need OS‑specific alternatives (system packages or source builds), see the OS pages linked below, but those are **not required** for typical Python use.

@note: OpenCV team maintains **PyPI** packages only. Conda distributions and platform specific builds are community builds and hardware vendor builds and may differ from the official one.

## Quick start

```
# 1) Create and activate a virtual environment (recommended)
python -m venv .venv
# Windows:
.venv\Scripts\activate
# Linux/macOS:
source .venv/bin/activate

# 2) Upgrade pip tooling
python -m pip install --upgrade pip setuptools wheel

# 3) Install OpenCV from PyPI (choose ONE)
pip install opencv-python          # main package (most users)
# or
pip install opencv-contrib-python  # + extra modules (contrib)
# or
pip install opencv-python-headless # no GUI/backends (servers/CI)
# or
pip install opencv-contrib-python-headless # no GUI/backends with extra modules (servers/CI)
```

### Tiny hello‑world

```
import cv2 as cv
import numpy as np

print("OpenCV:", cv.__version__)
img = np.zeros((120, 400, 3), dtype=np.uint8)
cv.putText(img, "OpenCV OK", (10, 80), cv.FONT_HERSHEY_SIMPLEX, 2, (255,255,255), 3)
# If you installed a non-headless build, you can display a window:
# cv.imshow("hello", img); cv.waitKey(0)
# Always safe (headless or not): save to file
cv.imwrite("hello.png", img)
```

## Virtual environments and IDEs

Using a virtual environment keeps project dependencies isolated. Tools that create or activate envs include:

-   `venv` (built-in) and `virtualenv`
-   Conda environments
-   IDEs (VS Code, PyCharm) that may **auto-create and auto-activate** an env per workspace

If imports fail inside an IDE, verify the interpreter selected by the IDE matches the environment where you installed OpenCV.

## OS notes

-   **Linux:** Your default Python may be `python3`. Use `python3 -m venv .venv` and `python3 -m pip ...`. If you cannot use a virtual env, `pip --user` installs to your home directory: `python3 -m pip install --user opencv-python`.
-   **Windows:** Install Python from \[python.org\] or via `winget install Python.Python.3`. Make sure **“Add python to PATH”** is enabled or use the **“Open in terminal”** from your IDE, which selects the right interpreter automatically.
-   **macOS:** Use the system `python3` or a managed one (Homebrew or Python.org). Always prefer a virtual environment.
-   **Raspberry Pi / ARM boards:** Prebuilt wheels may not exist for some Pi OS / Python combinations. See **Troubleshooting** below.

## Choosing a PyPI variant

-   `opencv-python`: core OpenCV modules with GUI/backends
-   `opencv-contrib-python`: includes **contrib** modules in addition to the core
-   `opencv-python-headless`: no GUI/backends (ideal for servers/containers/CI)
-   `opencv-contrib-python-headless`: contrib + headless

Install exactly **one** of these per environment.

## Troubleshooting

Please start with opencv-python project [README](https://github.com/opencv/opencv-python/blob/4.x/README.md)

**Pip is trying to build from source** Symptoms: very long build step, CMake errors, compiler errors. Fixes:

-   Upgrade build tooling: `python -m pip install --upgrade pip setuptools wheel`
-   Ensure your Python version is supported by the chosen package.
-   If you are on an uncommon platform or Python build, switch to a supported Python or try a different variant (headless vs non‑headless).

**“No matching distribution found” or “Unsupported wheel”**

-   Confirm your Python version (e.g., `python -V`). Choose a wheel that supports that version (manylinux/macOS/Windows wheels on PyPI target specific Python versions).
-   Create a fresh virtual environment with a mainstream Python (e.g., 3.10–3.12 for now) and reinstall.

**Raspberry Pi / ARM**

-   Wheels may lag behind new Python/Pi OS releases. Try `opencv-python-headless` first. If unavailable, consider system packages for camera/GUI pieces, or build from source following the OS page linked below.

**Import works in terminal but fails in IDE**

-   The IDE is using a different interpreter. Select the **same** environment inside your IDE’s interpreter settings.

## What about system packages or building from source?

For beginners using Python, **PyPI is recommended**. Native distribution packages and full source builds are better suited to advanced users with platform‑specific needs. You can still find them on the OS‑specific pages, moved under “Alternatives.”

## See also

-   @ref tutorial\_py\_root
-   OS pages: @ref tutorial\_py\_setup\_in\_windows, @ref tutorial\_py\_setup\_in\_ubuntu, @ref tutorial\_py\_setup\_in\_fedora

## [Py Setup In Fedora](https://docharvest.github.io/docs/opencv5/py_tutorials/py_setup/py_setup_in_fedora/py_setup_in_fedora/)


## [Py Setup In Ubuntu](https://docharvest.github.io/docs/opencv5/py_tutorials/py_setup/py_setup_in_ubuntu/py_setup_in_ubuntu/)

Contents

opencv5

Py Setup In Ubuntu

OpenCV 5

Py Setup In Ubuntu

# Install OpenCV-Python in Ubuntu {#tutorial\_py\_setup\_in\_ubuntu}

@note: Please prefer binaries distributed with PyPI, if possible. See @ref tutorial\_py\_pip\_install for details.

## Goals

In this tutorial We will learn to setup OpenCV-Python in Ubuntu System. Below steps are tested for Ubuntu 16.04 and 18.04 (both 64-bit).

OpenCV-Python can be installed in Ubuntu in two ways:

-   Install from pre-built binaries available in Ubuntu repositories
-   Compile from the source. In this section, we will see both.

Another important thing is the additional libraries required. OpenCV-Python requires only **Numpy** (in addition to other dependencies, which we will see later). But in this tutorials, we also use **Matplotlib** for some easy and nice plotting purposes (which I feel much better compared to OpenCV). Matplotlib is optional, but highly recommended. Similarly we will also see **IPython**, an Interactive Python Terminal, which is also highly recommended.

## Installing OpenCV-Python from Pre-built Binaries

This method serves best when using just for programming and developing OpenCV applications.

Install package [python3-opencv](https://packages.ubuntu.com/focal/python3-opencv) with following command in terminal (as root user).

```
$ sudo apt-get install python3-opencv
```

Open Python IDLE (or IPython) and type following codes in Python terminal.

```
import cv2 as cv
print(cv.__version__)
```

If the results are printed out without any errors, congratulations !!! You have installed OpenCV-Python successfully.

It is quite easy. But there is a problem with this. Apt repositories may not contain the latest version of OpenCV always. For example, at the time of writing this tutorial, apt repository contains 2.4.8 while latest OpenCV version is 3.x. With respect to Python API, latest version will always contain much better support and latest bug fixes.

So for getting latest source codes preference is next method, i.e. compiling from source. Also at some point in time, if you want to contribute to OpenCV, you will need this.

## Building OpenCV from source

Compiling from source may seem a little complicated at first, but once you succeeded in it, there is nothing complicated.

First we will install some dependencies. Some are required, some are optional. You can skip optional dependencies if you don't want.

### Required build dependencies

We need **CMake** to configure the installation, **GCC** for compilation, **Python-devel** and **Numpy** for building Python bindings etc.

```
sudo apt-get install cmake
sudo apt-get install gcc g++
```

to support python3:

```
sudo apt-get install python3-dev python3-numpy
```

Next we need **GTK** support for GUI features, Camera support (v4l), Media Support (ffmpeg, gstreamer) etc.

```
sudo apt-get install libavcodec-dev libavformat-dev libswscale-dev
sudo apt-get install libgstreamer-plugins-base1.0-dev libgstreamer1.0-dev
```

to support gtk2:

```
sudo apt-get install libgtk2.0-dev
```

to support gtk3:

```
sudo apt-get install libgtk-3-dev
```

### Optional Dependencies

Above dependencies are sufficient to install OpenCV in your Ubuntu machine. But depending upon your requirements, you may need some extra dependencies. A list of such optional dependencies are given below. You can either leave it or install it, your call :)

OpenCV comes with supporting files for image formats like PNG, JPEG, JPEG2000, TIFF, WebP etc. But it may be a little old. If you want to get latest libraries, you can install development files for system libraries of these formats.

```
sudo apt-get install libpng-dev
sudo apt-get install libjpeg-dev
sudo apt-get install libopenexr-dev
sudo apt-get install libtiff-dev
sudo apt-get install libwebp-dev
```

@note If you are using Ubuntu 16.04 you can also install `libjasper-dev` to add a system level support for the JPEG2000 format.

### Downloading OpenCV

To download the latest source from OpenCV's [GitHub Repository](https://github.com/opencv/opencv). (If you want to contribute to OpenCV choose this. For that, you need to install **Git** first)

```
$ sudo apt-get install git
$ git clone https://github.com/opencv/opencv.git
```

It will create a folder "opencv" in current directory. The cloning may take some time depending upon your internet connection.

Now open a terminal window and navigate to the downloaded "opencv" folder. Create a new "build" folder and navigate to it.

```
$ mkdir build
$ cd build
```

### Configuring and Installing

Now we have all the required dependencies, let's install OpenCV. Installation has to be configured with CMake. It specifies which modules are to be installed, installation path, which additional libraries to be used, whether documentation and examples to be compiled etc. Most of this work are done automatically with well configured default parameters.

Below command is normally used for configuration of OpenCV library build (executed from build folder):

```
$ cmake ../
```

OpenCV defaults assume "Release" build type and installation path is "/usr/local". For additional information about CMake options refer to OpenCV @ref tutorial\_linux\_install "C++ compilation guide":

You should see these lines in your CMake output (they mean that Python is properly found):

```
--   Python 3:
--     Interpreter:                 /usr/bin/python3.4 (ver 3.4.3)
--     Libraries:                   /usr/lib/x86_64-linux-gnu/libpython3.4m.so (ver 3.4.3)
--     numpy:                       /usr/lib/python3/dist-packages/numpy/core/include (ver 1.8.2)
--     packages path:               lib/python3.4/dist-packages
```

Now you build the files using "make" command and install it using "make install" command.

```
$ make
# sudo make install
```

Installation is over. All files are installed in "/usr/local/" folder. Open a terminal and try import "cv2".

```
import cv2 as cv
print(cv.__version__)
```

## [Py Setup In Windows](https://docharvest.github.io/docs/opencv5/py_tutorials/py_setup/py_setup_in_windows/py_setup_in_windows/)

Contents

opencv5

Py Setup In Windows

OpenCV 5

Py Setup In Windows

# Install OpenCV-Python in Windows {#tutorial\_py\_setup\_in\_windows}

@warning The instruction is deprecated. Please use OpenCV-Python package instead. See [https://github.com/opencv/opencv-python](https://github.com/opencv/opencv-python) for more details

## Goals

In this tutorial - We will learn to setup OpenCV-Python in your Windows system.

Below steps are tested in a Windows 7-64 bit machine with Visual Studio 2010 and Visual Studio 2012. The screenshots shows VS2012.

## Installing OpenCV from prebuilt binaries

\-# Below Python packages are to be downloaded and installed to their default locations.

```
-#  Python 3.x (3.4+) from [here](https://www.python.org/downloads/).

-#  Numpy package (for example, using `pip install numpy` command).

-#  Matplotlib (`pip install matplotlib`) (*Matplotlib is optional, but recommended since we use it a lot in our tutorials*).
```

\-# Install all packages into their default locations. Python will be installed to `C:/Python34/` in case of Python 3.4.

\-# After installation, open Python IDLE. Enter **import numpy** and make sure Numpy is working fine.

\-# Download latest OpenCV release from [GitHub](https://github.com/opencv/opencv/releases) or [SourceForge site](https://sourceforge.net/projects/opencvlibrary/files/) and double-click to extract it.

\-# Goto **opencv/build/python/3.4** folder.

\-# Copy **cv2.pyd** to **C:/Python34/lib/site-packages**.

\-# Copy the **opencv\_world.dll** file to **C:/Python34/lib/site-packages**

\-# Open Python IDLE and type following codes in Python terminal. @code >>> import cv2 as cv >>> print( cv.**version** ) @endcode

If the results are printed out without any errors, congratulations !!! You have installed OpenCV-Python successfully.

## Building OpenCV from source

\-# Download and install Visual Studio and CMake.

```
-#  [Visual Studio 2012](http://go.microsoft.com/?linkid=9816768)

-#  [CMake](https://cmake.org/download/)
```

\-# Download and install necessary Python packages to their default locations

```
-#  Python

-#  Numpy

@note In this case, we are using 32-bit binaries of Python packages. But if you want to use
OpenCV for x64, 64-bit binaries of Python packages are to be installed. Problem is that, there
is no official 64-bit binaries of Numpy. You have to build it on your own. For that, you have to
use the same compiler used to build Python. When you start Python IDLE, it shows the compiler
details. You can get more [information here](http://stackoverflow.com/q/2676763/1134940). So
your system must have the same Visual Studio version and build Numpy from source.

@note Another method to have 64-bit Python packages is to use ready-made Python distributions
from third-parties like [Anaconda](http://www.continuum.io/downloads),
[Enthought](https://www.enthought.com/downloads/) etc. It will be bigger in size, but will have
everything you need. Everything in a single shell. You can also download 32-bit versions also.
```

\-# Make sure Python and Numpy are working fine.

\-# Download OpenCV source. It can be from [Sourceforge](http://sourceforge.net/projects/opencvlibrary/) (for official release version) or from [Github](https://github.com/opencv/opencv) (for latest source). -# Extract it to a folder, opencv and create a new folder build in it. -# Open CMake-gui (_Start > All Programs > CMake-gui_) -# Fill the fields as follows (see the image below): -# Click on **Browse Source...** and locate the opencv folder. -# Click on **Browse Build...** and locate the build folder we created. -# Click on **Configure**. -# It will open a new window to select the compiler. Choose appropriate compiler (here, Visual Studio 11) and click **Finish**. -# Wait until analysis is finished. -# You will see all the fields are marked in red. Click on the **WITH** field to expand it. It decides what extra features you need. So mark appropriate fields. See the below image: -# Now click on **BUILD** field to expand it. First few fields configure the build method. See the below image: -# Remaining fields specify what modules are to be built. Since GPU modules are not yet supported by OpenCV-Python, you can completely avoid it to save time (But if you work with them, keep it there). See the image below: -# Now click on **ENABLE** field to expand it. Make sure **ENABLE\_SOLUTION\_FOLDERS** is unchecked (Solution folders are not supported by Visual Studio Express edition). See the image below: -# Also make sure that in the **PYTHON** field, everything is filled. (Ignore PYTHON\_DEBUG\_LIBRARY). See image below: -# Finally click the **Generate** button. -# Now go to our **opencv/build** folder. There you will find **OpenCV.sln** file. Open it with Visual Studio. -# Check build mode as **Release** instead of **Debug**. -# In the solution explorer, right-click on the **Solution** (or **ALL\_BUILD**) and build it. It will take some time to finish. -# Again, right-click on **INSTALL** and build it. Now OpenCV-Python will be installed. -# Open Python IDLE and enter 'import cv2 as cv'. If no error, it is installed correctly.

@note We have installed with no other support like TBB, Eigen, Qt, Documentation etc. It would be difficult to explain it here. A more detailed video will be added soon or you can just hack around.

## Exercises

If you have a windows machine, compile the OpenCV from source. Do all kinds of hacks. If you meet any problem, visit OpenCV forum and explain your problem.

## [Py Table Of Contents Setup](https://docharvest.github.io/docs/opencv5/py_tutorials/py_setup/py_table_of_contents_setup/)

Contents

opencv5

Py Table Of Contents Setup

OpenCV 5

Py Table Of Contents Setup

# Introduction to OpenCV {#tutorial\_py\_table\_of\_contents\_setup}

-   @subpage tutorial\_py\_intro
    
    Getting Started with OpenCV-Python
    
-   @subpage tutorial\_py\_pip\_install
    
    Install OpenCV for Python with pip
    
-   @subpage tutorial\_py\_setup\_in\_windows
    
    Set Up OpenCV-Python in Windows
    
-   @subpage tutorial\_py\_setup\_in\_fedora
    
    Set Up OpenCV-Python in Fedora
    
-   @subpage tutorial\_py\_setup\_in\_ubuntu
    
    Set Up OpenCV-Python in Ubuntu

## [Py Tutorials](https://docharvest.github.io/docs/opencv5/py_tutorials/py_tutorials/)

Contents

opencv5

Py Tutorials

OpenCV 5

Py Tutorials

# OpenCV-Python Tutorials {#tutorial\_py\_root}

-   @subpage tutorial\_py\_table\_of\_contents\_setup
    
    Learn how to setup OpenCV-Python on your computer!
    
-   @subpage tutorial\_py\_table\_of\_contents\_gui
    
    Here you will learn how to display and save images and videos, control mouse events and create trackbar.
    
-   @subpage tutorial\_py\_table\_of\_contents\_core
    
    In this section you will learn basic operations on image like pixel editing, geometric transformations, code optimization, some mathematical tools etc.
    
-   @subpage tutorial\_py\_table\_of\_contents\_imgproc
    
    In this section you will learn different image processing functions inside OpenCV.
    
-   @subpage tutorial\_py\_table\_of\_contents\_features
    
    In this section you will learn about feature detectors and descriptors
    
-   @ref tutorial\_table\_of\_content\_video
    
    In this section you will learn different techniques to work with videos like object tracking etc.
    
-   @subpage tutorial\_py\_table\_of\_contents\_calib3d
    
    In this section we will learn about camera calibration, stereo imaging etc.
    
-   @subpage tutorial\_py\_table\_of\_contents\_ml
    
    In this section you will learn different image processing functions inside OpenCV.
    
-   @subpage tutorial\_py\_table\_of\_contents\_photo
    
    In this section you will learn different computational photography techniques like image denoising etc.
    
-   @ref tutorial\_table\_of\_content\_objdetect
    
    In this section you will learn object detection techniques.
    
-   @subpage tutorial\_py\_table\_of\_contents\_bindings
    
    In this section, we will see how OpenCV-Python bindings are generated

## [Py Bg Subtraction](https://docharvest.github.io/docs/opencv5/py_tutorials/py_video/py_bg_subtraction/py_bg_subtraction/)

Contents

opencv5

Py Bg Subtraction

OpenCV 5

Py Bg Subtraction

# Background Subtraction {#tutorial\_py\_bg\_subtraction}

Tutorial content has been moved: @ref tutorial\_background\_subtraction

## [Py Lucas Kanade](https://docharvest.github.io/docs/opencv5/py_tutorials/py_video/py_lucas_kanade/py_lucas_kanade/)

Contents

opencv5

Py Lucas Kanade

OpenCV 5

Py Lucas Kanade

# Optical Flow {#tutorial\_py\_lucas\_kanade}

Tutorial content has been moved: @ref tutorial\_optical\_flow

## [Py Meanshift](https://docharvest.github.io/docs/opencv5/py_tutorials/py_video/py_meanshift/py_meanshift/)

Contents

opencv5

Py Meanshift

OpenCV 5

Py Meanshift

# Meanshift and Camshift {#tutorial\_py\_meanshift}

Tutorial content has been moved: @ref tutorial\_meanshift

## [Py Table Of Contents Video](https://docharvest.github.io/docs/opencv5/py_tutorials/py_video/py_table_of_contents_video/)

Contents

opencv5

Py Table Of Contents Video

OpenCV 5

Py Table Of Contents Video

# Video Analysis {#tutorial\_py\_table\_of\_contents\_video}

Content has been moved: @ref tutorial\_table\_of\_content\_video

## [Table Of Content Highgui](https://docharvest.github.io/docs/opencv5/tutorials/app/_old/table_of_content_highgui/)

Contents

opencv5

Table Of Content Highgui

OpenCV 5

Table Of Content Highgui

# High Level GUI and Media (highgui module) {#tutorial\_table\_of\_content\_highgui}

Content has been moved to this page: @ref tutorial\_table\_of\_content\_app

## [Table Of Content Imgcodecs](https://docharvest.github.io/docs/opencv5/tutorials/app/_old/table_of_content_imgcodecs/)

Contents

opencv5

Table Of Content Imgcodecs

OpenCV 5

Table Of Content Imgcodecs

# Image Input and Output (imgcodecs module) {#tutorial\_table\_of\_content\_imgcodecs}

Content has been moved to this page: @ref tutorial\_table\_of\_content\_app

## [Table Of Content Videoio](https://docharvest.github.io/docs/opencv5/tutorials/app/_old/table_of_content_videoio/)

Contents

opencv5

Table Of Content Videoio

OpenCV 5

Table Of Content Videoio

# Video Input and Output (videoio module) {#tutorial\_table\_of\_content\_videoio}

Content has been moved to this page: @ref tutorial\_table\_of\_content\_app

## [Animations](https://docharvest.github.io/docs/opencv5/tutorials/app/animations/)

Contents

opencv5

Animations

OpenCV 5

Animations

# Handling Animated Image Files {#tutorial\_animations}

@tableofcontents

Original author

Suleyman Turkmen (with help of ChatGPT)

Compatibility

OpenCV >= 4.11

## Goal

In this tutorial, you will learn how to:

-   Use `cv::imreadanimation` to load frames from animated image files.
-   Understand the structure and parameters of the `cv::Animation` structure.
-   Display individual frames from an animation.
-   Use `cv::imwriteanimation` to write `cv::Animation` to a file.

## Source Code

@add\_toggle\_cpp

-   **Downloadable code**: Click [here](https://github.com/opencv/opencv/tree/4.x/samples/cpp/tutorial_code/imgcodecs/animations.cpp)
    
-   **Code at a glance:** @include samples/cpp/tutorial\_code/imgcodecs/animations.cpp @end\_toggle
    

@add\_toggle\_python

-   **Downloadable code**: Click [here](https://github.com/opencv/opencv/tree/4.x/samples/python/tutorial_code/imgcodecs/animations.py)
    
-   **Code at a glance:** @include samples/python/tutorial\_code/imgcodecs/animations.py @end\_toggle
    

## Explanation

## Initializing the Animation Structure

Initialize a `cv::Animation` structure to hold the frames from the animated image file.

@add\_toggle\_cpp @snippet cpp/tutorial\_code/imgcodecs/animations.cpp init\_animation @end\_toggle

@add\_toggle\_python @snippet python/tutorial\_code/imgcodecs/animations.py init\_animation @end\_toggle

## Loading Frames

Use `cv::imreadanimation` to load frames from the specified file. Here, we load all frames from an animated WebP image.

@add\_toggle\_cpp @snippet cpp/tutorial\_code/imgcodecs/animations.cpp read\_animation @end\_toggle

@add\_toggle\_python @snippet python/tutorial\_code/imgcodecs/animations.py read\_animation @end\_toggle

## Displaying Frames

Each frame in the `animation.frames` vector can be displayed as a standalone image. This loop iterates through each frame, displaying it in a window with a short delay to simulate the animation.

> **Note:** Frame durations in `cv::Animation` are expressed in milliseconds. When displaying frames manually using `cv::waitKey`, make sure to use the corresponding duration value to preserve the original animation timing.

@add\_toggle\_cpp @snippet cpp/tutorial\_code/imgcodecs/animations.cpp show\_animation @end\_toggle

@add\_toggle\_python @snippet python/tutorial\_code/imgcodecs/animations.py show\_animation @end\_toggle

## Saving Animation

@add\_toggle\_cpp @snippet cpp/tutorial\_code/imgcodecs/animations.cpp write\_animation @end\_toggle

@add\_toggle\_python @snippet python/tutorial\_code/imgcodecs/animations.py write\_animation @end\_toggle

## Summary

The `cv::imreadanimation` and `cv::imwriteanimation` functions make it easy to work with animated image files by loading frames into a `cv::Animation` structure, allowing frame-by-frame processing. With these functions, you can load, process, and save frames from animated image files like GIF, AVIF, APNG, and WebP.

## [Highgui Wayland Ubuntu](https://docharvest.github.io/docs/opencv5/tutorials/app/highgui_wayland_ubuntu/)

Contents

opencv5

Highgui Wayland Ubuntu

OpenCV 5

Highgui Wayland Ubuntu

# Using Wayland highgui-backend in Ubuntu {#tutorial\_wayland\_ubuntu}

@tableofcontents

@prev\_tutorial{tutorial\_intelperc}

Original author

Kumataro

Compatibility

OpenCV >= 4.10

^

Ubuntu 24.04

## Goal

This tutorial is to use Wayland highgui-backend in Ubuntu 24.04.

Wayland highgui-backend is experimental implementation.

## Setup

-   Setup Ubuntu 24.04.
-   `sudo apt install build-essential git cmake` to build OpenCV.
-   `sudo apt install libwayland-dev wayland-protocols libxkbcommon-dev` to enable Wayland highgui-backend.
-   (Option) `sudo apt install ninja-build` (or remove `-GNinja` option for cmake command).
-   (Option) `sudo apt install libwayland-egl1` to enable Wayland EGL library.

## Get OpenCV from GitHub

```
mkdir work
cd work
git clone --depth=1 https://github.com/opencv/opencv.git
```

@note `--depth=1` option is to limit downloading commits. If you want to see more commit history, please remove this option.

## Build/Install OpenCV with Wayland highgui-backend

Run `cmake` with `-DWITH_WAYLAND=ON` option to configure OpenCV.

```
cmake -S opencv -B build4-main -DWITH_WAYLAND=ON -GNinja
```

If succeeded, Wayland Client/Cursor/Protocols and Xkbcommon versions are shown. Wayland EGL is option.

```
--
--   GUI:                           Wayland
--     Wayland:                     (Experimental) YES
--       Wayland Client:            YES (ver 1.22.0)
--       Wayland Cursor:            YES (ver 1.22.0)
--       Wayland Protocols:         YES (ver 1.34)
--       Xkbcommon:                 YES (ver 1.6.0)
--       Wayland EGL(Option):       YES (ver 18.1.0)
--     GTK+:                        NO
--     VTK support:                 NO
```

Run `cmake --build` to build, and `sudo cmake --install` to install into your system.

```
cmake --build build4-main
sudo cmake --install build4-main
sudo ldconfig
```

## Simple Application to try Wayland highgui-backend

Try this code, so you can see name of currentUIFrramework() and OpenCV logo window with Wayland highgui-backend.

```
// g++ main.cpp -o a.out -I /usr/local/include/opencv4 -lopencv_core -lopencv_highgui -lopencv_imgcodecs
#include <opencv2/core.hpp>
#include <opencv2/highgui.hpp>
#include <opencv2/imgcodecs.hpp>
#include <iostream>
#include <string>

int main(void)
{
  std::cout << "cv::currentUIFramework() returns " << cv::currentUIFramework() << std::endl;

  cv::Mat src;
  src = cv::imread("opencv-logo.png");

  cv::namedWindow("src");

  int key = 0;
  do
  {
      cv::imshow("src", src );
      key = cv::waitKey(50);
  } while( key != 'q' );
  return 0;
}
```

## Limitation/Known problem

-   cv::moveWindow() is not implementated. ( See. [https://github.com/opencv/opencv/issues/25478](https://github.com/opencv/opencv/issues/25478) )

## [Intelperc](https://docharvest.github.io/docs/opencv5/tutorials/app/intelperc/)

Contents

opencv5

Intelperc

OpenCV 5

Intelperc

# Using Creative Senz3D and other Intel RealSense SDK compatible depth sensors {#tutorial\_intelperc}

@tableofcontents

@prev\_tutorial{tutorial\_orbbec\_uvc} @next\_tutorial{tutorial\_wayland\_ubuntu}

Original author

Alessandro de Oliveira Faria

Compatibility

OpenCV >= 4.5.5

**Note**: This tutorial is partially obsolete since PerC SDK has been replaced with RealSense SDK

Depth sensors compatible with Intel® RealSense SDK are supported through VideoCapture class. Depth map, RGB image and some other formats of output can be retrieved by using familiar interface of VideoCapture.

In order to use depth sensor with OpenCV you should do the following preliminary steps:

\-# Install Intel RealSense SDK 2.0 (from here [https://github.com/IntelRealSense/librealsense](https://github.com/IntelRealSense/librealsense)).

\-# Configure OpenCV with Intel RealSense SDK support by setting WITH\_LIBREALSENSE flag in CMake. If Intel RealSense SDK is found in install folders OpenCV will be built with Intel Realsense SDK library (see a status LIBREALSENSE in CMake log).

\-# Build OpenCV.

VideoCapture can retrieve the following data:

\-# data given from depth generator: - CAP\_INTELPERC\_DEPTH\_MAP - each pixel is a 16-bit integer. The value indicates the distance from an object to the camera's XY plane or the Cartesian depth. (CV\_16UC1) - CAP\_INTELPERC\_UVDEPTH\_MAP - each pixel contains two 32-bit floating point values in the range of 0-1, representing the mapping of depth coordinates to the color coordinates. (CV\_32FC2) - CAP\_INTELPERC\_IR\_MAP - each pixel is a 16-bit integer. The value indicates the intensity of the reflected laser beam. (CV\_16UC1)

\-# data given from RGB image generator: - CAP\_INTELPERC\_IMAGE - color image. (CV\_8UC3)

In order to get depth map from depth sensor use VideoCapture::operator >>, e. g. : @code{.cpp} VideoCapture capture( CAP\_REALSENSE ); for(;;) { Mat depthMap; capture >> depthMap;

```
    if( waitKey( 30 ) >= 0 )
        break;
}
```

@endcode For getting several data maps use VideoCapture::grab and VideoCapture::retrieve, e.g. : @code{.cpp} VideoCapture capture(CAP\_REALSENSE); for(;;) { Mat depthMap; Mat image; Mat irImage;

```
    capture.grab();

    capture.retrieve( depthMap, CAP_INTELPERC_DEPTH_MAP );
    capture.retrieve(    image, CAP_INTELPERC_IMAGE );
    capture.retrieve(  irImage, CAP_INTELPERC_IR_MAP);

    if( waitKey( 30 ) >= 0 )
        break;
}
```

@endcode For setting and getting some property of sensor\` data generators use VideoCapture::set and VideoCapture::get methods respectively, e.g. : @code{.cpp} VideoCapture capture(CAP\_REALSENSE); capture.set( CAP\_INTELPERC\_DEPTH\_GENERATOR | CAP\_PROP\_INTELPERC\_PROFILE\_IDX, 0 ); cout << "FPS " << capture.get( CAP\_INTELPERC\_DEPTH\_GENERATOR+CAP\_PROP\_FPS ) << endl; @endcode Since two types of sensor's data generators are supported (image generator and depth generator), there are two flags that should be used to set/get property of the needed generator:

-   CAP\_INTELPERC\_IMAGE\_GENERATOR -- a flag for access to the image generator properties.
-   CAP\_INTELPERC\_DEPTH\_GENERATOR -- a flag for access to the depth generator properties. This flag value is assumed by default if neither of the two possible values of the property is set.

For more information please refer to the example of usage [videocapture\_depth.cpp](https://github.com/opencv/opencv/tree/5.x/samples/cpp/videocapture_depth.cpp) in opencv/samples/cpp folder.

## [Kinect Openni](https://docharvest.github.io/docs/opencv5/tutorials/app/kinect_openni/)

Contents

opencv5

Kinect Openni

OpenCV 5

Kinect Openni

# Using Kinect and other OpenNI compatible depth sensors {#tutorial\_kinect\_openni}

@tableofcontents

@prev\_tutorial{tutorial\_video\_write} @next\_tutorial{tutorial\_orbbec\_astra\_openni}

Depth sensors compatible with OpenNI (Kinect, XtionPRO, ...) are supported through VideoCapture class. Depth map, BGR image and some other formats of output can be retrieved by using familiar interface of VideoCapture.

In order to use depth sensor with OpenCV you should do the following preliminary steps:

\-# Install OpenNI library (from here [http://www.openni.org/downloadfiles](http://www.openni.org/downloadfiles)) and PrimeSensor Module for OpenNI (from here [https://github.com/avin2/SensorKinect](https://github.com/avin2/SensorKinect)). The installation should be done to default folders listed in the instructions of these products, e.g.: @code{.text} OpenNI: Linux & MacOSX: Libs into: /usr/lib Includes into: /usr/include/ni Windows: Libs into: c:/Program Files/OpenNI/Lib Includes into: c:/Program Files/OpenNI/Include PrimeSensor Module: Linux & MacOSX: Bins into: /usr/bin Windows: Bins into: c:/Program Files/Prime Sense/Sensor/Bin @endcode If one or both products were installed to the other folders, the user should change corresponding CMake variables OPENNI\_LIB\_DIR, OPENNI\_INCLUDE\_DIR or/and OPENNI\_PRIME\_SENSOR\_MODULE\_BIN\_DIR.

\-# Configure OpenCV with OpenNI support by setting WITH\_OPENNI flag in CMake. If OpenNI is found in install folders OpenCV will be built with OpenNI library (see a status OpenNI in CMake log) whereas PrimeSensor Modules can not be found (see a status OpenNI PrimeSensor Modules in CMake log). Without PrimeSensor module OpenCV will be successfully compiled with OpenNI library, but VideoCapture object will not grab data from Kinect sensor.

\-# Build OpenCV.

VideoCapture can retrieve the following data:

\-# data given from depth generator: - CAP\_OPENNI\_DEPTH\_MAP - depth values in mm (CV\_16UC1) - CAP\_OPENNI\_POINT\_CLOUD\_MAP - XYZ in meters (CV\_32FC3) - CAP\_OPENNI\_DISPARITY\_MAP - disparity in pixels (CV\_8UC1) - CAP\_OPENNI\_DISPARITY\_MAP\_32F - disparity in pixels (CV\_32FC1) - CAP\_OPENNI\_VALID\_DEPTH\_MASK - mask of valid pixels (not occluded, not shaded etc.) (CV\_8UC1)

\-# data given from BGR image generator: - CAP\_OPENNI\_BGR\_IMAGE - color image (CV\_8UC3) - CAP\_OPENNI\_GRAY\_IMAGE - gray image (CV\_8UC1)

In order to get depth map from depth sensor use VideoCapture::operator >>, e. g. : @code{.cpp} VideoCapture capture( CAP\_OPENNI2 ); for(;;) { Mat depthMap; capture >> depthMap;

```
    if( waitKey( 30 ) >= 0 )
        break;
}
```

@endcode For getting several data maps use VideoCapture::grab and VideoCapture::retrieve, e.g. : @code{.cpp} VideoCapture capture(0); // or CAP\_OPENNI2 for(;;) { Mat depthMap; Mat bgrImage;

```
    capture.grab();

    capture.retrieve( depthMap, CAP_OPENNI_DEPTH_MAP );
    capture.retrieve( bgrImage, CAP_OPENNI_BGR_IMAGE );

    if( waitKey( 30 ) >= 0 )
        break;
}
```

@endcode For setting and getting some property of sensor\` data generators use VideoCapture::set and VideoCapture::get methods respectively, e.g. : @code{.cpp} VideoCapture capture( CAP\_OPENNI2 ); capture.set( CAP\_OPENNI\_IMAGE\_GENERATOR\_OUTPUT\_MODE, CAP\_OPENNI\_VGA\_30HZ ); cout << "FPS " << capture.get( CAP\_OPENNI\_IMAGE\_GENERATOR+CAP\_PROP\_FPS ) << endl; @endcode Since two types of sensor's data generators are supported (image generator and depth generator), there are two flags that should be used to set/get property of the needed generator:

-   CAP\_OPENNI\_IMAGE\_GENERATOR -- A flag for access to the image generator properties.
-   CAP\_OPENNI\_DEPTH\_GENERATOR -- A flag for access to the depth generator properties. This flag value is assumed by default if neither of the two possible values of the property is not set.

Some depth sensors (for example XtionPRO) do not have image generator. In order to check it you can get CAP\_OPENNI\_IMAGE\_GENERATOR\_PRESENT property. @code{.cpp} bool isImageGeneratorPresent = capture.get( CAP\_PROP\_OPENNI\_IMAGE\_GENERATOR\_PRESENT ) != 0; // or == 1 @endcode Flags specifying the needed generator type must be used in combination with particular generator property. The following properties of cameras available through OpenNI interfaces are supported:

-   For image generator:
    
    -   CAP\_PROP\_OPENNI\_OUTPUT\_MODE -- Three output modes are supported: CAP\_OPENNI\_VGA\_30HZ used by default (image generator returns images in VGA resolution with 30 FPS), CAP\_OPENNI\_SXGA\_15HZ (image generator returns images in SXGA resolution with 15 FPS) and CAP\_OPENNI\_SXGA\_30HZ (image generator returns images in SXGA resolution with 30 FPS, the mode is supported by XtionPRO Live); depth generator's maps are always in VGA resolution.
-   For depth generator:
    
    -   CAP\_PROP\_OPENNI\_REGISTRATION -- Flag that registers the remapping depth map to image map by changing depth generator's view point (if the flag is "on") or sets this view point to its normal one (if the flag is "off"). The registration process’s resulting images are pixel-aligned,which means that every pixel in the image is aligned to a pixel in the depth image.
        
        Next properties are available for getting only:
        
    -   CAP\_PROP\_OPENNI\_FRAME\_MAX\_DEPTH -- A maximum supported depth of Kinect in mm.
        
    -   CAP\_PROP\_OPENNI\_BASELINE -- Baseline value in mm.
        
    -   CAP\_PROP\_OPENNI\_FOCAL\_LENGTH -- A focal length in pixels.
        
    -   CAP\_PROP\_FRAME\_WIDTH -- Frame width in pixels.
        
    -   CAP\_PROP\_FRAME\_HEIGHT -- Frame height in pixels.
        
    -   CAP\_PROP\_FPS -- Frame rate in FPS.
        
-   Some typical flags combinations "generator type + property" are defined as single flags:
    
    -   CAP\_OPENNI\_IMAGE\_GENERATOR\_OUTPUT\_MODE = CAP\_OPENNI\_IMAGE\_GENERATOR + CAP\_PROP\_OPENNI\_OUTPUT\_MODE
    -   CAP\_OPENNI\_DEPTH\_GENERATOR\_BASELINE = CAP\_OPENNI\_DEPTH\_GENERATOR + CAP\_PROP\_OPENNI\_BASELINE
    -   CAP\_OPENNI\_DEPTH\_GENERATOR\_FOCAL\_LENGTH = CAP\_OPENNI\_DEPTH\_GENERATOR + CAP\_PROP\_OPENNI\_FOCAL\_LENGTH
    -   CAP\_OPENNI\_DEPTH\_GENERATOR\_REGISTRATION = CAP\_OPENNI\_DEPTH\_GENERATOR + CAP\_PROP\_OPENNI\_REGISTRATION

For more information please refer to the example of usage [videocapture\_depth.cpp](https://github.com/opencv/opencv/tree/5.x/samples/cpp/videocapture_depth.cpp) in opencv/samples/cpp folder.

## [Orbbec Astra Openni](https://docharvest.github.io/docs/opencv5/tutorials/app/orbbec_astra_openni/)

Contents

opencv5

Orbbec Astra Openni

OpenCV 5

Orbbec Astra Openni

# Using Orbbec Astra 3D cameras {#tutorial\_orbbec\_astra\_openni}

@tableofcontents

@prev\_tutorial{tutorial\_kinect\_openni} @next\_tutorial{tutorial\_orbbec\_uvc}

### Introduction

This tutorial is devoted to the Astra Series of Orbbec 3D cameras ([https://www.orbbec.com/products/structured-light-camera/astra-series/](https://www.orbbec.com/products/structured-light-camera/astra-series/)). That cameras have a depth sensor in addition to a common color sensor. The depth sensors can be read using the open source OpenNI API with @ref cv::VideoCapture class. The video stream is provided through the regular camera interface.

### Installation Instructions

In order to use the Astra camera's depth sensor with OpenCV you should do the following steps:

\-# Download the latest version of Orbbec OpenNI SDK (from here [https://www.orbbec.com/developers/openni-sdk/](https://www.orbbec.com/developers/openni-sdk/)). Unzip the archive, choose the build according to your operating system and follow installation steps provided in the Readme file.

\-# For instance, if you use 64bit GNU/Linux run: @code{.bash} $ cd Linux/OpenNI-Linux-x64-2.3.0.63/ $ sudo ./install.sh @endcode When you are done with the installation, make sure to replug your device for udev rules to take effect. The camera should now work as a general camera device. Note that your current user should belong to group `video` to have access to the camera. Also, make sure to source `OpenNIDevEnvironment` file: @code{.bash} $ source OpenNIDevEnvironment @endcode To verify that the source command works and OpenNI library and header files can be found, run the following command and you should see something similar in your terminal: @code{.bash} $ echo $OPENNI2\_INCLUDE /home/user/OpenNI\_2.3.0.63/Linux/OpenNI-Linux-x64-2.3.0.63/Include $ echo $OPENNI2\_REDIST /home/user/OpenNI\_2.3.0.63/Linux/OpenNI-Linux-x64-2.3.0.63/Redist @endcode If the above two variables are empty, then you need to source `OpenNIDevEnvironment` again.

```
@note Orbbec OpenNI SDK version 2.3.0.86 and newer does not provide `install.sh` any more.
You can use the following script to initialize environment:
@code{.text}
# Check if user is root/running with sudo
if [ `whoami` != root ]; then
    echo Please run this script with sudo
    exit
fi

ORIG_PATH=`pwd`
cd `dirname $0`
SCRIPT_PATH=`pwd`
cd $ORIG_PATH

if [ "`uname -s`" != "Darwin" ]; then
    # Install UDEV rules for USB device
    cp ${SCRIPT_PATH}/orbbec-usb.rules /etc/udev/rules.d/558-orbbec-usb.rules
    echo "usb rules file install at /etc/udev/rules.d/558-orbbec-usb.rules"
fi

OUT_FILE="$SCRIPT_PATH/OpenNIDevEnvironment"
echo "export OPENNI2_INCLUDE=$SCRIPT_PATH/../sdk/Include" > $OUT_FILE
echo "export OPENNI2_REDIST=$SCRIPT_PATH/../sdk/libs" >> $OUT_FILE
chmod a+r $OUT_FILE
echo "exit"
@endcode

@note The last tried version `2.3.0.86_202210111154_4c8f5aa4_beta6` does not work correctly with
modern Linux, even after libusb rebuild as recommended by the instruction. The last know good
configuration is version 2.3.0.63 (tested with Ubuntu 18.04 amd64). It's not provided officialy
with the downloading page, but published by Orbbec technical suport on Orbbec community forum
[here](https://3dclub.orbbec3d.com/t/universal-download-thread-for-astra-series-cameras/622).
```

\-# Now you can configure OpenCV with OpenNI support enabled by setting the `WITH_OPENNI2` flag in CMake. You may also like to enable the `BUILD_EXAMPLES` flag to get a code sample working with your Astra camera. Run the following commands in the directory containing OpenCV source code to enable OpenNI support: @code{.bash} $ mkdir build $ cd build $ cmake -DWITH\_OPENNI2=ON .. @endcode If the OpenNI library is found, OpenCV will be built with OpenNI2 support. You can see the status of OpenNI2 support in the CMake log: @code{.text} -- Video I/O: -- DC1394: YES (2.2.6) -- FFMPEG: YES -- avcodec: YES (58.91.100) -- avformat: YES (58.45.100) -- avutil: YES (56.51.100) -- swscale: YES (5.7.100) -- avresample: NO -- GStreamer: YES (1.18.1) -- OpenNI2: YES (2.3.0) -- v4l/v4l2: YES (linux/videodev2.h) @endcode

\-# Build OpenCV: @code{.bash} $ make @endcode

### Code

The Astra Pro camera has two sensors -- a depth sensor and a color sensor. The depth sensor can be read using the OpenNI interface with @ref cv::VideoCapture class. The video stream is not available through OpenNI API and is only provided via the regular camera interface. So, to get both depth and color frames, two @ref cv::VideoCapture objects should be created:

@snippetlineno samples/cpp/tutorial\_code/videoio/openni\_orbbec\_astra/openni\_orbbec\_astra.cpp Open streams

The first object will use the OpenNI2 API to retrieve depth data. The second one uses the Video4Linux2 interface to access the color sensor. Note that the example above assumes that the Astra camera is the first camera in the system. If you have more than one camera connected, you may need to explicitly set the proper camera number.

Before using the created VideoCapture objects you may want to set up stream parameters by setting objects' properties. The most important parameters are frame width, frame height and fps. For this example, we’ll configure width and height of both streams to VGA resolution, which is the maximum resolution available for both sensors, and we’d like both stream parameters to be the same for easier color-to-depth data registration:

@snippetlineno samples/cpp/tutorial\_code/videoio/openni\_orbbec\_astra/openni\_orbbec\_astra.cpp Setup streams

For setting and retrieving some property of sensor data generators use @ref cv::VideoCapture::set and @ref cv::VideoCapture::get methods respectively, e.g. :

@snippetlineno samples/cpp/tutorial\_code/videoio/openni\_orbbec\_astra/openni\_orbbec\_astra.cpp Get properties

The following properties of cameras available through OpenNI interface are supported for the depth generator:

-   @ref cv::CAP\_PROP\_FRAME\_WIDTH -- Frame width in pixels.
    
-   @ref cv::CAP\_PROP\_FRAME\_HEIGHT -- Frame height in pixels.
    
-   @ref cv::CAP\_PROP\_FPS -- Frame rate in FPS.
    
-   @ref cv::CAP\_PROP\_OPENNI\_REGISTRATION -- Flag that registers the remapping depth map to image map by changing the depth generator's viewpoint (if the flag is "on") or sets this view point to its normal one (if the flag is "off"). The registration process’ resulting images are pixel-aligned, which means that every pixel in the image is aligned to a pixel in the depth image.
    
-   @ref cv::CAP\_PROP\_OPENNI2\_MIRROR -- Flag to enable or disable mirroring for this stream. Set to 0 to disable mirroring
    
    Next properties are available for getting only:
    
-   @ref cv::CAP\_PROP\_OPENNI\_FRAME\_MAX\_DEPTH -- A maximum supported depth of the camera in mm.
    
-   @ref cv::CAP\_PROP\_OPENNI\_BASELINE -- Baseline value in mm.
    

After the VideoCapture objects have been set up, you can start reading frames from them.

@note OpenCV's VideoCapture provides synchronous API, so you have to grab frames in a new thread to avoid one stream blocking while another stream is being read. VideoCapture is not a thread-safe class, so you need to be careful to avoid any possible deadlocks or data races.

As there are two video sources that should be read simultaneously, it’s necessary to create two threads to avoid blocking. Example implementation that gets frames from each sensor in a new thread and stores them in a list along with their timestamps:

@snippetlineno samples/cpp/tutorial\_code/videoio/openni\_orbbec\_astra/openni\_orbbec\_astra.cpp Read streams

VideoCapture can retrieve the following data:

\-# data given from the depth generator: - @ref cv::CAP\_OPENNI\_DEPTH\_MAP - depth values in mm (CV\_16UC1) - @ref cv::CAP\_OPENNI\_POINT\_CLOUD\_MAP - XYZ in meters (CV\_32FC3) - @ref cv::CAP\_OPENNI\_DISPARITY\_MAP - disparity in pixels (CV\_8UC1) - @ref cv::CAP\_OPENNI\_DISPARITY\_MAP\_32F - disparity in pixels (CV\_32FC1) - @ref cv::CAP\_OPENNI\_VALID\_DEPTH\_MASK - mask of valid pixels (not occluded, not shaded, etc.) (CV\_8UC1)

\-# data given from the color sensor is a regular BGR image (CV\_8UC3).

When new data are available, each reading thread notifies the main thread using a condition variable. A frame is stored in the ordered list -- the first frame in the list is the earliest captured, the last frame is the latest captured. As depth and color frames are read from independent sources two video streams may become out of sync even when both streams are set up for the same frame rate. A post-synchronization procedure can be applied to the streams to combine depth and color frames into pairs. The sample code below demonstrates this procedure:

@snippetlineno samples/cpp/tutorial\_code/videoio/openni\_orbbec\_astra/openni\_orbbec\_astra.cpp Pair frames

In the code snippet above the execution is blocked until there are some frames in both frame lists. When there are new frames, their timestamps are being checked -- if they differ more than a half of the frame period then one of the frames is dropped. If timestamps are close enough, then two frames are paired. Now, we have two frames: one containing color information and another one -- depth information. In the example above retrieved frames are simply shown with cv::imshow function, but you can insert any other processing code here.

In the sample images below you can see the color frame and the depth frame representing the same scene. Looking at the color frame it's hard to distinguish plant leaves from leaves painted on a wall, but the depth data makes it easy.

The complete implementation can be found in [openni\_orbbec\_astra.cpp](https://github.com/opencv/opencv/tree/5.x/samples/cpp/tutorial_code/videoio/openni_orbbec_astra/openni_orbbec_astra.cpp) in `samples/cpp/tutorial_code/videoio` directory.

## [Orbbec Uvc](https://docharvest.github.io/docs/opencv5/tutorials/app/orbbec_uvc/)

Contents

opencv5

Orbbec Uvc

OpenCV 5

Orbbec Uvc

# Using Orbbec 3D cameras (UVC) {#tutorial\_orbbec\_uvc}

@tableofcontents

@prev\_tutorial{tutorial\_orbbec\_astra\_openni} @next\_tutorial{tutorial\_intelperc}

Original author

Jinyue Chen

Compatibility

OpenCV >= 4.10

### Introduction

This tutorial is devoted to the Orbbec 3D cameras based on UVC protocol. For the use of the older Orbbec 3D cameras which depends on OpenNI, please refer to the [previous tutorial](https://github.com/opencv/opencv/blob/5.x/doc/tutorials/app/orbbec_astra_openni.markdown).

Unlike working with the OpenNI based Astra 3D cameras which requires OpenCV built with OpenNI2 SDK, Orbbec SDK is not required to be installed for accessing Orbbec UVC 3D cameras via OpenCV. By using `cv::VideoCapture` class, users get the stream data from 3D cameras, similar to working with USB cameras. The calibration and alignment of the depth map and color image are done internally.

### Instructions

In order to use the 3D cameras with OpenCV. You can refer to [Get Started](https://opencv.org/get-started/) to install OpenCV.

Note since 4.11 on, Mac OS users need to compile OpenCV from source with flag `-DOBSENSOR_USE_ORBBEC_SDK=ON` in order to use the cameras:

```
cmake -DOBSENSOR_USE_ORBBEC_SDK=ON ..
make
sudo make install
```

By default, when `-DOBSENSOR_USE_ORBBEC_SDK=ON` is enabled, OrbbecSDK v2 is used (i.e., `ORBBEC_SDK_VERSION` defaults to `2`); it supports the entire Orbbec Gemini 330 series.

If you need legacy cameras such as Orbbec Femto, Gemini2XL, or Astra+, switch to OrbbecSDK v1 with the flag `-DORBBEC_SDK_VERSION=1`:

```
  cmake -DOBSENSOR_USE_ORBBEC_SDK=ON -DORBBEC_SDK_VERSION=1 ..
  make -j
  sudo make install
```

## Code

@add\_toggle\_python This tutorial code's is shown lines below. You can also download it from [here](https://github.com/opencv/opencv/blob/5.x/samples/python/tutorial_code/videoio/videocapture_obsensor.py) @include samples/python/tutorial\_code/videoio/videocapture\_obsensor.py @end\_toggle

@add\_toggle\_cpp This tutorial code's is shown lines below. You can also download it from [here](https://github.com/opencv/opencv/blob/5.x/samples/cpp/tutorial_code/videoio/uvc_orbbec/videocapture_obsensor.cpp) @include samples/cpp/tutorial\_code/videoio/uvc\_orbbec/videocapture\_obsensor.cpp @end\_toggle

### Code Explanation

#### Python

-   **Open Orbbec Depth Sensor**: Using `cv.VideoCapture(0, cv.CAP_OBSENSOR)` to attempt to open the first Orbbec depth sensor device. If the camera fails to open, the program will exit and display an error message.
    
-   **Loop to Grab and Process Data**: In an infinite loop, the code continuously grabs data from the camera. The `orbbec_cap.grab()` method attempts to grab a frame.
    
-   **Process BGR Image**: Using `orbbec_cap.retrieve(None, cv.CAP_OBSENSOR_BGR_IMAGE)` to retrieve the BGR image data. If successfully retrieved, the BGR image is displayed in a window using `cv.imshow("BGR", bgr_image)`.
    
-   **Process Depth Image**: Using `orbbec_cap.retrieve(None, cv.CAP_OBSENSOR_DEPTH_MAP)` to retrieve the depth image data. If successfully retrieved, the depth image is first normalized to a range of 0 to 255, then a false color image is applied, and the result is displayed in a window using `cv.imshow("DEPTH", color_depth_map)`.
    
-   **Keyboard Interrupt**: Using `cv.pollKey()` to detect keyboard events. If a key is pressed, the loop breaks and the program ends.
    
-   **Release Resources**: After exiting the loop, the camera resources are released using `orbbec_cap.release()`.
    

#### C++

-   **Open Orbbec Depth Sensor**: Using `VideoCapture obsensorCapture(0, CAP_OBSENSOR)` to attempt to open the first Orbbec depth sensor device. If the camera fails to open, an error message is displayed, and the program exits.
    
-   **Retrieve Camera Intrinsic Parameters**: Using `obsensorCapture.get()` to retrieve the intrinsic parameters of the camera, including focal lengths (`fx`, `fy`) and principal points (`cx`, `cy`).
    
-   **Loop to Grab and Process Data**: In an infinite loop, the code continuously grabs data from the camera. The `obsensorCapture.grab()` method attempts to grab a frame.
    
-   **Process BGR Image**: Using `obsensorCapture.retrieve(image, CAP_OBSENSOR_BGR_IMAGE)` to retrieve the BGR image data. If successfully retrieved, the BGR image is displayed in a window using `imshow("BGR", image)`.
    
-   **Process Depth Image**: Using `obsensorCapture.retrieve(depthMap, CAP_OBSENSOR_DEPTH_MAP)` to retrieve the depth image data. If successfully retrieved, the depth image is normalized and a false color image is applied, then the result is displayed in a window using `imshow("DEPTH", adjDepthMap)`. The retrieved depth values are in millimeters and are truncated to a range between 300 and 5000 (millimeter). This fixed range can be interpreted as a truncation based on the depth camera's depth range, removing invalid pixels on the depth map.
    
-   **Overlay Depth Map on BGR Image**: Convert the depth map to an 8-bit image, resize it to match the BGR image size, and overlay it on the BGR image with a specified transparency (`alpha`). The overlaid image is displayed in a window using `imshow("DepthToColor", image)`.
    
-   **Keyboard Interrupt**: Using `pollKey()` to detect keyboard events. If a key is pressed, the loop breaks and the program ends.
    
-   **Release Resources**: After exiting the loop, the camera resources are released.
    

### Results

#### Python

#### C++

### Note

-   Mac users need `sudo` privileges to execute the code.
-   **Firmware**: If you’re using an Orbbec UVC 3D camera, please ensure your camera’s firmware is updated to the latest version to avoid potential compatibility issues. For more details, see [Orbbec’s Release Notes](https://github.com/orbbec/OrbbecSDK_v2/releases).

## [Raster Io Gdal](https://docharvest.github.io/docs/opencv5/tutorials/app/raster_io_gdal/)

Contents

opencv5

Raster Io Gdal

OpenCV 5

Raster Io Gdal

# Reading Geospatial Raster files with GDAL {#tutorial\_raster\_io\_gdal}

@tableofcontents

@prev\_tutorial{tutorial\_trackbar} @next\_tutorial{tutorial\_video\_input\_psnr\_ssim}

Original author

Marvin Smith

Compatibility

OpenCV >= 3.0

Geospatial raster data is a heavily used product in Geographic Information Systems and Photogrammetry. Raster data typically can represent imagery and Digital Elevation Models (DEM). The standard library for loading GIS imagery is the Geographic Data Abstraction Library [(GDAL)](http://www.gdal.org). In this example, we will show techniques for loading GIS raster formats using native OpenCV functions. In addition, we will show some an example of how OpenCV can use this data for novel and interesting purposes.

## Goals

The primary objectives for this tutorial:

-   How to use OpenCV \[imread\](@ref imread) to load satellite imagery.
-   How to use OpenCV \[imread\](@ref imread) to load SRTM Digital Elevation Models
-   Given the corner coordinates of both the image and DEM, correlate the elevation data to the image to find elevations for each pixel.
-   Show a basic, easy-to-implement example of a terrain heat map.
-   Show a basic use of DEM data coupled with ortho-rectified imagery.

To implement these goals, the following code takes a Digital Elevation Model as well as a GeoTiff image of San Francisco as input. The image and DEM data is processed and generates a terrain heat map of the image as well as labels areas of the city which would be affected should the water level of the bay rise 10, 50, and 100 meters.

## Code

@include cpp/tutorial\_code/imgcodecs/GDAL\_IO/gdal-image.cpp

## How to Read Raster Data using GDAL

This demonstration uses the default OpenCV imread function. The primary difference is that in order to force GDAL to load the image, you must use the appropriate flag. @snippet cpp/tutorial\_code/imgcodecs/GDAL\_IO/gdal-image.cpp load1 When loading digital elevation models, the actual numeric value of each pixel is essential and cannot be scaled or truncated. For example, with image data a pixel represented as a double with a value of 1 has an equal appearance to a pixel which is represented as an unsigned character with a value of 255. With terrain data, the pixel value represents the elevation in meters. In order to ensure that OpenCV preserves the native value, use the GDAL flag in imread with the ANYDEPTH flag. @snippet cpp/tutorial\_code/imgcodecs/GDAL\_IO/gdal-image.cpp load2 If you know beforehand the type of DEM model you are loading, then it may be a safe bet to test the Mat::type() or Mat::depth() using an assert or other mechanism. NASA or DOD specification documents can provide the input types for various elevation models. The major types, SRTM and DTED, are both signed shorts.

## Notes

### Lat/Lon (Geographic) Coordinates should normally be avoided

The Geographic Coordinate System is a spherical coordinate system, meaning that using them with Cartesian mathematics is technically incorrect. This demo uses them to increase the readability and is accurate enough to make the point. A better coordinate system would be Universal Transverse Mercator.

### Finding the corner coordinates

One easy method to find the corner coordinates of an image is to use the command-line tool gdalinfo. For imagery which is ortho-rectified and contains the projection information, you can use the [USGS EarthExplorer](http://http://earthexplorer.usgs.gov). @code{.bash} \\f$> gdalinfo N37W123.hgt

Driver: SRTMHGT/SRTMHGT File Format Files: N37W123.hgt Size is 3601, 3601 Coordinate System is: GEOGCS\["WGS 84", DATUM\["WGS\_1984",

... more output ...

Corner Coordinates: Upper Left (-123.0001389, 38.0001389) (123d 0' 0.50"W, 38d 0' 0.50"N) Lower Left (-123.0001389, 36.9998611) (123d 0' 0.50"W, 36d59'59.50"N) Upper Right (-121.9998611, 38.0001389) (121d59'59.50"W, 38d 0' 0.50"N) Lower Right (-121.9998611, 36.9998611) (121d59'59.50"W, 36d59'59.50"N) Center (-122.5000000, 37.5000000) (122d30' 0.00"W, 37d30' 0.00"N)

```
... more output ...
```

## @endcode Results

Below is the output of the program. Use the first image as the input. For the DEM model, download the SRTM file located at the USGS here. [](http://dds.cr.usgs.gov/srtm/version2_1/SRTM1/Region_04/N37W123.hgt.zip)[http://dds.cr.usgs.gov/srtm/version2\_1/SRTM1/Region\_04/N37W123.hgt.zip](http://dds.cr.usgs.gov/srtm/version2_1/SRTM1/Region_04/N37W123.hgt.zip)

## [Table Of Content App](https://docharvest.github.io/docs/opencv5/tutorials/app/table_of_content_app/)

Contents

opencv5

Table Of Content App

OpenCV 5

Table Of Content App

# Application utils (highgui, imgcodecs, videoio modules) {#tutorial\_table\_of\_content\_app}

-   @subpage tutorial\_trackbar
-   @subpage tutorial\_raster\_io\_gdal
-   @subpage tutorial\_video\_input\_psnr\_ssim
-   @subpage tutorial\_video\_write
-   @subpage tutorial\_kinect\_openni
-   @subpage tutorial\_orbbec\_astra\_openni
-   @subpage tutorial\_orbbec\_uvc
-   @subpage tutorial\_intelperc
-   @subpage tutorial\_wayland\_ubuntu
-   @subpage tutorial\_animations

## [Trackbar](https://docharvest.github.io/docs/opencv5/tutorials/app/trackbar/)

Contents

opencv5

Trackbar

OpenCV 5

Trackbar

# Adding a Trackbar to our applications! {#tutorial\_trackbar}

@tableofcontents

@next\_tutorial{tutorial\_raster\_io\_gdal}

Original author

Ana Huamán

Compatibility

OpenCV >= 3.0

-   In the previous tutorials (about @ref tutorial\_adding\_images and the @ref tutorial\_basic\_linear\_transform) you might have noted that we needed to give some **input** to our programs, such as \\f$\\alpha\\f$ and \\f$beta\\f$. We accomplished that by entering this data using the Terminal.
    
-   Well, it is time to use some fancy GUI tools. OpenCV provides some GUI utilities (**highgui** module) for you. An example of this is a **Trackbar**.
    
-   In this tutorial we will just modify our two previous programs so that they get the input information from the trackbar.
    

## Goals

In this tutorial you will learn how to:

-   Add a Trackbar in an OpenCV window by using @ref cv::createTrackbar

## Code

Let's modify the program made in the tutorial @ref tutorial\_adding\_images. We will let the user enter the \\f$\\alpha\\f$ value by using the Trackbar.

@add\_toggle\_cpp This tutorial code's is shown lines below. You can also download it from [here](https://github.com/opencv/opencv/tree/5.x/samples/cpp/tutorial_code/HighGUI/AddingImagesTrackbar.cpp) @include cpp/tutorial\_code/HighGUI/AddingImagesTrackbar.cpp @end\_toggle

@add\_toggle\_java This tutorial code's is shown lines below. You can also download it from [here](https://github.com/opencv/opencv/tree/5.x/samples/java/tutorial_code/highgui/trackbar/AddingImagesTrackbar.java) @include java/tutorial\_code/highgui/trackbar/AddingImagesTrackbar.java @end\_toggle

@add\_toggle\_python This tutorial code's is shown lines below. You can also download it from [here](https://github.com/opencv/opencv/tree/5.x/samples/python/tutorial_code/highgui/trackbar/AddingImagesTrackbar.py) @include python/tutorial\_code/highgui/trackbar/AddingImagesTrackbar.py @end\_toggle

## Explanation

We only analyze the code that is related to Trackbar:

-   First, we load two images, which are going to be blended.

@add\_toggle\_cpp @snippet cpp/tutorial\_code/HighGUI/AddingImagesTrackbar.cpp load @end\_toggle

@add\_toggle\_java @snippet java/tutorial\_code/highgui/trackbar/AddingImagesTrackbar.java load @end\_toggle

@add\_toggle\_python @snippet python/tutorial\_code/highgui/trackbar/AddingImagesTrackbar.py load @end\_toggle

-   To create a trackbar, first we have to create the window in which it is going to be located. So:

@add\_toggle\_cpp @snippet cpp/tutorial\_code/HighGUI/AddingImagesTrackbar.cpp window @end\_toggle

@add\_toggle\_java @snippet java/tutorial\_code/highgui/trackbar/AddingImagesTrackbar.java window @end\_toggle

@add\_toggle\_python @snippet python/tutorial\_code/highgui/trackbar/AddingImagesTrackbar.py window @end\_toggle

-   Now we can create the Trackbar:

@add\_toggle\_cpp @snippet cpp/tutorial\_code/HighGUI/AddingImagesTrackbar.cpp create\_trackbar @end\_toggle

@add\_toggle\_java @snippet java/tutorial\_code/highgui/trackbar/AddingImagesTrackbar.java create\_trackbar @end\_toggle

@add\_toggle\_python @snippet python/tutorial\_code/highgui/trackbar/AddingImagesTrackbar.py create\_trackbar @end\_toggle

Note the following (C++ code): - Our Trackbar has a label **TrackbarName** - The Trackbar is located in the window named **Linear Blend** - The Trackbar values will be in the range from \\f$0\\f$ to **alpha\_slider\_max** (the minimum limit is always **zero**). - The numerical value of Trackbar is stored in **alpha\_slider** - Whenever the user moves the Trackbar, the callback function **on\_trackbar** is called

Finally, we have to define the callback function **on\_trackbar** for C++ and Python code, using an anonymous inner class listener in Java

@add\_toggle\_cpp @snippet cpp/tutorial\_code/HighGUI/AddingImagesTrackbar.cpp on\_trackbar @end\_toggle

@add\_toggle\_java @snippet java/tutorial\_code/highgui/trackbar/AddingImagesTrackbar.java on\_trackbar @end\_toggle

@add\_toggle\_python @snippet python/tutorial\_code/highgui/trackbar/AddingImagesTrackbar.py on\_trackbar @end\_toggle

Note that (C++ code): - We use the value of **alpha\_slider** (integer) to get a double value for **alpha**. - **alpha\_slider** is updated each time the trackbar is displaced by the user. - We define _src1_, _src2_, _dist_, _alpha_, _alpha\_slider_ and _beta_ as global variables, so they can be used everywhere.

## Result

-   Our program produces the following output:
    
-   As a manner of practice, you can also add two trackbars for the program made in @ref tutorial\_basic\_linear\_transform. One trackbar to set \\f$\\alpha\\f$ and another for set \\f$\\beta\\f$. The output might look like:

## [Video Input Psnr Ssim](https://docharvest.github.io/docs/opencv5/tutorials/app/video_input_psnr_ssim/)

Contents

opencv5

Video Input Psnr Ssim

OpenCV 5

Video Input Psnr Ssim

# Video Input with OpenCV and similarity measurement {#tutorial\_video\_input\_psnr\_ssim}

@tableofcontents

@prev\_tutorial{tutorial\_raster\_io\_gdal} @next\_tutorial{tutorial\_video\_write}

Original author

Bernát Gábor

Compatibility

OpenCV >= 3.0

## Goal

Today it is common to have a digital video recording system at your disposal. Therefore, you will eventually come to the situation that you no longer process a batch of images, but video streams. These may be of two kinds: real-time image feed (in the case of a webcam) or prerecorded and hard disk drive stored files. Luckily OpenCV treats these two in the same manner, with the same C++ class. So here's what you'll learn in this tutorial:

-   How to open and read video streams
-   Two ways for checking image similarity: PSNR and SSIM

## The source code

As a test case where to show off these using OpenCV I've created a small program that reads in two video files and performs a similarity check between them. This is something you could use to check just how well a new video compressing algorithms works. Let there be a reference (original) video like [this small Megamind clip](https://github.com/opencv/opencv/tree/5.x/samples/data/Megamind.avi) and [a compressed version of it](https://github.com/opencv/opencv/tree/5.x/samples/data/Megamind_bugy.avi) . You may also find the source code and these video file in the `samples/data` folder of the OpenCV source library.

@add\_toggle\_cpp @include cpp/tutorial\_code/videoio/video-input-psnr-ssim/video-input-psnr-ssim.cpp @end\_toggle

@add\_toggle\_python @include samples/python/tutorial\_code/videoio/video-input-psnr-ssim.py @end\_toggle

## How to read a video stream (online-camera or offline-file)?

Essentially, all the functionalities required for video manipulation is integrated in the @ref cv::VideoCapture C++ class. This on itself builds on the FFmpeg open source library. This is a basic dependency of OpenCV so you shouldn't need to worry about this. A video is composed of a succession of images, we refer to these in the literature as frames. In case of a video file there is a _frame rate_ specifying just how long is between two frames. While for the video cameras usually there is a limit of just how many frames they can digitize per second, this property is less important as at any time the camera sees the current snapshot of the world.

The first task you need to do is to assign to a @ref cv::VideoCapture class its source. You can do this either via the @ref cv::VideoCapture::VideoCapture or its @ref cv::VideoCapture::open function. If this argument is an integer then you will bind the class to a camera, a device. The number passed here is the ID of the device, assigned by the operating system. If you have a single camera attached to your system its ID will probably be zero and further ones increasing from there. If the parameter passed to these is a string it will refer to a video file, and the string points to the location and name of the file. For example, to the upper source code a valid command line is: @code{.bash} video/Megamind.avi video/Megamind\_bug.avi 35 10 @endcode We do a similarity check. This requires a reference and a test case video file. The first two arguments refer to this. Here we use a relative address. This means that the application will look into its current working directory and open the video folder and try to find inside this the _Megamind.avi_ and the _Megamind\_bug.avi_. @code{.cpp} const string sourceReference = argv\[1\],sourceCompareWith = argv\[2\];

VideoCapture captRefrnc(sourceReference); // or VideoCapture captUndTst; captUndTst.open(sourceCompareWith); @endcode To check if the binding of the class to a video source was successful or not use the @ref cv::VideoCapture::isOpened function: @code{.cpp} if ( !captRefrnc.isOpened()) { cout << "Could not open reference " << sourceReference << endl; return -1; } @endcode Closing the video is automatic when the objects destructor is called. However, if you want to close it before this you need to call its @ref cv::VideoCapture::release function. The frames of the video are just simple images. Therefore, we just need to extract them from the @ref cv::VideoCapture object and put them inside a \*Mat\* one. The video streams are sequential. You may get the frames one after another by the @ref cv::VideoCapture::read or the overloaded >> operator: @code{.cpp} Mat frameReference, frameUnderTest; captRefrnc >> frameReference; captUndTst.read(frameUnderTest); @endcode The upper read operations will leave empty the _Mat_ objects if no frame could be acquired (either cause the video stream was closed or you got to the end of the video file). We can check this with a simple if: @code{.cpp} if( frameReference.empty() || frameUnderTest.empty()) { // exit the program } @endcode A read method is made of a frame grab and a decoding applied on that. You may call explicitly these two by using the @ref cv::VideoCapture::grab and then the @ref cv::VideoCapture::retrieve functions.

Videos have many-many information attached to them besides the content of the frames. These are usually numbers, however in some case it may be short character sequences (4 bytes or less). Due to this to acquire these information there is a general function named @ref cv::VideoCapture::get that returns double values containing these properties. Use bitwise operations to decode the characters from a double type and conversions where valid values are only integers. Its single argument is the ID of the queried property. For example, here we get the size of the frames in the reference and test case video file; plus the number of frames inside the reference. @code{.cpp} Size refS = Size((int) captRefrnc.get(CAP\_PROP\_FRAME\_WIDTH), (int) captRefrnc.get(CAP\_PROP\_FRAME\_HEIGHT)),

cout << "Reference frame resolution: Width=" << refS.width << " Height=" << refS.height << " of nr#: " << captRefrnc.get(CAP\_PROP\_FRAME\_COUNT) << endl; @endcode When you are working with videos you may often want to control these values yourself. To do this there is a @ref cv::VideoCapture::set function. Its first argument remains the name of the property you want to change and there is a second of double type containing the value to be set. It will return true if it succeeds and false otherwise. Good examples for this is seeking in a video file to a given time or frame: @code{.cpp} captRefrnc.set(CAP\_PROP\_POS\_MSEC, 1.2); // go to the 1.2 second in the video captRefrnc.set(CAP\_PROP\_POS\_FRAMES, 10); // go to the 10th frame of the video // now a read operation would read the frame at the set position @endcode For properties you can read and change look into the documentation of the @ref cv::VideoCapture::get and @ref cv::VideoCapture::set functions.

### Image similarity - PSNR and SSIM

We want to check just how imperceptible our video converting operation went, therefore we need a system to check frame by frame the similarity or differences. The most common algorithm used for this is the PSNR (aka **Peak signal-to-noise ratio**). The simplest definition of this starts out from the _mean squared error_. Let there be two images: I1 and I2; with a two dimensional size i and j, composed of c number of channels.

\\f\[MSE = \\frac{1}{c_i_j} \\sum{(I\_1-I\_2)^2}\\f\]

Then the PSNR is expressed as:

\\f\[PSNR = 10 \\cdot \\log\_{10} \\left( \\frac{MAX\_I^2}{MSE} \\right)\\f\]

Here the \\f$MAX\_I\\f$ is the maximum valid value for a pixel. In case of the simple single byte image per pixel per channel this is 255. When two images are the same the MSE will give zero, resulting in an invalid divide by zero operation in the PSNR formula. In this case the PSNR is undefined and as we'll need to handle this case separately. The transition to a logarithmic scale is made because the pixel values have a very wide dynamic range. All this translated to OpenCV and a function looks like:

@add\_toggle\_cpp @snippet cpp/tutorial\_code/videoio/video-input-psnr-ssim/video-input-psnr-ssim.cpp get-psnr @end\_toggle

@add\_toggle\_python @snippet samples/python/tutorial\_code/videoio/video-input-psnr-ssim.py get-psnr @end\_toggle

Typically result values are anywhere between 30 and 50 for video compression, where higher is better. If the images significantly differ you'll get much lower ones like 15 and so. This similarity check is easy and fast to calculate, however in practice it may turn out somewhat inconsistent with human eye perception. The **structural similarity** algorithm aims to correct this.

Describing the methods goes well beyond the purpose of this tutorial. For that I invite you to read the article introducing it. Nevertheless, you can get a good image of it by looking at the OpenCV implementation below.

@note SSIM is described more in-depth in the: "Z. Wang, A. C. Bovik, H. R. Sheikh and E. P. Simoncelli, "Image quality assessment: From error visibility to structural similarity," IEEE Transactions on Image Processing, vol. 13, no. 4, pp. 600-612, Apr. 2004." article.

@add\_toggle\_cpp @snippet samples/cpp/tutorial\_code/videoio/video-input-psnr-ssim/video-input-psnr-ssim.cpp get-mssim @end\_toggle

@add\_toggle\_python @snippet samples/python/tutorial\_code/videoio/video-input-psnr-ssim.py get-mssim @end\_toggle

This will return a similarity index for each channel of the image. This value is between zero and one, where one corresponds to perfect fit. Unfortunately, the many Gaussian blurring is quite costly, so while the PSNR may work in a real time like environment (24 frames per second) this will take significantly more than to accomplish similar performance results.

Therefore, the source code presented at the start of the tutorial will perform the PSNR measurement for each frame, and the SSIM only for the frames where the PSNR falls below an input value. For visualization purpose we show both images in an OpenCV window and print the PSNR and MSSIM values to the console. Expect to see something like:

You may observe a runtime instance of this on the [YouTube here](https://www.youtube.com/watch?v=iOcNljutOgg).

@youtube{iOcNljutOgg}

## [Video Write](https://docharvest.github.io/docs/opencv5/tutorials/app/video_write/)


## [Camera Calibration Pattern](https://docharvest.github.io/docs/opencv5/tutorials/calib3d/camera_calibration_pattern/camera_calibration_pattern/)

Contents

opencv5

Camera Calibration Pattern

OpenCV 5

Camera Calibration Pattern

# Create Calibration Pattern {#tutorial\_camera\_calibration\_pattern}

@tableofcontents

@next\_tutorial{tutorial\_camera\_calibration\_square\_chess}

Authors

Laurent Berger, Alexander Panov, Alexander Smorkalov

Compatibility

OpenCV > 4.12

The tutorial describes all pattern supported by OpenCV for camera(s) calibration and pose estimation with their strength, pitfalls and practical recommendations.

## What is calibration pattern? why I need it?

The flat printable pattern may be used:

1.  For camera intrinsics (internal parameters) calibration. See @ref tutorial\_camera\_calibration.
2.  For stereo or multi-camera system extrinsics (external parameters: rotation and translation of each camera) calibration. See cv::stereoCalibrate for details.
3.  Camera pose registration relative to well known point in 3d world. See @ref tutorial\_multiview\_camera\_calibration

## Pattern Types

**Chessboard**. Classic calibration pattern of black and white squares. The all calibration algorithms use internal chessboard corners as features. See cv::findChessboardCorners and cv::cornerSubPix to detect the board and refine corners coordinates with sub-pixel accuracy. The board size is defined as amount of internal corners, but not amount of black or white squares. Also pay attention, that the board with even size is symmetric. If board has even amount of corners by one of direction then its pose is defined up to 180 degrees (2 solutions). It the board is square with size N x N then its pose is defined up to 90 degrees (4 solutions). The last two cases are not suitable for calibration. Example code to generate features coordinates for calibration (object points):

```
    std::vector<cv::Point3f> objectPoints;
    for (int i = 0; i < boardSize.height; ++i) {
        for (int j = 0; j < boardSize.width; ++j) {
            objectPoints.push_back(Point3f(j*squareSize, i*squareSize, 0));
        }
    }
```

Printable chessboard pattern: [https://github.com/opencv/opencv/blob/5.x/doc/pattern.png](https://github.com/opencv/opencv/blob/5.x/doc/pattern.png) (9x6 chessboard, page width: 210 mm, page height: 297 mm (A4))

**Circles Grid**. The circles grid is symmetric or asymmetric (each even row shifted) grid of black circles on a white background or vice verse. See cv::findCirclesGrid function to detect the board with OpenCV. The detector produces sub-pixel coordinates of the circle centers and does not require additional refinement. The board size is defined as amount of circles in grid by x and y axis. In case of asymmetric grid the shifted rows are taken into account too. The board is suitable for intrinsics calibration. Symmetric grids suffer from the same issue as chessboard pattern with even size. It's pose is defined up to 180 degrees. Example code to generate features coordinates for calibration with symmetric grid (object points):

```
    std::vector<cv::Point3f> objectPoints;
    for (int i = 0; i < boardSize.height; ++i) {
        for (int j = 0; j < boardSize.width; ++j) {
            objectPoints.push_back(Point3f(j*squareSize, i*squareSize, 0));
        }
    }
```

Example code to generate features corrdinates for calibration with asymmetic grid (object points):

```
    std::vector<cv::Point3f> objectPoints;
    for (int i = 0; i < boardSize.height; i++) {
        for (int j = 0; j < boardSize.width; j++) {
            objectPoints.push_back(Point3f((2 * j + i % 2)*squareSize, i*squareSize, 0));
        }
    }
```

Printable asymmetric circles grid pattern: [https://github.com/opencv/opencv/blob/5.x/doc/acircles\_pattern.png](https://github.com/opencv/opencv/blob/5.x/doc/acircles_pattern.png) (11x4 asymmetric circles grid, page width: 210 mm, page height: 297 mm (A4))

**ChAruco board**. Chessboard unreached with ArUco markers. Each internal corner of the board is described by 2 neighborhood ArUco markers that makes it unique. The board size is defined in number of units, but not internal corners. ChAruco board of size N x M is equivalent to chessboard pattern of size N-1 x M-1. OpenCV provides `cv::aruco::CharucoDetector` class for the board detection. The detector algorithm finds ArUco markers first and them "assembles" the board using knowledge about ArUco pairs. In opposite to the previous pattern partially occluded board may be used as all corners are labeled. The board is rotation invariant, but set of ArUco markers and their order should be known to detector apriori. It cannot detect ChAruco board with predefined size and random set of markers. Example code to generate features corrdinates for calibration (object points) for board size in units:

```
    std::vector<cv::Point3f> objectPoints;
    for (int i = 0; i < boardSize.height-1; ++i) {
        for (int j = 0; j < boardSize.width-1; ++j) {
            objectPoints.push_back(Point3f(j*squareSize, i*squareSize, 0));
        }
    }
```

Printable ChAruco board pattern: [https://github.com/opencv/opencv/blob/5.x/doc/charuco\_board\_pattern.png](https://github.com/opencv/opencv/blob/5.x/doc/charuco_board_pattern.png) (7X5 ChAruco board, square size: 30 mm, marker size: 15 mm, ArUco dict: DICT\_5X5\_100, page width: 210 mm, page height: 297 mm (A4))

## Create Your Own Pattern

In case if ready pattern does not satisfy your requirements, you can generate your own. OpenCV provides generate\_pattern.py tool in `apps/pattern-tools` of source repository or your binary distribution. The only requirement is Python 3.

Examples:

create a checkerboard pattern in file chessboard.svg with 9 rows, 6 columns and a square size of 20mm:

```
    python generate_pattern.py -o chessboard.svg --rows 9 --columns 6 --type checkerboard --square_size 20
```

create a circle board pattern in file circleboard.svg with 7 rows, 5 columns and a radius of 15 mm:

```
    python generate_pattern.py -o circleboard.svg --rows 7 --columns 5 --type circles --square_size 15
```

create a circle board pattern in file acircleboard.svg with 7 rows, 5 columns and a square size of 10mm and less spacing between circle:

```
    python generate_pattern.py -o acircleboard.svg --rows 7 --columns 5 --type acircles --square_size 10 --radius_rate 2
```

create a radon checkerboard for findChessboardCornersSB() with markers in (7 4), (7 5), (8 5) cells:

```
    python generate_pattern.py -o radon_checkerboard.svg --rows 10 --columns 15 --type radon_checkerboard -s 12.1 -m 7 4 7 5 8 5
```

create a ChAruco board pattern in charuco\_board.svg with 7 rows, 5 columns, square size 30 mm, aruco marker size 15 mm and using DICT\_5X5\_100 as dictionary for aruco markers (it contains in DICT\_ARUCO.json file):

```
    python generate_pattern.py -o charuco_board.svg --rows 7 --columns 5 -T charuco_board --square_size 30 --marker_size 15 -f DICT_5X5_100.json.gz
```

If you want to change the measurement units, use the -u option (e.g. mm, inches, px, m)

If you want to change the page size, use the -w (width) and -h (height) options

If you want to use your own dictionary for the ChAruco board, specify the name of your dictionary file. For example:

```
    python generate_pattern.py -o charuco_board.svg --rows 7 --columns 5 -T charuco_board -f my_dictionary.json
```

You can generate your dictionary in the file my\_dictionary.json with 30 markers and a marker size of 5 bits using the utility provided in `samples/cpp/aruco_dict_utils.cpp`.

```
    bin/example_cpp_aruco_dict_utils.exe my_dict.json -nMarkers=30 -markerSize=5
```

## Pattern Size

Pattern is defined by it's physical board size, element (square or circle) physical size and amount of elements. Factors that affect calibration quality:

-   **Amount of features**. Most of OpenCV functions that work with detected patterns use optimization or some random consensus strategies inside. More features on board means more points for optimization and better estimation quality. Calibration process requires several images. It means that in most of cases lower amount of pattern features may be compensated by higher amount frames.
    
-   **Element size**. The physical size of elements depends on the distance and size in pixels. Each detector defines some minimal size for reliable detection. For circles grid it's circle radius, for chessboard it's square size, for ChAruco board it's ArUco marker element size. General recommendation: larger elements (in frame pixels) reduces detection uncertainty.
    
-   **Board size**. The board should be fully visible, sharp and reliably detected by OpenCV algorithms. So, the board size should satisfy previous items, if it's used with typical target distance. Usually larger board is better, but smaller boards allow to calibrate corners better.
    

## Generic Recommendations

1.  The final pattern should be as flat as possible. It improves calibration accuracy.
2.  Glance pattern is worse than matte. Blinks and shadows on glance surface degrades board detection significantly.
3.  Most of detection algorithms expect white (black) border around the markers. Please do not cut them or cover them.

## [Camera Calibration Square Chess](https://docharvest.github.io/docs/opencv5/tutorials/calib3d/camera_calibration_square_chess/camera_calibration_square_chess/)

Contents

opencv5

Camera Calibration Square Chess

OpenCV 5

Camera Calibration Square Chess

# Camera calibration with square chessboard {#tutorial\_camera\_calibration\_square\_chess}

@tableofcontents

@prev\_tutorial{tutorial\_camera\_calibration\_pattern} @next\_tutorial{tutorial\_camera\_calibration}

Original author

Victor Eruhimov

Compatibility

OpenCV >= 4.0

The goal of this tutorial is to learn how to calibrate a camera given a set of chessboard images.

_Test data_: use images in your data/chess folder.

-   Compile OpenCV with samples by setting BUILD\_EXAMPLES to ON in cmake configuration.
    
-   Go to bin folder and use imagelist\_creator to create an XML/YAML list of your images.
    
-   Then, run calibration sample to get camera parameters. Use square size equal to 3cm.
    

## Pose estimation

Now, let us write code that detects a chessboard in an image and finds its distance from the camera. You can apply this method to any object with known 3D geometry; which you detect in an image.

_Test data_: use chess\_test\*.jpg images from your data folder.

-   Create an empty console project. Load a test image :
    
    ```
    Mat img = imread(argv[1], IMREAD_GRAYSCALE);
    ```
    
-   Detect a chessboard in this image using findChessboard function :
    
    ```
    bool found = findChessboardCorners( img, boardSize, ptvec, CALIB_CB_ADAPTIVE_THRESH );
    ```
    
-   Now, write a function that generates a vector<Point3f> array of 3d coordinates of a chessboard in any coordinate system. For simplicity, let us choose a system such that one of the chessboard corners is in the origin and the board is in the plane _z = 0_
    
-   Read camera parameters from XML/YAML file :
    
    ```
    FileStorage fs( filename, FileStorage::READ );
    Mat intrinsics, distortion;
    fs["camera_matrix"] >> intrinsics;
    fs["distortion_coefficients"] >> distortion;
    ```
    
-   Now we are ready to find a chessboard pose by running \`solvePnP\` :
    
    ```
    vector<Point3f> boardPoints;
    // fill the array
    ...
    
    solvePnP(Mat(boardPoints), Mat(foundBoardCorners), cameraMatrix,
                         distCoeffs, rvec, tvec, false);
    ```
    
-   Calculate reprojection error like it is done in calibration sample (see opencv/samples/cpp/calibration.cpp, function computeReprojectionErrors).
    

Question: how would you calculate distance from the camera origin to any one of the corners? Answer: After obtaining the camera pose using solvePnP, the rotation (rvec) and translation (tvec) vectors define the transformation between the world (chessboard) coordinates and the camera coordinate system. To calculate the distance from the camera’s origin to any chessboard corner, first transform the 3D point from the chessboard coordinate system to the camera coordinate system (if not already done) and then compute its Euclidean distance using the L2 norm, for example:

```
    // assuming 'point' is the 3D position of a chessboard corner in the camera coordinate system
    double distance = norm(point);
```

This is equivalent to applying the L2 norm on the 3D point’s coordinates (x, y, z).

## [Camera Calibration](https://docharvest.github.io/docs/opencv5/tutorials/calib3d/camera_calibration/camera_calibration/)


## [Multiview Calibration](https://docharvest.github.io/docs/opencv5/tutorials/calib3d/camera_multiview_calibration/multiview_calibration/)

Contents

opencv5

Multiview Calibration

OpenCV 5

Multiview Calibration

# Multi-view Camera Calibration Tutorial {#tutorial\_multiview\_camera\_calibration}

@tableofcontents

@prev\_tutorial{tutorial\_interactive\_calibration} @next\_tutorial{tutorial\_usac}

Original author

Maksym Ivashechkin, Linfei Pan

Compatibility

OpenCV >= 5.0

## Structure

This tutorial consists of the following sections:

-   Introduction
-   Briefly
-   How to run
-   Python example
-   Details Of The Algorithm
-   Method Input
-   Method Output
-   Method Input
-   Pseudocode
-   Python sample API
-   C++ sample API
-   Practical Debugging Techniques

## Introduction

Multiview calibration is a very important task in computer vision. It is widely used in 3D reconstruction, structure from motion, autonomous driving, etc. The calibration procedure is often the first step for any vision task that must be done to obtain the intrinsics and extrinsics parameters of the cameras. The accuracy of camera calibration parameters directly influences all further computations and results, hence, estimating precise intrinsics and extrinsics is crucial.

The calibration algorithms require a set of images for each camera, where on the images a calibration pattern (e.g., checkerboard, ChArUco, etc.) is visible and detected. Additionally, to get results with a real scale, the 3D distance between two neighbor points of the calibration pattern grid should be measured. For extrinsics calibration, images must share the calibration pattern obtained from different views. An example setup can be found in the following figure. Moreover, images that share the pattern grid have to be taken at the same moment, or in other words, cameras must be synchronized. Otherwise, the extrinsics calibration will fail. Note that if each pattern point can be uniquely determined (for example, if a ChArUco target is used, see @ref cv::aruco::CharucoBoard), it is also possible to calibrate based only on partial observation. This is recommended as the overlapping field of view between camera pairs is usually limited in multiview-camera calibration, and it is generally difficult for them to observe the complete pattern at the same time.

The intrinsics calibration incorporates the estimation of focal lengths, skew, and the principal point of the camera; these parameters are combined in the intrinsic upper triangular matrix of size 3x3. Additionally, intrinsic calibration includes finding the distortion parameters of the camera.

The extrinsics parameters represent a relative rotation and translation between two cameras. For each frame, suppose the absolute camera pose for camera \\f$i\\f$ is \\f$R\_i, t\_i\\f$, and the relative camera pose between camera \\f$i\\f$ and camera \\f$j\\f$ is \\f$R\_{ij}, t\_{ij}\\f$. Suppose \\f$R\_1, t\_1\\f$, and \\f$R\_{1i}\\f$ for any \\f$i\\not=1\\f$ are known, then its pose can be calculated by \\f\[ R\_i = R\_{1i} R\_1\\f\] \\f\[ t\_i = R\_{1i} t\_1 + t\_{1i}\\f\]

Since the relative pose between two cameras can be calculated by \\f\[ R\_{ij} = R\_j R\_i^\\top \\f\] \\f\[ t\_{ij} = -R\_{ij} t\_i + R\_j \\f\]

This implies that any other relative pose of the form \\f$R\_{ij}, i\\not=1\\f$ is redundant. Therefore, for \\f$N\\f$ cameras, a sufficient amount of correctly selected pairs of estimated relative rotations and translations is \\f$N-1\\f$, while extrinsics parameters for all possible pairs \\f$N^2 = N \* (N-1) / 2\\f$ could be derived from those that are estimated. More details about intrinsics calibration can be found in this tutorial @ref tutorial\_camera\_calibration\_pattern, and its implementation @ref cv::calibrateCamera.

After intrinsics and extrinsics calibration, the projection matrices of cameras are found by combing intrinsic, rotation matrices, and translation. The projection matrices enable doing triangulation (3D reconstruction), rectification, finding epipolar geometry, etc.

The following sections describe the individual algorithmic steps of the overall multi-camera calibration pipeline:

## Briefly

The algorithm consists of three major steps that could be enumerated as follows:

1.  Calibrate intrinsics parameters (intrinsic matrix and distortion coefficients) for each camera independently.
2.  Calibrate pairwise cameras (using camera pair registration) using intrinsics parameters from step 1.
3.  Do global optimization using all cameras simultaneously to refine extrinsic parameters.

# How to run:

Assume we have `N` camera views, for each `i`\-th view there are `M` images containing pattern points (e.g., checkerboard).

## Python example

There are two options to run the sample code in Python (`opencv/apps/multiview-calibration/multiview_calibration.py`) either with raw images or provided points. The first option is to prepare `N` files where each file has the path to an image per line (images of a specific camera of the corresponding file). Leave the line empty, if there is no corresponding image for the camera in a certain frame. For example, a file for camera `i` should look like (`file_i.txt`):

```
/path/to/image_1_of_camera_i

/path/to/image_3_of_camera_i
...
/path/to/image_M_of_camera_i
```

The path to images should be a relative path concerning `file_i.txt`. Then sample program could be run via the command line as follows:

```
$ python3 multiview_calibration.py --pattern_size W,H --pattern_type TYPE --is_fisheye IS_FISHEYE_1,...,IS_FISHEYE_N \
--pattern_distance DIST --filenames /path/to/file_1.txt,...,/path/to/file_N.txt
```

Replace `W` and `H` with the size of the pattern points, `TYPE` with the name of a type of the calibration grid (supported patterns: `checkerboard`, `circles`, `acircles`), `IS_FISHEYE` corresponds to the camera type (1 - is fisheye, 0 - pinhole), `DIST` is pattern distance (i.e., the distance between two cells of a checkerboard). The sample script automatically detects image points according to the specified pattern type. By default, detection is done in parallel, but this option could be turned off.

Additional (optional) flags to the Python sample that could be used are as follows:

-   `--winsize` - pass values `H,W` to define window size for corners detection (default is 5,5).
-   `--debug_corners` - pass `True` or `False`. If `True` program shows several random images with detected corners for a user to manually verify the detection (default is `False`).
-   `--points_json_file` - pass name of JSON file where image and pattern points could be saved after detection. Later this file could be used to run sample code. The default value is '' (nothing is saved).
-   `--find_intrinsics_in_python` - pass `0` or `1`. If `1` then the Python sample automatically calibrates intrinsics parameters and reports reprojection errors. The multiview calibration is done only for extrinsics parameters. This flag aims to separate the calibration process and make it easier to debug what goes wrong.
-   `--path_to_save` - path to save results in a pickle file
-   `--path_to_visualize` - path to results pickle file needed to run visualization
-   `--visualize` - visualization flag (True or False), if True only runs visualization but path\_to\_visualize must be provided
-   `--resize_image_detection` - True / False, if True an image will be resized to speed up corners detection
-   `--gt_file` - path to the file containing the ground truth. An example can be found in `opencv_extra/testdata/python/hololens_multiview_calibration_images/HololensCapture4/gt.txt` (currently in pull request [1089](https://github.com/opencv/opencv_extra/pull/1089)). It is in the format
    
    ```
    K_0 (3 x 3)
    distortion_0 (1 row),
    R_0 (3 x 3)
    t_0 (3 x 1)
    ...
    K_n (3 x 3)
    distortion_n (1 row),
    R_n (3 x 3)
    t_n (3 x 1)
    # (Optional, pose for each frame)
    R_f0 (3 x 3)
    t_f1 (3 x 1)
    ...
    R_fm (3 x 3)
    t_fm (3 x 1)
    ```
    

Alternatively, the Python sample could be run from a JSON file that should contain image points, pattern points, and a boolean indicator of whether a camera is fisheye. An example JSON file is in `opencv_extra/testdata/python/multiview_calibration_data.json` (current in pull request [1001](https://github.com/opencv/opencv_extra/pull/1001)). Its format should be a dictionary with the following items:

-   `object_points` - list of lists of pattern (object) points (size NUM\_POINTS x 3).
-   `image_points` - list of lists of lists of lists of image points (size NUM\_CAMERAS x NUM\_FRAMES x NUM\_POINTS x 2). Note that it is of fixed size. To have incomplete observation, set the corresponding image points to be invalid (for example, (-1, -1))
-   `image_sizes` - list of tuples (width x height) of image size.
-   `is_fisheye` - list of boolean values (true - fisheye camera, false - otherwise). Optionally:
-   `Ks` and `distortions` - intrinsics parameters. If they are provided in JSON file then the proposed method does not estimate intrinsics parameters. `Ks` (intrinsic matrices) is a list of lists of lists (NUM\_CAMERAS x 3 x 3), `distortions` is a list of lists (NUM\_CAMERAS x NUM\_VALUES) of distortion parameters.
-   `images_names` - list of lists (NUM\_CAMERAS x NUM\_FRAMES x string) of image filenames for visualization of points after calibration.

```
$ python3 multiview_calibration.py --json_file /path/to/json
```

The description of flags can be found directly by running the sample script with the `help` option:

```
python3 multiview_calibration.py --help
```

The expected output in the Linux terminal for `multiview_calibration_images` data (from `opencv_extra/testdata/python/` generated in Blender) should be the following:

The expected output for real-life calibration images in `opencv_extra/testdata/python/real_multiview_calibration_images` is the following:

The expected output for real-life calibration images in `opencv_extra/testdata/python/hololens_multiview_calibration_images` is the following The command used

```
python3 multiview_calibration.py --filenames ../../results/hololens/HololensCapture1/output/cam_0.txt,../../results/hololens/HololensCapture1/output/cam_1.txt,../../results/hololens/HololensCapture1/output/cam_2.txt,../../results/hololens/HololensCapture1/output/cam_3.txt --pattern_size 6,10 --pattern_type charuco --fisheye 0,0,0,0 --pattern_distance 0.108 --board_dict_path ../../results/hololens/charuco_dict.json --gt_file ../../results/hololens/HololensCapture1/output/gt.txt
```

## Details Of The Algorithm

1.  **Intrinsics estimation, and rotation and translation initialization**
    1.  If the intrinsics are not provided, the calibration procedure starts intrinsics calibration independently for each camera using the OpenCV function @ref cv::calibrateCamera.
        
        1.  The following flags are used for the calibrating pinhole camera and fisheye camera
        
        -   Pinhole: @ref cv::CALIB\_ZERO\_TANGENT\_DIST - it zeroes out tangential distortion coefficients, and makes it consistent with the fisheye camera model.
        -   Fisheye: @ref cv::CALIB\_RECOMPUTE\_EXTRINSIC, cv::CALIB\_FIX\_SKEW - the intrinsic calibration of the fisheye camera model is not as stable, and these two parameters are empirically found to be helpful to robustify the result
        
        2.  To avoid degeneracy setting that all image points are collinear, a degeneracy check is performed by marking images with fewer than 4 observations or frames with less than 0.5% coverage as invalid.
        3.  Output of intrinsic calibration also includes rotation, translation vectors (transform of pattern points to camera frame), and errors per frame. For each frame, the index of the camera with the lowest error among all cameras is saved.
    2.  Otherwise, if intrinsics are known, then the proposed algorithm runs perspective-n-point estimation (@ref cv::solvePnP, @ref cv::fisheye::solvePnP) to estimate rotation and translation vectors, and reprojection error for each frame.
2.  **Initialization of relative camera pose**.
    1.  If the initial relative poses are not assumed known (CALIB\_USE\_EXTRINSIC\_GUESS flag not set), then the relative camera extrinsics are found by traversing a spanning tree and estimating pairwise relative camera pose.
        1.  **Miminal spanning tree establishment**. Assume that cameras can be represented as nodes of a connected graph. An edge between two cameras is created if there is any concurrent observation over all frames. If the graph does not connect all cameras (i.e., exists a camera that has no overlap with other cameras) then calibration is not possible. Otherwise, the next step consists of finding the [maximum spanning tree](https://en.wikipedia.org/wiki/Minimum_spanning_tree) (MST) of this graph. The MST captures all the best pairwise camera connections. The weight of edges across all frames is a weighted combination of multiple factors:
            -   (Major) The number of pattern points detected in both images (cameras)
            -   Ratio of area of convex hull of projected points in the image to the image resolution.
            -   Angle between cameras' optical axes (found from rotation vectors).
            -   Angle between the camera's optical axis and the pattern's normal vector (found from 3 non-collinear pattern points).
        2.  **Initialization of relative camera pose**. The initial estimate of cameras' extrinsics is found by pairwise camera registration (see @ref cv::registerCameras). Without loss of generality, the 0-th camera’s rotation is fixed to identity and translation to zero vector, and the 0-th node becomes the root of the MST. The order of stereo calibration is selected by traversing MST in a breadth-first search, starting from the root. The total number of pairs (also the number of edges of the tree) is NUM\_CAMERAS - 1, which is a property of a tree graph.
    2.  Else if prior knowledge of camera pose is provided, this step can be skipped
3.  **Global optimization**. Given the initial estimate of extrinsics, the aim is to polish results using global optimization (via the Levenberq-Marquardt method, see @ref cv::LevMarq class).
    -   To reduce the total number of iterations, all rotation and translation vectors estimated in the first step from intrinsic calibration with the lowest error are transformed to be relative with respect to the root camera.
    -   The total number of parameters is (NUM\_CAMERAS - 1) x (3 + 3) + NUM\_FRAMES x (3 + 3), where 3 stands for a rotation vector and 3 for a translation vector. The first part of the parameters is for extrinsics, and the second part is for rotation and translation vectors per frame. This can be seen from the illustrational plot in the introduction. For each frame, with the relative pose between cameras being fixed, no but one camera pose is needed to calculate the camera poses.
    -   _Robust function_ is additionally applied to mitigate the impact of outlier points during the optimization. The function has the shape of the derivative of Gaussian, or it is $x \* exp(-x/s)$ (efficiently implemented by approximation of the `exp`), where `x` is a square pixel error, and `s` is manually pre-defined scale. The choice of this function is that it increases on the interval of `0` to `y` pixel error, and it decreases thereafter. The idea is that the function slightly decreases errors until it reaches `y`, and if the error is too high (more than `y`) then its robust value is limited to `0`. The value of the scale factor was found by exhaustive evaluation that forces the robust function to almost linearly increase until the robust value of an error is 10 px and decreases afterward (see plot of the function below). The value itself is equal to 30 but could be modified in OpenCV source code.

## Method Input

The high-level input of the proposed method is as follows:

-   Pattern (object) points: (NUM\_FRAMES x) NUM\_PATTERN\_POINTS x 3. Points may contain a copy of pattern points along frames.
-   Image points: NUM\_CAMERAS x NUM\_FRAMES x NUM\_PATTERN\_POINTS x 2.
-   Image sizes: NUM\_CAMERAS x 2 (width and height).
-   Detection mask: matrix of size NUM\_CAMERAS x NUM\_FRAMES that indicates whether pattern points are detected for specific camera and frame index.
-   Rs, Ts (Optional): (relative) rotations and translations with respect to camera 0. The number of vectors is `NUM_CAMERAS-1`, for the first camera rotation and translation vectors are zero.
-   Ks (optional): intrinsic matrices per camera.
-   Distortions (optional).
-   use\_intrinsics\_guess: indicates whether intrinsics are provided.
-   Flags\_intrinsics: flag for intrinsics estimation.
-   use\_extrinsic\_guess: indicates whether extrinsics are provided.

## Method Output

The high-level output of the proposed method is the following:

-   Rs, Ts: (relative) Rotation and translation vectors of extrinsics parameters with respect to camera 0. The number of vectors is `NUM_CAMERAS-1`, for the first camera rotation and translation vectors are zero.
-   Intrinsic matrix for each camera.
-   Distortion coefficients for each camera.
-   Rotation and translation vectors of each frame pattern with respect to camera 0. The combination of rotation and translation is able to transform the pattern points to the camera coordinate space, and hence with intrinsics parameters project 3D points to the image.
-   Matrix of reprojection errors of size NUM\_CAMERAS x NUM\_FRAMES
-   Output pairs used for initial estimation of extrinsics, the number of pairs is `NUM_CAMERAS-1`.

## Pseudocode

The idea of the method could be demonstrated in high-level pseudocode whereas the whole C++ implementation of the proposed approach is implemented in the `opencv/modules/calib/src/multiview_calibration.cpp` file.

```
def mutiviewCalibration (pattern_points, image_points, detection_mask):
  for cam_i = 1,…,NUMBER_CAMERAS:
    if CALIBRATE_INTRINSICS:
      K_i, distortion_i, rvecs_i, tvecs_i = calibrateCamera(pattern_points, image_points[cam_i])
    else:
      rvecs_i, tvecs_i = solvePnP(pattern_points, image_points[cam_i], K_i, distortion_i)
    # Select best rvecs, tvecs based on reprojection errors. Process data:
    if CALIBRATE_EXTRINSICS:
      pattern_img_area[cam_i][frame] = area(convexHull(image_points[cam_i][frame]))
      angle_to_board[cam_i][frame] = arccos(pattern_normal_frame * optical_axis_cam_i)
      angle_cam_to_cam[cam_i][cam_j] = arccos(optical_axis_cam_i * optical_axis_cam_j)
    graph = maximumSpanningTree(detection_mask, pattern_img_area, angle_to_board, angle_cam_to_cam)
    camera_pairs = bread_first_search(graph, root_camera=0)
    for pair in camera_pairs:
      # find relative rotation, translation from camera i to j
      R_ij, t_ij = registerCameras(pattern_points_i, pattern_points_j, image_points[i], image_points[j])
    else:
      pass
  R*, t* = optimizeLevenbergMarquardt(R, t, pattern_points, image_points, K, distortion)
```

## Python sample API

To run the calibration procedure in Python follow the following steps (see sample code in `apps/multiview-calibration/multiview_calibration.py`):

\-# **Prepare data**:

@snippet apps/multiview-calibration/multiview\_calibration.py calib\_init

The detection mask matrix is later built by checking the size of image points after detection:

\-# **Detect pattern points on images**:

@snippet apps/multiview-calibration/multiview\_calibration.py detect\_pattern

\-# **Build detection mask matrix**:

@snippet apps/multiview-calibration/multiview\_calibration.py detection\_matrix

\-# **Finally, the calibration function is run as follows**:

@snippet apps/multiview-calibration/multiview\_calibration.py multiview\_calib

## C++ sample API

To run the calibration procedure in C++ follow the following steps (see sample code in `opencv/samples/cpp/multiview_calibration_sample.cpp`):

\-# Prepare data similarly to Python sample, ie., pattern size and scale, fisheye camera mask, files containing image filenames, and pass them to function:

@snippet samples/cpp/multiview\_calibration\_sample.cpp detectPointsAndCalibrate\_signature

\-# **Initialize data**:

@snippet samples/cpp/multiview\_calibration\_sample.cpp calib\_init

\-# **Set up ChArUco detector**: optional, only needed if the pattern type is ChArUco

@snippet samples/cpp/multiview\_calibration\_sample.cpp charuco\_detector

\-# **Detect pattern points on images**:

@snippet samples/cpp/multiview\_calibration\_sample.cpp detect\_pattern

\-# **Build detection mask matrix**:

@snippet samples/cpp/multiview\_calibration\_sample.cpp detection\_matrix

\-# **Run calibration**:

@snippet samples/cpp/multiview\_calibration\_sample.cpp multiview\_calib

## Practical Debugging Techniques

\-# **Intrinsics calibration** -# Choose the most suitable flags to perform calibration. For example, when the distortion of the pinhole camera model is not evident, it may not be necessary to use the @ref cv::CALIB\_RATIONAL\_MODEL. For the fisheye camera model, it is recommended to use @ref cv::CALIB\_RECOMPUTE\_EXTRINSIC and @ref cv::CALIB\_FIX\_SKEW.

\-# Camera intrinsics can be better estimated when points are more scattered in the image. The following code can be used to plot out the heat map of the observed point

```
@snippet apps/multiview-calibration/multiview_calibration.py plot_detection
![condensely scattered](camera_multiview_calibration/images/count_example.png)
The left example is not well scattered while the right example shows a better-scattered pattern
```

\-# Plot out the reprojection error to ensure the result is reasonable -# If ground truth camera intrinsics are available, a visualization of the estimated error on intrinsics is provided.

```
@snippet apps/multiview-calibration/multiview_calibration.py vis_intrinsics_error

resulting visualization would look similar to
![distortion error](camera_multiview_calibration/images/distort_error.jpg)
```

\-# **Multiview calibration** -# Use `plotCamerasPosition` in apps/multiview-calibration/multiview\_calibration.py to plot out the graph established for multiview calibration. shows positions of cameras, checkerboard (of a random frame), and pairs of cameras connected by black lines explicitly demonstrating tuples that were used in the initial stage of stereo calibration. The dashed gray lines demonstrate the non-spanning tree edges that are also used in the optimization. The width of these lines indicates the number of co-visible frames i.e. the strength of connection. It is more desired if the edges in the graph are dense and thick. For the right tree, the connection for camera four is rather limited and can be strengthened

\-# Visulization method for showing the reprojection error with arrows (from a given point to the back-projected one) is provided (see `plotProjection` in apps/multiview-calibration/multiview\_calibration.py). The color of the arrows highlights the error values. Additionally, the title reports mean error on this frame and its accuracy among other frames used in calibration.

## [Interactive Calibration](https://docharvest.github.io/docs/opencv5/tutorials/calib3d/interactive_calibration/interactive_calibration/)

Contents

opencv5

Interactive Calibration

OpenCV 5

Interactive Calibration

# Interactive camera calibration application {#tutorial\_interactive\_calibration}

@tableofcontents

@prev\_tutorial{tutorial\_real\_time\_pose} @next\_tutorial{tutorial\_multiview\_camera\_calibration}

Original author

Vladislav Sovrasov

Compatibility

OpenCV >= 3.1

According to classical calibration technique user must collect all data first and then run @ref cv::calibrateCamera function to obtain camera parameters. If average re-projection error is huge or if estimated parameters seems to be wrong, process of selection or collecting data and starting of @ref cv::calibrateCamera repeats.

Interactive calibration process assumes that after each new data portion user can see results and errors estimation, also he can delete last data portion and finally, when dataset for calibration is big enough starts process of auto data selection.

## Main application features

The sample application will:

-   Determine the distortion matrix and confidence interval for each element
-   Determine the camera matrix and confidence interval for each element
-   Take input from camera or video file
-   Read configuration from XML file
-   Save the results into XML file
-   Calculate re-projection error
-   Reject patterns views on sharp angles to prevent appear of ill-conditioned jacobian blocks
-   Auto switch calibration flags (fix aspect ratio and elements of distortion matrix if needed)
-   Auto detect when calibration is done by using several criteria
-   Auto capture of static patterns (user doesn't need press any keys to capture frame, just don't move pattern for a second)

Supported patterns:

-   Black-white chessboard
-   Asymmetrical circle pattern
-   Dual asymmetrical circle pattern
-   chAruco (chessboard with Aruco markers)
-   Symmetrical circle pattern

## Description of parameters

Application has two groups of parameters: primary (passed through command line) and advances (passed through XML file).

### Primary parameters:

All of this parameters are passed to application through a command line.

\-\[parameter\]=\[default value\]: description

-   \-v=\[filename\]: get video from filename, default input -- camera with id=0
-   \-ci=\[0\]: get video from camera with specified id
-   \-flip=\[false\]: vertical flip of input frames
-   \-t=\[circles\]: pattern for calibration (circles, chessboard, dualCircles, chAruco, symcircles)
-   \-sz=\[16.3\]: distance between two nearest centers of circles or squares on calibration board
-   \-dst=\[295\] distance between white and black parts of dualCircles pattern
-   \-w=\[width\]: width of pattern (in corners or circles)
-   \-h=\[height\]: height of pattern (in corners or circles)
-   \-of=\[camParams.xml\]: output file name
-   \-ft=\[true\]: auto tuning of calibration flags
-   \-vis=\[grid\]: captured boards visualization (grid, window)
-   \-d=\[0.8\]: delay between captures in seconds
-   \-pf=\[defaultConfig.xml\]: advanced application parameters file
-   \-force\_reopen=\[false\]: Forcefully reopen camera in case of errors. Can be helpful for ip cameras with unstable connection.
-   \-save\_frames=\[false\]: Save frames that contribute to final calibration
-   \-zoom=\[1\]: Zoom factor applied to the preview image

### Advanced parameters:

By default values of advanced parameters are stored in defaultConfig.xml

@code{.xml} 0 200 100 1 30 10 1e-7 30 0 0.1 1280 720 @endcode

-   _charuco\_dict_: name of special dictionary, which has been used for generation of chAruco pattern
-   _charuco\_square\_length_: size of square on chAruco board (in pixels)
-   _charuco\_marker\_size_: size of Aruco markers on chAruco board (in pixels)
-   _calibration\_step_: interval in frames between launches of @ref cv::calibrateCamera
-   _max\_frames\_num_: if number of frames for calibration is greater than this value frames filter starts working. After filtration size of calibration dataset is equals to _max\_frames\_num_
-   _min\_frames\_num_: if number of frames is greater than this value turns on auto flags tuning, undistorted view and quality evaluation
-   _solver\_eps_: precision of Levenberg-Marquardt solver in @ref cv::calibrateCamera
-   _solver\_max\_iters_: iterations limit of solver
-   _fast\_solver_: if this value is nonzero and Lapack is found QR decomposition is used instead of SVD in solver. QR faster than SVD, but potentially less precise
-   _frame\_filter\_conv\_param_: parameter which used in linear convolution of bicriterial frames filter
-   _camera\_resolution_: resolution of camera which is used for calibration

**Note:** _charuco\_dict_, _charuco\_square\_length_ and _charuco\_marker\_size_ are used for chAruco pattern generation (see Aruco module description for details: [Aruco tutorials](https://github.com/opencv/opencv_contrib/tree/5.x/modules/aruco/tutorials))

Default chAruco pattern:

## Dual circles pattern

To make this pattern you need standard OpenCV circles pattern and binary inverted one. Place two patterns on one plane in order when all horizontal lines of circles in one pattern are continuations of similar lines in another. Measure distance between patterns as shown at picture below pass it as **dst** command line parameter. Also measure distance between centers of nearest circles and pass this value as **sz** command line parameter.

This pattern is very sensitive to quality of production and measurements.

## Data filtration

When size of calibration dataset is greater than _max\_frames\_num_ starts working data filter. It tries to remove "bad" frames from dataset. Filter removes the frame on which \\f$loss\_function\\f$ takes maximum.

\\f\[loss\_function(i)=\\alpha RMS(i)+(1-\\alpha)reducedGridQuality(i)\\f\]

**RMS** is an average re-projection error calculated for frame _i_, **reducedGridQuality** is scene coverage quality evaluation without frame _i_. \\f$\\alpha\\f$ is equals to **frame\_filter\_conv\_param**.

## Calibration process

To start calibration just run application. Place pattern ahead the camera and fixate pattern in some pose. After that wait for capturing (will be shown message like "Frame #i captured"). Current focal distance and re-projection error will be shown at the main screen. Move pattern to the next position and repeat procedure. Try to cover image plane uniformly and don't show pattern on sharp angles to the image plane.

If calibration seems to be successful (confidence intervals and average re-projection error are small, frame coverage quality and number of pattern views are big enough) application will show a message like on screen below.

Hot keys:

-   Esc -- exit application
-   s -- save current data to XML file
-   r -- delete last frame
-   d -- delete all frames
-   u -- enable/disable applying of undistortion
-   v -- switch visualization mode

## Results

As result you will get camera parameters and confidence intervals for them.

Example of output XML file:

@code{.xml} "Thu 07 Apr 2016 04:23:03 PM MSK" 21 1280 720 3 3

d

1.2519588293098975e+03 0. 6.6684948780852471e+02 0. 1.2519588293098975e+03 3.6298123112613683e+02 0. 0. 1. 4 1

d

0\. 1.2887048808572649e+01 2.8536856683866230e+00 2.8341737483430314e+00 1 5

d

1.3569117181595716e-01 -8.2513063822554633e-01 0. 0. 1.6412101575010554e+00 5 1

d

1.5570675523402111e-02 8.7229075437543435e-02 0. 0. 1.8382427901856876e-01 4.2691743074130178e-01 @endcode

## [Real Time Pose](https://docharvest.github.io/docs/opencv5/tutorials/calib3d/real_time_pose/real_time_pose/)

Contents

opencv5

Real Time Pose

OpenCV 5

Real Time Pose

# Real Time pose estimation of a textured object {#tutorial\_real\_time\_pose}

@tableofcontents

@prev\_tutorial{tutorial\_camera\_calibration} @next\_tutorial{tutorial\_interactive\_calibration}

Original author

Edgar Riba

Compatibility

OpenCV >= 5.0

Nowadays, augmented reality is one of the top research topic in computer vision and robotics fields. The most elemental problem in augmented reality is the estimation of the camera pose respect of an object in the case of computer vision area to perform subsequent 3D rendering or, in robotics, to obtain an object pose for grasping and manipulation. However, this is not a trivial problem to solve due to the fact that the most common issue in image processing is the computational cost of applying a lot of algorithms or mathematical operations for solving a problem which is basic and immediately for humans.

## Goal

This tutorial explains how to build a real-time application to estimate the camera pose in order to track a textured object with six degrees of freedom given a 2D image and its 3D textured model.

The application will have the following parts:

-   Read 3D textured object model and object mesh.
-   Take input from Camera or Video.
-   Extract ORB features and descriptors from the scene.
-   Match scene descriptors with model descriptors using Flann matcher.
-   Pose estimation using PnP + Ransac.
-   Linear Kalman Filter for bad poses rejection.

## Theory

In computer vision estimate the camera pose from _n_ 3D-to-2D point correspondences is a fundamental and well understood problem. The most general version of the problem requires estimating the six degrees of freedom of the pose and five calibration parameters: focal length, principal point, aspect ratio and skew. It could be established with a minimum of 6 correspondences, using the well known Direct Linear Transform (DLT) algorithm. There are, though, several simplifications to the problem which turn into an extensive list of different algorithms that improve the accuracy of the DLT.

The most common simplification is to assume known calibration parameters which is the so-called Perspective-_n_\-Point problem:

**Problem Formulation:** Given a set of correspondences between 3D points \\f$p\_i\\f$ expressed in a world reference frame, and their 2D projections \\f$u\_i\\f$ onto the image, we seek to retrieve the pose (\\f$R\\f$ and \\f$t\\f$) of the camera w.r.t. the world and the focal length \\f$f\\f$.

OpenCV provides four different approaches to solve the Perspective-_n_\-Point problem which return \\f$R\\f$ and \\f$t\\f$. Then, using the following formula it's possible to project 3D points into the image plane:

\\f\[s\\ \\left \[ \\begin{matrix} u \\ v \\ 1 \\end{matrix} \\right \] = \\left \[ \\begin{matrix} f\_x & 0 & c\_x \\ 0 & f\_y & c\_y \\ 0 & 0 & 1 \\end{matrix} \\right \] \\left \[ \\begin{matrix} r\_{11} & r\_{12} & r\_{13} & t\_1 \\ r\_{21} & r\_{22} & r\_{23} & t\_2 \\ r\_{31} & r\_{32} & r\_{33} & t\_3 \\end{matrix} \\right \] \\left \[ \\begin{matrix} X \\ Y \\ Z\\ 1 \\end{matrix} \\right \]\\f\]

The complete documentation of how to manage with this equations is in @ref \_3d "3d".

## Source code

You can find the source code of this tutorial in the `samples/cpp/tutorial_code/calib3d/real_time_pose_estimation/` folder of the OpenCV source library.

The tutorial consists of two main programs:

\-# **Model registration**

```
This application is intended for users who do not have a 3D textured model of the object to be detected.
You can use this program to create your own textured 3D model. This program only works for planar
objects, then if you want to model an object with complex shape you should use a sophisticated
software to create it.

The application needs an input image of the object to be registered and its 3D mesh. We have also
to provide the intrinsic parameters of the camera with which the input image was taken. All the
files need to be specified using the absolute path or the relative one from your application’s
working directory. If no files are specified the program will try to open the provided default
parameters.

The application starts up extracting the ORB features and descriptors from the input image and
then uses the mesh along with the [Möller–Trumbore intersection
algorithm](http://en.wikipedia.org/wiki/M%C3%B6ller%E2%80%93Trumbore_intersection_algorithm/)
to compute the 3D coordinates of the found features. Finally, the 3D points and the descriptors
are stored in different lists in a file with YAML format which each row is a different point. The
technical background on how to store the files can be found in the @ref tutorial_file_input_output_with_xml_yml
tutorial.

![](images/registration.png)
```

\-# **Model detection**

```
The aim of this application is to estimate in real time the object pose given its 3D textured model.

The application starts up loading the 3D textured model in YAML file format with the same
structure explained in the model registration program. From the scene, the ORB features and
descriptors are detected and extracted. Then, is used @ref cv::FlannBasedMatcher with
@ref cv::flann::GenericIndex to do the matching between the scene descriptors and the model descriptors.
Using the found matches along with @ref cv::solvePnPRansac function the `R` and `t` of
the camera are computed. Finally, a KalmanFilter is applied in order to reject bad poses.

In the case that you compiled OpenCV with the samples, you can find it in `opencv/build/bin/cpp-tutorial-pnp_detection`.
Then you can run the application and change some parameters:
@code{.cpp}
This program shows how to detect an object given its 3D textured model. You can choose to use a recorded video or the webcam.
Usage:
  ./cpp-tutorial-pnp_detection -help
Keys:
  'esc' - to quit.
--------------------------------------------------------------------------

Usage: cpp-tutorial-pnp_detection [params]

  -c, --confidence (value:0.95)
      RANSAC confidence
  -e, --error (value:2.0)
      RANSAC reprojection error
  -f, --fast (value:true)
      use of robust fast match
  -h, --help (value:true)
      print this message
  --in, --inliers (value:30)
      minimum inliers for Kalman update
  --it, --iterations (value:500)
      RANSAC maximum iterations count
  -k, --keypoints (value:2000)
      number of keypoints to detect
  --mesh
      path to ply mesh
  --method, --pnp (value:0)
      PnP method: (0) ITERATIVE - (1) EPNP - (2) P3P - (3) DLS
  --model
      path to yml model
  -r, --ratio (value:0.7)
      threshold for ratio test
  -v, --video
      path to recorded video
@endcode
For example, you can run the application changing the pnp method:
@code{.cpp}
./cpp-tutorial-pnp_detection --method=2
@endcode
```

## Explanation

Here is explained in detail the code for the real time application:

\-# **Read 3D textured object model and object mesh.**

```
In order to load the textured model I implemented the *class* **Model** which has the function
*load()* that opens a YAML file and take the stored 3D points with its corresponding descriptors.
You can find an example of a 3D textured model in
`samples/cpp/tutorial_code/calib3d/real_time_pose_estimation/Data/cookies_ORB.yml`.

 @snippet samples/cpp/tutorial_code/calib3d/real_time_pose_estimation/src/Model.cpp model_load

In the main program the model is loaded as follows:
@code{.cpp}
Model model;               // instantiate Model object
model.load(yml_read_path); // load a 3D textured object model
@endcode
In order to read the model mesh I implemented a *class* **Mesh** which has a function *load()*
that opens a \f$*\f$.ply file and store the 3D points of the object and also the composed triangles.
You can find an example of a model mesh in
`samples/cpp/tutorial_code/calib3d/real_time_pose_estimation/Data/box.ply`.

@snippet samples/cpp/tutorial_code/calib3d/real_time_pose_estimation/src/Mesh.cpp mesh_load

In the main program the mesh is loaded as follows:
@code{.cpp}
Mesh mesh;                // instantiate Mesh object
mesh.load(ply_read_path); // load an object mesh
@endcode
You can also load different model and mesh:
@code{.cpp}
./cpp-tutorial-pnp_detection --mesh=/absolute_path_to_your_mesh.ply --model=/absolute_path_to_your_model.yml
@endcode
```

\-# **Take input from Camera or Video**

```
To detect is necessary capture video. It's done loading a recorded video by passing the absolute
path where it is located in your machine. In order to test the application you can find a recorded
video in `samples/cpp/tutorial_code/calib3d/real_time_pose_estimation/Data/box.mp4`.
@code{.cpp}
cv::VideoCapture cap;                // instantiate VideoCapture
cap.open(video_read_path);           // open a recorded video

if(!cap.isOpened())                  // check if we succeeded
{
   std::cout << "Could not open the camera device" << std::endl;
   return -1;
}
@endcode
Then the algorithm is computed frame per frame:
@code{.cpp}
cv::Mat frame, frame_vis;

while(cap.read(frame) && cv::waitKey(30) != 27)    // capture frame until ESC is pressed
{

    frame_vis = frame.clone();                     // refresh visualisation frame

    // MAIN ALGORITHM

}
@endcode
You can also load different recorded video:
@code{.cpp}
./cpp-tutorial-pnp_detection --video=/absolute_path_to_your_video.mp4
@endcode
```

\-# **Extract ORB features and descriptors from the scene**

```
The next step is to detect the scene features and extract it descriptors. For this task I
implemented a *class* **RobustMatcher** which has a function for keypoints detection and features
extraction. You can find it in
`samples/cpp/tutorial_code/calib3d/real_time_pose_estimation/src/RobustMatcher.cpp`. In your
*RobustMatch* object you can use any of the 2D features detectors of OpenCV. In this case I used
@ref cv::ORB features because is based on @ref cv::FAST to detect the keypoints and cv::xfeatures2d::BriefDescriptorExtractor
to extract the descriptors which means that is fast and robust to rotations. You can find more
detailed information about *ORB* in the documentation.

The following code is how to instantiate and set the features detector and the descriptors
extractor:

@snippet samples/cpp/tutorial_code/calib3d/real_time_pose_estimation/src/main_detection.cpp features

The features and descriptors will be computed by the *RobustMatcher* inside the matching function.
```

\-# **Match scene descriptors with model descriptors using Flann matcher**

```
It is the first step in our detection algorithm. The main idea is to match the scene descriptors
with our model descriptors in order to know the 3D coordinates of the found features into the
current scene.

Firstly, we have to set which matcher we want to use. In this case is used
@ref cv::FlannBasedMatcher matcher which in terms of computational cost is faster than the
@ref cv::BFMatcher matcher as we increase the trained collection of features. Then, for
FlannBased matcher the index created is *Multi-Probe LSH: Efficient Indexing for High-Dimensional
Similarity Search* due to *ORB* descriptors are binary.

You can tune the *LSH* and search parameters to improve the matching efficiency:
@code{.cpp}
cv::Ptr<cv::flann::IndexParams> indexParams = cv::makePtr<cv::flann::LshIndexParams>(6, 12, 1); // instantiate LSH index parameters
cv::Ptr<cv::flann::SearchParams> searchParams = cv::makePtr<cv::flann::SearchParams>(50);       // instantiate flann search parameters

cv::DescriptorMatcher * matcher = new cv::FlannBasedMatcher(indexParams, searchParams);         // instantiate FlannBased matcher
rmatcher.setDescriptorMatcher(matcher);                                                         // set matcher
@endcode
Secondly, we have to call the matcher by using *robustMatch()* or *fastRobustMatch()* function.
The difference of using this two functions is its computational cost. The first method is slower
but more robust at filtering good matches because uses two ratio test and a symmetry test. In
contrast, the second method is faster but less robust because only applies a single ratio test to
the matches.

The following code is to get the model 3D points and its descriptors and then call the matcher in
the main program:
@code{.cpp}
// Get the MODEL INFO

std::vector<cv::Point3f> list_points3d_model = model.get_points3d();  // list with model 3D coordinates
cv::Mat descriptors_model = model.get_descriptors();                  // list with descriptors of each 3D coordinate
@endcode

@snippet samples/cpp/tutorial_code/calib3d/real_time_pose_estimation/src/main_detection.cpp robust_match_call

The following code corresponds to the *robustMatch()* function which belongs to the
*RobustMatcher* class. This function uses the given image to detect the keypoints and extract the
descriptors, match using *two Nearest Neighbour* the extracted descriptors with the given model
descriptors and vice versa. Then, a ratio test is applied to the two direction matches in order to
remove these matches which its distance ratio between the first and second best match is larger
than a given threshold. Finally, a symmetry test is applied in order to remove non symmetrical
matches.

@snippet samples/cpp/tutorial_code/calib3d/real_time_pose_estimation/src/RobustMatcher.cpp robust_match

After the matches filtering we have to subtract the 2D and 3D correspondences from the found scene
keypoints and our 3D model using the obtained *DMatches* vector. For more information about
@ref cv::DMatch check the documentation.

@snippet samples/cpp/tutorial_code/calib3d/real_time_pose_estimation/src/main_detection.cpp correspondences

You can also change the ratio test threshold, the number of keypoints to detect as well as use or
not the robust matcher:
@code{.cpp}
./cpp-tutorial-pnp_detection --ratio=0.8 --keypoints=1000 --fast=false
@endcode
```

\-# **Pose estimation using PnP + Ransac**

```
Once with the 2D and 3D correspondences we have to apply a PnP algorithm in order to estimate the
camera pose. The reason why we have to use @ref cv::solvePnPRansac instead of @ref cv::solvePnP is
due to the fact that after the matching not all the found correspondences are correct and, as like
as not, there are false correspondences or also called *outliers*. The [Random Sample
Consensus](http://en.wikipedia.org/wiki/RANSAC) or *Ransac* is a non-deterministic iterative
method which estimate parameters of a mathematical model from observed data producing an
approximate result as the number of iterations increase. After applying *Ransac* all the *outliers*
will be eliminated to then estimate the camera pose with a certain probability to obtain a good
solution.

For the camera pose estimation I have implemented a *class* **PnPProblem**. This *class* has 4
attributes: a given calibration matrix, the rotation matrix, the translation matrix and the
rotation-translation matrix. The intrinsic calibration parameters of the camera which you are
using to estimate the pose are necessary. In order to obtain the parameters you can check
@ref tutorial_camera_calibration_square_chess and @ref tutorial_camera_calibration tutorials.

The following code is how to declare the *PnPProblem class* in the main program:

@code{.cpp}
// Intrinsic camera parameters: UVC WEBCAM

double f = 55;                           // focal length in mm
double sx = 22.3, sy = 14.9;             // sensor size
double width = 640, height = 480;        // image size

double params_WEBCAM[] = { width*f/sx,   // fx
                           height*f/sy,  // fy
                           width/2,      // cx
                           height/2};    // cy

PnPProblem pnp_detection(params_WEBCAM); // instantiate PnPProblem class
@endcode
The following code is how the *PnPProblem class* initialises its attributes:

@snippet samples/cpp/tutorial_code/calib3d/real_time_pose_estimation/src/PnPProblem.cpp pnp_ctor

OpenCV provides four PnP methods: ITERATIVE, EPNP, P3P and DLS. Depending on the application type,
the estimation method will be different. In the case that we want to make a real time application,
the more suitable methods are EPNP and P3P since they are faster than ITERATIVE and DLS at
finding an optimal solution. However, EPNP and P3P are not especially robust in front of planar
surfaces and sometimes the pose estimation seems to have a mirror effect. Therefore, in this
tutorial an ITERATIVE method is used due to the object to be detected has planar surfaces.

The OpenCV RANSAC implementation wants you to provide three parameters: 1) the maximum number of
iterations until the algorithm stops, 2) the maximum allowed distance between the observed and
computed point projections to consider it an inlier and 3) the confidence to obtain a good result.
You can tune these parameters in order to improve your algorithm performance. Increasing the
number of iterations will have a more accurate solution, but will take more time to find a
solution. Increasing the reprojection error will reduce the computation time, but your solution
will be unaccurate. Decreasing the confidence your algorithm will be faster, but the obtained
solution will be unaccurate.

The following parameters work for this application:
@code{.cpp}
// RANSAC parameters

int iterationsCount = 500;        // number of Ransac iterations.
float reprojectionError = 2.0;    // maximum allowed distance to consider it an inlier.
float confidence = 0.95;          // RANSAC successful confidence.
@endcode
The following code corresponds to the *estimatePoseRANSAC()* function which belongs to the
*PnPProblem class*. This function estimates the rotation and translation matrix given a set of
2D/3D correspondences, the desired PnP method to use, the output inliers container and the Ransac
parameters:

@snippet samples/cpp/tutorial_code/calib3d/real_time_pose_estimation/src/PnPProblem.cpp pnp_ransac

In the following code are the 3rd and 4th steps of the main algorithm. The first, calling the
above function and the second taking the output inliers vector from RANSAC to get the 2D scene
points for drawing purpose. As seen in the code we must be sure to apply RANSAC if we have
matches, in the other case, the function @ref cv::solvePnPRansac throws assert on invalid input
(not enough points).
@code{.cpp}
if(good_matches.size() > 4) // OpenCV requires solvePnPRANSAC to minimally have 4 set of points
{

    // -- Step 3: Estimate the pose using RANSAC approach
    pnp_detection.estimatePoseRANSAC( list_points3d_model_match, list_points2d_scene_match,
                                      pnpMethod, inliers_idx, iterationsCount, reprojectionError, confidence );

    // -- Step 4: Catch the inliers keypoints to draw
    for(int inliers_index = 0; inliers_index < inliers_idx.rows; ++inliers_index)
    {
    int n = inliers_idx.at<int>(inliers_index);         // i-inlier
    cv::Point2f point2d = list_points2d_scene_match[n]; // i-inlier point 2D
    list_points2d_inliers.push_back(point2d);           // add i-inlier to list
}
@endcode
Finally, once the camera pose has been estimated we can use the \f$R\f$ and \f$t\f$ in order to compute
the 2D projection onto the image of a given 3D point expressed in a world reference frame using
the showed formula on *Theory*.

The following code corresponds to the *backproject3DPoint()* function which belongs to the
*PnPProblem class*. The function backproject a given 3D point expressed in a world reference frame
onto a 2D image:

@snippet samples/cpp/tutorial_code/calib3d/real_time_pose_estimation/src/PnPProblem.cpp pnp_backproj

The above function is used to compute all the 3D points of the object *Mesh* to show the pose of
the object.

You can also change RANSAC parameters and PnP method:
@code{.cpp}
./cpp-tutorial-pnp_detection --error=0.25 --confidence=0.90 --iterations=250 --method=3
@endcode
```

\-# **Linear Kalman Filter for bad poses rejection**

```
Is it common in computer vision or robotics fields that after applying detection or tracking
techniques, bad results are obtained due to some sensor errors. In order to avoid these bad
detections in this tutorial is explained how to implement a Linear Kalman Filter. The Kalman
Filter will be applied after detected a given number of inliers.

You can find more information about what [Kalman
Filter](http://en.wikipedia.org/wiki/Kalman_filter) is. In this tutorial it's used the OpenCV
implementation of the @ref cv::KalmanFilter based on
[Linear Kalman Filter for position and orientation tracking](http://campar.in.tum.de/Chair/KalmanFilter)
to set the dynamics and measurement models.

Firstly, we have to define our state vector which will have 18 states: the positional data (x,y,z)
with its first and second derivatives (velocity and acceleration), then rotation is added in form
of three euler angles (roll, pitch, jaw) together with their first and second derivatives (angular
velocity and acceleration)

\f[X = (x,y,z,\dot x,\dot y,\dot z,\ddot x,\ddot y,\ddot z,\psi,\theta,\phi,\dot \psi,\dot \theta,\dot \phi,\ddot \psi,\ddot \theta,\ddot \phi)^T\f]

Secondly, we have to define the number of measurements which will be 6: from \f$R\f$ and \f$t\f$ we can
extract \f$(x,y,z)\f$ and \f$(\psi,\theta,\phi)\f$. In addition, we have to define the number of control
actions to apply to the system which in this case will be *zero*. Finally, we have to define the
differential time between measurements which in this case is \f$1/T\f$, where *T* is the frame rate of
the video.

@snippet samples/cpp/tutorial_code/calib3d/real_time_pose_estimation/src/main_detection.cpp Kalman_init_call

The following code corresponds to the *Kalman Filter* initialisation. Firstly, is set the process
noise, the measurement noise and the error covariance matrix. Secondly, are set the transition
matrix which is the dynamic model and finally the measurement matrix, which is the measurement
model.

You can tune the process and measurement noise to improve the *Kalman Filter* performance. As the
measurement noise is reduced the faster will converge doing the algorithm sensitive in front of
bad measurements.

@snippet samples/cpp/tutorial_code/calib3d/real_time_pose_estimation/src/main_detection.cpp Kalman_init

In the following code is the 5th step of the main algorithm. When the obtained number of inliers
after *Ransac* is over the threshold, the measurements matrix is filled and then the *Kalman
Filter* is updated:

@snippet samples/cpp/tutorial_code/calib3d/real_time_pose_estimation/src/main_detection.cpp step_5

The following code corresponds to the *fillMeasurements()* function which converts the measured
[Rotation Matrix to Eulers
angles](http://euclideanspace.com/maths/geometry/rotations/conversions/matrixToEuler/index.htm)
and fill the measurements matrix along with the measured translation vector:

@snippet samples/cpp/tutorial_code/calib3d/real_time_pose_estimation/src/main_detection.cpp fill_measure

The following code corresponds to the *updateKalmanFilter()* function which update the Kalman
Filter and set the estimated Rotation Matrix and translation vector. The estimated Rotation Matrix
comes from the estimated [Euler angles to Rotation
Matrix](http://euclideanspace.com/maths/geometry/rotations/conversions/eulerToMatrix/index.htm).

@snippet samples/cpp/tutorial_code/calib3d/real_time_pose_estimation/src/main_detection.cpp Kalman_update

The 6th step is set the estimated rotation-translation matrix:
@code{.cpp}
// -- Step 6: Set estimated projection matrix
pnp_detection_est.set_P_matrix(rotation_estimated, translation_estimated);
@endcode
The last and optional step is draw the found pose. To do it I implemented a function to draw all
the mesh 3D points and an extra reference axis:

@snippet samples/cpp/tutorial_code/calib3d/real_time_pose_estimation/src/main_detection.cpp step_x

You can also modify the minimum inliers to update Kalman Filter:
@code{.cpp}
./cpp-tutorial-pnp_detection --inliers=20
@endcode
```

## Results

The following videos are the results of pose estimation in real time using the explained detection algorithm using the following parameters: @code{.cpp} // Robust Matcher parameters

int numKeyPoints = 2000; // number of detected keypoints float ratio = 0.70f; // ratio test bool fast\_match = true; // fastRobustMatch() or robustMatch()

// RANSAC parameters

int iterationsCount = 500; // number of Ransac iterations. int reprojectionError = 2.0; // maximum allowed distance to consider it an inlier. float confidence = 0.95; // ransac successful confidence.

// Kalman Filter parameters

int minInliersKalman = 30; // Kalman threshold updating @endcode You can watch the real time pose estimation on the [YouTube here](http://www.youtube.com/user/opencvdev/videos).

@youtube{XNATklaJlSQ} @youtube{YLS9bWek78k}

## [Table Of Content Calib3d](https://docharvest.github.io/docs/opencv5/tutorials/calib3d/table_of_content_calib3d/)

Contents

opencv5

Table Of Content Calib3d

OpenCV 5

Table Of Content Calib3d

# Camera calibration and 3D reconstruction (calib3d module) {#tutorial\_table\_of\_content\_calib3d}

-   @subpage tutorial\_camera\_calibration\_pattern
-   @subpage tutorial\_camera\_calibration\_square\_chess
-   @subpage tutorial\_camera\_calibration
-   @subpage tutorial\_real\_time\_pose
-   @subpage tutorial\_interactive\_calibration
-   @subpage tutorial\_multiview\_camera\_calibration
-   @subpage tutorial\_usac

## [Usac](https://docharvest.github.io/docs/opencv5/tutorials/calib3d/usac/)

Contents

opencv5

Usac

OpenCV 5

Usac

# USAC: Improvement of Random Sample Consensus in OpenCV {#tutorial\_usac}

@tableofcontents

@prev\_tutorial{tutorial\_multiview\_camera\_calibration}

Original author

Maksym Ivashechkin

Compatibility

OpenCV >= 4.0

This work was integrated as part of the Google Summer of Code (August 2020).

## Contribution

The integrated part to OpenCV `3d` module is RANSAC-based universal framework USAC (`namespace usac`) written in C++. The framework includes different state-of-the-arts methods for sampling, verification or local optimization. The main advantage of the framework is its independence to any estimation problem and modular structure. Therefore, new solvers or methods can be added/removed easily. So far it includes the following components:

1.  Sampling method:
    
    1.  Uniform – standard RANSAC sampling proposed in @cite FischlerRANSAC which draw minimal subset independently uniformly at random. _The default option in proposed framework_.
        
    2.  PROSAC – method @cite ChumPROSAC that assumes input data points sorted by quality so sampling can start from the most promising points. Correspondences for this method can be sorted e.g., by ratio of descriptor distances of the best to second match obtained from SIFT detector. _This is method is recommended to use because it can find good model and terminate much earlier_.
        
    3.  NAPSAC – sampling method @cite MyattNAPSAC which takes initial point uniformly at random and the rest of points for minimal sample in the neighborhood of initial point. This is method can be potentially useful when models are localized. For example, for plane fitting. However, in practise struggles from degenerate issues and defining optimal neighborhood size.
        
    4.  Progressive-NAPSAC – sampler @cite barath2019progressive which is similar to NAPSAC, although it starts from local and gradually converges to global sampling. This method can be quite useful if local models are expected but distribution of data can be arbitrary. The implemented version assumes data points to be sorted by quality as in PROSAC.
        
2.  Score Method. USAC as well as standard RANSAC finds model which minimizes total loss. Loss can be represented by following functions:
    
    1.  RANSAC – binary 0 / 1 loss. 1 for outlier, 0 for inlier. _Good option if the goal is to find as many inliers as possible._
        
    2.  MSAC – truncated squared error distance of point to model. _The default option in framework_. The model might not have as many inliers as using RANSAC score, however will be more accurate.
        
    3.  MAGSAC – threshold-free method @cite BarathMAGSAC to compute score. Using, although, maximum sigma (standard deviation of noise) level to marginalize residual of point over sigma. Score of the point represents likelihood of point being inlier. _Recommended option when image noise is unknown since method does not require threshold_. However, it is still recommended to provide at least approximated threshold, because termination itself is based on number of points which error is less than threshold. By giving 0 threshold the method will output model after maximum number of iterations reached.
        
    4.  LMeds – the least median of squared error distances. In the framework finding median is efficiently implement with $O(n)$ complexity using quick-sort algorithm. Note, LMeds does not have to work properly when inlier ratio is less than 50%, in other cases this method is robust and does not require threshold.
        
3.  Error metric which describes error distance of point to estimated model.
    
    1.  Re-projection distance – used for affine, homography and projection matrices. For homography also symmetric re-projection distance can be used.
        
    2.  Sampson distance – used for Fundamental matrix.
        
    3.  Symmetric Geometric distance – used for Essential matrix.
        
4.  Degeneracy:
    
    1.  DEGENSAC – method @cite ChumDominant which for Fundamental matrix estimation efficiently verifies and recovers model which has at least 5 points in minimal sample lying on the dominant plane.
        
    2.  Collinearity test – for affine and homography matrix estimation checks if no 3 points lying on the line. For homography matrix since points are planar is applied test which checks if points in minimal sample lie on the same side w.r.t. to any line crossing any two points in sample (does not assume reflection).
        
    3.  Oriented epipolar constraint – method @cite ChumEpipolar for epipolar geometry which verifies model (fundamental and essential matrix) to have points visible in the front of the camera.
        
5.  SPRT verification – method @cite Matas2005RandomizedRW which verifies model by its evaluation on randomly shuffled points using statistical properties given by probability of inlier, relative time for estimation, average number of output models etc. Significantly speeding up framework, because bad model can be rejected very quickly without explicitly computing error for every point.
    
6.  Local Optimization:
    
    1.  Locally Optimized RANSAC – method @cite ChumLORANSAC that iteratively improves so-far-the-best model by non-minimal estimation. _The default option in framework. This procedure is the fastest and not worse than others local optimization methods._
        
    2.  Graph-Cut RANSAC – method @cite BarathGCRANSAC that refine so-far-the-best model, however, it exploits spatial coherence of the data points. _This procedure is quite precise however computationally slower._
        
    3.  Sigma Consensus – method @cite BarathMAGSAC which improves model by applying non-minimal weighted estimation, where weights are computed with the same logic as in MAGSAC score. This method is better to use together with MAGSAC score.
        
7.  Termination:
    
    1.  Standard – standard equation for independent and uniform sampling.
        
    2.  PROSAC – termination for PROSAC.
        
    3.  SPRT – termination for SPRT.
        
8.  Solver. In the framework there are minimal and non-minimal solvers. In minimal solver standard methods for estimation is applied. In non-minimal solver usually the covariance matrix is built and the model is found as the eigen vector corresponding to the highest eigen value.
    
    1.  Affine2D matrix
        
    2.  Homography matrix – for minimal solver is used RHO (Gaussian elimination) algorithm from OpenCV.
        
    3.  Fundamental matrix – for 7-points algorithm two null vectors are found using Gaussian elimination (eliminating to upper triangular matrix and back-substitution) instead of SVD and then solving 3-degrees polynomial. For 8-points solver Gaussian elimination is used too.
        
    4.  Essential matrix – 4 null vectors are found using Gaussian elimination. Then the solver based on Gröbner basis described in @cite SteweniusRecent is used. Essential matrix can be computed only if LAPACK or Eigen are installed as it requires eigen decomposition with complex eigen values.
        
    5.  Perspective-n-Point – the minimal solver is classical 3 points with up to 4 solutions. For RANSAC the low number of sample size plays significant role as it requires less iterations, furthermore in average P3P solver has around 1.39 estimated models. Also, in new version of `solvePnPRansac(...)` with `UsacParams` there is an option to pass empty intrinsic matrix `InputOutputArray cameraMatrix`. If matrix is empty then using Direct Linear Transformation algorithm (PnP with 6 points) framework outputs not only rotation and translation vector but also calibration matrix.
        

Also, the framework can be run in parallel. The parallelization is done in the way that multiple RANSACs are created and they share two atomic variables `bool success` and `int num_hypothesis_tested` which determines when all RANSACs must terminate. If one of RANSAC terminated successfully then all other RANSAC will terminate as well. In the end the best model is synchronized from all threads. If PROSAC sampler is used then threads must share the same sampler since sampling is done sequentially. However, using default options of framework parallel RANSAC is not deterministic since it depends on how often each thread is running. The easiest way to make it deterministic is using PROSAC sampler without SPRT and Local Optimization and not for Fundamental matrix, because they internally use random generators.

For NAPSAC, Progressive NAPSAC or Graph-Cut methods is required to build a neighborhood graph. In framework there are 3 options to do it:

1.  NEIGH\_FLANN\_KNN – estimate neighborhood graph using OpenCV FLANN K nearest-neighbors. The default value for KNN is 7. KNN method may work good for sampling but not good for GC-RANSAC.
    
2.  `NEIGH_FLANN_RADIUS` – similarly as in previous case finds neighbor points which distance is less than 20 pixels.
    
3.  `NEIGH_GRID` – for finding points’ neighborhood tiles points in cells using hash-table. The method is described in @cite barath2019progressive. Less accurate than `NEIGH_FLANN_RADIUS`, although significantly faster.
    

Note, `NEIGH_FLANN_RADIUS` and `NEIGH_GRID` are not able to PnP solver, since there are 3D object points.

## New flags:

1.  `USAC_DEFAULT` – has standard LO-RANSAC.
    
2.  `USAC_PARALLEL` – has LO-RANSAC and RANSACs run in parallel.
    
3.  `USAC_ACCURATE` – has GC-RANSAC.
    
4.  `USAC_FAST` – has LO-RANSAC with smaller number iterations in local optimization step. Uses RANSAC score to maximize number of inliers and terminate earlier.
    
5.  `USAC_PROSAC` – has PROSAC sampling. Note, points must be sorted.
    
6.  `USAC_FM_8PTS` – has LO-RANSAC. Only valid for Fundamental matrix with 8-points solver.
    
7.  `USAC_MAGSAC` – has MAGSAC++.
    

Every flag uses SPRT verification. And in the end the final so-far-the-best model is polished by non minimal estimation of all found inliers.

## A few other important parameters:

1.  `randomGeneratorState` – since every USAC solver is deterministic in OpenCV (i.e., for the same points and parameters returns the same result) by providing new state it will output new model.
    
2.  `loIterations` – number of iterations for Local Optimization method. _The default value is 10_. By increasing `loIterations` the output model could be more accurate, however, the computational time may also increase.
    
3.  `loSampleSize` – maximum sample number for Local Optimization. _The default value is 14_. Note, that by increasing `loSampleSize` the accuracy of model can increase as well as the computational time. However, it is recommended to keep value less than 100, because estimation on low number of points is faster and more robust.
    

## Samples:

There are three new sample files in opencv/samples directory.

1.  `epipolar_lines.cpp` – input arguments of `main` function are two paths to images. Then correspondences are found using SIFT detector. Fundamental matrix is found using RANSAC from tentative correspondences and epipolar lines are plotted.
    
2.  `essential_mat_reconstr.cpp` – input arguments are path to data file containing image names and single intrinsic matrix and directory where these images located. Correspondences are found using SIFT. The essential matrix is estimated using RANSAC and decomposed to rotation and translation. Then by building two relative poses with projection matrices image points are triangulated to object points. By running RANSAC with 3D plane fitting object points as well as correspondences are clustered into planes.
    
3.  `essential_mat_reconstr.py` – the same functionality as in .cpp file, however instead of clustering points to plane the 3D map of object points is plotted.

## [Adding Images](https://docharvest.github.io/docs/opencv5/tutorials/core/adding_images/adding_images/)

Contents

opencv5

Adding Images

OpenCV 5

Adding Images

# Adding (blending) two images using OpenCV {#tutorial\_adding\_images}

@tableofcontents

@prev\_tutorial{tutorial\_mat\_operations} @next\_tutorial{tutorial\_basic\_linear\_transform}

Original author

Ana Huamán

Compatibility

OpenCV >= 3.0

## We will learn how to blend two images! Goal

In this tutorial you will learn:

-   what is _linear blending_ and why it is useful;
-   how to add two images using **addWeighted()**

## Theory

@note The explanation below belongs to the book [Computer Vision: Algorithms and Applications](https://szeliski.org/Book/) by Richard Szeliski

From our previous tutorial, we already know a bit of _Pixel operators_. An interesting dyadic (two-input) operator is the _linear blend operator_:

\\f\[g(x) = (1 - \\alpha)f\_{0}(x) + \\alpha f\_{1}(x)\\f\]

By varying \\f$\\alpha\\f$ from \\f$0 \\rightarrow 1\\f$, this operator can be used to perform a temporal _cross-dissolve_ between two images or videos, as seen in slide shows and film productions (cool, eh?)

## Source Code

@add\_toggle\_cpp Download the source code from [here](https://raw.githubusercontent.com/opencv/opencv/5.x/samples/cpp/tutorial_code/core/AddingImages/AddingImages.cpp). @include cpp/tutorial\_code/core/AddingImages/AddingImages.cpp @end\_toggle

@add\_toggle\_java Download the source code from [here](https://raw.githubusercontent.com/opencv/opencv/5.x/samples/java/tutorial_code/core/AddingImages/AddingImages.java). @include java/tutorial\_code/core/AddingImages/AddingImages.java @end\_toggle

@add\_toggle\_python Download the source code from [here](https://raw.githubusercontent.com/opencv/opencv/5.x/samples/python/tutorial_code/core/AddingImages/adding_images.py). @include python/tutorial\_code/core/AddingImages/adding\_images.py @end\_toggle

## Explanation

Since we are going to perform:

\\f\[g(x) = (1 - \\alpha)f\_{0}(x) + \\alpha f\_{1}(x)\\f\]

We need two source images (\\f$f\_{0}(x)\\f$ and \\f$f\_{1}(x)\\f$). So, we load them in the usual way: @add\_toggle\_cpp @snippet cpp/tutorial\_code/core/AddingImages/AddingImages.cpp load @end\_toggle

@add\_toggle\_java @snippet java/tutorial\_code/core/AddingImages/AddingImages.java load @end\_toggle

@add\_toggle\_python @snippet python/tutorial\_code/core/AddingImages/adding\_images.py load @end\_toggle

We used the following images: [LinuxLogo.jpg](https://raw.githubusercontent.com/opencv/opencv/5.x/samples/data/LinuxLogo.jpg) and [WindowsLogo.jpg](https://raw.githubusercontent.com/opencv/opencv/5.x/samples/data/WindowsLogo.jpg)

@warning Since we are _adding_ _src1_ and _src2_, they both have to be of the same size (width and height) and type.

Now we need to generate the `g(x)` image. For this, the function **addWeighted()** comes quite handy:

@add\_toggle\_cpp @snippet cpp/tutorial\_code/core/AddingImages/AddingImages.cpp blend\_images @end\_toggle

@add\_toggle\_java @snippet java/tutorial\_code/core/AddingImages/AddingImages.java blend\_images @end\_toggle

@add\_toggle\_python @snippet python/tutorial\_code/core/AddingImages/adding\_images.py blend\_images Numpy version of above line (but cv function is around 2x faster): \\code{.py} dst = np.uint8(alpha\*(img1)+beta\*(img2)) \\endcode @end\_toggle

since **addWeighted()** produces: \\f\[dst = \\alpha \\cdot src1 + \\beta \\cdot src2 + \\gamma\\f\] In this case, `gamma` is the argument \\f$0.0\\f$ in the code above.

Create windows, show the images and wait for the user to end the program. @add\_toggle\_cpp @snippet cpp/tutorial\_code/core/AddingImages/AddingImages.cpp display @end\_toggle

@add\_toggle\_java @snippet java/tutorial\_code/core/AddingImages/AddingImages.java display @end\_toggle

@add\_toggle\_python @snippet python/tutorial\_code/core/AddingImages/adding\_images.py display @end\_toggle

## Result

## [Basic Linear Transform](https://docharvest.github.io/docs/opencv5/tutorials/core/basic_linear_transform/basic_linear_transform/)

Contents

opencv5

Basic Linear Transform

OpenCV 5

Basic Linear Transform

# Changing the contrast and brightness of an image! {#tutorial\_basic\_linear\_transform}

@tableofcontents

@prev\_tutorial{tutorial\_adding\_images} @next\_tutorial{tutorial\_discrete\_fourier\_transform}

Original author

Ana Huamán

Compatibility

OpenCV >= 3.0

## Goal

In this tutorial you will learn how to:

-   Access pixel values
-   Initialize a matrix with zeros
-   Learn what @ref cv::saturate\_cast does and why it is useful
-   Get some cool info about pixel transformations
-   Improve the brightness of an image on a practical example

## Theory

@note The explanation below belongs to the book [Computer Vision: Algorithms and Applications](https://szeliski.org/Book/) by Richard Szeliski

### Image Processing

-   A general image processing operator is a function that takes one or more input images and produces an output image.
-   Image transforms can be seen as:
    -   Point operators (pixel transforms)
    -   Neighborhood (area-based) operators

### Pixel Transforms

-   In this kind of image processing transform, each output pixel's value depends on only the corresponding input pixel value (plus, potentially, some globally collected information or parameters).
-   Examples of such operators include _brightness and contrast adjustments_ as well as color correction and transformations.

### Brightness and contrast adjustments

-   Two commonly used point processes are _multiplication_ and _addition_ with a constant:
    
    \\f\[g(x) = \\alpha f(x) + \\beta\\f\]
    
-   The parameters \\f$\\alpha > 0\\f$ and \\f$\\beta\\f$ are often called the _gain_ and _bias_ parameters; sometimes these parameters are said to control _contrast_ and _brightness_ respectively.
    
-   You can think of \\f$f(x)\\f$ as the source image pixels and \\f$g(x)\\f$ as the output image pixels. Then, more conveniently we can write the expression as:
    
    \\f\[g(i,j) = \\alpha \\cdot f(i,j) + \\beta\\f\]
    
    where \\f$i\\f$ and \\f$j\\f$ indicates that the pixel is located in the _i-th_ row and _j-th_ column.
    

## Code

@add\_toggle\_cpp

-   **Downloadable code**: Click [here](https://github.com/opencv/opencv/tree/5.x/samples/cpp/tutorial_code/ImgProc/BasicLinearTransforms.cpp)
    
-   The following code performs the operation \\f$g(i,j) = \\alpha \\cdot f(i,j) + \\beta\\f$ : @include samples/cpp/tutorial\_code/ImgProc/BasicLinearTransforms.cpp @end\_toggle
    

@add\_toggle\_java

-   **Downloadable code**: Click [here](https://github.com/opencv/opencv/tree/5.x/samples/java/tutorial_code/ImgProc/changing_contrast_brightness_image/BasicLinearTransformsDemo.java)
    
-   The following code performs the operation \\f$g(i,j) = \\alpha \\cdot f(i,j) + \\beta\\f$ : @include samples/java/tutorial\_code/ImgProc/changing\_contrast\_brightness\_image/BasicLinearTransformsDemo.java @end\_toggle
    

@add\_toggle\_python

-   **Downloadable code**: Click [here](https://github.com/opencv/opencv/tree/5.x/samples/python/tutorial_code/imgProc/changing_contrast_brightness_image/BasicLinearTransforms.py)
    
-   The following code performs the operation \\f$g(i,j) = \\alpha \\cdot f(i,j) + \\beta\\f$ : @include samples/python/tutorial\_code/imgProc/changing\_contrast\_brightness\_image/BasicLinearTransforms.py @end\_toggle
    

## Explanation

-   We load an image using @ref cv::imread and save it in a Mat object:

@add\_toggle\_cpp @snippet samples/cpp/tutorial\_code/ImgProc/BasicLinearTransforms.cpp basic-linear-transform-load @end\_toggle

@add\_toggle\_java @snippet samples/java/tutorial\_code/ImgProc/changing\_contrast\_brightness\_image/BasicLinearTransformsDemo.java basic-linear-transform-load @end\_toggle

@add\_toggle\_python @snippet samples/python/tutorial\_code/imgProc/changing\_contrast\_brightness\_image/BasicLinearTransforms.py basic-linear-transform-load @end\_toggle

-   Now, since we will make some transformations to this image, we need a new Mat object to store it. Also, we want this to have the following features:
    
    -   Initial pixel values equal to zero
    -   Same size and type as the original image

@add\_toggle\_cpp @snippet samples/cpp/tutorial\_code/ImgProc/BasicLinearTransforms.cpp basic-linear-transform-output @end\_toggle

@add\_toggle\_java @snippet samples/java/tutorial\_code/ImgProc/changing\_contrast\_brightness\_image/BasicLinearTransformsDemo.java basic-linear-transform-output @end\_toggle

@add\_toggle\_python @snippet samples/python/tutorial\_code/imgProc/changing\_contrast\_brightness\_image/BasicLinearTransforms.py basic-linear-transform-output @end\_toggle

We observe that @ref cv::Mat::zeros returns a Matlab-style zero initializer based on _image.size()_ and _image.type()_

-   We ask now the values of \\f$\\alpha\\f$ and \\f$\\beta\\f$ to be entered by the user:

@add\_toggle\_cpp @snippet samples/cpp/tutorial\_code/ImgProc/BasicLinearTransforms.cpp basic-linear-transform-parameters @end\_toggle

@add\_toggle\_java @snippet samples/java/tutorial\_code/ImgProc/changing\_contrast\_brightness\_image/BasicLinearTransformsDemo.java basic-linear-transform-parameters @end\_toggle

@add\_toggle\_python @snippet samples/python/tutorial\_code/imgProc/changing\_contrast\_brightness\_image/BasicLinearTransforms.py basic-linear-transform-parameters @end\_toggle

-   Now, to perform the operation \\f$g(i,j) = \\alpha \\cdot f(i,j) + \\beta\\f$ we will access to each pixel in image. Since we are operating with BGR images, we will have three values per pixel (B, G and R), so we will also access them separately. Here is the piece of code:

@add\_toggle\_cpp @snippet samples/cpp/tutorial\_code/ImgProc/BasicLinearTransforms.cpp basic-linear-transform-operation @end\_toggle

@add\_toggle\_java @snippet samples/java/tutorial\_code/ImgProc/changing\_contrast\_brightness\_image/BasicLinearTransformsDemo.java basic-linear-transform-operation @end\_toggle

@add\_toggle\_python @snippet samples/python/tutorial\_code/imgProc/changing\_contrast\_brightness\_image/BasicLinearTransforms.py basic-linear-transform-operation @end\_toggle

Notice the following (**C++ code only**):

-   To access each pixel in the images we are using this syntax: _image.at<Vec3b>(y,x)\[c\]_ where _y_ is the row, _x_ is the column and _c_ is B, G or R (0, 1 or 2).
    
-   Since the operation \\f$\\alpha \\cdot p(i,j) + \\beta\\f$ can give values out of range or not integers (if \\f$\\alpha\\f$ is float), we use cv::saturate\_cast to make sure the values are valid.
    
-   Finally, we create windows and show the images, the usual way.
    

@add\_toggle\_cpp @snippet samples/cpp/tutorial\_code/ImgProc/BasicLinearTransforms.cpp basic-linear-transform-display @end\_toggle

@add\_toggle\_java @snippet samples/java/tutorial\_code/ImgProc/changing\_contrast\_brightness\_image/BasicLinearTransformsDemo.java basic-linear-transform-display @end\_toggle

@add\_toggle\_python @snippet samples/python/tutorial\_code/imgProc/changing\_contrast\_brightness\_image/BasicLinearTransforms.py basic-linear-transform-display @end\_toggle

@note Instead of using the **for** loops to access each pixel, we could have simply used this command:

@add\_toggle\_cpp @code{.cpp} image.convertTo(new\_image, -1, alpha, beta); @endcode @end\_toggle

@add\_toggle\_java @code{.java} image.convertTo(newImage, -1, alpha, beta); @endcode @end\_toggle

@add\_toggle\_python @code{.py} new\_image = cv.convertScaleAbs(image, alpha=alpha, beta=beta) @endcode @end\_toggle

where @ref cv::Mat::convertTo would effectively perform _new\_image = a_image + beta\*. However, we wanted to show you how to access each pixel. In any case, both methods give the same result but convertTo is more optimized and works a lot faster.

## Result

-   ## Running our code and using \\f$\\alpha = 2.2\\f$ and \\f$\\beta = 50\\f$ @code{.bash} $ ./BasicLinearTransforms lena.jpg Basic Linear Transforms
    
    -   Enter the alpha value \[1.0-3.0\]: 2.2
    -   Enter the beta value \[0-100\]: 50 @endcode
-   We get this:
    

## Practical example

In this paragraph, we will put into practice what we have learned to correct an underexposed image by adjusting the brightness and the contrast of the image. We will also see another technique to correct the brightness of an image called gamma correction.

### Brightness and contrast adjustments

Increasing (/ decreasing) the \\f$\\beta\\f$ value will add (/ subtract) a constant value to every pixel. Pixel values outside of the \[0 ; 255\] range will be saturated (i.e. a pixel value higher (/ lesser) than 255 (/ 0) will be clamped to 255 (/ 0)).

The histogram represents for each color level the number of pixels with that color level. A dark image will have many pixels with low color value and thus the histogram will present a peak in its left part. When adding a constant bias, the histogram is shifted to the right as we have added a constant bias to all the pixels.

The \\f$\\alpha\\f$ parameter will modify how the levels spread. If \\f$ \\alpha < 1 \\f$, the color levels will be compressed and the result will be an image with less contrast.

Note that these histograms have been obtained using the Brightness-Contrast tool in the Gimp software. The brightness tool should be identical to the \\f$\\beta\\f$ bias parameters but the contrast tool seems to differ to the \\f$\\alpha\\f$ gain where the output range seems to be centered with Gimp (as you can notice in the previous histogram).

It can occur that playing with the \\f$\\beta\\f$ bias will improve the brightness but in the same time the image will appear with a slight veil as the contrast will be reduced. The \\f$\\alpha\\f$ gain can be used to diminue this effect but due to the saturation, we will lose some details in the original bright regions.

### Gamma correction

[Gamma correction](https://en.wikipedia.org/wiki/Gamma_correction) can be used to correct the brightness of an image by using a non linear transformation between the input values and the mapped output values:

\\f\[O = \\left( \\frac{I}{255} \\right)^{\\gamma} \\times 255\\f\]

As this relation is non linear, the effect will not be the same for all the pixels and will depend to their original value.

When \\f$ \\gamma < 1 \\f$, the original dark regions will be brighter and the histogram will be shifted to the right whereas it will be the opposite with \\f$ \\gamma > 1 \\f$.

### Correct an underexposed image

The following image has been corrected with: \\f$ \\alpha = 1.3 \\f$ and \\f$ \\beta = 40 \\f$.

{ width=90% }

The overall brightness has been improved but you can notice that the clouds are now greatly saturated due to the numerical saturation of the implementation used ([highlight clipping](https://en.wikipedia.org/wiki/Clipping_\(photography\)) in photography).

The following image has been corrected with: \\f$ \\gamma = 0.4 \\f$.

{ width=90% }

The gamma correction should tend to add less saturation effect as the mapping is non linear and there is no numerical saturation possible as in the previous method.

The previous figure compares the histograms for the three images (the y-ranges are not the same between the three histograms). You can notice that most of the pixel values are in the lower part of the histogram for the original image. After \\f$ \\alpha \\f$, \\f$ \\beta \\f$ correction, we can observe a big peak at 255 due to the saturation as well as a shift in the right. After gamma correction, the histogram is shifted to the right but the pixels in the dark regions are more shifted (see the gamma curves [figure](Basic_Linear_Transform_Tutorial_gamma.png)) than those in the bright regions.

In this tutorial, you have seen two simple methods to adjust the contrast and the brightness of an image. **They are basic techniques and are not intended to be used as a replacement of a raster graphics editor!**

### Code

@add\_toggle\_cpp Code for the tutorial is [here](https://github.com/opencv/opencv/blob/5.x/samples/cpp/tutorial_code/ImgProc/changing_contrast_brightness_image/changing_contrast_brightness_image.cpp). @end\_toggle

@add\_toggle\_java Code for the tutorial is [here](https://github.com/opencv/opencv/blob/5.x/samples/java/tutorial_code/ImgProc/changing_contrast_brightness_image/ChangingContrastBrightnessImageDemo.java). @end\_toggle

@add\_toggle\_python Code for the tutorial is [here](https://github.com/opencv/opencv/blob/5.x/samples/python/tutorial_code/imgProc/changing_contrast_brightness_image/changing_contrast_brightness_image.py). @end\_toggle

Code for the gamma correction:

@add\_toggle\_cpp @snippet samples/cpp/tutorial\_code/ImgProc/changing\_contrast\_brightness\_image/changing\_contrast\_brightness\_image.cpp changing-contrast-brightness-gamma-correction @end\_toggle

@add\_toggle\_java @snippet samples/java/tutorial\_code/ImgProc/changing\_contrast\_brightness\_image/ChangingContrastBrightnessImageDemo.java changing-contrast-brightness-gamma-correction @end\_toggle

@add\_toggle\_python @snippet samples/python/tutorial\_code/imgProc/changing\_contrast\_brightness\_image/changing\_contrast\_brightness\_image.py changing-contrast-brightness-gamma-correction @end\_toggle

A look-up table is used to improve the performance of the computation as only 256 values needs to be calculated once.

### Additional resources

-   [Gamma correction in graphics rendering](https://learnopengl.com/#!Advanced-Lighting/Gamma-Correction)
-   [Gamma correction and images displayed on CRT monitors](http://www.graphics.cornell.edu/~westin/gamma/gamma.html)
-   [Digital exposure techniques](http://www.cambridgeincolour.com/tutorials/digital-exposure-techniques.htm)

## [Discrete Fourier Transform](https://docharvest.github.io/docs/opencv5/tutorials/core/discrete_fourier_transform/discrete_fourier_transform/)

Contents

opencv5

Discrete Fourier Transform

OpenCV 5

Discrete Fourier Transform

# Discrete Fourier Transform {#tutorial\_discrete\_fourier\_transform}

@tableofcontents

@prev\_tutorial{tutorial\_basic\_linear\_transform} @next\_tutorial{tutorial\_file\_input\_output\_with\_xml\_yml}

Original author

Bernát Gábor

Compatibility

OpenCV >= 3.0

## Goal

We'll seek answers for the following questions:

-   What is a Fourier transform and why use it?
-   How to do it in OpenCV?
-   Usage of functions such as: **copyMakeBorder()** , **merge()** , **dft()** , **getOptimalDFTSize()** , **log()** and **normalize()** .

## Source code

@add\_toggle\_cpp You can [download this from here](https://raw.githubusercontent.com/opencv/opencv/5.x/samples/cpp/tutorial_code/core/discrete_fourier_transform/discrete_fourier_transform.cpp) or find it in the `samples/cpp/tutorial_code/core/discrete_fourier_transform/discrete_fourier_transform.cpp` of the OpenCV source code library. @end\_toggle

@add\_toggle\_java You can [download this from here](https://raw.githubusercontent.com/opencv/opencv/5.x/samples/java/tutorial_code/core/discrete_fourier_transform/DiscreteFourierTransform.java) or find it in the `samples/java/tutorial_code/core/discrete_fourier_transform/DiscreteFourierTransform.java` of the OpenCV source code library. @end\_toggle

@add\_toggle\_python You can [download this from here](https://raw.githubusercontent.com/opencv/opencv/5.x/samples/python/tutorial_code/core/discrete_fourier_transform/discrete_fourier_transform.py) or find it in the `samples/python/tutorial_code/core/discrete_fourier_transform/discrete_fourier_transform.py` of the OpenCV source code library. @end\_toggle

Here's a sample usage of **dft()** :

@add\_toggle\_cpp @include cpp/tutorial\_code/core/discrete\_fourier\_transform/discrete\_fourier\_transform.cpp @end\_toggle

@add\_toggle\_java @include java/tutorial\_code/core/discrete\_fourier\_transform/DiscreteFourierTransform.java @end\_toggle

@add\_toggle\_python @include python/tutorial\_code/core/discrete\_fourier\_transform/discrete\_fourier\_transform.py @end\_toggle

## Explanation

The Fourier Transform will decompose an image into its sinus and cosines components. In other words, it will transform an image from its spatial domain to its frequency domain. The idea is that any function may be approximated exactly with the sum of infinite sinus and cosines functions. The Fourier Transform is a way how to do this. Mathematically a two dimensional images Fourier transform is:

\\f\[F(k,l) = \\displaystyle\\sum\\limits\_{i=0}^{N-1}\\sum\\limits\_{j=0}^{N-1} f(i,j)e^{-i2\\pi(\\frac{ki}{N}+\\frac{lj}{N})}\\f\]\\f\[e^{ix} = \\cos{x} + i\\sin {x}\\f\]

Here f is the image value in its spatial domain and F in its frequency domain. The result of the transformation is complex numbers. Displaying this is possible either via a _real_ image and a _complex_ image or via a _magnitude_ and a _phase_ image. However, throughout the image processing algorithms only the _magnitude_ image is interesting as this contains all the information we need about the images geometric structure. Nevertheless, if you intend to make some modifications of the image in these forms and then you need to retransform it you'll need to preserve both of these.

In this sample I'll show how to calculate and show the _magnitude_ image of a Fourier Transform. In case of digital images are discrete. This means they may take up a value from a given domain value. For example in a basic gray scale image values usually are between zero and 255. Therefore the Fourier Transform too needs to be of a discrete type resulting in a Discrete Fourier Transform (_DFT_). You'll want to use this whenever you need to determine the structure of an image from a geometrical point of view. Here are the steps to follow (in case of a gray scale input image _I_):

### Expand the image to an optimal size

The performance of a DFT is dependent of the image size. It tends to be the fastest for image sizes that are multiple of the numbers two, three and five. Therefore, to achieve maximal performance it is generally a good idea to pad border values to the image to get a size with such traits. The **getOptimalDFTSize()** returns this optimal size and we can use the **copyMakeBorder()** function to expand the borders of an image (the appended pixels are initialized with zero):

@add\_toggle\_cpp @snippet cpp/tutorial\_code/core/discrete\_fourier\_transform/discrete\_fourier\_transform.cpp expand @end\_toggle

@add\_toggle\_java @snippet java/tutorial\_code/core/discrete\_fourier\_transform/DiscreteFourierTransform.java expand @end\_toggle

@add\_toggle\_python @snippet python/tutorial\_code/core/discrete\_fourier\_transform/discrete\_fourier\_transform.py expand @end\_toggle

### Make place for both the complex and the real values

The result of a Fourier Transform is complex. This implies that for each image value the result is two image values (one per component). Moreover, the frequency domains range is much larger than its spatial counterpart. Therefore, we store these usually at least in a _float_ format. Therefore we'll convert our input image to this type and expand it with another channel to hold the complex values:

@add\_toggle\_cpp @snippet cpp/tutorial\_code/core/discrete\_fourier\_transform/discrete\_fourier\_transform.cpp complex\_and\_real @end\_toggle

@add\_toggle\_java @snippet java/tutorial\_code/core/discrete\_fourier\_transform/DiscreteFourierTransform.java complex\_and\_real @end\_toggle

@add\_toggle\_python @snippet python/tutorial\_code/core/discrete\_fourier\_transform/discrete\_fourier\_transform.py complex\_and\_real @end\_toggle

### Make the Discrete Fourier Transform

It's possible an in-place calculation (same input as output):

@add\_toggle\_cpp @snippet cpp/tutorial\_code/core/discrete\_fourier\_transform/discrete\_fourier\_transform.cpp dft @end\_toggle

@add\_toggle\_java @snippet java/tutorial\_code/core/discrete\_fourier\_transform/DiscreteFourierTransform.java dft @end\_toggle

@add\_toggle\_python @snippet python/tutorial\_code/core/discrete\_fourier\_transform/discrete\_fourier\_transform.py dft @end\_toggle

### Transform the real and complex values to magnitude

A complex number has a real (_Re_) and a complex (imaginary - _Im_) part. The results of a DFT are complex numbers. The magnitude of a DFT is:

\\f\[M = \\sqrt\[2\]{ {Re(DFT(I))}^2 + {Im(DFT(I))}^2}\\f\]

Translated to OpenCV code:

@add\_toggle\_cpp @snippet cpp/tutorial\_code/core/discrete\_fourier\_transform/discrete\_fourier\_transform.cpp magnitude @end\_toggle

@add\_toggle\_java @snippet java/tutorial\_code/core/discrete\_fourier\_transform/DiscreteFourierTransform.java magnitude @end\_toggle

@add\_toggle\_python @snippet python/tutorial\_code/core/discrete\_fourier\_transform/discrete\_fourier\_transform.py magnitude @end\_toggle

### Switch to a logarithmic scale

It turns out that the dynamic range of the Fourier coefficients is too large to be displayed on the screen. We have some small and some high changing values that we can't observe like this. Therefore the high values will all turn out as white points, while the small ones as black. To use the gray scale values to for visualization we can transform our linear scale to a logarithmic one:

\\f\[M\_1 = \\log{(1 + M)}\\f\]

Translated to OpenCV code:

@add\_toggle\_cpp @snippet cpp/tutorial\_code/core/discrete\_fourier\_transform/discrete\_fourier\_transform.cpp log @end\_toggle

@add\_toggle\_java @snippet java/tutorial\_code/core/discrete\_fourier\_transform/DiscreteFourierTransform.java log @end\_toggle

@add\_toggle\_python @snippet python/tutorial\_code/core/discrete\_fourier\_transform/discrete\_fourier\_transform.py log @end\_toggle

### Crop and rearrange

Remember, that at the first step, we expanded the image? Well, it's time to throw away the newly introduced values. For visualization purposes we may also rearrange the quadrants of the result, so that the origin (zero, zero) corresponds with the image center.

@add\_toggle\_cpp @snippet cpp/tutorial\_code/core/discrete\_fourier\_transform/discrete\_fourier\_transform.cpp crop\_rearrange @end\_toggle

@add\_toggle\_java @snippet java/tutorial\_code/core/discrete\_fourier\_transform/DiscreteFourierTransform.java crop\_rearrange @end\_toggle

@add\_toggle\_python @snippet python/tutorial\_code/core/discrete\_fourier\_transform/discrete\_fourier\_transform.py crop\_rearrange @end\_toggle

### Normalize

This is done again for visualization purposes. We now have the magnitudes, however this are still out of our image display range of zero to one. We normalize our values to this range using the @ref cv::normalize() function.

@add\_toggle\_cpp @snippet cpp/tutorial\_code/core/discrete\_fourier\_transform/discrete\_fourier\_transform.cpp normalize @end\_toggle

@add\_toggle\_java @snippet java/tutorial\_code/core/discrete\_fourier\_transform/DiscreteFourierTransform.java normalize @end\_toggle

@add\_toggle\_python @snippet python/tutorial\_code/core/discrete\_fourier\_transform/discrete\_fourier\_transform.py normalize @end\_toggle

## Result

An application idea would be to determine the geometrical orientation present in the image. For example, let us find out if a text is horizontal or not? Looking at some text you'll notice that the text lines sort of form also horizontal lines and the letters form sort of vertical lines. These two main components of a text snippet may be also seen in case of the Fourier transform. Let us use [this horizontal](https://raw.githubusercontent.com/opencv/opencv/5.x/samples/data/imageTextN.png) and [this rotated](https://raw.githubusercontent.com/opencv/opencv/5.x/samples/data/imageTextR.png) image about a text.

In case of the horizontal text:

In case of a rotated text:

You can see that the most influential components of the frequency domain (brightest dots on the magnitude image) follow the geometric rotation of objects on the image. From this we may calculate the offset and perform an image rotation to correct eventual miss alignments.

## [File Input Output With Xml Yml](https://docharvest.github.io/docs/opencv5/tutorials/core/file_input_output_with_xml_yml/file_input_output_with_xml_yml/)

Contents

opencv5

File Input Output With Xml Yml

OpenCV 5

File Input Output With Xml Yml

# File Input and Output using XML / YAML / JSON files {#tutorial\_file\_input\_output\_with\_xml\_yml}

@tableofcontents

@prev\_tutorial{tutorial\_discrete\_fourier\_transform} @next\_tutorial{tutorial\_how\_to\_use\_OpenCV\_parallel\_for\_} @next\_tutorial{tutorial\_how\_to\_use\_OpenCV\_parallel\_for\_new}

Original author

Bernát Gábor

Compatibility

OpenCV >= 3.0

## Goal

You'll find answers to the following questions:

-   How do you print and read text entries to a file in OpenCV using YAML, XML, or JSON files?
-   How can you perform the same operations for OpenCV data structures?
-   How can this be done for your custom data structures?
-   How do you use OpenCV data structures, such as @ref cv::FileStorage , @ref cv::FileNode or @ref cv::FileNodeIterator .

## Source code

@add\_toggle\_cpp You can [download this from here](https://github.com/opencv/opencv/tree/5.x/samples/cpp/tutorial_code/core/file_input_output/file_input_output.cpp) or find it in the `samples/cpp/tutorial_code/core/file_input_output/file_input_output.cpp` of the OpenCV source code library.

Here's a sample code of how to achieve all the stuff enumerated at the goal list.

@include cpp/tutorial\_code/core/file\_input\_output/file\_input\_output.cpp @end\_toggle

@add\_toggle\_python You can [download this from here](https://github.com/opencv/opencv/tree/5.x/samples/python/tutorial_code/core/file_input_output/file_input_output.py) or find it in the `samples/python/tutorial_code/core/file_input_output/file_input_output.py` of the OpenCV source code library.

Here's a sample code of how to achieve all the stuff enumerated at the goal list.

@include python/tutorial\_code/core/file\_input\_output/file\_input\_output.py @end\_toggle

## Explanation

Here we talk only about XML, YAML and JSON file inputs. Your output (and its respective input) file may have only one of these extensions and the structure coming from this. They are two kinds of data structures you may serialize: _mappings_ (like the STL map and the Python dictionary) and _element sequence_ (like the STL vector). The difference between these is that in a map every element has a unique name through what you may access it. For sequences you need to go through them to query a specific item.

\-# **XML/YAML/JSON File Open and Close.** Before you write any content to such file you need to open it and at the end to close it. The XML/YAML/JSON data structure in OpenCV is @ref cv::FileStorage . To specify that this structure to which file binds on your hard drive you can use either its constructor or the _open()_ function of this: @add\_toggle\_cpp @snippet cpp/tutorial\_code/core/file\_input\_output/file\_input\_output.cpp open @end\_toggle @add\_toggle\_python @snippet python/tutorial\_code/core/file\_input\_output/file\_input\_output.py open @end\_toggle Either one of this you use the second argument is a constant specifying the type of operations you'll be able to on them: WRITE, READ or APPEND. The extension specified in the file name also determinates the output format that will be used. The output may be even compressed if you specify an extension such as _.xml.gz_.

```
The file automatically closes when the @ref cv::FileStorage objects is destroyed. However, you
may explicitly call for this by using the *release* function:
@add_toggle_cpp
@snippet cpp/tutorial_code/core/file_input_output/file_input_output.cpp close
@end_toggle
@add_toggle_python
@snippet python/tutorial_code/core/file_input_output/file_input_output.py close
@end_toggle
```

\-# **Input and Output of text and numbers.** In C++, the data structure uses the << output operator in the STL library. In Python, @ref cv::FileStorage.write() is used instead. For outputting any type of data structure we need first to specify its name. We do this by just simply pushing the name of this to the stream in C++. In Python, the first parameter for the write function is the name. For basic types you may follow this with the print of the value : @add\_toggle\_cpp @snippet cpp/tutorial\_code/core/file\_input\_output/file\_input\_output.cpp writeNum @end\_toggle @add\_toggle\_python @snippet python/tutorial\_code/core/file\_input\_output/file\_input\_output.py writeNum @end\_toggle Reading in is a simple addressing (via the \[\] operator) and casting operation or a read via the >> operator. In Python, we address with getNode() and use real() : @add\_toggle\_cpp @snippet cpp/tutorial\_code/core/file\_input\_output/file\_input\_output.cpp readNum @end\_toggle @add\_toggle\_python @snippet cpp/tutorial\_code/core/file\_input\_output/file\_input\_output.cpp readNum @end\_toggle -# **Input/Output of OpenCV Data structures.** Well these behave exactly just as the basic C++ and Python types: @add\_toggle\_cpp @snippet cpp/tutorial\_code/core/file\_input\_output/file\_input\_output.cpp iomati @snippet cpp/tutorial\_code/core/file\_input\_output/file\_input\_output.cpp iomatw @snippet cpp/tutorial\_code/core/file\_input\_output/file\_input\_output.cpp iomat @end\_toggle @add\_toggle\_python @snippet python/tutorial\_code/core/file\_input\_output/file\_input\_output.py iomati @snippet python/tutorial\_code/core/file\_input\_output/file\_input\_output.py iomatw @snippet python/tutorial\_code/core/file\_input\_output/file\_input\_output.py iomat @end\_toggle -# **Input/Output of vectors (arrays) and associative maps.** As I mentioned beforehand, we can output maps and sequences (array, vector) too. Again we first print the name of the variable and then we have to specify if our output is either a sequence or map.

```
For sequence before the first element print the "[" character and after the last one the "]"
character. With Python, call `FileStorage.startWriteStruct(structure_name, struct_type)`,
where `struct_type` is `cv2.FileNode_MAP` or `cv2.FileNode_SEQ` to start writing the structure.
Call `FileStorage.endWriteStruct()` to finish the structure:
@add_toggle_cpp
@snippet cpp/tutorial_code/core/file_input_output/file_input_output.cpp writeStr
@end_toggle
@add_toggle_python
@snippet python/tutorial_code/core/file_input_output/file_input_output.py writeStr
@end_toggle
For maps the drill is the same however now we use the "{" and "}" delimiter characters:
@add_toggle_cpp
@snippet cpp/tutorial_code/core/file_input_output/file_input_output.cpp writeMap
@end_toggle
@add_toggle_python
@snippet python/tutorial_code/core/file_input_output/file_input_output.py writeMap
@end_toggle
To read from these we use the @ref cv::FileNode and the @ref cv::FileNodeIterator data
structures. The [] operator of the @ref cv::FileStorage class (or the getNode() function in Python) returns a @ref cv::FileNode data
type. If the node is sequential we can use the @ref cv::FileNodeIterator to iterate through the
items. In Python, the at() function can be used to address elements of the sequence and the
size() function returns the length of the sequence:
@add_toggle_cpp
@snippet cpp/tutorial_code/core/file_input_output/file_input_output.cpp readStr
@end_toggle
@add_toggle_python
@snippet python/tutorial_code/core/file_input_output/file_input_output.py readStr
@end_toggle
For maps you can use the [] operator (at() function in Python) again to access the given item (or the \>\> operator too):
@add_toggle_cpp
@snippet cpp/tutorial_code/core/file_input_output/file_input_output.cpp readMap
@end_toggle
@add_toggle_python
@snippet python/tutorial_code/core/file_input_output/file_input_output.py readMap
@end_toggle
```

\-# **Read and write your own data structures.** Suppose you have a data structure such as: @add\_toggle\_cpp @code{.cpp} class MyData { public: MyData() : A(0), X(0), id() {} public: // Data Members int A; double X; string id; }; @endcode @end\_toggle @add\_toggle\_python @code{.py} class MyData: def **init**(self): self.A = self.X = 0 self.name = '' @endcode @end\_toggle In C++, it's possible to serialize this through the OpenCV I/O XML/YAML interface (just as in case of the OpenCV data structures) by adding a read and a write function inside and outside of your class. In Python, you can get close to this by implementing a read and write function inside the class. For the inside part: @add\_toggle\_cpp @snippet cpp/tutorial\_code/core/file\_input\_output/file\_input\_output.cpp inside @end\_toggle @add\_toggle\_python @snippet python/tutorial\_code/core/file\_input\_output/file\_input\_output.py inside @end\_toggle @add\_toggle\_cpp In C++, you need to add the following functions definitions outside the class: @snippet cpp/tutorial\_code/core/file\_input\_output/file\_input\_output.cpp outside @end\_toggle Here you can observe that in the read section we defined what happens if the user tries to read a non-existing node. In this case we just return the default initialization value, however a more verbose solution would be to return for instance a minus one value for an object ID.

```
Once you added these four functions use the \>\> operator for write and the \<\< operator for
read (or the defined input/output functions for Python):
@add_toggle_cpp
@snippet cpp/tutorial_code/core/file_input_output/file_input_output.cpp customIOi
@snippet cpp/tutorial_code/core/file_input_output/file_input_output.cpp customIOw
@snippet cpp/tutorial_code/core/file_input_output/file_input_output.cpp customIO
@end_toggle
@add_toggle_python
@snippet python/tutorial_code/core/file_input_output/file_input_output.py customIOi
@snippet python/tutorial_code/core/file_input_output/file_input_output.py customIOw
@snippet python/tutorial_code/core/file_input_output/file_input_output.py customIO
@end_toggle
Or to try out reading a non-existing read:
@add_toggle_cpp
@snippet cpp/tutorial_code/core/file_input_output/file_input_output.cpp nonexist
@end_toggle
@add_toggle_python
@snippet python/tutorial_code/core/file_input_output/file_input_output.py nonexist
@end_toggle
```

## Result

Well mostly we just print out the defined numbers. On the screen of your console you could see: @code{.bash} Write Done.

Reading: 100image1.jpg Awesomeness baboon.jpg Two 2; One 1

R = \[1, 0, 0; 0, 1, 0; 0, 0, 1\] T = \[0; 0; 0\]

MyData = { id = mydata1234, X = 3.14159, A = 97}

Attempt to read NonExisting (should initialize the data structure with its default). NonExisting = { id = , X = 0, A = 0}

Tip: Open up output.xml with a text editor to see the serialized data. @endcode Nevertheless, it's much more interesting what you may see in the output xml file: @code{.xml} 100 image1.jpg Awesomeness baboon.jpg 1 2 3 3

u

1 0 0 0 1 0 0 0 1 3 1

d

0\. 0. 0. 97 3.1415926535897931e+000 mydata1234 @endcode Or the YAML file: @code{.yaml} %YAML:1.0 iterationNr: 100 strings:

-   "image1.jpg"
-   Awesomeness
-   "baboon.jpg" Mapping: One: 1 Two: 2 R: !!opencv-matrix rows: 3 cols: 3 dt: u data: \[ 1, 0, 0, 0, 1, 0, 0, 0, 1 \] T: !!opencv-matrix rows: 3 cols: 1 dt: d data: \[ 0., 0., 0. \] MyData: A: 97 X: 3.1415926535897931e+000 id: mydata1234 @endcode You may observe a runtime instance of this on the [YouTube here](https://www.youtube.com/watch?v=A4yqVnByMMM) .

@youtube{A4yqVnByMMM}

## [How To Scan Images](https://docharvest.github.io/docs/opencv5/tutorials/core/how_to_scan_images/how_to_scan_images/)

Contents

opencv5

How To Scan Images

OpenCV 5

How To Scan Images

# How to scan images, lookup tables and time measurement with OpenCV {#tutorial\_how\_to\_scan\_images}

@tableofcontents

@prev\_tutorial{tutorial\_mat\_the\_basic\_image\_container} @next\_tutorial{tutorial\_mat\_mask\_operations}

Original author

Bernát Gábor

Compatibility

OpenCV >= 3.0

## Goal

We'll seek answers for the following questions:

-   How to go through each and every pixel of an image?
-   How are OpenCV matrix values stored?
-   How to measure the performance of our algorithm?
-   What are lookup tables and why use them?

## Our test case

Let us consider a simple color reduction method. By using the unsigned char C and C++ type for matrix item storing, a channel of pixel may have up to 256 different values. For a three channel image this can allow the formation of way too many colors (16 million to be exact). Working with so many color shades may give a heavy blow to our algorithm performance. However, sometimes it is enough to work with a lot less of them to get the same final result.

In this cases it's common that we make a _color space reduction_. This means that we divide the color space current value with a new input value to end up with fewer colors. For instance every value between zero and nine takes the new value zero, every value between ten and nineteen the value ten and so on.

When you divide an _uchar_ (unsigned char - aka values between zero and 255) value with an _int_ value the result will be also _char_. These values may only be char values. Therefore, any fraction will be rounded down. Taking advantage of this fact the upper operation in the _uchar_ domain may be expressed as:

\\f\[I\_{new} = (\\frac{I\_{old}}{10}) \* 10\\f\]

A simple color space reduction algorithm would consist of just passing through every pixel of an image matrix and applying this formula. It's worth noting that we do a divide and a multiplication operation. These operations are bloody expensive for a system. If possible it's worth avoiding them by using cheaper operations such as a few subtractions, addition or in best case a simple assignment. Furthermore, note that we only have a limited number of input values for the upper operation. In case of the _uchar_ system this is 256 to be exact.

Therefore, for larger images it would be wise to calculate all possible values beforehand and during the assignment just make the assignment, by using a lookup table. Lookup tables are simple arrays (having one or more dimensions) that for a given input value variation holds the final output value. Its strength is that we do not need to make the calculation, we just need to read the result.

Our test case program (and the code sample below) will do the following: read in an image passed as a command line argument (it may be either color or grayscale) and apply the reduction with the given command line argument integer value. In OpenCV, at the moment there are three major ways of going through an image pixel by pixel. To make things a little more interesting we'll make the scanning of the image using each of these methods, and print out how long it took.

You can download the full source code [here](https://github.com/opencv/opencv/tree/5.x/samples/cpp/tutorial_code/core/how_to_scan_images/how_to_scan_images.cpp) or look it up in the samples directory of OpenCV at the cpp tutorial code for the core section. Its basic usage is: @code{.bash} how\_to\_scan\_images imageName.jpg intValueToReduce \[G\] @endcode The final argument is optional. If given the image will be loaded in grayscale format, otherwise the BGR color space is used. The first thing is to calculate the lookup table.

@snippet how\_to\_scan\_images.cpp dividewith

Here we first use the C++ _stringstream_ class to convert the third command line argument from text to an integer format. Then we use a simple look and the upper formula to calculate the lookup table. No OpenCV specific stuff here.

Another issue is how do we measure time? Well OpenCV offers two simple functions to achieve this cv::getTickCount() and cv::getTickFrequency() . The first returns the number of ticks of your systems CPU from a certain event (like since you booted your system). The second returns how many times your CPU emits a tick during a second. So, measuring amount of time elapsed between two operations is as easy as: @code{.cpp} double t = (double)getTickCount(); // do something ... t = ((double)getTickCount() - t)/getTickFrequency(); cout << "Times passed in seconds: " << t << endl; @endcode

## @anchor tutorial\_how\_to\_scan\_images\_storing How is the image matrix stored in memory?

As you could already read in my @ref tutorial\_mat\_the\_basic\_image\_container tutorial the size of the matrix depends on the color system used. More accurately, it depends on the number of channels used. In case of a grayscale image we have something like:

For multichannel images the columns contain as many sub columns as the number of channels. For example in case of an BGR color system:

Note that the order of the channels is inverse: BGR instead of RGB. Because in many cases the memory is large enough to store the rows in a successive fashion the rows may follow one after another, creating a single long row. Because everything is in a single place following one after another this may help to speed up the scanning process. We can use the cv::Mat::isContinuous() function to _ask_ the matrix if this is the case. Continue on to the next section to find an example.

## The efficient way

When it comes to performance you cannot beat the classic C style operator\[\] (pointer) access. Therefore, the most efficient method we can recommend for making the assignment is:

@snippet how\_to\_scan\_images.cpp scan-c

Here we basically just acquire a pointer to the start of each row and go through it until it ends. In the special case that the matrix is stored in a continuous manner we only need to request the pointer a single time and go all the way to the end. We need to look out for color images: we have three channels so we need to pass through three times more items in each row.

There's another way of this. The _data_ data member of a _Mat_ object returns the pointer to the first row, first column. If this pointer is null you have no valid input in that object. Checking this is the simplest method to check if your image loading was a success. In case the storage is continuous we can use this to go through the whole data pointer. In case of a grayscale image this would look like: @code{.cpp} uchar\* p = I.data;

for( unsigned int i = 0; i < ncol\*nrows; ++i) \*p++ = table\[\*p\]; @endcode You would get the same result. However, this code is a lot harder to read later on. It gets even harder if you have some more advanced technique there. Moreover, in practice I've observed you'll get the same performance result (as most of the modern compilers will probably make this small optimization trick automatically for you).

## The iterator (safe) method

In case of the efficient way making sure that you pass through the right amount of _uchar_ fields and to skip the gaps that may occur between the rows was your responsibility. The iterator method is considered a safer way as it takes over these tasks from the user. All you need to do is to ask the begin and the end of the image matrix and then just increase the begin iterator until you reach the end. To acquire the value _pointed_ by the iterator use the \* operator (add it before it).

@snippet how\_to\_scan\_images.cpp scan-iterator

In case of color images we have three uchar items per column. This may be considered a short vector of uchar items, that has been baptized in OpenCV with the _Vec3b_ name. To access the n-th sub column we use simple operator\[\] access. It's important to remember that OpenCV iterators go through the columns and automatically skip to the next row. Therefore in case of color images if you use a simple _uchar_ iterator you'll be able to access only the blue channel values.

## On-the-fly address calculation with reference returning

The final method isn't recommended for scanning. It was made to acquire or modify somehow random elements in the image. Its basic usage is to specify the row and column number of the item you want to access. During our earlier scanning methods you could already notice that it is important through what type we are looking at the image. It's no different here as you need to manually specify what type to use at the automatic lookup. You can observe this in case of the grayscale images for the following source code (the usage of the + cv::Mat::at() function):

@snippet how\_to\_scan\_images.cpp scan-random

The function takes your input type and coordinates and calculates the address of the queried item. Then returns a reference to that. This may be a constant when you _get_ the value and non-constant when you _set_ the value. As a safety step in **debug mode only**\* there is a check performed that your input coordinates are valid and do exist. If this isn't the case you'll get a nice output message of this on the standard error output stream. Compared to the efficient way in release mode the only difference in using this is that for every element of the image you'll get a new row pointer for what we use the C operator\[\] to acquire the column element.

If you need to do multiple lookups using this method for an image it may be troublesome and time consuming to enter the type and the at keyword for each of the accesses. To solve this problem OpenCV has a cv::Mat\_ data type. It's the same as Mat with the extra need that at definition you need to specify the data type through what to look at the data matrix, however in return you can use the operator() for fast access of items. To make things even better this is easily convertible from and to the usual cv::Mat data type. A sample usage of this you can see in case of the color images of the function above. Nevertheless, it's important to note that the same operation (with the same runtime speed) could have been done with the cv::Mat::at function. It's just a less to write for the lazy programmer trick.

## The Core Function

This is a bonus method of achieving lookup table modification in an image. In image processing it's quite common that you want to modify all of a given image values to some other value. OpenCV provides a function for modifying image values, without the need to write the scanning logic of the image. We use the cv::LUT() function of the core module. First we build a Mat type of the lookup table:

@snippet how\_to\_scan\_images.cpp table-init

Finally call the function (I is our input image and J the output one):

@snippet how\_to\_scan\_images.cpp table-use

## Performance Difference

For the best result compile the program and run it yourself. To make the differences more clear, I've used a quite large (2560 X 1600) image. The performance presented here are for color images. For a more accurate value I've averaged the value I got from the call of the function for hundred times.

Method

Time

Efficient Way

79.4717 milliseconds

Iterator

83.7201 milliseconds

On-The-Fly RA

93.7878 milliseconds

LUT function

32.5759 milliseconds

We can conclude a couple of things. If possible, use the already made functions of OpenCV (instead of reinventing these). The fastest method turns out to be the LUT function. This is because the OpenCV library is multi-thread enabled via Intel Threaded Building Blocks. However, if you need to write a simple image scan prefer the pointer method. The iterator is a safer bet, however quite slower. Using the on-the-fly reference access method for full image scan is the most costly in debug mode. In the release mode it may beat the iterator approach or not, however it surely sacrifices for this the safety trait of iterators.

Finally, you may watch a sample run of the program on the [video posted](https://www.youtube.com/watch?v=fB3AN5fjgwc) on our YouTube channel.

@youtube{fB3AN5fjgwc}

## [How To Use OpenCV Parallel For](https://docharvest.github.io/docs/opencv5/tutorials/core/how_to_use_OpenCV_parallel_for_/how_to_use_OpenCV_parallel_for_/)

Contents

opencv5

How To Use OpenCV Parallel For

OpenCV 5

How To Use OpenCV Parallel For

# How to use the OpenCV parallel\_for\_ function to parallelize your code (Mandelbrot set example) {#tutorial\_how\_to\_use\_OpenCV\_parallel\_for\_}

@tableofcontents

@prev\_tutorial{tutorial\_file\_input\_output\_with\_xml\_yml} @next\_tutorial{tutorial\_how\_to\_use\_OpenCV\_parallel\_for\_new} @next\_tutorial{tutorial\_univ\_intrin}

Compatibility

OpenCV >= 3.0

@note See this \[tuturial\](@ref tutorial\_how\_to\_use\_OpenCV\_parallel\_for\_new) for a `parallel_for_` usage applied to image convolution.

## Goal

The goal of this tutorial is to show you how to use the OpenCV `parallel_for_` framework to easily parallelize your code. To illustrate the concept, we will write a program to draw a Mandelbrot set exploiting almost all the CPU load available. The full tutorial code is [here](https://github.com/opencv/opencv/blob/5.x/samples/cpp/tutorial_code/core/how_to_use_OpenCV_parallel_for_/how_to_use_OpenCV_parallel_for_.cpp). If you want more information about multithreading, you will have to refer to a reference book or course as this tutorial is intended to remain simple.

## Precondition

The first precondition is to have OpenCV built with a parallel framework. In OpenCV 4, the following parallel frameworks are available in that order:

1.  Intel Threading Building Blocks (3rdparty library, should be explicitly enabled)
2.  OpenMP (integrated to compiler, should be explicitly enabled)
3.  APPLE GCD (system wide, used automatically (APPLE only))
4.  Windows RT concurrency (system wide, used automatically (Windows RT only))
5.  Windows concurrency (part of runtime, used automatically (Windows only - MSVC++ >= 10))
6.  Pthreads (if available)

As you can see, several parallel frameworks can be used in the OpenCV library. Some parallel libraries are third party libraries and have to be explicitly built and enabled in CMake (e.g. TBB), others are automatically available with the platform (e.g. APPLE GCD) but chances are that you should be enable to have access to a parallel framework either directly or by enabling the option in CMake and rebuild the library.

The second (weak) precondition is more related to the task you want to achieve as not all computations are suitable / can be adapted to be run in a parallel way. To remain simple, tasks that can be split into multiple elementary operations with no memory dependency (no possible race condition) are easily parallelizable. Computer vision processing are often easily parallelizable as most of the time the processing of one pixel does not depend to the state of other pixels.

## Simple example: drawing a Mandelbrot set

We will use the example of drawing a Mandelbrot set to show how from a regular sequential code you can easily adapt the code to parallelize the computation.

## Theory

The Mandelbrot set definition has been named in tribute to the mathematician Benoit Mandelbrot by the mathematician Adrien Douady. It has been famous outside of the mathematics field as the image representation is an example of a class of fractals, a mathematical set that exhibits a repeating pattern displayed at every scale (even more, a Mandelbrot set is self-similar as the whole shape can be repeatedly seen at different scale). For a more in-depth introduction, you can look at the corresponding [Wikipedia article](https://en.wikipedia.org/wiki/Mandelbrot_set). Here, we will just introduce the formula to draw the Mandelbrot set (from the mentioned Wikipedia article).

> The Mandelbrot set is the set of values of \\f$ c \\f$ in the complex plane for which the orbit of 0 under iteration of the quadratic map \\f\[\\begin{cases} z\_0 = 0 \\ z\_{n+1} = z\_n^2 + c \\end{cases}\\f\] remains bounded. That is, a complex number \\f$ c \\f$ is part of the Mandelbrot set if, when starting with \\f$ z\_0 = 0 \\f$ and applying the iteration repeatedly, the absolute value of \\f$ z\_n \\f$ remains bounded however large \\f$ n \\f$ gets. This can also be represented as \\f\[\\limsup\_{n\\to\\infty}|z\_{n+1}|\\leqslant2\\f\]

## Pseudocode

A simple algorithm to generate a representation of the Mandelbrot set is called the ["escape time algorithm"](https://en.wikipedia.org/wiki/Mandelbrot_set#Escape_time_algorithm). For each pixel in the rendered image, we test using the recurrence relation if the complex number is bounded or not under a maximum number of iterations. Pixels that do not belong to the Mandelbrot set will escape quickly whereas we assume that the pixel is in the set after a fixed maximum number of iterations. A high value of iterations will produce a more detailed image but the computation time will increase accordingly. We use the number of iterations needed to "escape" to depict the pixel value in the image.

```
For each pixel (Px, Py) on the screen, do:
{
  x0 = scaled x coordinate of pixel (scaled to lie in the Mandelbrot X scale (-2, 1))
  y0 = scaled y coordinate of pixel (scaled to lie in the Mandelbrot Y scale (-1, 1))
  x = 0.0
  y = 0.0
  iteration = 0
  max_iteration = 1000
  while (x*x + y*y < 2*2  AND  iteration < max_iteration) {
    xtemp = x*x - y*y + x0
    y = 2*x*y + y0
    x = xtemp
    iteration = iteration + 1
  }
  color = palette[iteration]
  plot(Px, Py, color)
}
```

To relate between the pseudocode and the theory, we have:

-   \\f$ z = x + iy \\f$
-   \\f$ z^2 = x^2 + i2xy - y^2 \\f$
-   \\f$ c = x\_0 + iy\_0 \\f$

On this figure, we recall that the real part of a complex number is on the x-axis and the imaginary part on the y-axis. You can see that the whole shape can be repeatedly visible if we zoom at particular locations.

## Implementation

## Escape time algorithm implementation

@snippet how\_to\_use\_OpenCV\_parallel\_for\_.cpp mandelbrot-escape-time-algorithm

Here, we used the [`std::complex`](https://en.cppreference.com/cpp/numeric/complex) template class to represent a complex number. This function performs the test to check if the pixel is in set or not and returns the "escaped" iteration.

## Sequential Mandelbrot implementation

@snippet how\_to\_use\_OpenCV\_parallel\_for\_.cpp mandelbrot-sequential

In this implementation, we sequentially iterate over the pixels in the rendered image to perform the test to check if the pixel is likely to belong to the Mandelbrot set or not.

Another thing to do is to transform the pixel coordinate into the Mandelbrot set space with:

@snippet how\_to\_use\_OpenCV\_parallel\_for\_.cpp mandelbrot-transformation

Finally, to assign the grayscale value to the pixels, we use the following rule:

-   a pixel is black if it reaches the maximum number of iterations (pixel is assumed to be in the Mandelbrot set),
-   otherwise we assign a grayscale value depending on the escaped iteration and scaled to fit the grayscale range.

@snippet how\_to\_use\_OpenCV\_parallel\_for\_.cpp mandelbrot-grayscale-value

Using a linear scale transformation is not enough to perceive the grayscale variation. To overcome this, we will boost the perception by using a square root scale transformation (borrowed from Jeremy D. Frens in his [blog post](https://web.archive.org/web/20250419124416/http://www.programming-during-recess.net/2016/06/26/color-schemes-for-mandelbrot-sets/)): \\f$ f \\left( x \\right) = \\sqrt{\\frac{x}{\\text{maxIter}}} \\times 255 \\f$

The green curve corresponds to a simple linear scale transformation, the blue one to a square root scale transformation and you can observe how the lowest values will be boosted when looking at the slope at these positions.

## Parallel Mandelbrot implementation

When looking at the sequential implementation, we can notice that each pixel is computed independently. To optimize the computation, we can perform multiple pixel calculations in parallel, by exploiting the multi-core architecture of modern processor. To achieve this easily, we will use the OpenCV @ref cv::parallel\_for\_ framework.

@snippet how\_to\_use\_OpenCV\_parallel\_for\_.cpp mandelbrot-parallel

The first thing is to declare a custom class that inherits from @ref cv::ParallelLoopBody and to override the `virtual void operator ()(const cv::Range& range) const`.

The range in the `operator ()` represents the subset of pixels that will be treated by an individual thread. This splitting is done automatically to distribute equally the computation load. We have to convert the pixel index coordinate to a 2D `[row, col]` coordinate. Also note that we have to keep a reference on the mat image to be able to modify in-place the image.

The parallel execution is called with:

@snippet how\_to\_use\_OpenCV\_parallel\_for\_.cpp mandelbrot-parallel-call

Here, the range represents the total number of operations to be executed, so the total number of pixels in the image. To set the number of threads, you can use: @ref cv::setNumThreads. You can also specify the number of splitting using the nstripes parameter in @ref cv::parallel\_for\_. For instance, if your processor has 4 threads, setting `cv::setNumThreads(2)` or setting `nstripes=2` should be the same as by default it will use all the processor threads available but will split the workload only on two threads.

@note C++ 11 standard allows simplifying the parallel implementation by get rid of the `ParallelMandelbrot` class and replacing it with lambda expression:

@snippet how\_to\_use\_OpenCV\_parallel\_for\_.cpp mandelbrot-parallel-call-cxx11

## Results

You can find the full tutorial code [here](https://github.com/opencv/opencv/blob/5.x/samples/cpp/tutorial_code/core/how_to_use_OpenCV_parallel_for_/how_to_use_OpenCV_parallel_for_.cpp). The performance of the parallel implementation depends of the type of CPU you have. For instance, on 4 cores / 8 threads CPU, you can expect a speed-up of around 6.9X. There are many factors to explain why we do not achieve a speed-up of almost 8X. Main reasons should be mostly due to:

-   the overhead to create and manage the threads,
-   background processes running in parallel,
-   the difference between 4 hardware cores with 2 logical threads for each core and 8 hardware cores.

The resulting image produced by the tutorial code (you can modify the code to use more iterations and assign a pixel color depending on the escaped iteration and using a color palette to get more aesthetic images):

## [How To Use OpenCV Parallel For New](https://docharvest.github.io/docs/opencv5/tutorials/core/how_to_use_OpenCV_parallel_for_new/how_to_use_OpenCV_parallel_for_new/)

Contents

opencv5

How To Use OpenCV Parallel For New

OpenCV 5

How To Use OpenCV Parallel For New

# How to use the OpenCV parallel\_for\_ function to parallelize your code (convolution example) {#tutorial\_how\_to\_use\_OpenCV\_parallel\_for\_new}

@tableofcontents

@prev\_tutorial{tutorial\_file\_input\_output\_with\_xml\_yml} @prev\_tutorial{tutorial\_how\_to\_use\_OpenCV\_parallel\_for\_} @next\_tutorial{tutorial\_univ\_intrin}

Compatibility

OpenCV >= 3.0

## Goal

The goal of this tutorial is to demonstrate the use of the OpenCV `parallel_for_` framework to easily parallelize your code. To illustrate the concept, we will write a program to perform convolution operation over an image. The full tutorial code is [here](https://github.com/opencv/opencv/blob/5.x/samples/cpp/tutorial_code/core/how_to_use_OpenCV_parallel_for_/how_to_use_OpenCV_parallel_for_new.cpp).

## Precondition

### Parallel Frameworks

The first precondition is to have OpenCV built with a parallel framework. In OpenCV 4.5, the following parallel frameworks are available in that order:

-   Intel Threading Building Blocks (3rdparty library, should be explicitly enabled)
-   OpenMP (integrated to compiler, should be explicitly enabled)
-   APPLE GCD (system wide, used automatically (APPLE only))
-   Windows RT concurrency (system wide, used automatically (Windows RT only))
-   Windows concurrency (part of runtime, used automatically (Windows only - MSVC++ >= 10))
-   Pthreads

As you can see, several parallel frameworks can be used in the OpenCV library. Some parallel libraries are third party libraries and have to be explicitly enabled in CMake before building, while others are automatically available with the platform (e.g. APPLE GCD).

### Race Conditions

Race conditions occur when more than one thread try to write _or_ read and write to a particular memory location simultaneously. Based on that, we can broadly classify algorithms into two categories:-

1.  Algorithms in which only a single thread writes data to a particular memory location.
    
    -   In _convolution_, for example, even though multiple threads may read from a pixel at a particular time, only a single thread _writes_ to a particular pixel.
2.  Algorithms in which multiple threads may write to a single memory location.
    
    -   Finding contours, features, etc. Such algorithms may require each thread to add data to a global variable simultaneously. For example, when detecting features, each thread will add features of their respective parts of the image to a common vector, thus creating a race condition.

## Convolution

We will use the example of performing a convolution to demonstrate the use of `parallel_for_` to parallelize the computation. This is an example of an algorithm which does not lead to a race condition.

## Theory

Convolution is a simple mathematical operation widely used in image processing. Here, we slide a smaller matrix, called the _kernel_, over an image and a sum of the product of pixel values and corresponding values in the kernel gives us the value of the particular pixel in the output (called the anchor point of the kernel). Based on the values in the kernel, we get different results. In the example below, we use a 3x3 kernel (anchored at its center) and convolve over a 5x5 matrix to produce a 3x3 matrix. The size of the output can be altered by padding the input with suitable values.

For more information about different kernels and what they do, look [here](https://en.wikipedia.org/wiki/Kernel_\(image_processing\))

For the purpose of this tutorial, we will implement the simplest form of the function which takes a grayscale image (1 channel) and an odd length square kernel and produces an output image. The operation will not be performed in-place. @note We can store a few of the relevant pixels temporarily to make sure we use the original values during the convolution and then do it in-place. However, the purpose of this tutorial is to introduce parallel\_for\_ function and an inplace implementation may be too complicated.

## Pseudocode

```
InputImage src, OutputImage dst, kernel(size n)
makeborder(src, n/2)
for each pixel (i, j) strictly inside borders, do:
{
    value := 0
    for k := -n/2 to n/2, do:
        for l := -n/2 to n/2, do:
            value += kernel[n/2 + k][n/2 + l]*src[i + k][j + l]

    dst[i][j] := value
}
```

For an _n-sized kernel_, we will add a border of size _n/2_ to handle edge cases. We then run two loops to move along the kernel and add the products to sum

## Implementation

### Sequential implementation

@snippet how\_to\_use\_OpenCV\_parallel\_for\_new.cpp convolution-sequential

We first make an output matrix(dst) with the same size as src and add borders to the src image(to handle edge cases). @snippet how\_to\_use\_OpenCV\_parallel\_for\_new.cpp convolution-make-borders

We then sequentially iterate over the pixels in the src image and compute the value over the kernel and the neighbouring pixel values. We then fill value to the corresponding pixel in the dst image. @snippet how\_to\_use\_OpenCV\_parallel\_for\_new.cpp convolution-kernel-loop

### Parallel implementation

When looking at the sequential implementation, we can notice that each pixel depends on multiple neighbouring pixels but only one pixel is edited at a time. Thus, to optimize the computation, we can split the image into stripes and parallelly perform convolution on each, by exploiting the multi-core architecture of modern processor. The OpenCV @ref cv::parallel\_for\_ framework automatically decides how to split the computation efficiently and does most of the work for us.

@note Although values of a pixel in a particular stripe may depend on pixel values outside the stripe, these are only read only operations and hence will not cause undefined behaviour.

We first declare a custom class that inherits from @ref cv::ParallelLoopBody and override the `virtual void operator ()(const cv::Range& range) const`. @snippet how\_to\_use\_OpenCV\_parallel\_for\_new.cpp convolution-parallel

The range in the `operator ()` represents the subset of values that will be treated by an individual thread. Based on the requirement, there may be different ways of splitting the range which in turn changes the computation.

For example, we can either

1.  Split the entire traversal of the image and obtain the \[row, col\] coordinate in the following way (as shown in the above code):
    
    @snippet how\_to\_use\_OpenCV\_parallel\_for\_new.cpp overload-full
    
    We would then call the parallel\_for\_ function in the following way: @snippet how\_to\_use\_OpenCV\_parallel\_for\_new.cpp convolution-parallel-function
    

2.  Split the rows and compute for each row:
    
    @snippet how\_to\_use\_OpenCV\_parallel\_for\_new.cpp overload-row-split
    
    In this case, we call the parallel\_for\_ function with a different range: @snippet how\_to\_use\_OpenCV\_parallel\_for\_new.cpp convolution-parallel-function-row
    

@note In our case, both implementations perform similarly. Some cases may allow better memory access patterns or other performance benefits.

To set the number of threads, you can use: @ref cv::setNumThreads. You can also specify the number of splitting using the nstripes parameter in @ref cv::parallel\_for\_. For instance, if your processor has 4 threads, setting `cv::setNumThreads(2)` or setting `nstripes=2` should be the same as by default it will use all the processor threads available but will split the workload only on two threads.

@note C++ 11 standard allows simplifying the parallel implementation by getting rid of the `parallelConvolution` class and replacing it with lambda expression:

@snippet how\_to\_use\_OpenCV\_parallel\_for\_new.cpp convolution-parallel-cxx11

## Results

The resulting time taken for execution of the two implementations on a

-   _512x512 input_ with a _5x5 kernel_:
    
    ```
      This program shows how to use the OpenCV parallel_for_ function and
      compares the performance of the sequential and parallel implementations for a
      convolution operation
      Usage:
      ./a.out [image_path -- default lena.jpg]
    
      Sequential Implementation: 0.0953564s
      Parallel Implementation: 0.0246762s
      Parallel Implementation(Row Split): 0.0248722s
    ```
    

-   _512x512 input with a 3x3 kernel_
    
    ```
      This program shows how to use the OpenCV parallel_for_ function and
      compares the performance of the sequential and parallel implementations for a
      convolution operation
      Usage:
      ./a.out [image_path -- default lena.jpg]
    
      Sequential Implementation: 0.0301325s
      Parallel Implementation: 0.0117053s
      Parallel Implementation(Row Split): 0.0117894s
    ```
    

The performance of the parallel implementation depends on the type of CPU you have. For instance, on 4 cores - 8 threads CPU, runtime may be 6x to 7x faster than a sequential implementation. There are many factors to explain why we do not achieve a speed-up of 8x: \* the overhead to create and manage the threads, \* background processes running in parallel, \* the difference between 4 hardware cores with 2 logical threads for each core and 8 hardware cores.

In the tutorial, we used a horizontal gradient filter(as shown in the animation above), which produces an image highlighting the vertical edges.

## [Mat Operations](https://docharvest.github.io/docs/opencv5/tutorials/core/mat_operations/)

Contents

opencv5

Mat Operations

OpenCV 5

Mat Operations

# Operations with images {#tutorial\_mat\_operations}

@tableofcontents

@prev\_tutorial{tutorial\_mat\_mask\_operations} @next\_tutorial{tutorial\_adding\_images}

Compatibility

OpenCV >= 3.0

## Input/Output

### Images

Load an image from a file:

@add\_toggle\_cpp @snippet samples/cpp/tutorial\_code/core/mat\_operations/mat\_operations.cpp Load an image from a file @end\_toggle

@add\_toggle\_java @snippet samples/java/tutorial\_code/core/mat\_operations/MatOperations.java Load an image from a file @end\_toggle

@add\_toggle\_python @snippet samples/python/tutorial\_code/core/mat\_operations/mat\_operations.py Load an image from a file @end\_toggle

If you read a jpg file, a 3 channel image is created by default. If you need a grayscale image, use:

@add\_toggle\_cpp @snippet samples/cpp/tutorial\_code/core/mat\_operations/mat\_operations.cpp Load an image from a file in grayscale @end\_toggle

@add\_toggle\_java @snippet samples/java/tutorial\_code/core/mat\_operations/MatOperations.java Load an image from a file in grayscale @end\_toggle

@add\_toggle\_python @snippet samples/python/tutorial\_code/core/mat\_operations/mat\_operations.py Load an image from a file in grayscale @end\_toggle

@note Format of the file is determined by its content (first few bytes). To save an image to a file:

@add\_toggle\_cpp @snippet samples/cpp/tutorial\_code/core/mat\_operations/mat\_operations.cpp Save image @end\_toggle

@add\_toggle\_java @snippet samples/java/tutorial\_code/core/mat\_operations/MatOperations.java Save image @end\_toggle

@add\_toggle\_python @snippet samples/python/tutorial\_code/core/mat\_operations/mat\_operations.py Save image @end\_toggle

@note Format of the file is determined by its extension.

@note Use cv::imdecode and cv::imencode to read and write an image from/to memory rather than a file.

## Basic operations with images

### Accessing pixel intensity values

In order to get pixel intensity value, you have to know the type of an image and the number of channels. Here is an example for a single channel grey scale image (type 8UC1) and pixel coordinates x and y:

@add\_toggle\_cpp @snippet samples/cpp/tutorial\_code/core/mat\_operations/mat\_operations.cpp Pixel access 1 @end\_toggle

@add\_toggle\_java @snippet samples/java/tutorial\_code/core/mat\_operations/MatOperations.java Pixel access 1 @end\_toggle

@add\_toggle\_python @snippet samples/python/tutorial\_code/core/mat\_operations/mat\_operations.py Pixel access 1 @end\_toggle

C++ version only: intensity.val\[0\] contains a value from 0 to 255. Note the ordering of x and y. Since in OpenCV images are represented by the same structure as matrices, we use the same convention for both cases - the 0-based row index (or y-coordinate) goes first and the 0-based column index (or x-coordinate) follows it. Alternatively, you can use the following notation (**C++ only**):

@snippet samples/cpp/tutorial\_code/core/mat\_operations/mat\_operations.cpp Pixel access 2

Now let us consider a 3 channel image with BGR color ordering (the default format returned by imread):

**C++ code** @snippet samples/cpp/tutorial\_code/core/mat\_operations/mat\_operations.cpp Pixel access 3

**Python Python** @snippet samples/python/tutorial\_code/core/mat\_operations/mat\_operations.py Pixel access 3

You can use the same method for floating-point images (for example, you can get such an image by running Sobel on a 3 channel image) (**C++ only**):

@snippet samples/cpp/tutorial\_code/core/mat\_operations/mat\_operations.cpp Pixel access 4

The same method can be used to change pixel intensities:

@add\_toggle\_cpp @snippet samples/cpp/tutorial\_code/core/mat\_operations/mat\_operations.cpp Pixel access 5 @end\_toggle

@add\_toggle\_java @snippet samples/java/tutorial\_code/core/mat\_operations/MatOperations.java Pixel access 5 @end\_toggle

@add\_toggle\_python @snippet samples/python/tutorial\_code/core/mat\_operations/mat\_operations.py Pixel access 5 @end\_toggle

There are functions in OpenCV, especially from calib3d module, such as cv::projectPoints, that take an array of 2D or 3D points in the form of Mat. Matrix should contain exactly one column, each row corresponds to a point, matrix type should be 32FC2 or 32FC3 correspondingly. Such a matrix can be easily constructed from `std::vector` (**C++ only**):

@snippet samples/cpp/tutorial\_code/core/mat\_operations/mat\_operations.cpp Mat from points vector

One can access a point in this matrix using the same method `Mat::at` (**C++ only**):

@snippet samples/cpp/tutorial\_code/core/mat\_operations/mat\_operations.cpp Point access

### Memory management and reference counting

Mat is a structure that keeps matrix/image characteristics (rows and columns number, data type etc) and a pointer to data. So nothing prevents us from having several instances of Mat corresponding to the same data. A Mat keeps a reference count that tells if data has to be deallocated when a particular instance of Mat is destroyed. Here is an example of creating two matrices without copying data (**C++ only**):

@snippet samples/cpp/tutorial\_code/core/mat\_operations/mat\_operations.cpp Reference counting 1

As a result, we get a 32FC1 matrix with 3 columns instead of 32FC3 matrix with 1 column. `pointsMat` uses data from points and will not deallocate the memory when destroyed. In this particular instance, however, developer has to make sure that lifetime of `points` is longer than of `pointsMat` If we need to copy the data, this is done using, for example, cv::Mat::copyTo or cv::Mat::clone:

@add\_toggle\_cpp @snippet samples/cpp/tutorial\_code/core/mat\_operations/mat\_operations.cpp Reference counting 2 @end\_toggle

@add\_toggle\_java @snippet samples/java/tutorial\_code/core/mat\_operations/MatOperations.java Reference counting 2 @end\_toggle

@add\_toggle\_python @snippet samples/python/tutorial\_code/core/mat\_operations/mat\_operations.py Reference counting 2 @end\_toggle

An empty output Mat can be supplied to each function. Each implementation calls Mat::create for a destination matrix. This method allocates data for a matrix if it is empty. If it is not empty and has the correct size and type, the method does nothing. If however, size or type are different from the input arguments, the data is deallocated (and lost) and a new data is allocated. For example:

@add\_toggle\_cpp @snippet samples/cpp/tutorial\_code/core/mat\_operations/mat\_operations.cpp Reference counting 3 @end\_toggle

@add\_toggle\_java @snippet samples/java/tutorial\_code/core/mat\_operations/MatOperations.java Reference counting 3 @end\_toggle

@add\_toggle\_python @snippet samples/python/tutorial\_code/core/mat\_operations/mat\_operations.py Reference counting 3 @end\_toggle

### Primitive operations

There is a number of convenient operators defined on a matrix. For example, here is how we can make a black image from an existing greyscale image `img`

@add\_toggle\_cpp @snippet samples/cpp/tutorial\_code/core/mat\_operations/mat\_operations.cpp Set image to black @end\_toggle

@add\_toggle\_java @snippet samples/java/tutorial\_code/core/mat\_operations/MatOperations.java Set image to black @end\_toggle

@add\_toggle\_python @snippet samples/python/tutorial\_code/core/mat\_operations/mat\_operations.py Set image to black @end\_toggle

Selecting a region of interest:

@add\_toggle\_cpp @snippet samples/cpp/tutorial\_code/core/mat\_operations/mat\_operations.cpp Select ROI @end\_toggle

@add\_toggle\_java @snippet samples/java/tutorial\_code/core/mat\_operations/MatOperations.java Select ROI @end\_toggle

@add\_toggle\_python @snippet samples/python/tutorial\_code/core/mat\_operations/mat\_operations.py Select ROI @end\_toggle

Conversion from color to greyscale:

@add\_toggle\_cpp @snippet samples/cpp/tutorial\_code/core/mat\_operations/mat\_operations.cpp BGR to Gray @end\_toggle

@add\_toggle\_java @snippet samples/java/tutorial\_code/core/mat\_operations/MatOperations.java BGR to Gray @end\_toggle

@add\_toggle\_python @snippet samples/python/tutorial\_code/core/mat\_operations/mat\_operations.py BGR to Gray @end\_toggle

Change image type from 8UC1 to 32FC1:

@add\_toggle\_cpp @snippet samples/cpp/tutorial\_code/core/mat\_operations/mat\_operations.cpp Convert to CV\_32F @end\_toggle

@add\_toggle\_java @snippet samples/java/tutorial\_code/core/mat\_operations/MatOperations.java Convert to CV\_32F @end\_toggle

@add\_toggle\_python @snippet samples/python/tutorial\_code/core/mat\_operations/mat\_operations.py Convert to CV\_32F @end\_toggle

### Visualizing images

It is very useful to see intermediate results of your algorithm during development process. OpenCV provides a convenient way of visualizing images. A 8U image can be shown using:

@add\_toggle\_cpp @snippet samples/cpp/tutorial\_code/core/mat\_operations/mat\_operations.cpp imshow 1 @end\_toggle

@add\_toggle\_java @snippet samples/java/tutorial\_code/core/mat\_operations/MatOperations.java imshow 1 @end\_toggle

@add\_toggle\_python @snippet samples/python/tutorial\_code/core/mat\_operations/mat\_operations.py imshow 1 @end\_toggle

A call to waitKey() starts a message passing cycle that waits for a key stroke in the "image" window. A 32F image needs to be converted to 8U type. For example:

@add\_toggle\_cpp @snippet samples/cpp/tutorial\_code/core/mat\_operations/mat\_operations.cpp imshow 2 @end\_toggle

@add\_toggle\_java @snippet samples/java/tutorial\_code/core/mat\_operations/MatOperations.java imshow 2 @end\_toggle

@add\_toggle\_python @snippet samples/python/tutorial\_code/core/mat\_operations/mat\_operations.py imshow 2 @end\_toggle

@note Here cv::namedWindow is not necessary since it is immediately followed by cv::imshow. Nevertheless, it can be used to change the window properties or when using cv::createTrackbar

## [Mat The Basic Image Container](https://docharvest.github.io/docs/opencv5/tutorials/core/mat_the_basic_image_container/mat_the_basic_image_container/)

Contents

opencv5

Mat The Basic Image Container

OpenCV 5

Mat The Basic Image Container

# Mat - The Basic Image Container {#tutorial\_mat\_the\_basic\_image\_container}

@tableofcontents

@next\_tutorial{tutorial\_how\_to\_scan\_images}

Original author

Bernát Gábor

Compatibility

OpenCV >= 3.0

## Goal

We have multiple ways to acquire digital images from the real world: digital cameras, scanners, computed tomography, and magnetic resonance imaging to name a few. In every case what we (humans) see are images. However, when transforming this to our digital devices what we record are numerical values for each of the points of the image.

For example in the above image you can see that the mirror of the car is nothing more than a matrix containing all the intensity values of the pixel points. How we get and store the pixels values may vary according to our needs, but in the end all images inside a computer world may be reduced to numerical matrices and other information describing the matrix itself. _OpenCV_ is a computer vision library whose main focus is to process and manipulate this information. Therefore, the first thing you need to be familiar with is how OpenCV stores and handles images.

## Mat

OpenCV has been around since 2001. In those days the library was built around a _C_ interface and to store the image in the memory they used a C structure called _IplImage_. This is the one you'll see in most of the older tutorials and educational materials. The problem with this is that it brings to the table all the minuses of the C language. The biggest issue is the manual memory management. It builds on the assumption that the user is responsible for taking care of memory allocation and deallocation. While this is not a problem with smaller programs, once your code base grows it will be more of a struggle to handle all this rather than focusing on solving your development goal.

Luckily C++ came around and introduced the concept of classes making easier for the user through automatic memory management (more or less). The good news is that C++ is fully compatible with C so no compatibility issues can arise from making the change. Therefore, OpenCV 2.0 introduced a new C++ interface which offered a new way of doing things which means you do not need to fiddle with memory management, making your code concise (less to write, to achieve more). The main downside of the C++ interface is that many embedded development systems at the moment support only C. Therefore, unless you are targeting embedded platforms, there's no point to using the _old_ methods (unless you're a masochist programmer and you're asking for trouble).

The first thing you need to know about _Mat_ is that you no longer need to manually allocate its memory and release it as soon as you do not need it. While doing this is still a possibility, most of the OpenCV functions will allocate its output data automatically. As a nice bonus if you pass on an already existing _Mat_ object, which has already allocated the required space for the matrix, this will be reused. In other words we use at all times only as much memory as we need to perform the task.

_Mat_ is basically a class with two data parts: the matrix header (containing information such as the size of the matrix, the method used for storing, at which address is the matrix stored, and so on) and a pointer to the matrix containing the pixel values (taking any dimensionality depending on the method chosen for storing) . The matrix header size is constant, however the size of the matrix itself may vary from image to image and usually is larger by orders of magnitude.

OpenCV is an image processing library. It contains a large collection of image processing functions. To solve a computational challenge, most of the time you will end up using multiple functions of the library. Because of this, passing images to functions is a common practice. We should not forget that we are talking about image processing algorithms, which tend to be quite computational heavy. The last thing we want to do is further decrease the speed of your program by making unnecessary copies of potentially _large_ images.

To tackle this issue OpenCV uses a reference counting system. The idea is that each _Mat_ object has its own header, however a matrix may be shared between two _Mat_ objects by having their matrix pointers point to the same address. Moreover, the copy operators **will only copy the headers** and the pointer to the large matrix, not the data itself.

@code{.cpp} Mat A, C; // creates just the header parts A = imread(argv\[1\], IMREAD\_COLOR); // here we'll know the method used (allocate matrix)

Mat B(A); // Use the copy constructor

C = A; // Assignment operator @endcode

All the above objects, in the end, point to the same single data matrix and making a modification using any of them will affect all the other ones as well. In practice the different objects just provide different access methods to the same underlying data. Nevertheless, their header parts are different. The real interesting part is that you can create headers which refer to only a subsection of the full data. For example, to create a region of interest (_ROI_) in an image you just create a new header with the new boundaries: @code{.cpp} Mat D (A, Rect(10, 10, 100, 100) ); // using a rectangle Mat E = A(Range::all(), Range(1,3)); // using row and column boundaries @endcode Now you may ask -- if the matrix itself may belong to multiple _Mat_ objects, who takes responsibility for cleaning it up when it's no longer needed? The short answer is: the last object that used it. This is handled by using a reference counting mechanism. Whenever somebody copies a header of a _Mat_ object, a counter is increased for the matrix. Whenever a header is cleaned, this counter is decreased. When the counter reaches zero the matrix is freed. Sometimes you will want to copy the matrix itself too, so OpenCV provides @ref cv::Mat::clone() and @ref cv::Mat::copyTo() functions. @code{.cpp} Mat F = A.clone(); Mat G; A.copyTo(G); @endcode Now modifying _F_ or _G_ will not affect the matrix pointed to by the _A_'s header. What you need to remember from all this is that:

-   Output image allocation for OpenCV functions is automatic (unless specified otherwise).
-   You do not need to think about memory management with OpenCV's C++ interface.
-   The assignment operator and the copy constructor only copy the header.
-   The underlying matrix of an image may be copied using the @ref cv::Mat::clone() and @ref cv::Mat::copyTo() functions.

## Storing methods

This is about how you store the pixel values. You can select the color space and the data type used. The color space refers to how we combine color components in order to code a given color. The simplest one is the grayscale where the colors at our disposal are black and white. The combination of these allows us to create many shades of gray.

For _colorful_ ways we have a lot more methods to choose from. Each of them breaks it down to three or four basic components and we can use the combination of these to create the others. The most popular one is RGB, mainly because this is also how our eye builds up colors. Its base colors are red, green and blue. To code the transparency of a color sometimes a fourth element, alpha (A), is added.

There are, however, many other color systems, each with their own advantages:

-   RGB is the most common as our eyes use something similar, however keep in mind that OpenCV standard display system composes colors using the BGR color space (red and blue channels are swapped places).
-   The HSV and HLS decompose colors into their hue, saturation and value/luminance components, which is a more natural way for us to describe colors. You might, for example, dismiss the last component, making your algorithm less sensible to the light conditions of the input image.
-   YCrCb is used by the popular JPEG image format.
-   CIE L\*a\*b\* is a perceptually uniform color space, which comes in handy if you need to measure the _distance_ of a given color to another color.

Each of the building components has its own valid domains. This leads to the data type used. How we store a component defines the control we have over its domain. The smallest data type possible is _char_, which means one byte or 8 bits. This may be unsigned (so can store values from 0 to 255) or signed (values from -127 to +127). Although this width, in the case of three components (like RGB), already gives 16 million possible colors to represent, we may acquire an even finer control by using the float (4 byte = 32 bit) or double (8 byte = 64 bit) data types for each component. Nevertheless, remember that increasing the size of a component also increases the size of the whole picture in memory.

## Creating a Mat object explicitly

In the @ref tutorial\_display\_image tutorial you have already learned how to write a matrix to an image file by using the @ref cv::imwrite() function. However, for debugging purposes it's much more convenient to see the actual values. You can do this using the << operator of _Mat_. Be aware that this only works for two dimensional matrices.

Although _Mat_ works really well as an image container, it is also a general matrix class. Therefore, it is possible to create and manipulate multidimensional matrices. You can create a Mat object in multiple ways:

-   @ref cv::Mat::Mat Constructor
    
    @snippet mat\_the\_basic\_image\_container.cpp constructor
    
    For two dimensional and multichannel images we first define their size: row and column count wise.
    
    Then we need to specify the data type to use for storing the elements and the number of channels per matrix point. To do this we have multiple definitions constructed according to the following convention: @code CV\_\[The number of bits per item\]\[Signed or Unsigned\]\[Type Prefix\]C\[The channel number\] @endcode For instance, _CV\_8UC3_ means we use unsigned char types that are 8 bit long and each pixel has three of these to form the three channels. There are types predefined for up to four channels. The @ref cv::Scalar is four element short vector. Specify it and you can initialize all matrix points with a custom value. If you need more you can create the type with the upper macro, setting the channel number in parenthesis as you can see below.
    
-   Use C/C++ arrays and initialize via constructor
    
    @snippet mat\_the\_basic\_image\_container.cpp init
    
    The upper example shows how to create a matrix with more than two dimensions. Specify its dimension, then pass a pointer containing the size for each dimension and the rest remains the same.
    
-   @ref cv::Mat::create function:
    
    @snippet mat\_the\_basic\_image\_container.cpp create
    
    You cannot initialize the matrix values with this construction. It will only reallocate its matrix data memory if the new size will not fit into the old one.
    
-   MATLAB style initializer: @ref cv::Mat::zeros , @ref cv::Mat::ones , @ref cv::Mat::eye . Specify size and data type to use:
    
    @snippet mat\_the\_basic\_image\_container.cpp matlab
    
-   For small matrices you may use initializer lists:
    
    @snippet mat\_the\_basic\_image\_container.cpp list
    
-   Create a new header for an existing _Mat_ object and @ref cv::Mat::clone or @ref cv::Mat::copyTo it.
    
    @snippet mat\_the\_basic\_image\_container.cpp clone
    
    @note You can fill out a matrix with random values using the @ref cv::randu() function. You need to give a lower and upper limit for the random values: @snippet mat\_the\_basic\_image\_container.cpp random
    

## Output formatting

In the above examples you could see the default formatting option. OpenCV, however, allows you to format your matrix output:

-   Default @snippet mat\_the\_basic\_image\_container.cpp out-default
    
-   Python @snippet mat\_the\_basic\_image\_container.cpp out-python
    
-   Comma separated values (CSV) @snippet mat\_the\_basic\_image\_container.cpp out-csv
    
-   Numpy @snippet mat\_the\_basic\_image\_container.cpp out-numpy
    
-   C @snippet mat\_the\_basic\_image\_container.cpp out-c
    

## Output of other common items

OpenCV offers support for output of other common OpenCV data structures too via the << operator:

-   2D Point @snippet mat\_the\_basic\_image\_container.cpp out-point2
    
-   3D Point @snippet mat\_the\_basic\_image\_container.cpp out-point3
    
-   std::vector via cv::Mat @snippet mat\_the\_basic\_image\_container.cpp out-vector
    
-   std::vector of points @snippet mat\_the\_basic\_image\_container.cpp out-vector-points
    

Most of the samples here have been included in a small console application. You can download it from [here](https://github.com/opencv/opencv/tree/5.x/samples/cpp/tutorial_code/core/mat_the_basic_image_container/mat_the_basic_image_container.cpp) or in the core section of the cpp samples.

You can also find a quick video demonstration of this on [YouTube](https://www.youtube.com/watch?v=1tibU7vGWpk).

@youtube{1tibU7vGWpk}

## [Mat Mask Operations](https://docharvest.github.io/docs/opencv5/tutorials/core/mat-mask-operations/mat_mask_operations/)

Contents

opencv5

Mat Mask Operations

OpenCV 5

Mat Mask Operations

# Mask operations on matrices {#tutorial\_mat\_mask\_operations}

@tableofcontents

@prev\_tutorial{tutorial\_how\_to\_scan\_images} @next\_tutorial{tutorial\_mat\_operations}

Original author

Bernát Gábor

Compatibility

OpenCV >= 3.0

Mask operations on matrices are quite simple. The idea is that we recalculate each pixel's value in an image according to a mask matrix (also known as kernel). This mask holds values that will adjust how much influence neighboring pixels (and the current pixel) have on the new pixel value. From a mathematical point of view we make a weighted average, with our specified values.

## Our test case

Let's consider the issue of an image contrast enhancement method. Basically we want to apply for every pixel of the image the following formula:

\\f\[I(i,j) = 5\*I(i,j) - \[ I(i-1,j) + I(i+1,j) + I(i,j-1) + I(i,j+1)\]\\f\]\\f\[\\iff I(i,j)\*M, \\text{where } M = \\bordermatrix{ \_i\\backslash ^j & -1 & 0 & +1 \\cr -1 & 0 & -1 & 0 \\cr 0 & -1 & 5 & -1 \\cr +1 & 0 & -1 & 0 \\cr }\\f\]

The first notation is by using a formula, while the second is a compacted version of the first by using a mask. You use the mask by putting the center of the mask matrix (in the upper case noted by the zero-zero index) on the pixel you want to calculate and sum up the pixel values multiplied with the overlapped matrix values. It's the same thing, however in case of large matrices the latter notation is a lot easier to look over.

## Code

@add\_toggle\_cpp You can download this source code from [here](https://raw.githubusercontent.com/opencv/opencv/5.x/samples/cpp/tutorial_code/core/mat_mask_operations/mat_mask_operations.cpp) or look in the OpenCV source code libraries sample directory at `samples/cpp/tutorial_code/core/mat_mask_operations/mat_mask_operations.cpp`. @include samples/cpp/tutorial\_code/core/mat\_mask\_operations/mat\_mask\_operations.cpp @end\_toggle

@add\_toggle\_java You can download this source code from [here](https://raw.githubusercontent.com/opencv/opencv/5.x/samples/java/tutorial_code/core/mat_mask_operations/MatMaskOperations.java) or look in the OpenCV source code libraries sample directory at `samples/java/tutorial_code/core/mat_mask_operations/MatMaskOperations.java`. @include samples/java/tutorial\_code/core/mat\_mask\_operations/MatMaskOperations.java @end\_toggle

@add\_toggle\_python You can download this source code from [here](https://raw.githubusercontent.com/opencv/opencv/5.x/samples/python/tutorial_code/core/mat_mask_operations/mat_mask_operations.py) or look in the OpenCV source code libraries sample directory at `samples/python/tutorial_code/core/mat_mask_operations/mat_mask_operations.py`. @include samples/python/tutorial\_code/core/mat\_mask\_operations/mat\_mask\_operations.py @end\_toggle

## The Basic Method

Now let us see how we can make this happen by using the basic pixel access method or by using the **filter2D()** function.

Here's a function that will do this: @add\_toggle\_cpp @snippet samples/cpp/tutorial\_code/core/mat\_mask\_operations/mat\_mask\_operations.cpp basic\_method

At first we make sure that the input images data is in unsigned char format. For this we use the @ref CV\_Assert function (macro) that throws an error when the expression inside it is false. @snippet samples/cpp/tutorial\_code/core/mat\_mask\_operations/mat\_mask\_operations.cpp 8\_bit @end\_toggle

@add\_toggle\_java @snippet samples/java/tutorial\_code/core/mat\_mask\_operations/MatMaskOperations.java basic\_method

At first we make sure that the input images data in unsigned 8 bit format. @snippet samples/java/tutorial\_code/core/mat\_mask\_operations/MatMaskOperations.java 8\_bit @end\_toggle

@add\_toggle\_python @snippet samples/python/tutorial\_code/core/mat\_mask\_operations/mat\_mask\_operations.py basic\_method

At first we make sure that the input images data in unsigned 8 bit format. @code{.py} my\_image = cv.cvtColor(my\_image, cv.CV\_8U) @endcode

@end\_toggle

We create an output image with the same size and the same type as our input. As you can see in the @ref tutorial\_how\_to\_scan\_images\_storing "storing" section, depending on the number of channels we may have one or more subcolumns.

@add\_toggle\_cpp We will iterate through them via pointers so the total number of elements depends on this number. @snippet samples/cpp/tutorial\_code/core/mat\_mask\_operations/mat\_mask\_operations.cpp create\_channels @end\_toggle

@add\_toggle\_java @snippet samples/java/tutorial\_code/core/mat\_mask\_operations/MatMaskOperations.java create\_channels @end\_toggle

@add\_toggle\_python @code{.py} height, width, n\_channels = my\_image.shape result = np.zeros(my\_image.shape, my\_image.dtype) @endcode @end\_toggle

@add\_toggle\_cpp We'll use the plain C \[\] operator to access pixels. Because we need to access multiple rows at the same time we'll acquire the pointers for each of them (a previous, a current and a next line). We need another pointer to where we're going to save the calculation. Then simply access the right items with the \[\] operator. For moving the output pointer ahead we simply increase this (with one byte) after each operation: @snippet samples/cpp/tutorial\_code/core/mat\_mask\_operations/mat\_mask\_operations.cpp basic\_method\_loop

On the borders of the image the upper notation results inexistent pixel locations (like minus one - minus one). In these points our formula is undefined. A simple solution is to not apply the kernel in these points and, for example, set the pixels on the borders to zeros:

@snippet samples/cpp/tutorial\_code/core/mat\_mask\_operations/mat\_mask\_operations.cpp borders @end\_toggle

@add\_toggle\_java We need to access multiple rows and columns which can be done by adding or subtracting 1 to the current center (i,j). Then we apply the sum and put the new value in the Result matrix. @snippet samples/java/tutorial\_code/core/mat\_mask\_operations/MatMaskOperations.java basic\_method\_loop

On the borders of the image the upper notation results in inexistent pixel locations (like (-1,-1)). In these points our formula is undefined. A simple solution is to not apply the kernel in these points and, for example, set the pixels on the borders to zeros:

@snippet samples/java/tutorial\_code/core/mat\_mask\_operations/MatMaskOperations.java borders @end\_toggle

@add\_toggle\_python We need to access multiple rows and columns which can be done by adding or subtracting 1 to the current center (i,j). Then we apply the sum and put the new value in the Result matrix. @snippet samples/python/tutorial\_code/core/mat\_mask\_operations/mat\_mask\_operations.py basic\_method\_loop @end\_toggle

## The filter2D function

Applying such filters are so common in image processing that in OpenCV there is a function that will take care of applying the mask (also called a kernel in some places). For this you first need to define an object that holds the mask:

@add\_toggle\_cpp @snippet samples/cpp/tutorial\_code/core/mat\_mask\_operations/mat\_mask\_operations.cpp kern @end\_toggle

@add\_toggle\_java @snippet samples/java/tutorial\_code/core/mat\_mask\_operations/MatMaskOperations.java kern @end\_toggle

@add\_toggle\_python @snippet samples/python/tutorial\_code/core/mat\_mask\_operations/mat\_mask\_operations.py kern @end\_toggle

Then call the **filter2D()** function specifying the input, the output image and the kernel to use:

@add\_toggle\_cpp @snippet samples/cpp/tutorial\_code/core/mat\_mask\_operations/mat\_mask\_operations.cpp filter2D @end\_toggle

@add\_toggle\_java @snippet samples/java/tutorial\_code/core/mat\_mask\_operations/MatMaskOperations.java filter2D @end\_toggle

@add\_toggle\_python @snippet samples/python/tutorial\_code/core/mat\_mask\_operations/mat\_mask\_operations.py filter2D @end\_toggle

The function even has a fifth optional argument to specify the center of the kernel, a sixth for adding an optional value to the filtered pixels before storing them in K and a seventh one for determining what to do in the regions where the operation is undefined (borders).

This function is shorter, less verbose and, because there are some optimizations, it is usually faster than the _hand-coded method_. For example in my test while the second one took only 13 milliseconds the first took around 31 milliseconds. Quite some difference.

For example:

@add\_toggle\_cpp Check out an instance of running the program on our [YouTube channel](http://www.youtube.com/watch?v=7PF1tAU9se4) . @youtube{7PF1tAU9se4} @end\_toggle

## [Table Of Content Core](https://docharvest.github.io/docs/opencv5/tutorials/core/table_of_content_core/)

Contents

opencv5

Table Of Content Core

OpenCV 5

Table Of Content Core

# The Core Functionality (core module) {#tutorial\_table\_of\_content\_core}

@tableofcontents

##### Basic

-   @subpage tutorial\_mat\_the\_basic\_image\_container
-   @subpage tutorial\_how\_to\_scan\_images
-   @subpage tutorial\_mat\_mask\_operations
-   @subpage tutorial\_mat\_operations
-   @subpage tutorial\_adding\_images
-   @subpage tutorial\_basic\_linear\_transform

##### Advanced

-   @subpage tutorial\_discrete\_fourier\_transform
-   @subpage tutorial\_file\_input\_output\_with\_xml\_yml
-   @subpage tutorial\_how\_to\_use\_OpenCV\_parallel\_for\_
-   @subpage tutorial\_how\_to\_use\_OpenCV\_parallel\_for\_new
-   @subpage tutorial\_univ\_intrin

## [Univ Intrin](https://docharvest.github.io/docs/opencv5/tutorials/core/univ_intrin/univ_intrin/)

Contents

opencv5

Univ Intrin

OpenCV 5

Univ Intrin

# Vectorizing your code using Universal Intrinsics {#tutorial\_univ\_intrin}

@tableofcontents

@prev\_tutorial{tutorial\_how\_to\_use\_OpenCV\_parallel\_for\_new}

Compatibility

OpenCV >= 4.11

## Goal

The goal of this tutorial is to provide a guide to using the @ref core\_hal\_intrin feature to vectorize your C++ code for a faster runtime. We'll briefly look into _SIMD intrinsics_ and how to work with wide _registers_, followed by a tutorial on the basic operations using wide registers.

## Theory

In this section, we will briefly look into a few concepts to better help understand the functionality.

### Intrinsics

Intrinsics are functions which are separately handled by the compiler. These functions are often optimized to perform in the most efficient ways possible and hence run faster than normal implementations. However, since these functions depend on the compiler, it makes it difficult to write portable applications.

### SIMD

SIMD stands for **Single Instruction, Multiple Data**. SIMD Intrinsics allow the processor to vectorize calculations. The data is stored in what are known as _registers_. A _register_ may be _128-bits_, _256-bits_ or _512-bits_ wide. Each _register_ stores **multiple values** of the **same data type**. The size of the register and the size of each value determines the number of values stored in total.

Depending on what _Instruction Sets_ your CPU supports, you may be able to use the different registers. To learn more, look [here](https://en.wikipedia.org/wiki/Instruction_set_architecture)

### VLA

VLA stands for **Vector Length Agnostic** . A mechanism where the register width is determined by the hardware at runtime rather than being fixed at compile time. This allows a single binary to scale its performance across different CPUs within the same architecture (e.g., RVV or SVE).

## Universal Intrinsics

OpenCV's universal intrinsics provides an abstraction to SIMD and VLA vectorization methods and allows the user to use intrinsics without the need to write system specific code. Supported SIMD/VLA technologies are detailed in @ref core\_hal\_intrin .

**We will now introduce the available structures and functions:**

-   Register structures
-   Load and store
-   Mathematical Operations
-   Reduce and Mask

### Register Structures

The Universal Intrinsics set implements every register as a structure based on the particular SIMD register. All types contain the `nlanes` enumeration which gives the exact number of values that the type can hold. This eliminates the need to hardcode the number of values during implementations.

@note Each register structure is under the `cv` namespace.

There are **two types** of registers:

-   **Variable sized registers**: These structures do not have a fixed size and their exact bit length is deduced during compilation, based on the available SIMD capabilities. Consequently, the value of the `nlanes` enum is determined in compile time.
    
      
    Each structure follows the following convention:
    
    ```
      v_[type of value][size of each value in bits]
    ```
    
    For instance, **v\_uint8 holds 8-bit unsigned integers** and **v\_float32 holds 32-bit floating point values**. We then declare a register like we would declare any object in C++
    
    Based on the available SIMD instruction set, a particular register will hold different number of values. For example: If your computer supports a maximum of 256bit registers,
    
    -   _v\_uint8_ will hold 32 8-bit unsigned integers
        
    -   _v\_float64_ will hold 4 64-bit floats (doubles)
        
        ```
          v_uint8 a;                            // a is a register supporting uint8(char) data
          int n = a.nlanes;                     // n holds 32
        ```
        
    
    Available data type and sizes:
    
    Type
    
    Size in bits
    
    uint
    
    8, 16, 32, 64
    
    int
    
    8, 16, 32, 64
    
    float
    
    32, 64
    
-   **Constant sized registers**: These structures have a fixed bit size and hold a constant number of values. We need to know what SIMD instruction set is supported by the system and select compatible registers. Use these only if exact bit length is necessary.
    
      
    Each structure follows the convention:
    
    ```
      v_[type of value][size of each value in bits]x[number of values]
    ```
    
    Suppose we want to store
    
    -   32-bit(_size in bits_) signed integers in a **128 bit register**. Since the register size is already known, we can find out the _number of data points in register_ (_128/32 = 4_):
        
        ```
          v_int32x8 reg1                       // holds 8 32-bit signed integers.
        ```
        
    -   64-bit floats in 512 bit register:
        
        ```
          v_float64x8 reg2                     // reg2.nlanes = 8
        ```
        

### Load and Store operations

Now that we know how registers work, let us look at the functions used for filling these registers with values.

-   **Load**: Load functions allow you to _load_ values into a register.
    
    -   _Constructors_ - When declaring a register structure, we can either provide a memory address from where the register will pick up contiguous values, or provide the values explicitly as multiple arguments (Explicit multiple arguments is available only for Constant Sized Registers):
        
        ```
          float ptr[32] = {1, 2, 3 ..., 32};   // ptr is a pointer to a contiguous memory block of 32 floats
        
          // Variable Sized Registers //
          int x = v_float32().nlanes;          // set x as the number of values the register can hold
        
          v_float32 reg1(ptr);                 // reg1 stores first x values according to the maximum register size available.
          v_float32 reg2(ptr + x);             // reg stores the next x values
        
          // Constant Sized Registers //
          v_float32x4 reg1(ptr);               // reg1 stores the first 4 floats (1, 2, 3, 4)
          v_float32x4 reg2(ptr + 4);           // reg2 stores the next 4 floats (5, 6, 7, 8)
        
          // Or we can explicitly write down the values.
          v_float32x4(1, 2, 3, 4);
        ```
        
    
    \* \*Load Function\* - We can use the load method and provide the memory address of the data:
    
    ```
          float ptr[32] = {1, 2, 3, ..., 32};
          v_float32 reg_var;
          reg_var = vx_load(ptr);              // loads values from ptr[0] upto ptr[reg_var.nlanes - 1]
    
          v_float32x4 reg_128;
          reg_128 = v_load(ptr);               // loads values from ptr[0] upto ptr[3]
    
          v_float32x8 reg_256;
          reg_256 = v256_load(ptr);            // loads values from ptr[0] upto ptr[7]
    
          v_float32x16 reg_512;
          reg_512 = v512_load(ptr);            // loads values from ptr[0] upto ptr[15]
    
      @note The load function assumes data is unaligned. If your data is aligned, you may use the `vx_load_aligned()` function.
    ```
    
      
-   **Store**: Store functions allow you to _store_ the values from a register into a particular memory location.
    
    -   To store values from a register into a memory location, you may use the _v\_store()_ function:
        
        ```
          float ptr[4];
          v_store(ptr, reg); // store the first 128 bits(interpreted as 4x32-bit floats) of reg into ptr.
        ```
        

@note Ensure \*\*ptr\*\* has the same type as register. You can also cast the register into the proper type before carrying out operations. Simply typecasting the pointer to a particular type will lead wrong interpretation of data.

### Binary and Unary Operators

The universal intrinsics set provides element wise binary and unary operations.

@note Since OpenCV 4.11, C++ operator overloading (e.g., +, ) in Universal Intrinsics has been deprecated in favor of explicit wrapper functions (e.g., v\_add, v\_mul) to ensure compatibility with VLA architectures. See also: [https://github.com/opencv/opencv/issues/27267](https://github.com/opencv/opencv/issues/27267)

-   **Arithmetics**: We can add, subtract, multiply and divide two registers element-wise. The registers must be of the same width and hold the same type. To multiply two registers, for example:
    
    ```
      v_float32 a, b;                          // {a1, ..., an}, {b1, ..., bn}
      v_float32 c = v_add(a, b);               // {a1 + b1, ..., an + bn}
      v_flaot32 d = v_mul(a, b);               // {a1 * b1, ..., an * bn}
    ```
    

-   **Bitwise Logic and Shifts**: We can left shift or right shift the bits of each element of the register. We can also apply bitwise and, or, xor and not operators between two registers element-wise:
    
    ```
      v_int32 as;                              // {a1, ..., an}
      v_int32 al = v_shl(as, 2);               // {a1 << 2, ..., an << 2}
      v_int32 bl = v_shr(as, 2);               // {a1 >> 2, ..., an >> 2}
    
      v_int32 a, b;
      v_int32 a_and_b = v_and(a, b);           // {a1 & b1, ..., an & bn}
    ```
    

-   **Comparison Operators**: We can compare values between two registers using the v\_lt(<), v\_gt(>), v\_le(<=) , v\_ge(>=), v\_eq(==) and v\_ne(!=). Since each register contains multiple values, we don't get a single bool for these operations. Instead, for true values, all bits are converted to one (0xff for 8 bits, 0xffff for 16 bits, etc), while false values return bits converted to zero.
    
    ```
      // let us consider the following code is run in a 128-bit register
      v_uint8 a;                               // a = {0, 1, 2, ..., 13, 14, 15}
      v_uint8 b;                               // b = {15, 14, 13, ..., 2, 1, 0}
    
      v_uint8 c = v_lt(a, b);                  // c = {255, 255, 255, ..., 0, 0, 0}
    
      /*
          let us look at the first 4 values in binary
    
          a = |00000000|00000001|00000010|00000011|
          b = |00001111|00001110|00001101|00001100|
          c = |11111111|11111111|11111111|11111111|
    
          If we store the values of c and print them as integers, we will get 255 for true values and 0 for false values.
      */
      ---
      // In a computer supporting 256-bit registers
      v_int32 a;                               // a = {1, 2, 3, 4, 5, 6, 7, 8}
      v_int32 b;                               // b = {8, 7, 6, 5, 4, 3, 2, 1}
    
      v_int32 c = v_lt(a, b);                  // c = {-1, -1, -1, -1, 0, 0, 0, 0}
    
      /*
          The true values are 0xffffffff, which in signed 32-bit integer representation is equal to -1.
      */
    ```
    

-   **Min/Max operations**: We can use the _v\_min()_ and _v\_max()_ functions to return registers containing element-wise min, or max, of the two registers:
    
    ```
      v_int32 a;                               // {a1, ..., an}
      v_int32 b;                               // {b1, ..., bn}
    
      v_int32 mn = v_min(a, b);                // {min(a1, b1), ..., min(an, bn)}
      v_int32 mx = v_max(a, b);                // {max(a1, b1), ..., max(an, bn)}
    ```
    

@note Comparison and Min/Max operators are not available for 64 bit integers. Bitwise shift and logic operators are available only for integer values. Bitwise shift is available only for 16, 32 and 64 bit registers.

### Reduce and Mask

-   **Reduce Operations**: The _v\_reduce\_min()_, _v\_reduce\_max()_ and _v\_reduce\_sum()_ return a single value denoting the min, max or sum of the entire register:
    
    ```
      v_int32 a;                                //  a = {a1, ..., a4}
      int mn = v_reduce_min(a);                 // mn = min(a1, ..., an)
      int sum = v_reduce_sum(a);                // sum = a1 + ... + an
    ```
    

-   **Mask Operations**: Mask operations allow us to replicate conditionals in wide registers. These include:
    -   _v\_check\_all()_ - Returns a bool, which is true if all the values in the register are less than zero.
        
    -   _v\_check\_any()_ - Returns a bool, which is true if any value in the register is less than zero.
        
    -   _v\_select()_ - Returns a register, which blends two registers, based on a mask.
        
        ```
          v_uint8 a;                           // {a1, .., an}
          v_uint8 b;                           // {b1, ..., bn}
        
          v_int32x4 mask:                      // {0xff, 0, 0, 0xff, ..., 0xff, 0}
        
          v_uint8 Res = v_select(mask, a, b)   // {a1, b2, b3, a4, ..., an-1, bn}
        
          /*
              "Res" will contain the value from "a" if mask is true (all bits set to 1),
              and value from "b" if mask is false (all bits set to 0)
        
              We can use comparison operators to generate mask and v_select to obtain results based on conditionals.
              It is common to set all values of b to 0. Thus, v_select will give values of "a" or 0 based on the mask.
          */
        ```
        

## Demonstration

In the following section, we will vectorize a simple convolution function for single channel and compare the results to a scalar implementation. @note Not all algorithms are improved by manual vectorization. In fact, in certain cases, the compiler may _autovectorize_ the code, thus producing faster results for scalar implementations.

You may learn more about convolution from the previous tutorial. We use the same naive implementation from the previous tutorial and compare it to the vectorized version.

The full tutorial code is [here](https://github.com/opencv/opencv/tree/5.x/samples/cpp/tutorial_code/core/univ_intrin/univ_intrin.cpp).

### Vectorizing Convolution

We will first implement a 1-D convolution and then vectorize it. The 2-D vectorized convolution will perform 1-D convolution across the rows to produce the correct results.

#### 1-D Convolution: Scalar

@snippet univ\_intrin.cpp convolution-1D-scalar

1.  We first set up variables and make a border on both sides of the src matrix, to take care of edge cases. @snippet univ\_intrin.cpp convolution-1D-border
    
2.  For the main loop, we select an index _i_ and offset it on both sides along with the kernel, using the k variable. We store the value in _value_ and add it to the _dst_ matrix. @snippet univ\_intrin.cpp convolution-1D-scalar-main
    

#### 1-D Convolution: Vector

We will now look at the vectorized version of 1-D convolution. @snippet univ\_intrin.cpp convolution-1D-vector

1.  In our case, the kernel is a float. Since the kernel's datatype is the largest, we convert src to float32, forming _src\_32_. We also make a border like we did for the naive case. @snippet univ\_intrin.cpp convolution-1D-convert
    
2.  Now, for each column in the _kernel_, we calculate the scalar product of the value with all _window_ vectors of length `step`. We add these values to the already stored values in ans @snippet univ\_intrin.cpp convolution-1D-main
    
    -   We declare a pointer to the src\_32 and kernel and run a loop for each kernel element @snippet univ\_intrin.cpp convolution-1D-main-h1
        
    -   We load a register with the current kernel element. A window is shifted from _0_ to _len - step_ and its product with the kernel\_wide array is added to the values stored in _ans_. We store the values back into _ans_ @snippet univ\_intrin.cpp convolution-1D-main-h2
        
    -   Since the length might not be divisible by steps, we take care of the remaining values directly. The number of _tail_ values will always be less than _step_ and will not affect the performance significantly. We store all the values to _ans_ which is a float pointer. We can also directly store them in a `Mat` object @snippet univ\_intrin.cpp convolution-1D-main-h3
        
    -   Here is an iterative example:
        
        ```
          For example:
          kernel: {k1, k2, k3}
          src:           ...|a1|a2|a3|a4|...
        
        
          iter1:
          for each idx i in (0, len), 'step' idx at a time
              kernel_wide:          |k1|k1|k1|k1|
              window:               |a0|a1|a2|a3|
              ans:               ...| 0| 0| 0| 0|...
              sum =  ans + window * kernel_wide
                  =  |a0 * k1|a1 * k1|a2 * k1|a3 * k1|
        
          iter2:
              kernel_wide:          |k2|k2|k2|k2|
              window:               |a1|a2|a3|a4|
              ans:               ...|a0 * k1|a1 * k1|a2 * k1|a3 * k1|...
              sum =  ans + window * kernel_wide
                  =  |a0 * k1 + a1 * k2|a1 * k1 + a2 * k2|a2 * k1 + a3 * k2|a3 * k1 + a4 * k2|
        
          iter3:
              kernel_wide:          |k3|k3|k3|k3|
              window:               |a2|a3|a4|a5|
              ans:               ...|a0 * k1 + a1 * k2|a1 * k1 + a2 * k2|a2 * k1 + a3 * k2|a3 * k1 + a4 * k2|...
              sum =  sum + window * kernel_wide
                  =  |a0*k1 + a1*k2 + a2*k3|a1*k1 + a2*k2 + a3*k3|a2*k1 + a3*k2 + a4*k3|a3*k1 + a4*k2 + a5*k3|
        ```
        

@note The function parameters also include _row_, _rowk_ and _len_. These values are used when using the function as an intermediate step of 2-D convolution

#### 2-D Convolution

Suppose our kernel has _ksize_ rows. To compute the values for a particular row, we compute the 1-D convolution of the previous _ksize/2_ and the next _ksize/2_ rows, with the corresponding kernel row. The final values is simply the sum of the individual 1-D convolutions @snippet univ\_intrin.cpp convolution-2D

1.  We first initialize variables and make a border above and below the _src_ matrix. The left and right sides are handled by the 1-D convolution function. @snippet univ\_intrin.cpp convolution-2D-init
    
2.  For each row, we calculate the 1-D convolution of the rows above and below it. we then add the values to the _dst_ matrix. @snippet univ\_intrin.cpp convolution-2D-main
    
3.  We finally convert the _dst_ matrix to a _8-bit_ `unsigned char` matrix @snippet univ\_intrin.cpp convolution-2D-conv
    

## Results

In the tutorial, we used a horizontal gradient kernel. We obtain the same output image for both methods.

Improvement in runtime varies and will depend on the SIMD capabilities available in your CPU.

## [Dnn Android](https://docharvest.github.io/docs/opencv5/tutorials/dnn/dnn_android/dnn_android/)

Contents

opencv5

Dnn Android

OpenCV 5

Dnn Android

The page was moved to @ref tutorial\_android\_dnn\_intro

## [Custom deep learning layers support {#tutorial_dnn_custom_layers}](https://docharvest.github.io/docs/opencv5/tutorials/dnn/dnn_custom_layers/dnn_custom_layers/)

Contents

opencv5

Custom deep learning layers support {#tutorial\_dnn\_custom\_layers}

OpenCV 5

Custom deep learning layers support {#tutorial\_dnn\_custom\_layers}

@tableofcontents

@prev\_tutorial{tutorial\_dnn\_javascript} @next\_tutorial{tutorial\_dnn\_OCR}

Original author

Dmitry Kurtaev

Compatibility

OpenCV >= 3.4.1

## Introduction

Deep learning is a fast-growing area. New approaches to building neural networks usually introduce new types of layers. These could be modifications of existing ones or implementation of outstanding research ideas.

OpenCV allows importing and running networks from different deep learning frameworks. There is a number of the most popular layers. However, you can face a problem that your network cannot be imported using OpenCV because some layers of your network can be not implemented in the deep learning engine of OpenCV.

The first solution is to create a feature request at [https://github.com/opencv/opencv/issues](https://github.com/opencv/opencv/issues) mentioning details such as a source of a model and a type of new layer. The new layer could be implemented if the OpenCV community shares this need.

The second way is to define a **custom layer** so that OpenCV's deep learning engine will know how to use it. This tutorial is dedicated to show you a process of deep learning model's import customization.

## Define a custom layer in C++

Deep learning layer is a building block of network's pipeline. It has connections to **input blobs** and produces results to **output blobs**. There are trained **weights** and **hyper-parameters**. Layers' names, types, weights and hyper-parameters are stored in files are generated by native frameworks during training. If OpenCV encounters unknown layer type it throws an exception while trying to read a model:

```
Unspecified error: Can't create layer "layer_name" of type "MyType" in function getLayerInstance
```

To import the model correctly you have to derive a class from cv::dnn::Layer with the following methods:

@snippet dnn/custom\_layers.hpp A custom layer interface

And register it before the import:

@snippet dnn/custom\_layers.hpp Register a custom layer

@note `MyType` is a type of unimplemented layer from the thrown exception.

Let's see what all the methods do:

-   Constructor

@snippet dnn/custom\_layers.hpp MyLayer::MyLayer

Retrieves hyper-parameters from cv::dnn::LayerParams. If your layer has trainable weights they will be already stored in the Layer's member cv::dnn::Layer::blobs.

-   A static method `create`

@snippet dnn/custom\_layers.hpp MyLayer::create

This method should create an instance of you layer and return cv::Ptr with it.

-   Output blobs' shape computation

@snippet dnn/custom\_layers.hpp MyLayer::getMemoryShapes

Returns layer's output shapes depending on input shapes. You may request an extra memory using `internals`.

-   Run a layer

@snippet dnn/custom\_layers.hpp MyLayer::forward

Implement a layer's logic here. Compute outputs for given inputs.

@note OpenCV manages memory allocated for layers. In the most cases the same memory can be reused between layers. So your `forward` implementation should not rely on that the second invocation of `forward` will have the same data at `outputs` and `internals`.

-   Optional `finalize` method

@snippet dnn/custom\_layers.hpp MyLayer::finalize

The chain of methods is the following: OpenCV deep learning engine calls `create` method once, then it calls `getMemoryShapes` for every created layer, then you can make some preparations depend on known input dimensions at cv::dnn::Layer::finalize. After network was initialized only `forward` method is called for every network's input.

@note Varying input blobs' sizes such height, width or batch size make OpenCV reallocate all the internal memory. That leads to efficiency gaps. Try to initialize and deploy models using a fixed batch size and image's dimensions.

## Example: custom layer from TensorFlow

This is an example of how to import a network with [tf.image.resize\_bilinear](https://www.tensorflow.org/versions/master/api_docs/python/tf/image/resize_bilinear) operation. This is also a resize but with an implementation different from OpenCV's built-in resize.

Let's create a single layer network:

```
inp = tf.placeholder(tf.float32, [2, 3, 4, 5], 'input')
resized = tf.image.resize_bilinear(inp, size=[9, 8], name='resize_bilinear')
```

OpenCV sees that TensorFlow's graph in the following way:

```
node {
  name: "input"
  op: "Placeholder"
  attr {
    key: "dtype"
    value {
      type: DT_FLOAT
    }
  }
}
node {
  name: "resize_bilinear/size"
  op: "Const"
  attr {
    key: "dtype"
    value {
      type: DT_INT32
    }
  }
  attr {
    key: "value"
    value {
      tensor {
        dtype: DT_INT32
        tensor_shape {
          dim {
            size: 2
          }
        }
        tensor_content: "\t\000\000\000\010\000\000\000"
      }
    }
  }
}
node {
  name: "resize_bilinear"
  op: "ResizeBilinear"
  input: "input:0"
  input: "resize_bilinear/size"
  attr {
    key: "T"
    value {
      type: DT_FLOAT
    }
  }
  attr {
    key: "align_corners"
    value {
      b: false
    }
  }
}
library {
}
```

Custom layers import from TensorFlow is designed to put all layer's `attr` into cv::dnn::LayerParams but input `Const` blobs into cv::dnn::Layer::blobs. In our case resize's output shape will be stored in layer's `blobs[0]`.

@snippet dnn/custom\_layers.hpp ResizeBilinearLayer

Next we register a layer and try to import the model.

@snippet dnn/custom\_layers.hpp Register ResizeBilinearLayer

## Example: custom layer from ONNX

ONNX groups operators into **domains**. The standard operators live in the default domain `ai.onnx`; vendors and exporters often place their own ops in a named domain such as `my.namespace`. When OpenCV imports an ONNX node, it looks the op up in cv::dnn::LayerFactory by:

-   the op\_type alone, for nodes in the default `ai.onnx` domain (or no domain), and
-   `"<domain>.<op_type>"`, for nodes in any non-default domain.

Node attributes are passed through to the layer constructor as cv::dnn::LayerParams entries with the same names. Consider an op `MyCustomOp` with attributes `scale` and `bias` that computes `y = scale * x + bias`. The implementation can look like:

@snippet dnn/custom\_layer\_onnx.cpp CustomScaleBiasLayer

To import a model that uses this op, register the layer **before** calling cv::dnn::readNetFromONNX. Use cv::dnn::LayerFactory::registerLayer for runtime registration (and cv::dnn::LayerFactory::unregisterLayer when done) — pick the right key for the domain of the op as described above:

@snippet dnn/custom\_layer\_onnx.cpp Register CustomScaleBiasLayer

A complete runnable example is available at [samples/dnn/custom\_layer\_onnx.cpp](https://github.com/opencv/opencv/tree/5.x/samples/dnn/custom_layer_onnx.cpp). Tiny ONNX models exercising both the default-domain and custom-domain registration paths can be generated with [generate\_custom\_layer\_models.py](https://github.com/opencv/opencv_extra/tree/5.x/testdata/dnn/onnx/generate_custom_layer_models.py) in the opencv\_extra repository.

## Define a custom layer in Python

The following example shows how to customize OpenCV's layers in Python.

Let's consider the [Holistically-Nested Edge Detection](https://arxiv.org/abs/1504.06375) model. Its `Crop` layers receive two input blobs and crop the first one to match the spatial dimensions of the second. OpenCV's built-in `Crop` layer trims from the top-left corner, whereas this model expects cropping from the center, so using the built-in behaviour directly would produce shifted results with filled borders.

Next we're going to replace OpenCV's `Crop` layer that makes top-left cropping by a centric one.

-   Create a class with `getMemoryShapes` and `forward` methods

@snippet dnn/custom\_layer.py CropLayer

@note Both methods should return lists.

-   Register a new layer.

@snippet dnn/custom\_layer.py Register

That's it! We have replaced an implemented OpenCV's layer to a custom one. You may find a full script in the [source code](https://github.com/opencv/opencv/tree/5.x/samples/dnn/edge_detection.py).

!\[\](js\_tutorials/js\_assets/lena.jpg)

!\[\](images/lena\_hed.jpg)

## [DNN-based Face Detection And Recognition {#tutorial_dnn_face}](https://docharvest.github.io/docs/opencv5/tutorials/dnn/dnn_face/dnn_face/)

Contents

opencv5

DNN-based Face Detection And Recognition {#tutorial\_dnn\_face}

OpenCV 5

DNN-based Face Detection And Recognition {#tutorial\_dnn\_face}

@tableofcontents

@prev\_tutorial{tutorial\_dnn\_text\_spotting} @next\_tutorial{pytorch\_cls\_tutorial\_dnn\_conversion}

Original Author

Chengrui Wang, Yuantao Feng

Compatibility

OpenCV >= 4.5.4

## Introduction

In this section, we introduce cv::FaceDetectorYN class for face detection and cv::FaceRecognizerSF class for face recognition.

## Models

There are two models (ONNX format) pre-trained and required for this module:

-   [Face Detection](https://github.com/opencv/opencv_zoo/tree/master/models/face_detection_yunet):
    
    -   Size: 338KB
    -   Results on WIDER Face Val set: 0.830(easy), 0.824(medium), 0.708(hard)
-   [Face Recognition](https://github.com/opencv/opencv_zoo/tree/master/models/face_recognition_sface)
    
    -   Size: 36.9MB
    -   Results:
    
    Database
    
    Accuracy
    
    Threshold (normL2)
    
    Threshold (cosine)
    
    LFW
    
    99.60%
    
    1.128
    
    0.363
    
    CALFW
    
    93.95%
    
    1.149
    
    0.340
    
    CPLFW
    
    91.05%
    
    1.204
    
    0.275
    
    AgeDB-30
    
    94.90%
    
    1.202
    
    0.277
    
    CFP-FP
    
    94.80%
    
    1.253
    
    0.212
    

## Code

@add\_toggle\_cpp

-   **Downloadable code**: Click [here](https://github.com/opencv/opencv/tree/5.x/samples/dnn/face_detect.cpp)
    
-   **Code at glance:** @include samples/dnn/face\_detect.cpp @end\_toggle
    

@add\_toggle\_python

-   **Downloadable code**: Click [here](https://github.com/opencv/opencv/tree/5.x/samples/dnn/face_detect.py)
    
-   **Code at glance:** @include samples/dnn/face\_detect.py @end\_toggle
    

## Explanation

@add\_toggle\_cpp @snippet dnn/face\_detect.cpp initialize\_FaceDetectorYN @snippet dnn/face\_detect.cpp inference @end\_toggle

@add\_toggle\_python @snippet dnn/face\_detect.py initialize\_FaceDetectorYN @snippet dnn/face\_detect.py inference @end\_toggle

The detection output `faces` is a two-dimension array of type CV\_32F, whose rows are the detected face instances, columns are the location of a face and 5 facial landmarks. The format of each row is as follows:

```
x1, y1, w, h, x_re, y_re, x_le, y_le, x_nt, y_nt, x_rcm, y_rcm, x_lcm, y_lcm
```

, where `x1, y1, w, h` are the top-left coordinates, width and height of the face bounding box, `{x, y}_{re, le, nt, rcm, lcm}` stands for the coordinates of right eye, left eye, nose tip, the right corner and left corner of the mouth respectively.

### Face Recognition

Following Face Detection, run codes below to extract face feature from facial image.

@add\_toggle\_cpp @snippet dnn/face\_detect.cpp initialize\_FaceRecognizerSF @snippet dnn/face\_detect.cpp facerecognizer @end\_toggle

@add\_toggle\_python @snippet dnn/face\_detect.py initialize\_FaceRecognizerSF @snippet dnn/face\_detect.py facerecognizer @end\_toggle

After obtaining face features _feature1_ and _feature2_ of two facial images, run codes below to calculate the identity discrepancy between the two faces.

@add\_toggle\_cpp @snippet dnn/face\_detect.cpp match @end\_toggle

@add\_toggle\_python @snippet dnn/face\_detect.py match @end\_toggle

For example, two faces have same identity if the cosine distance is greater than or equal to 0.363, or the normL2 distance is less than or equal to 1.128.

## Reference:

-   [https://github.com/ShiqiYu/libfacedetection](https://github.com/ShiqiYu/libfacedetection)
-   [https://github.com/ShiqiYu/libfacedetection.train](https://github.com/ShiqiYu/libfacedetection.train)
-   [https://github.com/zhongyy/SFace](https://github.com/zhongyy/SFace)

## Acknowledgement

Thanks [Professor Shiqi Yu](https://github.com/ShiqiYu/) and [Yuantao Feng](https://github.com/fengyuentau) for training and providing the face detection model.

Thanks [Professor Deng](http://www.whdeng.cn/), [PhD Candidate Zhong](https://github.com/zhongyy/) and [Master Candidate Wang](https://github.com/crywang/) for training and providing the face recognition model.

## [Dnn Googlenet](https://docharvest.github.io/docs/opencv5/tutorials/dnn/dnn_googlenet/dnn_googlenet/)

Contents

opencv5

Dnn Googlenet

OpenCV 5

Dnn Googlenet

# Load ONNX framework models {#tutorial\_dnn\_googlenet}

@tableofcontents

@next\_tutorial{tutorial\_dnn\_openvino}

Original author

Vitaliy Lyudvichenko

Compatibility

OpenCV >= 4.5.4

## Introduction

In this tutorial you will learn how to use opencv\_dnn module for image classification by using GoogLeNet trained network from [ONNX model zoo](https://github.com/onnx/models/).

We will demonstrate results of this example on the following picture.

## Source Code

We will be using snippets from the example application, that can be downloaded [here](https://github.com/opencv/opencv/blob/5.x/samples/dnn/classification.cpp).

@include dnn/classification.cpp

## Explanation

\-# Firstly, download GoogLeNet model files: @code python download\_models.py googlenet @endcode

Also you need file with names of [ILSVRC2012](http://image-net.org/challenges/LSVRC/2012/browse-synsets) classes: [classification\_classes\_ILSVRC2012.txt](https://github.com/opencv/opencv/blob/5.x/samples/data/dnn/classification_classes_ILSVRC2012.txt).

Put these files into working dir of this program example.

\-# Read and initialize network using path to .onnx file @snippet dnn/classification.cpp Read and initialize network

\-# Read input image and convert to the blob, acceptable by GoogleNet @snippet dnn/classification.cpp Open a video file or an image file or a camera stream

cv::VideoCapture can load both images and videos.

@snippet dnn/classification.cpp Create a 4D blob from a frame We convert the image to a 4-dimensional blob (so-called batch) with `1x3x224x224` shape after applying necessary pre-processing like resizing and mean subtraction for each blue, green and red channels correspondingly using cv::dnn::blobFromImage function.

\-# Pass the blob to the network @snippet dnn/classification.cpp Set input blob

\-# Make forward pass @snippet dnn/classification.cpp Make forward pass During the forward pass output of each network layer is computed, but in this example we need output from the last layer only.

\-# Determine the best class @snippet dnn/classification.cpp Get a class with a highest score We put the output of network, which contain probabilities for each of 1000 ILSVRC2012 image classes, to the `prob` blob. And find the index of element with maximal value in this one. This index corresponds to the class of the image.

\-# Run an example from command line @code ./example\_dnn\_classification googlenet @endcode For our image we get prediction of class `space shuttle` with more than 99% sureness.

## [How to run deep networks in browser {#tutorial_dnn_javascript}](https://docharvest.github.io/docs/opencv5/tutorials/dnn/dnn_javascript/dnn_javascript/)

Contents

opencv5

How to run deep networks in browser {#tutorial\_dnn\_javascript}

OpenCV 5

How to run deep networks in browser {#tutorial\_dnn\_javascript}

@tableofcontents

@prev\_tutorial{tutorial\_dnn\_yolo} @next\_tutorial{tutorial\_dnn\_custom\_layers}

Original author

Dmitry Kurtaev

Compatibility

OpenCV >= 3.3.1

## Introduction

This tutorial will show us how to run deep learning models using OpenCV.js right in a browser. Tutorial refers a sample of face detection and face recognition models pipeline.

## Face detection

Face detection network gets BGR image as input and produces set of bounding boxes that might contain faces. All that we need is just select the boxes with a strong confidence.

## Face recognition

Network is called OpenFace (project [https://github.com/cmusatyalab/openface](https://github.com/cmusatyalab/openface)). Face recognition model receives RGB face image of size `96x96`. Then it returns `128`\-dimensional unit vector that represents input face as a point on the unit multidimensional sphere. So difference between two faces is an angle between two output vectors.

## Sample

All the sample is an HTML page that has JavaScript code to use OpenCV.js functionality. You may see an insertion of this page below. Press `Start` button to begin a demo. Press `Add a person` to name a person that is recognized as an unknown one. Next we'll discuss main parts of the code.

@htmlinclude js\_face\_recognition.html

\-# Run face detection network to detect faces on input image. @snippet dnn/js\_face\_recognition.html Run face detection model You may play with input blob sizes to balance detection quality and efficiency. The bigger input blob the smaller faces may be detected.

\-# Run face recognition network to receive `128`\-dimensional unit feature vector by input face image. @snippet dnn/js\_face\_recognition.html Get 128 floating points feature vector

\-# Perform a recognition. @snippet dnn/js\_face\_recognition.html Recognize Match a new feature vector with registered ones. Return a name of the best matched person.

\-# The main loop. @snippet dnn/js\_face\_recognition.html Define frames processing A main loop of our application receives a frames from a camera and makes a recognition of an every detected face on the frame. We start this function ones when OpenCV.js was initialized and deep learning models were downloaded.

## [How to run custom OCR model {#tutorial_dnn_OCR}](https://docharvest.github.io/docs/opencv5/tutorials/dnn/dnn_OCR/dnn_OCR/)

Contents

opencv5

How to run custom OCR model {#tutorial\_dnn\_OCR}

OpenCV 5

How to run custom OCR model {#tutorial\_dnn\_OCR}

@tableofcontents

@prev\_tutorial{tutorial\_dnn\_custom\_layers} @next\_tutorial{tutorial\_dnn\_text\_spotting}

Original author

Zihao Mu

Compatibility

OpenCV >= 4.3

## Introduction

In this tutorial, we first introduce how to obtain the custom OCR model, then how to transform your own OCR models so that they can be run correctly by the opencv\_dnn module. and finally we will provide some pre-trained models.

## Train your own OCR model

[This repository](https://github.com/zihaomu/deep-text-recognition-benchmark) is a good start point for training your own OCR model. In repository, the MJSynth+SynthText was set as training set by default. In addition, you can configure the model structure and data set you want.

## Transform OCR model to ONNX format and Use it in OpenCV DNN

After completing the model training, please use [transform\_to\_onnx.py](https://github.com/zihaomu/deep-text-recognition-benchmark/blob/master/transform_to_onnx.py) to convert the model into onnx format.

### Execute in webcam

The Python version example code can be found at [here](https://github.com/opencv/opencv/blob/5.x/samples/dnn/text_detection.py).

Example: @code{.bash} $ text\_detection -m=\[path\_to\_text\_detect\_model\] -ocr=\[path\_to\_text\_recognition\_model\] @endcode

## Pre-trained ONNX models are provided

Some pre-trained models can be found at [https://drive.google.com/drive/folders/1cTbQ3nuZG-EKWak6emD\_s8\_hHXWz7lAr?usp=sharing](https://drive.google.com/drive/folders/1cTbQ3nuZG-EKWak6emD_s8_hHXWz7lAr?usp=sharing).

Their performance at different text recognition datasets is shown in the table below:

Model name

IIIT5k(%)

SVT(%)

ICDAR03(%)

ICDAR13(%)

ICDAR15(%)

SVTP(%)

CUTE80(%)

average acc (%)

parameter( x10^6 )

DenseNet-CTC

72.267

67.39

82.81

80

48.38

49.45

42.50

63.26

0.24

DenseNet-BiLSTM-CTC

73.76

72.33

86.15

83.15

50.67

57.984

49.826

67.69

3.63

VGG-CTC

75.96

75.42

85.92

83.54

54.89

57.52

50.17

69.06

5.57

CRNN\_VGG-BiLSTM-CTC

82.63

82.07

92.96

88.867

66.28

71.01

62.37

78.03

8.45

ResNet-CTC

84.00

84.08

92.39

88.96

67.74

74.73

67.60

79.93

44.28

The performance of the text recognition model were tested on OpenCV DNN, and does not include the text detection model.

### Model selection suggestion

The input of text recognition model is the output of the text detection model, which causes the performance of text detection to greatly affect the performance of text recognition.

DenseNet\_CTC has the smallest parameters and best FPS, and it is suitable for edge devices, which are very sensitive to the cost of calculation. If you have limited computing resources and want to achieve better accuracy, VGG\_CTC is a good choice.

CRNN\_VGG\_BiLSTM\_CTC is suitable for scenarios that require high recognition accuracy.

## [Dnn Openvino](https://docharvest.github.io/docs/opencv5/tutorials/dnn/dnn_openvino/dnn_openvino/)

Contents

opencv5

Dnn Openvino

OpenCV 5

Dnn Openvino

# OpenCV usage with OpenVINO {#tutorial\_dnn\_openvino}

@prev\_tutorial{tutorial\_dnn\_googlenet} @next\_tutorial{tutorial\_dnn\_yolo}

Original author

Aleksandr Voron

Compatibility

OpenCV == 4.x

This tutorial provides OpenCV installation guidelines how to use OpenCV with OpenVINO.

Since 2021.1.1 release OpenVINO does not provide pre-built OpenCV. The change does not affect you if you are using OpenVINO runtime directly or OpenVINO samples: it does not have a strong dependency to OpenCV. However, if you are using Open Model Zoo demos or OpenVINO runtime as OpenCV DNN backend you need to get the OpenCV build.

There are 2 approaches how to get OpenCV:

-   Install pre-built OpenCV from another sources: system repositories, pip, conda, homebrew. Generic pre-built OpenCV package may have several limitations:
    -   OpenCV version may be out-of-date
    -   OpenCV may not contain G-API module with enabled OpenVINO support (e.g. some OMZ demos use G-API functionality)
    -   OpenCV may not be optimized for modern hardware (default builds need to cover wide range of hardware)
    -   OpenCV may not support Intel TBB, Intel Media SDK
    -   OpenCV DNN module may not use OpenVINO as an inference backend
-   Build OpenCV from source code against specific version of OpenVINO. This approach solves the limitations mentioned above.

The instruction how to follow both approaches is provided in [OpenCV wiki](https://github.com/opencv/opencv/wiki/BuildOpenCV4OpenVINO).

## Supported targets

OpenVINO backend (DNN\_BACKEND\_INFERENCE\_ENGINE) supports the following [targets](https://docs.opencv.org/4.x/d6/d0f/group__dnn.html#ga709af7692ba29788182cf573531b0ff5):

-   **DNN\_TARGET\_CPU:** Runs on the CPU, no additional dependencies required.
-   **DNN\_TARGET\_OPENCL, DNN\_TARGET\_OPENCL\_FP16:** Runs on the iGPU, requires OpenCL drivers. Install [intel-opencl-icd](https://launchpad.net/ubuntu/jammy/+package/intel-opencl-icd) on Ubuntu.
-   **DNN\_TARGET\_MYRIAD:** Runs on Intel® VPU like the [Neural Compute Stick](https://www.intel.com/content/www/us/en/products/sku/140109/intel-neural-compute-stick-2/specifications.html), to set up [see](https://www.intel.com/content/www/us/en/developer/archive/tools/neural-compute-stick.html).
-   **DNN\_TARGET\_HDDL:** Runs on the Intel® Movidius™ Myriad™ X High Density Deep Learning VPU, for details [see](https://intelsmartedge.github.io/ido-specs/doc/building-blocks/enhanced-platform-awareness/smartedge-open_hddl/).
-   **DNN\_TARGET\_FPGA:** Runs on Intel® Altera® series FPGAs [see](https://www.intel.com/content/www/us/en/docs/programmable/768970/2025-1/getting-started-guide.html).
-   **DNN\_TARGET\_NPU:** Runs on the integrated Intel® AI Boost processor, requires [Linux drivers](https://github.com/intel/linux-npu-driver/releases/tag/v1.17.0) OR [Windows drivers](https://www.intel.com/content/www/us/en/download/794734/intel-npu-driver-windows.html).

## [Conversion of PyTorch Classification Models and Launch with OpenCV C++ {#pytorch_cls_c_tutorial_dnn_conversion}](https://docharvest.github.io/docs/opencv5/tutorials/dnn/dnn_pytorch_tf_classification/pytorch_cls_model_conversion_c_tutorial/)

Contents

opencv5

Conversion of PyTorch Classification Models and Launch with OpenCV C++ {#pytorch\_cls\_c\_tutorial\_dnn\_conversion}

OpenCV 5

Conversion of PyTorch Classification Models and Launch with OpenCV C++ {#pytorch\_cls\_c\_tutorial\_dnn\_conversion}

@prev\_tutorial{pytorch\_cls\_tutorial\_dnn\_conversion}

Original author

Anastasia Murzova

Compatibility

OpenCV >= 4.5

## Goals

In this tutorial you will learn how to:

-   convert PyTorch classification models into ONNX format
-   run converted PyTorch model with OpenCV C/C++ API
-   provide model inference

We will explore the above-listed points by the example of ResNet-50 architecture.

## Introduction

Let's briefly view the key concepts involved in the pipeline of PyTorch models transition with OpenCV API. The initial step in conversion of PyTorch models into cv::dnn::Net is model transferring into [ONNX](https://onnx.ai/about.html) format. ONNX aims at the interchangeability of the neural networks between various frameworks. There is a built-in function in PyTorch for ONNX conversion: [`torch.onnx.export`](https://pytorch.org/docs/stable/onnx.html#torch.onnx.export). Further the obtained `.onnx` model is passed into cv::dnn::readNetFromONNX or cv::dnn::readNet.

## Requirements

To be able to experiment with the below code you will need to install a set of libraries. We will use a virtual environment with python3.7+ for this:

```
virtualenv -p /usr/bin/python3.7 <env_dir_path>
source <env_dir_path>/bin/activate
```

For OpenCV-Python building from source, follow the corresponding instructions from the @ref tutorial\_py\_table\_of\_contents\_setup.

Before you start the installation of the libraries, you can customize the [requirements.txt](https://github.com/opencv/opencv/tree/5.x/samples/dnn/dnn_model_runner/dnn_conversion/requirements.txt), excluding or including (for example, `opencv-python`) some dependencies. The below line initiates requirements installation into the previously activated virtual environment:

```
pip install -r requirements.txt
```

## Practice

In this part we are going to cover the following points:

1.  create a classification model conversion pipeline
2.  provide the inference, process prediction results

### Model Conversion Pipeline

The code in this subchapter is located in the `samples/dnn/dnn_model_runner` module and can be executed with the line:

```
python -m dnn_model_runner.dnn_conversion.pytorch.classification.py_to_py_resnet50_onnx
```

The following code contains the description of the below-listed steps:

1.  instantiate PyTorch model
2.  convert PyTorch model into `.onnx`

```
# initialize PyTorch ResNet-50 model
original_model = models.resnet50(pretrained=True)

# get the path to the converted into ONNX PyTorch model
full_model_path = get_pytorch_onnx_model(original_model)
print("PyTorch ResNet-50 model was successfully converted: ", full_model_path)
```

`get_pytorch_onnx_model(original_model)` function is based on `torch.onnx.export(...)` call:

```
# define the directory for further converted model save
onnx_model_path = "models"
# define the name of further converted model
onnx_model_name = "resnet50.onnx"

# create directory for further converted model
os.makedirs(onnx_model_path, exist_ok=True)

# get full path to the converted model
full_model_path = os.path.join(onnx_model_path, onnx_model_name)

# generate model input
generated_input = Variable(
    torch.randn(1, 3, 224, 224)
)

# model export into ONNX format
torch.onnx.export(
    original_model,
    generated_input,
    full_model_path,
    verbose=True,
    input_names=["input"],
    output_names=["output"],
    opset_version=11
)
```

After the successful execution of the above code we will get the following output:

```
PyTorch ResNet-50 model was successfully converted: models/resnet50.onnx
```

The proposed in `dnn/samples` module `dnn_model_runner` allows us to reproduce the above conversion steps for the following PyTorch classification models:

-   alexnet
-   vgg11
-   vgg13
-   vgg16
-   vgg19
-   resnet18
-   resnet34
-   resnet50
-   resnet101
-   resnet152
-   squeezenet1\_0
-   squeezenet1\_1
-   resnext50\_32x4d
-   resnext101\_32x8d
-   wide\_resnet50\_2
-   wide\_resnet101\_2

To obtain the converted model, the following line should be executed:

```
python -m dnn_model_runner.dnn_conversion.pytorch.classification.py_to_py_cls --model_name <pytorch_cls_model_name> --evaluate False
```

For the ResNet-50 case the below line should be run:

```
python -m dnn_model_runner.dnn_conversion.pytorch.classification.py_to_py_cls --model_name resnet50 --evaluate False
```

The default root directory for the converted model storage is defined in module `CommonConfig`:

```
@dataclass
class CommonConfig:
    output_data_root_dir: str = "dnn_model_runner/dnn_conversion"
```

Thus, the converted ResNet-50 will be saved in `dnn_model_runner/dnn_conversion/models`.

### Inference Pipeline

Now we can use `models/resnet50.onnx` for the inference pipeline using OpenCV C/C++ API. The implemented pipeline can be found in [samples/dnn/classification.cpp](https://github.com/opencv/opencv/blob/5.x/samples/dnn/classification.cpp). After the build of samples (`BUILD_EXAMPLES` flag value should be `ON`), the appropriate `example_dnn_classification` executable file will be provided.

To provide model inference we will use the below [squirrel photo](https://www.pexels.com/photo/brown-squirrel-eating-1564292) (under [CC0](https://www.pexels.com/terms-of-service/) license) corresponding to ImageNet class ID 335:

```
fox squirrel, eastern fox squirrel, Sciurus niger
```

For the label decoding of the obtained prediction, we also need `imagenet_classes.txt` file, which contains the full list of the ImageNet classes.

In this tutorial we will run the inference process for the converted PyTorch ResNet-50 model from the build (`samples/build`) directory:

```
./dnn/example_dnn_classification --model=../dnn/models/resnet50.onnx --input=../data/squirrel_cls.jpg --width=224 --height=224 --rgb=true --scale="0.003921569" --mean="123.675 116.28 103.53" --std="0.229 0.224 0.225" --crop=true --initial_width=256 --initial_height=256 --classes=../data/dnn/classification_classes_ILSVRC2012.txt
```

Let's explore `classification.cpp` key points step by step:

1.  read the model with cv::dnn::readNet, initialize the network:

```
Net net = readNet(model, config, framework);
```

The `model` parameter value is taken from `--model` key. In our case, it is `resnet50.onnx`.

-   preprocess input image:

```
if (rszWidth != 0 && rszHeight != 0)
{
    resize(frame, frame, Size(rszWidth, rszHeight));
}

// Create a 4D blob from a frame
blobFromImage(frame, blob, scale, Size(inpWidth, inpHeight), mean, swapRB, crop);

// Check std values.
if (std.val[0] != 0.0 && std.val[1] != 0.0 && std.val[2] != 0.0)
{
    // Divide blob by std.
    divide(blob, std, blob);
}
```

In this step we use cv::dnn::blobFromImage function to prepare model input. We set `Size(rszWidth, rszHeight)` with `--initial_width=256 --initial_height=256` for the initial image resize as it's described in [PyTorch ResNet inference pipeline](https://pytorch.org/hub/pytorch_vision_resnet/).

It should be noted that firstly in cv::dnn::blobFromImage mean value is subtracted and only then pixel values are multiplied by scale. Thus, we use `--mean="123.675 116.28 103.53"`, which is equivalent to `[0.485, 0.456, 0.406]` multiplied by `255.0` to reproduce the original image preprocessing order for PyTorch classification models:

```
img /= 255.0
img -= [0.485, 0.456, 0.406]
img /= [0.229, 0.224, 0.225]
```

-   make forward pass:

```
net.setInput(blob);
Mat prob = net.forward();
```

-   process the prediction:

```
Point classIdPoint;
double confidence;
minMaxLoc(prob.reshape(1, 1), 0, &confidence, 0, &classIdPoint);
int classId = classIdPoint.x;
```

Here we choose the most likely object class. The `classId` result for our case is 335 - fox squirrel, eastern fox squirrel, Sciurus niger:

## [Conversion of PyTorch Classification Models and Launch with OpenCV Python {#pytorch_cls_tutorial_dnn_conversion}](https://docharvest.github.io/docs/opencv5/tutorials/dnn/dnn_pytorch_tf_classification/pytorch_cls_model_conversion_tutorial/)

Contents

opencv5

Conversion of PyTorch Classification Models and Launch with OpenCV Python {#pytorch\_cls\_tutorial\_dnn\_conversion}

OpenCV 5

Conversion of PyTorch Classification Models and Launch with OpenCV Python {#pytorch\_cls\_tutorial\_dnn\_conversion}

@prev\_tutorial{tutorial\_dnn\_OCR} @next\_tutorial{pytorch\_cls\_c\_tutorial\_dnn\_conversion}

Original author

Anastasia Murzova

Compatibility

OpenCV >= 4.5

## Goals

In this tutorial you will learn how to:

-   convert PyTorch classification models into ONNX format
-   run converted PyTorch model with OpenCV Python API
-   obtain an evaluation of the PyTorch and OpenCV DNN models.

We will explore the above-listed points by the example of the ResNet-50 architecture.

## Introduction

Let's briefly view the key concepts involved in the pipeline of PyTorch models transition with OpenCV API. The initial step in conversion of PyTorch models into cv.dnn.Net is model transferring into [ONNX](https://onnx.ai/about.html) format. ONNX aims at the interchangeability of the neural networks between various frameworks. There is a built-in function in PyTorch for ONNX conversion: [`torch.onnx.export`](https://pytorch.org/docs/stable/onnx.html#torch.onnx.export). Further the obtained `.onnx` model is passed into cv.dnn.readNetFromONNX.

## Requirements

To be able to experiment with the below code you will need to install a set of libraries. We will use a virtual environment with python3.7+ for this:

```
virtualenv -p /usr/bin/python3.7 <env_dir_path>
source <env_dir_path>/bin/activate
```

For OpenCV-Python building from source, follow the corresponding instructions from the @ref tutorial\_py\_table\_of\_contents\_setup.

Before you start the installation of the libraries, you can customize the [requirements.txt](https://github.com/opencv/opencv/tree/5.x/samples/dnn/dnn_model_runner/dnn_conversion/requirements.txt), excluding or including (for example, `opencv-python`) some dependencies. The below line initiates requirements installation into the previously activated virtual environment:

```
pip install -r requirements.txt
```

## Practice

In this part we are going to cover the following points:

1.  create a classification model conversion pipeline and provide the inference
2.  evaluate and test classification models

If you'd like merely to run evaluation or test model pipelines, the "Model Conversion Pipeline" part can be skipped.

### Model Conversion Pipeline

The code in this subchapter is located in the `dnn_model_runner` module and can be executed with the line:

```
python -m dnn_model_runner.dnn_conversion.pytorch.classification.py_to_py_resnet50
```

The following code contains the description of the below-listed steps:

1.  instantiate PyTorch model
2.  convert PyTorch model into `.onnx`
3.  read the transferred network with OpenCV API
4.  prepare input data
5.  provide inference

```
# initialize PyTorch ResNet-50 model
original_model = models.resnet50(pretrained=True)

# get the path to the converted into ONNX PyTorch model
full_model_path = get_pytorch_onnx_model(original_model)

# read converted .onnx model with OpenCV API
opencv_net = cv2.dnn.readNetFromONNX(full_model_path)
print("OpenCV model was successfully read. Layer IDs: \n", opencv_net.getLayerNames())

# get preprocessed image
input_img = get_preprocessed_img("../data/squirrel_cls.jpg")

# get ImageNet labels
imagenet_labels = get_imagenet_labels("../data/dnn/classification_classes_ILSVRC2012.txt")

# obtain OpenCV DNN predictions
get_opencv_dnn_prediction(opencv_net, input_img, imagenet_labels)

# obtain original PyTorch ResNet50 predictions
get_pytorch_dnn_prediction(original_model, input_img, imagenet_labels)
```

To provide model inference we will use the below [squirrel photo](https://www.pexels.com/photo/brown-squirrel-eating-1564292) (under [CC0](https://www.pexels.com/terms-of-service/) license) corresponding to ImageNet class ID 335:

```
fox squirrel, eastern fox squirrel, Sciurus niger
```

For the label decoding of the obtained prediction, we also need `imagenet_classes.txt` file, which contains the full list of the ImageNet classes.

Let's go deeper into each step by the example of pretrained PyTorch ResNet-50:

-   instantiate PyTorch ResNet-50 model:

```
# initialize PyTorch ResNet-50 model
original_model = models.resnet50(pretrained=True)
```

-   convert PyTorch model into ONNX:

```
# define the directory for further converted model save
onnx_model_path = "models"
# define the name of further converted model
onnx_model_name = "resnet50.onnx"

# create directory for further converted model
os.makedirs(onnx_model_path, exist_ok=True)

# get full path to the converted model
full_model_path = os.path.join(onnx_model_path, onnx_model_name)

# generate model input
generated_input = Variable(
    torch.randn(1, 3, 224, 224)
)

# model export into ONNX format
torch.onnx.export(
    original_model,
    generated_input,
    full_model_path,
    verbose=True,
    input_names=["input"],
    output_names=["output"],
    opset_version=11
)
```

After the successful execution of the above code, we will get `models/resnet50.onnx`.

-   read the transferred network with cv.dnn.readNetFromONNX passing the obtained in the previous step ONNX model into it:

```
# read converted .onnx model with OpenCV API
opencv_net = cv2.dnn.readNetFromONNX(full_model_path)
```

-   prepare input data:

```
# read the image
input_img = cv2.imread(img_path, cv2.IMREAD_COLOR)
input_img = input_img.astype(np.float32)

input_img = cv2.resize(input_img, (256, 256))

# define preprocess parameters
mean = np.array([0.485, 0.456, 0.406]) * 255.0
scale = 1 / 255.0
std = [0.229, 0.224, 0.225]

# prepare input blob to fit the model input:
# 1. subtract mean
# 2. scale to set pixel values from 0 to 1
input_blob = cv2.dnn.blobFromImage(
    image=input_img,
    scalefactor=scale,
    size=(224, 224),  # img target size
    mean=mean,
    swapRB=True,  # BGR -> RGB
    crop=True  # center crop
)
# 3. divide by std
input_blob[0] /= np.asarray(std, dtype=np.float32).reshape(3, 1, 1)
```

In this step we read the image and prepare model input with cv.dnn.blobFromImage function, which returns 4-dimensional blob. It should be noted that firstly in cv.dnn.blobFromImage mean value is subtracted and only then pixel values are multiplied by scale. Thus, `mean` is multiplied by `255.0` to reproduce the original image preprocessing order:

```
img /= 255.0
img -= [0.485, 0.456, 0.406]
img /= [0.229, 0.224, 0.225]
```

-   OpenCV cv.dnn.Net inference:

```
# set OpenCV DNN input
opencv_net.setInput(preproc_img)

# OpenCV DNN inference
out = opencv_net.forward()
print("OpenCV DNN prediction: \n")
print("* shape: ", out.shape)

# get the predicted class ID
imagenet_class_id = np.argmax(out)

# get confidence
confidence = out[0][imagenet_class_id]
print("* class ID: {}, label: {}".format(imagenet_class_id, imagenet_labels[imagenet_class_id]))
print("* confidence: {:.4f}".format(confidence))
```

After the above code execution we will get the following output:

```
OpenCV DNN prediction:
* shape:  (1, 1000)
* class ID: 335, label: fox squirrel, eastern fox squirrel, Sciurus niger
* confidence: 14.8308
```

-   PyTorch ResNet-50 model inference:

```
original_net.eval()
preproc_img = torch.FloatTensor(preproc_img)

# inference
out = original_net(preproc_img)
print("\nPyTorch model prediction: \n")
print("* shape: ", out.shape)

# get the predicted class ID
imagenet_class_id = torch.argmax(out, axis=1).item()
print("* class ID: {}, label: {}".format(imagenet_class_id, imagenet_labels[imagenet_class_id]))

# get confidence
confidence = out[0][imagenet_class_id]
print("* confidence: {:.4f}".format(confidence.item()))
```

After the above code launching we will get the following output:

```
PyTorch model prediction:
* shape:  torch.Size([1, 1000])
* class ID: 335, label: fox squirrel, eastern fox squirrel, Sciurus niger
* confidence: 14.8308
```

The inference results of the original ResNet-50 model and cv.dnn.Net are equal. For the extended evaluation of the models we can use `py_to_py_cls` of the `dnn_model_runner` module. This module part will be described in the next subchapter.

### Evaluation of the Models

The proposed in `samples/dnn` `dnn_model_runner` module allows to run the full evaluation pipeline on the ImageNet dataset and test execution for the following PyTorch classification models:

-   alexnet
-   vgg11
-   vgg13
-   vgg16
-   vgg19
-   resnet18
-   resnet34
-   resnet50
-   resnet101
-   resnet152
-   squeezenet1\_0
-   squeezenet1\_1
-   resnext50\_32x4d
-   resnext101\_32x8d
-   wide\_resnet50\_2
-   wide\_resnet101\_2

This list can be also extended with further appropriate evaluation pipeline configuration.

#### Evaluation Mode

The below line represents running of the module in the evaluation mode:

```
python -m dnn_model_runner.dnn_conversion.pytorch.classification.py_to_py_cls --model_name <pytorch_cls_model_name>
```

Chosen from the list classification model will be read into OpenCV cv.dnn.Net object. Evaluation results of PyTorch and OpenCV models (accuracy, inference time, L1) will be written into the log file. Inference time values will be also depicted in a chart to generalize the obtained model information.

Necessary evaluation configurations are defined in the [test\_config.py](https://github.com/opencv/opencv/tree/5.x/samples/dnn/dnn_model_runner/dnn_conversion/common/test/configs/test_config.py) and can be modified in accordance with actual paths of data location:

```
@dataclass
class TestClsConfig:
    batch_size: int = 50
    frame_size: int = 224
    img_root_dir: str = "./ILSVRC2012_img_val"
    # location of image-class matching
    img_cls_file: str = "./val.txt"
    bgr_to_rgb: bool = True
```

To initiate the evaluation of the PyTorch ResNet-50, run the following line:

```
python -m dnn_model_runner.dnn_conversion.pytorch.classification.py_to_py_cls --model_name resnet50
```

After script launch, the log file with evaluation data will be generated in `dnn_model_runner/dnn_conversion/logs`:

```
The model PyTorch resnet50 was successfully obtained and converted to OpenCV DNN resnet50
===== Running evaluation of the model with the following params:
    * val data location: ./ILSVRC2012_img_val
    * log file location: dnn_model_runner/dnn_conversion/logs/PyTorch_resnet50_log.txt
```

#### Test Mode

The below line represents running of the module in the test mode, namely it provides the steps for the model inference:

```
python -m dnn_model_runner.dnn_conversion.pytorch.classification.py_to_py_cls --model_name <pytorch_cls_model_name> --test True --default_img_preprocess <True/False> --evaluate False
```

Here `default_img_preprocess` key defines whether you'd like to parametrize the model test process with some particular values or use the default values, for example, `scale`, `mean` or `std`.

Test configuration is represented in [test\_config.py](https://github.com/opencv/opencv/tree/5.x/samples/dnn/dnn_model_runner/dnn_conversion/common/test/configs/test_config.py) `TestClsModuleConfig` class:

```
@dataclass
class TestClsModuleConfig:
    cls_test_data_dir: str = "../data"
    test_module_name: str = "classification"
    test_module_path: str = "classification.py"
    input_img: str = os.path.join(cls_test_data_dir, "squirrel_cls.jpg")
    model: str = ""

    frame_height: str = str(TestClsConfig.frame_size)
    frame_width: str = str(TestClsConfig.frame_size)
    scale: str = "1.0"
    mean: List[str] = field(default_factory=lambda: ["0.0", "0.0", "0.0"])
    std: List[str] = field(default_factory=list)
    crop: str = "False"
    rgb: str = "True"
    rsz_height: str = ""
    rsz_width: str = ""
    classes: str = os.path.join(cls_test_data_dir, "dnn", "classification_classes_ILSVRC2012.txt")
```

The default image preprocessing options are defined in [default\_preprocess\_config.py](https://github.com/opencv/opencv/tree/5.x/samples/dnn/dnn_model_runner/dnn_conversion/common/test/configs/default_preprocess_config.py). For instance:

```
BASE_IMG_SCALE_FACTOR = 1 / 255.0
PYTORCH_RSZ_HEIGHT = 256
PYTORCH_RSZ_WIDTH = 256

pytorch_resize_input_blob = {
    "mean": ["123.675", "116.28", "103.53"],
    "scale": str(BASE_IMG_SCALE_FACTOR),
    "std": ["0.229", "0.224", "0.225"],
    "crop": "True",
    "rgb": "True",
    "rsz_height": str(PYTORCH_RSZ_HEIGHT),
    "rsz_width": str(PYTORCH_RSZ_WIDTH)
}
```

The basis of the model testing is represented in [samples/dnn/classification.py](https://github.com/opencv/opencv/blob/5.x/samples/dnn/classification.py). `classification.py` can be executed autonomously with provided converted model in `--input` and populated parameters for cv.dnn.blobFromImage.

To reproduce from scratch the described in "Model Conversion Pipeline" OpenCV steps with `dnn_model_runner` execute the below line:

```
python -m dnn_model_runner.dnn_conversion.pytorch.classification.py_to_py_cls --model_name resnet50 --test True --default_img_preprocess True --evaluate False
```

The network prediction is depicted in the top left corner of the output window:

## [Conversion of TensorFlow Classification Models and Launch with OpenCV Python {#tf_cls_tutorial_dnn_conversion}](https://docharvest.github.io/docs/opencv5/tutorials/dnn/dnn_pytorch_tf_classification/tf_cls_model_conversion_tutorial/)

Contents

opencv5

Conversion of TensorFlow Classification Models and Launch with OpenCV Python {#tf\_cls\_tutorial\_dnn\_conversion}

OpenCV 5

Conversion of TensorFlow Classification Models and Launch with OpenCV Python {#tf\_cls\_tutorial\_dnn\_conversion}

Original author

Anastasia Murzova

Compatibility

OpenCV >= 4.5

## Goals

In this tutorial you will learn how to:

-   obtain frozen graphs of TensorFlow (TF) classification models
-   run converted TensorFlow model with OpenCV Python API
-   obtain an evaluation of the TensorFlow and OpenCV DNN models

We will explore the above-listed points by the example of MobileNet architecture.

## Introduction

Let's briefly view the key concepts involved in the pipeline of TensorFlow models transition with OpenCV API. The initial step in conversion of TensorFlow models into cv.dnn.Net is obtaining the frozen TF model graph. Frozen graph defines the combination of the model graph structure with kept values of the required variables, for example, weights. Usually the frozen graph is saved in [protobuf](https://en.wikipedia.org/wiki/Protocol_Buffers) (`.pb`) files. After the model `.pb` file was generated it can be read with cv.dnn.readNetFromTensorflow function.

## Requirements

To be able to experiment with the below code you will need to install a set of libraries. We will use a virtual environment with python3.7+ for this:

```
virtualenv -p /usr/bin/python3.7 <env_dir_path>
source <env_dir_path>/bin/activate
```

For OpenCV-Python building from source, follow the corresponding instructions from the @ref tutorial\_py\_table\_of\_contents\_setup.

Before you start the installation of the libraries, you can customize the [requirements.txt](https://github.com/opencv/opencv/tree/5.x/samples/dnn/dnn_model_runner/dnn_conversion/requirements.txt), excluding or including (for example, `opencv-python`) some dependencies. The below line initiates requirements installation into the previously activated virtual environment:

```
pip install -r requirements.txt
```

## Practice

In this part we are going to cover the following points:

1.  create a TF classification model conversion pipeline and provide the inference
2.  evaluate and test TF classification models

If you'd like merely to run evaluation or test model pipelines, the "Model Conversion Pipeline" tutorial part can be skipped.

### Model Conversion Pipeline

The code in this subchapter is located in the `dnn_model_runner` module and can be executed with the line:

```
python -m dnn_model_runner.dnn_conversion.tf.classification.py_to_py_mobilenet
```

The following code contains the description of the below-listed steps:

1.  instantiate TF model
2.  create TF frozen graph
3.  read TF frozen graph with OpenCV API
4.  prepare input data
5.  provide inference

```
# initialize TF MobileNet model
original_tf_model = MobileNet(
    include_top=True,
    weights="imagenet"
)

# get TF frozen graph path
full_pb_path = get_tf_model_proto(original_tf_model)

# read frozen graph with OpenCV API
opencv_net = cv2.dnn.readNetFromTensorflow(full_pb_path)
print("OpenCV model was successfully read. Model layers: \n", opencv_net.getLayerNames())

# get preprocessed image
input_img = get_preprocessed_img("../data/squirrel_cls.jpg")

# get ImageNet labels
imagenet_labels = get_imagenet_labels("../data/dnn/classification_classes_ILSVRC2012.txt")

# obtain OpenCV DNN predictions
get_opencv_dnn_prediction(opencv_net, input_img, imagenet_labels)

# obtain TF model predictions
get_tf_dnn_prediction(original_tf_model, input_img, imagenet_labels)
```

To provide model inference we will use the below [squirrel photo](https://www.pexels.com/photo/brown-squirrel-eating-1564292) (under [CC0](https://www.pexels.com/terms-of-service/) license) corresponding to ImageNet class ID 335:

```
fox squirrel, eastern fox squirrel, Sciurus niger
```

For the label decoding of the obtained prediction, we also need `imagenet_classes.txt` file, which contains the full list of the ImageNet classes.

Let's go deeper into each step by the example of pretrained TF MobileNet:

-   instantiate TF model:

```
# initialize TF MobileNet model
original_tf_model = MobileNet(
    include_top=True,
    weights="imagenet"
)
```

-   create TF frozen graph

```
# define the directory for .pb model
pb_model_path = "models"

# define the name of .pb model
pb_model_name = "mobilenet.pb"

# create directory for further converted model
os.makedirs(pb_model_path, exist_ok=True)

# get model TF graph
tf_model_graph = tf.function(lambda x: tf_model(x))

# get concrete function
tf_model_graph = tf_model_graph.get_concrete_function(
    tf.TensorSpec(tf_model.inputs[0].shape, tf_model.inputs[0].dtype))

# obtain frozen concrete function
frozen_tf_func = convert_variables_to_constants_v2(tf_model_graph)
# get frozen graph
frozen_tf_func.graph.as_graph_def()

# save full tf model
tf.io.write_graph(graph_or_graph_def=frozen_tf_func.graph,
                  logdir=pb_model_path,
                  name=pb_model_name,
                  as_text=False)
```

After the successful execution of the above code, we will get a frozen graph in `models/mobilenet.pb`.

-   read TF frozen graph with with cv.dnn.readNetFromTensorflow passing the obtained in the previous step `mobilenet.pb` into it:

```
# get TF frozen graph path
full_pb_path = get_tf_model_proto(original_tf_model)
```

-   prepare input data with cv2.dnn.blobFromImage function:

```
# read the image
input_img = cv2.imread(img_path, cv2.IMREAD_COLOR)
input_img = input_img.astype(np.float32)

# define preprocess parameters
mean = np.array([1.0, 1.0, 1.0]) * 127.5
scale = 1 / 127.5

# prepare input blob to fit the model input:
# 1. subtract mean
# 2. scale to set pixel values from 0 to 1
input_blob = cv2.dnn.blobFromImage(
    image=input_img,
    scalefactor=scale,
    size=(224, 224),  # img target size
    mean=mean,
    swapRB=True,  # BGR -> RGB
    crop=True  # center crop
)
print("Input blob shape: {}\n".format(input_blob.shape))
```

Please, pay attention at the preprocessing order in the cv2.dnn.blobFromImage function. Firstly, the mean value is subtracted and only then pixel values are multiplied by the defined scale. Therefore, to reproduce the image preprocessing pipeline from the TF [`mobilenet.preprocess_input`](https://github.com/tensorflow/tensorflow/blob/02032fb477e9417197132648ec81e75beee9063a/tensorflow/python/keras/applications/mobilenet.py#L443-L445) function, we multiply `mean` by `127.5`.

As a result, 4-dimensional `input_blob` was obtained:

`Input blob shape: (1, 3, 224, 224)`

-   provide OpenCV cv.dnn.Net inference:

```
# set OpenCV DNN input
opencv_net.setInput(preproc_img)

# OpenCV DNN inference
out = opencv_net.forward()
print("OpenCV DNN prediction: \n")
print("* shape: ", out.shape)

# get the predicted class ID
imagenet_class_id = np.argmax(out)

# get confidence
confidence = out[0][imagenet_class_id]
print("* class ID: {}, label: {}".format(imagenet_class_id, imagenet_labels[imagenet_class_id]))
print("* confidence: {:.4f}\n".format(confidence))
```

After the above code execution we will get the following output:

```
OpenCV DNN prediction:
* shape:  (1, 1000)
* class ID: 335, label: fox squirrel, eastern fox squirrel, Sciurus niger
* confidence: 0.9525
```

-   provide TF MobileNet inference:

```
# inference
preproc_img = preproc_img.transpose(0, 2, 3, 1)
print("TF input blob shape: {}\n".format(preproc_img.shape))

out = original_net(preproc_img)

print("\nTensorFlow model prediction: \n")
print("* shape: ", out.shape)

# get the predicted class ID
imagenet_class_id = np.argmax(out)
print("* class ID: {}, label: {}".format(imagenet_class_id, imagenet_labels[imagenet_class_id]))

# get confidence
confidence = out[0][imagenet_class_id]
print("* confidence: {:.4f}".format(confidence))
```

To fit TF model input, `input_blob` was transposed:

```
TF input blob shape: (1, 224, 224, 3)
```

TF inference results are the following:

```
TensorFlow model prediction:
* shape:  (1, 1000)
* class ID: 335, label: fox squirrel, eastern fox squirrel, Sciurus niger
* confidence: 0.9525
```

As it can be seen from the experiments OpenCV and TF inference results are equal.

### Evaluation of the Models

The proposed in `dnn/samples` `dnn_model_runner` module allows to run the full evaluation pipeline on the ImageNet dataset and test execution for the following TensorFlow classification models:

-   vgg16
-   vgg19
-   resnet50
-   resnet101
-   resnet152
-   densenet121
-   densenet169
-   densenet201
-   inceptionresnetv2
-   inceptionv3
-   mobilenet
-   mobilenetv2
-   nasnetlarge
-   nasnetmobile
-   xception

This list can be also extended with further appropriate evaluation pipeline configuration.

#### Evaluation Mode

To below line represents running of the module in the evaluation mode:

```
python -m dnn_model_runner.dnn_conversion.tf.classification.py_to_py_cls --model_name <tf_cls_model_name>
```

Chosen from the list classification model will be read into OpenCV `cv.dnn_Net` object. Evaluation results of TF and OpenCV models (accuracy, inference time, L1) will be written into the log file. Inference time values will be also depicted in a chart to generalize the obtained model information.

Necessary evaluation configurations are defined in the [test\_config.py](https://github.com/opencv/opencv/tree/5.x/samples/dnn/dnn_model_runner/dnn_conversion/common/test/configs/test_config.py) and can be modified in accordance with actual paths of data location::

```
@dataclass
class TestClsConfig:
    batch_size: int = 50
    frame_size: int = 224
    img_root_dir: str = "./ILSVRC2012_img_val"
    # location of image-class matching
    img_cls_file: str = "./val.txt"
    bgr_to_rgb: bool = True
```

The values from `TestClsConfig` can be customized in accordance with chosen model.

To initiate the evaluation of the TensorFlow MobileNet, run the following line:

```
python -m dnn_model_runner.dnn_conversion.tf.classification.py_to_py_cls --model_name mobilenet
```

After script launch, the log file with evaluation data will be generated in `dnn_model_runner/dnn_conversion/logs`:

```
===== Running evaluation of the model with the following params:
    * val data location: ./ILSVRC2012_img_val
    * log file location: dnn_model_runner/dnn_conversion/logs/TF_mobilenet_log.txt
```

#### Test Mode

The below line represents running of the module in the test mode, namely it provides the steps for the model inference:

```
python -m dnn_model_runner.dnn_conversion.tf.classification.py_to_py_cls --model_name <tf_cls_model_name> --test True --default_img_preprocess <True/False> --evaluate False
```

Here `default_img_preprocess` key defines whether you'd like to parametrize the model test process with some particular values or use the default values, for example, `scale`, `mean` or `std`.

Test configuration is represented in [test\_config.py](https://github.com/opencv/opencv/tree/5.x/samples/dnn/dnn_model_runner/dnn_conversion/common/test/configs/test_config.py) `TestClsModuleConfig` class:

```
@dataclass
class TestClsModuleConfig:
    cls_test_data_dir: str = "../data"
    test_module_name: str = "classification"
    test_module_path: str = "classification.py"
    input_img: str = os.path.join(cls_test_data_dir, "squirrel_cls.jpg")
    model: str = ""

    frame_height: str = str(TestClsConfig.frame_size)
    frame_width: str = str(TestClsConfig.frame_size)
    scale: str = "1.0"
    mean: List[str] = field(default_factory=lambda: ["0.0", "0.0", "0.0"])
    std: List[str] = field(default_factory=list)
    crop: str = "False"
    rgb: str = "True"
    rsz_height: str = ""
    rsz_width: str = ""
    classes: str = os.path.join(cls_test_data_dir, "dnn", "classification_classes_ILSVRC2012.txt")
```

The default image preprocessing options are defined in `default_preprocess_config.py`. For instance, for MobileNet:

```
tf_input_blob = {
    "mean": ["127.5", "127.5", "127.5"],
    "scale": str(1 / 127.5),
    "std": [],
    "crop": "True",
    "rgb": "True"
}
```

The basis of the model testing is represented in [samples/dnn/classification.py](https://github.com/opencv/opencv/blob/5.x/samples/dnn/classification.py). `classification.py` can be executed autonomously with provided converted model in `--input` and populated parameters for cv.dnn.blobFromImage.

To reproduce from scratch the described in "Model Conversion Pipeline" OpenCV steps with `dnn_model_runner` execute the below line:

```
python -m dnn_model_runner.dnn_conversion.tf.classification.py_to_py_cls --model_name mobilenet --test True --default_img_preprocess True --evaluate False
```

The network prediction is depicted in the top left corner of the output window:

## [Conversion of TensorFlow Detection Models and Launch with OpenCV Python {#tf_det_tutorial_dnn_conversion}](https://docharvest.github.io/docs/opencv5/tutorials/dnn/dnn_pytorch_tf_detection/tf_det_model_conversion_tutorial/)

Contents

opencv5

Conversion of TensorFlow Detection Models and Launch with OpenCV Python {#tf\_det\_tutorial\_dnn\_conversion}

OpenCV 5

Conversion of TensorFlow Detection Models and Launch with OpenCV Python {#tf\_det\_tutorial\_dnn\_conversion}

Original author

Anastasia Murzova

Compatibility

OpenCV >= 4.5

## Goals

In this tutorial you will learn how to:

-   obtain frozen graphs of TensorFlow (TF) detection models
-   run converted TensorFlow model with OpenCV Python API

We will explore the above-listed points by the example of SSD MobileNetV1.

## Introduction

Let's briefly view the key concepts involved in the pipeline of TensorFlow models transition with OpenCV API. The initial step in the conversion of TensorFlow models into cv.dnn.Net is obtaining the frozen TF model graph. A frozen graph defines the combination of the model graph structure with kept values of the required variables, for example, weights. The frozen graph is saved in [protobuf](https://en.wikipedia.org/wiki/Protocol_Buffers) (`.pb`) files. There are special functions for reading `.pb` graphs in OpenCV: cv.dnn.readNetFromTensorflow and cv.dnn.readNet.

## Requirements

To be able to experiment with the below code you will need to install a set of libraries. We will use a virtual environment with python3.7+ for this:

```
virtualenv -p /usr/bin/python3.7 <env_dir_path>
source <env_dir_path>/bin/activate
```

For OpenCV-Python building from source, follow the corresponding instructions from the @ref tutorial\_py\_table\_of\_contents\_setup.

Before you start the installation of the libraries, you can customize the [requirements.txt](https://github.com/opencv/opencv/tree/5.x/samples/dnn/dnn_model_runner/dnn_conversion/requirements.txt), excluding or including (for example, `opencv-python`) some dependencies. The below line initiates requirements installation into the previously activated virtual environment:

```
pip install -r requirements.txt
```

## Practice

In this part we are going to cover the following points:

1.  create a TF classification model conversion pipeline and provide the inference
2.  provide the inference, process prediction results

### Model Preparation

The code in this subchapter is located in the `samples/dnn/dnn_model_runner` module and can be executed with the below line:

```
python -m dnn_model_runner.dnn_conversion.tf.detection.py_to_py_ssd_mobilenet
```

The following code contains the steps of the TF SSD MobileNetV1 model retrieval:

```
    tf_model_name = 'ssd_mobilenet_v1_coco_2017_11_17'
    graph_extraction_dir = "./"
    frozen_graph_path = extract_tf_frozen_graph(tf_model_name, graph_extraction_dir)
    print("Frozen graph path for {}: {}".format(tf_model_name, frozen_graph_path))
```

In `extract_tf_frozen_graph` function we extract the provided in model archive `frozen_inference_graph.pb` for its further processing:

```
# define model archive name
tf_model_tar = model_name + '.tar.gz'
# define link to retrieve model archive
model_link = DETECTION_MODELS_URL + tf_model_tar

tf_frozen_graph_name = 'frozen_inference_graph'

try:
    urllib.request.urlretrieve(model_link, tf_model_tar)
except Exception:
    print("TF {} was not retrieved: {}".format(model_name, model_link))
    return

print("TF {} was retrieved.".format(model_name))

tf_model_tar = tarfile.open(tf_model_tar)
frozen_graph_path = ""

for model_tar_elem in tf_model_tar.getmembers():
    if tf_frozen_graph_name in os.path.basename(model_tar_elem.name):
        tf_model_tar.extract(model_tar_elem, extracted_model_path)
        frozen_graph_path = os.path.join(extracted_model_path, model_tar_elem.name)
        break
tf_model_tar.close()
```

After the successful execution of the above code we will get the following output:

```
TF ssd_mobilenet_v1_coco_2017_11_17 was retrieved.
Frozen graph path for ssd_mobilenet_v1_coco_2017_11_17: ./ssd_mobilenet_v1_coco_2017_11_17/frozen_inference_graph.pb
```

To provide model inference we will use the below [double-decker bus photo](https://www.pexels.com/photo/bus-and-car-on-one-way-street-3626589/) (under [Pexels](https://www.pexels.com/license/) license):

To initiate the test process we need to provide an appropriate model configuration. We will use [`ssd_mobilenet_v1_coco.config`](https://github.com/tensorflow/models/blob/master/research/object_detection/samples/configs/ssd_mobilenet_v1_coco.config) from [TensorFlow Object Detection API](https://github.com/tensorflow/models/tree/master/research/object_detection#tensorflow-object-detection-api). TensorFlow Object Detection API framework contains helpful mechanisms for object detection model manipulations.

We will use this configuration to provide a text graph representation. To generate `.pbtxt` we will use the corresponding [`samples/dnn/tf_text_graph_ssd.py`](https://github.com/opencv/opencv/blob/5.x/samples/dnn/tf_text_graph_ssd.py) script:

```
python tf_text_graph_ssd.py --input ssd_mobilenet_v1_coco_2017_11_17/frozen_inference_graph.pb --config ssd_mobilenet_v1_coco_2017_11_17/ssd_mobilenet_v1_coco.config --output ssd_mobilenet_v1_coco_2017_11_17.pbtxt
```

After successful execution `ssd_mobilenet_v1_coco_2017_11_17.pbtxt` will be created.

Before we run `object_detection.py`, let's have a look at the default values for the SSD MobileNetV1 test process configuration. They are located in [`models.yml`](https://github.com/opencv/opencv/blob/5.x/samples/dnn/models.yml):

```
ssd_tf:
  model: "ssd_mobilenet_v1_coco_2017_11_17.pb"
  config: "ssd_mobilenet_v1_coco_2017_11_17.pbtxt"
  mean: [0, 0, 0]
  scale: 1.0
  width: 300
  height: 300
  rgb: true
  classes: "object_detection_classes_coco.txt"
  sample: "object_detection"
```

To fetch these values we need to provide frozen graph `ssd_mobilenet_v1_coco_2017_11_17.pb` model and text graph `ssd_mobilenet_v1_coco_2017_11_17.pbtxt`:

```
python object_detection.py ssd_tf --input ../data/pexels_double_decker_bus.jpg
```

This line is equivalent to:

```
python object_detection.py --model ssd_mobilenet_v1_coco_2017_11_17.pb --config  ssd_mobilenet_v1_coco_2017_11_17.pbtxt  --input ../data/pexels_double_decker_bus.jpg --width 300 --height 300 --classes ../data/dnn/object_detection_classes_coco.txt
```

The result is:

There are several helpful parameters, which can be also customized for result corrections: threshold (`--thr`) and non-maximum suppression (`--nms`) values.

## [Conversion of PyTorch Segmentation Models and Launch with OpenCV {#pytorch_segm_tutorial_dnn_conversion}](https://docharvest.github.io/docs/opencv5/tutorials/dnn/dnn_pytorch_tf_segmentation/pytorch_sem_segm_model_conversion_tutorial/)

Contents

opencv5

Conversion of PyTorch Segmentation Models and Launch with OpenCV {#pytorch\_segm\_tutorial\_dnn\_conversion}

OpenCV 5

Conversion of PyTorch Segmentation Models and Launch with OpenCV {#pytorch\_segm\_tutorial\_dnn\_conversion}

## Goals

In this tutorial you will learn how to:

-   convert PyTorch segmentation models
-   run converted PyTorch model with OpenCV
-   obtain an evaluation of the PyTorch and OpenCV DNN models

We will explore the above-listed points by the example of the FCN ResNet-50 architecture.

## Introduction

The key points involved in the transition pipeline of the [PyTorch classification](https://link_to_cls_tutorial) and segmentation models with OpenCV API are equal. The first step is model transferring into [ONNX](https://onnx.ai/about.html) format with PyTorch [`torch.onnx.export`](https://pytorch.org/docs/stable/onnx.html#torch.onnx.export) built-in function. Further the obtained `.onnx` model is passed into cv.dnn.readNetFromONNX, which returns cv.dnn.Net object ready for DNN manipulations.

## Practice

In this part we are going to cover the following points:

1.  create a segmentation model conversion pipeline and provide the inference
2.  evaluate and test segmentation models

If you'd like merely to run evaluation or test model pipelines, the "Model Conversion Pipeline" part can be skipped.

### Model Conversion Pipeline

The code in this subchapter is located in the `dnn_model_runner` module and can be executed with the line:

`python -m dnn_model_runner.dnn_conversion.pytorch.segmentation.py_to_py_fcnresnet50`

The following code contains the description of the below-listed steps:

1.  instantiate PyTorch model
2.  convert PyTorch model into `.onnx`
3.  read the transferred network with OpenCV API
4.  prepare input data
5.  provide inference
6.  get colored masks from predictions
7.  visualize results

```
# initialize PyTorch FCN ResNet-50 model
original_model = models.segmentation.fcn_resnet50(pretrained=True)

# get the path to the converted into ONNX PyTorch model
full_model_path = get_pytorch_onnx_model(original_model)

# read converted .onnx model with OpenCV API
opencv_net = cv2.dnn.readNetFromONNX(full_model_path)
print("OpenCV model was successfully read. Layer IDs: \n", opencv_net.getLayerNames())

# get preprocessed image
img, input_img = get_processed_imgs("test_data/sem_segm/2007_000033.jpg")

# obtain OpenCV DNN predictions
opencv_prediction = get_opencv_dnn_prediction(opencv_net, input_img)

# obtain original PyTorch ResNet50 predictions
pytorch_prediction = get_pytorch_dnn_prediction(original_model, input_img)

pascal_voc_classes, pascal_voc_colors = read_colors_info("test_data/sem_segm/pascal-classes.txt")

# obtain colored segmentation masks
opencv_colored_mask = get_colored_mask(img.shape, opencv_prediction, pascal_voc_colors)
pytorch_colored_mask = get_colored_mask(img.shape, pytorch_prediction, pascal_voc_colors)

# obtain palette of PASCAL VOC colors
color_legend = get_legend(pascal_voc_classes, pascal_voc_colors)

cv2.imshow('PyTorch Colored Mask', pytorch_colored_mask)
cv2.imshow('OpenCV DNN Colored Mask', opencv_colored_mask)
cv2.imshow('Color Legend', color_legend)

cv2.waitKey(0)
```

To provide the model inference we will use the below picture from the [PASCAL VOC](http://host.robots.ox.ac.uk/pascal/VOC/) validation dataset:

The target segmented result is:

For the PASCAL VOC colors decoding and its mapping with the predicted masks, we also need `pascal-classes.txt` file, which contains the full list of the PASCAL VOC classes and corresponding colors.

Let's go deeper into each code step by the example of pretrained PyTorch FCN ResNet-50:

-   instantiate PyTorch FCN ResNet-50 model:

```
# initialize PyTorch FCN ResNet-50 model
original_model = models.segmentation.fcn_resnet50(pretrained=True)
```

-   convert PyTorch model into ONNX format:

```
# define the directory for further converted model save
onnx_model_path = "models"
# define the name of further converted model
onnx_model_name = "fcnresnet50.onnx"

# create directory for further converted model
os.makedirs(onnx_model_path, exist_ok=True)

# get full path to the converted model
full_model_path = os.path.join(onnx_model_path, onnx_model_name)

# generate model input to build the graph
generated_input = Variable(
    torch.randn(1, 3, 500, 500)
)

# model export into ONNX format
torch.onnx.export(
    original_model,
    generated_input,
    full_model_path,
    verbose=True,
    input_names=["input"],
    output_names=["output"],
    opset_version=11
)
```

The code from this step does not differ from the classification conversion case. Thus, after the successful execution of the above code, we will get `models/fcnresnet50.onnx`.

-   read the transferred network with cv.dnn.readNetFromONNX passing the obtained in the previous step ONNX model into it:

```
# read converted .onnx model with OpenCV API
opencv_net = cv2.dnn.readNetFromONNX(full_model_path)
```

-   prepare input data:

```
# read the image
input_img = cv2.imread(img_path, cv2.IMREAD_COLOR)
input_img = input_img.astype(np.float32)

# target image sizes
img_height = input_img.shape[0]
img_width = input_img.shape[1]

# define preprocess parameters
mean = np.array([0.485, 0.456, 0.406]) * 255.0
scale = 1 / 255.0
std = [0.229, 0.224, 0.225]

# prepare input blob to fit the model input:
# 1. subtract mean
# 2. scale to set pixel values from 0 to 1
input_blob = cv2.dnn.blobFromImage(
    image=input_img,
    scalefactor=scale,
    size=(img_width, img_height),  # img target size
    mean=mean,
    swapRB=True,  # BGR -> RGB
    crop=False  # center crop
)
# 3. divide by std
input_blob[0] /= np.asarray(std, dtype=np.float32).reshape(3, 1, 1)
```

In this step we read the image and prepare model input with cv2.dnn.blobFromImage function, which returns 4-dimensional blob. It should be noted that firstly in `cv2.dnn.blobFromImage` mean value is subtracted and only then pixel values are scaled. Thus, `mean` is multiplied by `255.0` to reproduce the original image preprocessing order:

```
img /= 255.0
img -= [0.485, 0.456, 0.406]
img /= [0.229, 0.224, 0.225]
```

-   OpenCV `cv.dnn_Net` inference:

```
# set OpenCV DNN input
opencv_net.setInput(preproc_img)

# OpenCV DNN inference
out = opencv_net.forward()
print("OpenCV DNN segmentation prediction: \n")
print("* shape: ", out.shape)

# get IDs of predicted classes
out_predictions = np.argmax(out[0], axis=0)
```

After the above code execution we will get the following output:

```
OpenCV DNN segmentation prediction:
* shape:  (1, 21, 500, 500)
```

Each prediction channel out of 21, where 21 represents the number of PASCAL VOC classes, contains probabilities, which indicate how likely the pixel corresponds to the PASCAL VOC class.

-   PyTorch FCN ResNet-50 model inference:

```
original_net.eval()
preproc_img = torch.FloatTensor(preproc_img)

with torch.no_grad():
    # obtaining unnormalized probabilities for each class
    out = original_net(preproc_img)['out']

print("\nPyTorch segmentation model prediction: \n")
print("* shape: ", out.shape)

# get IDs of predicted classes
out_predictions = out[0].argmax(dim=0)
```

After the above code launching we will get the following output:

```
PyTorch segmentation model prediction:
* shape:  torch.Size([1, 21, 366, 500])
```

PyTorch prediction also contains probabilities corresponding to each class prediction.

-   get colored masks from predictions:

```
# convert mask values into PASCAL VOC colors
processed_mask = np.stack([colors[color_id] for color_id in segm_mask.flatten()])

# reshape mask into 3-channel image
processed_mask = processed_mask.reshape(mask_height, mask_width, 3)
processed_mask = cv2.resize(processed_mask, (img_width, img_height), interpolation=cv2.INTER_NEAREST).astype(
    np.uint8)

# convert colored mask from BGR to RGB for compatibility with PASCAL VOC colors
processed_mask = cv2.cvtColor(processed_mask, cv2.COLOR_BGR2RGB)
```

In this step we map the probabilities from segmentation masks with appropriate colors of the predicted classes. Let's have a look at the results:

For the extended evaluation of the models, we can use `py_to_py_segm` script of the `dnn_model_runner` module. This module part will be described in the next subchapter.

### Evaluation of the Models

The proposed in `dnn/samples` `dnn_model_runner` module allows to run the full evaluation pipeline on the PASCAL VOC dataset and test execution for the following PyTorch segmentation models:

-   FCN ResNet-50
-   FCN ResNet-101

This list can be also extended with further appropriate evaluation pipeline configuration.

#### Evaluation Mode

The below line represents running of the module in the evaluation mode:

```
python -m dnn_model_runner.dnn_conversion.pytorch.segmentation.py_to_py_segm --model_name <pytorch_segm_model_name>
```

Chosen from the list segmentation model will be read into OpenCV `cv.dnn_Net` object. Evaluation results of PyTorch and OpenCV models (pixel accuracy, mean IoU, inference time) will be written into the log file. Inference time values will be also depicted in a chart to generalize the obtained model information.

Necessary evaluation configurations are defined in the [`test_config.py`](https://github.com/opencv/opencv/tree/5.x/samples/dnn/dnn_model_runner/dnn_conversion/common/test/configs/test_config.py):

```
@dataclass
class TestSegmConfig:
    frame_size: int = 500
    img_root_dir: str = "./VOC2012"
    img_dir: str = os.path.join(img_root_dir, "JPEGImages/")
    img_segm_gt_dir: str = os.path.join(img_root_dir, "SegmentationClass/")
    # reduced val: https://github.com/shelhamer/fcn.berkeleyvision.org/blob/master/data/pascal/seg11valid.txt
    segm_val_file: str = os.path.join(img_root_dir, "ImageSets/Segmentation/seg11valid.txt")
    colour_file_cls: str = os.path.join(img_root_dir, "ImageSets/Segmentation/pascal-classes.txt")
```

These values can be modified in accordance with chosen model pipeline.

To initiate the evaluation of the PyTorch FCN ResNet-50, run the following line:

```
python -m dnn_model_runner.dnn_conversion.pytorch.segmentation.py_to_py_segm --model_name fcnresnet50
```

#### Test Mode

The below line represents running of the module in the test mode, which provides the steps for the model inference:

```
python -m dnn_model_runner.dnn_conversion.pytorch.segmentation.py_to_py_segm --model_name <pytorch_segm_model_name> --test True --default_img_preprocess <True/False> --evaluate False
```

Here `default_img_preprocess` key defines whether you'd like to parametrize the model test process with some particular values or use the default values, for example, `scale`, `mean` or `std`.

Test configuration is represented in [`test_config.py`](https://github.com/opencv/opencv/tree/5.x/samples/dnn/dnn_model_runner/dnn_conversion/common/test/configs/test_config.py) `TestSegmModuleConfig` class:

```
@dataclass
class TestSegmModuleConfig:
    segm_test_data_dir: str = "test_data/sem_segm"
    test_module_name: str = "segmentation"
    test_module_path: str = "segmentation.py"
    input_img: str = os.path.join(segm_test_data_dir, "2007_000033.jpg")
    model: str = ""

    frame_height: str = str(TestSegmConfig.frame_size)
    frame_width: str = str(TestSegmConfig.frame_size)
    scale: float = 1.0
    mean: List[float] = field(default_factory=lambda: [0.0, 0.0, 0.0])
    std: List[float] = field(default_factory=list)
    crop: bool = False
    rgb: bool = True
    classes: str = os.path.join(segm_test_data_dir, "pascal-classes.txt")
```

The default image preprocessing options are defined in `default_preprocess_config.py`:

```
pytorch_segm_input_blob = {
    "mean": ["123.675", "116.28", "103.53"],
    "scale": str(1 / 255.0),
    "std": ["0.229", "0.224", "0.225"],
    "crop": "False",
    "rgb": "True"
}
```

The basis of the model testing is represented in `samples/dnn/segmentation.py`. `segmentation.py` can be executed autonomously with provided converted model in `--input` and populated parameters for `cv2.dnn.blobFromImage`.

To reproduce from scratch the described in "Model Conversion Pipeline" OpenCV steps with `dnn_model_runner` execute the below line:

```
python -m dnn_model_runner.dnn_conversion.pytorch.segmentation.py_to_py_segm --model_name fcnresnet50 --test True --default_img_preprocess True --evaluate False
```

## [Conversion of TensorFlow Segmentation Models and Launch with OpenCV {#tf_segm_tutorial_dnn_conversion}](https://docharvest.github.io/docs/opencv5/tutorials/dnn/dnn_pytorch_tf_segmentation/tf_sem_segm_model_conversion_tutorial/)

Contents

opencv5

Conversion of TensorFlow Segmentation Models and Launch with OpenCV {#tf\_segm\_tutorial\_dnn\_conversion}

OpenCV 5

Conversion of TensorFlow Segmentation Models and Launch with OpenCV {#tf\_segm\_tutorial\_dnn\_conversion}

## Goals

In this tutorial you will learn how to:

-   convert TensorFlow (TF) segmentation models
-   run converted TensorFlow model with OpenCV
-   obtain an evaluation of the TensorFlow and OpenCV DNN models

We will explore the above-listed points by the example of the DeepLab architecture.

## Introduction

The key concepts involved in the transition pipeline of the [TensorFlow classification](https://link_to_cls_tutorial) and segmentation models with OpenCV API are almost equal excepting the phase of graph optimization. The initial step in conversion of TensorFlow models into cv.dnn.Net is obtaining the frozen TF model graph. Frozen graph defines the combination of the model graph structure with kept values of the required variables, for example, weights. Usually the frozen graph is saved in [protobuf](https://en.wikipedia.org/wiki/Protocol_Buffers) (`.pb`) files. To read the generated segmentation model `.pb` file with cv.dnn.readNetFromTensorflow, it is needed to modify the graph with TF [graph transform tool](https://github.com/tensorflow/tensorflow/tree/master/tensorflow/tools/graph_transforms).

## Practice

In this part we are going to cover the following points:

1.  create a TF classification model conversion pipeline and provide the inference
2.  evaluate and test TF classification models

If you'd like merely to run evaluation or test model pipelines, the "Model Conversion Pipeline" tutorial part can be skipped.

### Model Conversion Pipeline

The code in this subchapter is located in the `dnn_model_runner` module and can be executed with the line:

```
python -m dnn_model_runner.dnn_conversion.tf.segmentation.py_to_py_deeplab
```

TensorFlow segmentation models can be found in [TensorFlow Research Models](https://github.com/tensorflow/models/tree/master/research/#tensorflow-research-models) section, which contains the implementations of models on the basis of published research papers. We will retrieve the archive with the pre-trained TF DeepLabV3 from the below link:

```
http://download.tensorflow.org/models/deeplabv3_mnv2_pascal_trainval_2018_01_29.tar.gz
```

The full frozen graph obtaining pipeline is described in `deeplab_retrievement.py`:

```
def get_deeplab_frozen_graph():
    # define model path to download
    models_url = 'http://download.tensorflow.org/models/'
    mobilenetv2_voctrainval = 'deeplabv3_mnv2_pascal_trainval_2018_01_29.tar.gz'

    # construct model link to download
    model_link = models_url + mobilenetv2_voctrainval

    try:
        urllib.request.urlretrieve(model_link, mobilenetv2_voctrainval)
    except Exception:
        print("TF DeepLabV3 was not retrieved: {}".format(model_link))
        return

    tf_model_tar = tarfile.open(mobilenetv2_voctrainval)

    # iterate the obtained model archive
    for model_tar_elem in tf_model_tar.getmembers():
        # check whether the model archive contains frozen graph
        if TF_FROZEN_GRAPH_NAME in os.path.basename(model_tar_elem.name):
            # extract frozen graph
            tf_model_tar.extract(model_tar_elem, FROZEN_GRAPH_PATH)

    tf_model_tar.close()
```

After running this script:

```
python -m dnn_model_runner.dnn_conversion.tf.segmentation.deeplab_retrievement
```

we will get `frozen_inference_graph.pb` in `deeplab/deeplabv3_mnv2_pascal_trainval`.

Before going to the network loading with OpenCV it is needed to optimize the extracted `frozen_inference_graph.pb`. To optimize the graph we use TF `TransformGraph` with default parameters:

```
DEFAULT_OPT_GRAPH_NAME = "optimized_frozen_inference_graph.pb"
DEFAULT_INPUTS = "sub_7"
DEFAULT_OUTPUTS = "ResizeBilinear_3"
DEFAULT_TRANSFORMS = "remove_nodes(op=Identity)" \
                     " merge_duplicate_nodes" \
                     " strip_unused_nodes" \
                     " fold_constants(ignore_errors=true)" \
                     " fold_batch_norms" \
                     " fold_old_batch_norms"

def optimize_tf_graph(
        in_graph,
        out_graph=DEFAULT_OPT_GRAPH_NAME,
        inputs=DEFAULT_INPUTS,
        outputs=DEFAULT_OUTPUTS,
        transforms=DEFAULT_TRANSFORMS,
        is_manual=True,
        was_optimized=True
):
    # ...

    tf_opt_graph = TransformGraph(
        tf_graph,
        inputs,
        outputs,
        transforms
    )
```

To run graph optimization process, execute the line:

```
python -m dnn_model_runner.dnn_conversion.tf.segmentation.tf_graph_optimizer --in_graph deeplab/deeplabv3_mnv2_pascal_trainval/frozen_inference_graph.pb
```

As a result `deeplab/deeplabv3_mnv2_pascal_trainval` directory will contain `optimized_frozen_inference_graph.pb`.

After we have obtained the model graphs, let's examine the below-listed steps:

1.  read TF `frozen_inference_graph.pb` graph
2.  read optimized TF frozen graph with OpenCV API
3.  prepare input data
4.  provide inference
5.  get colored masks from predictions
6.  visualize results

```
# get TF model graph from the obtained frozen graph
deeplab_graph = read_deeplab_frozen_graph(deeplab_frozen_graph_path)

# read DeepLab frozen graph with OpenCV API
opencv_net = cv2.dnn.readNetFromTensorflow(opt_deeplab_frozen_graph_path)
print("OpenCV model was successfully read. Model layers: \n", opencv_net.getLayerNames())

# get processed image
original_img_shape, tf_input_blob, opencv_input_img = get_processed_imgs("test_data/sem_segm/2007_000033.jpg")

# obtain OpenCV DNN predictions
opencv_prediction = get_opencv_dnn_prediction(opencv_net, opencv_input_img)

# obtain TF model predictions
tf_prediction = get_tf_dnn_prediction(deeplab_graph, tf_input_blob)

# get PASCAL VOC classes and colors
pascal_voc_classes, pascal_voc_colors = read_colors_info("test_data/sem_segm/pascal-classes.txt")

# obtain colored segmentation masks
opencv_colored_mask = get_colored_mask(original_img_shape, opencv_prediction, pascal_voc_colors)
tf_colored_mask = get_tf_colored_mask(original_img_shape, tf_prediction, pascal_voc_colors)

# obtain palette of PASCAL VOC colors
color_legend = get_legend(pascal_voc_classes, pascal_voc_colors)

cv2.imshow('TensorFlow Colored Mask', tf_colored_mask)
cv2.imshow('OpenCV DNN Colored Mask', opencv_colored_mask)

cv2.imshow('Color Legend', color_legend)
```

To provide the model inference we will use the below picture from the [PASCAL VOC](http://host.robots.ox.ac.uk/pascal/VOC/) validation dataset:

The target segmented result is:

For the PASCAL VOC colors decoding and its mapping with the predicted masks, we also need `pascal-classes.txt` file, which contains the full list of the PASCAL VOC classes and corresponding colors.

Let's go deeper into each step by the example of pretrained TF DeepLabV3 MobileNetV2:

-   read TF `frozen_inference_graph.pb` graph :

```
# init deeplab model graph
model_graph = tf.Graph()

# obtain
with tf.io.gfile.GFile(frozen_graph_path, 'rb') as graph_file:
    tf_model_graph = GraphDef()
tf_model_graph.ParseFromString(graph_file.read())

with model_graph.as_default():
    tf.import_graph_def(tf_model_graph, name='')
```

-   read optimized TF frozen graph with OpenCV API:

```
# read DeepLab frozen graph with OpenCV API
opencv_net = cv2.dnn.readNetFromTensorflow(opt_deeplab_frozen_graph_path)
```

-   prepare input data with cv2.dnn.blobFromImage function:

```
# read the image
input_img = cv2.imread(img_path, cv2.IMREAD_COLOR)
input_img = input_img.astype(np.float32)

# preprocess image for TF model input
tf_preproc_img = cv2.resize(input_img, (513, 513))
tf_preproc_img = cv2.cvtColor(tf_preproc_img, cv2.COLOR_BGR2RGB)

# define preprocess parameters for OpenCV DNN
mean = np.array([1.0, 1.0, 1.0]) * 127.5
scale = 1 / 127.5

# prepare input blob to fit the model input:
# 1. subtract mean
# 2. scale to set pixel values from 0 to 1
input_blob = cv2.dnn.blobFromImage(
    image=input_img,
    scalefactor=scale,
    size=(513, 513),  # img target size
    mean=mean,
    swapRB=True,  # BGR -> RGB
    crop=False  # center crop
)
```

Please, pay attention at the preprocessing order in the `cv2.dnn.blobFromImage` function. Firstly, the mean value is subtracted and only then pixel values are multiplied by the defined scale. Therefore, to reproduce TF image preprocessing pipeline, we multiply `mean` by `127.5`. Another important point is image preprocessing for TF DeepLab. To pass the image into TF model we need only to construct an appropriate shape, the rest image preprocessing is described in [feature\_extractor.py](https://github.com/tensorflow/models/blob/master/research/deeplab/core/feature_extractor.py) and will be invoked automatically.

-   provide OpenCV `cv.dnn_Net` inference:

```
# set OpenCV DNN input
opencv_net.setInput(preproc_img)

# OpenCV DNN inference
out = opencv_net.forward()
print("OpenCV DNN segmentation prediction: \n")
print("* shape: ", out.shape)

# get IDs of predicted classes
out_predictions = np.argmax(out[0], axis=0)
```

After the above code execution we will get the following output:

```
OpenCV DNN segmentation prediction:
* shape:  (1, 21, 513, 513)
```

Each prediction channel out of 21, where 21 represents the number of PASCAL VOC classes, contains probabilities, which indicate how likely the pixel corresponds to the PASCAL VOC class.

-   provide TF model inference:

```
preproc_img = np.expand_dims(preproc_img, 0)

# init TF session
tf_session = Session(graph=model_graph)

input_tensor_name = "ImageTensor:0",
output_tensor_name = "SemanticPredictions:0"

# run inference
out = tf_session.run(
    output_tensor_name,
    feed_dict={input_tensor_name: [preproc_img]}
)

print("TF segmentation model prediction: \n")
print("* shape: ", out.shape)
```

TF inference results are the following:

```
TF segmentation model prediction:
* shape:  (1, 513, 513)
```

TensorFlow prediction contains the indexes of corresponding PASCAL VOC classes.

-   transform OpenCV prediction into colored mask:

```
mask_height = segm_mask.shape[0]
mask_width = segm_mask.shape[1]

img_height = original_img_shape[0]
img_width = original_img_shape[1]

# convert mask values into PASCAL VOC colors
processed_mask = np.stack([colors[color_id] for color_id in segm_mask.flatten()])

# reshape mask into 3-channel image
processed_mask = processed_mask.reshape(mask_height, mask_width, 3)
processed_mask = cv2.resize(processed_mask, (img_width, img_height), interpolation=cv2.INTER_NEAREST).astype(
    np.uint8)

# convert colored mask from BGR to RGB
processed_mask = cv2.cvtColor(processed_mask, cv2.COLOR_BGR2RGB)
```

In this step we map the probabilities from segmentation masks with appropriate colors of the predicted classes. Let's have a look at the results:

-   transform TF prediction into colored mask:

```
colors = np.array(colors)
processed_mask = colors[segm_mask[0]]

img_height = original_img_shape[0]
img_width = original_img_shape[1]

processed_mask = cv2.resize(processed_mask, (img_width, img_height), interpolation=cv2.INTER_NEAREST).astype(
    np.uint8)

# convert colored mask from BGR to RGB for compatibility with PASCAL VOC colors
processed_mask = cv2.cvtColor(processed_mask, cv2.COLOR_BGR2RGB)
```

The result is:

As a result, we get two equal segmentation masks.

### Evaluation of the Models

The proposed in `dnn/samples` `dnn_model_runner` module allows to run the full evaluation pipeline on the PASCAL VOC dataset and test execution for the DeepLab MobileNet model.

#### Evaluation Mode

To below line represents running of the module in the evaluation mode:

```
python -m dnn_model_runner.dnn_conversion.tf.segmentation.py_to_py_segm
```

The model will be read into OpenCV `cv.dnn_Net` object. Evaluation results of TF and OpenCV models (pixel accuracy, mean IoU, inference time) will be written into the log file. Inference time values will be also depicted in a chart to generalize the obtained model information.

Necessary evaluation configurations are defined in the [`test_config.py`](https://github.com/opencv/opencv/tree/5.x/samples/dnn/dnn_model_runner/dnn_conversion/common/test/configs/test_config.py):

```
@dataclass
class TestSegmConfig:
    frame_size: int = 500
    img_root_dir: str = "./VOC2012"
    img_dir: str = os.path.join(img_root_dir, "JPEGImages/")
    img_segm_gt_dir: str = os.path.join(img_root_dir, "SegmentationClass/")
    # reduced val: https://github.com/shelhamer/fcn.berkeleyvision.org/blob/master/data/pascal/seg11valid.txt
    segm_val_file: str = os.path.join(img_root_dir, "ImageSets/Segmentation/seg11valid.txt")
    colour_file_cls: str = os.path.join(img_root_dir, "ImageSets/Segmentation/pascal-classes.txt")
```

These values can be modified in accordance with chosen model pipeline.

#### Test Mode

The below line represents running of the module in the test mode, which provides the steps for the model inference:

```
python -m dnn_model_runner.dnn_conversion.tf.segmentation.py_to_py_segm --test True --default_img_preprocess <True/False> --evaluate False
```

Here `default_img_preprocess` key defines whether you'd like to parametrize the model test process with some particular values or use the default values, for example, `scale`, `mean` or `std`.

Test configuration is represented in [`test_config.py`](https://github.com/opencv/opencv/tree/5.x/samples/dnn/dnn_model_runner/dnn_conversion/common/test/configs/test_config.py) `TestSegmModuleConfig` class:

```
@dataclass
class TestSegmModuleConfig:
    segm_test_data_dir: str = "test_data/sem_segm"
    test_module_name: str = "segmentation"
    test_module_path: str = "segmentation.py"
    input_img: str = os.path.join(segm_test_data_dir, "2007_000033.jpg")
    model: str = ""

    frame_height: str = str(TestSegmConfig.frame_size)
    frame_width: str = str(TestSegmConfig.frame_size)
    scale: float = 1.0
    mean: List[float] = field(default_factory=lambda: [0.0, 0.0, 0.0])
    std: List[float] = field(default_factory=list)
    crop: bool = False
    rgb: bool = True
    classes: str = os.path.join(segm_test_data_dir, "pascal-classes.txt")
```

The default image preprocessing options are defined in `default_preprocess_config.py`:

```
tf_segm_input_blob = {
    "scale": str(1 / 127.5),
    "mean": ["127.5", "127.5", "127.5"],
    "std": [],
    "crop": "False",
    "rgb": "True"
}
```

The basis of the model testing is represented in `samples/dnn/segmentation.py`. `segmentation.py` can be executed autonomously with provided converted model in `--input` and populated parameters for `cv2.dnn.blobFromImage`.

To reproduce from scratch the described in "Model Conversion Pipeline" OpenCV steps with `dnn_model_runner` execute the below line:

```
python -m dnn_model_runner.dnn_conversion.tf.segmentation.py_to_py_segm --test True --default_img_preprocess True --evaluate False
```

## [High Level API: TextDetectionModel and TextRecognitionModel {#tutorial_dnn_text_spotting}](https://docharvest.github.io/docs/opencv5/tutorials/dnn/dnn_text_spotting/dnn_text_spotting/)


## [Dnn Yolo](https://docharvest.github.io/docs/opencv5/tutorials/dnn/dnn_yolo/dnn_yolo/)


## [Table Of Content Dnn](https://docharvest.github.io/docs/opencv5/tutorials/dnn/table_of_content_dnn/)

Contents

opencv5

Table Of Content Dnn

OpenCV 5

Table Of Content Dnn

# Deep Neural Networks (dnn module) {#tutorial\_table\_of\_content\_dnn}

@tableofcontents

-   @subpage tutorial\_dnn\_googlenet
-   @subpage tutorial\_dnn\_openvino
-   @subpage tutorial\_dnn\_yolo
-   @subpage tutorial\_dnn\_javascript
-   @subpage tutorial\_dnn\_custom\_layers
-   @subpage tutorial\_dnn\_OCR
-   @subpage tutorial\_dnn\_text\_spotting
-   @subpage tutorial\_dnn\_face

#### PyTorch models with OpenCV

In this section you will find the guides, which describe how to run classification, segmentation and detection PyTorch DNN models with OpenCV.

-   @subpage pytorch\_cls\_tutorial\_dnn\_conversion
-   @subpage pytorch\_cls\_c\_tutorial\_dnn\_conversion
-   @subpage pytorch\_segm\_tutorial\_dnn\_conversion

#### TensorFlow models with OpenCV

In this section you will find the guides, which describe how to run classification, segmentation and detection TensorFlow DNN models with OpenCV.

-   @subpage tf\_cls\_tutorial\_dnn\_conversion
-   @subpage tf\_det\_tutorial\_dnn\_conversion
-   @subpage tf\_segm\_tutorial\_dnn\_conversion

## [Akaze Matching](https://docharvest.github.io/docs/opencv5/tutorials/features/akaze_matching/akaze_matching/)

Contents

opencv5

Akaze Matching

OpenCV 5

Akaze Matching

# AKAZE local features matching {#tutorial\_akaze\_matching}

@tableofcontents

@prev\_tutorial{tutorial\_detection\_of\_planar\_objects} @next\_tutorial{tutorial\_akaze\_tracking}

Original author

Fedor Morozov

Compatibility

OpenCV >= 3.0

## Introduction

In this tutorial we will learn how to use AKAZE @cite ANB13 local features to detect and match keypoints on two images. We will find keypoints on a pair of images with given homography matrix, match them and count the number of inliers (i.e. matches that fit in the given homography).

You can find expanded version of this example here: [https://github.com/pablofdezalc/test\_kaze\_akaze\_opencv](https://github.com/pablofdezalc/test_kaze_akaze_opencv)

\\warning You need the [OpenCV contrib module _xfeatures2d_](https://github.com/opencv/opencv_contrib/tree/5.x/modules/xfeatures2d) to be able to use the AKAZE features.

## Data

We are going to use images 1 and 3 from _Graffiti_ sequence of [Oxford dataset](http://www.robots.ox.ac.uk/~vgg/data/data-aff.html).

Homography is given by a 3 by 3 matrix: @code{.none} 7.6285898e-01 -2.9922929e-01 2.2567123e+02 3.3443473e-01 1.0143901e+00 -7.6999973e+01 3.4663091e-04 -1.4364524e-05 1.0000000e+00 @endcode You can find the images (_graf1.png_, _graf3.png_) and homography (_H1to3p.xml_) in _opencv/samples/data/_.

### Source Code

@add\_toggle\_cpp

-   **Downloadable code**: Click [here](https://github.com/opencv/opencv/5.x/samples/cpp/tutorial_code/features/AKAZE_match.cpp)
    
-   **Code at glance:** @include samples/cpp/tutorial\_code/features/AKAZE\_match.cpp @end\_toggle
    

@add\_toggle\_java

-   **Downloadable code**: Click [here](https://github.com/opencv/opencv/5.x/samples/java/tutorial_code/features/akaze_matching/AKAZEMatchDemo.java)
    
-   **Code at glance:** @include samples/java/tutorial\_code/features/akaze\_matching/AKAZEMatchDemo.java @end\_toggle
    

@add\_toggle\_python

-   **Downloadable code**: Click [here](https://github.com/opencv/opencv/5.x/samples/python/tutorial_code/features/akaze_matching/AKAZE_match.py)
    
-   **Code at glance:** @include samples/python/tutorial\_code/features/akaze\_matching/AKAZE\_match.py @end\_toggle
    

### Explanation

-   **Load images and homography**

@add\_toggle\_cpp @snippet samples/cpp/tutorial\_code/features/AKAZE\_match.cpp load @end\_toggle

@add\_toggle\_java @snippet samples/java/tutorial\_code/features/akaze\_matching/AKAZEMatchDemo.java load @end\_toggle

@add\_toggle\_python @snippet samples/python/tutorial\_code/features/akaze\_matching/AKAZE\_match.py load @end\_toggle

We are loading grayscale images here. Homography is stored in the xml created with FileStorage.

-   **Detect keypoints and compute descriptors using AKAZE**

@add\_toggle\_cpp @snippet samples/cpp/tutorial\_code/features/AKAZE\_match.cpp AKAZE @end\_toggle

@add\_toggle\_java @snippet samples/java/tutorial\_code/features/akaze\_matching/AKAZEMatchDemo.java AKAZE @end\_toggle

@add\_toggle\_python @snippet samples/python/tutorial\_code/features/akaze\_matching/AKAZE\_match.py AKAZE @end\_toggle

We create AKAZE and detect and compute AKAZE keypoints and descriptors. Since we don't need the _mask_ parameter, _noArray()_ is used.

-   **Use brute-force matcher to find 2-nn matches**

@add\_toggle\_cpp @snippet samples/cpp/tutorial\_code/features/AKAZE\_match.cpp 2-nn matching @end\_toggle

@add\_toggle\_java @snippet samples/java/tutorial\_code/features/akaze\_matching/AKAZEMatchDemo.java 2-nn matching @end\_toggle

@add\_toggle\_python @snippet samples/python/tutorial\_code/features/akaze\_matching/AKAZE\_match.py 2-nn matching @end\_toggle

We use Hamming distance, because AKAZE uses binary descriptor by default.

-   **Use 2-nn matches and ratio criterion to find correct keypoint matches** @add\_toggle\_cpp @snippet samples/cpp/tutorial\_code/features/AKAZE\_match.cpp ratio test filtering @end\_toggle

@add\_toggle\_java @snippet samples/java/tutorial\_code/features/akaze\_matching/AKAZEMatchDemo.java ratio test filtering @end\_toggle

@add\_toggle\_python @snippet samples/python/tutorial\_code/features/akaze\_matching/AKAZE\_match.py ratio test filtering @end\_toggle

If the closest match distance is significantly lower than the second closest one, then the match is correct (match is not ambiguous).

-   **Check if our matches fit in the homography model**

@add\_toggle\_cpp @snippet samples/cpp/tutorial\_code/features/AKAZE\_match.cpp homography check @end\_toggle

@add\_toggle\_java @snippet samples/java/tutorial\_code/features/akaze\_matching/AKAZEMatchDemo.java homography check @end\_toggle

@add\_toggle\_python @snippet samples/python/tutorial\_code/features/akaze\_matching/AKAZE\_match.py homography check @end\_toggle

If the distance from first keypoint's projection to the second keypoint is less than threshold, then it fits the homography model.

We create a new set of matches for the inliers, because it is required by the drawing function.

-   **Output results**

@add\_toggle\_cpp @snippet samples/cpp/tutorial\_code/features/AKAZE\_match.cpp draw final matches @end\_toggle

@add\_toggle\_java @snippet samples/java/tutorial\_code/features/akaze\_matching/AKAZEMatchDemo.java draw final matches @end\_toggle

@add\_toggle\_python @snippet samples/python/tutorial\_code/features/akaze\_matching/AKAZE\_match.py draw final matches @end\_toggle

Here we save the resulting image and print some statistics.

## Results

### Found matches

Depending on your OpenCV version, you should get results coherent with:

@code{.none} Keypoints 1: 2943 Keypoints 2: 3511 Matches: 447 Inliers: 308 Inlier Ratio: 0.689038 @endcode

## [Akaze Tracking](https://docharvest.github.io/docs/opencv5/tutorials/features/akaze_tracking/akaze_tracking/)


## [Detection Of Planar Objects](https://docharvest.github.io/docs/opencv5/tutorials/features/detection_of_planar_objects/detection_of_planar_objects/)

Contents

opencv5

Detection Of Planar Objects

OpenCV 5

Detection Of Planar Objects

# Detection of planar objects {#tutorial\_detection\_of\_planar\_objects}

@tableofcontents

@prev\_tutorial{tutorial\_feature\_homography} @next\_tutorial{tutorial\_akaze\_matching}

Original author

Victor Eruhimov

Compatibility

OpenCV >= 3.0

The goal of this tutorial is to learn how to use _features_ and _calib3d_ modules for detecting known planar objects in scenes.

_Test data_: use images in your data folder, for instance, box.png and box\_in\_scene.png.

-   Create a new console project. Read two input images. :
    
    ```
    Mat img1 = imread(argv[1], IMREAD_GRAYSCALE);
    Mat img2 = imread(argv[2], IMREAD_GRAYSCALE);
    ```
    
-   Detect keypoints in both images and compute descriptors for each of the keypoints. :
    
    ```
    // detecting keypoints
    Ptr<Feature2D> surf = SURF::create();
    vector<KeyPoint> keypoints1;
    Mat descriptors1;
    surf->detectAndCompute(img1, Mat(), keypoints1, descriptors1);
    
    ... // do the same for the second image
    ```
    
-   Now, find the closest matches between descriptors from the first image to the second: :
    
    ```
    // matching descriptors
    BruteForceMatcher<L2<float> > matcher;
    vector<DMatch> matches;
    matcher.match(descriptors1, descriptors2, matches);
    ```
    
-   Visualize the results: :
    
    ```
    // drawing the results
    namedWindow("matches", 1);
    Mat img_matches;
    drawMatches(img1, keypoints1, img2, keypoints2, matches, img_matches);
    imshow("matches", img_matches);
    waitKey(0);
    ```
    
-   Find the homography transformation between two sets of points: :
    
    ```
    vector<Point2f> points1, points2;
    // fill the arrays with the points
    ....
    Mat H = findHomography(Mat(points1), Mat(points2), RANSAC, ransacReprojThreshold);
    ```
    
-   Create a set of inlier matches and draw them. Use perspectiveTransform function to map points with homography:
    
    Mat points1Projected; perspectiveTransform(Mat(points1), points1Projected, H);
    
-   Use drawMatches for drawing inliers.

## [Feature Description](https://docharvest.github.io/docs/opencv5/tutorials/features/feature_description/feature_description/)

Contents

opencv5

Feature Description

OpenCV 5

Feature Description

# Feature Description {#tutorial\_feature\_description}

@tableofcontents

@prev\_tutorial{tutorial\_feature\_detection} @next\_tutorial{tutorial\_feature\_flann\_matcher}

Original author

Ana Huamán

Compatibility

OpenCV >= 3.0

## Goal

In this tutorial you will learn how to:

-   Use the @ref cv::DescriptorExtractor interface in order to find the feature vector correspondent to the keypoints. Specifically:
    -   Use cv::xfeatures2d::SURF and its function cv::xfeatures2d::SURF::compute to perform the required calculations.
    -   Use a @ref cv::DescriptorMatcher to match the features vector
    -   Use the function @ref cv::drawMatches to draw the detected matches.

\\warning You need the [OpenCV contrib modules](https://github.com/opencv/opencv_contrib) to be able to use the SURF features (alternatives are ORB, KAZE, ... features).

## Theory

## Code

@add\_toggle\_cpp This tutorial code's is shown lines below. You can also download it from [here](https://github.com/opencv/opencv/tree/5.x/samples/cpp/tutorial_code/features/feature_description/SURF_matching_Demo.cpp) @include samples/cpp/tutorial\_code/features/feature\_description/SURF\_matching\_Demo.cpp @end\_toggle

@add\_toggle\_java This tutorial code's is shown lines below. You can also download it from [here](https://github.com/opencv/opencv/tree/5.x/samples/java/tutorial_code/features/feature_description/SURFMatchingDemo.java) @include samples/java/tutorial\_code/features/feature\_description/SURFMatchingDemo.java @end\_toggle

@add\_toggle\_python This tutorial code's is shown lines below. You can also download it from [here](https://github.com/opencv/opencv/tree/5.x/samples/python/tutorial_code/features/feature_description/SURF_matching_Demo.py) @include samples/python/tutorial\_code/features/feature\_description/SURF\_matching\_Demo.py @end\_toggle

## Explanation

## Result

Here is the result after applying the BruteForce matcher between the two original images:

## [Feature Detection](https://docharvest.github.io/docs/opencv5/tutorials/features/feature_detection/feature_detection/)

Contents

opencv5

Feature Detection

OpenCV 5

Feature Detection

# Feature Detection {#tutorial\_feature\_detection}

@tableofcontents

@prev\_tutorial{tutorial\_corner\_subpixels} @next\_tutorial{tutorial\_feature\_description}

Original author

Ana Huamán

Compatibility

OpenCV >= 3.0

## Goal

In this tutorial you will learn how to:

-   Use the @ref cv::FeatureDetector interface in order to find interest points. Specifically:
    -   Use the cv::xfeatures2d::SURF and its function cv::xfeatures2d::SURF::detect to perform the detection process
    -   Use the function @ref cv::drawKeypoints to draw the detected keypoints

\\warning You need the [OpenCV contrib modules](https://github.com/opencv/opencv_contrib) to be able to use the SURF features (alternatives are ORB, KAZE, ... features).

## Theory

## Code

@add\_toggle\_cpp This tutorial code's is shown lines below. You can also download it from [here](https://github.com/opencv/opencv/tree/5.x/samples/cpp/tutorial_code/features/feature_detection/SURF_detection_Demo.cpp) @include samples/cpp/tutorial\_code/features/feature\_detection/SURF\_detection\_Demo.cpp @end\_toggle

@add\_toggle\_java This tutorial code's is shown lines below. You can also download it from [here](https://github.com/opencv/opencv/tree/5.x/samples/java/tutorial_code/features/feature_detection/SURFDetectionDemo.java) @include samples/java/tutorial\_code/features/feature\_detection/SURFDetectionDemo.java @end\_toggle

@add\_toggle\_python This tutorial code's is shown lines below. You can also download it from [here](https://github.com/opencv/opencv/tree/5.x/samples/python/tutorial_code/features/feature_detection/SURF_detection_Demo.py) @include samples/python/tutorial\_code/features/feature\_detection/SURF\_detection\_Demo.py @end\_toggle

## Explanation

## Result

\-# Here is the result of the feature detection applied to the `box.png` image:

```
![](images/Feature_Detection_Result_a.jpg)
```

\-# And here is the result for the `box_in_scene.png` image:

```
![](images/Feature_Detection_Result_b.jpg)
```

## [Feature Flann Matcher](https://docharvest.github.io/docs/opencv5/tutorials/features/feature_flann_matcher/feature_flann_matcher/)

Contents

opencv5

Feature Flann Matcher

OpenCV 5

Feature Flann Matcher

# Feature Matching with FLANN {#tutorial\_feature\_flann\_matcher}

@tableofcontents

@prev\_tutorial{tutorial\_feature\_description} @next\_tutorial{tutorial\_feature\_homography}

Original author

Ana Huamán

Compatibility

OpenCV >= 3.0

## Goal

In this tutorial you will learn how to:

-   Use the @ref cv::FlannBasedMatcher interface in order to perform a quick and efficient matching by using the @ref flann module

\\warning You need the [OpenCV contrib modules](https://github.com/opencv/opencv_contrib) to be able to use the SURF features (alternatives are ORB, KAZE, ... features).

## Theory

Classical feature descriptors (SIFT, SURF, ...) are usually compared and matched using the Euclidean distance (or L2-norm). Since SIFT and SURF descriptors represent the histogram of oriented gradient (of the Haar wavelet response for SURF) in a neighborhood, alternatives of the Euclidean distance are histogram-based metrics (\\f$ \\chi^{2} \\f$, Earth Mover’s Distance (EMD), ...).

Arandjelovic et al. proposed in @cite Arandjelovic:2012:TTE:2354409.2355123 to extend to the RootSIFT descriptor:

> a square root (Hellinger) kernel instead of the standard Euclidean distance to measure the similarity between SIFT descriptors leads to a dramatic performance boost in all stages of the pipeline.

Binary descriptors (ORB, BRISK, ...) are matched using the [Hamming distance](https://en.wikipedia.org/wiki/Hamming_distance). This distance is equivalent to count the number of different elements for binary strings (population count after applying a XOR operation): \\f\[ d\_{hamming} \\left ( a,b \\right ) = \\sum\_{i=0}^{n-1} \\left ( a\_i \\oplus b\_i \\right ) \\f\]

To filter the matches, Lowe proposed in @cite Lowe04 to use a distance ratio test to try to eliminate false matches. The distance ratio between the two nearest matches of a considered keypoint is computed and it is a good match when this value is below a threshold. Indeed, this ratio allows helping to discriminate between ambiguous matches (distance ratio between the two nearest neighbors is close to one) and well discriminated matches. The figure below from the SIFT paper illustrates the probability that a match is correct based on the nearest-neighbor distance ratio test.

Alternative or additional filterering tests are:

-   cross check test (good match \\f$ \\left( f\_a, f\_b \\right) \\f$ if feature \\f$ f\_b \\f$ is the best match for \\f$ f\_a \\f$ in \\f$ I\_b \\f$ and feature \\f$ f\_a \\f$ is the best match for \\f$ f\_b \\f$ in \\f$ I\_a \\f$)
-   geometric test (eliminate matches that do not fit to a geometric model, e.g. RANSAC or robust homography for planar objects)

## Code

@add\_toggle\_cpp This tutorial code's is shown lines below. You can also download it from [here](https://github.com/opencv/opencv/tree/5.x/samples/cpp/tutorial_code/features/feature_flann_matcher/SURF_FLANN_matching_Demo.cpp) @include samples/cpp/tutorial\_code/features/feature\_flann\_matcher/SURF\_FLANN\_matching\_Demo.cpp @end\_toggle

@add\_toggle\_java This tutorial code's is shown lines below. You can also download it from [here](https://github.com/opencv/opencv/tree/5.x/samples/java/tutorial_code/features/feature_flann_matcher/SURFFLANNMatchingDemo.java) @include samples/java/tutorial\_code/features/feature\_flann\_matcher/SURFFLANNMatchingDemo.java @end\_toggle

@add\_toggle\_python This tutorial code's is shown lines below. You can also download it from [here](https://github.com/opencv/opencv/tree/5.x/samples/python/tutorial_code/features/feature_flann_matcher/SURF_FLANN_matching_Demo.py) @include samples/python/tutorial\_code/features/feature\_flann\_matcher/SURF\_FLANN\_matching\_Demo.py @end\_toggle

## Explanation

## Result

-   Here is the result of the SURF feature matching using the distance ratio test:

## [Feature Homography](https://docharvest.github.io/docs/opencv5/tutorials/features/feature_homography/feature_homography/)

Contents

opencv5

Feature Homography

OpenCV 5

Feature Homography

# Features + Homography to find a known object {#tutorial\_feature\_homography}

@tableofcontents

@prev\_tutorial{tutorial\_feature\_flann\_matcher} @next\_tutorial{tutorial\_detection\_of\_planar\_objects}

Original author

Ana Huamán

Compatibility

OpenCV >= 3.0

## Goal

In this tutorial you will learn how to:

-   Use the function @ref cv::findHomography to find the transform between matched keypoints.
-   Use the function @ref cv::perspectiveTransform to map the points.

\\warning You need the [OpenCV contrib modules](https://github.com/opencv/opencv_contrib) to be able to use the SURF features (alternatives are ORB, KAZE, ... features).

## Theory

## Code

@add\_toggle\_cpp This tutorial code's is shown lines below. You can also download it from [here](https://github.com/opencv/opencv/tree/5.x/samples/cpp/tutorial_code/features/feature_homography/SURF_FLANN_matching_homography_Demo.cpp) @include samples/cpp/tutorial\_code/features/feature\_homography/SURF\_FLANN\_matching\_homography\_Demo.cpp @end\_toggle

@add\_toggle\_java This tutorial code's is shown lines below. You can also download it from [here](https://github.com/opencv/opencv/tree/5.x/samples/java/tutorial_code/features/feature_homography/SURFFLANNMatchingHomographyDemo.java) @include samples/java/tutorial\_code/features/feature\_homography/SURFFLANNMatchingHomographyDemo.java @end\_toggle

@add\_toggle\_python This tutorial code's is shown lines below. You can also download it from [here](https://github.com/opencv/opencv/tree/5.x/samples/python/tutorial_code/features/feature_homography/SURF_FLANN_matching_homography_Demo.py) @include samples/python/tutorial\_code/features/feature\_homography/SURF\_FLANN\_matching\_homography\_Demo.py @end\_toggle

## Explanation

## Result

-   And here is the result for the detected object (highlighted in green). Note that since the homography is estimated with a RANSAC approach, detected false matches will not impact the homography calculation.

## [Homography](https://docharvest.github.io/docs/opencv5/tutorials/features/homography/homography/)

Contents

opencv5

Homography

OpenCV 5

Homography

# Basic concepts of the homography explained with code {#tutorial\_homography}

@tableofcontents

@prev\_tutorial{tutorial\_akaze\_tracking}

Compatibility

OpenCV >= 3.0

@tableofcontents

# Introduction {#tutorial\_homography\_Introduction}

This tutorial will demonstrate the basic concepts of the homography with some codes. For detailed explanations about the theory, please refer to a computer vision course or a computer vision book, e.g.:

-   Multiple View Geometry in Computer Vision, Richard Hartley and Andrew Zisserman, @cite HartleyZ00 (some sample chapters are available [here](https://www.robots.ox.ac.uk/~vgg/hzbook/), CVPR Tutorials are available [here](https://www.robots.ox.ac.uk/~az/tutorials/))
-   An Invitation to 3-D Vision: From Images to Geometric Models, Yi Ma, Stefano Soatto, Jana Kosecka, and S. Shankar Sastry, @cite Ma:2003:IVI (a computer vision book handout is available [here](https://cs.gmu.edu/%7Ekosecka/cs685/VisionBookHandout.pdf))
-   Computer Vision: Algorithms and Applications, Richard Szeliski, @cite RS10 (an electronic version is available [here](https://szeliski.org/Book/))
-   Deeper understanding of the homography decomposition for vision-based control, Ezio Malis, Manuel Vargas, @cite Malis2007 (open access [here](https://hal.inria.fr/inria-00174036))
-   Pose Estimation for Augmented Reality: A Hands-On Survey, Eric Marchand, Hideaki Uchiyama, Fabien Spindler, @cite Marchand16 (open access [here](https://hal.inria.fr/hal-01246370))

The tutorial code can be found here [C++](https://github.com/opencv/opencv/tree/5.x/samples/cpp/tutorial_code/features/Homography), [Python](https://github.com/opencv/opencv/tree/5.x/samples/python/tutorial_code/features/Homography), [Java](https://github.com/opencv/opencv/tree/5.x/samples/java/tutorial_code/features/Homography). The images used in this tutorial can be found [here](https://github.com/opencv/opencv/tree/5.x/samples/data) (`left*.jpg`).

## Basic theory {#tutorial\_homography\_Basic\_theory}

### What is the homography matrix? {#tutorial\_homography\_What\_is\_the\_homography\_matrix}

Briefly, the planar homography relates the transformation between two planes (up to a scale factor):

\\f\[ s \\begin{bmatrix} x^{'} \\ y^{'} \\ 1 \\end{bmatrix} = \\mathbf{H} \\begin{bmatrix} x \\ y \\ 1 \\end{bmatrix} = \\begin{bmatrix} h\_{11} & h\_{12} & h\_{13} \\ h\_{21} & h\_{22} & h\_{23} \\ h\_{31} & h\_{32} & h\_{33} \\end{bmatrix} \\begin{bmatrix} x \\ y \\ 1 \\end{bmatrix} \\f\]

The homography matrix is a `3x3` matrix but with 8 DoF (degrees of freedom) as it is estimated up to a scale. It is generally normalized (see also \\ref lecture\_16 "1") with \\f$ h\_{33} = 1 \\f$ or \\f$ h\_{11}^2 + h\_{12}^2 + h\_{13}^2 + h\_{21}^2 + h\_{22}^2 + h\_{23}^2 + h\_{31}^2 + h\_{32}^2 + h\_{33}^2 = 1 \\f$.

The following examples show different kinds of transformation but all relate a transformation between two planes.

-   a planar surface and the image plane (image taken from \\ref projective\_transformations "2")

-   a planar surface viewed by two camera positions (images taken from \\ref szeliski "3" and \\ref projective\_transformations "2")

-   a rotating camera around its axis of projection, equivalent to consider that the points are on a plane at infinity (image taken from \\ref projective\_transformations "2")

### How the homography transformation can be useful? {#tutorial\_homography\_How\_the\_homography\_transformation\_can\_be\_useful}

-   Camera pose estimation from coplanar points for augmented reality with marker for instance (see the previous first example)

-   Perspective removal / correction (see the previous second example)

-   Panorama stitching (see the previous second and third example)

## Demonstration codes {#tutorial\_homography\_Demonstration\_codes}

### Demo 1: Pose estimation from coplanar points {#tutorial\_homography\_Demo1}

\\note Please note that the code to estimate the camera pose from the homography is an example and you should use instead @ref cv::solvePnP if you want to estimate the camera pose for a planar or an arbitrary object.

The homography can be estimated using for instance the Direct Linear Transform (DLT) algorithm (see \\ref lecture\_16 "1" for more information). As the object is planar, the transformation between points expressed in the object frame and projected points into the image plane expressed in the normalized camera frame is a homography. Only because the object is planar, the camera pose can be retrieved from the homography, assuming the camera intrinsic parameters are known (see \\ref projective\_transformations "2" or \\ref answer\_dsp "4"). This can be tested easily using a chessboard object and `findChessboardCorners()` to get the corner locations in the image.

The first thing consists to detect the chessboard corners, the chessboard size (`patternSize`), here `9x6`, is required:

@snippet pose\_from\_homography.cpp find-chessboard-corners

The object points expressed in the object frame can be computed easily knowing the size of a chessboard square:

@snippet pose\_from\_homography.cpp compute-chessboard-object-points

The coordinate `Z=0` must be removed for the homography estimation part:

@snippet pose\_from\_homography.cpp compute-object-points

The image points expressed in the normalized camera can be computed from the corner points and by applying a reverse perspective transformation using the camera intrinsics and the distortion coefficients:

@snippet pose\_from\_homography.cpp load-intrinsics

@snippet pose\_from\_homography.cpp compute-image-points

The homography can then be estimated with:

@snippet pose\_from\_homography.cpp estimate-homography

A quick solution to retrieve the pose from the homography matrix is (see \\ref pose\_ar "5"):

@snippet pose\_from\_homography.cpp pose-from-homography

\\f\[ \\begin{align\*} \\mathbf{X} &= \\left( X, Y, 0, 1 \\right ) \\ \\mathbf{x} &= \\mathbf{P}\\mathbf{X} \\ &= \\mathbf{K} \\left\[ \\mathbf{r\_1} \\hspace{0.5em} \\mathbf{r\_2} \\hspace{0.5em} \\mathbf{r\_3} \\hspace{0.5em} \\mathbf{t} \\right \] \\begin{pmatrix} X \\ Y \\ 0 \\ 1 \\end{pmatrix} \\ &= \\mathbf{K} \\left\[ \\mathbf{r\_1} \\hspace{0.5em} \\mathbf{r\_2} \\hspace{0.5em} \\mathbf{t} \\right \] \\begin{pmatrix} X \\ Y \\ 1 \\end{pmatrix} \\ &= \\mathbf{H} \\begin{pmatrix} X \\ Y \\ 1 \\end{pmatrix} \\end{align\*} \\f\]

\\f\[ \\begin{align\*} \\mathbf{H} &= \\lambda \\mathbf{K} \\left\[ \\mathbf{r\_1} \\hspace{0.5em} \\mathbf{r\_2} \\hspace{0.5em} \\mathbf{t} \\right \] \\ \\mathbf{K}^{-1} \\mathbf{H} &= \\lambda \\left\[ \\mathbf{r\_1} \\hspace{0.5em} \\mathbf{r\_2} \\hspace{0.5em} \\mathbf{t} \\right \] \\ \\mathbf{P} &= \\mathbf{K} \\left\[ \\mathbf{r\_1} \\hspace{0.5em} \\mathbf{r\_2} \\hspace{0.5em} \\left( \\mathbf{r\_1} \\times \\mathbf{r\_2} \\right ) \\hspace{0.5em} \\mathbf{t} \\right \] \\end{align\*} \\f\]

This is a quick solution (see also \\ref projective\_transformations "2") as this does not ensure that the resulting rotation matrix will be orthogonal and the scale is estimated roughly by normalize the first column to 1.

A solution to have a proper rotation matrix (with the properties of a rotation matrix) consists to apply a polar decomposition, or orthogonalization of the rotation matrix (see \\ref polar\_decomposition "6" or \\ref polar\_decomposition\_svd "7" or \\ref polar\_decomposition\_svd\_2 "8" or \\ref Kabsch\_algorithm "9" for some information):

@snippet pose\_from\_homography.cpp polar-decomposition-of-the-rotation-matrix

To check the result, the object frame projected into the image with the estimated camera pose is displayed:

### Demo 2: Perspective correction {#tutorial\_homography\_Demo2}

In this example, a source image will be transformed into a desired perspective view by computing the homography that maps the source points into the desired points. The following image shows the source image (left) and the chessboard view that we want to transform into the desired chessboard view (right).

The first step consists to detect the chessboard corners in the source and desired images:

@add\_toggle\_cpp @snippet perspective\_correction.cpp find-corners @end\_toggle

@add\_toggle\_python @snippet samples/python/tutorial\_code/features/Homography/perspective\_correction.py find-corners @end\_toggle

@add\_toggle\_java @snippet samples/java/tutorial\_code/features/Homography/PerspectiveCorrection.java find-corners @end\_toggle

The homography is estimated easily with:

@add\_toggle\_cpp @snippet perspective\_correction.cpp estimate-homography @end\_toggle

@add\_toggle\_python @snippet samples/python/tutorial\_code/features/Homography/perspective\_correction.py estimate-homography @end\_toggle

@add\_toggle\_java @snippet samples/java/tutorial\_code/features/Homography/PerspectiveCorrection.java estimate-homography @end\_toggle

To warp the source chessboard view into the desired chessboard view, we use @ref cv::warpPerspective

@add\_toggle\_cpp @snippet perspective\_correction.cpp warp-chessboard @end\_toggle

@add\_toggle\_python @snippet samples/python/tutorial\_code/features/Homography/perspective\_correction.py warp-chessboard @end\_toggle

@add\_toggle\_java @snippet samples/java/tutorial\_code/features/Homography/PerspectiveCorrection.java warp-chessboard @end\_toggle

The result image is:

To compute the coordinates of the source corners transformed by the homography:

@add\_toggle\_cpp @snippet perspective\_correction.cpp compute-transformed-corners @end\_toggle

@add\_toggle\_python @snippet samples/python/tutorial\_code/features/Homography/perspective\_correction.py compute-transformed-corners @end\_toggle

@add\_toggle\_java @snippet samples/java/tutorial\_code/features/Homography/PerspectiveCorrection.java compute-transformed-corners @end\_toggle

To check the correctness of the calculation, the matching lines are displayed:

### Demo 3: Homography from the camera displacement {#tutorial\_homography\_Demo3}

The homography relates the transformation between two planes and it is possible to retrieve the corresponding camera displacement that allows to go from the first to the second plane view (see @cite Malis2007 for more information). Before going into the details that allow to compute the homography from the camera displacement, some recalls about camera pose and homogeneous transformation.

The function @ref cv::solvePnP allows to compute the camera pose from the correspondences 3D object points (points expressed in the object frame) and the projected 2D image points (object points viewed in the image). The intrinsic parameters and the distortion coefficients are required (see the camera calibration process).

\\f\[ \\begin{align\*} s \\begin{bmatrix} u \\ v \\ 1 \\end{bmatrix} &= \\begin{bmatrix} f\_x & 0 & c\_x \\ 0 & f\_y & c\_y \\ 0 & 0 & 1 \\end{bmatrix} \\begin{bmatrix} r\_{11} & r\_{12} & r\_{13} & t\_x \\ r\_{21} & r\_{22} & r\_{23} & t\_y \\ r\_{31} & r\_{32} & r\_{33} & t\_z \\end{bmatrix} \\begin{bmatrix} X\_o \\ Y\_o \\ Z\_o \\ 1 \\end{bmatrix} \\ &= \\mathbf{K} \\hspace{0.2em} ^{c}\\mathbf{M}\_o \\begin{bmatrix} X\_o \\ Y\_o \\ Z\_o \\ 1 \\end{bmatrix} \\end{align\*} \\f\]

\\f$ \\mathbf{K} \\f$ is the intrinsic matrix and \\f$ ^{c}\\mathbf{M}\_o \\f$ is the camera pose. The output of @ref cv::solvePnP is exactly this: `rvec` is the Rodrigues rotation vector and `tvec` the translation vector.

\\f$ ^{c}\\mathbf{M}\_o \\f$ can be represented in a homogeneous form and allows to transform a point expressed in the object frame into the camera frame:

\\f\[ \\begin{align\*} \\begin{bmatrix} X\_c \\ Y\_c \\ Z\_c \\ 1 \\end{bmatrix} &= \\hspace{0.2em} ^{c}\\mathbf{M}_o \\begin{bmatrix} X\_o \\ Y\_o \\ Z\_o \\ 1 \\end{bmatrix} \\ &= \\begin{bmatrix} ^{c}\\mathbf{R}_o & ^{c}\\mathbf{t}_o \\ 0_{1\\times3} & 1 \\end{bmatrix} \\begin{bmatrix} X\_o \\ Y\_o \\ Z\_o \\ 1 \\end{bmatrix} \\ &= \\begin{bmatrix} r_{11} & r_{12} & r\_{13} & t\_x \\ r\_{21} & r\_{22} & r\_{23} & t\_y \\ r\_{31} & r\_{32} & r\_{33} & t\_z \\ 0 & 0 & 0 & 1 \\end{bmatrix} \\begin{bmatrix} X\_o \\ Y\_o \\ Z\_o \\ 1 \\end{bmatrix} \\end{align\*} \\f\]

Transform a point expressed in one frame to another frame can be easily done with matrix multiplication:

-   \\f$ ^{c\_1}\\mathbf{M}\_o \\f$ is the camera pose for the camera 1
-   \\f$ ^{c\_2}\\mathbf{M}\_o \\f$ is the camera pose for the camera 2

To transform a 3D point expressed in the camera 1 frame to the camera 2 frame:

\\f\[ ^{c\_2}\\mathbf{M}_{c\_1} = \\hspace{0.2em} ^{c\_2}\\mathbf{M}_{o} \\cdot \\hspace{0.1em} ^{o}\\mathbf{M}_{c\_1} = \\hspace{0.2em} ^{c\_2}\\mathbf{M}_{o} \\cdot \\hspace{0.1em} \\left( ^{c\_1}\\mathbf{M}_{o} \\right )^{-1} = \\begin{bmatrix} ^{c\_2}\\mathbf{R}_{o} & ^{c\_2}\\mathbf{t}_{o} \\ 0_{3 \\times 1} & 1 \\end{bmatrix} \\cdot \\begin{bmatrix} ^{c\_1}\\mathbf{R}_{o}^T & - \\hspace{0.2em} ^{c\_1}\\mathbf{R}_{o}^T \\cdot \\hspace{0.2em} ^{c\_1}\\mathbf{t}_{o} \\ 0_{1 \\times 3} & 1 \\end{bmatrix} \\f\]

In this example, we will compute the camera displacement between two camera poses with respect to the chessboard object. The first step consists to compute the camera poses for the two images:

@snippet homography\_from\_camera\_displacement.cpp compute-poses

The camera displacement can be computed from the camera poses using the formulas above:

@snippet homography\_from\_camera\_displacement.cpp compute-c2Mc1

The homography related to a specific plane computed from the camera displacement is:

On this figure, `n` is the normal vector of the plane and `d` the distance between the camera frame and the plane along the plane normal. The [equation](https://en.wikipedia.org/wiki/Homography_\(computer_vision\)#3D_plane_to_plane_equation) to compute the homography from the camera displacement is:

\\f\[ ^{2}\\mathbf{H}_{1} = \\hspace{0.2em} ^{2}\\mathbf{R}_{1} - \\hspace{0.1em} \\frac{^{2}\\mathbf{t}\_{1} \\cdot \\hspace{0.1em} ^{1}\\mathbf{n}^\\top}{^1d} \\f\]

Where \\f$ ^{2}\\mathbf{H}_{1} \\f$ is the homography matrix that maps the points in the first camera frame to the corresponding points in the second camera frame, \\f$ ^{2}\\mathbf{R}_{1} = \\hspace{0.2em} ^{c\_2}\\mathbf{R}_{o} \\cdot \\hspace{0.1em} ^{c\_1}\\mathbf{R}_{o}^{\\top} \\f$ is the rotation matrix that represents the rotation between the two camera frames and \\f$ ^{2}\\mathbf{t}_{1} = \\hspace{0.2em} ^{c\_2}\\mathbf{R}_{o} \\cdot \\left( - \\hspace{0.1em} ^{c\_1}\\mathbf{R}_{o}^{\\top} \\cdot \\hspace{0.1em} ^{c\_1}\\mathbf{t}_{o} \\right ) + \\hspace{0.1em} ^{c\_2}\\mathbf{t}\_{o} \\f$ the translation vector between the two camera frames.

Here the normal vector `n` is the plane normal expressed in the camera frame 1 and can be computed as the cross product of 2 vectors (using 3 non collinear points that lie on the plane) or in our case directly with:

@snippet homography\_from\_camera\_displacement.cpp compute-plane-normal-at-camera-pose-1

The distance `d` can be computed as the dot product between the plane normal and a point on the plane or by computing the [plane equation](http://mathworld.wolfram.com/Plane.html) and using the D coefficient:

@snippet homography\_from\_camera\_displacement.cpp compute-plane-distance-to-the-camera-frame-1

The projective homography matrix \\f$ \\textbf{G} \\f$ can be computed from the Euclidean homography \\f$ \\textbf{H} \\f$ using the intrinsic matrix \\f$ \\textbf{K} \\f$ (see @cite Malis2007), here assuming the same camera between the two plane views:

\\f\[ \\textbf{G} = \\gamma \\textbf{K} \\textbf{H} \\textbf{K}^{-1} \\f\]

@snippet homography\_from\_camera\_displacement.cpp compute-homography

In our case, the Z-axis of the chessboard goes inside the object whereas in the homography figure it goes outside. This is just a matter of sign:

\\f\[ ^{2}\\mathbf{H}_{1} = \\hspace{0.2em} ^{2}\\mathbf{R}_{1} + \\hspace{0.1em} \\frac{^{2}\\mathbf{t}\_{1} \\cdot \\hspace{0.1em} ^{1}\\mathbf{n}^\\top}{^1d} \\f\]

@snippet homography\_from\_camera\_displacement.cpp compute-homography-from-camera-displacement

We will now compare the projective homography computed from the camera displacement with the one estimated with @ref cv::findHomography

```
findHomography H:
[0.32903393332201, -1.244138808862929, 536.4769088231476;
 0.6969763913334046, -0.08935909072571542, -80.34068504082403;
 0.00040511729592961, -0.001079740100565013, 0.9999999999999999]

homography from camera displacement:
[0.4160569997384721, -1.306889006892538, 553.7055461075881;
 0.7917584252773352, -0.06341244158456338, -108.2770029401219;
 0.0005926357240956578, -0.001020651672127799, 1]
```

The homography matrices are similar. If we compare the image 1 warped using both homography matrices:

Visually, it is hard to distinguish a difference between the result image from the homography computed from the camera displacement and the one estimated with @ref cv::findHomography function.

#### Exercise

This demo shows you how to compute the homography transformation from two camera poses. Try to perform the same operations, but by computing N inter homography this time. Instead of computing one homography to directly warp the source image to the desired camera viewpoint, perform N warping operations to see the different transformations operating.

You should get something similar to the following:

### Demo 4: Decompose the homography matrix {#tutorial\_homography\_Demo4}

OpenCV 3 contains the function @ref cv::decomposeHomographyMat which allows to decompose the homography matrix to a set of rotations, translations and plane normals. First we will decompose the homography matrix computed from the camera displacement:

@snippet decompose\_homography.cpp compute-homography-from-camera-displacement

The results of @ref cv::decomposeHomographyMat are:

@snippet decompose\_homography.cpp decompose-homography-from-camera-displacement

```
Solution 0:
rvec from homography decomposition: [-0.0919829920641369, -0.5372581036567992, 1.310868863540717]
rvec from camera displacement: [-0.09198299206413783, -0.5372581036567995, 1.310868863540717]
tvec from homography decomposition: [-0.7747961019053186, -0.02751124463434032, -0.6791980037590677] and scaled by d: [-0.1578091561210742, -0.005603443652993778, -0.1383378976078466]
tvec from camera displacement: [0.1578091561210745, 0.005603443652993617, 0.1383378976078466]
plane normal from homography decomposition: [-0.1973513139420648, 0.6283451996579074, -0.7524857267431757]
plane normal at camera 1 pose: [0.1973513139420654, -0.6283451996579068, 0.752485726743176]

Solution 1:
rvec from homography decomposition: [-0.0919829920641369, -0.5372581036567992, 1.310868863540717]
rvec from camera displacement: [-0.09198299206413783, -0.5372581036567995, 1.310868863540717]
tvec from homography decomposition: [0.7747961019053186, 0.02751124463434032, 0.6791980037590677] and scaled by d: [0.1578091561210742, 0.005603443652993778, 0.1383378976078466]
tvec from camera displacement: [0.1578091561210745, 0.005603443652993617, 0.1383378976078466]
plane normal from homography decomposition: [0.1973513139420648, -0.6283451996579074, 0.7524857267431757]
plane normal at camera 1 pose: [0.1973513139420654, -0.6283451996579068, 0.752485726743176]

Solution 2:
rvec from homography decomposition: [0.1053487907109967, -0.1561929144786397, 1.401356552358475]
rvec from camera displacement: [-0.09198299206413783, -0.5372581036567995, 1.310868863540717]
tvec from homography decomposition: [-0.4666552552894618, 0.1050032934770042, -0.913007654671646] and scaled by d: [-0.0950475510338766, 0.02138689274867372, -0.1859598508065552]
tvec from camera displacement: [0.1578091561210745, 0.005603443652993617, 0.1383378976078466]
plane normal from homography decomposition: [-0.3131715472900788, 0.8421206145721947, -0.4390403768225507]
plane normal at camera 1 pose: [0.1973513139420654, -0.6283451996579068, 0.752485726743176]

Solution 3:
rvec from homography decomposition: [0.1053487907109967, -0.1561929144786397, 1.401356552358475]
rvec from camera displacement: [-0.09198299206413783, -0.5372581036567995, 1.310868863540717]
tvec from homography decomposition: [0.4666552552894618, -0.1050032934770042, 0.913007654671646] and scaled by d: [0.0950475510338766, -0.02138689274867372, 0.1859598508065552]
tvec from camera displacement: [0.1578091561210745, 0.005603443652993617, 0.1383378976078466]
plane normal from homography decomposition: [0.3131715472900788, -0.8421206145721947, 0.4390403768225507]
plane normal at camera 1 pose: [0.1973513139420654, -0.6283451996579068, 0.752485726743176]
```

The result of the decomposition of the homography matrix can only be recovered up to a scale factor that corresponds in fact to the distance `d` as the normal is unit length. As you can see, there is one solution that matches almost perfectly with the computed camera displacement. As stated in the documentation:

```
At least two of the solutions may further be invalidated if point correspondences are available by applying positive depth constraint (all points must be in front of the camera).
```

As the result of the decomposition is a camera displacement, if we have the initial camera pose \\f$ ^{c\_1}\\mathbf{M}_{o} \\f$, we can compute the current camera pose \\f$ ^{c\_2}\\mathbf{M}_{o} = \\hspace{0.2em} ^{c\_2}\\mathbf{M}_{c\_1} \\cdot \\hspace{0.1em} ^{c\_1}\\mathbf{M}_{o} \\f$ and test if the 3D object points that belong to the plane are projected in front of the camera or not. Another solution could be to retain the solution with the closest normal if we know the plane normal expressed at the camera 1 pose.

The same thing but with the homography matrix estimated with @ref cv::findHomography

```
Solution 0:
rvec from homography decomposition: [0.1552207729599141, -0.152132696119647, 1.323678695078694]
rvec from camera displacement: [-0.09198299206413783, -0.5372581036567995, 1.310868863540717]
tvec from homography decomposition: [-0.4482361704818117, 0.02485247635491922, -1.034409687207331] and scaled by d: [-0.09129598307571339, 0.005061910238634657, -0.2106868109173855]
tvec from camera displacement: [0.1578091561210745, 0.005603443652993617, 0.1383378976078466]
plane normal from homography decomposition: [-0.1384902722707529, 0.9063331452766947, -0.3992250922214516]
plane normal at camera 1 pose: [0.1973513139420654, -0.6283451996579068, 0.752485726743176]

Solution 1:
rvec from homography decomposition: [0.1552207729599141, -0.152132696119647, 1.323678695078694]
rvec from camera displacement: [-0.09198299206413783, -0.5372581036567995, 1.310868863540717]
tvec from homography decomposition: [0.4482361704818117, -0.02485247635491922, 1.034409687207331] and scaled by d: [0.09129598307571339, -0.005061910238634657, 0.2106868109173855]
tvec from camera displacement: [0.1578091561210745, 0.005603443652993617, 0.1383378976078466]
plane normal from homography decomposition: [0.1384902722707529, -0.9063331452766947, 0.3992250922214516]
plane normal at camera 1 pose: [0.1973513139420654, -0.6283451996579068, 0.752485726743176]

Solution 2:
rvec from homography decomposition: [-0.2886605671759886, -0.521049903923871, 1.381242030882511]
rvec from camera displacement: [-0.09198299206413783, -0.5372581036567995, 1.310868863540717]
tvec from homography decomposition: [-0.8705961357284295, 0.1353018038908477, -0.7037702049789747] and scaled by d: [-0.177321544550518, 0.02755804196893467, -0.1433427218822783]
tvec from camera displacement: [0.1578091561210745, 0.005603443652993617, 0.1383378976078466]
plane normal from homography decomposition: [-0.2284582117722427, 0.6009247303964522, -0.7659610393954643]
plane normal at camera 1 pose: [0.1973513139420654, -0.6283451996579068, 0.752485726743176]

Solution 3:
rvec from homography decomposition: [-0.2886605671759886, -0.521049903923871, 1.381242030882511]
rvec from camera displacement: [-0.09198299206413783, -0.5372581036567995, 1.310868863540717]
tvec from homography decomposition: [0.8705961357284295, -0.1353018038908477, 0.7037702049789747] and scaled by d: [0.177321544550518, -0.02755804196893467, 0.1433427218822783]
tvec from camera displacement: [0.1578091561210745, 0.005603443652993617, 0.1383378976078466]
plane normal from homography decomposition: [0.2284582117722427, -0.6009247303964522, 0.7659610393954643]
plane normal at camera 1 pose: [0.1973513139420654, -0.6283451996579068, 0.752485726743176]
```

Again, there is also a solution that matches with the computed camera displacement.

### Demo 5: Basic panorama stitching from a rotating camera {#tutorial\_homography\_Demo5}

\\note This example is made to illustrate the concept of image stitching based on a pure rotational motion of the camera and should not be used to stitch panorama images. The \[stitching module\](@ref stitching) provides a complete pipeline to stitch images.

The homography transformation applies only for planar structure. But in the case of a rotating camera (pure rotation around the camera axis of projection, no translation), an arbitrary world can be considered (\[see previously\](@ref tutorial\_homography\_What\_is\_the\_homography\_matrix)).

The homography can then be computed using the rotation transformation and the camera intrinsic parameters as (see for instance \\ref homography\_course "10"):

\\f\[ s \\begin{bmatrix} x^{'} \\ y^{'} \\ 1 \\end{bmatrix} = \\bf{K} \\hspace{0.1em} \\bf{R} \\hspace{0.1em} \\bf{K}^{-1} \\begin{bmatrix} x \\ y \\ 1 \\end{bmatrix} \\f\]

To illustrate, we used Blender, a free and open-source 3D computer graphics software, to generate two camera views with only a rotation transformation between each other. More information about how to retrieve the camera intrinsic parameters and the `3x4` extrinsic matrix with respect to the world can be found in \\ref answer\_blender "11" (an additional transformation is needed to get the transformation between the camera and the object frames) with Blender.

The figure below shows the two generated views of the Suzanne model, with only a rotation transformation:

With the known associated camera poses and the intrinsic parameters, the relative rotation between the two views can be computed:

@add\_toggle\_cpp @snippet panorama\_stitching\_rotating\_camera.cpp extract-rotation @end\_toggle

@add\_toggle\_python @snippet samples/python/tutorial\_code/features/Homography/panorama\_stitching\_rotating\_camera.py extract-rotation @end\_toggle

@add\_toggle\_java @snippet samples/java/tutorial\_code/features/Homography/PanoramaStitchingRotatingCamera.java extract-rotation @end\_toggle

@add\_toggle\_cpp @snippet panorama\_stitching\_rotating\_camera.cpp compute-rotation-displacement @end\_toggle

@add\_toggle\_python @snippet samples/python/tutorial\_code/features/Homography/panorama\_stitching\_rotating\_camera.py compute-rotation-displacement @end\_toggle

@add\_toggle\_java @snippet samples/java/tutorial\_code/features/Homography/PanoramaStitchingRotatingCamera.java compute-rotation-displacement @end\_toggle

Here, the second image will be stitched with respect to the first image. The homography can be calculated using the formula above:

@add\_toggle\_cpp @snippet panorama\_stitching\_rotating\_camera.cpp compute-homography @end\_toggle

@add\_toggle\_python @snippet samples/python/tutorial\_code/features/Homography/panorama\_stitching\_rotating\_camera.py compute-homography @end\_toggle

@add\_toggle\_java @snippet samples/java/tutorial\_code/features/Homography/PanoramaStitchingRotatingCamera.java compute-homography @end\_toggle

The stitching is made simply with:

@add\_toggle\_cpp @snippet panorama\_stitching\_rotating\_camera.cpp stitch @end\_toggle

@add\_toggle\_python @snippet samples/python/tutorial\_code/features/Homography/panorama\_stitching\_rotating\_camera.py stitch @end\_toggle

@add\_toggle\_java @snippet samples/java/tutorial\_code/features/Homography/PanoramaStitchingRotatingCamera.java stitch @end\_toggle

The resulting image is:

## Additional references {#tutorial\_homography\_Additional\_references}

-   \\anchor lecture\_16 1. [Lecture 16: Planar Homographies](http://www.cse.psu.edu/~rtc12/CSE486/lecture16.pdf), Robert Collins
-   \\anchor projective\_transformations 2. [2D projective transformations (homographies)](https://web.archive.org/web/20171226115739/https://ags.cs.uni-kl.de/fileadmin/inf_ags/3dcv-ws11-12/3DCV_WS11-12_lec04.pdf), Christiano Gava, Gabriele Bleser
-   \\anchor szeliski 3. [Computer Vision: Algorithms and Applications](https://szeliski.org/Book/), Richard Szeliski
-   \\anchor answer\_dsp 4. [Step by Step Camera Pose Estimation for Visual Tracking and Planar Markers](https://dsp.stackexchange.com/a/2737)
-   \\anchor pose\_ar 5. [Pose from homography estimation](https://visp-doc.inria.fr/doxygen/camera_localization/tutorial-pose-dlt-planar-opencv.html)
-   \\anchor polar\_decomposition 6. [Polar Decomposition (in Continuum Mechanics)](http://www.continuummechanics.org/polardecomposition.html)
-   \\anchor polar\_decomposition\_svd 7. [Chapter 3 - 3.1.2 From matrices to rotations - Theorem 3.1 (Least-squares estimation of a rotation from a matrix K)](https://www-sop.inria.fr/asclepios/cours/MVA/Rotations.pdf)
-   \\anchor polar\_decomposition\_svd\_2 8. [A Personal Interview with the Singular Value Decomposition](https://web.stanford.edu/~gavish/documents/SVD_ans_you.pdf), Matan Gavish
-   \\anchor Kabsch\_algorithm 9. [Kabsch algorithm, Computation of the optimal rotation matrix](https://en.wikipedia.org/wiki/Kabsch_algorithm#Computation_of_the_optimal_rotation_matrix)
-   \\anchor homography\_course 10. [Homography](http://people.scs.carleton.ca/~c_shu/Courses/comp4900d/notes/homography.pdf), Dr. Gerhard Roth
-   \\anchor answer\_blender 11. [3x4 camera matrix from blender camera](https://blender.stackexchange.com/a/38210)

## [Table Of Content Features](https://docharvest.github.io/docs/opencv5/tutorials/features/table_of_content_features/)

Contents

opencv5

Table Of Content Features

OpenCV 5

Table Of Content Features

# Features framework (features module) {#tutorial\_table\_of\_content\_features}

-   @subpage tutorial\_harris\_detector
-   @subpage tutorial\_good\_features\_to\_track
-   @subpage tutorial\_generic\_corner\_detector
-   @subpage tutorial\_corner\_subpixels
-   @subpage tutorial\_feature\_detection
-   @subpage tutorial\_feature\_description
-   @subpage tutorial\_feature\_flann\_matcher
-   @subpage tutorial\_feature\_homography
-   @subpage tutorial\_detection\_of\_planar\_objects
-   @subpage tutorial\_akaze\_matching
-   @subpage tutorial\_akaze\_tracking
-   @subpage tutorial\_homography

## [Corner Subpixels](https://docharvest.github.io/docs/opencv5/tutorials/features/trackingmotion/corner_subpixels/corner_subpixels/)

Contents

opencv5

Corner Subpixels

OpenCV 5

Corner Subpixels

# Detecting corners location in subpixels {#tutorial\_corner\_subpixels}

@tableofcontents

@prev\_tutorial{tutorial\_generic\_corner\_detector} @next\_tutorial{tutorial\_feature\_detection}

Original author

Ana Huamán

Compatibility

OpenCV >= 3.0

## Goal

In this tutorial you will learn how to:

-   Use the OpenCV function @ref cv::cornerSubPix to find more exact corner positions (more exact than integer pixels).

## Theory

## Code

@add\_toggle\_cpp This tutorial code's is shown lines below. You can also download it from [here](https://github.com/opencv/opencv/tree/5.x/samples/cpp/tutorial_code/TrackingMotion/cornerSubPix_Demo.cpp) @include samples/cpp/tutorial\_code/TrackingMotion/cornerSubPix\_Demo.cpp @end\_toggle

@add\_toggle\_java This tutorial code's is shown lines below. You can also download it from [here](https://github.com/opencv/opencv/tree/5.x/samples/java/tutorial_code/TrackingMotion/corner_subpixels/CornerSubPixDemo.java) @include samples/java/tutorial\_code/TrackingMotion/corner\_subpixels/CornerSubPixDemo.java @end\_toggle

@add\_toggle\_python This tutorial code's is shown lines below. You can also download it from [here](https://github.com/opencv/opencv/tree/5.x/samples/python/tutorial_code/TrackingMotion/corner_subpixels/cornerSubPix_Demo.py) @include samples/python/tutorial\_code/TrackingMotion/corner\_subpixels/cornerSubPix\_Demo.py @end\_toggle

## Explanation

## Result

Here is the result:

## [Generic Corner Detector](https://docharvest.github.io/docs/opencv5/tutorials/features/trackingmotion/generic_corner_detector/generic_corner_detector/)

Contents

opencv5

Generic Corner Detector

OpenCV 5

Generic Corner Detector

# Creating your own corner detector {#tutorial\_generic\_corner\_detector}

@tableofcontents

@prev\_tutorial{tutorial\_good\_features\_to\_track} @next\_tutorial{tutorial\_corner\_subpixels}

Original author

Ana Huamán

Compatibility

OpenCV >= 3.0

## Goal

In this tutorial you will learn how to:

-   Use the OpenCV function @ref cv::cornerEigenValsAndVecs to find the eigenvalues and eigenvectors to determine if a pixel is a corner.
-   Use the OpenCV function @ref cv::cornerMinEigenVal to find the minimum eigenvalues for corner detection.
-   Implement our own version of the Harris detector as well as the Shi-Tomasi detector, by using the two functions above.

## Theory

## Code

@add\_toggle\_cpp This tutorial code's is shown lines below. You can also download it from [here](https://github.com/opencv/opencv/tree/5.x/samples/cpp/tutorial_code/TrackingMotion/cornerDetector_Demo.cpp)

@include samples/cpp/tutorial\_code/TrackingMotion/cornerDetector\_Demo.cpp @end\_toggle

@add\_toggle\_java This tutorial code's is shown lines below. You can also download it from [here](https://github.com/opencv/opencv/tree/5.x/samples/java/tutorial_code/TrackingMotion/generic_corner_detector/CornerDetectorDemo.java)

@include samples/java/tutorial\_code/TrackingMotion/generic\_corner\_detector/CornerDetectorDemo.java @end\_toggle

@add\_toggle\_python This tutorial code's is shown lines below. You can also download it from [here](https://github.com/opencv/opencv/tree/5.x/samples/python/tutorial_code/TrackingMotion/generic_corner_detector/cornerDetector_Demo.py)

@include samples/python/tutorial\_code/TrackingMotion/generic\_corner\_detector/cornerDetector\_Demo.py @end\_toggle

## Explanation

## Result

## [Good Features To Track](https://docharvest.github.io/docs/opencv5/tutorials/features/trackingmotion/good_features_to_track/good_features_to_track/)

Contents

opencv5

Good Features To Track

OpenCV 5

Good Features To Track

# Shi-Tomasi corner detector {#tutorial\_good\_features\_to\_track}

@tableofcontents

@prev\_tutorial{tutorial\_harris\_detector} @next\_tutorial{tutorial\_generic\_corner\_detector}

Original author

Ana Huamán

Compatibility

OpenCV >= 3.0

## Goal

In this tutorial you will learn how to:

-   Use the function @ref cv::goodFeaturesToTrack to detect corners using the Shi-Tomasi method (@cite Shi94).

## Theory

## Code

@add\_toggle\_cpp This tutorial code's is shown lines below. You can also download it from [here](https://github.com/opencv/opencv/tree/5.x/samples/cpp/tutorial_code/TrackingMotion/goodFeaturesToTrack_Demo.cpp) @include samples/cpp/tutorial\_code/TrackingMotion/goodFeaturesToTrack\_Demo.cpp @end\_toggle

@add\_toggle\_java This tutorial code's is shown lines below. You can also download it from [here](https://github.com/opencv/opencv/tree/5.x/samples/java/tutorial_code/TrackingMotion/good_features_to_track/GoodFeaturesToTrackDemo.java) @include samples/java/tutorial\_code/TrackingMotion/good\_features\_to\_track/GoodFeaturesToTrackDemo.java @end\_toggle

@add\_toggle\_python This tutorial code's is shown lines below. You can also download it from [here](https://github.com/opencv/opencv/tree/5.x/samples/python/tutorial_code/TrackingMotion/good_features_to_track/goodFeaturesToTrack_Demo.py) @include samples/python/tutorial\_code/TrackingMotion/good\_features\_to\_track/goodFeaturesToTrack\_Demo.py @end\_toggle

## Explanation

## Result

## [Harris Detector](https://docharvest.github.io/docs/opencv5/tutorials/features/trackingmotion/harris_detector/harris_detector/)

Contents

opencv5

Harris Detector

OpenCV 5

Harris Detector

# Harris corner detector {#tutorial\_harris\_detector}

@tableofcontents

@next\_tutorial{tutorial\_good\_features\_to\_track}

Original author

Ana Huamán

Compatibility

OpenCV >= 3.0

## Goal

In this tutorial you will learn:

-   What features are and why they are important
-   Use the function @ref cv::cornerHarris to detect corners using the Harris-Stephens method.

## Theory

### What is a feature?

-   In computer vision, usually we need to find matching points between different frames of an environment. Why? If we know how two images relate to each other, we can use _both_ images to extract information of them.
-   When we say **matching points** we are referring, in a general sense, to _characteristics_ in the scene that we can recognize easily. We call these characteristics **features**.
-   **So, what characteristics should a feature have?**
    -   It must be _uniquely recognizable_

### Types of Image Features

To mention a few:

-   Edges
-   **Corners** (also known as interest points)
-   Blobs (also known as regions of interest )

In this tutorial we will study the _corner_ features, specifically.

### Why is a corner so special?

-   Because, since it is the intersection of two edges, it represents a point in which the directions of these two edges _change_. Hence, the gradient of the image (in both directions) have a high variation, which can be used to detect it.

### How does it work?

-   Let's look for corners. Since corners represents a variation in the gradient in the image, we will look for this "variation".
    
-   Consider a grayscale image \\f$I\\f$. We are going to sweep a window \\f$w(x,y)\\f$ (with displacements \\f$u\\f$ in the x direction and \\f$v\\f$ in the y direction) \\f$I\\f$ and will calculate the variation of intensity.
    
    \\f\[E(u,v) = \\sum \_{x,y} w(x,y)\[ I(x+u,y+v) - I(x,y)\]^{2}\\f\]
    
    where:
    
    -   \\f$w(x,y)\\f$ is the window at position \\f$(x,y)\\f$
    -   \\f$I(x,y)\\f$ is the intensity at \\f$(x,y)\\f$
    -   \\f$I(x+u,y+v)\\f$ is the intensity at the moved window \\f$(x+u,y+v)\\f$
-   Since we are looking for windows with corners, we are looking for windows with a large variation in intensity. Hence, we have to maximize the equation above, specifically the term:
    
    \\f\[\\sum \_{x,y}\[ I(x+u,y+v) - I(x,y)\]^{2}\\f\]
    
-   Using _Taylor expansion_:
    
    \\f\[E(u,v) \\approx \\sum _{x,y}\[ I(x,y) + u I_{x} + vI\_{y} - I(x,y)\]^{2}\\f\]
    
-   Expanding the equation and cancelling properly:
    
    \\f\[E(u,v) \\approx \\sum _{x,y} u^{2}I_{x}^{2} + 2uvI\_{x}I\_{y} + v^{2}I\_{y}^{2}\\f\]
    
-   Which can be expressed in a matrix form as:
    
    \\f\[E(u,v) \\approx \\begin{bmatrix} u & v \\end{bmatrix} \\left ( \\displaystyle \\sum\_{x,y} w(x,y) \\begin{bmatrix} I\_x^{2} & I\_{x}I\_{y} \\ I\_xI\_{y} & I\_{y}^{2} \\end{bmatrix} \\right ) \\begin{bmatrix} u \\ v \\end{bmatrix}\\f\]
    
-   Let's denote:
    
    \\f\[M = \\displaystyle \\sum\_{x,y} w(x,y) \\begin{bmatrix} I\_x^{2} & I\_{x}I\_{y} \\ I\_xI\_{y} & I\_{y}^{2} \\end{bmatrix}\\f\]
    
-   So, our equation now is:
    
    \\f\[E(u,v) \\approx \\begin{bmatrix} u & v \\end{bmatrix} M \\begin{bmatrix} u \\ v \\end{bmatrix}\\f\]
    
-   A score is calculated for each window, to determine if it can possibly contain a corner:
    
    \\f\[R = det(M) - k(trace(M))^{2}\\f\]
    
    where:
    
    -   det(M) = \\f$\\lambda\_{1}\\lambda\_{2}\\f$
    -   trace(M) = \\f$\\lambda\_{1}+\\lambda\_{2}\\f$
    
    a window with a score \\f$R\\f$ greater than a certain value is considered a "corner"
    

## Code

@add\_toggle\_cpp This tutorial code's is shown lines below. You can also download it from [here](https://github.com/opencv/opencv/tree/5.x/samples/cpp/tutorial_code/TrackingMotion/cornerHarris_Demo.cpp) @include samples/cpp/tutorial\_code/TrackingMotion/cornerHarris\_Demo.cpp @end\_toggle

@add\_toggle\_java This tutorial code's is shown lines below. You can also download it from [here](https://github.com/opencv/opencv/tree/5.x/samples/java/tutorial_code/TrackingMotion/harris_detector/CornerHarrisDemo.java) @include samples/java/tutorial\_code/TrackingMotion/harris\_detector/CornerHarrisDemo.java @end\_toggle

@add\_toggle\_python This tutorial code's is shown lines below. You can also download it from [here](https://github.com/opencv/opencv/tree/5.x/samples/python/tutorial_code/TrackingMotion/harris_detector/cornerHarris_Demo.py) @include samples/python/tutorial\_code/TrackingMotion/harris\_detector/cornerHarris\_Demo.py @end\_toggle

## Explanation

## Result

The original image:

The detected corners are surrounded by a small black circle

## [Table Of Content Geometry](https://docharvest.github.io/docs/opencv5/tutorials/geometry/table_of_content_geometry/)

Contents

opencv5

Table Of Content Geometry

OpenCV 5

Table Of Content Geometry

# Computational geometry module {#tutorial\_table\_of\_content\_geometry}

## [Gpu Basics Similarity](https://docharvest.github.io/docs/opencv5/tutorials/gpu/gpu-basics-similarity/gpu_basics_similarity/)

Contents

opencv5

Gpu Basics Similarity

OpenCV 5

Gpu Basics Similarity

# @cond CUDA\_MODULES Similarity check (PNSR and SSIM) on the GPU {#tutorial\_gpu\_basics\_similarity}

@tableofcontents

@todo update this tutorial

@next\_tutorial{tutorial\_gpu\_thrust\_interop}

## Goal

In the @ref tutorial\_video\_input\_psnr\_ssim tutorial I already presented the PSNR and SSIM methods for checking the similarity between the two images. And as you could see, the execution process takes quite some time , especially in the case of the SSIM. However, if the performance numbers of an OpenCV implementation for the CPU do not satisfy you and you happen to have an NVIDIA CUDA GPU device in your system, all is not lost. You may try to port or write your owm algorithm for the video card.

This tutorial will give a good grasp on how to approach coding by using the GPU module of OpenCV. As a prerequisite you should already know how to handle the core, highgui and imgproc modules. So, our main goals are:

-   What's different compared to the CPU?
-   Create the GPU code for the PSNR and SSIM
-   Optimize the code for maximal performance

## The source code

You may also find the source code and the video file in the `samples/cpp/tutorial_code/gpu/gpu-basics-similarity/gpu-basics-similarity` directory of the OpenCV source library or download it from [here](https://github.com/opencv/opencv/tree/5.x/samples/cpp/tutorial_code/gpu/gpu-basics-similarity/gpu-basics-similarity.cpp). The full source code is quite long (due to the controlling of the application via the command line arguments and performance measurement). Therefore, to avoid cluttering up these sections with those you'll find here only the functions itself.

The PSNR returns a float number, that if the two inputs are similar between 30 and 50 (higher is better).

@snippet samples/cpp/tutorial\_code/gpu/gpu-basics-similarity/gpu-basics-similarity.cpp getpsnr @snippet samples/cpp/tutorial\_code/gpu/gpu-basics-similarity/gpu-basics-similarity.cpp getpsnrcuda @snippet samples/cpp/tutorial\_code/gpu/gpu-basics-similarity/gpu-basics-similarity.cpp psnr @snippet samples/cpp/tutorial\_code/gpu/gpu-basics-similarity/gpu-basics-similarity.cpp getpsnropt

The SSIM returns the MSSIM of the images. This is too a floating point number between zero and one (higher is better), however we have one for each channel. Therefore, we return a _Scalar_ OpenCV data structure:

@snippet samples/cpp/tutorial\_code/gpu/gpu-basics-similarity/gpu-basics-similarity.cpp getssim @snippet samples/cpp/tutorial\_code/gpu/gpu-basics-similarity/gpu-basics-similarity.cpp getssimcuda @snippet samples/cpp/tutorial\_code/gpu/gpu-basics-similarity/gpu-basics-similarity.cpp ssim @snippet samples/cpp/tutorial\_code/gpu/gpu-basics-similarity/gpu-basics-similarity.cpp getssimopt

## How to do it? - The GPU

As see above, we have three types of functions for each operation. One for the CPU and two for the GPU. The reason I made two for the GPU is too illustrate that often simple porting your CPU to GPU will actually make it slower. If you want some performance gain you will need to remember a few rules, for which I will go into detail later on.

The development of the GPU module was made so that it resembles as much as possible its CPU counterpart. This makes the porting process easier. The first thing you need to do before writing any code is to link the GPU module to your project, and include the header file for the module. All the functions and data structures of the GPU are in a _gpu_ sub namespace of the _cv_ namespace. You may add this to the default one via the _use namespace_ keyword, or mark it everywhere explicitly via the cv:: to avoid confusion. I'll do the later. @code{.cpp} #include <opencv2/gpu.hpp> // GPU structures and methods @endcode

GPU stands for "graphics processing unit". It was originally built to render graphical scenes. These scenes somehow build on a lot of data. Nevertheless, these aren't all dependent one from another in a sequential way and as it is possible a parallel processing of them. Due to this a GPU will contain multiple smaller processing units. These aren't the state of the art processors and on a one on one test with a CPU it will fall behind. However, its strength lies in its numbers. In the last years there has been an increasing trend to harvest these massive parallel powers of the GPU in non-graphical scenes; rendering as well. This gave birth to the general-purpose computation on graphics processing units (GPGPU).

The GPU has its own memory. When you read data from the hard drive with OpenCV into a _Mat_ object that takes place in your systems memory. The CPU works somehow directly on this (via its cache), however the GPU cannot. It has to transfer the information required for calculations from the system memory to its own. This is done via an upload process and is time consuming. In the end the result will have to be downloaded back to your system memory for your CPU to see and use it. Porting small functions to GPU is not recommended as the upload/download time will be larger than the amount you gain by a parallel execution.

Mat objects are stored only in the system memory (or the CPU cache). For getting an OpenCV matrix to the GPU you'll need to use its GPU counterpart @ref cv::cuda::GpuMat. It works similar to the Mat with a 2D only limitation and no reference returning for its functions (cannot mix GPU references with CPU ones). To upload a Mat object to the GPU you need to call the upload function after creating an instance of the class. To download you may use simple assignment to a Mat object or use the download function. @code{.cpp} Mat I1; // Main memory item - read image into with imread for example gpu::GpuMat gI; // GPU matrix - for now empty gI1.upload(I1); // Upload a data from the system memory to the GPU memory

I1 = gI1; // Download, gI1.download(I1) will work too @endcode Once you have your data up in the GPU memory you may call GPU enabled functions of OpenCV. Most of the functions keep the same name just as on the CPU, with the difference that they only accept _GpuMat_ inputs.

Another thing to keep in mind is that not for all channel numbers you can make efficient algorithms on the GPU. Generally, I found that the input images for the GPU images need to be either one or four channel ones and one of the char or float type for the item sizes. No double support on the GPU, sorry. Passing other types of objects for some functions will result in an exception throw, and an error message on the error output. The documentation details in most of the places the types accepted for the inputs. If you have three channel images as an input you can do two things: either add a new channel (and use char elements) or split up the image and call the function for each image. The first one isn't really recommended as this wastes memory.

For some functions, where the position of the elements (neighbor items) doesn't matter, the quick solution is to reshape it into a single channel image. This is the case for the PSNR implementation where for the _absdiff_ method the value of the neighbors is not important. However, for the _GaussianBlur_ this isn't an option and such need to use the split method for the SSIM. With this knowledge you can make a GPU viable code (like mine GPU one) and run it. You'll be surprised to see that it might turn out slower than your CPU implementation.

## Optimization

The reason for this is that you're throwing out on the window the price for memory allocation and data transfer. And on the GPU this is damn high. Another possibility for optimization is to introduce asynchronous OpenCV GPU calls too with the help of the @ref cv::cuda::Stream.

\-# Memory allocation on the GPU is considerable. Therefore, if it’s possible allocate new memory as few times as possible. If you create a function what you intend to call multiple times it is a good idea to allocate any local parameters for the function only once, during the first call. To do this you create a data structure containing all the local variables you will use. For instance in case of the PSNR these are: @code{.cpp} struct BufferPSNR // Optimized GPU versions { // Data allocations are very expensive on GPU. Use a buffer to solve: allocate once reuse later. gpu::GpuMat gI1, gI2, gs, t1,t2;

```
  gpu::GpuMat buf;
};
@endcode
Then create an instance of this in the main program:
@code{.cpp}
BufferPSNR bufferPSNR;
@endcode
And finally pass this to the function each time you call it:
@code{.cpp}
double getPSNR_GPU_optimized(const Mat& I1, const Mat& I2, BufferPSNR& b)
@endcode
Now you access these local parameters as: *b.gI1*, *b.buf* and so on. The GpuMat will only
reallocate itself on a new call if the new matrix size is different from the previous one.
```

\-# Avoid unnecessary function data transfers. Any small data transfer will be significant once you go to the GPU. Therefore, if possible, make all calculations in-place (in other words do not create new memory objects - for reasons explained at the previous point). For example, although expressing arithmetical operations may be easier to express in one line formulas, it will be slower. In case of the SSIM at one point I need to calculate: @code{.cpp} b.t1 = 2 \* b.mu1\_mu2 + C1; @endcode Although the upper call will succeed, observe that there is a hidden data transfer present. Before it makes the addition it needs to store somewhere the multiplication. Therefore, it will create a local matrix in the background, add to that the _C1_ value and finally assign that to _t1_. To avoid this we use the gpu functions, instead of the arithmetic operators: @code{.cpp} gpu::multiply(b.mu1\_mu2, 2, b.t1); //b.t1 = 2 \* b.mu1\_mu2 + C1; gpu::add(b.t1, C1, b.t1); @endcode -# Use asynchronous calls (the @ref cv::cuda::Stream ). By default whenever you call a GPU function it will wait for the call to finish and return with the result afterwards. However, it is possible to make asynchronous calls, meaning it will call for the operation execution, making the costly data allocations for the algorithm and return back right away. Now you can call another function, if you wish. For the MSSIM this is a small optimization point. In our default implementation we split up the image into channels and call them for each channel the GPU functions. A small degree of parallelization is possible with the stream. By using a stream we can make the data allocation, upload operations while the GPU is already executing a given method. For example, we need to upload two images. We queue these one after another and call the function that processes it. The functions will wait for the upload to finish, however while this happens it makes the output buffer allocations for the function to be executed next. @code{.cpp} gpu::Stream stream;

```
stream.enqueueConvert(b.gI1, b.t1, CV_32F);    // Upload

gpu::split(b.t1, b.vI1, stream);              // Methods (pass the stream as final parameter).
gpu::multiply(b.vI1[i], b.vI1[i], b.I1_2, stream);        // I1^2
@endcode
```

## Result and conclusion

On an Intel P8700 laptop CPU paired with a low end NVIDIA GT220M, here are the performance numbers: @code Time of PSNR CPU (averaged for 10 runs): 41.4122 milliseconds. With result of: 19.2506 Time of PSNR GPU (averaged for 10 runs): 158.977 milliseconds. With result of: 19.2506 Initial call GPU optimized: 31.3418 milliseconds. With result of: 19.2506 Time of PSNR GPU OPTIMIZED ( / 10 runs): 24.8171 milliseconds. With result of: 19.2506

Time of MSSIM CPU (averaged for 10 runs): 484.343 milliseconds. With result of B0.890964 G0.903845 R0.936934 Time of MSSIM GPU (averaged for 10 runs): 745.105 milliseconds. With result of B0.89922 G0.909051 R0.968223 Time of MSSIM GPU Initial Call 357.746 milliseconds. With result of B0.890964 G0.903845 R0.936934 Time of MSSIM GPU OPTIMIZED ( / 10 runs): 203.091 milliseconds. With result of B0.890964 G0.903845 R0.936934 @endcode In both cases we managed a performance increase of almost 100% compared to the CPU implementation. It may be just the improvement needed for your application to work. You may observe a runtime instance of this on the [YouTube here](https://www.youtube.com/watch?v=3_ESXmFlnvY).

@youtube{3\_ESXmFlnvY} @endcond

## [Gpu Thrust Interop](https://docharvest.github.io/docs/opencv5/tutorials/gpu/gpu-thrust-interop/gpu_thrust_interop/)

Contents

opencv5

Gpu Thrust Interop

OpenCV 5

Gpu Thrust Interop

# @cond CUDA\_MODULES Using a cv::cuda::GpuMat with thrust {#tutorial\_gpu\_thrust\_interop}

@tableofcontents

@prev\_tutorial{tutorial\_gpu\_basics\_similarity}

## Goal

Thrust is an extremely powerful library for various cuda accelerated algorithms. However thrust is designed to work with vectors and not pitched matrices. The following tutorial will discuss wrapping cv::cuda::GpuMat's into thrust iterators that can be used with thrust algorithms.

This tutorial should show you how to:

-   Wrap a GpuMat into a thrust iterator
-   Fill a GpuMat with random numbers
-   Sort a column of a GpuMat in place
-   Copy values greater than 0 to a new gpu matrix
-   Use streams with thrust

## Wrapping a GpuMat into a thrust iterator

The following code will produce an iterator for a GpuMat

@snippet samples/cpp/tutorial\_code/gpu/gpu-thrust-interop/Thrust\_interop.hpp begin\_itr @snippet samples/cpp/tutorial\_code/gpu/gpu-thrust-interop/Thrust\_interop.hpp end\_itr

Our goal is to have an iterator that will start at the beginning of the matrix, and increment correctly to access continuous matrix elements. This is trivial for a continuous row, but how about for a column of a pitched matrix? To do this we need the iterator to be aware of the matrix dimensions and step. This information is embedded in the step\_functor. @snippet samples/cpp/tutorial\_code/gpu/gpu-thrust-interop/Thrust\_interop.hpp step\_functor The step functor takes in an index value and returns the appropriate offset from the beginning of the matrix. The counting iterator simply increments over the range of pixel elements. Combined into the transform\_iterator we have an iterator that counts from 0 to M\*N and correctly increments to account for the pitched memory of a GpuMat. Unfortunately this does not include any memory location information, for that we need a thrust::device\_ptr. By combining a device pointer with the transform\_iterator we can point thrust to the first element of our matrix and have it step accordingly.

## Fill a GpuMat with random numbers

Now that we have some nice functions for making iterators for thrust, lets use them to do some things OpenCV can't do. Unfortunately at the time of this writing, OpenCV doesn't have any Gpu random number generation. Thankfully thrust does and it's now trivial to interop between the two. Example taken from [http://stackoverflow.com/questions/12614164/generating-a-random-number-vector-between-0-and-1-0-using-thrust](http://stackoverflow.com/questions/12614164/generating-a-random-number-vector-between-0-and-1-0-using-thrust)

First we need to write a functor that will produce our random values. @snippet samples/cpp/tutorial\_code/gpu/gpu-thrust-interop/main.cu prg

This will take in an integer value and output a value between a and b. Now we will populate our matrix with values between 0 and 10 with a thrust transform. @snippet samples/cpp/tutorial\_code/gpu/gpu-thrust-interop/main.cu random

## Sort a column of a GpuMat in place

Lets fill matrix elements with random values and an index. Afterwards we will sort the random numbers and the indecies. @snippet samples/cpp/tutorial\_code/gpu/gpu-thrust-interop/main.cu sort

## Copy values greater than 0 to a new gpu matrix while using streams

In this example we're going to see how cv::cuda::Streams can be used with thrust. Unfortunately this specific example uses functions that must return results to the CPU so it isn't the optimal use of streams.

@snippet samples/cpp/tutorial\_code/gpu/gpu-thrust-interop/main.cu copy\_greater

First we will populate a GPU mat with randomly generated data between -1 and 1 on a stream.

@snippet samples/cpp/tutorial\_code/gpu/gpu-thrust-interop/main.cu random\_gen\_stream

Notice the use of thrust::system::cuda::par.on(...), this creates an execution policy for executing thrust code on a stream. There is a bug in the version of thrust distributed with the cuda toolkit, as of version 7.5 this has not been fixed. This bug causes code to not execute on streams. The bug can however be fixed by using the newest version of thrust from the git repository. ([http://github.com/thrust/thrust.git](http://github.com/thrust/thrust.git)) Next we will determine how many values are greater than 0 by using thrust::count\_if with the following predicate:

@snippet samples/cpp/tutorial\_code/gpu/gpu-thrust-interop/main.cu pred\_greater

We will use those results to create an output buffer for storing the copied values, we will then use copy\_if with the same predicate to populate the output buffer. Lastly we will download the values into a CPU mat for viewing. @endcond

## [Table Of Content Gpu](https://docharvest.github.io/docs/opencv5/tutorials/gpu/table_of_content_gpu/)

Contents

opencv5

Table Of Content Gpu

OpenCV 5

Table Of Content Gpu

# @cond CUDA\_MODULES GPU-Accelerated Computer Vision (cuda module) {#tutorial\_table\_of\_content\_gpu}

Squeeze out every little computation power from your system by using the power of your video card to run the OpenCV algorithms.

-   @subpage tutorial\_gpu\_basics\_similarity
    
    _Languages:_ C++
    
    _Compatibility:_ > OpenCV 2.0
    
    _Author:_ Bernát Gábor
    
    This will give a good grasp on how to approach coding on the GPU module, once you already know how to handle the other modules. As a test case it will port the similarity methods from the tutorial @ref tutorial\_video\_input\_psnr\_ssim to the GPU.
    
-   @subpage tutorial\_gpu\_thrust\_interop
    
    _Languages:_ C++
    
    _Compatibility:_ >= OpenCV 3.0
    
    This tutorial will show you how to wrap a GpuMat into a thrust iterator in order to be able to use the functions in the thrust library.
    

@endcond

## [Anisotropic Image Segmentation](https://docharvest.github.io/docs/opencv5/tutorials/imgproc/anisotropic_image_segmentation/anisotropic_image_segmentation/)

Contents

opencv5

Anisotropic Image Segmentation

OpenCV 5

Anisotropic Image Segmentation

# Anisotropic image segmentation by a gradient structure tensor {#tutorial\_anisotropic\_image\_segmentation\_by\_a\_gst}

@tableofcontents

@prev\_tutorial{tutorial\_motion\_deblur\_filter} @next\_tutorial{tutorial\_periodic\_noise\_removing\_filter}

Original author

Karpushin Vladislav

Compatibility

OpenCV >= 3.0

## Goal

In this tutorial you will learn:

-   what the gradient structure tensor is
-   how to estimate orientation and coherency of an anisotropic image by a gradient structure tensor
-   how to segment an anisotropic image with a single local orientation by a gradient structure tensor

## Theory

@note The explanation is based on the books @cite jahne2000computer, @cite bigun2006vision and @cite van1995estimators. Good physical explanation of a gradient structure tensor is given in @cite yang1996structure. Also, you can refer to a wikipedia page [Structure tensor](https://en.wikipedia.org/wiki/Structure_tensor). @note A anisotropic image on this page is a real world image.

### What is the gradient structure tensor?

In mathematics, the gradient structure tensor (also referred to as the second-moment matrix, the second order moment tensor, the inertia tensor, etc.) is a matrix derived from the gradient of a function. It summarizes the predominant directions of the gradient in a specified neighborhood of a point, and the degree to which those directions are coherent (coherency). The gradient structure tensor is widely used in image processing and computer vision for 2D/3D image segmentation, motion detection, adaptive filtration, local image features detection, etc.

Important features of anisotropic images include orientation and coherency of a local anisotropy. In this paper we will show how to estimate orientation and coherency, and how to segment an anisotropic image with a single local orientation by a gradient structure tensor.

The gradient structure tensor of an image is a 2x2 symmetric matrix. Eigenvectors of the gradient structure tensor indicate local orientation, whereas eigenvalues give coherency (a measure of anisotropism).

The gradient structure tensor \\f$J\\f$ of an image \\f$Z\\f$ can be written as:

\\f\[J = \\begin{bmatrix} J\_{11} & J\_{12} \\ J\_{12} & J\_{22} \\end{bmatrix}\\f\]

where \\f$J\_{11} = M\[Z\_{x}^{2}\]\\f$, \\f$J\_{22} = M\[Z\_{y}^{2}\]\\f$, \\f$J\_{12} = M\[Z\_{x}Z\_{y}\]\\f$ - components of the tensor, \\f$M\[\]\\f$ is a symbol of mathematical expectation (we can consider this operation as averaging in a window w), \\f$Z\_{x}\\f$ and \\f$Z\_{y}\\f$ are partial derivatives of an image \\f$Z\\f$ with respect to \\f$x\\f$ and \\f$y\\f$.

The eigenvalues of the tensor can be found in the below formula: \\f\[\\lambda\_{1,2} = \\frac{1}{2} \\left \[ J\_{11} + J\_{22} \\pm \\sqrt{(J\_{11} - J\_{22})^{2} + 4J\_{12}^{2}} \\right \] \\f\] where \\f$\\lambda\_1\\f$ - largest eigenvalue, \\f$\\lambda\_2\\f$ - smallest eigenvalue.

### How to estimate orientation and coherency of an anisotropic image by gradient structure tensor?

The orientation of an anisotropic image: \\f\[\\alpha = 0.5arctg\\frac{2J\_{12}}{J\_{22} - J\_{11}}\\f\]

Coherency: \\f\[C = \\frac{\\lambda\_1 - \\lambda\_2}{\\lambda\_1 + \\lambda\_2}\\f\]

The coherency ranges from 0 to 1. For ideal local orientation (\\f$\\lambda\_2\\f$ = 0, \\f$\\lambda\_1\\f$ > 0) it is one, for an isotropic gray value structure (\\f$\\lambda\_1\\f$ = \\f$\\lambda\_2\\f$ > 0) it is zero.

## Source code

You can find source code in the `samples/cpp/tutorial_code/ImgProc/anisotropic_image_segmentation/anisotropic_image_segmentation.cpp` of the OpenCV source code library.

@add\_toggle\_cpp @include cpp/tutorial\_code/ImgProc/anisotropic\_image\_segmentation/anisotropic\_image\_segmentation.cpp @end\_toggle

@add\_toggle\_python @include samples/python/tutorial\_code/imgProc/anisotropic\_image\_segmentation/anisotropic\_image\_segmentation.py @end\_toggle

## Explanation

An anisotropic image segmentation algorithm consists of a gradient structure tensor calculation, an orientation calculation, a coherency calculation and an orientation and coherency thresholding:

@add\_toggle\_cpp @snippet samples/cpp/tutorial\_code/ImgProc/anisotropic\_image\_segmentation/anisotropic\_image\_segmentation.cpp main @end\_toggle

@add\_toggle\_python @snippet samples/python/tutorial\_code/imgProc/anisotropic\_image\_segmentation/anisotropic\_image\_segmentation.py main @end\_toggle

A function calcGST() calculates orientation and coherency by using a gradient structure tensor. An input parameter w defines a window size:

@add\_toggle\_cpp @snippet samples/cpp/tutorial\_code/ImgProc/anisotropic\_image\_segmentation/anisotropic\_image\_segmentation.cpp calcGST @end\_toggle

@add\_toggle\_python @snippet samples/python/tutorial\_code/imgProc/anisotropic\_image\_segmentation/anisotropic\_image\_segmentation.py calcGST @end\_toggle

The below code applies a thresholds LowThr and HighThr to image orientation and a threshold C\_Thr to image coherency calculated by the previous function. LowThr and HighThr define orientation range:

@add\_toggle\_cpp @snippet samples/cpp/tutorial\_code/ImgProc/anisotropic\_image\_segmentation/anisotropic\_image\_segmentation.cpp thresholding @end\_toggle

@add\_toggle\_python @snippet samples/python/tutorial\_code/imgProc/anisotropic\_image\_segmentation/anisotropic\_image\_segmentation.py thresholding @end\_toggle

And finally we combine thresholding results:

@add\_toggle\_cpp @snippet samples/cpp/tutorial\_code/ImgProc/anisotropic\_image\_segmentation/anisotropic\_image\_segmentation.cpp combining @end\_toggle

@add\_toggle\_python @snippet samples/python/tutorial\_code/imgProc/anisotropic\_image\_segmentation/anisotropic\_image\_segmentation.py combining @end\_toggle

## Result

Below you can see the real anisotropic image with single direction:

Below you can see the orientation and coherency of the anisotropic image:

Below you can see the segmentation result:

The result has been computed with w = 52, C\_Thr = 0.43, LowThr = 35, HighThr = 57. We can see that the algorithm selected only the areas with one single direction.

## References

-   [Structure tensor](https://en.wikipedia.org/wiki/Structure_tensor) - structure tensor description on the wikipedia

## [Basic Geometric Drawing](https://docharvest.github.io/docs/opencv5/tutorials/imgproc/basic_geometric_drawing/basic_geometric_drawing/)

Contents

opencv5

Basic Geometric Drawing

OpenCV 5

Basic Geometric Drawing

# Basic Drawing {#tutorial\_basic\_geometric\_drawing}

@tableofcontents

@next\_tutorial{tutorial\_random\_generator\_and\_text}

Original author

Ana Huamán

Compatibility

OpenCV >= 3.0

## Goals

In this tutorial you will learn how to:

-   Draw a **line** by using the OpenCV function **line()**
-   Draw an **ellipse** by using the OpenCV function **ellipse()**
-   Draw a **rectangle** by using the OpenCV function **rectangle()**
-   Draw a **circle** by using the OpenCV function **circle()**
-   Draw a **filled polygon** by using the OpenCV function **fillPoly()**

## OpenCV Theory

@add\_toggle\_cpp For this tutorial, we will heavily use two structures: @ref cv::Point and @ref cv::Scalar : @end\_toggle @add\_toggle\_java For this tutorial, we will heavily use two structures: @ref cv::Point and @ref cv::Scalar : @end\_toggle @add\_toggle\_python For this tutorial, we will heavily use tuples in Python instead of @ref cv::Point and @ref cv::Scalar : @end\_toggle

### Point

It represents a 2D point, specified by its image coordinates \\f$x\\f$ and \\f$y\\f$. We can define it as: @add\_toggle\_cpp @code{.cpp} Point pt; pt.x = 10; pt.y = 8; @endcode or @code{.cpp} Point pt = Point(10, 8); @endcode @end\_toggle @add\_toggle\_java @code{.java} Point pt = new Point(); pt.x = 10; pt.y = 8; @endcode or @code{.java} Point pt = new Point(10, 8); @endcode @end\_toggle @add\_toggle\_python @code{.python} pt = (10, 0) # x = 10, y = 0 @endcode @end\_toggle

### Scalar

-   Represents a 4-element vector. The type Scalar is widely used in OpenCV for passing pixel values.
-   In this tutorial, we will use it extensively to represent BGR color values (3 parameters). It is not necessary to define the last argument if it is not going to be used.
-   Let's see an example, if we are asked for a color argument and we give: @add\_toggle\_cpp @code{.cpp} Scalar( a, b, c ) @endcode @end\_toggle @add\_toggle\_java @code{.java} Scalar( a, b, c ) @endcode @end\_toggle @add\_toggle\_python @code{.python} ( a, b, c ) @endcode @end\_toggle We would be defining a BGR color such as: _Blue = a_, _Green = b_ and _Red = c_

## Code

@add\_toggle\_cpp

-   This code is in your OpenCV sample folder. Otherwise you can grab it from [here](https://raw.githubusercontent.com/opencv/opencv/5.x/samples/cpp/tutorial_code/ImgProc/basic_drawing/Drawing_1.cpp) @include samples/cpp/tutorial\_code/ImgProc/basic\_drawing/Drawing\_1.cpp @end\_toggle

@add\_toggle\_java

-   This code is in your OpenCV sample folder. Otherwise you can grab it from [here](https://raw.githubusercontent.com/opencv/opencv/5.x/samples/java/tutorial_code/ImgProc/BasicGeometricDrawing/BasicGeometricDrawing.java) @include samples/java/tutorial\_code/ImgProc/BasicGeometricDrawing/BasicGeometricDrawing.java @end\_toggle

@add\_toggle\_python

-   This code is in your OpenCV sample folder. Otherwise you can grab it from [here](https://raw.githubusercontent.com/opencv/opencv/5.x/samples/python/tutorial_code/imgProc/BasicGeometricDrawing/basic_geometric_drawing.py) @include samples/python/tutorial\_code/imgProc/BasicGeometricDrawing/basic\_geometric\_drawing.py @end\_toggle

## Explanation

Since we plan to draw two examples (an atom and a rook), we have to create two images and two windows to display them. @add\_toggle\_cpp @snippet cpp/tutorial\_code/ImgProc/basic\_drawing/Drawing\_1.cpp create\_images @end\_toggle

@add\_toggle\_java @snippet java/tutorial\_code/ImgProc/BasicGeometricDrawing/BasicGeometricDrawing.java create\_images @end\_toggle

@add\_toggle\_python @snippet python/tutorial\_code/imgProc/BasicGeometricDrawing/basic\_geometric\_drawing.py create\_images @end\_toggle

We created functions to draw different geometric shapes. For instance, to draw the atom we used **MyEllipse** and **MyFilledCircle**: @add\_toggle\_cpp @snippet cpp/tutorial\_code/ImgProc/basic\_drawing/Drawing\_1.cpp draw\_atom @end\_toggle

@add\_toggle\_java @snippet java/tutorial\_code/ImgProc/BasicGeometricDrawing/BasicGeometricDrawing.java draw\_atom @end\_toggle

@add\_toggle\_python @snippet python/tutorial\_code/imgProc/BasicGeometricDrawing/basic\_geometric\_drawing.py draw\_atom @end\_toggle

And to draw the rook we employed **MyLine**, **rectangle** and a **MyPolygon**: @add\_toggle\_cpp @snippet cpp/tutorial\_code/ImgProc/basic\_drawing/Drawing\_1.cpp draw\_rook @end\_toggle

@add\_toggle\_java @snippet java/tutorial\_code/ImgProc/BasicGeometricDrawing/BasicGeometricDrawing.java draw\_rook @end\_toggle

@add\_toggle\_python @snippet python/tutorial\_code/imgProc/BasicGeometricDrawing/basic\_geometric\_drawing.py draw\_rook @end\_toggle

Let's check what is inside each of these functions: @add\_toggle\_cpp @end\_toggle

#### MyLine

@add\_toggle\_cpp @snippet cpp/tutorial\_code/ImgProc/basic\_drawing/Drawing\_1.cpp my\_line @end\_toggle

@add\_toggle\_java @snippet java/tutorial\_code/ImgProc/BasicGeometricDrawing/BasicGeometricDrawing.java my\_line @end\_toggle

@add\_toggle\_python @snippet python/tutorial\_code/imgProc/BasicGeometricDrawing/basic\_geometric\_drawing.py my\_line @end\_toggle

-   As we can see, **MyLine** just call the function **line()** , which does the following:
    -   Draw a line from Point **start** to Point **end**
    -   The line is displayed in the image **img**
    -   The line color is defined by **( 0, 0, 0 )** which is the RGB value correspondent to **Black**
    -   The line thickness is set to **thickness** (in this case 2)
    -   The line is a 8-connected one (**lineType** = 8)

#### MyEllipse

@add\_toggle\_cpp @snippet cpp/tutorial\_code/ImgProc/basic\_drawing/Drawing\_1.cpp my\_ellipse @end\_toggle

@add\_toggle\_java @snippet java/tutorial\_code/ImgProc/BasicGeometricDrawing/BasicGeometricDrawing.java my\_ellipse @end\_toggle

@add\_toggle\_python @snippet python/tutorial\_code/imgProc/BasicGeometricDrawing/basic\_geometric\_drawing.py my\_ellipse @end\_toggle

-   From the code above, we can observe that the function **ellipse()** draws an ellipse such that:
    
    -   The ellipse is displayed in the image **img**
    -   The ellipse center is located in the point **(w/2, w/2)** and is enclosed in a box of size **(w/4, w/16)**
    -   The ellipse is rotated **angle** degrees
    -   The ellipse extends an arc between **0** and **360** degrees
    -   The color of the figure will be **( 255, 0, 0 )** which means blue in BGR value.
    -   The ellipse's **thickness** is 2.

#### MyFilledCircle

@add\_toggle\_cpp @snippet cpp/tutorial\_code/ImgProc/basic\_drawing/Drawing\_1.cpp my\_filled\_circle @end\_toggle

@add\_toggle\_java @snippet java/tutorial\_code/ImgProc/BasicGeometricDrawing/BasicGeometricDrawing.java my\_filled\_circle @end\_toggle

@add\_toggle\_python @snippet python/tutorial\_code/imgProc/BasicGeometricDrawing/basic\_geometric\_drawing.py my\_filled\_circle @end\_toggle

-   Similar to the ellipse function, we can observe that _circle_ receives as arguments:
    
    -   The image where the circle will be displayed (**img**)
    -   The center of the circle denoted as the point **center**
    -   The radius of the circle: **w/32**
    -   The color of the circle: **( 0, 0, 255 )** which means _Red_ in BGR
    -   Since **thickness** = -1, the circle will be drawn filled.

#### MyPolygon

@add\_toggle\_cpp @snippet cpp/tutorial\_code/ImgProc/basic\_drawing/Drawing\_1.cpp my\_polygon @end\_toggle

@add\_toggle\_java @snippet java/tutorial\_code/ImgProc/BasicGeometricDrawing/BasicGeometricDrawing.java my\_polygon @end\_toggle

@add\_toggle\_python @snippet python/tutorial\_code/imgProc/BasicGeometricDrawing/basic\_geometric\_drawing.py my\_polygon @end\_toggle

-   To draw a filled polygon we use the function **fillPoly()** . We note that:
    
    -   The polygon will be drawn on **img**
    -   The vertices of the polygon are the set of points in **ppt**
    -   The color of the polygon is defined by **( 255, 255, 255 )**, which is the BGR value for _white_

#### rectangle

@add\_toggle\_cpp @snippet cpp/tutorial\_code/ImgProc/basic\_drawing/Drawing\_1.cpp rectangle @end\_toggle

@add\_toggle\_java @snippet java/tutorial\_code/ImgProc/BasicGeometricDrawing/BasicGeometricDrawing.java rectangle @end\_toggle

@add\_toggle\_python @snippet python/tutorial\_code/imgProc/BasicGeometricDrawing/basic\_geometric\_drawing.py rectangle @end\_toggle

-   Finally we have the @ref cv::rectangle function (we did not create a special function for this guy). We note that:
    
    -   The rectangle will be drawn on **rook\_image**
    -   Two opposite vertices of the rectangle are defined by **( 0, 7\*w/8 )** and **( w, w )**
    -   The color of the rectangle is given by **( 0, 255, 255 )** which is the BGR value for _yellow_
    -   Since the thickness value is given by **FILLED (-1)**, the rectangle will be filled.

## Result

Compiling and running your program should give you a result like this:

## [Erosion Dilatation](https://docharvest.github.io/docs/opencv5/tutorials/imgproc/erosion_dilatation/erosion_dilatation/)

Contents

opencv5

Erosion Dilatation

OpenCV 5

Erosion Dilatation

# Eroding and Dilating {#tutorial\_erosion\_dilatation}

@tableofcontents

@prev\_tutorial{tutorial\_gausian\_median\_blur\_bilateral\_filter} @next\_tutorial{tutorial\_opening\_closing\_hats}

Original author

Ana Huamán

Compatibility

OpenCV >= 3.0

## Goal

In this tutorial you will learn how to:

-   Apply two very common morphological operators: Erosion and Dilation. For this purpose, you will use the following OpenCV functions:
    -   @ref cv::erode
    -   @ref cv::dilate

@note The explanation below belongs to the book **Learning OpenCV** by Bradski and Kaehler.

## Morphological Operations

-   In short: A set of operations that process images based on shapes. Morphological operations apply a _structuring element_ to an input image and generate an output image.
    
-   The most basic morphological operations are: Erosion and Dilation. They have a wide array of uses, i.e. :
    
    -   Removing noise
    -   Isolation of individual elements and joining disparate elements in an image.
    -   Finding of intensity bumps or holes in an image
-   We will explain dilation and erosion briefly, using the following image as an example:
    

### Dilation

-   This operations consists of convolving an image \\f$A\\f$ with some kernel (\\f$B\\f$), which can have any shape or size, usually a square or circle.
    
-   The kernel \\f$B\\f$ has a defined _anchor point_, usually being the center of the kernel.
    
-   As the kernel \\f$B\\f$ is scanned over the image, we compute the maximal pixel value overlapped by \\f$B\\f$ and replace the image pixel in the anchor point position with that maximal value. As you can deduce, this maximizing operation causes bright regions within an image to "grow" (therefore the name _dilation_).
    
-   The dilatation operation is: \\f$\\texttt{dst} (x,y) = \\max \_{(x',y'): , \\texttt{element} (x',y') \\ne0 } \\texttt{src} (x+x',y+y')\\f$
    
-   Take the above image as an example. Applying dilation we can get:
    
-   The bright area of the letter dilates around the black regions of the background.
    

### Erosion

-   This operation is the sister of dilation. It computes a local minimum over the area of given kernel.
    
-   As the kernel \\f$B\\f$ is scanned over the image, we compute the minimal pixel value overlapped by \\f$B\\f$ and replace the image pixel under the anchor point with that minimal value.
    
-   The erosion operation is: \\f$\\texttt{dst} (x,y) = \\min \_{(x',y'): , \\texttt{element} (x',y') \\ne0 } \\texttt{src} (x+x',y+y')\\f$
    
-   Analagously to the example for dilation, we can apply the erosion operator to the original image (shown above). You can see in the result below that the bright areas of the image get thinner, whereas the dark zones gets bigger.
    

## Code

@add\_toggle\_cpp This tutorial's code is shown below. You can also download it [here](https://github.com/opencv/opencv/tree/5.x/samples/cpp/tutorial_code/ImgProc/Morphology_1.cpp) @include samples/cpp/tutorial\_code/ImgProc/Morphology\_1.cpp @end\_toggle

@add\_toggle\_java This tutorial's code is shown below. You can also download it [here](https://github.com/opencv/opencv/tree/5.x/samples/java/tutorial_code/ImgProc/erosion_dilatation/MorphologyDemo1.java) @include samples/java/tutorial\_code/ImgProc/erosion\_dilatation/MorphologyDemo1.java @end\_toggle

@add\_toggle\_python This tutorial's code is shown below. You can also download it [here](https://github.com/opencv/opencv/tree/5.x/samples/python/tutorial_code/imgProc/erosion_dilatation/morphology_1.py) @include samples/python/tutorial\_code/imgProc/erosion\_dilatation/morphology\_1.py @end\_toggle

## Explanation

@add\_toggle\_cpp Most of the material shown here is trivial (if you have any doubt, please refer to the tutorials in previous sections). Let's check the general structure of the C++ program:

@snippet cpp/tutorial\_code/ImgProc/Morphology\_1.cpp main

\-# Load an image (can be BGR or grayscale) -# Create two windows (one for dilation output, the other for erosion) -# Create a set of two Trackbars for each operation: - The first trackbar "Element" returns either **erosion\_elem** or **dilation\_elem** - The second trackbar "Kernel size" return **erosion\_size** or **dilation\_size** for the corresponding operation. -# Call once erosion and dilation to show the initial image.

Every time we move any slider, the user's function **Erosion** or **Dilation** will be called and it will update the output image based on the current trackbar values.

Let's analyze these two functions:

### The erosion function (CPP)

@snippet cpp/tutorial\_code/ImgProc/Morphology\_1.cpp erosion

The function that performs the _erosion_ operation is @ref cv::erode . As we can see, it receives three arguments:

-   _src_: The source image
    
-   _erosion\_dst_: The output image
    
-   _element_: This is the kernel we will use to perform the operation. If we do not specify, the default is a simple `3x3` matrix. Otherwise, we can specify its shape. For this, we need to use the function cv::getStructuringElement : @snippet cpp/tutorial\_code/ImgProc/Morphology\_1.cpp kernel
    
    We can choose any of three shapes for our kernel:
    
    -   Rectangular box: MORPH\_RECT
    -   Cross: MORPH\_CROSS
    -   Ellipse: MORPH\_ELLIPSE
    -   Diamond: MORPH\_DIAMOND
    
    Then, we just have to specify the size of our kernel and the _anchor point_. If not specified, it is assumed to be in the center.
    

That is all. We are ready to perform the erosion of our image.

### The dilation function (CPP)

The code is below. As you can see, it is completely similar to the snippet of code for **erosion**. Here we also have the option of defining our kernel, its anchor point and the size of the operator to be used. @snippet cpp/tutorial\_code/ImgProc/Morphology\_1.cpp dilation @end\_toggle

@add\_toggle\_java Most of the material shown here is trivial (if you have any doubt, please refer to the tutorials in previous sections). Let's check however the general structure of the java class. There are 4 main parts in the java class:

-   the class constructor which setups the window that will be filled with window components
-   the `addComponentsToPane` method, which fills out the window
-   the `update` method, which determines what happens when the user changes any value
-   the `main` method, which is the entry point of the program

In this tutorial we will focus on the `addComponentsToPane` and `update` methods. However, for completion the steps followed in the constructor are:

\-# Load an image (can be BGR or grayscale) -# Create a window -# Add various control components with `addComponentsToPane` -# show the window

The components were added by the following method:

@snippet java/tutorial\_code/ImgProc/erosion\_dilatation/MorphologyDemo1.java components

In short we

\-# create a panel for the sliders -# create a combo box for the element types -# create a slider for the kernel size -# create a combo box for the morphology function to use (erosion or dilation)

The action and state changed listeners added call at the end the `update` method which updates the image based on the current slider values. So every time we move any slider, the `update` method is triggered.

### Updating the image (Java)

To update the image we used the following implementation:

@snippet java/tutorial\_code/ImgProc/erosion\_dilatation/MorphologyDemo1.java update

In other words we

\-# get the structuring element the user chose -# execute the **erosion** or **dilation** function based on `doErosion` -# reload the image with the morphology applied -# repaint the frame

Let's analyze the `erode` and `dilate` methods:

### The erosion method (Java)

@snippet java/tutorial\_code/ImgProc/erosion\_dilatation/MorphologyDemo1.java erosion

The function that performs the _erosion_ operation is @ref cv::erode . As we can see, it receives three arguments:

-   _src_: The source image
    
-   _erosion\_dst_: The output image
    
-   _element_: This is the kernel we will use to perform the operation. For specifying the shape, we need to use the function cv::getStructuringElement : @snippet java/tutorial\_code/ImgProc/erosion\_dilatation/MorphologyDemo1.java kernel
    
    We can choose any of three shapes for our kernel:
    
    -   Rectangular box: Imgproc.SHAPE\_RECT
    -   Cross: Imgproc.SHAPE\_CROSS
    -   Ellipse: Imgproc.SHAPE\_ELLIPSE
    
    Together with the shape we specify the size of our kernel and the _anchor point_. If the anchor point is not specified, it is assumed to be in the center.
    

That is all. We are ready to perform the erosion of our image.

### The dilation function (Java)

The code is below. As you can see, it is completely similar to the snippet of code for **erosion**. Here we also have the option of defining our kernel, its anchor point and the size of the operator to be used. @snippet java/tutorial\_code/ImgProc/erosion\_dilatation/MorphologyDemo1.java dilation @end\_toggle

@add\_toggle\_python Most of the material shown here is trivial (if you have any doubt, please refer to the tutorials in previous sections). Let's check the general structure of the python script:

@snippet python/tutorial\_code/imgProc/erosion\_dilatation/morphology\_1.py main

\-# Load an image (can be BGR or grayscale) -# Create two windows (one for erosion output, the other for dilation) with a set of trackbars each - The first trackbar "Element" returns the value for the morphological type that will be mapped (1 = rectangle, 2 = cross, 3 = ellipse) - The second trackbar "Kernel size" returns the size of the element for the corresponding operation -# Call once erosion and dilation to show the initial image

Every time we move any slider, the user's function **erosion** or **dilation** will be called and it will update the output image based on the current trackbar values.

Let's analyze these two functions:

### The erosion function (Python)

@snippet python/tutorial\_code/imgProc/erosion\_dilatation/morphology\_1.py erosion

The function that performs the _erosion_ operation is @ref cv::erode . As we can see, it receives two arguments and returns the processed image:

-   _src_: The source image
    
-   _element_: The kernel we will use to perform the operation. We can specify its shape by using the function cv::getStructuringElement : @snippet python/tutorial\_code/imgProc/erosion\_dilatation/morphology\_1.py kernel
    
    We can choose any of three shapes for our kernel:
    
    -   Rectangular box: MORPH\_RECT
    -   Cross: MORPH\_CROSS
    -   Ellipse: MORPH\_ELLIPSE
    -   Diamond: MORPH\_DIAMOND

Then, we just have to specify the size of our kernel and the _anchor point_. If the anchor point not specified, it is assumed to be in the center.

That is all. We are ready to perform the erosion of our image.

### The dilation function (Python)

The code is below. As you can see, it is completely similar to the snippet of code for **erosion**. Here we also have the option of defining our kernel, its anchor point and the size of the operator to be used.

@snippet python/tutorial\_code/imgProc/erosion\_dilatation/morphology\_1.py dilation @end\_toggle

@note Additionally, there are further parameters that allow you to perform multiple erosions/dilations (iterations) at once and also set the border type and value. However, We haven't used those in this simple tutorial. You can check out the reference for more details.

## Results

Compile the code above and execute it (or run the script if using python) with an image as argument. If you do not provide an image as argument the default sample image ([LinuxLogo.jpg](https://github.com/opencv/opencv/tree/5.x/samples/data/LinuxLogo.jpg)) will be used.

For instance, using this image:

We get the results below. Varying the indices in the Trackbars give different output images, naturally. Try them out! You can even try to add a third Trackbar to control the number of iterations.

(depending on the programming language the output might vary a little or be only 1 window)

## [Gausian Median Blur Bilateral Filter](https://docharvest.github.io/docs/opencv5/tutorials/imgproc/gausian_median_blur_bilateral_filter/gausian_median_blur_bilateral_filter/)

Contents

opencv5

Gausian Median Blur Bilateral Filter

OpenCV 5

Gausian Median Blur Bilateral Filter

# Smoothing Images {#tutorial\_gausian\_median\_blur\_bilateral\_filter}

@tableofcontents

@prev\_tutorial{tutorial\_random\_generator\_and\_text} @next\_tutorial{tutorial\_erosion\_dilatation}

Original author

Ana Huamán

Compatibility

OpenCV >= 3.0

## Goal

In this tutorial you will learn how to apply diverse linear filters to smooth images using OpenCV functions such as:

-   **blur()**
-   **GaussianBlur()**
-   **medianBlur()**
-   **bilateralFilter()**

## Theory

@note The explanation below belongs to the book [Computer Vision: Algorithms and Applications](https://szeliski.org/Book/) by Richard Szeliski and to _LearningOpenCV_

-   _Smoothing_, also called _blurring_, is a simple and frequently used image processing operation.
    
-   There are many reasons for smoothing. In this tutorial we will focus on smoothing in order to reduce noise (other uses will be seen in the following tutorials).
    
-   To perform a smoothing operation we will apply a _filter_ to our image. The most common type of filters are _linear_, in which an output pixel's value (i.e. \\f$g(i,j)\\f$) is determined as a weighted sum of input pixel values (i.e. \\f$f(i+k,j+l)\\f$) :
    
    \\f\[g(i,j) = \\sum\_{k,l} f(i+k, j+l) h(k,l)\\f\]
    
    \\f$h(k,l)\\f$ is called the _kernel_, which is nothing more than the coefficients of the filter.
    
    It helps to visualize a _filter_ as a window of coefficients sliding across the image.
    
-   There are many kind of filters, here we will mention the most used:
    

### Normalized Box Filter

-   This filter is the simplest of all! Each output pixel is the _mean_ of its kernel neighbors ( all of them contribute with equal weights)
    
-   The kernel is below:
    
    \\f\[K = \\dfrac{1}{K\_{width} \\cdot K\_{height}} \\begin{bmatrix} 1 & 1 & 1 & ... & 1 \\ 1 & 1 & 1 & ... & 1 \\ . & . & . & ... & 1 \\ . & . & . & ... & 1 \\ 1 & 1 & 1 & ... & 1 \\end{bmatrix}\\f\]
    

### Gaussian Filter

-   Probably the most useful filter (although not the fastest). Gaussian filtering is done by convolving each point in the input array with a _Gaussian kernel_ and then summing them all to produce the output array.
    
-   Just to make the picture clearer, remember how a 1D Gaussian kernel look like?
    
    Assuming that an image is 1D, you can notice that the pixel located in the middle would have the biggest weight. The weight of its neighbors decreases as the spatial distance between them and the center pixel increases.
    
    @note Remember that a 2D Gaussian can be represented as : \\f\[G\_{0}(x, y) = A e^{ \\dfrac{ -(x - \\mu\_{x})^{2} }{ 2\\sigma^{2}_{x} } + \\dfrac{ -(y - \\mu_{y})^{2} }{ 2\\sigma^{2}\_{y} } }\\f\] where \\f$\\mu\\f$ is the mean (the peak) and \\f$\\sigma^{2}\\f$ represents the variance (per each of the variables \\f$x\\f$ and \\f$y\\f$)
    

### Median Filter

The median filter run through each element of the signal (in this case the image) and replace each pixel with the **median** of its neighboring pixels (located in a square neighborhood around the evaluated pixel).

### Bilateral Filter

-   So far, we have explained some filters which main goal is to _smooth_ an input image. However, sometimes the filters do not only dissolve the noise, but also smooth away the _edges_. To avoid this (at certain extent at least), we can use a bilateral filter.
-   In an analogous way as the Gaussian filter, the bilateral filter also considers the neighboring pixels with weights assigned to each of them. These weights have two components, the first of which is the same weighting used by the Gaussian filter. The second component takes into account the difference in intensity between the neighboring pixels and the evaluated one.
-   For a more detailed explanation you can check [this link](http://homepages.inf.ed.ac.uk/rbf/CVonline/LOCAL_COPIES/MANDUCHI1/Bilateral_Filtering.html)

## Code

-   **What does this program do?**
    -   Loads an image
    -   Applies 4 different kinds of filters (explained in Theory) and show the filtered images sequentially

@add\_toggle\_cpp

-   **Downloadable code**: Click [here](https://raw.githubusercontent.com/opencv/opencv/5.x/samples/cpp/tutorial_code/ImgProc/Smoothing/Smoothing.cpp)
    
-   **Code at glance:** @include samples/cpp/tutorial\_code/ImgProc/Smoothing/Smoothing.cpp @end\_toggle
    

@add\_toggle\_java

-   **Downloadable code**: Click [here](https://raw.githubusercontent.com/opencv/opencv/5.x/samples/java/tutorial_code/ImgProc/Smoothing/Smoothing.java)
    
-   **Code at glance:** @include samples/java/tutorial\_code/ImgProc/Smoothing/Smoothing.java @end\_toggle
    

@add\_toggle\_python

-   **Downloadable code**: Click [here](https://raw.githubusercontent.com/opencv/opencv/5.x/samples/python/tutorial_code/imgProc/Smoothing/smoothing.py)
    
-   **Code at glance:** @include samples/python/tutorial\_code/imgProc/Smoothing/smoothing.py @end\_toggle
    

## Explanation

Let's check the OpenCV functions that involve only the smoothing procedure, since the rest is already known by now.

### Normalized Block Filter:

-   OpenCV offers the function **blur()** to perform smoothing with this filter. We specify 4 arguments (more details, check the Reference):
    -   _src_: Source image
    -   _dst_: Destination image
    -   _Size( w, h )_: Defines the size of the kernel to be used ( of width _w_ pixels and height _h_ pixels)
    -   _Point(-1, -1)_: Indicates where the anchor point (the pixel evaluated) is located with respect to the neighborhood. If there is a negative value, then the center of the kernel is considered the anchor point.

@add\_toggle\_cpp @snippet cpp/tutorial\_code/ImgProc/Smoothing/Smoothing.cpp blur @end\_toggle

@add\_toggle\_java @snippet samples/java/tutorial\_code/ImgProc/Smoothing/Smoothing.java blur @end\_toggle

@add\_toggle\_python @snippet samples/python/tutorial\_code/imgProc/Smoothing/smoothing.py blur @end\_toggle

### Gaussian Filter:

-   It is performed by the function **GaussianBlur()** : Here we use 4 arguments (more details, check the OpenCV reference):
    -   _src_: Source image
    -   _dst_: Destination image
    -   _Size(w, h)_: The size of the kernel to be used (the neighbors to be considered). \\f$w\\f$ and \\f$h\\f$ have to be odd and positive numbers otherwise the size will be calculated using the \\f$\\sigma\_{x}\\f$ and \\f$\\sigma\_{y}\\f$ arguments.
    -   \\f$\\sigma\_{x}\\f$: The standard deviation in x. Writing \\f$0\\f$ implies that \\f$\\sigma\_{x}\\f$ is calculated using kernel size.
    -   \\f$\\sigma\_{y}\\f$: The standard deviation in y. Writing \\f$0\\f$ implies that \\f$\\sigma\_{y}\\f$ is calculated using kernel size.

@add\_toggle\_cpp @snippet cpp/tutorial\_code/ImgProc/Smoothing/Smoothing.cpp gaussianblur @end\_toggle

@add\_toggle\_java @snippet samples/java/tutorial\_code/ImgProc/Smoothing/Smoothing.java gaussianblur @end\_toggle

@add\_toggle\_python @snippet samples/python/tutorial\_code/imgProc/Smoothing/smoothing.py gaussianblur @end\_toggle

### Median Filter:

-   This filter is provided by the **medianBlur()** function: We use three arguments:
    -   _src_: Source image
    -   _dst_: Destination image, must be the same type as _src_
    -   _i_: Size of the kernel (only one because we use a square window). Must be odd.

@add\_toggle\_cpp @snippet cpp/tutorial\_code/ImgProc/Smoothing/Smoothing.cpp medianblur @end\_toggle

@add\_toggle\_java @snippet samples/java/tutorial\_code/ImgProc/Smoothing/Smoothing.java medianblur @end\_toggle

@add\_toggle\_python @snippet samples/python/tutorial\_code/imgProc/Smoothing/smoothing.py medianblur @end\_toggle

### Bilateral Filter

-   Provided by OpenCV function **bilateralFilter()** We use 5 arguments:
    -   _src_: Source image
    -   _dst_: Destination image
    -   _d_: The diameter of each pixel neighborhood.
    -   \\f$\\sigma\_{Color}\\f$: Standard deviation in the color space.
    -   \\f$\\sigma\_{Space}\\f$: Standard deviation in the coordinate space (in pixel terms)

@add\_toggle\_cpp @snippet cpp/tutorial\_code/ImgProc/Smoothing/Smoothing.cpp bilateralfilter @end\_toggle

@add\_toggle\_java @snippet samples/java/tutorial\_code/ImgProc/Smoothing/Smoothing.java bilateralfilter @end\_toggle

@add\_toggle\_python @snippet samples/python/tutorial\_code/imgProc/Smoothing/smoothing.py bilateralfilter @end\_toggle

## Results

-   The code opens an image (in this case [lena.jpg](https://raw.githubusercontent.com/opencv/opencv/5.x/samples/data/lena.jpg)) and display it under the effects of the 4 filters explained.
    
-   Here is a snapshot of the image smoothed using _medianBlur_:

## [Generalized Hough Ballard Guil](https://docharvest.github.io/docs/opencv5/tutorials/imgproc/generalized_hough_ballard_guil/generalized_hough_ballard_guil/)

Contents

opencv5

Generalized Hough Ballard Guil

OpenCV 5

Generalized Hough Ballard Guil

# Object detection with Generalized Ballard and Guil Hough Transform {#tutorial\_generalized\_hough\_ballard\_guil}

@tableofcontents

@prev\_tutorial{tutorial\_hough\_circle} @next\_tutorial{tutorial\_remap}

Original author

Markus Heck

Compatibility

OpenCV >= 3.4

## Goal

In this tutorial you will learn how to:

-   Use @ref cv::GeneralizedHoughBallard and @ref cv::GeneralizedHoughGuil to detect an object

## Example

### What does this program do?

1.  Load the image and template

2.  Instantiate @ref cv::GeneralizedHoughBallard with the help of `createGeneralizedHoughBallard()`
3.  Instantiate @ref cv::GeneralizedHoughGuil with the help of `createGeneralizedHoughGuil()`
4.  Set the required parameters for both GeneralizedHough variants
5.  Detect and show found results

@note

-   Both variants can't be instantiated directly. Using the create methods is required.
-   Guil Hough is very slow. Calculating the results for the "mini" files used in this tutorial takes only a few seconds. With image and template in a higher resolution, as shown below, my notebook requires about 5 minutes to calculate a result.

### Code

The complete code for this tutorial is shown below. @include samples/cpp/tutorial\_code/ImgTrans/generalizedHoughTransform.cpp

## Explanation

### Load image, template and setup variables

@snippet samples/cpp/tutorial\_code/ImgTrans/generalizedHoughTransform.cpp generalized-hough-transform-load-and-setup

The position vectors will contain the matches the detectors will find. Every entry contains four floating point values: position vector

-   _\[0\]_: x coordinate of center point
-   _\[1\]_: y coordinate of center point
-   _\[2\]_: scale of detected object compared to template
-   _\[3\]_: rotation of detected object in degree in relation to template

An example could look as follows: `[200, 100, 0.9, 120]`

### Setup parameters

@snippet samples/cpp/tutorial\_code/ImgTrans/generalizedHoughTransform.cpp generalized-hough-transform-setup-parameters

Finding the optimal values can end up in trial and error and depends on many factors, such as the image resolution.

### Run detection

@snippet samples/cpp/tutorial\_code/ImgTrans/generalizedHoughTransform.cpp generalized-hough-transform-run

As mentioned above, this step will take some time, especially with larger images and when using Guil.

### Draw results and show image

@snippet samples/cpp/tutorial\_code/ImgTrans/generalizedHoughTransform.cpp generalized-hough-transform-draw-results

## Result

The blue rectangle shows the result of @ref cv::GeneralizedHoughBallard and the green rectangles the results of @ref cv::GeneralizedHoughGuil.

Getting perfect results like in this example is unlikely if the parameters are not perfectly adapted to the sample. An example with less perfect parameters is shown below. For the Ballard variant, only the center of the result is marked as a black dot on this image. The rectangle would be the same as on the previous image.

## [Back Projection](https://docharvest.github.io/docs/opencv5/tutorials/imgproc/histograms/back_projection/back_projection/)

Contents

opencv5

Back Projection

OpenCV 5

Back Projection

# Back Projection {#tutorial\_back\_projection}

@tableofcontents

@prev\_tutorial{tutorial\_histogram\_comparison} @next\_tutorial{tutorial\_template\_matching}

Original author

Ana Huamán

Compatibility

OpenCV >= 3.0

## Goal

In this tutorial you will learn:

-   What is Back Projection and why it is useful
-   How to use the OpenCV function @ref cv::calcBackProject to calculate Back Projection
-   How to mix different channels of an image by using the OpenCV function @ref cv::mixChannels

## Theory

### What is Back Projection?

-   Back Projection is a way of recording how well the pixels of a given image fit the distribution of pixels in a histogram model.
-   To make it simpler: For Back Projection, you calculate the histogram model of a feature and then use it to find this feature in an image.
-   Application example: If you have a histogram of flesh color (say, a Hue-Saturation histogram ), then you can use it to find flesh color areas in an image:

### How does it work?

-   We explain this by using the skin example:
    
-   Let's say you have gotten a skin histogram (Hue-Saturation) based on the image below. The histogram besides is going to be our _model histogram_ (which we know represents a sample of skin tonality). You applied some mask to capture only the histogram of the skin area:
    
-   Now, let's imagine that you get another hand image (Test Image) like the one below: (with its respective histogram):
    
-   What we want to do is to use our _model histogram_ (that we know represents a skin tonality) to detect skin areas in our Test Image. Here are the steps -# In each pixel of our Test Image (i.e. \\f$p(i,j)\\f$ ), collect the data and find the correspondent bin location for that pixel (i.e. \\f$( h\_{i,j}, s\_{i,j} )\\f$ ). -# Lookup the _model histogram_ in the correspondent bin - \\f$( h\_{i,j}, s\_{i,j} )\\f$ - and read the bin value. -# Store this bin value in a new image (_BackProjection_). Also, you may consider to normalize the _model histogram_ first, so the output for the Test Image can be visible for you. -# Applying the steps above, we get the following BackProjection image for our Test Image:
    
    ```
    ![](images/Back_Projection_Theory4.jpg)
    ```
    
    \-# In terms of statistics, the values stored in _BackProjection_ represent the _probability_ that a pixel in _Test Image_ belongs to a skin area, based on the _model histogram_ that we use. For instance in our Test image, the brighter areas are more probable to be skin area (as they actually are), whereas the darker areas have less probability (notice that these "dark" areas belong to surfaces that have some shadow on it, which in turns affects the detection).
    

## Code

-   **What does this program do?**
    -   Loads an image
    -   Convert the original to HSV format and separate only _Hue_ channel to be used for the Histogram (using the OpenCV function @ref cv::mixChannels )
    -   Let the user to enter the number of bins to be used in the calculation of the histogram.
    -   Calculate the histogram (and update it if the bins change) and the backprojection of the same image.
    -   Display the backprojection and the histogram in windows.

@add\_toggle\_cpp

-   **Downloadable code**:
    
    -   Click [here](https://github.com/opencv/opencv/tree/5.x/samples/cpp/tutorial_code/Histograms_Matching/calcBackProject_Demo1.cpp) for the basic version (explained in this tutorial).
    -   For stuff slightly fancier (using H-S histograms and floodFill to define a mask for the skin area) you can check the [improved demo](https://github.com/opencv/opencv/tree/5.x/samples/cpp/tutorial_code/Histograms_Matching/calcBackProject_Demo2.cpp)
    -   ...or you can always check out the classical [camshiftdemo](https://github.com/opencv/opencv/tree/5.x/samples/cpp/camshiftdemo.cpp) in samples.
-   **Code at glance:** @include samples/cpp/tutorial\_code/Histograms\_Matching/calcBackProject\_Demo1.cpp @end\_toggle
    

@add\_toggle\_java

-   **Downloadable code**:
    
    -   Click [here](https://github.com/opencv/opencv/tree/5.x/samples/java/tutorial_code/Histograms_Matching/back_projection/CalcBackProjectDemo1.java) for the basic version (explained in this tutorial).
    -   For stuff slightly fancier (using H-S histograms and floodFill to define a mask for the skin area) you can check the [improved demo](https://github.com/opencv/opencv/tree/5.x/samples/java/tutorial_code/Histograms_Matching/back_projection/CalcBackProjectDemo2.java)
    -   ...or you can always check out the classical [camshiftdemo](https://github.com/opencv/opencv/tree/5.x/samples/cpp/camshiftdemo.cpp) in samples.
-   **Code at glance:** @include samples/java/tutorial\_code/Histograms\_Matching/back\_projection/CalcBackProjectDemo1.java @end\_toggle
    

@add\_toggle\_python

-   **Downloadable code**:
    
    -   Click [here](https://github.com/opencv/opencv/tree/5.x/samples/python/tutorial_code/Histograms_Matching/back_projection/calcBackProject_Demo1.py) for the basic version (explained in this tutorial).
    -   For stuff slightly fancier (using H-S histograms and floodFill to define a mask for the skin area) you can check the [improved demo](https://github.com/opencv/opencv/tree/5.x/samples/python/tutorial_code/Histograms_Matching/back_projection/calcBackProject_Demo2.py)
    -   ...or you can always check out the classical [camshiftdemo](https://github.com/opencv/opencv/tree/5.x/samples/cpp/camshiftdemo.cpp) in samples.
-   **Code at glance:** @include samples/python/tutorial\_code/Histograms\_Matching/back\_projection/calcBackProject\_Demo1.py @end\_toggle
    

## Explanation

-   Read the input image:
    
    @add\_toggle\_cpp @snippet samples/cpp/tutorial\_code/Histograms\_Matching/calcBackProject\_Demo1.cpp Read the image @end\_toggle
    
    @add\_toggle\_java @snippet samples/java/tutorial\_code/Histograms\_Matching/back\_projection/CalcBackProjectDemo1.java Read the image @end\_toggle
    
    @add\_toggle\_python @snippet samples/python/tutorial\_code/Histograms\_Matching/back\_projection/calcBackProject\_Demo1.py Read the image @end\_toggle
    
-   Transform it to HSV format:
    
    @add\_toggle\_cpp @snippet samples/cpp/tutorial\_code/Histograms\_Matching/calcBackProject\_Demo1.cpp Transform it to HSV @end\_toggle
    
    @add\_toggle\_java @snippet samples/java/tutorial\_code/Histograms\_Matching/back\_projection/CalcBackProjectDemo1.java Transform it to HSV @end\_toggle
    
    @add\_toggle\_python @snippet samples/python/tutorial\_code/Histograms\_Matching/back\_projection/calcBackProject\_Demo1.py Transform it to HSV @end\_toggle
    
-   For this tutorial, we will use only the Hue value for our 1-D histogram (check out the fancier code in the links above if you want to use the more standard H-S histogram, which yields better results):
    
    @add\_toggle\_cpp @snippet samples/cpp/tutorial\_code/Histograms\_Matching/calcBackProject\_Demo1.cpp Use only the Hue value @end\_toggle
    
    @add\_toggle\_java @snippet samples/java/tutorial\_code/Histograms\_Matching/back\_projection/CalcBackProjectDemo1.java Use only the Hue value @end\_toggle
    
    @add\_toggle\_python @snippet samples/python/tutorial\_code/Histograms\_Matching/back\_projection/calcBackProject\_Demo1.py Use only the Hue value @end\_toggle
    
-   as you see, we use the function @ref cv::mixChannels to get only the channel 0 (Hue) from the hsv image. It gets the following parameters:
    
    -   **&hsv:** The source array from which the channels will be copied
    -   **1:** The number of source arrays
    -   **&hue:** The destination array of the copied channels
    -   **1:** The number of destination arrays
    -   **ch\[\] = {0,0}:** The array of index pairs indicating how the channels are copied. In this case, the Hue(0) channel of &hsv is being copied to the 0 channel of &hue (1-channel)
    -   **1:** Number of index pairs
-   Create a Trackbar for the user to enter the bin values. Any change on the Trackbar means a call to the **Hist\_and\_Backproj** callback function.
    
    @add\_toggle\_cpp @snippet samples/cpp/tutorial\_code/Histograms\_Matching/calcBackProject\_Demo1.cpp Create Trackbar to enter the number of bins @end\_toggle
    
    @add\_toggle\_java @snippet samples/java/tutorial\_code/Histograms\_Matching/back\_projection/CalcBackProjectDemo1.java Create Trackbar to enter the number of bins @end\_toggle
    
    @add\_toggle\_python @snippet samples/python/tutorial\_code/Histograms\_Matching/back\_projection/calcBackProject\_Demo1.py Create Trackbar to enter the number of bins @end\_toggle
    
-   Show the image and wait for the user to exit the program:
    
    @add\_toggle\_cpp @snippet samples/cpp/tutorial\_code/Histograms\_Matching/calcBackProject\_Demo1.cpp Show the image @end\_toggle
    
    @add\_toggle\_java @snippet samples/java/tutorial\_code/Histograms\_Matching/back\_projection/CalcBackProjectDemo1.java Show the image @end\_toggle
    
    @add\_toggle\_python @snippet samples/python/tutorial\_code/Histograms\_Matching/back\_projection/calcBackProject\_Demo1.py Show the image @end\_toggle
    
-   **Hist\_and\_Backproj function:** Initialize the arguments needed for @ref cv::calcHist . The number of bins comes from the Trackbar:
    
    @add\_toggle\_cpp @snippet samples/cpp/tutorial\_code/Histograms\_Matching/calcBackProject\_Demo1.cpp initialize @end\_toggle
    
    @add\_toggle\_java @snippet samples/java/tutorial\_code/Histograms\_Matching/back\_projection/CalcBackProjectDemo1.java initialize @end\_toggle
    
    @add\_toggle\_python @snippet samples/python/tutorial\_code/Histograms\_Matching/back\_projection/calcBackProject\_Demo1.py initialize @end\_toggle
    
-   Calculate the Histogram and normalize it to the range \\f$\[0,255\]\\f$
    
    @add\_toggle\_cpp @snippet samples/cpp/tutorial\_code/Histograms\_Matching/calcBackProject\_Demo1.cpp Get the Histogram and normalize it @end\_toggle
    
    @add\_toggle\_java @snippet samples/java/tutorial\_code/Histograms\_Matching/back\_projection/CalcBackProjectDemo1.java Get the Histogram and normalize it @end\_toggle
    
    @add\_toggle\_python @snippet samples/python/tutorial\_code/Histograms\_Matching/back\_projection/calcBackProject\_Demo1.py Get the Histogram and normalize it @end\_toggle
    
-   Get the Backprojection of the same image by calling the function @ref cv::calcBackProject
    
    @add\_toggle\_cpp @snippet samples/cpp/tutorial\_code/Histograms\_Matching/calcBackProject\_Demo1.cpp Get Backprojection @end\_toggle
    
    @add\_toggle\_java @snippet samples/java/tutorial\_code/Histograms\_Matching/back\_projection/CalcBackProjectDemo1.java Get Backprojection @end\_toggle
    
    @add\_toggle\_python @snippet samples/python/tutorial\_code/Histograms\_Matching/back\_projection/calcBackProject\_Demo1.py Get Backprojection @end\_toggle
    
-   all the arguments are known (the same as used to calculate the histogram), only we add the backproj matrix, which will store the backprojection of the source image (&hue)
    
-   Display backproj:
    
    @add\_toggle\_cpp @snippet samples/cpp/tutorial\_code/Histograms\_Matching/calcBackProject\_Demo1.cpp Draw the backproj @end\_toggle
    
    @add\_toggle\_java @snippet samples/java/tutorial\_code/Histograms\_Matching/back\_projection/CalcBackProjectDemo1.java Draw the backproj @end\_toggle
    
    @add\_toggle\_python @snippet samples/python/tutorial\_code/Histograms\_Matching/back\_projection/calcBackProject\_Demo1.py Draw the backproj @end\_toggle
    
-   Draw the 1-D Hue histogram of the image:
    
    @add\_toggle\_cpp @snippet samples/cpp/tutorial\_code/Histograms\_Matching/calcBackProject\_Demo1.cpp Draw the histogram @end\_toggle
    
    @add\_toggle\_java @snippet samples/java/tutorial\_code/Histograms\_Matching/back\_projection/CalcBackProjectDemo1.java Draw the histogram @end\_toggle
    
    @add\_toggle\_python @snippet samples/python/tutorial\_code/Histograms\_Matching/back\_projection/calcBackProject\_Demo1.py Draw the histogram @end\_toggle
    

## Results

Here are the output by using a sample image ( guess what? Another hand ). You can play with the bin values and you will observe how it affects the results:

## [Histogram Calculation](https://docharvest.github.io/docs/opencv5/tutorials/imgproc/histograms/histogram_calculation/histogram_calculation/)


## [Histogram Comparison](https://docharvest.github.io/docs/opencv5/tutorials/imgproc/histograms/histogram_comparison/histogram_comparison/)

Contents

opencv5

Histogram Comparison

OpenCV 5

Histogram Comparison

# Histogram Comparison {#tutorial\_histogram\_comparison}

@tableofcontents

@prev\_tutorial{tutorial\_histogram\_calculation} @next\_tutorial{tutorial\_back\_projection}

Original author

Ana Huamán

Compatibility

OpenCV >= 3.0

## Goal

In this tutorial you will learn how to:

-   Use the function @ref cv::compareHist to get a numerical parameter that express how well two histograms match with each other.
-   Use different metrics to compare histograms

## Theory

-   To compare two histograms ( \\f$H\_{1}\\f$ and \\f$H\_{2}\\f$ ), first we have to choose a _metric_ (\\f$d(H\_{1}, H\_{2})\\f$) to express how well both histograms match.
-   OpenCV implements the function @ref cv::compareHist to perform a comparison. It also offers 6 different metrics to compute the matching: -# **Correlation ( cv::HISTCMP\_CORREL )** \\f\[d(H\_1,H\_2) = \\frac{\\sum\_I (H\_1(I) - \\bar{H\_1}) (H\_2(I) - \\bar{H\_2})}{\\sqrt{\\sum\_I(H\_1(I) - \\bar{H\_1})^2 \\sum\_I(H\_2(I) - \\bar{H\_2})^2}}\\f\] where \\f\[\\bar{H\_k} = \\frac{1}{N} \\sum \_J H\_k(J)\\f\] and \\f$N\\f$ is the total number of histogram bins. -# **Chi-Square ( cv::HISTCMP\_CHISQR )** \\f\[d(H\_1,H\_2) = \\sum \_I \\frac{\\left(H\_1(I)-H\_2(I)\\right)^2}{H\_1(I)}\\f\] -# **Intersection ( cv::HISTCMP\_INTERSECT )** \\f\[d(H\_1,H\_2) = \\sum \_I \\min (H\_1(I), H\_2(I))\\f\] -# **Bhattacharyya distance ( cv::HISTCMP\_BHATTACHARYYA )** \\f\[d(H\_1,H\_2) = \\sqrt{1 - \\frac{1}{\\sqrt{\\bar{H\_1} \\bar{H\_2} N^2}} \\sum\_I \\sqrt{H\_1(I) \\cdot H\_2(I)}}\\f\] -# **Alternative Chi-Square ( cv::HISTCMP\_CHISQR\_ALT )** \\f\[d(H\_1,H\_2) = 2 \* \\sum \_I \\frac{\\left(H\_1(I)-H\_2(I)\\right)^2}{H\_1(I)+H\_2(I)}\\f\] -# **Kullback-Leibler divergence ( cv::HISTCMP\_KL\_DIV )** \\f\[d(H\_1,H\_2) = \\sum \_I H\_1(I) \\log \\left(\\frac{H\_1(I)}{H\_2(I)}\\right)\\f\]

## Code

-   **What does this program do?**
    -   Loads a _base image_ and 2 _test images_ to be compared with it.
    -   Generate 1 image that is the lower half of the _base image_
    -   Convert the images to HSV format
    -   Calculate the H-S histogram for all the images and normalize them in order to compare them.
    -   Compare the histogram of the _base image_ with respect to the 2 test histograms, the histogram of the lower half base image and with the same base image histogram.
    -   Display the numerical matching parameters obtained.

@add\_toggle\_cpp

-   **Downloadable code**: Click [here](https://github.com/opencv/opencv/tree/5.x/samples/cpp/tutorial_code/Histograms_Matching/compareHist_Demo.cpp)
    
-   **Code at glance:** @include samples/cpp/tutorial\_code/Histograms\_Matching/compareHist\_Demo.cpp @end\_toggle
    

@add\_toggle\_java

-   **Downloadable code**: Click [here](https://github.com/opencv/opencv/tree/5.x/samples/java/tutorial_code/Histograms_Matching/histogram_comparison/CompareHistDemo.java)
    
-   **Code at glance:** @include samples/java/tutorial\_code/Histograms\_Matching/histogram\_comparison/CompareHistDemo.java @end\_toggle
    

@add\_toggle\_python

-   **Downloadable code**: Click [here](https://github.com/opencv/opencv/tree/5.x/samples/python/tutorial_code/Histograms_Matching/histogram_comparison/compareHist_Demo.py)
    
-   **Code at glance:** @include samples/python/tutorial\_code/Histograms\_Matching/histogram\_comparison/compareHist\_Demo.py @end\_toggle
    

## Explanation

-   Load the base image (src\_base) and the other two test images:
    
    @add\_toggle\_cpp @snippet samples/cpp/tutorial\_code/Histograms\_Matching/compareHist\_Demo.cpp Load three images with different environment settings @end\_toggle
    
    @add\_toggle\_java @snippet samples/java/tutorial\_code/Histograms\_Matching/histogram\_comparison/CompareHistDemo.java Load three images with different environment settings @end\_toggle
    
    @add\_toggle\_python @snippet samples/python/tutorial\_code/Histograms\_Matching/histogram\_comparison/compareHist\_Demo.py Load three images with different environment settings @end\_toggle
    
-   Convert them to HSV format:
    
    @add\_toggle\_cpp @snippet samples/cpp/tutorial\_code/Histograms\_Matching/compareHist\_Demo.cpp Convert to HSV @end\_toggle
    
    @add\_toggle\_java @snippet samples/java/tutorial\_code/Histograms\_Matching/histogram\_comparison/CompareHistDemo.java Convert to HSV @end\_toggle
    
    @add\_toggle\_python @snippet samples/python/tutorial\_code/Histograms\_Matching/histogram\_comparison/compareHist\_Demo.py Convert to HSV @end\_toggle
    
-   Also, create an image of half the base image (in HSV format):
    
    @add\_toggle\_cpp @snippet samples/cpp/tutorial\_code/Histograms\_Matching/compareHist\_Demo.cpp Convert to HSV half @end\_toggle
    
    @add\_toggle\_java @snippet samples/java/tutorial\_code/Histograms\_Matching/histogram\_comparison/CompareHistDemo.java Convert to HSV half @end\_toggle
    
    @add\_toggle\_python @snippet samples/python/tutorial\_code/Histograms\_Matching/histogram\_comparison/compareHist\_Demo.py Convert to HSV half @end\_toggle
    
-   Initialize the arguments to calculate the histograms (bins, ranges and channels H and S).
    
    @add\_toggle\_cpp @snippet samples/cpp/tutorial\_code/Histograms\_Matching/compareHist\_Demo.cpp Using 50 bins for hue and 60 for saturation @end\_toggle
    
    @add\_toggle\_java @snippet samples/java/tutorial\_code/Histograms\_Matching/histogram\_comparison/CompareHistDemo.java Using 50 bins for hue and 60 for saturation @end\_toggle
    
    @add\_toggle\_python @snippet samples/python/tutorial\_code/Histograms\_Matching/histogram\_comparison/compareHist\_Demo.py Using 50 bins for hue and 60 for saturation @end\_toggle
    
-   Calculate the Histograms for the base image, the 2 test images and the half-down base image:
    
    @add\_toggle\_cpp @snippet samples/cpp/tutorial\_code/Histograms\_Matching/compareHist\_Demo.cpp Calculate the histograms for the HSV images @end\_toggle
    
    @add\_toggle\_java @snippet samples/java/tutorial\_code/Histograms\_Matching/histogram\_comparison/CompareHistDemo.java Calculate the histograms for the HSV images @end\_toggle
    
    @add\_toggle\_python @snippet samples/python/tutorial\_code/Histograms\_Matching/histogram\_comparison/compareHist\_Demo.py Calculate the histograms for the HSV images @end\_toggle
    
-   Apply sequentially the 6 comparison methods between the histogram of the base image (hist\_base) and the other histograms:
    
    @add\_toggle\_cpp @snippet samples/cpp/tutorial\_code/Histograms\_Matching/compareHist\_Demo.cpp Apply the histogram comparison methods @end\_toggle
    
    @add\_toggle\_java @snippet samples/java/tutorial\_code/Histograms\_Matching/histogram\_comparison/CompareHistDemo.java Apply the histogram comparison methods @end\_toggle
    
    @add\_toggle\_python @snippet samples/python/tutorial\_code/Histograms\_Matching/histogram\_comparison/compareHist\_Demo.py Apply the histogram comparison methods @end\_toggle
    

## Results

\-# We use as input the following images: where the first one is the base (to be compared to the others), the other 2 are the test images. We will also compare the first image with respect to itself and with respect of half the base image.

\-# We should expect a perfect match when we compare the base image histogram with itself. Also, compared with the histogram of half the base image, it should present a high match since both are from the same source. For the other two test images, we can observe that they have very different lighting conditions, so the matching should not be very good:

\-# Here the numeric results we got with OpenCV 4.12.0: _Method_ | Base - Base | Base - Half | Base - Test 1 | Base - Test 2 ------------------- | ------------ | ------------ | -------------- | --------------- _Correlation_ | 1.000000 | 0.880438 | 0.20457 | 0.065752 _Chi-square_ | 0.000000 | 0.328307 | 181.674 | 80.1494 _Intersection_ | 1.000000 | 0.75005 | 0.315061 | 0.0908022 _Bhattacharyya_ | 0.000000 | 0.237866 | 0.679825 | 0.873709 _Chi-Square alt._ | 0.000000 | 0.395046 | 2.31572 | 3.41024 _KL divergence_ | 0.000000 | 0.321064 | 2.6616 | 9.55412

```
For the *Correlation* and *Intersection* methods, the higher the metric, the more accurate the
match. As we can see, the match *base-base* is the highest of all as expected. Also we can observe
that the match *base-half* is the second best match (as we predicted). For the other four metrics,
the less the result, the better the match.
```

## [Histogram Equalization](https://docharvest.github.io/docs/opencv5/tutorials/imgproc/histograms/histogram_equalization/histogram_equalization/)

Contents

opencv5

Histogram Equalization

OpenCV 5

Histogram Equalization

# Histogram Equalization {#tutorial\_histogram\_equalization}

@tableofcontents

@prev\_tutorial{tutorial\_warp\_affine} @next\_tutorial{tutorial\_histogram\_calculation}

Original author

Ana Huamán

Compatibility

OpenCV >= 3.0

## Goal

In this tutorial you will learn:

-   What an image histogram is and why it is useful
-   To equalize histograms of images by using the OpenCV function @ref cv::equalizeHist

## Theory

### What is an Image Histogram?

-   It is a graphical representation of the intensity distribution of an image.
-   It quantifies the number of pixels for each intensity value considered.

### What is Histogram Equalization?

-   It is a method that improves the contrast in an image, in order to stretch out the intensity range (see also the corresponding [Wikipedia entry](https://en.wikipedia.org/wiki/Histogram_equalization)).
-   To make it clearer, from the image above, you can see that the pixels seem clustered around the middle of the available range of intensities. What Histogram Equalization does is to _stretch out_ this range. Take a look at the figure below: The green circles indicate the _underpopulated_ intensities. After applying the equalization, we get an histogram like the figure in the center. The resulting image is shown in the picture at right.

### How does it work?

-   Equalization implies _mapping_ one distribution (the given histogram) to another distribution (a wider and more uniform distribution of intensity values) so the intensity values are spread over the whole range.
    
-   To accomplish the equalization effect, the remapping should be the _cumulative distribution function (cdf)_ (more details, refer to _Learning OpenCV_). For the histogram \\f$H(i)\\f$, its _cumulative distribution_ \\f$H^{'}(i)\\f$ is:
    
    \\f\[H^{'}(i) = \\sum\_{0 \\le j < i} H(j)\\f\]
    
    To use this as a remapping function, we have to normalize \\f$H^{'}(i)\\f$ such that the maximum value is 255 ( or the maximum value for the intensity of the image ). From the example above, the cumulative function is:
    
-   Finally, we use a simple remapping procedure to obtain the intensity values of the equalized image:
    
    \\f\[equalized( x, y ) = H^{'}( src(x,y) )\\f\]
    

## Code

-   **What does this program do?**
    -   Loads an image
    -   Convert the original image to grayscale
    -   Equalize the Histogram by using the OpenCV function @ref cv::equalizeHist
    -   Display the source and equalized images in a window.

@add\_toggle\_cpp

-   **Downloadable code**: Click [here](https://github.com/opencv/opencv/tree/5.x/samples/cpp/tutorial_code/Histograms_Matching/EqualizeHist_Demo.cpp)
    
-   **Code at glance:** @include samples/cpp/tutorial\_code/Histograms\_Matching/EqualizeHist\_Demo.cpp @end\_toggle
    

@add\_toggle\_java

-   **Downloadable code**: Click [here](https://github.com/opencv/opencv/tree/5.x/samples/java/tutorial_code/Histograms_Matching/histogram_equalization/EqualizeHistDemo.java)
    
-   **Code at glance:** @include samples/java/tutorial\_code/Histograms\_Matching/histogram\_equalization/EqualizeHistDemo.java @end\_toggle
    

@add\_toggle\_python

-   **Downloadable code**: Click [here](https://github.com/opencv/opencv/tree/5.x/samples/python/tutorial_code/Histograms_Matching/histogram_equalization/EqualizeHist_Demo.py)
    
-   **Code at glance:** @include samples/python/tutorial\_code/Histograms\_Matching/histogram\_equalization/EqualizeHist\_Demo.py @end\_toggle
    

## Explanation

-   Load the source image:
    
    @add\_toggle\_cpp @snippet samples/cpp/tutorial\_code/Histograms\_Matching/EqualizeHist\_Demo.cpp Load image @end\_toggle
    
    @add\_toggle\_java @snippet samples/java/tutorial\_code/Histograms\_Matching/histogram\_equalization/EqualizeHistDemo.java Load image @end\_toggle
    
    @add\_toggle\_python @snippet samples/python/tutorial\_code/Histograms\_Matching/histogram\_equalization/EqualizeHist\_Demo.py Load image @end\_toggle
    
-   Convert it to grayscale:
    
    @add\_toggle\_cpp @snippet samples/cpp/tutorial\_code/Histograms\_Matching/EqualizeHist\_Demo.cpp Convert to grayscale @end\_toggle
    
    @add\_toggle\_java @snippet samples/java/tutorial\_code/Histograms\_Matching/histogram\_equalization/EqualizeHistDemo.java Convert to grayscale @end\_toggle
    
    @add\_toggle\_python @snippet samples/python/tutorial\_code/Histograms\_Matching/histogram\_equalization/EqualizeHist\_Demo.py Convert to grayscale @end\_toggle
    
-   Apply histogram equalization with the function @ref cv::equalizeHist :
    
    @add\_toggle\_cpp @snippet samples/cpp/tutorial\_code/Histograms\_Matching/EqualizeHist\_Demo.cpp Apply Histogram Equalization @end\_toggle
    
    @add\_toggle\_java @snippet samples/java/tutorial\_code/Histograms\_Matching/histogram\_equalization/EqualizeHistDemo.java Apply Histogram Equalization @end\_toggle
    
    @add\_toggle\_python @snippet samples/python/tutorial\_code/Histograms\_Matching/histogram\_equalization/EqualizeHist\_Demo.py Apply Histogram Equalization @end\_toggle As it can be easily seen, the only arguments are the original image and the output (equalized) image.
    
-   Display both images (original and equalized):
    
    @add\_toggle\_cpp @snippet samples/cpp/tutorial\_code/Histograms\_Matching/EqualizeHist\_Demo.cpp Display results @end\_toggle
    
    @add\_toggle\_java @snippet samples/java/tutorial\_code/Histograms\_Matching/histogram\_equalization/EqualizeHistDemo.java Display results @end\_toggle
    
    @add\_toggle\_python @snippet samples/python/tutorial\_code/Histograms\_Matching/histogram\_equalization/EqualizeHist\_Demo.py Display results @end\_toggle
    
-   Wait until user exists the program
    
    @add\_toggle\_cpp @snippet samples/cpp/tutorial\_code/Histograms\_Matching/EqualizeHist\_Demo.cpp Wait until user exits the program @end\_toggle
    
    @add\_toggle\_java @snippet samples/java/tutorial\_code/Histograms\_Matching/histogram\_equalization/EqualizeHistDemo.java Wait until user exits the program @end\_toggle
    
    @add\_toggle\_python @snippet samples/python/tutorial\_code/Histograms\_Matching/histogram\_equalization/EqualizeHist\_Demo.py Wait until user exits the program @end\_toggle
    

## Results

\-# To appreciate better the results of equalization, let's introduce an image with not much contrast, such as:

```
![](images/Histogram_Equalization_Original_Image.jpg)

which, by the way, has this histogram:

![](images/Histogram_Equalization_Original_Histogram.jpg)

notice that the pixels are clustered around the center of the histogram.
```

\-# After applying the equalization with our program, we get this result:

```
![](images/Histogram_Equalization_Equalized_Image.jpg)

this image has certainly more contrast. Check out its new histogram like this:

![](images/Histogram_Equalization_Equalized_Histogram.jpg)

Notice how the number of pixels is more distributed through the intensity range.
```

@note Are you wondering how did we draw the Histogram figures shown above? Check out the following tutorial!

## [Template Matching](https://docharvest.github.io/docs/opencv5/tutorials/imgproc/histograms/template_matching/template_matching/)

Contents

opencv5

Template Matching

OpenCV 5

Template Matching

# Template Matching {#tutorial\_template\_matching}

@tableofcontents

@prev\_tutorial{tutorial\_back\_projection} @next\_tutorial{tutorial\_find\_contours}

Original author

Ana Huamán

Compatibility

OpenCV >= 3.0

## Goal

In this tutorial you will learn how to:

-   Use the OpenCV function **matchTemplate()** to search for matches between an image patch and an input image
-   Use the OpenCV function **minMaxLoc()** to find the maximum and minimum values (as well as their positions) in a given array.

## Theory

### What is template matching?

Template matching is a technique for finding areas of an image that match (are similar) to a template image (patch).

While the patch must be a rectangle it may be that not all of the rectangle is relevant. In such a case, a mask can be used to isolate the portion of the patch that should be used to find the match.

### How does it work?

-   We need two primary components:
    
    \-# **Source image (I):** The image in which we expect to find a match to the template image -# **Template image (T):** The patch image which will be compared to the source image
    
    our goal is to detect the highest matching area:
    
-   To identify the matching area, we have to _compare_ the template image against the source image by sliding it:
    
-   By **sliding**, we mean moving the patch one pixel at a time (left to right, up to down). At each location, a metric is calculated so it represents how "good" or "bad" the match at that location is (or how similar the patch is to that particular area of the source image).
    
-   For each location of **T** over **I**, you _store_ the metric in the _result matrix_ **R**. Each location \\f$(x,y)\\f$ in **R** contains the match metric:
    
    the image above is the result **R** of sliding the patch with a metric **TM\_CCORR\_NORMED**. The brightest locations indicate the highest matches. As you can see, the location marked by the red circle is probably the one with the highest value, so that location (the rectangle formed by that point as a corner and width and height equal to the patch image) is considered the match.
    
-   In practice, we locate the highest value (or lower, depending of the type of matching method) in the _R_ matrix, using the function **minMaxLoc()**
    

### How does the mask work?

-   If masking is needed for the match, three components are required:
    
    \-# **Source image (I):** The image in which we expect to find a match to the template image -# **Template image (T):** The patch image which will be compared to the source image -# **Mask image (M):** The mask, a grayscale image that masks the template
    
-   Only two matching methods currently accept a mask: TM\_SQDIFF and TM\_CCORR\_NORMED (see below for explanation of all the matching methods available in opencv).
    
-   The mask must have the same dimensions as the template
    
-   The mask should have a CV\_8U or CV\_32F depth and the same number of channels as the template image. In CV\_8U case, the mask values are treated as binary, i.e. zero and non-zero. In CV\_32F case, the values should fall into \[0..1\] range and the template pixels will be multiplied by the corresponding mask pixel values. Since the input images in the sample have the CV\_8UC3 type, the mask is also read as color image.
    

### Which are the matching methods available in OpenCV?

Good question. OpenCV implements Template matching in the function **matchTemplate()**. The available methods are 6:

\-# **method=TM\_SQDIFF**

```
\f[R(x,y)= \sum _{x',y'} (T(x',y')-I(x+x',y+y'))^2\f]
```

\-# **method=TM\_SQDIFF\_NORMED**

```
\f[R(x,y)= \frac{\sum_{x',y'} (T(x',y')-I(x+x',y+y'))^2}{\sqrt{\sum_{x',y'}T(x',y')^2 \cdot \sum_{x',y'} I(x+x',y+y')^2}}\f]
```

\-# **method=TM\_CCORR**

```
\f[R(x,y)= \sum _{x',y'} (T(x',y')  \cdot I(x+x',y+y'))\f]
```

\-# **method=TM\_CCORR\_NORMED**

```
\f[R(x,y)= \frac{\sum_{x',y'} (T(x',y') \cdot I(x+x',y+y'))}{\sqrt{\sum_{x',y'}T(x',y')^2 \cdot \sum_{x',y'} I(x+x',y+y')^2}}\f]
```

\-# **method=TM\_CCOEFF**

```
\f[R(x,y)= \sum _{x',y'} (T'(x',y')  \cdot I'(x+x',y+y'))\f]

where

\f[\begin{array}{l} T'(x',y')=T(x',y') - 1/(w  \cdot h)  \cdot \sum _{x'',y''} T(x'',y'') \\ I'(x+x',y+y')=I(x+x',y+y') - 1/(w  \cdot h)  \cdot \sum _{x'',y''} I(x+x'',y+y'') \end{array}\f]
```

\-# **method=TM\_CCOEFF\_NORMED**

```
\f[R(x,y)= \frac{ \sum_{x',y'} (T'(x',y') \cdot I'(x+x',y+y')) }{ \sqrt{\sum_{x',y'}T'(x',y')^2 \cdot \sum_{x',y'} I'(x+x',y+y')^2} }\f]
```

## Code

-   **What does this program do?**
    -   Loads an input image, an image patch (_template_), and optionally a mask
    -   Perform a template matching procedure by using the OpenCV function **matchTemplate()** with any of the 6 matching methods described before. The user can choose the method by entering its selection in the Trackbar. If a mask is supplied, it will only be used for the methods that support masking
    -   Normalize the output of the matching procedure
    -   Localize the location with higher matching probability
    -   Draw a rectangle around the area corresponding to the highest match

@add\_toggle\_cpp

-   **Downloadable code**: Click [here](https://github.com/opencv/opencv/tree/5.x/samples/cpp/tutorial_code/Histograms_Matching/MatchTemplate_Demo.cpp)
-   **Code at glance:** @include samples/cpp/tutorial\_code/Histograms\_Matching/MatchTemplate\_Demo.cpp

@end\_toggle

@add\_toggle\_java

-   **Downloadable code**: Click [here](https://github.com/opencv/opencv/tree/5.x/samples/java/tutorial_code/ImgProc/tutorial_template_matching/MatchTemplateDemo.java)
-   **Code at glance:** @include samples/java/tutorial\_code/ImgProc/tutorial\_template\_matching/MatchTemplateDemo.java

@end\_toggle

@add\_toggle\_python

-   **Downloadable code**: Click [here](https://github.com/opencv/opencv/tree/5.x/samples/python/tutorial_code/imgProc/match_template/match_template.py)
-   **Code at glance:** @include samples/python/tutorial\_code/imgProc/match\_template/match\_template.py

@end\_toggle

## Explanation

-   Declare some global variables, such as the image, template and result matrices, as well as the match method and the window names:
    
    @add\_toggle\_cpp @snippet samples/cpp/tutorial\_code/Histograms\_Matching/MatchTemplate\_Demo.cpp declare @end\_toggle
    
    @add\_toggle\_java @snippet samples/java/tutorial\_code/ImgProc/tutorial\_template\_matching/MatchTemplateDemo.java declare @end\_toggle
    
    @add\_toggle\_python @snippet samples/python/tutorial\_code/imgProc/match\_template/match\_template.py global\_variables @end\_toggle
    
-   Load the source image, template, and optionally, if supported for the matching method, a mask:
    
    @add\_toggle\_cpp @snippet samples/cpp/tutorial\_code/Histograms\_Matching/MatchTemplate\_Demo.cpp load\_image @end\_toggle
    
    @add\_toggle\_java @snippet samples/java/tutorial\_code/ImgProc/tutorial\_template\_matching/MatchTemplateDemo.java load\_image @end\_toggle
    
    @add\_toggle\_python @snippet samples/python/tutorial\_code/imgProc/match\_template/match\_template.py load\_image @end\_toggle
    
-   Create the Trackbar to enter the kind of matching method to be used. When a change is detected the callback function is called.
    
    @add\_toggle\_cpp @snippet samples/cpp/tutorial\_code/Histograms\_Matching/MatchTemplate\_Demo.cpp create\_trackbar @end\_toggle
    
    @add\_toggle\_java @snippet samples/java/tutorial\_code/ImgProc/tutorial\_template\_matching/MatchTemplateDemo.java create\_trackbar @end\_toggle
    
    @add\_toggle\_python @snippet samples/python/tutorial\_code/imgProc/match\_template/match\_template.py create\_trackbar @end\_toggle
    
-   Let's check out the callback function. First, it makes a copy of the source image:
    
    @add\_toggle\_cpp @snippet samples/cpp/tutorial\_code/Histograms\_Matching/MatchTemplate\_Demo.cpp copy\_source @end\_toggle
    
    @add\_toggle\_java @snippet samples/java/tutorial\_code/ImgProc/tutorial\_template\_matching/MatchTemplateDemo.java copy\_source @end\_toggle
    
    @add\_toggle\_python @snippet samples/python/tutorial\_code/imgProc/match\_template/match\_template.py copy\_source @end\_toggle
    
-   Perform the template matching operation. The arguments are naturally the input image **I**, the template **T**, the result **R** and the match\_method (given by the Trackbar), and optionally the mask image **M**.
    
    @add\_toggle\_cpp @snippet samples/cpp/tutorial\_code/Histograms\_Matching/MatchTemplate\_Demo.cpp match\_template @end\_toggle
    
    @add\_toggle\_java @snippet samples/java/tutorial\_code/ImgProc/tutorial\_template\_matching/MatchTemplateDemo.java match\_template @end\_toggle
    
    @add\_toggle\_python @snippet samples/python/tutorial\_code/imgProc/match\_template/match\_template.py match\_template @end\_toggle
    
-   We normalize the results:
    
    @add\_toggle\_cpp @snippet samples/cpp/tutorial\_code/Histograms\_Matching/MatchTemplate\_Demo.cpp normalize @end\_toggle
    
    @add\_toggle\_java @snippet samples/java/tutorial\_code/ImgProc/tutorial\_template\_matching/MatchTemplateDemo.java normalize @end\_toggle
    
    @add\_toggle\_python @snippet samples/python/tutorial\_code/imgProc/match\_template/match\_template.py normalize @end\_toggle
    
-   We localize the minimum and maximum values in the result matrix **R** by using **minMaxLoc()**.
    
    @add\_toggle\_cpp @snippet samples/cpp/tutorial\_code/Histograms\_Matching/MatchTemplate\_Demo.cpp best\_match @end\_toggle
    
    @add\_toggle\_java @snippet samples/java/tutorial\_code/ImgProc/tutorial\_template\_matching/MatchTemplateDemo.java best\_match @end\_toggle
    
    @add\_toggle\_python @snippet samples/python/tutorial\_code/imgProc/match\_template/match\_template.py best\_match @end\_toggle
    
-   For the first two methods ( TM\_SQDIFF and MT\_SQDIFF\_NORMED ) the best match are the lowest values. For all the others, higher values represent better matches. So, we save the corresponding value in the **matchLoc** variable:
    
    @add\_toggle\_cpp @snippet samples/cpp/tutorial\_code/Histograms\_Matching/MatchTemplate\_Demo.cpp match\_loc @end\_toggle
    
    @add\_toggle\_java @snippet samples/java/tutorial\_code/ImgProc/tutorial\_template\_matching/MatchTemplateDemo.java match\_loc @end\_toggle
    
    @add\_toggle\_python @snippet samples/python/tutorial\_code/imgProc/match\_template/match\_template.py match\_loc @end\_toggle
    
-   Display the source image and the result matrix. Draw a rectangle around the highest possible matching area:
    
    @add\_toggle\_cpp @snippet samples/cpp/tutorial\_code/Histograms\_Matching/MatchTemplate\_Demo.cpp imshow @end\_toggle
    
    @add\_toggle\_java @snippet samples/java/tutorial\_code/ImgProc/tutorial\_template\_matching/MatchTemplateDemo.java imshow @end\_toggle
    
    @add\_toggle\_python @snippet samples/python/tutorial\_code/imgProc/match\_template/match\_template.py imshow @end\_toggle
    

## Results

\-# Testing our program with an input image such as:

```
![](images/Template_Matching_Original_Image.jpg)

and a template image:

![](images/Template_Matching_Template_Image.jpg)
```

\-# Generate the following result matrices (first row are the standard methods SQDIFF, CCORR and CCOEFF, second row are the same methods in its normalized version). In the first column, the darkest is the better match, for the other two columns, the brighter a location, the higher the match.

\-# The right match is shown below (black rectangle around the face of the guy at the right). Notice that CCORR and CCDEFF gave erroneous best matches, however their normalized version did it right, this may be due to the fact that we are only considering the "highest match" and not the other possible high matches.

```
![](images/Template_Matching_Image_Result.jpg)
```

## [HitOrMiss](https://docharvest.github.io/docs/opencv5/tutorials/imgproc/hitOrMiss/hitOrMiss/)

Contents

opencv5

HitOrMiss

OpenCV 5

HitOrMiss

# Hit-or-Miss {#tutorial\_hitOrMiss}

@tableofcontents

@prev\_tutorial{tutorial\_opening\_closing\_hats} @next\_tutorial{tutorial\_morph\_lines\_detection}

Original author

Lorena García

Compatibility

OpenCV >= 3.0

## Goal

In this tutorial you will learn how to find a given configuration or pattern in a binary image by using the Hit-or-Miss transform (also known as Hit-and-Miss transform). This transform is also the basis of more advanced morphological operations such as thinning or pruning.

We will use the OpenCV function **morphologyEx()** .

## Hit-or-Miss theory

Morphological operators process images based on their shape. These operators apply one or more _structuring elements_ to an input image to obtain the output image. The two basic morphological operations are the _erosion_ and the _dilation_. The combination of these two operations generate advanced morphological transformations such as _opening_, _closing_, or _top-hat_ transform. To know more about these and other basic morphological operations refer to previous tutorials (@ref tutorial\_erosion\_dilatation "Eroding and Dilating") and (@ref tutorial\_opening\_closing\_hats "More Morphology Transformations").

The Hit-or-Miss transformation is useful to find patterns in binary images. In particular, it finds those pixels whose neighbourhood matches the shape of a first structuring element \\f$B\_1\\f$ while not matching the shape of a second structuring element \\f$B\_2\\f$ at the same time. Mathematically, the operation applied to an image \\f$A\\f$ can be expressed as follows: \\f\[ A\\circledast B = (A\\ominus B\_1) \\cap (A^c\\ominus B\_2) \\f\]

Therefore, the hit-or-miss operation comprises three steps: 1. Erode image \\f$A\\f$ with structuring element \\f$B\_1\\f$. 2. Erode the complement of image \\f$A\\f$ (\\f$A^c\\f$) with structuring element \\f$B\_2\\f$. 3. AND results from step 1 and step 2.

The structuring elements \\f$B\_1\\f$ and \\f$B\_2\\f$ can be combined into a single element \\f$B\\f$. Let's see an example:

In this case, we are looking for a pattern in which the central pixel belongs to the background while the north, south, east, and west pixels belong to the foreground. The rest of pixels in the neighbourhood can be of any kind, we don't care about them. Now, let's apply this kernel to an input image:

You can see that the pattern is found in just one location within the image.

## Code

The code corresponding to the previous example is shown below.

@add\_toggle\_cpp You can also download it from [here](https://raw.githubusercontent.com/opencv/opencv/5.x/samples/cpp/tutorial_code/ImgProc/HitMiss/HitMiss.cpp) @include samples/cpp/tutorial\_code/ImgProc/HitMiss/HitMiss.cpp @end\_toggle

@add\_toggle\_java You can also download it from [here](https://raw.githubusercontent.com/opencv/opencv/5.x/samples/java/tutorial_code/ImgProc/HitMiss/HitMiss.java) @include samples/java/tutorial\_code/ImgProc/HitMiss/HitMiss.java @end\_toggle

@add\_toggle\_python You can also download it from [here](https://raw.githubusercontent.com/opencv/opencv/5.x/samples/python/tutorial_code/imgProc/HitMiss/hit_miss.py) @include samples/python/tutorial\_code/imgProc/HitMiss/hit\_miss.py @end\_toggle

As you can see, it is as simple as using the function **morphologyEx()** with the operation type **MORPH\_HITMISS** and the chosen kernel.

## Other examples

Here you can find the output results of applying different kernels to the same input image used before:

Now try your own patterns!

## [Canny Detector](https://docharvest.github.io/docs/opencv5/tutorials/imgproc/imgtrans/canny_detector/canny_detector/)

Contents

opencv5

Canny Detector

OpenCV 5

Canny Detector

# Canny Edge Detector {#tutorial\_canny\_detector}

@tableofcontents

@prev\_tutorial{tutorial\_laplace\_operator} @next\_tutorial{tutorial\_hough\_lines}

Original author

Ana Huamán

Compatibility

OpenCV >= 3.0

## Goal

In this tutorial you will learn how to:

-   Use the OpenCV function @ref cv::Canny to implement the Canny Edge Detector.

## Theory

The _Canny Edge detector_ @cite Canny86 was developed by John F. Canny in 1986. Also known to many as the _optimal detector_, the Canny algorithm aims to satisfy three main criteria:

-   **Low error rate:** Meaning a good detection of only existent edges.
-   **Good localization:** The distance between edge pixels detected and real edge pixels have to be minimized.
-   **Minimal response:** Only one detector response per edge.

### Steps

\-# Filter out any noise. The Gaussian filter is used for this purpose. An example of a Gaussian kernel of \\f$size = 5\\f$ that might be used is shown below:

```
\f[K = \dfrac{1}{159}\begin{bmatrix}
          2 & 4 & 5 & 4 & 2 \\
          4 & 9 & 12 & 9 & 4 \\
          5 & 12 & 15 & 12 & 5 \\
          4 & 9 & 12 & 9 & 4 \\
          2 & 4 & 5 & 4 & 2
                  \end{bmatrix}\f]
```

\-# Find the intensity gradient of the image. For this, we follow a procedure analogous to Sobel: -# Apply a pair of convolution masks (in \\f$x\\f$ and \\f$y\\f$ directions: \\f\[G\_{x} = \\begin{bmatrix} -1 & 0 & +1 \\ -2 & 0 & +2 \\ -1 & 0 & +1 \\end{bmatrix}\\f\]\\f\[G\_{y} = \\begin{bmatrix} -1 & -2 & -1 \\ 0 & 0 & 0 \\ +1 & +2 & +1 \\end{bmatrix}\\f\]

```
-#  Find the gradient strength and direction with:
    \f[\begin{array}{l}
    G = \sqrt{ G_{x}^{2} + G_{y}^{2} } \\
    \theta = \arctan(\dfrac{ G_{y} }{ G_{x} })
    \end{array}\f]
    The direction is rounded to one of four possible angles (namely 0, 45, 90 or 135)
```

\-# _Non-maximum_ suppression is applied. This removes pixels that are not considered to be part of an edge. Hence, only thin lines (candidate edges) will remain. -# _Hysteresis_: The final step. Canny does use two thresholds (upper and lower):

```
-#  If a pixel gradient is higher than the *upper* threshold, the pixel is accepted as an edge
-#  If a pixel gradient value is below the *lower* threshold, then it is rejected.
-#  If the pixel gradient is between the two thresholds, then it will be accepted only if it is
    connected to a pixel that is above the *upper* threshold.

Canny recommended a *upper*:*lower* ratio between 2:1 and 3:1.
```

\-# For more details, you can always consult your favorite Computer Vision book.

## Code

@add\_toggle\_cpp

-   The tutorial code's is shown lines below. You can also download it from [here](https://github.com/opencv/opencv/tree/5.x/samples/cpp/tutorial_code/ImgTrans/CannyDetector_Demo.cpp) @include samples/cpp/tutorial\_code/ImgTrans/CannyDetector\_Demo.cpp @end\_toggle

@add\_toggle\_java

-   The tutorial code's is shown lines below. You can also download it from [here](https://github.com/opencv/opencv/tree/5.x/samples/java/tutorial_code/ImgTrans/canny_detector/CannyDetectorDemo.java) @include samples/java/tutorial\_code/ImgTrans/canny\_detector/CannyDetectorDemo.java @end\_toggle

@add\_toggle\_python

-   The tutorial code's is shown lines below. You can also download it from [here](https://github.com/opencv/opencv/tree/5.x/samples/python/tutorial_code/ImgTrans/canny_detector/CannyDetector_Demo.py) @include samples/python/tutorial\_code/ImgTrans/canny\_detector/CannyDetector\_Demo.py @end\_toggle
    
-   **What does this program do?**
    
    -   Asks the user to enter a numerical value to set the lower threshold for our _Canny Edge Detector_ (by means of a Trackbar).
    -   Applies the _Canny Detector_ and generates a **mask** (bright lines representing the edges on a black background).
    -   Applies the mask obtained on the original image and display it in a window.

## Explanation (C++ code)

\-# Create some needed variables: @snippet cpp/tutorial\_code/ImgTrans/CannyDetector\_Demo.cpp variables

```
Note the following:

-#  We establish a ratio of lower:upper threshold of 3:1 (with the variable *ratio*).
-#  We set the kernel size of \f$3\f$ (for the Sobel operations to be performed internally by the
    Canny function).
-#  We set a maximum value for the lower Threshold of \f$100\f$.
```

\-# Loads the source image: @snippet cpp/tutorial\_code/ImgTrans/CannyDetector\_Demo.cpp load

\-# Create a matrix of the same type and size of _src_ (to be _dst_): @snippet cpp/tutorial\_code/ImgTrans/CannyDetector\_Demo.cpp create\_mat -# Convert the image to grayscale (using the function @ref cv::cvtColor ): @snippet cpp/tutorial\_code/ImgTrans/CannyDetector\_Demo.cpp convert\_to\_gray -# Create a window to display the results: @snippet cpp/tutorial\_code/ImgTrans/CannyDetector\_Demo.cpp create\_window -# Create a Trackbar for the user to enter the lower threshold for our Canny detector: @snippet cpp/tutorial\_code/ImgTrans/CannyDetector\_Demo.cpp create\_trackbar Observe the following:

```
-#  The variable to be controlled by the Trackbar is *lowThreshold* with a limit of
    *max_lowThreshold* (which we set to 100 previously)
-#  Each time the Trackbar registers an action, the callback function *CannyThreshold* will be
    invoked.
```

\-# Let's check the _CannyThreshold_ function, step by step: -# First, we blur the image with a filter of kernel size 3: @snippet cpp/tutorial\_code/ImgTrans/CannyDetector\_Demo.cpp reduce\_noise -# Second, we apply the OpenCV function @ref cv::Canny : @snippet cpp/tutorial\_code/ImgTrans/CannyDetector\_Demo.cpp canny where the arguments are:

```
    -   *detected_edges*: Source image, grayscale
    -   *detected_edges*: Output of the detector (can be the same as the input)
    -   *lowThreshold*: The value entered by the user moving the Trackbar
    -   *highThreshold*: Set in the program as three times the lower threshold (following
        Canny's recommendation)
    -   *kernel_size*: We defined it to be 3 (the size of the Sobel kernel to be used
        internally)
```

\-# We fill a _dst_ image with zeros (meaning the image is completely black). @snippet cpp/tutorial\_code/ImgTrans/CannyDetector\_Demo.cpp fill -# Finally, we will use the function @ref cv::Mat::copyTo to map only the areas of the image that are identified as edges (on a black background). @ref cv::Mat::copyTo copy the _src_ image onto _dst_. However, it will only copy the pixels in the locations where they have non-zero values. Since the output of the Canny detector is the edge contours on a black background, the resulting _dst_ will be black in all the area but the detected edges. @snippet cpp/tutorial\_code/ImgTrans/CannyDetector\_Demo.cpp copyto -# We display our result: @snippet cpp/tutorial\_code/ImgTrans/CannyDetector\_Demo.cpp display

## Result

-   After compiling the code above, we can run it giving as argument the path to an image. For example, using as an input the following image:
    
-   Moving the slider, trying different threshold, we obtain the following result:
    
-   Notice how the image is superposed to the black background on the edge regions.

## [CopyMakeBorder](https://docharvest.github.io/docs/opencv5/tutorials/imgproc/imgtrans/copyMakeBorder/copyMakeBorder/)

Contents

opencv5

CopyMakeBorder

OpenCV 5

CopyMakeBorder

# Adding borders to your images {#tutorial\_copyMakeBorder}

@tableofcontents

@prev\_tutorial{tutorial\_filter\_2d} @next\_tutorial{tutorial\_sobel\_derivatives}

Original author

Ana Huamán

Compatibility

OpenCV >= 3.0

## Goal

In this tutorial you will learn how to:

-   Use the OpenCV function **copyMakeBorder()** to set the borders (extra padding to your image).

## Theory

@note The explanation below belongs to the book **Learning OpenCV** by Bradski and Kaehler.

\-# In our previous tutorial we learned to use convolution to operate on images. One problem that naturally arises is how to handle the boundaries. How can we convolve them if the evaluated points are at the edge of the image? -# What most of OpenCV functions do is to copy a given image onto another slightly larger image and then automatically pads the boundary (by any of the methods explained in the sample code just below). This way, the convolution can be performed over the needed pixels without problems (the extra padding is cut after the operation is done). -# In this tutorial, we will briefly explore two ways of defining the extra padding (border) for an image:

```
-#  **BORDER_CONSTANT**: Pad the image with a constant value (i.e. black or \f$0\f$
-#  **BORDER_REPLICATE**: The row or column at the very edge of the original is replicated to
    the extra border.

This will be seen more clearly in the Code section.
```

-   **What does this program do?**
    -   Load an image
        
    -   Let the user choose what kind of padding use in the input image. There are two options:
        
        \-# _Constant value border_: Applies a padding of a constant value for the whole border. This value will be updated randomly each 0.5 seconds. -# _Replicated border_: The border will be replicated from the pixel values at the edges of the original image. The user chooses either option by pressing 'c' (constant) or 'r' (replicate)
        
    -   The program finishes when the user presses 'ESC'
        

## Code

The tutorial code's is shown lines below.

@add\_toggle\_cpp You can also download it from [here](https://raw.githubusercontent.com/opencv/opencv/5.x/samples/cpp/tutorial_code/ImgTrans/copyMakeBorder_demo.cpp) @include samples/cpp/tutorial\_code/ImgTrans/copyMakeBorder\_demo.cpp @end\_toggle

@add\_toggle\_java You can also download it from [here](https://raw.githubusercontent.com/opencv/opencv/5.x/samples/java/tutorial_code/ImgTrans/MakeBorder/CopyMakeBorder.java) @include samples/java/tutorial\_code/ImgTrans/MakeBorder/CopyMakeBorder.java @end\_toggle

@add\_toggle\_python You can also download it from [here](https://raw.githubusercontent.com/opencv/opencv/5.x/samples/python/tutorial_code/ImgTrans/MakeBorder/copy_make_border.py) @include samples/python/tutorial\_code/ImgTrans/MakeBorder/copy\_make\_border.py @end\_toggle

## Explanation

### Declare the variables

First we declare the variables we are going to use:

@add\_toggle\_cpp @snippet cpp/tutorial\_code/ImgTrans/copyMakeBorder\_demo.cpp variables @end\_toggle

@add\_toggle\_java @snippet java/tutorial\_code/ImgTrans/MakeBorder/CopyMakeBorder.java variables @end\_toggle

@add\_toggle\_python @snippet python/tutorial\_code/ImgTrans/MakeBorder/copy\_make\_border.py variables @end\_toggle

Especial attention deserves the variable _rng_ which is a random number generator. We use it to generate the random border color, as we will see soon.

### Load an image

As usual we load our source image _src_:

@add\_toggle\_cpp @snippet cpp/tutorial\_code/ImgTrans/copyMakeBorder\_demo.cpp load @end\_toggle

@add\_toggle\_java @snippet java/tutorial\_code/ImgTrans/MakeBorder/CopyMakeBorder.java load @end\_toggle

@add\_toggle\_python @snippet python/tutorial\_code/ImgTrans/MakeBorder/copy\_make\_border.py load @end\_toggle

### Create a window

After giving a short intro of how to use the program, we create a window:

@add\_toggle\_cpp @snippet cpp/tutorial\_code/ImgTrans/copyMakeBorder\_demo.cpp create\_window @end\_toggle

@add\_toggle\_java @snippet java/tutorial\_code/ImgTrans/MakeBorder/CopyMakeBorder.java create\_window @end\_toggle

@add\_toggle\_python @snippet python/tutorial\_code/ImgTrans/MakeBorder/copy\_make\_border.py create\_window @end\_toggle

### Initialize arguments

Now we initialize the argument that defines the size of the borders (_top_, _bottom_, _left_ and _right_). We give them a value of 5% the size of _src_.

@add\_toggle\_cpp @snippet cpp/tutorial\_code/ImgTrans/copyMakeBorder\_demo.cpp init\_arguments @end\_toggle

@add\_toggle\_java @snippet java/tutorial\_code/ImgTrans/MakeBorder/CopyMakeBorder.java init\_arguments @end\_toggle

@add\_toggle\_python @snippet python/tutorial\_code/ImgTrans/MakeBorder/copy\_make\_border.py init\_arguments @end\_toggle

### Loop

The program runs in an infinite loop while the key **ESC** isn't pressed. If the user presses '**c**' or '**r**', the _borderType_ variable takes the value of _BORDER\_CONSTANT_ or _BORDER\_REPLICATE_ respectively:

@add\_toggle\_cpp @snippet cpp/tutorial\_code/ImgTrans/copyMakeBorder\_demo.cpp check\_keypress @end\_toggle

@add\_toggle\_java @snippet java/tutorial\_code/ImgTrans/MakeBorder/CopyMakeBorder.java check\_keypress @end\_toggle

@add\_toggle\_python @snippet python/tutorial\_code/ImgTrans/MakeBorder/copy\_make\_border.py check\_keypress @end\_toggle

### Random color

In each iteration (after 0.5 seconds), the random border color (_value_) is updated...

@add\_toggle\_cpp @snippet cpp/tutorial\_code/ImgTrans/copyMakeBorder\_demo.cpp update\_value @end\_toggle

@add\_toggle\_java @snippet java/tutorial\_code/ImgTrans/MakeBorder/CopyMakeBorder.java update\_value @end\_toggle

@add\_toggle\_python @snippet python/tutorial\_code/ImgTrans/MakeBorder/copy\_make\_border.py update\_value @end\_toggle

This value is a set of three numbers picked randomly in the range \\f$\[0,255\]\\f$.

### Form a border around the image

Finally, we call the function **copyMakeBorder()** to apply the respective padding:

@add\_toggle\_cpp @snippet cpp/tutorial\_code/ImgTrans/copyMakeBorder\_demo.cpp copymakeborder @end\_toggle

@add\_toggle\_java @snippet java/tutorial\_code/ImgTrans/MakeBorder/CopyMakeBorder.java copymakeborder @end\_toggle

@add\_toggle\_python @snippet python/tutorial\_code/ImgTrans/MakeBorder/copy\_make\_border.py copymakeborder @end\_toggle

-   The arguments are:
    
    \-# _src_: Source image -# _dst_: Destination image -# _top_, _bottom_, _left_, _right_: Length in pixels of the borders at each side of the image. We define them as being 5% of the original size of the image. -# _borderType_: Define what type of border is applied. It can be constant or replicate for this example. -# _value_: If _borderType_ is _BORDER\_CONSTANT_, this is the value used to fill the border pixels.
    

### Display the results

We display our output image in the image created previously

@add\_toggle\_cpp @snippet cpp/tutorial\_code/ImgTrans/copyMakeBorder\_demo.cpp display @end\_toggle

@add\_toggle\_java @snippet java/tutorial\_code/ImgTrans/MakeBorder/CopyMakeBorder.java display @end\_toggle

@add\_toggle\_python @snippet python/tutorial\_code/ImgTrans/MakeBorder/copy\_make\_border.py display @end\_toggle

## Results

\-# After compiling the code above, you can execute it giving as argument the path of an image. The result should be:

```
-   By default, it begins with the border set to BORDER_CONSTANT. Hence, a succession of random
    colored borders will be shown.
-   If you press 'r', the border will become a replica of the edge pixels.
-   If you press 'c', the random colored borders will appear again
-   If you press 'ESC' the program will exit.

Below some screenshot showing how the border changes color and how the *BORDER_REPLICATE*
option looks:

![](images/CopyMakeBorder_Tutorial_Results.jpg)
```

## [Distance Transform](https://docharvest.github.io/docs/opencv5/tutorials/imgproc/imgtrans/distance_transformation/distance_transform/)

Contents

opencv5

Distance Transform

OpenCV 5

Distance Transform

# Image Segmentation with Distance Transform and Watershed Algorithm {#tutorial\_distance\_transform}

@tableofcontents

@prev\_tutorial{tutorial\_point\_polygon\_test} @next\_tutorial{tutorial\_out\_of\_focus\_deblur\_filter}

Original author

Theodore Tsesmelis

Compatibility

OpenCV >= 3.0

## Goal

In this tutorial you will learn how to:

-   Use the OpenCV function @ref cv::filter2D in order to perform some laplacian filtering for image sharpening
-   Use the OpenCV function @ref cv::distanceTransform in order to obtain the derived representation of a binary image, where the value of each pixel is replaced by its distance to the nearest background pixel
-   Use the OpenCV function @ref cv::watershed in order to isolate objects in the image from the background

## Theory

## Code

@add\_toggle\_cpp This tutorial code's is shown lines below. You can also download it from [here](https://github.com/opencv/opencv/tree/5.x/samples/cpp/tutorial_code/ImgTrans/imageSegmentation.cpp). @include samples/cpp/tutorial\_code/ImgTrans/imageSegmentation.cpp @end\_toggle

@add\_toggle\_java This tutorial code's is shown lines below. You can also download it from [here](https://github.com/opencv/opencv/tree/5.x/samples/java/tutorial_code/ImgTrans/distance_transformation/ImageSegmentationDemo.java) @include samples/java/tutorial\_code/ImgTrans/distance\_transformation/ImageSegmentationDemo.java @end\_toggle

@add\_toggle\_python This tutorial code's is shown lines below. You can also download it from [here](https://github.com/opencv/opencv/tree/5.x/samples/python/tutorial_code/ImgTrans/distance_transformation/imageSegmentation.py) @include samples/python/tutorial\_code/ImgTrans/distance\_transformation/imageSegmentation.py @end\_toggle

## Explanation / Result

-   Load the source image and check if it is loaded without any problem, then show it:

@add\_toggle\_cpp @snippet samples/cpp/tutorial\_code/ImgTrans/imageSegmentation.cpp load\_image @end\_toggle

@add\_toggle\_java @snippet samples/java/tutorial\_code/ImgTrans/distance\_transformation/ImageSegmentationDemo.java load\_image @end\_toggle

@add\_toggle\_python @snippet samples/python/tutorial\_code/ImgTrans/distance\_transformation/imageSegmentation.py load\_image @end\_toggle

-   Then if we have an image with a white background, it is good to transform it to black. This will help us to discriminate the foreground objects easier when we will apply the Distance Transform:

@add\_toggle\_cpp @snippet samples/cpp/tutorial\_code/ImgTrans/imageSegmentation.cpp black\_bg @end\_toggle

@add\_toggle\_java @snippet samples/java/tutorial\_code/ImgTrans/distance\_transformation/ImageSegmentationDemo.java black\_bg @end\_toggle

@add\_toggle\_python @snippet samples/python/tutorial\_code/ImgTrans/distance\_transformation/imageSegmentation.py black\_bg @end\_toggle

-   Afterwards we will sharpen our image in order to acute the edges of the foreground objects. We will apply a laplacian filter with a quite strong filter (an approximation of second derivative):

@add\_toggle\_cpp @snippet samples/cpp/tutorial\_code/ImgTrans/imageSegmentation.cpp sharp @end\_toggle

@add\_toggle\_java @snippet samples/java/tutorial\_code/ImgTrans/distance\_transformation/ImageSegmentationDemo.java sharp @end\_toggle

@add\_toggle\_python @snippet samples/python/tutorial\_code/ImgTrans/distance\_transformation/imageSegmentation.py sharp @end\_toggle

-   Now we transform our new sharpened source image to a grayscale and a binary one, respectively:

@add\_toggle\_cpp @snippet samples/cpp/tutorial\_code/ImgTrans/imageSegmentation.cpp bin @end\_toggle

@add\_toggle\_java @snippet samples/java/tutorial\_code/ImgTrans/distance\_transformation/ImageSegmentationDemo.java bin @end\_toggle

@add\_toggle\_python @snippet samples/python/tutorial\_code/ImgTrans/distance\_transformation/imageSegmentation.py bin @end\_toggle

-   We are ready now to apply the Distance Transform on the binary image. Moreover, we normalize the output image in order to be able visualize and threshold the result:

@add\_toggle\_cpp @snippet samples/cpp/tutorial\_code/ImgTrans/imageSegmentation.cpp dist @end\_toggle

@add\_toggle\_java @snippet samples/java/tutorial\_code/ImgTrans/distance\_transformation/ImageSegmentationDemo.java dist @end\_toggle

@add\_toggle\_python @snippet samples/python/tutorial\_code/ImgTrans/distance\_transformation/imageSegmentation.py dist @end\_toggle

-   We threshold the _dist_ image and then perform some morphology operation (i.e. dilation) in order to extract the peaks from the above image:

@add\_toggle\_cpp @snippet samples/cpp/tutorial\_code/ImgTrans/imageSegmentation.cpp peaks @end\_toggle

@add\_toggle\_java @snippet samples/java/tutorial\_code/ImgTrans/distance\_transformation/ImageSegmentationDemo.java peaks @end\_toggle

@add\_toggle\_python @snippet samples/python/tutorial\_code/ImgTrans/distance\_transformation/imageSegmentation.py peaks @end\_toggle

-   From each blob then we create a seed/marker for the watershed algorithm with the help of the @ref cv::findContours function:

@add\_toggle\_cpp @snippet samples/cpp/tutorial\_code/ImgTrans/imageSegmentation.cpp seeds @end\_toggle

@add\_toggle\_java @snippet samples/java/tutorial\_code/ImgTrans/distance\_transformation/ImageSegmentationDemo.java seeds @end\_toggle

@add\_toggle\_python @snippet samples/python/tutorial\_code/ImgTrans/distance\_transformation/imageSegmentation.py seeds @end\_toggle

-   Finally, we can apply the watershed algorithm, and visualize the result:

@add\_toggle\_cpp @snippet samples/cpp/tutorial\_code/ImgTrans/imageSegmentation.cpp watershed @end\_toggle

@add\_toggle\_java @snippet samples/java/tutorial\_code/ImgTrans/distance\_transformation/ImageSegmentationDemo.java watershed @end\_toggle

@add\_toggle\_python @snippet samples/python/tutorial\_code/ImgTrans/distance\_transformation/imageSegmentation.py watershed @end\_toggle

## [Filter 2d](https://docharvest.github.io/docs/opencv5/tutorials/imgproc/imgtrans/filter_2d/filter_2d/)

Contents

opencv5

Filter 2d

OpenCV 5

Filter 2d

# Making your own linear filters! {#tutorial\_filter\_2d}

@tableofcontents

@prev\_tutorial{tutorial\_threshold\_inRange} @next\_tutorial{tutorial\_copyMakeBorder}

Original author

Ana Huamán

Compatibility

OpenCV >= 3.0

## Goal

In this tutorial you will learn how to:

-   Use the OpenCV function **filter2D()** to create your own linear filters.

## Theory

@note The explanation below belongs to the book **Learning OpenCV** by Bradski and Kaehler.

### Correlation

In a very general sense, correlation is an operation between every part of an image and an operator (kernel).

### What is a kernel?

A kernel is essentially a fixed size array of numerical coefficients along with an _anchor point_ in that array, which is typically located at the center.

### How does correlation with a kernel work?

Assume you want to know the resulting value of a particular location in the image. The value of the correlation is calculated in the following way:

\-# Place the kernel anchor on top of a determined pixel, with the rest of the kernel overlaying the corresponding local pixels in the image. -# Multiply the kernel coefficients by the corresponding image pixel values and sum the result. -# Place the result to the location of the _anchor_ in the input image. -# Repeat the process for all pixels by scanning the kernel over the entire image.

Expressing the procedure above in the form of an equation we would have:

\\f\[H(x,y) = \\sum\_{i=0}^{M\_{i} - 1} \\sum\_{j=0}^{M\_{j}-1} I(x+i - a\_{i}, y + j - a\_{j})K(i,j)\\f\]

Fortunately, OpenCV provides you with the function **filter2D()** so you do not have to code all these operations.

### What does this program do?

-   Loads an image
-   Performs a _normalized box filter_. For instance, for a kernel of size \\f$size = 3\\f$, the kernel would be:

\\f\[K = \\dfrac{1}{3 \\cdot 3} \\begin{bmatrix} 1 & 1 & 1 \\ 1 & 1 & 1 \\ 1 & 1 & 1 \\end{bmatrix}\\f\]

The program will perform the filter operation with kernels of sizes 3, 5, 7, 9 and 11.

-   The filter output (with each kernel) will be shown during 500 milliseconds

## Code

The tutorial code's is shown in the lines below.

@add\_toggle\_cpp You can also download it from [here](https://raw.githubusercontent.com/opencv/opencv/5.x/samples/cpp/tutorial_code/ImgTrans/filter2D_demo.cpp) @include cpp/tutorial\_code/ImgTrans/filter2D\_demo.cpp @end\_toggle

@add\_toggle\_java You can also download it from [here](https://raw.githubusercontent.com/opencv/opencv/5.x/samples/java/tutorial_code/ImgTrans/Filter2D/Filter2D_Demo.java) @include java/tutorial\_code/ImgTrans/Filter2D/Filter2D\_Demo.java @end\_toggle

@add\_toggle\_python You can also download it from [here](https://raw.githubusercontent.com/opencv/opencv/5.x/samples/python/tutorial_code/ImgTrans/Filter2D/filter2D.py) @include python/tutorial\_code/ImgTrans/Filter2D/filter2D.py @end\_toggle

## Explanation

### Load an image

@add\_toggle\_cpp @snippet cpp/tutorial\_code/ImgTrans/filter2D\_demo.cpp load @end\_toggle

@add\_toggle\_java @snippet java/tutorial\_code/ImgTrans/Filter2D/Filter2D\_Demo.java load @end\_toggle

@add\_toggle\_python @snippet python/tutorial\_code/ImgTrans/Filter2D/filter2D.py load @end\_toggle

### Initialize the arguments

@add\_toggle\_cpp @snippet cpp/tutorial\_code/ImgTrans/filter2D\_demo.cpp init\_arguments @end\_toggle

@add\_toggle\_java @snippet java/tutorial\_code/ImgTrans/Filter2D/Filter2D\_Demo.java init\_arguments @end\_toggle

@add\_toggle\_python @snippet python/tutorial\_code/ImgTrans/Filter2D/filter2D.py init\_arguments @end\_toggle

### Loop

Perform an infinite loop updating the kernel size and applying our linear filter to the input image. Let's analyze that more in detail:

-   First we define the kernel our filter is going to use. Here it is:

@add\_toggle\_cpp @snippet cpp/tutorial\_code/ImgTrans/filter2D\_demo.cpp update\_kernel @end\_toggle

@add\_toggle\_java @snippet java/tutorial\_code/ImgTrans/Filter2D/Filter2D\_Demo.java update\_kernel @end\_toggle

@add\_toggle\_python @snippet python/tutorial\_code/ImgTrans/Filter2D/filter2D.py update\_kernel @end\_toggle

The first line is to update the _kernel\_size_ to odd values in the range: \\f$\[3,11\]\\f$. The second line actually builds the kernel by setting its value to a matrix filled with \\f$1's\\f$ and normalizing it by dividing it between the number of elements.

-   After setting the kernel, we can generate the filter by using the function **filter2D()** :

@add\_toggle\_cpp @snippet cpp/tutorial\_code/ImgTrans/filter2D\_demo.cpp apply\_filter @end\_toggle

@add\_toggle\_java @snippet java/tutorial\_code/ImgTrans/Filter2D/Filter2D\_Demo.java apply\_filter @end\_toggle

@add\_toggle\_python @snippet python/tutorial\_code/ImgTrans/Filter2D/filter2D.py apply\_filter @end\_toggle

-   The arguments denote: - _src_: Source image - _dst_: Destination image - _ddepth_: The depth of _dst_. A negative value (such as \\f$-1\\f$) indicates that the depth is the same as the source. - _kernel_: The kernel to be scanned through the image - _anchor_: The position of the anchor relative to its kernel. The location _Point(-1, -1)_ indicates the center by default. - _delta_: A value to be added to each pixel during the correlation. By default it is \\f$0\\f$ - _BORDER\_DEFAULT_: We let this value by default (more details in the following tutorial)
    
-   Our program will effectuate a _while_ loop, each 500 ms the kernel size of our filter will be updated in the range indicated.
    

## Results

\-# After compiling the code above, you can execute it giving as argument the path of an image. The result should be a window that shows an image blurred by a normalized filter. Each 0.5 seconds the kernel size should change, as can be seen in the series of snapshots below:

## [Hough Circle](https://docharvest.github.io/docs/opencv5/tutorials/imgproc/imgtrans/hough_circle/hough_circle/)

Contents

opencv5

Hough Circle

OpenCV 5

Hough Circle

# Hough Circle Transform {#tutorial\_hough\_circle}

@tableofcontents

@prev\_tutorial{tutorial\_hough\_lines} @next\_tutorial{tutorial\_generalized\_hough\_ballard\_guil}

Original author

Ana Huamán

Compatibility

OpenCV >= 3.0

## Goal

In this tutorial you will learn how to:

-   Use the OpenCV function **HoughCircles()** to detect circles in an image.

## Theory

### Hough Circle Transform

-   The Hough Circle Transform works in a _roughly_ analogous way to the Hough Line Transform explained in the previous tutorial.
    
-   In the line detection case, a line was defined by two parameters \\f$(r, \\theta)\\f$. In the circle case, we need three parameters to define a circle:
    
    \\f\[C : ( x\_{center}, y\_{center}, r )\\f\]
    
    where \\f$(x\_{center}, y\_{center})\\f$ define the center position (green point) and \\f$r\\f$ is the radius, which allows us to completely define a circle, as it can be seen below:
    
-   For sake of efficiency, OpenCV implements a detection method slightly trickier than the standard Hough Transform: _The Hough gradient method_, which is made up of two main stages. The first stage involves edge detection and finding the possible circle centers and the second stage finds the best radius for each candidate center. For more details, please check the book _Learning OpenCV_ or your favorite Computer Vision bibliography
    

#### What does this program do?

-   Loads an image and blur it to reduce the noise
-   Applies the _Hough Circle Transform_ to the blurred image .
-   Display the detected circle in a window.

## Code

@add\_toggle\_cpp The sample code that we will explain can be downloaded from [here](https://raw.githubusercontent.com/opencv/opencv/5.x/samples/cpp/tutorial_code/ImgTrans/houghcircles.cpp). A slightly fancier version (which shows trackbars for changing the threshold values) can be found [here](https://raw.githubusercontent.com/opencv/opencv/5.x/samples/cpp/tutorial_code/ImgTrans/HoughCircle_Demo.cpp). @include samples/cpp/tutorial\_code/ImgTrans/houghcircles.cpp @end\_toggle

@add\_toggle\_java The sample code that we will explain can be downloaded from [here](https://raw.githubusercontent.com/opencv/opencv/5.x/samples/java/tutorial_code/ImgTrans/HoughCircle/HoughCircles.java). @include samples/java/tutorial\_code/ImgTrans/HoughCircle/HoughCircles.java @end\_toggle

@add\_toggle\_python The sample code that we will explain can be downloaded from [here](https://raw.githubusercontent.com/opencv/opencv/5.x/samples/python/tutorial_code/ImgTrans/HoughCircle/hough_circle.py). @include samples/python/tutorial\_code/ImgTrans/HoughCircle/hough\_circle.py @end\_toggle

## Explanation

The image we used can be found [here](https://raw.githubusercontent.com/opencv/opencv/5.x/samples/data/smarties.png)

### Load an image:

@add\_toggle\_cpp @snippet samples/cpp/tutorial\_code/ImgTrans/houghcircles.cpp load @end\_toggle

@add\_toggle\_java @snippet samples/java/tutorial\_code/ImgTrans/HoughCircle/HoughCircles.java load @end\_toggle

@add\_toggle\_python @snippet samples/python/tutorial\_code/ImgTrans/HoughCircle/hough\_circle.py load @end\_toggle

### Convert it to grayscale:

@add\_toggle\_cpp @snippet samples/cpp/tutorial\_code/ImgTrans/houghcircles.cpp convert\_to\_gray @end\_toggle

@add\_toggle\_java @snippet samples/java/tutorial\_code/ImgTrans/HoughCircle/HoughCircles.java convert\_to\_gray @end\_toggle

@add\_toggle\_python @snippet samples/python/tutorial\_code/ImgTrans/HoughCircle/hough\_circle.py convert\_to\_gray @end\_toggle

### Apply a Median blur to reduce noise and avoid false circle detection:

@add\_toggle\_cpp @snippet samples/cpp/tutorial\_code/ImgTrans/houghcircles.cpp reduce\_noise @end\_toggle

@add\_toggle\_java @snippet samples/java/tutorial\_code/ImgTrans/HoughCircle/HoughCircles.java reduce\_noise @end\_toggle

@add\_toggle\_python @snippet samples/python/tutorial\_code/ImgTrans/HoughCircle/hough\_circle.py reduce\_noise @end\_toggle

### Proceed to apply Hough Circle Transform:

@add\_toggle\_cpp @snippet samples/cpp/tutorial\_code/ImgTrans/houghcircles.cpp houghcircles @end\_toggle

@add\_toggle\_java @snippet samples/java/tutorial\_code/ImgTrans/HoughCircle/HoughCircles.java houghcircles @end\_toggle

@add\_toggle\_python @snippet samples/python/tutorial\_code/ImgTrans/HoughCircle/hough\_circle.py houghcircles @end\_toggle

-   with the arguments:
    
    -   _gray_: Input image (grayscale).
    -   _circles_: A vector that stores sets of 3 values: \\f$x\_{c}, y\_{c}, r\\f$ for each detected circle.
    -   _HOUGH\_GRADIENT_: Define the detection method. Currently this is the only one available in OpenCV.
    -   _dp = 1_: The inverse ratio of resolution.
    -   _min\_dist = gray.rows/16_: Minimum distance between detected centers.
    -   _param\_1 = 200_: Upper threshold for the internal Canny edge detector.
    -   _param\_2_ = 100\*: Threshold for center detection.
    -   _min\_radius = 0_: Minimum radius to be detected. If unknown, put zero as default.
    -   _max\_radius = 0_: Maximum radius to be detected. If unknown, put zero as default.

### Draw the detected circles:

@add\_toggle\_cpp @snippet samples/cpp/tutorial\_code/ImgTrans/houghcircles.cpp draw @end\_toggle

@add\_toggle\_java @snippet samples/java/tutorial\_code/ImgTrans/HoughCircle/HoughCircles.java draw @end\_toggle

@add\_toggle\_python @snippet samples/python/tutorial\_code/ImgTrans/HoughCircle/hough\_circle.py draw @end\_toggle

You can see that we will draw the circle(s) on red and the center(s) with a small green dot

### Display the detected circle(s) and wait for the user to exit the program:

@add\_toggle\_cpp @snippet samples/cpp/tutorial\_code/ImgTrans/houghcircles.cpp display @end\_toggle

@add\_toggle\_java @snippet samples/java/tutorial\_code/ImgTrans/HoughCircle/HoughCircles.java display @end\_toggle

@add\_toggle\_python @snippet samples/python/tutorial\_code/ImgTrans/HoughCircle/hough\_circle.py display @end\_toggle

## Result

The result of running the code above with a test image is shown below:

## [Hough Lines](https://docharvest.github.io/docs/opencv5/tutorials/imgproc/imgtrans/hough_lines/hough_lines/)

Contents

opencv5

Hough Lines

OpenCV 5

Hough Lines

# Hough Line Transform {#tutorial\_hough\_lines}

@tableofcontents

@prev\_tutorial{tutorial\_canny\_detector} @next\_tutorial{tutorial\_hough\_circle}

Original author

Ana Huamán

Compatibility

OpenCV >= 3.0

## Goal

In this tutorial you will learn how to:

-   Use the OpenCV functions **HoughLines()** and **HoughLinesP()** to detect lines in an image.

## Theory

@note The explanation below belongs to the book **Learning OpenCV** by Bradski and Kaehler.

## Hough Line Transform

\-# The Hough Line Transform is a transform used to detect straight lines. -# To apply the Transform, first an edge detection pre-processing is desirable.

### How does it work?

\-# As you know, a line in the image space can be expressed with two variables. For example:

```
-#  In the **Cartesian coordinate system:** Parameters: \f$(m,b)\f$.
-#  In the **Polar coordinate system:** Parameters: \f$(r,\theta)\f$

![](images/Hough_Lines_Tutorial_Theory_0.jpg)

For Hough Transforms, we will express lines in the *Polar system*. Hence, a line equation can be
written as:

\f[y = \left ( -\dfrac{\cos \theta}{\sin \theta} \right ) x + \left ( \dfrac{r}{\sin \theta} \right )\f]
```

Arranging the terms: \\f$r = x \\cos \\theta + y \\sin \\theta\\f$

\-# In general for each point \\f$(x\_{0}, y\_{0})\\f$, we can define the family of lines that goes through that point as:

```
\f[r_{\theta} = x_{0} \cdot \cos \theta  + y_{0} \cdot \sin \theta\f]

Meaning that each pair \f$(r_{\theta},\theta)\f$ represents each line that passes by
\f$(x_{0}, y_{0})\f$.
```

\-# If for a given \\f$(x\_{0}, y\_{0})\\f$ we plot the family of lines that goes through it, we get a sinusoid. For instance, for \\f$x\_{0} = 8\\f$ and \\f$y\_{0} = 6\\f$ we get the following plot (in a plane \\f$\\theta\\f$ - \\f$r\\f$):

```
![](images/Hough_Lines_Tutorial_Theory_1.jpg)

We consider only points such that \f$r > 0\f$ and \f$0< \theta < 2 \pi\f$.
```

\-# We can do the same operation above for all the points in an image. If the curves of two different points intersect in the plane \\f$\\theta\\f$ - \\f$r\\f$, that means that both points belong to a same line. For instance, following with the example above and drawing the plot for two more points: \\f$x\_{1} = 4\\f$, \\f$y\_{1} = 9\\f$ and \\f$x\_{2} = 12\\f$, \\f$y\_{2} = 3\\f$, we get:

```
![](images/Hough_Lines_Tutorial_Theory_2.jpg)

The three plots intersect in one single point \f$(0.925, 9.6)\f$, these coordinates are the
parameters (\f$\theta, r\f$) or the line in which \f$(x_{0}, y_{0})\f$, \f$(x_{1}, y_{1})\f$ and
\f$(x_{2}, y_{2})\f$ lay.
```

\-# What does all the stuff above mean? It means that in general, a line can be _detected_ by finding the number of intersections between curves.The more curves intersecting means that the line represented by that intersection have more points. In general, we can define a _threshold_ of the minimum number of intersections needed to _detect_ a line. -# This is what the Hough Line Transform does. It keeps track of the intersection between curves of every point in the image. If the number of intersections is above some _threshold_, then it declares it as a line with the parameters \\f$(\\theta, r\_{\\theta})\\f$ of the intersection point.

### Standard and Probabilistic Hough Line Transform

OpenCV implements three kind of Hough Line Transforms:

a. **The Standard Hough Transform**

-   It consists in pretty much what we just explained in the previous section. It gives you as result a vector of couples \\f$(\\theta, r\_{\\theta})\\f$
-   In OpenCV it is implemented with the function **HoughLines()**

b. **The Probabilistic Hough Line Transform**

-   A more efficient implementation of the Hough Line Transform. It gives as output the extremes of the detected lines \\f$(x\_{0}, y\_{0}, x\_{1}, y\_{1})\\f$
-   In OpenCV it is implemented with the function **HoughLinesP()**

c. **The Weighted Hough Transform**

-   Uses edge intensity instead binary 0 or 1 values in standard Hough transform.
-   In OpenCV it is implemented with the function **HoughLines()** with use\_edgeval=true.
-   See the example in samples/cpp/tutorial\_code/ImgTrans/HoughLines\_Demo.cpp.

### What does this program do?

```
-   Loads an image
-   Applies a *Standard Hough Line Transform* and a *Probabilistic Line Transform*.
-   Display the original image and the detected line in three windows.
```

## Code

@add\_toggle\_cpp The sample code that we will explain can be downloaded from [here](https://raw.githubusercontent.com/opencv/opencv/5.x/samples/cpp/tutorial_code/ImgTrans/houghlines.cpp). A slightly fancier version (which shows both Hough standard and probabilistic with trackbars for changing the threshold values) can be found [here](https://raw.githubusercontent.com/opencv/opencv/5.x/samples/cpp/tutorial_code/ImgTrans/HoughLines_Demo.cpp). @include samples/cpp/tutorial\_code/ImgTrans/houghlines.cpp @end\_toggle

@add\_toggle\_java The sample code that we will explain can be downloaded from [here](https://raw.githubusercontent.com/opencv/opencv/5.x/samples/java/tutorial_code/ImgTrans/HoughLine/HoughLines.java). @include samples/java/tutorial\_code/ImgTrans/HoughLine/HoughLines.java @end\_toggle

@add\_toggle\_python The sample code that we will explain can be downloaded from [here](https://raw.githubusercontent.com/opencv/opencv/5.x/samples/python/tutorial_code/ImgTrans/HoughLine/hough_lines.py). @include samples/python/tutorial\_code/ImgTrans/HoughLine/hough\_lines.py @end\_toggle

## Explanation

### Load an image:

@add\_toggle\_cpp @snippet samples/cpp/tutorial\_code/ImgTrans/houghlines.cpp load @end\_toggle

@add\_toggle\_java @snippet samples/java/tutorial\_code/ImgTrans/HoughLine/HoughLines.java load @end\_toggle

@add\_toggle\_python @snippet samples/python/tutorial\_code/ImgTrans/HoughLine/hough\_lines.py load @end\_toggle

### Detect the edges of the image by using a Canny detector:

@add\_toggle\_cpp @snippet samples/cpp/tutorial\_code/ImgTrans/houghlines.cpp edge\_detection @end\_toggle

@add\_toggle\_java @snippet samples/java/tutorial\_code/ImgTrans/HoughLine/HoughLines.java edge\_detection @end\_toggle

@add\_toggle\_python @snippet samples/python/tutorial\_code/ImgTrans/HoughLine/hough\_lines.py edge\_detection @end\_toggle

Now we will apply the Hough Line Transform. We will explain how to use both OpenCV functions available for this purpose.

### Standard Hough Line Transform:

First, you apply the Transform:

@add\_toggle\_cpp @snippet samples/cpp/tutorial\_code/ImgTrans/houghlines.cpp hough\_lines @end\_toggle

@add\_toggle\_java @snippet samples/java/tutorial\_code/ImgTrans/HoughLine/HoughLines.java hough\_lines @end\_toggle

@add\_toggle\_python @snippet samples/python/tutorial\_code/ImgTrans/HoughLine/hough\_lines.py hough\_lines @end\_toggle

-   ```
      with the following arguments:
    
      -   *dst*: Output of the edge detector. It should be a grayscale image (although in fact it
          is a binary one)
      -   *lines*: A vector that will store the parameters \f$(r,\theta)\f$ of the detected lines
      -   *rho* : The resolution of the parameter \f$r\f$ in pixels. We use **1** pixel.
      -   *theta*: The resolution of the parameter \f$\theta\f$ in radians. We use **1 degree**
          (CV_PI/180)
      -   *threshold*: The minimum number of intersections to "*detect*" a line
      -   *srn* and *stn*: Default parameters to zero. Check OpenCV reference for more info.
    ```
    

And then you display the result by drawing the lines. @add\_toggle\_cpp @snippet samples/cpp/tutorial\_code/ImgTrans/houghlines.cpp draw\_lines @end\_toggle

@add\_toggle\_java @snippet samples/java/tutorial\_code/ImgTrans/HoughLine/HoughLines.java draw\_lines @end\_toggle

@add\_toggle\_python @snippet samples/python/tutorial\_code/ImgTrans/HoughLine/hough\_lines.py draw\_lines @end\_toggle

### Probabilistic Hough Line Transform

First you apply the transform:

@add\_toggle\_cpp @snippet samples/cpp/tutorial\_code/ImgTrans/houghlines.cpp hough\_lines\_p @end\_toggle

@add\_toggle\_java @snippet samples/java/tutorial\_code/ImgTrans/HoughLine/HoughLines.java hough\_lines\_p @end\_toggle

@add\_toggle\_python @snippet samples/python/tutorial\_code/ImgTrans/HoughLine/hough\_lines.py hough\_lines\_p @end\_toggle

-   ```
      with the arguments:
    
      -   *dst*: Output of the edge detector. It should be a grayscale image (although in fact it
          is a binary one)
      -   *lines*: A vector that will store the parameters
          \f$(x_{start}, y_{start}, x_{end}, y_{end})\f$ of the detected lines
      -   *rho* : The resolution of the parameter \f$r\f$ in pixels. We use **1** pixel.
      -   *theta*: The resolution of the parameter \f$\theta\f$ in radians. We use **1 degree**
          (CV_PI/180)
      -   *threshold*: The minimum number of intersections to "*detect*" a line
      -   *minLineLength*: The minimum number of points that can form a line. Lines with less than
          this number of points are disregarded.
      -   *maxLineGap*: The maximum gap between two points to be considered in the same line.
    ```
    

And then you display the result by drawing the lines.

@add\_toggle\_cpp @snippet samples/cpp/tutorial\_code/ImgTrans/houghlines.cpp draw\_lines\_p @end\_toggle

@add\_toggle\_java @snippet samples/java/tutorial\_code/ImgTrans/HoughLine/HoughLines.java draw\_lines\_p @end\_toggle

@add\_toggle\_python @snippet samples/python/tutorial\_code/ImgTrans/HoughLine/hough\_lines.py draw\_lines\_p @end\_toggle

### Display the original image and the detected lines:

@add\_toggle\_cpp @snippet samples/cpp/tutorial\_code/ImgTrans/houghlines.cpp imshow @end\_toggle

@add\_toggle\_java @snippet samples/java/tutorial\_code/ImgTrans/HoughLine/HoughLines.java imshow @end\_toggle

@add\_toggle\_python @snippet samples/python/tutorial\_code/ImgTrans/HoughLine/hough\_lines.py imshow @end\_toggle

### Wait until the user exits the program

@add\_toggle\_cpp @snippet samples/cpp/tutorial\_code/ImgTrans/houghlines.cpp exit @end\_toggle

@add\_toggle\_java @snippet samples/java/tutorial\_code/ImgTrans/HoughLine/HoughLines.java exit @end\_toggle

@add\_toggle\_python @snippet samples/python/tutorial\_code/ImgTrans/HoughLine/hough\_lines.py exit @end\_toggle

## Result

@note The results below are obtained using the slightly fancier version we mentioned in the _Code_ section. It still implements the same stuff as above, only adding the Trackbar for the Threshold.

Using an input image such as a [sudoku image](https://raw.githubusercontent.com/opencv/opencv/5.x/samples/data/sudoku.png). We get the following result by using the Standard Hough Line Transform: And by using the Probabilistic Hough Line Transform:

You may observe that the number of lines detected vary while you change the _threshold_. The explanation is sort of evident: If you establish a higher threshold, fewer lines will be detected (since you will need more points to declare a line detected).

## [Laplace Operator](https://docharvest.github.io/docs/opencv5/tutorials/imgproc/imgtrans/laplace_operator/laplace_operator/)

Contents

opencv5

Laplace Operator

OpenCV 5

Laplace Operator

# Laplace Operator {#tutorial\_laplace\_operator}

@tableofcontents

@prev\_tutorial{tutorial\_sobel\_derivatives} @next\_tutorial{tutorial\_canny\_detector}

Original author

Ana Huamán

Compatibility

OpenCV >= 3.0

## Goal

In this tutorial you will learn how to:

-   Use the OpenCV function **Laplacian()** to implement a discrete analog of the _Laplacian operator_.

## Theory

\-# In the previous tutorial we learned how to use the _Sobel Operator_. It was based on the fact that in the edge area, the pixel intensity shows a "jump" or a high variation of intensity. Getting the first derivative of the intensity, we observed that an edge is characterized by a maximum, as it can be seen in the figure:

```
![](images/Laplace_Operator_Tutorial_Theory_Previous.jpg)
```

\-# And...what happens if we take the second derivative?

```
![](images/Laplace_Operator_Tutorial_Theory_ddIntensity.jpg)

You can observe that the second derivative is zero! So, we can also use this criterion to
attempt to detect edges in an image. However, note that zeros will not only appear in edges
(they can actually appear in other meaningless locations); this can be solved by applying
filtering where needed.
```

### Laplacian Operator

\-# From the explanation above, we deduce that the second derivative can be used to _detect edges_. Since images are "_2D_", we would need to take the derivative in both dimensions. Here, the Laplacian operator comes handy. -# The _Laplacian operator_ is defined by:

\\f\[Laplace(f) = \\dfrac{\\partial^{2} f}{\\partial x^{2}} + \\dfrac{\\partial^{2} f}{\\partial y^{2}}\\f\]

\-# The Laplacian operator is implemented in OpenCV by the function **Laplacian()** . In fact, since the Laplacian uses the gradient of images, it calls internally the _Sobel_ operator to perform its computation.

## Code

\-# **What does this program do?** - Loads an image - Remove noise by applying a Gaussian blur and then convert the original image to grayscale - Applies a Laplacian operator to the grayscale image and stores the output image - Display the result in a window

@add\_toggle\_cpp -# The tutorial code's is shown lines below. You can also download it from [here](https://raw.githubusercontent.com/opencv/opencv/5.x/samples/cpp/tutorial_code/ImgTrans/Laplace_Demo.cpp) @include samples/cpp/tutorial\_code/ImgTrans/Laplace\_Demo.cpp @end\_toggle

@add\_toggle\_java -# The tutorial code's is shown lines below. You can also download it from [here](https://raw.githubusercontent.com/opencv/opencv/5.x/samples/java/tutorial_code/ImgTrans/LaPlace/LaplaceDemo.java) @include samples/java/tutorial\_code/ImgTrans/LaPlace/LaplaceDemo.java @end\_toggle

@add\_toggle\_python -# The tutorial code's is shown lines below. You can also download it from [here](https://raw.githubusercontent.com/opencv/opencv/5.x/samples/python/tutorial_code/ImgTrans/LaPlace/laplace_demo.py) @include samples/python/tutorial\_code/ImgTrans/LaPlace/laplace\_demo.py @end\_toggle

## Explanation

### Declare variables

@add\_toggle\_cpp @snippet cpp/tutorial\_code/ImgTrans/Laplace\_Demo.cpp variables @end\_toggle

@add\_toggle\_java @snippet samples/java/tutorial\_code/ImgTrans/LaPlace/LaplaceDemo.java variables @end\_toggle

@add\_toggle\_python @snippet samples/python/tutorial\_code/ImgTrans/LaPlace/laplace\_demo.py variables @end\_toggle

### Load source image

@add\_toggle\_cpp @snippet cpp/tutorial\_code/ImgTrans/Laplace\_Demo.cpp load @end\_toggle

@add\_toggle\_java @snippet samples/java/tutorial\_code/ImgTrans/LaPlace/LaplaceDemo.java load @end\_toggle

@add\_toggle\_python @snippet samples/python/tutorial\_code/ImgTrans/LaPlace/laplace\_demo.py load @end\_toggle

### Reduce noise

@add\_toggle\_cpp @snippet cpp/tutorial\_code/ImgTrans/Laplace\_Demo.cpp reduce\_noise @end\_toggle

@add\_toggle\_java @snippet samples/java/tutorial\_code/ImgTrans/LaPlace/LaplaceDemo.java reduce\_noise @end\_toggle

@add\_toggle\_python @snippet samples/python/tutorial\_code/ImgTrans/LaPlace/laplace\_demo.py reduce\_noise @end\_toggle

### Grayscale

@add\_toggle\_cpp @snippet cpp/tutorial\_code/ImgTrans/Laplace\_Demo.cpp convert\_to\_gray @end\_toggle

@add\_toggle\_java @snippet samples/java/tutorial\_code/ImgTrans/LaPlace/LaplaceDemo.java convert\_to\_gray @end\_toggle

@add\_toggle\_python @snippet samples/python/tutorial\_code/ImgTrans/LaPlace/laplace\_demo.py convert\_to\_gray @end\_toggle

### Laplacian operator

@add\_toggle\_cpp @snippet cpp/tutorial\_code/ImgTrans/Laplace\_Demo.cpp laplacian @end\_toggle

@add\_toggle\_java @snippet samples/java/tutorial\_code/ImgTrans/LaPlace/LaplaceDemo.java laplacian @end\_toggle

@add\_toggle\_python @snippet samples/python/tutorial\_code/ImgTrans/LaPlace/laplace\_demo.py laplacian @end\_toggle

-   The arguments are:
    -   _src\_gray_: The input image.
    -   _dst_: Destination (output) image
    -   _ddepth_: Depth of the destination image. Since our input is _CV\_8U_ we define _ddepth_ = _CV\_16S_ to avoid overflow
    -   _kernel\_size_: The kernel size of the Sobel operator to be applied internally. We use 3 in this example.
    -   _scale_, _delta_ and _BORDER\_DEFAULT_: We leave them as default values.

### Convert output to a _CV\_8U_ image

@add\_toggle\_cpp @snippet cpp/tutorial\_code/ImgTrans/Laplace\_Demo.cpp convert @end\_toggle

@add\_toggle\_java @snippet samples/java/tutorial\_code/ImgTrans/LaPlace/LaplaceDemo.java convert @end\_toggle

@add\_toggle\_python @snippet samples/python/tutorial\_code/ImgTrans/LaPlace/laplace\_demo.py convert @end\_toggle

### Display the result

@add\_toggle\_cpp @snippet cpp/tutorial\_code/ImgTrans/Laplace\_Demo.cpp display @end\_toggle

@add\_toggle\_java @snippet samples/java/tutorial\_code/ImgTrans/LaPlace/LaplaceDemo.java display @end\_toggle

@add\_toggle\_python @snippet samples/python/tutorial\_code/ImgTrans/LaPlace/laplace\_demo.py display @end\_toggle

## Results

\-# After compiling the code above, we can run it giving as argument the path to an image. For example, using as an input:

```
![](images/Laplace_Operator_Tutorial_Original_Image.jpg)
```

\-# We obtain the following result. Notice how the trees and the silhouette of the cow are approximately well defined (except in areas in which the intensity are very similar, i.e. around the cow's head). Also, note that the roof of the house behind the trees (right side) is notoriously marked. This is due to the fact that the contrast is higher in that region.

```
![](images/Laplace_Operator_Tutorial_Result.jpg)
```

## [Remap](https://docharvest.github.io/docs/opencv5/tutorials/imgproc/imgtrans/remap/remap/)

Contents

opencv5

Remap

OpenCV 5

Remap

# Remapping {#tutorial\_remap}

@tableofcontents

@prev\_tutorial{tutorial\_generalized\_hough\_ballard\_guil} @next\_tutorial{tutorial\_warp\_affine}

Original author

Ana Huamán

Compatibility

OpenCV >= 3.0

## Goal

In this tutorial you will learn how to:

a. Use the OpenCV function @ref cv::remap to implement simple remapping routines.

## Theory

### What is remapping?

-   It is the process of taking pixels from one place in the image and locating them in another position in a new image.
    
-   To accomplish the mapping process, it might be necessary to do some interpolation for non-integer pixel locations, since there will not always be a one-to-one-pixel correspondence between source and destination images.
    
-   We can express the remap for every pixel location \\f$(x,y)\\f$ as:
    
    \\f\[g(x,y) = f ( h(x,y) )\\f\]
    
    where \\f$g()\\f$ is the remapped image, \\f$f()\\f$ the source image and \\f$h(x,y)\\f$ is the mapping function that operates on \\f$(x,y)\\f$.
    
-   Let's think in a quick example. Imagine that we have an image \\f$I\\f$ and, say, we want to do a remap such that:
    
    \\f\[h(x,y) = (I.cols - x, y )\\f\]
    
    What would happen? It is easily seen that the image would flip in the \\f$x\\f$ direction. For instance, consider the input image:
    
    observe how the red circle changes positions with respect to \\f$x\\f$ (considering \\f$x\\f$ the horizontal direction):
    
-   In OpenCV, the function @ref cv::remap offers a simple remapping implementation.
    

## Code

-   **What does this program do?**
    -   Loads an image
    -   Each second, apply 1 of 4 different remapping processes to the image and display them indefinitely in a window.
    -   Wait for the user to exit the program

@add\_toggle\_cpp

-   The tutorial code is shown lines below. You can also download it from [here](https://github.com/opencv/opencv/tree/5.x/samples/cpp/tutorial_code/ImgTrans/Remap_Demo.cpp) @include samples/cpp/tutorial\_code/ImgTrans/Remap\_Demo.cpp @end\_toggle

@add\_toggle\_java

-   The tutorial code is shown lines below. You can also download it from [here](https://github.com/opencv/opencv/tree/5.x/samples/java/tutorial_code/ImgTrans/remap/RemapDemo.java) @include samples/java/tutorial\_code/ImgTrans/remap/RemapDemo.java @end\_toggle

@add\_toggle\_python

-   The tutorial code is shown lines below. You can also download it from [here](https://github.com/opencv/opencv/tree/5.x/samples/python/tutorial_code/ImgTrans/remap/Remap_Demo.py) @include samples/python/tutorial\_code/ImgTrans/remap/Remap\_Demo.py @end\_toggle

## Explanation

-   Load an image:
    
    @add\_toggle\_cpp @snippet samples/cpp/tutorial\_code/ImgTrans/Remap\_Demo.cpp Load @end\_toggle
    
    @add\_toggle\_java @snippet samples/java/tutorial\_code/ImgTrans/remap/RemapDemo.java Load @end\_toggle
    
    @add\_toggle\_python @snippet samples/python/tutorial\_code/ImgTrans/remap/Remap\_Demo.py Load @end\_toggle
    
-   Create the destination image and the two mapping matrices (for x and y )
    
    @add\_toggle\_cpp @snippet samples/cpp/tutorial\_code/ImgTrans/Remap\_Demo.cpp Create @end\_toggle
    
    @add\_toggle\_java @snippet samples/java/tutorial\_code/ImgTrans/remap/RemapDemo.java Create @end\_toggle
    
    @add\_toggle\_python @snippet samples/python/tutorial\_code/ImgTrans/remap/Remap\_Demo.py Create @end\_toggle
    
-   Create a window to display results
    
    @add\_toggle\_cpp @snippet samples/cpp/tutorial\_code/ImgTrans/Remap\_Demo.cpp Window @end\_toggle
    
    @add\_toggle\_java @snippet samples/java/tutorial\_code/ImgTrans/remap/RemapDemo.java Window @end\_toggle
    
    @add\_toggle\_python @snippet samples/python/tutorial\_code/ImgTrans/remap/Remap\_Demo.py Window @end\_toggle
    
-   Establish a loop. Each 1000 ms we update our mapping matrices (_mat\_x_ and _mat\_y_) and apply them to our source image:
    
    @add\_toggle\_cpp @snippet samples/cpp/tutorial\_code/ImgTrans/Remap\_Demo.cpp Loop @end\_toggle
    
    @add\_toggle\_java @snippet samples/java/tutorial\_code/ImgTrans/remap/RemapDemo.java Loop @end\_toggle
    
    @add\_toggle\_python @snippet samples/python/tutorial\_code/ImgTrans/remap/Remap\_Demo.py Loop @end\_toggle
    
-   The function that applies the remapping is @ref cv::remap . We give the following arguments:
    
    -   **src**: Source image
    -   **dst**: Destination image of same size as _src_
    -   **map\_x**: The mapping function in the x direction. It is equivalent to the first component of \\f$h(i,j)\\f$
    -   **map\_y**: Same as above, but in y direction. Note that _map\_y_ and _map\_x_ are both of the same size as _src_
    -   **INTER\_LINEAR**: The type of interpolation to use for non-integer pixels. This is by default.
    -   **BORDER\_CONSTANT**: Default
    
    How do we update our mapping matrices _mat\_x_ and _mat\_y_? Go on reading:
    
-   **Updating the mapping matrices:** We are going to perform 4 different mappings: -# Reduce the picture to half its size and will display it in the middle: \\f\[h(i,j) = ( 2 \\times i - src.cols/2 + 0.5, 2 \\times j - src.rows/2 + 0.5)\\f\] for all pairs \\f$(i,j)\\f$ such that: \\f$\\dfrac{src.cols}{4}<i<\\dfrac{3 \\cdot src.cols}{4}\\f$ and \\f$\\dfrac{src.rows}{4}<j<\\dfrac{3 \\cdot src.rows}{4}\\f$ -# Turn the image upside down: \\f$h( i, j ) = (i, src.rows - j)\\f$ -# Reflect the image from left to right: \\f$h(i,j) = ( src.cols - i, j )\\f$ -# Combination of b and c: \\f$h(i,j) = ( src.cols - i, src.rows - j )\\f$
    

This is expressed in the following snippet. Here, _map\_x_ represents the first coordinate of _h(i,j)_ and _map\_y_ the second coordinate.

@add\_toggle\_cpp @snippet samples/cpp/tutorial\_code/ImgTrans/Remap\_Demo.cpp Update @end\_toggle

@add\_toggle\_java @snippet samples/java/tutorial\_code/ImgTrans/remap/RemapDemo.java Update @end\_toggle

@add\_toggle\_python @snippet samples/python/tutorial\_code/ImgTrans/remap/Remap\_Demo.py Update @end\_toggle

## Result

\-# After compiling the code above, you can execute it giving as argument an image path. For instance, by using the following image:

```
![](images/Remap_Tutorial_Original_Image.jpg)
```

\-# This is the result of reducing it to half the size and centering it:

```
![](images/Remap_Tutorial_Result_0.jpg)
```

\-# Turning it upside down:

```
![](images/Remap_Tutorial_Result_1.jpg)
```

\-# Reflecting it in the x direction:

```
![](images/Remap_Tutorial_Result_2.jpg)
```

\-# Reflecting it in both directions:

```
![](images/Remap_Tutorial_Result_3.jpg)
```

## [Sobel Derivatives](https://docharvest.github.io/docs/opencv5/tutorials/imgproc/imgtrans/sobel_derivatives/sobel_derivatives/)

Contents

opencv5

Sobel Derivatives

OpenCV 5

Sobel Derivatives

# Sobel Derivatives {#tutorial\_sobel\_derivatives}

@tableofcontents

@prev\_tutorial{tutorial\_copyMakeBorder} @next\_tutorial{tutorial\_laplace\_operator}

Original author

Ana Huamán

Compatibility

OpenCV >= 3.0

## Goal

In this tutorial you will learn how to:

-   Use the OpenCV function **Sobel()** to calculate the derivatives from an image.
-   Use the OpenCV function **Scharr()** to calculate a more accurate derivative for a kernel of size \\f$3 \\cdot 3\\f$

## Theory

@note The explanation below belongs to the book **Learning OpenCV** by Bradski and Kaehler.

\-# In the last two tutorials we have seen applicative examples of convolutions. One of the most important convolutions is the computation of derivatives in an image (or an approximation to them). -# Why may be important the calculus of the derivatives in an image? Let's imagine we want to detect the _edges_ present in the image. For instance:

```
![](images/Sobel_Derivatives_Tutorial_Theory_0.jpg)

You can easily notice that in an *edge*, the pixel intensity *changes* in a notorious way. A
good way to express *changes* is by using *derivatives*. A high change in gradient indicates a
major change in the image.
```

\-# To be more graphical, let's assume we have a 1D-image. An edge is shown by the "jump" in intensity in the plot below:

```
![](images/Sobel_Derivatives_Tutorial_Theory_Intensity_Function.jpg)
```

\-# The edge "jump" can be seen more easily if we take the first derivative (actually, here appears as a maximum)

```
![](images/Sobel_Derivatives_Tutorial_Theory_dIntensity_Function.jpg)
```

\-# So, from the explanation above, we can deduce that a method to detect edges in an image can be performed by locating pixel locations where the gradient is higher than its neighbors (or to generalize, higher than a threshold). -# More detailed explanation, please refer to **Learning OpenCV** by Bradski and Kaehler

### Sobel Operator

\-# The Sobel Operator is a discrete differentiation operator. It computes an approximation of the gradient of an image intensity function. -# The Sobel Operator combines Gaussian smoothing and differentiation.

### Formulation

Assuming that the image to be operated is \\f$I\\f$:

\-# We calculate two derivatives: -# **Horizontal changes**: This is computed by convolving \\f$I\\f$ with a kernel \\f$G\_{x}\\f$ with odd size. For example for a kernel size of 3, \\f$G\_{x}\\f$ would be computed as:

```
    \f[G_{x} = \begin{bmatrix}
    -1 & 0 & +1  \\
    -2 & 0 & +2  \\
    -1 & 0 & +1
    \end{bmatrix} * I\f]

-#  **Vertical changes**: This is computed by convolving \f$I\f$ with a kernel \f$G_{y}\f$ with odd
    size. For example for a kernel size of 3, \f$G_{y}\f$ would be computed as:

    \f[G_{y} = \begin{bmatrix}
    -1 & -2 & -1  \\
    0 & 0 & 0  \\
    +1 & +2 & +1
    \end{bmatrix} * I\f]
```

\-# At each point of the image we calculate an approximation of the _gradient_ in that point by combining both results above:

```
\f[G = \sqrt{ G_{x}^{2} + G_{y}^{2} }\f]

Although sometimes the following simpler equation is used:

\f[G = |G_{x}| + |G_{y}|\f]
```

@note When the size of the kernel is `3`, the Sobel kernel shown above may produce noticeable inaccuracies (after all, Sobel is only an approximation of the derivative). OpenCV addresses this inaccuracy for kernels of size 3 by using the **Scharr()** function. This is as fast but more accurate than the standard Sobel function. It implements the following kernels: \\f\[G\_{x} = \\begin{bmatrix} -3 & 0 & +3 \\ -10 & 0 & +10 \\ -3 & 0 & +3 \\end{bmatrix}\\f\]\\f\[G\_{y} = \\begin{bmatrix} -3 & -10 & -3 \\ 0 & 0 & 0 \\ +3 & +10 & +3 \\end{bmatrix}\\f\] @note You can check out more information of this function in the OpenCV reference - **Scharr()** . Also, in the sample code below, you will notice that above the code for **Sobel()** function there is also code for the **Scharr()** function commented. Uncommenting it (and obviously commenting the Sobel stuff) should give you an idea of how this function works.

## Code

\-# **What does this program do?** - Applies the _Sobel Operator_ and generates as output an image with the detected _edges_ bright on a darker background.

\-# The tutorial code's is shown lines below.

@add\_toggle\_cpp You can also download it from [here](https://raw.githubusercontent.com/opencv/opencv/5.x/samples/cpp/tutorial_code/ImgTrans/Sobel_Demo.cpp) @include samples/cpp/tutorial\_code/ImgTrans/Sobel\_Demo.cpp @end\_toggle

@add\_toggle\_java You can also download it from [here](https://raw.githubusercontent.com/opencv/opencv/5.x/samples/java/tutorial_code/ImgTrans/SobelDemo/SobelDemo.java) @include samples/java/tutorial\_code/ImgTrans/SobelDemo/SobelDemo.java @end\_toggle

@add\_toggle\_python You can also download it from [here](https://raw.githubusercontent.com/opencv/opencv/5.x/samples/python/tutorial_code/ImgTrans/SobelDemo/sobel_demo.py) @include samples/python/tutorial\_code/ImgTrans/SobelDemo/sobel\_demo.py @end\_toggle

## Explanation

### Declare variables

@snippet cpp/tutorial\_code/ImgTrans/Sobel\_Demo.cpp variables

### Load source image

@snippet cpp/tutorial\_code/ImgTrans/Sobel\_Demo.cpp load

### Reduce noise

@snippet cpp/tutorial\_code/ImgTrans/Sobel\_Demo.cpp reduce\_noise

### Grayscale

@snippet cpp/tutorial\_code/ImgTrans/Sobel\_Demo.cpp convert\_to\_gray

### Sobel Operator

@snippet cpp/tutorial\_code/ImgTrans/Sobel\_Demo.cpp sobel

-   We calculate the "derivatives" in _x_ and _y_ directions. For this, we use the function **Sobel()** as shown below: The function takes the following arguments:
    
    -   _src\_gray_: In our example, the input image. Here it is _CV\_8U_
    -   _grad\_x_ / _grad\_y_ : The output image.
    -   _ddepth_: The depth of the output image. We set it to _CV\_16S_ to avoid overflow.
    -   _x\_order_: The order of the derivative in **x** direction.
    -   _y\_order_: The order of the derivative in **y** direction.
    -   _scale_, _delta_ and _BORDER\_DEFAULT_: We use default values.
    
    Notice that to calculate the gradient in _x_ direction we use: \\f$x\_{order}= 1\\f$ and \\f$y\_{order} = 0\\f$. We do analogously for the _y_ direction.
    

### Convert output to a CV\_8U image

@snippet cpp/tutorial\_code/ImgTrans/Sobel\_Demo.cpp convert

### Gradient

@snippet cpp/tutorial\_code/ImgTrans/Sobel\_Demo.cpp blend

We try to approximate the _gradient_ by adding both directional gradients (note that this is not an exact calculation at all! but it is good for our purposes).

### Show results

@snippet cpp/tutorial\_code/ImgTrans/Sobel\_Demo.cpp display

## Results

\-# Here is the output of applying our basic detector to _lena.jpg_:

```
![](images/Sobel_Derivatives_Tutorial_Result.jpg)
```

## [Warp Affine](https://docharvest.github.io/docs/opencv5/tutorials/imgproc/imgtrans/warp_affine/warp_affine/)

Contents

opencv5

Warp Affine

OpenCV 5

Warp Affine

# Affine Transformations {#tutorial\_warp\_affine}

@tableofcontents

@prev\_tutorial{tutorial\_remap} @next\_tutorial{tutorial\_histogram\_equalization}

Original author

Ana Huamán

Compatibility

OpenCV >= 3.0

## Goal

In this tutorial you will learn how to:

-   Use the OpenCV function @ref cv::warpAffine to implement simple remapping routines.
-   Use the OpenCV function @ref cv::getRotationMatrix2D to obtain a \\f$2 \\times 3\\f$ rotation matrix

## Theory

### What is an Affine Transformation?

\-# A transformation that can be expressed in the form of a _matrix multiplication_ (linear transformation) followed by a _vector addition_ (translation). -# From the above, we can use an Affine Transformation to express:

```
-#  Rotations (linear transformation)
-#  Translations (vector addition)
-#  Scale operations (linear transformation)

you can see that, in essence, an Affine Transformation represents a **relation** between two
images.
```

\-# The usual way to represent an Affine Transformation is by using a \\f$2 \\times 3\\f$ matrix.

```
\f[
A = \begin{bmatrix}
    a_{00} & a_{01} \\
    a_{10} & a_{11}
    \end{bmatrix}_{2 \times 2}
B = \begin{bmatrix}
    b_{00} \\
    b_{10}
    \end{bmatrix}_{2 \times 1}
\f]
\f[
M = \begin{bmatrix}
    A & B
    \end{bmatrix}
=
```

\\begin{bmatrix} a\_{00} & a\_{01} & b\_{00} \\ a\_{10} & a\_{11} & b\_{10} \\end{bmatrix}\_{2 \\times 3} \\f\]

```
Considering that we want to transform a 2D vector \f$X = \begin{bmatrix}x \\ y\end{bmatrix}\f$ by
using \f$A\f$ and \f$B\f$, we can do the same with:

\f$T = A \cdot \begin{bmatrix}x \\ y\end{bmatrix} + B\f$ or \f$T = M \cdot  [x, y, 1]^{T}\f$

\f[T =  \begin{bmatrix}
    a_{00}x + a_{01}y + b_{00} \\
    a_{10}x + a_{11}y + b_{10}
    \end{bmatrix}\f]
```

### How do we get an Affine Transformation?

\-# We mentioned that an Affine Transformation is basically a **relation** between two images. The information about this relation can come, roughly, in two ways: -# We know both \\f$X\\f$ and \\f$T\\f$ and we also know that they are related. Then our task is to find \\f$M\\f$ -# We know \\f$M\\f$ and \\f$X\\f$. To obtain \\f$T\\f$ we only need to apply \\f$T = M \\cdot X\\f$. Our information for \\f$M\\f$ may be explicit (i.e. have the 2-by-3 matrix) or it can come as a geometric relation between points.

\-# Let's explain this in a better way (b). Since \\f$M\\f$ relates 2 images, we can analyze the simplest case in which it relates three points in both images. Look at the figure below:

```
![](images/Warp_Affine_Tutorial_Theory_0.jpg)

the points 1, 2 and 3 (forming a triangle in image 1) are mapped into image 2, still forming a
triangle, but now they have changed notoriously. If we find the Affine Transformation with these
3 points (you can choose them as you like), then we can apply this found relation to all the
pixels in an image.
```

## Code

-   **What does this program do?**
    -   Loads an image
    -   Applies an Affine Transform to the image. This transform is obtained from the relation between three points. We use the function @ref cv::warpAffine for that purpose.
    -   Applies a Rotation to the image after being transformed. This rotation is with respect to the image center
    -   Waits until the user exits the program

@add\_toggle\_cpp

-   The tutorial's code is shown below. You can also download it [here](https://raw.githubusercontent.com/opencv/opencv/5.x/samples/cpp/tutorial_code/ImgProc/Smoothing/Smoothing.cpp) @include samples/cpp/tutorial\_code/ImgTrans/Geometric\_Transforms\_Demo.cpp @end\_toggle

@add\_toggle\_java

-   The tutorial's code is shown below. You can also download it [here](https://raw.githubusercontent.com/opencv/opencv/5.x/samples/cpp/tutorial_code/ImgProc/Smoothing/Smoothing.cpp) @include samples/java/tutorial\_code/ImgTrans/warp\_affine/GeometricTransformsDemo.java @end\_toggle

@add\_toggle\_python

-   The tutorial's code is shown below. You can also download it [here](https://raw.githubusercontent.com/opencv/opencv/5.x/samples/python/tutorial_code/ImgTrans/warp_affine/Geometric_Transforms_Demo.py) @include samples/python/tutorial\_code/ImgTrans/warp\_affine/Geometric\_Transforms\_Demo.py @end\_toggle

## Explanation

-   Load an image:
    
    @add\_toggle\_cpp @snippet samples/cpp/tutorial\_code/ImgTrans/Geometric\_Transforms\_Demo.cpp Load the image @end\_toggle
    
    @add\_toggle\_java @snippet samples/java/tutorial\_code/ImgTrans/warp\_affine/GeometricTransformsDemo.java Load the image @end\_toggle
    
    @add\_toggle\_python @snippet samples/python/tutorial\_code/ImgTrans/warp\_affine/Geometric\_Transforms\_Demo.py Load the image @end\_toggle
    
-   **Affine Transform:** As we explained in lines above, we need two sets of 3 points to derive the affine transform relation. Have a look:
    
    @add\_toggle\_cpp @snippet samples/cpp/tutorial\_code/ImgTrans/Geometric\_Transforms\_Demo.cpp Set your 3 points to calculate the Affine Transform @end\_toggle
    
    @add\_toggle\_java @snippet samples/java/tutorial\_code/ImgTrans/warp\_affine/GeometricTransformsDemo.java Set your 3 points to calculate the Affine Transform @end\_toggle
    
    @add\_toggle\_python @snippet samples/python/tutorial\_code/ImgTrans/warp\_affine/Geometric\_Transforms\_Demo.py Set your 3 points to calculate the Affine Transform @end\_toggle You may want to draw these points to get a better idea on how they change. Their locations are approximately the same as the ones depicted in the example figure (in the Theory section). You may note that the size and orientation of the triangle defined by the 3 points change.
    
-   Armed with both sets of points, we calculate the Affine Transform by using OpenCV function @ref cv::getAffineTransform :
    
    @add\_toggle\_cpp @snippet samples/cpp/tutorial\_code/ImgTrans/Geometric\_Transforms\_Demo.cpp Get the Affine Transform @end\_toggle
    
    @add\_toggle\_java @snippet samples/java/tutorial\_code/ImgTrans/warp\_affine/GeometricTransformsDemo.java Get the Affine Transform @end\_toggle
    
    @add\_toggle\_python @snippet samples/python/tutorial\_code/ImgTrans/warp\_affine/Geometric\_Transforms\_Demo.py Get the Affine Transform @end\_toggle We get a \\f$2 \\times 3\\f$ matrix as an output (in this case **warp\_mat**)
    
-   We then apply the Affine Transform just found to the src image
    
    @add\_toggle\_cpp @snippet samples/cpp/tutorial\_code/ImgTrans/Geometric\_Transforms\_Demo.cpp Apply the Affine Transform just found to the src image @end\_toggle
    
    @add\_toggle\_java @snippet samples/java/tutorial\_code/ImgTrans/warp\_affine/GeometricTransformsDemo.java Apply the Affine Transform just found to the src image @end\_toggle
    
    @add\_toggle\_python @snippet samples/python/tutorial\_code/ImgTrans/warp\_affine/Geometric\_Transforms\_Demo.py Apply the Affine Transform just found to the src image @end\_toggle with the following arguments:
    
    -   **src**: Input image
    -   **warp\_dst**: Output image
    -   **warp\_mat**: Affine transform
    -   **warp\_dst.size()**: The desired size of the output image
    
    We just got our first transformed image! We will display it in one bit. Before that, we also want to rotate it...
    
-   **Rotate:** To rotate an image, we need to know two things:
    
    \-# The center with respect to which the image will rotate -# The angle to be rotated. In OpenCV a positive angle is counter-clockwise -# _Optional:_ A scale factor
    
    We define these parameters with the following snippet:
    
    @add\_toggle\_cpp @snippet samples/cpp/tutorial\_code/ImgTrans/Geometric\_Transforms\_Demo.cpp Compute a rotation matrix with respect to the center of the image @end\_toggle
    
    @add\_toggle\_java @snippet samples/java/tutorial\_code/ImgTrans/warp\_affine/GeometricTransformsDemo.java Compute a rotation matrix with respect to the center of the image @end\_toggle
    
    @add\_toggle\_python @snippet samples/python/tutorial\_code/ImgTrans/warp\_affine/Geometric\_Transforms\_Demo.py Compute a rotation matrix with respect to the center of the image @end\_toggle
    
-   We generate the rotation matrix with the OpenCV function @ref cv::getRotationMatrix2D , which returns a \\f$2 \\times 3\\f$ matrix (in this case _rot\_mat_)
    
    @add\_toggle\_cpp @snippet samples/cpp/tutorial\_code/ImgTrans/Geometric\_Transforms\_Demo.cpp Get the rotation matrix with the specifications above @end\_toggle
    
    @add\_toggle\_java @snippet samples/java/tutorial\_code/ImgTrans/warp\_affine/GeometricTransformsDemo.java Get the rotation matrix with the specifications above @end\_toggle
    
    @add\_toggle\_python @snippet samples/python/tutorial\_code/ImgTrans/warp\_affine/Geometric\_Transforms\_Demo.py Get the rotation matrix with the specifications above @end\_toggle
    
-   We now apply the found rotation to the output of our previous Transformation:
    
    @add\_toggle\_cpp @snippet samples/cpp/tutorial\_code/ImgTrans/Geometric\_Transforms\_Demo.cpp Rotate the warped image @end\_toggle
    
    @add\_toggle\_java @snippet samples/java/tutorial\_code/ImgTrans/warp\_affine/GeometricTransformsDemo.java Rotate the warped image @end\_toggle
    
    @add\_toggle\_python @snippet samples/python/tutorial\_code/ImgTrans/warp\_affine/Geometric\_Transforms\_Demo.py Rotate the warped image @end\_toggle
    
-   Finally, we display our results in two windows plus the original image for good measure:
    
    @add\_toggle\_cpp @snippet samples/cpp/tutorial\_code/ImgTrans/Geometric\_Transforms\_Demo.cpp Show what you got @end\_toggle
    
    @add\_toggle\_java @snippet samples/java/tutorial\_code/ImgTrans/warp\_affine/GeometricTransformsDemo.java Show what you got @end\_toggle
    
    @add\_toggle\_python @snippet samples/python/tutorial\_code/ImgTrans/warp\_affine/Geometric\_Transforms\_Demo.py Show what you got @end\_toggle
    
-   We just have to wait until the user exits the program
    
    @add\_toggle\_cpp @snippet samples/cpp/tutorial\_code/ImgTrans/Geometric\_Transforms\_Demo.cpp Wait until user exits the program @end\_toggle
    
    @add\_toggle\_java @snippet samples/java/tutorial\_code/ImgTrans/warp\_affine/GeometricTransformsDemo.java Wait until user exits the program @end\_toggle
    
    @add\_toggle\_python @snippet samples/python/tutorial\_code/ImgTrans/warp\_affine/Geometric\_Transforms\_Demo.py Wait until user exits the program @end\_toggle
    

## Result

-   After compiling the code above, we can give it the path of an image as argument. For instance, for a picture like:
    
    after applying the first Affine Transform we obtain:
    
    and finally, after applying a negative rotation (remember negative means clockwise) and a scale factor, we get:

## [Morph Lines Detection](https://docharvest.github.io/docs/opencv5/tutorials/imgproc/morph_lines_detection/morph_lines_detection/)

Contents

opencv5

Morph Lines Detection

OpenCV 5

Morph Lines Detection

# Extract horizontal and vertical lines by using morphological operations {#tutorial\_morph\_lines\_detection}

@tableofcontents

@prev\_tutorial{tutorial\_hitOrMiss} @next\_tutorial{tutorial\_pyramids}

Original author

Theodore Tsesmelis

Compatibility

OpenCV >= 3.0

## Goal

In this tutorial you will learn how to:

-   Apply two very common morphology operators (i.e. Dilation and Erosion), with the creation of custom kernels, in order to extract straight lines on the horizontal and vertical axes. For this purpose, you will use the following OpenCV functions:
    
    -   **erode()**
    -   **dilate()**
    -   **getStructuringElement()**
    
    in an example where your goal will be to extract the music notes from a music sheet.
    

## Theory

### Morphology Operations

Morphology is a set of image processing operations that process images based on predefined _structuring elements_ known also as kernels. The value of each pixel in the output image is based on a comparison of the corresponding pixel in the input image with its neighbors. By choosing the size and shape of the kernel, you can construct a morphological operation that is sensitive to specific shapes regarding the input image.

Two of the most basic morphological operations are dilation and erosion. Dilation adds pixels to the boundaries of the object in an image, while erosion does exactly the opposite. The amount of pixels added or removed, respectively depends on the size and shape of the structuring element used to process the image. In general the rules followed from these two operations have as follows:

-   **Dilation**: The value of the output pixel is the **_maximum_** value of all the pixels that fall within the structuring element's size and shape. For example in a binary image, if any of the pixels of the input image falling within the range of the kernel is set to the value 1, the corresponding pixel of the output image will be set to 1 as well. The latter applies to any type of image (e.g. grayscale, bgr, etc).
    
-   **Erosion**: The vice versa applies for the erosion operation. The value of the output pixel is the **_minimum_** value of all the pixels that fall within the structuring element's size and shape. Look the at the example figures below:
    

### Structuring Elements

As it can be seen above and in general in any morphological operation the structuring element used to probe the input image, is the most important part.

A structuring element is a matrix consisting of only 0's and 1's that can have any arbitrary shape and size. Typically are much smaller than the image being processed, while the pixels with values of 1 define the neighborhood. The center pixel of the structuring element, called the origin, identifies the pixel of interest -- the pixel being processed.

For example, the following illustrates a diamond-shaped structuring element of 7x7 size.

A structuring element can have many common shapes, such as lines, diamonds, disks, periodic lines, and circles and sizes. You typically choose a structuring element the same size and shape as the objects you want to process/extract in the input image. For example, to find lines in an image, create a linear structuring element as you will see later.

## Code

This tutorial code's is shown lines below.

@add\_toggle\_cpp You can also download it from [here](https://raw.githubusercontent.com/opencv/opencv/5.x/samples/cpp/tutorial_code/ImgProc/morph_lines_detection/Morphology_3.cpp). @include samples/cpp/tutorial\_code/ImgProc/morph\_lines\_detection/Morphology\_3.cpp @end\_toggle

@add\_toggle\_java You can also download it from [here](https://raw.githubusercontent.com/opencv/opencv/5.x/samples/java/tutorial_code/ImgProc/morph_lines_detection/Morphology_3.java). @include samples/java/tutorial\_code/ImgProc/morph\_lines\_detection/Morphology\_3.java @end\_toggle

@add\_toggle\_python You can also download it from [here](https://raw.githubusercontent.com/opencv/opencv/5.x/samples/python/tutorial_code/imgProc/morph_lines_detection/morph_lines_detection.py). @include samples/python/tutorial\_code/imgProc/morph\_lines\_detection/morph\_lines\_detection.py @end\_toggle

## Explanation / Result

Get image from [here](https://raw.githubusercontent.com/opencv/opencv/5.x/doc/tutorials/imgproc/morph_lines_detection/images/src.png) .

### Load Image

@add\_toggle\_cpp @snippet samples/cpp/tutorial\_code/ImgProc/morph\_lines\_detection/Morphology\_3.cpp load\_image @end\_toggle

@add\_toggle\_java @snippet samples/java/tutorial\_code/ImgProc/morph\_lines\_detection/Morphology\_3.java load\_image @end\_toggle

@add\_toggle\_python @snippet samples/python/tutorial\_code/imgProc/morph\_lines\_detection/morph\_lines\_detection.py load\_image @end\_toggle

### Grayscale

@add\_toggle\_cpp @snippet samples/cpp/tutorial\_code/ImgProc/morph\_lines\_detection/Morphology\_3.cpp gray @end\_toggle

@add\_toggle\_java @snippet samples/java/tutorial\_code/ImgProc/morph\_lines\_detection/Morphology\_3.java gray @end\_toggle

@add\_toggle\_python @snippet samples/python/tutorial\_code/imgProc/morph\_lines\_detection/morph\_lines\_detection.py gray @end\_toggle

### Grayscale to Binary image

@add\_toggle\_cpp @snippet samples/cpp/tutorial\_code/ImgProc/morph\_lines\_detection/Morphology\_3.cpp bin @end\_toggle

@add\_toggle\_java @snippet samples/java/tutorial\_code/ImgProc/morph\_lines\_detection/Morphology\_3.java bin @end\_toggle

@add\_toggle\_python @snippet samples/python/tutorial\_code/imgProc/morph\_lines\_detection/morph\_lines\_detection.py bin @end\_toggle

### Output images

Now we are ready to apply morphological operations in order to extract the horizontal and vertical lines and as a consequence to separate the music notes from the music sheet, but first let's initialize the output images that we will use for that reason:

@add\_toggle\_cpp @snippet samples/cpp/tutorial\_code/ImgProc/morph\_lines\_detection/Morphology\_3.cpp init @end\_toggle

@add\_toggle\_java @snippet samples/java/tutorial\_code/ImgProc/morph\_lines\_detection/Morphology\_3.java init @end\_toggle

@add\_toggle\_python @snippet samples/python/tutorial\_code/imgProc/morph\_lines\_detection/morph\_lines\_detection.py init @end\_toggle

### Structure elements

As we specified in the theory in order to extract the object that we desire, we need to create the corresponding structure element. Since we want to extract the horizontal lines, a corresponding structure element for that purpose will have the following shape: and in the source code this is represented by the following code snippet:

@add\_toggle\_cpp @snippet samples/cpp/tutorial\_code/ImgProc/morph\_lines\_detection/Morphology\_3.cpp horiz @end\_toggle

@add\_toggle\_java @snippet samples/java/tutorial\_code/ImgProc/morph\_lines\_detection/Morphology\_3.java horiz @end\_toggle

@add\_toggle\_python @snippet samples/python/tutorial\_code/imgProc/morph\_lines\_detection/morph\_lines\_detection.py horiz @end\_toggle

The same applies for the vertical lines, with the corresponding structure element: and again this is represented as follows:

@add\_toggle\_cpp @snippet samples/cpp/tutorial\_code/ImgProc/morph\_lines\_detection/Morphology\_3.cpp vert @end\_toggle

@add\_toggle\_java @snippet samples/java/tutorial\_code/ImgProc/morph\_lines\_detection/Morphology\_3.java vert @end\_toggle

@add\_toggle\_python @snippet samples/python/tutorial\_code/imgProc/morph\_lines\_detection/morph\_lines\_detection.py vert @end\_toggle

### Refine edges / Result

As you can see we are almost there. However, at that point you will notice that the edges of the notes are a bit rough. For that reason we need to refine the edges in order to obtain a smoother result:

@add\_toggle\_cpp @snippet samples/cpp/tutorial\_code/ImgProc/morph\_lines\_detection/Morphology\_3.cpp smooth @end\_toggle

@add\_toggle\_java @snippet samples/java/tutorial\_code/ImgProc/morph\_lines\_detection/Morphology\_3.java smooth @end\_toggle

@add\_toggle\_python @snippet samples/python/tutorial\_code/imgProc/morph\_lines\_detection/morph\_lines\_detection.py smooth @end\_toggle

## [Motion Deblur Filter](https://docharvest.github.io/docs/opencv5/tutorials/imgproc/motion_deblur_filter/motion_deblur_filter/)

Contents

opencv5

Motion Deblur Filter

OpenCV 5

Motion Deblur Filter

# Motion Deblur Filter {#tutorial\_motion\_deblur\_filter}

@tableofcontents

@prev\_tutorial{tutorial\_out\_of\_focus\_deblur\_filter} @next\_tutorial{tutorial\_anisotropic\_image\_segmentation\_by\_a\_gst}

Original author

Karpushin Vladislav

Compatibility

OpenCV >= 3.0

## Goal

In this tutorial you will learn:

-   what the PSF of a motion blur image is
-   how to restore a motion blur image

## Theory

For the degradation image model theory and the Wiener filter theory you can refer to the tutorial @ref tutorial\_out\_of\_focus\_deblur\_filter "Out-of-focus Deblur Filter". On this page only a linear motion blur distortion is considered. The motion blur image on this page is a real world image. The blur was caused by a moving subject.

### What is the PSF of a motion blur image?

The point spread function (PSF) of a linear motion blur distortion is a line segment. Such a PSF is specified by two parameters: \\f$LEN\\f$ is the length of the blur and \\f$THETA\\f$ is the angle of motion.

### How to restore a blurred image?

On this page the Wiener filter is used as the restoration filter, for details you can refer to the tutorial @ref tutorial\_out\_of\_focus\_deblur\_filter "Out-of-focus Deblur Filter". In order to synthesize the Wiener filter for a motion blur case, it needs to specify the signal-to-noise ratio (\\f$SNR\\f$), \\f$LEN\\f$ and \\f$THETA\\f$ of the PSF.

## Source code

You can find source code in the `samples/cpp/tutorial_code/ImgProc/motion_deblur_filter/motion_deblur_filter.cpp` of the OpenCV source code library.

@include cpp/tutorial\_code/ImgProc/motion\_deblur\_filter/motion\_deblur\_filter.cpp

## Explanation

A motion blur image recovering algorithm consists of PSF generation, Wiener filter generation and filtering a blurred image in a frequency domain: @snippet samples/cpp/tutorial\_code/ImgProc/motion\_deblur\_filter/motion\_deblur\_filter.cpp main

A function calcPSF() forms a PSF according to input parameters \\f$LEN\\f$ and \\f$THETA\\f$ (in degrees): @snippet samples/cpp/tutorial\_code/ImgProc/motion\_deblur\_filter/motion\_deblur\_filter.cpp calcPSF

A function edgetaper() tapers the input image’s edges in order to reduce the ringing effect in a restored image: @snippet samples/cpp/tutorial\_code/ImgProc/motion\_deblur\_filter/motion\_deblur\_filter.cpp edgetaper

The functions calcWnrFilter(), fftshift() and filter2DFreq() realize an image filtration by a specified PSF in the frequency domain. The functions are copied from the tutorial @ref tutorial\_out\_of\_focus\_deblur\_filter "Out-of-focus Deblur Filter".

## Result

Below you can see the real world image with motion blur distortion. The license plate is not readable on both cars. The red markers show the car’s license plate location.

Below you can see the restoration result for the black car license plate. The result has been computed with \\f$LEN\\f$ = 125, \\f$THETA\\f$ = 0, \\f$SNR\\f$ = 700.

Below you can see the restoration result for the white car license plate. The result has been computed with \\f$LEN\\f$ = 78, \\f$THETA\\f$ = 15, \\f$SNR\\f$ = 300.

The values of \\f$SNR\\f$, \\f$LEN\\f$ and \\f$THETA\\f$ were selected manually to give the best possible visual result. The \\f$THETA\\f$ parameter coincides with the car’s moving direction, and the \\f$LEN\\f$ parameter depends on the car’s moving speed. The result is not perfect, but at least it gives us a hint of the image’s content. With some effort, the car license plate is now readable.

@note The parameters \\f$LEN\\f$ and \\f$THETA\\f$ are the most important. You should adjust \\f$LEN\\f$ and \\f$THETA\\f$ first, then \\f$SNR\\f$.

You can also find a quick video demonstration of a license plate recovering method [YouTube](https://youtu.be/xSrE0hdhb4o). @youtube{xSrE0hdhb4o}

## [Opening Closing Hats](https://docharvest.github.io/docs/opencv5/tutorials/imgproc/opening_closing_hats/opening_closing_hats/)

Contents

opencv5

Opening Closing Hats

OpenCV 5

Opening Closing Hats

# More Morphology Transformations {#tutorial\_opening\_closing\_hats}

@tableofcontents

@prev\_tutorial{tutorial\_erosion\_dilatation} @next\_tutorial{tutorial\_hitOrMiss}

Original author

Ana Huamán

Compatibility

OpenCV >= 3.0

## Goal

In this tutorial you will learn how to:

-   Use the OpenCV function @ref cv::morphologyEx to apply Morphological Transformation such as:
    -   Opening
    -   Closing
    -   Morphological Gradient
    -   Top Hat
    -   Black Hat

## Theory

@note The explanation below belongs to the book **Learning OpenCV** by Bradski and Kaehler.

In the previous tutorial we covered two basic Morphology operations:

-   Erosion
-   Dilation.

Based on these two we can effectuate more sophisticated transformations to our images. Here we discuss briefly 5 operations offered by OpenCV:

### Opening

-   It is obtained by the erosion of an image followed by a dilation.
    
    \\f\[dst = open( src, element) = dilate( erode( src, element ) )\\f\]
    
-   Useful for removing small objects (it is assumed that the objects are bright on a dark foreground)
    
-   For instance, check out the example below. The image at the left is the original and the image at the right is the result after applying the opening transformation. We can observe that the small dots have disappeared.
    

### Closing

-   It is obtained by the dilation of an image followed by an erosion.
    
    \\f\[dst = close( src, element ) = erode( dilate( src, element ) )\\f\]
    
-   Useful to remove small holes (dark regions).
    

### Morphological Gradient

-   It is the difference between the dilation and the erosion of an image.
    
    \\f\[dst = morph\_{grad}( src, element ) = dilate( src, element ) - erode( src, element )\\f\]
    
-   It is useful for finding the outline of an object as can be seen below:
    

### Top Hat

-   It is the difference between an input image and its opening.
    
    \\f\[dst = tophat( src, element ) = src - open( src, element )\\f\]
    

### Black Hat

-   It is the difference between the closing and its input image
    
    \\f\[dst = blackhat( src, element ) = close( src, element ) - src\\f\]
    

## Code

@add\_toggle\_cpp This tutorial's code is shown below. You can also download it [here](https://github.com/opencv/opencv/tree/5.x/samples/cpp/tutorial_code/ImgProc/Morphology_2.cpp) @include cpp/tutorial\_code/ImgProc/Morphology\_2.cpp @end\_toggle

@add\_toggle\_java This tutorial's code is shown below. You can also download it [here](https://github.com/opencv/opencv/tree/5.x/samples/java/tutorial_code/ImgProc/opening_closing_hats/MorphologyDemo2.java) @include java/tutorial\_code/ImgProc/opening\_closing\_hats/MorphologyDemo2.java @end\_toggle

@add\_toggle\_python This tutorial's code is shown below. You can also download it [here](https://github.com/opencv/opencv/tree/5.x/samples/python/tutorial_code/imgProc/opening_closing_hats/morphology_2.py) @include python/tutorial\_code/imgProc/opening\_closing\_hats/morphology\_2.py @end\_toggle

## Explanation

\-# Let's check the general structure of the C++ program: - Load an image - Create a window to display results of the Morphological operations - Create three Trackbars for the user to enter parameters: - The first trackbar **Operator** returns the kind of morphology operation to use (**morph\_operator**). @snippet cpp/tutorial\_code/ImgProc/Morphology\_2.cpp create\_trackbar1

```
    -   The second trackbar **Element** returns **morph_elem**, which indicates what kind of
        structure our kernel is:
        @snippet cpp/tutorial_code/ImgProc/Morphology_2.cpp create_trackbar2

    -   The final trackbar **Kernel Size** returns the size of the kernel to be used
        (**morph_size**)
        @snippet cpp/tutorial_code/ImgProc/Morphology_2.cpp create_trackbar3

-   Every time we move any slider, the user's function **Morphology_Operations** will be called
    to effectuate a new morphology operation and it will update the output image based on the
    current trackbar values.
    @snippet cpp/tutorial_code/ImgProc/Morphology_2.cpp morphology_operations

    We can observe that the key function to perform the morphology transformations is @ref
    cv::morphologyEx . In this example we use four arguments (leaving the rest as defaults):

    -   **src** : Source (input) image
    -   **dst**: Output image
    -   **operation**: The kind of morphology transformation to be performed. Note that we have
        5 alternatives:

        -   *Opening*: MORPH_OPEN : 2
        -   *Closing*: MORPH_CLOSE: 3
        -   *Gradient*: MORPH_GRADIENT: 4
        -   *Top Hat*: MORPH_TOPHAT: 5
        -   *Black Hat*: MORPH_BLACKHAT: 6

        As you can see the values range from \<2-6\>, that is why we add (+2) to the values
        entered by the Trackbar:
        @snippet cpp/tutorial_code/ImgProc/Morphology_2.cpp operation
    -   **element**: The kernel to be used. We use the function @ref cv::getStructuringElement
        to define our own structure.
```

## Results

-   After compiling the code above we can execute it giving an image path as an argument. Results using the image: **baboon.png**:
    
-   And here are two snapshots of the display window. The first picture shows the output after using the operator **Opening** with a cross kernel. The second picture (right side, shows the result of using a **Blackhat** operator with an ellipse kernel.

## [Out Of Focus Deblur Filter](https://docharvest.github.io/docs/opencv5/tutorials/imgproc/out_of_focus_deblur_filter/out_of_focus_deblur_filter/)

Contents

opencv5

Out Of Focus Deblur Filter

OpenCV 5

Out Of Focus Deblur Filter

# Out-of-focus Deblur Filter {#tutorial\_out\_of\_focus\_deblur\_filter}

@tableofcontents

@prev\_tutorial{tutorial\_distance\_transform} @next\_tutorial{tutorial\_motion\_deblur\_filter}

Original author

Karpushin Vladislav

Compatibility

OpenCV >= 3.0

## Goal

In this tutorial you will learn:

-   what a degradation image model is
-   what the PSF of an out-of-focus image is
-   how to restore a blurred image
-   what is a Wiener filter

## Theory

@note The explanation is based on the books @cite Gonzalez1987 and @cite gruzman. Also, you can refer to Matlab's tutorial [Image Deblurring in Matlab](https://www.mathworks.com/help/images/image-deblurring.html) and the article [SmartDeblur](http://yuzhikov.com/articles/BlurredImagesRestoration1.htm). @note The out-of-focus image on this page is a real world image. The out-of-focus was achieved manually by camera optics.

### What is a degradation image model?

Here is a mathematical model of the image degradation in frequency domain representation:

\\f\[S = H\\cdot U + N\\f\]

where \\f$S\\f$ is a spectrum of blurred (degraded) image, \\f$U\\f$ is a spectrum of original true (undegraded) image, \\f$H\\f$ is a frequency response of point spread function (PSF), \\f$N\\f$ is a spectrum of additive noise.

The circular PSF is a good approximation of out-of-focus distortion. Such a PSF is specified by only one parameter - radius \\f$R\\f$. Circular PSF is used in this work.

### How to restore a blurred image?

The objective of restoration (deblurring) is to obtain an estimate of the original image. The restoration formula in frequency domain is:

\\f\[U' = H\_w\\cdot S\\f\]

where \\f$U'\\f$ is the spectrum of estimation of original image \\f$U\\f$, and \\f$H\_w\\f$ is the restoration filter, for example, the Wiener filter.

### What is the Wiener filter?

The Wiener filter is a way to restore a blurred image. Let's suppose that the PSF is a real and symmetric signal, a power spectrum of the original true image and noise are not known, then a simplified Wiener formula is:

\\f\[H\_w = \\frac{H}{|H|^2+\\frac{1}{SNR}} \\f\]

where \\f$SNR\\f$ is signal-to-noise ratio.

So, in order to recover an out-of-focus image by Wiener filter, it needs to know the \\f$SNR\\f$ and \\f$R\\f$ of the circular PSF.

## Source code

You can find source code in the `samples/cpp/tutorial_code/ImgProc/out_of_focus_deblur_filter/out_of_focus_deblur_filter.cpp` of the OpenCV source code library.

@include cpp/tutorial\_code/ImgProc/out\_of\_focus\_deblur\_filter/out\_of\_focus\_deblur\_filter.cpp

## Explanation

An out-of-focus image recovering algorithm consists of PSF generation, Wiener filter generation and filtering a blurred image in frequency domain: @snippet samples/cpp/tutorial\_code/ImgProc/out\_of\_focus\_deblur\_filter/out\_of\_focus\_deblur\_filter.cpp main

A function calcPSF() forms a circular PSF according to input parameter radius \\f$R\\f$: @snippet samples/cpp/tutorial\_code/ImgProc/out\_of\_focus\_deblur\_filter/out\_of\_focus\_deblur\_filter.cpp calcPSF

A function calcWnrFilter() synthesizes the simplified Wiener filter \\f$H\_w\\f$ according to the formula described above: @snippet samples/cpp/tutorial\_code/ImgProc/out\_of\_focus\_deblur\_filter/out\_of\_focus\_deblur\_filter.cpp calcWnrFilter

A function fftshift() rearranges the PSF. This code was just copied from the tutorial @ref tutorial\_discrete\_fourier\_transform "Discrete Fourier Transform": @snippet samples/cpp/tutorial\_code/ImgProc/out\_of\_focus\_deblur\_filter/out\_of\_focus\_deblur\_filter.cpp fftshift

A function filter2DFreq() filters the blurred image in the frequency domain: @snippet samples/cpp/tutorial\_code/ImgProc/out\_of\_focus\_deblur\_filter/out\_of\_focus\_deblur\_filter.cpp filter2DFreq

## Result

Below you can see the real out-of-focus image:

And the following result has been computed with \\f$R\\f$ = 53 and \\f$SNR\\f$ = 5200 parameters:

The Wiener filter was used, and values of \\f$R\\f$ and \\f$SNR\\f$ were selected manually to give the best possible visual result. We can see that the result is not perfect, but it gives us a hint to the image's content. With some difficulty, the text is readable.

@note The parameter \\f$R\\f$ is the most important. So you should adjust \\f$R\\f$ first, then \\f$SNR\\f$. @note Sometimes you can observe the ringing effect in a restored image. This effect can be reduced with several methods. For example, you can taper input image edges.

You can also find a quick video demonstration of this on [YouTube](https://youtu.be/0bEcE4B0XP4). @youtube{0bEcE4B0XP4}

## References

-   [Image Deblurring in Matlab](https://www.mathworks.com/help/images/image-deblurring.html) - Image Deblurring in Matlab
-   [SmartDeblur](http://yuzhikov.com/articles/BlurredImagesRestoration1.htm) - SmartDeblur site

## [Periodic Noise Removing Filter](https://docharvest.github.io/docs/opencv5/tutorials/imgproc/periodic_noise_removing_filter/periodic_noise_removing_filter/)

Contents

opencv5

Periodic Noise Removing Filter

OpenCV 5

Periodic Noise Removing Filter

# Periodic Noise Removing Filter {#tutorial\_periodic\_noise\_removing\_filter}

@tableofcontents

@prev\_tutorial{tutorial\_anisotropic\_image\_segmentation\_by\_a\_gst}

Original author

Karpushin Vladislav

Compatibility

OpenCV >= 3.0

## Goal

In this tutorial you will learn:

-   how to remove periodic noise in the Fourier domain

## Theory

@note The explanation is based on the book @cite Gonzalez1987. The image on this page is a real world image.

Periodic noise produces spikes in the Fourier domain that can often be detected by visual analysis.

### How to remove periodic noise in the Fourier domain?

Periodic noise can be reduced significantly via frequency domain filtering. On this page we use a notch reject filter with an appropriate radius to completely enclose the noise spikes in the Fourier domain. The notch filter rejects frequencies in predefined neighborhoods around a center frequency. The number of notch filters is arbitrary. The shape of the notch areas can also be arbitrary (e.g. rectangular or circular). On this page we use three circular shape notch reject filters. Power spectrum densify of an image is used for the noise spike’s visual detection.

## Source code

You can find source code in the `samples/cpp/tutorial_code/ImgProc/periodic_noise_removing_filter/periodic_noise_removing_filter.cpp` of the OpenCV source code library.

@include samples/cpp/tutorial\_code/ImgProc/periodic\_noise\_removing\_filter/periodic\_noise\_removing\_filter.cpp

## Explanation

Periodic noise reduction by frequency domain filtering consists of power spectrum density calculation (for the noise spikes visual detection), notch reject filter synthesis and frequency filtering: @snippet samples/cpp/tutorial\_code/ImgProc/periodic\_noise\_removing\_filter/periodic\_noise\_removing\_filter.cpp main

A function calcPSD() calculates power spectrum density of an image: @snippet samples/cpp/tutorial\_code/ImgProc/periodic\_noise\_removing\_filter/periodic\_noise\_removing\_filter.cpp calcPSD

A function synthesizeFilterH() forms a transfer function of an ideal circular shape notch reject filter according to a center frequency and a radius: @snippet samples/cpp/tutorial\_code/ImgProc/periodic\_noise\_removing\_filter/periodic\_noise\_removing\_filter.cpp synthesizeFilterH

A function filter2DFreq() filters an image in the frequency domain. The functions fftshift() and filter2DFreq() are copied from the tutorial @ref tutorial\_out\_of\_focus\_deblur\_filter "Out-of-focus Deblur Filter".

## Result

The figure below shows an image heavily corrupted by periodical noise of various frequencies.

The noise components are easily seen as bright dots (spikes) in the Power spectrum density shown in the figure below.

The figure below shows a notch reject filter with an appropriate radius to completely enclose the noise spikes.

The result of processing the image with the notch reject filter is shown below.

The improvement is quite evident. This image contains significantly less visible periodic noise than the original image.

You can also find a quick video demonstration of this filtering idea on [YouTube](https://youtu.be/Qne51TcWwAc). @youtube{Qne51TcWwAc}

## [Pyramids](https://docharvest.github.io/docs/opencv5/tutorials/imgproc/pyramids/pyramids/)

Contents

opencv5

Pyramids

OpenCV 5

Pyramids

# Image Pyramids {#tutorial\_pyramids}

@tableofcontents

@prev\_tutorial{tutorial\_morph\_lines\_detection} @next\_tutorial{tutorial\_threshold}

Original author

Ana Huamán

Compatibility

OpenCV >= 3.0

## Goal

In this tutorial you will learn how to:

-   Use the OpenCV functions **pyrUp()** and **pyrDown()** to downsample or upsample a given image.

## Theory

@note The explanation below belongs to the book **Learning OpenCV** by Bradski and Kaehler.

-   Usually we need to convert an image to a size different than its original. For this, there are two possible options: -# _Upsize_ the image (zoom in) or -# _Downsize_ it (zoom out).
-   Although there is a _geometric transformation_ function in OpenCV that -literally- resize an image (**resize** , which we will show in a future tutorial), in this section we analyze first the use of **Image Pyramids**, which are widely applied in a huge range of vision applications.

### Image Pyramid

-   An image pyramid is a collection of images - all arising from a single original image - that are successively downsampled until some desired stopping point is reached.
-   There are two common kinds of image pyramids:
    -   **Gaussian pyramid:** Used to downsample images
    -   **Laplacian pyramid:** Used to reconstruct an upsampled image from an image lower in the pyramid (with less resolution)
-   In this tutorial we'll use the _Gaussian pyramid_.

### Gaussian Pyramid

-   Imagine the pyramid as a set of layers in which the higher the layer, the smaller the size.
    
-   Every layer is numbered from bottom to top, so layer \\f$(i+1)\\f$ (denoted as \\f$G\_{i+1}\\f$ is smaller than layer \\f$i\\f$ (\\f$G\_{i}\\f$).
    
-   To produce layer \\f$(i+1)\\f$ in the Gaussian pyramid, we do the following:
    
    -   Convolve \\f$G\_{i}\\f$ with a Gaussian kernel:
        
        \\f\[\\frac{1}{256} \\begin{bmatrix} 1 & 4 & 6 & 4 & 1 \\ 4 & 16 & 24 & 16 & 4 \\ 6 & 24 & 36 & 24 & 6 \\ 4 & 16 & 24 & 16 & 4 \\ 1 & 4 & 6 & 4 & 1 \\end{bmatrix}\\f\]
        
    -   Remove every even-numbered row and column.
        
-   You can easily notice that the resulting image will be exactly one-quarter the area of its predecessor. Iterating this process on the input image \\f$G\_{0}\\f$ (original image) produces the entire pyramid.
    
-   The procedure above was useful to downsample an image. What if we want to make it bigger?: columns filled with zeros (\\f$0 \\f$)
    
    -   First, upsize the image to twice the original in each dimension, with the new even rows and
    -   Perform a convolution with the same kernel shown above (multiplied by 4) to approximate the values of the "missing pixels"
-   These two procedures (downsampling and upsampling as explained above) are implemented by the OpenCV functions **pyrUp()** and **pyrDown()** , as we will see in an example with the code below:
    

@note When we reduce the size of an image, we are actually _losing_ information of the image.

## Code

This tutorial code's is shown lines below.

@add\_toggle\_cpp You can also download it from [here](https://raw.githubusercontent.com/opencv/opencv/5.x/samples/cpp/tutorial_code/ImgProc/Pyramids/Pyramids.cpp) @include samples/cpp/tutorial\_code/ImgProc/Pyramids/Pyramids.cpp @end\_toggle

@add\_toggle\_java You can also download it from [here](https://raw.githubusercontent.com/opencv/opencv/5.x/samples/java/tutorial_code/ImgProc/Pyramids/Pyramids.java) @include samples/java/tutorial\_code/ImgProc/Pyramids/Pyramids.java @end\_toggle

@add\_toggle\_python You can also download it from [here](https://raw.githubusercontent.com/opencv/opencv/5.x/samples/python/tutorial_code/imgProc/Pyramids/pyramids.py) @include samples/python/tutorial\_code/imgProc/Pyramids/pyramids.py @end\_toggle

## Explanation

Let's check the general structure of the program:

### Load an image

@add\_toggle\_cpp @snippet cpp/tutorial\_code/ImgProc/Pyramids/Pyramids.cpp load @end\_toggle

@add\_toggle\_java @snippet java/tutorial\_code/ImgProc/Pyramids/Pyramids.java load @end\_toggle

@add\_toggle\_python @snippet python/tutorial\_code/imgProc/Pyramids/pyramids.py load @end\_toggle

### Create window

@add\_toggle\_cpp @snippet cpp/tutorial\_code/ImgProc/Pyramids/Pyramids.cpp show\_image @end\_toggle

@add\_toggle\_java @snippet java/tutorial\_code/ImgProc/Pyramids/Pyramids.java show\_image @end\_toggle

@add\_toggle\_python @snippet python/tutorial\_code/imgProc/Pyramids/pyramids.py show\_image @end\_toggle

### Loop

@add\_toggle\_cpp @snippet cpp/tutorial\_code/ImgProc/Pyramids/Pyramids.cpp loop @end\_toggle

@add\_toggle\_java @snippet java/tutorial\_code/ImgProc/Pyramids/Pyramids.java loop @end\_toggle

@add\_toggle\_python @snippet python/tutorial\_code/imgProc/Pyramids/pyramids.py loop @end\_toggle

Perform an infinite loop waiting for user input. Our program exits if the user presses **ESC**. Besides, it has two options:

-   **Perform upsampling - Zoom 'i'n (after pressing 'i')**
    
    We use the function **pyrUp()** with three arguments: - _src_: The current and destination image (to be shown on screen, supposedly the double of the input image) - _Size( tmp.cols_2, tmp.rows\*2 )\* : The destination size. Since we are upsampling, **pyrUp()** expects a size double than the input image (in this case _src_).
    

@add\_toggle\_cpp @snippet cpp/tutorial\_code/ImgProc/Pyramids/Pyramids.cpp pyrup @end\_toggle

@add\_toggle\_java @snippet java/tutorial\_code/ImgProc/Pyramids/Pyramids.java pyrup @end\_toggle

@add\_toggle\_python @snippet python/tutorial\_code/imgProc/Pyramids/pyramids.py pyrup @end\_toggle

-   **Perform downsampling - Zoom 'o'ut (after pressing 'o')**
    
    We use the function **pyrDown()** with three arguments (similarly to **pyrUp()**): - _src_: The current and destination image (to be shown on screen, supposedly half the input image) - _Size( tmp.cols/2, tmp.rows/2 )_ : The destination size. Since we are downsampling, **pyrDown()** expects half the size the input image (in this case _src_).
    

@add\_toggle\_cpp @snippet cpp/tutorial\_code/ImgProc/Pyramids/Pyramids.cpp pyrdown @end\_toggle

@add\_toggle\_java @snippet java/tutorial\_code/ImgProc/Pyramids/Pyramids.java pyrdown @end\_toggle

@add\_toggle\_python @snippet python/tutorial\_code/imgProc/Pyramids/pyramids.py pyrdown @end\_toggle

Notice that it is important that the input image can be divided by a factor of two (in both dimensions). Otherwise, an error will be shown.

## Results

-   The program calls by default an image [chicky\_512.png](https://raw.githubusercontent.com/opencv/opencv/5.x/samples/data/chicky_512.png) that comes in the `samples/data` folder. Notice that this image is \\f$512 \\times 512\\f$, hence a downsample won't generate any error (\\f$512 = 2^{9}\\f$). The original image is shown below:
    
-   First we apply two successive **pyrDown()** operations by pressing 'd'. Our output is:
    
-   Note that we should have lost some resolution due to the fact that we are diminishing the size of the image. This is evident after we apply **pyrUp()** twice (by pressing 'u'). Our output is now:

## [Random Generator And Text](https://docharvest.github.io/docs/opencv5/tutorials/imgproc/random_generator_and_text/random_generator_and_text/)

Contents

opencv5

Random Generator And Text

OpenCV 5

Random Generator And Text

# Random generator and text with OpenCV {#tutorial\_random\_generator\_and\_text}

@tableofcontents

@prev\_tutorial{tutorial\_basic\_geometric\_drawing} @next\_tutorial{tutorial\_gausian\_median\_blur\_bilateral\_filter}

Original author

Ana Huamán

Compatibility

OpenCV >= 3.0

## Goals

In this tutorial you will learn how to:

-   Use the _Random Number generator class_ (@ref cv::RNG ) and how to get a random number from a uniform distribution.
-   Display text on an OpenCV window by using the function @ref cv::putText

## Code

-   In the previous tutorial (@ref tutorial\_basic\_geometric\_drawing) we drew diverse geometric figures, giving as input parameters such as coordinates (in the form of @ref cv::Point), color, thickness, etc. You might have noticed that we gave specific values for these arguments.
-   In this tutorial, we intend to use _random_ values for the drawing parameters. Also, we intend to populate our image with a big number of geometric figures. Since we will be initializing them in a random fashion, this process will be automatic and made by using _loops_ .
-   This code is in your OpenCV sample folder. Otherwise you can grab it from [here](https://github.com/opencv/opencv/blob/5.x/samples/cpp/tutorial_code/ImgProc/basic_drawing/Drawing_2.cpp)

## Explanation

\-# Let's start by checking out the _main_ function. We observe that first thing we do is creating a _Random Number Generator_ object (RNG): @code{.cpp} RNG rng( 0xFFFFFFFF ); @endcode RNG implements a random number generator. In this example, _rng_ is a RNG element initialized with the value _0xFFFFFFFF_

\-# Then we create a matrix initialized to _zeros_ (which means that it will appear as black), specifying its height, width and its type: @code{.cpp} /// Initialize a matrix filled with zeros Mat image = Mat::zeros( window\_height, window\_width, CV\_8UC3 );

```
/// Show it in a window during DELAY ms
imshow( window_name, image );
@endcode
```

\-# Then we proceed to draw crazy stuff. After taking a look at the code, you can see that it is mainly divided in 8 sections, defined as functions: @code{.cpp} /// Now, let's draw some lines c = Drawing\_Random\_Lines(image, window\_name, rng); if( c != 0 ) return 0;

```
/// Go on drawing, this time nice rectangles
c = Drawing_Random_Rectangles(image, window_name, rng);
if( c != 0 ) return 0;

/// Draw some ellipses
c = Drawing_Random_Ellipses( image, window_name, rng );
if( c != 0 ) return 0;

/// Now some polylines
c = Drawing_Random_Polylines( image, window_name, rng );
if( c != 0 ) return 0;

/// Draw filled polygons
c = Drawing_Random_Filled_Polygons( image, window_name, rng );
if( c != 0 ) return 0;

/// Draw circles
c = Drawing_Random_Circles( image, window_name, rng );
if( c != 0 ) return 0;

/// Display text in random positions
c = Displaying_Random_Text( image, window_name, rng );
if( c != 0 ) return 0;

/// Displaying the big end!
c = Displaying_Big_End( image, window_name, rng );
@endcode
All of these functions follow the same pattern, so we will analyze only a couple of them, since
the same explanation applies for all.
```

\-# Checking out the function **Drawing\_Random\_Lines**: @code{.cpp} int Drawing\_Random\_Lines( Mat image, char\* window\_name, RNG rng ) { int lineType = 8; Point pt1, pt2;

```
  for( int i = 0; i < NUMBER; i++ )
  {
   pt1.x = rng.uniform( x_1, x_2 );
   pt1.y = rng.uniform( y_1, y_2 );
   pt2.x = rng.uniform( x_1, x_2 );
   pt2.y = rng.uniform( y_1, y_2 );

   line( image, pt1, pt2, randomColor(rng), rng.uniform(1, 10), 8 );
   imshow( window_name, image );
   if( waitKey( DELAY ) >= 0 )
   { return -1; }
  }
  return 0;
}
@endcode
We can observe the following:

-   The *for* loop will repeat **NUMBER** times. Since the function @ref cv::line is inside this
    loop, that means that **NUMBER** lines will be generated.
-   The line extremes are given by *pt1* and *pt2*. For *pt1* we can see that:
    @code{.cpp}
    pt1.x = rng.uniform( x_1, x_2 );
    pt1.y = rng.uniform( y_1, y_2 );
    @endcode
    -   We know that **rng** is a *Random number generator* object. In the code above we are
        calling **rng.uniform(a,b)**. This generates a randomly uniformed distribution between
        the values **a** and **b** (inclusive in **a**, exclusive in **b**).
    -   From the explanation above, we deduce that the extremes *pt1* and *pt2* will be random
        values, so the lines positions will be quite impredictable, giving a nice visual effect
        (check out the Result section below).
    -   As another observation, we notice that in the @ref cv::line arguments, for the *color*
        input we enter:
        @code{.cpp}
        randomColor(rng)
        @endcode
        Let's check the function implementation:
        @code{.cpp}
        static Scalar randomColor( RNG& rng )
          {
          int icolor = (unsigned) rng;
          return Scalar( icolor&255, (icolor>>8)&255, (icolor>>16)&255 );
          }
        @endcode
        As we can see, the return value is an *Scalar* with 3 randomly initialized values, which
        are used as the *R*, *G* and *B* parameters for the line color. Hence, the color of the
        lines will be random too!
```

\-# The explanation above applies for the other functions generating circles, ellipses, polygons, etc. The parameters such as _center_ and _vertices_ are also generated randomly. -# Before finishing, we also should take a look at the functions _Display\_Random\_Text_ and _Displaying\_Big\_End_, since they both have a few interesting features: -# **Display\_Random\_Text:** @code{.cpp} int Displaying\_Random\_Text( Mat image, char\* window\_name, RNG rng ) { int lineType = 8;

```
  for ( int i = 1; i < NUMBER; i++ )
  {
    Point org;
    org.x = rng.uniform(x_1, x_2);
    org.y = rng.uniform(y_1, y_2);

    putText( image, "Testing text rendering", org, rng.uniform(0,8),
             rng.uniform(0,100)*0.05+0.1, randomColor(rng), rng.uniform(1, 10), lineType);

    imshow( window_name, image );
    if( waitKey(DELAY) >= 0 )
      { return -1; }
  }

  return 0;
}
@endcode
Everything looks familiar but the expression:
@code{.cpp}
putText( image, "Testing text rendering", org, rng.uniform(0,8),
         rng.uniform(0,100)*0.05+0.1, randomColor(rng), rng.uniform(1, 10), lineType);
@endcode
So, what does the function @ref cv::putText do? In our example:

-   Draws the text **"Testing text rendering"** in **image**
-   The bottom-left corner of the text will be located in the Point **org**
-   The font type is a random integer value in the range: \f$[0, 8>\f$.
-   The scale of the font is denoted by the expression **rng.uniform(0, 100)x0.05 + 0.1**
    (meaning its range is: \f$[0.1, 5.1>\f$)
-   The text color is random (denoted by **randomColor(rng)**)
-   The text thickness ranges between 1 and 10, as specified by **rng.uniform(1,10)**

As a result, we will get (analagously to the other drawing functions) **NUMBER** texts over our
image, in random locations.
```

\-# **Displaying\_Big\_End** @code{.cpp} int Displaying\_Big\_End( Mat image, char\* window\_name, RNG rng ) { Size textsize = getTextSize("OpenCV forever!", FONT\_HERSHEY\_COMPLEX, 3, 5, 0); Point org((window\_width - textsize.width)/2, (window\_height - textsize.height)/2); int lineType = 8;

```
  Mat image2;

  for( int i = 0; i < 255; i += 2 )
  {
    image2 = image - Scalar::all(i);
    putText( image2, "OpenCV forever!", org, FONT_HERSHEY_COMPLEX, 3,
           Scalar(i, i, 255), 5, lineType );

    imshow( window_name, image2 );
    if( waitKey(DELAY) >= 0 )
      { return -1; }
  }

  return 0;
}
@endcode
Besides the function **getTextSize** (which gets the size of the argument text), the new
operation we can observe is inside the *foor* loop:
@code{.cpp}
image2 = image - Scalar::all(i)
@endcode
So, **image2** is the subtraction of **image** and **Scalar::all(i)**. In fact, what happens
here is that every pixel of **image2** will be the result of subtracting every pixel of
**image** minus the value of **i** (remember that for each pixel we are considering three values
such as R, G and B, so each of them will be affected)

Also remember that the subtraction operation *always* performs internally a **saturate**
operation, which means that the result obtained will always be inside the allowed range (no
negative and between 0 and 255 for our example).
```

## Result

As you just saw in the Code section, the program will sequentially execute diverse drawing functions, which will produce:

\-# First a random set of _NUMBER_ lines will appear on screen such as it can be seen in this screenshot:

```
![](images/Drawing_2_Tutorial_Result_0.jpg)
```

\-# Then, a new set of figures, these time _rectangles_ will follow. -# Now some ellipses will appear, each of them with random position, size, thickness and arc length:

```
![](images/Drawing_2_Tutorial_Result_2.jpg)
```

\-# Now, _polylines_ with 03 segments will appear on screen, again in random configurations.

```
![](images/Drawing_2_Tutorial_Result_3.jpg)
```

\-# Filled polygons (in this example triangles) will follow. -# The last geometric figure to appear: circles!

```
![](images/Drawing_2_Tutorial_Result_5.jpg)
```

\-# Near the end, the text _"Testing Text Rendering"_ will appear in a variety of fonts, sizes, colors and positions. -# And the big end (which by the way expresses a big truth too):

```
![](images/Drawing_2_Tutorial_Result_big.jpg)
```

## [Bounding Rects Circles](https://docharvest.github.io/docs/opencv5/tutorials/imgproc/shapedescriptors/bounding_rects_circles/bounding_rects_circles/)

Contents

opencv5

Bounding Rects Circles

OpenCV 5

Bounding Rects Circles

# Creating Bounding boxes and circles for contours {#tutorial\_bounding\_rects\_circles}

@tableofcontents

@prev\_tutorial{tutorial\_hull} @next\_tutorial{tutorial\_bounding\_rotated\_ellipses}

Original author

Ana Huamán

Compatibility

OpenCV >= 3.0

## Goal

In this tutorial you will learn how to:

-   Use the OpenCV function @ref cv::boundingRect
-   Use the OpenCV function @ref cv::minEnclosingCircle

## Theory

## Code

@add\_toggle\_cpp This tutorial code's is shown lines below. You can also download it from [here](https://github.com/opencv/opencv/tree/5.x/samples/cpp/tutorial_code/ShapeDescriptors/generalContours_demo1.cpp) @include samples/cpp/tutorial\_code/ShapeDescriptors/generalContours\_demo1.cpp @end\_toggle

@add\_toggle\_java This tutorial code's is shown lines below. You can also download it from [here](https://github.com/opencv/opencv/tree/5.x/samples/java/tutorial_code/ShapeDescriptors/bounding_rects_circles/GeneralContoursDemo1.java) @include samples/java/tutorial\_code/ShapeDescriptors/bounding\_rects\_circles/GeneralContoursDemo1.java @end\_toggle

@add\_toggle\_python This tutorial code's is shown lines below. You can also download it from [here](https://github.com/opencv/opencv/tree/5.x/samples/python/tutorial_code/ShapeDescriptors/bounding_rects_circles/generalContours_demo1.py) @include samples/python/tutorial\_code/ShapeDescriptors/bounding\_rects\_circles/generalContours\_demo1.py @end\_toggle

## Explanation

The main function is rather simple, as follows from the comments we do the following:

-   Open the image, convert it into grayscale and blur it to get rid of the noise.

@add\_toggle\_cpp @snippet samples/cpp/tutorial\_code/ShapeDescriptors/generalContours\_demo1.cpp setup @end\_toggle

@add\_toggle\_java @snippet samples/java/tutorial\_code/ShapeDescriptors/bounding\_rects\_circles/GeneralContoursDemo1.java setup @end\_toggle

@add\_toggle\_python @snippet samples/python/tutorial\_code/ShapeDescriptors/bounding\_rects\_circles/generalContours\_demo1.py setup @end\_toggle

-   Create a window with header "Source" and display the source file in it.

@add\_toggle\_cpp @snippet samples/cpp/tutorial\_code/ShapeDescriptors/generalContours\_demo1.cpp createWindow @end\_toggle

@add\_toggle\_java @snippet samples/java/tutorial\_code/ShapeDescriptors/bounding\_rects\_circles/GeneralContoursDemo1.java createWindow @end\_toggle

@add\_toggle\_python @snippet samples/python/tutorial\_code/ShapeDescriptors/bounding\_rects\_circles/generalContours\_demo1.py createWindow @end\_toggle

-   Create a trackbar on the `source_window` and assign a callback function to it. In general callback functions are used to react to some kind of signal, in our case it's trackbar's state change. Explicit one-time call of `thresh_callback` is necessary to display the "Contours" window simultaneously with the "Source" window.

@add\_toggle\_cpp @snippet samples/cpp/tutorial\_code/ShapeDescriptors/generalContours\_demo1.cpp trackbar @end\_toggle

@add\_toggle\_java @snippet samples/java/tutorial\_code/ShapeDescriptors/bounding\_rects\_circles/GeneralContoursDemo1.java trackbar @end\_toggle

@add\_toggle\_python @snippet samples/python/tutorial\_code/ShapeDescriptors/bounding\_rects\_circles/generalContours\_demo1.py trackbar @end\_toggle

The callback function does all the interesting job.

-   Use @ref cv::Canny to detect edges in the images.

@add\_toggle\_cpp @snippet samples/cpp/tutorial\_code/ShapeDescriptors/generalContours\_demo1.cpp Canny @end\_toggle

@add\_toggle\_java @snippet samples/java/tutorial\_code/ShapeDescriptors/bounding\_rects\_circles/GeneralContoursDemo1.java Canny @end\_toggle

@add\_toggle\_python @snippet samples/python/tutorial\_code/ShapeDescriptors/bounding\_rects\_circles/generalContours\_demo1.py Canny @end\_toggle

-   Finds contours and saves them to the vectors `contour` and `hierarchy`.

@add\_toggle\_cpp @snippet samples/cpp/tutorial\_code/ShapeDescriptors/generalContours\_demo1.cpp findContours @end\_toggle

@add\_toggle\_java @snippet samples/java/tutorial\_code/ShapeDescriptors/bounding\_rects\_circles/GeneralContoursDemo1.java findContours @end\_toggle

@add\_toggle\_python @snippet samples/python/tutorial\_code/ShapeDescriptors/bounding\_rects\_circles/generalContours\_demo1.py findContours @end\_toggle

-   For every found contour we now apply approximation to polygons with accuracy +-3 and stating that the curve must be closed. After that we find a bounding rect for every polygon and save it to `boundRect`. At last we find a minimum enclosing circle for every polygon and save it to `center` and `radius` vectors.

@add\_toggle\_cpp @snippet samples/cpp/tutorial\_code/ShapeDescriptors/generalContours\_demo1.cpp allthework @end\_toggle

@add\_toggle\_java @snippet samples/java/tutorial\_code/ShapeDescriptors/bounding\_rects\_circles/GeneralContoursDemo1.java allthework @end\_toggle

@add\_toggle\_python @snippet samples/python/tutorial\_code/ShapeDescriptors/bounding\_rects\_circles/generalContours\_demo1.py allthework @end\_toggle

We found everything we need, all we have to do is to draw.

-   Create new Mat of unsigned 8-bit chars, filled with zeros. It will contain all the drawings we are going to make (rects and circles).

@add\_toggle\_cpp @snippet samples/cpp/tutorial\_code/ShapeDescriptors/generalContours\_demo1.cpp zeroMat @end\_toggle

@add\_toggle\_java @snippet samples/java/tutorial\_code/ShapeDescriptors/bounding\_rects\_circles/GeneralContoursDemo1.java zeroMat @end\_toggle

@add\_toggle\_python @snippet samples/python/tutorial\_code/ShapeDescriptors/bounding\_rects\_circles/generalContours\_demo1.py zeroMat @end\_toggle

-   For every contour: pick a random color, draw the contour, the bounding rectangle and the minimal enclosing circle with it.

@add\_toggle\_cpp @snippet samples/cpp/tutorial\_code/ShapeDescriptors/generalContours\_demo1.cpp forContour @end\_toggle

@add\_toggle\_java @snippet samples/java/tutorial\_code/ShapeDescriptors/bounding\_rects\_circles/GeneralContoursDemo1.java forContour @end\_toggle

@add\_toggle\_python @snippet samples/python/tutorial\_code/ShapeDescriptors/bounding\_rects\_circles/generalContours\_demo1.py forContour @end\_toggle

-   Display the results: create a new window "Contours" and show everything we added to drawings on it.

@add\_toggle\_cpp @snippet samples/cpp/tutorial\_code/ShapeDescriptors/generalContours\_demo1.cpp showDrawings @end\_toggle

@add\_toggle\_java @snippet samples/java/tutorial\_code/ShapeDescriptors/bounding\_rects\_circles/GeneralContoursDemo1.java showDrawings @end\_toggle

@add\_toggle\_python @snippet samples/python/tutorial\_code/ShapeDescriptors/bounding\_rects\_circles/generalContours\_demo1.py showDrawings @end\_toggle

## Result

Here it is:

## [Bounding Rotated Ellipses](https://docharvest.github.io/docs/opencv5/tutorials/imgproc/shapedescriptors/bounding_rotated_ellipses/bounding_rotated_ellipses/)

Contents

opencv5

Bounding Rotated Ellipses

OpenCV 5

Bounding Rotated Ellipses

# Creating Bounding rotated boxes and ellipses for contours {#tutorial\_bounding\_rotated\_ellipses}

@tableofcontents

@prev\_tutorial{tutorial\_bounding\_rects\_circles} @next\_tutorial{tutorial\_moments}

Original author

Ana Huamán

Compatibility

OpenCV >= 3.0

## Goal

In this tutorial you will learn how to:

-   Use the OpenCV function @ref cv::minAreaRect
-   Use the OpenCV function @ref cv::fitEllipse

## Theory

## Code

@add\_toggle\_cpp This tutorial code's is shown lines below. You can also download it from [here](https://github.com/opencv/opencv/tree/5.x/samples/cpp/tutorial_code/ShapeDescriptors/generalContours_demo2.cpp) @include samples/cpp/tutorial\_code/ShapeDescriptors/generalContours\_demo2.cpp @end\_toggle

@add\_toggle\_java This tutorial code's is shown lines below. You can also download it from [here](https://github.com/opencv/opencv/tree/5.x/samples/java/tutorial_code/ShapeDescriptors/bounding_rotated_ellipses/GeneralContoursDemo2.java) @include samples/java/tutorial\_code/ShapeDescriptors/bounding\_rotated\_ellipses/GeneralContoursDemo2.java @end\_toggle

@add\_toggle\_python This tutorial code's is shown lines below. You can also download it from [here](https://github.com/opencv/opencv/tree/5.x/samples/python/tutorial_code/ShapeDescriptors/bounding_rotated_ellipses/generalContours_demo2.py) @include samples/python/tutorial\_code/ShapeDescriptors/bounding\_rotated\_ellipses/generalContours\_demo2.py @end\_toggle

## Explanation

## Result

Here it is:

## [Find Contours](https://docharvest.github.io/docs/opencv5/tutorials/imgproc/shapedescriptors/find_contours/find_contours/)

Contents

opencv5

Find Contours

OpenCV 5

Find Contours

# Finding contours in your image {#tutorial\_find\_contours}

@tableofcontents

@prev\_tutorial{tutorial\_template\_matching} @next\_tutorial{tutorial\_hull}

Original author

Ana Huamán

Compatibility

OpenCV >= 3.0

## Goal

In this tutorial you will learn how to:

-   Use the OpenCV function @ref cv::findContours
-   Use the OpenCV function @ref cv::drawContours

## Theory

## Code

@add\_toggle\_cpp This tutorial code's is shown lines below. You can also download it from [here](https://github.com/opencv/opencv/tree/5.x/samples/cpp/tutorial_code/ShapeDescriptors/findContours_demo.cpp) @include samples/cpp/tutorial\_code/ShapeDescriptors/findContours\_demo.cpp @end\_toggle

@add\_toggle\_java This tutorial code's is shown lines below. You can also download it from [here](https://github.com/opencv/opencv/tree/5.x/samples/java/tutorial_code/ShapeDescriptors/find_contours/FindContoursDemo.java) @include samples/java/tutorial\_code/ShapeDescriptors/find\_contours/FindContoursDemo.java @end\_toggle

@add\_toggle\_python This tutorial code's is shown lines below. You can also download it from [here](https://github.com/opencv/opencv/tree/5.x/samples/python/tutorial_code/ShapeDescriptors/find_contours/findContours_demo.py) @include samples/python/tutorial\_code/ShapeDescriptors/find\_contours/findContours\_demo.py @end\_toggle

## Explanation

## Result

Here it is:

## [Hull](https://docharvest.github.io/docs/opencv5/tutorials/imgproc/shapedescriptors/hull/hull/)

Contents

opencv5

Hull

OpenCV 5

Hull

# Convex Hull {#tutorial\_hull}

@tableofcontents

@prev\_tutorial{tutorial\_find\_contours} @next\_tutorial{tutorial\_bounding\_rects\_circles}

Original author

Ana Huamán

Compatibility

OpenCV >= 3.0

## Goal

In this tutorial you will learn how to:

-   Use the OpenCV function @ref cv::convexHull

## Theory

## Code

@add\_toggle\_cpp This tutorial code's is shown lines below. You can also download it from [here](https://github.com/opencv/opencv/tree/5.x/samples/cpp/tutorial_code/ShapeDescriptors/hull_demo.cpp) @include samples/cpp/tutorial\_code/ShapeDescriptors/hull\_demo.cpp @end\_toggle

@add\_toggle\_java This tutorial code's is shown lines below. You can also download it from [here](https://github.com/opencv/opencv/tree/5.x/samples/java/tutorial_code/ShapeDescriptors/hull/HullDemo.java) @include samples/java/tutorial\_code/ShapeDescriptors/hull/HullDemo.java @end\_toggle

@add\_toggle\_python This tutorial code's is shown lines below. You can also download it from [here](https://github.com/opencv/opencv/tree/5.x/samples/python/tutorial_code/ShapeDescriptors/hull/hull_demo.py) @include samples/python/tutorial\_code/ShapeDescriptors/hull/hull\_demo.py @end\_toggle

## Explanation

## Result

Here it is:

## [Moments](https://docharvest.github.io/docs/opencv5/tutorials/imgproc/shapedescriptors/moments/moments/)

Contents

opencv5

Moments

OpenCV 5

Moments

# Image Moments {#tutorial\_moments}

@tableofcontents

@prev\_tutorial{tutorial\_bounding\_rotated\_ellipses} @next\_tutorial{tutorial\_point\_polygon\_test}

Original author

Ana Huamán

Compatibility

OpenCV >= 3.0

## Goal

In this tutorial you will learn how to:

-   Use the OpenCV function @ref cv::moments
-   Use the OpenCV function @ref cv::contourArea
-   Use the OpenCV function @ref cv::arcLength

## Theory

## Code

@add\_toggle\_cpp This tutorial code's is shown lines below. You can also download it from [here](https://github.com/opencv/opencv/tree/5.x/samples/cpp/tutorial_code/ShapeDescriptors/moments_demo.cpp) @include samples/cpp/tutorial\_code/ShapeDescriptors/moments\_demo.cpp @end\_toggle

@add\_toggle\_java This tutorial code's is shown lines below. You can also download it from [here](https://github.com/opencv/opencv/tree/5.x/samples/java/tutorial_code/ShapeDescriptors/moments/MomentsDemo.java) @include samples/java/tutorial\_code/ShapeDescriptors/moments/MomentsDemo.java @end\_toggle

@add\_toggle\_python This tutorial code's is shown lines below. You can also download it from [here](https://github.com/opencv/opencv/tree/5.x/samples/python/tutorial_code/ShapeDescriptors/moments/moments_demo.py) @include samples/python/tutorial\_code/ShapeDescriptors/moments/moments\_demo.py @end\_toggle

## Explanation

## Result

Here it is:

## [Point Polygon Test](https://docharvest.github.io/docs/opencv5/tutorials/imgproc/shapedescriptors/point_polygon_test/point_polygon_test/)

Contents

opencv5

Point Polygon Test

OpenCV 5

Point Polygon Test

# Point Polygon Test {#tutorial\_point\_polygon\_test}

@tableofcontents

@prev\_tutorial{tutorial\_moments} @next\_tutorial{tutorial\_distance\_transform}

Original author

Ana Huamán

Compatibility

OpenCV >= 3.0

## Goal

In this tutorial you will learn how to:

-   Use the OpenCV function @ref cv::pointPolygonTest

## Theory

## Code

@add\_toggle\_cpp This tutorial code's is shown lines below. You can also download it from [here](https://github.com/opencv/opencv/tree/5.x/samples/cpp/tutorial_code/ShapeDescriptors/pointPolygonTest_demo.cpp) @include samples/cpp/tutorial\_code/ShapeDescriptors/pointPolygonTest\_demo.cpp @end\_toggle

@add\_toggle\_java This tutorial code's is shown lines below. You can also download it from [here](https://github.com/opencv/opencv/tree/5.x/samples/java/tutorial_code/ShapeDescriptors/point_polygon_test/PointPolygonTestDemo.java) @include samples/java/tutorial\_code/ShapeDescriptors/point\_polygon\_test/PointPolygonTestDemo.java @end\_toggle

@add\_toggle\_python This tutorial code's is shown lines below. You can also download it from [here](https://github.com/opencv/opencv/tree/5.x/samples/python/tutorial_code/ShapeDescriptors/point_polygon_test/pointPolygonTest_demo.py) @include samples/python/tutorial\_code/ShapeDescriptors/point\_polygon\_test/pointPolygonTest\_demo.py @end\_toggle

## Explanation

## Result

Here it is:

## [Table Of Content Imgproc](https://docharvest.github.io/docs/opencv5/tutorials/imgproc/table_of_content_imgproc/)

Contents

opencv5

Table Of Content Imgproc

OpenCV 5

Table Of Content Imgproc

# Image Processing (imgproc module) {#tutorial\_table\_of\_content\_imgproc}

@tableofcontents

The imgproc module in OpenCV is a collection of per-pixel image operations (color conversions, filters) drawing (contours, objects, text), and geometry transformations (warping, resize) useful for computer vision. Here's an overview of the content in the imgproc module, categorized for easier navigation:

##### Basic

These tutorials cover fundamental image processing tasks, such as drawing on images, applying filters, and morphological operations.

-   @subpage tutorial\_basic\_geometric\_drawing
-   @subpage tutorial\_random\_generator\_and\_text
-   @subpage tutorial\_gausian\_median\_blur\_bilateral\_filter
-   @subpage tutorial\_erosion\_dilatation
-   @subpage tutorial\_opening\_closing\_hats
-   @subpage tutorial\_hitOrMiss
-   @subpage tutorial\_morph\_lines\_detection
-   @subpage tutorial\_pyramids
-   @subpage tutorial\_threshold
-   @subpage tutorial\_threshold\_inRange

##### Transformations

These tutorials explore more advanced transformations that modify the image in various ways, such as filtering, warping, and edge detection.

-   @subpage tutorial\_filter\_2d
-   @subpage tutorial\_copyMakeBorder
-   @subpage tutorial\_sobel\_derivatives
-   @subpage tutorial\_laplace\_operator
-   @subpage tutorial\_canny\_detector
-   @subpage tutorial\_hough\_lines
-   @subpage tutorial\_hough\_circle
-   @subpage tutorial\_generalized\_hough\_ballard\_guil
-   @subpage tutorial\_remap
-   @subpage tutorial\_warp\_affine

##### Histograms

Histograms are vital for image analysis, and these tutorials cover operations like equalization, comparison, and back projection.

-   @subpage tutorial\_histogram\_equalization
-   @subpage tutorial\_histogram\_calculation
-   @subpage tutorial\_histogram\_comparison
-   @subpage tutorial\_back\_projection
-   @subpage tutorial\_template\_matching

##### Contours

Contours are curves that represent the boundaries of objects in an image. These tutorials cover techniques to detect and analyze contours.

-   @subpage tutorial\_find\_contours
-   @subpage tutorial\_hull
-   @subpage tutorial\_bounding\_rects\_circles
-   @subpage tutorial\_bounding\_rotated\_ellipses
-   @subpage tutorial\_moments
-   @subpage tutorial\_point\_polygon\_test

##### Others

These tutorials cover specialized image processing techniques for more complex tasks like deblurring, noise removal, and image segmentation.

-   @subpage tutorial\_distance\_transform
-   @subpage tutorial\_out\_of\_focus\_deblur\_filter
-   @subpage tutorial\_motion\_deblur\_filter
-   @subpage tutorial\_anisotropic\_image\_segmentation\_by\_a\_gst
-   @subpage tutorial\_periodic\_noise\_removing\_filter

## [Table Of Contents Contours](https://docharvest.github.io/docs/opencv5/tutorials/imgproc/table_of_contents_contours/)

Contents

opencv5

Table Of Contents Contours

OpenCV 5

Table Of Contents Contours

# Contours in OpenCV {#tutorial\_table\_of\_contents\_contours}

Content has been moved to this page: @ref tutorial\_table\_of\_content\_imgproc

## [Threshold InRange](https://docharvest.github.io/docs/opencv5/tutorials/imgproc/threshold_inRange/threshold_inRange/)

Contents

opencv5

Threshold InRange

OpenCV 5

Threshold InRange

# Thresholding Operations using inRange {#tutorial\_threshold\_inRange}

@tableofcontents

@prev\_tutorial{tutorial\_threshold} @next\_tutorial{tutorial\_filter\_2d}

Original author

Lorena García

Compatibility

Rishiraj Surti

## Goal

In this tutorial you will learn how to:

-   Perform basic thresholding operations using OpenCV @ref cv::inRange function.
-   Detect an object based on the range of pixel values in the HSV colorspace.

## Theory

-   In the previous tutorial, we learnt how to perform thresholding using @ref cv::threshold function.
-   In this tutorial, we will learn how to do it using @ref cv::inRange function.
-   The concept remains the same, but now we add a range of pixel values we need.

## HSV colorspace

[HSV](https://en.wikipedia.org/wiki/HSL_and_HSV) (hue, saturation, value) colorspace is a model to represent the colorspace similar to the RGB color model. Since the hue channel models the color type, it is very useful in image processing tasks that need to segment objects based on its color. Variation of the saturation goes from unsaturated to represent shades of gray and fully saturated (no white component). Value channel describes the brightness or the intensity of the color. Next image shows the HSV cylinder.

Since colors in the RGB colorspace are coded using the three channels, it is more difficult to segment an object in the image based on its color.

Formulas used to convert from one colorspace to another colorspace using @ref cv::cvtColor function are described in @ref imgproc\_color\_conversions

## Code

@add\_toggle\_cpp The tutorial code's is shown lines below. You can also download it from [here](https://github.com/opencv/opencv/tree/5.x/samples/cpp/tutorial_code/ImgProc/Threshold_inRange.cpp) @include samples/cpp/tutorial\_code/ImgProc/Threshold\_inRange.cpp @end\_toggle

@add\_toggle\_java The tutorial code's is shown lines below. You can also download it from [here](https://github.com/opencv/opencv/tree/5.x/samples/java/tutorial_code/ImgProc/threshold_inRange/ThresholdInRange.java) @include samples/java/tutorial\_code/ImgProc/threshold\_inRange/ThresholdInRange.java @end\_toggle

@add\_toggle\_python The tutorial code's is shown lines below. You can also download it from [here](https://github.com/opencv/opencv/tree/5.x/samples/python/tutorial_code/imgProc/threshold_inRange/threshold_inRange.py) @include samples/python/tutorial\_code/imgProc/threshold\_inRange/threshold\_inRange.py @end\_toggle

## Explanation

Let's check the general structure of the program:

-   Capture the video stream from default or supplied capturing device.
    
    @add\_toggle\_cpp @snippet samples/cpp/tutorial\_code/ImgProc/Threshold\_inRange.cpp cap @end\_toggle
    
    @add\_toggle\_java @snippet samples/java/tutorial\_code/ImgProc/threshold\_inRange/ThresholdInRange.java cap @end\_toggle
    
    @add\_toggle\_python @snippet samples/python/tutorial\_code/imgProc/threshold\_inRange/threshold\_inRange.py cap @end\_toggle
    
-   Create a window to display the default frame and the threshold frame.
    
    @add\_toggle\_cpp @snippet samples/cpp/tutorial\_code/ImgProc/Threshold\_inRange.cpp window @end\_toggle
    
    @add\_toggle\_java @snippet samples/java/tutorial\_code/ImgProc/threshold\_inRange/ThresholdInRange.java window @end\_toggle
    
    @add\_toggle\_python @snippet samples/python/tutorial\_code/imgProc/threshold\_inRange/threshold\_inRange.py window @end\_toggle
    
-   Create the trackbars to set the range of HSV values
    
    @add\_toggle\_cpp @snippet samples/cpp/tutorial\_code/ImgProc/Threshold\_inRange.cpp trackbar @end\_toggle
    
    @add\_toggle\_java @snippet samples/java/tutorial\_code/ImgProc/threshold\_inRange/ThresholdInRange.java trackbar @end\_toggle
    
    @add\_toggle\_python @snippet samples/python/tutorial\_code/imgProc/threshold\_inRange/threshold\_inRange.py trackbar @end\_toggle
    
-   Until the user want the program to exit do the following
    
    @add\_toggle\_cpp @snippet samples/cpp/tutorial\_code/ImgProc/Threshold\_inRange.cpp while @end\_toggle
    
    @add\_toggle\_java @snippet samples/java/tutorial\_code/ImgProc/threshold\_inRange/ThresholdInRange.java while @end\_toggle
    
    @add\_toggle\_python @snippet samples/python/tutorial\_code/imgProc/threshold\_inRange/threshold\_inRange.py while @end\_toggle
    
-   Show the images
    
    @add\_toggle\_cpp @snippet samples/cpp/tutorial\_code/ImgProc/Threshold\_inRange.cpp show @end\_toggle
    
    @add\_toggle\_java @snippet samples/java/tutorial\_code/ImgProc/threshold\_inRange/ThresholdInRange.java show @end\_toggle
    
    @add\_toggle\_python @snippet samples/python/tutorial\_code/imgProc/threshold\_inRange/threshold\_inRange.py show @end\_toggle
    
-   For a trackbar which controls the lower range, say for example hue value:
    
    @add\_toggle\_cpp @snippet samples/cpp/tutorial\_code/ImgProc/Threshold\_inRange.cpp low @end\_toggle
    
    @add\_toggle\_java @snippet samples/java/tutorial\_code/ImgProc/threshold\_inRange/ThresholdInRange.java low @end\_toggle
    
    @add\_toggle\_python @snippet samples/python/tutorial\_code/imgProc/threshold\_inRange/threshold\_inRange.py low @end\_toggle @snippet samples/cpp/tutorial\_code/ImgProc/Threshold\_inRange.cpp low
    
-   For a trackbar which controls the upper range, say for example hue value:
    
    @add\_toggle\_cpp @snippet samples/cpp/tutorial\_code/ImgProc/Threshold\_inRange.cpp high @end\_toggle
    
    @add\_toggle\_java @snippet samples/java/tutorial\_code/ImgProc/threshold\_inRange/ThresholdInRange.java high @end\_toggle
    
    @add\_toggle\_python @snippet samples/python/tutorial\_code/imgProc/threshold\_inRange/threshold\_inRange.py high @end\_toggle
    
-   It is necessary to find the maximum and minimum value to avoid discrepancies such as the high value of threshold becoming less than the low value.
    

## Results

-   After compiling this program, run it. The program will open two windows
    
-   As you set the range values from the trackbar, the resulting frame will be visible in the other window.

## [Threshold](https://docharvest.github.io/docs/opencv5/tutorials/imgproc/threshold/threshold/)

Contents

opencv5

Threshold

OpenCV 5

Threshold

# Basic Thresholding Operations {#tutorial\_threshold}

@tableofcontents

@prev\_tutorial{tutorial\_pyramids} @next\_tutorial{tutorial\_threshold\_inRange}

Original author

Ana Huamán

Compatibility

OpenCV >= 3.0

## Goal

In this tutorial you will learn how to:

-   Perform basic thresholding operations using OpenCV function @ref cv::threshold

## Cool Theory

@note The explanation below belongs to the book **Learning OpenCV** by Bradski and Kaehler. What is

## Thresholding?

-   The simplest segmentation method
    
-   Application example: Separate out regions of an image corresponding to objects which we want to analyze. This separation is based on the variation of intensity between the object pixels and the background pixels.
    
-   To differentiate the pixels we are interested in from the rest (which will eventually be rejected), we perform a comparison of each pixel intensity value with respect to a _threshold_ (determined according to the problem to solve).
    
-   Once we have separated properly the important pixels, we can set them with a determined value to identify them (i.e. we can assign them a value of \\f$0\\f$ (black), \\f$255\\f$ (white) or any value that suits your needs).
    

### Types of Thresholding

-   OpenCV offers the function @ref cv::threshold to perform thresholding operations.
    
-   We can effectuate \\f$5\\f$ types of Thresholding operations with this function. We will explain them in the following subsections.
    
-   To illustrate how these thresholding processes work, let's consider that we have a source image with pixels with intensity values \\f$src(x,y)\\f$. The plot below depicts this. The horizontal blue line represents the threshold \\f$thresh\\f$ (fixed).
    

#### Threshold Binary

-   This thresholding operation can be expressed as:
    
    \\f\[\\texttt{dst} (x,y) = \\fork{\\texttt{maxVal}}{if (\\texttt{src}(x,y) > \\texttt{thresh})}{0}{otherwise}\\f\]
    
-   So, if the intensity of the pixel \\f$src(x,y)\\f$ is higher than \\f$thresh\\f$, then the new pixel intensity is set to a \\f$MaxVal\\f$. Otherwise, the pixels are set to \\f$0\\f$.
    

#### Threshold Binary, Inverted

-   This thresholding operation can be expressed as:
    
    \\f\[\\texttt{dst} (x,y) = \\fork{0}{if (\\texttt{src}(x,y) > \\texttt{thresh})}{\\texttt{maxVal}}{otherwise}\\f\]
    
-   If the intensity of the pixel \\f$src(x,y)\\f$ is higher than \\f$thresh\\f$, then the new pixel intensity is set to a \\f$0\\f$. Otherwise, it is set to \\f$MaxVal\\f$.
    

#### Truncate

-   This thresholding operation can be expressed as:
    
    \\f\[\\texttt{dst} (x,y) = \\fork{\\texttt{threshold}}{if (\\texttt{src}(x,y) > \\texttt{thresh})}{\\texttt{src}(x,y)}{otherwise}\\f\]
    
-   The maximum intensity value for the pixels is \\f$thresh\\f$, if \\f$src(x,y)\\f$ is greater, then its value is _truncated_. See figure below:
    

#### Threshold to Zero

-   This operation can be expressed as:
    
    \\f\[\\texttt{dst} (x,y) = \\fork{\\texttt{src}(x,y)}{if (\\texttt{src}(x,y) > \\texttt{thresh})}{0}{otherwise}\\f\]
    
-   If \\f$src(x,y)\\f$ is lower than \\f$thresh\\f$, the new pixel value will be set to \\f$0\\f$.
    

#### Threshold to Zero, Inverted

-   This operation can be expressed as:
    
    \\f\[\\texttt{dst} (x,y) = \\fork{0}{if (\\texttt{src}(x,y) > \\texttt{thresh})}{\\texttt{src}(x,y)}{otherwise}\\f\]
    
-   If \\f$src(x,y)\\f$ is greater than \\f$thresh\\f$, the new pixel value will be set to \\f$0\\f$.
    

## Code

@add\_toggle\_cpp The tutorial code's is shown lines below. You can also download it from [here](https://github.com/opencv/opencv/tree/5.x/samples/cpp/tutorial_code/ImgProc/Threshold.cpp) @include samples/cpp/tutorial\_code/ImgProc/Threshold.cpp @end\_toggle

@add\_toggle\_java The tutorial code's is shown lines below. You can also download it from [here](https://github.com/opencv/opencv/tree/5.x/samples/java/tutorial_code/ImgProc/threshold/Threshold.java) @include samples/java/tutorial\_code/ImgProc/threshold/Threshold.java @end\_toggle

@add\_toggle\_python The tutorial code's is shown lines below. You can also download it from [here](https://github.com/opencv/opencv/tree/5.x/samples/python/tutorial_code/imgProc/threshold/threshold.py) @include samples/python/tutorial\_code/imgProc/threshold/threshold.py @end\_toggle

## Explanation

Let's check the general structure of the program:

-   Load an image. If it is BGR we convert it to Grayscale. For this, remember that we can use the function @ref cv::cvtColor :

@add\_toggle\_cpp @snippet samples/cpp/tutorial\_code/ImgProc/Threshold.cpp load @end\_toggle

@add\_toggle\_java @snippet samples/java/tutorial\_code/ImgProc/threshold/Threshold.java load @end\_toggle

@add\_toggle\_python @snippet samples/python/tutorial\_code/imgProc/threshold/threshold.py load @end\_toggle

-   Create a window to display the result

@add\_toggle\_cpp @snippet samples/cpp/tutorial\_code/ImgProc/Threshold.cpp window @end\_toggle

@add\_toggle\_java @snippet samples/java/tutorial\_code/ImgProc/threshold/Threshold.java window @end\_toggle

@add\_toggle\_python @snippet samples/python/tutorial\_code/imgProc/threshold/threshold.py window @end\_toggle

-   Create \\f$2\\f$ trackbars for the user to enter user input:
    
    -   **Type of thresholding**: Binary, To Zero, etc...
    -   **Threshold value**

@add\_toggle\_cpp @snippet samples/cpp/tutorial\_code/ImgProc/Threshold.cpp trackbar @end\_toggle

@add\_toggle\_java @snippet samples/java/tutorial\_code/ImgProc/threshold/Threshold.java trackbar @end\_toggle

@add\_toggle\_python @snippet samples/python/tutorial\_code/imgProc/threshold/threshold.py trackbar @end\_toggle

-   Wait until the user enters the threshold value, the type of thresholding (or until the program exits)
-   Whenever the user changes the value of any of the Trackbars, the function _Threshold\_Demo_ (_update_ in Java) is called:

@add\_toggle\_cpp @snippet samples/cpp/tutorial\_code/ImgProc/Threshold.cpp Threshold\_Demo @end\_toggle

@add\_toggle\_java @snippet samples/java/tutorial\_code/ImgProc/threshold/Threshold.java Threshold\_Demo @end\_toggle

@add\_toggle\_python @snippet samples/python/tutorial\_code/imgProc/threshold/threshold.py Threshold\_Demo @end\_toggle

As you can see, the function @ref cv::threshold is invoked. We give \\f$5\\f$ parameters in C++ code:

-   _src\_gray_: Our input image
-   _dst_: Destination (output) image
-   _threshold\_value_: The \\f$thresh\\f$ value with respect to which the thresholding operation is made
-   _max\_BINARY\_value_: The value used with the Binary thresholding operations (to set the chosen pixels)
-   _threshold\_type_: One of the \\f$5\\f$ thresholding operations. They are listed in the comment section of the function above.

## Results

\-# After compiling this program, run it giving a path to an image as argument. For instance, for an input image as:

```
![](images/Threshold_Tutorial_Original_Image.jpg)
```

\-# First, we try to threshold our image with a _binary threshold inverted_. We expect that the pixels brighter than the \\f$thresh\\f$ will turn dark, which is what actually happens, as we can see in the snapshot below (notice from the original image, that the doggie's tongue and eyes are particularly bright in comparison with the image, this is reflected in the output image).

```
![](images/Threshold_Tutorial_Result_Binary_Inverted.jpg)
```

\-# Now we try with the _threshold to zero_. With this, we expect that the darkest pixels (below the threshold) will become completely black, whereas the pixels with value greater than the threshold will keep its original value. This is verified by the following snapshot of the output image:

```
![](images/Threshold_Tutorial_Result_Zero.jpg)
```

## [Android Dev Intro](https://docharvest.github.io/docs/opencv5/tutorials/introduction/android_binary_package/android_dev_intro/)

Contents

opencv5

Android Dev Intro

OpenCV 5

Android Dev Intro

# Introduction into Android Development {#tutorial\_android\_dev\_intro}

@prev\_tutorial{tutorial\_clojure\_dev\_intro} @next\_tutorial{tutorial\_dev\_with\_OCV\_on\_Android}

Original author

Rostislav Vasilikhin

Compatibility

OpenCV >= 4.0

@tableofcontents

This guide was designed to help you in learning Android development basics and setting up your working environment quickly. It was tested with Ubuntu 22.04 and Windows 10.

If you encounter any error after thoroughly following these steps, feel free to contact us via OpenCV [Forum](https://forum.opencv.org). We'll do our best to help you out.

## Preface

Android is a Linux-based, open source mobile operating system developed by Open Handset Alliance led by Google. See the [Android home site](http://www.android.com/about/) for general details.

Development for Android significantly differs from development for other platforms. So before starting programming for Android we recommend you make sure that you are familiar with the following key topics:

\-# [Java](http://en.wikipedia.org/wiki/Java_\(programming_language\)) programming language that is the primary development technology for Android OS. Also, you can find [Oracle docs on Java](http://docs.oracle.com/javase/) useful. -# [Java Native Interface (JNI)](http://en.wikipedia.org/wiki/Java_Native_Interface) that is a technology of running native code in Java virtual machine. Also, you can find [Oracle docs on JNI](http://docs.oracle.com/javase/7/docs/technotes/guides/jni/) useful. -# [Android Activity](http://developer.android.com/training/basics/activity-lifecycle/starting.html) and its life-cycle, that is an essential Android API class. -# OpenCV development will certainly require some knowledge of the [Android Camera](http://developer.android.com/guide/topics/media/camera.html) specifics.

## Manual environment setup for Android development

In this tutorial we're gonna use an official Android Studio IDE and a set of other freely available tools.

### Get tools and dependencies

Here's how to get a ready to work environment:

1.  Download and install Android Studio:
    
    -   Ubuntu:
        1.  Download Android Studio: [https://developer.android.com/studio](https://developer.android.com/studio)
        2.  Extract the tar.gz archive
        3.  Follow the instructions in `Install-Linux-tar.txt`: open `android-studio/bin` folder in terminal and run `./studio.sh`
        4.  Perform standard installation through GUI
        5.  Optionally you can add a shortcut on a desktop for a quick access by clicking menu _**Tools -> Create desktop entry**_. The menu appears after any project is created or opened.
    -   Windows: Just download Android Studio from the official site and run installer.
2.  Install fresh Android SDK and NDK:
    
    1.  Open SDK manager in Android Studio (_**Customize -> All Settings -> Languages & Frameworks -> Android SDK**_)
    2.  Enable "Show Package Details" checkbox
    3.  Check SDK and NDK of the latest versions and press OK
    4.  Make sure that your device support the chosen SDK versions
3.  Install all the necessary packages for the build:
    
    -   `sudo apt install git cmake ninja-build openjdk-17-jdk openjdk-17-jre`
    -   the rest required packages are dependencies and should be installed automatically

### Check OpenCV examples

1.  Download OpenCV from Android SDK from official [release page on Github](https://github.com/opencv/opencv/releases) or [SourceForge](https://sourceforge.net/projects/opencvlibrary/).
2.  Extract zip archive with your OS tools.
3.  Open the project `<YOUR_OPENCV_BUILD_FOLDER>/OpenCV-android-sdk/samples` in Android Studio.
4.  Connect your device
    -   Debugging should be enabled on a device, you can find an instruction about it across the web
    -   Alternatively you can use a virtual device that comes with the Android studio
5.  Choose a sample from the drop-down menu (for example, `15-puzzle`) and run it.

## Setup Device for Testing and Debugging

Usually the recipe above works as expected, but in some cases there are additional actions that must be performed. In this section we'll cover some cases.

### Windows host computer

If you have Windows 10 or higher then you don't have to do additional actions to connect a phone and run samples on it. However, earlier Windows versions require a longer procedure:

\-# Enable USB debugging on the Android device (via Settings menu). -# Attach the Android device to your PC with a USB cable. -# Go to Start Menu and **right-click** on Computer. Select Manage in the context menu. You may be asked for Administrative permissions. -# Select Device Manager in the left pane and find an unknown device in the list. You may try unplugging it and then plugging back in order to check whether it's your exact equipment appears in the list.

```
![](images/usb_device_connect_01.png)
```

\-# Try your luck installing Google USB drivers without any modifications: **right-click** on the unknown device, select Properties menu item --> Details tab --> Update Driver button.

```
![](images/usb_device_connect_05.png)
```

\-# Select Browse computer for driver software.

```
![](images/usb_device_connect_06.png)
```

\-# Specify the path to `<Android SDK folder>/extras/google/usb_driver/` folder.

```
![](images/usb_device_connect_07.png)
```

\-# If you get the prompt to install unverified drivers and report about success - you've finished with USB driver installation.

```
![](images/usb_device_connect_08.png)

![](images/usb_device_connect_09.png)
```

\-# Otherwise (getting the failure like shown below) follow the next steps.

```
![](images/usb_device_connect_12.png)
```

\-# Again **right-click** on the unknown device, select Properties --> Details --> Hardware Ids and copy the line like `USB\VID_XXXX&PID_XXXX&MI_XX`.

```
![](images/usb_device_connect_02.png)
```

\-# Now open file `<Android SDK folder>/extras/google/usb_driver/android_winusb.inf`. Select either Google.NTx86 or Google.NTamd64 section depending on your host system architecture.

```
![](images/usb_device_connect_03.png)
```

\-# There should be a record like existing ones for your device and you need to add one manually.

```
![](images/usb_device_connect_04.png)
```

\-# Save the `android_winusb.inf` file and try to install the USB driver again.

```
![](images/usb_device_connect_05.png)

![](images/usb_device_connect_06.png)

![](images/usb_device_connect_07.png)
```

\-# This time installation should go successfully.

```
![](images/usb_device_connect_08.png)

![](images/usb_device_connect_09.png)
```

\-# And an unknown device is now recognized as an Android phone.

```
![](images/usb_device_connect_10.png)
```

\-# Successful device USB connection can be verified in console via adb devices command.

```
![](images/usb_device_connect_11.png)
```

\-# Now, in Eclipse go Run -> Run/Debug to run your application in regular or debugging mode. Device Chooser will let you choose among the devices.

### Linux host computer

While the latest Ubuntu versions work well with connected Android devices, there can be issues on older versions. However, most of them can be fixed easily. You have to create a new **/etc/udev/rules.d/51-android.rules** configuration file that contains information about your Android device. You may find some Vendor ID's [here](http://developer.android.com/tools/device.html#VendorIds) or execute lsusb command to view VendorID of plugged Android device. Here is an example of such file for LG device: @code{.guess} SUBSYSTEM=="usb", ATTR{idVendor}=="1004", MODE="0666", GROUP="plugdev" @endcode Then restart your adb server (even better to restart the system), plug in your Android device and execute adb devices command. You will see the list of attached devices:

```
savuor@rostislav-laptop:~/Android/Sdk/platform-tools$ ./adb devices
List of devices attached
R58MB40Q3VP     device

savuor@rostislav-laptop:~/Android/Sdk/platform-tools$
```

### Mac OS host computer

No actions are required, just connect your device via USB and run adb devices to check connection.

## What's next

Now, when you have your instance of OpenCV4Adroid SDK set up and configured, you may want to proceed to using OpenCV in your own application. You can learn how to do that in a separate @ref tutorial\_dev\_with\_OCV\_on\_Android tutorial.

## [How to run deep networks on Android device {#tutorial_android_dnn_intro}](https://docharvest.github.io/docs/opencv5/tutorials/introduction/android_binary_package/android_dnn_intro/)


## [Android Ocl Intro](https://docharvest.github.io/docs/opencv5/tutorials/introduction/android_binary_package/android_ocl_intro/)

Contents

opencv5

Android Ocl Intro

OpenCV 5

Android Ocl Intro

# Use OpenCL in Android camera preview based CV application {#tutorial\_android\_ocl\_intro}

@prev\_tutorial{tutorial\_android\_dnn\_intro} @next\_tutorial{tutorial\_macos\_install}

Original author

Andrey Pavlenko, Alexander Panov

Compatibility

OpenCV >= 4.9

@tableofcontents

This guide was designed to help you in use of [OpenCL ™](https://www.khronos.org/opencl/) in Android camera preview based CV application. Tutorial was written for [Android Studio](http://developer.android.com/tools/studio/index.html) 2022.2.1. It was tested with Ubuntu 22.04.

This tutorial assumes you have the following installed and configured:

-   Android Studio (2022.2.1.+)
-   JDK 17
-   Android SDK
-   Android NDK (25.2.9519653+)
-   download OpenCV source code from [github](git@github.com:opencv/opencv.git) or from [releases](https://opencv.org/releases/) and build by [instruction on wiki](https://github.com/opencv/opencv/wiki/Custom-OpenCV-Android-SDK-and-AAR-package-build).

It also assumes that you are familiar with Android Java and JNI programming basics. If you need help with anything of the above, you may refer to our @ref tutorial\_android\_dev\_intro guide.

This tutorial also assumes you have an Android operated device with OpenCL enabled.

The related source code is located within OpenCV samples at [opencv/samples/android/tutorial-4-opencl](https://github.com/opencv/opencv/tree/5.x/samples/android/tutorial-4-opencl/) directory.

## How to build custom OpenCV Android SDK with OpenCL

1.  **Assemble and configure Android OpenCL SDK.** The JNI part of the sample depends on standard Khornos OpenCL headers, and C++ wrapper for OpenCL and libOpenCL.so. The standard OpenCL headers may be copied from 3rdparty directory in OpenCV repository or you Linux distribution package. C++ wrapper is available in [official Khronos reposiotry on Github](https://github.com/KhronosGroup/OpenCL-CLHPP). Copy the header files to didicated directory in the following way: @code{.bash} cd your\_path/ && mkdir ANDROID\_OPENCL\_SDK && mkdir ANDROID\_OPENCL\_SDK/include && cd ANDROID\_OPENCL\_SDK/include cp -r path\_to\_opencv/opencv/3rdparty/include/opencl/1.2/CL . && cd CL wget [https://github.com/KhronosGroup/OpenCL-CLHPP/raw/main/include/CL/opencl.hpp](https://github.com/KhronosGroup/OpenCL-CLHPP/raw/main/include/CL/opencl.hpp) wget [https://github.com/KhronosGroup/OpenCL-CLHPP/raw/main/include/CL/cl2.hpp](https://github.com/KhronosGroup/OpenCL-CLHPP/raw/main/include/CL/cl2.hpp) @endcode libOpenCL.so may be provided with BSP or just downloaded from any OpenCL-cabaple Android device with relevant arhitecture. @code{.bash} cd your\_path/ANDROID\_OPENCL\_SDK && mkdir lib && cd lib adb pull /system/vendor/lib64/libOpenCL.so @endcode System verison of libOpenCL.so may have a lot of platform specific dependencies. `-Wl,--allow-shlib-undefined` flag allows to ignore 3rdparty symbols if they are not used during the build. The following CMake line allows to link the JNI part against standard OpenCL, but not include the loadLibrary into application package. System OpenCL API is used in run-time. @code target\_link\_libraries(${target} -lOpenCL) @endcode
    
2.  **Build custom OpenCV Android SDK with OpenCL.** OpenCL support (T-API) is disabled in OpenCV builds for Android OS by default. but it's possible to rebuild locally OpenCV for Android with OpenCL/T-API enabled: use `-DWITH_OPENCL=ON` option for CMake. You also need to specify the path to the Android OpenCL SDK: use `-DANDROID_OPENCL_SDK=path_to_your_Android_OpenCL_SDK` option for CMake. If you are building OpenCV using `build_sdk.py` please follow [instruction on wiki](https://github.com/opencv/opencv/wiki/Custom-OpenCV-Android-SDK-and-AAR-package-build). Set these CMake parameters in your `.config.py`, e.g. `ndk-18-api-level-21.config.py`: @code{.py} ABI("3", "arm64-v8a", None, 21, cmake\_vars=dict('WITH\_OPENCL': 'ON', 'ANDROID\_OPENCL\_SDK': 'path\_to\_your\_Android\_OpenCL\_SDK')) @endcode If you are building OpenCV using cmake/ninja, use this bash script (set your NDK\_VERSION and your paths instead of examples of paths): @code{.bash} cd path\_to\_opencv && mkdir build && cd build export NDK\_VERSION=25.2.9519653 export ANDROID\_SDK=/home/user/Android/Sdk/ export ANDROID\_OPENCL\_SDK=/path\_to\_ANDROID\_OPENCL\_SDK/ export ANDROID\_HOME=$ANDROID\_SDK export ANDROID\_NDK\_HOME=$ANDROID\_SDK/ndk/$NDK\_VERSION/ cmake -GNinja -DCMAKE\_TOOLCHAIN\_FILE=$ANDROID\_NDK\_HOME/build/cmake/android.toolchain.cmake -DANDROID\_STL=c++\_shared -DANDROID\_NATIVE\_API\_LEVEL=24 -DANDROID\_SDK=$ANDROID\_SDK -DANDROID\_NDK=$ANDROID\_NDK\_HOME -DBUILD\_JAVA=ON -DANDROID\_HOME=$ANDROID\_SDK -DBUILD\_ANDROID\_EXAMPLES=ON -DINSTALL\_ANDROID\_EXAMPLES=ON -DANDROID\_ABI=arm64-v8a -DWITH\_OPENCL=ON -DANDROID\_OPENCL\_SDK=$ANDROID\_OPENCL\_SDK .. @endcode
    

## Preface

Using [GPGPU](https://en.wikipedia.org/wiki/General-purpose_computing_on_graphics_processing_units) via OpenCL for applications performance enhancements is quite a modern trend now. Some CV algo-s (e.g. image filtering) run much faster on a GPU than on a CPU. Recently it has become possible on Android OS.

The most popular CV application scenario for an Android operated device is starting camera in preview mode, applying some CV algo to every frame and displaying the preview frames modified by that CV algo.

Let's consider how we can use OpenCL in this scenario. In particular let's try two ways: direct calls to OpenCL API and recently introduced OpenCV T-API (aka [Transparent API](https://docs.google.com/presentation/d/1qoa29N_B-s297-fp0-b3rBirvpzJQp8dCtllLQ4DVCY/present)) - implicit OpenCL accelerations of some OpenCV algo-s.

## Application structure

Starting Android API level 11 (Android 3.0) [Camera API](http://developer.android.com/reference/android/hardware/Camera.html) allows use of OpenGL texture as a target for preview frames. Android API level 21 brings a new [Camera2 API](http://developer.android.com/reference/android/hardware/camera2/package-summary.html) that provides much more control over the camera settings and usage modes, it allows several targets for preview frames and OpenGL texture in particular.

Having a preview frame in an OpenGL texture is a good deal for using OpenCL because there is an [OpenGL-OpenCL Interoperability API (cl\_khr\_gl\_sharing)](https://www.khronos.org/registry/cl/sdk/1.2/docs/man/xhtml/cl_khr_gl_sharing.html), allowing sharing OpenGL texture data with OpenCL functions without copying (with some restrictions of course).

Let's create a base for our application that just configures Android camera to send preview frames to OpenGL texture and displays these frames on display without any processing.

A minimal `Activity` class for that purposes looks like following:

@code{.java} public class Tutorial4Activity extends Activity {

```
private MyGLSurfaceView mView;

@Override
public void onCreate(Bundle savedInstanceState) {
    super.onCreate(savedInstanceState);
    requestWindowFeature(Window.FEATURE_NO_TITLE);
    getWindow().setFlags(WindowManager.LayoutParams.FLAG_FULLSCREEN,
            WindowManager.LayoutParams.FLAG_FULLSCREEN);
    getWindow().setFlags(WindowManager.LayoutParams.FLAG_KEEP_SCREEN_ON,
            WindowManager.LayoutParams.FLAG_KEEP_SCREEN_ON);
    setRequestedOrientation(ActivityInfo.SCREEN_ORIENTATION_LANDSCAPE);

    mView = new MyGLSurfaceView(this);
    setContentView(mView);
}

@Override
protected void onPause() {
    mView.onPause();
    super.onPause();
}

@Override
protected void onResume() {
    super.onResume();
    mView.onResume();
}
```

} @endcode

And a minimal `View` class respectively:

@snippet samples/android/tutorial-4-opencl/src/org/opencv/samples/tutorial4/MyGLSurfaceView.java minimal\_surface\_view

@note we use two renderer classes: one for legacy [Camera](http://developer.android.com/reference/android/hardware/Camera.html) API and another for modern [Camera2](http://developer.android.com/reference/android/hardware/camera2/package-summary.html).

A minimal `Renderer` class can be implemented in Java (OpenGL ES 2.0 [available](http://developer.android.com/reference/android/opengl/GLES20.html) in Java), but since we are going to modify the preview texture with OpenCL let's move OpenGL stuff to JNI. Here is a simple Java wrapper for our JNI stuff:

@snippet samples/android/tutorial-4-opencl/src/org/opencv/samples/tutorial4/NativePart.java native\_part

Since `Camera` and `Camera2` APIs differ significantly in camera setup and control, let's create a base class for the two corresponding renderers:

@code{.java} public abstract class MyGLRendererBase implements GLSurfaceView.Renderer, SurfaceTexture.OnFrameAvailableListener { protected final String LOGTAG = "MyGLRendererBase";

```
protected SurfaceTexture mSTex;
protected MyGLSurfaceView mView;

protected boolean mGLInit = false;
protected boolean mTexUpdate = false;

MyGLRendererBase(MyGLSurfaceView view) {
    mView = view;
}

protected abstract void openCamera();
protected abstract void closeCamera();
protected abstract void setCameraPreviewSize(int width, int height);

public void onResume() {
    Log.i(LOGTAG, "onResume");
}

public void onPause() {
    Log.i(LOGTAG, "onPause");
    mGLInit = false;
    mTexUpdate = false;
    closeCamera();
    if(mSTex != null) {
        mSTex.release();
        mSTex = null;
        NativeGLRenderer.closeGL();
    }
}

@Override
public synchronized void onFrameAvailable(SurfaceTexture surfaceTexture) {
    //Log.i(LOGTAG, "onFrameAvailable");
    mTexUpdate = true;
    mView.requestRender();
}

@Override
public void onDrawFrame(GL10 gl) {
    //Log.i(LOGTAG, "onDrawFrame");
    if (!mGLInit)
        return;

    synchronized (this) {
        if (mTexUpdate) {
            mSTex.updateTexImage();
            mTexUpdate = false;
        }
    }
    NativeGLRenderer.drawFrame();
}

@Override
public void onSurfaceChanged(GL10 gl, int surfaceWidth, int surfaceHeight) {
    Log.i(LOGTAG, "onSurfaceChanged("+surfaceWidth+"x"+surfaceHeight+")");
    NativeGLRenderer.changeSize(surfaceWidth, surfaceHeight);
    setCameraPreviewSize(surfaceWidth, surfaceHeight);
}

@Override
public void onSurfaceCreated(GL10 gl, EGLConfig config) {
    Log.i(LOGTAG, "onSurfaceCreated");
    String strGLVersion = GLES20.glGetString(GLES20.GL_VERSION);
    if (strGLVersion != null)
        Log.i(LOGTAG, "OpenGL ES version: " + strGLVersion);

    int hTex = NativeGLRenderer.initGL();
    mSTex = new SurfaceTexture(hTex);
    mSTex.setOnFrameAvailableListener(this);
    openCamera();
    mGLInit = true;
}
```

} @endcode

As you can see, inheritors for `Camera` and `Camera2` APIs should implement the following abstract methods: @code{.java} protected abstract void openCamera(); protected abstract void closeCamera(); protected abstract void setCameraPreviewSize(int width, int height); @endcode

Let's leave the details of their implementation beyond of this tutorial, please refer the [source code](https://github.com/opencv/opencv/tree/5.x/samples/android/tutorial-4-opencl/) to see them.

## Preview Frames modification

The details OpenGL ES 2.0 initialization are also quite straightforward and noisy to be quoted here, but the important point here is that the OpeGL texture to be the target for camera preview should be of type `GL_TEXTURE_EXTERNAL_OES` (not `GL_TEXTURE_2D`), internally it keeps picture data in _YUV_ format. That makes unable sharing it via CL-GL interop (`cl_khr_gl_sharing`) and accessing its pixel data via C/C++ code. To overcome this restriction we have to perform an OpenGL rendering from this texture to another regular `GL_TEXTURE_2D` one using _FrameBuffer Object_ (aka FBO).

### C/C++ code

After that we can read (_copy_) pixel data from C/C++ via `glReadPixels()` and write them back to texture after modification via `glTexSubImage2D()`.

### Direct OpenCL calls

Also that `GL_TEXTURE_2D` texture can be shared with OpenCL without copying, but we have to create OpenCL context with special way for that:

@snippet samples/android/tutorial-4-opencl/jni/CLprocessor.cpp init\_opencl

Then the texture can be wrapped by a `cl::ImageGL` object and processed via OpenCL calls:

@snippet samples/android/tutorial-4-opencl/jni/CLprocessor.cpp process\_pure\_opencl

### OpenCV T-API

But instead of writing OpenCL code by yourselves you may want to use **OpenCV T-API** that calls OpenCL implicitly. All that you need is to pass the created OpenCL context to OpenCV (via `cv::ocl::attachContext()`) and somehow wrap OpenGL texture with `cv::UMat`. Unfortunately `UMat` keeps OpenCL _buffer_ internally, that can't be wrapped over either OpenGL _texture_ or OpenCL _image_ - so we have to copy image data here:

@snippet samples/android/tutorial-4-opencl/jni/CLprocessor.cpp process\_tapi

@note We have to make one more image data copy when placing back the modified image to the original OpenGL texture via OpenCL image wrapper.

## Performance notes

To compare the performance we measured FPS of the same preview frames modification (_Laplacian_) done by C/C++ code (call to `cv::Laplacian` with `cv::Mat`), by direct OpenCL calls (using OpenCL _images_ for input and output), and by OpenCV _T-API_ (call to `cv::Laplacian` with `cv::UMat`) on _Sony Xperia Z3_ with 720p camera resolution:

-   **C/C++ version** shows **3-4 fps**
-   **direct OpenCL calls** shows **25-27 fps**
-   **OpenCV T-API** shows **11-13 fps** (due to extra copying from `cl_image` to `cl_buffer` and back)

## [Dev With OCV On Android](https://docharvest.github.io/docs/opencv5/tutorials/introduction/android_binary_package/dev_with_OCV_on_Android/)

Contents

opencv5

Dev With OCV On Android

OpenCV 5

Dev With OCV On Android

# Android Development with OpenCV {#tutorial\_dev\_with\_OCV\_on\_Android}

@prev\_tutorial{tutorial\_android\_dev\_intro} @next\_tutorial{tutorial\_android\_dnn\_intro}

Original authors

Alexander Panov, Rostislav Vasilikhin

Compatibility

OpenCV >= 4.9.0

@tableofcontents

This tutorial has been created to help you use OpenCV library within your Android project.

This guide was checked on Ubuntu but contains no platform-dependent parts, therefore should be compatible with any OS supported by Android Studio and OpenCV4Android SDK.

This tutorial assumes you have the following installed and configured:

-   Android Studio
-   JDK
-   Android SDK and NDK
-   Optional: OpenCV for Android SDK from official [release page on Github](https://github.com/opencv/opencv/releases) or [SourceForge](https://sourceforge.net/projects/opencvlibrary/). Advanced: as alternative the SDK may be built from source code by [instruction on wiki](https://github.com/opencv/opencv/wiki/Custom-OpenCV-Android-SDK-and-AAR-package-build).

If you need help with anything of the above, you may refer to our @ref tutorial\_android\_dev\_intro guide.

If you encounter any error after thoroughly following these steps, feel free to contact us via OpenCV [forum](https://forum.opencv.org). We'll do our best to help you out.

## Hello OpenCV sample with SDK

In this section we're gonna create a simple app that does nothing but OpenCV loading. In next section we'll extend it to support camera.

In addition to this instruction you can use some video guide, for example [this one](https://www.youtube.com/watch?v=bR7lL886-uc&ab_channel=ProgrammingHut)

1.  Open Android Studio and create empty project by choosing _**Empty Views Activity**_
    
2.  Setup the project:
    
    -   Choose _**Java**_ language
    -   Choose _**Groovy DSL**_ build configuration language
    -   Choose _**Minumum SDK**_ with the version number not less than was used during OpenCV 4 Android build
        -   If you don't know it, you can find it in file `OpenCV-android-sdk/sdk/build.gradle` at `android -> defaultConfig -> minSdkVersion`
    
3.  Click _**File -> New -> Import module...**_ and select OpenCV SDK path
    
4.  Set module name as `OpenCV` and press `Finish`
    
5.  OpenCV also provides experiemental Kotlin support. Please add Android Kotlin plugin to `MyApplication/OpenCV/build.gradle` file: @code{.gradle} plugins { id 'org.jetbrains.kotlin.android' version '1.7.10' #version may differ for your setup
    
    ## } @endcode Like this: If you don't do this, you may get an error: @code Task failed with an exception.
    
    -   Where: Build file '/home/alexander/AndroidStudioProjects/MyApplication/opencv/build.gradle' line: 4
        
    -   What went wrong: A problem occurred evaluating project ':opencv'.
        
        > Plugin with id 'kotlin-android' not found. @endcode The fix was found [here](https://stackoverflow.com/questions/73225714/import-opencv-sdk-to-android-studio-chipmunk)
        
6.  OpenCV project uses `buildConfig` feature. Please enable it in `MyApplication/OpenCV/build.gradle` file to `android` block: @code{.gradle} buildFeatures{ buildConfig true }
    
    @endcode Like this: If you don't do this, you may get an error: @code JavaCameraView.java:15: error: cannot find symbol import org.opencv.BuildConfig; ^ symbol: class BuildConfig location: package org.opencv @endcode The fix was found [here](https://stackoverflow.com/questions/76374886/error-cannot-find-symbol-import-org-opencv-buildconfig-android-studio) and [here](https://forum.opencv.org/t/task-compiledebugjavawithjavac-failed/13667/4)
    
7.  Add the module to the project:
    
    -   Click _**File -> Project structure... -> Dependencies -> All modules -> + (Add Dependency button) -> Module dependency**_
    
    -   Choose `app`
    
    -   Select `OpenCV`
    
8.  Before using any OpenCV function you have to load the library first. If you application includes other OpenCV-dependent native libraries you should load them _**after**_ OpenCV initialization. Add the folowing code to load the library at app start: @snippet samples/android/tutorial-1-camerapreview/src/org/opencv/samples/tutorial1/Tutorial1Activity.java ocv\_loader\_init Like this:
    
9.  Choose a device to check the sample on and run the code by pressing `run` button
    

## Hello OpenCV sample with Maven Central

Since OpenCV 4.9.0 OpenCV for Android package is available with Maven Central and may be installed automatically as Gradle dependency. In this section we're gonna create a simple app that does nothing but OpenCV loading with Maven Central.

1.  Open Android Studio and create empty project by choosing _**Empty Views Activity**_
    
2.  Setup the project:
    
    -   Choose _**Java**_ language
    -   Choose _**Groovy DSL**_ build configuration language
    -   Choose _**Minumum SDK**_ with the version number not less than OpenCV supports. For 4.9.0 minimal SDK version is 21.
    
3.  Edit `build.gradle` and add OpenCV library to Dependencies list like this: @code{.gradle} dependencies { implementation 'org.opencv:opencv:4.9.0' } @endcode `4.9.0` may be replaced by any version available as [official release](https://central.sonatype.com/artifact/org.opencv/opencv).
    
4.  Before using any OpenCV function you have to load the library first. If you application includes other OpenCV-dependent native libraries you should load them _**after**_ OpenCV initialization. Add the folowing code to load the library at app start: @snippet samples/android/tutorial-1-camerapreview/src/org/opencv/samples/tutorial1/Tutorial1Activity.java ocv\_loader\_init Like this:
    
5.  Choose a device to check the sample on and run the code by pressing `run` button
    

## Camera view sample

In this section we'll extend our empty OpenCV app created in the previous section to support camera. We'll take camera frames and display them on the screen.

1.  Tell a system that we need camera permissions. Add the following code to the file `MyApplication/app/src/main/AndroidManifest.xml`: @snippet samples/android/tutorial-1-camerapreview/gradle/AndroidManifest.xml camera\_permissions Like this:
    
2.  Go to `activity_main.xml` layout and delete TextView with text "Hello World!"
    
    This can also be done in Code or Split mode by removing the `TextView` block from XML file.
    
3.  Add camera view to the layout:
    
    1.  Add a scheme into layout description: @code{.xml} xmlns:opencv="[http://schemas.android.com/apk/res-auto](http://schemas.android.com/apk/res-auto)" @endcode
        
    2.  Replace `TextView` with `org.opencv.android.JavaCameraView` widget: @snippet /samples/android/tutorial-1-camerapreview/res/layout/tutorial1\_surface\_view.xml camera\_view
        
    3.  If you get a layout warning replace `fill_parent` values by `match_parent` for `android:layout_width` and `android:layout_height` properties
        
    
    You'll get a code like this:
    
    @include /samples/android/tutorial-1-camerapreview/res/layout/tutorial1\_surface\_view.xml
    
4.  Inherit the main class from `org.opencv.android.CameraActivity`. CameraActivity implements camera perimission requiest and some other utilities needed for CV application. Methods we're interested in to override are `onCreate`, `onDestroy`, `onPause`, `onResume` and `getCameraViewList`
    
5.  Implement the interface `org.opencv.android.CameraBridgeViewBase.CvCameraViewListener2` `onCameraFrame` method should return the `Mat` object with content for render. The sample just returns camera frame for preview: `return inputFrame.rgba();`
    
6.  Allocate `org.opencv.android.CameraBridgeViewBase` object:
    
    -   It should be created at app start (`onCreate` method) and this class should be set as a listener
    -   At pause/resume (`onPause`, `onResume` methods) it should be disabled/enabled
    -   Should be disabled at app finish (`onDestroy` method)
    -   Should be returned in `getCameraViewList`
7.  Optionally you can forbid the phone to dim screen or lock:
    
    @snippet samples/android/tutorial-1-camerapreview/src/org/opencv/samples/tutorial1/Tutorial1Activity.java keep\_screen
    

Finally you'll get source code similar to this:

@include samples/android/tutorial-1-camerapreview/src/org/opencv/samples/tutorial1/Tutorial1Activity.java

This is it! Now you can run the code on your device to check it.

## Let's discuss some most important steps

Every Android application with UI must implement Activity and View. By the first steps we create blank activity and default view layout. The simplest OpenCV-centric application must perform OpenCV initialization, create a view to show preview from camera and implement `CvCameraViewListener2` interface to get frames from camera and process them.

First of all we create our application view using XML layout. Our layout consists of the only one full screen component of class `org.opencv.android.JavaCameraView`. This OpenCV class is inherited from `CameraBridgeViewBase` that extends `SurfaceView` and under the hood uses standard Android camera API.

The `CvCameraViewListener2` interface lets you add some processing steps after the frame is grabbed from the camera and before it's rendered on the screen. The most important method is `onCameraFrame`. This is a callback function and it's called on retrieving frame from camera. It expects that `onCameraFrame` function returns RGBA frame that will be drawn on the screen.

The callback passes a frame from camera to our class as an object of `CvCameraViewFrame` class. This object has `rgba()` and `gray()` methods that let a user get colored or one-channel grayscale frame as a `Mat` class object.

@note Do not save or use `CvCameraViewFrame` object out of `onCameraFrame` callback. This object does not have its own state and its behavior outside the callback is unpredictable!

## [Building Fastcv](https://docharvest.github.io/docs/opencv5/tutorials/introduction/building_fastcv/building_fastcv/)

Contents

opencv5

Building Fastcv

OpenCV 5

Building Fastcv

# Building OpenCV with FastCV {#tutorial\_building\_fastcv}

Compatibility

OpenCV >= 4.11.0

## Enable OpenCV with FastCV for Qualcomm Chipsets

This document scope is to guide the Developers to enable OpenCV Acceleration with FastCV for the Qualcomm chipsets with ARM64 architecture. Entablement of OpenCV with FastCV back-end on non-Qualcomm chipsets or Linux platforms other than [Qualcomm Linux](https://www.qualcomm.com/developer/software/qualcomm-linux) is currently out of scope.

## About FastCV

FastCV provides two main features to computer vision application developers:

-   A library of frequently used computer vision (CV) functions, optimized to run efficiently on a wide variety of Qualcomm’s Snapdragon devices.
-   A clean processor-agnostic hardware acceleration API, under which chipset vendors can hardware accelerate FastCV functions on Qualcomm’s Snapdragon hardware.

FastCV is released as a unified binary, a single binary containing two implementations of the algorithms:

-   Generic implementation runs on Arm® architecture and is referred to as FastCV for Arm architecture.
-   Implementation runs only on Qualcomm® Snapdragon™ chipsets and is called FastCV for Snapdragon.

FastCV library is Qualcomm proprietary and provides faster implementation of CV algorithms on various hardware as compared to other CV libraries.

## OpenCV Acceleration with FastCV HAL and Extensions

OpenCV and FastCV integration is implemented in two ways:

1.  FastCV-based HAL for basic computer vision and arithmetic algorithms acceleration.
2.  FastCV module in opencv\_contrib with custom algorithms and FastCV function wrappers that do not fit generic OpenCV interface or behaviour.

## Supported Platforms

1.  Android : Qualcomm Chipsets with the Android from Snapdragon 8 Gen 1 onwards([https://www.qualcomm.com/products/mobile/snapdragon/smartphones#product-list](https://www.qualcomm.com/products/mobile/snapdragon/smartphones#product-list))
2.  Linux : Qualcomm Linux Program related boards mentioned in [Hardware](https://www.qualcomm.com/developer/software/qualcomm-linux/hardware)

## Compiling OpenCV with FastCV for Android

1.  **Follow Wiki page for OpenCV Compilation** : [https://github.com/opencv/opencv/wiki/Custom-OpenCV-Android-SDK-and-AAR-package-build](https://github.com/opencv/opencv/wiki/Custom-OpenCV-Android-SDK-and-AAR-package-build)

Once the OpenCV repository code is cloned into the workspace, please add `-DWITH_FASTCV=ON` flag to cmake vars as below to arm64 entry in `opencv/platforms/android/default.config.py` or create new one with the option to enable FastCV HAL and/or extenstions compilation:

```
 ABI("3", "arm64-v8a", None, 24, cmake_vars=dict(WITH_FASTCV='ON')),
```

2.  Remaining steps can be followed as mentioned in [the wiki page](https://github.com/opencv/opencv/wiki/Custom-OpenCV-Android-SDK-and-AAR-package-build)

## Compiling OpenCV with FastCV for Qualcomm Linux

@note: Only Ubuntu 22.04 is supported as host platform for eSDK deployment.

1.  Install eSDK by following [Qualcomm® Linux Documentation](https://docs.qualcomm.com/bundle/publicresource/topics/80-70017-51/install-sdk.html)
    
2.  After installing the eSDK, set the ESDK\_ROOT:
    

```
export ESDK_ROOT=<eSDK install location>
```

3.  Add SDK tools and libraries to your environment:

```
source environment-setup-armv8-2a-qcom-linux
```

If you encounter the following message:

```
Your environment is misconfigured, you probably need to 'unset LD_LIBRARY_PATH'
but please check why this was set in the first place and that it's safe to unset.
The SDK will not operate correctly in most cases when LD_LIBRARY_PATH is set.
```

just unset your host `LD_LIBRARY_PATH` environment variable: `unset LD_LIBRARY_PATH`.

4.  Clone OpenCV Repositories:

Clone the OpenCV main and optionally opencv\_contrib repositories into any directory (it does not need to be inside the SDK directory).

```
git clone https://github.com/opencv/opencv.git
git clone https://github.com/opencv/opencv_contrib.git
```

5.  Build OpenCV

Create a build directory, navigate into it and build the project with CMake there:

```
mkdir build
cd build
cmake -DCMAKE_SYSTEM_NAME=Linux -DCMAKE_SYSTEM_PROCESSOR=aarch64 -DWITH_FASTCV=ON -DBUILD_SHARED_LIBS=ON -DOPENCV_EXTRA_MODULES_PATH=../opencv_contrib/modules/fastcv/ ../opencv
make -j$(nproc)
```

If the FastCV library is updated, please replace the old FastCV libraries located at:

```
<ESDK_PATH>\qcom-wayland_sdk\tmp\sysroots\qcs6490-rb3gen2-vision-kit\usr\lib
```

with the latest FastCV libraries downloaded in:

```
build\3rdparty\fastcv\libs
```

6.  Validate

Push the OpenCV libraries, test binaries and test data on to the target. Execute the OpenCV conformance or performance tests. During runtime, If libwebp.so.7 lib is missing, find the lib in the below Path and push it on the target

```
<ESDK_PATH>\qcom-wayland_sdk\tmp\sysroots\qcs6490-rb3gen2-vision-kit\usr\lib\libwebp.so.7
```

## HAL and Extension list of APIs

**FastCV based OpenCV HAL APIs list :**

OpenCV module

OpenCV API

Underlying FastCV API for OpenCV acceleration

IMGPROC

medianBlur

fcvFilterMedian3x3u8\_v3

sobel

fcvFilterSobel3x3u8s16

fcvFilterSobel5x5u8s16

fcvFilterSobel7x7u8s16

boxFilter

fcvBoxFilter3x3u8\_v3

fcvBoxFilter5x5u8\_v2

fcvBoxFilterNxNf32

adaptiveThreshold

fcvAdaptiveThresholdGaussian3x3u8\_v2

fcvAdaptiveThresholdGaussian5x5u8\_v2

fcvAdaptiveThresholdMean3x3u8\_v2

fcvAdaptiveThresholdMean5x5u8\_v2

pyrDown

fcvPyramidCreateu8\_v4

cvtColor

fcvColorRGB888toYCrCbu8\_v3

fcvColorRGB888ToHSV888u8

gaussianBlur

fcvFilterGaussian5x5u8\_v3

fcvFilterGaussian3x3u8\_v4

warpPerspective

fcvWarpPerspectiveu8\_v5

Canny

fcvFilterCannyu8

CORE

lut

fcvTableLookupu8

norm

fcvHammingDistanceu8

multiply

fcvElementMultiplyu8u16\_v2

transpose

fcvTransposeu8\_v2

fcvTransposeu16\_v2

fcvTransposef32\_v2

meanStdDev

fcvImageIntensityStats\_v2

flip

fcvFlipu8

fcvFlipu16

fcvFlipRGB888u8

rotate

fcvRotateImageu8

fcvRotateImageInterleavedu8

multiply

fcvElementMultiplyu8

fcvElementMultiplys16

fcvElementMultiplyf32

addWeighted

fcvAddWeightedu8\_v2

subtract

fcvImageDiffu8f32\_v2

SVD & solve

fcvSVDf32\_v2

gemm

fcvMatrixMultiplyf32\_v2

fcvMultiplyScalarf32

fcvAddf32\_v2

**FastCV based OpenCV Extensions APIs list :**

These OpenCV extension APIs are implemented under the **cv::fastcv** namespace.

OpenCV Extension APIs

Underlying FastCV API for OpenCV acceleration

matmuls8s32

fcvMatrixMultiplys8s32

clusterEuclidean

fcvClusterEuclideanu8

FAST10

fcvCornerFast10InMaskScoreu8

fcvCornerFast10InMasku8

fcvCornerFast10Scoreu8

fcvCornerFast10u8

FFT

fcvFFTu8

IFFT

fcvIFFTf32

fillConvexPoly

fcvFillConvexPolyu8

houghLines

fcvHoughLineu8

moments

fcvImageMomentsu8

fcvImageMomentss32

fcvImageMomentsf32

runMSER

fcvMserInit

fcvMserNN8Init

fcvMserExtu8\_v3

fcvMserExtNN8u8

fcvMserNN8u8

fcvMserRelease

remap

fcvRemapu8\_v2

remapRGBA

fcvRemapRGBA8888BLu8

fcvRemapRGBA8888NNu8

resizeDown

fcvScaleDownBy2u8\_v2

fcvScaleDownBy4u8\_v2

fcvScaleDownMNInterleaveu8

fcvScaleDownMNu8

meanShift

fcvMeanShiftu8

fcvMeanShifts32

fcvMeanShiftf32

bilateralRecursive

fcvBilateralFilterRecursiveu8

thresholdRange

fcvFilterThresholdRangeu8\_v2

bilateralFilter

fcvBilateralFilter5x5u8\_v3

fcvBilateralFilter7x7u8\_v3

fcvBilateralFilter9x9u8\_v3

calcHist

fcvImageIntensityHistogram

gaussianBlur

fcvFilterGaussian3x3u8\_v4

fcvFilterGaussian5x5u8\_v3

fcvFilterGaussian5x5s16\_v3

fcvFilterGaussian5x5s32\_v3

fcvFilterGaussian11x11u8\_v2

filter2D

fcvFilterCorrNxNu8

fcvFilterCorrNxNu8s16

fcvFilterCorrNxNu8f32

sepFilter2D

fcvFilterCorrSepMxNu8

fcvFilterCorrSep9x9s16\_v2

fcvFilterCorrSep11x11s16\_v2

fcvFilterCorrSep13x13s16\_v2

fcvFilterCorrSep15x15s16\_v2

fcvFilterCorrSep17x17s16\_v2

fcvFilterCorrSepNxNs16

sobel3x3u8

fcvImageGradientSobelPlanars8\_v2

sobel3x3u8

fcvImageGradientSobelPlanars16\_v2

sobel3x3u8

fcvImageGradientSobelPlanars16\_v3

sobel3x3u8

fcvImageGradientSobelPlanarf32\_v2

sobel3x3u8

fcvImageGradientSobelPlanarf32\_v3

sobel

fcvFilterSobel3x3u8\_v2

fcvFilterSobel3x3u8s16

fcvFilterSobel5x5u8s16

fcvFilterSobel7x7u8s16

DCT

fcvDCTu8

iDCT

fcvIDCTs16

sobelPyramid

fcvPyramidAllocate

fcvPyramidAllocate\_v2

fcvPyramidAllocate\_v3

fcvPyramidSobelGradientCreatei8

fcvPyramidSobelGradientCreatei16

fcvPyramidSobelGradientCreatef32

fcvPyramidDelete

fcvPyramidDelete\_v2

fcvPyramidCreatef32\_v2

fcvPyramidCreateu8\_v4

trackOpticalFlowLK

fcvTrackLKOpticalFlowu8\_v3

fcvTrackLKOpticalFlowu8

warpPerspective2Plane

fcv2PlaneWarpPerspectiveu8

warpPerspective

fcvWarpPerspectiveu8\_v5

arithmetic\_op

fcvAddu8

fcvAdds16\_v2

fcvAddf32

fcvSubtractu8

fcvSubtracts16

integrateYUV

fcvIntegrateImageYCbCr420PseudoPlanaru8

normalizeLocalBox

fcvNormalizeLocalBoxu8

fcvNormalizeLocalBoxf32

merge

fcvChannelCombine2Planesu8

fcvChannelCombine3Planesu8

fcvChannelCombine4Planesu8

split

fcvDeinterleaveu8

fcvChannelExtractu8

warpAffine

fcvTransformAffineu8\_v2

fcvTransformAffineClippedu8\_v3

fcv3ChannelTransformAffineClippedBCu8

**FastCV QDSP based OpenCV Extension APIs list :** These OpenCV extension APIs are implemented under the **cv::fastcv::dsp** namespace. This namespace provides optimized implementations that leverage QDSP (**Qualcomm's Digital Signal Processor**) acceleration using FastCV's Q-suffixed APIs. These functions require DSP initialization (fcvQ6Init).

OpenCV Extension APIs

Underlying FastCV API for OpenCV acceleration

filter2D

fcvFilterCorr3x3s8\_v2Q

fcvFilterCorrNxNu8Q

fcvFilterCorrNxNu8s16Q

fcvFilterCorrNxNu8f32Q

FFT

fcvFFTu8Q

IFFT

fcvIFFTf32Q

fcvdspinit

fcvQ6Init

fcvdspdeinit

fcvQ6DeInit

Canny

fcvFilterCannyu8Q

sumOfAbsoluteDiffs

fcvSumOfAbsoluteDiffs8x8u8\_v2Q

thresholdOtsu

fcvFilterThresholdOtsuu8Q

**How to Use FastCV QDSP based OpenCV Extension APIs**

This section outlines the essential steps required to use OpenCV Extension APIs that are accelerated using FastCV on QDSP(**Qualcomm's Digital Signal Processor**).

1.  Initialize QDSP:
    
    -   Call **cv::fastcv::dsp::fcvdspinit()** to initialize the QDSP.
2.  Allocate memory using **Qualcomm's memory allocator** for all buffers that are being fed to the OpenCV extension API.:
    
    -   Use **cv::fastcv::getQcAllocator()** to assign the allocator to the buffers.
    -   Example: cv::Mat src; src.allocator = cv::fastcv::getQcAllocator(); **// Set Qualcomm's memory allocator**  
        After setting Qualcomm's memory allocator, any buffer created using methods like src.create(...), cv::imread(...) etc., will have its memory allocated using Qualcomm's memory allocator.
3.  Call the OpenCV extension API from 'cv::fastcv::dsp':
    
    -   Example: **cv::fastcv::dsp::thresholdOtsu(src, dst, binaryType);** where 'src' and 'dst' are 'cv::Mat' objects with the Qualcomm's memory allocator,      and 'binaryType' is a boolean indicating the thresholding mode.
4.  Deinitialize QDSP:
    
    -   Call **cv::fastcv::dsp::fcvdspdeinit()** to deinitialize the QDSP.

**Reference Example**: Refer to a working test case using the OpenCV Extension APIs in the opencv\_contrib repository:[opencv\_contrib/modules/fastcv/test/test\_thresh\_dsp.cpp](https://github.com/opencv/opencv_contrib/blob/4.x/modules/fastcv/test/test_thresh_dsp.cpp)

## [Building Tegra Cuda](https://docharvest.github.io/docs/opencv5/tutorials/introduction/building_tegra_cuda/building_tegra_cuda/)

Contents

opencv5

Building Tegra Cuda

OpenCV 5

Building Tegra Cuda

# Building OpenCV for Tegra with CUDA {#tutorial\_building\_tegra\_cuda}

@prev\_tutorial{tutorial\_arm\_crosscompile\_with\_cmake} @next\_tutorial{tutorial\_display\_image}

Original author

Randy J. Ray

Compatibility

OpenCV >= 3.1.0

@warning This tutorial is deprecated.

@tableofcontents

## OpenCV with CUDA for Tegra

This document is a basic guide to building the OpenCV libraries with CUDA support for use in the Tegra environment. It covers the basic elements of building the version 3.1.0 libraries from source code for three (3) different types of platforms:

-   NVIDIA DRIVE™ PX 2 (V4L)
-   NVIDIA® Tegra® Linux Driver Package (L4T)
-   Desktop Linux (Ubuntu 14.04 LTS and 16.04 LTS)

This document is not an exhaustive guide to all of the options available when building OpenCV. Specifically, it covers the basic options used when building each platform but does not cover any options that are not needed (or are unchanged from their default values). Additionally, the installation of the CUDA toolkit is not covered here.

This document is focused on building the 3.1.0 version of OpenCV, but the guidelines here may also work for building from the master branch of the git repository. There are differences in some of the CMake options for builds of the 2.4.13 version of OpenCV, which are summarized below in the @ref tutorial\_building\_tegra\_cuda\_opencv\_24X section.

Most of the configuration commands are based on the system having CUDA 8.0 installed. In the case of the Jetson TK1, an older CUDA is used because 8.0 is not supported for that platform. These instructions may also work with older versions of CUDA, but are only tested with 8.0.

### A Note on Native Compilation vs. Cross-Compilation

The OpenCV build system supports native compilation for all the supported platforms, as well as cross-compilation for platforms such as ARM and others. The native compilation process is simpler, whereas the cross-compilation is generally faster.

At the present time, this document focuses only on native compilation.

## Getting the Source Code {#tutorial\_building\_tegra\_cuda\_getting\_the\_code}

There are two (2) ways to get the OpenCV source code:

-   Direct download from the [OpenCV downloads](https://opencv.org/releases) page
-   Cloning the git repositories hosted on [GitHub](https://github.com/opencv)

For this guide, the focus is on using the git repositories. This is because the 3.1.0 version of OpenCV will not build with CUDA 8.0 without applying a few small upstream changes from the git repository.

### OpenCV

Start with the `opencv` repository:

```
# Clone the opencv repository locally:
$ git clone https://github.com/opencv/opencv.git
```

To build the 3.1.0 version (as opposed to building the most-recent source), you must check out a branch based on the `3.1.0` tag:

```
$ cd opencv
$ git checkout -b v3.1.0 3.1.0
```

**Note:** This operation creates a new local branch in your clone's repository.

There are some upstream changes that must be applied via the `git cherry-pick` command. The first of these is to apply a fix for building specifically with the 8.0 version of CUDA that was not part of the 3.1.0 release:

```
# While still in the opencv directory:
$ git cherry-pick 10896
```

You will see the following output from the command:

```
[v3.1.0 d6d69a7] GraphCut deprecated in CUDA 7.5 and removed in 8.0
 Author: Vladislav Vinogradov <vlad.vinogradov@itseez.com>
 1 file changed, 2 insertions(+), 1 deletion(-)
```

Secondly, there is a fix for a CMake macro call that is problematic on some systems:

```
$ git cherry pick cdb9c
```

You should see output similar to:

```
[v3.1.0-28613 e5ac2e4] gpu samples: fix REMOVE_ITEM error
 Author: Alexander Alekhin <alexander.alekhin@itseez.com>
 1 file changed, 1 insertion(+), 1 deletion(-)
```

The last upstream fix that is needed deals with the `pkg-config` configuration file that is bundled with the developer package (`libopencv-dev`):

```
$ git cherry-pick 24dbb
```

You should see output similar to:

```
[v3.1.0 3a6d7ab] pkg-config: modules list contains only OpenCV modules (fixes #5852)
 Author: Alexander Alekhin <alexander.alekhin@itseez.com>
 1 file changed, 7 insertions(+), 4 deletions(-)
```

At this point, the `opencv` repository is ready for building.

### OpenCV Extra

The `opencv_extra` repository contains extra data for the OpenCV library, including the data files used by the tests and demos. It must be cloned separately:

```
# In the same base directory from which you cloned OpenCV:
$ git clone https://github.com/opencv/opencv_extra.git
```

As with the OpenCV source, you must use the same method as above to set the source tree to the 3.1.0 version. When you are building from a specific tag, both repositories must be checked out at that tag.

```
$ cd opencv_extra
$ git checkout -b v3.1.0 3.1.0
```

You may opt to not fetch this repository if you do not plan on running the tests or installing the test-data along with the samples and example programs. If it is not referenced in the invocation of CMake, it will not be used.

**Note:** If you plan to run the tests, some tests expect the data to be present and will fail without it.

## Preparation and Prerequisites {#tutorial\_building\_tegra\_cuda\_preparation}

To build OpenCV, you need a directory to create the configuration and build the libraries. You also need a number of 3rd-party libraries upon which OpenCV depends.

### Prerequisites for Ubuntu Linux

These are the basic requirements for building OpenCV for Tegra on Linux:

-   CMake 2.8.10 or newer
-   CUDA toolkit 8.0 (7.0 or 7.5 may also be used)
-   Build tools (make, gcc, g++)
-   Python 2.6 or greater

These are the same regardless of the platform (DRIVE PX 2, Desktop, etc.).

A number of development packages are required for building on Linux:

-   libglew-dev
-   libtiff5-dev
-   zlib1g-dev
-   libjpeg-dev
-   libpng12-dev
-   libjasper-dev
-   libavcodec-dev
-   libavformat-dev
-   libavutil-dev
-   libpostproc-dev
-   libswscale-dev
-   libeigen3-dev
-   libtbb-dev
-   libgtk2.0-dev
-   pkg-config

Some of the packages above are in the `universe` repository for Ubuntu Linux systems. If you have not already enabled that repository, you need to do the following before trying to install all of the packages listed above:

```
$ sudo apt-add-repository universe
$ sudo apt-get update
```

The following command can be pasted into a shell in order to install the required packages:

```
$ sudo apt-get install \
    libglew-dev \
    libtiff5-dev \
    zlib1g-dev \
    libjpeg-dev \
    libpng12-dev \
    libjasper-dev \
    libavcodec-dev \
    libavformat-dev \
    libavutil-dev \
    libpostproc-dev \
    libswscale-dev \
    libeigen3-dev \
    libtbb-dev \
    libgtk2.0-dev \
    pkg-config
```

(Line-breaks and continuation characters are added for readability.)

If you want the Python bindings to be built, you will also need the appropriate packages for either or both of Python 2 and Python 3:

-   python-dev / python3-dev
-   python-numpy / python3-numpy
-   python-py / python3-py
-   python-pytest / python3-pytest

The commands that will do this:

```
$ sudo apt-get install python-dev python-numpy python-py python-pytest
# And, optionally:
$ sudo apt-get install python3-dev python3-numpy python3-py python3-pytest
```

Once all the necessary packages are installed, you can configure the build.

### Preparing the Build Area

Software projects that use the CMake system for configuring their builds expect the actual builds to be done outside of the source tree itself. For configuring and building OpenCV, create a directory called "build" in the same base directory into which you cloned the git repositories:

```
$ mkdir build
$ cd build
```

You are now ready to configure and build OpenCV.

## Configuring OpenCV for Building {#tutorial\_building\_tegra\_cuda\_configuring}

The CMake configuration options given below for the different platforms are targeted towards the functionality needed for Tegra. They are based on the original configuration options used for building OpenCV 2.4.13.

The build of OpenCV is configured with CMake. If run with no parameters, it detects what it needs to know about your system. However, it may have difficulty finding the CUDA files if they are not in a standard location, and it may try to build some options that you might otherwise not want included, so the following invocations of CMake are recommended.

In each `cmake` command listed in the following sub-sections, line-breaks and indentation are added for readability. Continuation characters are also added in examples for Linux-based platforms, allowing you to copy and paste the examples directly into a shell. When entering these commands by hand, enter the command and options as a single line. For a detailed explanation of the parameters passed to `cmake`, see the "CMake Parameter Reference" section.

For the Linux-based platforms, the shown value for the `CMAKE_INSTALL_PREFIX` parameter is `/usr`. You can set this to whatever you want, based on the layout of your system.

In each of the `cmake` invocations below, the last parameter, `OPENCV_TEST_DATA_PATH`, tells the build system where to find the test-data that is provided by the `opencv_extra` repository. When this is included, a `make install` installs this test-data alongside the libraries and example code, and a `make test` automatically provides this path to the tests that have to load data from it. If you did not clone the `opencv_extra` repository, do not include this parameter.

### Vibrante V4L Configuration

Supported platform: Drive PX 2

```
$ cmake \
    -DCMAKE_BUILD_TYPE=Release \
    -DCMAKE_INSTALL_PREFIX=/usr \
    -DBUILD_PNG=OFF \
    -DBUILD_TIFF=OFF \
    -DBUILD_TBB=OFF \
    -DBUILD_JPEG=OFF \
    -DBUILD_JASPER=OFF \
    -DBUILD_ZLIB=OFF \
    -DBUILD_EXAMPLES=ON \
    -DBUILD_JAVA=OFF \
    -DBUILD_opencv_python2=ON \
    -DBUILD_opencv_python3=OFF \
    -DENABLE_NEON=ON \
    -DWITH_OPENCL=OFF \
    -DWITH_OPENMP=OFF \
    -DWITH_FFMPEG=ON \
    -DWITH_GSTREAMER=OFF \
    -DWITH_GSTREAMER_0_10=OFF \
    -DWITH_CUDA=ON \
    -DWITH_GTK=ON \
    -DWITH_VTK=OFF \
    -DWITH_TBB=ON \
    -DWITH_1394=OFF \
    -DWITH_OPENEXR=OFF \
    -DCUDA_TOOLKIT_ROOT_DIR=/usr/local/cuda-8.0 \
    -DCUDA_ARCH_BIN=6.2 \
    -DCUDA_ARCH_PTX="" \
    -DINSTALL_C_EXAMPLES=ON \
    -DINSTALL_TESTS=OFF \
    -DOPENCV_TEST_DATA_PATH=../opencv_extra/testdata \
    ../opencv
```

The configuration provided above builds the Python bindings for Python 2 (but not Python 3) as part of the build process. If you want the Python 3 bindings (or do not want the Python 2 bindings), change the values of `BUILD_opencv_python2` and/or `BUILD_opencv_python3` as needed. To enable bindings, set the value to `ON`, to disable them set it to `OFF`:

```
-DBUILD_opencv_python2=OFF
```

### Jetson L4T Configuration

Supported platforms:

-   Jetson TK1
-   Jetson TX1

Configuration is slightly different for the Jetson TK1 and the Jetson TX1 systems.

#### Jetson TK1

```
$ cmake \
    -DCMAKE_BUILD_TYPE=Release \
    -DCMAKE_INSTALL_PREFIX=/usr \
    -DCMAKE_CXX_FLAGS=-Wa,-mimplicit-it=thumb \
    -DBUILD_PNG=OFF \
    -DBUILD_TIFF=OFF \
    -DBUILD_TBB=OFF \
    -DBUILD_JPEG=OFF \
    -DBUILD_JASPER=OFF \
    -DBUILD_ZLIB=OFF \
    -DBUILD_EXAMPLES=ON \
    -DBUILD_JAVA=OFF \
    -DBUILD_opencv_python2=ON \
    -DBUILD_opencv_python3=OFF \
    -DENABLE_NEON=ON \
    -DWITH_OPENCL=OFF \
    -DWITH_OPENMP=OFF \
    -DWITH_FFMPEG=ON \
    -DWITH_GSTREAMER=OFF \
    -DWITH_GSTREAMER_0_10=OFF \
    -DWITH_CUDA=ON \
    -DWITH_GTK=ON \
    -DWITH_VTK=OFF \
    -DWITH_TBB=ON \
    -DWITH_1394=OFF \
    -DWITH_OPENEXR=OFF \
    -DCUDA_TOOLKIT_ROOT_DIR=/usr/local/cuda-6.5 \
    -DCUDA_ARCH_BIN=3.2 \
    -DCUDA_ARCH_PTX="" \
    -DINSTALL_C_EXAMPLES=ON \
    -DINSTALL_TESTS=OFF \
    -DOPENCV_TEST_DATA_PATH=../opencv_extra/testdata \
    ../opencv
```

**Note:** This uses CUDA 6.5, not 8.0.

#### Jetson TX1

```
$ cmake \
    -DCMAKE_BUILD_TYPE=Release \
    -DCMAKE_INSTALL_PREFIX=/usr \
    -DBUILD_PNG=OFF \
    -DBUILD_TIFF=OFF \
    -DBUILD_TBB=OFF \
    -DBUILD_JPEG=OFF \
    -DBUILD_JASPER=OFF \
    -DBUILD_ZLIB=OFF \
    -DBUILD_EXAMPLES=ON \
    -DBUILD_JAVA=OFF \
    -DBUILD_opencv_python2=ON \
    -DBUILD_opencv_python3=OFF \
    -DENABLE_PRECOMPILED_HEADERS=OFF \
    -DWITH_OPENCL=OFF \
    -DWITH_OPENMP=OFF \
    -DWITH_FFMPEG=ON \
    -DWITH_GSTREAMER=OFF \
    -DWITH_GSTREAMER_0_10=OFF \
    -DWITH_CUDA=ON \
    -DWITH_GTK=ON \
    -DWITH_VTK=OFF \
    -DWITH_TBB=ON \
    -DWITH_1394=OFF \
    -DWITH_OPENEXR=OFF \
    -DCUDA_TOOLKIT_ROOT_DIR=/usr/local/cuda-8.0 \
    -DCUDA_ARCH_BIN=5.3 \
    -DCUDA_ARCH_PTX="" \
    -DINSTALL_C_EXAMPLES=ON \
    -DINSTALL_TESTS=OFF \
    -DOPENCV_TEST_DATA_PATH=../opencv_extra/testdata \
    ../opencv
```

**Note:** This configuration does not set the `ENABLE_NEON` parameter.

### Ubuntu Desktop Linux Configuration

Supported platforms:

-   Ubuntu Desktop Linux 14.04 LTS
-   Ubuntu Desktop Linux 16.04 LTS

The configuration options given to `cmake` below are targeted towards the functionality needed for Tegra. For a desktop system, you may wish to adjust some options to enable (or disable) certain features. The features enabled below are based on the building of OpenCV 2.4.13.

```
$ cmake \
    -DCMAKE_BUILD_TYPE=Release \
    -DCMAKE_INSTALL_PREFIX=/usr \
    -DBUILD_PNG=OFF \
    -DBUILD_TIFF=OFF \
    -DBUILD_TBB=OFF \
    -DBUILD_JPEG=OFF \
    -DBUILD_JASPER=OFF \
    -DBUILD_ZLIB=OFF \
    -DBUILD_EXAMPLES=ON \
    -DBUILD_JAVA=OFF \
    -DBUILD_opencv_python2=ON \
    -DBUILD_opencv_python3=OFF \
    -DWITH_OPENCL=OFF \
    -DWITH_OPENMP=OFF \
    -DWITH_FFMPEG=ON \
    -DWITH_GSTREAMER=OFF \
    -DWITH_GSTREAMER_0_10=OFF \
    -DWITH_CUDA=ON \
    -DWITH_GTK=ON \
    -DWITH_VTK=OFF \
    -DWITH_TBB=ON \
    -DWITH_1394=OFF \
    -DWITH_OPENEXR=OFF \
    -DCUDA_TOOLKIT_ROOT_DIR=/usr/local/cuda-8.0 \
    -DCUDA_ARCH_BIN='3.0 3.5 5.0 6.0 6.2' \
    -DCUDA_ARCH_PTX="" \
    -DINSTALL_C_EXAMPLES=ON \
    -DINSTALL_TESTS=OFF \
    -DOPENCV_TEST_DATA_PATH=../opencv_extra/testdata \
    ../opencv
```

This configuration is nearly identical to that for V4L and L4T, except that the `CUDA_ARCH_BIN` parameter specifies multiple architectures so as to support a variety of GPU boards. For a desktop, you have the option of omitting this parameter, and CMake will instead run a small test program that probes for the supported architectures. However, the libraries produced might not work on Ubuntu systems with different cards.

As with previous examples, the configuration given above builds the Python bindings for Python 2 (but not Python 3) as part of the build process.

## Building OpenCV {#tutorial\_building\_tegra\_cuda\_building}

Once `cmake` finishes configuring OpenCV, building is done using the standard `make` utility.

### Building with `make`

The only parameter that is needed for the invocation of `make` is the `-j` parameter for specifying how many parallel threads to use. This varies depending on the system and how much memory is available, other running processes, etc. The following table offers suggested values for this parameter:

Platform

Suggested value

Notes

DRIVE PX 2

6

Jetson TK1

3

If the build fails due to a compiler-related error, try again with a smaller number of threads. Also consider rebooting the system if it has been running for a long time since the last reboot.

Jetson TX1

4

Ubuntu Desktop

7

The actual value will vary with the number of cores you have and the amount of physical memory. Because of the resource requirements of compiling the CUDA code, it is not recommended to go above 7.

Based on the value you select, build (assuming you selected 6):

```
$ make -j6
```

By default, CMake hides the details of the build steps. If you need to see more detail about each compilation unit, etc., you can enable verbose output:

```
$ make -j6 VERBOSE=1
```

## Testing OpenCV {#tutorial\_building\_tegra\_cuda\_testing}

Once the build completes successfully, you have the option of running the extensive set of tests that OpenCV provides. If you did not clone the `opencv_extra` repository and specify the path to `testdata` in the `cmake` invocation, then testing is not recommended.

### Testing under Linux

To run the basic tests under Linux, execute:

```
$ make test
```

This executes `ctest` to carry out the tests, as specified in CTest syntax within the OpenCV repository. The `ctest` harness takes many different parameters (too many to list here, see the manual page for CTest to see the full set), and if you wish to pass any of them, you can do so by specifying them in a `make` command-line parameter called `ARGS`:

```
$ make test ARGS="--verbose --parallel 3"
```

In this example, there are two (2) arguments passed to `ctest`: `--verbose` and `--parallel 3`. The first argument causes the output from `ctest` to be more detailed, and the second causes `ctest` to run as many as three (3) tests in parallel. As with choosing a thread count for building, base any choice for testing on the available number of processor cores, physical memory, etc. Some of the tests do attempt to allocate significant amounts of memory.

#### Known Issues with Tests

At present, not all of the tests in the OpenCV test suite pass. There are tests that fail whether or not CUDA is compiled, and there are tests that are only specific to CUDA that also do not currently pass.

**Note:** There are no tests that pass without CUDA but fail only when CUDA is included.

As the full lists of failing tests vary based on platform, it is impractical to list them here.

## Installing OpenCV {#tutorial\_building\_tegra\_cuda\_installing}

Installing OpenCV is very straightforward. For the Linux-based platforms, the command is:

```
$ make install
```

Depending on the chosen installation location, you may need root privilege to install.

## Building OpenCV 2.4.X {#tutorial\_building\_tegra\_cuda\_opencv\_24X}

If you wish to build your own version of the 2.4 version of OpenCV, there are only a few adjustments that must be made. At the time of this writing, the latest version on the 2.4 tree is 2.4.13. These instructions may work for later versions of 2.4, though they have not been tested for any earlier versions.

**Note:** The 2.4.X OpenCV source does not have the extra modules and code for Tegra that was upstreamed into the 3.X versions of OpenCV. This part of the guide is only for cases where you want to build a vanilla version of OpenCV 2.4.

### Selecting the 2.4 Source

First you must select the correct source branch or tag. If you want a specific version such as 2.4.13, you want to make a local branch based on the tag, as was done with the 3.1.0 tag above:

```
# Within the opencv directory:
$ git checkout -b v2.4.13 2.4.13

# Within the opencv_extra directory:
$ git checkout -b v2.4.13 2.4.13
```

If you simply want the newest code from the 2.4 line of OpenCV, there is a `2.4` branch already in the repository. You can check that out instead of a specific tag:

```
$ git checkout 2.4
```

There is no need for the `git cherry-pick` commands used with 3.1.0 when building the 2.4.13 source.

### Configuring

Configuring is done with CMake as before. The primary difference is that OpenCV 2.4 only provides Python bindings for Python 2, and thus does not distinguish between Python 2 and Python 3 in the CMake parameters. There is only one parameter, `BUILD_opencv_python`. In addition, there is a build-related parameter that controls features in 2.4 that are not in 3.1.0. This parameter is `BUILD_opencv_nonfree`.

Configuration still takes place in a separate directory that must be a sibling to the `opencv` and `opencv_extra` directories.

#### Configuring Vibrante V4L

For DRIVE PX 2:

```
$ cmake \
    -DCMAKE_BUILD_TYPE=Release \
    -DCMAKE_INSTALL_PREFIX=/usr \
    -DBUILD_PNG=OFF \
    -DBUILD_TIFF=OFF \
    -DBUILD_TBB=OFF \
    -DBUILD_JPEG=OFF \
    -DBUILD_JASPER=OFF \
    -DBUILD_ZLIB=OFF \
    -DBUILD_EXAMPLES=ON \
    -DBUILD_JAVA=OFF \
    -DBUILD_opencv_nonfree=OFF \
    -DBUILD_opencv_python=ON \
    -DENABLE_NEON=ON \
    -DWITH_OPENCL=OFF \
    -DWITH_OPENMP=OFF \
    -DWITH_FFMPEG=ON \
    -DWITH_GSTREAMER=OFF \
    -DWITH_GSTREAMER_0_10=OFF \
    -DWITH_CUDA=ON \
    -DWITH_GTK=ON \
    -DWITH_VTK=OFF \
    -DWITH_TBB=ON \
    -DWITH_1394=OFF \
    -DWITH_OPENEXR=OFF \
    -DCUDA_TOOLKIT_ROOT_DIR=/usr/local/cuda-8.0 \
    -DCUDA_ARCH_BIN=6.2 \
    -DCUDA_ARCH_PTX="" \
    -DINSTALL_C_EXAMPLES=ON \
    -DINSTALL_TESTS=ON \
    -DOPENCV_TEST_DATA_PATH=../opencv_extra/testdata \
    ../opencv
```

#### Configuring Jetson L4T

For Jetson TK1:

```
$ cmake \
    -DCMAKE_BUILD_TYPE=Release \
    -DCMAKE_INSTALL_PREFIX=/usr \
    -DBUILD_PNG=OFF \
    -DBUILD_TIFF=OFF \
    -DBUILD_TBB=OFF \
    -DBUILD_JPEG=OFF \
    -DBUILD_JASPER=OFF \
    -DBUILD_ZLIB=OFF \
    -DBUILD_EXAMPLES=ON \
    -DBUILD_JAVA=OFF \
    -DBUILD_opencv_nonfree=OFF \
    -DBUILD_opencv_python=ON \
    -DENABLE_NEON=ON \
    -DWITH_OPENCL=OFF \
    -DWITH_OPENMP=OFF \
    -DWITH_FFMPEG=ON \
    -DWITH_GSTREAMER=OFF \
    -DWITH_GSTREAMER_0_10=OFF \
    -DWITH_CUDA=ON \
    -DWITH_GTK=ON \
    -DWITH_VTK=OFF \
    -DWITH_TBB=ON \
    -DWITH_1394=OFF \
    -DWITH_OPENEXR=OFF \
    -DCUDA_TOOLKIT_ROOT_DIR=/usr/local/cuda-6.5 \
    -DCUDA_ARCH_BIN=3.2 \
    -DCUDA_ARCH_PTX="" \
    -DINSTALL_C_EXAMPLES=ON \
    -DINSTALL_TESTS=ON \
    -DOPENCV_TEST_DATA_PATH=../opencv_extra/testdata \
    ../opencv
```

For Jetson TX1:

```
$ cmake \
    -DCMAKE_BUILD_TYPE=Release \
    -DCMAKE_INSTALL_PREFIX=/usr \
    -DBUILD_PNG=OFF \
    -DBUILD_TIFF=OFF \
    -DBUILD_TBB=OFF \
    -DBUILD_JPEG=OFF \
    -DBUILD_JASPER=OFF \
    -DBUILD_ZLIB=OFF \
    -DBUILD_EXAMPLES=ON \
    -DBUILD_JAVA=OFF \
    -DBUILD_opencv_nonfree=OFF \
    -DBUILD_opencv_python=ON \
    -DENABLE_PRECOMPILED_HEADERS=OFF \
    -DWITH_OPENCL=OFF \
    -DWITH_OPENMP=OFF \
    -DWITH_FFMPEG=ON \
    -DWITH_GSTREAMER=OFF \
    -DWITH_GSTREAMER_0_10=OFF \
    -DWITH_CUDA=ON \
    -DWITH_GTK=ON \
    -DWITH_VTK=OFF \
    -DWITH_TBB=ON \
    -DWITH_1394=OFF \
    -DWITH_OPENEXR=OFF \
    -DCUDA_TOOLKIT_ROOT_DIR=/usr/local/cuda-8.0 \
    -DCUDA_ARCH_BIN=5.3 \
    -DCUDA_ARCH_PTX="" \
    -DINSTALL_C_EXAMPLES=ON \
    -DINSTALL_TESTS=ON \
    -DOPENCV_TEST_DATA_PATH=../opencv_extra/testdata \
    ../opencv
```

#### Configuring Desktop Ubuntu Linux

For both 14.04 LTS and 16.04 LTS:

```
$ cmake \
    -DCMAKE_BUILD_TYPE=Release \
    -DCMAKE_INSTALL_PREFIX=/usr \
    -DBUILD_PNG=OFF \
    -DBUILD_TIFF=OFF \
    -DBUILD_TBB=OFF \
    -DBUILD_JPEG=OFF \
    -DBUILD_JASPER=OFF \
    -DBUILD_ZLIB=OFF \
    -DBUILD_EXAMPLES=ON \
    -DBUILD_JAVA=OFF \
    -DBUILD_opencv_nonfree=OFF \
    -DBUILD_opencv_python=ON \
    -DWITH_OPENCL=OFF \
    -DWITH_OPENMP=OFF \
    -DWITH_FFMPEG=ON \
    -DWITH_GSTREAMER=OFF \
    -DWITH_GSTREAMER_0_10=OFF \
    -DWITH_CUDA=ON \
    -DWITH_GTK=ON \
    -DWITH_VTK=OFF \
    -DWITH_TBB=ON \
    -DWITH_1394=OFF \
    -DWITH_OPENEXR=OFF \
    -DCUDA_TOOLKIT_ROOT_DIR=/usr/local/cuda-8.0 \
    -DCUDA_ARCH_BIN='3.0 3.5 5.0 6.0 6.2' \
    -DCUDA_ARCH_PTX="" \
    -DINSTALL_C_EXAMPLES=ON \
    -DINSTALL_TESTS=ON \
    -DOPENCV_TEST_DATA_PATH=../opencv_extra/testdata \
    ../opencv
```

### Building, Testing and Installing

Once configured, the steps of building, testing, and installing are the same as above for the 3.1.0 source.

## CMake Parameter Reference {#tutorial\_building\_tegra\_cuda\_parameter\_reference}

The following is a table of all the parameters passed to CMake in the recommended invocations above. Some of these are parameters from CMake itself, while most are specific to OpenCV.

Parameter

Our Default Value

What It Does

Notes

BUILD\_EXAMPLES

ON

Governs whether the C/C++ examples are built

BUILD\_JASPER

OFF

Governs whether the Jasper library (`libjasper`) is built from source in the `3rdparty` directory

BUILD\_JPEG

OFF

As above, for `libjpeg`

BUILD\_PNG

OFF

As above, for `libpng`

BUILD\_TBB

OFF

As above, for `tbb`

BUILD\_TIFF

OFF

As above, for `libtiff`

BUILD\_ZLIB

OFF

As above, for `zlib`

BUILD\_JAVA

OFF

Controls the building of the Java bindings for OpenCV

Building the Java bindings requires OpenCV libraries be built for static linking only

BUILD\_opencv\_nonfree

OFF

Controls the building of non-free (non-open-source) elements

Used only for building 2.4.X

BUILD\_opencv\_python

ON

Controls the building of the Python 2 bindings in OpenCV 2.4.X

Used only for building 2.4.X

BUILD\_opencv\_python2

ON

Controls the building of the Python 2 bindings in OpenCV 3.1.0

Not used in 2.4.X

BUILD\_opencv\_python3

OFF

Controls the building of the Python 3 bindings in OpenCV 3.1.0

Not used in 2.4.X

CMAKE\_BUILD\_TYPE

Release

Selects the type of build (release vs. development)

Is generally either `Release` or `Debug`

CMAKE\_INSTALL\_PREFIX

/usr

Sets the root for installation of the libraries and header files

CUDA\_ARCH\_BIN

varies

Sets the CUDA architecture(s) for which code is compiled

Usually only passed for platforms with known specific cards. OpenCV includes a small program that determines the architectures of the system's installed card if you do not pass this parameter. Here, for Ubuntu desktop, the value is a list to maximize card support.

CUDA\_ARCH\_PTX

""

Builds PTX intermediate code for the specified virtual PTX architectures

CUDA\_TOOLKIT\_ROOT\_DIR

/usr/local/cuda-8.0 (for Linux)

Specifies the location of the CUDA include files and libraries

ENABLE\_NEON

ON

Enables the use of NEON SIMD extensions for ARM chips

Only passed for builds on ARM platforms

ENABLE\_PRECOMPILED\_HEADERS

OFF

Enables/disables support for pre-compiled headers

Only specified on some of the ARM platforms

INSTALL\_C\_EXAMPLES

ON

Enables the installation of the C example files as part of `make install`

INSTALL\_TESTS

ON

Enables the installation of the tests as part of `make install`

OPENCV\_TEST\_DATA\_PATH

../opencv\_extra/testdata

Path to the `testdata` directory in the `opencv_extra` repository

WITH\_1394

OFF

Specifies whether to include IEEE-1394 support

WITH\_CUDA

ON

Specifies whether to include CUDA support

WITH\_FFMPEG

ON

Specifies whether to include FFMPEG support

WITH\_GSTREAMER

OFF

Specifies whether to include GStreamer 1.0 support

WITH\_GSTREAMER\_0\_10

OFF

Specifies whether to include GStreamer 0.10 support

WITH\_GTK

ON

Specifies whether to include GTK 2.0 support

Only given on Linux platforms, not Microsoft Windows

WITH\_OPENCL

OFF

Specifies whether to include OpenCL runtime support

WITH\_OPENEXR

OFF

Specifies whether to include ILM support via OpenEXR

WITH\_OPENMP

OFF

Specifies whether to include OpenMP runtime support

WITH\_TBB

ON

Specifies whether to include Intel TBB support

WITH\_VTK

OFF

Specifies whether to include VTK support

Copyright © 2016, NVIDIA CORPORATION. All rights reserved.

## [Clojure Dev Intro](https://docharvest.github.io/docs/opencv5/tutorials/introduction/clojure_dev_intro/clojure_dev_intro/)

Contents

opencv5

Clojure Dev Intro

OpenCV 5

Clojure Dev Intro

# Introduction to OpenCV Development with Clojure {#tutorial\_clojure\_dev\_intro}

@prev\_tutorial{tutorial\_java\_eclipse} @next\_tutorial{tutorial\_android\_dev\_intro}

Original author

Mimmo Cosenza

Compatibility

OpenCV >= 3.0

@tableofcontents

@warning This tutorial can contain obsolete information.

As of OpenCV 2.4.4, OpenCV supports desktop Java development using nearly the same interface as for Android development.

[Clojure](http://clojure.org/) is a contemporary LISP dialect hosted by the Java Virtual Machine and it offers a complete interoperability with the underlying JVM. This means that we should even be able to use the Clojure REPL (Read Eval Print Loop) as and interactive programmable interface to the underlying OpenCV engine.

## What we'll do in this tutorial

This tutorial will help you in setting up a basic Clojure environment for interactively learning OpenCV within the fully programmable CLojure REPL.

### Tutorial source code

You can find a runnable source code of the sample in the `samples/java/clojure/simple-sample` folder of the OpenCV repository. After having installed OpenCV and Clojure as explained in the tutorial, issue the following command to run the sample from the command line. @code{.bash} cd path/to/samples/java/clojure/simple-sample lein run @endcode

## Preamble

For detailed instruction on installing OpenCV with desktop Java support refer to the @ref tutorial\_java\_dev\_intro "corresponding tutorial".

If you are in hurry, here is a minimum quick start guide to install OpenCV on Mac OS X:

@note I'm assuming you already installed [xcode](https://developer.apple.com/xcode/), [jdk](http://www.oracle.com/technetwork/java/javase/downloads/index.html) and [Cmake](http://www.cmake.org/cmake/resources/software.html).

@code{.bash} cd ~/ mkdir opt git clone [https://github.com/opencv/opencv.git](https://github.com/opencv/opencv.git) cd opencv git checkout 2.4 mkdir build cd build cmake -DBUILD\_SHARED\_LIBS=OFF .. ... ... make -j8

# optional

# make install

@endcode

## Install Leiningen

Once you installed OpenCV with desktop java support the only other requirement is to install [Leiningeng](https://github.com/technomancy/leiningen) which allows you to manage the entire life cycle of your CLJ projects.

The available [installation guide](https://github.com/technomancy/leiningen#installation) is very easy to be followed:

\-# [Download the script](https://raw.github.com/technomancy/leiningen/stable/bin/lein) -# Place it on your $PATH (cf. ~/bin is a good choice if it is on your path.) -# Set the script to be executable. (i.e. chmod 755 ~/bin/lein).

If you work on Windows, follow [this instruction](https://github.com/technomancy/leiningen#windows)

You now have both the OpenCV library and a fully installed basic Clojure environment. What is now needed is to configure the Clojure environment to interact with the OpenCV library.

## Install the localrepo Leiningen plugin

The set of commands (tasks in Leiningen parlance) natively supported by Leiningen can be very easily extended by various plugins. One of them is the [lein-localrepo](https://github.com/kumarshantanu/lein-localrepo) plugin which allows to install any jar lib as an artifact in the local maven repository of your machine (typically in the ~/.m2/repository directory of your username).

We're going to use this lein plugin to add to the local maven repository the opencv components needed by Java and Clojure to use the opencv lib.

Generally speaking, if you want to use a plugin on project base only, it can be added directly to a CLJ project created by lein.

Instead, when you want a plugin to be available to any CLJ project in your username space, you can add it to the profiles.clj in the ~/.lein/ directory.

The lein-localrepo plugin will be useful to me in other CLJ projects where I need to call native libs wrapped by a Java interface. So I decide to make it available to any CLJ project: @code{.bash} mkdir ~/.lein @endcode Create a file named profiles.clj in the ~/.lein directory and copy into it the following content: @code{.clojure} {:user {:plugins \[\[lein-localrepo "0.5.2"\]\]}} @endcode Here we're saying that the version release "0.5.2" of the lein-localrepo plugin will be available to the :user profile for any CLJ project created by lein.

You do not need to do anything else to install the plugin because it will be automatically downloaded from a remote repository the very first time you issue any lein task.

## Install the java specific libs as local repository

If you followed the standard documentation for installing OpenCV on your computer, you should find the following two libs under the directory where you built OpenCV:

-   the build/bin/opencv-247.jar java lib
-   the build/lib/libopencv\_java247.dylib native lib (or .so in you built OpenCV a GNU/Linux OS)

They are the only opencv libs needed by the JVM to interact with OpenCV.

### Take apart the needed opencv libs

Create a new directory to store in the above two libs. Start by copying into it the opencv-247.jar lib. @code{.bash} cd ~/opt mkdir clj-opencv cd clj-opencv cp ~/opt/opencv/build/bin/opencv-247.jar . @endcode First lib done.

Now, to be able to add the libopencv\_java247.dylib shared native lib to the local maven repository, we first need to package it as a jar file.

The native lib has to be copied into a directories layout which mimics the names of your operating system and architecture. I'm using a Mac OS X with a X86 64 bit architecture. So my layout will be the following: @code{.bash} mkdir -p native/macosx/x86\_64 @endcode Copy into the x86\_64 directory the libopencv\_java247.dylib lib. @code{.bash} cp ~/opt/opencv/build/lib/libopencv\_java247.dylib native/macosx/x86\_64/ @endcode If you're running OpenCV from a different OS/Architecture pair, here is a summary of the mapping you can choose from. @code{.bash} OS

Mac OS X -> macosx Windows -> windows Linux -> linux SunOS -> solaris

Architectures

amd64 -> x86\_64 x86\_64 -> x86\_64 x86 -> x86 i386 -> x86 arm -> arm sparc -> sparc @endcode

### Package the native lib as a jar

Next you need to package the native lib in a jar file by using the jar command to create a new jar file from a directory. @code{.bash} jar -cMf opencv-native-247.jar native @endcode Note that ehe M option instructs the jar command to not create a MANIFEST file for the artifact.

Your directories layout should look like the following: @code{.bash} tree . |\_\_ native | |\_\_ macosx | |\_\_ x86\_64 | |\_\_ libopencv\_java247.dylib | |\_\_ opencv-247.jar |\_\_ opencv-native-247.jar

3 directories, 3 files @endcode

### Locally install the jars

We are now ready to add the two jars as artifacts to the local maven repository with the help of the lein-localrepo plugin. @code{.bash} lein localrepo install opencv-247.jar opencv/opencv 2.4.7 @endcode Here the localrepo install task creates the 2.4.7. release of the opencv/opencv maven artifact from the opencv-247.jar lib and then installs it into the local maven repository. The opencv/opencv artifact will then be available to any maven compliant project (Leiningen is internally based on maven).

Do the same thing with the native lib previously wrapped in a new jar file. @code{.bash} lein localrepo install opencv-native-247.jar opencv/opencv-native 2.4.7 @endcode Note that the groupId, opencv, of the two artifacts is the same. We are now ready to create a new CLJ project to start interacting with OpenCV.

### Create a project

Create a new CLJ project by using the lein new task from the terminal. @code{.bash}

# cd in the directory where you work with your development projects (e.g. ~/devel)

lein new simple-sample Generating a project called simple-sample based on the 'default' template. To see other templates (app, lein plugin, etc), try `lein help new`. @endcode The above task creates the following simple-sample directories layout: @code{.bash} tree simple-sample/ simple-sample/ |\_\_ LICENSE |\_\_ README.md |\_\_ doc | |\_\_ intro.md | |\_\_ project.clj |\_\_ resources |\_\_ src | |\_\_ simple\_sample | |\_\_ core.clj |\_\_ test |\_\_ simple\_sample |\_\_ core\_test.clj

6 directories, 6 files @endcode We need to add the two opencv artifacts as dependencies of the newly created project. Open the project.clj and modify its dependencies section as follows: @code{.bash} (defproject simple-sample "0.1.0-SNAPSHOT" description "FIXME: write description" url "[http://example.com/FIXME](http://example.com/FIXME)" license {:name "Eclipse Public License" url "[http://www.eclipse.org/legal/epl-v10.html"}](http://www.eclipse.org/legal/epl-v10.html%22%7D) dependencies \[\[org.clojure/clojure "1.5.1"\] \[opencv/opencv "2.4.7"\] ; added line \[opencv/opencv-native "2.4.7"\]\]) ;added line @endcode Note that The Clojure Programming Language is a jar artifact too. This is why Clojure is called an hosted language.

To verify that everything went right issue the lein deps task. The very first time you run a lein task it will take sometime to download all the required dependencies before executing the task itself. @code{.bash} cd simple-sample lein deps ... @endcode The deps task reads and merges from the project.clj and the ~/.lein/profiles.clj files all the dependencies of the simple-sample project and verifies if they have already been cached in the local maven repository. If the task returns without messages about not being able to retrieve the two new artifacts your installation is correct, otherwise go back and double check that you did everything right.

### REPLing with OpenCV

Now cd in the simple-sample directory and issue the following lein task: @code{.bash} cd simple-sample lein repl ... ... nREPL server started on port 50907 on host 127.0.0.1 REPL-y 0.3.0 Clojure 1.5.1 Docs: (doc function-name-here) (find-doc "part-of-name-here") Source: (source function-name-here) Javadoc: (javadoc java-object-or-class-here) Exit: Control+D or (exit) or (quit) Results: Stored in vars \*1, \*2, \*3, an exception in \*e

user=> @endcode You can immediately interact with the REPL by issuing any CLJ expression to be evaluated. @code{.clojure} user=> (+ 41 1) 42 user=> (println "Hello, OpenCV!") Hello, OpenCV! nil user=> (defn foo \[\] (str "bar")) #'user/foo user=> (foo) "bar" @endcode When ran from the home directory of a lein based project, even if the lein repl task automatically loads all the project dependencies, you still need to load the opencv native library to be able to interact with the OpenCV. @code{.clojure} user=> (clojure.lang.RT/loadLibrary org.opencv.core.Core/NATIVE\_LIBRARY\_NAME) nil @endcode Then you can start interacting with OpenCV by just referencing the fully qualified names of its classes.

@note [Here](https://docs.opencv.org/5.x/javadoc/index.html) you can find the full OpenCV Java API.

@code{.clojure} user=> (org.opencv.core.Point. 0 0) #<Point {0.0, 0.0}> @endcode Here we created a two dimensions opencv Point instance. Even if all the java packages included within the java interface to OpenCV are immediately available from the CLJ REPL, it's very annoying to prefix the Point. instance constructors with the fully qualified package name.

Fortunately CLJ offer a very easy way to overcome this annoyance by directly importing the Point class. @code{.clojure} user=> (import 'org.opencv.core.Point) org.opencv.core.Point user=> (def p1 (Point. 0 0)) #'user/p1 user=> p1 #<Point {0.0, 0.0}> user=> (def p2 (Point. 100 100)) #'user/p2 @endcode We can even inspect the class of an instance and verify if the value of a symbol is an instance of a Point java class. @code{.clojure} user=> (class p1) org.opencv.core.Point user=> (instance? org.opencv.core.Point p1) true @endcode If we now want to use the opencv Rect class to create a rectangle, we again have to fully qualify its constructor even if it leaves in the same org.opencv.core package of the Point class. @code{.clojure} user=> (org.opencv.core.Rect. p1 p2) #<Rect {0, 0, 100x100}> @endcode Again, the CLJ importing facilities is very handy and let you to map more symbols in one shot. @code{.clojure} user=> (import '\[org.opencv.core Point Rect Size\]) org.opencv.core.Size user=> (def r1 (Rect. p1 p2)) #'user/r1 user=> r1 #<Rect {0, 0, 100x100}> user=> (class r1) org.opencv.core.Rect user=> (instance? org.opencv.core.Rect r1) true user=> (Size. 100 100) #<Size 100x100> user=> (def sq-100 (Size. 100 100)) #'user/sq-100 user=> (class sq-100) org.opencv.core.Size user=> (instance? org.opencv.core.Size sq-100) true @endcode Obviously you can call methods on instances as well. @code{.clojure} user=> (.area r1) 10000.0 user=> (.area sq-100) 10000.0 @endcode Or modify the value of a member field. @code{.clojure} user=> (set! (.x p1) 10) 10 user=> p1 #<Point {10.0, 0.0}> user=> (set! (.width sq-100) 10) 10 user=> (set! (.height sq-100) 10) 10 user=> (.area sq-100) 100.0 @endcode If you find yourself not remembering a OpenCV class behavior, the REPL gives you the opportunity to easily search the corresponding javadoc documentation: @code{.clojure} user=> (javadoc Rect) "[http://www.google.com/search?btnI=I%27m%20Feeling%20Lucky&q=allinurl:org/opencv/core/Rect.html](http://www.google.com/search?btnI=I%27m%20Feeling%20Lucky&q=allinurl:org/opencv/core/Rect.html)" @endcode

### Mimic the OpenCV Java Tutorial Sample in the REPL

Let's now try to port to Clojure the @ref tutorial\_java\_dev\_intro "OpenCV Java tutorial sample". Instead of writing it in a source file we're going to evaluate it at the REPL.

Following is the original Java source code of the cited sample.

@code{.java} import org.opencv.core.Mat; import org.opencv.core.CvType; import org.opencv.core.Scalar;

class SimpleSample {

static{ System.loadLibrary("opencv\_java244"); }

public static void main(String\[\] args) { Mat m = new Mat(5, 10, CvType.CV\_8UC1, new Scalar(0)); System.out.println("OpenCV Mat: " + m); Mat mr1 = m.row(1); mr1.setTo(new Scalar(1)); Mat mc5 = m.col(5); mc5.setTo(new Scalar(5)); System.out.println("OpenCV Mat data:\\n" + m.dump()); }

} @endcode

### Add injections to the project

Before start coding, we'd like to eliminate the boring need of interactively loading the native opencv lib any time we start a new REPL to interact with it.

First, stop the REPL by evaluating the (exit) expression at the REPL prompt.

@code{.clojure} user=> (exit) Bye for now! @endcode

Then open your project.clj file and edit it as follows:

@code{.clojure} (defproject simple-sample "0.1.0-SNAPSHOT" ... injections \[(clojure.lang.RT/loadLibrary org.opencv.core.Core/NATIVE\_LIBRARY\_NAME)\]) @endcode

Here we're saying to load the opencv native lib anytime we run the REPL in such a way that we have not anymore to remember to manually do it.

Rerun the lein repl task

@code{.bash} lein repl nREPL server started on port 51645 on host 127.0.0.1 REPL-y 0.3.0 Clojure 1.5.1 Docs: (doc function-name-here) (find-doc "part-of-name-here") Source: (source function-name-here) Javadoc: (javadoc java-object-or-class-here) Exit: Control+D or (exit) or (quit) Results: Stored in vars \*1, \*2, \*3, an exception in \*e

user=> @endcode

Import the interested OpenCV java interfaces.

@code{.clojure} user=> (import '\[org.opencv.core Mat CvType Scalar\]) org.opencv.core.Scalar @endcode

We're going to mimic almost verbatim the original OpenCV java tutorial to:

-   create a 5x10 matrix with all its elements initialized to 0
-   change the value of every element of the second row to 1
-   change the value of every element of the 6th column to 5
-   print the content of the obtained matrix

@code{.clojure} user=> (def m (Mat. 5 10 CvType/CV\_8UC1 (Scalar. 0 0))) #'user/m user=> (def mr1 (.row m 1)) #'user/mr1 user=> (.setTo mr1 (Scalar. 1 0)) #<Mat Mat \[ 1\*10\*CV\_8UC1, isCont=true, isSubmat=true, nativeObj=0x7fc9dac49880, dataAddr=0x7fc9d9c98d5a \]> user=> (def mc5 (.col m 5)) #'user/mc5 user=> (.setTo mc5 (Scalar. 5 0)) #<Mat Mat \[ 5\*1\*CV\_8UC1, isCont=false, isSubmat=true, nativeObj=0x7fc9d9c995a0, dataAddr=0x7fc9d9c98d55 \]> user=> (println (.dump m)) \[0, 0, 0, 0, 0, 5, 0, 0, 0, 0; 1, 1, 1, 1, 1, 5, 1, 1, 1, 1; 0, 0, 0, 0, 0, 5, 0, 0, 0, 0; 0, 0, 0, 0, 0, 5, 0, 0, 0, 0; 0, 0, 0, 0, 0, 5, 0, 0, 0, 0\] nil @endcode

If you are accustomed to a functional language all those abused and mutating nouns are going to irritate your preference for verbs. Even if the CLJ interop syntax is very handy and complete, there is still an impedance mismatch between any OOP language and any FP language (bein Scala a mixed paradigms programming language).

To exit the REPL type (exit), ctr-D or (quit) at the REPL prompt. @code{.clojure} user=> (exit) Bye for now! @endcode

### Interactively load and blur an image

In the next sample you will learn how to interactively load and blur and image from the REPL by using the following OpenCV methods:

-   the imread static method from the Highgui class to read an image from a file
-   the imwrite static method from the Highgui class to write an image to a file
-   the GaussianBlur static method from the Imgproc class to apply to blur the original image

We're also going to use the Mat class which is returned from the imread method and accepted as the main argument to both the GaussianBlur and the imwrite methods.

### Add an image to the project

First we want to add an image file to a newly create directory for storing static resources of the project.

@code{.bash} mkdir -p resources/images cp ~/opt/opencv/doc/tutorials/introduction/desktop\_java/images/lena.png resource/images/ @endcode

### Read the image

Now launch the REPL as usual and start by importing all the OpenCV classes we're going to use: @code{.clojure} lein repl nREPL server started on port 50624 on host 127.0.0.1 REPL-y 0.3.0 Clojure 1.5.1 Docs: (doc function-name-here) (find-doc "part-of-name-here") Source: (source function-name-here) Javadoc: (javadoc java-object-or-class-here) Exit: Control+D or (exit) or (quit) Results: Stored in vars \*1, \*2, \*3, an exception in \*e

user=> (import '\[org.opencv.core Mat Size CvType\] '\[org.opencv.imgcodecs Imgcodecs\] '\[org.opencv.imgproc Imgproc\]) org.opencv.imgproc.Imgproc @endcode Now read the image from the resources/images/lena.png file. @code{.clojure} user=> (def lena (Highgui/imread "resources/images/lena.png")) #'user/lena user=> lena #<Mat Mat \[ 512\*512\*CV\_8UC3, isCont=true, isSubmat=false, nativeObj=0x7f9ab3054c40, dataAddr=0x19fea9010 \]> @endcode As you see, by simply evaluating the lena symbol we know that lena.png is a 512x512 matrix of CV\_8UC3 elements type. Let's create a new Mat instance of the same dimensions and elements type. @code{.clojure} user=> (def blurred (Mat. 512 512 CvType/CV\_8UC3)) #'user/blurred user=> @endcode Now apply a GaussianBlur filter using lena as the source matrix and blurred as the destination matrix. @code{.clojure} user=> (Imgproc/GaussianBlur lena blurred (Size. 5 5) 3 3) nil @endcode As a last step just save the blurred matrix in a new image file. @code{.clojure} user=> (Highgui/imwrite "resources/images/blurred.png" blurred) true user=> (exit) Bye for now! @endcode Following is the new blurred image of Lena.

## Next Steps

This tutorial only introduces the very basic environment set up to be able to interact with OpenCV in a CLJ REPL.

I recommend any Clojure newbie to read the [Clojure Java Interop chapter](http://clojure.org/java_interop) to get all you need to know to interoperate with any plain java lib that has not been wrapped in Clojure to make it usable in a more idiomatic and functional way within Clojure.

The OpenCV Java API does not wrap the highgui module functionalities depending on Qt (e.g. namedWindow and imshow. If you want to create windows and show images into them while interacting with OpenCV from the REPL, at the moment you're left at your own. You could use Java Swing to fill the gap.

### License

Copyright © 2013 Giacomo (Mimmo) Cosenza aka Magomimmo

Distributed under the BSD 3-clause License.

## [Config Reference](https://docharvest.github.io/docs/opencv5/tutorials/introduction/config_reference/config_reference/)

Contents

opencv5

Config Reference

OpenCV 5

Config Reference

# OpenCV configuration options reference {#tutorial\_config\_reference}

@prev\_tutorial{tutorial\_general\_install} @next\_tutorial{tutorial\_env\_reference}

@tableofcontents

# Introduction {#tutorial\_config\_reference\_intro}

@note We assume you have read @ref tutorial\_general\_install tutorial or have experience with CMake.

Configuration options can be set in several different ways:

-   Command line: `cmake -Doption=value ...`
-   Initial cache files: `cmake -C my_options.txt ...`
-   Interactive via GUI

In this reference we will use regular command line.

Most of the options can be found in the root cmake script of OpenCV: `opencv/CMakeLists.txt`. Some options can be defined in specific modules.

It is possible to use CMake tool to print all available options:

```
# initial configuration
cmake ../opencv

# print all options
cmake -L

# print all options with help message
cmake -LH

# print all options including advanced
cmake -LA
```

Most popular and useful are options starting with `WITH_`, `ENABLE_`, `BUILD_`, `OPENCV_`.

Default values vary depending on platform and other options values.

# General options {#tutorial\_config\_reference\_general}

## Build with extra modules {#tutorial\_config\_reference\_general\_contrib}

`OPENCV_EXTRA_MODULES_PATH` option contains a semicolon-separated list of directories containing extra modules which will be added to the build. Module directory must have compatible layout and CMakeLists.txt, brief description can be found in the [Coding Style Guide](https://github.com/opencv/opencv/wiki/Coding_Style_Guide).

Examples:

```
# build with all modules in opencv_contrib
cmake -DOPENCV_EXTRA_MODULES_PATH=../opencv_contrib/modules ../opencv

# build with one of opencv_contrib modules
cmake -DOPENCV_EXTRA_MODULES_PATH=../opencv_contrib/modules/bgsegm ../opencv

# build with two custom modules (semicolon must be escaped in bash)
cmake -DOPENCV_EXTRA_MODULES_PATH=../my_mod1\;../my_mod2 ../opencv
```

@note Only 0- and 1-level deep module locations are supported, following command will raise an error:

```
cmake -DOPENCV_EXTRA_MODULES_PATH=../opencv_contrib ../opencv
```

## Build with C++ Standard setting {#tutorial\_config\_reference\_general\_cxx\_standard}

`CMAKE_CXX_STANDARD` option can be used to set C++ standard settings for OpenCV building.

```
cmake -DCMAKE_CXX_STANDARD=17 ../opencv
cmake --build .
```

-   C++11 is default/required/recommended for OpenCV 4.x. C++17 is default/required/recomended for OpenCV 5.x.
-   If your compiler does not support required C++ Standard features, OpenCV configuration should be fail.
-   If you set older C++ Standard than required, OpenCV configuration should be fail. For workaround, `OPENCV_SKIP_CMAKE_CXX_STANDARD` option can be used to skip `CMAKE_CXX_STANDARD` version check.
-   If you set newer C++ Standard than recomended, numerous warnings may appear or OpenCV build may fail.

## Debug build {#tutorial\_config\_reference\_general\_debug}

`CMAKE_BUILD_TYPE` option can be used to enable debug build; resulting binaries will contain debug symbols and most of compiler optimizations will be turned off. To enable debug symbols in Release build turn the `BUILD_WITH_DEBUG_INFO` option on.

On some platforms (e.g. Linux) build type must be set at configuration stage:

```
cmake -DCMAKE_BUILD_TYPE=Debug ../opencv
cmake --build .
```

On other platforms different types of build can be produced in the same build directory (e.g. Visual Studio, XCode):

```
cmake <options> ../opencv
cmake --build . --config Debug
```

If you use GNU libstdc++ (default for GCC) you can turn on the `ENABLE_GNU_STL_DEBUG` option, then C++ library will be used in Debug mode, e.g. indexes will be bound-checked during vector element access.

Many kinds of optimizations can be disabled with `CV_DISABLE_OPTIMIZATION` option:

-   Some third-party libraries (e.g. IPP, Lapack, Eigen)
-   Explicit vectorized implementation (universal intrinsics, raw intrinsics, etc.)
-   Dispatched optimizations
-   Explicit loop unrolling

@see [https://cmake.org/cmake/help/latest/variable/CMAKE\_BUILD\_TYPE.html](https://cmake.org/cmake/help/latest/variable/CMAKE_BUILD_TYPE.html) @see [https://gcc.gnu.org/onlinedocs/libstdc++/manual/using\_macros.html](https://gcc.gnu.org/onlinedocs/libstdc++/manual/using_macros.html) @see [https://github.com/opencv/opencv/wiki/CPU-optimizations-build-options](https://github.com/opencv/opencv/wiki/CPU-optimizations-build-options)

## Static build {#tutorial\_config\_reference\_general\_static}

`BUILD_SHARED_LIBS` option control whether to produce dynamic (.dll, .so, .dylib) or static (.a, .lib) libraries. Default value depends on target platform, in most cases it is `ON`.

Example:

```
cmake -DBUILD_SHARED_LIBS=OFF ../opencv
```

@see [https://en.wikipedia.org/wiki/Static\_library](https://en.wikipedia.org/wiki/Static_library)

`ENABLE_PIC` sets the [CMAKE\_POSITION\_INDEPENDENT\_CODE](https://cmake.org/cmake/help/latest/variable/CMAKE_POSITION_INDEPENDENT_CODE.html) option. It enables or disable generation of "position-independent code". This option must be enabled when building dynamic libraries or static libraries intended to be linked into dynamic libraries. Default value is `ON`.

@see [https://en.wikipedia.org/wiki/Position-independent\_code](https://en.wikipedia.org/wiki/Position-independent_code)

## Generate pkg-config info

`OPENCV_GENERATE_PKGCONFIG` option enables `.pc` file generation along with standard CMake package. This file can be useful for projects which do not use CMake for build.

Example:

```
cmake -DOPENCV_GENERATE_PKGCONFIG=ON ../opencv
```

@note Due to complexity of configuration process resulting `.pc` file can contain incomplete list of third-party dependencies and may not work in some configurations, especially for static builds. This feature is not officially supported since 4.x version and is disabled by default.

## Build tests, samples and applications {#tutorial\_config\_reference\_general\_tests}

There are two kinds of tests: accuracy (`opencv_test_*`) and performance (`opencv_perf_*`). Tests and applications are enabled by default. Examples are not being built by default and should be enabled explicitly.

Corresponding _cmake_ options:

```
cmake \
  -DBUILD_TESTS=ON \
  -DBUILD_PERF_TESTS=ON \
  -DBUILD_EXAMPLES=ON \
  -DBUILD_opencv_apps=ON \
  ../opencv
```

## Build limited set of modules {#tutorial\_config\_reference\_general\_modules}

Each module is a subdirectory of the `modules` directory. It is possible to disable one module:

```
cmake -DBUILD_opencv_geometry=OFF ../opencv
```

The opposite option is to build only specified modules and all modules they depend on:

```
cmake -DBUILD_LIST=geometry,videoio,ts ../opencv
```

In this example we requested 3 modules and configuration script has determined all dependencies automatically:

```
--   OpenCV modules:
--     To be built:                 core features flann geometry highgui imgcodecs imgproc ts videoio
```

## Downloaded dependencies {#tutorial\_config\_reference\_general\_download}

Configuration script can try to download additional libraries and files from the internet, if it fails to do it corresponding features will be turned off. In some cases configuration error can occur. By default all files are first downloaded to the `<source>/.cache` directory and then unpacked or copied to the build directory. It is possible to change download cache location by setting environment variable or configuration option:

```
export OPENCV_DOWNLOAD_PATH=/tmp/opencv-cache
cmake ../opencv
# or
cmake -DOPENCV_DOWNLOAD_PATH=/tmp/opencv-cache ../opencv
```

In case of access via proxy, corresponding environment variables should be set before running cmake:

```
export http_proxy=<proxy-host>:<port>
export https_proxy=<proxy-host>:<port>
```

Full log of download process can be found in build directory - `CMakeDownloadLog.txt`. In addition, for each failed download a command will be added to helper scripts in the build directory, e.g. `download_with_wget.sh`. Users can run these scripts as is or modify according to their needs.

## CPU optimization level {#tutorial\_config\_reference\_general\_cpu}

On x86\_64 machines the library will be compiled for SSE3 instruction set level by default. This level can be changed by configuration option:

```
cmake -DCPU_BASELINE=AVX2 ../opencv
```

@note Other platforms have their own instruction set levels: `VFPV3` and `NEON` on ARM, `VSX` on PowerPC.

Some functions support dispatch mechanism allowing to compile them for several instruction sets and to choose one during runtime. List of enabled instruction sets can be changed during configuration:

```
cmake -DCPU_DISPATCH=AVX,AVX2 ../opencv
```

To disable dispatch mechanism this option should be set to an empty value:

```
cmake -DCPU_DISPATCH= ../opencv
```

It is possible to disable optimized parts of code for troubleshooting and debugging:

```
# disable universal intrinsics
cmake -DCV_ENABLE_INTRINSICS=OFF ../opencv
# disable all possible built-in optimizations
cmake -DCV_DISABLE_OPTIMIZATION=ON ../opencv
```

@note More details on CPU optimization options can be found in wiki: [https://github.com/opencv/opencv/wiki/CPU-optimizations-build-options](https://github.com/opencv/opencv/wiki/CPU-optimizations-build-options)

## Profiling, coverage, sanitize, hardening, size optimization {#profiling\_coverage\_sanitize\_hardening\_size\_optimization}

Following options can be used to produce special builds with instrumentation or improved security. All options are disabled by default.

Option

Compiler

Description

`ENABLE_PROFILING`

GCC or Clang

Enable profiling compiler and linker options.

`ENABLE_COVERAGE`

GCC or Clang

Enable code coverage support.

`OPENCV_ENABLE_MEMORY_SANITIZER`

N/A

Enable several quirks in code to assist memory sanitizer.

`ENABLE_BUILD_HARDENING`

GCC, Clang, MSVC

Enable compiler options which reduce possibility of code exploitation.

`ENABLE_LTO`

GCC, Clang, MSVC

Enable Link Time Optimization (LTO).

`ENABLE_THIN_LTO`

Clang

Enable thin LTO which incorporates intermediate bitcode to binaries allowing consumers optimize their applications later.

`OPENCV_ALGO_HINT_DEFAULT`

Any

Set default OpenCV implementation hint value: `ALGO_HINT_ACCURATE` or `ALGO_HINT_APPROX`. Dangerous! The option changes behaviour globally and may affect accuracy of many algorithms.

@see [GCC instrumentation](https://gcc.gnu.org/onlinedocs/gcc/Instrumentation-Options.html) @see [Build hardening](https://en.wikipedia.org/wiki/Hardening_\(computing\)) @see [Interprocedural optimization](https://en.wikipedia.org/wiki/Interprocedural_optimization) @see [Link time optimization](https://gcc.gnu.org/wiki/LinkTimeOptimization) @see [ThinLTO](https://clang.llvm.org/docs/ThinLTO.html)

## Enable IPP optimization

Following options can be used to enables IPP optimizations for each functions but increases the size of the opencv library. All options are disabled by default.

Option

Functions

\+ roughly size

`OPENCV_IPP_GAUSSIAN_BLUR`

GaussianBlur()

+8Mb

`OPENCV_IPP_MEAN`

mean() / meanStdDev()

+0.2Mb

`OPENCV_IPP_MINMAX`

minMaxLoc() / minMaxIdx()

+0.2Mb

`OPENCV_IPP_SUM`

sum()

+0.1Mb

# Functional features and dependencies {#tutorial\_config\_reference\_func}

There are many optional dependencies and features that can be turned on or off. _cmake_ has special option allowing to print all available configuration parameters:

```
cmake -LH ../opencv
```

## Options naming conventions

There are three kinds of options used to control dependencies of the library, they have different prefixes:

-   Options starting with `WITH_` enable or disable a dependency
-   Options starting with `BUILD_` enable or disable building and using 3rdparty library bundled with OpenCV
-   Options starting with `HAVE_` indicate that dependency have been enabled, can be used to manually enable a dependency if automatic detection can not be used.

When `WITH_` option is enabled:

-   If `BUILD_` option is enabled, 3rdparty library will be built and enabled => `HAVE_` set to `ON`
-   If `BUILD_` option is disabled, 3rdparty library will be detected and enabled if found => `HAVE_` set to `ON` if dependency is found

## Heterogeneous computation {#tutorial\_config\_reference\_func\_hetero}

### CUDA support

`WITH_CUDA` (default: _OFF_)

Many algorithms have been implemented using CUDA acceleration, these functions are located in separate modules. CUDA toolkit must be installed from the official NVIDIA site as a prerequisite. For cmake versions older than 3.9 OpenCV uses own `cmake/FindCUDA.cmake` script, for newer versions - the one packaged with CMake. Additional options can be used to control build process, e.g. `CUDA_GENERATION` or `CUDA_ARCH_BIN`. These parameters are not documented yet, please consult with the `cmake/OpenCVDetectCUDA.cmake` script for details.

@note Since OpenCV version 4.0 all CUDA-accelerated algorithm implementations have been moved to the _opencv\_contrib_ repository. To build _opencv_ and _opencv\_contrib_ together check @ref tutorial\_config\_reference\_general\_contrib.

@cond CUDA\_MODULES @note Some tutorials can be found in the corresponding section: @ref tutorial\_table\_of\_content\_gpu @see @ref cuda @endcond

@see [https://en.wikipedia.org/wiki/CUDA](https://en.wikipedia.org/wiki/CUDA)

TODO: other options: `WITH_CUFFT`, `WITH_CUBLAS`, `WITH_NVCUVID`?

### OpenCL support

`WITH_OPENCL` (default: _ON_)

Multiple OpenCL-accelerated algorithms are available via so-called "Transparent API (T-API)". This integration uses same functions at the user level as regular CPU implementations. Switch to the OpenCL execution branch happens if input and output image arguments are passed as opaque cv::UMat objects. More information can be found in [the brief introduction](https://opencv.org/opencl/) and @ref core\_opencl

At the build time this feature does not have any prerequisites. During runtime a working OpenCL runtime is required, to check it run `clinfo` and/or `opencv_version --opencl` command. Some parameters of OpenCL integration can be modified using environment variables, e.g. `OPENCV_OPENCL_DEVICE`. However there is no thorough documentation for this feature yet, so please check the source code in `modules/core/src/ocl.cpp` file for details.

@see [https://en.wikipedia.org/wiki/OpenCL](https://en.wikipedia.org/wiki/OpenCL)

TODO: other options: `WITH_OPENCL_SVM`, `WITH_OPENCLAMDFFT`, `WITH_OPENCLAMDBLAS`, `WITH_OPENCL_D3D11_NV`, `WITH_VA_INTEL`

## Image reading and writing (imgcodecs module) {#tutorial\_config\_reference\_func\_imgcodecs}

### Built-in formats

Following formats can be read by OpenCV without help of any third-party library:

Formats

Option

Default

[BMP](https://en.wikipedia.org/wiki/BMP_file_format)

(Always)

_ON_

[HDR](https://en.wikipedia.org/wiki/RGBE_image_format)

`WITH_IMGCODEC_HDR`

_ON_

[Sun Raster](https://en.wikipedia.org/wiki/Sun_Raster)

`WITH_IMGCODEC_SUNRASTER`

_ON_

[PPM, PGM, PBM, PAM](https://en.wikipedia.org/wiki/Netpbm#File_formats)

`WITH_IMGCODEC_PXM`

_ON_

[PFM](https://en.wikipedia.org/wiki/Netpbm#File_formats)

`WITH_IMGCODEC_PFM`

_ON_

[GIF](https://en.wikipedia.org/wiki/GIF)

`WITH_IMGCODEC_GIF`

_ON_

### PNG, JPEG, TIFF, WEBP, JPEG 2000, EXR, JPEG XL, AVIF support

Formats

Library

Option

Default

Force build own

[PNG](https://en.wikipedia.org/wiki/Portable_Network_Graphics)

[libpng](https://en.wikipedia.org/wiki/Libpng)

`WITH_PNG`

_ON_

`BUILD_PNG`

^

[libspng(simple png)](https://libspng.org/)

`WITH_SPNG`

_OFF_

`BUILD_SPNG`

[JPEG](https://en.wikipedia.org/wiki/JPEG)

[libjpeg-turbo](https://en.wikipedia.org/wiki/Libjpeg)

`WITH_JPEG`

_ON_

`BUILD_JPEG`

^

[libjpeg](https://en.wikipedia.org/wiki/Libjpeg)

`WITH_JPEG`

_OFF_

Not supported. (see note)

[TIFF](https://en.wikipedia.org/wiki/TIFF)

[LibTIFF](https://en.wikipedia.org/wiki/LibTIFF)

`WITH_TIFF`

_ON_

`BUILD_TIFF`

[WebP](https://en.wikipedia.org/wiki/WebP)

`WITH_WEBP`

_ON_

`BUILD_WEBP`

[JPEG 2000](https://en.wikipedia.org/wiki/JPEG_2000)

[OpenJPEG](https://en.wikipedia.org/wiki/OpenJPEG)

`WITH_OPENJPEG`

_ON_

`BUILD_OPENJPEG`

^

[JasPer](https://en.wikipedia.org/wiki/JasPer)

`WITH_JASPER`

_ON_ (see note)

`BUILD_JASPER`

[OpenEXR](https://en.wikipedia.org/wiki/OpenEXR)

`WITH_OPENEXR`

_ON_

`BUILD_OPENEXR`

[JPEG XL](https://en.wikipedia.org/wiki/JPEG_XL)

`WITH_JPEGXL`

_ON_

Not supported. (see note)

[AVIF](https://en.wikipedia.org/wiki/AVIF)

`WITH_AVIF`

_ON_

Not supported. (see note)

Most library source codes required to read/write images in these formats are bundled into OpenCV and will be built automatically if not found at the configuration stage (except for some codecs that require external libraries, e.g. JPEG XL and AVIF). Corresponding BUILD\_\* options will force building and using the bundled libraries; they are enabled by default on some platforms, e.g. Windows.

@note (All) Only one library for each image format can be enabled(e.g. In order to use JasPer for JPEG 2000 format, OpenJPEG must be disabled). @note (JPEG 2000) OpenJPEG have higher priority than JasPer which is deprecated. @note (JPEG) OpenCV 5 doesn't contain libjpeg source code, so `BUILD_JPEG_TURBO_DISABLE` is not supported. Users can use a system-wide installed libjpeg instead of libjpeg-turbo. @note (EXR) OpenCV 5 doesn't contain OpenEXR source code, so `BUILD_OPENEXR` is not supported. Users must provide a system-wide installation of libopenexr. @note (JPEG XL) OpenCV doesn't contain libjxl source code, so `BUILD_JPEGXL` is not supported. Users must provide a system-wide installation of libjxl. @note (AVIF) OpenCV doesn't contain libavif source code, so `BUILD_AVIF` is not supported. Users must provide a system-wide installation of libavif.

@warning OpenEXR ver 2.2 or earlier cannot be used in combination with C++17 or later. In this case, updating OpenEXR ver 2.3.0 or later is required.

### GDAL integration

`WITH_GDAL` (default: _OFF_)

[GDAL](https://en.wikipedia.org/wiki/GDAL) is a higher level library which supports reading multiple file formats including PNG, JPEG and TIFF. It will have higher priority when opening files and can override other backends. This library will be searched using cmake package mechanism, make sure it is installed correctly or manually set `GDAL_DIR` environment or cmake variable.

### GDCM integration

`WITH_GDCM` (default: _OFF_)

Enables [DICOM](https://en.wikipedia.org/wiki/DICOM) medical image format support through [GDCM library](https://en.wikipedia.org/wiki/GDCM). This library will be searched using cmake package mechanism, make sure it is installed correctly or manually set `GDCM_DIR` environment or cmake variable.

## Video reading and writing (videoio module) {#tutorial\_config\_reference\_func\_videoio}

TODO: how videoio works, registry, priorities

### Video4Linux

`WITH_V4L` (Linux; default: _ON_ )

Capture images from camera using [Video4Linux](https://en.wikipedia.org/wiki/Video4Linux) API. Linux kernel headers must be installed.

### FFmpeg

`WITH_FFMPEG` (default: _ON_)

Integration with [FFmpeg](https://en.wikipedia.org/wiki/FFmpeg) library for decoding and encoding video files and network streams. This library can read and write many popular video formats. It consists of several components which must be installed as prerequisites for the build:

-   _avcodec_
-   _avformat_
-   _avutil_
-   _swscale_
-   _avresample_ (optional)

Exception is Windows platform where a prebuilt [plugin library containing FFmpeg](https://github.com/opencv/opencv_3rdparty/tree/ffmpeg/5.x) will be downloaded during a configuration stage and copied to the `bin` folder with all produced libraries.

@note [Libav](https://en.wikipedia.org/wiki/Libav) library can be used instead of FFmpeg, but this combination is not actively supported.

### GStreamer

`WITH_GSTREAMER` (default: _ON_)

Enable integration with [GStreamer](https://en.wikipedia.org/wiki/GStreamer) library for decoding and encoding video files, capturing frames from cameras and network streams. Numerous plugins can be installed to extend supported formats list. OpenCV allows running arbitrary GStreamer pipelines passed as strings to @ref cv::VideoCapture and @ref cv::VideoWriter objects.

Various GStreamer plugins offer HW-accelerated video processing on different platforms.

### Microsoft Media Foundation

`WITH_MSMF` (Windows; default: _ON_)

Enables MSMF backend which uses Windows' built-in [Media Foundation framework](https://en.wikipedia.org/wiki/Media_Foundation). Can be used to capture frames from camera, decode and encode video files. This backend have HW-accelerated processing support (`WITH_MSMF_DXVA` option, default is _ON_).

@note Older versions of Windows (prior to 10) can have incompatible versions of Media Foundation and are known to have problems when used from OpenCV.

### DirectShow

`WITH_DSHOW` (Windows; default: _ON_)

This backend uses older [DirectShow](https://en.wikipedia.org/wiki/DirectShow) framework. It can be used only to capture frames from camera. It is now deprecated in favor of MSMF backend, although both can be enabled in the same build.

### AVFoundation

`WITH_AVFOUNDATION` (Apple; default: _ON_)

[AVFoundation](https://en.wikipedia.org/wiki/AVFoundation) framework is part of Apple platforms and can be used to capture frames from camera, encode and decode video files.

### Other backends

There are multiple less popular frameworks which can be used to read and write videos. Each requires corresponding library or SDK installed.

Option

Default

Description

`WITH_1394`

_OFF_

[IIDC IEEE1394](https://en.wikipedia.org/wiki/IEEE_1394#IIDC) support using DC1394 library

`WITH_OPENNI`

_OFF_

[OpenNI](https://en.wikipedia.org/wiki/OpenNI) can be used to capture data from depth-sensing cameras. Deprecated.

`WITH_OPENNI2`

_OFF_

[OpenNI2](https://structure.io/openni) can be used to capture data from depth-sensing cameras.

`WITH_PVAPI`

_OFF_

[PVAPI](https://www.alliedvision.com/en/support/software-downloads.html) is legacy SDK for Prosilica GigE cameras. Deprecated.

`WITH_ARAVIS`

_OFF_

[Aravis](https://github.com/AravisProject/aravis) library is used for video acquisition using Genicam cameras.

`WITH_XIMEA`

_OFF_

[XIMEA](https://www.ximea.com/) cameras support.

`WITH_XINE`

_OFF_

[XINE](https://en.wikipedia.org/wiki/Xine) library support.

`WITH_LIBREALSENSE`

_OFF_

[RealSense](https://en.wikipedia.org/wiki/Intel_RealSense) cameras support.

`WITH_MFX`

_OFF_

[MediaSDK](http://mediasdk.intel.com/) library can be used for HW-accelerated decoding and encoding of raw video streams.

`WITH_GPHOTO2`

_OFF_

[GPhoto](https://en.wikipedia.org/wiki/GPhoto) library can be used to capure frames from cameras.

`WITH_ANDROID_MEDIANDK`

_ON_

[MediaNDK](https://developer.android.com/ndk/guides/stable_apis#libmediandk) library is available on Android since API level 21.

### videoio plugins

Since version 4.1.0 some _videoio_ backends can be built as plugins thus breaking strict dependency on third-party libraries and making them optional at runtime. Following options can be used to control this mechanism:

Option

Default

Description

`VIDEOIO_ENABLE_PLUGINS`

_ON_

Enable or disable plugins completely.

`VIDEOIO_PLUGIN_LIST`

_empty_

Comma- or semicolon-separated list of backend names to be compiled as plugins. Supported names are _ffmpeg_, _gstreamer_, _msmf_, _mfx_ and _all_.

Check @ref tutorial\_general\_install for standalone plugins build instructions.

## Parallel processing {#tutorial\_config\_reference\_func\_core}

Some of OpenCV algorithms can use multithreading to accelerate processing. OpenCV can be built with one of threading backends.

Backend

Option

Default

Platform

Description

pthreads

`WITH_PTHREADS_PF`

_ON_

Unix-like

Default backend based on [pthreads](https://en.wikipedia.org/wiki/POSIX_Threads) library is available on Linux, Android and other Unix-like platforms. Thread pool is implemented in OpenCV and can be controlled with environment variables `OPENCV_THREAD_POOL_*`. Please check sources in _modules/core/src/parallel\_impl.cpp_ file for details.

Concurrency

N/A

_ON_

Windows

[Concurrency runtime](https://docs.microsoft.com/en-us/cpp/parallel/concrt/concurrency-runtime) is available on Windows and will be turned _ON_ on supported platforms unless other backend is enabled.

GCD

N/A

_ON_

Apple

[Grand Central Dispatch](https://en.wikipedia.org/wiki/Grand_Central_Dispatch) is available on Apple platforms and will be turned _ON_ automatically unless other backend is enabled. Uses global system thread pool.

TBB

`WITH_TBB`

_OFF_

Multiple

[Threading Building Blocks](https://en.wikipedia.org/wiki/Threading_Building_Blocks) is a cross-platform library for parallel programming.

OpenMP

`WITH_OPENMP`

_OFF_

Multiple

[OpenMP](https://en.wikipedia.org/wiki/OpenMP) API relies on compiler support.

HPX

`WITH_HPX`

_OFF_

Multiple

[High Performance ParallelX](https://en.wikipedia.org/wiki/HPX) is an experimental backend which is more suitable for multiprocessor environments.

@note OpenCV can download and build TBB library from GitHub, this functionality can be enabled with the `BUILD_TBB` option.

### Threading plugins

Since version 4.5.2 OpenCV supports dynamically loaded threading backends. At this moment only separate compilation process is supported: first you have to build OpenCV with some _default_ parallel backend (e.g. pthreads), then build each plugin and copy resulting binaries to the _lib_ or _bin_ folder.

Option

Default

Description

PARALLEL\_ENABLE\_PLUGINS

ON

Enable plugin support, if this option is disabled OpenCV will not try to load anything

Check @ref tutorial\_general\_install for standalone plugins build instructions.

## GUI backends (highgui module) {#tutorial\_config\_reference\_highgui}

OpenCV relies on various GUI libraries for window drawing.

Option

Default

Platform

Description

`WITH_GTK`

_ON_

Linux

[GTK](https://en.wikipedia.org/wiki/GTK) is a common toolkit in Linux and Unix-like OS-es. By default version 3 will be used if found, version 2 can be forced with the `WITH_GTK_2_X` option.

`WITH_WIN32UI`

_ON_

Windows

[WinAPI](https://en.wikipedia.org/wiki/Windows_API) is a standard GUI API in Windows.

N/A

_ON_

macOS

[Cocoa](https://en.wikipedia.org/wiki/Cocoa_\(API\)) is a framework used in macOS.

`WITH_QT`

_OFF_

Cross-platform

[Qt](https://en.wikipedia.org/wiki/Qt_\(software\)) is a cross-platform GUI framework.

`WITH_FRAMEBUFFER`

_OFF_

Linux

Experimental backend using [Linux framebuffer](https://en.wikipedia.org/wiki/Linux_framebuffer). Have limited functionality but does not require dependencies.

`WITH_FRAMEBUFFER_XVFB`

_OFF_

Linux

Enables special output mode of the FRAMEBUFFER backend compatible with [xvfb](https://en.wikipedia.org/wiki/Xvfb) tool. Requires some X11 headers.

@note OpenCV compiled with Qt support enables advanced _highgui_ interface, see @ref highgui\_qt for details.

### OpenGL

`WITH_OPENGL` (default: _OFF_)

OpenGL integration can be used to draw HW-accelerated windows with following backends: GTK, WIN32 and Qt. And enables basic interoperability with OpenGL, see @ref core\_opengl and @ref highgui\_opengl for details.

### highgui plugins

Since OpenCV 4.5.3 GTK backend can be build as a dynamically loaded plugin. Following options can be used to control this mechanism:

Option

Default

Description

`HIGHGUI_ENABLE_PLUGINS`

_ON_

Enable or disable plugins completely.

`HIGHGUI_PLUGIN_LIST`

_empty_

Comma- or semicolon-separated list of backend names to be compiled as plugins. Supported names are _gtk_, _gtk2_, _gtk3_, and _all_.

Check @ref tutorial\_general\_install for standalone plugins build instructions.

## Deep learning neural networks inference backends and options (dnn module) {#tutorial\_config\_reference\_dnn}

OpenCV have own DNN inference module which have own build-in engine, but can also use other libraries for optimized processing. Multiple backends can be enabled in single build. Selection happens at runtime automatically or manually.

Option

Default

Description

`WITH_PROTOBUF`

_ON_

Enables [protobuf](https://en.wikipedia.org/wiki/Protocol_Buffers) library search. OpenCV can either build own copy of the library or use external one. This dependency is required by the _dnn_ module, if it can't be found module will be disabled.

`BUILD_PROTOBUF`

_ON_

Build own copy of _protobuf_. Must be disabled if you want to use external library.

`PROTOBUF_UPDATE_FILES`

_OFF_

Re-generate all .proto files. _protoc_ compiler compatible with used version of _protobuf_ must be installed.

`OPENCV_DNN_OPENCL`

_ON_

Enable built-in OpenCL inference backend.

`WITH_INF_ENGINE`

_OFF_

**Deprecated since OpenVINO 2022.1** Enables [Intel Inference Engine (IE)](https://github.com/openvinotoolkit/openvino) backend. Allows to execute networks in IE format (.xml + .bin). Inference Engine must be installed either as part of [OpenVINO toolkit](https://en.wikipedia.org/wiki/OpenVINO), either as a standalone library built from sources.

`INF_ENGINE_RELEASE`

_2020040000_

**Deprecated since OpenVINO 2022.1** Defines version of Inference Engine library which is tied to OpenVINO toolkit version. Must be a 10-digit string, e.g. _2020040000_ for OpenVINO 2020.4.

`WITH_NGRAPH`

_OFF_

**Deprecated since OpenVINO 2022.1** Enables Intel NGraph library support. This library is part of Inference Engine backend which allows executing arbitrary networks read from files in multiple formats supported by OpenCV: ONNX, TensorFlow, PyTorch, etc.. NGraph library must be installed, it is included into Inference Engine.

`WITH_OPENVINO`

_OFF_

Enable Intel OpenVINO Toolkit support. Should be used for OpenVINO>=2022.1 instead of `WITH_INF_ENGINE` and `WITH_NGRAPH`.

`WITH_ONNXRUNTIME`

_OFF_

Enable Microsoft ONNX Runtime backend support for OpenCV DNN.

`DOWNLOAD_ONNXRUNTIME`

_OFF_

Download official ONNX Runtime prebuilt binaries when enabled (or when ONNX Runtime is not available in system paths).

`DOWNLOAD_ONNXRUNTIME_GPU`

_OFF_

Download GPU-enabled ONNX Runtime prebuilt binaries when available (Windows x64 and Linux x64 only). Requires `WITH_ONNXRUNTIME=ON`.

`ONNXRUNTIME_PREFER_STATIC`

_ON_

Prefer static `libonnxruntime.a` when both static and shared ONNX Runtime libraries are available.

`ONNXRUNTIME_VERSION`

_1.25.1_

ONNX Runtime version to download for prebuilt packages.

`OPENCV_DNN_CUDA`

_OFF_

Enable CUDA backend. [CUDA](https://en.wikipedia.org/wiki/CUDA), CUBLAS and [CUDNN](https://developer.nvidia.com/cudnn) must be installed.

`WITH_VULKAN`

_OFF_

Enable experimental [Vulkan](https://en.wikipedia.org/wiki/Vulkan_\(API\)) backend. Does not require additional dependencies, but can use external Vulkan headers (`VULKAN_INCLUDE_DIRS`).

# Installation layout {#tutorial\_config\_reference\_install}

## Installation root {#tutorial\_config\_reference\_install\_root}

To install produced binaries root location should be configured. Default value depends on distribution, in Ubuntu it is usually set to `/usr/local`. It can be changed during configuration:

```
cmake -DCMAKE_INSTALL_PREFIX=/opt/opencv ../opencv
```

This path can be relative to current working directory, in the following example it will be set to `<absolute-path-to-build>/install`:

```
cmake -DCMAKE_INSTALL_PREFIX=install ../opencv
```

After building the library, all files can be copied to the configured install location using the following command:

```
cmake --build . --target install
```

To install binaries to the system location (e.g. `/usr/local`) as a regular user it is necessary to run the previous command with elevated privileges:

```
sudo cmake --build . --target install
```

@note On some platforms (Linux) it is possible to remove symbol information during install. Binaries will become 10-15% smaller but debugging will be limited:

```
cmake --build . --target install/strip
```

## Components and locations {#tutorial\_config\_reference\_install\_comp}

Options cane be used to control whether or not a part of the library will be installed:

Option

Default

Description

`INSTALL_C_EXAMPLES`

_OFF_

Install C++ sample sources from the _samples/cpp_ directory.

`INSTALL_PYTHON_EXAMPLES`

_OFF_

Install Python sample sources from the _samples/python_ directory.

`INSTALL_ANDROID_EXAMPLES`

_OFF_

Install Android sample sources from the _samples/android_ directory.

`INSTALL_BIN_EXAMPLES`

_OFF_

Install prebuilt sample applications (`BUILD_EXAMPLES` must be enabled).

`INSTALL_TESTS`

_OFF_

Install tests (`BUILD_TESTS` must be enabled).

`OPENCV_INSTALL_APPS_LIST`

_all_

Comma- or semicolon-separated list of prebuilt applications to install (from _apps_ directory)

Following options allow to modify components' installation locations relatively to install prefix. Default values of these options depend on platform and other options, please check the _cmake/OpenCVInstallLayout.cmake_ file for details.

Option

Components

`OPENCV_BIN_INSTALL_PATH`

applications, dynamic libraries (_win_)

`OPENCV_TEST_INSTALL_PATH`

test applications

`OPENCV_SAMPLES_BIN_INSTALL_PATH`

sample applications

`OPENCV_LIB_INSTALL_PATH`

dynamic libraries, import libraries (_win_)

`OPENCV_LIB_ARCHIVE_INSTALL_PATH`

static libraries

`OPENCV_3P_LIB_INSTALL_PATH`

3rdparty libraries

`OPENCV_CONFIG_INSTALL_PATH`

cmake config package

`OPENCV_INCLUDE_INSTALL_PATH`

header files

`OPENCV_OTHER_INSTALL_PATH`

extra data files

`OPENCV_SAMPLES_SRC_INSTALL_PATH`

sample sources

`OPENCV_LICENSES_INSTALL_PATH`

licenses for included 3rdparty components

`OPENCV_TEST_DATA_INSTALL_PATH`

test data

`OPENCV_DOC_INSTALL_PATH`

documentation

`OPENCV_JAR_INSTALL_PATH`

JAR file with Java bindings

`OPENCV_JNI_INSTALL_PATH`

JNI part of Java bindings

`OPENCV_JNI_BIN_INSTALL_PATH`

Dynamic libraries from the JNI part of Java bindings

Following options can be used to change installation layout for common scenarios:

Option

Default

Description

`INSTALL_CREATE_DISTRIB`

_OFF_

Tune multiple things to produce Windows and Android distributions.

`INSTALL_TO_MANGLED_PATHS`

_OFF_

Adds one level to several installation locations to allow side-by-side installations. For example, headers will be installed to _/usr/include/opencv-5.x.y_ instead of _/usr/include/opencv5_ with this option enabled.

# Miscellaneous features {#tutorial\_config\_reference\_misc}

Option

Default

Description

`OPENCV_ENABLE_NONFREE`

_OFF_

Some algorithms included in the library are known to be protected by patents and are disabled by default.

`OPENCV_FORCE_3RDPARTY_BUILD`

_OFF_

Enable all `BUILD_` options at once.

`OPENCV_IPP_ENABLE_ALL`

_OFF_

Enable all `OPENCV_IPP_` options at once.

`ENABLE_CCACHE`

_ON_ (on Unix-like platforms)

Enable [ccache](https://en.wikipedia.org/wiki/Ccache) auto-detection. This tool wraps compiler calls and caches results, can significantly improve re-compilation time.

`ENABLE_PRECOMPILED_HEADERS`

_ON_ (for MSVC)

Enable precompiled headers support. Improves build time.

`BUILD_DOCS`

_OFF_

Enable documentation build (_doxygen_, _doxygen\_cpp_, _doxygen\_python_, _doxygen\_javadoc_ targets). [Doxygen](http://www.doxygen.org/index.html) must be installed for C++ documentation build. Python and [BeautifulSoup4](https://en.wikipedia.org/wiki/Beautiful_Soup_\(HTML_parser\)) must be installed for Python documentation build. Javadoc and Ant must be installed for Java documentation build (part of Java SDK).

`ENABLE_PYLINT`

_ON_ (when docs or examples are enabled)

Enable python scripts check with [Pylint](https://en.wikipedia.org/wiki/Pylint) (_check\_pylint_ target). Pylint must be installed.

`ENABLE_FLAKE8`

_ON_ (when docs or examples are enabled)

Enable python scripts check with [Flake8](https://flake8.pycqa.org/) (_check\_flake8_ target). Flake8 must be installed.

`BUILD_JAVA`

_ON_

Enable Java wrappers build. Java SDK and Ant must be installed.

`BUILD_FAT_JAVA_LIB`

_ON_ (for static Android builds)

Build single _opencv\_java_ dynamic library containing all library functionality bundled with Java bindings.

`BUILD_opencv_python3`

_ON_

Build python3 bindings. Python with development files and numpy must be installed.

TODO: need separate tutorials covering bindings builds

## Automated builds

Some features have been added specifically for automated build environments, like continuous integration and packaging systems.

Option

Default

Description

`ENABLE_NOISY_WARNINGS`

_OFF_

Enables several compiler warnings considered _noisy_, i.e. having less importance than others. These warnings are usually ignored but in some cases can be worth being checked for.

`OPENCV_WARNINGS_ARE_ERRORS`

_OFF_

Treat compiler warnings as errors. Build will be halted.

`ENABLE_CONFIG_VERIFICATION`

_OFF_

For each enabled dependency (`WITH_` option) verify that it has been found and enabled (`HAVE_` variable). By default feature will be silently turned off if dependency was not found, but with this option enabled cmake configuration will fail. Convenient for packaging systems which require stable library configuration not depending on environment fluctuations.

`OPENCV_CMAKE_HOOKS_DIR`

_empty_

OpenCV allows to customize configuration process by adding custom hook scripts at each stage and substage. cmake scripts with predefined names located in the directory set by this variable will be included before and after various configuration stages. Examples of file names: _CMAKE\_INIT.cmake_, _PRE\_CMAKE\_BOOTSTRAP.cmake_, _POST\_CMAKE\_BOOTSTRAP.cmake_, etc.. Other names are not documented and can be found in the project cmake files by searching for the _ocv\_cmake\_hook_ macro calls.

`OPENCV_DUMP_HOOKS_FLOW`

_OFF_

Enables a debug message print on each cmake hook script call.

## Contrib Modules

Following build options are utilized in `opencv_contrib` modules, as stated [previously](#tutorial_config_reference_general_contrib), these extra modules can be added to your final build by setting `DOPENCV_EXTRA_MODULES_PATH` option.

Option

Default

Description

`WITH_CLP`

_OFF_

Will add [coinor](https://projects.coin-or.org/Clp) linear programming library build support which is required in `videostab` module. Make sure to install the development libraries of coinor-clp.

# Other non-documented options

`BUILD_ANDROID_PROJECTS` `BUILD_ANDROID_EXAMPLES` `ANDROID_HOME` `ANDROID_SDK` `ANDROID_NDK` `ANDROID_SDK_ROOT`

`CMAKE_TOOLCHAIN_FILE`

`WITH_CAROTENE` `WITH_KLEIDICV` `WITH_CPUFEATURES` `WITH_EIGEN` `WITH_DIRECTX` `WITH_VA` `WITH_LAPACK` `BUILD_ZLIB` `BUILD_ITT` `WITH_IPP` `BUILD_IPP_IW`

## [Tutorial Cross Referencing](https://docharvest.github.io/docs/opencv5/tutorials/introduction/cross_referencing/tutorial_cross_referencing/)

Contents

opencv5

Tutorial Cross Referencing

OpenCV 5

Tutorial Cross Referencing

# Cross referencing OpenCV from other Doxygen projects {#tutorial\_cross\_referencing}

@prev\_tutorial{tutorial\_transition\_guide}

Original author

Sebastian Höffner

Compatibility

OpenCV >= 3.3.0

@warning This tutorial can contain obsolete information.

## Cross referencing OpenCV

[Doxygen](http://www.doxygen.nl) is a tool to generate documentations like the OpenCV documentation you are reading right now. It is used by a variety of software projects and if you happen to use it to generate your own documentation, and you are using OpenCV inside your project, this short tutorial is for you.

Imagine this warning inside your documentation code:

@code /\*\*

-   @warning This functions returns a cv::Mat. \*/ @endcode

Inside your generated documentation this warning will look roughly like this:

@warning This functions returns a %cv::Mat.

While inside the OpenCV documentation the `%cv::Mat` is rendered as a link:

@warning This functions returns a cv::Mat.

To generate links to the OpenCV documentation inside your project, you only have to perform two small steps. First download the file [opencv.tag](opencv.tag) (right-click and choose "save as...") and place it somewhere in your project directory, for example as `docs/doxygen-tags/opencv.tag`.

Open your Doxyfile using your favorite text editor and search for the key `TAGFILES`. Change it as follows:

@code TAGFILES = ./docs/doxygen-tags/opencv.tag=[http://docs.opencv.org/5.0.0](http://docs.opencv.org/5.0.0) @endcode

If you had other definitions already, you can append the line using a `\`:

@code TAGFILES = ./docs/doxygen-tags/libstdc++.tag=[https://gcc.gnu.org/onlinedocs/libstdc++/latest-doxygen](https://gcc.gnu.org/onlinedocs/libstdc++/latest-doxygen)  
./docs/doxygen-tags/opencv.tag=[http://docs.opencv.org/5.0.0](http://docs.opencv.org/5.0.0) @endcode

Doxygen can now use the information from the tag file to link to the OpenCV documentation. Rebuild your documentation right now!

@note To allow others to also use a \*.tag file to link to your documentation, set `GENERATE_TAGFILE = html/your_project.tag`. Your documentation will now contain a `your_project.tag` file in its root directory.

## References

-   [Doxygen: Linking to external documentation](http://www.doxygen.nl/manual/external.html)
-   [opencv.tag](opencv.tag)

## [Arm Crosscompile With Cmake](https://docharvest.github.io/docs/opencv5/tutorials/introduction/crosscompilation/arm_crosscompile_with_cmake/)


## [MultiArch cross-compilation with Ubuntu/Debian{#tutorial_crosscompile_with_multiarch}](https://docharvest.github.io/docs/opencv5/tutorials/introduction/crosscompilation/crosscompile_with_multiarch/)


## [Java Dev Intro](https://docharvest.github.io/docs/opencv5/tutorials/introduction/desktop_java/java_dev_intro/)


## [Display Image](https://docharvest.github.io/docs/opencv5/tutorials/introduction/display_image/display_image/)

Contents

opencv5

Display Image

OpenCV 5

Display Image

# Getting Started with Images {#tutorial\_display\_image}

@prev\_tutorial{tutorial\_building\_tegra\_cuda} @next\_tutorial{tutorial\_documentation}

Original author

Ana Huamán

Compatibility

OpenCV >= 3.4.4

@tableofcontents

@warning This tutorial can contain obsolete information.

## Goal

In this tutorial you will learn how to:

-   Read an image from file (using @ref cv::imread)
-   Display an image in an OpenCV window (using @ref cv::imshow)
-   Write an image to a file (using @ref cv::imwrite)

## Source Code

@add\_toggle\_cpp

-   **Downloadable code**: Click [here](https://github.com/opencv/opencv/tree/5.x/samples/cpp/tutorial_code/introduction/display_image/display_image.cpp)
    
-   **Code at glance:** @include samples/cpp/tutorial\_code/introduction/display\_image/display\_image.cpp @end\_toggle
    

@add\_toggle\_python

-   **Downloadable code**: Click [here](https://github.com/opencv/opencv/tree/5.x/samples/python/tutorial_code/introduction/display_image/display_image.py)
    
-   **Code at glance:** @include samples/python/tutorial\_code/introduction/display\_image/display\_image.py @end\_toggle
    

## Explanation

@add\_toggle\_cpp In OpenCV 3 we have multiple modules. Each one takes care of a different area or approach towards image processing. You could already observe this in the structure of the user guide of these tutorials itself. Before you use any of them you first need to include the header files where the content of each individual module is declared.

You'll almost always end up using the:

-   @ref core "core" section, as here are defined the basic building blocks of the library
-   @ref imgcodecs "imgcodecs" module, which provides functions for reading and writing
-   @ref highgui "highgui" module, as this contains the functions to show an image in a window

We also include the _iostream_ to facilitate console line output and input.

By declaring `using namespace cv;`, in the following, the library functions can be accessed without explicitly stating the namespace.

@snippet cpp/tutorial\_code/introduction/display\_image/display\_image.cpp includes @end\_toggle

@add\_toggle\_python As a first step, the OpenCV python library is imported. The proper way to do this is to additionally assign it the name _cv_, which is used in the following to reference the library.

@snippet samples/python/tutorial\_code/introduction/display\_image/display\_image.py imports @end\_toggle

Now, let's analyze the main code. As a first step, we read the image "starry\_night.jpg" from the OpenCV samples. In order to do so, a call to the @ref cv::imread function loads the image using the file path specified by the first argument. The second argument is optional and specifies the format in which we want the image. This may be:

-   IMREAD\_COLOR loads the image in the BGR 8-bit format. This is the **default** that is used here.
-   IMREAD\_UNCHANGED loads the image as is (including the alpha channel if present)
-   IMREAD\_GRAYSCALE loads the image as an intensity one

After reading in the image data will be stored in a @ref cv::Mat object.

@add\_toggle\_cpp @snippet cpp/tutorial\_code/introduction/display\_image/display\_image.cpp imread @end\_toggle

@add\_toggle\_python @snippet samples/python/tutorial\_code/introduction/display\_image/display\_image.py imread @end\_toggle

@note OpenCV offers support for the image formats Windows bitmap (bmp), portable image formats (pbm, pgm, ppm) and Sun raster (sr, ras). With help of plugins (you need to specify to use them if you build yourself the library, nevertheless in the packages we ship present by default) you may also load image formats like JPEG (jpeg, jpg, jpe), JPEG 2000 (jp2 - codenamed in the CMake as Jasper), TIFF files (tiff, tif) and portable network graphics (png). Furthermore, OpenEXR is also a possibility.

Afterwards, a check is executed, if the image was loaded correctly. @add\_toggle\_cpp @snippet cpp/tutorial\_code/introduction/display\_image/display\_image.cpp empty @end\_toggle

@add\_toggle\_python @snippet samples/python/tutorial\_code/introduction/display\_image/display\_image.py empty @end\_toggle

Then, the image is shown using a call to the @ref cv::imshow function. The first argument is the title of the window and the second argument is the @ref cv::Mat object that will be shown.

Because we want our window to be displayed until the user presses a key (otherwise the program would end far too quickly), we use the @ref cv::waitKey function whose only parameter is just how long should it wait for a user input (measured in milliseconds). Zero means to wait forever. The return value is the key that was pressed.

@add\_toggle\_cpp @snippet cpp/tutorial\_code/introduction/display\_image/display\_image.cpp imshow @end\_toggle

@add\_toggle\_python @snippet samples/python/tutorial\_code/introduction/display\_image/display\_image.py imshow @end\_toggle

In the end, the image is written to a file if the pressed key was the "s"-key. For this the cv::imwrite function is called that has the file path and the cv::Mat object as an argument.

@add\_toggle\_cpp @snippet cpp/tutorial\_code/introduction/display\_image/display\_image.cpp imsave @end\_toggle

@add\_toggle\_python @snippet samples/python/tutorial\_code/introduction/display\_image/display\_image.py imsave @end\_toggle

## [Documentation Tutorial](https://docharvest.github.io/docs/opencv5/tutorials/introduction/documenting_opencv/documentation_tutorial/)


## [Env Reference](https://docharvest.github.io/docs/opencv5/tutorials/introduction/env_reference/env_reference/)

Contents

opencv5

Env Reference

OpenCV 5

Env Reference

# OpenCV environment variables reference {#tutorial\_env\_reference}

@prev\_tutorial{tutorial\_config\_reference} @next\_tutorial{tutorial\_linux\_install}

@tableofcontents

## Introduction

OpenCV can change its behavior depending on the runtime environment:

-   enable extra debugging output or performance tracing
-   modify default locations and search paths
-   tune some algorithms or general behavior
-   enable or disable workarounds, safety features and optimizations

**Notes:**

-   ⭐ marks most popular variables
-   variables with names like this `VAR_${NAME}` describes family of variables, where `${NAME}` should be changed to one of predefined values, e.g. `VAR_TBB`, `VAR_OPENMP`, ...

### Setting environment variable in Windows

In terminal or cmd-file (bat-file):

```
set MY_ENV_VARIABLE=true
C:\my_app.exe
```

In GUI:

-   Go to "Settings -> System -> About"
-   Click on "Advanced system settings" in the right part
-   In new window click on the "Environment variables" button
-   Add an entry to the "User variables" list

### Setting environment variable in Linux

In terminal or shell script:

```
export MY_ENV_VARIABLE=true
./my_app
```

or as a single command:

```
MY_ENV_VARIABLE=true ./my_app
```

### Setting environment variable in Python

```
import os
os.environ["MY_ENV_VARIABLE"] = "True" # value must be a string
import cv2 # variables set after this may not have effect
```

@note This method may not work on all operating systems and/or Python distributions. For example, it works on Ubuntu Linux with system Python interpreter, but doesn't work on Windows 10 with the official Python package. It depends on the ability of a process to change its own environment (OpenCV uses `getenv` from C++ runtime to read variables).

@note See also:

-   [https://docs.python.org/3.12/library/os.html#os.environ](https://docs.python.org/3.12/library/os.html#os.environ)
-   [https://stackoverflow.com/questions/69199708/setenvironmentvariable-does-not-seem-to-set-values-that-can-be-retrieved-by-ge](https://stackoverflow.com/questions/69199708/setenvironmentvariable-does-not-seem-to-set-values-that-can-be-retrieved-by-ge)

## Types

-   _bool_ - `1`, `True`, `true`, `TRUE` / `0`, `False`, `false`, `FALSE`
-   _number_/_size_ - unsigned number, suffixes `MB`, `Mb`, `mb`, `KB`, `Kb`, `kb`
-   _string_ - plain string or can have a structure
-   _path_ - to file, to directory
-   _paths_ - `;`\-separated on Windows, `:`\-separated on others

## General, core

name

type

default

description

OPENCV\_SKIP\_CPU\_BASELINE\_CHECK

bool

false

do not check that current CPU supports all features used by the build (baseline)

OPENCV\_CPU\_DISABLE

`,` or `;`\-separated

disable code branches which use CPU features (dispatched code)

OPENCV\_SETUP\_TERMINATE\_HANDLER

bool

true (Windows)

use std::set\_terminate to install own termination handler

OPENCV\_LIBVA\_RUNTIME

file path

libva for VA interoperability utils

OPENCV\_ENABLE\_MEMALIGN

bool

true (except static analysis, memory sanitizer, fuzzying, \_WIN32?)

enable aligned memory allocations

OPENCV\_BUFFER\_AREA\_ALWAYS\_SAFE

bool

false

enable safe mode for multi-buffer allocations (each buffer separately)

OPENCV\_KMEANS\_PARALLEL\_GRANULARITY

num

1000

tune algorithm parallel work distribution parameter `parallel_for_(..., ..., ..., granularity)`

OPENCV\_DUMP\_ERRORS

bool

true (Debug or Android), false (others)

print extra information on exception (log to Android)

OPENCV\_DUMP\_CONFIG

bool

false

print build configuration to stderr (`getBuildInformation`)

OPENCV\_PYTHON\_DEBUG

bool

false

enable extra warnings in Python bindings

OPENCV\_TEMP\_PATH

path

`/tmp/` (Linux), `/data/local/tmp/` (Android), `GetTempPathA` (Windows)

directory for temporary files

OPENCV\_DATA\_PATH\_HINT

paths

paths for findDataFile

OPENCV\_DATA\_PATH

paths

paths for findDataFile

OPENCV\_SAMPLES\_DATA\_PATH\_HINT

paths

paths for findDataFile

OPENCV\_SAMPLES\_DATA\_PATH

paths

paths for findDataFile

Links:

-   [https://github.com/opencv/opencv/wiki/CPU-optimizations-build-options](https://github.com/opencv/opencv/wiki/CPU-optimizations-build-options)

## Logging

name

type

default

description

⭐ OPENCV\_LOG\_LEVEL

string

logging level (see accepted values below)

OPENCV\_LOG\_TIMESTAMP

bool

true

logging with timestamps

OPENCV\_LOG\_TIMESTAMP\_NS

bool

false

add nsec to logging timestamps

### Levels

-   `0`, `O`, `OFF`, `S`, `SILENT`, `DISABLE`, `DISABLED`
-   `F`, `FATAL`
-   `E`, `ERROR`
-   `W`, `WARNING`, `WARN`, `WARNINGS`
-   `I`, `INFO`
-   `D`, `DEBUG`
-   `V`, `VERBOSE`

## core/parallel\_for

name

type

default

description

⭐ OPENCV\_FOR\_THREADS\_NUM

num

0

set number of threads

OPENCV\_THREAD\_POOL\_ACTIVE\_WAIT\_PAUSE\_LIMIT

num

16

tune pthreads parallel\_for backend

OPENCV\_THREAD\_POOL\_ACTIVE\_WAIT\_WORKER

num

2000

tune pthreads parallel\_for backend

OPENCV\_THREAD\_POOL\_ACTIVE\_WAIT\_MAIN

num

10000

tune pthreads parallel\_for backend

OPENCV\_THREAD\_POOL\_ACTIVE\_WAIT\_THREADS\_LIMIT

num

0

tune pthreads parallel\_for backend

OPENCV\_FOR\_OPENMP\_DYNAMIC\_DISABLE

bool

false

Removed in 4.13.0. Use standard [OMP\_DYNAMIC](https://www.openmp.org/spec-html/5.0/openmpsu116.html) instead

## backends

Some modules have multiple available backends, following variables allow choosing specific backend or changing default priorities in which backends will be probed (e.g. when opening a video file).

name

type

default

description

OPENCV\_PARALLEL\_BACKEND

string

choose specific paralel\_for backend (one of `TBB`, `ONETBB`, `OPENMP`)

OPENCV\_PARALLEL\_PRIORITY\_${NAME}

num

set backend priority, default is 1000

OPENCV\_PARALLEL\_PRIORITY\_LIST

string, `,`\-separated

list of backends in priority order

OPENCV\_UI\_BACKEND

string

choose highgui backend for window rendering (one of `GTK`, `GTK3`, `GTK2`, `QT`, `WIN32`)

OPENCV\_UI\_PRIORITY\_${NAME}

num

set highgui backend priority, default is 1000

OPENCV\_UI\_PRIORITY\_LIST

string, `,`\-separated

list of highgui backends in priority order

OPENCV\_VIDEOIO\_PRIORITY\_${NAME}

num

set videoio backend priority, default is 1000

OPENCV\_VIDEOIO\_PRIORITY\_LIST

string, `,`\-separated

list of videoio backends in priority order

## plugins

Some external dependencies can be detached into a dynamic library, which will be loaded at runtime (plugin). Following variables allow changing default search locations and naming pattern for these plugins.

name

type

default

description

OPENCV\_CORE\_PLUGIN\_PATH

paths

directories to search for _core_ plugins

OPENCV\_CORE\_PARALLEL\_PLUGIN\_${NAME}

string, glob

parallel\_for plugin library name (glob), e.g. default for TBB is "opencv\_core\_parallel\_tbb\*.so"

OPENCV\_DNN\_PLUGIN\_PATH

paths

directories to search for _dnn_ plugins

OPENCV\_DNN\_PLUGIN\_${NAME}

string, glob

parallel\_for plugin library name (glob), e.g. default for TBB is "opencv\_core\_parallel\_tbb\*.so"

OPENCV\_CORE\_PLUGIN\_PATH

paths

directories to search for _highgui_ plugins (YES it is CORE)

OPENCV\_UI\_PLUGIN\_${NAME}

string, glob

_highgui_ plugin library name (glob)

OPENCV\_VIDEOIO\_PLUGIN\_PATH

paths

directories to search for _videoio_ plugins

OPENCV\_VIDEOIO\_PLUGIN\_${NAME}

string, glob

_videoio_ plugin library name (glob)

## OpenCL

**Note:** OpenCL device specification format is `<Platform>:<CPU|GPU|ACCELERATOR|nothing=GPU/CPU>:<deviceName>`, e.g. `AMD:GPU:`

name

type

default

description

OPENCV\_OPENCL\_RUNTIME

filepath or `disabled`

path to OpenCL runtime library (e.g. `OpenCL.dll`, `libOpenCL.so`)

⭐ OPENCV\_OPENCL\_DEVICE

string or `disabled`

choose specific OpenCL device. See specification format in the note above. See more details in the Links section.

OPENCV\_OPENCL\_RAISE\_ERROR

bool

false

raise exception if something fails during OpenCL kernel preparation and execution (Release builds only)

OPENCV\_OPENCL\_ABORT\_ON\_BUILD\_ERROR

bool

false

abort if OpenCL kernel compilation failed

OPENCV\_OPENCL\_CACHE\_ENABLE

bool

true

enable OpenCL kernel cache

OPENCV\_OPENCL\_CACHE\_WRITE

bool

true

allow writing to the cache, otherwise cache will be read-only

OPENCV\_OPENCL\_CACHE\_LOCK\_ENABLE

bool

true

use .lock files to synchronize between multiple applications using the same OpenCL cache (may not work on network drives)

OPENCV\_OPENCL\_CACHE\_CLEANUP

bool

true

automatically remove old entries from cache (leftovers from older OpenCL runtimes)

OPENCV\_OPENCL\_VALIDATE\_BINARY\_PROGRAMS

bool

false

validate loaded binary OpenCL kernels

OPENCV\_OPENCL\_DISABLE\_BUFFER\_RECT\_OPERATIONS

bool

true (Apple), false (others)

enable workaround for non-continuos data downloads

OPENCV\_OPENCL\_BUILD\_EXTRA\_OPTIONS

string

pass extra options to OpenCL kernel compilation

OPENCV\_OPENCL\_ENABLE\_MEM\_USE\_HOST\_PTR

bool

true

workaround/optimization for buffer allocation

OPENCV\_OPENCL\_ALIGNMENT\_MEM\_USE\_HOST\_PTR

num

4

parameter for OPENCV\_OPENCL\_ENABLE\_MEM\_USE\_HOST\_PTR

OPENCV\_OPENCL\_DEVICE\_MAX\_WORK\_GROUP\_SIZE

num

0

allow to decrease maxWorkGroupSize

OPENCV\_OPENCL\_PROGRAM\_CACHE

num

0

limit number of programs in OpenCL kernel cache

OPENCV\_OPENCL\_RAISE\_ERROR\_REUSE\_ASYNC\_KERNEL

bool

false

raise exception if async kernel failed

OPENCV\_OPENCL\_BUFFERPOOL\_LIMIT

num

1 << 27 (Intel device), 0 (others)

limit memory used by buffer bool

OPENCV\_OPENCL\_HOST\_PTR\_BUFFERPOOL\_LIMIT

num

same as OPENCV\_OPENCL\_BUFFERPOOL\_LIMIT, but for HOST\_PTR buffers

OPENCV\_OPENCL\_BUFFER\_FORCE\_MAPPING

bool

false

force clEnqueueMapBuffer

OPENCV\_OPENCL\_BUFFER\_FORCE\_COPYING

bool

false

force clEnqueueReadBuffer/clEnqueueWriteBuffer

OPENCV\_OPENCL\_FORCE

bool

false

force running OpenCL kernel even if usual conditions are not met (e.g. dst.isUMat)

OPENCV\_OPENCL\_PERF\_CHECK\_BYPASS

bool

false

force running OpenCL kernel even if usual performance-related conditions are not met (e.g. image is very small)

### SVM (Shared Virtual Memory) - disabled by default

name

type

default

description

OPENCV\_OPENCL\_SVM\_DISABLE

bool

false

disable SVM

OPENCV\_OPENCL\_SVM\_FORCE\_UMAT\_USAGE

bool

false

OPENCV\_OPENCL\_SVM\_DISABLE\_UMAT\_USAGE

bool

false

OPENCV\_OPENCL\_SVM\_CAPABILITIES\_MASK

num

OPENCV\_OPENCL\_SVM\_BUFFERPOOL\_LIMIT

num

same as OPENCV\_OPENCL\_BUFFERPOOL\_LIMIT, but for SVM buffers

### Links:

-   [https://github.com/opencv/opencv/wiki/OpenCL-optimizations](https://github.com/opencv/opencv/wiki/OpenCL-optimizations)

## Tracing/Profiling

name

type

default

description

⭐ OPENCV\_TRACE

bool

false

enable trace

OPENCV\_TRACE\_LOCATION

string

`OpenCVTrace`

trace file name ("${name}-$03d.txt")

OPENCV\_TRACE\_DEPTH\_OPENCV

num

1

OPENCV\_TRACE\_MAX\_CHILDREN\_OPENCV

num

1000

OPENCV\_TRACE\_MAX\_CHILDREN

num

1000

OPENCV\_TRACE\_SYNC\_OPENCL

bool

false

wait for OpenCL kernels to finish

OPENCV\_TRACE\_ITT\_ENABLE

bool

true

OPENCV\_TRACE\_ITT\_PARENT

bool

false

set parentID for ITT task

OPENCV\_TRACE\_ITT\_SET\_THREAD\_NAME

bool

false

set name for OpenCV's threads "OpenCVThread-%03d"

### Links:

-   [https://github.com/opencv/opencv/wiki/Profiling-OpenCV-Applications](https://github.com/opencv/opencv/wiki/Profiling-OpenCV-Applications)

## Cache

**Note:** Default tmp location is `%TMPDIR%` (Windows); `$XDG_CACHE_HOME`, `$HOME/.cache`, `/var/tmp`, `/tmp` (others)

name

type

default

description

OPENCV\_CACHE\_SHOW\_CLEANUP\_MESSAGE

bool

true

show cache cleanup message

OPENCV\_DOWNLOAD\_CACHE\_DIR

path

default tmp location

cache directory for downloaded files (subdirectory `downloads`)

OPENCV\_DNN\_IE\_GPU\_CACHE\_DIR

path

default tmp location

cache directory for OpenVINO OpenCL kernels (subdirectory `dnn_ie_cache_${device}`)

OPENCV\_OPENCL\_CACHE\_DIR

path

default tmp location

cache directory for OpenCL kernels cache (subdirectory `opencl_cache`)

## dnn

**Note:** In the table below `dump_base_name` equals to `ocv_dnn_net_%05d_%02d` where first argument is internal network ID and the second - dump level.

name

type

default

description

OPENCV\_DNN\_BACKEND\_DEFAULT

num

3 (OpenCV)

set default DNN backend, see dnn.hpp for backends enumeration

OPENCV\_DNN\_NETWORK\_DUMP

num

0

level of information dumps, 0 - no dumps (default file name `${dump_base_name}.dot`)

OPENCV\_DNN\_DISABLE\_MEMORY\_OPTIMIZATIONS

bool

false

OPENCV\_DNN\_CHECK\_NAN\_INF

bool

false

check for NaNs in layer outputs

OPENCV\_DNN\_CHECK\_NAN\_INF\_DUMP

bool

false

print layer data when NaN check has failed

OPENCV\_DNN\_CHECK\_NAN\_INF\_RAISE\_ERROR

bool

false

also raise exception when NaN check has failed

OPENCV\_DNN\_ONNX\_USE\_LEGACY\_NAMES

bool

false

use ONNX node names as-is instead of "onnx\_node!${node\_name}"

OPENCV\_DNN\_CUSTOM\_ONNX\_TYPE\_INCLUDE\_DOMAIN\_NAME

bool

true

prepend layer domain to layer types ("domain.type")

OPENCV\_VULKAN\_RUNTIME

file path

set location of Vulkan runtime library for DNN Vulkan backend

OPENCV\_DNN\_IE\_SERIALIZE

bool

false

dump intermediate OpenVINO graph (default file names `${dump_base_name}_ngraph.xml`, `${dump_base_name}_ngraph.bin`)

OPENCV\_DNN\_IE\_EXTRA\_PLUGIN\_PATH

path

path to extra OpenVINO plugins

OPENCV\_DNN\_IE\_VPU\_TYPE

string

Force using specific OpenVINO VPU device type ("Myriad2" or "MyriadX")

OPENCV\_TEST\_DNN\_IE\_VPU\_TYPE

string

same as OPENCV\_DNN\_IE\_VPU\_TYPE, but for tests

OPENCV\_DNN\_INFERENCE\_ENGINE\_HOLD\_PLUGINS

bool

true

always hold one existing OpenVINO instance to avoid crashes on unloading

OPENCV\_DNN\_INFERENCE\_ENGINE\_CORE\_LIFETIME\_WORKAROUND

bool

true (Windows), false (other)

another OpenVINO lifetime workaround

OPENCV\_DNN\_OPENCL\_ALLOW\_ALL\_DEVICES

bool

false

allow running on CPU devices, allow FP16 on non-Intel device

OPENCV\_OCL4DNN\_CONVOLUTION\_IGNORE\_INPUT\_DIMS\_4\_CHECK

bool

false

workaround for OpenCL backend, see [https://github.com/opencv/opencv/issues/20833](https://github.com/opencv/opencv/issues/20833)

OPENCV\_OCL4DNN\_WORKAROUND\_IDLF

bool

true

another workaround for OpenCL backend

OPENCV\_OCL4DNN\_CONFIG\_PATH

path

path to kernel configuration cache for auto-tuning (must be existing directory), set this variable to enable auto-tuning

OPENCV\_OCL4DNN\_DISABLE\_AUTO\_TUNING

bool

false

disable auto-tuning

OPENCV\_OCL4DNN\_FORCE\_AUTO\_TUNING

bool

false

force auto-tuning

OPENCV\_OCL4DNN\_TEST\_ALL\_KERNELS

num

0

test convolution kernels, number of iterations (auto-tuning)

OPENCV\_OCL4DNN\_DUMP\_FAILED\_RESULT

bool

false

dump extra information on errors (auto-tuning)

OPENCV\_OCL4DNN\_TUNING\_RAISE\_CHECK\_ERROR

bool

false

raise exception on errors (auto-tuning)

## Tests

name

type

default

description

⭐ OPENCV\_TEST\_DATA\_PATH

dir path

set test data search location (e.g. `/home/user/opencv_extra/testdata`)

⭐ OPENCV\_DNN\_TEST\_DATA\_PATH

dir path

`$OPENCV_TEST_DATA_PATH/dnn`

set DNN model search location for tests (used by _dnn_, _gapi_, _objdetect_, _video_ modules)

OPENCV\_OPEN\_MODEL\_ZOO\_DATA\_PATH

dir path

`$OPENCV_DNN_TEST_DATA_PATH/omz_intel_models`

set OpenVINO models search location for tests (used by _dnn_, _gapi_ modules)

INTEL\_CVSDK\_DIR

some _dnn_ tests can search OpenVINO models here too

OPENCV\_TEST\_DEBUG

num

0

debug level for tests, same as `--test_debug` (0 - no debug (default), 1 - basic test debug information, >1 - extra debug information)

OPENCV\_TEST\_REQUIRE\_DATA

bool

false

same as `--test_require_data` option (fail on missing non-required test data instead of skip)

OPENCV\_TEST\_CHECK\_OPTIONAL\_DATA

bool

false

assert when optional data is not found

OPENCV\_IPP\_CHECK

bool

false

default value for `--test_ipp_check` and `--perf_ipp_check`

OPENCV\_PERF\_VALIDATION\_DIR

dir path

location of files read/written by `--perf_read_validation_results`/`--perf_write_validation_results`

⭐ OPENCV\_PYTEST\_FILTER

string (glob)

test filter for Python tests

### Links:

-   [https://github.com/opencv/opencv/wiki/QA\_in\_OpenCV](https://github.com/opencv/opencv/wiki/QA_in_OpenCV)

## videoio

**Note:** extra FFmpeg options should be pased in form `key;value|key;value|key;value`, for example `hwaccel;cuvid|video_codec;h264_cuvid|vsync;0` or `vcodec;x264|vprofile;high|vlevel;4.0`

name

type

default

description

⭐ OPENCV\_FFMPEG\_CAPTURE\_OPTIONS

string (see note)

extra options for VideoCapture FFmpeg backend

⭐ OPENCV\_FFMPEG\_WRITER\_OPTIONS

string (see note)

extra options for VideoWriter FFmpeg backend

OPENCV\_FFMPEG\_THREADS

num

set FFmpeg thread count

OPENCV\_FFMPEG\_DEBUG

bool

false

enable logging messages from FFmpeg

OPENCV\_FFMPEG\_LOGLEVEL

num

set FFmpeg logging level

OPENCV\_FFMPEG\_SKIP\_LOG\_CALLBACK

bool

false

do not install OpenCV's FFmpeg log callback (preserve default/user callback)

OPENCV\_FFMPEG\_DLL\_DIR

dir path

directory with FFmpeg plugin (legacy)

OPENCV\_FFMPEG\_IS\_THREAD\_SAFE

bool

false

enabling this option will turn off thread safety locks in the FFmpeg backend (use only if you are sure FFmpeg is built with threading support, tested on Linux)

OPENCV\_FFMPEG\_READ\_ATTEMPTS

num

4096

number of failed `av_read_frame` attempts before failing read procedure

OPENCV\_FFMPEG\_DECODE\_ATTEMPTS

num

64

number of failed `avcodec_receive_frame` attempts before failing decoding procedure

OPENCV\_VIDEOIO\_GSTREAMER\_CALL\_DEINIT

bool

false

close GStreamer instance on end

OPENCV\_VIDEOIO\_GSTREAMER\_START\_MAINLOOP

bool

false

start GStreamer loop in separate thread

OPENCV\_VIDEOIO\_MFX\_IMPL

num

set specific MFX implementation (see MFX docs for enumeration)

OPENCV\_VIDEOIO\_MFX\_EXTRA\_SURFACE\_NUM

num

1

add extra surfaces to the surface pool

OPENCV\_VIDEOIO\_MFX\_POOL\_TIMEOUT

num

1

timeout for waiting for free surface from the pool (in seconds)

OPENCV\_VIDEOIO\_MFX\_BITRATE\_DIVISOR

num

300

this option allows to tune encoding bitrate (video quality/size)

OPENCV\_VIDEOIO\_MFX\_WRITER\_TIMEOUT

num

1

timeout for encoding operation (in seconds)

OPENCV\_VIDEOIO\_MSMF\_ENABLE\_HW\_TRANSFORMS

bool

true

allow HW-accelerated transformations (DXVA) in MediaFoundation processing graph (may slow down camera probing process)

OPENCV\_DSHOW\_DEBUG

bool

false

enable verbose logging in the DShow backend

OPENCV\_DSHOW\_SAVEGRAPH\_FILENAME

file path

enable processing graph tump in the DShow backend

OPENCV\_VIDEOIO\_V4L\_RANGE\_NORMALIZED

bool

false

use (0, 1) range for properties (V4L)

OPENCV\_VIDEOIO\_V4L\_SELECT\_TIMEOUT

num

10

timeout for select call (in seconds) (V4L)

OPENCV\_VIDEOCAPTURE\_DEBUG

bool

false

enable debug messages for VideoCapture

OPENCV\_VIDEOWRITER\_DEBUG

bool

false

enable debug messages for VideoWriter

⭐ OPENCV\_VIDEOIO\_DEBUG

bool

false

debug messages for both VideoCapture and VideoWriter

### videoio tests

name

type

default

description

OPENCV\_TEST\_VIDEOIO\_BACKEND\_REQUIRE\_FFMPEG

bool

false

test app will exit if no FFmpeg backend is available

OPENCV\_TEST\_V4L2\_VIVID\_DEVICE

file path

path to VIVID virtual camera device for V4L2 test (e.g. `/dev/video5`)

OPENCV\_TEST\_PERF\_CAMERA\_LIST

paths

cameras to use in performance test (waitAny\_V4L test)

OPENCV\_TEST\_CAMERA\_%d\_FPS

num

fps to set for N-th camera (0-based index) (waitAny\_V4L test)

## highgui

name

type

default

description

$XDG\_RUNTIME\_DIR

Wayland backend specific - create shared memory-mapped file for interprocess communication (named `opencv-shared-??????`)

OPENCV\_HIGHGUI\_FB\_MODE

string

`FB`

Selects output mode for the framebuffer backend (`FB` - regular frambuffer, `EMU` - emulation, perform internal checks but does nothing, `XVFB` - compatible with _xvfb_ virtual frambuffer)

OPENCV\_HIGHGUI\_FB\_DEVICE

file path

Path to frambuffer device to use (will be checked first)

FRAMEBUFFER

file path

`/dev/fb0`

Same as OPENCV\_HIGHGUI\_FB\_DEVICE, commonly used variable for the same purpose (will be checked second)

## imgproc

name

type

default

description

OPENCV\_OPENCL\_IMGPROC\_MORPH\_SPECIAL\_KERNEL

bool

true (Apple), false (others)

use special OpenCL kernel for small morph kernel (Intel devices)

OPENCV\_GAUSSIANBLUR\_CHECK\_BITEXACT\_KERNELS

bool

false

validate Gaussian kernels before running (src is CV\_16U, bit-exact version)

## imgcodecs

name

type

default

description

OPENCV\_IMGCODECS\_AVIF\_MAX\_FILE\_SIZE

num

64MB

limit input AVIF size

OPENCV\_IMGCODECS\_WEBP\_MAX\_FILE\_SIZE

num

64MB

limit input WEBM size

OPENCV\_IO\_MAX\_IMAGE\_PARAMS

num

50

limit maximum allowed number of parameters in imwrite and imencode

OPENCV\_IO\_MAX\_IMAGE\_WIDTH

num

1 << 20, limit input image size to avoid large memory allocations

OPENCV\_IO\_MAX\_IMAGE\_HEIGHT

num

1 << 20

OPENCV\_IO\_MAX\_IMAGE\_PIXELS

num

1 << 30

OPENCV\_IO\_ENABLE\_JASPER

bool

true (set build option OPENCV\_IO\_FORCE\_JASPER), false (otherwise)

enable Jasper backend

@note OPENCV\_IO\_ENABLE\_OPENEXR is deprecated because bundled OpenEXR library had been removed.

## [General Install](https://docharvest.github.io/docs/opencv5/tutorials/introduction/general_install/general_install/)

Contents

opencv5

General Install

OpenCV 5

General Install

# OpenCV installation overview {#tutorial\_general\_install}

@next\_tutorial{tutorial\_config\_reference}

@tableofcontents

There are two ways of installing OpenCV on your machine: download prebuilt version for your platform or compile from sources.

# Prebuilt version {#tutorial\_general\_install\_prebuilt}

In many cases you can find prebuilt version of OpenCV that will meet your needs.

## Packages by OpenCV core team {#tutorial\_general\_install\_prebuilt\_core}

Packages for Android, iOS and Windows built with default parameters and recent compilers are published for each release, they do not contain _opencv\_contrib_ modules.

-   GitHub releases: [https://github.com/opencv/opencv/releases](https://github.com/opencv/opencv/releases)
-   SourceForge.net: [https://sourceforge.net/projects/opencvlibrary/files/](https://sourceforge.net/projects/opencvlibrary/files/)

## Third-party packages {#tutorial\_general\_install\_prebuilt\_thirdparty}

Other organizations and people maintain their own binary distributions of OpenCV. For example:

-   System packages in popular Linux distributions ([https://pkgs.org/search/?q=opencv](https://pkgs.org/search/?q=opencv))
-   PyPI ([https://pypi.org/search/?q=opencv](https://pypi.org/search/?q=opencv))
-   Conda ([https://anaconda.org/search?q=opencv](https://anaconda.org/search?q=opencv))
-   Conan ([https://conan.io/center/recipes/opencv](https://conan.io/center/recipes/opencv))
-   vcpkg ([https://github.com/microsoft/vcpkg/tree/master/ports/opencv](https://github.com/microsoft/vcpkg/tree/master/ports/opencv))
-   NuGet ([https://www.nuget.org/packages?q=opencv](https://www.nuget.org/packages?q=opencv))
-   Brew ([https://formulae.brew.sh/formula/opencv](https://formulae.brew.sh/formula/opencv))
-   Maven ([https://search.maven.org/search?q=opencv](https://search.maven.org/search?q=opencv))

# Build from sources {#tutorial\_general\_install\_sources}

It can happen that existing binary packages are not applicable for your use case, then you'll have to build custom version of OpenCV by yourself. This section gives a high-level overview of the build process, check tutorial for specific platform for actual build instructions.

OpenCV uses [CMake](https://cmake.org/) build management system for configuration and build, so this section mostly describes generalized process of building software with CMake.

## Step 0: Prerequisites {#tutorial\_general\_install\_sources\_0}

Install C++ compiler and build tools. On \*NIX platforms it is usually GCC/G++ or Clang compiler and Make or Ninja build tool. On Windows it can be Visual Studio IDE or MinGW-w64 compiler. Native toolchains for Android are provided in the Android NDK. XCode IDE is used to build software for OSX and iOS platforms.

Install CMake from the official site or some other source.

Get other third-party dependencies: libraries with extra functionality like decoding videos or showing GUI elements; libraries providing optimized implementations of selected algorithms; tools used for documentation generation and other extras. Check @ref tutorial\_config\_reference for available options and corresponding dependencies.

## Step 1: Get software sources {#tutorial\_general\_install\_sources\_1}

Typical software project consists of one or several code repositories. OpenCV have two repositories with code: _opencv_ - main repository with stable and actively supported algorithms and _opencv\_contrib_ which contains experimental and non-free (patented) algorithms; and one repository with test data: _opencv\_extra_.

You can download a snapshot of repository in form of an archive or clone repository with full history.

To download snapshot archives:

-   Go to [https://github.com/opencv/opencv/releases](https://github.com/opencv/opencv/releases) and download "Source code" archive from any release.
-   (optionally) Go to [https://github.com/opencv/opencv\_contrib/releases](https://github.com/opencv/opencv_contrib/releases) and download "Source code" archive for the same release as _opencv_
-   (optionally) Go to [https://github.com/opencv/opencv\_extra/releases](https://github.com/opencv/opencv_extra/releases) and download "Source code" archive for the same release as _opencv_
-   Unpack all archives to some location

To clone repositories run the following commands in console (_git_ [must be installed](https://git-scm.com/book/en/v2/Getting-Started-Installing-Git)):

```
git clone https://github.com/opencv/opencv
git -C opencv checkout <some-tag>

# optionally
git clone https://github.com/opencv/opencv_contrib
git -C opencv_contrib checkout <same-tag-as-opencv>

# optionally
git clone https://github.com/opencv/opencv_extra
git -C opencv_extra checkout <same-tag-as-opencv>
```

@note If you want to build software using more than one repository, make sure all components are compatible with each other. For OpenCV it means that _opencv_ and _opencv\_contrib_ repositories must be checked out at the same tag or that all snapshot archives are downloaded from the same release.

@note When choosing which version to download take in account your target platform and development tools versions, latest versions of OpenCV can have build problems with very old compilers and vice versa. We recommend using latest release and fresh OS/compiler combination.

## Step 2: Configure {#tutorial\_general\_install\_sources\_2}

At this step CMake will verify that all necessary tools and dependencies are available and compatible with the library and will generate intermediate files for the chosen build system. It could be Makefiles, IDE projects and solutions, etc. Usually this step is performed in newly created build directory:

```
cmake -G<generator> <configuration-options> <source-directory>
```

@note `cmake-gui` application allows to see and modify available options using graphical user interface. See [https://cmake.org/runningcmake/](https://cmake.org/runningcmake/) for details.

## Step 3: Build {#tutorial\_general\_install\_sources\_3}

During build process source files are compiled into object files which are linked together or otherwise combined into libraries and applications. This step can be run using universal command:

```
cmake --build <build-directory> <build-options>
```

... or underlying build system can be called directly:

```
make
```

## (optional) Step 3: Install {#tutorial\_general\_install\_sources\_4}

During installation procedure build results and other files from build directory will be copied to the install location. Default installation location is `/usr/local` on UNIX and `C:/Program Files` on Windows. This location can be changed at the configuration step by setting `CMAKE_INSTALL_PREFIX` option. To perform installation run the following command:

```
cmake --build <build-directory> --target install <other-options>
```

@note This step is optional, OpenCV can be used directly from the build directory.

@note If the installation root location is a protected system directory, so the installation process must be run with superuser or administrator privileges (e.g. `sudo cmake ...`).

## (optional) Step 4: Build plugins {#tutorial\_general\_install\_plugins\_4}

It is possible to decouple some of OpenCV dependencies and make them optional by extracting parts of the code into dynamically-loaded plugins. It helps to produce adaptive binary distributions which can work on systems with less dependencies and extend functionality just by installing missing libraries. For now modules _core_, _videoio_ and _highgui_ support this mechanism for some of their dependencies. In some cases it is possible to build plugins together with OpenCV by setting options like `VIDEOIO_PLUGIN_LIST` or `HIGHGUI_PLUGIN_LIST`, more options related to this scenario can be found in the @ref tutorial\_config\_reference. In other cases plugins should be built separately in their own build procedure and this section describes such standalone build process.

@note It is recommended to use compiler, configuration and build options which are compatible to the one used for OpenCV build, otherwise resulting library can refuse to load or cause other runtime problems. Note that some functionality can be limited or work slower when backends are loaded dynamically due to extra barrier between OpenCV and corresponding third-party library.

Build procedure is similar to the main OpenCV build, but you have to use special CMake projects located in corresponding subdirectories, these folders can also contain reference scripts and Docker images. It is important to use `opencv_<module>_<backend>` name prefix for plugins so that loader is able to find them. Each supported prefix can be used to load only one library, however multiple candidates can be probed for a single prefix. For example, you can have _libopencv\_videoio\_ffmpeg\_3.so_ and _libopencv\_videoio\_ffmpeg\_4.so_ plugins and the first one which can be loaded successfully will occupy internal slot and stop probing process. Possible prefixes and project locations are presented in the table below:

module

backends

location

core

parallel\_tbb, parallel\_onetbb, parallel\_openmp

_opencv/modules/core/misc/plugins_

highgui

gtk, gtk2, gtk3

_opencv/modules/highgui/misc/plugins_

videoio

ffmpeg, gstreamer, intel\_mfx, msmf

_opencv/modules/videoio/misc_

Example:

```
# set-up environment for TBB detection, for example:
#   export TBB_DIR=<dir-with-tbb-cmake-config>
cmake -G<generator> \
    -DOPENCV_PLUGIN_NAME=opencv_core_tbb_<suffix> \
    -DOPENCV_PLUGIN_DESTINATION=<dest-folder> \
    -DCMAKE_BUILD_TYPE=<config> \
    <opencv>/modules/core/misc/plugins/parallel_tbb
cmake --build . --config <config>
```

@note On Windows plugins must be linked with existing OpenCV build. Set `OpenCV_DIR` environment or CMake variable to the directory with _OpenCVConfig.cmake_ file, it can be OpenCV build directory or some path in the location where you performed installation.

## [Java Eclipse](https://docharvest.github.io/docs/opencv5/tutorials/introduction/java_eclipse/java_eclipse/)

Contents

opencv5

Java Eclipse

OpenCV 5

Java Eclipse

# Using OpenCV Java with Eclipse {#tutorial\_java\_eclipse}

@prev\_tutorial{tutorial\_java\_dev\_intro} @next\_tutorial{tutorial\_clojure\_dev\_intro}

Original author

Barış Evrim Demiröz

Compatibility

OpenCV >= 3.0

@tableofcontents

@warning This tutorial can contain obsolete information.

Since version 2.4.4 [OpenCV supports Java](http://opencv.org/opencv-java-api.html). In this tutorial I will explain how to setup development environment for using OpenCV Java with Eclipse in **Windows**, so you can enjoy the benefits of garbage collected, very refactorable (rename variable, extract method and whatnot) modern language that enables you to write code with less effort and make less mistakes. Here we go.

## Configuring Eclipse

First, obtain a fresh release of OpenCV [from download page](https://opencv.org/releases) and extract it under a simple location like `C:\OpenCV-2.4.6\`. I am using version 2.4.6, but the steps are more or less the same for other versions.

Now, we will define OpenCV as a user library in Eclipse, so we can reuse the configuration for any project. Launch Eclipse and select Window --> Preferences from the menu.

Navigate under Java --> Build Path --> User Libraries and click New....

Enter a name, e.g. OpenCV-2.4.6, for your new library.

Now select your new user library and click Add External JARs....

Browse through `C:\OpenCV-2.4.6\build\java\` and select opencv-246.jar. After adding the jar, extend the opencv-246.jar and select Native library location and press Edit....

Select External Folder... and browse to select the folder `C:\OpenCV-2.4.6\build\java\x64`. If you have a 32-bit system you need to select the x86 folder instead of x64.

Your user library configuration should look like this:

## Testing the configuration on a new Java project

Now start creating a new Java project.

On the Java Settings step, under Libraries tab, select Add Library... and select OpenCV-2.4.6, then click Finish.

Libraries should look like this:

Now you have created and configured a new Java project it is time to test it. Create a new java file. Here is a starter code for your convenience: @code{.java} import org.opencv.core.Core; import org.opencv.core.CvType; import org.opencv.core.Mat;

public class Hello { public static void main( String\[\] args ) { System.loadLibrary( Core.NATIVE\_LIBRARY\_NAME ); Mat mat = Mat.eye( 3, 3, CvType.CV\_8UC1 ); System.out.println( "mat = " + mat.dump() ); } } @endcode When you run the code you should see 3x3 identity matrix as output.

That is it, whenever you start a new project just add the OpenCV user library that you have defined to your project and you are good to go. Enjoy your powerful, less painful development environment :)

## Running Java code with OpenCV and MKL dependency

You may get the following error (e.g. on Ubuntu) if you have built OpenCV with MKL library with some Java code that calls OpenCV functions that use Intel MKL:

> Intel MKL FATAL ERROR: Cannot load libmkl\_avx2.so or libmkl\_def.so.

One solution to solve this on Linux consists in preloading the Intel MKL library (either run the command in a terminal or add it to your `.bashrc` file). Your command line should be something similar to this (add `$LD_PRELOAD:` before if you have already set the `LD_PRELOAD` variable):

> export LD\_PRELOAD=/opt/intel/mkl/lib/intel64/libmkl\_core.so:/opt/intel/mkl/lib/intel64/libmkl\_sequential.so

Then, run the Eclipse IDE from a terminal that have this environment variable set (`echo $LD_PRELOAD`) and the error should disappear.

## [Linux Eclipse](https://docharvest.github.io/docs/opencv5/tutorials/introduction/linux_eclipse/linux_eclipse/)


## [Linux Gcc Cmake](https://docharvest.github.io/docs/opencv5/tutorials/introduction/linux_gcc_cmake/linux_gcc_cmake/)


## [Linux Gdb Pretty Printer](https://docharvest.github.io/docs/opencv5/tutorials/introduction/linux_gdb_pretty_printer/linux_gdb_pretty_printer/)

Contents

opencv5

Linux Gdb Pretty Printer

OpenCV 5

Linux Gdb Pretty Printer

# Using OpenCV with gdb-powered IDEs {#tutorial\_linux\_gdb\_pretty\_printer}

@prev\_tutorial{tutorial\_oneapi\_install} @next\_tutorial{tutorial\_linux\_gcc\_cmake}

Original author

Egor Smirnov

Compatibility

OpenCV >= 4.0

@tableofcontents

# Capabilities {#tutorial\_linux\_gdb\_pretty\_printer\_capabilities}

This pretty-printer can show element type, `is_continuous`, `is_submatrix` flags and (possibly truncated) matrix. It is known to work in Clion, VS Code and gdb.

# Installation {#tutorial\_linux\_gdb\_pretty\_printer\_installation}

Move into `opencv/samples/gdb/`. Place `mat_pretty_printer.py` in a convenient place, rename `gdbinit` to `.gdbinit` and move it into your home folder. Change 'source' line of `.gdbinit` to point to your `mat_pretty_printer.py` path.

In order to check version of python bundled with your gdb, use the following commands from the gdb shell:

```
python
import sys
print(sys.version_info)
end
```

If the version of python 3 installed in your system doesn't match the version in gdb, create a new virtual environment with the exact same version, install `numpy` and change the path to python3 in `.gdbinit` accordingly.

# Usage {#tutorial\_linux\_gdb\_pretty\_printer\_usage}

The fields in a debugger prefixed with `view_` are pseudo-fields added for convenience, the rest are left as is. If you feel that the number of elements in truncated view is too low, you can edit `mat_pretty_printer.py` - `np.set_printoptions` controls everything matrix display-related.

## [Linux Install](https://docharvest.github.io/docs/opencv5/tutorials/introduction/linux_install/linux_install/)


## [Load Save Image](https://docharvest.github.io/docs/opencv5/tutorials/introduction/load_save_image/load_save_image/)

Contents

opencv5

Load Save Image

OpenCV 5

Load Save Image

# Load, Modify, and Save an Image {#tutorial\_load\_save\_image}

Tutorial content has been moved: @ref tutorial\_display\_image

## [Macos Install](https://docharvest.github.io/docs/opencv5/tutorials/introduction/macos_install/macos_install/)

Contents

opencv5

Macos Install

OpenCV 5

Macos Install

# Installation in MacOS {#tutorial\_macos\_install}

@prev\_tutorial{tutorial\_android\_ocl\_intro} @next\_tutorial{tutorial\_arm\_crosscompile\_with\_cmake}

Original author

`@sajarindider`

Compatibility

OpenCV >= 3.4

The following steps have been tested for macOS (Mavericks) but should work with other versions as well.

## Required Packages

-   CMake 3.9 or higher
-   Git
-   Python 3.x and NumPy 1.5 or later

This tutorial will assume you have [Python](https://docs.python.org/3/using/mac.html), [NumPy](https://numpy.org/install/) and [Git](https://git-scm.com/downloads/mac) installed on your machine.

@note

-   macOS up to 12.2 (Monterey): Comes with Python 2.7 pre-installed.
-   macOS 12.3 and later: Python 2.7 has been removed, and no version of Python is included by default.

It is recommended to install the latest version of Python 3.x (at least Python 3.8) for compatibility with the latest OpenCV Python bindings.

@note If you have Xcode and Xcode Command Line Tools installed, Git is already available on your machine.

## Installing CMake

\-# Find the version for your system and download CMake from their release's [page](https://cmake.org/download/)

\-# Install the `.dmg` package and launch it from Applications. That will give you the UI app of CMake

\-# From the CMake app window, choose menu Tools --> How to Install For Command Line Use. Then, follow the instructions from the pop-up there.

\-# The install folder will be `/usr/local/bin/` by default. Complete the installation by choosing Install command line links.

\-# Test that CMake is installed correctly by running:

```
@code{.bash}
cmake --version
@endcode
```

@note You can use [Homebrew](https://brew.sh/) to install CMake with:

```
@code{.bash}
brew install cmake
@endcode
```

## Getting OpenCV Source Code

You can use the latest stable OpenCV version or you can grab the latest snapshot from our [Git repository](https://github.com/opencv/opencv.git).

### Getting the Latest Stable OpenCV Version

-   Go to our [OpenCV releases page](https://opencv.org/releases).
-   Download the source archive of the latest version (e.g., OpenCV 4.x) and unpack it.

### Getting the Cutting-edge OpenCV from the Git Repository

Launch Git client and clone [OpenCV repository](https://github.com/opencv/opencv). If you need modules from [OpenCV contrib repository](https://github.com/opencv/opencv_contrib) then clone it as well.

For example:

```
@code{.bash}
cd ~/<your_working_directory>
git clone https://github.com/opencv/opencv.git
git clone https://github.com/opencv/opencv_contrib.git
@endcode
```

## Building OpenCV from Source Using CMake

\-# Create a temporary directory, which we denote as `build_opencv`, where you want to put the generated Makefiles, project files as well the object files and output binaries and enter there.

```
For example:

@code{.bash}
mkdir build_opencv
cd build_opencv
@endcode

@note It is good practice to keep your source code directories clean. Create the build directory outside of the source tree.
```

\-# Configuring. Run `cmake [<some optional parameters>] <path to the OpenCV source directory>`

```
For example:

@code{.bash}
cmake -DCMAKE_BUILD_TYPE=Release -DBUILD_EXAMPLES=ON ../opencv
@endcode

Alternatively, you can use the CMake GUI (`cmake-gui`):

-   set the OpenCV source code path to, e.g. `/Users/your_username/opencv`
-   set the binary build path to your CMake build directory, e.g. `/Users/your_username/build_opencv`
-   set optional parameters
-   run: "Configure"
-   run: "Generate"
```

\-# Description of some parameters - build type: `-DCMAKE_BUILD_TYPE=Release` (or `Debug`). - include Extra Modules: If you cloned the `opencv_contrib` repository and want to include its modules, set:

```
    @code{.bash}
    -DOPENCV_EXTRA_MODULES_PATH=../opencv_contrib/modules
    @endcode
-   set `-DBUILD_DOCS=ON` for building documents (doxygen is required)
-   set `-DBUILD_EXAMPLES=ON` to build all examples
```

\-# \[optional\] Building python. Set the following python parameters: - `-DPYTHON3_EXECUTABLE=$(which python3)` - `-DPYTHON3_INCLUDE_DIR=$(python3 -c "from sysconfig import get_paths as gp; print(gp()['include'])")` - `-DPYTHON3_NUMPY_INCLUDE_DIRS=$(python3 -c "import numpy; print(numpy.get_include())")`

\-# Build. From build directory execute _make_, it is recommended to do this in several threads

```
For example:

@code{.bash}
make -j$(sysctl -n hw.ncpu) # runs the build using all available CPU cores
@endcode
```

\-# After building, you can install OpenCV system-wide using:

```
@code{.bash}
sudo make install
@endcode
```

\-# To use OpenCV in your CMake-based projects through `find_package(OpenCV)`, specify the `OpenCV_DIR` variable pointing to the build or install directory.

```
For example:

@code{.bash}
cmake -DOpenCV_DIR=~/build_opencv ..
@endcode
```

### Verifying the OpenCV Installation

After building (and optionally installing) OpenCV, you can verify the installation by checking the version using Python:

```
@code{.bash}
python3 -c "import cv2; print(cv2.__version__)"
@endcode
```

This command should output the version of OpenCV you have installed.

@note You can also use a package manager like [Homebrew](https://brew.sh/) or [pip](https://pip.pypa.io/en/stable/) to install releases of OpenCV only (Not the cutting edge).

-   Installing via Homebrew:
    
    For example:
    
    @code{.bash} brew install opencv @endcode
    
-   Installing via pip:
    
    For example:
    
    @code{.bash} pip install opencv-python @endcode
    
    @note To access the extra modules from `opencv_contrib`, install the `opencv-contrib-python` package using `pip install opencv-contrib-python`.

## [Oneapi Install](https://docharvest.github.io/docs/opencv5/tutorials/introduction/oneapi_install/oneapi_install/)

Contents

opencv5

Oneapi Install

OpenCV 5

Oneapi Install

# Building OpenCV with oneAPI {#tutorial\_oneapi\_install}

@prev\_tutorial{tutorial\_linux\_install} @next\_tutorial{tutorial\_linux\_gcc\_cmake}

Original author

Alessandro de Oliveira Faria

Compatibility

OpenCV >= 4.11.0

@tableofcontents

# Quick start {#tutorial\_oneapi\_install\_quick\_start}

**oneAPI** is Intel's open initiative (now also maintained by the UXL Foundation) that combines a specification and a set of toolkits for programming CPUs, GPUs, FPGAs and NPUs with a single code base. The core is the SYCL standard (single-source C++ for parallelism), complemented by high-performance libraries — oneTBB (parallelism), oneMKL (linear algebra), oneDNN (neural networks), oneVPL (video), etc. Thus, when you compile with oneAPI's DPC++ (icpx) compiler, the binary gains optimized execution paths that choose, at runtime, the best vector instructions or the available device, without changing the source code.

## Why compile OpenCV with the oneAPI ecosystem when targeting the CPU:

-   Simple, because by enabling the CMake options -DWITH\_SYCL=ON -DWITH\_TBB=ON -DWITH\_ONEDNN=ON -DWITH\_IPP=ON and using the icpx compiler, the OpenCV core starts to directly invoke oneAPI libraries.
-   oneDNN replaces the generic kernels of the cv::dnn layer with implementations that exploit AVX2, AVX-512, AMX and VNNI, accelerating convolutions, matmul and network post-processing by up to 3-5× on modern CPUs.
-   oneTBB takes over the thread pool, scheduling filters like cv::resize, cv::GaussianBlur or the G-API pipeline across all cores without busy-wait.
-   IPP (now distributed via oneAPI Base Toolkit) provides optimized intrinsic routines for elementary operations (SAD, DFT, median blur), which OpenCV calls when it encounters the HAVE\_IPP macro.
-   All this happens transparently: the source code that uses cv::Mat remains the same, but the linked symbols point to vectorized versions, and the internal dispatcher selects the appropriate vector width at runtime.

## CPU Processor Requirements

Systems based on Intel® 64 architectures below are supported both as host and target platforms.

-   Intel® Core™ processor family or higher
-   Intel® Xeon® processor family
-   Intel® Xeon® Scalable processor family

### Requirements for Accelerators

-   Integrated GEN9 (and higher) GPUs. See source in Intel® Graphics Compiler for OpenCL™
-   FPGA Card: see Intel(R) DPC++ Compiler System Requirements.

### Disk Space Requirements

-   3.3 GB of disk space (minimum) on a standard installation.

@note: During the installation process, the installer may need up to 6 GB of additional temporary disk storage to manage the download and intermediate installation files.

### Memory Requirements

-   8 GB RAM recommended

## How To install oneAPI

Installing oneAPI: To quickly set up the oneAPI ecosystem on openSUSE, simply follow the official guide [https://www.intel.com/content/www/us/en/developer/articles/guide/installation-guide-for-oneapi-toolkits.html](https://www.intel.com/content/www/us/en/developer/articles/guide/installation-guide-for-oneapi-toolkits.html), which shows you how to enable the distribution’s dedicated repository (zypper ar … oneAPI) and install the metapackages ― for example, intel-basekit (DPC++, TBB, oneDNN, IPP compilers) and, optionally, intel-hpckit or intel-renderkit if you need HPC or graphics tools. The guide also explains post-installation tweaks, such as loading the environment with source /opt/intel/oneapi/setvars.sh , ensuring that the binaries (icpx, dpcpp) and libraries are immediately available in your shell for compiling and running accelerated applications.

## Download, Github Instruction, Build and Install

1.  Below are the commands to download last version (latest release on the date of publication of this text):

```
git clone https://github.com/opencv/opencv.git
```

2.  and make sure you are using branch 4.\*:

```
git status
On branch 4.x
```

3.  Navigate to OpenCV repository and prepare the build folder:

```
cd opencv
mkdir build
cd build
```

4.  Set up Intel oneAPI environment variables. For default installation:

```
source /opt/intel/oneapi/setvars.sh
```

5.  Run CMake \* with Intel® oneAPI DPC++/C++ Compiler to configure the project:

```
 cmake -DCMAKE_C_COMPILER=icx \
       -DCMAKE_CXX_COMPILER=icpx
       -DCMAKE_CXX_FLAGS="-march=native -mavx -mfma -msse -msse2" ..
 cmake --build .
```

6.  Now Make sure openCV\* is compiled with Intel® oneAPI DPC++/C++ Compiler and install:

```
readelf -p .comment bin/opencv_annotation
String dump of section '.comment':
  [     0]  GCC: (SUSE Linux) 13.3.1 20250313 [revision 4ef1d8c84faeebffeb0cc01ee22e891b41e5c4e0]
  [    56]  GCC: (SUSE Linux) 12.3.0
  [    6f]  Intel(R) oneAPI DPC++/C++ Compiler 2025.1.1 (2025.1.1.20250418)
make install
```

Have fun...

## [Table Of Content Introduction](https://docharvest.github.io/docs/opencv5/tutorials/introduction/table_of_content_introduction/)

Contents

opencv5

Table Of Content Introduction

OpenCV 5

Table Of Content Introduction

# Introduction to OpenCV {#tutorial\_table\_of\_content\_introduction}

@tableofcontents

-   @subpage tutorial\_general\_install
-   @subpage tutorial\_config\_reference
-   @subpage tutorial\_env\_reference

##### Linux

-   @subpage tutorial\_linux\_install
-   @subpage tutorial\_oneapi\_install
-   @subpage tutorial\_linux\_gdb\_pretty\_printer
-   @subpage tutorial\_linux\_gcc\_cmake
-   @subpage tutorial\_linux\_eclipse

##### Windows

-   @subpage tutorial\_windows\_install
-   @subpage tutorial\_windows\_visual\_studio\_opencv
-   @subpage tutorial\_windows\_visual\_studio\_image\_watch
-   @subpage tutorial\_windows\_msys2\_vscode
-   @subpage tutorial\_windows\_armpl

##### Java & Android

-   @subpage tutorial\_java\_dev\_intro
-   @subpage tutorial\_java\_eclipse
-   @subpage tutorial\_clojure\_dev\_intro
-   @subpage tutorial\_android\_dev\_intro
-   @subpage tutorial\_dev\_with\_OCV\_on\_Android
-   @subpage tutorial\_android\_dnn\_intro
-   @subpage tutorial\_android\_ocl\_intro

##### Other platforms

-   @subpage tutorial\_macos\_install
-   @subpage tutorial\_arm\_crosscompile\_with\_cmake
-   @subpage tutorial\_crosscompile\_with\_multiarch
-   @subpage tutorial\_building\_tegra\_cuda
-   @subpage tutorial\_building\_fastcv
-   @ref tutorial\_ios\_install

##### Usage basics

-   @subpage tutorial\_display\_image - We will learn how to load an image from file and display it using OpenCV

##### Miscellaneous

-   @subpage tutorial\_documentation - This tutorial describes new documenting process and some useful Doxygen features.
-   @subpage tutorial\_transition\_guide - This document describes some aspects of 2.4 -> 3.0 transition process.
-   @subpage tutorial\_cross\_referencing - This document outlines how to create cross references to the OpenCV documentation from other Doxygen projects.

## [Transition Guide](https://docharvest.github.io/docs/opencv5/tutorials/introduction/transition_guide/transition_guide/)

Contents

opencv5

Transition Guide

OpenCV 5

Transition Guide

# Transition guide {#tutorial\_transition\_guide}

@prev\_tutorial{tutorial\_documentation} @next\_tutorial{tutorial\_cross\_referencing}

Compatibility

OpenCV >= 5.0

@tableofcontents

# Changes overview {#tutorial\_transition\_overview}

This document is intended to software developers who want to migrate their code to OpenCV 5.0.

**TODO**

## [Windows Armpl Opencv](https://docharvest.github.io/docs/opencv5/tutorials/introduction/windows_armpl/windows_armpl_opencv/)

Contents

opencv5

Windows Armpl Opencv

OpenCV 5

Windows Armpl Opencv

# Building OpenCV with ARM Performance Libraries (ARMPL) on Windows {#tutorial\_windows\_armpl}

@prev\_tutorial{tutorial\_windows\_install} @next\_tutorial{tutorial\_linux\_install}

@tableofcontents

# Introduction {#tutorial\_windows\_armpl\_intro}

This tutorial explains how to build OpenCV on Windows (AArch64) with [ARM Performance Libraries (ARMPL)](https://developer.arm.com/Tools%20and%20Software/Arm%20Performance%20Libraries) as a math backend. ARMPL provides optimized BLAS and LAPACK routines for Arm-based hardware and can significantly accelerate OpenCV operations such as DFT and DCT.

# Step 1: Download and Install ARM Performance Libraries {#tutorial\_windows\_armpl\_download}

1.  Open a browser and go to the [ARM Performance Libraries Downloads page](https://developer.arm.com/Tools%20and%20Software/Arm%20Performance%20Libraries#Downloads).
    
2.  Under **Windows / AArch64**, download the installer for your preferred toolchain:
    
    File
    
    Architecture
    
    Size
    
    `arm-performance-libraries_26.01_Windows.msi`
    
    AArch64
    
    ~240 MiB
    
3.  Run the downloaded `.msi` installer and follow the on-screen instructions. The default installation directory is:
    
    ```
    C:\Program Files\Arm Performance Libraries\armpl_26.01
    ```
    

# Step 2: Configure System Environment Variables {#tutorial\_windows\_armpl\_env}

OpenCV's CMake scripts (and the ARMPL runtime itself) need to find the library files at both build time and run time. Add the following entries to the **System** `PATH` variable:

1.  Open **System Properties**, click **Advanced**, then **Environment Variables**.
    
2.  Under **System variables**, select `Path` and click **Edit**.
    
3.  Add the two paths below (adjust the version number if yours differs):
    
    ```
    C:\Program Files\Arm Performance Libraries\armpl_26.01\lib
    C:\Program Files\Arm Performance Libraries\armpl_26.01\bin
    ```
    
4.  Click **OK** on every dialog to save.
    

# Step 3: Clone OpenCV {#tutorial\_windows\_armpl\_clone}

```
git clone https://github.com/opencv/opencv.git
cd opencv
```

If you also need the extra modules:

```
git clone https://github.com/opencv/opencv_contrib.git
```

# Step 4: Configure with CMake {#tutorial\_windows\_armpl\_cmake}

Create a build directory and run CMake with ARMPL support enabled.

**Without OpenMP (single-threaded ARMPL):**

```
mkdir build && cd build

cmake -G "Visual Studio 17 2022" -A ARM64 ^
      -DWITH_ARMPL=ON ^
      -DARMPL_ROOT_DIR="C:\Program Files\Arm Performance Libraries\armpl_26.01" ^
      -DWITH_OPENMP=OFF ^
      ..
```

**With OpenMP (multi-threaded ARMPL):**

ARMPL ships both serial and OpenMP-enabled library variants. To use the multi-threaded variant, enable OpenMP in CMake:

```
mkdir build && cd build

cmake -G "Visual Studio 17 2022" -A ARM64 ^
      -DWITH_ARMPL=ON ^
      -DARMPL_ROOT_DIR="C:\Program Files\Arm Performance Libraries\armpl_26.01" ^
      -DWITH_OPENMP=ON ^
      ..
```

@note Enabling `WITH_OPENMP=ON` causes CMake to link against the `armpl_lp64_mp` (multi-threaded) variant of ARMPL. Disabling it links against the serial `armpl_lp64` variant. Only one variant should be enabled at a time to avoid symbol conflicts.

# Step 5: Build and Install {#tutorial\_windows\_armpl\_build}

Open the generated `.sln` file in Visual Studio and build the **Release** configuration, or build from the command line:

```
cmake --build . --config Release --parallel
cmake --install . --config Release
```

# Step 6: Verify the Build {#tutorial\_windows\_armpl\_verify}

After a successful build, confirm that OpenCV detects ARMPL by running:

```
opencv_version --verbose 2>&1 | findstr /i armpl
```

You should see a line similar to:

```
  ARMPL:                       YES (armpl_26.01)
```

Alternatively, check the CMake configuration log for the line:

```
--   ARMPL support:             YES
```

# Troubleshooting {#tutorial\_windows\_armpl\_troubleshoot}

**CMake cannot find ARMPL:**

Make sure `ARMPL_ROOT_DIR` points to the folder that contains both `include\` and `lib\` sub-directories:

```
C:\Program Files\Arm Performance Libraries\armpl_26.01
    bin\
    include\
    lib\
```

**Runtime error: DLL not found:**

Ensure that both the `lib\` and `bin\` directories are on the system `PATH` and that you opened a new Command Prompt after adding them (changes are not picked up by already-open sessions).

**Linker errors with OpenMP:**

If you see duplicate symbol errors when `WITH_OPENMP=ON`, make sure you are not also linking against the serial ARMPL library. Pass `-DWITH_OPENMP=ON` consistently and clean the build directory before re-running CMake.

# See also {#tutorial\_windows\_armpl\_seealso}

-   @ref tutorial\_windows\_install - Generic Windows build guide
-   [ARM Performance Libraries documentation](https://developer.arm.com/documentation/101004/)
-   @ref tutorial\_general\_install - General installation guide

## [Windows Install](https://docharvest.github.io/docs/opencv5/tutorials/introduction/windows_install/windows_install/)

Contents

opencv5

Windows Install

OpenCV 5

Windows Install

# Installation in Windows {#tutorial\_windows\_install}

@prev\_tutorial{tutorial\_linux\_eclipse} @next\_tutorial{tutorial\_windows\_visual\_studio\_opencv}

Original author

Bernát Gábor

Compatibility

OpenCV >= 3.0

@tableofcontents

@warning This tutorial can contain obsolete information.

The description here was tested on Windows 7 SP1. Nevertheless, it should also work on any other relatively modern version of Windows OS. If you encounter errors after following the steps described below, feel free to contact us via our [OpenCV Q&A forum](https://forum.opencv.org). We'll do our best to help you out.

@note To use the OpenCV library you have two options: @ref tutorial\_windows\_install\_prebuilt or @ref tutorial\_windows\_install\_build. While the first one is easier to complete, it only works if you are coding with the latest Microsoft Visual Studio IDE and do not take advantage of the most advanced technologies we integrate into our library. .. \_Windows\_Install\_Prebuild:

## Installation by Using the Pre-built Libraries {#tutorial\_windows\_install\_prebuilt}

\-# Launch a web browser of choice and go to our [page on Sourceforge](http://sourceforge.net/projects/opencvlibrary/files/). -# Choose a build you want to use and download it. -# Make sure you have admin rights. Unpack the self-extracting archive. -# You can check the installation at the chosen path as you can see below.

```
![](images/OpenCV_Install_Directory.png)
```

\-# To finalize the installation go to the @ref tutorial\_windows\_install\_path section.

## Installation by Using git-bash (version>=2.14.1) and cmake (version >=3.9.1){#tutorial\_windows\_gitbash\_build}

\-# You must download [cmake (version >=3.9.1)](https://cmake.org) and install it. You must add cmake to PATH variable during installation

\-# You must install [git-bash (version>=2.14.1)](https://git-for-windows.github.io/). Don't add git to PATH variable during installation

\-# Run git-bash. You observe a command line window. Suppose you want to build opencv and opencv\_contrib in c:/lib

\-# In git command line enter following command (if folder does not exist) : @code{.bash} mkdir /c/lib cd /c/lib @endcode

\-# save this script with name installOCV.sh in c:/lib @code{.bash} #!/bin/bash -e myRepo=$(pwd) CMAKE\_GENERATOR\_OPTIONS=-G"Visual Studio 16 2019" #CMAKE\_GENERATOR\_OPTIONS=-G"Visual Studio 15 2017 Win64" #CMAKE\_GENERATOR\_OPTIONS=(-G"Visual Studio 16 2019" -A x64) # CMake 3.14+ is required if \[ ! -d "$myRepo/opencv" \]; then echo "cloning opencv" git clone [https://github.com/opencv/opencv.git](https://github.com/opencv/opencv.git) else cd opencv git pull --rebase cd .. fi if \[ ! -d "$myRepo/opencv\_contrib" \]; then echo "cloning opencv\_contrib" git clone [https://github.com/opencv/opencv\_contrib.git](https://github.com/opencv/opencv_contrib.git) else cd opencv\_contrib git pull --rebase cd .. fi RepoSource=opencv mkdir -p build\_opencv pushd build\_opencv CMAKE\_OPTIONS=(-DBUILD\_PERF\_TESTS:BOOL=OFF -DBUILD\_TESTS:BOOL=OFF -DBUILD\_DOCS:BOOL=OFF -DWITH\_CUDA:BOOL=OFF -DBUILD\_EXAMPLES:BOOL=OFF -DINSTALL\_CREATE\_DISTRIB=ON) set -x cmake "${CMAKE\_GENERATOR\_OPTIONS\[@\]}" "${CMAKE\_OPTIONS\[@\]}" -DOPENCV\_EXTRA\_MODULES\_PATH="$myRepo"/opencv\_contrib/modules -DCMAKE\_INSTALL\_PREFIX="$myRepo/install/$RepoSource" "$myRepo/$RepoSource" echo "\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\* $Source\_DIR -->debug" cmake --build . --config debug echo "\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\* $Source\_DIR -->release" cmake --build . --config release cmake --build . --target install --config release cmake --build . --target install --config debug popd @endcode In this script I suppose you use VS 2015 in 64 bits @code{.bash} CMAKE\_GENERATOR\_OPTIONS=-G"Visual Studio 14 2015 Win64" @endcode and opencv will be installed in c:/lib/install/opencv @code{.bash} -DCMAKE\_INSTALL\_PREFIX="$myRepo/install/$RepoSource" @endcode with no Perf tests, no tests, no doc, no CUDA and no example @code{.bash} CMAKE\_OPTIONS=(-DBUILD\_PERF\_TESTS:BOOL=OFF -DBUILD\_TESTS:BOOL=OFF -DBUILD\_DOCS:BOOL=OFF -DBUILD\_EXAMPLES:BOOL=OFF) @endcode -# In git command line enter following command : @code{.bash} ./installOCV.sh @endcode -# Drink a coffee or two... opencv is ready : That's all! -# Next time you run this script, opencv and opencv\_contrib will be updated and rebuild

## Installation by Making Your Own Libraries from the Source Files {#tutorial\_windows\_install\_build}

You may find the content of this tutorial also inside the following videos: [Part 1](https://www.youtube.com/watch?v=NnovZ1cTlMs) and [Part 2](https://www.youtube.com/watch?v=qGNWMcfWwPU), hosted on YouTube.

@youtube{NnovZ1cTlMs} @youtube{qGNWMcfWwPU}

**warning**

These videos above are long-obsolete and contain inaccurate information. Be careful, since solutions described in those videos are no longer supported and may even break your install.

If you are building your own libraries you can take the source files from our [Git repository](https://github.com/opencv/opencv.git).

Building the OpenCV library from scratch requires a couple of tools installed beforehand:

-   An IDE of choice (preferably), or just a CC++ compiler that will actually make the binary files. Here we will use the [Microsoft Visual Studio](https://www.microsoft.com/visualstudio/en-us). However, you can use any other IDE that has a valid CC++ compiler.
-   [CMake](http://www.cmake.org/cmake/resources/software.html), which is a neat tool to make the project files (for your chosen IDE) from the OpenCV source files. It will also allow an easy configuration of the OpenCV build files, in order to make binary files that fits exactly to your needs.
-   Git to acquire the OpenCV source files. A good tool for this is [TortoiseGit](http://code.google.com/p/tortoisegit/wiki/Download). Alternatively, you can just download an archived version of the source files from our [page on Sourceforge](http://sourceforge.net/projects/opencvlibrary/files/opencv-win/)

OpenCV may come in multiple flavors. There is a "core" section that will work on its own. Nevertheless, there is a couple of tools, libraries made by 3rd parties that offer services of which the OpenCV may take advantage. These will improve its capabilities in many ways. In order to use any of them, you need to download and install them on your system.

-   The [Python libraries](http://www.python.org/downloads/) are required to build the _Python interface_ of OpenCV. For now use the version `2.7.{x}`. This is also a must if you want to build the _OpenCV documentation_.
-   [Numpy](http://numpy.scipy.org/) is a scientific computing package for Python. Required for the _Python interface_.
-   [Intel Threading Building Blocks (_TBB_)](http://threadingbuildingblocks.org/file.php?fid=77) is used inside OpenCV for parallel code snippets. Using this will make sure that the OpenCV library will take advantage of all the cores you have in your system's CPU.
-   [Intel Integrated Performance Primitives (_IPP_)](http://software.intel.com/en-us/articles/intel-ipp/) may be used to improve the performance of color conversion, Haar training and DFT functions of the OpenCV library. Watch out, since this is not a free service.
-   OpenCV offers a somewhat fancier and more useful graphical user interface, than the default one by using the [Qt framework](http://qt.nokia.com/downloads). For a quick overview of what this has to offer, look into the documentations _highgui_ module, under the _Qt New Functions_ section. Version 4.6 or later of the framework is required.
-   [Eigen](http://eigen.tuxfamily.org/index.php?title=Main_Page#Download) is a C++ template library for linear algebra.
-   The latest [CUDA Toolkit](http://developer.nvidia.com/cuda-downloads) will allow you to use the power lying inside your GPU. This will drastically improve performance for some algorithms (e.g the HOG descriptor). Getting more and more of our algorithms to work on the GPUs is a constant effort of the OpenCV team.
-   [OpenEXR](http://www.openexr.com/downloads.html) source files are required for the library to work with this high dynamic range (HDR) image file format.
-   The OpenNI Framework contains a set of open source APIs that provide support for natural interaction with devices via methods such as voice command recognition, hand gestures, and body motion tracking. Prebuilt binaries can be found [here](http://structure.io/openni). The source code of [OpenNI](https://github.com/OpenNI/OpenNI) and [OpenNI2](https://github.com/OpenNI/OpenNI2) are also available on Github.
-   [Doxygen](http://www.doxygen.nl) is a documentation generator and is the tool that will actually create the _OpenCV documentation_.

Now we will describe the steps to follow for a full build (using all the above frameworks, tools and libraries). If you do not need the support for some of these, you can just freely skip this section.

### Building the library

\-# Make sure you have a working IDE with a valid compiler. In case of the Microsoft Visual Studio just install it and make sure it starts up. -# Install [CMake](http://www.cmake.org/cmake/resources/software.html). Simply follow the wizard, no need to add it to the path. The default install options are OK. -# Download and install an up-to-date version of msysgit from its [official site](http://code.google.com/p/msysgit/downloads/list). There is also the portable version, which you need only to unpack to get access to the console version of Git. Supposing that for some of us it could be quite enough. -# Install [TortoiseGit](http://code.google.com/p/tortoisegit/wiki/Download). Choose the 32 or 64 bit version according to the type of OS you work in. While installing, locate your msysgit (if it does not do that automatically). Follow the wizard -- the default options are OK for the most part. -# Choose a directory in your file system, where you will download the OpenCV libraries to. I recommend creating a new one that has short path and no special characters in it, for example `D:/OpenCV`. For this tutorial, I will suggest you do so. If you use your own path and know, what you are doing -- it is OK. -# Clone the repository to the selected directory. After clicking _Clone_ button, a window will appear where you can select from what repository you want to download source files ([https://github.com/opencv/opencv.git](https://github.com/opencv/opencv.git)) and to what directory (`D:/OpenCV`). -# Push the OK button and be patient as the repository is quite a heavy download. It will take some time depending on your Internet connection.

\-# In this section, I will cover installing the 3rd party libraries. -# Download the [Python libraries](http://www.python.org/downloads/) and install it with the default options. You will need a couple other python extensions. Luckily installing all these may be automated by a nice tool called [Setuptools](http://pypi.python.org/pypi/setuptools#downloads). Download and install again.

```
-#  The easiest way to install Numpy is to just download its binaries from the [sourceforge page](http://sourceforge.net/projects/numpy/files/NumPy/).
    Make sure your download and install
    exactly the binary for your python version (so for version `2.7`).

-#  For the [Intel Threading Building Blocks (*TBB*)](http://threadingbuildingblocks.org/file.php?fid=77)
    download the source files and extract
    it inside a directory on your system. For example let there be `D:/OpenCV/dep`. For installing
    the [Intel Integrated Performance Primitives (*IPP*)](http://software.intel.com/en-us/articles/intel-ipp/)
    the story is the same. For
    extracting the archives, I recommend using the [7-Zip](http://www.7-zip.org/) application.

    ![](images/IntelTBB.png)

-#  In case of the [Eigen](http://eigen.tuxfamily.org/index.php?title=Main_Page#Download) library it is again a case of download and extract to the
    `D:/OpenCV/dep` directory.
-#  Same as above with [OpenEXR](http://www.openexr.com/downloads.html).
-#  For the [OpenNI Framework](http://www.openni.org/) you need to install both the [development
    build](http://www.openni.org/downloadfiles/opennimodules/openni-binaries/21-stable) and the
    [PrimeSensor
    Module](http://www.openni.org/downloadfiles/opennimodules/openni-compliant-hardware-binaries/32-stable).
-#  For the CUDA you need again two modules: the latest [CUDA Toolkit](http://developer.nvidia.com/cuda-downloads) and the *CUDA Tools SDK*.
    Download and install both of them with a *complete* option by using the 32 or 64 bit setups
    according to your OS.
-#  In case of the Qt framework you need to build yourself the binary files (unless you use the
    Microsoft Visual Studio 2008 with 32 bit compiler). To do this go to the [Qt
    Downloads](http://qt.nokia.com/downloads) page. Download the source files (not the
    installers!!!):

    ![](images/qtDownloadThisPackage.png)

    Extract it into a nice and short named directory like `D:/OpenCV/dep/qt/` . Then you need to
    build it. Start up a *Visual* *Studio* *Command* *Prompt* (*2010*) by using the start menu
    search (or navigate through the start menu
    All Programs --\> Microsoft Visual Studio 2010 --\> Visual Studio Tools --\> Visual Studio Command Prompt (2010)).

    ![](images/visualstudiocommandprompt.jpg)

    Now navigate to the extracted folder and enter inside it by using this console window. You
    should have a folder containing files like *Install*, *Make* and so on. Use the *dir* command
    to list files inside your current directory. Once arrived at this directory enter the
    following command:
    @code{.bash}
    configure.exe -release -no-webkit -no-phonon -no-phonon-backend -no-script -no-scripttools
                  -no-qt3support -no-multimedia -no-ltcg
    @endcode
    Completing this will take around 10-20 minutes. Then enter the next command that will take a
    lot longer (can easily take even more than a full hour):
    @code{.bash}
    nmake
    @endcode
    After this set the Qt environment variables using the following command on Windows 7:
    @code{.bash}
    setx -m QTDIR D:/OpenCV/dep/qt/qt-everywhere-opensource-src-4.7.3
    @endcode
    Also, add the built binary files path to the system path by using the [PathEditor](http://www.redfernplace.com/software-projects/patheditor/). In our
    case this is `D:/OpenCV/dep/qt/qt-everywhere-opensource-src-4.7.3/bin`.

    @note
    If you plan on doing Qt application development you can also install at this point the *Qt
    Visual Studio Add-in*. After this you can make and build Qt applications without using the *Qt
    Creator*. Everything is nicely integrated into Visual Studio.
```

\-# Now start the _CMake (cmake-gui)_. You may again enter it in the start menu search or get it from the All Programs --> CMake 2.8 --> CMake (cmake-gui). First, select the directory for the source files of the OpenCV library (1). Then, specify a directory where you will build the binary files for OpenCV (2).

```
![](images/CMakeSelectBin.jpg)

Press the Configure button to specify the compiler (and *IDE*) you want to use. Note that in
case you can choose between different compilers for making either 64 bit or 32 bit libraries.
Select the one you use in your application development.

![](images/CMake_Configure_Windows.jpg)

CMake will start out and based on your system variables will try to automatically locate as many
packages as possible. You can modify the packages to use for the build in the WITH --\> WITH_X
menu points (where *X* is the package abbreviation). Here are a list of current packages you can
turn on or off:

![](images/CMakeBuildWithWindowsGUI.jpg)

Select all the packages you want to use and press again the *Configure* button. For an easier
overview of the build options make sure the *Grouped* option under the binary directory
selection is turned on. For some of the packages CMake may not find all of the required files or
directories. In case of these, CMake will throw an error in its output window (located at the
bottom of the GUI) and set its field values to not found constants. For example:

![](images/CMakePackageNotFoundWindows.jpg)

![](images/CMakeOutputPackageNotFound.jpg)

For these you need to manually set the queried directories or files path. After this press again
the *Configure* button to see if the value entered by you was accepted or not. Do this until all
entries are good and you cannot see errors in the field/value or the output part of the GUI. Now
I want to emphasize an option that you will definitely love:
ENABLE --\> ENABLE_SOLUTION_FOLDERS. OpenCV will create many-many projects and turning this
option will make sure that they are categorized inside directories in the *Solution Explorer*.
It is a must have feature, if you ask me.

![](images/CMakeBuildOptionsOpenCV.jpg)

Furthermore, you need to select what part of OpenCV you want to build.

-   *BUILD_DOCS* -\> It creates two projects for building the documentation of OpenCV (there
    will be a separate project for building the HTML and the PDF files). Note that these are not
    built together with the solution. You need to make an explicit build project command on
    these to do so.
-   *BUILD_EXAMPLES* -\> OpenCV comes with many example applications from which you may learn
    most of the libraries capabilities. This will also come handy to easily try out if OpenCV is
    fully functional on your computer.
-   *BUILD_PACKAGE* -\> Prior to version 2.3 with this you could build a project that will
    build an OpenCV installer. With this, you can easily install your OpenCV flavor on other
    systems. For the latest source files of OpenCV, it generates a new project that simply
    creates a zip archive with OpenCV sources.
-   *BUILD_SHARED_LIBS* -\> With this you can control to build DLL files (when turned on) or
    static library files (\*.lib) otherwise.
-   *BUILD_TESTS* -\> Each module of OpenCV has a test project assigned to it. Building these
    test projects is also a good way to try out, that the modules work just as expected on your
    system too.
-   *BUILD_PERF_TESTS* -\> There are also performance tests for many OpenCV functions. If
    you are concerned about performance, build them and run.
-   *BUILD_opencv_python* -\> Self-explanatory. Create the binaries to use OpenCV from the
    Python language.
-   *BUILD_opencv_world* -\> Generate a single "opencv_world" binary (a shared or static library, depending on *BUILD_SHARED_LIBS*) including all the modules instead of a collection of separate binaries, one binary per module.

Press again the *Configure* button and ensure no errors are reported. If this is the case, you
can tell CMake to create the project files by pushing the *Generate* button. Go to the build
directory and open the created **OpenCV** solution. Depending on just how much of the above
options you have selected the solution may contain quite a lot of projects so be tolerant on the
IDE at the startup. Now you need to build both the *Release* and the *Debug* binaries. Use the
drop-down menu on your IDE to change to another of these after building for one of them.

![](images/ChangeBuildVisualStudio.jpg)

In the end, you can observe the built binary files inside the bin directory:

![](images/OpenCVBuildResultWindows.jpg)

For the documentation, you need to explicitly issue the build commands on the *doxygen* project for
the HTML documentation. It will call *Doxygen* to do
all the hard work. You can find the generated documentation inside the `build/doc/doxygen/html`.

To collect the header and the binary files, that you will use during your own projects, into a
separate directory (similarly to how the pre-built binaries ship) you need to explicitly build
the *Install* project.

![](images/WindowsBuildInstall.png)

This will create an *Install* directory inside the *Build* one collecting all the built binaries
into a single place. Use this only after you built both the *Release* and *Debug* versions.

To test your build just go into the `Build/bin/Debug` or `Build/bin/Release` directory and start
a couple of applications like the *contours.exe*. If they run, you are done. Otherwise,
something definitely went awfully wrong. In this case you should contact us at our [Q&A forum](https://forum.opencv.org/).
If everything is okay, the *contours.exe* output should resemble the following image (if
built with Qt support):

![](images/WindowsQtContoursOutput.png)
```

@note If you use the GPU module (CUDA libraries), make sure you also upgrade to the latest drivers of your GPU. Error messages containing invalid entries in (or cannot find) the nvcuda.dll are caused mostly by old video card drivers. For testing the GPU (if built) run the _performance\_gpu.exe_ sample application.

## Set the OpenCV environment variable and add it to the systems path {#tutorial\_windows\_install\_path}

First, we set an environment variable to make our work easier. This will hold the build directory of our OpenCV library that we use in our projects. Start up a command window and enter: @code setx OpenCV\_DIR D:\\OpenCV\\build\\x64\\vc14 (suggested for Visual Studio 2015 - 64 bit Windows) setx OpenCV\_DIR D:\\OpenCV\\build\\x86\\vc14 (suggested for Visual Studio 2015 - 32 bit Windows)

```
setx OpenCV_DIR D:\OpenCV\build\x64\vc15     (suggested for Visual Studio 2017 - 64 bit Windows)
setx OpenCV_DIR D:\OpenCV\build\x86\vc15     (suggested for Visual Studio 2017 - 32 bit Windows)

setx OpenCV_DIR D:\OpenCV\build\x64\vc16     (suggested for Visual Studio 2019 - 64 bit Windows)
setx OpenCV_DIR D:\OpenCV\build\x86\vc16     (suggested for Visual Studio 2019 - 32 bit Windows)

setx OpenCV_DIR D:\OpenCV\build\x64\vc17     (suggested for Visual Studio 2022 - 64 bit Windows)
setx OpenCV_DIR D:\OpenCV\build\x86\vc17     (suggested for Visual Studio 2022 - 32 bit Windows)

setx OpenCV_DIR D:\OpenCV\build\x64\vc18     (suggested for Visual Studio 2026 - 64 bit Windows)
setx OpenCV_DIR D:\OpenCV\build\x86\vc18     (suggested for Visual Studio 2026 - 32 bit Windows)
```

@endcode Here the directory is where you have your OpenCV binaries (_extracted_ or _built_). You can have different platform (e.g. x64 instead of x86) or compiler type, so substitute appropriate value. Inside this, you should have two folders called _lib_ and _bin_.

If you built static libraries then you are done. Otherwise, you need to add the _bin_ folders path to the systems path. This is because you will use the OpenCV library in form of _"Dynamic-link libraries"_ (also known as **DLL**). Inside these are stored all the algorithms and information the OpenCV library contains. The operating system will load them only on demand, during runtime. However, to do this the operating system needs to know where they are. The systems **PATH** contains a list of folders where DLLs can be found. Add the OpenCV library path to this and the OS will know where to look if he ever needs the OpenCV binaries. Otherwise, you will need to copy the used DLLs right beside the applications executable file (_exe_) for the OS to find it, which is highly unpleasant if you work on many projects. To do this start up again the [PathEditor](http://www.redfernplace.com/software-projects/patheditor/) and add the following new entry (right click in the application to bring up the menu): @code %OPENCV\_DIR%\\bin @endcode

Save it to the registry and you are done. If you ever change the location of your build directories or want to try out your application with a different build, all you will need to do is to update the OPENCV\_DIR variable via the _setx_ command inside a command window.

Now you can continue reading the tutorials with the @ref tutorial\_windows\_visual\_studio\_opencv section. There you will find out how to use the OpenCV library in your own projects with the help of the Microsoft Visual Studio IDE.

## [Window Install Opencv](https://docharvest.github.io/docs/opencv5/tutorials/introduction/windows_msys2_vscode/window_install_opencv/)


## [Windows Visual Studio Image Watch](https://docharvest.github.io/docs/opencv5/tutorials/introduction/windows_visual_studio_image_watch/windows_visual_studio_image_watch/)


## [Windows Visual Studio Opencv](https://docharvest.github.io/docs/opencv5/tutorials/introduction/windows_visual_studio_opencv/windows_visual_studio_opencv/)

Contents

opencv5

Windows Visual Studio Opencv

OpenCV 5

Windows Visual Studio Opencv

# How to build applications with OpenCV inside the "Microsoft Visual Studio" {#tutorial\_windows\_visual\_studio\_opencv}

@prev\_tutorial{tutorial\_windows\_install} @next\_tutorial{tutorial\_windows\_visual\_studio\_image\_watch}

Original author

Bernát Gábor

Compatibility

OpenCV >= 3.0

@tableofcontents

@warning This tutorial can contain obsolete information.

Everything I describe here will apply to the `C\C++` interface of OpenCV. I start out from the assumption that you have read and completed with success the @ref tutorial\_windows\_install tutorial. Therefore, before you go any further make sure you have an OpenCV directory that contains the OpenCV header files plus binaries and you have set the environment variables as described here @ref tutorial\_windows\_install\_path.

The OpenCV libraries, distributed by us, on the Microsoft Windows operating system are in a Dynamic Linked Libraries (_DLL_). These have the advantage that all the content of the library is loaded only at runtime, on demand, and that countless programs may use the same library file. This means that if you have ten applications using the OpenCV library, no need to have around a version for each one of them. Of course you need to have the _dll_ of the OpenCV on all systems where you want to run your application.

Another approach is to use static libraries that have _lib_ extensions. You may build these by using our source files as described in the @ref tutorial\_windows\_install tutorial. When you use this the library will be built-in inside your _exe_ file. So there is no chance that the user deletes them, for some reason. As a drawback your application will be larger one and as, it will take more time to load it during its startup.

To build an application with OpenCV you need to do two things:

-   _Tell_ to the compiler how the OpenCV library _looks_. You do this by _showing_ it the header files.
    
-   _Tell_ to the linker from where to get the functions or data structures of OpenCV, when they are needed.
    
    If you use the _lib_ system you must set the path where the library files are and specify in which one of them to look. During the build the linker will look into these libraries and add the definitions and implementation of all _used_ functions and data structures to the executable file.
    
    If you use the _DLL_ system you must again specify all this, however now for a different reason. This is a Microsoft OS specific stuff. It seems that the linker needs to know that where in the DLL to search for the data structure or function at the runtime. This information is stored inside _lib_ files. Nevertheless, they aren't static libraries. They are so called import libraries. This is why when you make some _DLLs_ in Windows you will also end up with some _lib_ extension libraries. The good part is that at runtime only the _DLL_ is required.
    

To pass on all this information to the Visual Studio IDE you can either do it globally (so all your future projects will get this information) or locally (so only for you current project). The advantage of the global one is that you only need to do it once; however, it may be undesirable to clump all your projects all the time with all this information. In case of the global one how you do it depends on the Microsoft Visual Studio you use. There is a **2008 and previous versions** and a **2010 way** of doing it. Inside the global section of this tutorial I'll show what the main differences are.

The base item of a project in Visual Studio is a solution. A solution may contain multiple projects. Projects are the building blocks of an application. Every project will realize something and you will have a main project in which you can put together this project puzzle. In case of the many simple applications (like many of the tutorials will be) you do not need to break down the application into modules. In these cases, your main project will be the only existing one. Now go create a new solution inside Visual studio by going through the File --> New --> Project menu selection. Choose _Win32 Console Application_ as type. Enter its name and select the path where to create it. Then in the upcoming dialog make sure you create an empty project.

## The local method

Every project is built separately from the others. Due to this every project has its own rule package. Inside this rule packages are stored all the information the _IDE_ needs to know to build your project. For any application there are at least two build modes: a _Release_ and a _Debug_ one. The _Debug_ has many features that exist so you can find and resolve easier bugs inside your application. In contrast the _Release_ is an optimized version, where the goal is to make the application run as fast as possible or to be as small as possible. You may figure that these modes also require different rules to use during build. Therefore, there exist different rule packages for each of your build modes. These rule packages are called inside the IDE as _project properties_ and you can view and modify them by using the _Property Manager_. You can bring this up with View --> Property Pages (For Visual Studio 2013 onwards, go to View --> Other Windows --> Property Manager). Expand it and you can see the existing rule packages (called _Property Sheets_).

The really useful stuff of these is that you may create a rule package _once_ and you can later just add it to your new projects. Create it once and reuse it later. We want to create a new _Property Sheet_ that will contain all the rules that the compiler and linker needs to know. Of course we will need a separate one for the Debug and the Release Builds. Start up with the Debug one as shown in the image below:

Use for example the _OpenCV\_Debug_ name. Then by selecting the sheet Right Click --> Properties. In the following I will show to set the OpenCV rules locally, as I find unnecessary to pollute projects with custom rules that I do not use it. Go the C++ groups General entry and under the _"Additional Include Directories"_ add the path to your OpenCV include. If you don't have _"C/C++"_ group, you should add any .c/.cpp file to the project. @code{.bash} $(OPENCV\_DIR)....\\include @endcode

When adding third party libraries settings it is generally a good idea to use the power behind the environment variables. The full location of the OpenCV library may change on each system. Moreover, you may even end up yourself with moving the install directory for some reason. If you would give explicit paths inside your property sheet your project will end up not working when you pass it further to someone else who has a different OpenCV install path. Moreover, fixing this would require to manually modifying every explicit path. A more elegant solution is to use the environment variables. Anything that you put inside a parenthesis started with a dollar sign will be replaced at runtime with the current environment variables value. Here comes in play the environment variable setting we already made in our previous tutorial @ref tutorial\_windows\_install\_path.

Next go to the Linker --> General and under the _"Additional Library Directories"_ add the libs directory: @code{.bash} $(OPENCV\_DIR)\\lib @endcode

Then you need to specify the libraries in which the linker should look into. To do this go to the Linker --> Input and under the _"Additional Dependencies"_ entry add the name of all modules which you want to use:

The names of the libraries are as follow: @code{.bash} opencv\_(The Name of the module)(The version Number of the library you use)d.lib @endcode A full list, for the latest version would contain: @code{.bash} opencv\_calib3d300d.lib opencv\_core300d.lib opencv\_features2d300d.lib opencv\_flann300d.lib opencv\_highgui300d.lib opencv\_imgcodecs300d.lib opencv\_imgproc300d.lib opencv\_ml300d.lib opencv\_objdetect300d.lib opencv\_photo300d.lib opencv\_shape300d.lib opencv\_stitching300d.lib opencv\_superres300d.lib opencv\_ts300d.lib opencv\_video300d.lib opencv\_videoio300d.lib opencv\_videostab300d.lib @endcode

Alternatively, your OpenCV download may have been built into one large .lib file. Check by looking in OpenCV\\build\\architecture\\vc14\\lib. In this case all you would add is, for the version 3.3.0: @code{.bash} opencv\_world330.lib @endcode The letter _d_ at the end just indicates that these are the libraries required for the debug. Now click ok to save and do the same with a new property inside the Release rule section. Make sure to omit the _d_ letters from the library names and to save the property sheets with the save icon above them.

You can find your property sheets inside your projects directory. At this point, it is a wise decision to back them up into some special directory, to always have them at hand in the future, whenever you create an OpenCV project. Note that for Visual Studio 2010 the file extension is _props_, while for 2008 this is _vsprops_.

Next time when you make a new OpenCV project just use the "Add Existing Property Sheet..." menu entry inside the Property Manager to easily add the OpenCV build rules.

## The global method

In case you find it too troublesome to add the property pages to each and every one of your projects you can also add this rules to a _"global property page"_. However, this applies only to the additional include and library directories. The name of the libraries to use you still need to specify manually by using for instance: a Property page.

In Visual Studio 2008 you can find this under the: Tools --> Options --> Projects and Solutions --> VC++ Directories.

In Visual Studio 2010 this has been moved to a global property sheet which is automatically added to every project you create:

The process is the same as described in case of the local approach. Just add the include directories by using the environment variable _OPENCV\_DIR_.

## Test it!

Now to try this out download our little test [source code](https://github.com/opencv/opencv/tree/5.x/samples/cpp/tutorial_code/introduction/windows_visual_studio_opencv/introduction_windows_vs.cpp) or get it from the sample code folder of the OpenCV sources. Add this to your project and build it. Here's its content:

@include cpp/tutorial\_code/introduction/windows\_visual\_studio\_opencv/introduction\_windows\_vs.cpp

You can start a Visual Studio build from two places. Either inside from the _IDE_ (keyboard combination: Control-F5) or by navigating to your build directory and start the application with a double click. The catch is that these two **aren't** the same. When you start it from the _IDE_ its current working directory is the projects directory, while otherwise it is the folder where the application file currently is (so usually your build directory). Moreover, in case of starting from the _IDE_ the console window will not close once finished. It will wait for a keystroke of yours.

This is important to remember when you code inside the code open and save commands. Your resources will be saved ( and queried for at opening!!!) relatively to your working directory. This is unless you give a full, explicit path as a parameter for the I/O functions. In the code above we open [this OpenCV logo](https://github.com/opencv/opencv/tree/5.x/samples/data/opencv-logo.png). Before starting up the application, make sure you place the image file in your current working directory. Modify the image file name inside the code to try it out on other images too. Run it and voil á:

## Command line arguments with Visual Studio

Throughout some of our future tutorials, you'll see that the programs main input method will be by giving a runtime argument. To do this you can just start up a command windows (cmd + Enter in the start menu), navigate to your executable file and start it with an argument. So for example in case of my upper project this would look like: @code{.bash} D: CD OpenCV\\MySolutionName\\Release MySolutionName.exe exampleImage.jpg @endcode Here I first changed my drive (if your project isn't on the OS local drive), navigated to my project and start it with an example image argument. While under Linux system it is common to fiddle around with the console window on the Microsoft Windows many people come to use it almost never. Besides, adding the same argument again and again while you are testing your application is, somewhat, a cumbersome task. Luckily, in the Visual Studio there is a menu to automate all this:

Specify here the name of the inputs and while you start your application from the Visual Studio environment you have automatic argument passing. In the next introductory tutorial you'll see an in-depth explanation of the upper source code: @ref tutorial\_display\_image.

## [Hello](https://docharvest.github.io/docs/opencv5/tutorials/ios/hello/hello/)

Contents

opencv5

Hello

OpenCV 5

Hello

# OpenCV iOS Hello {#tutorial\_hello}

@tableofcontents

@prev\_tutorial{tutorial\_ios\_install} @next\_tutorial{tutorial\_image\_manipulation}

Original author

Charu Hans

Compatibility

OpenCV >= 3.0

## Goal

In this tutorial we will learn how to:

-   Link OpenCV framework with Xcode
-   How to write simple Hello World application using OpenCV and Xcode.

## Linking OpenCV iOS

Follow this step by step guide to link OpenCV to iOS.

\-# Create a new XCode project. -# Now we need to link _opencv2.framework_ with Xcode. Select the project Navigator in the left hand panel and click on project name. -# Under the TARGETS click on Build Phases. Expand Link Binary With Libraries option. -# Click on Add others and go to directory where _opencv2.framework_ is located and click open -# Now you can start writing your application.

## Hello OpenCV iOS Application

Now we will learn how to write a simple Hello World Application in Xcode using OpenCV.

-   Link your project with OpenCV as shown in previous section.
    
-   Open the file named _NameOfProject-Prefix.pch_ ( replace NameOfProject with name of your project) and add the following lines of code. @code{.m} #ifdef \_\_cplusplus #import <opencv2/opencv.hpp> #endif @endcode
    
-   Add the following lines of code to viewDidLoad method in ViewController.m. @code{.m} UIAlertView \* alert = \[\[UIAlertView alloc\] initWithTitle:@"Hello!" message:@"Welcome to OpenCV" delegate:self cancelButtonTitle:@"Continue" otherButtonTitles:nil\]; \[alert show\]; @endcode
    
-   You are good to run the project.
    

## Output

## Changes for XCode5+ and iOS8+

With the newer XCode and iOS versions you need to watch out for some specific details

-   The \*.m file in your project should be renamed to \*.mm.
-   You have to manually include AssetsLibrary.framework into your project, which is not done anymore by default.

## [Image Manipulation](https://docharvest.github.io/docs/opencv5/tutorials/ios/image_manipulation/image_manipulation/)

Contents

opencv5

Image Manipulation

OpenCV 5

Image Manipulation

# OpenCV iOS - Image Processing {#tutorial\_image\_manipulation}

@tableofcontents

@prev\_tutorial{tutorial\_hello} @next\_tutorial{tutorial\_video\_processing}

Original author

Charu Hans

Compatibility

OpenCV >= 3.0

## Goal

In this tutorial we will learn how to do basic image processing using OpenCV in iOS.

## Introduction

In _OpenCV_ all the image processing operations are usually carried out on the _Mat_ structure. In iOS however, to render an image on screen it have to be an instance of the _UIImage_ class. To convert an _OpenCV Mat_ to an _UIImage_ we use the _Core Graphics_ framework available in iOS. Below is the code needed to convert back and forth between Mat's and UIImage's. @code{.m}

-   (cv::Mat)cvMatFromUIImage:(UIImage \*)image { CGColorSpaceRef colorSpace = CGImageGetColorSpace(image.CGImage); CGFloat cols = image.size.width; CGFloat rows = image.size.height;
    
    cv::Mat cvMat(rows, cols, CV\_8UC4); // 8 bits per component, 4 channels (color channels + alpha)
    
    CGContextRef contextRef = CGBitmapContextCreate(cvMat.data, // Pointer to data cols, // Width of bitmap rows, // Height of bitmap 8, // Bits per component cvMat.step\[0\], // Bytes per row colorSpace, // Colorspace kCGImageAlphaNoneSkipLast | kCGBitmapByteOrderDefault); // Bitmap info flags CGContextDrawImage(contextRef, CGRectMake(0, 0, cols, rows), image.CGImage); CGContextRelease(contextRef);
    
    return cvMat;
    

} @endcode @code{.m}

-   (cv::Mat)cvMatGrayFromUIImage:(UIImage \*)image { CGColorSpaceRef colorSpace = CGImageGetColorSpace(image.CGImage); CGFloat cols = image.size.width; CGFloat rows = image.size.height;
    
    cv::Mat cvMat(rows, cols, CV\_8UC1); // 8 bits per component, 1 channels
    
    CGContextRef contextRef = CGBitmapContextCreate(cvMat.data, // Pointer to data cols, // Width of bitmap rows, // Height of bitmap 8, // Bits per component cvMat.step\[0\], // Bytes per row colorSpace, // Colorspace kCGImageAlphaNoneSkipLast | kCGBitmapByteOrderDefault); // Bitmap info flags CGContextDrawImage(contextRef, CGRectMake(0, 0, cols, rows), image.CGImage); CGContextRelease(contextRef);
    
    return cvMat;
    

} @endcode After the processing we need to convert it back to UIImage. The code below can handle both gray-scale and color image conversions (determined by the number of channels in the _if_ statement). @code{.m} cv::Mat greyMat; cv::cvtColor(inputMat, greyMat, COLOR\_BGR2GRAY); @endcode After the processing we need to convert it back to UIImage. @code{.m} -(UIImage \*)UIImageFromCVMat:(cv::Mat)cvMat { NSData \*data = \[NSData dataWithBytes:cvMat.data length:cvMat.elemSize()\*cvMat.total()\]; CGColorSpaceRef colorSpace;

if (cvMat.elemSize() == 1) { colorSpace = CGColorSpaceCreateDeviceGray(); } else { colorSpace = CGColorSpaceCreateDeviceRGB(); }

CGDataProviderRef provider = CGDataProviderCreateWithCFData((\_\_bridge CFDataRef)data);

// Creating CGImage from cv::Mat CGImageRef imageRef = CGImageCreate(cvMat.cols, //width cvMat.rows, //height 8, //bits per component 8 \* cvMat.elemSize(), //bits per pixel cvMat.step\[0\], //bytesPerRow colorSpace, //colorspace kCGImageAlphaNone|kCGBitmapByteOrderDefault,// bitmap info provider, //CGDataProviderRef NULL, //decode false, //should interpolate kCGRenderingIntentDefault //intent );

// Getting UIImage from CGImage UIImage \*finalImage = \[UIImage imageWithCGImage:imageRef\]; CGImageRelease(imageRef); CGDataProviderRelease(provider); CGColorSpaceRelease(colorSpace);

return finalImage; } @endcode

## Output

Check out an instance of running code with more Image Effects on [YouTube](http://www.youtube.com/watch?v=Ko3K_xdhJ1I) .

@youtube{Ko3K\_xdhJ1I}

## [Ios Install](https://docharvest.github.io/docs/opencv5/tutorials/ios/ios_install/ios_install/)


## [Table Of Content Ios](https://docharvest.github.io/docs/opencv5/tutorials/ios/table_of_content_ios/)

Contents

opencv5

Table Of Content Ios

OpenCV 5

Table Of Content Ios

# OpenCV iOS {#tutorial\_table\_of\_content\_ios}

-   @subpage tutorial\_ios\_install
-   @subpage tutorial\_hello
-   @subpage tutorial\_image\_manipulation
-   @subpage tutorial\_video\_processing

## [Video Processing](https://docharvest.github.io/docs/opencv5/tutorials/ios/video_processing/video_processing/)


## [Aruco Board Detection](https://docharvest.github.io/docs/opencv5/tutorials/objdetect/aruco_board_detection/aruco_board_detection/)

Contents

opencv5

Aruco Board Detection

OpenCV 5

Aruco Board Detection

# Detection of ArUco boards {#tutorial\_aruco\_board\_detection}

@prev\_tutorial{tutorial\_aruco\_detection} @next\_tutorial{tutorial\_charuco\_detection}

Original authors

Sergio Garrido, Alexander Panov

Compatibility

OpenCV >= 4.7.0

An ArUco board is a set of markers that acts like a single marker in the sense that it provides a single pose for the camera.

The most popular board is the one with all the markers in the same plane, since it can be easily printed:

However, boards are not limited to this arrangement and can represent any 2d or 3d layout.

The difference between a board and a set of independent markers is that the relative position between the markers in the board is known a priori. This allows that the corners of all the markers can be used for estimating the pose of the camera respect to the whole board.

When you use a set of independent markers, you can estimate the pose for each marker individually, since you dont know the relative position of the markers in the environment.

The main benefits of using boards are:

-   The pose estimation is much more versatile. Only some markers are necessary to perform pose estimation. Thus, the pose can be calculated even in the presence of occlusions or partial views.
-   The obtained pose is usually more accurate since a higher amount of point correspondences (marker corners) are employed.

## Board Detection

A board detection is similar to the standard marker detection. The only difference is in the pose estimation step. In fact, to use marker boards, a standard marker detection should be done before estimating the board pose.

To perform pose estimation for boards, you should use `solvePnP()` function, as shown below in the `samples/cpp/tutorial_code/objectDetection/detect_board.cpp`.

@snippet samples/cpp/tutorial\_code/objectDetection/detect\_board.cpp aruco\_detect\_board\_full\_sample

The parameters are:

-   `objPoints`, `imgPoints` object and image points, matched with `cv::aruco::GridBoard::matchImagePoints()` which, in turn, takes as input `markerCorners` and `markerIds` structures of detected markers from `cv::aruco::ArucoDetector::detectMarkers()` function.
-   `board` the `cv::aruco::Board` object that defines the board layout and its ids
-   `cameraMatrix` and `distCoeffs`: camera calibration parameters necessary for pose estimation.
-   `rvec` and `tvec`: estimated pose of the board. If not empty then treated as initial guess.
-   The function returns the total number of markers employed for estimating the board pose.

The drawFrameAxes() function can be used to check the obtained pose. For instance:

And this is another example with the board partially occluded:

As it can be observed, although some markers have not been detected, the board pose can still be estimated from the rest of markers.

Sample video:

@youtube{Q1HlJEjW\_j0}

A full working example is included in the `detect_board.cpp` inside the `samples/cpp/tutorial_code/objectDetection/`.

The samples now take input via command line via the `cv::CommandLineParser`. For this file the example parameters will look like: @code{.cpp} -w=5 -h=7 -l=100 -s=10 -v=/path\_to\_opencv/opencv/doc/tutorials/objdetect/aruco\_board\_detection/gboriginal.jpg -c=/path\_to\_opencv/opencv/samples/cpp/tutorial\_code/objectDetection/tutorial\_camera\_params.yml -cd=/path\_to\_opencv/opencv/samples/cpp/tutorial\_code/objectDetection/tutorial\_dict.yml @endcode Parameters for `detect_board.cpp`: @snippet samples/cpp/tutorial\_code/objectDetection/detect\_board.cpp aruco\_detect\_board\_keys

## Grid Board

Creating the `cv::aruco::Board` object requires specifying the corner positions for each marker in the environment. However, in many cases, the board will be just a set of markers in the same plane and in a grid layout, so it can be easily printed and used.

Fortunately, the aruco module provides the basic functionality to create and print these types of markers easily.

The `cv::aruco::GridBoard` class is a specialized class that inherits from the `cv::aruco::Board` class and which represents a Board with all the markers in the same plane and in a grid layout, as in the following image:

Concretely, the coordinate system in a grid board is positioned in the board plane, centered in the bottom left corner of the board and with the Z pointing out, like in the following image (X:red, Y:green, Z:blue):

A `cv::aruco::GridBoard` object can be defined using the following parameters:

-   Number of markers in the X direction.
-   Number of markers in the Y direction.
-   Length of the marker side.
-   Length of the marker separation.
-   The dictionary of the markers.
-   Ids of all the markers (X\*Y markers).

This object can be easily created from these parameters using the `cv::aruco::GridBoard` constructor:

@snippet samples/cpp/tutorial\_code/objectDetection/detect\_board.cpp aruco\_create\_board

-   The first and second parameters are the number of markers in the X and Y direction respectively.
-   The third and fourth parameters are the marker length and the marker separation respectively. They can be provided in any unit, having in mind that the estimated pose for this board will be measured in the same units (in general, meters are used).
-   Finally, the dictionary of the markers is provided.

So, this board will be composed by 5x7=35 markers. The ids of each of the markers are assigned, by default, in ascending order starting on 0, so they will be 0, 1, 2, ..., 34.

After creating a grid board, we probably want to print it and use it. There are two ways to do this:

1.  By using the script `apps/pattern-tools/generate_pattern.py`, see @subpage tutorial\_camera\_calibration\_pattern.
2.  By using the function `cv::aruco::GridBoard::generateImage()`.

The function `cv::aruco::GridBoard::generateImage()` is provided in cv::aruco::GridBoard class and can be called by using the following code:

@snippet samples/cpp/tutorial\_code/objectDetection/create\_board.cpp aruco\_generate\_board\_image

-   The first parameter is the size of the output image in pixels. In this case 600x500 pixels. If this is not proportional to the board dimensions, it will be centered on the image.
-   `boardImage`: the output image with the board.
-   The third parameter is the (optional) margin in pixels, so none of the markers are touching the image border. In this case the margin is 10.
-   Finally, the size of the marker border, similarly to `generateImageMarker()` function. The default value is 1.

A full working example of board creation is included in the `samples/cpp/tutorial_code/objectDetection/create_board.cpp`

The output image will be something like this:

The samples now take input via commandline via the `cv::CommandLineParser`. For this file the example parameters will look like: @code{.cpp} "_output\_path_/aboard.png" -w=5 -h=7 -l=100 -s=10 -d=10 @endcode

## Refine marker detection

ArUco boards can also be used to improve the detection of markers. If we have detected a subset of the markers that belongs to the board, we can use these markers and the board layout information to try to find the markers that have not been previously detected.

This can be done using the `cv::aruco::refineDetectedMarkers()` function, which should be called after calling `cv::aruco::ArucoDetector::detectMarkers()`.

The main parameters of this function are the original image where markers were detected, the board object, the detected marker corners, the detected marker ids and the rejected marker corners.

The rejected corners can be obtained from the `cv::aruco::ArucoDetector::detectMarkers()` function and are also known as marker candidates. This candidates are square shapes that have been found in the original image but have failed to pass the identification step (i.e. their inner codification presents too many errors) and thus they have not been recognized as markers.

However, these candidates are sometimes actual markers that have not been correctly identified due to high noise in the image, very low resolution or other related problems that affect to the binary code extraction. The `cv::aruco::ArucoDetector::refineDetectedMarkers()` function finds correspondences between these candidates and the missing markers of the board. This search is based on two parameters:

-   Distance between the candidate and the projection of the missing marker. To obtain these projections, it is necessary to have detected at least one marker of the board. The projections are obtained using the camera parameters (camera matrix and distortion coefficients) if they are provided. If not, the projections are obtained from local homography and only planar board are allowed (i.e. the Z coordinate of all the marker corners should be the same). The `minRepDistance` parameter in `refineDetectedMarkers()` determines the minimum euclidean distance between the candidate corners and the projected marker corners (default value 10).
    
-   Binary codification. If a candidate surpasses the minimum distance condition, its internal bits are analyzed again to determine if it is actually the projected marker or not. However, in this case, the condition is not so strong and the number of allowed erroneous bits can be higher. This is indicated in the `errorCorrectionRate` parameter (default value 3.0). If a negative value is provided, the internal bits are not analyzed at all and only the corner distances are evaluated.
    

This is an example of using the `cv::aruco::ArucoDetector::refineDetectedMarkers()` function:

@snippet samples/cpp/tutorial\_code/objectDetection/detect\_board.cpp aruco\_detect\_and\_refine

It must also be noted that, in some cases, if the number of detected markers in the first place is too low (for instance only 1 or 2 markers), the projections of the missing markers can be of bad quality, producing erroneous correspondences.

See module samples for a more detailed implementation.

## [Aruco Calibration](https://docharvest.github.io/docs/opencv5/tutorials/objdetect/aruco_calibration/aruco_calibration/)

Contents

opencv5

Aruco Calibration

OpenCV 5

Aruco Calibration

# Calibration with ArUco and ChArUco {#tutorial\_aruco\_calibration}

@prev\_tutorial{tutorial\_charuco\_diamond\_detection} @next\_tutorial{tutorial\_aruco\_faq}

The ArUco module can also be used to calibrate a camera. Camera calibration consists in obtaining the camera intrinsic parameters and distortion coefficients. This parameters remain fixed unless the camera optic is modified, thus camera calibration only need to be done once.

Camera calibration is usually performed using the OpenCV `cv::calibrateCamera()` function. This function requires some correspondences between environment points and their projection in the camera image from different viewpoints. In general, these correspondences are obtained from the corners of chessboard patterns. See `cv::calibrateCamera()` function documentation or the OpenCV calibration tutorial for more detailed information.

Using the ArUco module, calibration can be performed based on ArUco markers corners or ChArUco corners. Calibrating using ArUco is much more versatile than using traditional chessboard patterns, since it allows occlusions or partial views.

As it can be stated, calibration can be done using both, marker corners or ChArUco corners. However, it is highly recommended using the ChArUco corners approach since the provided corners are much more accurate in comparison to the marker corners. Calibration using a standard Board should only be employed in those scenarios where the ChArUco boards cannot be employed because of any kind of restriction.

## Calibration with ChArUco Boards

To calibrate using a ChArUco board, it is necessary to detect the board from different viewpoints, in the same way that the standard calibration does with the traditional chessboard pattern. However, due to the benefits of using ChArUco, occlusions and partial views are allowed, and not all the corners need to be visible in all the viewpoints.

The example of using `cv::calibrateCamera()` for cv::aruco::CharucoBoard:

@snippet samples/cpp/tutorial\_code/objectDetection/calibrate\_camera\_charuco.cpp CalibrationWithCharucoBoard1 @snippet samples/cpp/tutorial\_code/objectDetection/calibrate\_camera\_charuco.cpp CalibrationWithCharucoBoard2 @snippet samples/cpp/tutorial\_code/objectDetection/calibrate\_camera\_charuco.cpp CalibrationWithCharucoBoard3

The ChArUco corners and ChArUco identifiers captured on each viewpoint are stored in the vectors `allCharucoCorners` and `allCharucoIds`, one element per viewpoint.

The `calibrateCamera()` function will fill the `cameraMatrix` and `distCoeffs` arrays with the camera calibration parameters. It will return the reprojection error obtained from the calibration. The elements in `rvecs` and `tvecs` will be filled with the estimated pose of the camera (respect to the ChArUco board) in each of the viewpoints.

Finally, the `calibrationFlags` parameter determines some of the options for the calibration.

A full working example is included in the `calibrate_camera_charuco.cpp` inside the `samples/cpp/tutorial_code/objectDetection` folder.

The samples now take input via commandline via the `cv::CommandLineParser`. For this file the example parameters will look like: @code{.cpp} "camera\_calib.txt" -w=5 -h=7 -sl=0.04 -ml=0.02 -d=10 -v=path/img\_%02d.jpg @endcode

The camera calibration parameters from `opencv/samples/cpp/tutorial_code/objectDetection/tutorial_camera_charuco.yml` were obtained by the `img_00.jpg-img_03.jpg` placed from this [folder](https://github.com/opencv/opencv_contrib/tree/4.6.0/modules/aruco/tutorials/aruco_calibration/images).

## Calibration with ArUco Boards

As it has been stated, it is recommended the use of ChAruco boards instead of ArUco boards for camera calibration, since ChArUco corners are more accurate than marker corners. However, in some special cases it must be required to use calibration based on ArUco boards. As in the previous case, it requires the detections of an ArUco board from different viewpoints.

The example of using `cv::calibrateCamera()` for cv::aruco::GridBoard:

@snippet samples/cpp/tutorial\_code/objectDetection/calibrate\_camera.cpp CalibrationWithArucoBoard1 @snippet samples/cpp/tutorial\_code/objectDetection/calibrate\_camera.cpp CalibrationWithArucoBoard2 @snippet samples/cpp/tutorial\_code/objectDetection/calibrate\_camera.cpp CalibrationWithArucoBoard3

A full working example is included in the `calibrate_camera.cpp` inside the `samples/cpp/tutorial_code/objectDetection` folder.

The samples now take input via commandline via the `cv::CommandLineParser`. For this file the example parameters will look like: @code{.cpp} "camera\_calib.txt" -w=5 -h=7 -l=100 -s=10 -d=10 -v=path/aruco\_videos\_or\_images @endcode

## [Aruco Detection](https://docharvest.github.io/docs/opencv5/tutorials/objdetect/aruco_detection/aruco_detection/)


## [Aruco Faq](https://docharvest.github.io/docs/opencv5/tutorials/objdetect/aruco_faq/aruco_faq/)

Contents

opencv5

Aruco Faq

OpenCV 5

Aruco Faq

# Aruco module FAQ {#tutorial\_aruco\_faq}

@prev\_tutorial{tutorial\_aruco\_calibration} @next\_tutorial{tutorial\_barcode\_detect\_and\_decode}

This is a compilation of questions that can be useful for those that want to use the aruco module.

-   I only want to label some objects, what should I use?

In this case, you only need single ArUco markers. You can place one or several markers with different ids in each of the object you want to identify.

-   Which algorithm is used for marker detection?

The aruco module is based on the original ArUco library. A full description of the detection process can be found in:

> S. Garrido-Jurado, R. Muñoz-Salinas, F. J. Madrid-Cuevas, and M. J. Marín-Jiménez. 2014. "Automatic generation and detection of highly reliable fiducial markers under occlusion". Pattern Recogn. 47, 6 (June 2014), 2280-2292. DOI=10.1016/j.patcog.2014.01.005

-   My markers are not being detected correctly, what can I do?

There can be many factors that avoid the correct detection of markers. You probably need to adjust some of the parameters in the `cv::aruco::DetectorParameters` object. The first thing you can do is checking if your markers are returned as rejected candidates by the `cv::aruco::ArucoDetector::detectMarkers()` function. Depending on this, you should try to modify different parameters.

If you are using a ArUco board, you can also try the `cv::aruco::ArucoDetector::refineDetectedMarkers()` function. If you are [using big markers](https://github.com/opencv/opencv_contrib/issues/2811) (400x400 pixels and more), try increasing `cv::aruco::DetectorParameters::adaptiveThreshWinSizeMax` value. Also avoid [narrow borders around the ArUco marker](https://github.com/opencv/opencv_contrib/issues/2492) (5% or less of the marker perimeter, adjusted by `cv::aruco::DetectorParameters::minMarkerDistanceRate`) around markers.

-   What are the benefits of ArUco boards? What are the drawbacks?

Using a board of markers you can obtain the camera pose from a set of markers, instead of a single one. This way, the detection is able to handle occlusion of partial views of the Board, since only one marker is necessary to obtain the pose.

Furthermore, as in most cases you are using more corners for pose estimation, it will be more accurate than using a single marker.

The main drawback is that a Board is not as versatile as a single marker.

-   What are the benefits of ChArUco boards over ArUco boards? And the drawbacks?

ChArUco boards combines chessboards with ArUco boards. Thanks to this, the corners provided by ChArUco boards are more accurate than those provided by ArUco Boards (or single markers).

The main drawback is that ChArUco boards are not as versatile as ArUco board. For instance, a ChArUco board is a planar board with a specific marker layout while the ArUco boards can have any layout, even in 3d. Furthermore, the markers in the ChArUco board are usually smaller and more difficult to detect.

-   I do not need pose estimation, should I use ChArUco boards?

No. The main goal of ChArUco boards is provide high accurate corners for pose estimation or camera calibration.

-   Should all the markers in an ArUco board be placed in the same plane?

No, the marker corners in a ArUco board can be placed anywhere in its 3d coordinate system.

-   Should all the markers in an ChArUco board be placed in the same plane?

Yes, all the markers in a ChArUco board need to be in the same plane and their layout is fixed by the chessboard shape.

-   What is the difference between a `cv::aruco::Board` object and a `cv::aruco::GridBoard` object?

The `cv::aruco::GridBoard` class is a specific type of board that inherits from `cv::aruco::Board` class. A `cv::aruco::GridBoard` object is a board whose markers are placed in the same plane and in a grid layout.

-   What are Diamond markers?

Diamond markers are very similar to a ChArUco board of 3x3 squares. However, contrary to ChArUco boards, the detection of diamonds is based on the relative position of the markers. They are useful when you want to provide a conceptual meaning to any (or all) of the markers in the diamond. An example is using one of the marker to provide the diamond scale.

-   Do I need to detect marker before board detection, ChArUco board detection or Diamond detection?

Yes, the detection of single markers is a basic tool in the aruco module. It is done using the `cv::aruco::DetectorParameters::detectMarkers()` function. The rest of functionalities receives a list of detected markers from this function.

-   I want to calibrate my camera, can I use this module?

Yes, the aruco module provides functionalities to calibrate the camera using both, ArUco boards and ChArUco boards.

-   Should I calibrate using a ChArUco board or an ArUco board?

It is highly recommended the calibration using ChArUco board due to the high accuracy.

-   Should I use a predefined dictionary or generate my own dictionary?

In general, it is easier to use one of the predefined dictionaries. However, if you need a bigger dictionary (in terms of number of markers or number of bits) you should generate your own dictionary. Dictionary generation is also useful if you want to maximize the inter-marker distance to achieve a better error correction during the identification step.

-   I am generating my own dictionary but it takes too long

Dictionary generation should only be done once at the beginning of your application and it should take some seconds. If you are generating the dictionary on each iteration of your detection loop, you are doing it wrong.

Furthermore, it is recommendable to save the dictionary to a file with `cv::aruco::Dictionary::writeDictionary()` and read it with `cv::aruco::Dictionary::readDictionary()` on every execution, so you don't need to generate it.

-   I would like to use some markers of the original ArUco library that I have already printed, can I use them?

Yes, one of the predefined dictionary is `cv::aruco::DICT_ARUCO_ORIGINAL`, which detects the marker of the original ArUco library with the same identifiers.

-   Can I use the Board configuration file of the original ArUco library in this module?

Not directly, you will need to adapt the information of the ArUco file to the aruco module Board format.

-   Can I use this module to detect the markers of other libraries based on binary fiducial markers?

Probably yes, however you will need to port the dictionary of the original library to the aruco module format.

-   Do I need to store the Dictionary information in a file so I can use it in different executions?

If you are using one of the predefined dictionaries, it is not necessary. Otherwise, it is recommendable that you save it to file.

-   Do I need to store the Board information in a file so I can use it in different executions?

If you are using a `cv::aruco::GridBoard` or a `cv::aruco::CharucoBoard` you only need to store the board measurements that are provided to the `cv::aruco::GridBoard::GridBoard()` constructor or in or `cv::aruco::CharucoBoard` constructor. If you manually modify the marker ids of the boards, or if you use a different type of board, you should save your board object to file.

-   Does the aruco module provide functions to save the Dictionary or Board to file?

You can use `cv::aruco::Dictionary::writeDictionary()` and `cv::aruco::Dictionary::readDictionary()` for `cv::aruco::Dictionary`. The data member of board classes are public and can be easily stored.

-   Alright, but how can I render a 3d model to create an augmented reality application?

To do so, you will need to use an external rendering engine library, such as OpenGL. The aruco module only provides the functionality to obtain the camera pose, i.e. the rotation and translation vectors, which is necessary to create the augmented reality effect. However, you will need to adapt the rotation and traslation vectors from the OpenCV format to the format accepted by your 3d rendering library. The original ArUco library contains examples of how to do it for OpenGL and Ogre3D.

-   I have use this module in my research work, how can I cite it?

You can cite the original ArUco library:

> S. Garrido-Jurado, R. Muñoz-Salinas, F. J. Madrid-Cuevas, and M. J. Marín-Jiménez. 2014. "Automatic generation and detection of highly reliable fiducial markers under occlusion". Pattern Recogn. 47, 6 (June 2014), 2280-2292. DOI=10.1016/j.patcog.2014.01.005

-   Pose estimation markers are not being detected correctly, what can I do?

It is important to remark that the estimation of the pose using only 4 coplanar points is subject to ambiguity. In general, the ambiguity can be solved, if the camera is near to the marker. However, as the marker becomes small, the errors in the corner estimation grows and ambiguity comes as a problem. Try increasing the size of the marker you're using, and you can also try non-symmetrical (aruco\_dict\_utils.cpp) markers to avoid collisions. Use multiple markers (ArUco/ChArUco/Diamonds boards) and pose estimation with solvePnP() with the `cv::SOLVEPNP_IPPE_SQUARE` option. More in [this issue](https://github.com/opencv/opencv/issues/8813).

## [Barcode Detect And Decode](https://docharvest.github.io/docs/opencv5/tutorials/objdetect/barcode_detect_and_decode/barcode_detect_and_decode/)

Contents

opencv5

Barcode Detect And Decode

OpenCV 5

Barcode Detect And Decode

# Barcode Recognition {#tutorial\_barcode\_detect\_and\_decode}

@tableofcontents

@prev\_tutorial{tutorial\_aruco\_faq}

Compatibility

OpenCV >= 4.8

## Goal

In this chapter we will familiarize with the barcode detection and decoding methods available in OpenCV.

## Basics

Barcode is major technique to identify commodity in real life. A common barcode is a pattern of parallel lines arranged by black bars and white bars with vastly different reflectivity. Barcode recognition is to scan the barcode in the horizontal direction to get a string of binary codes composed of bars of different widths and colors, that is, the code information of the barcode. The content of barcode can be decoded by matching with various barcode encoding methods. Currently, we support EAN-8, EAN-13, UPC-A and UPC-E standards.

See [https://en.wikipedia.org/wiki/Universal\_Product\_Code](https://en.wikipedia.org/wiki/Universal_Product_Code) and [https://en.wikipedia.org/wiki/International\_Article\_Number](https://en.wikipedia.org/wiki/International_Article_Number)

Related papers: @cite Xiangmin2015research , @cite kass1987analyzing , @cite bazen2002systematic

## Code example

### Main class

Several algorithms were introduced for barcode recognition.

While coding, we firstly need to create a cv::barcode::BarcodeDetector object. It has mainly three member functions, which will be introduced in the following.

#### Initialization

Optionally user can construct barcode detector with a super resolution model, passed as a single-file ONNX network (`sr.onnx`). A converted model can be downloaded from [https://github.com/WeChatCV/opencv\_3rdparty/tree/wechat\_qrcode](https://github.com/WeChatCV/opencv_3rdparty/tree/wechat_qrcode). Caffe models (`sr.prototxt` / `sr.caffemodel`) are no longer supported.

@snippet cpp/barcode.cpp initialize

We need to create variables to store the outputs.

@snippet cpp/barcode.cpp output

#### Detecting

cv::barcode::BarcodeDetector::detect method uses an algorithm based on directional coherence. First, we compute the average squared gradients of every pixel, @cite bazen2002systematic . Then we divide an image into square patches and compute the **gradient orientation coherence** and **mean gradient direction** of each patch. Then, we connect all patches that have **high gradient orientation coherence** and **similar gradient direction**. At this stage we use multiscale patches to capture the gradient distribution of multi-size barcodes, and apply non-maximum suppression to filter duplicate proposals. At last, we use cv::minAreaRect to bound the ROI, and output the corners of the rectangles.

Detect codes in the input image, and output the corners of detected rectangles:

@snippet cpp/barcode.cpp detect

#### Decoding

cv::barcode::BarcodeDetector::decode method first super-scales the image (_optionally_) if it is smaller than threshold, sharpens the image and then binaries it by OTSU or local binarization. Then it reads the contents of the barcode by matching the similarity of the specified barcode pattern.

#### Detecting and decoding

cv::barcode::BarcodeDetector::detectAndDecode combines `detect` and `decode` in a single call. A simple example below shows how to use this function:

@snippet cpp/barcode.cpp detectAndDecode

Visualize the results:

@snippet cpp/barcode.cpp visualize

## Results

Original image:

After detection:

## [Charuco Detection](https://docharvest.github.io/docs/opencv5/tutorials/objdetect/charuco_detection/charuco_detection/)

Contents

opencv5

Charuco Detection

OpenCV 5

Charuco Detection

# Detection of ChArUco Boards {#tutorial\_charuco\_detection}

@prev\_tutorial{tutorial\_aruco\_board\_detection} @next\_tutorial{tutorial\_charuco\_diamond\_detection}

ArUco markers and boards are very useful due to their fast detection and their versatility. However, one of the problems of ArUco markers is that the accuracy of their corner positions is not too high, even after applying subpixel refinement.

On the contrary, the corners of chessboard patterns can be refined more accurately since each corner is surrounded by two black squares. However, finding a chessboard pattern is not as versatile as finding an ArUco board: it has to be completely visible and occlusions are not permitted.

A ChArUco board tries to combine the benefits of these two approaches:

The ArUco part is used to interpolate the position of the chessboard corners, so that it has the versatility of marker boards, since it allows occlusions or partial views. Moreover, since the interpolated corners belong to a chessboard, they are very accurate in terms of subpixel accuracy.

When high precision is necessary, such as in camera calibration, Charuco boards are a better option than standard ArUco boards.

## Goal

In this tutorial you will learn:

-   How to create a charuco board ?
-   How to detect the charuco corners without performing camera calibration ?
-   How to detect the charuco corners with camera calibration and pose estimation ?

## Source code

You can find this code in `samples/cpp/tutorial_code/objectDetection/detect_board_charuco.cpp`

Here's a sample code of how to achieve all the stuff enumerated at the goal list.

@snippet samples/cpp/tutorial\_code/objectDetection/detect\_board\_charuco.cpp charuco\_detect\_board\_full\_sample

## ChArUco Board Creation

The aruco module provides the `cv::aruco::CharucoBoard` class that represents a Charuco Board and which inherits from the `cv::aruco::Board` class.

This class, as the rest of ChArUco functionalities, are defined in:

@snippet samples/cpp/tutorial\_code/objectDetection/detect\_board\_charuco.cpp charucohdr

To define a `cv::aruco::CharucoBoard`, it is necessary:

-   Number of chessboard squares in X and Y directions.
-   Length of square side.
-   Length of marker side.
-   The dictionary of the markers.
-   Ids of all the markers.

As for the `cv::aruco::GridBoard` objects, the aruco module provides to create `cv::aruco::CharucoBoard` easily. This object can be easily created from these parameters using the `cv::aruco::CharucoBoard` constructor:

@snippet samples/cpp/tutorial\_code/objectDetection/create\_board\_charuco.cpp create\_charucoBoard

-   The first parameter is the number of squares in X and Y direction respectively.
-   The second and third parameters are the length of the squares and the markers respectively. They can be provided in any unit, having in mind that the estimated pose for this board would be measured in the same units (usually meters are used).
-   Finally, the dictionary of the markers is provided.

The ids of each of the markers are assigned by default in ascending order and starting on 0, like in `cv::aruco::GridBoard` constructor. This can be easily customized by accessing to the ids vector through `board.ids`, like in the `cv::aruco::Board` parent class.

Once we have our `cv::aruco::CharucoBoard` object, we can create an image to print it. There are two ways to do this:

1.  By using the script `apps/pattern-tools/generate_pattern.py`, see @subpage tutorial\_camera\_calibration\_pattern.
2.  By using the function `cv::aruco::CharucoBoard::generateImage()`.

The function `cv::aruco::CharucoBoard::generateImage()` is provided in cv::aruco::CharucoBoard class and can be called by using the following code: @snippet samples/cpp/tutorial\_code/objectDetection/create\_board\_charuco.cpp generate\_charucoBoard

-   The first parameter is the size of the output image in pixels. If this is not proportional to the board dimensions, it will be centered on the image.
-   The second parameter is the output image with the charuco board.
-   The third parameter is the (optional) margin in pixels, so none of the markers are touching the image border.
-   Finally, the size of the marker border, similarly to `cv::aruco::generateImageMarker()` function. The default value is 1.

The output image will be something like this:

A full working example is included in the `create_board_charuco.cpp` inside the `samples/cpp/tutorial_code/objectDetection/`.

The samples `create_board_charuco.cpp` now take input via commandline via the `cv::CommandLineParser`. For this file the example parameters will look like: @code{.cpp} "_output\_path_/chboard.png" -w=5 -h=7 -sl=100 -ml=60 -d=10 @endcode

## ChArUco Board Detection

When you detect a ChArUco board, what you are actually detecting is each of the chessboard corners of the board.

Each corner on a ChArUco board has a unique identifier (id) assigned. These ids go from 0 to the total number of corners in the board. The steps of charuco board detection can be broken down to the following steps:

-   **Taking input Image**

@snippet samples/cpp/tutorial\_code/objectDetection/detect\_board\_charuco.cpp inputImg

The original image where the markers are to be detected. The image is necessary to perform subpixel refinement in the ChArUco corners.

-   **Reading the camera calibration Parameters(only for detection with camera calibration)**

@snippet samples/cpp/tutorial\_code/objectDetection/aruco\_samples\_utility.hpp camDistCoeffs

The parameters of `readCameraParameters` are:

-   The first parameter is the path to the camera intrinsic matrix and distortion coefficients.
-   The second and third parameters are cameraMatrix and distCoeffs.

This function takes these parameters as input and returns a boolean value of whether the camera calibration parameters are valid or not. For detection of charuco corners without calibration, this step is not required.

-   **Detecting the markers and interpolation of charuco corners from markers**

The detection of the ChArUco corners is based on the previous detected markers. So that, first markers are detected, and then ChArUco corners are interpolated from markers. The method that detect the ChArUco corners is `cv::aruco::CharucoDetector::detectBoard()`.

@snippet samples/cpp/tutorial\_code/objectDetection/detect\_board\_charuco.cpp interpolateCornersCharuco

The parameters of detectBoard are:

-   `image` - Input image.
-   `charucoCorners` - output list of image positions of the detected corners.
-   `charucoIds` - output ids for each of the detected corners in `charucoCorners`.
-   `markerCorners` - input/output vector of detected marker corners.
-   `markerIds` - input/output vector of identifiers of the detected markers

If markerCorners and markerIds are empty, the function will detect aruco markers and ids.

If calibration parameters are provided, the ChArUco corners are interpolated by, first, estimating a rough pose from the ArUco markers and, then, reprojecting the ChArUco corners back to the image.

On the other hand, if calibration parameters are not provided, the ChArUco corners are interpolated by calculating the corresponding homography between the ChArUco plane and the ChArUco image projection.

The main problem of using homography is that the interpolation is more sensible to image distortion. Actually, the homography is only performed using the closest markers of each ChArUco corner to reduce the effect of distortion.

When detecting markers for ChArUco boards, and specially when using homography, it is recommended to disable the corner refinement of markers. The reason of this is that, due to the proximity of the chessboard squares, the subpixel process can produce important deviations in the corner positions and these deviations are propagated to the ChArUco corner interpolation, producing poor results.

@note To avoid deviations, the margin between chessboard square and aruco marker should be greater than 70% of one marker module.

Furthermore, only those corners whose two surrounding markers have be found are returned. If any of the two surrounding markers has not been detected, this usually means that there is some occlusion or the image quality is not good in that zone. In any case, it is preferable not to consider that corner, since what we want is to be sure that the interpolated ChArUco corners are very accurate.

After the ChArUco corners have been interpolated, a subpixel refinement is performed.

Once we have interpolated the ChArUco corners, we would probably want to draw them to see if their detections are correct. This can be easily done using the `cv::aruco::drawDetectedCornersCharuco()` function:

@snippet samples/cpp/tutorial\_code/objectDetection/detect\_board\_charuco.cpp drawDetectedCornersCharuco

-   `imageCopy` is the image where the corners will be drawn (it will normally be the same image where the corners were detected).
-   The `outputImage` will be a clone of `inputImage` with the corners drawn.
-   `charucoCorners` and `charucoIds` are the detected Charuco corners from the `cv::aruco::CharucoDetector::detectBoard()` function.
-   Finally, the last parameter is the (optional) color we want to draw the corners with, of type `cv::Scalar`.

For this image:

The result will be:

In the presence of occlusion. like in the following image, although some corners are clearly visible, not all their surrounding markers have been detected due occlusion and, thus, they are not interpolated:

Sample video:

@youtube{Nj44m\_N\_9FY}

A full working example is included in the `detect_board_charuco.cpp` inside the `samples/cpp/tutorial_code/objectDetection/`.

The samples `detect_board_charuco.cpp` now take input via commandline via the `cv::CommandLineParser`. For this file the example parameters will look like: @code{.cpp} -w=5 -h=7 -sl=0.04 -ml=0.02 -d=10 -v=/path\_to\_opencv/opencv/doc/tutorials/objdetect/charuco\_detection/images/choriginal.jpg @endcode

## ChArUco Pose Estimation

The final goal of the ChArUco boards is finding corners very accurately for a high precision calibration or pose estimation.

The aruco module provides a function to perform ChArUco pose estimation easily. As in the `cv::aruco::GridBoard`, the coordinate system of the `cv::aruco::CharucoBoard` is placed in the board plane with the Z axis pointing in, and centered in the bottom left corner of the board.

@note After OpenCV 4.6.0, there was an incompatible change in the coordinate systems of the boards, now the coordinate systems are placed in the boards plane with the Z axis pointing in the plane (previously the axis pointed out the plane). `objPoints` in CW order correspond to the Z-axis pointing in the plane. `objPoints` in CCW order correspond to the Z-axis pointing out the plane. See PR [https://github.com/opencv/opencv\_contrib/pull/3174](https://github.com/opencv/opencv_contrib/pull/3174)

To perform pose estimation for charuco boards, you should use `cv::aruco::CharucoBoard::matchImagePoints()` and `cv::solvePnP()`:

@snippet samples/cpp/tutorial\_code/objectDetection/detect\_board\_charuco.cpp poseCharuco

-   The `charucoCorners` and `charucoIds` parameters are the detected charuco corners from the `cv::aruco::CharucoDetector::detectBoard()` function.
-   The `cameraMatrix` and `distCoeffs` are the camera calibration parameters which are necessary for pose estimation.
-   Finally, the `rvec` and `tvec` parameters are the output pose of the Charuco Board.
-   `cv::solvePnP()` returns true if the pose was correctly estimated and false otherwise. The main reason of failing is that there are not enough corners for pose estimation or they are in the same line.

The axis can be drawn using `cv::drawFrameAxes()` to check the pose is correctly estimated. The result would be: (X:red, Y:green, Z:blue)

A full working example is included in the `detect_board_charuco.cpp` inside the `samples/cpp/tutorial_code/objectDetection/`.

The samples `detect_board_charuco.cpp` now take input via commandline via the `cv::CommandLineParser`. For this file the example parameters will look like: @code{.cpp} -w=5 -h=7 -sl=0.04 -ml=0.02 -d=10 -v=/path\_to\_opencv/opencv/doc/tutorials/objdetect/charuco\_detection/images/choriginal.jpg -c=/path\_to\_opencv/opencv/samples/cpp/tutorial\_code/objectDetection/tutorial\_camera\_charuco.yml @endcode

## [Charuco Diamond Detection](https://docharvest.github.io/docs/opencv5/tutorials/objdetect/charuco_diamond_detection/charuco_diamond_detection/)

Contents

opencv5

Charuco Diamond Detection

OpenCV 5

Charuco Diamond Detection

# Detection of Diamond Markers {#tutorial\_charuco\_diamond\_detection}

@prev\_tutorial{tutorial\_charuco\_detection} @next\_tutorial{tutorial\_aruco\_calibration}

A ChArUco diamond marker (or simply diamond marker) is a chessboard composed by 3x3 squares and 4 ArUco markers inside the white squares. It is similar to a ChArUco board in appearance, however they are conceptually different.

In both, ChArUco board and Diamond markers, their detection is based on the previous detected ArUco markers. In the ChArUco case, the used markers are selected by directly looking their identifiers. This means that if a marker (included in the board) is found on a image, it will be automatically assumed to belong to the board. Furthermore, if a marker board is found more than once in the image, it will produce an ambiguity since the system wont be able to know which one should be used for the Board.

On the other hand, the detection of Diamond marker is not based on the identifiers. Instead, their detection is based on the relative position of the markers. As a consequence, marker identifiers can be repeated in the same diamond or among different diamonds, and they can be detected simultaneously without ambiguity. However, due to the complexity of finding marker based on their relative position, the diamond markers are limited to a size of 3x3 squares and 4 markers.

As in a single ArUco marker, each Diamond marker is composed by 4 corners and a identifier. The four corners correspond to the 4 chessboard corners in the marker and the identifier is actually an array of 4 numbers, which are the identifiers of the four ArUco markers inside the diamond.

Diamond markers are useful in those scenarios where repeated markers should be allowed. For instance:

-   To increase the number of identifiers of single markers by using diamond marker for labeling. They would allow up to N^4 different ids, being N the number of markers in the used dictionary.
    
-   Give to each of the four markers a conceptual meaning. For instance, one of the four marker ids could be used to indicate the scale of the marker (i.e. the size of the square), so that the same diamond can be found in the environment with different sizes just by changing one of the four markers and the user does not need to manually indicate the scale of each of them. This case is included in the `detect_diamonds.cpp` file inside the samples folder of the module.
    

Furthermore, as its corners are chessboard corners, they can be used for accurate pose estimation.

The diamond functionalities are included in `<opencv2/objdetect/charuco_detector.hpp>`

## ChArUco Diamond Creation

The image of a diamond marker can be easily created using the `cv::aruco::CharucoBoard::generateImage()` function. For instance:

@snippet samples/cpp/tutorial\_code/objectDetection/create\_diamond.cpp generate\_diamond

This will create a diamond marker image with a square size of 200 pixels and a marker size of 120 pixels. The marker ids are given in the second parameter as a `cv::Vec4i` object. The order of the marker ids in the diamond layout are the same as in a standard ChArUco board, i.e. top, left, right and bottom.

The image produced will be:

A full working example is included in the `create_diamond.cpp` inside the `samples/cpp/tutorial_code/objectDetection/`.

The samples `create_diamond.cpp` now take input via commandline via the `cv::CommandLineParser`. For this file the example parameters will look like: @code{.cpp} "_path_/mydiamond.png" -sl=200 -ml=120 -d=10 -ids=0,1,2,3 @endcode

## ChArUco Diamond Detection

As in most cases, the detection of diamond markers requires a previous detection of ArUco markers. After detecting markers, diamond are detected using the `cv::aruco::CharucoDetector::detectDiamonds()` function:

@snippet samples/cpp/tutorial\_code/objectDetection/detect\_diamonds.cpp detect\_diamonds

The `cv::aruco::CharucoDetector::detectDiamonds()` function receives the original image and the previous detected marker corners and ids. If markerCorners and markerIds are empty, the function will detect aruco markers and ids. The input image is necessary to perform subpixel refinement in the ChArUco corners. It also receives the rate between the square size and the marker sizes which is required for both, detecting the diamond from the relative positions of the markers and interpolating the ChArUco corners.

The function returns the detected diamonds in two parameters. The first parameter, `diamondCorners`, is an array containing all the four corners of each detected diamond. Its format is similar to the detected corners by the `cv::aruco::ArucoDetector::detectMarkers()` function and, for each diamond, the corners are represented in the same order than in the ArUco markers, i.e. clockwise order starting with the top-left corner. The second returned parameter, `diamondIds`, contains all the ids of the returned diamond corners in `diamondCorners`. Each id is actually an array of 4 integers that can be represented with `cv::Vec4i`.

The detected diamond can be visualized using the function `cv::aruco::drawDetectedDiamonds()` which simply receives the image and the diamond corners and ids:

@snippet samples/cpp/tutorial\_code/objectDetection/detect\_diamonds.cpp draw\_diamonds

The result is the same that the one produced by `cv::aruco::drawDetectedMarkers()`, but printing the four ids of the diamond:

A full working example is included in the `detect_diamonds.cpp` inside the `samples/cpp/tutorial_code/objectDetection/`.

The samples `detect_diamonds.cpp` now take input via commandline via the `cv::CommandLineParser`. For this file the example parameters will look like: @code{.cpp} -dp=path\_to\_opencv/opencv/samples/cpp/tutorial\_code/objectDetection/detector\_params.yml -sl=0.4 -ml=0.25 -refine=3 -v=path\_to\_opencv/opencv/doc/tutorials/objdetect/charuco\_diamond\_detection/images/diamondmarkers.jpg -cd=path\_to\_opencv/opencv/samples/cpp/tutorial\_code/objectDetection/tutorial\_dict.yml @endcode

## ChArUco Diamond Pose Estimation

Since a ChArUco diamond is represented by its four corners, its pose can be estimated in the same way than in a single ArUco marker, i.e. using the `cv::solvePnP()` function. For instance:

@snippet samples/cpp/tutorial\_code/objectDetection/detect\_diamonds.cpp diamond\_pose\_estimation @snippet samples/cpp/tutorial\_code/objectDetection/detect\_diamonds.cpp draw\_diamond\_pose\_estimation

The function will obtain the rotation and translation vector for each of the diamond marker and store them in `rvecs` and `tvecs`. Note that the diamond corners are a chessboard square corners and thus, the square length has to be provided for pose estimation, and not the marker length. Camera calibration parameters are also required.

Finally, an axis can be drawn to check the estimated pose is correct using `drawFrameAxes()`:

The coordinate system of the diamond pose will be in the center of the marker with the Z axis pointing out, as in a simple ArUco marker pose estimation.

Sample video:

@youtube{OqKpBnglH7k}

Also ChArUco diamond pose can be estimated as ChArUco board: @snippet samples/cpp/tutorial\_code/objectDetection/detect\_diamonds.cpp diamond\_pose\_estimation\_as\_charuco

A full working example is included in the `detect_diamonds.cpp` inside the `samples/cpp/tutorial_code/objectDetection/`.

The samples `detect_diamonds.cpp` now take input via commandline via the `cv::CommandLineParser`. For this file the example parameters will look like: @code{.cpp} -dp=path\_to\_opencv/opencv/samples/cpp/tutorial\_code/objectDetection/detector\_params.yml -sl=0.4 -ml=0.25 -refine=3 -v=path\_to\_opencv/opencv/doc/tutorials/objdetect/charuco\_diamond\_detection/images/diamondmarkers.jpg -cd=path\_to\_opencv/opencv/samples/cpp/tutorial\_code/objectDetection/tutorial\_dict.yml -c=path\_to\_opencv/opencv/samples/cpp/tutorial\_code/objectDetection/tutorial\_camera\_params.yml @endcode

## [Debugging The System](https://docharvest.github.io/docs/opencv5/tutorials/objdetect/macbeth_chart_detection/debugging_the_system/)

Contents

opencv5

Debugging The System

OpenCV 5

Debugging The System

# Customising and Debugging the detection system{#tutorial\_mcc\_debugging\_the\_system}

There are many hyperparameters that are involved in the detection of a chart.The default values are chosen to maximize the detections in the average case. But these might not be best for your use case.These values can be configured to improve the accuracy for a particular use case. To do this, you would need to create an instance of `DetectorParameters`.

```
    mcc::Ptr<DetectorParameters> params = mcc::DetectorParameters::create();
```

-   `mcc::` is important.

It contains a lot of values, the complete list can be found in the documentation for `DetectorParameters`. For this tutorial we will be playing with the value of `maxError`. The other values can be configured similarly.

`maxError` controls how much error is allowed in detection. Like if some chart cell is occluded. It will increase the error. The default value allows some level of tolerance to occlusions, increasing(or decreasing) `maxError`, will increase(or decrease) this tolerance.

You can change its value simply like this.

```
    params.maxError = 0.5;
```

To use this in the detection system, you would need to pass it to the process function.

```
    Ptr<CCheckerDetector> detector = CCheckerDetector::create();
    detector->process(image, chartType, params = params);
```

Thats how easy is it to play with the values. But there is a catch, there are a lot of parts in the detection pipeline. If you simply run it like this you would not be able to see the effect of this change in isolation. It is possible that the preceding parts detected no possible colorchecker candidates, and so changing the value of `maxError` will have no effect. Luckily OpenCV provides a solution for this. You can make the code output a multiple images, each one showing the effect of one part of the pipeling. This is disabled by default.

-   This can only be used if you are compiling from sources. If you can't build from souces, and still need this feature,try raising as issue in the OpenCV repo.

To do this : Open the file `opencv/modules/objdetect/include/opencv2/objdetect/mcc_checker_detector.hpp`, near the top there is this line

```
// #define MCC_DEBUG
```

Uncomment this line and rebuild opencv. After this whenever you run the detector, It will show you multiple images, each corresponding to a part of the pipeline. Also you might see some repetetions like first you will see `Thresholding Output`, then some more images, and again `Thresholding Output` corresponding to same image, but slightly different from previous one, it is because internally the image is thesholded multiple times, with different parameters to adjust for different possible sizes of the colorchecker.

## [Macbeth Chart Detection](https://docharvest.github.io/docs/opencv5/tutorials/objdetect/macbeth_chart_detection/macbeth_chart_detection/)


## [Table Of Content Objdetect](https://docharvest.github.io/docs/opencv5/tutorials/objdetect/table_of_content_objdetect/)

Contents

opencv5

Table Of Content Objdetect

OpenCV 5

Table Of Content Objdetect

# Object Detection (objdetect module) {#tutorial\_table\_of\_content\_objdetect}

-   @subpage tutorial\_aruco\_detection
-   @subpage tutorial\_aruco\_board\_detection
-   @subpage tutorial\_charuco\_detection
-   @subpage tutorial\_charuco\_diamond\_detection
-   @subpage tutorial\_aruco\_calibration
-   @subpage tutorial\_aruco\_faq
-   @subpage tutorial\_barcode\_detect\_and\_decode
-   @subpage tutorial\_macbeth\_chart\_detection
-   @subpage tutorial\_mcc\_debugging\_the\_system

## [Table Of Content Stitching](https://docharvest.github.io/docs/opencv5/tutorials/others/_old/table_of_content_stitching/)

Contents

opencv5

Table Of Content Stitching

OpenCV 5

Table Of Content Stitching

# Images stitching (stitching module) {#tutorial\_table\_of\_content\_stitching}

Content has been moved to this page: @ref tutorial\_table\_of\_content\_other

## [Table Of Content Video](https://docharvest.github.io/docs/opencv5/tutorials/others/_old/table_of_content_video/)

Contents

opencv5

Table Of Content Video

OpenCV 5

Table Of Content Video

# Video analysis (video module) {#tutorial\_table\_of\_content\_video}

Content has been moved to this page: @ref tutorial\_table\_of\_content\_other

## [Background Subtraction](https://docharvest.github.io/docs/opencv5/tutorials/others/background_subtraction/)

Contents

opencv5

Background Subtraction

OpenCV 5

Background Subtraction

# How to Use Background Subtraction Methods {#tutorial\_background\_subtraction}

@tableofcontents

@prev\_tutorial{tutorial\_stitcher} @next\_tutorial{tutorial\_meanshift}

Original author

Domenico Daniele Bloisi

Compatibility

OpenCV >= 3.0

-   Background subtraction (BS) is a common and widely used technique for generating a foreground mask (namely, a binary image containing the pixels belonging to moving objects in the scene) by using static cameras.
    
-   As the name suggests, BS calculates the foreground mask performing a subtraction between the current frame and a background model, containing the static part of the scene or, more in general, everything that can be considered as background given the characteristics of the observed scene.
    
-   Background modeling consists of two main steps:
    
    \-# Background Initialization; -# Background Update.
    
    In the first step, an initial model of the background is computed, while in the second step that model is updated in order to adapt to possible changes in the scene.
    
-   In this tutorial we will learn how to perform BS by using OpenCV.
    

## Goals

In this tutorial you will learn how to:

\-# Read data from videos or image sequences by using @ref cv::VideoCapture ; -# Create and update the background model by using @ref cv::BackgroundSubtractor class; -# Get and show the foreground mask by using @ref cv::imshow ;

### Code

In the following you can find the source code. We will let the user choose to process either a video file or a sequence of images.

We will use @ref cv::BackgroundSubtractorMOG2 in this sample, to generate the foreground mask.

The results as well as the input data are shown on the screen.

@add\_toggle\_cpp

-   **Downloadable code**: Click [here](https://github.com/opencv/opencv/tree/5.x/samples/cpp/tutorial_code/video/bg_sub.cpp)
    
-   **Code at glance:** @include samples/cpp/tutorial\_code/video/bg\_sub.cpp @end\_toggle
    

@add\_toggle\_java

-   **Downloadable code**: Click [here](https://github.com/opencv/opencv/tree/5.x/samples/java/tutorial_code/video/background_subtraction/BackgroundSubtractionDemo.java)
    
-   **Code at glance:** @include samples/java/tutorial\_code/video/background\_subtraction/BackgroundSubtractionDemo.java @end\_toggle
    

@add\_toggle\_python

-   **Downloadable code**: Click [here](https://github.com/opencv/opencv/tree/5.x/samples/python/tutorial_code/video/background_subtraction/bg_sub.py)
    
-   **Code at glance:** @include samples/python/tutorial\_code/video/background\_subtraction/bg\_sub.py @end\_toggle
    

## Explanation

We discuss the main parts of the code above:

-   A @ref cv::BackgroundSubtractor object will be used to generate the foreground mask. In this example, default parameters are used, but it is also possible to declare specific parameters in the create function.

@add\_toggle\_cpp @snippet samples/cpp/tutorial\_code/video/bg\_sub.cpp create @end\_toggle

@add\_toggle\_java @snippet samples/java/tutorial\_code/video/background\_subtraction/BackgroundSubtractionDemo.java create @end\_toggle

@add\_toggle\_python @snippet samples/python/tutorial\_code/video/background\_subtraction/bg\_sub.py create @end\_toggle

-   A @ref cv::VideoCapture object is used to read the input video or input images sequence.

@add\_toggle\_cpp @snippet samples/cpp/tutorial\_code/video/bg\_sub.cpp capture @end\_toggle

@add\_toggle\_java @snippet samples/java/tutorial\_code/video/background\_subtraction/BackgroundSubtractionDemo.java capture @end\_toggle

@add\_toggle\_python @snippet samples/python/tutorial\_code/video/background\_subtraction/bg\_sub.py capture @end\_toggle

-   Every frame is used both for calculating the foreground mask and for updating the background. If you want to change the learning rate used for updating the background model, it is possible to set a specific learning rate by passing a parameter to the `apply` method.

@add\_toggle\_cpp @snippet samples/cpp/tutorial\_code/video/bg\_sub.cpp apply @end\_toggle

@add\_toggle\_java @snippet samples/java/tutorial\_code/video/background\_subtraction/BackgroundSubtractionDemo.java apply @end\_toggle

@add\_toggle\_python @snippet samples/python/tutorial\_code/video/background\_subtraction/bg\_sub.py apply @end\_toggle

-   The current frame number can be extracted from the @ref cv::VideoCapture object and stamped in the top left corner of the current frame. A white rectangle is used to highlight the black colored frame number.

@add\_toggle\_cpp @snippet samples/cpp/tutorial\_code/video/bg\_sub.cpp display\_frame\_number @end\_toggle

@add\_toggle\_java @snippet samples/java/tutorial\_code/video/background\_subtraction/BackgroundSubtractionDemo.java display\_frame\_number @end\_toggle

@add\_toggle\_python @snippet samples/python/tutorial\_code/video/background\_subtraction/bg\_sub.py display\_frame\_number @end\_toggle

-   We are ready to show the current input frame and the results.

@add\_toggle\_cpp @snippet samples/cpp/tutorial\_code/video/bg\_sub.cpp show @end\_toggle

@add\_toggle\_java @snippet samples/java/tutorial\_code/video/background\_subtraction/BackgroundSubtractionDemo.java show @end\_toggle

@add\_toggle\_python @snippet samples/python/tutorial\_code/video/background\_subtraction/bg\_sub.py show @end\_toggle

## Results

-   With the `vtest.avi` video, for the following frame:
    
    The output of the program will look as the following for MOG2 method (gray areas are detected shadows):
    
    The output of the program will look as the following for the KNN method (gray areas are detected shadows):
    

## References

-   [Background Models Challenge (BMC) website](https://web.archive.org/web/20140418093037/http://bmc.univ-bpclermont.fr/)
-   A Benchmark Dataset for Foreground/Background Extraction @cite vacavant2013benchmark

## [Introduction To Pca](https://docharvest.github.io/docs/opencv5/tutorials/others/introduction_to_pca/)

Contents

opencv5

Introduction To Pca

OpenCV 5

Introduction To Pca

# Introduction to Principal Component Analysis (PCA) {#tutorial\_introduction\_to\_pca}

@tableofcontents

@prev\_tutorial{tutorial\_optical\_flow}

Original author

Theodore Tsesmelis

Compatibility

OpenCV >= 3.0

## Goal

In this tutorial you will learn how to:

-   Use the OpenCV class @ref cv::PCA to calculate the orientation of an object.

## What is PCA?

Principal Component Analysis (PCA) is a statistical procedure that extracts the most important features of a dataset.

Consider that you have a set of 2D points as it is shown in the figure above. Each dimension corresponds to a feature you are interested in. Here some could argue that the points are set in a random order. However, if you have a better look you will see that there is a linear pattern (indicated by the blue line) which is hard to dismiss. A key point of PCA is the Dimensionality Reduction. Dimensionality Reduction is the process of reducing the number of the dimensions of the given dataset. For example, in the above case it is possible to approximate the set of points to a single line and therefore, reduce the dimensionality of the given points from 2D to 1D.

Moreover, you could also see that the points vary the most along the blue line, more than they vary along the Feature 1 or Feature 2 axes. This means that if you know the position of a point along the blue line you have more information about the point than if you only knew where it was on Feature 1 axis or Feature 2 axis.

Hence, PCA allows us to find the direction along which our data varies the most. In fact, the result of running PCA on the set of points in the diagram consist of 2 vectors called _eigenvectors_ which are the _principal components_ of the data set.

The size of each eigenvector is encoded in the corresponding eigenvalue and indicates how much the data vary along the principal component. The beginning of the eigenvectors is the center of all points in the data set. Applying PCA to N-dimensional data set yields N N-dimensional eigenvectors, N eigenvalues and 1 N-dimensional center point. Enough theory, let’s see how we can put these ideas into code.

## How are the eigenvectors and eigenvalues computed?

The goal is to transform a given data set **X** of dimension _p_ to an alternative data set **Y** of smaller dimension _L_. Equivalently, we are seeking to find the matrix **Y**, where **Y** is the _Karhunen–Loève transform_ (KLT) of matrix **X**:

\\f\[ \\mathbf{Y} = \\mathbb{K} \\mathbb{L} \\mathbb{T} {\\mathbf{X}} \\f\]

**Organize the data set**

Suppose you have data comprising a set of observations of _p_ variables, and you want to reduce the data so that each observation can be described with only _L_ variables, _L_ < _p_. Suppose further, that the data are arranged as a set of _n_ data vectors \\f$ x\_1...x\_n \\f$ with each \\f$ x\_i \\f$ representing a single grouped observation of the _p_ variables.

-   Write \\f$ x\_1...x\_n \\f$ as row vectors, each of which has _p_ columns.
-   Place the row vectors into a single matrix **X** of dimensions \\f$ n\\times p \\f$.

**Calculate the empirical mean**

-   Find the empirical mean along each dimension \\f$ j = 1, ..., p \\f$.
    
-   Place the calculated mean values into an empirical mean vector **u** of dimensions \\f$ p\\times 1 \\f$.
    
    \\f\[ \\mathbf{u\[j\]} = \\frac{1}{n}\\sum\_{i=1}^{n}\\mathbf{X\[i,j\]} \\f\]
    

**Calculate the deviations from the mean**

Mean subtraction is an integral part of the solution towards finding a principal component basis that minimizes the mean square error of approximating the data. Hence, we proceed by centering the data as follows:

-   Subtract the empirical mean vector **u** from each row of the data matrix **X**.
    
-   Store mean-subtracted data in the \\f$ n\\times p \\f$ matrix **B**.
    
    \\f\[ \\mathbf{B} = \\mathbf{X} - \\mathbf{h}\\mathbf{u^{T}} \\f\]
    
    where **h** is an \\f$ n\\times 1 \\f$ column vector of all 1s:
    
    \\f\[ h\[i\] = 1, i = 1, ..., n \\f\]
    

**Find the covariance matrix**

-   Find the \\f$ p\\times p \\f$ empirical covariance matrix **C** from the outer product of matrix **B** with itself:
    
    \\f\[ \\mathbf{C} = \\frac{1}{n-1} \\mathbf{B^{\*}} \\cdot \\mathbf{B} \\f\]
    
    where \* is the conjugate transpose operator. Note that if B consists entirely of real numbers, which is the case in many applications, the "conjugate transpose" is the same as the regular transpose.
    

**Find the eigenvectors and eigenvalues of the covariance matrix**

-   Compute the matrix **V** of eigenvectors which diagonalizes the covariance matrix **C**:
    
    \\f\[ \\mathbf{V^{-1}} \\mathbf{C} \\mathbf{V} = \\mathbf{D} \\f\]
    
    where **D** is the diagonal matrix of eigenvalues of **C**.
    
-   Matrix **D** will take the form of an \\f$ p \\times p \\f$ diagonal matrix:
    
    \\f\[ D\[k,l\] = \\left{\\begin{matrix} \\lambda\_k, k = l \\ 0, k \\neq l \\end{matrix}\\right. \\f\]
    
    here, \\f$ \\lambda\_j \\f$ is the _j_\-th eigenvalue of the covariance matrix **C**
    
-   Matrix **V**, also of dimension _p_ x _p_, contains _p_ column vectors, each of length _p_, which represent the _p_ eigenvectors of the covariance matrix **C**.
    
-   The eigenvalues and eigenvectors are ordered and paired. The _j_ th eigenvalue corresponds to the _j_ th eigenvector.
    

@note sources [\[1\]](https://robospace.wordpress.com/2013/10/09/object-orientation-principal-component-analysis-opencv/), [\[2\]](http://en.wikipedia.org/wiki/Principal_component_analysis) and special thanks to Svetlin Penkov for the original tutorial.

## Source Code

@add\_toggle\_cpp

-   **Downloadable code**: Click [here](https://github.com/opencv/opencv/tree/5.x/samples/cpp/tutorial_code/ml/introduction_to_pca/introduction_to_pca.cpp)
    
-   **Code at glance:** @include samples/cpp/tutorial\_code/ml/introduction\_to\_pca/introduction\_to\_pca.cpp @end\_toggle
    

@add\_toggle\_java

-   **Downloadable code**: Click [here](https://github.com/opencv/opencv/tree/5.x/samples/java/tutorial_code/ml/introduction_to_pca/IntroductionToPCADemo.java)
    
-   **Code at glance:** @include samples/java/tutorial\_code/ml/introduction\_to\_pca/IntroductionToPCADemo.java @end\_toggle
    

@add\_toggle\_python

-   **Downloadable code**: Click [here](https://github.com/opencv/opencv/tree/5.x/samples/python/tutorial_code/ml/introduction_to_pca/introduction_to_pca.py)
    
-   **Code at glance:** @include samples/python/tutorial\_code/ml/introduction\_to\_pca/introduction\_to\_pca.py @end\_toggle
    

@note Another example using PCA for dimensionality reduction while maintaining an amount of variance can be found at [opencv\_source\_code/samples/cpp/pca.cpp](https://github.com/opencv/opencv/tree/5.x/samples/cpp/pca.cpp)

## Explanation

-   **Read image and convert it to binary**

Here we apply the necessary pre-processing procedures in order to be able to detect the objects of interest.

@add\_toggle\_cpp @snippet samples/cpp/tutorial\_code/ml/introduction\_to\_pca/introduction\_to\_pca.cpp pre-process @end\_toggle

@add\_toggle\_java @snippet samples/java/tutorial\_code/ml/introduction\_to\_pca/IntroductionToPCADemo.java pre-process @end\_toggle

@add\_toggle\_python @snippet samples/python/tutorial\_code/ml/introduction\_to\_pca/introduction\_to\_pca.py pre-process @end\_toggle

-   **Extract objects of interest**

Then find and filter contours by size and obtain the orientation of the remaining ones.

@add\_toggle\_cpp @snippet samples/cpp/tutorial\_code/ml/introduction\_to\_pca/introduction\_to\_pca.cpp contours @end\_toggle

@add\_toggle\_java @snippet samples/java/tutorial\_code/ml/introduction\_to\_pca/IntroductionToPCADemo.java contours @end\_toggle

@add\_toggle\_python @snippet samples/python/tutorial\_code/ml/introduction\_to\_pca/introduction\_to\_pca.py contours @end\_toggle

-   **Extract orientation**

Orientation is extracted by the call of getOrientation() function, which performs all the PCA procedure.

@add\_toggle\_cpp @snippet samples/cpp/tutorial\_code/ml/introduction\_to\_pca/introduction\_to\_pca.cpp pca @end\_toggle

@add\_toggle\_java @snippet samples/java/tutorial\_code/ml/introduction\_to\_pca/IntroductionToPCADemo.java pca @end\_toggle

@add\_toggle\_python @snippet samples/python/tutorial\_code/ml/introduction\_to\_pca/introduction\_to\_pca.py pca @end\_toggle

First the data need to be arranged in a matrix with size n x 2, where n is the number of data points we have. Then we can perform that PCA analysis. The calculated mean (i.e. center of mass) is stored in the _cntr_ variable and the eigenvectors and eigenvalues are stored in the corresponding std::vector’s.

-   **Visualize result**

The final result is visualized through the drawAxis() function, where the principal components are drawn in lines, and each eigenvector is multiplied by its eigenvalue and translated to the mean position.

@add\_toggle\_cpp @snippet samples/cpp/tutorial\_code/ml/introduction\_to\_pca/introduction\_to\_pca.cpp visualization @end\_toggle

@add\_toggle\_java @snippet samples/java/tutorial\_code/ml/introduction\_to\_pca/IntroductionToPCADemo.java visualization @end\_toggle

@add\_toggle\_python @snippet samples/python/tutorial\_code/ml/introduction\_to\_pca/introduction\_to\_pca.py visualization @end\_toggle

@add\_toggle\_cpp @snippet samples/cpp/tutorial\_code/ml/introduction\_to\_pca/introduction\_to\_pca.cpp visualization1 @end\_toggle

@add\_toggle\_java @snippet samples/java/tutorial\_code/ml/introduction\_to\_pca/IntroductionToPCADemo.java visualization1 @end\_toggle

@add\_toggle\_python @snippet samples/python/tutorial\_code/ml/introduction\_to\_pca/introduction\_to\_pca.py visualization1 @end\_toggle

## Results

The code opens an image, finds the orientation of the detected objects of interest and then visualizes the result by drawing the contours of the detected objects of interest, the center point, and the x-axis, y-axis regarding the extracted orientation.

## [Meanshift](https://docharvest.github.io/docs/opencv5/tutorials/others/meanshift/)

Contents

opencv5

Meanshift

OpenCV 5

Meanshift

# Meanshift and Camshift {#tutorial\_meanshift}

@tableofcontents

@prev\_tutorial{tutorial\_background\_subtraction} @next\_tutorial{tutorial\_optical\_flow}

## Goal

In this chapter,

-   We will learn about the Meanshift and Camshift algorithms to track objects in videos.

## Meanshift

The intuition behind the meanshift is simple. Consider you have a set of points. (It can be a pixel distribution like histogram backprojection). You are given a small window (may be a circle) and you have to move that window to the area of maximum pixel density (or maximum number of points). It is illustrated in the simple image given below:

The initial window is shown in blue circle with the name "C1". Its original center is marked in blue rectangle, named "C1\_o". But if you find the centroid of the points inside that window, you will get the point "C1\_r" (marked in small blue circle) which is the real centroid of the window. Surely they don't match. So move your window such that the circle of the new window matches with the previous centroid. Again find the new centroid. Most probably, it won't match. So move it again, and continue the iterations such that the center of window and its centroid falls on the same location (or within a small desired error). So finally what you obtain is a window with maximum pixel distribution. It is marked with a green circle, named "C2". As you can see in the image, it has maximum number of points. The whole process is demonstrated on a static image below:

So we normally pass the histogram backprojected image and initial target location. When the object moves, obviously the movement is reflected in the histogram backprojected image. As a result, the meanshift algorithm moves our window to the new location with maximum density.

### Meanshift in OpenCV

To use meanshift in OpenCV, first we need to setup the target, find its histogram so that we can backproject the target on each frame for calculation of meanshift. We also need to provide an initial location of window. For histogram, only Hue is considered here. Also, to avoid false values due to low light, low light values are discarded using **cv.inRange()** function.

@add\_toggle\_cpp

-   **Downloadable code**: Click [here](https://github.com/opencv/opencv/tree/5.x/samples/cpp/tutorial_code/video/meanshift/meanshift.cpp)
    
-   **Code at glance:** @include samples/cpp/tutorial\_code/video/meanshift/meanshift.cpp @end\_toggle
    

@add\_toggle\_python

-   **Downloadable code**: Click [here](https://github.com/opencv/opencv/tree/5.x/samples/python/tutorial_code/video/meanshift/meanshift.py)
    
-   **Code at glance:** @include samples/python/tutorial\_code/video/meanshift/meanshift.py @end\_toggle
    

@add\_toggle\_java

-   **Downloadable code**: Click [here](https://github.com/opencv/opencv/tree/5.x/samples/java/tutorial_code/video/meanshift/MeanshiftDemo.java)
    
-   **Code at glance:** @include samples/java/tutorial\_code/video/meanshift/MeanshiftDemo.java @end\_toggle
    

Three frames in a video I used is given below:

## Camshift

Did you closely watch the last result? There is a problem. Our window always has the same size whether the car is very far or very close to the camera. That is not good. We need to adapt the window size with size and rotation of the target. Once again, the solution came from "OpenCV Labs" and it is called CAMshift (Continuously Adaptive Meanshift) published by Gary Bradsky in his paper "Computer Vision Face Tracking for Use in a Perceptual User Interface" in 1998 @cite Bradski98 .

It applies meanshift first. Once meanshift converges, it updates the size of the window as, \\f$s = 2 \\times \\sqrt{\\frac{M\_{00}}{256}}\\f$. It also calculates the orientation of the best fitting ellipse to it. Again it applies the meanshift with new scaled search window and previous window location. The process continues until the required accuracy is met.

### Camshift in OpenCV

It is similar to meanshift, but returns a rotated rectangle (that is our result) and box parameters (used to be passed as search window in next iteration). See the code below:

@add\_toggle\_cpp

-   **Downloadable code**: Click [here](https://github.com/opencv/opencv/tree/5.x/samples/cpp/tutorial_code/video/meanshift/camshift.cpp)
    
-   **Code at glance:** @include samples/cpp/tutorial\_code/video/meanshift/camshift.cpp @end\_toggle
    

@add\_toggle\_python

-   **Downloadable code**: Click [here](https://github.com/opencv/opencv/tree/5.x/samples/python/tutorial_code/video/meanshift/camshift.py)
    
-   **Code at glance:** @include samples/python/tutorial\_code/video/meanshift/camshift.py @end\_toggle
    

@add\_toggle\_java

-   **Downloadable code**: Click [here](https://github.com/opencv/opencv/tree/5.x/samples/java/tutorial_code/video/meanshift/CamshiftDemo.java)
    
-   **Code at glance:** @include samples/java/tutorial\_code/video/meanshift/CamshiftDemo.java @end\_toggle
    

Three frames of the result is shown below:

## Additional Resources

\-# French Wikipedia page on [Camshift](http://fr.wikipedia.org/wiki/Camshift). (The two animations are taken from there) 2. Bradski, G.R., "Real time face and object tracking as a component of a perceptual user interface," Applications of Computer Vision, 1998. WACV '98. Proceedings., Fourth IEEE Workshop on , vol., no., pp.214,219, 19-21 Oct 1998

## Exercises

\-# OpenCV comes with a Python [sample](https://github.com/opencv/opencv/blob/5.x/samples/python/snippets/camshift.py) for an interactive demo of camshift. Use it, hack it, understand it.

## [Optical Flow](https://docharvest.github.io/docs/opencv5/tutorials/others/optical_flow/)

Contents

opencv5

Optical Flow

OpenCV 5

Optical Flow

# Optical Flow {#tutorial\_optical\_flow}

@tableofcontents

@prev\_tutorial{tutorial\_meanshift} @next\_tutorial{tutorial\_introduction\_to\_pca}

## Goal

In this chapter, - We will understand the concepts of optical flow and its estimation using Lucas-Kanade method. - We will use functions like **cv.calcOpticalFlowPyrLK()** to track feature points in a video. - We will create a dense optical flow field using the **cv.calcOpticalFlowFarneback()** method.

## Optical Flow

Optical flow is the pattern of apparent motion of image objects between two consecutive frames caused by the movement of object or camera. It is 2D vector field where each vector is a displacement vector showing the movement of points from first frame to second. Consider the image below (Image Courtesy: [Wikipedia article on Optical Flow](http://en.wikipedia.org/wiki/Optical_flow)).

It shows a ball moving in 5 consecutive frames. The arrow shows its displacement vector. Optical flow has many applications in areas like :

-   Structure from Motion
-   Video Compression
-   Video Stabilization ...

Optical flow works on several assumptions:

\-# The pixel intensities of an object do not change between consecutive frames. 2. Neighbouring pixels have similar motion.

Consider a pixel \\f$I(x,y,t)\\f$ in first frame (Check a new dimension, time, is added here. Earlier we were working with images only, so no need of time). It moves by distance \\f$(dx,dy)\\f$ in next frame taken after \\f$dt\\f$ time. So since those pixels are the same and intensity does not change, we can say,

\\f\[I(x,y,t) = I(x+dx, y+dy, t+dt)\\f\]

Then take taylor series approximation of right-hand side, remove common terms and divide by \\f$dt\\f$ to get the following equation:

\\f\[f\_x u + f\_y v + f\_t = 0 ;\\f\]

where:

\\f\[f\_x = \\frac{\\partial f}{\\partial x} ; ; ; f\_y = \\frac{\\partial f}{\\partial y}\\f\]\\f\[u = \\frac{dx}{dt} ; ; ; v = \\frac{dy}{dt}\\f\]

Above equation is called Optical Flow equation. In it, we can find \\f$f\_x\\f$ and \\f$f\_y\\f$, they are image gradients. Similarly \\f$f\_t\\f$ is the gradient along time. But \\f$(u,v)\\f$ is unknown. We cannot solve this one equation with two unknown variables. So several methods are provided to solve this problem and one of them is Lucas-Kanade.

### Lucas-Kanade method

We have seen an assumption before, that all the neighbouring pixels will have similar motion. Lucas-Kanade method takes a 3x3 patch around the point. So all the 9 points have the same motion. We can find \\f$(f\_x, f\_y, f\_t)\\f$ for these 9 points. So now our problem becomes solving 9 equations with two unknown variables which is over-determined. A better solution is obtained with least square fit method. Below is the final solution which is two equation-two unknown problem and solve to get the solution.

\\f\[\\begin{bmatrix} u \\ v \\end{bmatrix} = \\begin{bmatrix} \\sum\_{i}{f\_{x\_i}}^2 & \\sum\_{i}{f\_{x\_i} f\_{y\_i} } \\ \\sum\_{i}{f\_{x\_i} f\_{y\_i}} & \\sum\_{i}{f\_{y\_i}}^2 \\end{bmatrix}^{-1} \\begin{bmatrix} - \\sum\_{i}{f\_{x\_i} f\_{t\_i}} \\ - \\sum\_{i}{f\_{y\_i} f\_{t\_i}} \\end{bmatrix}\\f\]

( Check similarity of inverse matrix with Harris corner detector. It denotes that corners are better points to be tracked.)

So from the user point of view, the idea is simple, we give some points to track, we receive the optical flow vectors of those points. But again there are some problems. Until now, we were dealing with small motions, so it fails when there is a large motion. To deal with this we use pyramids. When we go up in the pyramid, small motions are removed and large motions become small motions. So by applying Lucas-Kanade there, we get optical flow along with the scale.

## Lucas-Kanade Optical Flow in OpenCV

OpenCV provides all these in a single function, **cv.calcOpticalFlowPyrLK()**. Here, we create a simple application which tracks some points in a video. To decide the points, we use **cv.goodFeaturesToTrack()**. We take the first frame, detect some Shi-Tomasi corner points in it, then we iteratively track those points using Lucas-Kanade optical flow. For the function **cv.calcOpticalFlowPyrLK()** we pass the previous frame, previous points and next frame. It returns next points along with some status numbers which has a value of 1 if next point is found, else zero. We iteratively pass these next points as previous points in next step. See the code below:

@add\_toggle\_cpp

-   **Downloadable code**: Click [here](https://github.com/opencv/opencv/tree/5.x/samples/cpp/tutorial_code/video/optical_flow/optical_flow.cpp)
    
-   **Code at glance:** @include samples/cpp/tutorial\_code/video/optical\_flow/optical\_flow.cpp @end\_toggle
    

@add\_toggle\_python

-   **Downloadable code**: Click [here](https://github.com/opencv/opencv/tree/5.x/samples/python/tutorial_code/video/optical_flow/optical_flow.py)
    
-   **Code at glance:** @include samples/python/tutorial\_code/video/optical\_flow/optical\_flow.py @end\_toggle
    

@add\_toggle\_java

-   **Downloadable code**: Click [here](https://github.com/opencv/opencv/tree/5.x/samples/java/tutorial_code/video/optical_flow/OpticalFlowDemo.java)
    
-   **Code at glance:** @include samples/java/tutorial\_code/video/optical\_flow/OpticalFlowDemo.java @end\_toggle
    

(This code doesn't check how correct are the next keypoints. So even if any feature point disappears in image, there is a chance that optical flow finds the next point which may look close to it. So actually for a robust tracking, corner points should be detected in particular intervals. OpenCV samples comes up with such a sample which finds the feature points at every 5 frames. It also run a backward-check of the optical flow points got to select only good ones. Check samples/python/lk\_track.py).

See the results we got:

## Dense Optical Flow in OpenCV

Lucas-Kanade method computes optical flow for a sparse feature set (in our example, corners detected using Shi-Tomasi algorithm). OpenCV provides another algorithm to find the dense optical flow. It computes the optical flow for all the points in the frame. It is based on Gunnar Farneback's algorithm which is explained in "Two-Frame Motion Estimation Based on Polynomial Expansion" by Gunnar Farneback in 2003.

Below sample shows how to find the dense optical flow using above algorithm. We get a 2-channel array with optical flow vectors, \\f$(u,v)\\f$. We find their magnitude and direction. We color code the result for better visualization. Direction corresponds to Hue value of the image. Magnitude corresponds to Value plane. See the code below:

@add\_toggle\_cpp

-   **Downloadable code**: Click [here](https://github.com/opencv/opencv/tree/5.x/samples/cpp/tutorial_code/video/optical_flow/optical_flow_dense.cpp)
    
-   **Code at glance:** @include samples/cpp/tutorial\_code/video/optical\_flow/optical\_flow\_dense.cpp @end\_toggle
    

@add\_toggle\_python

-   **Downloadable code**: Click [here](https://github.com/opencv/opencv/tree/5.x/samples/python/tutorial_code/video/optical_flow/optical_flow_dense.py)
    
-   **Code at glance:** @include samples/python/tutorial\_code/video/optical\_flow/optical\_flow\_dense.py @end\_toggle
    

@add\_toggle\_java

-   **Downloadable code**: Click [here](https://github.com/opencv/opencv/tree/5.x/samples/java/tutorial_code/video/optical_flow/OpticalFlowDenseDemo.java)
    
-   **Code at glance:** @include samples/java/tutorial\_code/video/optical\_flow/OpticalFlowDenseDemo.java @end\_toggle
    

See the result below:

## [Stitcher](https://docharvest.github.io/docs/opencv5/tutorials/others/stitcher/)

Contents

opencv5

Stitcher

OpenCV 5

Stitcher

# High level stitching API (Stitcher class) {#tutorial\_stitcher}

@tableofcontents

@prev\_tutorial{tutorial\_hdr\_imaging} @next\_tutorial{tutorial\_background\_subtraction}

Original author

Jiri Horner

Compatibility

OpenCV >= 3.2

## Goal

In this tutorial you will learn how to:

-   use the high-level stitching API for stitching provided by
    -   @ref cv::Stitcher
-   learn how to use preconfigured Stitcher configurations to stitch images using different camera models.

## Code

@add\_toggle\_cpp This tutorial's code is shown in the lines below. You can download it from [here](https://github.com/opencv/opencv/tree/5.x/samples/cpp/stitching.cpp).

Note: The C++ version includes additional options such as image division (--d3) and more detailed error handling, which are not present in the Python example.

@include samples/cpp/snippets/stitching.cpp

@end\_toggle

@add\_toggle\_python This tutorial's code is shown in the lines below. You can download it from [here](https://github.com/opencv/opencv/blob/5.x/samples/python/stitching.py).

Note: The C++ version includes additional options such as image division (--d3) and more detailed error handling, which are not present in the Python example.

@include samples/python/snippets/stitching.py

@end\_toggle

## Explanation

The most important code part is:

@add\_toggle\_cpp @snippet cpp/snippets/stitching.cpp stitching @end\_toggle

@add\_toggle\_python @snippet python/snippets/stitching.py stitching @end\_toggle

A new instance of stitcher is created and the @ref cv::Stitcher::stitch will do all the hard work.

@ref cv::Stitcher::create can create stitcher in one of the predefined configurations (argument `mode`). See @ref cv::Stitcher::Mode for details. These configurations will setup multiple stitcher properties to operate in one of predefined scenarios. After you create stitcher in one of predefined configurations you can adjust stitching by setting any of the stitcher properties.

If you have cuda device @ref cv::Stitcher can be configured to offload certain operations to GPU. If you prefer this configuration set `try_use_gpu` to true. OpenCL acceleration will be used transparently based on global OpenCV settings regardless of this flag.

Stitching might fail for several reasons, you should always check if everything went good and resulting pano is stored in `pano`. See @ref cv::Stitcher::Status documentation for possible error codes.

## Camera models

There are currently 2 camera models implemented in stitching pipeline.

-   _Homography model_ expecting perspective transformations between images implemented in @ref cv::detail::BestOf2NearestMatcher cv::detail::HomographyBasedEstimator cv::detail::BundleAdjusterReproj cv::detail::BundleAdjusterRay
-   _Affine model_ expecting affine transformation with 6 DOF or 4 DOF implemented in @ref cv::detail::AffineBestOf2NearestMatcher cv::detail::AffineBasedEstimator cv::detail::BundleAdjusterAffine cv::detail::BundleAdjusterAffinePartial cv::AffineWarper

Homography model is useful for creating photo panoramas captured by camera, while affine-based model can be used to stitch scans and object captured by specialized devices.

@note Certain detailed settings of @ref cv::Stitcher might not make sense. Especially you should not mix classes implementing affine model and classes implementing Homography model, as they work with different transformations.

## Try it out

If you enabled building samples you can found binary under `build/bin/cpp-example-stitching`. This example is a console application, run it without arguments to see help. `opencv_extra` provides some sample data for testing all available configurations.

to try panorama mode run:

```
./cpp-example-stitching --mode panorama <path to opencv_extra>/testdata/stitching/boat*
```

to try scans mode run (dataset from home-grade scanner):

```
./cpp-example-stitching --mode scans <path to opencv_extra>/testdata/stitching/newspaper*
```

or (dataset from professional book scanner):

```
./cpp-example-stitching --mode scans <path to opencv_extra>/testdata/stitching/budapest*
```

@note Examples above expects POSIX platform, on windows you have to provide all files names explicitly (e.g. `boat1.jpg` `boat2.jpg`...) as windows command line does not support `*` expansion.

## Stitching detailed (python opencv >4.0.1)

If you want to study internals of the stitching pipeline or you want to experiment with detailed configuration you can use stitching\_detailed source code available in C++ or python

#### stitching\_detailed

@add\_toggle\_cpp \[stitching\_detailed.cpp\](https://raw.githubusercontent.com/opencv/opencv/5.x/samples/cpp/stitching\_detailed.cpp) @end\_toggle

@add\_toggle\_python [stitching\_detailed.py](https://raw.githubusercontent.com/opencv/opencv/5.x/samples/python/stitching_detailed.py) @end\_toggle

stitching\_detailed program uses command line to get stitching parameter. Many parameters exists. Above examples shows some command line parameters possible :

boat5.jpg boat2.jpg boat3.jpg boat4.jpg boat1.jpg boat6.jpg --work\_megapix 0.6 --features orb --matcher homography --estimator homography --match\_conf 0.3 --conf\_thresh 0.3 --ba ray --ba\_refine\_mask xxxxx --save\_graph test.txt --wave\_correct no --warp fisheye --blend multiband --expos\_comp no --seam gc\_colorgrad

Pairwise images are matched using an homography --matcher homography and estimator used for transformation estimation too --estimator homography

Confidence for feature matching step is 0.3 : --match\_conf 0.3. You can decrease this value if you have some difficulties to match images

Threshold for two images are from the same panorama confidence is 0. : --conf\_thresh 0.3 You can decrease this value if you have some difficulties to match images

Bundle adjustment cost function is ray --ba ray

Refinement mask for bundle adjustment is xxxxx ( --ba\_refine\_mask xxxxx) where 'x' means refine respective parameter and '\_' means don't. Refine one, and has the following format: fx,skew,ppx,aspect,ppy

Save matches graph represented in DOT language to test.txt ( --save\_graph test.txt) : Labels description: Nm is number of matches, Ni is number of inliers, C is confidence

Perform wave effect correction is no (--wave\_correct no)

Warp surface type is fisheye (--warp fisheye)

Blending method is multiband (--blend multiband)

Exposure compensation method is not used (--expos\_comp no)

Seam estimation estimator is Minimum graph cut-based seam (--seam gc\_colorgrad)

you can use those arguments on command line too :

boat5.jpg boat2.jpg boat3.jpg boat4.jpg boat1.jpg boat6.jpg --work\_megapix 0.6 --features orb --matcher homography --estimator homography --match\_conf 0.3 --conf\_thresh 0.3 --ba ray --ba\_refine\_mask xxxxx --wave\_correct horiz --warp compressedPlaneA2B1 --blend multiband --expos\_comp channels\_blocks --seam gc\_colorgrad

You will get :

For images captured using a scanner or a drone ( affine motion) you can use those arguments on command line :

newspaper1.jpg newspaper2.jpg --work\_megapix 0.6 --features surf --matcher affine --estimator affine --match\_conf 0.3 --conf\_thresh 0.3 --ba affine --ba\_refine\_mask xxxxx --wave\_correct no --warp affine

You can find all images in [https://github.com/opencv/opencv\_extra/tree/5.x/testdata/stitching](https://github.com/opencv/opencv_extra/tree/5.x/testdata/stitching)

## [Table Of Content Other](https://docharvest.github.io/docs/opencv5/tutorials/others/table_of_content_other/)

Contents

opencv5

Table Of Content Other

OpenCV 5

Table Of Content Other

# Other tutorials (stitching, video) {#tutorial\_table\_of\_content\_other}

-   stitching. @subpage tutorial\_stitcher
-   video. @subpage tutorial\_background\_subtraction
-   video. @subpage tutorial\_meanshift
-   video. @subpage tutorial\_optical\_flow
-   ml. @subpage tutorial\_introduction\_to\_pca

## [Color Correction Model](https://docharvest.github.io/docs/opencv5/tutorials/photo/ccm/color_correction_model/)

Contents

opencv5

Color Correction Model

OpenCV 5

Color Correction Model

# Color Correction Model{#tutorial\_ccm\_color\_correction\_model}

## Introduction

The purpose of color correction is to adjust the color response of input and output devices to a known state. The device being calibrated is sometimes called the calibration source; the color space used as the standard is sometimes called the calibration target. Color calibration has been used in many industries, such as television production, games, photography, engineering, chemistry, medicine, etc. Due to the manufacturing process of the input and output equipment, the channel response has nonlinear distortion. In order to correct the picture output of the equipment, it is nessary to calibrate the captured color and the actual color.

In this tutorial you will learn how to use the 'Color Correction Model' to do a color correction in a image.

The color correction functionalities are included in:

```
#include <opencv2/photo/ccm.hpp>
```

## Reference

See details of ColorCorrection Algorithm at [https://github.com/riskiest/color\_calibration/tree/v4/doc/pdf/English/Algorithm](https://github.com/riskiest/color_calibration/tree/v4/doc/pdf/English/Algorithm)

## Source Code of the sample

The sample has two parts of code, the first is the color checker detector model, see details at tutorial\_macbeth\_chart\_detection, the second part is to make color calibration.

```
Here are the parameters for ColorCorrectionModel
    src :
            detected colors of ColorChecker patches;
            NOTICE: the color type is RGB not BGR, and the color values are in [0, 1];
    constcolor :
            the Built-in color card;
            Supported list:
                Macbeth: Macbeth ColorChecker ;
                Vinyl: DKK ColorChecker ;
                DigitalSG: DigitalSG ColorChecker with 140 squares;
    Mat colors :
           the reference color values
           and corresponding color space
           NOTICE: the color values are in [0, 1]
    refColorSpace :
           the corresponding color space
                  If the color type is some RGB, the format is RGB not BGR;
    Supported Color Space:
            Must be one of the members of the ColorSpace enum.
            @snippet modules/photo/include/opencv2/photo/ccm.hpp ColorSpace
            For the full, up-to-date list see cv::ccm::ColorSpace in ccm.hpp.
```

## Code

@snippet samples/cpp/color\_correction\_model.cpp tutorial

## [Linearization Transformation](https://docharvest.github.io/docs/opencv5/tutorials/photo/ccm/linearization_transformation/)

Contents

opencv5

Linearization Transformation

OpenCV 5

Linearization Transformation

# Linearization Transformation For Color Correction {#tutorial\_ccm\_linearization\_transformation}

## Overview

The first step in color correction is to linearize the detected colors. Since the input color space may not be calibrated, empirical methods are used for linearization. The most common methods include:

1.  Identical Transformation
2.  Gamma Correction
3.  Polynomial Fitting

Linearization is typically an element-wise function. The following symbols are used:

\\f$C\\f$: Any color channel (\\f$R, G\\f$, or \\f$B\\f$) \\f$R, G, B\\f$: Respective color channels \\f$G\\f$: Grayscale \\f$s, sl\\f$: Represents the detected data and its linearized value, the former is the input and the latter is the output \\f$d, dl\\f$: Reference data and its linearized value

* * *

## Identical Transformation

No change is made during the Identical transformation linearization, usually because the tristimulus values of the input RGB image is already proportional to the luminance.  
For example, if the input measurement data is in RAW format, the measurement data is already linear, so no linearization is required.

**Formula:** \\f\[ C\_{sl}=C\_s \\f\]

* * *

## Gamma Correction

Gamma correction is a means of performing nonlinearity in RGB space, see the Color Space documentation for details.  
In the linearization part, the value of \\f$gamma\\f$ is usually set to 2.2. You can also customize the value.

**Formulas:** \\f\[ \\begin{aligned} C\_{sl}=C\_s^{\\gamma},\\qquad C\_s\\ge0\\ C\_{sl}=-(-C\_s)^{\\gamma},\\qquad C\_s<0\\\\ \\end{aligned} \\f\]

* * *

## Polynomial Fitting

Linearization using polynomial fitting.

**Polynomial form:** \\f\[ f(x)=a\_nx^n+a\_{n-1}x^{n-1}+... +a\_0 \\f\] Then: \\f\[ C\_{sl}=f(C\_s) \\f\]

_Usually n ≤ 3 to avoid overfitting._  
It is usually necessary to use linearized reference colors and corresponding detected colors to calculate the polynomial parameters.  
However, not all colors can participate in the calculation. The saturation detected colors needs to be removed. See the algorithm introduction document for details.

### Fitting Channels Respectively

Use three polynomials, \\f$r(x), g(x), b(x)\\f$, to linearize each channel of the RGB color space\[1-3\]: \\f\[ \\begin{aligned} R\_{sl}=r(R\_s)\\ G\_{sl}=g(G\_s)\\ B\_{sl}=b(B\_s)\\ \\end{aligned} \\f\] The polynomial is generated by minimizing the residual sum of squares between the detected data and the linearized reference data.  
Take the R-channel as an example:

\\f\[ R=\\arg min\_{f}(\\Sigma(R\_{dl}-f(R\_S)^2)) \\f\]

It's equivalent to finding the least square regression for below equations: \\f\[ \\begin{aligned} f(R\_{s1})=R\_{dl1}\\ f(R\_{s2})=R\_{dl2}\\ ... \\end{aligned} \\f\]

# With a polynomial, the equations become: \\f\[ \\begin{bmatrix} R\_{s1}^{n} & R\_{s1}^{n-1} & ... & 1\\ R\_{s2}^{n} & R\_{s2}^{n-1} & ... & 1\\ ... & ... & ... & ... \\end{bmatrix} \\begin{bmatrix} a\_{n}\\ a\_{n-1}\\ ... \\ a\_0 \\end{bmatrix}

\\begin{bmatrix} R\_{dl1}\\ R\_{dl2}\\ ... \\end{bmatrix} \\f\] This can be expressed in matrix form as: \\f\[ AX=B \\f\] **Coefficient calculation:** \\f\[ X=(A^TA)^{-1}A^TB \\f\] Once we get the polynomial coefficients, we can get the polynomial r.  
This method of finding polynomial coefficients can be implemented by numpy.polyfit in numpy, expressed here as: \\f\[ R=polyfit(R\_S, R\_{dl}) \\f\] Note that, in general, the polynomial that we want to obtain is guaranteed to monotonically increase in the interval \[0,1\] ,  
but this means that nonlinear method is needed to generate the polynomials(see \[4\] for detail).  
This would greatly increases the complexity of the program.  
Considering that the monotonicity does not affect the correct operation of the color correction program, polyfit is still used to implement the program.

Parameters for other channels can also be derived in a similar way.

### Grayscale Polynomial Fitting

In this method\[2\], single polynomial is used for all channels. The polynomial is still a polyfit result from the detected colors to the linear reference colors. However, only the gray of the reference colors can participate in the calculation.

Since the detected colors corresponding to the gray of reference colors is not necessarily gray, it needs to be grayed. Grayscale refers to the Y channel of the XYZ color space. The color space of the detected data is not determined and cannot be converted into the XYZ space. Therefore, the sRGB formula is used to approximate\[5\]. \\f\[ G\_{s}=0.2126R\_{s}+0.7152G\_{s}+0.0722B\_{s} \\f\] Then the polynomial parameters can be obtained by using the polyfit: \\f\[ f=polyfit(G\_{s}, G\_{dl}) \\f\] After \\f$f\\f$ is obtained, linearization can be performed.

### Logarithmic Polynomial Fitting

Takes the logarithm of gamma correction: \\f\[ ln(C\_{sl})={\\gamma}ln(C\_s),\\qquad C\_s\\ge0  
\\f\] It can be seen that there is a linear relationship between \\f$ln(C\_s)\\f$ and \\f$ln(C\_{sl})\\f$. It can be considered that the formula is an approximation of a polynomial relationship, that is, there exists a polynomial \\f$f\\f$, which makes\[2\]: \\f\[ \\begin{aligned} ln(C\_{sl})=f(ln(C\_s)), \\qquad C\_s>0\\ C\_{sl}=0, \\qquad C\_s=0 \\end{aligned} \\f\]

Because \\f$exp(ln(0))\\to\\infty \\f$, the channel component that is zero is directly mapped to zero in this formula.

**Fitted using polyfit on logarithmic values:** \\f\[ \\begin{aligned} r=polyfit(ln(R\_s),ln(R\_{dl}))\\ g=polyfit(ln(G\_s),ln(G\_{dl}))\\ b=polyfit(ln(B\_s),ln(B\_{dl}))\\ \\end{aligned} \\f\]

Note: The parameter of \\f$ln(\*) \\f$ cannot be zero. Therefore, we need to delete all channel values that are 0 from \\f$R\_s \\f$ and \\f$R\_{dl} \\f$, \\f$G\_s\\f$ and \\f$G\_{dl}\\f$, \\f$B\_s\\f$ and \\f$B\_{dl}\\f$.

The final fitting equations become: \\f\[ \\begin{aligned} \\ln(R\_{sl}) &= r(\\ln(R\_s)), \\qquad R\_s > 0 \\ R\_{sl} &= 0, \\qquad R\_s = 0 \\ \\ln(G\_{sl}) &= g(\\ln(G\_s)), \\qquad G\_s > 0 \\ G\_{sl} &= 0, \\qquad G\_s = 0 \\ \\ln(B\_{sl}) &= b(\\ln(B\_s)), \\qquad B\_s > 0 \\ B\_{sl} &= 0, \\qquad B\_s = 0 \\end{aligned} \\f\]

## For grayscale polynomials, there are also: \\f\[ f=polyfit(ln(G\_{sl}),ln(G\_{dl})) \\f\] and: \\f\[ \\begin{aligned} ln(C\_{sl})=f(ln(C\_s)), \\qquad C\_s>0\\ C\_sl=0, \\qquad C\_s=0 \\end{aligned} \\f\]

The functionalities are included in: @code{.cpp} #include <opencv2/photo/ccm.hpp> @endcode

## Enum Definition

```
enum LINEAR_TYPE
{
    LINEARIZATION_IDENTITY,            // No change
    LINEARIZATION_GAMMA,               // Gamma correction; requires gamma value
    LINEARIZATION_COLORPOLYFIT,        // Polynomial fitting for each channel; requires degree
    LINEARIZATION_COLORLOGPOLYFIT,     // Logarithmic polynomial fitting; requires degree
    LINEARIZATION_GRAYPOLYFIT,         // Grayscale polynomial fitting; requires degree and dst_whites
    LINEARIZATION_GRAYLOGPOLYFIT       // Grayscale logarithmic polynomial fitting; requires degree and dst_whites
};
```

* * *

## References

-   \[1-3\] Refer to polynomial fitting methods and empirical studies.
-   \[4\] Describes nonlinear polynomial generation methods.
-   \[5\] sRGB approximation for grayscale calculation.

This documentation is part of the OpenCV photo module.

## [Hdr Imaging](https://docharvest.github.io/docs/opencv5/tutorials/photo/hdr_imaging/)

Contents

opencv5

Hdr Imaging

OpenCV 5

Hdr Imaging

# High Dynamic Range Imaging {#tutorial\_hdr\_imaging}

@tableofcontents

@next\_tutorial{tutorial\_stitcher}

Original author

Fedor Morozov

Compatibility

OpenCV >= 3.0

## Introduction

Today most digital images and imaging devices use 8 bits per channel thus limiting the dynamic range of the device to two orders of magnitude (actually 256 levels), while human eye can adapt to lighting conditions varying by ten orders of magnitude. When we take photographs of a real world scene bright regions may be overexposed, while the dark ones may be underexposed, so we can’t capture all details using a single exposure. HDR imaging works with images that use more that 8 bits per channel (usually 32-bit float values), allowing much wider dynamic range.

There are different ways to obtain HDR images, but the most common one is to use photographs of the scene taken with different exposure values. To combine this exposures it is useful to know your camera’s response function and there are algorithms to estimate it. After the HDR image has been blended it has to be converted back to 8-bit to view it on usual displays. This process is called tonemapping. Additional complexities arise when objects of the scene or camera move between shots, since images with different exposures should be registered and aligned.

In this tutorial we show how to generate and display HDR image from an exposure sequence. In our case images are already aligned and there are no moving objects. We also demonstrate an alternative approach called exposure fusion that produces low dynamic range image. Each step of HDR pipeline can be implemented using different algorithms so take a look at the reference manual to see them all.

## Exposure sequence

## Source Code

@add\_toggle\_cpp This tutorial code's is shown lines below. You can also download it from [here](https://github.com/opencv/opencv/tree/5.x/samples/cpp/tutorial_code/photo/hdr_imaging/hdr_imaging.cpp) @include samples/cpp/tutorial\_code/photo/hdr\_imaging/hdr\_imaging.cpp @end\_toggle

@add\_toggle\_java This tutorial code's is shown lines below. You can also download it from [here](https://github.com/opencv/opencv/tree/5.x/samples/java/tutorial_code/photo/hdr_imaging/HDRImagingDemo.java) @include samples/java/tutorial\_code/photo/hdr\_imaging/HDRImagingDemo.java @end\_toggle

@add\_toggle\_python This tutorial code's is shown lines below. You can also download it from [here](https://github.com/opencv/opencv/tree/5.x/samples/python/tutorial_code/photo/hdr_imaging/hdr_imaging.py) @include samples/python/tutorial\_code/photo/hdr\_imaging/hdr\_imaging.py @end\_toggle

## Sample images

Data directory that contains images, exposure times and `list.txt` file can be downloaded from [here](https://github.com/opencv/opencv_extra/tree/5.x/testdata/cv/hdr/exposures).

## Explanation

-   **Load images and exposure times**

@add\_toggle\_cpp @snippet samples/cpp/tutorial\_code/photo/hdr\_imaging/hdr\_imaging.cpp Load images and exposure times @end\_toggle

@add\_toggle\_java @snippet samples/java/tutorial\_code/photo/hdr\_imaging/HDRImagingDemo.java Load images and exposure times @end\_toggle

@add\_toggle\_python @snippet samples/python/tutorial\_code/photo/hdr\_imaging/hdr\_imaging.py Load images and exposure times @end\_toggle

Firstly we load input images and exposure times from user-defined folder. The folder should contain images and _list.txt_ - file that contains file names and inverse exposure times.

For our image sequence the list is following: @code{.none} memorial00.png 0.03125 memorial01.png 0.0625 ... memorial15.png 1024 @endcode

-   **Estimate camera response**

@add\_toggle\_cpp @snippet samples/cpp/tutorial\_code/photo/hdr\_imaging/hdr\_imaging.cpp Estimate camera response @end\_toggle

@add\_toggle\_java @snippet samples/java/tutorial\_code/photo/hdr\_imaging/HDRImagingDemo.java Estimate camera response @end\_toggle

@add\_toggle\_python @snippet samples/python/tutorial\_code/photo/hdr\_imaging/hdr\_imaging.py Estimate camera response @end\_toggle

It is necessary to know camera response function (CRF) for a lot of HDR construction algorithms. We use one of the calibration algorithms to estimate inverse CRF for all 256 pixel values.

-   **Make HDR image**

@add\_toggle\_cpp @snippet samples/cpp/tutorial\_code/photo/hdr\_imaging/hdr\_imaging.cpp Make HDR image @end\_toggle

@add\_toggle\_java @snippet samples/java/tutorial\_code/photo/hdr\_imaging/HDRImagingDemo.java Make HDR image @end\_toggle

@add\_toggle\_python @snippet samples/python/tutorial\_code/photo/hdr\_imaging/hdr\_imaging.py Make HDR image @end\_toggle

We use Debevec's weighting scheme to construct HDR image using response calculated in the previous item.

-   **Tonemap HDR image**

@add\_toggle\_cpp @snippet samples/cpp/tutorial\_code/photo/hdr\_imaging/hdr\_imaging.cpp Tonemap HDR image @end\_toggle

@add\_toggle\_java @snippet samples/java/tutorial\_code/photo/hdr\_imaging/HDRImagingDemo.java Tonemap HDR image @end\_toggle

@add\_toggle\_python @snippet samples/python/tutorial\_code/photo/hdr\_imaging/hdr\_imaging.py Tonemap HDR image @end\_toggle

Since we want to see our results on common LDR display we have to map our HDR image to 8-bit range preserving most details. It is the main goal of tonemapping methods. We use tonemapper with bilateral filtering and set 2.2 as the value for gamma correction.

-   **Perform exposure fusion**

@add\_toggle\_cpp @snippet samples/cpp/tutorial\_code/photo/hdr\_imaging/hdr\_imaging.cpp Perform exposure fusion @end\_toggle

@add\_toggle\_java @snippet samples/java/tutorial\_code/photo/hdr\_imaging/HDRImagingDemo.java Perform exposure fusion @end\_toggle

@add\_toggle\_python @snippet samples/python/tutorial\_code/photo/hdr\_imaging/hdr\_imaging.py Perform exposure fusion @end\_toggle

There is an alternative way to merge our exposures in case when we don't need HDR image. This process is called exposure fusion and produces LDR image that doesn't require gamma correction. It also doesn't use exposure values of the photographs.

-   **Write results**

@add\_toggle\_cpp @snippet samples/cpp/tutorial\_code/photo/hdr\_imaging/hdr\_imaging.cpp Write results @end\_toggle

@add\_toggle\_java @snippet samples/java/tutorial\_code/photo/hdr\_imaging/HDRImagingDemo.java Write results @end\_toggle

@add\_toggle\_python @snippet samples/python/tutorial\_code/photo/hdr\_imaging/hdr\_imaging.py Write results @end\_toggle

Now it's time to look at the results. Note that HDR image can't be stored in one of common image formats, so we save it to Radiance image (.hdr). Also all HDR imaging functions return results in \[0, 1\] range so we should multiply result by 255.

You can try other tonemap algorithms: cv::TonemapDrago, cv::TonemapMantiuk and cv::TonemapReinhard You can also adjust the parameters in the HDR calibration and tonemap methods for your own photos.

## Results

### Tonemapped image

### Exposure fusion

## Additional Resources

1.  Paul E Debevec and Jitendra Malik. Recovering high dynamic range radiance maps from photographs. In ACM SIGGRAPH 2008 classes, page 31. ACM, 2008. @cite DM97
2.  Mark A Robertson, Sean Borman, and Robert L Stevenson. Dynamic range improvement through multiple exposures. In Image Processing, 1999. ICIP 99. Proceedings. 1999 International Conference on, volume 3, pages 159–163. IEEE, 1999. @cite RB99
3.  Tom Mertens, Jan Kautz, and Frank Van Reeth. Exposure fusion. In Computer Graphics and Applications, 2007. PG'07. 15th Pacific Conference on, pages 382–390. IEEE, 2007. @cite MK07
4.  [Wikipedia-HDR](https://en.wikipedia.org/wiki/High-dynamic-range_imaging)
5.  [Recovering High Dynamic Range Radiance Maps from Photographs (webpage)](http://www.pauldebevec.com/Research/HDR/)

## [Table Of Content Photo](https://docharvest.github.io/docs/opencv5/tutorials/photo/table_of_content_photo/)

Contents

opencv5

Table Of Content Photo

OpenCV 5

Table Of Content Photo

# Photo (photo module) {#tutorial\_table\_of\_content\_photo}

-   @subpage tutorial\_hdr\_imaging
-   @subpage tutorial\_ccm\_color\_correction\_model
-   @subpage tutorial\_ccm\_linearization\_transformation

## [Point Cloud](https://docharvest.github.io/docs/opencv5/tutorials/ptcloud/point_cloud/point_cloud/)

Contents

opencv5

Point Cloud

OpenCV 5

Point Cloud

# Point cloud visualisation {#tutorial\_point\_cloud}

Original author

Dmitrii Klepikov

Compatibility

OpenCV >= 5.0

## Goal

In this tutorial you will:

-   Load and save point cloud data
-   Visualise your data

## Requirements

For visualisations you need to compile OpenCV library with OpenGL support. For this you should set WITH\_OPENGL flag ON in CMake while building OpenCV from source.

## Practice

Loading and saving of point cloud can be done using `cv::loadPointCloud` and `cv::savePointCloud` accordingly.

Currently supported formats are:

-   [.OBJ](https://en.wikipedia.org/wiki/Wavefront_.obj_file) (supported keys are v(which is responsible for point position), vn(normal coordinates) and f(faces of a mesh), other keys are ignored)
-   [.PLY](https://en.wikipedia.org/wiki/PLY_\(file_format\)) (all encoding types(ascii and byte) are supported with limitation to only float type for data)

@code{.py} vertices, normals = cv2.loadPointCloud("teapot.obj") @endcode

Function `cv::loadPointCloud` returns vector of points of float (`cv::Point3f`) and vector of their normals(if specified in source file). To visualize it you can use functions from viz3d module and it is needed to reinterpret data into another format

@code{.py} vertices = np.squeeze(vertices, axis=1)

color = \[1.0, 1.0, 0.0\] colors = np.tile(color, (vertices.shape\[0\], 1)) obj\_pts = np.concatenate((vertices, colors), axis=1).astype(np.float32)

cv2.viz3d.showPoints("Window", "Points", obj\_pts)

cv2.waitKey(0) @endcode

In presented code sample we add a colour attribute to every point Result will be:

For additional info grid can be added

@code{.py} vertices, normals = cv2.loadPointCloud("teapot.obj") @endcode

Other possible way to draw 3d objects can be a mesh. For that we use special functions to load mesh data and display it. Here for now only .OBJ files are supported and they should be triangulated before processing (triangulation - process of breaking faces into triangles).

@code{.py} vertices, indices = cv2.loadMesh("../data/teapot.obj") vertices = np.squeeze(vertices, axis=1)

cv2.viz3d.showMesh("window", "mesh", vertices, indices) @endcode

## [Table Of Content Geometry](https://docharvest.github.io/docs/opencv5/tutorials/ptcloud/table_of_content_geometry/)

Contents

opencv5

Table Of Content Geometry

OpenCV 5

Table Of Content Geometry

# Point cloud module {#tutorial\_table\_of\_content\_ptcloud}

-   @subpage tutorial\_point\_cloud

## [Tutorials](https://docharvest.github.io/docs/opencv5/tutorials/tutorials/)

Contents

opencv5

Tutorials

OpenCV 5

Tutorials

# OpenCV Tutorials {#tutorial\_root}

-   @subpage tutorial\_table\_of\_content\_introduction - build and install OpenCV on your computer
-   @subpage tutorial\_table\_of\_content\_core - basic building blocks of the library
-   @subpage tutorial\_table\_of\_content\_imgproc - image processing functions
-   @subpage tutorial\_table\_of\_content\_app - application utils (GUI, image/video input/output)
-   @subpage tutorial\_table\_of\_content\_calib3d - extract 3D world information from 2D images
-   @subpage tutorial\_table\_of\_content\_objdetect - detect ArUco markers and other calibration boards
-   @subpage tutorial\_table\_of\_content\_features - feature detectors, descriptors and matching framework
-   @subpage tutorial\_table\_of\_content\_dnn - infer neural networks using built-in _dnn_ module
-   @subpage tutorial\_table\_of\_content\_other - other modules (stitching, video)
-   @subpage tutorial\_table\_of\_content\_ios - running OpenCV on an iDevice
-   @subpage tutorial\_table\_of\_content\_geometry - 2d and 3d geometry primitives and point clouds processing
-   @subpage tutorial\_table\_of\_content\_ptcloud - point cloud processing
-   @subpage tutorial\_table\_of\_content\_photo - photo module functions (hdr\_image, ccm) @cond CUDA\_MODULES
-   @subpage tutorial\_table\_of\_content\_gpu - utilizing power of video card to run CV algorithms @endcond