Video Summary and Transcription
The video covers various aspects of computer vision using OpenCV, including image processing techniques and their applications. It explains how histograms are used for segmenting objects and enhancing image contrast through histogram equalization. Methods like OTSU's thresholding and hysteresis thresholding are discussed for object detection and line extraction. The talk highlights the difference between image processing and computer vision, emphasizing that image processing transforms images, while computer vision extracts values or features from them. Practical applications in healthcare, such as cancer detection and stage determination, are also mentioned. The speaker briefly touches on the hardware requirements for AI and computer vision algorithms and clarifies that OpenCV is not suitable for audio processing. The importance of understanding the underlying algorithms for robust and effective debugging is also stressed.
1. Introduction and Image Processing
Today, we will explore image processing, computer vision, and the combination of computer vision and machine learning. I will provide examples and a code snippet for computer vision algorithms. I have over 15 years of experience in computer vision, machine learning, and artificial intelligence. Let's start by understanding what an image is.
Hi, everyone. Thank you very much for registering to this machine learning conference. Whether you are located in Europe or anywhere else on the earth. Thanks a lot for being here.
I'm Veril. Today, we are going to see how we can make software, machine learning software, see things like humans do. We will try to introduce some image processing and computer vision topics in this very limited short time. Because of the time limitation, I cannot show you so many programming examples. But within my slides, if you notice I've put some algorithms within boxes. So I hope you can turn back and look around later, that could be useful. And we can discuss more examples may be after the presentation at the Q&A session.
So this is my agenda for today. After a small introduction, I will first introduce you what is image processing, some image processing algorithms, and then we will look at computer vision. And we will see some classical computer vision algorithms. Finally, I want to address how we can combine computer vision and machine learning, and how it is different than regular AI solving all problems by itself. So how is it different? Is computer vision irrelevant now because AI is doing everything by itself? So we will discuss these topics. Lastly, I will give you a piece of code. Again, you can implement yourself to jumpstart into doing some computer vision algorithms, hopefully.
A little bit of introduction about myself. I'm currently an assistant professor at the Swedish University called Jönköping University. It's an AI lab just focusing on developing AI and explainable AI algorithms. Besides, I've got my own company, which has done a lot of computer vision applications in the past in the Netherlands. I'm located in the Netherlands also. I've got experience over 15 years in the area of computer vision, machine learning, and also artificial intelligence is the area where I'm putting my a lot of efforts these days. If I want to introduce myself, I always need to add that I'm an environmentalist.
What is an image? Let's start with what is an image before processing these images. If it's a digital image, we are talking about digital images, not old analog style images. If it's a digital image, we are talking about a matrix. The source can be anything. The source can be your smartphone, a regular camera.
2. Image Processing and Computer Vision
If we are talking about digital image, we are talking about a matrix. If we see a picture with a T within it, we can represent this T as an image by putting 1 when it's bright and putting 0 when it's dark. Assume that we talk about the brightness or grayscale image, which is a combination of red, green, blue bands. Image processing and computer vision are different. Image processing software takes an image as input and outputs a processed image. Computer vision takes an image as input and outputs a value, such as the number of people in the scene or a GPS location found in a satellite image.
It can be a satellite sensor. It can be a heat sensor. It can be dermatological or microscopic image, telescope image, whatever. If we are talking about digital image, we are talking about a matrix. Alright, if you've got a matrix, you know how to do matrix operations and welcome to image processing. You know how to do image processing now.
If we see a picture with a T within it, let's say, we can represent this T as an image by putting 1 when it's bright and putting 0 when it's dark. Now we've got T represented as a digital image. When the numbers are only 0 and 1, of course this matrix is a binary matrix. It's a binary image, it's called. But normally when we've got our smartphones make a picture, we've got RGB, red, green, blue color images. So, that means that we don't have one image as a one matrix. For each image, we've got three matrices. But in order to make things, operations simple. Now, we will try to look at one matrix at once.
Assume that we talk about the brightness or grayscale image, which is combination of these red, green, blue bands all together. Let's assume that a weighted sum is obtained. So, before going further, I said I will talk about image processing. I will talk about computer vision. Are they the same thing or not? I say it's the same. Is it true or false? Give an answer from your head now. It's false. Well, even though people use these terms interchangeably, some people say image processing for computer vision, some people can say computer vision for an image processing application, but in fact there are different things. When we say image processing, we assume that we've got a software, which we say image processing software, where the input is an image and output is a processed image, then we've got an image processing software. However, when we are talking about computer vision, then our input again is an image. But the output is a value. It can be a number of people in the scene or a vector, a position, a GPS location that we found in a satellite image, a boundary, a shape. It can be any class, for instance. So if we have a value at the end, then we say we've done computer vision. And most of the time, these are done altogether, because most of the time the image for computer vision is not suitable to process and to be processed immediately.
3. Image Processing and Histogram
In image processing, we can remove noise, adjust colors, augment images, detect edges and corners, and make old photos new. Histogram is a crucial element in image processing, representing the distribution of brightness values within an image.
So it's a good idea to do first image processing to process the image, maybe removing some noise, and make it more suitable to be processed by computer vision. And afterwards, computer vision extracts some values, shapes, vectors. Later on, you can pick them up and put them into a machine learning algorithm, learn something, classify something, recognize something. So we will see more examples.
In image processing, we say our input is an image, and our output is a processed image. What could be this? Just some examples of image processing algorithms. It could be denoised result. Maybe input was a noisy image and the output is a denoised image. It's more smooth. It can be a color adjustment application done by an image processing algorithm. You can do image augmentation, very interesting for generating test datasets for AI, for instance. Or you can remove the background like Zoom meetings are doing now, changing the background. This is also an image processing algorithm. You can do edge detection, corner detection, like implementing. You can do such methods by using image processing algorithm, but assume that you've got an image of the edge at the end. And also you can make old photos new by guessing what colors they should get at the end. And there's an example of pretty much like looking at old videos colored. And this is a result of image processing algorithm where image processing is applied to each frame of the video.
So I will show you some image processing algorithms, but maybe the most important thing to express is histogram, when we are talking about how to do image processing algorithms. Histogram is distribution of brightness values within the image. And it's the most interesting thing to look at in an image when you get it. For instance, here, there's a scene that you see. If we put the histogram, we mean, we are looking at the image values. If we are talking about 8-bit image, the values are from 0 to 2255 because we should have 256 values at the end for an 8-bit image. Each of these values are telling about brightness level of the image. If it's very bright, if it's very white, the value is 255. If it's totally dark, the value is zero, and the other gray values are in between. If you look at the distribution, how much brightness, bright pixels, there are in this image, if you look at the statics, you see that some accumulation is happening close to zero, and some accumulation is happening close to 255 higher values. We can expect that these darker values, these mountain is causing this accumulation of darker values here. It gives us an idea about the values within the image, and are they distributed nice or not.
4. Histogram Equalization and Segmentation
We can do more processes to the image by looking at the histogram. Histogram equalization stretches the histogram to ensure the distribution is between 0 and 255. This is done by using the cumulative values of the histogram. Histograms are used to make images more visible and can also be used for segmentation.
And we can also do more processes to the image by just looking at this histogram. So the first important thing to know about how to do things with histogram is to know how to do histogram equalization. Most of the time, when we've got input images, for instance from a drone, it's looking at a limited area where it's also projecting its own shadow. We will have very dark values.
First of all, we have to make sure that we have stretched these values good enough that some texture, some properties within. Otherwise, we cannot do further computer vision applications to extract some features to recognize things. The first interesting thing to look at is histogram equalization, and it's trying to stretch the histogram to make sure that the distribution is done between 0 and 255 and we can see some texture within.
This histogram equalization is easily done by looking at the cumulative values of the histogram. That means if we've got the brightness values distribution, we can count how many brightness values until this value in order to project this cumulative histogram, then we can use this cumulative distribution in order to find the new values of each brightness by just using the cumulative distribution function as a projection function to find the new distribution, new brightness values to project into stretched values. So I hope you can find more at the references, but we use these histograms for making the image more visible, the features within itself are more visible, but we can also use it for segmentation.
5. Image Processing Techniques
We can use the histogram for segmenting objects by choosing threshold values. OpenCV provides the OTSUS thresholding method to automatically find the threshold value. Another approach is using hysteresis threshold to extract lines by selecting two thresholds and considering connected pixels. By applying matrix operations and convolution windows, we can perform various image processing algorithms.
For instance, if we look at the histogram and put a threshold value, like remember the mountain and the sea background, we can threshold the histogram in order to remove the background or to get the foreground on. We can use the histogram for segmenting objects as well. We can do this segmentation by looking at the histogram of the overall image, or we can look at small windows and choose different thresholds if the brightness is fluctuating a lot within the image.
And the question is here is how to choose the threshold value automatically in order to find the, to do the segmentation. In summary, OpenCV comes with help because there is a method called OTSUS thresholding and OpenCV offers you just in one line. What it does is to, I draw it by hand here. If you've got this brightness values, assume that they are distributed like in this video. I have to go a little bit quicker. If I can. If you look at the histogram, what we need to do is to fit two Gaussians to this histogram. I can fit two Gaussians to two peaks, two peak values that I found in this histogram, like this. And the intersection place gives me the threshold value to segment this image automatically. I have to go a little bit faster, we can discuss in the Q&A.
We can look at more algorithms that what we can do in image, in the area of image processing. We can use the brightness values in order to extract lines. How we can do that? We can use hysteresis threshold in a cold method, for instance. What hysteresis threshold does is instead of selecting one threshold like Otsu did, here we select two thresholds. One is high threshold, the other one is low threshold. Then we are looking at connected pixels. And the connected pixels, let's say, all of them are lower than maximum threshold. Then we say we remove these connected pixels. They do not represent anything significant. However, if some of the parts of the connected pixels have brightness value higher than the high threshold, then we assume that all connected segment, even though some of the places were darker, we assume all the connected segment pixels as one edge together. So, that gives us very robust algorithms because one threshold may not detect all the connected lines, but two thresholds with hysteresis threshold and like methods help us to develop more robust approach. So, I have given you some algorithms to process images further, but I have to go quicker. Please look at the slides later. In order to do image processing algorithms, we can do matrix operations because image is a matrix. We said we can make a lot of applications by making a convolution window and scanning the image with a convolution window, like, for instance, if the convolution window has only values one, that means we put the average of the value within the window to each pixel when we are travelling and at the end, we've got averaged pixels, which gives us a smoothed image. If we replace this convolutional window instead of once, we put a Gaussian shape, that means the middle pixel is making the most influence when we are having the average, the weighted average now. And the far away pixel is making the less contribution when we make the average.
6. Image Processing and Computer Vision Techniques
Image processing algorithms can be performed in the frequency domain by converting the image into a frequency spectrum and applying filters. Computer vision extracts important features, such as corners and edges, to recognize objects. Descriptor vectors are used for machine learning algorithms. Computer vision allows for hand-tailoring, while AI finds features automatically. AI requires large datasets, while computer vision can use a single image template. Computer vision methods do not always require a GPU, unlike AI methods.
So that helps us to achieve two results, which are smoothed, but the edges are not smoothed much, edges are preserved at the end. I advise you to look at my slides later, I'm going to share, but it's also possible to make these image processing algorithms in the frequency domain by converting the image into frequency spectrum and then process it there by applying low-pass filter, high-pass filter, and then convert it back to image domain to achieve results.
Anything that we've done here in grayscale is also possible to do in red, green, blue bands of the image directly, and maybe combine the results later, but the most interesting thing for the people in AI is to look at augmentation methods that we can do with the image processing methods to generate a lot of data that we can use. After processing the image and computer vision comes. Computer vision tries to extract important features. We call it local features, to look at the image, to recognize things. In order to do so, we try to develop some methods which can identify corners, sharp edges, or specific texture properties, like Harris did with its algorithm, compared each small window with its neighbor to see if the neighbor made a significant change or not. Here is the map of the Harris at the top right side he obtained. If there is a significant change by comparing the neighbor, it's highlighted. Then when he controlled it, he got corners at the end. This is Harris's algorithm. Then later on, other people proposed heuristic approaches or bandpass filtering approaches which are more robust to noise effects, because Harris's was not robust to noise effects. If there's a noise pixel, it's going to be highlighted at the end.
Are features enough for recognition? No, I will answer for you. We need to describe each feature with a feature vector which we call descriptor vector in order to be able to do machine learning algorithms to learn or classify things. Please look at the descriptor vector extraction and I give you within the source code box which is offered to you by OpenCV directly and there are different methods to use this descriptor vectors in order to recognize or match things like directly matching with a template that we already hold in the memory. We can try to find the most similar descriptors within the image or we can try to look at the special distribution of these descriptor vectors with a graph theory. And I gave you more examples, but before we end, I want to discuss how it is different than the AI algorithms that we know of. Does it mean that if we develop AI, with more sophisticated algorithms, we achieve more? Not always. Sometimes computer vision and machine learning comes with more advantages because we hand tailor everything and AI comes with other advantages. So I listed some comparison lists here and generally computer vision allows us to hand tailor what we are looking at. However, AI finds the good features by itself. It makes things maybe more robust or generalized but it might make things also more vulnerable to adversarial attacks when we cannot hand tailor what to look at significantly. AI always needs a large number of datasets to learn from. However, as you see, if we are using computer vision, we can use only one image template to find the similar thing. With computer vision methods, it might be challenging to find optimum model because we show only one template. However, AI learns from large dataset, it might maybe find generalization better. And finally, for computer vision methods, you don't always need GPU. But with AI methods, to develop and find optimal solution, you definitely need a GPU. So please go ahead, write the code that will open up your camera and get grab an image that you can start using OpenCV algorithms.
7. Closing Remarks and Audio Processing
I have to close it up now, but I'm happy to answer your questions. You can find more about me on my website and Twitter. Subscribe to my YouTube channel and newsletter for free content. Regarding audio processing, OpenCV is meant for computer vision and image processing, not audio. Audio is one-dimensional and time-based, while images are two-dimensional. TensorFlow libraries may be more suitable for audio processing.
And this is a very short time. Unfortunately, I have to close it up now, and I'm very happy to go through your questions and open up the topic more. Thank you very much for inviting me. You can find more about how to access me at my website, my firstname, surname.com, and you can find me on Twitter. Please subscribe to my YouTube channel where I teach a lot and also my newsletter to get all the free content that I'm delivering every month. Thanks a lot.
Hello, Raquel. Hi. How are you doing today? I'm fine. Thank you. Thank you. Very nice to be here. Very nice conference. Awesome. Well, we're all glad you're here. And we want to start this conference. Well, we've already started it, but we want to continue and get the show on the road with some amazing questions from our audience, right? Because they have so many questions for you. OK, so here's a good question. OK, so what are some examples of processing that can be done in the domain of, like, frequency and audio? Can you use OpenCV for that, for example? Oh, for audio processing, OpenCV meant for Open Computer Vision, it stands for computer vision and doing image processing. So it's not meant for audio processing directly, but I'm assuming that you might find out maybe one or two functions just it ends up you can process 1D, one dimensional audio signal too. As we've seen in slides, all the images that we are looking are two dimensional metrics. So they are a little bit different than audio, which is one dimension and comes in time. So it's open CV. It's not meant for sound processing. You're right. I'm sure you can benefit some TensorFlow libraries, but other libraries? Maybe it's good to look up further. It's not really my domain. Yeah, that's actually true because I know that the operation of convolution, although like you do tend to apply it to images and it's like the fundamental for operation for computer vision. We do have like one dimensional con like we do have one dimensional convolutions for like signal processing as well, but yeah, that's true. I also haven't heard too much about it from open CV, but it's definitely like relevant in terms of fundamental operations.
Mathematics and Practical Approach
Would you recommend people just starting directly with open CV and then do the math later? It's important to understand the algorithms behind the code to ensure robustness and effective debugging. Self-education through online resources can be beneficial. A good balance between practical and theoretical aspects is crucial. In healthcare, computer vision can be used for cancer recognition and determining the stage of cancer through feature extraction.
Yeah. And okay. So here's another question. I know that you, you've introduced like some math on your slides while also explaining open CV and I know for, for many people, mathematics can be very daunting. And like because of this, there are people who look at the math and they're like, okay, I don't want to get into this. Would you recommend people just starting directly with open CV and then do the math later, or is there like some other approach to this computer vision that you would take?
Yeah, that's a very good question. Uh, first of all, just in my mind, I want to mention, I shared my slides. I think it will be announced on a website or Twitter that's the slides of this talk. You can use it for finding the references. I always put the link to read further. So maybe you can click on the link and check the quote, check the source to learn further. What you heard right now in this very short talk, unfortunately, I cannot explain everything in detail. But when you're programming, you start doing things, it's really good to check the net otherwise you will end up maybe with a good result with your programming, but is it really a robust algorithm which you can implement and work in real life or whenever there is an error, there's a false recognition, where does it come from? It wouldn't be easy to debug if you are not aware of what kind of algorithms behind or you need to know the parameters of this function to tune them. But how do you choose the parameters if you don't know the met behind it? I'm not telling that you go and take a Master, Ph.D. degree like something I I not everybody needs to do that. But I know very professional developers will really do robust, good stuff, just self educated. and nowadays everything is online. I would recommend maybe you're writing a code. It works good. But just go and check the maths behind it's on just on online reading some blogs, forums and asking questions there to other people. It would be beneficial.
Yeah, I completely agree that a good, healthy balance between like the actual practical aspect with the theoretical aspect can do wonders in helping you understand exactly like what you're getting into. Yeah. Thanks for that answer. And we have some more questions here, too. And let me just pull some of them up. OK, so do you have any ideas or, you know, techniques of computer vision that you would use in health care or farming or any of the other social causes that you care about?
Yes, a lot in the health care field. I used it for recognition of cancer, skin cancer, and identifying whether the cancer area is malignant or benign. So it's at what stage of cancer could it be? And for this purpose, I found future extraction just inspired from doctors. They explained to me how they look at skin.
Cancer Classification and Agriculture
Doctors can visually classify the stage of cancer based on their experience. Mathematical algorithms can be used for feature extraction and classification. Artificial intelligence can support doctors in decision-making, providing a second opinion and normalizing answers. In agriculture, computer vision and OpenCV are used to monitor plant health and detect issues before they spread.
And how they classify the stage of the cancer by just looking with their eye. Of course, there must be biopsy, but doctors have seen so many examples and they can just look and tell. And I was really curious, how do you look and tell? Explain it to me. When doctors explained to me, I found a way to describe what they explain mathematically, then it turned into a feature extraction algorithm, which could be used for classification of cancer stage at the end, it always inspires me to think of how human look at this and how human recognize and can I find a mathematical method to do something similar? Yup. I think that, yeah, that type of meta understanding of like being in the mind of like how a human would do this and trying to instruct a machine to do something very similar is very important. And sometimes it's also like very hard to get in terms of like healthcare, because I know like if you, if you make like, for example, in cancer detection, like mistakes that you make can be more fatal, like it's you're playing with human life. So there kind of is that, you know, up and down of like, okay, is it okay to use artificial intelligence? To be the decision maker, but want to be like a decision supporter to, to doctor always that's in those kinds of critical areas, we don't want to kick out the human being, but just want to be a supporter. And second thing, if you put 10 doctors, they will not end up with the same answer. If you ask what cancer stages. They've got also some fluctuations in their answer. So maybe a machine learning system could be a way to normalize their answers. And just to get a second opinion from the machine. Got it. Yep. Thanks for that answer. And I think we have one more final question that we could get to. Right now, like maybe in your current work, probably in your startup and Create4D, how are we, how are you currently using computer vision and even OpenCV for that matter in this work, in your current workspace? Yes, it's very interesting to look at what's going on, on the earth, especially in agriculture area, since it's my passion to look at food security and how plants are doing. Do we have enough food soon in the supermarket or not? And the methods that I described in this talk, like simple convolution, help a lot to extract some features, which might indicate a sudden change in a local area, maybe something wrong in this area, and I can go there and visit and rescue the plants before they die or spread the disease to other plants. I want. Nice, awesome. Well, thank you so much Farola. I think we're out of time right now for questions, but I appreciate you coming here and just answering our questions and also giving us a good insight into computer vision, yeah, computer vision and everything. I mean, like, that's amazing. So everyone, by the way, if you wanted to go and talk to Brielle right now, she will be available in her speaker room to answer your questions in the event that, you know, you couldn't get your questions out, at least in this Q and A session. So thanks a lot for joining us, Brielle. It was great having you here, and we will see you later in your speaker room. Thank you.
Comments