Enhancing Distance Prediction through Monocular Depth Estimation based on Graph Convolutional Networks
Armin Masoumian
Candidate: Armin Masoumian
PhD Advisors: Dr. Domènec Puig, Dr. Ibrahim Abdellatif, Dr. Julián Efrén Cristiano and Dr. Hatem Mahmoud
Date of defense: 2024-02-07
File: Thesis download
Abstract: As the field of robotics and autonomous vehicles advances, the demand for precise depth measurements becomes increasingly pronounced. Depth estimation (DE), a fundamental task in computer vision, plays a pivotal role in achieving this accuracy, with deep learning (DL) techniques offering a viable solution. Particularly, self-supervised monocular depth estimation (MDE) represents cutting-edge technology, allowing the estimation of object depth in a scene from a single image, eliminating the need for expensive stereoscopic or 3D cameras. Graph convolutional networks (GCNs) have further improved the accuracy of DE models by accommodating non-Euclidean data, while combining multiple loss functions has enhanced the reliability of depth predictions.
This study explores the extensive applications of self-supervised MDE and provides a comprehensive review of recent advancements in the field using DL techniques. It delves into key aspects like input data shapes, training methods, and evaluation criteria while also addressing the limitations of DL-based MDE models, including challenges related to accuracy, computational efficiency, real-time feasibility, domain adaptation, and generalization. Furthermore, the research introduces an innovative MDE approach leveraging GCNs for estimating depth maps from monocular videos, outperforming existing state-of-the-art methods. Additionally, a novel deep learning framework is presented, seamlessly integrating DE and object detection within a single image, achieving impressive accuracy, particularly in outdoor scenarios. In summary, this study underscores the efficiency of the self-supervised MDE approach based on graph convolutional networks, providing both quantitative and qualitative comparisons with state-of-the-art methods, emphasizing the considerable advantages of the proposed depth prediction technique.
Supervised Monocular Depth Estimation Based on Machine and Deep Learning Models.
Saddam Abdulwahab
Candidate: Saddam Abdulwahab
PhD Advisors: Dr. Domènec Puig, Dr. Ibrahim Abdellatif and Dr. Hatem Mahmoud
Date of defense: 2023-04-27
File: Thesis download
Abstract: Depth Estimation refers to measuring the distance of each pixel relative to the camera. Depth estimation is crucial for many applications, such as scene understanding and reconstruction, robot vision, and self-driving cars. Depth maps can be estimated using stereo or monocular images. Depth estimation is typically performed through stereo vision following several time-consuming stages, such as epipolar geometry, rectification, and matching. However, predicting depth maps from single RGB images is still challenging as object shapes are to be inferred from intensity images strongly affected by viewpoint changes, texture content, and light conditions. Additionally, the camera only captures a 2D projection of the 3D world. While the apparent size and position of objects in the image can change significantly based on their distance from the camera.
Stereo cameras have been deployed in systems to obtain depth map information. Although it shows good performance, but its main drawback is the complex and expensive hardware setup it requires and the time complexity, which limits its use. In turn, monocular cameras have become simpler and cheaper; however, single images always need more important depth map information. Many approaches to predict depth maps from monocular images have recently been proposed, thanks to the revolution of deep learning models. However, most of these solutions result in blurry approximations of low-resolution depth maps. In general, depth estimation requires knowing the appropriate representation methods to extract the shared features in a single RGB image and the corresponding depth map to get the depth estimation.
Consequently, this thesis attempts to contribute into two research lines in estimating depth maps (also known as depth images): the first line estimates the depth based on the object present in a scene to reduce the complexity of the complete scene. Thus, we developed new techniques and concepts based on traditional and deep learning methods to achieve this task. The second research line estimates the depth based on a complete scene from a monocular camera. We have developed more comprehensive techniques with a high precision rate and acceptable computational timing to get more precise depth maps.
Analyzing the breast tissue in mammograms using deep learning.
Nasibeh Saffari Tabalvandani
Candidate: Nasibeh Saffari Tabalvandani
PhD Advisors: Dr. Blas Herrera and Dr. Domènec Puig
Date of defense: 2022-03-24
File: Thesis download
Abstract: Mammographic breast density (MBD) reflects the amount of fibroglandular area of breast tissue that appears white and shiny on mammograms, commonly known as percent breast density (PD%). MBD is a risk factor for breast cancer and a risk factor for masking tumors. However, accurate estimation of BMD with visual assessment remains a challenge due to poor contrast and significant variations in background adipose tissue in mammograms. In addition, the correct interpretation of mammography images requires highly trained medical experts: It is difficult, laborious, expensive and prone to errors. However, dense breast tissue can make breast cancer more difficult to identify and be associated with a higher risk of breast cancer. For example, women with high breast density compared to women with low breast density have been reported to have a four to six times greater risk of developing the disease. The main key to breast density computation and breast density classification is to correctly detect dense tissues in mammographic images.
Many methods have been proposed to estimate breast density; however, most are not automated. In addition, they have been severely affected by low signal-to-noise ratio and density variability in appearance and texture. It would be more helpful to have a computer-aided diagnosis (CAD) system to help the doctor analyze and diagnose it automatically. The current development of deep learning methods motivates us to improve the current breast density analysis systems. The main focus of this thesis is to develop a system to automate breast density analysis (such as; Breast Density Segmentation (BDS), Breast Density Percentage (BDP) and Breast Density Classification ( BDC) ), using deep learning techniques and applying it to temporal mammograms after treatment to analyze breast density changes to find a dangerous and suspicious patient.
Segmentation and Classification of Multimodal Medical Images based on Generative Adversarial Learning and Convolutional Neural Networks.
Vivek Kumar Singh
Candidate: Vivek Kumar Singh
PhD Advisors: Dr. Domènec Puig and Dr. Santiago Romaní
Date of defense: 2019-11-22
File: Thesis download
Abstract: Abstract: Medical imaging is an important means for early illness detection in the majority of medical fields, which provides better prognosis to the patients. But properly interpreting medical images needs highly trained medical experts: it is difficult, time-consuming, expensive, and error-prone. It would be more beneficial to have a computer-aided diagnosis (CAD) system that can automatically outline the possible ill tissues and suggest diagnosis to the doctor. Current development in deep learning methods motivates us to improve current medical image analysis systems. In this thesis, we have considered three different medical diagnosis, such as breast cancer from mammograms and ultrasound images, skin lesion from dermoscopic images, and retinal diseases from fundus images. These tasks are very challenging due to the several sources of variability in the image capturing processes.
Firstly, we propose a method to analyze the breast cancer in mammograms. In a first stage, we utilize the Single Shot Detector (SSD) method to locate the possibly abnormal regions, which are called regions of interest (ROIs). Then, in a second stage we apply a conditional generative adversarial network (cGAN) method to segment possible masses within the ROIs. This network works efficiently with a reduced number of training images. In a third stage, a convolutional neural network (CNN) has been introduced to classify the shape of the masses (round, oval, lobular and irregular). Besides, we also try to classify those masses into four distinct breast cancer molecular subtypes (Luminal-A, Luminal-B, Her-2, and Basal-like), based on its shape and also on the micro-texture rendered in the image pixels. Moreover, for ultrasound image processing, we extended the proposed cGAN model by introducing a novel channel attention and weighting (CAW) block, which improves the robustness of segmentation by fostering the more relevant features of the masses. Some statistical analysis corroborate the accuracy of the segmented masks. Finally, we also performed a classification between benign and malignant tumors based on the shape of the segmented masks.
Second, skin lesion segmentation in dermoscopic images is still challenging due to the low contrast and fuzzy boundaries of lesions. Besides, lesions have high similarity to healthy regions. To overcome this problems, we introduce a novel layer inside the encoder of the cGAN, called factorized channel attention (FCA) block. It integrates a channel attention mechanism and a residual 1-D kernel factorized convolution. The channel attention mechanism increases the discriminability between the lesion and non-lesion features by taking into account feature channel interdependencies. The 1-D factorized kernels provide extra convolutional layers with a minimal set of parameters and a residual connection that minimizes the impact of image artifacts and irrelevant objects.
Third, segmentation of retinal optic disc in fundus photographs plays a critical role in the diagnosis, screening and treatment of many ophthalmologic diseases. Therefore, we have applied our cGAN method to the task of optic disc segmentation, obtaining promising results with a really short number of training samples (less than twenty). Experiments with these three kinds of medical image diagnosis have been performed for quantitative and qualitative comparisons with other state-of-the-art methods, to show the advantages of the proposed detection, segmentation and classification techniques.
Keywords: Medical image analysis, deep learning, conditional generative adversarial network, segmentation.
Efficient Deep Learning Models and Their Applications to Health Informatics.
Mostafa Kamal Sarker
Candidate: Mostafa Kamal Sarker
PhD Advisors: Dr. Domènec Puig and Dr. Petia Radeva
Date of defense: 2019-11-12
File: Thesis download
Abstract: Abstract: This thesis designed and implemented efficient deep learning methods to solve classification and segmentation problems in two major health informatics domains, namely pervasive sensing and medical imaging. In the area of pervasive sensing, this thesis focuses only on food and related scene classification for health and nutrition analysis. This thesis used deep learning models to find the answer of two important two questions, “where we eat?’’ and ‘’what we eat?’’ for properly monitoring our health and nutrition condition. This is a new research domain, so this thesis presented entire scenarios from the scratch (e.g. create a dataset, model selection, parameter optimization, etc.). To answer the first question, “where we eat?”, it introduced two new datasets, “FoodPlaces”, “EgoFoodPlaces” and models, “MACNet”, “MACNet+SA” based on multi-scale atrous convolutional networks with the self-attention mechanism. To answer the second question, “what we eat?”, it presented a new dataset, “Yummly48K” and model, “CuisineNet’‘, designed by aggregating convolution layers with various kernel sizes followed by residual and pyramid pooling module with two fully connected pathway. The proposed models performed state-of-the-art classification accuracy on their related datasets. In the field of medical imaging, this thesis targets skin lesion segmentation problem in the dermoscopic images. This thesis introduced two novel deep learning models to accurately segment the skin lesions, “SLSDeep” and “MobileGAN” based on dilated residual with pyramid pooling network and conditional Generative Adversarial Networks (cGANs). Both models show excellent performance on public benchmark datasets.
Keywords: Deep Learning, Wearable Device, Food Places Classification, Convolutional Neural Network, Recurrent Neural Network, Skin Lesion Segmentation, Dilated Convolutional Neural Network, Generative Adversarial Network.
Empowering Cognitive Stimulation Therapy (CST) with Socially Assistive Robotics (SAR) and Emotion Recognition.
Jainendra Shukla
Candidate: Jainendra Shukla
PhD Advisor: Dr. Domènec Puig
Date of defense: 2018-05-24
File: Thesis download
Abstract: Robot-assisted systems for cognitive rehabilitation can increase the reach of potential benefits of evidence-based psychological or psychosocial interventions to the individuals with a wide range of mental health concerns. Existing researches in socially assistive robots (SAR) lack clinical validation and hence, medical practitioners have little motivation for their use in clinical practices. Besides, existing human-robot interactions are inattentive to the user’s current emotional state and engagement. Cognitive rehabilitation interventions for individuals with mental health concerns demand complex human robot interaction, and ubiquity of wearable devices motivates for robot interaction systems which can autonomously acquire information about the user’s emotional state, intentions and surrounding context so the robot can adapt its interactions accordingly. In this thesis, I have described the design, implementation of robot-assisted cognitive rehabilitation activities and real-time emotion recognition from electro-dermal activity (EDA) signals. Design of robot-assisted interventions presents a coherent framework to produce positive effects on both the users and the caregivers. The implementation of the system confirms an increased engagement among users and a significant reduction in caregivers burden. The development of the emotion recognition algorithms has shown that it is possible to process the EDA signals in real time with minimal lag to infer the emotional state of individuals with intellectual disability (ID).
Keywords: Socially Assistive Robotics; Emotion Recognition; Stimulation Therapy.
Understanding Road Scenes using Deep Neural Networks.
Hamed Habibi
Candidate: Hamed Habibi
PhD Advisor: Dr. Domènec Puig
Date of defense: 2017-07-06
File: Thesis download
Abstract: Understanding road scenes is crucial for autonomous cars. This requires segmenting road scenes into semantically meaningful regions and recognizing objects in a scene. While objects such as cars and pedestrians has to be segmented accurately, it might not be necessary to detect and locate these objects in a scene. However, detecting and classifying objects such as traffic signs is essential for conforming to road rules. In this thesis, we first propose a method for classifying traffic signs using visual attributes and Bayesian networks. Then, we propose two neural network for this purpose and develop a new method for creating an ensemble of models. Next, we study sensitivity of neural networks against adversarial samples and propose two denoising networks that are attached to the classification networks to increase their stability against noise. In the second part of the thesis, we first propose a network to detect traffic signs in high-resolution images in real-time and show how to implement the scanning window technique within our network using dilated convolutions. Then, we formulate the detection problem as a segmentation problem and propose a fully convolutional network for detecting traffic signs. Finally, we propose a new fully convolutional network composed of fire modules, bypass connections and consecutive dilated convoletions in the last part of the thesis for segmenting road scenes into semantically meaningful regions and show that it is more accurate and computationally more efficient compared to similar networks.
Keywords: Deep Neural Networks; Understanding Road Scenes; Semantic Segmentation.
Active contours for intensity inhomogeneous image segmentation.
Farhan Akram
Candidate: Farhan Akram
PhD Advisor: Dr. Domènec Puig
PhD Advisor: Dr. Miguel Ángel garcía
Date of defense: 06-07-2017
File: Thesis download
Abstract: Intensity inhomogeneity is a well-known problem in image segmentation, which affects the accuracy of intensity-based segmentation methods. In this thesis, edge-based and region-based active contour methods are proposed to segment intensity inhomogeneous images. Firstly, we have proposed an edge-based active contour method based on the Difference of Gaussians (DoG), which helps to segment the global structure of the image. Secondly, we have proposed a region-based active contour method to both correct and segment intensity inhomogeneous images. A phase stretch transform (PST) kernel has been used to compute new intensity means and bias field, which are employed to define a bias fitted image. Thirdly, another region-based active contour method has been proposed using an energy functional based on local and global fitted images. Bias field is approximated with a Gaussian distribution and the bias of intensity inhomogeneous regions is corrected by dividing the original image by the approximated bias field. Finally, a hybrid region-based multiphase (four-phase) active contours method has been proposed to partition a brain MR image into three distinct regions: white matter (WM), gray matter (GM) and cerebrospinal fluid (CSF). In this work, a post-processing (pixel correction) method has also been devised to improve the accuracy of the segmented WM, GM and CSF regions. Experimental results with both synthetic and real brain MR images have been used for a quantitative and qualitative comparison with state-of-the-art active contour methods to show the advantages of the proposed segmentation techniques.
Keywords: Image segmentation; Active contours; Intensity inhomogeneous.
Human-robot interaction and computer-vision-based services for autonomous robots.
Jordi Bautista Ballester
Candidate: Jordi Bautista Ballester
PhD Advisor: Dr. Domènec Puig
PhD Advisor: Dr. Jaume Vergés
Date of defense: 2016-07-14
File: Doctoral thesis download
Abstract: Imitation Learning (IL), or robot Programming by Demonstration (PbD), covers methods by which a robot learns new skills through human guidance and imitation. PbD takes its inspiration from the way humans learn new skills by imitation in o der to develop methods by which new tasks can be transmitted to robots. This thesis is motivated by the generic question of “what to imitate?” which concerns the problem of how to extract the essential features of a task. To this end, here we adopt Action Recognition (AR) perspective in order to allow the robot to decide what has to be imitated or inferred when interacting with a human kind. The proposed approach is based on a well-known method from natural language processing: namely, Bag of Words (BoW). This method is applied to large databases in order to obtain a trained model. Although BoW is a machine learning technique that is used in various fields of research, in action classification for robot learning it is far from accurate. Moreover, it focuses on the classification of objects and gestures rather than actions. Thus, in this thesis we show that the method is suitable in action classification scenarios for merging information from different sources or different trials. This thesis makes three contributions: (1) it proposes a general method for dealing with action recognition and thus to contribute to imitation learning; (2) the methodology can be applied to large databases which include different modes of action captures; and (3) the method is applied specifically in a real international innovation project called Vinbot.
Keywords: Imitation Learning, Sensor Fusion, Robotics, Action Recognition, Human Robot Interaction, Computer Vision, Bag of Words, Multikernel SVM.
Development of advanced computer methods for breast cancer image interpretation through texture and temporal evolution analysis
Mohamed Abdel-Nasser
Candidate: Mohamed Abdel-Nasser
PhD Advisor: Dr. Domènec Puig
PhD Advisor: Dr. Antonio Moreno
Date of defense: 2016-07-08
File: Download
Abstract: Breast cancer is one of the most dangerous diseases that attacks women. Computer-aided diagnosis systems may help to detect breast cancer early and reduce mortality. This thesis proposes several methods for analyzing breast cancer images. We analyze breast cancer in mammographies, ultrasonographies and thermographies. Our analysis includes mass/normal breast tissue classification, benign/malignant tumor classification in mammograms and ultrasound images, nipple detection in thermograms, mammogram registration and analysis of the evolution of breast tumors.
We considered well-known texture analysis methods and proposed two new texture descriptors. We also studied the effect of pixel resolution, integration scale, preprocessing and feature normalization on the performance of these texture analysis methods for tumor classification. Finally, we used super-resolution approaches to improve the performance of texture analysis methods when classifying breast tumors in ultrasound images.
For the analysis of breast cancer in thermograms, we propose an automatic method for detecting nipples that is accurate and simple. To analyze the evolution of breast cancer, we propose a temporal mammogram registration method based on curvilinear coordinates. We also propose a method for quantifying and visualizing the evolution of breast tumors in patients undergoing medical treatment. Overall, the methods proposed in this thesis improve the performance of the state-of-the-art approaches and may help to improve the diagnosis of breast cancer.
Swarm robotic systems: Y-Pod formation with the analysis on scalability and stability
Purushotham Muniganti
Candidate: Purushotham Muniganti
PhD Advisor: Dr. Albert Oller Pujol
PhD Advisor: Dr. Domènec Puig
Date of defense: 2016-02-08
File: Download
Abstract: The context of this work is an active area of research community which is “swarm formation”. In general, swarm system has most striking examples from nature: social insect colonies are able to build sophisticated structures and regulate the activities of millions of individuals by endowing each individual with simple rules. When applying rules extracted from natural systems to artificial problems, essentially requires different control parameters in order to fulfil the system performance in terms of scalability, flexibility and robustness.
This thesis contributes to the investigation of the swarm formation shape and controller, which is important in swarm robotics too since coordinated behaviour of a group of robots to form a pattern when viewed globally. In this regard, global shape formation is one of the ongoing problems in artificial swarm intelligence. In nature, it is performed for various purposes, such as natural disaster and flock of large birds flying together while forming a shape in order to reduce the air resistance. There exist various shape formations in the literature, but in this thesis, approached new strategy, i.e. Y-Pod, which has vast applications compared to other formation techniques. The Y-Pod is a node which connected with three segments and it will appears different for 2D and 3D environments with respect to angles and shapes.
The main objective of the proposed approach is to form a Y-Pod shape using with linear controller that significantly define the resulting behavior. We have proposed system settling time and pole based approach with respect to equilibrium strategy, to control the swarm system. The proposed linear controller guarantee that the system stability and scalability based on steering analysis and pattern index matching techniques. In addition, with the help of pattern index matching technique, we justify the absolute minima and system synchronization problems in order to overcome the redundancy issues in communication networks. In this process, parameters are chosen based on desired formation as well as user defined constraints. This approach compared to others, is simple, computationally efficient, scales well to different swarm sizes, to both centralized and decentralized swarm models.
Generation and control of locomotion for biped robots based on biologically inspired approaches
Julián Efrén Cristiano Rodríguez
Candidate: Julián Efrén Cristiano Rodríguez
PhD Advisor: Dr. Domènec Puig
PhD Advisor: Dr. Miguel Ángel García
Date of defense: 2016-01-15
File: Download
Abstract: This thesis proposes the use of biologically inspired control approaches to generate and control the omnidirectional gait of humanoid robots, adapting their movement to various types of flat terrain using multi-sensory feedback. The proposed locomotion control systems were implemented using Central Pattern Generator (CPG) networks based on Matsuoka’s neuron model. CPGs are biological neural networks located in the central nervous system of vertebrates or in the main ganglia of invertebrates, which can control coordinated movements, such as those involved in locomotion, respiration, chewing or swallowing.
The fact that, in nature, human and animal locomotion is controlled by CPG networks has inspired the theory on which the present thesis is based. In particular, two closed-loop control architectures based on CPG-joint-space control methods have been proposed and tested by using both a simulated and a real NAO humanoid robot. The first control architecture identified some important features that a CPG-joint-space control scheme must have if a useful locomotion pattern is to be described. On the basis of this analysis, the second control architecture was proposed to describe well-characterized locomotion patterns. The new system, characterized by optimized parameters obtained with a genetic algorithm (GA), effectively generated and controlled locomotion patterns for biped robots on flat and sloped terrain.
To improve how the system behaves in closed-loop, a phase resetting mechanism for CPG networks based on Matsuoka’s neuron model has been proposed. It makes it possible to design and study feedback controllers that can quickly modify the locomotion pattern generated.
The results obtained show that the proposed control schemes can yield well-characterized locomotion patterns with a fast response suitable for humanoid robots with a reduced processing capability. These experiments also indicate that the proposed system enables the robot to respond quickly and robustly, and to cope with complex situations.
Robust atalysis and protection of dynamic scenes for privacy-aware video surveillance
Hatem Abd Ellatif FatahAllah Ibrahim Mahmoud Rashwan
Candidate: Hatem Abd Ellatif FatahAllah Ibrahim Mahmoud Rashwan
PhD Advisor: Dr. Domènec Puig
PhD Advisor: Dr. Antoni Martínez Ballesté
Date of defense: 2014-05-26
File: Download
Abstract: Recent advances in pervasive video surveillance systems pave the way for a compre hensive surveillance of every aspect of our lives. Computerized and interconnected camera systems can be used to profile, track and monitor individuals for the sake of security. Notwithstanding, these systems clearly interfere with the fundamental right of the individuals to privacy. To alleviate this privacy problem and avert the so-called Big Brother effect, the usage of privacy enhancing technologies is mandatory.
Privacy-aware video surveillance systems are based on a Detection Submodule that detects the so-called regions of interest (i.e. areas to protect to achieve privacy) from the captured video and on a Protection Submodule that protects the detected areas (aiming at preventing identity disclosure). Only a trusted manager might be able to access the protected video and unprotect it, for instance in case of criminal investigations and, in general, under permission of a law enforcer (judge, police, etc.). Most literature on privacy in video surveillance systems concentrates on the goal of detecting faces and other regions of interest, and in proposing different methods to protect them. However, the trustworthiness of those systems and, by extension the privacy they provide, is neglected.
In this thesis, the topic of privacy-aware video surveillance is tackled from a holistic point of view. Firstly, an introductory chapter defines the properties of a trustworthy privacy-aware video surveillance system, and reviews the techniques that can be used in the Detection Submodule and in the Protection Submodule.
The remaining of the thesis is divided into two parts. In the first one, some contributions aiming at improving the detection of regions of interest are developed. Specifically, it addresses our contributions to optical flow detection techniques: it has been found that, despite its usefulness, the widely known variational optical flow has several limitations and shortcomings for providing accurate flow fields for motion estimation problems in computer vision. In order to overcome these limitations, new development models are introduced as an alternative to classic concepts. Two models are proposed in this dissertation in order to improve the robustness of variational optical flow model through tensor voting to be more robust against noise and to preserve discontinuities. In addition, the data term of the optical flow model based on brightness constancy assumption is replaced by a rich descriptor in order to obtain an illumination-robust optical flow model.
In the second part, the protection of regions of interest is addressed. A method based on coefficient alteration in the compressed domain of the video is presented and tested in terms of robustness and efficiency. The processes related to the information security of the data involved in the protection and unprotection processes are also comprehensively taken into account.
The thesis includes tests and implementations for all the theoretical proposals, aiming at demonstrating theirrvalidity in a real video surveillance scenario. Finally, a chapter with a summary of the advances presented and further work concludes the thesis.
Modeling and applications of the focus cue in conventional digital cameras
Said David Pertuz Arroyo
Candidate: Said David Pertuz Arroyo
PhD Advisor: Dr. Domènec Puig
PhD Advisor: Dr. Miguel Ángel García
Date of defense: 2013-07-17
File: Download
Abstract: The focus of digital cameras plays a fundamental role in both the quality of the acquired images and the perception of the imaged scene. This thesis studies the focus cue in conventional cameras with focus control, such as cellphone cameras, photography cameras, webcams and the like. A deep review of the theoretical concepts behind focus in conventional cameras reveals that, despite its usefulness, the widely known thin lens model has several limitations for solving different focus-related problems in computer vision. In order to overcome these limitations, the focus profile model is introduced as an alternative to classic concepts, such as the near and far limits of the depth-of-field. The new concepts introduced in this dissertation are exploited for solving diverse focus-related problems, such as efficient image capture, depth estimation, visual cue integration and image fusion. The results obtained through an exhaustive experimental validation demonstrate the applicability of the proposed models.
Robust perceptual organization techniques for analysis of color images
Rodrigo Moreno Serrano
Candidate: Rodrigo Moreno Serrano
PhD Advisor: Dr. Domènec Puig
PhD Advisor: Dr. Miguel Ángel García
Date of defense: 2013-07-17
File: Download
Abstract:This thesis focuses on the development of new robust image analysis techniques more closely related to the way the human visual system behaves. One of the pillars of the thesis is the so called tensor voting technique. This is a robust perceptual organization technique that propagates and aggregates information encoded by means of tensors through a convolution like process. Its robustness and adaptability have been one of the key points for using tensor voting in this thesis. These two properties are verified in the thesis by applying tensor voting to three applications where it had not been applied so far: image structure estimation, edge detection and image segmentation of images acquired through stereo vision.
The most important drawback of tensor voting is that its usual implementations are highly time consuming. In this line, this thesis proposes two new efficient implumentations of tensor voting, both derived from an in depth analysis of this technique.
Despite its adaptability, this thesis shows that the original formulation of tensor voting (hereafter, classical tensor voting) is not adequate for some applications, since the hypotheses from which it is based are not suitable for all applications. This is particularly certain for color image denoising. Thus, this thesis shows that, more than a method, tensor voting can be thought of as a methodology in which the encoding and voting process can be tailored for every specific application, while maintaining the tensor voting spirit.
By following this reasoning, this thesis proposes a unified framework for both image denoising and robust edge detection.
This framework is an extension of the classical tensor voting in which both color and edginess the likelihood of finding an edge at every pixel of the image are encoded through tensors, and where the voting process takes into account a set of plausible perceptual criteria related to the way the human visual system processes visual information. Recent advances in the perception of color have been essential for designing such a voting process.
This new approach has been found effective, since it yields excellent results for both applications. In particular, the new method applied to image denoising has a better performance than other state of the art methods for real noise. This makes it more adequate for real applications, in which an image denoiser is indeed required. In addition, the method applied to edge detection yields more robust results than the state of the art techniques and has a competitive performance in recall, discriminability, precision, and false alarm rejection.
Moreover, this thesis shows how the results of this new framework can be combined with other techniques to tackle the problem of robust color image segmentation. The tensors obtained by applying the new framework are utilized to classify pixels into likely homogeneous and likely inhomogeneous. Those pixels are then sequentially segmented through a variation of an efficient graph based image segmentation algorithm. Experiments show that the proposed segmentation algorithm yields better scores in three of the five applied evaluation metrics when compared to the state of the art techniques with a competitive computational cost.
This thesis also proposes new evaluation techniques in the scope of image processing. First, two new metrics are proposed in the field of image denoising: one to measure how an algorithm is able to preserve edges, and the second to measure how a method is able not to introduce undesirable artifacts. Second, a new methodology for assessing edge detectors that avoids possible bias introduced by post processing is proposed. It consists of five new metrics for assessing recall, discriminability, precision, false alarm rejection and robustness. Finally, two new non parametric metrics are proposed for estimating the degree of over and undersegmentation yielded by image segmentation algorithms.
Supervised and unsupervised segmentation of textured images by efficient multi-level pattern classification
Jaime Christian Meléndez Rodríguez
Candidate: Jaime Christian Meléndez Rodríguez
PhD Advisor: Dr. Domènec Puig
PhD Advisor: Dr. Miguel Ángel García
Date of defense: 2010-10-08
File: Download
Abstract: This thesis proposes new, efficient methodologies for supervised and unsupervised image segmentation based on texture information. For the supervised case, a technique for pixel classification based on a multi-level strategy that iteratively refines the resulting segmentation is proposed. This strategy utilizes pattern recognition methods based on prototypes (determined by clustering algorithms) and support vector machines. In order to obtain the best performance, an algorithm for automatic parameter selection and methods to reduce the computational cost associated with the segmentation process are also included. For the unsupervised case, the previous methodology is adapted by means of an initial pattern discovery stage, which allows transforming the original unsupervised problem into a supervised one. Several sets of experiments considering a wide variety of images are carried out in order to validate the developed techniques.