ArchivIA - Archivio istituzionale dell'Universita' di Catania: Deeply Incorporating Human Capabilities into Machine Learning Models for Fine-Grained Visual Categorization

	Home page

Esplora l'archivio
	Comunità & Collezioni
	Titolo
	Autore
	Soggetto
	Corso
	Settore disciplinare
	Ciclo di dottorato
	Anno accademico
	Data di pubblicazione

Area utenti registrati
	Aggiornamenti via e-mail
	My ArchivIA
	Modifica profilo


	Informazioni su ArchivIA (MIT)

ArchivIA - Archivio istituzionale dell'Universita' di Catania >
Tesi >
Tesi di dottorato >
Area 09 - Ingegneria industriale e dell'informazione >

Utilizza questo identificativo per citare o creare un link a questo documento: http://hdl.handle.net/10761/4144

Data:	7-gen-2019
Autori:	Murabito, Francesca
Titolo:	Deeply Incorporating Human Capabilities into Machine Learning Models for Fine-Grained Visual Categorization
Abstract:	Artificial intelligence and machine learning have long attempted to emulate human visual system. With the recent advances in deep neural networks, which take inspiration from the architecture of the primate visual hierarchy, human-level visual abilities are now coming within reach of artificial systems. However, the existing computational models are designed with engineering goals, loosely emulating computations and connections of biological neurons, especially in terms of intermediate visual representations. In this thesis we aim at investigating how human skills can be integrated into computational models in order to perform fine-grained image categorization, a task which requires the application of specific perceptive and cognitive abilities to be solved. In particular, our goal is to develop systems which, either implicitly or explicitly, combine human reasoning processes with deep classification models. Our claims is that by the emulation of the process carried out by humans while performing a recognition task it is possible to yield improved classification performance. To this end, we first attempt to replicate human visual attention by modeling a saliency detection system able to emulate the integration of the top-down (task-controlled, classification-driven) and bottom-up (sensory information) processes; thus, the generated saliency maps are able to represent implicitly the way humans perceive and focus their attention while performing recognition, and, therefore, a useful supervision for the automatic classification system. We then investigate if and to what extent the learned saliency maps can support visual classification in nontrivial cases. To achieve this, we propose SalClassNet, a CNN framework consisting of two networks jointly trained: a) the first one computing top-down saliency maps from input images, and b) the second one exploiting the computed saliency maps for visual classification. Gaze shifts change in relation to a task is not the only process when performing classification in specific domains, but humans also leverage a-priori specialized knowledge to perform recognition. For example, distinguishing between different dog breeds or fruit varieties requires skills that not all human possess but only domain experts. Of course, one may argue that the typical learning-by-example approach can be applied by asking domain experts to collect enough annotations from which machine learning methods can derive the features necessary for the classification. Nevertheless, this is a really costly process and often infeasible. Thus, the second part of this thesis aim at explicitly modeling and exploiting domain-specific knowledge to perform recognition. To this end, we introduce and demonstrate that computational ontologies can explicitly encode human knowledge and that it can be used to support multiple tasks from data annotation to classification. In particular, we propose an ontology-based annotation tool, able to reduce significantly the efforts to collect highly-specialized labels and demonstrate its effectiveness building the VegImage dataset, a collection of about 4,000 images belonging to 24 fruit varieties, annotated with over 65,000 bounding boxes and enriched with a large knowledge base consisting of more than 1,000,000 OWL triples. We then exploit this ontology-structured knowledge by combining a semantic-classifier, which performs inference based on the information encoded in the domain ontology, with a visual convolutional neural network, showing that the integration of semantics into automatic classification models can represents the key to solve a complex task such as the fine-grained recognition of fruit varieties, a task which requires the contribution of domain expert to be completely solved. Performance evaluation of the proposed approaches provides a basis to assess the validity of our claim along with the scientific soundness of developed models.
In	Area 09 - Ingegneria industriale e dell'informazione

Full text:

File	Descrizione	Dimensioni	Formato	Consultabilità
MRBFNC87S47C351F-thesis.pdf	tesi	14,36 MB	Adobe PDF	Visualizza/apri

Segnala questo record su

Del.icio.us

Citeulike

Connotea

Facebook

Stumble it!

Browser supportati Firefox 3+, Internet Explorer 7+, Google Chrome, Safari

ICT Support, development & maintenance are provided by the AePIC team @ CILEA. Powered on DSpace Software.