#10 Fighting The Semantic Gap On CBIR Systems

     CBIR which stands for Content Based Image Retrieval is a technique that involves querying an image. Which you then retrieve matches to the queried image based on its similarity results. It typically involves comparing the low-level features of an image, including color, texture, and shape. However, what if you want to compare high level features? High level features mean, the semantic meaning behind an image that humans are basically able to discern. For example, say you want, a tall happy person, or a happy small dog. A person would be able to pick out images pretty easily from a set that match these criteria, a system not so much. What exactly can be done to bridge this Semantic Gap? 

    The academic article “Fighting the Semantic Gap on CBIR Systems through New Relevance Feedback Techniques” mentions using relevance feedback techniques to combat this.” The paper describes the process as “Therefore, by gathering the users´ indications, algorithms can be developed to change the placement of the query, or to change the similarity function employed in order to better comply to the users’ expectations. The approach that asks to the user to set the relevance of the images to a given query and to reprocess it based on the users´ feedback is called relevance feedback (RF) and is been proven to be quite effective in bridging the semantic gap.”[1] Two new Relevance Feedback Techniques are introduced called, Relevance Feedback Projection and Multiple Point Projection.

    Relevance Feedback Projection which is described in the paper as “The proposed Relevance Feedback Projection technique analyzes each feature (attribute) from the feature vector separately, classifying them into a “relevance rule”, indicating the relevant and irrelevant objects. The values of the attribute being processed (relevant images) are placed in the relevant interval”.[1]  So the features from an image are analyzed separately instead of together this causes the accuracy of it to increase compared to regular Relevance Feedback. Irrelevant images are of course pushed out in order to make more room for relevant images.

    Of course, to increase the accuracy even further you have to improve upon precision. Precision being the fraction of relevant images. To increase the precision, it is beneficial to combine the previous technique with Multiple Point Projection. The figure below outlines how Multiple Point Projection compares to other techniques. The paper explains that in the figure “Our proposed techniques achieve a precision of 98% for 0.1 of recall and 88% when the recall is 20% when applying the RF Projection technique, and 100% and 89% respectively for the Multiple Point Projection.”[1] The techniques had a pretty good precision but combining them together can be beneficial.

     The proposed techniques seem promising when trying to bridge the semantic gap. A good solution to understanding high level features, is introducing a human element to it. How you go about it is important, Relevance Feedback Projection seems like a good technique to use for modern systems. However, it seems the best method in utilizing these techniques is to combine both Relevance Feed Projection and Multiple Point Projection.

References

[1] A. J. M. Traina, J. Marques and C. Traina, "Fighting the Semantic Gap on CBIR Systems through New Relevance Feedback Techniques," 19th IEEE Symposium on Computer-Based Medical Systems (CBMS'06), 2006, pp. 881-886, doi: 10.1109/CBMS.2006.88.

Comments

  1. Bao,
    Very interesting topic and well written summary.

    ReplyDelete

Post a Comment

Popular posts from this blog

#7 Sklearn - Python Package - Linear Regression (Part 2)

#3 Project Progress: Import and Clean the Data in Python

#6 Sklearn - Python package - Linear Regression