SAPIR kickoff meeting, Padova, January 2007 Partner MU-Brno WP5 – complex queries Pavel Zezula Faculty of Informatics Masaryk University, Brno SAPIR kickoff meeting, Padova, January 2007 SAPIR kickoff meeting, Padova, January 2007 Complex Queries •Several different attributes (features) of every object –similarity measure for every attribute is different •Complex queries –multiple simple similarity queries on attributes –ranks combined by an aggregation function • •Find the best matches of circular shape objects with red color –the best match for circular shape or red color needs not be the best match combined!!! SAPIR kickoff meeting, Padova, January 2007 SAPIR kickoff meeting, Padova, January 2007 Complex Queries – Definition •Assume that object has m attributes –every attribute is comparable by a distance function di –the value of an aggregation function t – – – represents the “score” of object o with respect to query object q –function t must be monotonous –normalized similarity grades can be used instead of distances • • •represents how similar is the object o to query q in respective attribute SAPIR kickoff meeting, Padova, January 2007 SAPIR kickoff meeting, Padova, January 2007 The A0 Algorithm •Retrieve k top objects with respect to q =(q1, q2, … , qm) •For each attribute i –objects delivered in decreasing similarity to qi –incrementally build sets Xi with best matches till – – •For all –compute function t(o) – needs more dist. comp. –sort results according to values of t(o) –return k first objects SAPIR kickoff meeting, Padova, January 2007 SAPIR kickoff meeting, Padova, January 2007 Threshold Algorithm TA •Retrieve k top objects with respect to q =(q1, q2, … , qm) •Incrementally retrieve objects in every attribute i –objects in decreasing similarity stored in lists Xi •Let mi be the maximal grade (distance) seen in list Xi •The threshold value is defined as t(m1, m2,…, mm) •For every object o retrieved in any list Xi –compute the score t(o) –if the score belongs to the best k scores seen so far •remember o and t(o) - only first k objects are stored •Stop if at least k objects with scores up to the threshold are found SAPIR kickoff meeting, Padova, January 2007 SAPIR kickoff meeting, Padova, January 2007 Example X1 list (color) X2 list (shape) q a b c d d a b c b c a d t(o) = avg(d1,d2): avg(3,2) = 2.5 avg(1,3) = 2 avg(2,4) = 3 avg(4,1) = 2.5 t(m) = 1 t(m) = 2 t(m) = 3 t(m) = 4 SAPIR kickoff meeting, Padova, January 2007 SAPIR kickoff meeting, Padova, January 2007 TA Properties •Sorted access to objects for every attribute –index structure for every attribute •e.g. an index for color histograms and another one for shapes –incremental nearest neighbor search is needed •Random access to objects –ability to compute score of object o in other attributes •e.g. for a particular object o = (color, shape), find its color similarity d(q1,o1) and shape similarity d(q2,o2) •Special variants of threshold algorithm –restricting random accesses –restricting sorted accesses SAPIR kickoff meeting, Padova, January 2007 SAPIR kickoff meeting, Padova, January 2007 Challenge for SAPIR •Multilayer architecture of MESSI •Expensive sorted access – incremental nearest neighbor is needed •Efficient random access •Minimization of the network communication costs is needed •Inter- and intra-query parallelism tradeoff SAPIR kickoff meeting, Padova, January 2007 Partner MU-Brno WP7 – social networks Pavel Zezula Faculty of Informatics Masaryk University, Brno