about sql query in refer module

Dear Paul,
I have got a strange question while using sql query in refer modul. ex:
select bib.* from bib where (bib.Folder_ID=85) and A and (B or C), A, B, C are conditions, I can get the right answers for
select bib.* from bib where (bib.Folder_ID=85) and A and not (B or C) before, but I cannot get it now, while I change the sql to
select bib.* from bib where (bib.Folder_ID=85) and A and not B and not C, it can come up with the right answer.
Would you give me some suggestions for the question?
Thanks Paul.

You said you can get the

You said you can get the right answer before. Did you upgrade Biblioscape and cause the query not to work? Without your database and the queries, it is hard for me to diagnoze the problem. Thanks, Paul

Dear Paul thanks for

Dear Paul
thanks for your reply.
My biblioscape is 7, and downloaded the patch from the official site before the question happened. The H/S is lenovo w500 + win7(sp1). It's totally a surprise for me since it worked well in the past.
I have tried to reinstall a new version of Biblioscape, and create a new DB, but the question is still on. and the data is Bibs from EI, the sql is like
select bib.* from bib where (bib.Folder_ID=101) and (lower(title) like '%stream%' or lower(abstract) like '%stream%') => get right answer(7 bibs)
select bib.* from bib where (bib.Folder_ID=101) and not (lower(title) like '%stream%' or lower(abstract) like '%stream%') =>get wrong answer(40 bibs)
select bib.* from bib where (bib.Folder_ID=101) and not (lower(title) like '%stream%') and not (lower(abstract) like '%stream%') => get right answer(33 bibs)
following the is data(total 40 bibs):
//bib begin
@inproceedings{20111913965149 ,
language = {English},
copyright = {Compilation and indexing terms, Copyright 2011 Elsevier Inc.},
copyright = {Compendex},
title = {Value joins are expensive over (probabilistic) XML},
journal = {ACM International Conference Proceeding Series},
author = {Kharlamov, Evgeny and Nutt, Werner and Senellart, Pierre},
year = {2011},
pages = {49 - 56},
address = {Uppsala, Sweden},
abstract = {We address the cost of adding value joins to tree-pattern queries and monadic second-order queries over trees in terms of the tractability of query evaluation over two data models: XML and probabilistic XML. Our results show that the data complexity rises from linear, for join-free queries, to intractable, for queries with value joins, while combined complexity remains essentially the same. For treepattern queries with joins (TPJ) the complexity jump is only on probabilistic XML, while for monadic second-order logic over trees with joins (TMSOJ) it already appears for deterministic XML documents. Moreover, for TPJ queries that have a single join, we show a dichotomy: every query is either essentially join-free, and in this case it is tractable over probabilistic XML, or it is intractable. In this light we study the problem of deciding whether a query with joins is essentially join-free. For TMSOJ we prove that this problem is undecidable and for TPJ it is Π2 P-complete. Finally, for TPJ we provide a conceptually simple criterion to check whether a given query is essentially join free.},
key = {XML},
keywords = {Formal logic;Trees (mathematics);},
note = {Combined complexity;Data complexity;Data models;Monadic second-order logic;Pattern query;Probabilistic XML;Query evaluation;Second orders;},
URL = {http://dx.doi.org/10.1145/1966357.1966366},
}

@article{20111313874548 ,
language = {Chinese},
copyright = {Compilation and indexing terms, Copyright 2011 Elsevier Inc.},
copyright = {Compendex},
title = {Efficient processing of complex XML twig pattern queries based on path-joins},
journal = {Zhejiang Daxue Xuebao (Gongxue Ban)/Journal of Zhejiang University (Engineering Science)},
author = {Jiang, Jin-Hua and Wu, Yu and Hu, Tian-Lei and Chen, Gang},
volume = {45},
number = {1},
year = {2011},
pages = {1 - 8},
issn = {1008973X},
address = {20 Yugu Road, Hangzhou, 310027, China},
abstract = {A novel path-joins based method was proposed to support efficient processing of complex twig pattern queries with OR-predicates of extensible markup language (XML) queries. The method processed the complex twig pattern matching in a holistic way based on the concept AND/OR branch extension (AOBE) and path-joins by dividing the twig pattern into individual paths. Then an index-based algorithm was proposed to efficiently skip useless elements and avoid unnecessary computations. The path-joins based method simplified the complex twig pattern queries processing compared with the existing algorithms. The method only accessed the labels of leaf query nodes, thus the I/O and CPU costs were greatly reduced. Experimental results demonstrate that the method is more efficient than previous approaches.},
key = {XML},
keywords = {Algorithms;Hypertext systems;Markup languages;Pattern matching;},
note = {Extensible markup language;Index;OR-predicates;Path-joins;Twig pattern;},
URL = {http://dx.doi.org/10.3785/j.issn.1008-973X.2011.01.001},
}

@inproceedings{20105013482221 ,
language = {English},
copyright = {Compilation and indexing terms, Copyright 2011 Elsevier Inc.},
copyright = {Compendex},
title = {pq-Hash: An Efficient method for approximate XML joins},
journal = {Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)},
author = {Li, Fei and Wang, Hongzhi and Hao, Liang and Li, Jianzhong and Gao, Hong},
volume = {6185 LNCS},
year = {2010},
pages = {125 - 134},
issn = {03029743},
address = {Jiuzhaigou, China},
abstract = {Approximate matching between large tree sets is broadly used in many applications such as data integration and XML de-duplication. However, most existing methods suffer for low efficiency, thus do not scale to large tree sets. pq-gram is a widely-used method with high quality of matches. In this paper, we propose pq-hash as an improvement to pq-gram. As the base of pq-hash, a randomized data structure, pq-array, is developed. With pq-array, large trees are represented as small fixed sized arrays. Sort-merge and hash join technique is applied based on these pq-arrays to avoid nested-loop join. From theoretical analysis and experimental results, retaining high join quality, pq-hash gains much higher efficiency than pq-gram. © 2010 Springer-Verlag.},
key = {Trees (mathematics)},
keywords = {Data structures;Information management;XML;},
note = {Approximate matching;Data integration;Efficient method;Existing method;Hash join;High quality;Higher efficiency;},
URL = {http://dx.doi.org/10.1007/978-3-642-16720-1_13},
}

@article{20111813959001 ,
language = {English},
copyright = {Compilation and indexing terms, Copyright 2011 Elsevier Inc.},
copyright = {Compendex},
title = {Similarity join of XML documents stored in file systems},
journal = {IEEE Latin America Transactions},
author = {Bocassanta, Fabio and Dorneles, Carina F.},
volume = {8},
number = {6},
year = {2010},
pages = {722 - 727},
issn = {15480992},
address = {445 Hoes Lane - P.O.Box 1331, Piscataway, NJ 08855-1331, United States},
abstract = {Joining XML documents, in a data integration environment, is not a trivial task because besides data are stored in several representations (abbreviated, incomplete, or misspelled), XML data are usually organized as collection of values, which requires a different implementation of the join operation. In this paper, we present two similarity join operators, which are used over XML documents stored in file system. The operators have been implemented in a tool, called SimJoiX, which assists in the task of joining data stored in XML files. © 2005 IEEE.},
key = {XML},
keywords = {Joining;},
note = {Data integration;File systems;Join operation;Similarity functions;similarity join;XML data;XML files;},
URL = {http://dx.doi.org/10.1109/TLA.2010.5688101},
}

@inproceedings{20104613381661 ,
language = {English},
copyright = {Compilation and indexing terms, Copyright 2011 Elsevier Inc.},
copyright = {Compendex},
title = {Approximate joins for XML using g-string},
journal = {Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)},
author = {Li, Fei and Wang, Hongzhi and Zhang, Cheng and Hao, Liang and Li, Jianzhong and Gao, Hong},
volume = {6309 LNCS},
year = {2010},
pages = {3 - 17},
issn = {03029743},
address = {Singapore, Singapore},
abstract = {When integrating XML documents from autonomous databases, exact joins often fail for the data items representing the same real world object may not be exactly the same. Thus the join must be approximate. Tree-edit-distance-based join methods have high join quality but low efficiency. Comparatively, other methods with higher efficiency cannot perform the join as effectively as tree edit distance does. To keep the balance between efficiency and effectiveness, in this paper, we propose a novel method to approximately join XML documents. In our method, trees are transformed to g-strings with each entry a tiny subtree. Then the distance between two trees is evaluated as the g-string distance between their corresponding g-strings. To make the g-string based join method scale to large XML databases, we propose the g-bag distance as the lower bound of the g-string distance. With g-bag distance, only a very small part of g-string distance need to be computed directly. Thus the whole join process can be done very efficiently. We theoretically analyze the properties of the g-string distance. Experiments with synthetic and various real world data confirm the effectiveness and efficiency of our method and suggest that our technique is both scalable and useful. © 2010 Springer-Verlag.},
key = {XML},
keywords = {Database systems;},
note = {Approximate joins;Data items;Edit distance;Higher efficiency;Join method;Lower bounds;Novel methods;Real world data;Real-world objects;Subtrees;Tree edit distance;XML database;},
URL = {http://dx.doi.org/10.1007/978-3-642-15684-7_2},
}

@inproceedings{20104513357490 ,
language = {English},
copyright = {Compilation and indexing terms, Copyright 2011 Elsevier Inc.},
copyright = {Compendex},
title = {A load shedding framework for XML stream joins},
journal = {Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)},
author = {Dash, Ranjan and Fegaras, Leonidas},
volume = {6261 LNCS},
number = {PART 1},
year = {2010},
pages = {269 - 280},
issn = {03029743},
address = {Bilbao, Spain},
abstract = {Joining data streams using various types of windows is an established method of stream processing. The limitation of window size due to memory constraint takes a heavy toll on the accuracy of the query result. Through this paper, we propose a unique windowing technique based on innovative cost functions for join query processing under memory constraints. The logical window construction is controlled through unique data structure and maintained using load shedding technique with least overhead. We applied our technique on XML streams domain and proved the effectiveness of our strategy through measuring the accuracy of the result from joining two XML streams using standard XQuery. With assumption of acceptability of an approximate solution with acceptable error bound in the face of unbounded, complex XML stream, we have tried to come up with a low overhead architecture for load shedding and tested its usefulness through a set of cost functions. © 2010 Springer-Verlag.},
key = {Data processing},
keywords = {Cost functions;Data communication systems;Data mining;Data structures;Expert systems;Hydraulics;Joining;Problem solving;Quality control;Quality of service;Query processing;XML;},
note = {Approximate query processing;Data stream;Load Shedding;Stream Joining;Synopsis;XML Streams;},
URL = {http://dx.doi.org/10.1007/978-3-642-15364-8_21},
}

@inproceedings{20103213135579 ,
language = {English},
copyright = {Compilation and indexing terms, Copyright 2011 Elsevier Inc.},
copyright = {Compendex},
title = {Cross engine database joining},
journal = {8th ACIS International Conference on Software Engineering Research, Management and Applications, SERA 2010},
author = {Leonard, Wesley and Albee, Paul},
year = {2010},
pages = {19 - 26},
address = {Montreal, QC, Canada},
abstract = {A standards-based, open-source middleware system was designed and implemented to facilitate the analysis of large and disparate datasets. This system makes it possible to access several different types of database servers simultaneously, browse remote data, combine datasets, and join tables from remote databases independent of vendor. The system uses an algorithm known as Dynamic Merge Cache to handle data caching, query generation, transformations, and joining with minimal operational interference to source databases. The system is able to combine any subset of configured databases and convert the information into XML. The resulting XML is made available to analysis tools through a web service. After the system connects to a remote database, a metadata catalog is created from the source database. The user is able to configure which tables and fields to export from the remote dataset. The user is also able to filter, transform, and combine data. The system was tested with a large fish contaminant database and a second database populated with simulated scientific data. © 2010 IEEE.},
key = {Metadata},
keywords = {Engineering research;Joining;Markup languages;Middleware;Software engineering;Wavelet transforms;Web services;XML;},
note = {Analysis tools;Data caching;Data sets;Data-base servers;Database connectivity;Distributed database;Middleware system;Middleware/business logic;Open-source;Query generation;Remote data;Remote database;Scientific data;System use;},
URL = {http://dx.doi.org/10.1109/SERA.2010.13},
}

@inproceedings{20102012934717 ,
language = {English},
copyright = {Compilation and indexing terms, Copyright 2011 Elsevier Inc.},
copyright = {Compendex},
title = {Let SQL drive the XQuery workhorse (XQuery join graph isolation)},
journal = {Advances in Database Technology - EDBT 2010 - 13th International Conference on Extending Database Technology, Proceedings},
author = {Grust, Torsten and Mayr, Manuel and Rittinger, Jan},
year = {2010},
pages = {147 - 158},
address = {Lausanne, Switzerland},
abstract = {A purely relational account of the true XQuery semantics can turn any relational database system into an XQuery processor. Compiling nested expressions of the fully compositional XQuery language, however, yields odd algebraic plan shapes featuring scattered distributions of join operators that currently overwhelm commercial SQL query optimizers. This work rewrites such plans before submission to the relational database back-end. Once cast into the shape of join graphs, we have found off-the-shelf relational query optimizers - the B-tree indexing subsystem and join tree planner, in particular - to cope and even be autonomously capable of "reinventing" advanced processing strategies that have originally been devised specifically for the XQuery domain, e.g., XPath step reordering, axis reversal, and path stitching. Performance assessments provide evidence that relational query engines are among the most versatile and efficient XQuery processors readily available today. Copyright 2010 ACM.},
key = {Trees (mathematics)},
keywords = {Indexing (materials working);Relational database systems;Technology;},
note = {Axis reversal;Join operators;Nested expressions;Optimizers;Performance assessment;Relational Database;Relational queries;SQL query;XPath step;XQuery language;},
URL = {http://dx.doi.org/10.1145/1739041.1739062},
}

@article{20101412831870 ,
language = {English},
copyright = {Compilation and indexing terms, Copyright 2011 Elsevier Inc.},
copyright = {Compendex},
title = {Processing strategy for global XQuery queries based on XQuery join cost},
journal = {Journal of Information Science and Engineering},
author = {Park, Jong-Hyun and Kang, J.I.-Hoon},
volume = {26},
number = {2},
year = {2010},
pages = {659 - 672},
issn = {10162364},
address = {128 Academia Road, Section 2, Nankang, Taipei, 115, Taiwan},
abstract = {XML is a standard for exchanging and formatting data over the Internet and XQuery is a standard query language for searching and integrating XML data. Therefore, it is a natural choice for interoperability to use XQuery over the Internet. Global XQuery queries search and integrate heterogeneous data, being distributed in the local systems. In order to process efficiently global XQuery queries, their processing strategy is important because an improper processing strategy could produce an enormous number of intermediate results or execute redundant expressions. In distributed relational databases, there are some techniques for processing global SQL queries. Unfortunately, however, the structure of the data handled by the XQuery language is quite different from the one by the SQL. The XQuery language deals with semi-structural data, i.e. treestructured data, while SQL deals with well-structured data, i.e., the table-shaped data. These structural differences make it difficult to apply the techniques for global SQL queries into for global XQuery queries. Especially this paper considers the join cost for devising a query processing strategy. Therefore, we define some problems for estimating the join cost in XQuery queries and propose ECNJ algorithm for solving these problems. Also this paper proposes the query processing strategy and evaluates the strategy by implementing a prototype system.},
key = {Costs},
keywords = {Estimation;Internet;Linguistics;Markup languages;Query languages;Query processing;XML;},
note = {Global xquery processing;Heterogeneous data;Intermediate results;Local system;Prototype system;Relational Database;SQL query;Standard query languages;Structural data;Structural differences;Structured data;Tree-structured data;XML data;Xquery;XQuery language;XQuery processing;},
}

@inproceedings{20094612444400 ,
language = {English},
copyright = {Compilation and indexing terms, Copyright 2011 Elsevier Inc.},
copyright = {Compendex},
title = {A cluster-based approach to XML similarity joins},
journal = {ACM International Conference Proceeding Series},
author = {Ribeiro, Leonardo A. and Harder, Theo and Pimenta, Fernanda S.},
year = {2009},
pages = {182 - 193},
address = {Cetraro - Calabria, Italy},
abstract = {A natural consequence of the widespread adoption of XML as standard for information representation and exchange is the redundant storage of large amounts of persistent XML documents. Compared to relational data tables, data represented in XML format can potentially be even more sensitive to data quality issues because structure, besides textual information, may cause variations in XML documents representing the same information entity. Therefore, correlating XML documents, which are similar in content an structure, is a fundamental operation. In this paper, we present an effective, flexible, and high-performance XML-based similarity join framework. We exploit structural summaries and clustering concepts to produce compact and high-quality XML document representations: our approach outperforms previous work both in terms of performance and accuracy. In this context, we explore different ways to weigh and combine evidence from textual and structural XML representations. Furthermore, we address user interaction, when the similarity framework is configured for a specific domain, and updatability of clustering information, when new documents enter datasets under consideration. We present a thorough experimental evaluation to validate our techniques in the context of a native XML DBMS. Copyright ©2009 ACM.},
key = {Markup languages},
keywords = {Database systems;Security of data;XML;},
note = {Clustering;Entity resolution;Similarity joins;Similarity measures;xml databases;},
URL = {http://dx.doi.org/10.1145/1620432.1620451},
}

@inproceedings{20092912197673 ,
language = {English},
copyright = {Compilation and indexing terms, Copyright 2011 Elsevier Inc.},
copyright = {Compendex},
title = {XML data integration using fragment join},
journal = {Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)},
author = {Gong, Jian and Cheung, David W. and Mamoulis, Nikos and Kao, Ben},
volume = {5463},
year = {2009},
pages = {334 - 338},
issn = {03029743},
address = {Brisbane, QLD, Australia},
abstract = {We study the problem of answering XML queries over multiple data sources under a schema-independent scenario where XML schemas and schema mappings are unavailable. We develop the fragment join operator-a general operator that merges two XML fragments based on their overlapping components.We formally define the operator and propose an efficient algorithm for implementing it. We define schema-independent query processing over multiple data sources and propose a novel framework to solve this problem. We provide theoretical analysis and experimental results that show that our approaches are both effective and efficient.},
key = {Database systems},
keywords = {Algorithms;Data handling;Markup languages;XML;},
note = {Efficient algorithm;Join operators;Multiple data sources;Schema mappings;XML data;XML queries;XML schemas;},
URL = {http://dx.doi.org/10.1007/978-3-642-00887-0_30},
}

@article{20090911932632 ,
language = {Chinese},
copyright = {Compilation and indexing terms, Copyright 2011 Elsevier Inc.},
copyright = {Compendex},
title = {Estimate XML containment join size using weighted Haar wavelet},
journal = {Zhejiang Daxue Xuebao (Gongxue Ban)/Journal of Zhejiang University (Engineering Science)},
author = {Shao, Feng and Chen, Gang and Chen, Ke and Bei, Yi-Jun and Dong, Jin-Xiang},
volume = {43},
number = {1},
year = {2009},
pages = {28 - 35},
issn = {1008973X},
address = {20 Yugu Road, Hangzhou, 310027, China},
abstract = {A novel weighted Haar wavelet method was proposed to estimate the size of extensible markup language (XML) containment join that is the basic operation in XML structural query processing. The method efficiently compressed the statistic of XML containment join size by the Haar wavelet. The statistic was maintained in the wavelet synopsis. XML containment join size was computed by the wavelet coefficient reconstruction during XML estimation. A novel weight model was presented based on the query frequency of XML tag name to reduce estimation error. The weight model was integrated into the Haar wavelet method. The experimental results show that the method outperforms previous join estimation methods, e.g., histogram-based means, sample-based means. The method has smaller mean relative error than previous methods under the same space budget.},
key = {Markup languages},
keywords = {Estimation;Frequency estimation;Hypertext systems;Linguistics;Query languages;Query processing;Wavelet transforms;XML;},
note = {Basic operations;Containment join;Estimation errors;Estimation methods;Extensible markup language (XML);Haar wavelet;Mean relative errors;Selectivity estimation;Structural queries;Wavelet coefficients;},
URL = {http://dx.doi.org/10.3785/j.issn.1008-973X.2009.01.006},
}

@inproceedings{20092812178949 ,
language = {English},
copyright = {Compilation and indexing terms, Copyright 2011 Elsevier Inc.},
copyright = {Compendex},
title = {XQuery join graph isolation},
journal = {Proceedings - International Conference on Data Engineering},
author = {Grust, Torsten and Mayr, Manuel and Rittinger, Jan},
year = {2009},
pages = {1167 - 1170},
issn = {10844627},
address = {Shanghai, China},
abstract = {A purely relational account of the true XQuery semantics can turn any relational database system into an XQuery processor. Compiling nested expressions of the fully compositional XQuery language, however, yields odd algebraic plan shapes featuring scattered distributions of join operators that currently overwhelm commercial SQL query optimizers. This work rewrites such plans before submission to the relational database back-end. Once cast into the shape of join graphs, we have found off-the-shelf relational query optimizers-the Btree indexing subsystem and join tree planner, in particular-to cope and even be autonomously capable of "reinventing" advanced processing strategies that have originally been devised specifically for the XQuery domain, e.g., XPath step reordering, axis reversal, and path stitching. Performance assessments provide evidence that relational query engines are among the most versatile and efficient XQuery processors readily available today. © 2009 IEEE.},
key = {Graph theory},
keywords = {Indexing (materials working);Relational database systems;},
note = {Axis reversal;Join operators;Nested expressions;Optimizers;Performance assessment;Relational Database;Relational queries;SQL query;XPath step;XQuery language;},
URL = {http://dx.doi.org/10.1109/ICDE.2009.192},
}

@article{20091412017520 ,
language = {Chinese},
copyright = {Compilation and indexing terms, Copyright 2011 Elsevier Inc.},
copyright = {Compendex},
title = {Integration and implementation of in-process data management during aircraft final join assembly},
journal = {Zhejiang Daxue Xuebao (Gongxue Ban)/Journal of Zhejiang University (Engineering Science)},
author = {Yu, Feng-Jie and Wang, Qing and Li, Jiang-Xiong and Dong, Hui-Yue and Ke, Ying-Lin and Yang, Wei-Dong and Qin, Long-Gang},
volume = {43},
number = {2},
year = {2009},
pages = {207 - 212},
issn = {1008973X},
address = {20 Yugu Road, Hangzhou, 310027, China},
abstract = {To implement numeralization, automation and flexibility during the aircraft join assembly, a data management system for a certain type airplane was proposed, which was required in the processes of large-scale postural adjustment, fuselage join and finish machining. Based on a workflow-driven task manager, the system utilized an extendable architecture of multi-ties C/S. The block reading/writing, the parameterized strategy, and the dual-mechanism by online Oracle database and offline XML files were introduced for efficient data storage and retrieval. Simultaneously, several key technologies were particularly illustrated, including status data handling, equipment maintenance, machining BOM disposal, visualized 3D-simulation, data mining and statistical analysis, etc. Application of this system beneficially provides history information and technical support to optimize assembly workflow, improve assembly techniques, redistribute error tolerance, ensure accurate assembly and speed production.},
key = {Data handling},
keywords = {Aircraft;Fuselages;Information management;Machining;Markup languages;},
note = {3-D simulations;Assembly techniques;Data management systems;Data storages;Dual mechanisms;Equipment maintenances;Error tolerances;Finish machining;Flexibility;In-process data;Key technologies;Offline;Oracle database;Parameterized;Statistical analysis;Task managers;Technical supports;XML files;},
URL = {http://dx.doi.org/10.3785/j.issn.1008-973X.2009.02.003},
}

@article{20081211161509 ,
language = {English},
copyright = {Compilation and indexing terms, Copyright 2011 Elsevier Inc.},
copyright = {Compendex},
title = {Holistic algorithm for efficiently evaluating xtwig joins},
journal = {Journal of Computational Information Systems},
author = {Ning, Bo and Wang, Guoren and Zhao, Yanyan},
volume = {4},
number = {1},
year = {2008},
pages = {401 - 406},
issn = {15539105},
address = {P.O. Box 162, Bethel, CT 06801-0162, United States},
abstract = {In order to provide a more powerful XML query ability, an xtwig query, represented as an unrooted labeled tree is proposed. It contains reverse axes in predicates, and specifies the pattern of selection predicates on multiple elements from both descendants and ancestors. A number of algorithms have been proposed recently to process a twig query holistically. Those algorithms, however, only deal with twig queries without reverse axes. A straightforward approach that first decomposes an xtwig query into multiple twig queries and then merges their results is obviously not optimal in most cases. In this paper, we study novel holistic-processing algorithm for xtwig queries with both forward and reverse axes without decomposition, and exploit a data structure XHyperCube to conduct the relations of predicates with reverse axes, which is the core problem in xtwig pattern. The experiments show that holistic processing is much more efficient than the decomposition approach. It avoids the useless intermediate results and is linear in the sum of sizes of the input lists and the results, but independent of the number of the reverse axes predicates.},
key = {XML},
keywords = {Algorithms;Data structures;Database systems;Pattern matching;Query languages;Query processing;Trees (mathematics);World Wide Web;},
note = {Holistic processing;Predicates;Reverse axes;Twig queries;Unrooted labeled trees;XPath;Xtwig pattern;},
}

@inproceedings{20083011391743 ,
language = {English},
copyright = {Compilation and indexing terms, Copyright 2011 Elsevier Inc.},
copyright = {Compendex},
title = {Twig'n join: Progressive query processing of multiple XML streams},
journal = {Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)},
author = {Tok, Wee Hyong and Bressan, Stephane and Lee, Mong-Li},
volume = {4947 LNCS},
year = {2008},
pages = {546 - 553},
issn = {03029743},
address = {New Delhi, India},
abstract = {We propose a practical approach to the progressive processing of (FWR) XQuery queries on multiple XML streams, called Twig'n Join (or TnJ). The query is decomposed into a query plan combining several twig queries on the individual streams, followed by a multi-way join and a final twig query. The processing is itself accordingly decomposed into three pipelined stages progressively producing streams of XML fragments. Twig'n Join combines the advantages of the recently proposed TwigM algorithm and our previous work on relational result-rate based progressive joins. In addition, we introduce a novel dynamic probing technique, called Result-Oriented Probing (ROP), which determines an optimal probing sequence for the multi-way join. This significantly reduces the amount of redundant probing for results. We comparatively evaluate the performance of Twig'n Join using both synthetic and real-life data from standard XML query processing benchmarks. We show that Twig'n Join is indeed effective and efficient for processing multiple XML streams. © 2008 Springer-Verlag Berlin Heidelberg.},
key = {Pipeline processing systems},
keywords = {Benchmarking;Data processing;Database systems;Information management;Markup languages;Query processing;Rivers;Standards;XML;},
note = {Advanced applications;Heidelberg (CO);Individual (PSS 544-7);International conferences;Multi-way join;Probing sequence;Progressive processing;Progressive query (PQ);Real-life data;Twig queries;XML query processing;XML streaming;},
URL = {http://dx.doi.org/10.1007/978-3-540-78568-2_45},
}

@inproceedings{20091812056893 ,
language = {English},
copyright = {Compilation and indexing terms, Copyright 2011 Elsevier Inc.},
copyright = {Compendex},
title = {A cost-based join selection for XML twig content-based queries},
journal = {3rd International Workshop on Database Technologies for Handling XML Information on the Web, DataX'08 - Held at EDBT 2008: 11th International Conference on Extending Database Technology},
author = {Baca, Radim and Kratky, Michal},
year = {2008},
pages = {13 - 20},
address = {Nantes, France},
abstract = {XML (Extensible Mark-up Language) has been embraced as a new approach to data modeling. Nowadays, more and more information is formated as semi-structured data, e.g., articles in a digital library, documents on the web, and so on. Implementation of an efficient system enabling storage and querying of XML documents requires development of new techniques. Many different techniques of XML indexing have been proposed during recent years. If we consider some classes of indexing methods, we distinguish two kinds of joins for processing twig queries. The first join merges two sets retrieved from an inverted list. The second join applies the first query result in building the second query. Although authors propose improvements of their joins, there has not yet been a discussion about the advantages of applying various join operations. In this article, we propose a join selection based on the cost of a join. By choosing a more appropriate join operation, twig query processing efficiency is significantly improved. Copyright 2008 ACM.},
key = {Markup languages},
keywords = {Content based retrieval;Costs;Digital libraries;Indexing (materials working);Indexing (of information);Query processing;Technology;XML;},
note = {Content-based queries;Data modeling;Efficient systems;In buildings;Indexing methods;Inverted lists;Join operations;New approaches;Query results;Selection based;Semi-structured datum;Xml indexing;},
URL = {http://dx.doi.org/10.1145/1416691.1416696},
}

@inproceedings{20083011391745 ,
language = {English},
copyright = {Compilation and indexing terms, Copyright 2011 Elsevier Inc.},
copyright = {Compendex},
title = {An approach for XML similarity join using tree serialization},
journal = {Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)},
author = {Wen, Lianzi and Amagasa, Toshiyuki and Kitagawa, Hiroyuki},
volume = {4947 LNCS},
year = {2008},
pages = {562 - 570},
issn = {03029743},
address = {New Delhi, India},
abstract = {This paper proposes a scheme for similarity join over XML data based on XML data serialization and subsequent similarity matching over XML node subsequences. With the recent explosive diffusion of XML, great volumes of electronic data are now marked up with XML. As a consequence, a growing amount of XML data represents similar contents, but with dissimilar structures. To extract as much information as possible from this heterogeneous information, similarity join has been used. Our proposed similarity join for XML data can be summarized as follows: 1) we serialize XML data as XML node sequences; 2) we extract semantically/structurally coherent subsequences; 3) we filter out dissimilar subsequences using textual information; and 4) we extract pairs of subsequences as the final result by checking structural similarity. The above process is costly to execute. To make it scalable against large document sets, we use Bloom filter to speed up text similarity computation. We show the feasibility of the proposed scheme by experiments. © 2008 Springer-Verlag Berlin Heidelberg.},
key = {XML},
keywords = {Arsenic compounds;Crack propagation;Data structures;Database systems;Information management;Markup languages;Wave filters;},
note = {Advanced applications;Bloom filtering;Document sets;Electronic data;Heidelberg (CO);Heterogeneous information;International conferences;Similarity matching;Speed ups;Structural similarity (SSIM);Text similarity;Textual information;XML data;},
URL = {http://dx.doi.org/10.1007/978-3-540-78568-2_47},
}

@inproceedings{20084011619261 ,
language = {English},
copyright = {Compilation and indexing terms, Copyright 2011 Elsevier Inc.},
copyright = {Compendex},
title = {Approximate joins for data-centric XML},
journal = {Proceedings - International Conference on Data Engineering},
author = {Augsten, Nikolaus and Bohlen, Michael and Dyreson, Curtis and Gamper, Johann},
year = {2008},
pages = {814 - 823},
issn = {10844627},
address = {Cancun, Mexico},
abstract = {In data integration applications, a join matches elements that are common to two data sources. Often, however, elements are represented slightly different in each source, so an approximate join must be used. For XML data, most approximate join strategies are based on some ordered tree matching technique. But in data-centric XML the order is irrelevant: two elements should match even if their subelement order varies. In this paper we give a solution for the approximate join of unordered trees. Our solution is based on windowed pq-grams. We develop an efflcient technique to systematically generate windowed pq-grams in a three-step process: sorting the unordered tree, extending the sorted tree with dummy nodes, and computing the windowed pg-grams on the extended tree. The windowed pg-gram distance between two sorted trees approximates the tree edit distance between the respective unordered trees. The approximate join algorithm based on windowed pq-grams is implemented as an equality join on strings which avoids the costly computation of the distance between every pair of input trees. Our experiments with synthetic and real world data confirm the analytic results and suggest that our technique is both useful and scalable. © 2008 IEEE.},
key = {Trees (mathematics)},
keywords = {Information management;Integration;Markup languages;Technology;XML;},
note = {Data engineering;Data integration applications;Data sourcing;Data-centric;Dummy nodes;International conferences;Join algorithms;Real-world data;Three-step process;Tree edit distances;Tree matching;Unordered trees;XML data;},
URL = {http://dx.doi.org/10.1109/ICDE.2008.4497490},
}

@inproceedings{20083011391746 ,
language = {English},
copyright = {Compilation and indexing terms, Copyright 2011 Elsevier Inc.},
copyright = {Compendex},
title = {A holistic algorithm for efficiently evaluating xtwig joins},
journal = {Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)},
author = {Ning, Bo and Wang, Guoren and Yu, Jeffrey Xu},
volume = {4947 LNCS},
year = {2008},
pages = {571 - 579},
issn = {03029743},
address = {New Delhi, India},
abstract = {More and more XML data have been generated and used in the data exchange. XML employs a tree-structure data model, but lots of queries submitted by users are not like the tree-structure. Those queries contain ancestor axis in predicates, and specify the pattern of selection predicates on multiple elements from descendants to ancestors. Efficiently finding all occurrences of such an xtwig pattern in an XML database is crucial for XML query processing. A straightforward method is to rewrite an xtwig pattern to equivalent reverse-axis-free one. However, this method needs scanning the element streams several times and is rather expensive to evaluate. In this paper, we study the xtwig pattern, and propose two basic decomposing methods, VertiDec and HoriDec, and a holistic processing method, XtwigStack, for processing xtwig queries. The experiments show that the holistic algorithm is much more efficient than the rewriting and decomposition approaches. © 2008 Springer-Verlag Berlin Heidelberg.},
key = {Trees (mathematics)},
keywords = {Boolean functions;Database systems;Decomposition;Information management;Markup languages;Query processing;XML;},
note = {Advanced applications;Data exchange (DX);Heidelberg (CO);International conferences;Multiple elements;Processing methods;Tree structures;XML data;XML databases;XML query processing;},
URL = {http://dx.doi.org/10.1007/978-3-540-78568-2_48},
}

@inproceedings{20084411664863 ,
language = {English},
copyright = {Compilation and indexing terms, Copyright 2011 Elsevier Inc.},
copyright = {Compendex},
title = {Evaluating performance and quality of XML-based similarity joins},
journal = {Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)},
author = {Ribeiro, Leonardo and Harder, Theo},
volume = {5207 LNCS},
year = {2008},
pages = {246 - 261},
issn = {03029743},
address = {Pori, Finland},
abstract = {A similarity join correlating fragments in XML documents, which are similar in structure and content, can be used as the core algorithm to support data cleaning and data integration tasks. For this reason, built-in support for such an operator in an XML database management system (XDBMS) is very attractive. However, similarity assessment is especially difficult on XML datasets, because structure, besides textual information, may embody variations in XML documents representing the same real-world entity. Moreover, the similarity computation is considerably more expensive for tree-structured objects and should, therefore, be a prime optimization candidate. In this paper, we explore and optimize tree-based similarity joins and analyze their performance and accuracy when embedded in native XDBMSs. © 2008 Springer-Verlag Berlin Heidelberg.},
key = {Database systems},
keywords = {Administrative data processing;Information systems;Integration;Management information systems;Markup languages;XML;},
note = {Core algorithms;Data cleanings;Data integrations;Datasets;Evaluating;Similarity assessments;Similarity computations;Similarity joins;Textual informations;Xml databases;Xml documents;},
URL = {http://dx.doi.org/10.1007/978-3-540-85713-6_18},
}

@inproceedings{20080311037159 ,
language = {English},
copyright = {Compilation and indexing terms, Copyright 2011 Elsevier Inc.},
copyright = {Compendex},
title = {Cost-based query optimization for multi reachability joins},
journal = {Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)},
author = {Cheng, Jiefeng and Yu, Jeffrey Xu and Ding, Bolin},
volume = {4443 LNCS},
year = {2007},
pages = {18 - 30},
issn = {03029743},
address = {Bangkok, Thailand},
abstract = {There is a need to efficiently identify reachabilities between different types of objects over a large data graph. A reachability join (R-join) serves as a primitive operator for such a purpose. Given two types, A and D, R-join finds all pairs of A and D that D-typed objects are reachable from some A-typed objects. In this paper, we focus on processing multi reachability joins (R-joins). In the literature, the up-to-date approach extended the well-known twig-stack join algorithm, to be applicable on directed acyclic graphs (DAGs). The efficiency of such an approach is affected by the density of large DAGs. In this paper, we present algorithms to optimize R-joins using a dynamic programming based on the estimated costs associated with R-join. Our algorithm is not affected by the density of graphs. We conducted extensive performance studies, and report our findings in our performance studies. © Springer-Verlag Berlin Heidelberg 2007.},
key = {Query languages},
keywords = {Algorithms;Graph theory;Optimization;Parameter estimation;},
note = {Multi reachability;R-joins;},
}

@inproceedings{20074410896441 ,
language = {English},
copyright = {Compilation and indexing terms, Copyright 2011 Elsevier Inc.},
copyright = {Compendex},
title = {Massively multi-query join processing in publish/subscribe systems},
journal = {Proceedings of the ACM SIGMOD International Conference on Management of Data},
author = {Hong, Mingsheng and Demers, Alan J. and Gehrke, Johannes E. and Koch, Christoph and Riedewald, Mirek and White, Walker M.},
year = {2007},
pages = {761 - 772},
issn = {07308078},
address = {Beijing, China},
abstract = {There has been much recent interest in XML publish/subscribe systems. Some systems scale to thousands of concurrent queries, but support a limited query language (usually a fragment of XPath 1.0). Other systems support more expressive languages, but do not scale well with the number of concurrent queries. In this paper, we propose a set of novel query processing techniques, referred to as Massively Multi-Query Join Processing techniques, for processing a large number of XML stream queries involving value joins over multiple XML streams and documents. These techniques enable the sharing of representations of inputs to multiple joins, and the sharing of join computation. Our techniques are also applicable to relational event processing systems and publish/subscribe systems that support join queries. We present experimental results to demonstrate the effectiveness of our techniques. We are able to process thousands of XML messages with hundreds of thousands of join queries on real RSS feed streams. Our techniques gain more than two orders of magnitude speedup compared to the naive approach of evaluating such join queries. Copyright 2007 ACM.},
key = {Query processing},
keywords = {Electronic publishing;Information retrieval;Query languages;Relational database systems;XML;},
note = {Multi-query optimization;Publish/subscribe;Stream query processing;XML join;},
URL = {http://dx.doi.org/10.1145/1247480.1247564},
}

@article{20072710695057 ,
language = {English},
copyright = {Compilation and indexing terms, Copyright 2011 Elsevier Inc.},
copyright = {Compendex},
title = {Holistic Join for Generalized Tree Patterns},
journal = {Information Systems},
author = {Ramanan, Prakash},
volume = {32},
number = {7},
year = {2007},
pages = {1018 - 1036},
issn = {03064379},
address = {Langford Lane, Kidlington, Oxford, OX5 1GB, United Kingdom},
abstract = {We consider the problem of evaluating an XQuery query Q (involving only child and descendant axes) on an XML document D. D is stored on a disk and is read from there, in document order. Chen et al. [From Tree Patterns to Generalized Tree Patterns: on efficient evaluation of XQuery, Proceedings of International Conference on Very Large Data Bases (VLDB), 2003, pp. 237-248] presented an algorithm to convert Q (from a large fragment of XQuery) into a Generalized Tree PatternGTP (Q), and a set J (Q) of value join conditions on its vertices. Evaluating Q on D reduces to finding the matches for GTP (Q) in D. We present an efficient algorithm for finding these matches. Excluding the computation of the value joins J (Q), our algorithm performs two linear passes over the data, and runs in O (d | Q |) memory space, where d denotes the depth of D; runtime and disk I/O are O (| Q ∥ D |). If separate input streams of document nodes for the individual vertices in GTP (Q) are available, our runtime and disk I/O are linear in the input size; this runtime and disk I/O are trivially optimal. © 2006 Elsevier B.V. All rights reserved.},
key = {Decision tables},
keywords = {Computation theory;Learning algorithms;Linear programming;Problem solving;Query languages;XML;},
note = {Generalized Tree Patterns;Query evaluation;XPath;XQuery;},
URL = {http://dx.doi.org/10.1016/j.is.2006.10.008},
}

@inproceedings{20073210756574 ,
language = {English},
copyright = {Compilation and indexing terms, Copyright 2011 Elsevier Inc.},
copyright = {Compendex},
title = {An index scheme for XML documents based on relationship joins},
journal = {Proceedings - 2006 10th International Conference on Computer Supported Cooperative Work in Design, CSCWD 2006},
author = {Wu, Chengwen and Dong, Jinxiang and Chen, Gang and Yu, Lihua},
year = {2006},
pages = {483 - 487},
address = {Nanjing, China},
abstract = {XML is rapidly emerging as a standard for information Storage, representation and exchange on the web. How to rapidly search and query XML documents efficiently has received many attentions in resent research. However, current querying schemes of XML documents typically involve in both node content and tree structural information, which may limit efficiency when facing the application that the tree structural information is more complicated than the tree node itself. In this paper, we propose the node relationships joins algorithms that utilize available indexes mainly on tree structural information. The relationships join algorithms work perfectly especially for Searching paths that are very long or whose lengths are unknown, Experimental results from our prototype system implementation highlight the correctness and efficiency of our solution. © 2006 IEEE.},
key = {Information management},
keywords = {Indexing (of information);Query processing;Relational database systems;World Wide Web;XML;},
note = {Element index;Numbering node;Relationship index;Relationship join;},
URL = {http://dx.doi.org/10.1109/CSCWD.2006.253228},
}

@inproceedings{20064310192505 ,
language = {English},
copyright = {Compilation and indexing terms, Copyright 2011 Elsevier Inc.},
copyright = {Compendex},
title = {Kappa-join: Efficient execution of existential quantification in XML query languages},
journal = {Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)},
author = {Brantner, Matthias and Helmer, Sven and Kanne, Carl-Christian and Moerkotte, Guido},
volume = {4156 LNCS},
year = {2006},
pages = {1 - 15},
issn = {03029743},
address = {Seoul, Korea, Republic of},
abstract = {XML query languages feature powerful primitives for formulating queries, involving comparison expressions which are existentially quantified. If such comparisons involve several scopes, they are correlated and, thus, become difficult to evaluate efficiently. In this paper, we develop a new ternary operator, called Kappa-Join, for efficiently evaluating queries with existential quantification. In XML queries, a correlation predicate can occur conjunctively and disjunctively. Our decorrelation approach not only improves performance in the conjunctive case, but also allows decorrelation of the disjunctive case. The latter is not possible with any known technique. In an experimental evaluation, we compare the query execution times of the Kappa-Join with existing XPath evaluation techniques to demonstrate the effectiveness of our new operator. © Springer-Verlag Berlin Heidelberg 2006.},
key = {Query languages},
keywords = {Artificial intelligence;Database systems;Evaluation;},
note = {Conjunctive case;Existential quantification;Experimental evaluation;Ternary operator;},
}

@article{2006269959166 ,
language = {English},
copyright = {Compilation and indexing terms, Copyright 2011 Elsevier Inc.},
copyright = {Compendex},
title = {Integrating XML data sources using approximate joins},
journal = {ACM Transactions on Database Systems},
author = {Guha, Sudipto and Jagadish, H.V. and Koudas, Nick and Srivastava, Divesh and Yu, Ting},
volume = {31},
number = {1},
year = {2006},
pages = {161 - 207},
issn = {03625915},
abstract = {XML is widely recognized as the data interchange standard of tomorrow because of its ability to represent data from a variety of sources. Hence, XML is likely to be the format through which data from multiple sources is integrated. In this article, we study the problem of integrating XML data sources through correlations realized as join operations. A challenging aspect of this operation is the XML document structure. Two documents might convey approximately or exactly the same information but may be quite different in structure. Consequently, an approximate match in structure, in addition to content, has to be folded into the join operation. We quantify an approximate match in structure and content for pairs of XML documents using well defined notions of distance. We show how notions of distance that have metric properties can be incorporated in a framework for joins between XML data sources and introduce the idea of reference sets to facilitate this operation. Intuitively, a reference set consists of data elements used to project the data space. We characterize what constitutes a good choice of a reference set, and we propose sampling-based algorithms to identify them. We then instantiate our join framework using the tree edit distance between a pair of trees. We next turn our attention to utilizing well known index structures to improve the performance of approximate XML join operations. We present a methodology enabling adaptation of index structures for this problem, and we instantiate it in terms of the R-tree. We demonstrate the practical utility of our solutions using large collections of real and synthetic XML data sets, varying parameters of interest, and highlighting the performance benefits of our approach. © 2006 ACM.},
key = {XML},
keywords = {Algorithms;Correlation methods;Data structures;Database systems;Information dissemination;Trees (mathematics);},
note = {Approximate joins;Data integration;Joins;Tree edit distance;},
URL = {http://dx.doi.org/10.1145/1132863.1132868},
}

@article{2006129765186 ,
language = {English},
copyright = {Compilation and indexing terms, Copyright 2011 Elsevier Inc.},
copyright = {Compendex},
title = {Join minimization in XML-to-SQL translation: An algebraic approach},
journal = {SIGMOD Record},
author = {Mani, Murali and Wang, Song and Dougherty, Dan and Rundensteiner, Elke A.},
volume = {35},
number = {1},
year = {2006},
pages = {20 - 25},
issn = {01635808},
abstract = {Consider an XML view defined over a relational database, and a user query specified over this view. This user XML query is typically processed using the following steps: (a) our translator maps the XML query to one or more SQL queries, (b) the relational engine translates an SQL query to a relational algebra plan, (c) the relational engine executes the algebra plan and returns SQL results, and (d) our translator translates the SQL results back to XML. However, a straightforward approach produces a relational algebra plan after step (b) that is inefficient and has redundant joins. In this paper, we report on our preliminary observations with respect to how joins in such a relational algebra plan can be minimized. Our approach works on the relational algebra plan and optimizes it using novel rewrite rules that consider pairs of joins in the plan and determine whether one of them is redundant and hence can be removed. Our study shows that algebraic techniques achieve effective join minimization, and such techniques are useful and can be integrated into mainstream SQL engines.},
key = {XML},
keywords = {Linear algebra;Query languages;Relational database systems;},
note = {Algebraic techniques;Relational algebra;SQL query;},
}

@inproceedings{20082511327695 ,
language = {English},
copyright = {Compilation and indexing terms, Copyright 2011 Elsevier Inc.},
copyright = {Compendex},
title = {Functional dependency maintenance and lossless join decomposition in XML model decomposition},
journal = {2006 2nd International Conference on Semantics Knowledge and Grid, SKG},
author = {Li, Xia and Ye, Fei-Yue and Yuan, Hong-Juan and Peng, Wen-Tao},
year = {2006},
address = {Guilin Guangxi, China},
abstract = {The paper studies "equality" in XML model decomposition. XML IFD, TFD, AFD and MVD are proposed. According to complexity of these FDs, four XML normal forms are presented. FD maintenance and lossless join decomposition of DTD are defined to analyze "equality" of decomposition. Four lossless algorithms are proposed to decompose DTD into some XML normal form and validity of these algorithms is analyzed. © 2006 IEEE.},
key = {Information management},
keywords = {Algorithms;Evolutionary algorithms;Information theory;Maintenance;Markup languages;Mathematical models;Neodymium;Semantics;XML;},
note = {(algorithmic) complexity;Functional dependency (FD);international conferences;Lossless;Lossless algorithms;Normal form (NF);},
URL = {http://dx.doi.org/10.1109/SKG.2006.51},
}

@inproceedings{20062810002179 ,
language = {English},
copyright = {Compilation and indexing terms, Copyright 2011 Elsevier Inc.},
copyright = {Compendex},
title = {Applying cosine series to join size estimation},
journal = {International Conference on Information and Knowledge Management, Proceedings},
author = {Luo, Cheng and Jiang, Zhewei and Hou, Wen-Chi},
year = {2005},
pages = {227 - 228},
address = {Bremen, Germany},
abstract = {This paper provides a general overview of two innovative applications of Cosine series in XML joins and data stream joins.},
key = {XML},
keywords = {Computer applications;Database systems;Information analysis;Optimization;Size determination;},
note = {Cosine series;Data stream;Query optimization;Structural join;},
}

@inproceedings{2006219886428 ,
language = {English},
copyright = {Compilation and indexing terms, Copyright 2011 Elsevier Inc.},
copyright = {Compendex},
title = {Subgraph join: Efficient processing subgraph queries on graph-structured XML document},
journal = {Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)},
author = {Wang, Hongzhi and Wang, Wei and Lin, Xuemin and Li, Jianzhong},
volume = {3739 LNCS},
year = {2005},
pages = {68 - 80},
issn = {03029743},
address = {Hangzhou, China},
abstract = {The information in many applications can be naturally represented as graph-structured XML document. Structural query on graph structured XML document matches the subgraph of graph structured XML document on some given schema. The query processing of graphstructured XML document brings new challenges. In this paper, for the processing of subgraph query, we design a subgraph join algorithm based on reachability coding. Using efficient data structure, subgraph join algorithm can process subgraph query with various structures efficiently. © Springer-Verlag Berlin Heidelberg 2005.},
key = {Graph theory},
keywords = {Algorithms;Codes (symbols);Computer graphics;Data structures;Query languages;XML;},
note = {Graph-structured XML document;Subgraph join algorithm;Subgraph queries;},
}

@inproceedings{2006229908924 ,
language = {English},
copyright = {Compilation and indexing terms, Copyright 2011 Elsevier Inc.},
copyright = {Compendex},
title = {Efficient join algorithms for integrating XML data in grid environment},
journal = {Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)},
author = {Wang, Hongzhi and Li, Jianzhong and Xiong, Shuguang},
volume = {3795 LNCS},
year = {2005},
pages = {547 - 553},
issn = {03029743},
address = {Beijing, China},
abstract = {For its self-description feature, XML can be used to represent information in grid environment. Querying XML data distributed in grid environment brings new challenges. In this paper, we focus on join algorithms in result merge step of query processing. In order to transmit results efficiently, we present strategies of data compacting, as well as 4 join operator models. Based on the compacted data structure, we design efficient algorithms of these join operators. Extensive experimental results shows that our data compact strategy is effective; our join algorithms outperform XJoin significantly and have good scalability. © Springer-Verlag Berlin Heidelberg 2005.},
key = {Algorithms},
keywords = {Data processing;Data structures;Distributed computer systems;Query languages;XML;},
note = {Data compacting;Grid environment;Operator models;},
}

@inproceedings{2005429418092 ,
language = {English},
copyright = {Compilation and indexing terms, Copyright 2011 Elsevier Inc.},
copyright = {Compendex},
title = {LAX: An efficient approximate XML join based on clustered leaf nodes for XML data integration},
journal = {Lecture Notes in Computer Science},
author = {Liang, Wenxin and Yokota, Haruo},
volume = {3567},
year = {2005},
pages = {82 - 97},
issn = {03029743},
address = {Sunderland, United kingdom},
abstract = {Recently, more and more data are published and exchanged by XML on the Internet. However, different XML data sources might contain the same data but have different structures. Therefore, it requires an efficient method to integrate such XML data sources so that more complete and useful information can be conveniently accessed and acquired by users. The tree edit distance is regarded as an effective metric for evaluating the structural similarity in XML documents. However, its computational cost is extremely expensive and the traditional wisdom in join algorithms cannot be applied easily. In this paper, we propose LAX (Leaf-clustering based Approximate XML join algorithm), in which the two XML document trees are clustered into subtrees representing independent items and the similarity between them is determined by calculating the similarity degree based on the leaf nodes of each pair of subtrees. We also propose an effective algorithm for clustering the XML document for LAX. We show that it is easily to apply the traditional wisdom in join algorithms to LAX and the join result contains complete information of the two documents. We then do experiments to compare LAX with the tree edit distance and evaluate its performance using both synthetic and real data sets. Our experimental results show that LAX is more efficient in performance and more effective for measuring the approximate similarity between XML documents than the tree edit distance. © Springer-Verlag Berlin Heidelberg 2005.},
key = {XML},
keywords = {Algorithms;Computational methods;Information analysis;Integration;Metric system;},
note = {Data sets;Data sources;LAX;},
}

@inproceedings{2005389367032 ,
language = {English},
copyright = {Compilation and indexing terms, Copyright 2011 Elsevier Inc.},
copyright = {Compendex},
title = {PathStack: A holistic path join algorithm for path query with not-predicates on XML data},
journal = {Lecture Notes in Computer Science},
author = {Jiao, Enhua and Ling, Tok Wang and Chan, Chee-Yong},
volume = {3453},
year = {2005},
pages = {113 - 124},
issn = {03029743},
address = {Beijing, China},
abstract = {The evaluation of path queries forms the basis of complex XML query processing which has attracted a lot of research attention. However, none of these works have examined the processing of more complex queries that contain not-predicates. In this paper, we present the first study on evaluating path queries with not-predicates. We propose an efficient holistic path join algorithm, PathStack1, which has the following advantages: (1) it requires only one scan of the relevant data to evaluate path queries with not-predicates; (2) it does not generate any intermediate results; and (3) its memory space requirement is bounded by the longest path in the input XML document. We also present an improved variant of PathStack that further minimizes unnecessary computations. © Springer-Verlag Berlin Heidelberg 2005.},
key = {XML},
keywords = {Algorithms;Computation theory;Data reduction;Data storage equipment;Query languages;Research;},
note = {Complex query;Path join algorithms;Path queries;Query processing;},
}

@inproceedings{2005149020802 ,
language = {English},
copyright = {Compilation and indexing terms, Copyright 2011 Elsevier Inc.},
copyright = {Compendex},
title = {Normalization design of XML database schema for eliminating redundant schemas and satisfying lossless join},
journal = {Proceedings - IEEE/WIC/ACM International Conference on Web Intelligence, WI 2004},
author = {Wu, Yonghui},
year = {2004},
pages = {660 - 663},
address = {Beijing, China},
abstract = {Normalization design of XML database schema is to produce a set of XML schemas or DTDs that can well represent data dependencies and eliminate redundancies. In the current researches on normalization design of XML database schema, redundancies in XML database schema are not studied specially and classified, and normalization design algorithms are only converting an initial schema into one in one of normal forms proposed in these researches. The paper defines hierarchical schema representing XML database schema and corresponding normal forms - first normal form (1NF) and second normal form (2NF) for XML database schema, and presents the algorithm eliminating redundant schemas and normalization design algorithm for 2NF. In XML database schema in INF, the set of full and embedded MVDs are implied by the given set of MVDs. XML database schema in 2NF satisfies properties for INF, eliminates redundant schemas, and satisfies lossless join property. © 2004 IEEE.},
key = {XML},
keywords = {Algorithms;Data reduction;Hierarchical systems;Redundancy;Relational database systems;Semantics;World Wide Web;},
note = {Data dependencies;Database schema;First normal form (1NF);Hierarchical schema;},
}

@inproceedings{2003497764120 ,
language = {English},
copyright = {Compilation and indexing terms, Copyright 2011 Elsevier Inc.},
copyright = {Compendex},
title = {Index-based approximate XML joins},
journal = {Proceedings - International Conference on Data Engineering},
author = {Guha, Sudipto and Koudas, Nick and Srivastava, Divesh and Yu, Ting},
year = {2003},
pages = {708 - 710},
address = {Bangalore, India},
abstract = {XML data integration tools are facing a variety of challenges for their efficient and effective operation. Among these is the requirement to handle a variety of inconsistencies or mistakes present in the data sets. In this paper we study the problem of integrating XML data sources through index assisted join operations, using notions of approximate match in the structure and content of XML documents as the join predicate. We show how a well known and widely deployed index structure, namely the R-tree, can be adopted to improve the performance of such operations. We propose novel search and join algorithms for R-trees adopted to index XML document collections. We also propose novel optimization objectives for R-tree construction, making R-trees better suited for this application.},
key = {XML},
keywords = {Algorithms;Data structures;Database systems;Indexing (of information);Optimization;},
note = {Data integration tools;Data sets;},
}

@article{2003157435694 ,
language = {English},
copyright = {Compilation and indexing terms, Copyright 2011 Elsevier Inc.},
copyright = {Compendex},
title = {Efficient processing of regular path joins using PID},
journal = {Information and Software Technology},
author = {Kim, Jongik and Kim, Hyoung-Joo},
volume = {45},
number = {5},
year = {2003},
pages = {241 - 251},
issn = {09505849},
abstract = {XML is data that has no fixed structure. So it is hard to design a schema for storing and querying an XML data. Instead of a fixed schema, graph-based data models are widely adopted for querying XML. Queries on XML are based on paths in a data graph. A meaningful query usually has several paths in it, but much of recent research is more concerned with optimizing a single path in a query. In this paper, we present an efficient technique for processing multiple path expressions in a query. We implemented our technique and present preliminary performance results. © 2003 Elsevier Science B.V. All rights reserved.},
key = {Information technology},
keywords = {Data structures;Database systems;HTML;XML;},
note = {Graph-based data models;},
URL = {http://dx.doi.org/10.1016/S0950-5849(02)00208-2},
}

@inproceedings{2004098040741 ,
language = {English},
copyright = {Compilation and indexing terms, Copyright 2011 Elsevier Inc.},
copyright = {Compendex},
title = {Containment Join Size Estimation: Models and Methods},
journal = {Proceedings of the ACM SIGMOD International Conference on Management of Data},
author = {Wang, Wei and Jiang, Haifeng and Lu, Hongjun and Yu, Jeffrey Xu},
year = {2003},
pages = {145 - 156},
issn = {07308078},
address = {San Diego, CA, United states},
abstract = {Recent years witnessed an increasing interest in researches in XML, partly due to the fact that XML has now become the de facto standard for data interchange over the internet. A large amount of work has been reported on XML storage models and query processing techniques. However, few works have addressed issues of XML query optimization. In this paper, we report our study on one of the challenges in XML query optimization: containment join size estimation. Containment join is well accepted as an important operation in XML query processing. Estimating the size of its results is no doubt essential to generate efficient XML query processing plans. We propose two models, the interval model and the position model, and a set of estimation methods based on these two models. Comprehensive performance studies were conducted. The results not only demonstrate the advantages of our new algorithms over existing algorithms, but also provide valuable insights into the tradeoff among various parameters.},
URL = {http://dx.doi.org/10.1145/872757.872777},
}

@article{2005319276911 ,
language = {English},
copyright = {Compilation and indexing terms, Copyright 2011 Elsevier Inc.},
copyright = {Compendex},
title = {Invensys joins the fray on open automation architecture},
journal = {Chemical Week},
author = {Mullin, Rick},
volume = {164},
number = {37},
year = {2002},
pages = {33 - },
issn = {0009272X},
abstract = {A plant automation and control system architecture, the Archestra platform introduced by Invensys thal allows users to mix and match IT and controls components from all major vendors without special integration programming is discussed. The platform uses Microsoft.net, an XML-based communication platform for Internet and Intranet business system communications, as an information conduit between control systems and a link between automation and EWP systems. Users are not only looking to open architecture as a means for mixing systems at low cost, but are also looking for ways to manipulate data from real time control systems. Users want to develop systems that are effectively leverage manufacturing expertise across worldwide plant operations.},
key = {Automation},
keywords = {Chemical industry;Computer architecture;Computer systems programming;Control system analysis;Information analysis;Information technology;Internet;Knowledge based systems;Marketing;Open systems;Product development;Real time systems;XML;},
note = {Communication standards;Invensys (CO);Open automation architecture;Real time control systems;},
}

@inproceedings{2002387094049 ,
language = {English},
copyright = {Compilation and indexing terms, Copyright 2011 Elsevier Inc.},
copyright = {Compendex},
title = {Approximate XML joins},
journal = {Proceedings of the ACM SIGMOD International Conference on Management of Data},
author = {Guha, Sudipto and Jagadish, H.V. and Koudas, Nick and Srivastava, Divesh and Yu, Ting},
year = {2002},
pages = {287 - 298},
issn = {07308078},
address = {Madison, WI, United states},
abstract = {XML is widely recognized as the data interchange standard for tomorrow, because of its ability to represent data from a wide variety of sources. Hence, XML is likely to be the format through which data from multiple sources is integrated. In this paper we study the problem of integrating XML data sources through correlations realized as join operations. A challenging aspect of this operation is the XML document structure. Two documents might convey approximately or exactly the same information but may be quite different in structure. Consequently approximate match in structure, in addition to, content has to be folded in the join operation. We quantify approximate match in structure and content using well defined notions of distance. For structure, we propose computationally inexpensive lower and upper bounds for the tree edit distance metric between two trees. We then show how the tree edit distance, and other metrics that quantify distance between trees, can be incorporated in a join framework. We introduce the notion of reference sets to facilitate this operation. Intuitively, a reference set consists of data elements used to project the data space. We characterize what constitutes a good choice of a reference set and we propose sampling based algorithms to identify them. This gives rise to a variety of algorithmic approaches for the problem, which we formulate and analyze. We demonstrate the practical utility of our solutions using large collections of real and synthetic XML data sets.},
}
//bib end

thanks.

I imported your record set

I imported your record set and tested the query using the dbsys.exe tool under the "Tools" sub-folder. The query number 2 returned 33 hits which is correct. How your query is entered in Biblioscape? I don't think the problem is caused by the database engine. You can zip all the files under your database folder and email me the zip. I may be able to find the problem with the database. Thanks, Paul

Dear Paul I entered

Dear Paul
I entered the query in "quick search-old-smart search". By the way, I can get the right answers using dbsys.exe too.
I have emailed the zipped db to support@biblioscape.com.
Thanks.

Thank you for sending me the

Thank you for sending me the file. I can reproduce it in Biblioscape. Please use query number 3 to get around this. Thanks, Paul