摘 要
伴随着强有力的数据库风格的查询语言比如XPath和XQuery成为W3C的标准,对XML数据的查询就成了一个很值得探究的问题。
XML查询通常指明一些基于多个元素的选择谓词组成的查询模式树。基本的树结构化关系是父子关系和祖先-后代关系,进而在XML数据库中匹配上述基本二元结构关系就成了XML查询处理的一个关键操作。在本文中,我们建立了带有最优索引的XML文档树对象模型,依据由XPath查询语句转化得到的模式树(TPQ),使用栈-树-降序算法进行结构化联接从而在XML文档树中精确匹配查询模式树并最终在基于XML数据的查询系统中进行了编码实现。
由于XML数据的异质性,受信息提取领域思想的启示,采用近似查询匹配并返回按近似度等级排列的答案往往比返回精确查询结果更为合适。并且,随着XML知识库数量的增长,计算K个最优查询匹配的能力逐渐得到重视。模式树是查询树形结构的数据比如XML的基础。进而,本文研究了建立在查询模式树松弛方法基础上的XML近似查询匹配,并介绍内嵌了分级方案的高效算法来处理潜在的查询结果。
关键词 XML,结构化联接,近似,松弛,top-k,查询
ABSTRACT
Querying XML data is a well-explored topic with powerful databasestyle query languages such as XPath and XQuery set to become W3C standards.
XML queries typically specify patterns of selection predicates on multiple elements that have some specified tree structured relationships. The primitive tree structured relationships are parent-child and ancestor-descendant, and finding all occurrences of these relationships in an XML database is a core operation for XML query processing.In this paper, we build XML DOM tree with optimal index, according as the tree pattern query(TPQ) transformed from XPath query sentence, and use stack-tree-des algorithm to do structural join which exactly match TPQ against the DOM tree.
Because of the heterogeneity of XML data, it is often more appropriate to permit approximate query matching and return ranked answers, in the spirit of IR, than to return exact matches, and the ability to compute top-k matches to XML queries is gaining importance due to the increasing number of large XML repositories. Tree patterns are fundamental to querying tree-structured data like XML. In this paper, we also study the problem of approximate XML query matching, based on tree pattern relaxations ,and introduce ranking schemes embedded in efficient algorithms to process potential answers.
KEY WORDS XML, structure join, approximate, relaxation, top-k, query
目录
摘 要 4
ABSTRACT 5
前 言 8
第1章 绪论 11
1.1论文背景 11
1.2本文的研究路线和主要研究内容 14
1.3研究的目的和意义 17
第二章 相关基础知识 18
2.1 XML文档树数据模型 18
2.2 XML文档树索引结构及其特点 19
2.3 XPath查询语言 21
2.4查询模式树 27
第三章 XML精确查询方法的研究及实现 29
3.1 精确查询方法概述 29
3.2 XML文档树的建立 31
3.3 结构化联接算法 31
3.3.1栈-树-降序算法(Stack-Tree-Desc) 34
3.3.2栈-树-降序算法的分析 36
3.3.3 栈-树-降序算法的程序实现 37
3.3.4 程序运行结果的实例演示 38
3.3.5 算法的进一步改进 40
第四章 XML近似查询方法的研究 42
4.1 近似查询概述 42