From 21e8df5c109e4dd4f50bcebdebf8e4c4ce297560 Mon Sep 17 00:00:00 2001 From: ser Date: Thu, 19 May 2005 02:58:11 +0000 Subject: Merged in development from the main REXML repository. * Fixed bug #34, typo in xpath_parser. * Previous fix, (include? -> includes?) was incorrect. * Added another test for encoding * Started AnyName support in RelaxNG * Added Element#Attributes#to_a, so that it does something intelligent. This was needed by XPath, for '@*' * Fixed XPath so that @* works. * Added xmlgrep to the bin/ directory. A little tool allowing you to grep for XPaths in an XML document. * Fixed a CDATA pretty-printing bug. (#39) * Fixed a buffering bug in Source.rb that affected the SAX parser This bug was related to how REXML determines the encoding of a file, and evinced itself by hanging on input when using the SAX parser. * The unit test for the previous patch. Forgot to commit it. * Minor pretty printing fix. * Applied Curt Sampson's optimization improvements * Issue #9; 3.1.3: The SAX parser was not denormalizing entity references in incoming text. All declared internal entities, as well as numeric entities, should now be denormalized. There was a related bug in that the SAX parser was actually double-encoding entities; this is also fixed. * bin/* programs should now be executable. Setting bin apps to executable * Issue 14; 3.1.3: DTD events are now all being passed by StreamParser Some of the DTD events were not being passed through by the stream parser. * #26: Element#add_element(nil) now raises an error Changed XPath searches so that if a non-Hash is passed, an error is raised Fixed a spurrious undefined method error in encoding. #29: XPath ordering bug fixed by Mark Williams. Incidentally, Mark supplied a superlative bug report, including a full unit test. Then he went ahead and fixed the bug. It doesn't get any better than this, folks. * Fixed a broken link. Thanks to Dick Davies for pointing it out. Added functions courtesy of Michael Neumann . Example code to follow. * Added Michael's sample code. Merged the changes in from branches/xpath_V * Fixed preceding:: and following:: axis Fixed the ordering bug that Martin Fowler reported. * Uncommented some code commented for testing Applied Nobu's changes to the Encoding infrastructure, which should fix potential threading issues. * Added more tests, and the missing syncenumerator class. Fixed the inheritance bug in the pull parser that James Britt found. Indentation changes, and changed some exceptions to runtime exceptions. * Changes by Matz, mostly of indent -> indent_level, to avoid function/variable naming conflicts * Tabs -> spaces (whitespace) Note the addition of syncenumerator.rb. This is a stopgap, until I can work on the class enough to get it accepted as a replacement for the SyncEnumerator that comes with the Generator class. My version is orders of magnitude faster than the Generator SyncEnumerator, but is currently missing a couple of features of the original. Eventually, I expect this class to migrate to another part of the source tree. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@8483 b2dd03c8-39d4-4d8f-98ff-823fe69b080e --- lib/rexml/document.rb | 294 ++++++------- lib/rexml/element.rb | 39 +- lib/rexml/functions.rb | 61 ++- lib/rexml/instruction.rb | 4 + lib/rexml/node.rb | 26 ++ lib/rexml/parsers/pullparser.rb | 59 ++- lib/rexml/parsers/sax2parser.rb | 18 +- lib/rexml/parsers/streamparser.rb | 5 +- lib/rexml/parsers/xpathparser.rb | 9 +- lib/rexml/rexml.rb | 11 +- lib/rexml/syncenumerator.rb | 33 ++ lib/rexml/text.rb | 542 ++++++++++++------------ lib/rexml/xmldecl.rb | 4 + lib/rexml/xpath.rb | 123 +++--- lib/rexml/xpath_parser.rb | 848 +++++++++++++++++++++++--------------- 15 files changed, 1210 insertions(+), 866 deletions(-) create mode 100644 lib/rexml/syncenumerator.rb (limited to 'lib') diff --git a/lib/rexml/document.rb b/lib/rexml/document.rb index 39360d4f4a..8755e04de1 100644 --- a/lib/rexml/document.rb +++ b/lib/rexml/document.rb @@ -16,164 +16,166 @@ module REXML # Document has a single child that can be accessed by root(). # Note that if you want to have an XML declaration written for a document # you create, you must add one; REXML documents do not write a default - # declaration for you. See |DECLARATION| and |write|. - class Document < Element - # A convenient default XML declaration. If you want an XML declaration, - # the easiest way to add one is mydoc << Document::DECLARATION + # declaration for you. See |DECLARATION| and |write|. + class Document < Element + # A convenient default XML declaration. If you want an XML declaration, + # the easiest way to add one is mydoc << Document::DECLARATION # +DEPRECATED+ # Use: mydoc << XMLDecl.default - DECLARATION = XMLDecl.default - - # Constructor - # @param source if supplied, must be a Document, String, or IO. - # Documents have their context and Element attributes cloned. - # Strings are expected to be valid XML documents. IOs are expected - # to be sources of valid XML documents. - # @param context if supplied, contains the context of the document; - # this should be a Hash. - # NOTE that I'm not sure what the context is for; I cloned it out of - # the Electric XML API (in which it also seems to do nothing), and it - # is now legacy. It may do something, someday... it may disappear. - def initialize( source = nil, context = {} ) - super() - @context = context - return if source.nil? - if source.kind_of? Document - @context = source.context - super source - else - build( source ) - end - end + DECLARATION = XMLDecl.default + + # Constructor + # @param source if supplied, must be a Document, String, or IO. + # Documents have their context and Element attributes cloned. + # Strings are expected to be valid XML documents. IOs are expected + # to be sources of valid XML documents. + # @param context if supplied, contains the context of the document; + # this should be a Hash. + # NOTE that I'm not sure what the context is for; I cloned it out of + # the Electric XML API (in which it also seems to do nothing), and it + # is now legacy. It may do something, someday... it may disappear. + def initialize( source = nil, context = {} ) + super() + @context = context + return if source.nil? + if source.kind_of? Document + @context = source.context + super source + else + build( source ) + end + end def node_type :document end - # Should be obvious - def clone - Document.new self - end - - # According to the XML spec, a root node has no expanded name - def expanded_name - '' - #d = doc_type - #d ? d.name : "UNDEFINED" - end - - alias :name :expanded_name - - # We override this, because XMLDecls and DocTypes must go at the start - # of the document - def add( child ) - if child.kind_of? XMLDecl - @children.unshift child - elsif child.kind_of? DocType - if @children[0].kind_of? XMLDecl - @children[1,0] = child - else - @children.unshift child - end - child.parent = self - else - rv = super - raise "attempted adding second root element to document" if @elements.size > 1 - rv - end - end - alias :<< :add - - def add_element(arg=nil, arg2=nil) - rv = super - raise "attempted adding second root element to document" if @elements.size > 1 - rv - end - - # @return the root Element of the document, or nil if this document - # has no children. - def root - @children.find { |item| item.kind_of? Element } - end - - # @return the DocType child of the document, if one exists, - # and nil otherwise. - def doctype - @children.find { |item| item.kind_of? DocType } - end - - # @return the XMLDecl of this document; if no XMLDecl has been - # set, the default declaration is returned. - def xml_decl - rv = @children[0] + # Should be obvious + def clone + Document.new self + end + + # According to the XML spec, a root node has no expanded name + def expanded_name + '' + #d = doc_type + #d ? d.name : "UNDEFINED" + end + + alias :name :expanded_name + + # We override this, because XMLDecls and DocTypes must go at the start + # of the document + def add( child ) + if child.kind_of? XMLDecl + @children.unshift child + elsif child.kind_of? DocType + if @children[0].kind_of? XMLDecl + @children[1,0] = child + else + @children.unshift child + end + child.parent = self + else + rv = super + raise "attempted adding second root element to document" if @elements.size > 1 + rv + end + end + alias :<< :add + + def add_element(arg=nil, arg2=nil) + rv = super + raise "attempted adding second root element to document" if @elements.size > 1 + rv + end + + # @return the root Element of the document, or nil if this document + # has no children. + def root + elements[1] + #self + #@children.find { |item| item.kind_of? Element } + end + + # @return the DocType child of the document, if one exists, + # and nil otherwise. + def doctype + @children.find { |item| item.kind_of? DocType } + end + + # @return the XMLDecl of this document; if no XMLDecl has been + # set, the default declaration is returned. + def xml_decl + rv = @children[0] return rv if rv.kind_of? XMLDecl rv = @children.unshift(XMLDecl.default)[0] - end - - # @return the XMLDecl version of this document as a String. - # If no XMLDecl has been set, returns the default version. - def version - xml_decl().version - end - - # @return the XMLDecl encoding of this document as a String. - # If no XMLDecl has been set, returns the default encoding. - def encoding - xml_decl().encoding - end - - # @return the XMLDecl standalone value of this document as a String. - # If no XMLDecl has been set, returns the default setting. - def stand_alone? - xml_decl().stand_alone? - end - - # Write the XML tree out, optionally with indent. This writes out the - # entire XML document, including XML declarations, doctype declarations, - # and processing instructions (if any are given). - # A controversial point is whether Document should always write the XML - # declaration () whether or not one is given by the - # user (or source document). REXML does not write one if one was not - # specified, because it adds unneccessary bandwidth to applications such - # as XML-RPC. - # - # - # output:: - # output an object which supports '<< string'; this is where the - # document will be written. - # indent:: - # An integer. If -1, no indenting will be used; otherwise, the - # indentation will be this number of spaces, and children will be - # indented an additional amount. Defaults to -1 - # transitive:: - # If transitive is true and indent is >= 0, then the output will be - # pretty-printed in such a way that the added whitespace does not affect - # the absolute *value* of the document -- that is, it leaves the value - # and number of Text nodes in the document unchanged. - # ie_hack:: - # Internet Explorer is the worst piece of crap to have ever been - # written, with the possible exception of Windows itself. Since IE is - # unable to parse proper XML, we have to provide a hack to generate XML - # that IE's limited abilities can handle. This hack inserts a space - # before the /> on empty tags. Defaults to false - def write( output=$stdout, indent_level=-1, transitive=false, ie_hack=false ) - output = Output.new( output, xml_decl.encoding ) if xml_decl.encoding != "UTF-8" && !output.kind_of?(Output) - @children.each { |node| - indent( output, indent_level ) if node.node_type == :element - if node.write( output, indent_level, transitive, ie_hack ) + end + + # @return the XMLDecl version of this document as a String. + # If no XMLDecl has been set, returns the default version. + def version + xml_decl().version + end + + # @return the XMLDecl encoding of this document as a String. + # If no XMLDecl has been set, returns the default encoding. + def encoding + xml_decl().encoding + end + + # @return the XMLDecl standalone value of this document as a String. + # If no XMLDecl has been set, returns the default setting. + def stand_alone? + xml_decl().stand_alone? + end + + # Write the XML tree out, optionally with indent. This writes out the + # entire XML document, including XML declarations, doctype declarations, + # and processing instructions (if any are given). + # A controversial point is whether Document should always write the XML + # declaration () whether or not one is given by the + # user (or source document). REXML does not write one if one was not + # specified, because it adds unneccessary bandwidth to applications such + # as XML-RPC. + # + # + # output:: + # output an object which supports '<< string'; this is where the + # document will be written. + # indent:: + # An integer. If -1, no indenting will be used; otherwise, the + # indentation will be this number of spaces, and children will be + # indented an additional amount. Defaults to -1 + # transitive:: + # If transitive is true and indent is >= 0, then the output will be + # pretty-printed in such a way that the added whitespace does not affect + # the absolute *value* of the document -- that is, it leaves the value + # and number of Text nodes in the document unchanged. + # ie_hack:: + # Internet Explorer is the worst piece of crap to have ever been + # written, with the possible exception of Windows itself. Since IE is + # unable to parse proper XML, we have to provide a hack to generate XML + # that IE's limited abilities can handle. This hack inserts a space + # before the /> on empty tags. Defaults to false + def write( output=$stdout, indent_level=-1, transitive=false, ie_hack=false ) + output = Output.new( output, xml_decl.encoding ) if xml_decl.encoding != "UTF-8" && !output.kind_of?(Output) + @children.each { |node| + indent( output, indent_level ) if node.node_type == :element + if node.write( output, indent_level, transitive, ie_hack ) output << "\n" unless indent_level<0 or node == @children[-1] end - } - end + } + end - - def Document::parse_stream( source, listener ) - Parsers::StreamParser.new( source, listener ).parse - end + + def Document::parse_stream( source, listener ) + Parsers::StreamParser.new( source, listener ).parse + end - private - def build( source ) + private + def build( source ) Parsers::TreeParser.new( source, self ).parse - end - end + end + end end diff --git a/lib/rexml/element.rb b/lib/rexml/element.rb index e18f0b28c7..7f578ecb3d 100644 --- a/lib/rexml/element.rb +++ b/lib/rexml/element.rb @@ -6,6 +6,14 @@ require "rexml/xpath" require "rexml/parseexception" module REXML + # An implementation note about namespaces: + # As we parse, when we find namespaces we put them in a hash and assign + # them a unique ID. We then convert the namespace prefix for the node + # to the unique ID. This makes namespace lookup much faster for the + # cost of extra memory use. We save the namespace prefix for the + # context node and convert it back when we write it. + @@namespaces = {} + # Represents a tagged XML element. Elements are characterized by # having children, attributes, and names, and can themselves be # children. @@ -91,19 +99,35 @@ module REXML Element.new self end - # Evaluates to the root element of the document that this element + # Evaluates to the root node of the document that this element # belongs to. If this element doesn't belong to a document, but does # belong to another Element, the parent's root will be returned, until the # earliest ancestor is found. + # + # Note that this is not the same as the document element. + # In the following example, is the document element, and the root + # node is the parent node of the document element. You may ask yourself + # why the root node is useful: consider the doctype and XML declaration, + # and any processing instructions before the document element... they + # are children of the root node, or siblings of the document element. + # The only time this isn't true is when an Element is created that is + # not part of any Document. In this case, the ancestor that has no + # parent acts as the root node. # d = Document.new '' # a = d[1] ; c = a[1][1] - # d.root # These all evaluate to the same Element, - # a.root # namely, - # c.root # - def root - parent.nil? ? self : parent.root + # d.root_node == d # TRUE + # a.root_node # namely, d + # c.root_node # again, d + def root_node + parent.nil? ? self : parent.root_node end + def root + return elements[1] if self.kind_of? Document + return self if parent.kind_of? Document or parent.nil? + return parent.root + end + # Evaluates to the document to which this element belongs, or nil if this # element doesn't belong to a document. def document @@ -270,7 +294,8 @@ module REXML # el = doc.add_element 'my-tag', {'attr1'=>'val1', 'attr2'=>'val2'} # el = Element.new 'my-tag' # doc.add_element el - def add_element element=nil, attrs=nil + def add_element element, attrs=nil + raise "First argument must be either an element name, or an Element object" if element.nil? el = @elements.add(element) if attrs.kind_of? Hash attrs.each do |key, value| diff --git a/lib/rexml/functions.rb b/lib/rexml/functions.rb index 9cbff99537..7a2fb996a0 100644 --- a/lib/rexml/functions.rb +++ b/lib/rexml/functions.rb @@ -7,41 +7,33 @@ module REXML # Therefore, in XML, "local-name()" is identical (and actually becomes) # "local_name()" module Functions - @@node = nil - @@index = nil - @@size = nil - @@variables = {} + @@context = nil @@namespace_context = {} + @@variables = {} - def Functions::node=(value); @@node = value; end - def Functions::index=(value); @@index = value; end - def Functions::size=(value); @@size = value; end - def Functions::variables=(value); @@variables = value; end - def Functions::namespace_context=(value) - @@namespace_context = value - end - def Functions::node; @@node; end - def Functions::index; @@index; end - def Functions::size; @@size; end - def Functions::variables; @@variables; end - def Functions::namespace_context; @@namespace_context; end + def Functions::namespace_context=(x) ; @@namespace_context=x ; end + def Functions::variables=(x) ; @@variables=x ; end + def Functions::namespace_context ; @@namespace_context ; end + def Functions::variables ; @@variables ; end + + def Functions::context=(value); @@context = value; end def Functions::text( ) - if @@node.node_type == :element - return @@node.text - elsif @@node.node_type == :text - return @@node.value + if @@context[:node].node_type == :element + return @@context[:node].find_all{|n| n.node_type == :text}.collect{|n| n.value} + elsif @@context[:node].node_type == :text + return @@context[:node].value else return false end end def Functions::last( ) - @@size + @@context[:size] end def Functions::position( ) - @@index + @@context[:index] end def Functions::count( node_set ) @@ -73,7 +65,7 @@ module REXML # Helper method. def Functions::get_namespace( node_set = nil ) if node_set == nil - yield @@node if defined? @@node.namespace + yield @@context[:node] if defined? @@context[:node].namespace else if node_set.namespace yield node_set @@ -214,7 +206,7 @@ module REXML # UNTESTED def Functions::normalize_space( string=nil ) - string = string(@@node) if string.nil? + string = string(@@context[:node]) if string.nil? if string.kind_of? Array string.collect{|x| string.to_s.strip.gsub(/\s+/um, ' ') if string} else @@ -291,7 +283,7 @@ module REXML # UNTESTED def Functions::lang( language ) lang = false - node = @@node + node = @@context[:node] attr = nil until node.nil? if node.node_type == :element @@ -325,15 +317,16 @@ module REXML # an object of a type other than the four basic types is converted to a # number in a way that is dependent on that type def Functions::number( object=nil ) - object = @@node unless object - if object == true + object = @@context[:node] unless object + case object + when true Float(1) - elsif object == false + when false Float(0) - elsif object.kind_of? Array + when Array number(string( object )) - elsif object.kind_of? Float - object + when Numeric + object.to_f else str = string( object ) #puts "STRING OF #{object.inspect} = #{str}" @@ -364,9 +357,13 @@ module REXML end end + def Functions::processing_instruction( node ) + node.node_type == :processing_instruction + end + def Functions::method_missing( id ) puts "METHOD MISSING #{id.id2name}" - XPath.match( @@node, id.id2name ) + XPath.match( @@context[:node], id.id2name ) end end end diff --git a/lib/rexml/instruction.rb b/lib/rexml/instruction.rb index ebd868c95c..ed4f604c74 100644 --- a/lib/rexml/instruction.rb +++ b/lib/rexml/instruction.rb @@ -58,5 +58,9 @@ module REXML def node_type :processing_instruction end + + def inspect + "" + end end end diff --git a/lib/rexml/node.rb b/lib/rexml/node.rb index 5f414c03ef..e5dec72a9d 100644 --- a/lib/rexml/node.rb +++ b/lib/rexml/node.rb @@ -36,5 +36,31 @@ module REXML def parent? false; end + + + # Visit all subnodes of +self+ recursively + def each_recursive(&block) # :yields: node + self.elements.each {|node| + block.call(node) + node.each_recursive(&block) + } + end + + # Find (and return) first subnode (recursively) for which the block + # evaluates to true. Returns +nil+ if none was found. + def find_first_recursive(&block) # :yields: node + each_recursive {|node| + return node if block.call(node) + } + return nil + end + + # Returns the index that +self+ has in its parent's elements array, so that + # the following equation holds true: + # + # node == node.parent.elements[node.index_in_parent] + def index_in_parent + parent.index(self)+1 + end end end diff --git a/lib/rexml/parsers/pullparser.rb b/lib/rexml/parsers/pullparser.rb index fe4d41c959..0a328ea8fc 100644 --- a/lib/rexml/parsers/pullparser.rb +++ b/lib/rexml/parsers/pullparser.rb @@ -23,13 +23,13 @@ module REXML # end # # Nat Price gave me some good ideas for the API. - class PullParser < BaseParser + class PullParser include XMLTokens def initialize stream - super @entities = {} @listeners = nil + @parser = BaseParser.new( stream ) end def add_listener( listener ) @@ -44,21 +44,38 @@ module REXML end def peek depth=0 - PullEvent.new(super) + PullEvent.new(@parser.peek(depth)) end + def has_next? + @parser.has_next? + end + def pull - event = super + event = @parser.pull case event[0] when :entitydecl @entities[ event[1] ] = event[2] unless event[2] =~ /PUBLIC|SYSTEM/ when :text - unnormalized = unnormalize( event[1], @entities ) + unnormalized = @parser.unnormalize( event[1], @entities ) event << unnormalized end PullEvent.new( event ) end + + def unshift token + @parser.unshift token + end + + def entity reference + @parser.entity( reference ) + end + + def empty? + @parser.empty? + end + end # A parsing event. The contents of the event are accessed as an +Array?, @@ -73,44 +90,65 @@ module REXML def initialize(arg) @contents = arg end - def []( index ) - @contents[index+1] + + def []( start, endd=nil) + if start.kind_of? Range + @contents.slice( start.begin+1 .. start.end ) + elsif start.kind_of? Numeric + if endd.nil? + @contents.slice( start+1 ) + else + @contents.slice( start+1, endd ) + end + else + raise "Illegal argument #{start.inspect} (#{start.class})" + end end + def event_type @contents[0] end + # Content: [ String tag_name, Hash attributes ] def start_element? @contents[0] == :start_element end + # Content: [ String tag_name ] def end_element? @contents[0] == :end_element end + # Content: [ String raw_text, String unnormalized_text ] def text? @contents[0] == :text end + # Content: [ String text ] def instruction? @contents[0] == :processing_instruction end + # Content: [ String text ] def comment? @contents[0] == :comment end + # Content: [ String name, String pub_sys, String long_name, String uri ] def doctype? @contents[0] == :start_doctype end + # Content: [ String text ] def attlistdecl? @contents[0] == :attlistdecl end + # Content: [ String text ] def elementdecl? @contents[0] == :elementdecl end + # Due to the wonders of DTDs, an entity declaration can be just about # anything. There's no way to normalize it; you'll have to interpret the # content yourself. However, the following is true: @@ -121,28 +159,33 @@ module REXML def entitydecl? @contents[0] == :entitydecl end + # Content: [ String text ] def notationdecl? @contents[0] == :notationdecl end + # Content: [ String text ] def entity? @contents[0] == :entity end + # Content: [ String text ] def cdata? @contents[0] == :cdata end + # Content: [ String version, String encoding, String standalone ] def xmldecl? @contents[0] == :xmldecl end + def error? @contents[0] == :error end def inspect - @contents[0].to_s + ": " + @contents[1..-1].inspect + @contents[0].to_s + ": " + @contents[1..-1].inspect end end end diff --git a/lib/rexml/parsers/sax2parser.rb b/lib/rexml/parsers/sax2parser.rb index 96440d17bf..d5ee1bcfcd 100644 --- a/lib/rexml/parsers/sax2parser.rb +++ b/lib/rexml/parsers/sax2parser.rb @@ -12,6 +12,7 @@ module REXML @namespace_stack = [] @has_listeners = false @tag_stack = [] + @entities = {} end def add_listener( listener ) @@ -143,10 +144,21 @@ module REXML end end when :text - normalized = @parser.normalize( event[1] ) - handle( :characters, normalized ) + #normalized = @parser.normalize( event[1] ) + #handle( :characters, normalized ) + copy = event[1].clone + @entities.each { |key, value| copy = copy.gsub("&#{key};", value) } + copy.gsub!( Text::NUMERICENTITY ) {|m| + m=$1 + m = "0#{m}" if m[0] == ?x + [Integer(m)].pack('U*') + } + handle( :characters, copy ) + when :entitydecl + @entities[ event[1] ] = event[2] if event.size == 3 + handle( *event ) when :processing_instruction, :comment, :doctype, :attlistdecl, - :elementdecl, :entitydecl, :cdata, :notationdecl, :xmldecl + :elementdecl, :cdata, :notationdecl, :xmldecl handle( *event ) end end diff --git a/lib/rexml/parsers/streamparser.rb b/lib/rexml/parsers/streamparser.rb index 357cc186e6..996d613e15 100644 --- a/lib/rexml/parsers/streamparser.rb +++ b/lib/rexml/parsers/streamparser.rb @@ -31,9 +31,8 @@ module REXML @listener.instruction( *event[1,2] ) when :start_doctype @listener.doctype( *event[1..-1] ) - when :notationdecl, :entitydecl, :elementdecl - @listener.notationdecl( event[1..-1] ) - when :comment, :attlistdecl, :elementdecl, :cdata, :xmldecl + when :comment, :attlistdecl, :notationdecl, :elementdecl, + :entitydecl, :cdata, :xmldecl, :attlistdecl @listener.send( event[0].to_s, *event[1..-1] ) end end diff --git a/lib/rexml/parsers/xpathparser.rb b/lib/rexml/parsers/xpathparser.rb index 41b2b8a5c1..6bac852d6b 100644 --- a/lib/rexml/parsers/xpathparser.rb +++ b/lib/rexml/parsers/xpathparser.rb @@ -20,7 +20,7 @@ module REXML path.gsub!(/([\(\[])\s+/, '\1') # Strip ignorable spaces path.gsub!( /\s+([\]\)])/, '\1' ) parsed = [] - path = LocationPath(path, parsed) + path = OrExpr(path, parsed) parsed end @@ -302,7 +302,7 @@ module REXML path = path[1..-1] end parsed << :processing_instruction - parsed << literal + parsed << (literal || '') when NCNAMETEST #puts "NCNAMETEST" prefix = $1 @@ -589,9 +589,10 @@ module REXML when /^(\w[-\w]*)(?:\()/ #puts "PrimaryExpr :: Function >>> #$1 -- '#$''" fname = $1 - path = $' + tmp = $' #puts "#{fname} =~ #{NT.inspect}" - #return nil if fname =~ NT + return path if fname =~ NT + path = tmp parsed << :function parsed << fname path = FunctionCall(path, parsed) diff --git a/lib/rexml/rexml.rb b/lib/rexml/rexml.rb index bf905b20e2..00fd50ad02 100644 --- a/lib/rexml/rexml.rb +++ b/lib/rexml/rexml.rb @@ -10,8 +10,8 @@ # # Main page:: http://www.germane-software.com/software/rexml # Author:: Sean Russell -# Version:: 3.1.1 -# Date:: +2004/162 +# Version:: 3.1.3 +# Date:: +2005/139 # # This API documentation can be downloaded from the REXML home page, or can # be accessed online[http://www.germane-software.com/software/rexml_doc] @@ -20,8 +20,7 @@ # or can be accessed # online[http://www.germane-software.com/software/rexml/docs/tutorial.html] module REXML - Copyright = "Copyright © 2001, 2002, 2003, 2004 Sean Russell " - Date = "+2004/186" - Version = "3.1.2" - + Copyright = "Copyright © 2001-2005 Sean Russell " + Date = "+2005/139" + Version = "3.1.3" end diff --git a/lib/rexml/syncenumerator.rb b/lib/rexml/syncenumerator.rb new file mode 100644 index 0000000000..955e006cb2 --- /dev/null +++ b/lib/rexml/syncenumerator.rb @@ -0,0 +1,33 @@ +module REXML + class SyncEnumerator + include Enumerable + + # Creates a new SyncEnumerator which enumerates rows of given + # Enumerable objects. + def initialize(*enums) + @gens = enums + @biggest = @gens[0] + @gens.each {|x| @biggest = x if x.size > @biggest.size } + end + + # Returns the number of enumerated Enumerable objects, i.e. the size + # of each row. + def size + @gens.size + end + + # Returns the number of enumerated Enumerable objects, i.e. the size + # of each row. + def length + @gens.length + end + + # Enumerates rows of the Enumerable objects. + def each + @biggest.zip( *@gens ) {|a| + yield(*a[1..-1]) + } + self + end + end +end diff --git a/lib/rexml/text.rb b/lib/rexml/text.rb index 3e5fcc23b6..9a83121af8 100644 --- a/lib/rexml/text.rb +++ b/lib/rexml/text.rb @@ -5,180 +5,182 @@ require 'rexml/doctype' require 'rexml/parseexception' module REXML - # Represents text nodes in an XML document - class Text < Child - include Comparable - # The order in which the substitutions occur - SPECIALS = [ /&(?!#?[\w-]+;)/u, //u, /"/u, /'/u, /\r/u ] - SUBSTITUTES = ['&', '<', '>', '"', ''', ' '] - # Characters which are substituted in written strings - SLAICEPS = [ '<', '>', '"', "'", '&' ] - SETUTITSBUS = [ /</u, />/u, /"/u, /'/u, /&/u ] + # Represents text nodes in an XML document + class Text < Child + include Comparable + # The order in which the substitutions occur + SPECIALS = [ /&(?!#?[\w-]+;)/u, //u, /"/u, /'/u, /\r/u ] + SUBSTITUTES = ['&', '<', '>', '"', ''', ' '] + # Characters which are substituted in written strings + SLAICEPS = [ '<', '>', '"', "'", '&' ] + SETUTITSBUS = [ /</u, />/u, /"/u, /'/u, /&/u ] - # If +raw+ is true, then REXML leaves the value alone - attr_accessor :raw + # If +raw+ is true, then REXML leaves the value alone + attr_accessor :raw - ILLEGAL = /(<|&(?!(#{Entity::NAME})|(#0*((?:\d+)|(?:x[a-fA-F0-9]+)));))/um - NUMERICENTITY = /�*((?:\d+)|(?:x[a-fA-F0-9]+));/ + ILLEGAL = /(<|&(?!(#{Entity::NAME})|(#0*((?:\d+)|(?:x[a-fA-F0-9]+)));))/um + NUMERICENTITY = /�*((?:\d+)|(?:x[a-fA-F0-9]+));/ - # Constructor - # +arg+ if a String, the content is set to the String. If a Text, - # the object is shallowly cloned. - # - # +respect_whitespace+ (boolean, false) if true, whitespace is - # respected - # - # +parent+ (nil) if this is a Parent object, the parent - # will be set to this. - # - # +raw+ (nil) This argument can be given three values. - # If true, then the value of used to construct this object is expected to - # contain no unescaped XML markup, and REXML will not change the text. If - # this value is false, the string may contain any characters, and REXML will - # escape any and all defined entities whose values are contained in the - # text. If this value is nil (the default), then the raw value of the - # parent will be used as the raw value for this node. If there is no raw - # value for the parent, and no value is supplied, the default is false. - # Text.new( "<&", false, nil, false ) #-> "<&" - # Text.new( "<&", false, nil, true ) #-> IllegalArgumentException - # Text.new( "<&", false, nil, true ) #-> "<&" - # # Assume that the entity "s" is defined to be "sean" - # # and that the entity "r" is defined to be "russell" - # Text.new( "sean russell" ) #-> "&s; &r;" - # Text.new( "sean russell", false, nil, true ) #-> "sean russell" - # - # +entity_filter+ (nil) This can be an array of entities to match in the - # supplied text. This argument is only useful if +raw+ is set to false. - # Text.new( "sean russell", false, nil, false, ["s"] ) #-> "&s; russell" - # Text.new( "sean russell", false, nil, true, ["s"] ) #-> "sean russell" - # In the last example, the +entity_filter+ argument is ignored. - # - # +pattern+ INTERNAL USE ONLY - def initialize(arg, respect_whitespace=false, parent=nil, raw=nil, - entity_filter=nil, illegal=ILLEGAL ) + # Constructor + # +arg+ if a String, the content is set to the String. If a Text, + # the object is shallowly cloned. + # + # +respect_whitespace+ (boolean, false) if true, whitespace is + # respected + # + # +parent+ (nil) if this is a Parent object, the parent + # will be set to this. + # + # +raw+ (nil) This argument can be given three values. + # If true, then the value of used to construct this object is expected to + # contain no unescaped XML markup, and REXML will not change the text. If + # this value is false, the string may contain any characters, and REXML will + # escape any and all defined entities whose values are contained in the + # text. If this value is nil (the default), then the raw value of the + # parent will be used as the raw value for this node. If there is no raw + # value for the parent, and no value is supplied, the default is false. + # Text.new( "<&", false, nil, false ) #-> "<&" + # Text.new( "<&", false, nil, true ) #-> IllegalArgumentException + # Text.new( "<&", false, nil, true ) #-> "<&" + # # Assume that the entity "s" is defined to be "sean" + # # and that the entity "r" is defined to be "russell" + # Text.new( "sean russell" ) #-> "&s; &r;" + # Text.new( "sean russell", false, nil, true ) #-> "sean russell" + # + # +entity_filter+ (nil) This can be an array of entities to match in the + # supplied text. This argument is only useful if +raw+ is set to false. + # Text.new( "sean russell", false, nil, false, ["s"] ) #-> "&s; russell" + # Text.new( "sean russell", false, nil, true, ["s"] ) #-> "sean russell" + # In the last example, the +entity_filter+ argument is ignored. + # + # +pattern+ INTERNAL USE ONLY + def initialize(arg, respect_whitespace=false, parent=nil, raw=nil, + entity_filter=nil, illegal=ILLEGAL ) - @raw = false + @raw = false - if parent - super( parent ) - @raw = parent.raw - else - @parent = nil - end + if parent + super( parent ) + @raw = parent.raw + else + @parent = nil + end - @raw = raw unless raw.nil? - @entity_filter = entity_filter - @normalized = @unnormalized = nil + @raw = raw unless raw.nil? + @entity_filter = entity_filter + @normalized = @unnormalized = nil - if arg.kind_of? String - @string = arg.clone - @string.squeeze!(" \n\t") unless respect_whitespace - elsif arg.kind_of? Text - @string = arg.to_s - @raw = arg.raw - elsif - raise Exception.new( "Illegal argument of type #{arg.type} for Text constructor (#{arg})" ) - end + if arg.kind_of? String + @string = arg.clone + @string.squeeze!(" \n\t") unless respect_whitespace + elsif arg.kind_of? Text + @string = arg.to_s + @raw = arg.raw + elsif + raise "Illegal argument of type #{arg.type} for Text constructor (#{arg})" + end - @string.gsub!( /\r\n?/, "\n" ) + @string.gsub!( /\r\n?/, "\n" ) - # check for illegal characters - if @raw - if @string =~ illegal - raise Exception.new( - "Illegal character '#{$1}' in raw string \"#{@string}\"" - ) - end - end - end + # check for illegal characters + if @raw + if @string =~ illegal + raise "Illegal character '#{$1}' in raw string \"#{@string}\"" + end + end + end - def node_type - :text - end + def node_type + :text + end - def empty? - @string.size==0 - end + def empty? + @string.size==0 + end - def clone - return Text.new(self) - end + def clone + return Text.new(self) + end - # Appends text to this text node. The text is appended in the +raw+ mode - # of this text node. - def <<( to_append ) - @string << to_append.gsub( /\r\n?/, "\n" ) - end + # Appends text to this text node. The text is appended in the +raw+ mode + # of this text node. + def <<( to_append ) + @string << to_append.gsub( /\r\n?/, "\n" ) + end - # +other+ a String or a Text - # +returns+ the result of (to_s <=> arg.to_s) - def <=>( other ) - to_s() <=> other.to_s - end + # +other+ a String or a Text + # +returns+ the result of (to_s <=> arg.to_s) + def <=>( other ) + to_s() <=> other.to_s + end - REFERENCE = /#{Entity::REFERENCE}/ - # Returns the string value of this text node. This string is always - # escaped, meaning that it is a valid XML text node string, and all - # entities that can be escaped, have been inserted. This method respects - # the entity filter set in the constructor. - # - # # Assume that the entity "s" is defined to be "sean", and that the - # # entity "r" is defined to be "russell" - # t = Text.new( "< & sean russell", false, nil, false, ['s'] ) - # t.to_s #-> "< & &s; russell" - # t = Text.new( "< & &s; russell", false, nil, false ) - # t.to_s #-> "< & &s; russell" - # u = Text.new( "sean russell", false, nil, true ) - # u.to_s #-> "sean russell" - def to_s - return @string if @raw - return @normalized if @normalized + REFERENCE = /#{Entity::REFERENCE}/ + # Returns the string value of this text node. This string is always + # escaped, meaning that it is a valid XML text node string, and all + # entities that can be escaped, have been inserted. This method respects + # the entity filter set in the constructor. + # + # # Assume that the entity "s" is defined to be "sean", and that the + # # entity "r" is defined to be "russell" + # t = Text.new( "< & sean russell", false, nil, false, ['s'] ) + # t.to_s #-> "< & &s; russell" + # t = Text.new( "< & &s; russell", false, nil, false ) + # t.to_s #-> "< & &s; russell" + # u = Text.new( "sean russell", false, nil, true ) + # u.to_s #-> "sean russell" + def to_s + return @string if @raw + return @normalized if @normalized - doctype = nil - if @parent - doc = @parent.document - doctype = doc.doctype if doc - end + doctype = nil + if @parent + doc = @parent.document + doctype = doc.doctype if doc + end - @normalized = Text::normalize( @string, doctype, @entity_filter ) - end + @normalized = Text::normalize( @string, doctype, @entity_filter ) + end - # Returns the string value of this text. This is the text without - # entities, as it might be used programmatically, or printed to the - # console. This ignores the 'raw' attribute setting, and any - # entity_filter. - # - # # Assume that the entity "s" is defined to be "sean", and that the - # # entity "r" is defined to be "russell" - # t = Text.new( "< & sean russell", false, nil, false, ['s'] ) - # t.string #-> "< & sean russell" - # t = Text.new( "< & &s; russell", false, nil, false ) - # t.string #-> "< & sean russell" - # u = Text.new( "sean russell", false, nil, true ) - # u.string #-> "sean russell" - def value - @unnormalized if @unnormalized - doctype = nil - if @parent - doc = @parent.document - doctype = doc.doctype if doc - end - @unnormalized = Text::unnormalize( @string, doctype ) - end - - def wrap(string, width, addnewline=false) - # Recursivly wrap string at width. - return string if string.length <= width - place = string.rindex(' ', width) # Position in string with last ' ' before cutoff - if addnewline then - return "\n" + string[0,place] + "\n" + wrap(string[place+1..-1], width) - else - return string[0,place] + "\n" + wrap(string[place+1..-1], width) - end - end + def inspect + @string.inspect + end + + # Returns the string value of this text. This is the text without + # entities, as it might be used programmatically, or printed to the + # console. This ignores the 'raw' attribute setting, and any + # entity_filter. + # + # # Assume that the entity "s" is defined to be "sean", and that the + # # entity "r" is defined to be "russell" + # t = Text.new( "< & sean russell", false, nil, false, ['s'] ) + # t.string #-> "< & sean russell" + # t = Text.new( "< & &s; russell", false, nil, false ) + # t.string #-> "< & sean russell" + # u = Text.new( "sean russell", false, nil, true ) + # u.string #-> "sean russell" + def value + @unnormalized if @unnormalized + doctype = nil + if @parent + doc = @parent.document + doctype = doc.doctype if doc + end + @unnormalized = Text::unnormalize( @string, doctype ) + end + + def wrap(string, width, addnewline=false) + # Recursivly wrap string at width. + return string if string.length <= width + place = string.rindex(' ', width) # Position in string with last ' ' before cutoff + if addnewline then + return "\n" + string[0,place] + "\n" + wrap(string[place+1..-1], width) + else + return string[0,place] + "\n" + wrap(string[place+1..-1], width) + end + end # Sets the contents of this text node. This expects the text to be # unnormalized. It returns self. @@ -188,26 +190,26 @@ module REXML # e[0].value = "bar" # bar # e[0].value = "" # <a> def value=( val ) - @string = val.gsub( /\r\n?/, "\n" ) + @string = val.gsub( /\r\n?/, "\n" ) @unnormalized = nil @normalized = nil @raw = false end - def indent_text(string, level=1, style="\t", indentfirstline=true) + def indent_text(string, level=1, style="\t", indentfirstline=true) return string if level < 0 - new_string = '' - string.each { |line| - indent_string = style * level - new_line = (indent_string + line).sub(/[\s]+$/,'') - new_string << new_line - } - new_string.strip! unless indentfirstline - return new_string - end + new_string = '' + string.each { |line| + indent_string = style * level + new_line = (indent_string + line).sub(/[\s]+$/,'') + new_string << new_line + } + new_string.strip! unless indentfirstline + return new_string + end - def write( writer, indent=-1, transitive=false, ie_hack=false ) - s = to_s() + def write( writer, indent=-1, transitive=false, ie_hack=false ) + s = to_s() if not (@parent and @parent.whitespace) then s = wrap(s, 60, false) if @parent and @parent.context[:wordwrap] == :all if @parent and not @parent.context[:indentstyle].nil? and indent > 0 and s.count("\n") > 0 @@ -216,7 +218,7 @@ module REXML s.squeeze!(" \n\t") if @parent and !@parent.whitespace end writer << s - end + end # FIXME # This probably won't work properly @@ -226,111 +228,111 @@ module REXML return path end - # Writes out text, substituting special characters beforehand. - # +out+ A String, IO, or any other object supporting <<( String ) - # +input+ the text to substitute and the write out - # - # z=utf8.unpack("U*") - # ascOut="" - # z.each{|r| - # if r < 0x100 - # ascOut.concat(r.chr) - # else - # ascOut.concat(sprintf("&#x%x;", r)) - # end - # } - # puts ascOut - def write_with_substitution out, input - copy = input.clone - # Doing it like this rather than in a loop improves the speed - copy.gsub!( SPECIALS[0], SUBSTITUTES[0] ) - copy.gsub!( SPECIALS[1], SUBSTITUTES[1] ) - copy.gsub!( SPECIALS[2], SUBSTITUTES[2] ) - copy.gsub!( SPECIALS[3], SUBSTITUTES[3] ) - copy.gsub!( SPECIALS[4], SUBSTITUTES[4] ) - copy.gsub!( SPECIALS[5], SUBSTITUTES[5] ) - out << copy - end + # Writes out text, substituting special characters beforehand. + # +out+ A String, IO, or any other object supporting <<( String ) + # +input+ the text to substitute and the write out + # + # z=utf8.unpack("U*") + # ascOut="" + # z.each{|r| + # if r < 0x100 + # ascOut.concat(r.chr) + # else + # ascOut.concat(sprintf("&#x%x;", r)) + # end + # } + # puts ascOut + def write_with_substitution out, input + copy = input.clone + # Doing it like this rather than in a loop improves the speed + copy.gsub!( SPECIALS[0], SUBSTITUTES[0] ) + copy.gsub!( SPECIALS[1], SUBSTITUTES[1] ) + copy.gsub!( SPECIALS[2], SUBSTITUTES[2] ) + copy.gsub!( SPECIALS[3], SUBSTITUTES[3] ) + copy.gsub!( SPECIALS[4], SUBSTITUTES[4] ) + copy.gsub!( SPECIALS[5], SUBSTITUTES[5] ) + out << copy + end - # Reads text, substituting entities - def Text::read_with_substitution( input, illegal=nil ) - copy = input.clone + # Reads text, substituting entities + def Text::read_with_substitution( input, illegal=nil ) + copy = input.clone - if copy =~ illegal - raise ParseException.new( "malformed text: Illegal character #$& in \"#{copy}\"" ) - end if illegal - - copy.gsub!( /\r\n?/, "\n" ) - if copy.include? ?& - copy.gsub!( SETUTITSBUS[0], SLAICEPS[0] ) - copy.gsub!( SETUTITSBUS[1], SLAICEPS[1] ) - copy.gsub!( SETUTITSBUS[2], SLAICEPS[2] ) - copy.gsub!( SETUTITSBUS[3], SLAICEPS[3] ) - copy.gsub!( SETUTITSBUS[4], SLAICEPS[4] ) - copy.gsub!( /�*((?:\d+)|(?:x[a-f0-9]+));/ ) {|m| - m=$1 - #m='0' if m=='' - m = "0#{m}" if m[0] == ?x - [Integer(m)].pack('U*') - } - end - copy - end + if copy =~ illegal + raise ParseException.new( "malformed text: Illegal character #$& in \"#{copy}\"" ) + end if illegal + + copy.gsub!( /\r\n?/, "\n" ) + if copy.include? ?& + copy.gsub!( SETUTITSBUS[0], SLAICEPS[0] ) + copy.gsub!( SETUTITSBUS[1], SLAICEPS[1] ) + copy.gsub!( SETUTITSBUS[2], SLAICEPS[2] ) + copy.gsub!( SETUTITSBUS[3], SLAICEPS[3] ) + copy.gsub!( SETUTITSBUS[4], SLAICEPS[4] ) + copy.gsub!( /�*((?:\d+)|(?:x[a-f0-9]+));/ ) {|m| + m=$1 + #m='0' if m=='' + m = "0#{m}" if m[0] == ?x + [Integer(m)].pack('U*') + } + end + copy + end - EREFERENCE = /&(?!#{Entity::NAME};)/ - # Escapes all possible entities - def Text::normalize( input, doctype=nil, entity_filter=nil ) - copy = input.clone - # Doing it like this rather than in a loop improves the speed - if doctype - copy = copy.gsub( EREFERENCE, '&' ) - doctype.entities.each_value do |entity| - copy = copy.gsub( entity.value, - "&#{entity.name};" ) if entity.value and - not( entity_filter and entity_filter.include?(entity) ) - end - else - copy = copy.gsub( EREFERENCE, '&' ) - DocType::DEFAULT_ENTITIES.each_value do |entity| - copy = copy.gsub(entity.value, "&#{entity.name};" ) - end - end - copy - end + EREFERENCE = /&(?!#{Entity::NAME};)/ + # Escapes all possible entities + def Text::normalize( input, doctype=nil, entity_filter=nil ) + copy = input.clone + # Doing it like this rather than in a loop improves the speed + if doctype + copy = copy.gsub( EREFERENCE, '&' ) + doctype.entities.each_value do |entity| + copy = copy.gsub( entity.value, + "&#{entity.name};" ) if entity.value and + not( entity_filter and entity_filter.include?(entity) ) + end + else + copy = copy.gsub( EREFERENCE, '&' ) + DocType::DEFAULT_ENTITIES.each_value do |entity| + copy = copy.gsub(entity.value, "&#{entity.name};" ) + end + end + copy + end - # Unescapes all possible entities - def Text::unnormalize( string, doctype=nil, filter=nil, illegal=nil ) - rv = string.clone - rv.gsub!( /\r\n?/, "\n" ) - matches = rv.scan( REFERENCE ) - return rv if matches.size == 0 - rv.gsub!( NUMERICENTITY ) {|m| - m=$1 - m = "0#{m}" if m[0] == ?x - [Integer(m)].pack('U*') - } - matches.collect!{|x|x[0]}.compact! - if matches.size > 0 - if doctype - matches.each do |entity_reference| - unless filter and filter.include?(entity_reference) - entity_value = doctype.entity( entity_reference ) - re = /&#{entity_reference};/ - rv.gsub!( re, entity_value ) if entity_value - end - end - else - matches.each do |entity_reference| - unless filter and filter.include?(entity_reference) - entity_value = DocType::DEFAULT_ENTITIES[ entity_reference ] - re = /&#{entity_reference};/ - rv.gsub!( re, entity_value.value ) if entity_value - end - end - end - rv.gsub!( /&/, '&' ) - end - rv - end - end + # Unescapes all possible entities + def Text::unnormalize( string, doctype=nil, filter=nil, illegal=nil ) + rv = string.clone + rv.gsub!( /\r\n?/, "\n" ) + matches = rv.scan( REFERENCE ) + return rv if matches.size == 0 + rv.gsub!( NUMERICENTITY ) {|m| + m=$1 + m = "0#{m}" if m[0] == ?x + [Integer(m)].pack('U*') + } + matches.collect!{|x|x[0]}.compact! + if matches.size > 0 + if doctype + matches.each do |entity_reference| + unless filter and filter.include?(entity_reference) + entity_value = doctype.entity( entity_reference ) + re = /&#{entity_reference};/ + rv.gsub!( re, entity_value ) if entity_value + end + end + else + matches.each do |entity_reference| + unless filter and filter.include?(entity_reference) + entity_value = DocType::DEFAULT_ENTITIES[ entity_reference ] + re = /&#{entity_reference};/ + rv.gsub!( re, entity_value.value ) if entity_value + end + end + end + rv.gsub!( /&/, '&' ) + end + rv + end + end end diff --git a/lib/rexml/xmldecl.rb b/lib/rexml/xmldecl.rb index df2cbf0060..47131ac816 100644 --- a/lib/rexml/xmldecl.rb +++ b/lib/rexml/xmldecl.rb @@ -94,6 +94,10 @@ module REXML @writethis = true end + def inspect + START.sub(/\\/u, '') + " ... " + STOP.sub(/\\/u, '') + end + private def content(enc) rv = "version='#@version'" diff --git a/lib/rexml/xpath.rb b/lib/rexml/xpath.rb index c9c216fe27..6875f038e0 100644 --- a/lib/rexml/xpath.rb +++ b/lib/rexml/xpath.rb @@ -2,61 +2,76 @@ require 'rexml/functions' require 'rexml/xpath_parser' module REXML - # Wrapper class. Use this class to access the XPath functions. - class XPath - include Functions - EMPTY_HASH = {} + # Wrapper class. Use this class to access the XPath functions. + class XPath + include Functions + EMPTY_HASH = {} - # Finds and returns the first node that matches the supplied xpath. - # element:: - # The context element - # path:: - # The xpath to search for. If not supplied or nil, returns the first - # node matching '*'. - # namespaces:: - # If supplied, a Hash which defines a namespace mapping. - # - # XPath.first( node ) - # XPath.first( doc, "//b"} ) - # XPath.first( node, "a/x:b", { "x"=>"http://doofus" } ) - def XPath::first element, path=nil, namespaces={}, variables={} - parser = XPathParser.new - parser.namespaces = namespaces - parser.variables = variables - path = "*" unless path - element = [element] unless element.kind_of? Array - parser.parse(path, element)[0] - end + # Finds and returns the first node that matches the supplied xpath. + # element:: + # The context element + # path:: + # The xpath to search for. If not supplied or nil, returns the first + # node matching '*'. + # namespaces:: + # If supplied, a Hash which defines a namespace mapping. + # + # XPath.first( node ) + # XPath.first( doc, "//b"} ) + # XPath.first( node, "a/x:b", { "x"=>"http://doofus" } ) + def XPath::first element, path=nil, namespaces={}, variables={} +=begin + raise "The namespaces argument, if supplied, must be a hash object." unless namespaces.kind_of? Hash + raise "The variables argument, if supplied, must be a hash object." unless variables.kind_of? Hash + parser = XPathParser.new + parser.namespaces = namespaces + parser.variables = variables + path = "*" unless path + parser.first( path, element ); +=end +#=begin + raise "The namespaces argument, if supplied, must be a hash object." unless namespaces.kind_of? Hash + raise "The variables argument, if supplied, must be a hash object." unless variables.kind_of? Hash + parser = XPathParser.new + parser.namespaces = namespaces + parser.variables = variables + path = "*" unless path + element = [element] unless element.kind_of? Array + parser.parse(path, element).flatten[0] +#=end + end - # Itterates over nodes that match the given path, calling the supplied - # block with the match. - # element:: - # The context element - # path:: - # The xpath to search for. If not supplied or nil, defaults to '*' - # namespaces:: - # If supplied, a Hash which defines a namespace mapping - # - # XPath.each( node ) { |el| ... } - # XPath.each( node, '/*[@attr='v']' ) { |el| ... } - # XPath.each( node, 'ancestor::x' ) { |el| ... } - def XPath::each element, path=nil, namespaces={}, variables={}, &block - parser = XPathParser.new - parser.namespaces = namespaces - parser.variables = variables - path = "*" unless path - element = [element] unless element.kind_of? Array - parser.parse(path, element).each( &block ) - end + # Itterates over nodes that match the given path, calling the supplied + # block with the match. + # element:: + # The context element + # path:: + # The xpath to search for. If not supplied or nil, defaults to '*' + # namespaces:: + # If supplied, a Hash which defines a namespace mapping + # + # XPath.each( node ) { |el| ... } + # XPath.each( node, '/*[@attr='v']' ) { |el| ... } + # XPath.each( node, 'ancestor::x' ) { |el| ... } + def XPath::each element, path=nil, namespaces={}, variables={}, &block + raise "The namespaces argument, if supplied, must be a hash object." unless namespaces.kind_of? Hash + raise "The variables argument, if supplied, must be a hash object." unless variables.kind_of? Hash + parser = XPathParser.new + parser.namespaces = namespaces + parser.variables = variables + path = "*" unless path + element = [element] unless element.kind_of? Array + parser.parse(path, element).each( &block ) + end - # Returns an array of nodes matching a given XPath. - def XPath::match element, path=nil, namespaces={}, variables={} - parser = XPathParser.new - parser.namespaces = namespaces - parser.variables = variables - path = "*" unless path - element = [element] unless element.kind_of? Array - parser.parse(path,element) - end - end + # Returns an array of nodes matching a given XPath. + def XPath::match element, path=nil, namespaces={}, variables={} + parser = XPathParser.new + parser.namespaces = namespaces + parser.variables = variables + path = "*" unless path + element = [element] unless element.kind_of? Array + parser.parse(path,element) + end + end end diff --git a/lib/rexml/xpath_parser.rb b/lib/rexml/xpath_parser.rb index 5a976d5e82..91b8ad48c8 100644 --- a/lib/rexml/xpath_parser.rb +++ b/lib/rexml/xpath_parser.rb @@ -1,7 +1,28 @@ require 'rexml/namespace' require 'rexml/xmltokens' +require 'rexml/attribute' +require 'rexml/syncenumerator' require 'rexml/parsers/xpathparser' +class Object + def dclone + clone + end +end +class Symbol + def dclone + self + end +end +class Array + def dclone + klone = self.clone + klone.clear + self.each{|v| klone << v.dclone} + klone + end +end + module REXML # You don't want to use this class. Really. Use XPath, which is a wrapper # for this class. Believe me. You don't want to poke around in here. @@ -28,259 +49,419 @@ module REXML end def parse path, nodeset - path_stack = @parser.parse( path ) - #puts "PARSE: #{path} => #{path_stack.inspect}" - #puts "PARSE: nodeset = #{nodeset.collect{|x|x.to_s}.inspect}" - match( path_stack, nodeset ) + #puts "#"*40 + path_stack = @parser.parse( path ) + #puts "PARSE: #{path} => #{path_stack.inspect}" + #puts "PARSE: nodeset = #{nodeset.inspect}" + match( path_stack, nodeset ) + end + + def get_first path, nodeset + #puts "#"*40 + path_stack = @parser.parse( path ) + #puts "PARSE: #{path} => #{path_stack.inspect}" + #puts "PARSE: nodeset = #{nodeset.inspect}" + first( path_stack, nodeset ) end def predicate path, nodeset - path_stack = @parser.predicate( path ) - return Predicate( path_stack, nodeset ) + path_stack = @parser.parse( path ) + expr( path_stack, nodeset ) end def []=( variable_name, value ) @variables[ variable_name ] = value end - def match( path_stack, nodeset ) - while ( path_stack.size > 0 and nodeset.size > 0 ) - #puts "PARSE: #{path_stack.inspect} '#{nodeset.collect{|n|n.class}.inspect}'" - nodeset = internal_parse( path_stack, nodeset ) - #puts "NODESET: #{nodeset}" - #puts "PATH_STACK: #{path_stack.inspect}" - end - nodeset - end - private + # Performs a depth-first (document order) XPath search, and returns the + # first match. This is the fastest, lightest way to return a single result. + def first( path_stack, node ) + #puts "#{depth}) Entering match( #{path.inspect}, #{tree.inspect} )" + return nil if path.size == 0 - def internal_parse path_stack, nodeset - #puts "INTERNAL_PARSE RETURNING WITH NO RESULTS" if nodeset.size == 0 or path_stack.size == 0 - return nodeset if nodeset.size == 0 or path_stack.size == 0 - #puts "INTERNAL_PARSE: #{path_stack.inspect}, #{nodeset.collect{|n| n.class}.inspect}" - case path_stack.shift + case path[0] when :document - return [ nodeset[0].root.parent ] - - when :qname - prefix = path_stack.shift - name = path_stack.shift - #puts "QNAME #{prefix}#{prefix.size>0?':':''}#{name}" - n = nodeset.clone - ns = @namespaces[prefix] - ns = ns ? ns : '' - n.delete_if do |node| - # FIXME: This DOUBLES the time XPath searches take - ns = node.namespace( prefix ) if node.node_type == :element and ns == '' - #puts "NODE: '#{node.to_s}'; node.has_name?( #{name.inspect}, #{ns.inspect} ): #{ node.has_name?( name, ns )}; node.namespace() = #{node.namespace().inspect}; node.prefix = #{node.prefix().inspect}" if node.node_type == :element - !(node.node_type == :element and node.name == name and node.namespace == ns ) + # do nothing + return first( path[1..-1], node ) + when :child + for c in node.children + #puts "#{depth}) CHILD checking #{name(c)}" + r = first( path[1..-1], c ) + #puts "#{depth}) RETURNING #{r.inspect}" if r + return r if r end - return n - - when :any - n = nodeset.clone - n.delete_if { |node| node.node_type != :element } - return n - - when :self - # THIS SPACE LEFT INTENTIONALLY BLANK - - when :processing_instruction - target = path_stack.shift - n = nodeset.clone - n.delete_if do |node| - (node.node_type != :processing_instruction) or - ( !target.nil? and ( node.target != target ) ) + when :qname + name = path[2] + #puts "#{depth}) QNAME #{name(tree)} == #{name} (path => #{path.size})" + if node.name == name + #puts "#{depth}) RETURNING #{tree.inspect}" if path.size == 3 + return node if path.size == 3 + return first( path[3..-1], node ) + else + return nil end - return n - - when :text - #puts ":TEXT" - n = nodeset.clone - n.delete_if do |node| - #puts "#{node} :: #{node.node_type}" - node.node_type != :text + when :descendant_or_self + r = first( path[1..-1], node ) + return r if r + for c in node.children + r = first( path, c ) + return r if r end - return n + when :node + return first( path[1..-1], node ) + when :any + return first( path[1..-1], node ) + end + return nil + end - when :comment - n = nodeset.clone - n.delete_if do |node| - node.node_type != :comment - end - return n - when :node - return nodeset + def match( path_stack, nodeset ) + #puts "MATCH: path_stack = #{path_stack.inspect}" + #puts "MATCH: nodeset = #{nodeset.inspect}" + r = expr( path_stack, nodeset ) + #puts "MAIN EXPR => #{r.inspect}" + r - # FIXME: I suspect the following XPath will fail: - # /a/*/*[1] - when :child - #puts "CHILD" - new_nodeset = [] - nt = nil - for node in nodeset - nt = node.node_type - new_nodeset += node.children if nt == :element or nt == :document - end - #path_stack[0,(path_stack.size-ps_clone.size)] = [] - return new_nodeset + #while ( path_stack.size > 0 and nodeset.size > 0 ) + # #puts "MATCH: #{path_stack.inspect} '#{nodeset.collect{|n|n.class}.inspect}'" + # nodeset = expr( path_stack, nodeset ) + # #puts "NODESET: #{nodeset.inspect}" + # #puts "PATH_STACK: #{path_stack.inspect}" + #end + #nodeset + end + + private + + + # Expr takes a stack of path elements and a set of nodes (either a Parent + # or an Array and returns an Array of matching nodes + ALL = [ :attribute, :element, :text, :processing_instruction, :comment ] + ELEMENTS = [ :element ] + def expr( path_stack, nodeset, context=nil ) + #puts "#"*15 + #puts "In expr with #{path_stack.inspect}" + #puts "Returning" if path_stack.length == 0 || nodeset.length == 0 + node_types = ELEMENTS + return nodeset if path_stack.length == 0 || nodeset.length == 0 + while path_stack.length > 0 + #puts "Path stack = #{path_stack.inspect}" + #puts "Nodeset is #{nodeset.inspect}" + case (op = path_stack.shift) + when :document + nodeset = [ nodeset[0].root_node ] + #puts ":document, nodeset = #{nodeset.inspect}" - when :literal - literal = path_stack.shift - if literal =~ /^\d+(\.\d+)?$/ - return ($1 ? literal.to_f : literal.to_i) - end - #puts "RETURNING '#{literal}'" - return literal - - when :attribute - new_nodeset = [] - case path_stack.shift when :qname + #puts "IN QNAME" prefix = path_stack.shift name = path_stack.shift - for element in nodeset - if element.node_type == :element - #puts element.name - #puts "looking for attribute #{name} in '#{@namespaces[prefix]}'" - attr = element.attribute( name, @namespaces[prefix] ) - #puts ":ATTRIBUTE: attr => #{attr}" - new_nodeset << attr if attr + ns = @namespaces[prefix] + ns = ns ? ns : '' + nodeset.delete_if do |node| + # FIXME: This DOUBLES the time XPath searches take + ns = node.namespace( prefix ) if node.node_type == :element and ns == '' + #puts "NS = #{ns.inspect}" + #puts "node.node_type == :element => #{node.node_type == :element}" + if node.node_type == :element + #puts "node.name == #{name} => #{node.name == name}" + if node.name == name + #puts "node.namespace == #{ns.inspect} => #{node.namespace == ns}" + end end + !(node.node_type == :element and + node.name == name and + node.namespace == ns ) end + node_types = ELEMENTS + when :any - #puts "ANY" - for element in nodeset - if element.node_type == :element - new_nodeset += element.attributes.to_a - end + #puts "ANY 1: nodeset = #{nodeset.inspect}" + #puts "ANY 1: node_types = #{node_types.inspect}" + nodeset.delete_if { |node| !node_types.include?(node.node_type) } + #puts "ANY 2: nodeset = #{nodeset.inspect}" + + when :self + # This space left intentionally blank + + when :processing_instruction + target = path_stack.shift + nodeset.delete_if do |node| + (node.node_type != :processing_instruction) or + ( target!='' and ( node.target != target ) ) end - end - #puts "RETURNING #{new_nodeset.collect{|n|n.to_s}.inspect}" - return new_nodeset - - when :parent - return internal_parse( path_stack, nodeset.collect{|n| n.parent}.compact ) - - when :ancestor - #puts "ANCESTOR" - new_nodeset = [] - for node in nodeset - while node.parent - node = node.parent - new_nodeset << node unless new_nodeset.include? node + + when :text + nodeset.delete_if { |node| node.node_type != :text } + + when :comment + nodeset.delete_if { |node| node.node_type != :comment } + + when :node + # This space left intentionally blank + node_types = ALL + + when :child + new_nodeset = [] + nt = nil + for node in nodeset + nt = node.node_type + new_nodeset += node.children if nt == :element or nt == :document end - end - #nodeset = new_nodeset.uniq - return new_nodeset - - when :ancestor_or_self - new_nodeset = [] - for node in nodeset - if node.node_type == :element - new_nodeset << node - while ( node.parent ) + nodeset = new_nodeset + node_types = ELEMENTS + + when :literal + literal = path_stack.shift + if literal =~ /^\d+(\.\d+)?$/ + return ($1 ? literal.to_f : literal.to_i) + end + return literal + + when :attribute + new_nodeset = [] + case path_stack.shift + when :qname + prefix = path_stack.shift + name = path_stack.shift + for element in nodeset + if element.node_type == :element + #puts element.name + attr = element.attribute( name, @namespaces[prefix] ) + new_nodeset << attr if attr + end + end + when :any + #puts "ANY" + for element in nodeset + if element.node_type == :element + new_nodeset += element.attributes.to_a + end + end + end + nodeset = new_nodeset + + when :parent + #puts "PARENT 1: nodeset = #{nodeset}" + nodeset = nodeset.collect{|n| n.parent}.compact + #nodeset = expr(path_stack.dclone, nodeset.collect{|n| n.parent}.compact) + #puts "PARENT 2: nodeset = #{nodeset.inspect}" + node_types = ELEMENTS + + when :ancestor + new_nodeset = [] + for node in nodeset + while node.parent node = node.parent new_nodeset << node unless new_nodeset.include? node end end - end - #nodeset = new_nodeset.uniq - return new_nodeset - - when :predicate - #puts "@"*80 - #puts "NODESET = #{nodeset.collect{|n|n.to_s}.inspect}" - predicate = path_stack.shift - new_nodeset = [] - Functions::size = nodeset.size - nodeset.size.times do |index| - node = nodeset[index] - Functions::node = node - Functions::index = index+1 - #puts "Node #{node} and index=#{index+1}" - result = Predicate( predicate, node ) - #puts "Predicate returned #{result} (#{result.class}) for #{node.class}" - if result.kind_of? Numeric - #puts "#{result} == #{index} => #{result == index}" - new_nodeset << node if result == (index+1) - elsif result.instance_of? Array - new_nodeset << node if result.size > 0 + nodeset = new_nodeset + node_types = ELEMENTS + + when :ancestor_or_self + new_nodeset = [] + for node in nodeset + if node.node_type == :element + new_nodeset << node + while ( node.parent ) + node = node.parent + new_nodeset << node unless new_nodeset.include? node + end + end + end + nodeset = new_nodeset + node_types = ELEMENTS + + when :predicate + new_nodeset = [] + subcontext = { :size => nodeset.size } + pred = path_stack.shift + nodeset.each_with_index { |node, index| + subcontext[ :node ] = node + #puts "PREDICATE SETTING CONTEXT INDEX TO #{index+1}" + subcontext[ :index ] = index+1 + pc = pred.dclone + #puts "#{node.hash}) Recursing with #{pred.inspect} and [#{node.inspect}]" + result = expr( pc, [node], subcontext ) + result = result[0] if result.kind_of? Array and result.length == 1 + #puts "#{node.hash}) Result = #{result.inspect} (#{result.class.name})" + if result.kind_of? Numeric + #puts "Adding node #{node.inspect}" if result == (index+1) + new_nodeset << node if result == (index+1) + elsif result.instance_of? Array + #puts "Adding node #{node.inspect}" if result.size > 0 + new_nodeset << node if result.size > 0 + else + #puts "Adding node #{node.inspect}" if result + new_nodeset << node if result + end + } + #puts "New nodeset = #{new_nodeset.inspect}" + #puts "Path_stack = #{path_stack.inspect}" + nodeset = new_nodeset +=begin + predicate = path_stack.shift + ns = nodeset.clone + result = expr( predicate, ns ) + #puts "Result = #{result.inspect} (#{result.class.name})" + #puts "nodeset = #{nodeset.inspect}" + if result.kind_of? Array + nodeset = result.zip(ns).collect{|m,n| n if m}.compact else - new_nodeset << node if result + nodeset = result ? nodeset : [] end - end - #puts "Nodeset after predicate #{predicate.inspect} has #{new_nodeset.size} nodes" - #puts "NODESET: #{new_nodeset.collect{|n|n.to_s}.inspect}" - return new_nodeset + #puts "Outgoing NS = #{nodeset.inspect}" +=end + + when :descendant_or_self + rv = descendant_or_self( path_stack, nodeset ) + path_stack.clear + nodeset = rv + node_types = ELEMENTS + + when :descendant + results = [] + nt = nil + for node in nodeset + nt = node.node_type + results += expr( path_stack.dclone.unshift( :descendant_or_self ), + node.children ) if nt == :element or nt == :document + end + nodeset = results + node_types = ELEMENTS + + when :following_sibling + #puts "FOLLOWING_SIBLING 1: nodeset = #{nodeset}" + results = [] + for node in nodeset + all_siblings = node.parent.children + current_index = all_siblings.index( node ) + following_siblings = all_siblings[ current_index+1 .. -1 ] + results += expr( path_stack.dclone, following_siblings ) + end + #puts "FOLLOWING_SIBLING 2: nodeset = #{nodeset}" + nodeset = results + + when :preceding_sibling + results = [] + for node in nodeset + all_siblings = node.parent.children + current_index = all_siblings.index( node ) + preceding_siblings = all_siblings[ 0 .. current_index-1 ].reverse + #results += expr( path_stack.dclone, preceding_siblings ) + end + nodeset = preceding_siblings + node_types = ELEMENTS - when :descendant_or_self - rv = descendant_or_self( path_stack, nodeset ) - path_stack.clear - return rv - - when :descendant - #puts ":DESCENDANT" - results = [] - nt = nil - for node in nodeset - nt = node.node_type - results += internal_parse( path_stack.clone.unshift( :descendant_or_self ), - node.children ) if nt == :element or nt == :document - end - return results - - when :following_sibling - results = [] - for node in nodeset - all_siblings = node.parent.children - current_index = all_siblings.index( node ) - following_siblings = all_siblings[ current_index+1 .. -1 ] - results += internal_parse( path_stack.clone, following_siblings ) - end - return results - - when :preceding_sibling - results = [] - for node in nodeset - all_siblings = node.parent.children - current_index = all_siblings.index( node ) - preceding_siblings = all_siblings[ 0 .. current_index-1 ] - results += internal_parse( path_stack.clone, preceding_siblings ) - end - return results + when :preceding + new_nodeset = [] + for node in nodeset + new_nodeset += preceding( node ) + end + #puts "NEW NODESET => #{new_nodeset.inspect}" + nodeset = new_nodeset + node_types = ELEMENTS + + when :following + new_nodeset = [] + for node in nodeset + new_nodeset += following( node ) + end + nodeset = new_nodeset + node_types = ELEMENTS - when :preceding - new_nodeset = [] - for node in nodeset - new_nodeset += preceding( node ) - end - return new_nodeset + when :namespace + new_set = [] + for node in nodeset + new_nodeset << node.namespace if node.node_type == :element or node.node_type == :attribute + end + nodeset = new_nodeset + + when :variable + var_name = path_stack.shift + return @variables[ var_name ] + + # :and, :or, :eq, :neq, :lt, :lteq, :gt, :gteq + when :eq, :neq, :lt, :lteq, :gt, :gteq, :and, :or + left = expr( path_stack.shift, nodeset, context ) + #puts "LEFT => #{left.inspect} (#{left.class.name})" + right = expr( path_stack.shift, nodeset, context ) + #puts "RIGHT => #{right.inspect} (#{right.class.name})" + res = equality_relational_compare( left, op, right ) + #puts "RES => #{res.inspect}" + return res - when :following - new_nodeset = [] - for node in nodeset - new_nodeset += following( node ) - end - return new_nodeset + when :div + left = Functions::number(expr(path_stack.shift, nodeset, context)).to_f + right = Functions::number(expr(path_stack.shift, nodeset, context)).to_f + return (left / right) - when :namespace - new_set = [] - for node in nodeset - new_nodeset << node.namespace if node.node_type == :element or node.node_type == :attribute - end - return new_nodeset + when :mod + left = Functions::number(expr(path_stack.shift, nodeset, context )).to_f + right = Functions::number(expr(path_stack.shift, nodeset, context )).to_f + return (left % right) - when :variable - var_name = path_stack.shift - return @variables[ var_name ] + when :mult + left = Functions::number(expr(path_stack.shift, nodeset, context )).to_f + right = Functions::number(expr(path_stack.shift, nodeset, context )).to_f + return (left * right) - end - nodeset + when :plus + left = Functions::number(expr(path_stack.shift, nodeset, context )).to_f + right = Functions::number(expr(path_stack.shift, nodeset, context )).to_f + return (left + right) + + when :minus + left = Functions::number(expr(path_stack.shift, nodeset, context )).to_f + right = Functions::number(expr(path_stack.shift, nodeset, context )).to_f + return (left - right) + + when :union + left = expr( path_stack.shift, nodeset, context ) + right = expr( path_stack.shift, nodeset, context ) + return (left | right) + + when :neg + res = expr( path_stack, nodeset, context ) + return -(res.to_f) + + when :not + when :function + func_name = path_stack.shift.tr('-','_') + arguments = path_stack.shift + #puts "FUNCTION 0: #{func_name}(#{arguments.collect{|a|a.inspect}.join(', ')})" + subcontext = context ? nil : { :size => nodeset.size } + + res = [] + cont = context + nodeset.each_with_index { |n, i| + if subcontext + subcontext[:node] = n + subcontext[:index] = i + cont = subcontext + end + arg_clone = arguments.dclone + args = arg_clone.collect { |arg| + #puts "FUNCTION 1: Calling expr( #{arg.inspect}, [#{n.inspect}] )" + expr( arg, [n], cont ) + } + #puts "FUNCTION 2: #{func_name}(#{args.collect{|a|a.inspect}.join(', ')})" + Functions.context = cont + res << Functions.send( func_name, *args ) + #puts "FUNCTION 3: #{res[-1].inspect}" + } + return res + + end + end # while + #puts "EXPR returning #{nodeset.inspect}" + return nodeset end + ########################################################## # FIXME # The next two methods are BAD MOJO! @@ -294,13 +475,16 @@ module REXML d_o_s( path_stack, nodeset, rs ) #puts "RS = #{rs.collect{|n|n.to_s}.inspect}" document_order(rs.flatten.compact) + #rs.flatten.compact end def d_o_s( p, ns, r ) + #puts "IN DOS with #{ns.inspect}; ALREADY HAVE #{r.inspect}" nt = nil ns.each_index do |i| n = ns[i] - x = match( p.clone, [ n ] ) + #puts "P => #{p.inspect}" + x = expr( p.dclone, [ n ] ) nt = n.node_type d_o_s( p, n.children, x ) if nt == :element or nt == :document and n.children.size > 0 r.concat(x) if x.size > 0 @@ -310,6 +494,12 @@ module REXML # Reorders an array of nodes so that they are in document order # It tries to do this efficiently. + # + # FIXME: I need to get rid of this, but the issue is that most of the XPath + # interpreter functions as a filter, which means that we lose context going + # in and out of function calls. If I knew what the index of the nodes was, + # I wouldn't have to do this. Maybe add a document IDX for each node? + # Problems with mutable documents. Or, rewrite everything. def document_order( array_of_nodes ) new_arry = [] array_of_nodes.each { |node| @@ -319,8 +509,9 @@ module REXML node_idx << np.parent.index( np ) np = np.parent end - new_arry << [ node_idx.reverse.join, node ] + new_arry << [ node_idx.reverse, node ] } + #puts "new_arry = #{new_arry.inspect}" new_arry.sort{ |s1, s2| s1[0] <=> s2[0] }.collect{ |s| s[1] } end @@ -333,124 +524,127 @@ module REXML end - # Given a predicate, a node, and a context, evaluates to true or false. - def Predicate( predicate, node ) - predicate = predicate.clone - #puts "#"*20 - #puts "Predicate( #{predicate.inspect}, #{node.class} )" - results = [] - case (predicate[0]) - when :and, :or, :eq, :neq, :lt, :lteq, :gt, :gteq - eq = predicate.shift - left = Predicate( predicate.shift, node ) - right = Predicate( predicate.shift, node ) - #puts "LEFT = #{left.inspect}" - #puts "RIGHT = #{right.inspect}" - return equality_relational_compare( left, eq, right ) - - when :div, :mod, :mult, :plus, :minus - op = predicate.shift - left = Predicate( predicate.shift, node ) - right = Predicate( predicate.shift, node ) - #puts "LEFT = #{left.inspect}" - #puts "RIGHT = #{right.inspect}" - left = Functions::number( left ) - right = Functions::number( right ) - #puts "LEFT = #{left.inspect}" - #puts "RIGHT = #{right.inspect}" - case op - when :div - return left.to_f / right.to_f - when :mod - return left % right - when :mult - return left * right - when :plus - return left + right - when :minus - return left - right - end - when :union - predicate.shift - left = Predicate( predicate.shift, node ) - right = Predicate( predicate.shift, node ) - return (left | right) - - when :neg - predicate.shift - operand = Functions::number(Predicate( predicate, node )) - return -operand - - when :not - predicate.shift - return !Predicate( predicate.shift, node ) - - when :function - predicate.shift - func_name = predicate.shift.tr('-', '_') - arguments = predicate.shift - #puts "\nFUNCTION: #{func_name}" - #puts "ARGUMENTS: #{arguments.inspect} #{node.to_s}" - args = arguments.collect { |arg| Predicate( arg, node ) } - #puts "FUNCTION: #{func_name}( #{args.collect{|n|n.to_s}.inspect} )" - result = Functions.send( func_name, *args ) - #puts "RESULTS: #{result.inspect}" - return result + # Builds a nodeset of all of the preceding nodes of the supplied node, + # in reverse document order + # preceding:: includes every element in the document that precedes this node, + # except for ancestors + def preceding( node ) + #puts "IN PRECEDING" + ancestors = [] + p = node.parent + while p + ancestors << p + p = p.parent + end - else - return match( predicate, [ node ] ) + acc = [] + p = preceding_node_of( node ) + #puts "P = #{p.inspect}" + while p + if ancestors.include? p + ancestors.delete(p) + else + acc << p + end + p = preceding_node_of( p ) + #puts "P = #{p.inspect}" + end + acc + end + def preceding_node_of( node ) + #puts "NODE: #{node.inspect}" + #puts "PREVIOUS NODE: #{node.previous_sibling_node.inspect}" + #puts "PARENT NODE: #{node.parent}" + psn = node.previous_sibling_node + if psn.nil? + if node.parent.nil? or node.parent.class == Document + return nil + end + return node.parent + #psn = preceding_node_of( node.parent ) + end + while psn and psn.kind_of? Element and psn.children.size > 0 + psn = psn.children[-1] end + psn end - # Builds a nodeset of all of the following nodes of the supplied node, - # in document order def following( node ) - all_siblings = node.parent.children - current_index = all_siblings.index( node ) - following_siblings = all_siblings[ current_index+1 .. -1 ] - following = [] - recurse( following_siblings ) { |node| following << node } - following.shift - #puts "following is returning #{puta following}" - following + #puts "IN PRECEDING" + acc = [] + p = next_sibling_node( node ) + #puts "P = #{p.inspect}" + while p + acc << p + p = following_node_of( p ) + #puts "P = #{p.inspect}" + end + acc end - # Builds a nodeset of all of the preceding nodes of the supplied node, - # in reverse document order - def preceding( node ) - all_siblings = node.parent.children - current_index = all_siblings.index( node ) - preceding_siblings = all_siblings[ 0 .. current_index-1 ] + def following_node_of( node ) + #puts "NODE: #{node.inspect}" + #puts "PREVIOUS NODE: #{node.previous_sibling_node.inspect}" + #puts "PARENT NODE: #{node.parent}" + if node.kind_of? Element and node.children.size > 0 + return node.children[0] + end + return next_sibling_node(node) + end + + def next_sibling_node(node) + psn = node.next_sibling_node + while psn.nil? + if node.parent.nil? or node.parent.class == Document + return nil + end + node = node.parent + psn = node.next_sibling_node + #puts "psn = #{psn.inspect}" + end + return psn + end - preceding = [] - recurse( preceding_siblings ) { |node| preceding.unshift( node ) } - preceding + def norm b + case b + when true, false + return b + when 'true', 'false' + return Functions::boolean( b ) + when /^\d+(\.\d+)?$/ + return Functions::number( b ) + else + return Functions::string( b ) + end end def equality_relational_compare( set1, op, set2 ) - #puts "#"*80 + #puts "EQ_REL_COMP(#{set1.inspect} #{op.inspect} #{set2.inspect})" if set1.kind_of? Array and set2.kind_of? Array - #puts "#{set1.size} & #{set2.size}" + #puts "#{set1.size} & #{set2.size}" if set1.size == 1 and set2.size == 1 set1 = set1[0] set2 = set2[0] elsif set1.size == 0 or set2.size == 0 nd = set1.size==0 ? set2 : set1 - nd.each { |il| return true if compare( il, op, nil ) } + rv = nd.collect { |il| compare( il, op, nil ) } + #puts "RV = #{rv.inspect}" + return rv else - set1.each do |i1| - i1 = i1.to_s - set2.each do |i2| - i2 = i2.to_s - return true if compare( i1, op, i2 ) - end - end - return false + res = [] + enum = SyncEnumerator.new( set1, set2 ).each { |i1, i2| + #puts "i1 = #{i1.inspect} (#{i1.class.name})" + #puts "i2 = #{i2.inspect} (#{i2.class.name})" + i1 = norm( i1 ) + i2 = norm( i2 ) + res << compare( i1, op, i2 ) + } + return res end end - #puts "EQ_REL_COMP: #{set1.class.name} #{set1.inspect}, #{op}, #{set2.class.name} #{set2.inspect}" + #puts "EQ_REL_COMP: #{set1.inspect} (#{set1.class.name}), #{op}, #{set2.inspect} (#{set2.class.name})" #puts "COMPARING VALUES" # If one is nodeset and other is number, compare number to each item # in nodeset s.t. number op number(string(item)) @@ -459,40 +653,28 @@ module REXML # If one is nodeset and other is boolean, compare boolean to each item # in nodeset s.t. boolean op boolean(item) if set1.kind_of? Array or set2.kind_of? Array - #puts "ISA ARRAY" + #puts "ISA ARRAY" if set1.kind_of? Array a = set1 - b = set2.to_s + b = set2 else a = set2 - b = set1.to_s + b = set1 end case b - when 'true', 'false' - b = Functions::boolean( b ) - for v in a - v = Functions::boolean(v) - return true if compare( v, op, b ) - end + when true, false + return a.collect {|v| compare( Functions::boolean(v), op, b ) } + when Numeric + return a.collect {|v| compare( Functions::number(v), op, b )} when /^\d+(\.\d+)?$/ b = Functions::number( b ) #puts "B = #{b.inspect}" - for v in a - #puts "v = #{v.inspect}" - v = Functions::number(v) - #puts "v = #{v.inspect}" - #puts compare(v,op,b) - return true if compare( v, op, b ) - end + return a.collect {|v| compare( Functions::number(v), op, b )} else - #puts "Functions::string( #{b}(#{b.class.name}) ) = #{Functions::string(b)}" + #puts "Functions::string( #{b}(#{b.class.name}) ) = #{Functions::string(b)}" b = Functions::string( b ) - for v in a - #puts "v = #{v.class.name} #{v.inspect}" - v = Functions::string(v) - return true if compare( v, op, b ) - end + return a.collect { |v| compare( Functions::string(v), op, b ) } end else # If neither is nodeset, @@ -532,7 +714,7 @@ module REXML end def compare a, op, b - #puts "COMPARE #{a.to_s}(#{a.class.name}) #{op} #{b.to_s}(#{a.class.name})" + #puts "COMPARE #{a.inspect}(#{a.class.name}) #{op} #{b.inspect}(#{b.class.name})" case op when :eq a == b -- cgit v1.2.3