Short summary:

This is a version bump to REXML 3.1.4. It includes numerous bug fixes and is a pretty big patch, but is nonetheless a minor revision bump, since the API hasn't changed. For more information, see: http:/www.germane-software.com/projects/rexml/milestone/3.1.4 For all tickets, see: http://www.germane-software.com/projects/rexml/ticket/# Where '#' is replaced with the ticket number. Changelog: * Fixed the documentation WRT the raw mode of text nodes (ticket #4) * Fixes roundup ticket #43: substring-after bug. * Fixed ticket #44, Element#xpath * Patch submitted by an anonymous doner to allow parsing of Tempfiles. I was hoping that, by now, that whole Source thing would have been changed to use duck typing and avoid this sort of ticket... but in the meantime, the patch has been applied. * Fixes ticket:30, XPath default namespace bug. The fix was provided by Lucas Nussbaum. * Aliases #size to #length, as per zdennis's request. * Fixes typo from previous commit * Fixes ticket #32, preceding-sibling fails attempting delete_if on nil nodeset * Merges a user-contributed patch for ticket #40 * Adds a forgotten-to-commit unit test for ticket #32 * Changes Date, Version, and Copyright to upper case, to avoid conflicts with the Date class. All of the other changes in the altered files are because Subversion doesn't allow block-level commits, like it should. English cased Version and Copyright are aliased to the upper case versions, for partial backward compatability. * Minor, yet incomplete, documentation changes. Again, these are in this patch because of Subversion's glaring lack of block-level commits. * Resolves ticket #34, SAX parser change makes it impossible to parse IO feeds. * Moves parser.source.position() to parser.position() * Fixes ticket:48, repeated writes munging text content * Fixes ticket:46, adding methods for accessing notation DTD information. * Encodes some characters and removes a brokes link in the documentation * Deals with carriage returns after XML declarations * Improved doctype handling * Whitespace handling changes * Applies a patch by David Tardon, which (incidentally) fixes ticket:50 * Closes #26, allowing anything that walks like an IO to be a source. * Ticket #31 - One unescape too many This wasn't really a bug, per se... "value" always returns a normalized string, and "value" is the method used to get the text() of an element. However, entities have no meaning in CDATA sections, so there's no justification for value to be normalizing the content of CData objects. This behavior has therefore been changed. * Ticket #45 -- Now parses notation declarations in DTDs properly. * Resolves ticket #49, Document.parse_stream returns ArgumentError * Adds documentation to clarify how XMLDecl works, to avoid invalid bug reports. * Addresses ticket #10, fixing the StreamParser API for DTDs. * Fixes ticket #42, XPath node-set function 'name' fails with relative node set parameter * Good patch by Aaron to fix ticket #53: REXML ignoring unbalanced tags at the end of a document. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/branches/ruby_1_8@10090 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
author: ser <ser@b2dd03c8-39d4-4d8f-98ff-823fe69b080e> 2006-04-14 02:56:44 +0000
committer: ser <ser@b2dd03c8-39d4-4d8f-98ff-823fe69b080e> 2006-04-14 02:56:44 +0000
commit: 5f4bf329291f885d23f4d6277b4a22862a291687 (patch)
tree: 0ce7091ae84f46a62452f6671c28ad8aac834d68 /lib/rexml/doctype.rb
parent: bec759abcc335aabde7c0dcd8c85c18223446644 (diff)
1 files changed, 249 insertions, 184 deletions
diff --git a/lib/rexml/doctype.rb b/lib/rexml/doctype.rb
index 652a04fce2..4a1ffb4336 100644
--- a/lib/rexml/doctype.rb
+++ b/lib/rexml/doctype.rb
@@ -6,55 +6,55 @@ require 'rexml/attlistdecl'
 require 'rexml/xmltokens'
 
 module REXML
-	# Represents an XML DOCTYPE declaration; that is, the contents of <!DOCTYPE
-	# ... >.  DOCTYPES can be used to declare the DTD of a document, as well as
-	# being used to declare entities used in the document.
-	class DocType < Parent
-		include XMLTokens
-		START = "<!DOCTYPE"
-		STOP = ">"
-		SYSTEM = "SYSTEM"
-		PUBLIC = "PUBLIC"
-		DEFAULT_ENTITIES = { 
-			'gt'=>EntityConst::GT, 
-			'lt'=>EntityConst::LT, 
-			'quot'=>EntityConst::QUOT, 
-			"apos"=>EntityConst::APOS 
-		}
-
-		# name is the name of the doctype
-		# external_id is the referenced DTD, if given
-		attr_reader :name, :external_id, :entities, :namespaces
-
-		# Constructor
-		#
-		#	 dt = DocType.new( 'foo', '-//I/Hate/External/IDs' )
-		#	 # <!DOCTYPE foo '-//I/Hate/External/IDs'>
-		#	 dt = DocType.new( doctype_to_clone )
-		#	 # Incomplete.  Shallow clone of doctype
+  # Represents an XML DOCTYPE declaration; that is, the contents of <!DOCTYPE
+  # ... >.  DOCTYPES can be used to declare the DTD of a document, as well as
+  # being used to declare entities used in the document.
+  class DocType < Parent
+    include XMLTokens
+    START = "<!DOCTYPE"
+    STOP = ">"
+    SYSTEM = "SYSTEM"
+    PUBLIC = "PUBLIC"
+    DEFAULT_ENTITIES = { 
+      'gt'=>EntityConst::GT, 
+      'lt'=>EntityConst::LT, 
+      'quot'=>EntityConst::QUOT, 
+      "apos"=>EntityConst::APOS 
+    }
+
+    # name is the name of the doctype
+    # external_id is the referenced DTD, if given
+    attr_reader :name, :external_id, :entities, :namespaces
+
+    # Constructor
+    #
+    #   dt = DocType.new( 'foo', '-//I/Hate/External/IDs' )
+    #   # <!DOCTYPE foo '-//I/Hate/External/IDs'>
+    #   dt = DocType.new( doctype_to_clone )
+    #   # Incomplete.  Shallow clone of doctype
     #
     # +Note+ that the constructor: 
     #
     #  Doctype.new( Source.new( "<!DOCTYPE foo 'bar'>" ) )
     #
     # is _deprecated_.  Do not use it.  It will probably disappear.
-		def initialize( first, parent=nil )
-			@entities = DEFAULT_ENTITIES
-			@long_name = @uri = nil
-			if first.kind_of? String
-				super()
-				@name = first
-				@external_id = parent
-			elsif first.kind_of? DocType
-				super( parent )
-				@name = first.name
-				@external_id = first.external_id
-			elsif first.kind_of? Array
-				super( parent )
-				@name = first[0]
-				@external_id = first[1]
-				@long_name = first[2]
-				@uri = first[3]
+    def initialize( first, parent=nil )
+      @entities = DEFAULT_ENTITIES
+      @long_name = @uri = nil
+      if first.kind_of? String
+        super()
+        @name = first
+        @external_id = parent
+      elsif first.kind_of? DocType
+        super( parent )
+        @name = first.name
+        @external_id = first.external_id
+      elsif first.kind_of? Array
+        super( parent )
+        @name = first[0]
+        @external_id = first[1]
+        @long_name = first[2]
+        @uri = first[3]
       elsif first.kind_of? Source
         super( parent )
         parser = Parsers::BaseParser.new( first )
@@ -64,150 +64,215 @@ module REXML
         end
       else
         super()
-			end
-		end
-
-		def node_type
-			:doctype
-		end
-
-		def attributes_of element
-			rv = []
-			each do |child|
-				child.each do |key,val|
-					rv << Attribute.new(key,val)
-				end if child.kind_of? AttlistDecl and child.element_name == element
-			end
-			rv
-		end
-
-		def attribute_of element, attribute
-			att_decl = find do |child|
-				child.kind_of? AttlistDecl and
-				child.element_name == element and
-				child.include? attribute
-			end
-			return nil unless att_decl
-			att_decl[attribute]
-		end
-
-		def clone
-			DocType.new self
-		end
-
-		# output::
-		#   Where to write the string
-		# indent::
-		#   An integer.  If -1, no indenting will be used; otherwise, the
-		#   indentation will be this number of spaces, and children will be
-		#   indented an additional amount.
-		# transitive::
-		#   If transitive is true and indent is >= 0, then the output will be
-		#   pretty-printed in such a way that the added whitespace does not affect
-		#   the absolute *value* of the document -- that is, it leaves the value
-		#   and number of Text nodes in the document unchanged.
-		# ie_hack::
-		#   Internet Explorer is the worst piece of crap to have ever been
-		#   written, with the possible exception of Windows itself.  Since IE is
-		#   unable to parse proper XML, we have to provide a hack to generate XML
-		#   that IE's limited abilities can handle.  This hack inserts a space 
-		#   before the /> on empty tags.
-		#
-		def write( output, indent=0, transitive=false, ie_hack=false )
-			indent( output, indent )
-			output << START
-			output << ' '
-			output << @name
-			output << " #@external_id" if @external_id
-			output << " #@long_name" if @long_name
-			output << " #@uri" if @uri
-			unless @children.empty?
-				next_indent = indent + 1
-				output << ' ['
-				child = nil		# speed
-				@children.each { |child|
-					output << "\n"
-					child.write( output, next_indent )
-				}
-				output << "\n"
-				#output << '   '*next_indent
-				output << "]"
-			end
-			output << STOP
-		end
+      end
+    end
+
+    def node_type
+      :doctype
+    end
+
+    def attributes_of element
+      rv = []
+      each do |child|
+        child.each do |key,val|
+          rv << Attribute.new(key,val)
+        end if child.kind_of? AttlistDecl and child.element_name == element
+      end
+      rv
+    end
+
+    def attribute_of element, attribute
+      att_decl = find do |child|
+        child.kind_of? AttlistDecl and
+        child.element_name == element and
+        child.include? attribute
+      end
+      return nil unless att_decl
+      att_decl[attribute]
+    end
+
+    def clone
+      DocType.new self
+    end
+
+    # output::
+    #   Where to write the string
+    # indent::
+    #   An integer.  If -1, no indenting will be used; otherwise, the
+    #   indentation will be this number of spaces, and children will be
+    #   indented an additional amount.
+    # transitive::
+    #   If transitive is true and indent is >= 0, then the output will be
+    #   pretty-printed in such a way that the added whitespace does not affect
+    #   the absolute *value* of the document -- that is, it leaves the value
+    #   and number of Text nodes in the document unchanged.
+    # ie_hack::
+    #   Internet Explorer is the worst piece of crap to have ever been
+    #   written, with the possible exception of Windows itself.  Since IE is
+    #   unable to parse proper XML, we have to provide a hack to generate XML
+    #   that IE's limited abilities can handle.  This hack inserts a space 
+    #   before the /> on empty tags.
+    #
+    def write( output, indent=0, transitive=false, ie_hack=false )
+      indent( output, indent )
+      output << START
+      output << ' '
+      output << @name
+      output << " #@external_id" if @external_id
+      output << " #@long_name" if @long_name
+      output << " #@uri" if @uri
+      unless @children.empty?
+        next_indent = indent + 1
+        output << ' ['
+        child = nil    # speed
+        @children.each { |child|
+          output << "\n"
+          child.write( output, next_indent )
+        }
+        #output << '   '*next_indent
+        output << "\n]"
+      end
+      output << STOP
+    end
 
     def context
       @parent.context
     end
 
-		def entity( name )
-			@entities[name].unnormalized if @entities[name]
-		end
-
-		def add child
-			super(child)
-			@entities = DEFAULT_ENTITIES.clone if @entities == DEFAULT_ENTITIES
-			@entities[ child.name ] = child if child.kind_of? Entity
-		end
-	end
-
-	# We don't really handle any of these since we're not a validating
-	# parser, so we can be pretty dumb about them.  All we need to be able
-	# to do is spew them back out on a write()
-
-	# This is an abstract class.  You never use this directly; it serves as a
-	# parent class for the specific declarations.
-	class Declaration < Child
-		def initialize src
-			super()
-			@string = src
-		end
-
-		def to_s
-			@string+'>'
-		end
-
-		def write( output, indent )
-			output << ('   '*indent) if indent > 0
-			output << to_s
-		end
-	end
-	
-	public
-	class ElementDecl < Declaration
-		def initialize( src )
-			super
-		end
-	end
-
-	class ExternalEntity < Child
-		def initialize( src )
-			super()
-			@entity = src
-		end
-		def to_s
-			@entity
-		end
-		def write( output, indent )
-			output << @entity
-			output << "\n"
-		end
-	end
-
-	class NotationDecl < Child
-		def initialize name, middle, rest
-			@name = name
-			@middle = middle
-			@rest = rest
-		end
-
-		def to_s
-			"<!NOTATION #@name '#@middle #@rest'>"
-		end
-
-		def write( output, indent=-1 )
-			output << ('   '*indent) if indent > 0
-			output << to_s
-		end
-	end
+    def entity( name )
+      @entities[name].unnormalized if @entities[name]
+    end
+
+    def add child
+      super(child)
+      @entities = DEFAULT_ENTITIES.clone if @entities == DEFAULT_ENTITIES
+      @entities[ child.name ] = child if child.kind_of? Entity
+    end
+    
+    # This method retrieves the public identifier identifying the document's 
+    # DTD.
+    #
+    # Method contributed by Henrik Martensson
+    def public
+      case @external_id
+      when "SYSTEM"
+        nil
+      when "PUBLIC"
+        strip_quotes(@long_name)
+      end
+    end
+    
+    # This method retrieves the system identifier identifying the document's DTD
+    #
+    # Method contributed by Henrik Martensson
+    def system
+      case @external_id
+      when "SYSTEM"
+        strip_quotes(@long_name)
+      when "PUBLIC"
+        @uri.kind_of?(String) ? strip_quotes(@uri) : nil
+      end
+    end
+    
+    # This method returns a list of notations that have been declared in the
+    # _internal_ DTD subset. Notations in the external DTD subset are not 
+    # listed.
+    #
+    # Method contributed by Henrik Martensson
+    def notations
+      children().select {|node| node.kind_of?(REXML::NotationDecl)}
+    end
+    
+    # Retrieves a named notation. Only notations declared in the internal
+    # DTD subset can be retrieved.
+    #
+    # Method contributed by Henrik Martensson
+    def notation(name)
+      notations.find { |notation_decl|
+        notation_decl.name == name
+      }
+    end
+    
+    private
+    
+    # Method contributed by Henrik Martensson
+    def strip_quotes(quoted_string)
+      quoted_string =~ /^[\'\"].*[\´\"]$/ ?
+        quoted_string[1, quoted_string.length-2] :
+        quoted_string
+    end
+  end
+
+  # We don't really handle any of these since we're not a validating
+  # parser, so we can be pretty dumb about them.  All we need to be able
+  # to do is spew them back out on a write()
+
+  # This is an abstract class.  You never use this directly; it serves as a
+  # parent class for the specific declarations.
+  class Declaration < Child
+    def initialize src
+      super()
+      @string = src
+    end
+
+    def to_s
+      @string+'>'
+    end
+
+    def write( output, indent )
+      output << ('   '*indent) if indent > 0
+      output << to_s
+    end
+  end
+  
+  public
+  class ElementDecl < Declaration
+    def initialize( src )
+      super
+    end
+  end
+
+  class ExternalEntity < Child
+    def initialize( src )
+      super()
+      @entity = src
+    end
+    def to_s
+      @entity
+    end
+    def write( output, indent )
+      output << @entity
+    end
+  end
+
+  class NotationDecl < Child
+    attr_accessor :public, :system
+    def initialize name, middle, pub, sys
+      super(nil)
+      @name = name
+      @middle = middle
+      @public = pub
+      @system = sys
+    end
+
+    def to_s
+      "<!NOTATION #@name #@middle#{
+        @public ? ' ' + public.inspect : '' 
+      }#{
+        @system ? ' ' +@system.inspect : ''
+      }>"
+    end
+
+    def write( output, indent=-1 )
+      output << ('   '*indent) if indent > 0
+      output << to_s
+    end
+    
+    # This method retrieves the name of the notation.
+    #
+    # Method contributed by Henrik Martensson
+    def name
+      @name
+    end
+  end
 end
author	ser <ser@b2dd03c8-39d4-4d8f-98ff-823fe69b080e>	2006-04-14 02:56:44 +0000
committer	ser <ser@b2dd03c8-39d4-4d8f-98ff-823fe69b080e>	2006-04-14 02:56:44 +0000
commit	5f4bf329291f885d23f4d6277b4a22862a291687 (patch)
tree	0ce7091ae84f46a62452f6671c28ad8aac834d68 /lib/rexml/doctype.rb
parent	bec759abcc335aabde7c0dcd8c85c18223446644 (diff)