summaryrefslogtreecommitdiff
path: root/lib/rexml/source.rb
diff options
context:
space:
mode:
authorser <ser@b2dd03c8-39d4-4d8f-98ff-823fe69b080e>2007-01-20 03:56:02 +0000
committerser <ser@b2dd03c8-39d4-4d8f-98ff-823fe69b080e>2007-01-20 03:56:02 +0000
commitfa4bfa6af585589e4465831f1489fee83ce26f09 (patch)
treefabaa77b102a6a2b93bdb79b70c2fe5dfe21763e /lib/rexml/source.rb
parentf700c1354f19ca5ad73f4e119dcbff493a3e6e00 (diff)
Merged from REXML main repository:
Fixes ticket:68. NOTE that this involves an API change! Entity declarations in the doctype now generate events that carry two, not one, arguments. Implements ticket:15, using gwrite's suggestion. This allows Element to be subclassed. Two unrelated changes, because subversion is retarded and doesn't do block-level commits: 1) Fixed a typo bug in previous change for ticket:15 2) Fixed namespaces handling in XPath and element. ***** Note that this is an API change!!! ***** Element.namespaces() now returns a hash of namespace mappings which are relevant for that node. Fixes a bug in multiple decodings The changeset 1230:1231 was bad. The default behavior is *not* to use the native REXML encodings by default, but rather to use ICONV by default. I know that this will piss some people off, but defaulting to the pure Ruby version isn't the correct solution, and it breaks other encodings, so I've reverted it. * Fixes ticket:61 (xpath_parser) * Fixes ticket:63 (UTF-16; UNILE decoding was bad) * Cleans up some tests, removing opportunities for test corruption * Improves parsing error messages a little * Adds the ability to override the encoding detection in Source construction * Fixes an edge case in Functions::string, where document nodes weren't correctly converted * Fixes Functions::string() for Element and Document nodes * Fixes some problems in entity handling Addresses ticket:66 Fixes ticket:71 Addresses ticket:78 NOTE: that this also fixes what is technically another bug in REXML. REXML's XPath parser used to allow exponential notation in numbers. The XPath spec is specific about what a number is, and scientific notation is not included. Therefore, this has been fixed. Cross-ported a fix for ticket:88 from CVS. Fixes ticket:80 Documentation cleanup. Ticket:84 Applied Kou's fix for an un-trac'ed bug. ------------------------------------------------------------------------ git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@11548 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
Diffstat (limited to 'lib/rexml/source.rb')
-rw-r--r--lib/rexml/source.rb23
1 files changed, 17 insertions, 6 deletions
diff --git a/lib/rexml/source.rb b/lib/rexml/source.rb
index 3b6e813baf..2fee99c0e9 100644
--- a/lib/rexml/source.rb
+++ b/lib/rexml/source.rb
@@ -6,7 +6,7 @@ module REXML
# Generates a Source object
# @param arg Either a String, or an IO
# @return a Source, or nil if a bad argument was given
- def SourceFactory::create_from arg#, slurp=true
+ def SourceFactory::create_from(arg)
if arg.kind_of? String
Source.new(arg)
elsif arg.respond_to? :read and
@@ -35,12 +35,19 @@ module REXML
# Constructor
# @param arg must be a String, and should be a valid XML document
- def initialize(arg)
+ # @param encoding if non-null, sets the encoding of the source to this
+ # value, overriding all encoding detection
+ def initialize(arg, encoding=nil)
@orig = @buffer = arg
- self.encoding = check_encoding( @buffer )
+ if encoding
+ self.encoding = encoding
+ else
+ self.encoding = check_encoding( @buffer )
+ end
@line = 0
end
+
# Inherited from Encoding
# Overridden to support optimized en/decoding
def encoding=(enc)
@@ -124,7 +131,7 @@ module REXML
#attr_reader :block_size
# block_size has been deprecated
- def initialize(arg, block_size=500)
+ def initialize(arg, block_size=500, encoding=nil)
@er_source = @source = arg
@to_utf = false
# Determining the encoding is a deceptively difficult issue to resolve.
@@ -134,10 +141,12 @@ module REXML
# if there is one. If there isn't one, the file MUST be UTF-8, as per
# the XML spec. If there is one, we can determine the encoding from
# it.
+ @buffer = ""
str = @source.read( 2 )
- if /\A(?:\xfe\xff|\xff\xfe)/n =~ str
+ if encoding
+ self.encoding = encoding
+ elsif /\A(?:\xfe\xff|\xff\xfe)/n =~ str
self.encoding = check_encoding( str )
- @line_break = encode( '>' )
else
@line_break = '>'
end
@@ -159,6 +168,8 @@ module REXML
str = @source.readline(@line_break)
str = decode(str) if @to_utf and str
@buffer << str
+ rescue Iconv::IllegalSequence
+ raise
rescue
@source = nil
end