From 91ed484f92040e5c2006a3a00ec77a54d552cf37 Mon Sep 17 00:00:00 2001 From: kou Date: Fri, 17 Sep 2010 13:14:14 +0000 Subject: * test/rexml/: import REXML tests from http://www.germane-software.com/repos/rexml/trunk/test/. Many tests are failed temporary. I'll fix them quickly. Sorry. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@29282 b2dd03c8-39d4-4d8f-98ff-823fe69b080e --- test/rexml/data/documentation.xml | 542 ++++++++++++++++++++++++++++++++++++++ 1 file changed, 542 insertions(+) create mode 100644 test/rexml/data/documentation.xml (limited to 'test/rexml/data/documentation.xml') diff --git a/test/rexml/data/documentation.xml b/test/rexml/data/documentation.xml new file mode 100644 index 0000000000..a1ad6e878b --- /dev/null +++ b/test/rexml/data/documentation.xml @@ -0,0 +1,542 @@ + + + + + + + + REXML + + + + @ANT_VERSION@ + + @ANT_DATE@ + + http://www.germane-software.com/software/rexml + + rexml + + ruby + + Sean + Russell + + + + +

REXML is a conformant XML processor for the Ruby programming + language. REXML passes 100% of the Oasis non-validating tests and + includes full XPath support. It is reasonably fast, and is implemented + in pure Ruby. Best of all, it has a clean, intuitive API. REXML is + included in the standard library of Ruby

+ +

This software is distribute under the Ruby + license.

+
+ + +

REXML arose out of a desire for a straightforward XML API, and is an + attempt at an API that doesn't require constant referencing of + documentation to do common tasks. "Keep the common case simple, and the + uncommon, possible."

+ +

REXML avoids The DOM API, which violates the maxim of simplicity. It + does provide a DOM model, but one that is Ruby-ized. It is an + XML API oriented for Ruby programmers, not for XML programmers coming + from Java.

+ +

Some of the common differences are that the Ruby API relies on block + enumerations, rather than iterators. For example, the Java code:

+ + for (Enumeration e=parent.getChildren(); e.hasMoreElements(); ) { + Element child = (Element)e.nextElement(); // Do something with child +} + +

in Ruby becomes:

+ + parent.each_child{ |child| # Do something with child } + +

Can't you feel the peace and contentment in this block of code? Ruby + is the language Buddha would have programmed in.

+ +

One last thing. If you use and like this software, and you're in a + position of power in a company in Western Europe and are looking for a + software architect or developer, drop me a line. I took a lot of French + classes in college (all of which I've forgotten), and I lived in Munich + long enough that I was pretty fluent by the time I left, and I'd love to + get back over there.

+
+ + + Four intuitive parsing APIs. + + Intuitive, powerful, and reasonably fast tree parsing API (a-la + DOM + + Fast stream parsing API (a-la SAX)This is not a SAX + API. + + SAX2-based APIIn addition to the native REXML streaming + API. This is slower than the native REXML API, but does a lot more work + for you. + + Pull parsing API. + + Small + + Reasonably fast (for interpreted code) + + Native Ruby + + Full XPath supportCurrently only available for the tree + API + + XML 1.0 conformantREXML passes all of the non-validating + OASIS tests. There are probably places where REXML isn't conformant, but + I try to fix them as they're reported. + + ISO-8859-1, UNILE, UTF-16 and UTF-8 input and output; also, + support for any encoding the iconv supports. + + Documentation + +
+ + + +

You don't have to install anything; if you're running a + version of Ruby greater than 1.8, REXML is included. However, if you + choose to upgrade from the REXML distribution, run the command: + ruby bin/install.rb. By the way, you really should look at + these sorts of files before you run them as root. They could contain + anything, and since (in Ruby, at least) they tend to be mercifully + short, it doesn't hurt to glance over them. If you want to uninstall + REXML, run ruby bin/install.rb -u.

+
+ + +

If you have Test::Unit installed, you can run the unit test cases. + Run the command: ruby bin/suite.rb; it runs against the + distribution, not against the installed version.

+
+ + +

There is a benchmark suite in benchmarks/. To run the + benchmarks, change into that directory and run ruby + comparison.rb. If you have nothing else installed, only the + benchmarks for REXML will be run. However, if you have any of the + following installed, benchmarks for those tools will also be run:

+ + + NQXML + + XMLParser + + Electric XML (you must copy EXML.jar into the + benchmarks directory and compile + flatbench.java before running the test) + + +

The results will be written to index.html.

+
+ + +

Please see the Tutorial.

+ +

The API documentation is available on-line, + or it can be downloaded as an archive in + tgz format (~70Kb) or (if you're a masochist) in + zip format (~280Kb). The best solution is to download and install + Dave Thomas' most excellent rdoc and generate the API docs + yourself; then you'll be sure to have the latest API docs and won't have + to keep downloading the doc archive.

+ +

The unit tests in test/ and the benchmarking code in + benchmark/ provide additional examples of using REXML. The + Tutorial provides examples with commentary. The documentation unpacks + into rexml/doc.

+ +

Kouhei Sutou maintains a Japanese + version of the REXML API docs. Kou's + documentation page contains links to binary archives for various + versions of the documentation.

+
+
+ + + +

Unfortunately, NQXML is the only package REXML can be compared + against; XMLParser uses expat, which is a native library, and really is + a different beast altogether. So in comparing NQXML and REXML you can + look at four things: speed, size, completeness, and API.

+ +

Benchmarks

+ +

REXML is faster than NQXML in some things, and slower than NQXML in a + couple of things. You can see this for yourself by running the supplied + benchmarks. Most of the places where REXML are slower are because of the + convenience methodsFor example, + element.elements[index] isn't really an array operation; + index can be an Integer or an XPath, and this feature is relatively time + expensive.. On the positive side, most of the convenience + methods can be bypassed if you know what you are doing. Check the benchmark comparison page for a + general comparison. You can look at the benchmark code yourself + to decide how much salt to take with them.

+ +

The sizes of the XML parsers are closeAs measured with + ruby -nle 'print unless /^\s*(#.*|)$/' *.rb | wc -l + . NQXML 1.1.3 has 1580 non-blank, non-comment lines of code; + REXML 2.0 has 2340REXML started out with about 1200, but that + number has been steadily increasing as features are added. XPath + accounts for 541 lines of that code, so the core REXML has about 1800 + LOC..

+ +

REXML is a conformant XML 1.0 parser. It supports multiple language + encodings, and internal processing uses the required UTF-8 and UTF-16 + encodings. It passes 100% of the Oasis non-validating tests. + Furthermore, it provides a full implementation of XPath, a SAX2 and a + PullParser API.

+
+ + +

As of release 2.0, XPath 1.0 is fully implemented.

+ +

I fully expect bugs to crop up from time to time, so if you see any + bogus XPath results, please let me know. That said, since I'm now + following the XPath grammar and spec fairly closely, I suspect that you + won't be surprised by REXML's XPath very often, and it should become + rock solid fairly quickly.

+ +

Check the "bugs" section for known problems; there are little bits of + XPath here and there that are not yet implemented, but I'll get to them + soon.

+ +

Namespace support is rather odd, but it isn't my fault. I can only do + so much and still conform to the specs. In particular, XPath attempts to + help as much as possible. Therefore, in the trivial cases, you can pass + namespace prefixes to Element.elements[...] and so on -- in these cases, + XPath will use the namespace environment of the base element you're + starting your XPath search from. However, if you want to do something + more complex, like pass in your own namespace environment, you have to + use the XPath first(), each(), and match() methods. Also, default + namespaces force you to use the XPath methods, rather than the + convenience methods, because there is no way for XPath to know what the + mappings for the default namespaces should be. This is exactly why I + loath namespaces -- a pox on the person(s) who thought them up!

+
+ + +

Namespace support is now fairly stable. One thing to be aware of is + that REXML is not (yet) a validating parser. This means that some + invalid namespace declarations are not caught.

+
+ + +

There is a low-volume mailing list dedicated to REXML. To subscribe, + send an empty email to ser-rexml-subscribe@germane-software.com. + This list is more or less spam proof. To unsubscribe, similarly send a + message to ser-rexml-unsubscribe@germane-software.com.

+
+ + +

An RSS + file for REXML is now being generated from the change log. This + allows you to be alerted of bug fixes and feature additions via "pull". + Another + RSS is available which contains a single item: the release notice + for the most recent release. This is an abuse of the RSS + mechanism, which was intended to be a distribution system for headlines + linked back to full articles, but it works. The headline for REXML is + the version number, and the description is the change log. The links all + link back to the REXML home page. The URL for the RSS itself is + http://www.germane-software.com/software/rexml/rss.xml.

+ +

The changelog itself is here.

+ +

For those who are interested, there's a SLOCCount (by David A. Wheeler) file + with stats on the REXML sourcecode. Note that the SLOCCount output + includes the files in the test/, benchmarks/, and bin/ directories, as + well as the main sourcecode for REXML itself.

+
+ + + + Raggle is a + console-based RSS aggregator. + + getrss + is an RSS aggregator + + Ned Konz's ruby-htmltools + uses REXML + + Hiroshi NAKAMURA's SOAP4R + package can use REXML as the XML processor. + + Chris Morris' XML + Serializer. XML Serializer provides a serialization mechanism + for Ruby that provides a bidirectional mapping between Ruby classes + and XML documents. + + Much of the RubyXML + site is generated with scripts that use REXML. RubyXML is a great + place to find information about th intersection between Ruby and + XML. + + + + +

You can submit bug reports and feature requests, and view the list of + known bugs, at the REXML bug report + page. Please do submit bug reports. If you really want your bug + fixed fast, include an runit or Test::Unit method (or methods) that + illustrates the problem. At the very least, send me some XML that REXML + doesn't process properly.

+ +

You don't have to send an entire test suite -- just the unit test + methods. If you don't send me a unit test, I'll have to write one + myself, which will mean that your bug will take longer to fix.

+ +

When submitting bug reports, please include the version of Ruby and + of REXML that you're using, and the operating system you're running on. + Just run: ruby -vrrexml/rexml -e 'p + REXML::VERSION,PLATFORM' and paste the results in your bug + report. Include your email if you want a response about the bug.

+ + Attributes are not handled internally as nodes, so you can't + perform node functions on them. This will have to change. It'll also + probably mean that, rather than returning attribute values, XPath will + return the Attribute nodes. + + Some of the XPath functions are untestedMike + Stok has been testing, debugging, and implementing some of these + Functions (and he's been doing a good job) so there's steady improvement + in this area.. Any XPath functions that don't work are also + bugs... please report them. If you send a unit test that illustrates the + problem, I'll try to fix the problem within a couple of days (if I can) + and send you a patch, personally. + + Accessing prefixes for which there is no defined namespace in an + XPath should throw an exception. It currently doesn't -- it just fails + to match. +
+ + + Reparsing a tree with a pull/SAX parser + + Better namespace support in SAX + + Lazy tree parsing + + Segregate parsers, for optimized minimal distributions + + XML <-> Ruby + + Validation support + + True XML character support + + Add XPath support for streaming APIs + + XQuery support + + XUpdate support + + Make sure namespaces are supported in pull parser + + Add document start and entity replacement events + in pull parser + + Better stream parsing exception handling + + I'd like to hack XMLRPC4R to use REXML, for my own + purposes. + +
+ + + REXML is hanging while parsing one of my XML files. + + Your XML is probably malformed. Some malformed XML, especially XML that + contains literal '<' embedded in the document, causes REXML to hang. + REXML should be throwing an exception, but it doesn't; this is a bug. I'm + aware that it is an extremely annoying bug, and it is one I'm trying to + solve in a way that doesn't significantly reduce REXML's parsing + speed. + + I'm using the XPath '//foo' on an XML branch node X, and keep getting + all of the 'foo' elements in the entire document. Why? Shouldn't it return + only the 'foo' element descendants of X? + + No. XPath specifies that '/' returns the document root, regardless of + the context node. '//' also starts at the document root. If you want to + limit your search to a branch, you need to use the self:: axe. EG, + 'self::node()//foo', or the shorthand './/foo'. + + I want to parse a document both as a tree, and as a stream. Can I do + this? + + Yes, and no. There is no mechanism that directly supports this in + REXML. However, aside from writing your own traversal layer, there is a + way of doing this. To turn a tree into a stream, just turn the branch you + want to process as a stream back into a string, and re-parse it with your + preferred API. EG: pp = PullParser.new( some_element.to_s ). The other + direction is more difficult; you basically have to build a tree from the + events. REXML will have one of these builders, eventually, but it doesn't + currently exist. + + Why is Element.elements indexed off of '1' instead of '0'? + + Because of XPath. The XPath specification states that the index of the + first child node is '1'. Although it may be counter-intuitive to base + elements on 1, it is more undesireable to have element.elements[0] == + element.elements[ 'node()[1]' ]. Since I can't change the XPath + specification, the result is that Element.elements[1] is the first child + element. + + Why isn't REXML a validating parser? + + Because validating parsers must include code that parses and interprets + DTDs. I hate DTDs. REXML supports the barest minimum of DTD parsing, and + even that isn't complete. There is DTD parsing code in the works, but I + only work on it when I'm really, really bored. Rumor has it that a + contributor is working on a DTD parser for REXML; rest assured that any + such contribution will be included with REXML as soon as it is + available. + + I'm trying to create an ISO-8859-1 document, but when I add text to the + document it isn't being properly encoded. + + Regardless of what the encoding of your document is, when you add text + programmatically to a REXML document you must ensure that you are + only adding UTF-8 to the tree. In particular, you can't add ISO-8859-1 + encoded text that contains characters above 0x80 to REXML trees -- you + must convert it to UTF-8 before doing so. Luckily, this is easy: + text.unpack('C*').pack('U*') will do the trick. 7-bit ASCII + is identical to UTF-8, so you probably won't need to worry about this. + + How do I get the tag name of an Element? + + You take a look at the APIs, and notice that Element + includes Namespace. Then you click on the + Namespace link and look at the methods that + Element includes from Namespace. One of these is + name(). Another is expanded_name(). Yet another + is prefix(). Then, you email the author of rdoc and ask him + to extend rdoc so that it lists methods in the API that are included from + other files, so that you don't have to do all of that looking around for + your method. + + + +

I've had help from a number of resources; if I haven't listed you here, + it means that I just haven't gotten around to adding you, or that I'm a + dork and have forgotten. In either case, feel free to write me and + complain.

+ + + Mike Stok has been very active, sending not only fixes for bugs + (especially in Functions), but also by providing unit tests and making + sure REXML runs under Ruby 1.7. He also sent the most awesome hand + knitted tea cozy, with "REXML" and the Ruby knitted into it. + + Kouhei Sutou translated the REXML API documentation to Japanese! + Links are in the API docs section of the main documentation. He has also + contributed a large number of bug reports and patches to fix bugs in + REXML. + + Erik Terpstra heard my pleas and submitted several logos for + REXML. After sagely procrastinating for several weeks, I finally forced + my poor slave of a wife to pick one (this is what we call "delegation"). + She did, with caveats; Erik quickly made the changes, and the result is + what you now see at the top of this page. He also supplied a smaller version that you can include + with your projects that use REXML, if you'd like. + + Ernest Ellingson contributed the sourcecode for turning UTF16 and + UNILE encodings into UTF8, which allowed REXML to get the 100% OASIS + valid tests rating. + + Ian Macdonald provided me with a comprehensive, well written RPM + spec file. + + Oliver M . Bolzer is maintaining a Debian package distribution of + REXML. He also has provided good feedback and bug reports about + namespace support. + + Michael Granger supplied a patch for REXML that make the unit + tests pass under Ruby 1.7. + + James Britt contributed code that makes using + Document.parse_stream easier to use by allowing it to be passed either a + Source, File, or String. + + Tobias Reif: Numerous bug reports, and suggestions for + improvement. + + Stefan Scholl, who provided a lot of feedback and bug reports + while I was trying to get ISO-8859-1 support working. + + Steven E Lumos for volunteering information about XPath + particulars. + + Fumitoshi UKAI provided some bug fixes for CData metacharacter + quoting. + + TAKAHASHI Masayoshi, for information on UTF + + Robert Feldt: Bug reports and suggestions/recommendations about + improving REXML. Testing is one of the most important aspects of + software development. + + Electric + XML: This was, after all, the inspiration for REXML. Originally, + I was just going to do a straight port, and although REXML doesn't in + any way, shape or form resemble Electric XML, still the basic framework + and philosophy was inspired by E-XML. And I still use E-XML in my Java + projects. + + NQXML: + While I may complain about the NQXML API, I wrote a few applications + using it that wouldn't have been written otherwise, and it was very + useful to me. It also encouraged me to write REXML. Never complain about + free software *slap*. + + See my technologies + page for a more comprehensive list of computer technologies that + I depend on for my day-to-day work. + + rdoc, an excellent JavaDoc analogWhen I was first + working on REXML, rdoc wasn't, IMO, very good, so I wrote API2XML. + API2XML was good enough for a while, and then there was a flurry of work + on rdoc, and it quickly surpassed API2XML in features. Since I was never + really interested in maintaining a JavaDoc analog, I stopped support of + API2XML, and am now recommending that people use + rdoc.. + + Many, many other people who've submitted bug reports, suggestions, + and positive feedback. You're all co-developers! + +
+
-- cgit v1.2.3