summaryrefslogtreecommitdiff
path: root/lib/csv.rb
diff options
context:
space:
mode:
authornobu <nobu@b2dd03c8-39d4-4d8f-98ff-823fe69b080e>2009-03-06 03:56:38 +0000
committernobu <nobu@b2dd03c8-39d4-4d8f-98ff-823fe69b080e>2009-03-06 03:56:38 +0000
commit287a34ae0dfc23e4158f67cb7783d239f202c368 (patch)
tree5e35d5b41aae961b37cf6632f60c42f51c7aa775 /lib/csv.rb
parent9b52ae2e6491bb5d6c59e1799449f6268baf6f89 (diff)
* {ext,lib,test}/**/*.rb: removed trailing spaces.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@22784 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
Diffstat (limited to 'lib/csv.rb')
-rw-r--r--lib/csv.rb970
1 files changed, 485 insertions, 485 deletions
diff --git a/lib/csv.rb b/lib/csv.rb
index 8adc2973b9..a5f670c059 100644
--- a/lib/csv.rb
+++ b/lib/csv.rb
@@ -5,38 +5,38 @@
# Created by James Edward Gray II on 2005-10-31.
# Copyright 2005 James Edward Gray II. You can redistribute or modify this code
# under the terms of Ruby's license.
-#
+#
# See CSV for documentation.
-#
+#
# == Description
-#
+#
# Welcome to the new and improved CSV.
-#
+#
# This version of the CSV library began its life as FasterCSV. FasterCSV was
# intended as a replacement to Ruby's then standard CSV library. It was
# designed to address concerns users of that library had and it had three
# primary goals:
-#
+#
# 1. Be significantly faster than CSV while remaining a pure Ruby library.
# 2. Use a smaller and easier to maintain code base. (FasterCSV eventually
# grew larger, was also but considerably richer in features. The parsing
# core remains quite small.)
# 3. Improve on the CSV interface.
-#
+#
# Obviously, the last one is subjective. I did try to defer to the original
# interface whenever I didn't have a compelling reason to change it though, so
# hopefully this won't be too radically different.
-#
+#
# We must have met our goals because FasterCSV was renamed to CSV and replaced
# the original library.
-#
+#
# == What's Different From the Old CSV?
-#
+#
# I'm sure I'll miss something, but I'll try to mention most of the major
# differences I am aware of, to help others quickly get up to speed:
-#
+#
# === CSV Parsing
-#
+#
# * This parser is m17n aware. See CSV for full details.
# * This library has a stricter parser and will throw MalformedCSVErrors on
# problematic data.
@@ -46,9 +46,9 @@
# * The old library returned empty lines as <tt>[nil]</tt>. This library calls
# them <tt>[]</tt>.
# * This library has a much faster parser.
-#
+#
# === Interface
-#
+#
# * CSV now uses Hash-style parameters to set options.
# * CSV no longer has generate_row() or parse_row().
# * The old CSV's Reader and Writer classes have been dropped.
@@ -60,33 +60,33 @@
# * CSV no longer supports partial reads. It works line-by-line.
# * CSV no longer allows the instance methods to override the separators for
# performance reasons. They must be set in the constructor.
-#
+#
# If you use this library and find yourself missing any functionality I have
# trimmed, please {let me know}[mailto:james@grayproductions.net].
-#
+#
# == Documentation
-#
+#
# See CSV for documentation.
-#
+#
# == What is CSV, really?
-#
+#
# CSV maintains a pretty strict definition of CSV taken directly from
# {the RFC}[http://www.ietf.org/rfc/rfc4180.txt]. I relax the rules in only one
# place and that is to make using this library easier. CSV will parse all valid
# CSV.
-#
+#
# What you don't want to do is feed CSV invalid data. Because of the way the
# CSV format works, it's common for a parser to need to read until the end of
# the file to be sure a field is invalid. This eats a lot of time and memory.
-#
+#
# Luckily, when working with invalid CSV, Ruby's built-in methods will almost
# always be superior in every way. For example, parsing non-quoted fields is as
# easy as:
-#
+#
# data.split(",")
-#
+#
# == Questions and/or Comments
-#
+#
# Feel free to email {James Edward Gray II}[mailto:james@grayproductions.net]
# with any questions.
@@ -95,139 +95,139 @@ require "English"
require "date"
require "stringio"
-#
+#
# This class provides a complete interface to CSV files and data. It offers
# tools to enable you to read and write to and from Strings or IO objects, as
# needed.
-#
+#
# == Reading
-#
+#
# === From a File
-#
+#
# ==== A Line at a Time
-#
+#
# CSV.foreach("path/to/file.csv") do |row|
# # use row here...
# end
-#
+#
# ==== All at Once
-#
+#
# arr_of_arrs = CSV.read("path/to/file.csv")
-#
+#
# === From a String
-#
+#
# ==== A Line at a Time
-#
+#
# CSV.parse("CSV,data,String") do |row|
# # use row here...
# end
-#
+#
# ==== All at Once
-#
+#
# arr_of_arrs = CSV.parse("CSV,data,String")
-#
+#
# == Writing
-#
+#
# === To a File
-#
+#
# CSV.open("path/to/file.csv", "wb") do |csv|
# csv << ["row", "of", "CSV", "data"]
# csv << ["another", "row"]
# # ...
# end
-#
+#
# === To a String
-#
+#
# csv_string = CSV.generate do |csv|
# csv << ["row", "of", "CSV", "data"]
# csv << ["another", "row"]
# # ...
# end
-#
+#
# == Convert a Single Line
-#
+#
# csv_string = ["CSV", "data"].to_csv # to CSV
# csv_array = "CSV,String".parse_csv # from CSV
-#
+#
# == Shortcut Interface
-#
+#
# CSV { |csv_out| csv_out << %w{my data here} } # to $stdout
# CSV(csv = "") { |csv_str| csv_str << %w{my data here} } # to a String
# CSV($stderr) { |csv_err| csv_err << %w{my data here} } # to $stderr
-#
+#
# == CSV and Character Encodings (M17n or Multilingualization)
-#
+#
# This new CSV parser is m17n savvy. The parser works in the Encoding of the IO
# or String object being read from or written to. Your data is never transcoded
# (unless you ask Ruby to transcode it for you) and will literally be parsed in
# the Encoding it is in. Thus CSV will return Arrays or Rows of Strings in the
# Encoding of your data. This is accomplished by transcoding the parser itself
# into your Encoding.
-#
+#
# Some transcoding must take place, of course, to accomplish this multiencoding
# support. For example, <tt>:col_sep</tt>, <tt>:row_sep</tt>, and
# <tt>:quote_char</tt> must be transcoded to match your data. Hopefully this
# makes the entire process feel transparent, since CSV's defaults should just
# magically work for you data. However, you can set these values manually in
# the target Encoding to avoid the translation.
-#
+#
# It's also important to note that while all of CSV's core parser is now
# Encoding agnostic, some features are not. For example, the built-in
# converters will try to transcode data to UTF-8 before making conversions.
# Again, you can provide custom converters that are aware of your Encodings to
# avoid this translation. It's just too hard for me to support native
# conversions in all of Ruby's Encodings.
-#
+#
# Anyway, the practical side of this is simple: make sure IO and String objects
# passed into CSV have the proper Encoding set and everything should just work.
# CSV methods that allow you to open IO objects (CSV::foreach(), CSV::open(),
# CSV::read(), and CSV::readlines()) do allow you to specify the Encoding.
-#
+#
# One minor exception comes when generating CSV into a String with an Encoding
# that is not ASCII compatible. There's no existing data for CSV to use to
# prepare itself and thus you will probably need to manually specify the desired
# Encoding for most of those cases. It will try to guess using the fields in a
# row of output though, when using CSV::generate_line() or Array#to_csv().
-#
+#
# I try to point out any other Encoding issues in the documentation of methods
# as they come up.
-#
+#
# This has been tested to the best of my ability with all non-"dummy" Encodings
# Ruby ships with. However, it is brave new code and may have some bugs.
# Please feel free to {report}[mailto:james@grayproductions.net] any issues you
# find with it.
-#
+#
class CSV
# The version of the installed library.
VERSION = "2.4.5".freeze
-
- #
+
+ #
# A CSV::Row is part Array and part Hash. It retains an order for the fields
# and allows duplicates just as an Array would, but also allows you to access
# fields by name just as you could if they were in a Hash.
- #
+ #
# All rows returned by CSV will be constructed from this class, if header row
# processing is activated.
- #
+ #
class Row
- #
+ #
# Construct a new CSV::Row from +headers+ and +fields+, which are expected
# to be Arrays. If one Array is shorter than the other, it will be padded
# with +nil+ objects.
- #
+ #
# The optional +header_row+ parameter can be set to +true+ to indicate, via
# CSV::Row.header_row?() and CSV::Row.field_row?(), that this is a header
# row. Otherwise, the row is assumes to be a field row.
- #
+ #
# A CSV::Row object supports the following Array methods through delegation:
- #
+ #
# * empty?()
# * length()
# * size()
- #
+ #
def initialize(headers, fields, header_row = false)
@header_row = header_row
-
+
# handle extra headers or fields
@row = if headers.size > fields.size
headers.zip(fields)
@@ -235,7 +235,7 @@ class CSV
fields.zip(headers).map { |pair| pair.reverse }
end
end
-
+
# Internal data format used to compare equality.
attr_reader :row
protected :row
@@ -244,35 +244,35 @@ class CSV
extend Forwardable
def_delegators :@row, :empty?, :length, :size
-
+
# Returns +true+ if this is a header row.
def header_row?
@header_row
end
-
+
# Returns +true+ if this is a field row.
def field_row?
not header_row?
end
-
+
# Returns the headers of this row.
def headers
@row.map { |pair| pair.first }
end
-
- #
+
+ #
# :call-seq:
# field( header )
# field( header, offset )
# field( index )
- #
+ #
# This method will fetch the field value by +header+ or +index+. If a field
# is not found, +nil+ is returned.
- #
+ #
# When provided, +offset+ ensures that a header match occurrs on or later
- # than the +offset+ index. You can use this to find duplicate headers,
+ # than the +offset+ index. You can use this to find duplicate headers,
# without resorting to hard-coding exact indices.
- #
+ #
def field(header_or_index, minimum_index = 0)
# locate the pair
finder = header_or_index.is_a?(Integer) ? :[] : :assoc
@@ -282,23 +282,23 @@ class CSV
pair.nil? ? nil : pair.last
end
alias_method :[], :field
-
- #
+
+ #
# :call-seq:
# []=( header, value )
# []=( header, offset, value )
# []=( index, value )
- #
+ #
# Looks up the field by the semantics described in CSV::Row.field() and
# assigns the +value+.
- #
+ #
# Assigning past the end of the row with an index will set all pairs between
# to <tt>[nil, nil]</tt>. Assigning to an unused header appends the new
# pair.
- #
+ #
def []=(*args)
value = args.pop
-
+
if args.first.is_a? Integer
if @row[args.first].nil? # extending past the end with index
@row[args.first] = [nil, value]
@@ -315,20 +315,20 @@ class CSV
end
end
end
-
- #
+
+ #
# :call-seq:
# <<( field )
# <<( header_and_field_array )
# <<( header_and_field_hash )
- #
+ #
# If a two-element Array is provided, it is assumed to be a header and field
# and the pair is appended. A Hash works the same way with the key being
# the header and the value being the field. Anything else is assumed to be
# a lone field which is appended with a +nil+ header.
- #
+ #
# This method returns the row for chaining.
- #
+ #
def <<(arg)
if arg.is_a?(Array) and arg.size == 2 # appending a header and name
@row << arg
@@ -337,33 +337,33 @@ class CSV
else # append field value
@row << [nil, arg]
end
-
+
self # for chaining
end
-
- #
+
+ #
# A shortcut for appending multiple fields. Equivalent to:
- #
+ #
# args.each { |arg| csv_row << arg }
- #
+ #
# This method returns the row for chaining.
- #
+ #
def push(*args)
args.each { |arg| self << arg }
-
+
self # for chaining
end
-
- #
+
+ #
# :call-seq:
# delete( header )
# delete( header, offset )
# delete( index )
- #
+ #
# Used to remove a pair from the row by +header+ or +index+. The pair is
# located as described in CSV::Row.field(). The deleted pair is returned,
# or +nil+ if a pair could not be found.
- #
+ #
def delete(header_or_index, minimum_index = 0)
if header_or_index.is_a? Integer # by index
@row.delete_at(header_or_index)
@@ -371,28 +371,28 @@ class CSV
@row.delete_at(index(header_or_index, minimum_index))
end
end
-
- #
+
+ #
# The provided +block+ is passed a header and field for each pair in the row
# and expected to return +true+ or +false+, depending on whether the pair
# should be deleted.
- #
+ #
# This method returns the row for chaining.
- #
+ #
def delete_if(&block)
@row.delete_if(&block)
-
+
self # for chaining
end
-
- #
+
+ #
# This method accepts any number of arguments which can be headers, indices,
- # Ranges of either, or two-element Arrays containing a header and offset.
+ # Ranges of either, or two-element Arrays containing a header and offset.
# Each argument will be replaced with a field lookup as described in
# CSV::Row.field().
- #
+ #
# If called with no arguments, all fields are returned.
- #
+ #
def fields(*headers_and_or_indices)
if headers_and_or_indices.empty? # return all fields--no arguments
@row.map { |pair| pair.last }
@@ -413,80 +413,80 @@ class CSV
end
end
alias_method :values_at, :fields
-
- #
+
+ #
# :call-seq:
# index( header )
# index( header, offset )
- #
+ #
# This method will return the index of a field with the provided +header+.
# The +offset+ can be used to locate duplicate header names, as described in
# CSV::Row.field().
- #
+ #
def index(header, minimum_index = 0)
# find the pair
index = headers[minimum_index..-1].index(header)
# return the index at the right offset, if we found one
index.nil? ? nil : index + minimum_index
end
-
+
# Returns +true+ if +name+ is a header for this row, and +false+ otherwise.
def header?(name)
headers.include? name
end
alias_method :include?, :header?
-
- #
+
+ #
# Returns +true+ if +data+ matches a field in this row, and +false+
# otherwise.
- #
+ #
def field?(data)
fields.include? data
end
include Enumerable
-
- #
+
+ #
# Yields each pair of the row as header and field tuples (much like
# iterating over a Hash).
- #
+ #
# Support for Enumerable.
- #
+ #
# This method returns the row for chaining.
- #
+ #
def each(&block)
@row.each(&block)
-
+
self # for chaining
end
-
- #
- # Returns +true+ if this row contains the same headers and fields in the
+
+ #
+ # Returns +true+ if this row contains the same headers and fields in the
# same order as +other+.
- #
+ #
def ==(other)
@row == other.row
end
-
- #
+
+ #
# Collapses the row into a simple Hash. Be warning that this discards field
# order and clobbers duplicate fields.
- #
+ #
def to_hash
# flatten just one level of the internal Array
Hash[*@row.inject(Array.new) { |ary, pair| ary.push(*pair) }]
end
-
- #
+
+ #
# Returns the row as a CSV String. Headers are not used. Equivalent to:
- #
+ #
# csv_row.fields.to_csv( options )
- #
+ #
def to_csv(options = Hash.new)
fields.to_csv(options)
end
alias_method :to_s, :to_csv
-
+
# A summary of fields, by header, in an ASCII compatible String.
def inspect
str = ["#<", self.class.to_s]
@@ -505,35 +505,35 @@ class CSV
end
end
end
-
- #
+
+ #
# A CSV::Table is a two-dimensional data structure for representing CSV
- # documents. Tables allow you to work with the data by row or column,
+ # documents. Tables allow you to work with the data by row or column,
# manipulate the data, and even convert the results back to CSV, if needed.
- #
+ #
# All tables returned by CSV will be constructed from this class, if header
# row processing is activated.
- #
+ #
class Table
- #
+ #
# Construct a new CSV::Table from +array_of_rows+, which are expected
# to be CSV::Row objects. All rows are assumed to have the same headers.
- #
+ #
# A CSV::Table object supports the following Array methods through
# delegation:
- #
+ #
# * empty?()
# * length()
# * size()
- #
+ #
def initialize(array_of_rows)
@table = array_of_rows
@mode = :col_or_row
end
-
+
# The current access mode for indexing and iteration.
attr_reader :mode
-
+
# Internal data format used to compare equality.
attr_reader :table
protected :table
@@ -542,88 +542,88 @@ class CSV
extend Forwardable
def_delegators :@table, :empty?, :length, :size
-
- #
- # Returns a duplicate table object, in column mode. This is handy for
- # chaining in a single call without changing the table mode, but be aware
+
+ #
+ # Returns a duplicate table object, in column mode. This is handy for
+ # chaining in a single call without changing the table mode, but be aware
# that this method can consume a fair amount of memory for bigger data sets.
- #
+ #
# This method returns the duplicate table for chaining. Don't chain
# destructive methods (like []=()) this way though, since you are working
# with a duplicate.
- #
+ #
def by_col
self.class.new(@table.dup).by_col!
end
-
- #
+
+ #
# Switches the mode of this table to column mode. All calls to indexing and
# iteration methods will work with columns until the mode is changed again.
- #
+ #
# This method returns the table and is safe to chain.
- #
+ #
def by_col!
@mode = :col
-
+
self
end
-
- #
- # Returns a duplicate table object, in mixed mode. This is handy for
- # chaining in a single call without changing the table mode, but be aware
+
+ #
+ # Returns a duplicate table object, in mixed mode. This is handy for
+ # chaining in a single call without changing the table mode, but be aware
# that this method can consume a fair amount of memory for bigger data sets.
- #
+ #
# This method returns the duplicate table for chaining. Don't chain
# destructive methods (like []=()) this way though, since you are working
# with a duplicate.
- #
+ #
def by_col_or_row
self.class.new(@table.dup).by_col_or_row!
end
-
- #
+
+ #
# Switches the mode of this table to mixed mode. All calls to indexing and
# iteration methods will use the default intelligent indexing system until
# the mode is changed again. In mixed mode an index is assumed to be a row
# reference while anything else is assumed to be column access by headers.
- #
+ #
# This method returns the table and is safe to chain.
- #
+ #
def by_col_or_row!
@mode = :col_or_row
-
+
self
end
-
- #
+
+ #
# Returns a duplicate table object, in row mode. This is handy for chaining
# in a single call without changing the table mode, but be aware that this
# method can consume a fair amount of memory for bigger data sets.
- #
+ #
# This method returns the duplicate table for chaining. Don't chain
# destructive methods (like []=()) this way though, since you are working
# with a duplicate.
- #
+ #
def by_row
self.class.new(@table.dup).by_row!
end
-
- #
+
+ #
# Switches the mode of this table to row mode. All calls to indexing and
# iteration methods will work with rows until the mode is changed again.
- #
+ #
# This method returns the table and is safe to chain.
- #
+ #
def by_row!
@mode = :row
-
+
self
end
-
- #
+
+ #
# Returns the headers for the first row of this table (assumed to match all
# other rows). An empty Array is returned for empty tables.
- #
+ #
def headers
if @table.empty?
Array.new
@@ -631,15 +631,15 @@ class CSV
@table.first.headers
end
end
-
- #
+
+ #
# In the default mixed mode, this method returns rows for index access and
# columns for header access. You can force the index association by first
# calling by_col!() or by_row!().
- #
+ #
# Columns are returned as an Array of values. Altering that Array has no
# effect on the table.
- #
+ #
def [](index_or_header)
if @mode == :row or # by index
(@mode == :col_or_row and index_or_header.is_a? Integer)
@@ -648,23 +648,23 @@ class CSV
@table.map { |row| row[index_or_header] }
end
end
-
- #
+
+ #
# In the default mixed mode, this method assigns rows for index access and
# columns for header access. You can force the index association by first
# calling by_col!() or by_row!().
- #
+ #
# Rows may be set to an Array of values (which will inherit the table's
# headers()) or a CSV::Row.
- #
- # Columns may be set to a single value, which is copied to each row of the
+ #
+ # Columns may be set to a single value, which is copied to each row of the
# column, or an Array of values. Arrays of values are assigned to rows top
# to bottom in row major order. Excess values are ignored and if the Array
# does not have a value for each row the extra rows will receive a +nil+.
- #
+ #
# Assigning to an existing column or row clobbers the data. Assigning to
# new columns creates them at the right end of the table.
- #
+ #
def []=(index_or_header, value)
if @mode == :row or # by index
(@mode == :col_or_row and index_or_header.is_a? Integer)
@@ -693,16 +693,16 @@ class CSV
end
end
end
-
- #
+
+ #
# The mixed mode default is to treat a list of indices as row access,
# returning the rows indicated. Anything else is considered columnar
# access. For columnar access, the return set has an Array for each row
# with the values indicated by the headers in each Array. You can force
# column or row mode using by_col!() or by_row!().
- #
+ #
# You cannot mix column and row access.
- #
+ #
def values_at(*indices_or_headers)
if @mode == :row or # by indices
( @mode == :col_or_row and indices_or_headers.all? do |index|
@@ -717,41 +717,41 @@ class CSV
end
end
- #
+ #
# Adds a new row to the bottom end of this table. You can provide an Array,
# which will be converted to a CSV::Row (inheriting the table's headers()),
# or a CSV::Row.
- #
+ #
# This method returns the table for chaining.
- #
+ #
def <<(row_or_array)
if row_or_array.is_a? Array # append Array
@table << Row.new(headers, row_or_array)
else # append Row
@table << row_or_array
end
-
+
self # for chaining
end
-
- #
+
+ #
# A shortcut for appending multiple rows. Equivalent to:
- #
+ #
# rows.each { |row| self << row }
- #
+ #
# This method returns the table for chaining.
- #
+ #
def push(*rows)
rows.each { |row| self << row }
-
+
self # for chaining
end
- #
+ #
# Removes and returns the indicated column or row. In the default mixed
# mode indices refer to rows and everything else is assumed to be a column
# header. Use by_col!() or by_row!() to force the lookup.
- #
+ #
def delete(index_or_header)
if @mode == :row or # by index
(@mode == :col_or_row and index_or_header.is_a? Integer)
@@ -760,15 +760,15 @@ class CSV
@table.map { |row| row.delete(index_or_header).last }
end
end
-
- #
+
+ #
# Removes any column or row for which the block returns +true+. In the
# default mixed mode or row mode, iteration is the standard row major
# walking of rows. In column mode, interation will +yield+ two element
# tuples containing the column name and an Array of values for that column.
- #
+ #
# This method returns the table for chaining.
- #
+ #
def delete_if(&block)
if @mode == :row or @mode == :col_or_row # by index
@table.delete_if(&block)
@@ -779,38 +779,38 @@ class CSV
end
to_delete.map { |header| delete(header) }
end
-
+
self # for chaining
end
-
+
include Enumerable
-
- #
+
+ #
# In the default mixed mode or row mode, iteration is the standard row major
# walking of rows. In column mode, interation will +yield+ two element
# tuples containing the column name and an Array of values for that column.
- #
+ #
# This method returns the table for chaining.
- #
+ #
def each(&block)
if @mode == :col
headers.each { |header| block[[header, self[header]]] }
else
@table.each(&block)
end
-
+
self # for chaining
end
-
+
# Returns +true+ if all rows of this table ==() +other+'s rows.
def ==(other)
@table == other.table
end
-
- #
+
+ #
# Returns the table as an Array of Arrays. Headers will be the first row,
# then all of the field rows will follow.
- #
+ #
def to_a
@table.inject([headers]) do |array, row|
if row.header_row?
@@ -820,11 +820,11 @@ class CSV
end
end
end
-
- #
+
+ #
# Returns the table as a complete CSV String. Headers will be listed first,
# then all of the field rows.
- #
+ #
def to_csv(options = Hash.new)
@table.inject([headers.to_csv(options)]) do |rows, row|
if row.header_row?
@@ -835,7 +835,7 @@ class CSV
end.join
end
alias_method :to_s, :to_csv
-
+
# Shows the mode and size of this table in a US-ASCII String.
def inspect
"#<#{self.class} mode:#{@mode} row_count:#{to_a.size}>".encode("US-ASCII")
@@ -844,19 +844,19 @@ class CSV
# The error thrown when the parser encounters illegal CSV formatting.
class MalformedCSVError < RuntimeError; end
-
- #
+
+ #
# A FieldInfo Struct contains details about a field's position in the data
# source it was read from. CSV will pass this Struct to some blocks that make
# decisions based on field structure. See CSV.convert_fields() for an
# example.
- #
+ #
# <b><tt>index</tt></b>:: The zero-based index of the field in its row.
# <b><tt>line</tt></b>:: The line of the data source this row is from.
# <b><tt>header</tt></b>:: The header for the column, when available.
- #
+ #
FieldInfo = Struct.new(:index, :line, :header)
-
+
# A Regexp used to find and convert some common Date formats.
DateMatcher = / \A(?: (\w+,?\s+)?\w+\s+\d{1,2},?\s+\d{2,4} |
\d{4}-\d{2}-\d{2} )\z /x
@@ -864,34 +864,34 @@ class CSV
DateTimeMatcher =
/ \A(?: (\w+,?\s+)?\w+\s+\d{1,2}\s+\d{1,2}:\d{1,2}:\d{1,2},?\s+\d{2,4} |
\d{4}-\d{2}-\d{2}\s\d{2}:\d{2}:\d{2} )\z /x
-
+
# The encoding used by all converters.
ConverterEncoding = Encoding.find("UTF-8")
-
- #
+
+ #
# This Hash holds the built-in converters of CSV that can be accessed by name.
# You can select Converters with CSV.convert() or through the +options+ Hash
# passed to CSV::new().
- #
+ #
# <b><tt>:integer</tt></b>:: Converts any field Integer() accepts.
# <b><tt>:float</tt></b>:: Converts any field Float() accepts.
- # <b><tt>:numeric</tt></b>:: A combination of <tt>:integer</tt>
+ # <b><tt>:numeric</tt></b>:: A combination of <tt>:integer</tt>
# and <tt>:float</tt>.
# <b><tt>:date</tt></b>:: Converts any field Date::parse() accepts.
# <b><tt>:date_time</tt></b>:: Converts any field DateTime::parse() accepts.
- # <b><tt>:all</tt></b>:: All built-in converters. A combination of
+ # <b><tt>:all</tt></b>:: All built-in converters. A combination of
# <tt>:date_time</tt> and <tt>:numeric</tt>.
- #
+ #
# All built-in converters transcode field data to UTF-8 before attempting a
# conversion. If your data cannot be transcoded to UTF-8 the conversion will
# fail and the field will remain unchanged.
- #
+ #
# This Hash is intentionally left unfrozen and users should feel free to add
# values to it that can be accessed by all CSV objects.
- #
+ #
# To add a combo field, the value should be an Array of names. Combo fields
# can be nested with other combo fields.
- #
+ #
Converters = { integer: lambda { |f|
Integer(f.encode(ConverterEncoding)) rescue f
},
@@ -917,26 +917,26 @@ class CSV
},
all: [:date_time, :numeric] }
- #
+ #
# This Hash holds the built-in header converters of CSV that can be accessed
# by name. You can select HeaderConverters with CSV.header_convert() or
# through the +options+ Hash passed to CSV::new().
- #
+ #
# <b><tt>:downcase</tt></b>:: Calls downcase() on the header String.
# <b><tt>:symbol</tt></b>:: The header String is downcased, spaces are
# replaced with underscores, non-word characters
# are dropped, and finally to_sym() is called.
- #
+ #
# All built-in header converters transcode header data to UTF-8 before
# attempting a conversion. If your data cannot be transcoded to UTF-8 the
# conversion will fail and the header will remain unchanged.
- #
+ #
# This Hash is intetionally left unfrozen and users should feel free to add
# values to it that can be accessed by all CSV objects.
- #
+ #
# To add a combo field, the value should be an Array of names. Combo fields
# can be nested with other combo fields.
- #
+ #
HeaderConverters = {
downcase: lambda { |h| h.encode(ConverterEncoding).downcase },
symbol: lambda { |h|
@@ -944,10 +944,10 @@ class CSV
gsub(/\W+/, "").to_sym
}
}
-
- #
+
+ #
# The options used when no overrides are given by calling code. They are:
- #
+ #
# <b><tt>:col_sep</tt></b>:: <tt>","</tt>
# <b><tt>:row_sep</tt></b>:: <tt>:auto</tt>
# <b><tt>:quote_char</tt></b>:: <tt>'"'</tt>
@@ -959,10 +959,10 @@ class CSV
# <b><tt>:header_converters</tt></b>:: +nil+
# <b><tt>:skip_blanks</tt></b>:: +false+
# <b><tt>:force_quotes</tt></b>:: +false+
- #
+ #
DEFAULT_OPTIONS = { col_sep: ",",
row_sep: :auto,
- quote_char: '"',
+ quote_char: '"',
field_size_limit: nil,
converters: nil,
unconverted_fields: nil,
@@ -971,21 +971,21 @@ class CSV
header_converters: nil,
skip_blanks: false,
force_quotes: false }.freeze
-
- #
+
+ #
# This method will return a CSV instance, just like CSV::new(), but the
# instance will be cached and returned for all future calls to this method for
# the same +data+ object (tested by Object#object_id()) with the same
# +options+.
- #
+ #
# If a block is given, the instance is passed to the block and the return
# value becomes the return value of the block.
- #
+ #
def self.instance(data = $stdout, options = Hash.new)
# create a _signature_ for this method call, data object and options
sig = [data.object_id] +
options.values_at(*DEFAULT_OPTIONS.keys.sort_by { |sym| sym.to_s })
-
+
# fetch or create the instance for this signature
@@instances ||= Hash.new
instance = (@@instances[sig] ||= new(data, options))
@@ -997,25 +997,25 @@ class CSV
end
end
- #
+ #
# This method allows you to serialize an Array of Ruby objects to a String or
# File of CSV data. This is not as powerful as Marshal or YAML, but perhaps
# useful for spreadsheet and database interaction.
- #
+ #
# Out of the box, this method is intended to work with simple data objects or
# Structs. It will serialize a list of instance variables and/or
# Struct.members().
- #
+ #
# If you need need more complicated serialization, you can control the process
# by adding methods to the class to be serialized.
- #
+ #
# A class method csv_meta() is responsible for returning the first row of the
# document (as an Array). This row is considered to be a Hash of the form
# key_1,value_1,key_2,value_2,... CSV::load() expects to find a class key
# with a value of the stringified class name and CSV::dump() will create this,
# if you do not define this method. This method is only called on the first
# object of the Array.
- #
+ #
# The next method you can provide is an instance method called csv_headers().
# This method is expected to return the second line of the document (again as
# an Array), which is to be used to give each column a header. By default,
@@ -1023,20 +1023,20 @@ class CSV
# @ character or call send() passing the header as the method name and
# the field value as an argument. This method is only called on the first
# object of the Array.
- #
+ #
# Finally, you can provide an instance method called csv_dump(), which will
# be passed the headers. This should return an Array of fields that can be
# serialized for this object. This method is called once for every object in
# the Array.
- #
+ #
# The +io+ parameter can be used to serialize to a File, and +options+ can be
# anything CSV::new() accepts.
- #
+ #
def self.dump(ary_of_objs, io = "", options = Hash.new)
obj_template = ary_of_objs.first
-
+
csv = new(io, options)
-
+
# write meta information
begin
csv << obj_template.class.csv_meta
@@ -1054,7 +1054,7 @@ class CSV
end
end
csv << headers
-
+
# serialize each object
ary_of_objs.each do |obj|
begin
@@ -1069,39 +1069,39 @@ class CSV
end
end
end
-
+
if io.is_a? String
csv.string
else
csv.close
end
end
-
- #
+
+ #
# This method is the reading counterpart to CSV::dump(). See that method for
# a detailed description of the process.
- #
- # You can customize loading by adding a class method called csv_load() which
+ #
+ # You can customize loading by adding a class method called csv_load() which
# will be passed a Hash of meta information, an Array of headers, and an Array
# of fields for the object the method is expected to return.
- #
+ #
# Remember that all fields will be Strings after this load. If you need
# something else, use +options+ to setup converters or provide a custom
# csv_load() implementation.
- #
+ #
def self.load(io_or_str, options = Hash.new)
csv = new(io_or_str, options)
-
+
# load meta information
meta = Hash[*csv.shift]
cls = meta["class".encode(csv.encoding)].split("::".encode(csv.encoding)).
inject(Object) do |c, const|
c.const_get(const)
end
-
+
# load headers
headers = csv.shift
-
+
# unserialize each object stored in the file
results = csv.inject(Array.new) do |all, row|
begin
@@ -1118,36 +1118,36 @@ class CSV
end
all << obj
end
-
+
csv.close unless io_or_str.is_a? String
-
+
results
end
-
- #
+
+ #
# :call-seq:
# filter( options = Hash.new ) { |row| ... }
# filter( input, options = Hash.new ) { |row| ... }
# filter( input, output, options = Hash.new ) { |row| ... }
- #
+ #
# This method is a convenience for building Unix-like filters for CSV data.
- # Each row is yielded to the provided block which can alter it as needed.
+ # Each row is yielded to the provided block which can alter it as needed.
# After the block returns, the row is appended to +output+ altered or not.
- #
+ #
# The +input+ and +output+ arguments can be anything CSV::new() accepts
- # (generally String or IO objects). If not given, they default to
+ # (generally String or IO objects). If not given, they default to
# <tt>ARGF</tt> and <tt>$stdout</tt>.
- #
+ #
# The +options+ parameter is also filtered down to CSV::new() after some
- # clever key parsing. Any key beginning with <tt>:in_</tt> or
+ # clever key parsing. Any key beginning with <tt>:in_</tt> or
# <tt>:input_</tt> will have that leading identifier stripped and will only
# be used in the +options+ Hash for the +input+ object. Keys starting with
- # <tt>:out_</tt> or <tt>:output_</tt> affect only +output+. All other keys
+ # <tt>:out_</tt> or <tt>:output_</tt> affect only +output+. All other keys
# are assigned to both objects.
- #
+ #
# The <tt>:output_row_sep</tt> +option+ defaults to
# <tt>$INPUT_RECORD_SEPARATOR</tt> (<tt>$/</tt>).
- #
+ #
def self.filter(*args)
# parse options for input, output, or both
in_options, out_options = Hash.new, {row_sep: $INPUT_RECORD_SEPARATOR}
@@ -1167,19 +1167,19 @@ class CSV
# build input and output wrappers
input = new(args.shift || ARGF, in_options)
output = new(args.shift || $stdout, out_options)
-
+
# read, yield, write
input.each do |row|
yield row
output << row
end
end
-
- #
+
+ #
# This method is intended as the primary interface for reading CSV files. You
# pass a +path+ and any +options+ you wish to set for the read. Each row of
# file will be passed to the provided +block+ in turn.
- #
+ #
# The +options+ parameter can be anything CSV::new() understands. This method
# also understands an additional <tt>:encoding</tt> parameter that you can use
# to specify the Encoding of the data in the file to be read. You must provide
@@ -1188,7 +1188,7 @@ class CSV
# have the data transcoded as it is read. For example,
# <tt>encoding: "UTF-32BE:UTF-8"</tt> would read UTF-32BE data from the file
# but transcode it to UTF-8 before CSV parses it.
- #
+ #
def self.foreach(path, options = Hash.new, &block)
encoding = options.delete(:encoding)
mode = "rb"
@@ -1198,24 +1198,24 @@ class CSV
end
end
- #
+ #
# :call-seq:
# generate( str, options = Hash.new ) { |csv| ... }
# generate( options = Hash.new ) { |csv| ... }
- #
- # This method wraps a String you provide, or an empty default String, in a
+ #
+ # This method wraps a String you provide, or an empty default String, in a
# CSV object which is passed to the provided block. You can use the block to
# append CSV rows to the String and when the block exits, the final String
# will be returned.
- #
+ #
# Note that a passed String *is* modfied by this method. Call dup() before
# passing if you need a new String.
- #
+ #
# The +options+ parameter can be anthing CSV::new() understands. This method
# understands an additional <tt>:encoding</tt> parameter when not passed a
# String to set the base Encoding for the output. CSV needs this hint if you
# plan to output non-ASCII compatible data.
- #
+ #
def self.generate(*args)
# add a default empty String, if none was given
if args.first.is_a? String
@@ -1233,19 +1233,19 @@ class CSV
csv.string # return final String
end
- #
- # This method is a shortcut for converting a single row (Array) into a CSV
+ #
+ # This method is a shortcut for converting a single row (Array) into a CSV
# String.
- #
+ #
# The +options+ parameter can be anthing CSV::new() understands. This method
- # understands an additional <tt>:encoding</tt> parameter to set the base
+ # understands an additional <tt>:encoding</tt> parameter to set the base
# Encoding for the output. This method will try to guess your Encoding from
# the first non-+nil+ field in +row+, if possible, but you may need to use
# this parameter as a backup plan.
- #
+ #
# The <tt>:row_sep</tt> +option+ defaults to <tt>$INPUT_RECORD_SEPARATOR</tt>
# (<tt>$/</tt>) when calling this method.
- #
+ #
def self.generate_line(row, options = Hash.new)
options = {row_sep: $INPUT_RECORD_SEPARATOR}.merge(options)
encoding = options.delete(:encoding)
@@ -1257,27 +1257,27 @@ class CSV
end
(new(str, options) << row).string
end
-
- #
+
+ #
# :call-seq:
# open( filename, mode = "rb", options = Hash.new ) { |faster_csv| ... }
# open( filename, options = Hash.new ) { |faster_csv| ... }
# open( filename, mode = "rb", options = Hash.new )
# open( filename, options = Hash.new )
- #
+ #
# This method opens an IO object, and wraps that with CSV. This is intended
# as the primary interface for writing a CSV file.
- #
+ #
# You must pass a +filename+ and may optionally add a +mode+ for Ruby's
# open(). You may also pass an optional Hash containing any +options+
# CSV::new() understands as the final argument.
- #
+ #
# This method works like Ruby's open() call, in that it will pass a CSV object
# to a provided block and close it when the block terminates, or it will
# return the CSV object when no block is provided. (*Note*: This is different
# from the Ruby 1.8 CSV library which passed rows to the block. Use
# CSV::foreach() for that behavior.)
- #
+ #
# You must provide a +mode+ with an embedded Encoding designator unless your
# data is in Encoding::default_external(). CSV will check the Encoding of the
# underlying IO object (set by the +mode+ you pass) to deterime how to parse
@@ -1285,10 +1285,10 @@ class CSV
# it is read just as you can with a normal call to IO::open(). For example,
# <tt>"rb:UTF-32BE:UTF-8"</tt> would read UTF-32BE data from the file but
# transcode it to UTF-8 before CSV parses it.
- #
+ #
# An opened CSV object will delegate to many IO methods for convenience. You
# may call:
- #
+ #
# * binmode()
# * binmode?()
# * close()
@@ -1320,7 +1320,7 @@ class CSV
# * to_io()
# * truncate()
# * tty?()
- #
+ #
def self.open(*args)
# find the +options+ Hash
options = if args.last.is_a? Hash then args.pop else Hash.new end
@@ -1328,7 +1328,7 @@ class CSV
args << "rb" if args.size == 1
# wrap a File opened with the remaining +args+
csv = new(File.open(*args), options)
-
+
# handle blocks like Ruby's open(), not like the CSV library
if block_given?
begin
@@ -1340,19 +1340,19 @@ class CSV
csv
end
end
-
- #
+
+ #
# :call-seq:
# parse( str, options = Hash.new ) { |row| ... }
# parse( str, options = Hash.new )
- #
+ #
# This method can be used to easily parse CSV out of a String. You may either
# provide a +block+ which will be called with each row of the String in turn,
# or just use the returned Array of Arrays (when no +block+ is given).
- #
+ #
# You pass your +str+ to read from, and an optional +options+ Hash containing
# anything CSV::new() understands.
- #
+ #
def self.parse(*args, &block)
csv = new(*args)
if block.nil? # slurp contents, if no block is given
@@ -1365,20 +1365,20 @@ class CSV
csv.each(&block)
end
end
-
- #
- # This method is a shortcut for converting a single line of a CSV String into
- # a into an Array. Note that if +line+ contains multiple rows, anything
+
+ #
+ # This method is a shortcut for converting a single line of a CSV String into
+ # a into an Array. Note that if +line+ contains multiple rows, anything
# beyond the first row is ignored.
- #
+ #
# The +options+ parameter can be anthing CSV::new() understands.
- #
+ #
def self.parse_line(line, options = Hash.new)
new(line, options).shift
end
-
- #
- # Use to slurp a CSV file into an Array of Arrays. Pass the +path+ to the
+
+ #
+ # Use to slurp a CSV file into an Array of Arrays. Pass the +path+ to the
# file and any +options+ CSV::new() understands. This method also understands
# an additional <tt>:encoding</tt> parameter that you can use to specify the
# Encoding of the data in the file to be read. You must provide this unless
@@ -1387,46 +1387,46 @@ class CSV
# transcoded as it is read. For example,
# <tt>encoding: "UTF-32BE:UTF-8"</tt> would read UTF-32BE data from the file
# but transcode it to UTF-8 before CSV parses it.
- #
+ #
def self.read(path, options = Hash.new)
encoding = options.delete(:encoding)
mode = "rb"
mode << ":#{encoding}" if encoding
open(path, mode, options) { |csv| csv.read }
end
-
+
# Alias for CSV::read().
def self.readlines(*args)
read(*args)
end
-
- #
+
+ #
# A shortcut for:
- #
+ #
# CSV.read( path, { headers: true,
# converters: :numeric,
# header_converters: :symbol }.merge(options) )
- #
+ #
def self.table(path, options = Hash.new)
read( path, { headers: true,
converters: :numeric,
header_converters: :symbol }.merge(options) )
end
-
- #
+
+ #
# This constructor will wrap either a String or IO object passed in +data+ for
# reading and/or writing. In addition to the CSV instance methods, several IO
# methods are delegated. (See CSV::open() for a complete list.) If you pass
# a String for +data+, you can later retrieve it (after writing to it, for
# example) with CSV.string().
- #
- # Note that a wrapped String will be positioned at at the beginning (for
+ #
+ # Note that a wrapped String will be positioned at at the beginning (for
# reading). If you want it at the end (for writing), use CSV::generate().
# If you want any other positioning, pass a preset StringIO object instead.
- #
- # You may set any reading and/or writing preferences in the +options+ Hash.
+ #
+ # You may set any reading and/or writing preferences in the +options+ Hash.
# Available options are:
- #
+ #
# <b><tt>:col_sep</tt></b>:: The String placed between each field.
# This String will be transcoded into
# the data's Encoding before parsing.
@@ -1501,7 +1501,7 @@ class CSV
# Array or String were not fields of the
# document and thus will have an empty
# Array attached.
- # <b><tt>:headers</tt></b>:: If set to <tt>:first_row</tt> or
+ # <b><tt>:headers</tt></b>:: If set to <tt>:first_row</tt> or
# +true+, the initial row of the CSV
# file will be treated as a row of
# headers. If set to an Array, the
@@ -1538,16 +1538,16 @@ class CSV
# skip over any rows with no content.
# <b><tt>:force_quotes</tt></b>:: When set to a +true+ value, CSV will
# quote all CSV fields it creates.
- #
+ #
# See CSV::DEFAULT_OPTIONS for the default settings.
- #
+ #
# Options cannot be overriden in the instance methods for performance reasons,
# so be sure to set what you want here.
- #
+ #
def initialize(data, options = Hash.new)
# build the options for this read/write
options = DEFAULT_OPTIONS.merge(options)
-
+
# create the IO object we will read from
@io = if data.is_a? String then StringIO.new(data) else data end
# honor the IO encoding if we can, otherwise default to ASCII-8BIT
@@ -1557,110 +1557,110 @@ class CSV
@io.string.encoding
end
@encoding ||= Encoding.default_internal || Encoding.default_external
- #
+ #
# prepare for building safe regular expressions in the target encoding,
# if we can transcode the needed characters
- #
+ #
@re_esc = "\\".encode(@encoding) rescue ""
@re_chars = %w[ \\ . [ ] - ^ $ ?
* + { } ( ) | #
\ \r \n \t \f \v ].
map { |s| s.encode(@encoding) rescue nil }.compact
-
+
init_separators(options)
init_parsers(options)
init_converters(options)
init_headers(options)
-
+
unless options.empty?
raise ArgumentError, "Unknown options: #{options.keys.join(', ')}."
end
-
+
# track our own lineno since IO gets confused about line-ends is CSV fields
@lineno = 0
end
-
- #
+
+ #
# The encoded <tt>:col_sep</tt> used in parsing and writing. See CSV::new
# for details.
- #
+ #
attr_reader :col_sep
- #
+ #
# The encoded <tt>:row_sep</tt> used in parsing and writing. See CSV::new
# for details.
- #
+ #
attr_reader :row_sep
- #
+ #
# The encoded <tt>:quote_char</tt> used in parsing and writing. See CSV::new
# for details.
- #
+ #
attr_reader :quote_char
# The limit for field size, if any. See CSV::new for details.
attr_reader :field_size_limit
- #
+ #
# Returns the current list of converters in effect. See CSV::new for details.
# Built-in converters will be returned by name, while others will be returned
# as is.
- #
+ #
def converters
@converters.map do |converter|
name = Converters.rassoc(converter)
name ? name.first : converter
end
end
- #
+ #
# Returns +true+ if unconverted_fields() to parsed results. See CSV::new
# for details.
- #
+ #
def unconverted_fields?() @unconverted_fields end
- #
+ #
# Returns +nil+ if headers will not be used, +true+ if they will but have not
# yet been read, or the actual headers after they have been read. See
# CSV::new for details.
- #
+ #
def headers
@headers || true if @use_headers
end
- #
+ #
# Returns +true+ if headers will be returned as a row of results.
# See CSV::new for details.
- #
+ #
def return_headers?() @return_headers end
# Returns +true+ if headers are written in output. See CSV::new for details.
def write_headers?() @write_headers end
- #
+ #
# Returns the current list of converters in effect for headers. See CSV::new
# for details. Built-in converters will be returned by name, while others
# will be returned as is.
- #
+ #
def header_converters
@header_converters.map do |converter|
name = HeaderConverters.rassoc(converter)
name ? name.first : converter
end
end
- #
+ #
# Returns +true+ blank lines are skipped by the parser. See CSV::new
# for details.
- #
+ #
def skip_blanks?() @skip_blanks end
# Returns +true+ if all output fields are quoted. See CSV::new for details.
def force_quotes?() @force_quotes end
-
- #
+
+ #
# The Encoding CSV is parsing or writing in. This will be the Encoding you
# receive parsed data in and/or the Encoding data will be written in.
- #
+ #
attr_reader :encoding
-
- #
- # The line number of the last row read from this file. Fields with nested
+
+ #
+ # The line number of the last row read from this file. Fields with nested
# line-end characters will not affect this count.
- #
+ #
attr_reader :lineno
-
+
### IO and StringIO Delegation ###
-
+
extend Forwardable
def_delegators :@io, :binmode, :binmode?, :close, :close_read, :close_write,
:closed?, :eof, :eof?, :external_encoding, :fcntl,
@@ -1668,31 +1668,31 @@ class CSV
:ioctl, :isatty, :path, :pid, :pos, :pos=, :reopen,
:seek, :stat, :string, :sync, :sync=, :tell, :to_i,
:to_io, :truncate, :tty?
-
+
# Rewinds the underlying IO object and resets CSV's lineno() counter.
def rewind
@headers = nil
@lineno = 0
-
+
@io.rewind
end
### End Delegation ###
-
- #
+
+ #
# The primary write method for wrapped Strings and IOs, +row+ (an Array or
# CSV::Row) is converted to CSV and appended to the data source. When a
# CSV::Row is passed, only the row's fields() are appended to the output.
- #
+ #
# The data source must be open for writing.
- #
+ #
def <<(row)
# make sure headers have been assigned
if header_row? and [Array, String].include? @use_headers.class
parse_headers # won't read data for Array or String
self << @headers if @write_headers
end
-
+
# handle CSV::Row objects and Hashes
row = case row
when self.class::Row then row.fields
@@ -1704,69 +1704,69 @@ class CSV
@lineno += 1
@io << row.map(&@quote).join(@col_sep) + @row_sep # quote and separate
-
+
self # for chaining
end
alias_method :add_row, :<<
alias_method :puts, :<<
-
- #
+
+ #
# :call-seq:
# convert( name )
# convert { |field| ... }
# convert { |field, field_info| ... }
- #
+ #
# You can use this method to install a CSV::Converters built-in, or provide a
# block that handles a custom conversion.
- #
+ #
# If you provide a block that takes one argument, it will be passed the field
# and is expected to return the converted value or the field itself. If your
- # block takes two arguments, it will also be passed a CSV::FieldInfo Struct,
- # containing details about the field. Again, the block should return a
+ # block takes two arguments, it will also be passed a CSV::FieldInfo Struct,
+ # containing details about the field. Again, the block should return a
# converted field or the field itself.
- #
+ #
def convert(name = nil, &converter)
add_converter(:converters, self.class::Converters, name, &converter)
end
- #
+ #
# :call-seq:
# header_convert( name )
# header_convert { |field| ... }
# header_convert { |field, field_info| ... }
- #
+ #
# Identical to CSV#convert(), but for header rows.
- #
+ #
# Note that this method must be called before header rows are read to have any
# effect.
- #
+ #
def header_convert(name = nil, &converter)
add_converter( :header_converters,
self.class::HeaderConverters,
name,
&converter )
end
-
+
include Enumerable
-
- #
+
+ #
# Yields each row of the data source in turn.
- #
+ #
# Support for Enumerable.
- #
+ #
# The data source must be open for reading.
- #
+ #
def each
while row = shift
yield row
end
end
-
- #
+
+ #
# Slurps the remaining rows and returns an Array of Arrays.
- #
+ #
# The data source must be open for reading.
- #
+ #
def read
rows = to_a
if @use_headers
@@ -1776,25 +1776,25 @@ class CSV
end
end
alias_method :readlines, :read
-
+
# Returns +true+ if the next row read will be a header row.
def header_row?
@use_headers and @headers.nil?
end
-
- #
+
+ #
# The primary read method for wrapped Strings and IOs, a single row is pulled
# from the data source, parsed and returned as an Array of fields (if header
# rows are not used) or a CSV::Row (when header rows are used).
- #
+ #
# The data source must be open for reading.
- #
+ #
def shift
#########################################################################
### This method is purposefully kept a bit long as simple conditional ###
### checks are faster than numerous (expensive) method calls. ###
#########################################################################
-
+
# handle headers not based on document content
if header_row? and @return_headers and
[Array, String].include? @use_headers.class
@@ -1804,25 +1804,25 @@ class CSV
return parse_headers
end
end
-
+
# begin with a blank line, so we can always add to it
line = ""
- #
+ #
# it can take multiple calls to <tt>@io.gets()</tt> to get a full line,
# because of \r and/or \n characters embedded in quoted fields
- #
+ #
loop do
# add another read to the line
(line += @io.gets(@row_sep)) rescue return nil
# copy the line so we can chop it up in parsing
parse = line.dup
parse.sub!(@parsers[:line_end], "")
-
- #
+
+ #
# I believe a blank line should be an <tt>Array.new</tt>, not Ruby 1.8
# CSV's <tt>[nil]</tt>
- #
+ #
if parse.empty?
@lineno += 1
if @skip_blanks
@@ -1837,19 +1837,19 @@ class CSV
end
end
- #
- # shave leading empty fields if needed, because the main parser chokes
+ #
+ # shave leading empty fields if needed, because the main parser chokes
# on these
- #
+ #
csv = if parse.sub!(@parsers[:leading_fields], "")
[nil] * ($&.length / @col_sep.length)
else
Array.new
end
- #
- # then parse the main fields with a hyper-tuned Regexp from
+ #
+ # then parse the main fields with a hyper-tuned Regexp from
# Mastering Regular Expressions, Second Edition
- #
+ #
parse.gsub!(@parsers[:csv_row]) do
csv << if $1.nil? # we found an unquoted field
if $2.empty? # switch empty unquoted fields to +nil+...
@@ -1903,11 +1903,11 @@ class CSV
end
alias_method :gets, :shift
alias_method :readline, :shift
-
- #
+
+ #
# Returns a simplified description of the key FasterCSV attributes in an
# ASCII compatible String.
- #
+ #
def inspect
str = ["<#", self.class.to_s, " io_type:"]
# show type of wrapped IO
@@ -1942,19 +1942,19 @@ class CSV
end.join
end
end
-
+
private
-
- #
+
+ #
# Stores the indicated separators for later use.
- #
+ #
# If auto-discovery was requested for <tt>@row_sep</tt>, this method will read
# ahead in the <tt>@io</tt> and try to find one. +ARGF+, +STDIN+, +STDOUT+,
# +STDERR+ and any stream open for output only with a default
# <tt>@row_sep</tt> of <tt>$INPUT_RECORD_SEPARATOR</tt> (<tt>$/</tt>).
- #
+ #
# This method also establishes the quoting rules used for CSV output.
- #
+ #
def init_separators(options)
# store the selected separators
@col_sep = options.delete(:col_sep).to_s.encode(@encoding)
@@ -1964,11 +1964,11 @@ class CSV
if @quote_char.length != 1
raise ArgumentError, ":quote_char has to be a single character String"
end
-
- #
+
+ #
# automatically discover row separator when requested
# (not fully encoding safe)
- #
+ #
if @row_sep == :auto
if [ARGF, STDIN, STDOUT, STDERR].include?(@io) or
(defined?(Zlib) and @io.class == Zlib::GzipWriter)
@@ -1977,20 +1977,20 @@ class CSV
begin
saved_pos = @io.pos # remember where we were
while @row_sep == :auto
- #
- # if we run out of data, it's probably a single line
+ #
+ # if we run out of data, it's probably a single line
# (use a sensible default)
- #
+ #
if @io.eof?
@row_sep = $INPUT_RECORD_SEPARATOR
break
end
-
+
# read ahead a bit
sample = read_to_char(1024)
sample += read_to_char(1) if sample[-1..-1] == encode_str("\r") and
not @io.eof?
-
+
# try to find a standard separator
if sample =~ encode_re("\r\n?|\n")
@row_sep = $&
@@ -2011,7 +2011,7 @@ class CSV
end
end
@row_sep = @row_sep.to_s.encode(@encoding)
-
+
# establish quoting rules
@force_quotes = options.delete(:force_quotes)
do_quote = lambda do |field|
@@ -2039,13 +2039,13 @@ class CSV
end
end
end
-
+
# Pre-compiles parsers and stores them by name for access during reads.
def init_parsers(options)
# store the parser behaviors
@skip_blanks = options.delete(:skip_blanks)
@field_size_limit = options.delete(:field_size_limit)
-
+
# prebuild Regexps for faster parsing
esc_col_sep = escape_re(@col_sep)
esc_row_sep = escape_re(@row_sep)
@@ -2084,27 +2084,27 @@ class CSV
return_newline: encode_str("\r\n")
}
end
-
- #
+
+ #
# Loads any converters requested during construction.
- #
+ #
# If +field_name+ is set <tt>:converters</tt> (the default) field converters
# are set. When +field_name+ is <tt>:header_converters</tt> header converters
# are added instead.
- #
- # The <tt>:unconverted_fields</tt> option is also actived for
+ #
+ # The <tt>:unconverted_fields</tt> option is also actived for
# <tt>:converters</tt> calls, if requested.
- #
+ #
def init_converters(options, field_name = :converters)
if field_name == :converters
@unconverted_fields = options.delete(:unconverted_fields)
end
instance_variable_set("@#{field_name}", Array.new)
-
+
# find the correct method to add the converters
convert = method(field_name.to_s.sub(/ers\Z/, ""))
-
+
# load converters
unless options[field_name].nil?
# allow a single converter not wrapped in an Array
@@ -2120,10 +2120,10 @@ class CSV
end
end
end
-
+
options.delete(field_name)
end
-
+
# Stores header row settings and loads header converters, if needed.
def init_headers(options)
@use_headers = options.delete(:headers)
@@ -2132,18 +2132,18 @@ class CSV
# headers must be delayed until shift(), in case they need a row of content
@headers = nil
-
+
init_converters(options, :header_converters)
end
-
- #
+
+ #
# The actual work method for adding converters, used by both CSV.convert() and
# CSV.header_convert().
- #
+ #
# This method requires the +var_name+ of the instance variable to place the
# converters in, the +const+ Hash to lookup named converters in, and the
# normal parameters of the CSV.convert() and CSV.header_convert() methods.
- #
+ #
def add_converter(var_name, const, name = nil, &converter)
if name.nil? # custom converter
instance_variable_get("@#{var_name}") << converter
@@ -2159,18 +2159,18 @@ class CSV
end
end
end
-
- #
+
+ #
# Processes +fields+ with <tt>@converters</tt>, or <tt>@header_converters</tt>
# if +headers+ is passed as +true+, returning the converted field set. Any
# converter that changes the field into something other than a String halts
# the pipeline of conversion for that field. This is primarily an efficiency
# shortcut.
- #
+ #
def convert_fields(fields, headers = false)
# see if we are converting headers or fields
converters = headers ? @header_converters : @converters
-
+
fields.map.with_index do |field, index|
converters.each do |converter|
field = if converter.arity == 1 # straight field converter
@@ -2184,17 +2184,17 @@ class CSV
field # final state of each field, converted or original
end
end
-
- #
+
+ #
# This methods is used to turn a finished +row+ into a CSV::Row. Header rows
# are also dealt with here, either by returning a CSV::Row with identical
# headers and fields (save that the fields do not go through the converters)
# or by reading past them to return a field row. Headers are also saved in
# <tt>@headers</tt> for use in future rows.
- #
+ #
# When +nil+, +row+ is assumed to be a header row not based on an actual row
# of the stream.
- #
+ #
def parse_headers(row = nil)
if @headers.nil? # header row
@headers = case @use_headers # save headers
@@ -2209,11 +2209,11 @@ class CSV
# first row is headers
else row
end
-
+
# prepare converted and unconverted copies
row = @headers if row.nil?
@headers = convert_fields(@headers, true)
-
+
if @return_headers # return headers
return self.class::Row.new(@headers, row, true)
elsif not [Array, String].include? @use_headers.class # skip to field row
@@ -2223,12 +2223,12 @@ class CSV
self.class::Row.new(@headers, convert_fields(row)) # field row
end
-
- #
+
+ #
# Thiw methods injects an instance variable <tt>unconverted_fields</tt> into
# +row+ and an accessor method for it called unconverted_fields(). The
# variable is set to the contents of +fields+.
- #
+ #
def add_unconverted_fields(row, fields)
class << row
attr_reader :unconverted_fields
@@ -2236,41 +2236,41 @@ class CSV
row.instance_eval { @unconverted_fields = fields }
row
end
-
- #
+
+ #
# This method is an encoding safe version of Regexp::escape(). It will escape
# any characters that would change the meaning of a regular expression in the
# encoding of +str+. Regular expression characters that cannot be transcoded
# to the target encoding will be skipped and no escaping will be performed if
# a backslash cannot be transcoded.
- #
+ #
def escape_re(str)
str.chars.map { |c| @re_chars.include?(c) ? @re_esc + c : c }.join
end
-
- #
+
+ #
# Builds a regular expression in <tt>@encoding</tt>. All +chunks+ will be
# transcoded to that encoding.
- #
+ #
def encode_re(*chunks)
Regexp.new(encode_str(*chunks))
end
-
- #
+
+ #
# Builds a String in <tt>@encoding</tt>. All +chunks+ will be transcoded to
# that encoding.
- #
+ #
def encode_str(*chunks)
chunks.map { |chunk| chunk.encode(@encoding.name) }.join
end
- #
+ #
# Reads at least +bytes+ from <tt>@io</tt>, but will read up 10 bytes ahead if
# needed to ensure the data read is valid in the ecoding of that data. This
# should ensure that it is safe to use regular expressions on the read data,
# unless it is actually a broken encoding. The read data will be returned in
# <tt>@encoding</tt>.
- #
+ #
def read_to_char(bytes)
return "" if @io.eof?
data = @io.read(bytes)