Roo implements read access for all common spreadsheet types. It can handle:
- Excel 2007 - 2013 formats (xlsx, xlsm)
- LibreOffice / OpenOffice.org formats (ods)
- CSV
- Excel 97, Excel 2002 XML, and Excel 2003 XML formats when using the roo-xls gem (xls, xml)
- Google spreadsheets with read/write access when using roo-google
Install as a gem
$ gem install roo
Or add it to your Gemfile
gem "roo", "~> 2.10.0"
You can use the Roo::Spreadsheet
class so roo
automatically detects which parser class to use for you.
require 'roo'
file_name = './new_prices.xlsx'
xlsx = Roo::Spreadsheet.open(file_name)
xlsx.info
# => Returns basic info about the spreadsheet file
Roo::Spreadsheet.open
can accept both string paths and File
instances. Also, you can provide the extension of the file as an option:
require 'roo'
file_name = './rails_temp_upload'
xlsx = Roo::Spreadsheet.open(file_name, extension: :xlsx)
xlsx.info
# => Returns basic info about the spreadsheet file
On the other hand, if you know what the file extension is, you can use the specific parser class instead:
require 'roo'
xlsx = Roo::Excelx.new("./new_prices.xlsx")
xlsx.info
# => Returns basic info about the spreadsheet file
ods.sheets
# => ['Info', 'Sheet 2', 'Sheet 3'] # an Array of sheet names in the workbook
ods.sheet('Info').row(1)
ods.sheet(0).row(1)
# Set the last sheet as the default sheet.
ods.default_sheet = ods.sheets.last
ods.default_sheet = ods.sheets[2]
ods.default_sheet = 'Sheet 3'
# Iterate through each sheet
ods.each_with_pagename do |name, sheet|
p sheet.row(1)
end
Roo uses Excel's numbering for rows, columns and cells, so 1
is the first index, not 0
as it is in an Array
sheet.row(1)
# returns the first row of the spreadsheet.
sheet.column(1)
# returns the first column of the spreadsheet.
Almost all methods have an optional argument sheet
. If this parameter is omitted, the default_sheet will be used.
sheet.first_row(sheet.sheets[0])
# => 1 # the number of the first row
sheet.last_row
# => 42 # the number of the last row
sheet.first_column
# => 1 # the number of the first column
sheet.last_column
# => 10 # the number of the last column
You can access the top-left cell in the following ways
sheet.cell(1,1)
sheet.cell('A',1)
sheet.cell(1,'A')
sheet.a1
# Access the second sheet's top-left cell.
sheet.cell(1,'A',sheet.sheets[1])
Use each
to iterate over each row.
If each is given a hash with the names of some columns, then each will generate a hash with the columns supplied for each row.
sheet.each(id: 'ID', name: 'FULL_NAME') do |hash|
puts hash.inspect
# => { id: 1, name: 'John Smith' }
end
Use sheet.parse
to return an array of rows. Column names can be a String
or a Regexp
.
sheet.parse(id: /UPC|SKU/, qty: /ATS*\sATP\s*QTY\z/)
# => [{:id => 727880013358, :qty => 12}, ...]
Use the :headers
option to include the header row in the parsed content.
sheet.parse(headers: true)
Use the :header_search
option to locate the header row and assign the header names.
sheet.parse(header_search: [/UPC*SKU/,/ATS*\sATP\s*QTY\z/])
Use the :clean
option to strip out control characters and surrounding white space.
sheet.parse(clean: true)
When opening the file you can add a hash of options.
If you open a document with merged cells and do not want to end up with nil values for the rows after the first one.
xlsx = Roo::Excelx.new('./roo_error.xlsx', {:expand_merged_ranges => true})
Roo has the ability to export sheets using the following formats. It
will only export the default_sheet
.
sheet.to_csv
sheet.to_matrix
sheet.to_xml
sheet.to_yaml
Specify the file as default argument for #to_csv
:
sheet.to_csv(File.new("/dev/null"))
specify the custom separator:
sheet.to_csv(separator: ":") # "," using by default
Stream rows from an Excelx spreadsheet.
xlsx = Roo::Excelx.new("./test_data/test_small.xlsx")
xlsx.each_row_streaming do |row|
puts row.inspect # Array of Excelx::Cell objects
end
By default blank cells will be excluded from the array. To keep them, use the option pad_cells = true. (They will be set to nil in the array)
xlsx.each_row_streaming(pad_cells: true) do |row|
puts row.inspect # Array of Excelx::Cell objects
end
To stream only some of the rows, you can use the max_rows
and offset
options.
xlsx.each_row_streaming(offset: 1) do |row| # Will exclude first (inevitably header) row
puts row.inspect # Array of Excelx::Cell objects
end
xlsx.each_row_streaming(max_rows: 3) do |row| # Will yield 4 rows (it's automatically incremented by 1) after the supplied offset.
puts row.inspect # Array of Excelx::Cell objects
end
Iterate over each row
xlsx.each_row do |row|
...
end
Roo::Excelx
also provides these helpful methods.
xlsx.excelx_type(3, 'C')
# => :numeric_or_formula
xlsx.cell(3, 'C')
# => 600000383.0
xlsx.excelx_value(row,col)
# => '600000383'
xlsx.formatted_value(row,col)
# => '0600000383'
Roo::Excelx
can access celltype, comments, font information, formulas, hyperlinks and labels.
xlsx.comment(1,1, ods.sheets[-1])
xlsx.font(1,1).bold?
xlsx.formula('A', 2)
Roo::OpenOffice has support for encrypted OpenOffice spreadsheets.
# Load an encrypted OpenOffice Spreadsheet
ods = Roo::OpenOffice.new("myspreadsheet.ods", password: "password")
Roo::OpenOffice
can access celltype, comments, font information, formulas and labels.
ods.celltype
# => :percentage
ods.comment(1,1, ods.sheets[-1])
ods.font(1,1).italic?
# => false
ods.formula('A', 2)
# Load a CSV file
csv = Roo::CSV.new("mycsv.csv")
Because Roo uses the standard CSV library, you can use options available to that library to parse csv files. You can pass options using the csv_options
key.
For instance, you can load tab-delimited files (.tsv
), and you can use a particular encoding when opening the file.
# Load a tab-delimited csv
csv = Roo::CSV.new("mytsv.tsv", csv_options: {col_sep: "\t"})
# Load a csv with an explicit encoding
csv = Roo::CSV.new("mycsv.csv", csv_options: {encoding: Encoding::ISO_8859_1})
You can also open csv files through the Roo::Spreadsheet class (useful if you accept both CSV and Excel types from a user file upload, for example).
# Load a spreadsheet from a file path
# Roo figures out the right parser based on file extension
spreadsheet = Roo::Spreadsheet.open(csv_or_xlsx_file)
# Load a csv and auto-strip the BOM (byte order mark)
# csv files saved from MS Excel typically have the BOM marker at the beginning of the file
spreadsheet = Roo::Spreadsheet.open("mycsv.csv", { csv_options: { encoding: 'bom|utf-8' } })
If you use .xls
or Google spreadsheets, you will need to install roo-xls
or roo-google
to continue using that functionality.
Roo's public methods have stayed relatively consistent between 1.13.x and 2.0.0, but please check the Changelog to better understand the changes made since 1.13.x.
- Fork it ( /~https://github.com/roo-rb/roo/fork )
- Install it (
bundle install --with local_development
) - Create your feature branch (
git checkout -b my-new-feature
) - Commit your changes (
git commit -am 'My new feature'
) - Push to the branch (
git push origin my-new-feature
) - Create a new Pull Request
Roo uses Minitest and RSpec. The best of both worlds! Run bundle exec rake
to
run the tests/examples.
You can run the tests/examples with Rspec like reporters by running
USE_REPORTERS=true bundle exec rake
Roo also has a few tests that take a long time (5+ seconds). To run these, use
LONG_RUN=true bundle exec rake
If you find an issue, please create a gist and refer to it in an issue (sample gist). Here are some instructions for creating such a gist.
- Create a gist with code that creates the error.
- Clone the gist repo locally, add a stripped down version of the offending spreadsheet to the gist repo, and push the gist's changes master.
- Paste the gist url here.