Skip to content

Write an Excel document using the Spark2 datasource API

Jörn Franke edited this page Sep 30, 2018 · 8 revisions

This is a Spark2 datasource application demonstrating some of the capabilities of the hadoopoffice library. This example features writing of Excel files with formulas and comments. It has successfully been tested with the HDP Sandbox VM 2.5, but other Hadoop distributions should work equally well, if they support Spark 2.

Building the example

Note the datasource is available on Maven Central and Spark-packages.

Execute

git clone /~https://github.com/ZuInnoTe/hadoopoffice.git hadoopoffice

You can build the application by changing to the directory hadoopoffice/examples/scala-spark2-excel-out-ds and using the following command:

sbt clean +it:test +assembly

Running the example

Before you execute the example make sure that the output directory does not exist:

hadoop fs -rm -R /user/spark/output

Execute the following command (please take care that you use spark-submit of Spark2)

spark-submit --class org.zuinnote.spark.office.example.excel.SparkScalaExcelOutDataSource ./example-ho-spark-scala-ds-excelout.jar /user/spark/output/                                                                                                                                

After the Spark2 job has been completed, you find the Excel file on HDFS. You can copy it to your local filesystem and open it in Excel or LibreOffice Calc using the following command:

hadoop fs -copyToLocal /user/spark/output/part-m-00000.xlsx

The Excel files contains cell with numeric values, a formula adding up some cells and a comment.

Other features

Find here further configuration options of the HadoopOffice library, such as encryption, decryption, locale, meta data filter, linked workbooks and filtering by sheets.