Write an Excel document using the Spark2 datasource API

This is a Spark2 datasource application demonstrating some of the capabilities of the hadoopoffice library. This example features writing of Excel files with formulas and comments. It has successfully been tested with the HDP Sandbox VM 2.5, but other Hadoop distributions should work equally well, if they support Spark 2.

Building the example

Note the datasource is available on Maven Central and Spark-packages.

Execute

git clone /~https://github.com/ZuInnoTe/hadoopoffice.git hadoopoffice

You can build the application by changing to the directory hadoopoffice/examples/scala-spark2-excel-out-ds and using the following command:

sbt clean +it:test +assembly

Running the example

Before you execute the example make sure that the output directory does not exist:

hadoop fs -rm -R /user/spark/output

Execute the following command (please take care that you use spark-submit of Spark2)

spark-submit --class org.zuinnote.spark.office.example.excel.SparkScalaExcelOutDataSource ./example-ho-spark-scala-ds-excelout.jar /user/spark/output/

After the Spark2 job has been completed, you find the Excel file on HDFS. You can copy it to your local filesystem and open it in Excel or LibreOffice Calc using the following command:

hadoop fs -copyToLocal /user/spark/output/part-m-00000.xlsx

The Excel files contains cell with numeric values, a formula adding up some cells and a comment.

Other features

Find here further configuration options of the HadoopOffice library, such as encryption, decryption, locale, meta data filter, linked workbooks and filtering by sheets.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Write an Excel document using the Spark2 datasource API

Building the example

Running the example

Other features

Clone this wiki locally