-
Notifications
You must be signed in to change notification settings - Fork 31
Write an Excel document using the Spark2 datasource API
This is a Spark2 datasource application demonstrating some of the capabilities of the hadoopoffice library. This example features writing of Excel files with formulas and comments. It has successfully been tested with the HDP Sandbox VM 2.5, but other Hadoop distributions should work equally well, if they support Spark 2.
Note the datasource is available on Maven Central and Spark-packages.
Execute
git clone /~https://github.com/ZuInnoTe/hadoopoffice.git hadoopoffice
You can build the application by changing to the directory hadoopoffice/examples/scala-spark2-excel-out-ds and using the following command:
sbt clean +it:test +assembly
Before you execute the example make sure that the output directory does not exist:
hadoop fs -rm -R /user/spark/output
Execute the following command (please take care that you use spark-submit of Spark2)
spark-submit --class org.zuinnote.spark.office.example.excel.SparkScalaExcelOutDataSource ./example-ho-spark-scala-ds-excelout.jar /user/spark/output/
After the Spark2 job has been completed, you find the Excel file on HDFS. You can copy it to your local filesystem and open it in Excel or LibreOffice Calc using the following command:
hadoop fs -copyToLocal /user/spark/output/part-m-00000.xlsx
The Excel files contains cell with numeric values, a formula adding up some cells and a comment.
Find here further configuration options of the HadoopOffice library, such as encryption, decryption, locale, meta data filter, linked workbooks and filtering by sheets.