This repository provides a scaffold for Apache Spark applications written in Scala. It includes a structured project setup with essential configurations, making it easier to kickstart Spark-based data processing projects.
This setup is built using
- Java 11
- Scala 2.13.16
- sbt 1.9.8
- Spark 3.5.4
- Pre-configured SBT build system
- Sample Spark job implementation
- Converts parquet to CSV
- Organized project structure
- Logging setup using SLF4J
- Easy-to-run example with SparkSession
Ensure you have the following installed:
- Java 8+
- Scala 2.12+
- Apache Spark 3.x
- SBT (Scala Build Tool)
Clone the repository and navigate to the project directory:
$ git clone /~https://github.com/Nasruddin/Spark-scala-scaffold.git
$ cd Spark-scala-scaffold
Compile and package the application using SBT:
$ sbt clean compile
$ sbt package
Run the application locally using SBT:
$ sbt run --input-path data --output-path output
Or submit the packaged JAR to a Spark cluster:
$ spark-submit --class com.example.Main \
--master local[*] \
target/scala-2.12/spark-scala-scaffold_2.12-0.1.jar \
--input-path data --output-path output
Spark-scala-scaffold/
├── src/
│ ├── main/
│ │ ├── scala/com/example/
│ │ │ ├── sparktutorial/
│ │ │ │ ├── Analysis.scala
│ │ │ │ ├── Configuration.scala
│ │ │ │ ├── package.scala
│ │ │ │ ├── SparkExampleMain.scala
│ ├── resources/
│ │ ├── application.conf
├── build.sbt
├── README.md
├── project/
├── target/
Analysis.scala
- Contains the main transformation logic for the Spark application.Configuration.scala
- Handles the application configuration usingTypesafe Config
.package.scala
- Contains utility functions and implicit values for the Spark session.SparkExampleMain.scala
- Entry point for running Spark examples.build.sbt
- Project dependencies and build configuration.
The configuration is managed using Typesafe Config
. The configuration file application.conf
should be placed in the resources
directory.
default {
appName = "Spark Scala Basic Setup App"
spark {
settings {
spark.master = "local[*]"
spark.app.name = ${default.appName}
}
}
}
Add the following dependencies to your build.sbt
file:
libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-core" % "3.5.4",
"org.apache.spark" %% "spark-sql" % "3.5.4",
"com.typesafe" % "config" % "1.4.1"
)
Contributions are welcome! Feel free to submit issues or pull requests.