Commit 61f6ba2d authored by Martin Perdacher's avatar Martin Perdacher

Update README.md

parent 4c49f877
# Helloworld example using Apache spark
### Without warranty
Things like Spark and its dependencies are changing quite fast. If you encounter problems or errors in this image you are welcome to resolve the issues and contribuite to this docker image.
### Install prerequisites
- git
......@@ -22,7 +26,7 @@ docker run -t -i docker-spark:1.0
- mount [volume](https://docs.docker.com/storage/) using `docker create volumne`
### Running the example
### Running the example (python)
Run Spark locally with as many cores available
```{bash}
......@@ -33,3 +37,20 @@ Run Spark locally with 2 cores
```{bash}
spark-submit --master "local[2]" helloworld.py
```
### Running the example (Java or Scala)
Best practice for Java or Scala is to use a build tool such as [Maven](https://maven.apache.org/) or [SBT](https://www.scala-sbt.org/).
Here we are using SBT, which is already installed on your Docker image.
A build tool automatically manages your dependencies. Most important dependencies for our lecture are:
- Scala (Java already installed and JAVA_HOME is set)
- Apache Spark (currently 2.4.5)
For the use in Google Cloud you need to build a jar or fat-jar, which contains all dependencies of your project. For SBT this could be managed using [sbt-assembly](https://github.com/sbt/sbt-assembly). Should be already installed.
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment