/mnt/c 경로에서 실행

mkdir temp

cd temp

virtualenv venv

Untitled

source venv/bin/activate

pip install pyspark

Quick Start - Spark 3.2.1 Documentation (apache.org)

Untitled

This program just counts the number of lines containing 'a' and the number containing 'b' in a text file. Note that you'll need to replace YOUR_SPARK_HOME with the location where Spark is installed. As with the Scala and Java examples, we use a SparkSession to create Datasets. For applications that use custom classes or third-party libraries, we can also add code dependencies to spark-submit through its --py-files argument by packaging them into a .zip file (see spark-submit --help for details). SimpleApp is simple enough that we do not need to specify any code dependencies.

We can run this application using the bin/spark-submit script: