Spark Streaming
What is Spark Streaming?
Well, like anything, it is a miracle if you don't know it, else it is something which can be useful to you.
So below is a small code snippet to get an idea about it, what it is. But you need to explore it further to unwrap its usages.
Well, like anything, it is a miracle if you don't know it, else it is something which can be useful to you.
So below is a small code snippet to get an idea about it, what it is. But you need to explore it further to unwrap its usages.
Create a Scala object file like shown above in a Scala project.
Note the package name & we have extended App, so not required to define the main() here
While using socketTextStream(), we have provided the address & port to get the data from.
And if you are going to test it local then we utilize a free utility ‘NetCat’ to help us here to send the data from the above address & port.
Download it from https://joncraton.org/blog/46/netcat-for-windows/ and password for the zip file is ‘nc’ or you can take it from here with no password required. Once you download & unzip it on your local, then you will be having below contents –
Note the package name & we have extended App, so not required to define the main() here
While using socketTextStream(), we have provided the address & port to get the data from.
And if you are going to test it local then we utilize a free utility ‘NetCat’ to help us here to send the data from the above address & port.
Download it from https://joncraton.org/blog/46/netcat-for-windows/ and password for the zip file is ‘nc’ or you can take it from here with no password required. Once you download & unzip it on your local, then you will be having below contents –
Now open ‘command prompt’ for the above location & use below command to start sending messages from the required address & port –
nc.exe -s 127.0.0.1 -lvp 2222
Now execute the Scala code from your IDE & type the message in command prompt window, you will see the message after above processing.
Now we can also do the packaging of our code using any packaging tool like Maven, but here I am using SBT. So create a build.sbt file in your scala project with the following contents –
name := "anaotherApp"
version := "1.0"
scalaVersion := "2.11.8"
val sparkVersion="2.4.6"
libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-core" % sparkVersion,
"org.apache.spark" %% "spark-sql" % sparkVersion,
"org.apache.spark" %% "spark-mllib" % sparkVersion % "runtime",
"org.apache.spark" %% "spark-streaming" % sparkVersion % "provided",
"org.apache.spark" %% "spark-hive" % sparkVersion % "provided",
)
Now go to the above location of file in command prompt & execute the below commands –
If want to execute your file via SBT on command prompt then –
sbt run
It will do compile & run the code
Else you can run following commands –
sbt compile
sbt package
After this you can see the jar file created in the target folder of your project. And there you can see the class files also.
Now we can use this jar to execute in Spark like shown below –
spark-submit --class main.scala.AnotherApp anotherapp_2.11-0.1.jar
Note :- Don’t forget to start NetCat before above command
Now you can see spark UI on below address & server name can be seen in the logs on the console, as here server name is - sparkapp
http://sparkapp:4040
nc.exe -s 127.0.0.1 -lvp 2222
Now execute the Scala code from your IDE & type the message in command prompt window, you will see the message after above processing.
Now we can also do the packaging of our code using any packaging tool like Maven, but here I am using SBT. So create a build.sbt file in your scala project with the following contents –
name := "anaotherApp"
version := "1.0"
scalaVersion := "2.11.8"
val sparkVersion="2.4.6"
libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-core" % sparkVersion,
"org.apache.spark" %% "spark-sql" % sparkVersion,
"org.apache.spark" %% "spark-mllib" % sparkVersion % "runtime",
"org.apache.spark" %% "spark-streaming" % sparkVersion % "provided",
"org.apache.spark" %% "spark-hive" % sparkVersion % "provided",
)
Now go to the above location of file in command prompt & execute the below commands –
If want to execute your file via SBT on command prompt then –
sbt run
It will do compile & run the code
Else you can run following commands –
sbt compile
sbt package
After this you can see the jar file created in the target folder of your project. And there you can see the class files also.
Now we can use this jar to execute in Spark like shown below –
spark-submit --class main.scala.AnotherApp anotherapp_2.11-0.1.jar
Note :- Don’t forget to start NetCat before above command
Now you can see spark UI on below address & server name can be seen in the logs on the console, as here server name is - sparkapp
http://sparkapp:4040