Rodrigo Hernández Mota
@rhdzmota
Data and ML Engineer
(let's be honest)
Consider the typical "hello world" application.
package com.intersysconsulting.nanodegree.scalaessentials.examples
object Hello {
def main(args: Array[String]): Unit =
println("Welcome to the Scala Essentials Nanodegree!")
}
Any observations?
Now consider a random "one-liner".
1 to 10 filter { _ % 2 == 1 } flatMap {x => x to x + 1}
Comments?
No, but simple concepts can have a huge impact.
Combination of object oriented and functional programming in a high-level and static typed language.
For most people this implies a paradigm shift!
Pure object-oriented language; every value is an object and every operation is a method call! Types and behaviour of objects are describes by classes and traits. Class composition is done via a mixin-based mechanisms that replaces multiple inheritance.
In Scala every function is a value. It supports higher-order functions, nesting, recursion, and currying.
def reverse[A](list: List[A]): List[A] = {
def tailRecReverse(result: List[A], prependOps: (List[A], A) => List[A])(current: List[A]): List[A] = current match {
case Nil => result
case head :: tail => tailRecReverse(prependOps(result, head), prependOps)(tail)
}
val reverseOp = tailRecReverse(List.empty[A], (xs: List[A], x: A) => x +: xs) _
reverseOp(list)
}
(unnecessarily complex example)
Scala has an strongly typed system that supports type inference.
val courseName = "Introduction"
def square(x: Int) = x * x
Scala has seamless interoperability with the Java Ecosystem.
Scala classes are ultimately JVM classes. You can create Java objects, call their methods and inherit from Java classes transparently from Scala.
Example: assume we want to know the number of underage persons on a group.
import java.util.ArrayList;
public class Example {
public class Person {
public final String name;
public final int age;
Person(String name, int age) {
this.name = name;
this.age = age;
}
}
public void count() {
ArrayList<Person> peopleList = new ArrayList<Person>();
peopleList.add(new Person("A", 15));
peopleList.add(new Person("B", 20));
peopleList.add(new Person("C", 17));
peopleList.add(new Person("D", 22));
Person[] people = new Person[4];
people = peopleList.toArray(people);
ArrayList<Person> minorList = new ArrayList<Person>();
ArrayList<Person> adultList = new ArrayList<Person>();
for (int i = 0; i < people.length; i++)
(people[i].age < 18 ? minorList : adultList).add(people[i]);
System.out.println(minorList.size());
System.out.println(adultList.size());
}
public static void main(String[] args) {
Example example = new Example();
example.count();
}
}
object Example extends App {
class Person(val name: String, val age: Int)
val people = Array(
new Person("A", 15),
new Person("B", 20),
new Person("C", 17),
new Person("D", 22))
val (minors, adults) = people partition (_.age < 18)
println(minors.length)
println(adults.length)
}
object Example extends App {
class Person(val name: String, val age: Int)
val people = Array(
new Person("A", 15),
new Person("B", 20),
new Person("C", 17),
new Person("D", 22))
val (minors, adults) = people.par partition (_.age < 18)
println(minors.length)
println(adults.length)
}
You can use data-parallel operations on collections, actors for concurrency/distribution or futures for asynchronous programming. Scala was build with concurrency and parallelism is mind.
val x = Future { someExpensiveComputation() }
val y = Future { someOtherExpensiveComputation() }
val z = for (a <- x; b <- y) yield a * b
for (c <- z) println("Result: " + c)
println("Meanwhile, the main thread goes on!")
According to Lightbend:
Scala has taken over the world of Fast Data, which is what some are calling the next wave of computation engines [...] (ability to process event streams in real time).
And more!
Scala has a relevant impact on Apache Spark, Apache Kafka, Apache Flink, Akka Streams and more!
According to Dean Wampler, traditional big-data tools are inefficient and hard to implement.
Let's consider a word-count application.
WordCount Example using MapReduce with Java
import org.apache.hadoop.*;
public class WordCount {
public static class TokenizerMapper
extends Mapper<Object, Text, Text, IntWriter> {
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();
public void map(Object key, Text value, Context context)
throws IOException, InterruptedException {
StringTokenizer strTokenizer = new StringTokenizer(value.toString());
while (strTokenizer.hasMoreTokens()) {
word.set(strTokenizer.nextToken());
context.write(word, one);
}
}
}
public static class IntSumReducer
extends Reduces<Text, IntWritable, Text, IntWritable> {
private IntWritable result = new IntWritable();
public void reduce(Text key, Interable<IntWritable> values, Context context)
throws IOException, InterruptedException {
int sum = 0;
for (IntWritable val : values) {
sum += val.get();
}
resut.set(sum);
context.write(key, result);
}
}
}
Wordcount Example using Spark with Scala
import org.apache.spark.sql._
object WordCount extends App with Context {
import spark.implicits._
final case class WordCount(word: String, count: Long)
val readPath = "path/to/input/file"
val writePath = "path/to/output/file"
val data: Dataset[String] = spark.read.textFile(readPath)
val wordCount: Dataset[WordCount] = data
.flatMap(_.split("""\s+""")).map(_.toLowerCase.replaceAll("[^A-Za-z0-9]", "")).filter(_.length > 1)
.groupByKey(identity).count().map({case (w, c) => WordCount(w, c)})
.sort($"count".desc)
wordCount.coalesce(1).write.csv(writePath)
}
See the complete example here.
Moreover, Noel Welsh and Dave Gurnell in their book Scala with cats explain that:
We can use the standalone scala compiler!
$ sudo apt install openjdk-8-jdk
$ wget https://downloads.lightbend.com/scala/2.12.6/scala-2.12.6.deb
$ sudo dpkg -i scala-2.12.6.deb
Change the .deb
for .msi
(Windows) or .tgz
(Mac). Or see the official documentation for more options.
(Students using Windows 10 are encouraged to install WSL)
Create a helloworld.scala
file with the following content:
object HelloWorld {
def main(args: Array[String]): Unit =
println("Hello, World!")
}
In the command line:
$ scala helloworld.scala
$ scalac helloworld.scala && scala HelloWorld
You can launch the Scala REPL by simply typing scala
in the command line.
:load file.scala
:paste
:type
:q
scala> println("This is the Scala REPL")
This is the Scala REPL
scala> def sum(x: Int, y: Int): Int = x + y
sum: (x: Int, y: Int)Int
scala> :type sum _
(Int, Int) => Int
scala> :load helloworld.scala
Loading helloworld.scala...
defined object HelloWorld
scala> HelloWorld.main(Array.empty)
Hello, World!
Scala scripts are lightweight files containing Scala code that can be directly run from the command line.
Install by running:
$ sudo sh -c '(echo "#!/usr/bin/env sh" && curl -L https://github.com/lihaoyi/Ammonite/releases/download/1.2.1/2.12-1.2.1) > /usr/local/bin/amm && chmod +x /usr/local/bin/amm' && amm
Create a file named fibonacci.sc
containing:
import scala.annotation.tailrec
def fibonacci(n: Int): List[BigInt] = {
def recFibo(n: Int): BigInt =
if (n < 2) 1 else recFibo(n - 1) + recFibo(n - 2)
if (n <= 0) List[BigInt]() else 0 :: (0 until n).toList.map(recFibo)
}
def tailRecFibonacci(n: Int): List[BigInt] = {
@tailrec
def recFibo(n: Int, a: BigInt = 0, b: BigInt = 1): BigInt = n match {
case 0 => a
case 1 => b
case _ => recFibo(n-1, b, a+b)
}
if (n < 0) List[BigInt]() else (0 to n).toList.map(recFibo(_))
}
@main
def main(i: Int = 5): Unit = {
println(s"Fibonacci sequence for i=$i : ${tailRecFibonacci(i).toString}")
}
Run the script with amm fibonacci.sc 10
You can import other scripts and Ivy Dependencies.
Create a file fibosum.sc
containing:
import $file.fibonacci
import scala.util.Try
def fibosum(i: Int)( fib: Int => List[BigInt]): BigInt =
fib(i).foldRight(0: BigInt)(_ + _)
@main
def main(i: Int = 5, tailrec: Boolean = true): Unit = {
val fiboSum = fibosum(i) _
val res: Option[BigInt] = Try(
if (tailrec) fiboSum(fibonacci.tailRecFibonacci _)
else fiboSum(fibonacci.fibonacci _)).toOption
println(s"The sum of the fibonacci sequence for i=$i is $res.")
}
Run the script with (you may need to sudo apt install time
):
time amm fibosum -i 40 -tailrec false
time amm fibosum -i 40 -tailrec true
Any difference? Implementation matters!
SBT is the most popular build tool for Scala Projects.
Some relevant features:
You can install SBT by running:
$ echo "deb https://dl.bintray.com/sbt/debian /" | sudo tee -a /etc/apt/sources.list.d/sbt.list
$ sudo apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv 2EE0EA64E40A89B84B2DF73499E82A75642AC823
$ sudo apt update && sudo apt install sbt
$ sbt about
See more installation instructions here.
Let's create a minimal project:
$ mkdir hello-world && cd hello-world
$ mkdir -p src/main/scala/example
$ mkdir -p src/test/scala/example
$ touch build.sbt
Create a source file at src/main/scala/example/Hello.scala
package example
object Hello {
val message = "Hello, World!"
def main(args: Array[String]): Unit =
println(message)
}
Now we can:
sbt compile
sbt run
.Add dependencies in the build.sbt
file:
ThisBuild / scalaVersion := "2.12.7"
ThisBuild / organization := "com.example"
lazy val hello = (project in file("."))
.settings(
name := "Hello",
libraryDependencies += "org.scalatest" %% "scalatest" % "3.0.5" % Test,
)
Now we can add tests in src/test/scala/example/HelloSpec.scala
package example
import org.scalatest.{FlatSpec, Matchers}
class HelloSpec extends FlatSpec with Matchers {
"The Hello object" should "contain a message starting with 'Hello'" in {
assert(Hello.message startsWith "Hello")
}
}
Run tests with sbt test
We can use the Scala REPL with the dependencies and classes defined in our project with sbt console
scala> import example.Hello
import example.Hello
scala> Hello.message
res0: String = Hello, World!
For more information and features see sbt by example.
Current sbt versions allow to create new build definitions from a template using the new command. Giter8 contains the most popular templates.
$ sbt new {template-name}
Templates:
scala/scala-seed.g8
- seed template for Scalaakka/akka-quickstart-scala.g8
- akka quickstartholdenk/sparkProjectTemplate.g8
- spark templateAnd more.
IntelliJ Idea is one of the bests IDEs with Scala support. We'll be using this IDE for the Nanodegree.
Students are encouraged to donwload the Jetbrains Toolbox App.
We can import SBT Projects using IntelliJ
$ sbt new scala/scala-seed.g8
$ idea . &
Consider looking at the talk Effective Scala development in IntelliJ Idea by Mikhail Mutcianko for more information regarding tips & tricks.
scala/hello-world.g8
giter8 template to create a project.sbt run
command form the CLI.Talks and Conferences
Other online resources