Module 1: Introduction

History Time

The following events led to the eventual creation of the Scala Language:

  • Mid-1980 Functional languages in research (e.g., Scheme, ML, Miranda)
    • Type inference, lazy evaluation, and higher-ranked types.
  • 1987- 1999 Development of Haskell by Peyton’Jones et al.
    • A pure, lazy, functional language that includes typoeclasses and features monads.
    • Odersky words on higher-ranked types (incorporated in GHC).
  • 1999 - 2001 Development of Funnel by Odersky.
  • 1996 Sun Microsystem releases Java 1.0 by James Gosling
  • 1997 Pizza language by Martin Odersky ad Philip Wadler.
    • Superset of Java: translates to Java source as an intermediate representation.
    • Parametric polymorphism (generics), algebraic types (case classes), and higher-order functions.
  • 1998 Bracha, Odersky, Stoutmire and Wadler publish GJ at OOPSLA featuring generics.
    • Type erasure as a consecuence.
  • 2000 Sun Microsystems adopts GJ compiles as javac in JDK 1.3
  • 2004 Sun Microsystems realeses Java 1.5 incorporating generics from GJ.
  • 2004 Odersky publish the Scala Language based on the prev work of Funnel and Pizza.
  • 2006 Scala 2.x with many improvements.

For a more detailed explaination see Dan Rosen conference called Scala > Java.

According to Martin Odersky in his Scala with Style conference:

Because of the representation of concepts and the things you could do with that.

The first OOP languages (Simula 96 and Smalltalk) allowed a way to execute a fixed API with an unknown implementation. This generated a rising popularity of the OO paradigm.

Object oriented design (SOLID):

  1. S - Single Responsibility Principle
  2. O - Open-closed Principle
  3. L - Liskov substitution principle
  4. I - Interface segregation principle
  5. D - Dependency Inversion Principle

What about FP?

Functional programming has lots of methodological advantages:

  • Fewer errors
  • Better modularity
  • Higher-level abstractions
  • Shorter code
  • Increases developer productivity (easy to reason)

Nonetheless, a “cultural shock” has maintained the FP approach underground.

func fam

This was true, until software development reached certain level of complexity derived from:

  • Multicore, parallel programming
  • Async, asynchronous events
  • Cloud computing, distributed environments

exp growth

Several languages and frameworks use locks/threads that don’t scale. Therefore, the need of better and safer abstractions.

The essence of functional programming offered a viable solution in the form of immutable transformation of values instead of stepwise modification of mutable state.

Can FP and OOP be combined?

Of course. According to the official website Scala is a:

Combination of object oriented and functional programming in a high-level and static typed language.

What else is Scala?

Scala was created by Martin Odersky at the EPFL in 2004. The current version (2.12.7) is maintained by Scalacenter and Lightbend.

Scala is object-oriented

Pure object-oriented language; every value is an object and every operation is a method call. Types and behaviour of objects are describes by classes and traits. Class composition is done via a mixin-based mechanisms that replaces multiple inheritance.

Scala is functional

In Scala every function is a value. It supports higher-order functions, nesting, recursion, and currying.

def reverse[A](list: List[A]): List[A] = {
  def tailRecReverse(result: List[A], prependOps: (List[A], A) => List[A])(current: List[A]): List[A] = current match {
    case Nil => result
    case head :: tail => tailRecReverse(prependOps(result, head), prependOps)(tail)
  }
  val reverseOp = tailRecReverse(List.empty[A], (xs: List[A], x: A) => x +: xs) _
  reverseOp(list)
}

(unnecessarily complex example)

Scala is statically typed… with type inference!

Scala has an strongly typed system that supports type inference.

val courseName = "Introduction"
def square(x: Int) = x * x

Scala has different runtimes

JVM - JS - LLVM

runtime

Scala interops with Java

Scala has seamless interoperability with the Java Ecosystem. According to the official site:

Scala classes are ultimately JVM classes. You can create Java objects, call their methods and inherit from Java classes transparently from Scala.

Scala has lightweight syntax

Example: assume we want to know the number of underage persons on a group.

Java Code

import java.util.ArrayList;

public class Example {

  public class Person {
    public final String name;
    public final int age;
    Person(String name, int age) {
      this.name = name;
      this.age  = age;
    }
  }

  public void count() {
    ArrayList<Person> peopleList = new ArrayList<Person>();
    peopleList.add(new Person("A", 15));
    peopleList.add(new Person("B", 20));
    peopleList.add(new Person("C", 17));
    peopleList.add(new Person("D", 22));
    Person[] people = new Person[4];
    people = peopleList.toArray(people);
    ArrayList<Person> minorList = new ArrayList<Person>();
    ArrayList<Person> adultList = new ArrayList<Person>();
    for (int i = 0; i < people.length; i++)
      (people[i].age < 18 ? minorList : adultList).add(people[i]);
    System.out.println(minorList.size());
    System.out.println(adultList.size());
  }

  public static void main(String[] args) {
    Example example = new Example();
    example.count();
  }
}

Scala Code

object Example extends App {
  class Person(val name: String, val age: Int)
  val people = Array(
    new Person("A", 15),
    new Person("B", 20),
    new Person("C", 17),
    new Person("D", 22))
  val (minors, adults) = people partition (_.age < 18)
  println(minors.length)
  println(adults.length)
}

Scala Parallel Code

object Example extends App {
  class Person(val name: String, val age: Int)
  val people = Array(
    new Person("A", 15),
    new Person("B", 20),
    new Person("C", 17),
    new Person("D", 22))
  val (minors, adults) = people.par partition (_.age < 18)
  println(minors.length)
  println(adults.length)
}

Scala empowers concurrency and distribution

You can use data-parallel operations on collections, actors for concurrency/distribution or futures for asynchronous programming. Scala was build with concurrency and parallelism is mind.

val x = Future { someExpensiveComputation() }
val y = Future { someOtherExpensiveComputation() }
val z = for (a <- x; b <- y) yield a * b
for (c <- z) println("Result: " + c)
println("Meanwhile, the main thread goes on!")

Scala in the industry

According to Lightbend:

Scala has taken over the world of Fast Data, which is what some are calling the next wave of computation engines […] (ability to process event streams in real time).

Case Studies

  • Twitter - New tweets per second record
  • Paypal - Blows past 1 billion transactions per day
  • Walmart - Boost conversion by 20%
  • Samsung - Real time data platform for wearables

And more!

Relevant big-data open source projects

Scala has a relevant impact on Apache Spark, Apache Kafka, Apache Flink, Akka Streams and more!

Why is Scala in such position?

According to Dean Wampler, traditional big-data tools are inefficient and hard to implement.

Consider a typical word-count application.

A solution using MapReduce with Java should look similar to:

import org.apache.hadoop.*;

public class WordCount {
    public static class TokenizerMapper 
        extends Mapper<Object, Text, Text, IntWriter> {
        private final static IntWritable one = new IntWritable(1);
        private Text word = new Text();
        public void map(Object key, Text value, Context context)
            throws IOException, InterruptedException {
            StringTokenizer strTokenizer = new StringTokenizer(value.toString());
            while (strTokenizer.hasMoreTokens()) {
                word.set(strTokenizer.nextToken());
                context.write(word, one);
            }
        }
    }
    public static class IntSumReducer
        extends Reduces<Text, IntWritable, Text, IntWritable> {
        private IntWritable result = new IntWritable();
        public void reduce(Text key, Interable<IntWritable> values, Context context) 
            throws IOException, InterruptedException {
            int sum = 0;
            for (IntWritable val : values) {
                sum += val.get();
            }
            resut.set(sum);
            context.write(key, result);
        }
    }
}

Using Spark with Scala:

import org.apache.spark.sql._

object WordCount extends App with Context {
  import spark.implicits._
  final case class WordCount(word: String, count: Long)
  val readPath = "path/to/input/file"
  val writePath = "path/to/output/file"
  val data: Dataset[String] = spark.read.textFile(readPath)
  val wordCount: Dataset[WordCount] = data
    .flatMap(_.split("""\s+""")).map(_.toLowerCase.replaceAll("[^A-Za-z0-9]", "")).filter(_.length > 1)
    .groupByKey(identity).count().map({case (w, c) => WordCount(w, c)})
    .sort($"count".desc)
  wordCount.coalesce(1).write.csv(writePath)
}

See the complete example here.

It’s evident that the FP abstractions exposed over the Spark API provide a more easy-to-use API. Moreover, the actual performance of the Spark Engine vs MapReduce can be 100x faster.

Noel Welsh and Dave Gurnell in their book Scala with cats explain that:

  • Monoids are used in Spark to perform parallel work on each partition and combine the results in a single node.
  • Distributed systems use commutative replicated data types (CRDTs) to guarantee eventual consistency and to support reconciliation.

How to get started?

Installing Scala

We can use the standalone scala compiler.

$ sudo apt install openjdk-8-jdk
$ wget https://downloads.lightbend.com/scala/2.12.6/scala-2.12.6.deb
$ sudo dpkg -i scala-2.12.6.deb

Change the .deb for .msi (Windows) or .tgz (Mac). Or see the official documentation for more options.

(Students using Windows 10 are encouraged to install WSL)

Hello, World!

Create a helloworld.scala file with the following content:

object HelloWorld {
  def main(args: Array[String]): Unit =
    println("Hello, World!")
}

In the command line:

$ scala helloworld.scala
$ scalac helloworld.scala && scala HelloWorld

Using the Scala REPL

You can launch the Scala REPL by simply typing scala in the command line.

  • Scala tool for evaluation expressions (similar to ipython).
  • Has build-in tab completion.
  • Load scala files using :load file.scala
  • Paste mode using :paste
  • Get expression types with :type
  • Exit with :q
scala> println("This is the Scala REPL")
This is the Scala REPL

scala> def sum(x: Int, y: Int): Int = x + y
sum: (x: Int, y: Int)Int

scala> :type sum _
(Int, Int) => Int

scala> :load helloworld.scala
Loading helloworld.scala...
defined object HelloWorld

scala> HelloWorld.main(Array.empty)
Hello, World!

Scala Scripts with Ammonite

Scala scripts are lightweight files containing Scala code that can be directly run from the command line.

Install by running:

$ sudo sh -c '(echo "#!/usr/bin/env sh" && curl -L https://github.com/lihaoyi/Ammonite/releases/download/1.2.1/2.12-1.2.1) > /usr/local/bin/amm && chmod +x /usr/local/bin/amm' && amm

Create a file named fibonacci.sc containing:

import scala.annotation.tailrec

def fibonacci(n: Int): List[BigInt] = {
  def recFibo(n: Int): BigInt =
    if (n < 2) 1 else recFibo(n - 1) + recFibo(n - 2)
  if (n <= 0) List[BigInt]() else 0 :: (0 until n).toList.map(recFibo)
}

def tailRecFibonacci(n: Int): List[BigInt] = {
  @tailrec
  def recFibo(n: Int, a: BigInt = 0, b: BigInt = 1): BigInt = n match {
    case 0 => a
    case 1 => b
    case _ => recFibo(n-1, b, a+b)
  }
  if (n < 0) List[BigInt]() else (0 to n).toList.map(recFibo(_))
}

@main
def main(i: Int = 5): Unit = {
  println(s"Fibonacci sequence for i=$i : ${tailRecFibonacci(i).toString}")
}

Run the script with amm fibonacci.sc 10

We can use Ammonite to import other scripts or Ivy Dependencies.

Create a file fibosum.sc containing:

import $file.fibonacci
import scala.util.Try

def fibosum(i: Int)( fib: Int => List[BigInt]): BigInt = 
  fib(i).foldRight(0: BigInt)(_ + _)

@main 
def main(i: Int = 5, tailrec: Boolean = true): Unit = {
  val fiboSum = fibosum(i) _
  val res: Option[BigInt] = Try(
    if (tailrec) fiboSum(fibonacci.tailRecFibonacci _)
    else fiboSum(fibonacci.fibonacci _)).toOption 
  println(s"The sum of the fibonacci sequence for i=$i is $res.")
}

Run the script with (you may need to sudo apt install time):

  • time amm fibosum -i 40 -tailrec false
  • time amm fibosum -i 40 -tailrec true

Any difference? Implementation matters!

Scala projects with SBT

SBT is the most popular build tool for Scala Projects.

Some relevant features:

  • Scala-based build definition!
  • Continuous compilation and testing.
  • Package and publish.
  • Mixed Scala/Java projects.
  • Scala REPL with project classes and dependencies on classpath.
  • Parallel task and test execution.

You can install SBT by running:

$ echo "deb https://dl.bintray.com/sbt/debian /" | sudo tee -a /etc/apt/sources.list.d/sbt.list
$ sudo apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv 2EE0EA64E40A89B84B2DF73499E82A75642AC823
$ sudo apt update && sudo apt install sbt
$ sbt about

See more installation instructions here.

Let’s create a minimal project:

$ mkdir hello-world && cd hello-world
$ mkdir -p src/main/scala/example
$ mkdir -p src/test/scala/example
$ touch build.sbt

Create a source file at src/main/scala/example/Hello.scala

package example

object Hello {
  val message = "Hello, World!"
  def main(args: Array[String]): Unit = 
    println(message)
}

Now we can:

  • Compile the project with sbt compile
  • Compile and run with sbt run.

Add dependencies in the build.sbt file:

ThisBuild / scalaVersion := "2.12.7"
ThisBuild / organization := "com.example"

lazy val hello = (project in file("."))
  .settings(
    name := "Hello",
    libraryDependencies += "org.scalatest" %% "scalatest" % "3.0.5" % Test,
  )

Now we can add tests in src/test/scala/example/HelloSpec.scala

package example

import org.scalatest.{FlatSpec, Matchers}

class HelloSpec extends FlatSpec with Matchers {
  "The Hello object" should "contain a message starting with 'Hello'" in {
    assert(Hello.message startsWith "Hello")
  }
}

Run tests with sbt test

We can use the Scala REPL with the dependencies and classes defined in our project with sbt console

scala> import example.Hello
import example.Hello

scala> Hello.message
res0: String = Hello, World!

For more information and features see sbt by example.

Giter8 templates

Current sbt versions allow to create new build definitions from a template using the new command. Giter8 contains the most popular templates.

$ sbt new {template-name}
  • scala/scala-seed.g8 - seed template for Scala
  • akka/akka-quickstart-scala.g8 - akka quickstart
  • holdenk/sparkProjectTemplate.g8 - spark template

And more.

IntelliJ Idea IDE

IntelliJ Idea is one of the bests IDEs with Scala support. We’ll be using this IDE for the Nanodegree.

Students are encouraged to donwload the Jetbrains Toolbox App.

We can import SBT Projects using IntelliJ

$ sbt new scala/scala-seed.g8
$ idea . &

Consider looking at the talk Effective Scala development in IntelliJ Idea by Mikhail Mutcianko for more information regarding tips & tricks.

Personal Assignment!

  1. Use the scala/hello-world.g8 giter8 template to create a project.
  2. Run the project using the sbt run command form the CLI.
  3. Use IntelliJ Idea to edit the project to print “Howdy, World!” instead of “Hello, World!”

Study Material

Talks and Conferences

Other online resources