Archive

Posts Tagged ‘scala’

Using Akka Dispatchers In Scala

September 19, 2011 Comments off

Akka dispatchers are extremely important in Akka framework. They are directly responsible for optimal performance, throughput and scalability.

Akka supports dispatchers for both event-driven lightweight threads and thread-based Actors. For thread-based Actors each dispatcher is bound to a dedicated OS thread.

Default dispatcher is a single event-based dispatcher for all Actors created. The dispatcher used is this one:

Dispatchers.globalExecutorBasedEventDrivenDispatcher

For many cases it becomes mandatory to group Actors together for a dedicated dispatcher, then we can override the defaults and define our own dispatcher.

Setting the Dispatcher
Normally we set the dispatcher in Actor itself

class EchoActor extends Actor {
     self.dispatcher = ..... // set the dispatcher
}

Or we can set it in the ActorRef

actorRef.dispatcher = dispatcher

There are different kind of dispatchers

  • Thread-based
  • Event-based
  • Priority event-based
  • Work-stealing

Thread-based
It binds dedicated OS thread to each Actor. The messages are posted to LinkedBlockingQueue which feeds messages to dispatcher one by one. It has worst performance and scalability. We also cannot share it among actors. Although Actors do not block for threads in this case.
Code example

class EchoActor extends Actor {
     self.dispatcher = Dispatchers.newThreadBasedDispatcher(self)
     ....
}

Event-based
The ExecutorBasedEventDrivenDispatcher binds a set of Actors to a thread pool backed up by a BlockingQueue. The dispatcher must be shared among Actors. This dispatcher is highly configurable and here we can specify things like ‘type of queue’, ‘max items’ , ‘rejection-policy’.
Code example

class EchoActor extends Actor {
self.dispatcher = Dispatchers.newExecutorBasedEventDrivenDispatcher(name)
 .withNewThreadPoolWithLinkedBlockingQueueWithCapacity(100)
 .setCorePoolSize(16)
 .setMaxPoolSize(128)
 .setKeepAliveTimeInMillis(60000)
 .setRejectionPolicy(new CallerRunsPolicy)
 .build
 ....
}

Priority event-based
It is meant for handling messages when priorities are assigned to messages. It is done by using PriorityExecutorBasedEventDrivenDispatcher. It requires a PriorityGenerator as an attribute in its constructor.

Let’s look at an example where we have a PriorityExecutorBasedEventDrivenDispatcher used for a group of messages fired on an actor.

package com.meetu.akka.dispatcher

import akka.actor.Actor.actorOf
import akka.actor.Actor
import akka.dispatch.PriorityExecutorBasedEventDrivenDispatcher
import akka.dispatch.PriorityGenerator

object PriorityDispatcherExample extends App {
  val actor = Actor.actorOf(
    new Actor {
      def receive = {
        case x => println(x)
      }
    })

  val priority = PriorityGenerator {
    case "high priority" => 0
    case "low priority" => 100
    case _ => 50
  }

  actor.dispatcher = new PriorityExecutorBasedEventDrivenDispatcher("foo", priority)

  actor.start
  actor.dispatcher.suspend(actor)

  actor ! "low priority"
  actor ! "others"
  actor ! "low priority"
  actor ! "high priority"

  actor.dispatcher.resume(actor)
  Actor.registry.shutdownAll
}

Read more…

Categories: Scala Tags: ,

Manage Akka Actors With Supervisors

September 15, 2011 Comments off

Read this blog on our specialized Scala consulting organization

Knoldus

Manage Akka Actors With Supervisors

Categories: Scala Tags: ,

Starting Akka Project With SBT 0.10

August 16, 2011 2 comments

I was starting with Akka project with SBT but found that the latest SBT is quite different from before.

I tried to create AKKA project with latest SBT but got stuck. Old SBT used to ask to create a new project in case it did not find any in the directory. With new SBT it is not the case. If you want to know how to go about creating new Akka project with SBT read on.

After installing SBT if we type in sbt in the command prompt in an empty directory, this is what we are likely to see.

In order to create project execute the following commands in the sbt session.

> set name := "AkkaQuickStart"
> set version := "1.0"
> set scalaVersion := "2.9.0-1"
> session save
> exit

We should get the following output if we type the above mentioned commands in sbt session.

SBT creates files and directories when we executed the commands. It creates build.sbt and it contains the same values we typed in sbt session. Other directories like target and project are of little consequence to us.

Project directory will become important later when we will try to add sbteclipse plugin. My project directory contains the following subdirectories and files.

Read more…

Categories: Scala Tags: , ,

Working With Scala Collections

June 22, 2011 Comments off

We have a monthly iBAT (Inphina Beer and Technology Sessions). We look forward to this day and it was Scala day this time. I presented on Scala collections.

Scala collection is elegant and concise. Scala Collections like java is object-oriented and we can work with generic types. It is optionally persistent i.e can be mutable and immutable. It provides higher order methods like foreach, map and filter.

Scala collections follow uniform return type principle. Which basically means that when you perform an operation on a collection they return a collection of the same type.

The root Trait in Scala collections is Traversable. It may take some time to get used to it as Scala collections root Traits like Traversable have a big bunch of methods.

The for notation is more readable. It is basically like a for loop with a return type. Look for yourself how readable it is compared to the original code.

This code listing is without a for notation

scala> val listOfNumbers = List(1, 2, 3)
listOfNumbers: List[Int] = List(1, 2, 3)

scala> val flattenedListOfNumbers = listOfNumbers flatMap (0 to _)
flattenedListOfNumbers: List[Int] = List(0, 1, 0, 1, 2, 0, 1, 2, 3)

After using for notation

scala> val listOfNumbers = List(1, 2, 3)
listOfNumbers: List[Int] = List(1, 2, 3)

scala> val flattenedList = for(number <- listOfNumbers; range <- 0 to number) yield range
flattenedList: List[Int] = List(0, 1, 0, 1, 2, 0, 1, 2, 3)

One more interesting thing we can do with a Map is that we can reverse key and value with the following code

scala> val aMap = Map("India" -> "Delhi", "France" -> "Paris", "Italy" -> "Rome")
aMap: scala.collection.immutable.Map[java.lang.String,java.lang.String] = Map(India -> Delhi, France -> Paris, Italy -> Rome)

scala> val reverseMap = aMap map{case(k, v) => (v, k)}
reverseMap: scala.collection.immutable.Map[java.lang.String,java.lang.String] = Map(Delhi -> India, Paris -> France, Rome -> Italy)

scala> val alsoReverseMap = for((k, v) <- aMap) yield (v -> k) 
alsoReverseMap: scala.collection.immutable.Map[java.lang.String,java.lang.String] = Map(Delhi -> India, Paris -> France, Rome -> Italy)

Reference were from Martin’s Odersky talk on Future Proofing Collections and Scala Collections API documentation. Enjoy the presentation …

Categories: Scala Tags: ,

Working With Scala Test

May 16, 2011 1 comment

Scala Test is an open source Test framework for the java platform. With Scala test we can either test java or Scala code. It integrates with popular existing tools like JUnit, TestNG and it is designed to do different styles of testing like Behavior Driven Design for example.

Let’s have a look at a simple JUnit4 Test in Scala Test

package com.inphina.ibat.junit

import org.scalatest.junit.AssertionsForJUnit
import org.junit.Assert._
import org.junit.Test
import  org.junit.Before

class SimpleJunit4Demo extends AssertionsForJUnit {

  var sb: StringBuilder = _

  @Before def initialize() {
    sb = new StringBuilder("Welcome To ")
  }

  @Test
  def verifyEasy() {
    sb.append("ScalaTest!")
    assertEquals("Welcome To ScalaTest!", sb.toString)
  }

}

This Test implementation is not very different from a normal JUnit Test we are accustomed to. In order to understand ScalaTest there are three concepts we need to be aware of.

  • Suite: A collection of tests. A Test is anything that has a name and can succeed or fail
  • Runner: ScalaTest provides a Runner application that can run Suites of Tests
  • Reporter: As Tests are run events are fired to a reporter, it takes care of presenting results back to the user

Read more…

Categories: Scala Tags: ,

Setting up Scala dev environment on Ubuntu 11.04

April 30, 2011 4 comments

Inphina provides specialized Scala consulting, training and offshoring … Learn more about Scala@Inphina!

Recently moved to Ubuntu 11.04 from Windows 7 and not regretting it one bit. Used Ubuntu Software Center and Synaptic Package Manager to set up much of my development environment without any hassles. However I couldn’t use the Synaptic Package Manager to install Scala as it had an older version published in the repository — Scala 2.7.x.

If, like me, you need to install Scala 2.8.1 or Scala 2.9.1 on Ubuntu, follow the steps below:

  1. Download your desired version of Scala
  2. Extract the package in /opt/
  3. Edit .bashrc with the command gedit .bashrc
  4. Export variable SCALA_HOME: export SCALA_HOME=/opt/scala-2.8.1.final
  5. Update PATH to include SCALA binaries: export PATH=$SCALA_HOME/bin:$PATH
  6. Save .bashrc, log out and log in
  7. Verify the Scala installation: scala -version
  8. Install / configure your favorite IDE

You’re ready to go, start hacking..


Srirangan is a programmer / senior consultant with Inphina Technologies
Blog   GitHub   LinkedIn   Twitter

Categories: Scala Tags: ,

How to introduce Scala in the enterprise

April 22, 2011 2 comments

Inphina provides specialized Scala consulting, training and offshoring … Learn more about Scala@Inphina!

Are you a Scala fanboy enthusiast? Have you been following the Scala community and projects intently for a past couple of years? And have you been itching to implement Scala code, Scala libraries and tools in a real world scenario only to find your team members / customers / IT leadership not receptive to the idea?

If so, here are some things you may want to try:

Effectively communicate Scala’s benefits

Scala has a lot of advantages … you know that and I know that but do they know that?

Does your IT leader know that Scala plays really nicely with the existing JVM (and CLR) environments? Do your fellow programmers know of the conciseness of Scala code? Is the older, hardened Java developer aware that most times Java Code *is* perfectly compatible and compilable Scala code?

Has your customer been told of the rapid development, scalability and maintainability edge Scala brings to the engagement?

If stakeholders see the benefit they will recognize the opportunity. And it may very well be that we have earned new allies in our crusade for Scala adoption.

Share the case-studies

The Scala programming language has been proven in high-scalability environments. Twitter got over its stability woes with its migration to Scala. LinkedIn picked Scalatra, a Scala web framework, for its newer modules. Foursquare is built entirely on the Scala Lift Web Framework while Guardian.co.uk recently spoke about their adoption of Scala.

These and many more case studies are out there available helping our understanding of the realistic challenges of Scala adoption, how they were overcome and what are the benefits that have been derived from the overall migration to Scala.

Start with something small

Take the initiative and start with something small and show them how it is done.

For example you may want to introduce a sub module in your existing enterprise project for testing with ScalaTests, build your project utilities with Scala, or implement a simple website / web application with a Scala based web frameworks instead.

Amaze the business with the rapid developement that you’ve achieved, amaze the geeks with your sexy Scala code. Go out there, implement it anyway and bedazzle them.

If things go wrong, you can always apologize and make use of Git or whatever other version control you have. ;-)

Destroy the “It’s not enterprise ready!” argument

You have great IDE support (IntelliJIDEA, NetBeans 7, Eclipse with ScalaIDE 2.9 beta). It integrates very nicely with Maven. You have SBT for projects from scratch. You have books and now you have training programmes. It works seamlessly on your JVM / CLR based environments.

Given the current economic situation, it is pragmatic for business to consider the availability of Scala shops and especially Scala Offshoring partners. Inphina Technologies has taken the lead and is pioneering the effort to offer Scala Offshoring solutions to its partners.

Ask loudly *why* do they specifically think it is not enterprise ready? Chances are they don’t really have a good explanation. That’s not to say Scala doesn’t have its shortcomings, however, what Scala also has is a very enthusiastic and effective community.


Srirangan is a programmer / senior consultant with Inphina Technologies
Blog   GitHub   LinkedIn   Twitter

Categories: Scala Tags:

Simple Database Migrations with Scala and Querulous

April 12, 2011 3 comments

Inphina provides specialized Scala consulting, training and offshoring … Learn more about Scala@Inphina!

(Cross posted on my personal blog)

We hate doing it, but one time or the other, each of us ends up writing a quick-and-dirty database migration utility for a project we’re working on. I recently had to do the same, and surprisingly the process was really smooth .. largely thanks to Scala goodness and Querulous — a very cool, simple and lightweight database library created by some folks at Twitter.

Let’s start by creating a Mavenized Scala project, check out Getting Started – Maven Archetype for Scala.

Does the Scala project build successfully? Good.

The next step, add the Querulous dependencies to your project pom.xml file:

<dependencies>
    ..
    <dependency>
      <groupId>com.twitter</groupId>
      <artifactId>querulous</artifactId>
      <version>2.0.1</version>
    </dependency>
    ..
</dependencies>
..
  <repositories>
    ..
    <repository>
      <id>twitter.com</id>
      <url>http://maven.twttr.com</url>
    </repository>
    ..
  </repositories>

In our example, we will migrate a source database boysDb to a destination database girlsDb. No real migration business logic is being applied, think of it as a one-to-one migration.

We start off by declaring QueryEvaluators for our source and destination database schemas:

  val sourceDb = QueryEvaluator("localhost/boysdb", "root", "")
  val destinationDb = QueryEvaluator("localhost/girlsdb", "root", "")

QueryEvaluator object allows us to execute SQL queries on a referenced database.

Next step, get boys from the source database:

val boys = sourceDb.select("select id, name from boys") {
      row => (row.getInt(1), row.getString(2))
    }

boys is a collection of Tuple2[Int, String] containing the id and name attributes of a boy.

For each boy, insert a girl:

boys.foreach(boy => insertGirls(boy))

Let’s now define our insertGirls() method

  def insertGirls(boy: Tuple2[Int, String]): Unit = {
    destinationDb.execute("insert into girls (id, name) values (?, ?)", boy._1, boy._2)
  }

..and we’re done.

Scala + Querulous makes it easy to connect to multiple database instances, makes querying extremely straightforward by exposing SQL directly without unnecessary ORM’ing or configuration / property file mess.

Full source code has been published on GitHub – https://github.com/Srirangan/scala-querulous-simple-migrations

For serious database migrations, do consider Scala Migrations.


Srirangan is a programmer / senior consultant with Inphina Technologies
Blog   GitHub   LinkedIn   Twitter

Categories: Scala Tags: ,

Getting Started – Scala Persistence with Squeryl

March 18, 2011 2 comments

Inphina provides specialized Scala consulting, training and offshoring … Learn more about Scala@Inphina!

(Cross posted on my personal blog)

Recently I had to implement simple entity persistence in Scala. I used Squeryl which defines itself as “a Scala ORM and DSL for talking with Databases with minimum verbosity and maximum type safety”. I tried it out and it was extremely easy to get started with.

This blog post will cover the basics of Scala persistence using Squeryl and the target database system is MySQL.

You will need a Scala project to work with. If you don’t have one already, you can create a Scala project with the Maven archetype.

Step 1 – Define dependencies

Our first dependency is Squeryl. As I blog this, the latest Squeryl version is “0.9.4-RC6″. We need to make this available to our code. Additionally, we need to make available the MySQL Connector as this example persists data into a MySQL database. MySQL Connector is a run-time dependency internally consumed by Squeryl.

    <dependency>
      <groupId>org.squeryl</groupId>
      <artifactId>squeryl_2.8.1</artifactId>
      <version>0.9.4-RC6</version>
    </dependency>
    <dependency>
      <groupId>mysql</groupId>
      <artifactId>mysql-connector-java</artifactId>
      <version>5.1.15</version>
    </dependency>

If you are using Apache Maven, add the code shown above to the “dependencies” element of your project “pom.xml” file. If you don’t use Maven, make sure the artifacts / JARs are available in your project.

Step 2 – Define entities

Squeryl entities are basic Scala objects.. should we call them POSO’s? Here I’ve defined a BaseEntity and extended it to define a User entity.

package net.srirangan.opensource.squerylexample.entities

import java.sql.Timestamp
import org.squeryl.KeyedEntity

class BaseEntity extends KeyedEntity[Long] {

  val id:Long = 0
  var lastModified = new Timestamp(System.currentTimeMillis)

}

All entities will extend and build on the BaseEntity. The BaseEntity introduces two fields “id” and “lastModified”.

package net.srirangan.opensource.squerylexample.entities

class User(var email:String, var password:String) extends BaseEntity {

  // Zero argument constructor required
  // Squeryl Roadmap says 0.9.5 will not need them :-) 
  def this() = this("", "")
  
}

We’ve now defined the User extending the BaseEntity. In addition to the fields “id” and “lastModified”, a User instance will have “email” and “password” attributes. In addition, we have had to define a zero-argument constructor, however, the Squeryl road-map says that this would not be required from version “0.9.5″ onwards.

Step 3 – Define schema

Squeryl requires us to define a Schema object. Our entity classes (i.e. User) needs to be mapped as tables inside the Schema. Table column properties (unique, index, auto_increment etc.) are also defined here.


package net.srirangan.opensource.squerylexample.schema

import org.squeryl._
import org.squeryl.PrimitiveTypeMode._
import net.srirangan.opensource.squerylexample.entities.User

object Schema extends Schema {

  val users = table[User]

  on(users)(user => declare(
      user.id is (autoIncremented),
      user.email is (unique)
    ))

}

In the code above, we map the User entity to the User table and we’ve declared the email column as unique and the id to be auto incremented. Custom table and column names are possible to define through Squeryl annotations provided, check out the Squeryl docs.

Step 4 – Start database session

Needless to mention, we will need an active database session if we are to perform CRUD operations on the database. Here’s how I’ve done it:

  import org.squeryl.Session
  import org.squeryl.SessionFactory
  import org.squeryl.adapters.MySQLAdapter
  
  val databaseUsername = "squeryl-example"
  val databasePassword = "squeryl-example"
  val databaseConnection = "jdbc:mysql://localhost:3306/squeryl-example"
  
  def startDatabaseSession():Unit = {
    Class.forName("com.mysql.jdbc.Driver")
      SessionFactory.concreteFactory = Some(() => Session.create(
          java.sql.DriverManager.getConnection(databaseConnection, databaseUsername, databasePassword),
          new MySQLAdapter)
        )
  }

Step 5 – Generate schema

Once we’ve initialized the database session, we can generate the database schema as shown below:

  import org.squeryl.PrimitiveTypeMode._
  import net.srirangan.opensource.squerylexample.schema.Schema

  startDatabaseSession()
    
  transaction {
    Schema.create
    println("Created the schema")
  }

Step 6 – Insert and Update

  import net.srirangan.opensource.squerylexample.entities.User

  transaction {
    val user1:User = new User("user1@domain.com", "oldPassword")
    Schema.users.insert(user1)
    println("Inserted user1")
    
    user1.password = "newPassword"
    Schema.users.update(user1)
    println("Updated user1")
  }

Step 7 – Select

    transaction {
      val queriedUser:User = Schema.users.where(user => user.id === 2L).single
      println(queriedUser.id + " -- " + queriedUser.email)
    }

The entire squeryl-example source code is available at github.com/Srirangan/squeryl-example

More examples on Squeryl.org.


Srirangan is a programmer / senior consultant with Inphina Technologies
Blog   GitHub   LinkedIn   Twitter

Categories: Scala Tags: , , ,

Build a simple web crawler with Scala and GridGain

March 10, 2011 4 comments

Inphina provides specialized Scala consulting, training and offshoring … Learn more about Scala@Inphina!

(Cross posted on my personal blog)

Recently, as a proof-of-concept, I had to build a crawler. Of course I cannot share much details about that project other than to state that it’s an absolute privilege to be part of. :-)

I set out to build this crawler.

Prior experience of pairing with Narinder and Vikas had made me aware of distributed computing technologies such as Hadoop and GridGain, so I knew there was my start. Based on past experiences, I immediately picked GridGain over Hadoop. Pretty obvious reasons too: More examples, better support etc.

My next choice was a programming language. Java was the obvious choice but I took a risk and chose Scala. GridGain’s support for Scala and abundance of examples made this choice a bit easier. A quick, unofficial definition for those unaware: Scala is an Objective-Functional programming language that is very attractive to programmers and has proved itself in high-scalability situations (Twitter, LinkedIn, FourSquare etc.)

Note – I am new to Scala and my Scala code may look more Java like than Functional. I’m still learning and future examples should be better. “Awesomeness of Scala code” not a valid parameter to judge this blog post!

Professional etiquette (and NDAs + lawyers) will not allow me to share exact details of this crawler. After all, it is not my intellectual property. But for the sake of this example I will consider my target to be a simple web crawler that would be used by search engines to index the content on the internet.

What would our web crawler do?

  1. Start at some base URL
  2. Index content of this URL
  3. Search for more URLs to index
  4. Repeat 2 & 3 for these new URLs

This blog post will not get into the operational logic of loading a URL, extracting keywords, adding to index, extracting URLs etc. That I believe has been done to death. Alternatively I will look at how to scale up the crawling process using Scala and GridGain.

Those already familiar with GridGain, for the sake of this example I would request you to merge the concepts of a GridTask and a GridJob. Here we will create custom GridTasks which have one corresponding, unique custom GridJob.

Our GridTask-GridJob Pairs will be:

  • LoadUrlDataTask, LoadUrlDataJob
  • IndexKeywordsTask, IndexKeywordsJob

Much of the game is being played in LoadUrlDataJob. Its role is envisioned as follows:

  1. Make HTTP request to URL
  2. Gather response data from URL
  3. Trigger IndexKeywordsTask for URL data
  4. Fetch new URLs from URL data
  5. Trigger LoadUrlDataTask for new URLs

While the rest have simple roles:

  • LoadUrlDataTask = Return one LoadUrlDataJob
  • IndexKeywordsTask = Return one IndexKeywordsJob
  • IndexKeywordsJob = Parse data and index keywords

In other words, an IndexKeywords job would index keywords and die. In contrast, a LoadUrlData job would trigger exactly one IndexKeyword job and trigger potentially multiple LoadUrlData jobs.

Let’s look at the sources:

package net.srirangan.simplewebcrawler.tasks

import java.lang.String
import java.util.{List,ArrayList}
import org.gridgain.grid._
import net.srirangan.simplewebcrawler.jobs.LoadUrlJob

class LoadUrlTask extends GridTaskNoReduceSplitAdapter[String] {

  def split(gridSize:Int, url:String):List[GridJob] = {
    val jobs:List[GridJob] = new ArrayList[GridJob]()
    val job:GridJob = new LoadUrlJob(url)
    jobs.add(job)
    jobs
  }
  
}
package net.srirangan.simplewebcrawler.jobs

import java.lang.String
import java.util.{List,ArrayList}
import org.gridgain.grid.GridJobAdapterEx
import org.gridgain.scalar.scalar._
import net.srirangan.simplewebcrawler.tasks.{LoadUrlTask,IndexKeywordsTask}

class LoadUrlJob(url:String) extends GridJobAdapterEx {
  def execute():Object = {
    println("load url for - " + url)

    val data:String = "this is data for " + url
    val urls:List[String] = new ArrayList[String]()

    //
    // .. actual parser logic comes here
    // .. data:String will contain the contents of url:String
    // .. urls:List is a list of all new URLs found in data:String
    //
    
    // Start indexing keywords for data:String from url:String
    grid.execute(classOf[IndexKeywordsTask], data).get
    
    // adding dummy url in urls:List
    urls.add(url + ".1")

    // start load url for urls:List
    while( urls.iterator.hasNext() ) {
      val url:String = urls.iterator.next()
      grid.execute(classOf[LoadUrlTask], url).get
    }

    data
  }
}
package net.srirangan.simplewebcrawler.tasks

import java.lang.String
import java.util.{List,ArrayList}
import org.gridgain.grid.GridJob
import org.gridgain.grid.GridTaskNoReduceSplitAdapter
import net.srirangan.simplewebcrawler.jobs.IndexKeywordsJob

class IndexKeywordsTask extends GridTaskNoReduceSplitAdapter[String] {

  protected def split( gridSize:Int, url:String):List[GridJob] = {
    val jobs:List[GridJob] = new ArrayList[GridJob]()
    val job:GridJob = new IndexKeywordsJob(url)
    jobs.add(job)
    jobs
  }
  
}
package net.srirangan.simplewebcrawler.jobs

import java.lang.String
import org.gridgain.grid.GridJobAdapterEx
import org.gridgain.scalar.scalar._

class IndexKeywordsJob(data:String) extends GridJobAdapterEx {
  def execute():Object = {
    println(data)
    // .. actual indexing logic comes here
    null
  }
}

Complete Mavenized sources for Scala GridGain SimpleWebCrawler can be found on GitHub.com – https://github.com/Srirangan/simplewebcrawler

A quick look at the role of LoadUrlDataJob and we know that this needs to scale and scale big. Here is a visualization showing three levels of LoadUrlData wherein each LoadUrlDataJob spawns three other LoadUrlDataJobs and one IndexKeywords Job.

GridGain takes care of this seamlessly and divides the tasks among available nodes without any configuration or instruction. Here are screenshots showing three nodes of GridGain, one inside my IDE while other two on the console.

Is this a perfect web crawler? No. Far from it. For one, you need to control its spawn-rate else your machine will die. :-)

But it is an example that does showcase the power of GridGain and the ease with which Scala / Scalar can leverage it.


Srirangan is a programmer / senior consultant with Inphina Technologies
Blog   GitHub   LinkedIn   Twitter

Follow

Get every new post delivered to your Inbox.

%d bloggers like this: