Thursday, January 28, 2010

A little more Scala

I can't believe how thrilled I was to get a run-time error today! Because that was the first sign I had gotten past the Scala roadblock I mentioned in my previous post. It would have been nicer for the case to just work, but apparently my SAM file was incomplete or corrupt. But, moments later it ran correctly on a BAM file. For better or worse, I deserve nearly no credit for this step forward -- Mr. Google found me a key code example.

The problem I faced is that I have a Java class (from the Picard library for reading alignment data in SAM/BAM format). To get each record, an iterator is provided. But my first few attempts to guess the syntax just didn't work, so it was off to Google.

My first working version is

package hello
import java.io.File
import org.biojava.bio.Annotation
import org.biojava.bio.seq.Sequence
import org.biojava.bio.seq.impl.SimpleSequence
import org.biojava.bio.symbol.SymbolList
import org.biojava.bio.program.abi.ABITrace
import org.biojava.bio.seq.io.SeqIOTools
import net.sf.samtools.SAMFileReader

object HelloWorld extends Application {

val samFile=new File("C:/workspace/short-reads/aln.se.2.sorted.bam")
val inputSam=new SAMFileReader(samFile)
var counter=0

var recs=inputSam.iterator
while (recs.hasNext)
{
var samRec=recs.next;
counter=counter+1
}

println("records: ",counter);

Ah, sweet success. But, while that's a step forward it doesn't really play with anything novel that Scala lends me. The example I found this in was actually implementing something richer, which I then borrowed (same imports as before)

First, I define a class which wraps an iterator and defines a foreach method:

class IteratorWrapper[A](iter:java.util.Iterator[A])
{
def foreach(f: A => Unit): Unit = {
while(iter.hasNext){
f(iter.next)
}
}
}

Second, is the definition within the body of my object of a rule which allows iterators to be automatically converted to my wrapper object. Now, this sounds powerfully dangerous (and vice versa). A key constraint is Scala won't do this if there is any ambiguity -- if there are multiple legal solutions to what to promote to, it won't work. Finally, I rewrite the loop using the foreach construct.

object HelloWorld extends Application {
implicit def iteratorToWrapper[T](iter:java.util.Iterator[T]):IteratorWrapper[T] = new IteratorWrapper[T](iter)

val samFile=new File("C:/workspace/short-reads/aln.se.2.sorted.bam")
val inputSam=new SAMFileReader(samFile)
var counter=0

for (val samRec<-recs)
{
counter=counter+1
}
println("records: ",counter);

Is this really better? Well, I think so -- for me. The code is terse but still clear. This also saves a lot of looking up some standard wordy idioms -- for some reason I never quite locked in the standard read-lines-one-at-a-time loop in C# -- always had to copy an example.

You can take some of this a bit far in Scala -- the syntax allows a lot of flexibility and some of the examples in the O'Reilly book are almost scary. I probably once would have been inspired to write my own domain specific language within Scala, but for now I'll pass.

Am I taking a performance hit with this? Good question -- I'm sort of trusting that the Scala compiler is smart enough to treat this all as syntactic sugar, but for most of what I do performance is well behind readibility and ease of coding & maintenance. Well, until the code becomes painfully slow.

I don't have them in front of me, but I can think of examples from back at Codon where I wanted to treat something like an iterator -- especially a strongly typed one. C# does let you use for loops using anything which implements the IEnumerable interface, but it can get tedious to wrap everything up when using a library class which I think should implement IEnumerable but the designer didn't.

I still have some playing to do, but maybe soon I'll put something together that I didn't have code to do previously. That would be a serious milestone.

1 comment:

Mohieddine said...

hi DR keith robison

i want know the difference between bio informatic and computational biology . i am student first year doctorat i work with systeme complexe and optimization (genetic algorithm. pso .ant colony. )

i want enter in this domain and i want know some problems

please give me your mail for contact
thank you