Thursday, February 03, 2011

Yearning to Be at AGBT, Day 2

Day 2 of AGBT had a Twitter feed stuffed with complaints about the Twitter/blogging policy at AGBT. It's an opt-in, not opt-out, policy. While many understood why Rick Wilson presenting a colleague's unpublished data shouldn't be tweeted out of respect, there was clearly frustration at talks which didn't have an obvious reason -- especially ones dominated by (or perhaps entirely?) published data. Plus, a number of those monitoring twitter confused the policy for a no-blogging rule. Anthony Fejes has been blogging from the meeting and is keeping a running list of who isn't allowing tweeting/blogging. Tonight's sessions seemed to be particularly untweetable.

Alas, some of the talks I wanted to sit in on weren't tweetable -- Amanda is very disappointed that the first dog talk of the conference couldn't be yipped (but is supposedly in press). But, even with the hush-hush over some talks. Apparently another talk (or the same one) full of kitten pictures couldn't be mewed either.

Another big problem at the meeting has been poor WiFi connectivity -- and some trying to use cellular data connections have had problems too. Not an uncommon problem with public WiFi, but a bit embarassing at a tech-centric meeting.

An odd Twitter comment was one claiming more tweets about the parties than the meeting itself; someone is watching a different Twitter feed than I am! On a positive note, #AGBT seems to have completely squashed #AGBT2011, other than a few holdout mugwumps -- nobody is tagging only with the longer version.

At the other end of things, rather odd for one vendor (who shall go nameless here) posting a bad photo (rotten angle, terrible lighting) of their poster via Twitter -- why not post either a good picture or better yet the PDF of the poster!!

On the positive side, Fejes.ca has nice notes on a number of talks. A few highlights extracted: Steven Salzberg has announced that the next version of Bowtie will fully support indels. He also presented data suggesting it is still much faster than some of the competing aligners (such as BWA) yet has similar or perhaps even better sensitivity. He also apparently had a very nice backgrounder on the Burrows-Wheeler transform that is the core of Bowtie, BWA and several other fast aligners -- I think I understand it but could certainly use a good tutorial.

Lots of good cancer talks it seems; Fejes.ca has notes on each. Really, really wish I were there. Also an interesting stat: Broad apparently devotes 42 CPUs to processing data from each HiSeq. YIKES!

Most of the talks tonight were apparently tweetless, but BioNanomatrix apparently impressed with a movie (unlike a much complained about one from PacBio) of 400Kb DNA moving through a channel. Not clear how close this is to practical utility.

On the vendor side, the big news is Complete Genomics releasing 40 genomes this month and 20 more next month. These include a 17 member 3-generation family from CEPH and two trios. Multiple ethnic groups are represented (Northern European, Italian, Mexican, Chinese, Japanese, Yoruban and two different tribes from Kenya). Their announcement is the first I've heard of their open source toolkit CGATools for accessing their data. Also, the data is not only on their FTP site but made available through a cloud environment called Bionimbus -- something I'll need to look at further.

One final Twitter note: I suspect that none of the vendors who hoped to have their products highlighted at academic talks are happy that most of these talks went tweetless. On the other hand, a lot of companies did not allow tweeting of their own talks, but these were early stage companies who don't yet need sales.

Oh, one tidbit from yesterday I forgot to mention. One group (I think it was Sanger) was doing something with BACs for physical mapping. The one tweet I saw wasn't detailed (are they ever?) -- was this with arrayed BACs or in solution pools that have never been picked? Would be curious to know for something else I'm writing for this space.

One final note -- tried out my Perl code and discovered there is a 100 tweet return limit. So, modified the code to step through to older tweets. Here is the new version which will slurp up to 10K messages (just modify the loop variable max to go even crazier; AGBT is only around 500 right now).
#!/usr/bin/perl
use Net::Twitter::Lite;

my $nt = Net::Twitter::Lite->new(
username => $ENV{'TWITTER_USERNAME'},
password => $ENV{'TWITTER_PASSWORD'}
);

my $queryTerm="#AGBT -RT";
my $searchHash={'q'=>$queryTerm,'rpp'=>200};

my $outCount=0;
my $lineCount=0;
my @statuses=();
for (my $i=0; $i<10; $i++)
{
my $r=$nt->search($searchHash);
last if (scalar(@{$r->{'results'}})==0);
push(@statuses,@{$r->{'results'}});
print "#\t",$i,"\t",$statuses[$#statuses]->{'id'},"\n";
#exit;
$searchHash->{'max_id'}=$statuses[$#statuses]->{'id'}-1;
}

foreach my $status(@statuses)
{
my $origText=$status->{'text'};
$lineCount++;
next if ($origText=~/^RT/);
my $text=$origText;
my $url="";
if ($text=~m/(http:[^ ]+)/)
{
$url=$1;
$text=~s/ ?$url//;
}
my @hashes="";
while ($text=~m/ \#([^ ]+)/g)
{
push(@hashes,$1);
}
$text=~s/\ #([^ ]+)//g; $text=~s/ +$//;
print join("\t",$status->{'created_at'},$status->{'from_user'},$url,join(',',grep(/[a-z]/i,@hashes)),
$text),"\n";
$outCount++;
}
print STDERR "Done! $outCount/$lineCount output\n";


1 comment:

Rick said...

"... Also an interesting stat: Broad apparently devotes 42 CPUs to processing data from each HiSeq. YIKES!..."

Assuming you are talking about CPU/cores instead of actual computers, 42 CPUs doesn't sound like a "YIKES!" moment to me.

We don't have an HiSeq (will have a HiScan soon) but do have a SOLiD. I regularly grab 5 or so of our 16-CPU nodes for secondary processing -- say 80 CPUs. Our newest nodes are 24-CPU boxes with 96 or 192 GB memory. Cost of those is $4,500 and $12,300 respectively (which includes power costs and maintenance for 5 years). So in terms of both cost and need, 42 CPUs is ho-hum. :-) What is much more costly is fast and large storage. Having 40-80 CPUs reading/writing 10Gb/s to the same disk at the same time can bring a lot of storage solutions to their knees.

PS: I am enjoying your summary of the tweets.