-
Notifications
You must be signed in to change notification settings - Fork 7
/
index.html
1160 lines (736 loc) · 60.5 KB
/
index.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
<!DOCTYPE html>
<!--[if IEMobile 7 ]><html class="no-js iem7"><![endif]-->
<!--[if lt IE 9]><html class="no-js lte-ie8"><![endif]-->
<!--[if (gt IE 8)|(gt IEMobile 7)|!(IEMobile)|!(IE)]><!--><html class="no-js" lang="en"><!--<![endif]-->
<head>
<meta charset="utf-8">
<title>Big Data Genomics</title>
<meta name="author" content="Big Data Genomics">
<!-- http://t.co/dKP3o1e -->
<meta name="HandheldFriendly" content="True">
<meta name="MobileOptimized" content="320">
<meta name="viewport" content="width=device-width, initial-scale=1">
<link rel="canonical" href="http://bigdatagenomics.github.io">
<link href="/favicon.png" type="image/png" rel="icon">
<link href="/atom.xml" rel="alternate" title="Big Data Genomics" type="application/atom+xml">
<link href="/assets/bootstrap/dist/css/bootstrap.min.css" rel="stylesheet" type="text/css">
<link href="/assets/bootstrap/dist/css/bootstrap-theme.min.css" rel="stylesheet" type="text/css">
<link href="/stylesheets/screen.css" media="screen, projection" rel="stylesheet" type="text/css">
<script src="/javascripts/libs/jquery/jquery-2.0.3.min.js"></script>
<script type="text/javascript">
var _gaq = _gaq || [];
_gaq.push(['_setAccount', 'UA-49914824-1']);
_gaq.push(['_trackPageview']);
(function() {
var ga = document.createElement('script'); ga.type = 'text/javascript'; ga.async = true;
ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 'http://www') + '.google-analytics.com/ga.js';
var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(ga, s);
})();
</script>
</head>
<body >
<div id="wrap">
<header role="banner">
<nav class="navbar navbar-default" role="navigation">
<div class="container">
<div class="navbar-header">
<button type="button" class="navbar-toggle" data-toggle="collapse" data-target=".navbar-collapse">
<span class="icon-bar"></span>
<span class="icon-bar"></span>
<span class="icon-bar"></span>
</button>
<a class="navbar-brand" href="/">Big Data Genomics</a>
</div>
<div class="navbar-collapse collapse">
<ul class="nav navbar-nav">
<li class="active">
<a href="/">Blog</a>
</li>
<li >
<a href="/blog/archives">Archives</a>
</li>
<li >
<a href="/projects">Projects</a>
</li>
<li >
<a href="/mail">Mailing List</a>
</li>
<li >
<a href="/cla">CLAs</a>
</li>
</ul>
<ul class="nav navbar-nav navbar-right">
<li>
<a class="subscribe-rss" href="/atom.xml" title="subscribe via RSS">
<span class="visible-xs">RSS</span>
<img class="hidden-xs" src="/images/rss.png" alt="RSS">
</a>
</li>
</ul>
<form class="search navbar-form navbar-right" action="http://google.com/search" method="GET">
<input type="hidden" name="q" value="site:bigdatagenomics.github.io">
<div class="form-group">
<input class="form-control" type="text" name="q" placeholder="Search">
</div>
</form>
</div>
</div>
</nav>
</header>
<div id="main" class="container">
<div id="content">
<img src="/images/github.png" /><a href="https://github.com/bigdatagenomics/">https://github.com/bigdatagenomics/</a> <img src="/images/twitter.png"/><a href="https://twitter.com/bigdatagenomics">https://twitter.com/bigdatagenomics/</a><br/>
<div class="row">
<div class="page-content col-md-9">
<div class="blog-index">
<article class="post">
<header class="page-header">
<p class="meta text-muted text-uppercase">
<span class="glyphicon glyphicon-calendar"></span> <time datetime="2018-12-01T00:00:00-08:00" pubdate data-updated="true">Dec 1<span>st</span>, 2018</time>
</p>
<h1 class="entry-title"><a href="/blog/2018/12/01/adam-0-dot-25-dot-0-cannoli-0-dot-3-dot-0-releases/">ADAM 0.25.0 and Cannoli 0.3.0 Released</a></h1>
</header>
<div class="entry-content clearfix"><p>ADAM <a href="https://github.com/bigdatagenomics/adam/releases">version 0.25.0</a> and
Cannoli <a href="https://github.com/bigdatagenomics/cannoli/releases">version 0.3.0</a> have been released!</p>
<p>Since the 0.24.0 release of ADAM, more then 40 issues have been closed, including bug fixes around
indexed reads and attributes in VCF. New features include additional filter by methods and multi-sample
coverage. The ADAM Python APIs now support Python 3.</p>
<p>Based on feedback from the <a href="https://www.open-bio.org/wiki/BOSC_2018">2018 GCCBOSC bioinformatics community conference</a>,
at <a href="https://galaxyproject.org/events/gccbosc2018/collaboration/">2018 GCCBOSC CollaborationFest</a> the Cannoli API
was refactored to greatly improve interactive use in <code>cannoli-shell</code> (a Scala REPL based on Spark Shell, similar
to <code>adam-shell</code>) and notebooks such as <a href="https://jupyter.org/">Jupyter</a>, <a href="https://zeppelin.apache.org/">Zeppelin</a>,
and <a href="http://spark-notebook.io/">Spark Notebook</a>.</p>
<p>For example, here is an entire variant calling pipeline, based on bwa, ADAM, and Freebayes</p>
<figure class='code'><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
<span class='line-number'>6</span>
<span class='line-number'>7</span>
<span class='line-number'>8</span>
<span class='line-number'>9</span>
<span class='line-number'>10</span>
<span class='line-number'>11</span>
<span class='line-number'>12</span>
<span class='line-number'>13</span>
<span class='line-number'>14</span>
<span class='line-number'>15</span>
<span class='line-number'>16</span>
<span class='line-number'>17</span>
<span class='line-number'>18</span>
<span class='line-number'>19</span>
<span class='line-number'>20</span>
<span class='line-number'>21</span>
<span class='line-number'>22</span>
<span class='line-number'>23</span>
</pre></td><td class='code'><pre><code class=''><span class='line'>import org.bdgenomics.adam.rdd.ADAMContext._
</span><span class='line'>import org.bdgenomics.cannoli.cli._
</span><span class='line'>import org.bdgenomics.cannoli.cli.Cannoli._
</span><span class='line'>
</span><span class='line'>val sample = "sample"
</span><span class='line'>val reference = "ref.fa"
</span><span class='line'>
</span><span class='line'>val reads = sc.loadPairedFastqAsFragments(sample + "_1.fq", sample + "_2.fq")
</span><span class='line'>
</span><span class='line'>val bwaArgs = new BwaArgs()
</span><span class='line'>bwaArgs.sample = sample
</span><span class='line'>bwaArgs.indexPath = reference
</span><span class='line'>
</span><span class='line'>val alignments = reads.alignWithBwa(bwaArgs)
</span><span class='line'>val sorted = alignments.sortReadsByReferencePositionAndIndex()
</span><span class='line'>val markdup = sorted.markDuplicates()
</span><span class='line'>
</span><span class='line'>val freebayesArgs = new FreebayesArgs()
</span><span class='line'>freebayesArgs.referencePath = reference
</span><span class='line'>
</span><span class='line'>val variantContexts = markdup.callVariantsWithFreebayes(freebayesArgs)
</span><span class='line'>
</span><span class='line'>variantContexts.saveAsVcf(sample + ".freebayes.vcf.bgzf")</span></code></pre></td></tr></table></div></figure>
<h1>Changes since Previous Releases</h1>
<p>The full list of changes to ADAM since version 0.24.0 and Cannoli since version 0.2.0 are below.</p>
</div>
<footer>
<a class="btn btn-default" rel="full-article" href="/blog/2018/12/01/adam-0-dot-25-dot-0-cannoli-0-dot-3-dot-0-releases/">Read on →</a>
</footer>
</article>
<article class="post">
<header class="page-header">
<p class="meta text-muted text-uppercase">
<span class="glyphicon glyphicon-calendar"></span> <time datetime="2018-03-28T00:00:00-07:00" pubdate data-updated="true">Mar 28<span>th</span>, 2018</time>
</p>
<h1 class="entry-title"><a href="/blog/2018/03/28/adam-0-dot-24-dot-0-cannoli-0-dot-2-dot-0-releases/">ADAM 0.24.0 and Cannoli 0.2.0 Released</a></h1>
</header>
<div class="entry-content clearfix"><p>ADAM <a href="https://github.com/bigdatagenomics/adam/releases">version 0.24.0</a> and
Cannoli <a href="https://github.com/bigdatagenomics/cannoli/releases">version 0.2.0</a> have been released!</p>
<p>As of version 0.24.0, support for Spark version 1.x and Scala 2.10.x has been dropped. ADAM and
Cannoli currently build against Spark version 2.3.0 and Scala version 2.11.12.</p>
<p>Major new features in ADAM version 0.24.0 include Spark SQL support across all genomic data
types and access to the ADAM region join API through Python and R. The ADAM Python and R APIs are
now feature complete relative to ADAM’s Java API. ADAM version 0.24.0 also introduces
Hive-style partitioning by genomic range for Parquet-backed Datasets. This greatly improves
performance for genomic range based queries.</p>
<p>With version 0.2.0, Cannoli now provides a functional API for interactive use in
<code>cannoli-shell</code> (a Scala REPL based on Spark Shell, similar to <code>adam-shell</code>) and
notebooks such as <a href="https://jupyter.org/">Jupyter</a>, <a href="https://zeppelin.apache.org/">Zeppelin</a>,
and <a href="http://spark-notebook.io/">Spark Notebook</a>. This API allows for multiple
Cannoli-wrapped bioinformatics tools as processes in a larger Spark-based workflow
without having to write out to disk intermediately.</p>
<h1>Changes since Previous Releases</h1>
<p>The full list of changes to ADAM since version 0.23.0 and Cannoli since version 0.1.0 are below.</p>
</div>
<footer>
<a class="btn btn-default" rel="full-article" href="/blog/2018/03/28/adam-0-dot-24-dot-0-cannoli-0-dot-2-dot-0-releases/">Read on →</a>
</footer>
</article>
<article class="post">
<header class="page-header">
<p class="meta text-muted text-uppercase">
<span class="glyphicon glyphicon-calendar"></span> <time datetime="2018-01-04T09:47:53-08:00" pubdate data-updated="true">Jan 4<span>th</span>, 2018</time>
</p>
<h1 class="entry-title"><a href="/blog/2018/01/04/adam-0-dot-23-dot-0-released-plus-avocado-cannoli-and-deca-releases/">ADAM 0.23.0 Released (+ Avocado and DECA Releases)</a></h1>
</header>
<div class="entry-content clearfix"><p>We are excited to announce the availability of the ADAM 0.23.0 release, along
with releases of Avocado germline variant caller (release 0.1.0) and the DECA
copy number variant caller (release 0.2.0). These releases contain an extensive
number of feature additions, performance improvements, and bug patches, with
over 375 issues closed and pull requests merged or closed since the last ADAM
release.</p>
<p>Some of the highlights include:</p>
<ul>
<li>A validated, high-performance end-to-end alignment/variant calling pipeline
using ADAM, Cannoli, and Avocado.</li>
<li>Support for manipulating data using Spark SQL.</li>
<li>R and Python APIs for ADAM, including the ability to get a working deployment
of ADAM simply by running <code>pip install bdgenomics.adam</code>.</li>
</ul>
<p>With this release, we have also moved our documentation to Read The Docs:</p>
<ul>
<li><a href="http://adam.readthedocs.io/en/latest/">Read the Docs for ADAM</a></li>
<li><a href="http://bdg-avocado.readthedocs.io/en/latest/">Read the Docs for Avocado</a></li>
<li><a href="http://bdg-deca.readthedocs.io/en/latest/">Read the Docs for DECA</a></li>
</ul>
<p>This documentation describes how to deploy our tools on a variety of platforms,
including a local cluster, cloud computing, and through the
<a href="https://github.com/bd2kgenomics/toil">Toil</a> workflow manager. We already have
a <code>pip</code> installable Toil workflow for calling copy number variants with DECA,
which is packaged as part of the
<a href="http://bdg-workflows.readthedocs.io/en/latest/">bdgenomics.workflows</a> library.</p>
<p>This release is the last release of ADAM that supports Spark 1.x and Scala 2.10.
The upcoming release of ADAM will only support Spark 2.x and Scala 2.11. Avocado
and DECA have already dropped support for Spark 1.x.</p>
<p>Over the upcoming few weeks, we are working on a release of
<a href="https://github.com/bigdatagenomics/cannoli">Cannoli</a>, as well as Toil workflows
for running the ADAM/Avocado/Cannoli variant calling pipeline, and a preprint
describing the pipeline in more depth. We also are working on a release of the
<a href="https://github.com/bigdatagenomics/mango">Mango</a> visualization tool, which uses
ADAM as a backend for interactively visualizing large genomics datasets. Stay
tuned for more info!</p>
<h1>Variant Calling with Cannoli, ADAM, Avocado, and DECA</h1>
<p>With the collection of tools we have released, you can run highly rapid and
accurate variant calling entirely in Apache Spark. While we have introduced
Avocado and DECA earlier in this post, we haven’t talked about Cannoli yet.
Cannoli—-Italian for “a little pipe”—-uses ADAM’s <a href="http://adam.readthedocs.io/en/adam-parent_2.11-0.23.0/api/pipes/">pipe API</a>
to parallelize commonly used genomics tools. Currently, Cannoli supports
aligning reads with Bowtie, Bowtie2, and BWA; calling variants with FreeBayes;
and annotating variant effects with SnpEff. We are working on support for many
more tools, as you can see in our <a href="https://github.com/bigdatagenomics/cannoli/issues">issue tracker</a>.
Please let us know if you are interested in any specific tool—-or even
better—-in helping us add support for a specific tool. ADAM’s pipe API makes
it extremely easy to parallelize an existing single node genomic analysis tool,
and most tools can be implemented on top of the pipe API in less than 10 lines
of code. For example, here’s how you could launch BWA using ADAM’s Pipe API in
Python:</p>
<p><img class="center" src="/images/pipe.png" width="750"></p>
<p>By using Cannoli, we can accelerate alignment with BWA to take approximately
10—15 minutes when running on a 1,024 core cluster.</p>
<p>We can couple this rapid alignment pipeline with the fast preprocessing stages in
ADAM and the variant calling stages in Avocado to call variants on a 60x coverage WGS
dataset in approximately 45 minutes on a 1,024 core cluster. Avocado can be used to
call variants on a single sample, or to jointly call variants using a <a href="http://bdg-avocado.readthedocs.io/en/latest/workflows/joint.html">gVCF-based
workflow</a>. When
running on 1,024 cores, we were able to jointly genotype more than 10TB of gVCFs
within approximately 6 hours. Avocado has >99% accuracy when genotyping SNPs,
and >96% accuracy when genotyping INDELs. Detailed benchmarking results can be
found in <a href="https://www2.eecs.berkeley.edu/Pubs/TechRpts/2017/EECS-2017-204.pdf">Chapter 8 of this thesis</a>.
Avocado is two times faster than the GATK4’s Spark-based implementation of the
HaplotypeCaller, although it is worth pointing out that this is an unfair
comparison, as the HaplotypeCaller performs local reassembly, while Avocado does
not.</p>
<p>One interesting comparison is between the duplicate marking and BQSR tools in
ADAM and in the GATK4. In both cases, ADAM’s implementation is faster than the
GATK4’s equivalent implementation.</p>
<p><img class="center" src="/images/speedup-md.png"></p>
<p><img class="center" src="/images/speedup-bqsr.png"></p>
<p>We have work-in-progress towards a Spark SQL-based implementation of duplicate
marking, which will provide an additional >20% performance improvement. We hope to
introduce this new duplicate marker in the 0.24.0 release of ADAM.</p>
<h1>Manipulating Data using Spark SQL</h1>
<p>Since Apache Spark 1.6, there has been a major push in the Spark project to
rearchitect Spark around the Catalyst query optimizer and the Tungsten code
execution engine. These two engines are hidden behind Spark SQL’s DataFrame
and Dataset APIs, which provide a SQL-like interface for manipulating data
using Spark. Unlike Spark’s Resilient Distributed Dataset (RDD) API, the
DataFrame API allows the Catalyst query optimizer to examine the function that
the user is running. Catalyst can then rewrite the query so that it runs in a
more efficient manner, and can implement the query using the Tungsten engine
with performance that approaches native performance. This can provide
order-of-magnitude performance improvements for some queries, and it also
provides users with uniform query performance across Scala, Java, SQL, Python,
and R.</p>
<p>Although Spark SQL was introduced in 2015, we were not able to take advantage
of Spark SQL in ADAM until recently. While ADAM has always described genomics
data using a set of schemas, the library we used to represent these schemas
(<a href="https://avro.apache.org">Apache Avro</a>) was not compatible with Spark SQL. To
resolve this, we updated our core <a href="http://adam.readthedocs.io/en/adam-parent_2.11-0.23.0/api/genomicRdd/"><code>GenomicRDD</code> interfaces</a>
to transparently convert between Spark’s RDD and DataFrame/Dataset APIs. We
describe the architecture we use for converting between these two representations
<a href="http://adam.readthedocs.io/en/adam-parent_2.11-0.23.0/api/genomicRdd/#transforming-genomicrdds-via-spark-sql">here</a>.
With the Spark SQL query interfaces built into <code>GenomicRDD</code>s, you can begin
running SQL queries on genomic data in fewer than 5 lines of code:</p>
<figure class='code'><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
<span class='line-number'>6</span>
<span class='line-number'>7</span>
<span class='line-number'>8</span>
<span class='line-number'>9</span>
<span class='line-number'>10</span>
<span class='line-number'>11</span>
<span class='line-number'>12</span>
<span class='line-number'>13</span>
<span class='line-number'>14</span>
<span class='line-number'>15</span>
<span class='line-number'>16</span>
<span class='line-number'>17</span>
<span class='line-number'>18</span>
<span class='line-number'>19</span>
<span class='line-number'>20</span>
<span class='line-number'>21</span>
<span class='line-number'>22</span>
<span class='line-number'>23</span>
<span class='line-number'>24</span>
<span class='line-number'>25</span>
<span class='line-number'>26</span>
<span class='line-number'>27</span>
<span class='line-number'>28</span>
<span class='line-number'>29</span>
<span class='line-number'>30</span>
<span class='line-number'>31</span>
<span class='line-number'>32</span>
<span class='line-number'>33</span>
<span class='line-number'>34</span>
<span class='line-number'>35</span>
<span class='line-number'>36</span>
<span class='line-number'>37</span>
<span class='line-number'>38</span>
<span class='line-number'>39</span>
<span class='line-number'>40</span>
<span class='line-number'>41</span>
<span class='line-number'>42</span>
</pre></td><td class='code'><pre><code class=''><span class='line'>$ adam-shell
</span><span class='line'>
</span><span class='line'>Welcome to
</span><span class='line'> ____ __
</span><span class='line'> / __/__ ___ _____/ /__
</span><span class='line'> _\ \/ _ \/ _ `/ __/ '_/
</span><span class='line'> /___/ .__/\_,_/_/ /_/\_\ version 2.2.1
</span><span class='line'> /_/
</span><span class='line'>
</span><span class='line'>Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_152)
</span><span class='line'>
</span><span class='line'>scala> import org.bdgenomics.adam.rdd.ADAMContext._
</span><span class='line'>import org.bdgenomics.adam.rdd.ADAMContext._
</span><span class='line'>
</span><span class='line'>scala> val reads = sc.loadAlignments("adam-core/src/test/resources/small.sam")
</span><span class='line'>reads: org.bdgenomics.adam.rdd.read.AlignmentRecordRDD = RDDBoundAlignmentRecordRDD with 2 reference sequences, 0 read groups, and 2 processing steps
</span><span class='line'>
</span><span class='line'>scala> reads.transformDataset(_.filter("readMapped=true")).dataset.show
</span><span class='line'>+--------------+----------+---------+-----------+---------+----+--------------------+--------------------+----+-----+--------+---------------------+-------------------+----------+----------+----------+----------+-------------------------+-------------+------------------+------------------+----------------+------------------+----------------------+--------------------+--------+--------------------+---------------+-----------------+------------------+--------------+------------------+
</span><span class='line'>|readInFragment|contigName| start|oldPosition| end|mapq| readName| sequence|qual|cigar|oldCigar|basesTrimmedFromStart|basesTrimmedFromEnd|readPaired|properPair|readMapped|mateMapped|failedVendorQualityChecks|duplicateRead|readNegativeStrand|mateNegativeStrand|primaryAlignment|secondaryAlignment|supplementaryAlignment|mismatchingPositions|origQual| attributes|recordGroupName|recordGroupSample|mateAlignmentStart|mateContigName|inferredInsertSize|
</span><span class='line'>+--------------+----------+---------+-----------+---------+----+--------------------+--------------------+----+-----+--------+---------------------+-------------------+----------+----------+----------+----------+-------------------------+-------------+------------------+------------------+----------------+------------------+----------------------+--------------------+--------+--------------------+---------------+-----------------+------------------+--------------+------------------+
</span><span class='line'>| 0| 1| 26472783| null| 26472858| 60|simread:1:2647278...|GTATAAGAGCAGCCTTA...|null| 75M| null| 0| 0| false| false| true| false| false| false| true| false| true| false| false| null| null|XS:i:0 AS:i:75 NM...| null| null| null| null| null|
</span><span class='line'>| 0| 1|240997787| null|240997862| 60|simread:1:2409977...|CTTTATTTTTATTTTTA...|null| 75M| null| 0| 0| false| false| true| false| false| false| false| false| true| false| false| null| null|XS:i:39 AS:i:75 N...| null| null| null| null| null|
</span><span class='line'>| 0| 1|189606653| null|189606728| 60|simread:1:1896066...|TGTATCTTCCTCCCCTG...|null| 75M| null| 0| 0| false| false| true| false| false| false| false| false| true| false| false| null| null|XS:i:0 AS:i:75 NM...| null| null| null| null| null|
</span><span class='line'>| 0| 1|207027738| null|207027813| 60|simread:1:2070277...|TTTAATAAATGTTGATT...|null| 75M| null| 0| 0| false| false| true| false| false| false| false| false| true| false| false| null| null|XS:i:0 AS:i:75 NM...| null| null| null| null| null|
</span><span class='line'>| 0| 1| 14397233| null| 14397308| 60|simread:1:1439723...|TAAAATGCCCCCATCTT...|null| 75M| null| 0| 0| false| false| true| false| false| false| true| false| true| false| false| null| null|XS:i:0 AS:i:75 NM...| null| null| null| null| null|
</span><span class='line'>| 0| 1|240344442| null|240344517| 24|simread:1:2403444...|TACAGGCACCCACCATC...|null| 75M| null| 0| 0| false| false| true| false| false| false| false| false| true| false| false| null| null|XS:i:61 AS:i:75 N...| null| null| null| null| null|
</span><span class='line'>| 0| 1|153978724| null|153978799| 60|simread:1:1539787...|GCTCACTGCAGCCTCAA...|null| 75M| null| 0| 0| false| false| true| false| false| false| true| false| true| false| false| null| null|XS:i:0 AS:i:75 NM...| null| null| null| null| null|
</span><span class='line'>| 0| 1|237728409| null|237728484| 28|simread:1:2377284...|TTTCTTTTTCTTTCTTT...|null| 75M| null| 0| 0| false| false| true| false| false| false| false| false| true| false| false| null| null|XS:i:59 AS:i:75 N...| null| null| null| null| null|
</span><span class='line'>| 0| 1|231911906| null|231911981| 60|simread:1:2319119...|TCATGTAGCATGCATAT...|null| 75M| null| 0| 0| false| false| true| false| false| false| true| false| true| false| false| null| null|XS:i:0 AS:i:75 NM...| null| null| null| null| null|
</span><span class='line'>| 0| 1| 50683371| null| 50683446| 60|simread:1:5068337...|GCTCAGGCCTTGCAAGA...|null| 75M| null| 0| 0| false| false| true| false| false| false| true| false| true| false| false| null| null|XS:i:0 AS:i:75 NM...| null| null| null| null| null|
</span><span class='line'>| 0| 1| 37577445| null| 37577520| 60|simread:1:3757744...|CCTAGAGAAGCTCCCAC...|null| 75M| null| 0| 0| false| false| true| false| false| false| true| false| true| false| false| null| null|XS:i:0 AS:i:75 NM...| null| null| null| null| null|
</span><span class='line'>| 0| 1|195211965| null|195212040| 60|simread:1:1952119...|AAATAAAGTTTGGCTTT...|null| 75M| null| 0| 0| false| false| true| false| false| false| true| false| true| false| false| null| null|XS:i:0 AS:i:75 NM...| null| null| null| null| null|
</span><span class='line'>| 0| 1|163841413| null|163841488| 60|simread:1:1638414...|TGTGTAACTAACATAAT...|null| 75M| null| 0| 0| false| false| true| false| false| false| true| false| true| false| false| null| null|XS:i:0 AS:i:75 NM...| null| null| null| null| null|
</span><span class='line'>| 0| 1|101556378| null|101556453| 60|simread:1:1015563...|TTTATTTTTTGAGCATG...|null| 75M| null| 0| 0| false| false| true| false| false| false| true| false| true| false| false| null| null|XS:i:0 AS:i:75 NM...| null| null| null| null| null|
</span><span class='line'>| 0| 1| 20101800| null| 20101875| 35|simread:1:2010180...|CTCAGGTGATCCACCCG...|null| 75M| null| 0| 0| false| false| true| false| false| false| false| false| true| false| false| null| null|XS:i:55 AS:i:75 N...| null| null| null| null| null|
</span><span class='line'>| 0| 1|186794283| null|186794358| 60|simread:1:1867942...|GACAAGATAGTACTTGA...|null| 75M| null| 0| 0| false| false| true| false| false| false| false| false| true| false| false| null| null|XS:i:0 AS:i:75 NM...| null| null| null| null| null|
</span><span class='line'>| 0| 1|165341382| null|165341457| 60|simread:1:1653413...|CTACTCTCATTGACTGT...|null| 75M| null| 0| 0| false| false| true| false| false| false| false| false| true| false| false| null| null|XS:i:0 AS:i:75 NM...| null| null| null| null| null|
</span><span class='line'>| 0| 1| 5469106| null| 5469181| 60|simread:1:5469106...|CTCATTCTCTCTCCTGC...|null| 75M| null| 0| 0| false| false| true| false| false| false| false| false| true| false| false| null| null|XS:i:0 AS:i:75 NM...| null| null| null| null| null|
</span><span class='line'>| 0| 1| 89554252| null| 89554327| 60|simread:1:8955425...|AAATTAAACAGCTCGTT...|null| 75M| null| 0| 0| false| false| true| false| false| false| true| false| true| false| false| null| null|XS:i:0 AS:i:75 NM...| null| null| null| null| null|
</span><span class='line'>| 0| 1|169801933| null|169802008| 40|simread:1:1698019...|AGACTGGGTCTCACTAT...|null| 75M| null| 0| 0| false| false| true| false| false| false| false| false| true| false| false| null| null|XS:i:52 AS:i:75 N...| null| null| null| null| null|
</span><span class='line'>+--------------+----------+---------+-----------+---------+----+--------------------+--------------------+----+-----+--------+---------------------+-------------------+----------+----------+----------+----------+-------------------------+-------------+------------------+------------------+----------------+------------------+----------------------+--------------------+--------+--------------------+---------------+-----------------+------------------+--------------+------------------+</span></code></pre></td></tr></table></div></figure>
<p>While Spark SQL has specific optimizations for loading data from Apache Parquet
files, ADAM can be used to run Spark SQL queries against data stored in most
common genomics file formats, including SAM/BAM/CRAM, FASTQ, VCF/BCF, BED,
GTF/GFF3, IntervalList, NarrowPeak, FASTA and more.</p>
<h1>Using ADAM through Python and R</h1>
<p>As mentioned above, one of the major advantages of Spark SQL is that it provides
uniform query performance across Scala, Java, Python, and R. While ADAM is
mostly written in Scala, we have maintained Java APIs for a long time. However,
we have previously been unable to support Python or R APIs. Adding support
for Spark SQL eliminated the major issues that prevented us from adding Python
and R APIs. This release of ADAM introduces the <code>bdgenomics.adam</code> packages for
Python and R. Our Python API can be installed using <code>pip install
bdgenomics.adam</code>, and our R API is available from
<a href="https://github.com/bigdatagenomics/adam/releases/download/adam-parent-spark2_2.11-0.23.0/bdgenomics.adam_0.23.0.tar.gz">GitHub</a>.
We hope to make our R API available through CRAN in the 0.24.0 release of ADAM;
we are blocked on an issue upstream in Apache Spark and are tracking progress on
this issue at <a href="https://github.com/bigdatagenomics/adam/issues/1851">ADAM-1851</a>.</p>
<p>In addition to installing the <code>bdgenomics.adam</code> libraries, running <code>pip install
bdgenomics.adam</code> installs all of the ADAM command line tools:</p>
<figure class='code'><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
<span class='line-number'>6</span>
<span class='line-number'>7</span>
<span class='line-number'>8</span>
<span class='line-number'>9</span>
<span class='line-number'>10</span>
<span class='line-number'>11</span>
<span class='line-number'>12</span>
<span class='line-number'>13</span>
<span class='line-number'>14</span>
<span class='line-number'>15</span>
<span class='line-number'>16</span>
<span class='line-number'>17</span>
<span class='line-number'>18</span>
<span class='line-number'>19</span>
<span class='line-number'>20</span>
<span class='line-number'>21</span>
<span class='line-number'>22</span>
<span class='line-number'>23</span>
<span class='line-number'>24</span>
<span class='line-number'>25</span>
<span class='line-number'>26</span>
<span class='line-number'>27</span>
<span class='line-number'>28</span>
<span class='line-number'>29</span>
<span class='line-number'>30</span>
<span class='line-number'>31</span>
<span class='line-number'>32</span>
<span class='line-number'>33</span>
<span class='line-number'>34</span>
<span class='line-number'>35</span>
<span class='line-number'>36</span>
<span class='line-number'>37</span>
<span class='line-number'>38</span>
<span class='line-number'>39</span>
<span class='line-number'>40</span>
<span class='line-number'>41</span>
<span class='line-number'>42</span>
<span class='line-number'>43</span>
<span class='line-number'>44</span>
<span class='line-number'>45</span>
<span class='line-number'>46</span>
<span class='line-number'>47</span>
<span class='line-number'>48</span>
<span class='line-number'>49</span>
<span class='line-number'>50</span>
<span class='line-number'>51</span>
<span class='line-number'>52</span>
<span class='line-number'>53</span>
<span class='line-number'>54</span>
</pre></td><td class='code'><pre><code class=''><span class='line'>$ pip install bdgenomics.adam
</span><span class='line'>...
</span><span class='line'>Successfully installed bdgenomics.adam-0.23.0 py4j-0.10.4 pyspark-2.2.1
</span><span class='line'>
</span><span class='line'>$ adam-submit
</span><span class='line'>
</span><span class='line'> e 888~-_ e e e
</span><span class='line'> d8b 888 \ d8b d8b d8b
</span><span class='line'> /Y88b 888 | /Y88b d888bdY88b
</span><span class='line'> / Y88b 888 | / Y88b / Y88Y Y888b
</span><span class='line'> /____Y88b 888 / /____Y88b / YY Y888b
</span><span class='line'> / Y88b 888_-~ / Y88b / Y888b
</span><span class='line'>
</span><span class='line'>Usage: adam-submit [<spark-args> --] <adam-args>
</span><span class='line'>
</span><span class='line'>Choose one of the following commands:
</span><span class='line'>
</span><span class='line'>ADAM ACTIONS
</span><span class='line'> countKmers : Counts the k-mers/q-mers from a read dataset.
</span><span class='line'> countContigKmers : Counts the k-mers/q-mers from a read dataset.
</span><span class='line'> transformAlignments : Convert SAM/BAM to ADAM format and optionally perform read pre-processing transformations
</span><span class='line'> transformFeatures : Convert a file with sequence features into corresponding ADAM format and vice versa
</span><span class='line'> transformGenotypes : Convert a file with genotypes into corresponding ADAM format and vice versa
</span><span class='line'> transformVariants : Convert a file with variants into corresponding ADAM format and vice versa
</span><span class='line'> mergeShards : Merges the shards of a file
</span><span class='line'> reads2coverage : Calculate the coverage from a given ADAM file
</span><span class='line'>
</span><span class='line'>CONVERSION OPERATIONS
</span><span class='line'> fasta2adam : Converts a text FASTA sequence file into an ADAMNucleotideContig Parquet file which represents assembled sequences.
</span><span class='line'> adam2fasta : Convert ADAM nucleotide contig fragments to FASTA files
</span><span class='line'> adam2fastq : Convert BAM to FASTQ files
</span><span class='line'> transformFragments : Convert alignment records into fragment records.
</span><span class='line'>
</span><span class='line'>PRINT
</span><span class='line'> print : Print an ADAM formatted file
</span><span class='line'> flagstat : Print statistics on reads in an ADAM file (similar to samtools flagstat)
</span><span class='line'> view : View certain reads from an alignment-record file.
</span><span class='line'>
</span><span class='line'>
</span><span class='line'>$ adam-shell
</span><span class='line'>
</span><span class='line'>Welcome to
</span><span class='line'> ____ __
</span><span class='line'> / __/__ ___ _____/ /__
</span><span class='line'> _\ \/ _ \/ _ `/ __/ '_/
</span><span class='line'> /___/ .__/\_,_/_/ /_/\_\ version 2.2.1
</span><span class='line'> /_/
</span><span class='line'>
</span><span class='line'>Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_152)
</span><span class='line'>
</span><span class='line'>scala> import org.bdgenomics.adam.rdd.ADAMContext._
</span><span class='line'>import org.bdgenomics.adam.rdd.ADAMContext._
</span><span class='line'>
</span><span class='line'>scala> :quit</span></code></pre></td></tr></table></div></figure>
<p>Most of the major APIs in ADAM can be used through our Python and R bindings,
with the exception of the region join API. We plan to enable the use of the
region join API in Python and R in the 0.24.0 release of ADAM, along with other
API compatibility improvements.</p>
<h1>Changes since Previous Release</h1>
<p>The full list of changes since version 0.22.0 is below.</p>
</div>
<footer>
<a class="btn btn-default" rel="full-article" href="/blog/2018/01/04/adam-0-dot-23-dot-0-released-plus-avocado-cannoli-and-deca-releases/">Read on →</a>
</footer>
</article>
<article class="post">
<header class="page-header">
<p class="meta text-muted text-uppercase">
<span class="glyphicon glyphicon-calendar"></span> <time datetime="2017-04-03T12:00:00-07:00" pubdate data-updated="true">Apr 3<span>rd</span>, 2017</time>
</p>
<h1 class="entry-title"><a href="/blog/2017/04/03/adam-0-dot-22-dot-0-release/">ADAM 0.22.0 Released</a></h1>
</header>
<div class="entry-content clearfix"><p>ADAM version 0.22.0 has been <a href="https://github.com/bigdatagenomics/adam/releases">released</a>!</p>
<p>Due to major changes between Spark versions 1.6 and 2.0, we build for combinations of Apache Spark and Scala versions:
<a href="https://github.com/bigdatagenomics/adam/releases/tag/adam-parent_2.10-0.22.0">Spark 1.x and Scala 2.10</a>,
<a href="https://github.com/bigdatagenomics/adam/releases/tag/adam-parent_2.11-0.22.0">Spark 1.x and Scala 2.11</a>,
<a href="https://github.com/bigdatagenomics/adam/releases/tag/adam-parent-spark2_2.10-0.22.0">Spark 2.x and Scala 2.10</a>, and
<a href="https://github.com/bigdatagenomics/adam/releases/tag/adam-parent-spark2_2.11-0.22.0">Spark 2.x and Scala 2.11</a>.</p>
<p>The focus of this release was performance, including major improvements to BQSR and INDEL realignment.</p>
<p>More than 80 other issues were closed in this release, including bug fixes around VCF validation and paired end FASTQ parsing
and new features such as pipe API support for features.</p>
<p>The full list of changes since version 0.21.0 is below.</p>
</div>
<footer>
<a class="btn btn-default" rel="full-article" href="/blog/2017/04/03/adam-0-dot-22-dot-0-release/">Read on →</a>
</footer>
</article>
<article class="post">
<header class="page-header">
<p class="meta text-muted text-uppercase">
<span class="glyphicon glyphicon-calendar"></span> <time datetime="2017-01-06T11:00:00-08:00" pubdate data-updated="true">Jan 6<span>th</span>, 2017</time>
</p>
<h1 class="entry-title"><a href="/blog/2017/01/06/adam-0-dot-21-dot-0-release/">ADAM 0.21.0 Released</a></h1>
</header>
<div class="entry-content clearfix"><p>ADAM version 0.21.0 has been <a href="https://github.com/bigdatagenomics/adam/releases">released</a>!</p>
<p>Due to major changes between Spark versions 1.6 and 2.0, we now build for combinations of Apache Spark and Scala versions:
<a href="https://github.com/bigdatagenomics/adam/releases/tag/adam-parent_2.10-0.21.0">Spark 1.x and Scala 2.10</a>,
<a href="https://github.com/bigdatagenomics/adam/releases/tag/adam-parent_2.11-0.21.0">Spark 1.x and Scala 2.11</a>,
<a href="https://github.com/bigdatagenomics/adam/releases/tag/adam-parent-spark2_2.10-0.21.0">Spark 2.x and Scala 2.10</a>, and
<a href="https://github.com/bigdatagenomics/adam/releases/tag/adam-parent-spark2_2.11-0.21.0">Spark 2.x and Scala 2.11</a>.
The Spark 2.x build-time dependency will be bumped to version 2.1.0 in the next release of ADAM, see issue <a href="https://github.com/bigdatagenomics/adam/issues/1330">#1330</a>.</p>
<p>One focus of this release was documentation, both at the developer API level, including extensive javadoc and scaladoc
source code comments, and at the user level (e.g. <a href="https://github.com/bigdatagenomics/adam/tree/master/docs/source">https://github.com/bigdatagenomics/adam/tree/master/docs/source</a>). The
user docs can be compiled to PDF or HTML with pandoc, but to be honest they look better rendered as Markdown on Github.</p>
<p>Another focus was to more closely follow the VCF specification(s) when reading from and writing to VCF.
For this we made significant changes to our variant and variant annotation schema and added support
for version 1.0 of the <a href="http://snpeff.sourceforge.net/VCFannotationformat_v1.0.pdf">VCF INFO ‘ANN’ key specification</a>.
This work will continue for our genotype and genotype annotation schema in the next version of ADAM.</p>
<p>The full list of changes since version 0.20.0 is below.</p>
</div>
<footer>
<a class="btn btn-default" rel="full-article" href="/blog/2017/01/06/adam-0-dot-21-dot-0-release/">Read on →</a>
</footer>
</article>
<article class="post">
<header class="page-header">
<p class="meta text-muted text-uppercase">
<span class="glyphicon glyphicon-calendar"></span> <time datetime="2016-10-19T12:00:00-07:00" pubdate data-updated="true">Oct 19<span>th</span>, 2016</time>
</p>
<h1 class="entry-title"><a href="/blog/2016/10/19/adam-0-dot-20-dot-0-release/">ADAM 0.20.0 Released</a></h1>
</header>
<div class="entry-content clearfix"><p>ADAM version 0.20.0 has been <a href="https://github.com/bigdatagenomics/adam/releases">released</a>!</p>
<p>Due to major changes between Spark versions 1.6 and 2.0, we now build for combinations of Apache Spark and Scala versions:
<a href="https://github.com/bigdatagenomics/adam/releases/tag/adam-parent_2.10-0.20.0">Spark 1.x and Scala 2.10</a>,
<a href="https://github.com/bigdatagenomics/adam/releases/tag/adam-parent_2.11-0.20.0">Spark 1.x and Scala 2.11</a>,
<a href="https://github.com/bigdatagenomics/adam/releases/tag/adam-parent-spark2_2.10-0.20.0">Spark 2.x and Scala 2.10</a>, and
<a href="https://github.com/bigdatagenomics/adam/releases/tag/adam-parent-spark2_2.11-0.20.0">Spark 2.x and Scala 2.11</a>.</p>
<p>Since the last release, version 0.19.0, we have closed more than 180 issues and merged more than 120 pull requests.</p>
<p>We added a new pipe API, allowing for streaming alignment and variant records out to external applications and streaming back
in the results. Several new region join implementations are now public API, including a broadcast inner join, broadcast right
outer join, sort-merge inner join, sort-merge right outer join, sort-merge left outer join, sort-merge full outer join,
sort-merge inner join followed by a group by, and a sort-merge right outer join followed by a group by.</p>
<p>Alignment records can now be read from and written to CRAM format. We updated upstream dependencies on Hadoop-BAM and htsjdk to
fix various alignment record header bugs and to add support for gzip and BGZF compressed VCF.</p>
<p>Our sequence feature schema now more closely follow the GFF3 specification, while still supporting BED, GFF2/GTF, IntervalList,
and NarrowPeak formats. We also added a new sample schema for e.g. SRA sample metadata.</p>
<p>With this version the core ADAM APIs are undergoing a major refactoring. We changed many method names on ADAMContext to make
the API more consistent. We also added RDD wrapper classes to increase performance by serializing metadata (such as record groups,
samples, and sequence dictionaries) to disk separate from primary data in Parquet. API incompatibilities between ADAM releases
will settle down by the 1.0 release, currently targeted for early 2017.</p>
<p>The full list of changes since version 0.19.0 is below.</p>
</div>
<footer>
<a class="btn btn-default" rel="full-article" href="/blog/2016/10/19/adam-0-dot-20-dot-0-release/">Read on →</a>
</footer>
</article>
<article class="post">
<header class="page-header">
<p class="meta text-muted text-uppercase">
<span class="glyphicon glyphicon-calendar"></span> <time datetime="2016-02-25T11:00:00-08:00" pubdate data-updated="true">Feb 25<span>th</span>, 2016</time>
</p>
<h1 class="entry-title"><a href="/blog/2016/02/25/adam-0-dot-19-dot-0-release/">ADAM 0.19.0 Released</a></h1>
</header>
<div class="entry-content clearfix"><p>ADAM version 0.19.0 has been <a href="https://github.com/bigdatagenomics/adam/releases">released</a>, built for both <a href="https://github.com/bigdatagenomics/adam/releases/tag/adam-parent_2.10-0.19.0">Scala 2.10</a> and <a href="https://github.com/bigdatagenomics/adam/releases/tag/adam-parent_2.11-0.19.0">Scala 2.11</a>.</p>
<p>The 0.19.0 release contains various concordance fixes and performance improvements for accessing read metadata. Schema changes, including a bump to version 0.7.0 of the Big Data Genomics <a href="https://github.com/bigdatagenomics/bdg-formats/releases/tag/bdg-formats-0.7.0">Avro data formats</a>, were made to support the read metadata performance improvements. Additionally, the performance of exporting a single BAM file was improved, and this was made to be guaranteed correct for sorted data.</p>
<p>ADAM now targets Apache Spark 1.5.2 and Apache Hadoop 2.6.0 as the default build environment. ADAM and applications built on ADAM should run on a wide range of Apache Spark (1.3.1 up to and including the most recent, 1.6.0) and Apache Hadoop (currently 2.3.0 and 2.6.0) versions. A compatibility matrix of Spark, Hadoop, and Scala version builds in our <a href="https://amplab.cs.berkeley.edu/jenkins/view/Big%20Data%20Genomics/">continuous integration system</a> verifies this. Please note, as of this release, support for Apache Spark 1.2.x and Apache Hadoop 1.0.x <a href="https://github.com/bigdatagenomics/adam/issues/958">has been dropped</a>.</p>
<p>The full list of changes since version 0.18.2 is below.</p>
</div>
<footer>
<a class="btn btn-default" rel="full-article" href="/blog/2016/02/25/adam-0-dot-19-dot-0-release/">Read on →</a>
</footer>
</article>
<article class="post">
<header class="page-header">
<p class="meta text-muted text-uppercase">
<span class="glyphicon glyphicon-calendar"></span> <time datetime="2015-11-11T11:00:00-08:00" pubdate data-updated="true">Nov 11<span>th</span>, 2015</time>
</p>
<h1 class="entry-title"><a href="/blog/2015/11/11/adam-0-dot-18-dot-2-released/">ADAM 0.18.2 Released</a></h1>
</header>
<div class="entry-content clearfix"><p>A few ADAM releases have been made since the last announcement; we’ll attempt to catch up here.</p>
<p>The most recent is a version <a href="https://github.com/bigdatagenomics/adam/releases">0.18.2 bugfix release</a>, built for both <a href="https://github.com/bigdatagenomics/adam/releases/tag/adam-parent_2.10-0.18.2">Scala 2.10</a> and <a href="https://github.com/bigdatagenomics/adam/releases/tag/adam-parent_2.11-0.18.2">Scala 2.11</a>. It fixes <a href="https://github.com/bigdatagenomics/adam/pull/873">a minor issue</a> with the binary distribution artifact from version 0.18.1.</p>
<p>Prior to version 0.18.2, we made significant changes to support version 0.6.0 of the Big Data Genomics <a href="https://github.com/bigdatagenomics/bdg-formats/releases/tag/bdg-formats-0.6.0">Avro data formats</a>. We also improved performance on core transforms (markdups, indel realignment, bqsr) by using finer grained projection. Some issues in 2bitfile when dealing with gaps and masked regions were fixed. Round-trip transformations from native formats (e.g., FASTA, FASTQ, SAM, BAM) to ADAM and back have been improved. We made extending ADAM more straightforward.</p>
<p>ADAM now runs on a wide range of Apache Spark (1.2.1 up to and including the most recent, 1.5.1) and Apache Hadoop (currently 1.0.4, 2.3.0 and 2.6.0) versions. This is verified by a compatibility matrix of Spark, Hadoop, and Scala version builds in our <a href="https://amplab.cs.berkeley.edu/jenkins/view/Big%20Data%20Genomics/">continuous integration system</a>.</p>
<p>The full list of changes since version 0.17.0 is below.</p>
</div>
<footer>
<a class="btn btn-default" rel="full-article" href="/blog/2015/11/11/adam-0-dot-18-dot-2-released/">Read on →</a>
</footer>
</article>
<article class="post">
<header class="page-header">
<p class="meta text-muted text-uppercase">
<span class="glyphicon glyphicon-calendar"></span> <time datetime="2015-07-10T10:19:45-07:00" pubdate data-updated="true">Jul 10<span>th</span>, 2015</time>
</p>
<h1 class="entry-title"><a href="/blog/2015/07/10/genomic-analysis-using-adam/">Genomic Analysis Using ADAM, Spark and Deep Learning</a></h1>
</header>
<div class="entry-content clearfix"><blockquote><p>Special thanks to <a href="https://github.com/nfergu">Neil Ferguson</a> for this blog post on genomic analysis using ADAM, Spark and Deep Learning</p></blockquote>
<p>Can we use deep learning to predict which population group you belong to, based solely on your genome?</p>
<p>Yes, we can – and in this post, we will show you exactly how to do this in a scalable way, using Apache Spark. We will explain how to apply <a href="https://en.wikipedia.org/wiki/Deep_learning">deep learning</a> using <a href="https://en.wikipedia.org/wiki/Artificial_neural_network">artifical neural networks</a> to predict which population group an individual belongs to – based entirely on his or her genomic data.</p>
<p>This is a follow-up to an earlier post: <a href="http://bdgenomics.org/blog/2015/02/02/scalable-genomes-clustering-with-adam-and-spark/">Scalable Genomes Clustering With ADAM and Spark</a> and attempts to replicate the results of that post. However, we will use a different machine learning technique. Where the original post used <a href="https://en.wikipedia.org/wiki/K-means_clustering">k-means clustering</a>, we will use deep learning.</p>
<p>We will use <a href="https://github.com/bigdatagenomics/adam">ADAM</a> and <a href="https://spark.apache.org/">Apache Spark</a> in combination with <a href="http://0xdata.com/product/">H2O</a>, an open source predictive analytics platform, and <a href="http://0xdata.com/product/sparkling-water/">Sparking Water</a>, which integrates H2O with Spark.</p>
</div>
<footer>
<a class="btn btn-default" rel="full-article" href="/blog/2015/07/10/genomic-analysis-using-adam/">Read on →</a>
</footer>
</article>
<article class="post">
<header class="page-header">