Skip to content

SampleData[PairedEndSequencesWithQuality]

Santiago Castro Dau edited this page Apr 22, 2024 · 2 revisions

Collections of unjoined paired-end sequences with quality scores associated with specified samples (i.e., demultiplexed sequences).

Artifact Format

class SingleLanePerSamplePairedEndFastqDirFmt(_SingleLanePerSampleFastqDirFmt):
    _REQUIRE_PAIRED = True

class _SingleLanePerSampleFastqDirFmt(CasavaOneEightSingleLanePerSampleDirFmt):
    manifest = model.File('MANIFEST', format=FastqManifestFormat)
    metadata = model.File('metadata.yml', format=YamlFormat)

class CasavaOneEightSingleLanePerSampleDirFmt(model.DirectoryFormat):
    _CHECK_PAIRED = True
    _REQUIRE_PAIRED = False

    sequences = model.FileCollection(
        r'.+_.+_L[0-9][0-9][0-9]_R[12]_001\.fastq\.gz',
        format=FastqGzFormat)

    @sequences.set_path_maker
    def sequences_path_maker(self, sample_id, barcode_id, lane_number,
                             read_number):
        return '%s_%s_L%03d_R%d_001.fastq.gz' % (sample_id, barcode_id,
                                                 lane_number, read_number)

Expected Folder Structure

data
├── metadata.yml
├── MANIFEST
├── <sample_id>_<barcode_id>_L<lane_number>_R<read_number>.fastq.gz
⋮
└── <sample_id>_<barcode_id>_L<lane_number>_R<read_number>.fastq.gz

Where to find SampleData[PairedEndSequencesWithQuality]

As Input

🏠 Home

🧑🏻‍🏫 Tutorials

🎬 Actions

Clone this wiki locally