Package detail

@gmod/bam

GMOD5.6kMIT6.1.1

Parser for BAM and BAM index (bai) files

bionode, biojs, bam, genomics

readme

Install

$ npm install --save @gmod/bam

Usage

const { BamFile } = require('@gmod/bam')
// or import {BamFile} from '@gmod/bam'

const t = new BamFile({
  bamPath: 'test.bam',
})

// note: it's required to first run getHeader before any getRecordsForRange
var header = await t.getHeader()

// this would get same records as samtools view ctgA:1-50000
var records = await t.getRecordsForRange('ctgA', 0, 50000)

The bamPath argument only works on nodejs. In the browser, you should pass bamFilehandle with a generic-filehandle2 e.g. RemoteFile

const { RemoteFile } = require('generic-filehandle2')
const bam = new BamFile({
  bamFilehandle: new RemoteFile('yourfile.bam'), // or a full http url
  baiFilehandle: new RemoteFile('yourfile.bam.bai'), // or a full http url
})

Input are 0-based half-open coordinates (note: not the same as samtools view coordinate inputs!)

Usage with htsget

Since 1.0.41 we support usage of the htsget protocol

Here is a small code snippet for this

const { HtsgetFile } = require('@gmod/bam')

const ti = new HtsgetFile({
  baseUrl: 'http://htsnexus.rnd.dnanex.us/v1/reads',
  trackId: 'BroadHiSeqX_b37/NA12878',
})
await ti.getHeader()
const records = await ti.getRecordsForRange(1, 2000000, 2000001)

Our implementation makes some assumptions about how the protocol is implemented, so let us know if it doesn't work for your use case

Documentation

BAM constructor

The BAM class constructor accepts arguments

bamPath/bamUrl/bamFilehandle - a string file path to a local file or a class object with a read method
csiPath/csiUrl/csiFilehandle - a CSI index for the BAM file, required for long chromosomes greater than 2^29 in length
baiPath/baiUrl/baiFilehandle - a BAI index for the BAM file
cacheSize - limit on number of chunks to cache. default: 50
yieldThreadTime - the interval at which the code yields to the main thread when it is parsing a lot of data. default: 100ms. Set to 0 to performed no yielding

Note: filehandles implement the Filehandle interface from https://www.npmjs.com/package/generic-filehandle2.

This module offers the path and url arguments as convenience methods for supplying the LocalFile and RemoteFile

async getRecordsForRange(refName, start, end, opts)

Note: you must run getHeader before running getRecordsForRange

refName - a string for the chrom to fetch from
start - a 0-based half open start coordinate
end - a 0-based half open end coordinate
opts.signal - an AbortSignal to indicate stop processing
opts.viewAsPairs - re-dispatches requests to find mate pairs. default: false
opts.pairAcrossChr - control the viewAsPairs option behavior to pair across chromosomes. default: false
opts.maxInsertSize - control the viewAsPairs option behavior to limit distance within a chromosome to fetch. default: 200kb

async *streamRecordsForRange(refName, start, end, opts)

This is a async generator function that takes the same signature as getRecordsForRange but results can be processed using

for await (const chunk of file.streamRecordsForRange(
  refName,
  start,
  end,
  opts,
)) {
}

The getRecordsForRange simply wraps this process by concatenating chunks into an array

async getHeader(opts: {....anything to pass to generic-filehandle2 opts})

This obtains the header from HtsgetFile or BamFile. Retrieves BAM file and BAI/CSI header if applicable, or API request for refnames from htsget

async indexCov(refName, start, end)

refName - a string for the chrom to fetch from
start - a 0-based half open start coordinate (optional)
end - a 0-based half open end coordinate (optional)

Returns features of the form {start, end, score} containing estimated feature density across 16kb windows in the genome

async lineCount(refName: string)

refName - a string for the chrom to fetch from

Returns number of features on refName, uses special pseudo-bin from the BAI/CSI index (e.g. bin 37450 from bai, returning n_mapped from SAM spec pdf) or -1 if refName not exist in sample

async hasRefSeq(refName: string)

refName - a string for the chrom to check

Returns whether we have this refName in the sample

Returned features

Example

feature.ref_id // numerical sequence id corresponding to position in the sam header
feature.start // 0-based half open start coordinate
feature.end // 0-based half open end coordinate
feature.name // QNAME
feature.seq // feature sequence
feature.qual // qualities
feature.CIGAR // CIGAR string
feature.tags // tags
feature.flags // flags
feature.template_length // TLEN

License

MIT © Colin Diesh

changelog

6.1.1 (2025-10-02)

6.1.0 (2025-10-01)

6.0.4 (2025-05-26)

6.0.3 (2025-05-13)

6.0.2 (2025-04-30)

6.0.1 (2025-04-30)

6.0.0 (2025-04-30)

5.0.7 (2025-03-11)

5.0.6 (2025-02-28)

5.0.5 (2024-12-18)

5.0.4 (2024-12-18)

5.0.3 (2024-12-18)

5.0.2 (2024-12-17)

5.0.1 (2024-12-12)

5.0.0 (2024-12-12)

4.0.1 (2024-11-12)

4.0.0 (2024-11-12)

3.0.3 (2024-11-11)

3.0.2 (2024-11-11)

republish v3.0.1 since it got tagged on a deleted branch

3.0.1 (2024-11-11)

3.0.0 (2024-11-07)

2.0.4 (2024-08-09)

2.0.3 (2024-07-23)

Reverts

Revert "Migrate to eslint9" (65adcbb)
Revert "Run format" (2a02535)

2.0.2 (2024-02-21)

Update typescript-eslint config and related fixes

2.0.1 (2024-2-20)

Update to buffer-crc32 1.0.0
Fix BAM header parsing of refNames containing a :

2.0.0 (2023-06-08)

Features

explicit buffer import (#98) (66de9f4)
Add explicit buffer import
Remove cross-fetch and object.entries polyfills
Improve typescripting
Remove chunkSizeLimit and fetchSizeLimit

1.1.18 (2022-12-17)

Use es2015 for nodejs build

1.1.17 (2022-07-18)

Bump devDeps and generic-filehandle to 3.0.0

1.1.16 (2022-03-30)

Add src directory for better source maps

1.1.15 (2022-03-18)

Fix for htsget failing with message 'input must be buffer, number, or string, received object'
Speed improvement by caching chunks of features

1.1.14 (2022-03-14)

Fix seq function for corner case

1.1.13 (2022-02-25)

Optimize qual and sequence string record functions for less GC pressure

1.1.12 (2022-02-17)

Add blocksForRange method to BamFile class to help stats estimation in JBrowse 2

1.1.11 (2022-01-26)

Cache setup of index file parsing

1.1.10 (2022-01-18)

Make _refID and flags public fields
Small internal changes to the handling of opts

1.1.9 (2021-12-14)

Add ESM module export in package.json (smaller bundle size for consumers)
Cache BAI readFile result for compatibility with node.js native filehandles (which otherwise fail if re-reading the filehandle twice)

1.1.8 (2021-05-21)

Fix types for yieldThreadTime

1.1.7 (2021-05-21)

New param yieldThreadTime to constructor to yield while processing

1.1.6 (2021-02-20)

Add qualRaw function on records for getting raw qual score array instead of string

1.1.5 (2020-12-11)

Allow getHeaderText to accept cancellation options

1.1.4 (2020-12-11)

Add canMergeBlocks to CSI code (already existed in BAI)
Add suggestion from @jrobinso about reg2bins modification for memory saving (Thanks!)
Add getHeaderText() method for getting a text string of the header data

1.1.3 (2020-10-29)

Fix usage of feature.get('seq'), was using feature.getReadBases before this

1.1.2 (2020-10-02)

Fix signedness in BAM tags (#65)
Remove unused seq_reverse_complemented tag from _tags()

1.1.1 (2020-09-20)

Remove JBrowse specific results from tags

1.1.0 (2020-08-28)

Add support for the CG tag for long CIGAR strings

1.0.42 (2020-08-19)

Small bugfix for Htsget specifically

1.0.41 (2020-08-19)

Add htsget example
Support opts object to getHeader allowing things like auth headers to be passed right off the bat

1.0.40 (2020-07-30)

1.0.39 (2020-07-30)

Don't use origin master in the follow-tags postpublish command for cleaner version publishing

1.0.38 (2020-07-30)

Direct construction of qual/seq toString
Improve performance of the uniqueID calculation for pathological cases where there are tons of bins

1.0.37 (2020-06-06)

Typescript only release: export BamRecord types

1.0.36 (2020-03-05)

Adds a shortcut to stop parsing chunks after a record is detected to be outside the requested range while decoding

1.0.35 (2020-02-04)

Update scheme used to calculate unique fileOffset based IDs using @gmod/bgzf-filehandle updates

1.0.34 (2020-01-24)

Small fix for using id() instead of .get('id') for weird SAM records containing ID field

1.0.33 (2020-01-24)

Perform decoding of entire chunk up front to aid caching, reverts change in 1.0.29

1.0.32 (2019-11-16)

Add a speed improvement for long reads by pre-allocating sequence/quality scores array

1.0.31 (2019-11-07)

Fix example of the "ID" field failing to return the right data

1.0.30 (2019-11-07)

Add fix that was causing the parser to not return all tags from the _tags API

1.0.29 (2019-10-31)

Decoding of the BAM records at time of use instead of entire chunk decoded up front
Alternate chunk merging strategy inspired by igv.js code

1.0.28 (2019-10-29)

Add CSI index block merging
Change unique ID generator to be smaller numeric IDs

1.0.27 (2019-10-10)

Make feature IDs become generated based relative to the exact bgzip block

1.0.26 (2019-10-01)

Restore issue with getRecordsForRange not returning all features (#44)
Fix compatibility with electron (#43)
Fix usage of feature.get('seq')

1.0.25 (2019-09-29)

Fixed some typescript typings

1.0.24 (2019-09-27)

Added typescript typings

1.0.22 (2019-09-27)

Added typescript typings
Botched release, was removed from npm

1.0.22 (2019-09-03)

Fixed issue with features having different IDs across different chunks (#36)

1.0.21 (2019-08-06)

Add a fix for the small chunk unpacking re-seeking in the same bgzf block repeatedly (#35)

1.0.20 (2019-06-06)

Added a method for smaller chunk unpacking, by modifying the header parsing to return smaller chunks and the bgzf unzipping to respect chunk boundaries (#30)
Use fileOffset as bam feature ID which previously was crc32 of the BAM buffer which consequently speeds up processing and allows exact duplicate features

1.0.19 (2019-05-30)

Added lineCount and hasRefSeq functions to BamFile, each accepting a string seqName
Fixed aborting on index retrieval code

1.0.18 (2019-05-01)

Bump generic-filehandle to 1.0.9 to fix error with using native fetch (global fetch needed to be bound)
Bump abortable-promise-cache to 1.0.1 version to fix error with using native fetch and abort signals

1.0.17 (2019-04-28)

Fix wrong number of arguments being passed to the readRefSeqs file read() invocation resulting in bad range requests

1.0.16 (2019-04-28)

Added indexCov algorithm to retrieve approximate coverage of the BAM inferred from the size of the BAI linear index bins
Fixed abortSignal on read() calls
Updated API to allow bamUrl/baiUrl/csiUrl

1.0.15 (2019-04-04)

Added check for too large of chromosomes in the bai bins
Added aborting support (thanks @rbuels)
Refactored index file class

1.0.14 (2019-01-04)

Add hasRefSeq for CSI indexes

1.0.13 (2018-12-25)

Use ascii decoding for read names
Fix error with large BAM headers with many refseqs

1.0.12 (2018-11-25)

Faster viewAsPairs operation

1.0.11 (2018-11-23)

Fix for ie11

1.0.10 (2018-11-18)

Add a maxInsertSize parameter to getRecordsForRange

1.0.9 (2018-11-16)

Allow bases other than ACGT to be decoded
Make viewAsPairs only resolve pairs on given refSeq unless pairAcrossChr is enabled for query

1.0.8 (2018-10-31)

Add getPairOrientation for reads

1.0.7 (2018-10-19)

Re-release of 1.0.6 due to build machinery error

1.0.6 (2018-10-19)

Add bugfix for where bytes for an invalid request returns 0 resulting in pako unzip errors

1.0.5 (2018-10-16)

Add a bugfix for pairing reads related to adding duplicate records to results

1.0.4 (2018-10-13)

Support pairing reads
Fix pseudobin parsing containing feature count on certain BAM files

1.0.3 (2018-09-25)

Remove @gmod/tabix dependency

1.0.2 (2018-09-25)

Fix CSI indexing code

1.0.1 (2018-09-24)

Rename hasDataForReferenceSequence to hasRefSeq

1.0.0 (2018-09-24)

Initial implementation of BAM parsing code