SNP calling

At NgBS we are able to identify variation in DNA sequences by mapping short read data to a reference genome. The variant callers we typical use are freebayes and/or samtools. Both of these programs are able to call single nucleotide polymorphisms as well as identify indels.

We are also able to functionally annotate variants (i.e. identify SNPs that change amino acids or indels that cause frame-shift errors etc) if a set of gene models are available. We can also call SNPs using both callers and provide a file that only contains 'robust' SNPs that are called by both programs. We strongly recommend that you visually confirm any SNPs of interest using a viewer like IGV (we can help with this).

Typical outputs that can be delivered

  • Variant call information in the form of a VCF file
  • SNP statistics
  • If requested SNP annotations (coding mutations)
  • Short read alignment BAM files ready for viewing using visualisation tools (ie IGV)

Frequently Asked Questions

VCF files contain large amounts of information describing SNP features such as read depth, genotype, and quality. This vital information is presented in an information dense text format that is best viewed using spreadsheet software (or any text file reader). These files contain a LOT of useful information and we won't lie it will probably take you some time to digest all of the information if you have never worked with these files before. However, this common format is well documented and once you are familiar with the key data columns you should have no trouble identifying robust SNPs of interest.

Yes the BAM files supplied can be viewed using free software such as IGV.

Yes.

SNP calling using short read data is based on a model whose paramaters are estimated from the underlying short read data. The different programs use slightly different model/parameters to estimate the probability that a SNP call is real (at least based on the observed data). Calling SNPs using multiple programs allows you to be more confident the calls are model independent.

Yes. However, unless very specific annotations have been generated for your species of interest (Human, Mouse, Fly etc) these regions are likely to be coarsely defined based on (for example) a specific number of base-pairs upstream from the gene(s) of interest.