ConGenR: rapid determination of consensus genotypes and estimates of genotyping errors from replicated genetic samples
Abstract
ConGenR is an R based conservation genetics script that facilitates rapid determination of consensus genotypes from replicated samples, determines overall (successful amplifications / amplification attempted) and individual sample level (proportion of samples with successful amplifications at n loci) amplification success rates, and quantifies genotyping error rates. ConGenR is intended for use with codominant, multilocus microsatellite data generated primarily through noninvasive genetic sampling and processed with a multi-tubes approach. ConGenR handles input that can be easily exported from GENEMAPPER, a program commonly used to score allele sizes. Amplification success and genotyping error rates can be evaluated by sample class (i.e., any identifiable and meaningful subdivision of samples; e.g., sex, season, region, or sample condition), offering insights into processes driving amplification success and genotyping error rates. Additionally, amplification success and genotyping error rates are calculated by locus, expediting the identification of problematic loci during pilot studies.
Citation
RC Lonsinger, LP Waits 2015. ConGenR: rapid determination of consensus genotypes and estimates of genotyping errors from replicated genetic samples. Conservation Genetics Resources 7(4): 841–843. | Link.
ConGenR is an R based conservation genetics script that facilitates rapid determination of consensus genotypes from replicated samples, determines overall (successful amplifications / amplification attempted) and individual sample level (proportion of samples with successful amplifications at n loci) amplification success rates, and quantifies genotyping error rates. ConGenR is intended for use with codominant, multilocus microsatellite data generated primarily through noninvasive genetic sampling and processed with a multi-tubes approach. ConGenR handles input that can be easily exported from GENEMAPPER, a program commonly used to score allele sizes. Amplification success and genotyping error rates can be evaluated by sample class (i.e., any identifiable and meaningful subdivision of samples; e.g., sex, season, region, or sample condition), offering insights into processes driving amplification success and genotyping error rates. Additionally, amplification success and genotyping error rates are calculated by locus, expediting the identification of problematic loci during pilot studies.
Citation
RC Lonsinger, LP Waits 2015. ConGenR: rapid determination of consensus genotypes and estimates of genotyping errors from replicated genetic samples. Conservation Genetics Resources 7(4): 841–843. | Link.
To download the ConGenR script, user manual, and example input files, please (1) enter your email address below and (2) indicate whether or not you would like to receive notifications. |
|
New Features: ConGenR now supports matching of multilocus consensus genotypes
Following the determination of consensus genotypes, it is often desirable to identify samples with identical (full match) or similar (partial match) multilocus genotypes. The congen.matching() function can be used to identify samples with fully or partially matching multilocus genotypes. Users can define the number of matching loci required to report two samples as being a match; in practice, the number of matches required will often be set to the number of loci required to meet desired levels of probability of identity (Waits et al. 2001). An option to consider or ignore loci with uncertainty (coded as an allele size = 0) provides a flexible framework for identifying matches even when uncertainty exists. When uncertainty is ignored, two samples are considered a match if the number of loci that are full matches and the number of loci with uncertainty sum to a value greater than the number of matches required. Matching methods employed by ConGenR allow the user to provide sample location data, which can be numeric (i.e., XY data, such as UTMs) or categorical (e.g., region, county, study area). Sample locations are provided in a separate file or data frame. When numeric locations are provided, the results will include a distance between each focal sample (the sample to which other samples are being compared) and each sample determined to be a match. Alternatively, if locations are categorical, the location of each sample will be added to the result file, facilitating comparisons between the focal sample and matching samples.
Following the determination of consensus genotypes, it is often desirable to identify samples with identical (full match) or similar (partial match) multilocus genotypes. The congen.matching() function can be used to identify samples with fully or partially matching multilocus genotypes. Users can define the number of matching loci required to report two samples as being a match; in practice, the number of matches required will often be set to the number of loci required to meet desired levels of probability of identity (Waits et al. 2001). An option to consider or ignore loci with uncertainty (coded as an allele size = 0) provides a flexible framework for identifying matches even when uncertainty exists. When uncertainty is ignored, two samples are considered a match if the number of loci that are full matches and the number of loci with uncertainty sum to a value greater than the number of matches required. Matching methods employed by ConGenR allow the user to provide sample location data, which can be numeric (i.e., XY data, such as UTMs) or categorical (e.g., region, county, study area). Sample locations are provided in a separate file or data frame. When numeric locations are provided, the results will include a distance between each focal sample (the sample to which other samples are being compared) and each sample determined to be a match. Alternatively, if locations are categorical, the location of each sample will be added to the result file, facilitating comparisons between the focal sample and matching samples.