PseudoBase information

Information
Lines available

Lines available (not all lines available for all queries)

D.pseudoobscura pseudoobscura

31 lines

Mesa Verde, CO 2-25 reference line
collected: 2005
Source : Richards SRA Experiment : Original

American Fork Canyon, UT 06
collected: 2015
SRA Experiment : SRX7842584

American Fork Canyon, UT 12
collected: 1997
SRA Experiment : SRX091462

American Fork Canyon, UT 12-2
collected: 2015
SRA Experiment : SRX7842593

American Fork Canyon, UT 14
collected: 2015
SRA Experiment : SRX7842580

American Fork Canyon, UT 19
collected: 2015
SRA Experiment : SRX7842589

American Fork Canyon, UT 24
collected: 2015
SRA Experiment : SRX7842600

American Fork Canyon, UT 30
collected: 2015
SRA Experiment : SRX7842597

American Fork Canyon, UT 47
collected: 2015
SRA Experiment : SRX7842579

American Fork Canyon, UT 48
collected: 2015
SRA Experiment : SRX7842594

American Fork Canyon, UT 49
collected: 2015
SRA Experiment : SRX7842595

American Fork Canyon, UT 56
collected: 2015
SRA Experiment : SRX7842581

American Fork Canyon, UT 57
collected: 2015
SRA Experiment : SRX7842596

American Fork Canyon, UT 60
collected: 2015
SRA Experiment : SRX7842587

Dpse SR
SRA Experiment : SRX3430959

Dpse ST
SRA Experiment : SRX3430958

Flagstaff, AZ 18
collected: 1997
SRA Experiment : SRX091310

Madera Canyon, AZ 06
collected: 2015
SRA Experiment : SRX7842583

Madera Canyon, AZ 13
collected: 2015
SRA Experiment : SRX7842582

Madera Canyon, AZ 14
collected: 2015
SRA Experiment : SRX7842586

Madera Canyon, AZ 15
collected: 2015
SRA Experiment : SRX7842599

Madera Canyon, AZ 17
collected: 2015
SRA Experiment : SRX7842588

Madera Canyon, AZ 20
collected: 2015
SRA Experiment : SRX7842598

Madera Canyon, AZ 27
collected: 2015
SRA Experiment : SRX7842591

Mather, CA 32
collected: 1997
SRA Experiment : SRX091461

Mather, CA TL
collected: 1959
SRA Experiment : SRX091324

Mount St. Helena, CA 24
collected: 1997
SRA Experiment : SRX091463

Mount St. Helena, CA 9
collected: 1997
SRA Experiment : SRX091465

S7-Flag14
collected: 1997
SRA Experiment : SRX7842590

San Antonio, NM, Pikes Peak 1134
collected: 2006
SRA Experiment : SRX091323

San Antonio, NM, Pikes Peak 1137
collected: 2006
SRA Experiment : SRX091311

D.pseudoobscura bogotana

5 lines

SCinv
SRA Experiment : SRX7260972

Suta3
SRA Experiment : SRX7260973

Toro1
SRA Experiment : SRX091468

Toro4
SRA Experiment : SRX7260971

WhiteER
SRA Experiment : SRX7260970

D.persimilis

13 lines

111_35
SRA Experiment : Pending release under PRJNA672098

111_50
SRA Experiment : Pending release under PRJNA672098

111_51
SRA Experiment : Pending release under PRJNA672098

Dper SR
SRA Experiment : SRX3430961

Dper ST
SRA Experiment : SRX3430960

MSH3
SRA Experiment : Pending release under PRJNA672098

MSH42
SRA Experiment : Pending release under PRJNA672098

MSH7
SRA Experiment : Pending release under PRJNA672098

Mather40
SRA Experiment : Pending release under PRJNA672098

MatherG
SRA Experiment : Pending release under PRJNA672098

Mount St. Helena, CA 1993
collected: 1993
SRA Experment : SRX104991, SRX104992

Mount St. Helena, CA 39
collected: 1997
SRA Experiment : SRX063440

Santa Cruz Island
collected: 2004
SRA Experiment : SRX091471

D.miranda

11 lines

MA28
SRA Experiment : SRX950183

MAO101.4
SRA Experiment : SRX950187

MAO3.3
SRA Experiment : SRX950188

MAO3.4
SRA Experiment : SRX950189

MAO3.5
SRA Experiment : SRX950190

MAO3.6
SRA Experiment : SRX950211

ML14
SRA Experiment : SRX965452

ML16
SRA Experiment : SRX965455

ML6f
SRA Experiment : SRX965460

SP138
Source : D Bachtrog SRA Experiment : SRX965461

SP235
SRA Experiment : SRX965462

D.lowei

1 line

Lab3Lowei
SRA Experiment : SRX091467, SRX091466

About PseudoBase

Drosophila pseudoobscura is a classic model system for the study of evolutionary genetics and genomics, and many genome sequences have accumulated for D. pseudoobscura and closely related species. To facilitate the exploration of genetic variation within species and comparative genomics across species, we present PseudoBase. This database contains genetic variation (SNPs and indels) from D. pseudoobscura and several related species. All genetic data within the database are derived from the same workflow, so variants are easily comparable across data sets. Features include an embedded JBrowse interface, ability to pull out alignments of individual genes/regions, and batch access for gene lists. Anyone can take advantage of this database without the burden of obtaining and downloading raw data, assembling genomes, or calling variants. We hope that this resource will be of use in both research and educational settings.

For further details, please see the documentation here or get in touch using the email address listed below. When citing PseudoBase, please cite our paper and reference the version number indicated in the footer (and consult the Updates tab above for details about version differences):

Korunes, KL, RB Myers, R Hardy, and MAF Noor. 2020. PseudoBase: A genomic visualization and exploration resource for the Drosophila pseudoobscura subgroup. Fly. In press. doi:10.1080/19336934.2020.1864201

Contact

Please contact us at pseudobase.help@gmail.com with any questions or comments regarding PseudoBase.

PseudoBase Data

PseudoBase uses whole genome paired-end Illumina sequencing from multiple laboratory groups and experiments (associated publications include: Fuller, Leonard, Young, Schaeffer, & Phadnis, 2018; Korunes et al., 2019; McGaugh et al., 2012; McGaugh & Noor, 2012; Samuk, Manzano-Winkler, Ritz, & Noor, 2020). Raw sequencing data and associated details are available on the NCBI Short Read Archive under the sample accessions provided in the "lines available" tab.

A brief note on naming conventions: there are two named subspecies of D. pseudoobscura—D. pseudoobscura pseudoobscura and D. pseudoobscura bogotana. In PseudoBase and in our associated paper, we use D. pseudoobscura to refer to both subspecies. We specify D. pseudoobscura pseudoobscura or D. pseudoobscura bogotana when we are specifically referring to only one of the two subspecies

The pipeline used for genome alignment and variant calling is available on GitHub (https://github.com/kkorunes/PseudobaseScripts).
In summary, we first used BWA-0.7.17 (Li & Durbin 2009) to align all sequences to the D. pseudoobscura genome assembly, obtained from FlyBase (Dpse_3.04: GCA_000001765.2; Thurmond et al. 2019). Please note that while PseudoBase only includes this reference genome, FlyBase provides a coordinate converter that is useful for converting coordinates from the previous version of the D. pseudoobscura reference (Coordinate Converter). We used Picard to mark adapters and duplicates that might introduce bias from data generation steps such as PCR amplification (http://broadinstitute.github.io/picard/). Variants were then called and filtered using GATK v4.1.1 (McKenna et al. 2010; Van der Auwera et al. 2013). We filtered SNPs and INDELs separately, according to the hard filtering recommendations provide by GATK. Specifically, we excluded SNPs with QualByDepth (QD) < 2.0 , FisherStrand (FS) > 60, and StrandOddsRatio (SOR) > 3.0, MQ < 40, MQRankSum < -12.5, ReadPosRankSum < -8. INDELs were filtered to exclude variants with QualByDepth (QD) < 2.0 , FisherStrand (FS) > 200, and StrandOddsRatio (SOR) > 10.0, ReadPosRankSum < 20.

PseudoBase User Interface

Pseudobase Home search page

The PseudoBase homepage allows the user to query by gene (or genes if the user uploads a batch query) or by chromosomal region. By selecting one or more species of interest, the user can either generate a FASTA-formatted alignment or navigate to the JBrowse interface (Buels et al., 2016). Supported formats in the “By Gene” search function on the homepage include gene names (e.g., adh), GA IDs (e.g., GA26895), CG IDs (e.g., CG10064), GL IDs (e.g., GL15062), GLEANR IDs (e.g., GLEANR_4729), and FlyBase IDs (e.g., FBgn0248267). D. melanogaster gene IDs are available for search because D. melanogaster orthologs in other sequenced Drosophila genomes are reported by FlyBase (as determined by OrthoDB), and PseudoBase uses this ortholog report to display the relevant orthologous D. pseudoobscura gene when a D. melanogaster gene identifier is entered (Thurmond et al., 2019; FlyBase file ("dmel_orthologs_in_drosophila_species_fb_2020_04.tsv.gz"). PseudoBase also uses this ortholog report to look up gene identifiers of D. persimilis, by first determining the D. melanogaster ortholog, then looking up the D. pseudoobscura ortholog. Note that any genomic region, including subfeatures of genes such as introns, can be accessed from this page by inputting their genomic coordinates into the “By Chromosome” tab. The JBrowse interface can also be reached directly, using the “Browse” tab.

If the user generates a FASTA-formatted alignment, the FASTA headers will contain the following information, depending on whether the search was performed "By Gene" or "By Chromosome":

By gene: 'species' | 'strain name' | 'reference sequence FlyBase release' | 'chrom'_'gene CDS start pos' 'MRNA transcript used to determine CDS' | 'list of gene synonyms/translations for selected gene'

By chromosome: 'species' | 'strain name' | 'chromosome' | 'reference sequence FlyBase release' | 'position range selected'

PseudoBase JBrowse

The embedded JBrowse interface allows for browsing of specific genes/regions. All strains imported into PseudoBase are automatically made available for browsing within JBrowse, and the user can view or hide strains by checking/unchecking boxes in the “Available Tracks” panel to the left of the JBrowse viewer. JBrowse allows the user to visualize SNPs and indels (tracks with the “I/D” prefix) specific to each selected track. Clicking on any of the displayed variants brings up further details, such as the specific allele and its attributes (e.g., sequencing depth).

The reference sequence and accompanying annotations in the JBrowse interface are provided by FlyBase (Dpse_3.04: GCA_000001765.2; Thurmond et al. 2019). Clicking on either “Ref sequence” or “Ref annotations” brings up details about these data tracks, including the color legend used identify annotation features (e.g., coding genes, ncRNA, orthologous regions, etc.) at a glance. Clicking on an annotation itself will bring up a variety of details including genomic coordinates, length, alternative names, and the reference sequence of the feature, with an option to save a FASTA-formatted download of this sequence.

The top of the JBrowse interface includes arrows to navigate along the selected chromosome, -/+ options to zoom out or in, a chromosome dropdown menu to jump to a different chromosome, and a chromosome coordinate entry box to jump to a different region of the selected chromosome.

For the genomic region selected in the JBrowse viewer, the reference sequence and accompanying annotations are pulled from FlyBase (ftp://ftp.flybase.net/genomes/Drosophila_pseudoobscura/dpse_r3.04_FB2018_05/gff/dpse-all-3.04.gff.gz). The displayed features include genes, coding sequences (CDSs), exons, introns, untranslated regions (5’ and 3’ UTRs), mRNA, ncRNA, orthologous regions, “orthologous to" annotations, proteins, and syntenic_regions. See the FlyBase documentation for descriptions of these data types. Clicking on any of these features brings up detailed information, including coordinates, the feature length, any aliases, the full nucleotide sequence, and the nucleotide sequence of each subfeature (e.g., introns).

URL Query Parameters

PseudoBase URL query parameters allow the user to link straight to a specific gene/region in PseudoBase via a URL. Available parameters are the region (gene or genomic coordinates), the output format ('fasta' or 'jbrowse' , with the default set to 'fasta'), and the species (comma separated list; e.g., 'pse,mir,bog,low,per' - default is 'pse'). For example:

Jump straight to Adh gene in JBrowse:

http://pseudobase.biology.duke.edu?gene=Adh&output=jbrowse

Jump straight to GA10043 gene FASTA results (output mir and low species):

http://pseudobase.biology.duke.edu?gene=GA10043&species=mir,low

Jump straight to Chr 2, coordinates 550000..564000 in JBrowse:

http://pseudobase.biology.duke.edu?chrom=2&pos=550000..564000&output=jbrowse

Citations

Buels, R., Yao, E., Diesh, C. M., Hayes, R. D., Munoz-Torres, M., Helt, G., … Holmes, I. H. (2016). JBrowse: a dynamic web platform for genome visualization and analysis. Genome Biology, 17(66). doi: 10.1186/s13059-016-0924-1
Fuller, Z. L., Leonard, C. J., Young, R. E., Schaeffer, S. W., & Phadnis, N. (2018). Ancestral polymorphisms explain the role of chromosomal inversions in speciation. PLoS Genetics, 14(7), e1007526. doi: 10.1371/journal.pgen.1007526
Korunes, K. L., Machado, C. A., & Noor, M. A. (2019). Inversions shape the divergence of Drosophila pseudoobscura and D. persimilis on multiple timescales. BioRxiv, 842047. doi: 10.1101/842047
Li H, Durbin R (2009) Fast and accurate short read alignment with Burrows-Wheeler Transform. Bioinformatics, 25, 1754–60. doi:10.1093/bioinformatics/btp324
McGaugh, S. E., Heil, C. S. S., Manzano-Winkler, B., Loewe, L., Goldstein, S., Himmel, T. L., & Noor, M. A. F. (2012). Recombination modulates how selection affects linked sites in Drosophila. PLoS Biology, 10(11), e1001422. doi: 10.1371/journal.pbio.1001422
McGaugh, S. E., & Noor, M. A. F. (2012). Genomic impacts of chromosomal inversions in parapatric Drosophila species. Philosophical Transactions of the Royal Society B: Biological Sciences, 367(1587), 422–429. doi: 10.1098/rstb.2011.0250
McKenna A, Hanna M, Banks E et al. (2010) The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Research, 20, 1297–303. doi:10.1101/gr.107524.110
Samuk, K., Manzano-Winkler, B., Ritz, K. R., & Noor, M. A. F. (2020). Natural selection shapes variation in genome-wide recombination rate in Drosophila pseudoobscura. Current Biology, 30(8), 1517-1528.E6. doi: 10.1016/j.cub.2020.03.0
Thurmond J, Goodman JL, Strelets VB et al. (2019) FlyBase 2.0: The next generation. Nucleic Acids Research, 47, D759–D765. doi:10.1093/nar/gk
Van der Auwera GA, Carneiro MO, Hartl C et al. (2013) From FastQ data to high-confidence variant calls: The Genome Analysis Toolkit best practices pipeline. Current Protocols in Bioinformatics, 43, 11.10.1-11.10.33. doi:10.1002/0471250953.bi1110s43

PseudoBase Information