PseudoBase Information


Lines available (not all lines available for all queries)

D.pseudoobscura pseudoobscura
31 lines
Mesa Verde, CO 2-25 reference line
collected: 2005
Source : Richards SRA Experiment : Original
American Fork Canyon, UT 06
collected: 2015
SRA Experiment : SRX7842584
American Fork Canyon, UT 12
collected: 1997
SRA Experiment : SRX091462
American Fork Canyon, UT 12-2
collected: 2015
SRA Experiment : SRX7842593
American Fork Canyon, UT 14
collected: 2015
SRA Experiment : SRX7842580
American Fork Canyon, UT 19
collected: 2015
SRA Experiment : SRX7842589
American Fork Canyon, UT 24
collected: 2015
SRA Experiment : SRX7842600
American Fork Canyon, UT 30
collected: 2015
SRA Experiment : SRX7842597
American Fork Canyon, UT 47
collected: 2015
SRA Experiment : SRX7842579
American Fork Canyon, UT 48
collected: 2015
SRA Experiment : SRX7842594
American Fork Canyon, UT 49
collected: 2015
SRA Experiment : SRX7842595
American Fork Canyon, UT 56
collected: 2015
SRA Experiment : SRX7842581
American Fork Canyon, UT 57
collected: 2015
SRA Experiment : SRX7842596
American Fork Canyon, UT 60
collected: 2015
SRA Experiment : SRX7842587
Dpse SR
SRA Experiment : SRX3430959
Dpse ST
SRA Experiment : SRX3430958
Flagstaff, AZ 18
collected: 1997
SRA Experiment : SRX091310
Madera Canyon, AZ 06
collected: 2015
SRA Experiment : SRX7842583
Madera Canyon, AZ 13
collected: 2015
SRA Experiment : SRX7842582
Madera Canyon, AZ 14
collected: 2015
SRA Experiment : SRX7842586
Madera Canyon, AZ 15
collected: 2015
SRA Experiment : SRX7842599
Madera Canyon, AZ 17
collected: 2015
SRA Experiment : SRX7842588
Madera Canyon, AZ 20
collected: 2015
SRA Experiment : SRX7842598
Madera Canyon, AZ 27
collected: 2015
SRA Experiment : SRX7842591
Mather, CA 32
collected: 1997
SRA Experiment : SRX091461
Mather, CA TL
collected: 1959
SRA Experiment : SRX091324
Mount St. Helena, CA 24
collected: 1997
SRA Experiment : SRX091463
Mount St. Helena, CA 9
collected: 1997
SRA Experiment : SRX091465
S7-Flag14
collected: 1997
SRA Experiment : SRX7842590
San Antonio, NM, Pikes Peak 1134
collected: 2006
SRA Experiment : SRX091323
San Antonio, NM, Pikes Peak 1137
collected: 2006
SRA Experiment : SRX091311
D.pseudoobscura bogotana
5 lines
SCinv
SRA Experiment : SRX7260972
Suta3
SRA Experiment : SRX7260973
Toro1
SRA Experiment : SRX091468
Toro4
SRA Experiment : SRX7260971
WhiteER
SRA Experiment : SRX7260970
D.persimilis
13 lines
111_35
SRA Experiment : Pending release under PRJNA672098
111_50
SRA Experiment : Pending release under PRJNA672098
111_51
SRA Experiment : Pending release under PRJNA672098
Dper SR
SRA Experiment : SRX3430961
Dper ST
SRA Experiment : SRX3430960
MSH3
SRA Experiment : Pending release under PRJNA672098
MSH42
SRA Experiment : Pending release under PRJNA672098
MSH7
SRA Experiment : Pending release under PRJNA672098
Mather40
SRA Experiment : Pending release under PRJNA672098
MatherG
SRA Experiment : Pending release under PRJNA672098
Mount St. Helena, CA 1993
collected: 1993
SRA Experment : SRX104991, SRX104992
Mount St. Helena, CA 39
collected: 1997
SRA Experiment : SRX063440
Santa Cruz Island
collected: 2004
SRA Experiment : SRX091471
D.miranda
11 lines
MA28
SRA Experiment : SRX950183
MAO101.4
SRA Experiment : SRX950187
MAO3.3
SRA Experiment : SRX950188
MAO3.4
SRA Experiment : SRX950189
MAO3.5
SRA Experiment : SRX950190
MAO3.6
SRA Experiment : SRX950211
ML14
SRA Experiment : SRX965452
ML16
SRA Experiment : SRX965455
ML6f
SRA Experiment : SRX965460
SP138
Source : D Bachtrog SRA Experiment : SRX965461
SP235
SRA Experiment : SRX965462
D.lowei
1 line
Lab3Lowei
SRA Experiment : SRX091467, SRX091466

About PseudoBase

Drosophila pseudoobscura is a classic model system for the study of evolutionary genetics and genomics, and many genome sequences have accumulated for D. pseudoobscura and closely related species. To facilitate the exploration of genetic variation within species and comparative genomics across species, we present PseudoBase. This database contains genetic variation (SNPs and indels) from D. pseudoobscura and several related species. All genetic data within the database are derived from the same workflow, so variants are easily comparable across data sets. Features include an embedded JBrowse interface, ability to pull out alignments of individual genes/regions, and batch access for gene lists. Anyone can take advantage of this database without the burden of obtaining and downloading raw data, assembling genomes, or calling variants. We hope that this resource will be of use in both research and educational settings.

For further details, please see the documentation here or get in touch using the email address listed below. When citing PseudoBase, please cite our paper and reference the version number indicated in the footer (and consult the Updates tab above for details about version differences):

Korunes, KL, RB Myers, R Hardy, and MAF Noor. 2020. PseudoBase: A genomic visualization and exploration resource for the Drosophila pseudoobscura subgroup. Fly. In press. doi:10.1080/19336934.2020.1864201

Contact

Please contact us at pseudobase.help@gmail.com with any questions or comments regarding PseudoBase.

PseudoBase Data

PseudoBase uses whole genome paired-end Illumina sequencing from multiple laboratory groups and experiments (associated publications include: Fuller, Leonard, Young, Schaeffer, & Phadnis, 2018; Korunes et al., 2019; McGaugh et al., 2012; McGaugh & Noor, 2012; Samuk, Manzano-Winkler, Ritz, & Noor, 2020). Raw sequencing data and associated details are available on the NCBI Short Read Archive under the sample accessions provided in the "lines available" tab.

A brief note on naming conventions: there are two named subspecies of D. pseudoobscura—D. pseudoobscura pseudoobscura and D. pseudoobscura bogotana. In PseudoBase and in our associated paper, we use D. pseudoobscura to refer to both subspecies. We specify D. pseudoobscura pseudoobscura or D. pseudoobscura bogotana when we are specifically referring to only one of the two subspecies

The pipeline used for genome alignment and variant calling is available on GitHub (https://github.com/kkorunes/PseudobaseScripts).
In summary, we first used BWA-0.7.17 (Li & Durbin 2009) to align all sequences to the D. pseudoobscura genome assembly, obtained from FlyBase (Dpse_3.04: GCA_000001765.2; Thurmond et al. 2019). Please note that while PseudoBase only includes this reference genome, FlyBase provides a coordinate converter that is useful for converting coordinates from the previous version of the D. pseudoobscura reference (Coordinate Converter). We used Picard to mark adapters and duplicates that might introduce bias from data generation steps such as PCR amplification (http://broadinstitute.github.io/picard/). Variants were then called and filtered using GATK v4.1.1 (McKenna et al. 2010; Van der Auwera et al. 2013). We filtered SNPs and INDELs separately, according to the hard filtering recommendations provide by GATK. Specifically, we excluded SNPs with QualByDepth (QD) < 2.0 , FisherStrand (FS) > 60, and StrandOddsRatio (SOR) > 3.0, MQ < 40, MQRankSum < -12.5, ReadPosRankSum < -8. INDELs were filtered to exclude variants with QualByDepth (QD) < 2.0 , FisherStrand (FS) > 200, and StrandOddsRatio (SOR) > 10.0, ReadPosRankSum < 20.

PseudoBase User Interface

Pseudobase Home search page

The PseudoBase homepage allows the user to query by gene (or genes if the user uploads a batch query) or by chromosomal region. By selecting one or more species of interest, the user can either generate a FASTA-formatted alignment or navigate to the JBrowse interface (Buels et al., 2016). Supported formats in the “By Gene” search function on the homepage include gene names (e.g., adh), GA IDs (e.g., GA26895), CG IDs (e.g., CG10064), GL IDs (e.g., GL15062), GLEANR IDs (e.g., GLEANR_4729), and FlyBase IDs (e.g., FBgn0248267). D. melanogaster gene IDs are available for search because D. melanogaster orthologs in other sequenced Drosophila genomes are reported by FlyBase (as determined by OrthoDB), and PseudoBase uses this ortholog report to display the relevant orthologous D. pseudoobscura gene when a D. melanogaster gene identifier is entered (Thurmond et al., 2019; FlyBase file ("dmel_orthologs_in_drosophila_species_fb_2020_04.tsv.gz"). PseudoBase also uses this ortholog report to look up gene identifiers of D. persimilis, by first determining the D. melanogaster ortholog, then looking up the D. pseudoobscura ortholog. Note that any genomic region, including subfeatures of genes such as introns, can be accessed from this page by inputting their genomic coordinates into the “By Chromosome” tab. The JBrowse interface can also be reached directly, using the “Browse” tab.

If the user generates a FASTA-formatted alignment, the FASTA headers will contain the following information, depending on whether the search was performed "By Gene" or "By Chromosome":

By gene: 'species' | 'strain name' | 'reference sequence FlyBase release' | 'chrom'_'gene CDS start pos' 'MRNA transcript used to determine CDS' | 'list of gene synonyms/translations for selected gene'

By chromosome: 'species' | 'strain name' | 'chromosome' | 'reference sequence FlyBase release' | 'position range selected'

PseudoBase JBrowse

The embedded JBrowse interface allows for browsing of specific genes/regions. All strains imported into PseudoBase are automatically made available for browsing within JBrowse, and the user can view or hide strains by checking/unchecking boxes in the “Available Tracks” panel to the left of the JBrowse viewer. JBrowse allows the user to visualize SNPs and indels (tracks with the “I/D” prefix) specific to each selected track. Clicking on any of the displayed variants brings up further details, such as the specific allele and its attributes (e.g., sequencing depth).

The reference sequence and accompanying annotations in the JBrowse interface are provided by FlyBase (Dpse_3.04: GCA_000001765.2; Thurmond et al. 2019). Clicking on either “Ref sequence” or “Ref annotations” brings up details about these data tracks, including the color legend used identify annotation features (e.g., coding genes, ncRNA, orthologous regions, etc.) at a glance. Clicking on an annotation itself will bring up a variety of details including genomic coordinates, length, alternative names, and the reference sequence of the feature, with an option to save a FASTA-formatted download of this sequence.

The top of the JBrowse interface includes arrows to navigate along the selected chromosome, -/+ options to zoom out or in, a chromosome dropdown menu to jump to a different chromosome, and a chromosome coordinate entry box to jump to a different region of the selected chromosome.

For the genomic region selected in the JBrowse viewer, the reference sequence and accompanying annotations are pulled from FlyBase (ftp://ftp.flybase.net/genomes/Drosophila_pseudoobscura/dpse_r3.04_FB2018_05/gff/dpse-all-3.04.gff.gz). The displayed features include genes, coding sequences (CDSs), exons, introns, untranslated regions (5’ and 3’ UTRs), mRNA, ncRNA, orthologous regions, “orthologous to" annotations, proteins, and syntenic_regions. See the FlyBase documentation for descriptions of these data types. Clicking on any of these features brings up detailed information, including coordinates, the feature length, any aliases, the full nucleotide sequence, and the nucleotide sequence of each subfeature (e.g., introns).

URL Query Parameters

PseudoBase URL query parameters allow the user to link straight to a specific gene/region in PseudoBase via a URL. Available parameters are the region (gene or genomic coordinates), the output format ('fasta' or 'jbrowse' , with the default set to 'fasta'), and the species (comma separated list; e.g., 'pse,mir,bog,low,per' - default is 'pse'). For example:

Jump straight to Adh gene in JBrowse:

http://pseudobase.biology.duke.edu?gene=Adh&output=jbrowse

Jump straight to GA10043 gene FASTA results (output mir and low species):

http://pseudobase.biology.duke.edu?gene=GA10043&species=mir,low

Jump straight to Chr 2, coordinates 550000..564000 in JBrowse:

http://pseudobase.biology.duke.edu?chrom=2&pos=550000..564000&output=jbrowse

Citations

  • Buels, R., Yao, E., Diesh, C. M., Hayes, R. D., Munoz-Torres, M., Helt, G., … Holmes, I. H. (2016). JBrowse: a dynamic web platform for genome visualization and analysis. Genome Biology, 17(66). doi: 10.1186/s13059-016-0924-1

  • Fuller, Z. L., Leonard, C. J., Young, R. E., Schaeffer, S. W., & Phadnis, N. (2018). Ancestral polymorphisms explain the role of chromosomal inversions in speciation. PLoS Genetics, 14(7), e1007526. doi: 10.1371/journal.pgen.1007526

  • Korunes, K. L., Machado, C. A., & Noor, M. A. (2019). Inversions shape the divergence of Drosophila pseudoobscura and D. persimilis on multiple timescales. BioRxiv, 842047. doi: 10.1101/842047

  • Li H, Durbin R (2009) Fast and accurate short read alignment with Burrows-Wheeler Transform. Bioinformatics, 25, 1754–60. doi:10.1093/bioinformatics/btp324

  • McGaugh, S. E., Heil, C. S. S., Manzano-Winkler, B., Loewe, L., Goldstein, S., Himmel, T. L., & Noor, M. A. F. (2012). Recombination modulates how selection affects linked sites in Drosophila. PLoS Biology, 10(11), e1001422. doi: 10.1371/journal.pbio.1001422

  • McGaugh, S. E., & Noor, M. A. F. (2012). Genomic impacts of chromosomal inversions in parapatric Drosophila species. Philosophical Transactions of the Royal Society B: Biological Sciences, 367(1587), 422–429. doi: 10.1098/rstb.2011.0250

  • McKenna A, Hanna M, Banks E et al. (2010) The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Research, 20, 1297–303. doi:10.1101/gr.107524.110

  • Samuk, K., Manzano-Winkler, B., Ritz, K. R., & Noor, M. A. F. (2020). Natural selection shapes variation in genome-wide recombination rate in Drosophila pseudoobscura. Current Biology, 30(8), 1517-1528.E6. doi: 10.1016/j.cub.2020.03.0

  • Thurmond J, Goodman JL, Strelets VB et al. (2019) FlyBase 2.0: The next generation. Nucleic Acids Research, 47, D759–D765. doi:10.1093/nar/gk

  • Van der Auwera GA, Carneiro MO, Hartl C et al. (2013) From FastQ data to high-confidence variant calls: The Genome Analysis Toolkit best practices pipeline. Current Protocols in Bioinformatics, 43, 11.10.1-11.10.33. doi:10.1002/0471250953.bi1110s43


Brought to you by the Noor laboratory at Duke University
Original research sponsored by the National Institutes of Health
This research sponsored by National Science Foundation grants 1545627, 1754022, and 1754439
Original database and interface designed and constructed by Ryan Hardy
PseudoBase v2.1 redesign and construction by Russell B Myers
Genome alignment and variant calling by Katharine Korunes Ph.D.
Please contact us at pseudobase.help@gmail.com with any questions/comments
Powered by JBrowse Powered by Django. NSF Logo
Patch: 2.1.3a