Steps to compare Allen Institute's raw sequencing data for the WTC-11 hiPSC line

  1. Visit our Genomics page and click on ‘WTC whole genome’. This will direct you to our UCSC genome browser tracks for WTC11.

  2. Now scroll down to the track section below the browser (section with blue horizontal bars) and click on ‘WTC (parental) genome…’ and select “show” and submit

  3. Scroll back up to the top of the page, and you should see two tracks for WTC11: WTC parental line variants and WTC whole genome alignment. The variant track shows all of the WTC variants (SNPs and short indels) relative to the GRCh38 human reference genome.

  4. Observe the reference genome sequence that now includes WTC specific variants. Unfortunately, that sequence is unable to be directly downloaded, but to get that info do the following:

  5. Using the search bar, look up the MAPT gene or by using the ‘Blat’ function (under Tools), look up the sequence of your homology arms. Note: there may be no WTC specific variants or a small number of variants depending on the precise location and size of your homology arms.

  6. You may download the reference sequence by going to View -> DNA -> get DNA.

  7. Using your own DNA editing software (i.e., SnapGene or ApE), you may modify the sequence to include the WTC specific variants if there are any. Note: the variant track includes all variants that were initially called including some that were subsequently filtered out; also variants that pass filtering will say ‘Filter: PASS’ (click on each variant to check), and only include variants that pass filtering.

Additionally, consider whether to include both heterozygous and homozygous variants, or homozygous only (e.g., we include only homozygous variants in our homology arms).

Reach out if you need any additional help with configuring/viewing the tracks.

Thank you. But it seems that the above instructions does’t answer the question. The above steps are about viewing the variant calls provided by Allen Institute and obtaining a genome with these variants. Could you instead advise how to obtain the “raw” whole-genome sequencing reads of WTC11?

From the ucsc genome browser, you are only able to download the aligned bam file not the actual raw data (since it is 270gb). So, we will work on our end here at the Allen and make this data available with instructions on our website. It may take a few days to push this request, but we will be in touch as soon as it is posted! Thanks for your patience.

@H.Z The raw and processed WTC-11 whole genome sequence is now available via quilt here: https://open.quiltdata.com/b/allencell/tree/aics/wtc11_short_read_genome_sequence/

We also have linked read (10X) whole genome data for WTC-11 at https://open.quiltdata.com/b/allencell/tree/aics/wtc11_linkedread_wgs/

Let us know if you have any questions.

Hi,

Are the raw CRAM/BAM or fastq files available for the exome and transcriptome data on the cell lines?

Thank you

Hi Tychele, we don’t have raw exome or transcriptome data in a publicly accessible format right now, but we will be creating quilt packages with all of this data soon. We hope to have it out sometime in May and will post here as soon as it is available. Would it be useful for you to include processed data in packages (ex. alignments, variant calls, transcript counts) even if you’re planning on analyzing the raw data yourself?

Hello @tanyag … thank you for your reply! Yes, it would be useful to have processed data as well. I am planning to run some additional variant callers, beyond GATK, and would be happy to provide that variant data back to this database as well.