This report includes data from the following sequencing runs: iGUIDE-190710-CH4NL.

Summary

The following document summarizes the results of processing iGUIDE-190710-CH4NL sequencing set(s) through the iGUIDE pipeline. Included in this document are explanations of the data analytics as well as tables and graphics of the data obtained from the sequence analysis. This report includes 10 specimens treated with 2 targeting sequences. A total of 11,091,616 reads are considered in this analysis, which represent 1,277,684 observed incorporated double-stranded oligo-dinucleotides (dsODNs, a unit of measure associated with iGUIDE or GUIDE-seq based analyses).

Table 1 highlights some key information from the data analysis for each specimen, including the total number of alignments (representing the observed incorporated dsODNs), an estimated range of On-target editing efficiency, and the number of predicted off-target sites.

Table 1. Analysis summary.
Specimen Condition Alignments On-target Efficiency Predicted Off-targets
iGSP0227 Mock - Mock - d1 2,094 0 0
iGSP0228 TF.TCv2 - TruCD33v5 - d1 3,479 98.2% 20
iGSP0229 TF.TCv2 - CD33v5 - d1 5,028 96.1% 3
iGSP0230 Ald.SpyFi - TruCD33v5 - d1 1,390 84.1% 13
iGSP0231 Ald.SpyFi - CD33v5 - d1 2,657 98.8% 4
iGSP0232 Mock - Mock - d7 138,875 0 0
iGSP0233 TF.TCv2 - TruCD33v5 - d7 176,480 41.9% 1,904
iGSP0234 TF.TCv2 - CD33v5 - d7 322,576 84.4% 279
iGSP0235 Ald.SpyFi - TruCD33v5 - d7 306,128 8.6% 3,249
iGSP0236 Ald.SpyFi - CD33v5 - d7 318,977 78.5% 327

Specimen overview

Table 2. Specimen summary.
Specimen Nuclease GuideRNA Timepoint Reads UMItags Alignments
iGSP0227 Mock Mock d1 98,614 8,968 2,094
iGSP0228 TF.TCv2 TruCD33v5 d1 151,397 15,213 3,479
iGSP0229 TF.TCv2 CD33v5 d1 218,356 23,342 5,028
iGSP0230 Ald.SpyFi TruCD33v5 d1 56,929 6,222 1,390
iGSP0231 Ald.SpyFi CD33v5 d1 141,385 13,514 2,657
iGSP0232 Mock Mock d7 2,053,569 398,932 138,875
iGSP0233 TF.TCv2 TruCD33v5 d7 2,047,141 413,748 176,480
iGSP0234 TF.TCv2 CD33v5 d7 2,154,405 609,207 322,576
iGSP0235 Ald.SpyFi TruCD33v5 d7 2,082,139 595,423 306,128
iGSP0236 Ald.SpyFi CD33v5 d7 2,087,681 613,396 318,977

Each specimen started in the iGUIDE pipeline as genomic DNA. The gDNA was randomly sheared through ultrasonication and ligated with barcoded DNA linkers. Nested-PCR was used to amplify from incorporated dsODN sequences to the linker sequences with barcoded and linker-specific primers. This dual barcoding reduces sample to sample crossover. Amplicons were sequenced on an Illumina platform and the sequencing data processed with the iGUIDE software, available on GitHub@cnobles/iGUIDE.

DNA sequence reads were aligned to the hg38 reference genome. The number of reads aligning for each specimen is displayed in Table 2, along with the number of unique “alignments” they represent (or the number of observed incorporated dsODNs). Multiple reads may represent a singular alignment of genomic DNA, inherent to sequence analysis of amplified DNA. These alignments indicate individual events of dsODN incorporation and/or clonal expansion.

Alternatively, random nucleotide sequences are included in the ligated linker sequences. These Unique Molecular Indeces (UMItags) can provide another method of abundance by counting the number of UMItags and breakpoint position combinations for each incorporation sites. This method of quantification has an increased dynamic range, yet can suffer from PCR artifacts leading to inflated abundances.

On-target analysis

Incorporation sites, or locations in the genome where the dsODN was detected, are expected to be in the proximity of nuclease targeted locations. The target sequences provided for these analyses and their On-target locations (Edit Locus) are shown in Table 3. The genomic locations are in a format where chromosome, orientation, and nucleotide position are delimited by a colon (“:”).

Table 3. Target sequences and associated information.
Nuclease Target Name Sequence PAM Edit Locus
Cas9 TruCD33v5 GTCAGTGACGGTACAGGA NGG chr19:+:51211265
chr19:+:51225275
Cas9 CD33v5 GAGTCAGTGACGGTACAGGA NGG chr19:+:51211265
chr19:+:51225275

Analysis of On-target associated incorporation sites (Table 4) produces several features that are helpful in On- and Off-target site characterization. These include the following:

  • Alignment Pileups: unique alignments that overlap with each other or “pileup”, suggesting a nearby location may be targeted for a double strand break (DSB). For this analyses, any group of 3 or more unique alignments were considered as a pileup cluster.

  • Flanking Paired alignments: alignments can be found on either side of a DSB, and therefore identifying flanking alignments suggests a DSB could be found between the paired alignments. Flanking alignments were searched for in these data up to 200 bp from each other.

  • Target Matched alignments: searching for the target sequences upstream of the incorporation site can be an indicator of targeted nuclease activity. While this indicator may seem to be crucial, guide RNAs have been demonstrated to have a variety of behaviors when annealing to target DNA, not all of which can be easily searched for with a simple sequence alignment. Nucleotide sequence matching target sequences were searched for up to 100 bp upstream of the incorporation sites and required to have no more than 6 mismatches in the target sequence and/or PAM sequence.

Specimen specific tables with data relating to these criteria are found in Table 4 for percent On-target editing and Table 6 for identified Off-target loci.

On-target editing efficiency

Table 4 displays the percent of observations (efficiency or specificity) that were associated with all On-target loci for All alignments. Further the efficiencies for Pileups, Paired, and Matched criteria are displayed in the following columns. These different criteria are used as the denominator to dictcate the amount of observed nuclease-specific editing. This is an estimate though, as On-target editing does have the potential to saturate the dynamic range of the abundance calculation. Therefore, these percentages should be considered lower bounds for editing efficiency and specificity.

Table 4. Percent On-target.
All
Pileup
Paired
Matched
Specimen Condition percent percent percent percent
iGSP0227 Mock - Mock - d1 0.00 0.00 0.00 0.00
iGSP0228 TF.TCv2 - TruCD33v5 - d1 64.33 98.11 99.51 98.16
iGSP0229 TF.TCv2 - CD33v5 - d1 65.75 94.54 95.00 96.10
iGSP0230 Ald.SpyFi - TruCD33v5 - d1 15.25 85.14 100.00 84.13
iGSP0231 Ald.SpyFi - CD33v5 - d1 25.44 90.98 100.00 98.83
iGSP0232 Mock - Mock - d7 0.00 0.00 0.00 0.00
iGSP0233 TF.TCv2 - TruCD33v5 - d7 1.63 46.22 66.31 41.95
iGSP0234 TF.TCv2 - CD33v5 - d7 1.24 44.66 44.38 84.43
iGSP0235 Ald.SpyFi - TruCD33v5 - d7 0.21 12.17 12.41 8.56
iGSP0236 Ald.SpyFi - CD33v5 - d7 0.79 34.30 34.93 78.54

Off-target analysis

Specimen information

Using the criteria discussed previously based on characterizing features of nuclease targeted sites, off-target sites can be selected from the data in an unbiased manner. Table 6 shows a summary of the unique off-target locations (loci) observed in the data. For All alignments, the loci are based on overlapping alignments (pileup clustering) without a minimum number of fragments required to be classified as a pileup cluster. Pileup loci are similarly based on overlapping alignments, but require at least 3 alignments to form a cluster. Flanking Paired loci require at least two unique alignments with opposite orientation (strands) within 200 bp upstream of each other. Target Matched loci require a match in the upstream sequence to a treated target (within 6 mismatches out of the 18, 20 nts and 1 PAM mismatch).

Table 6. Off-target Loci.
All
Pileup
Paired
Matched
Specimen Condition loci loci loci loci
iGSP0227 Mock - Mock - d1 2,000 9 2 0
iGSP0228 TF.TCv2 - TruCD33v5 - d1 1,181 11 1 20
iGSP0229 TF.TCv2 - CD33v5 - d1 1,496 20 9 3
iGSP0230 Ald.SpyFi - TruCD33v5 - d1 1,111 8 0 13
iGSP0231 Ald.SpyFi - CD33v5 - d1 1,863 16 0 4
iGSP0232 Mock - Mock - d7 132,319 778 469 0
iGSP0233 TF.TCv2 - TruCD33v5 - d7 165,976 905 680 1,904
iGSP0234 TF.TCv2 - CD33v5 - d7 303,753 1,344 2,299 279
iGSP0235 Ald.SpyFi - TruCD33v5 - d7 291,578 1,281 2,120 3,249
iGSP0236 Ald.SpyFi - CD33v5 - d7 301,624 1,383 2,266 327

Off-target enrichment in cancer-associated genes

Flanking Paired loci and Target Matched loci are tested for enrichment against specific gene lists in Table 7. The cancer-associated and special gene lists included in this analysis were: http://bushmanlab.org/assets/doc/allOnco_Feb2017.tsv and http://bushmanlab.org/assets/doc/humanLymph.tsv. Enrichment was tested by Fisher’s Exact and p-values were adjusted for multiple comparisons using a Benjamani-Hochberg correction. Omitted specimens or conditions had insufficient data for this analysis (Total Gene Count = 0) or did not have enough data to support a powerful analysis (Estimated Power greater than 80%).

Table 7. Cancer-associated Gene editing enrichment.
Total
Onco Enrich.
Special Enrich.
Origin Condition genes genes p-value (pwr) genes p-value (pwr)
Reference Random 13,677 1,183 28
Flanking Pairs TF.TCv2 - CD33v5 - d7 2,301 245 0.015 (80%) 8 1.000 (17%)
Target Matched TF.TCv2 - TruCD33v5 - d7 1,906 220 0.001 (96%) 4 1.000 (3%)
Target Matched Ald.SpyFi - TruCD33v5 - d7 3,251 359 0.001 (100%) 11 1.000 (38%)

Genomic distribution of incorporation sites

The figure(s) below display the genomic distribution of identified incorporation sites. The inner most ring plots all alignments identified within the associated data, while subsequent rings plot the alignments associated with Pileups, Flanking Pairs, and Target Matched groups. The height of the bar within its associated ring is correlated to the number of incorporations identified within the 10 Mb window (logarithm base 10 of incorporation site abundances).