This seminal 2014 paper shows the real problem of Cas9, it is meant to be non-specific – else viruses would evade it in a day
- “mismatches at dCas9 binding sites can be as high as 10“
- “as many as 9 of the mismatches can be consecutive in the PAM-distal region”
- “a perfect match of 10 bases in the PAM-proximal region of the sgRNA guiding sequence is sufficient to mediate Cas9 binding to DNA.”
And once it binds it will cut sometimes – we just dont have the tech to observe it
Here is an example with 7 mismatches. “And with 7 mismatches (maybe more) allowed, there are 100k possible off-targets for any gRNA in most any genome.”