Note: Descriptions are shown in the official language in which they were submitted.
CA 03027882 2018-12-14
WO 2017/218727
PCT/US2017/037596
METHODS FOR RULE-BASED GENOME DESIGN
RELATED APPLICATION DATA
This application claims priority to U.S. Provisional Application No.
62/350,468 filed
on June 15, 2016, which is hereby incorporated herein by reference in its
entirety for all
purposes.
STATEMENT OF GOVERNMENT INTERESTS
This invention was made with government support under DE-FG02-02ER63445
awarded by Department of Energy and HR0011-13-1-0002 awarded by Department of
Defense. The government has certain rights in the invention.
FIELD
Aspects described herein generally relate to genetic engineering and
genetically modified
cells and/or organisms. In particular, one or more aspects of the disclosure
are directed to
methods and computer software useful for genome design based on a predefined
set of rules
or conditions or parameters or features.
BACKGROUND
Genetically modified organisms (GMOs) are being used increasingly to produce
human
consumables such as fuels, commodity chemicals, and therapeutics. GMOs are
also used in
agriculture (e.g., golden rice, Roundup Ready crops, Frostban),
bioremediation (e.g., oil
spills), and healthcare (e.g., Crohn's disease and oral inflammation).
Modifications in
commercially implemented GMOs may often be limited to heterologous gene
expression and
evolution under optimizing selection. Yet synthetic genomes that differ
radically from any
known organism may expand potential applications.
There has been considerable interest in creating minimal (Gibson et al., 2010)
and recoded
(Lajoie et al., 2013a; Lajoie et al., 2013b) genomes, but genomes are not yet
understood well
enough to design them from scratch. While in vivo genome engineering
strategies may reduce
the risk of creating nonfunctional genomes (Lajoie et al., 2013a; Lajoie et
al., 2013b), rational
design may still be indispensable for restricting the search space to create
viable genomes
with a desired function. Therefore, the field of genome engineering may be in
dire need of
1
CA 03027882 2018-12-14
WO 2017/218727
PCT/US2017/037596
general design rules or conditions or parameters or features, methods of
eliciting these rules
or conditions or parameters or features, and software that may be used to
generate viable and
constructable genomes.
SUMMARY
The following presents a simplified summary of various aspects described
herein. This
summary is not an extensive overview, and is not intended to identify key or
critical elements
or to delineate the scope of the claims. The following summary merely presents
some
concepts in a simplified form as an introductory prelude to the more detailed
description
provided below.
Aspects of the present disclosure provide methods, algorithms, computing
platforms, and
computer software for designing genomes based on satisfying a set of rules or
conditions or
parameters or features while minimizing disturbances to biologically relevant
motifs,
synthesizing the genome designs, and testing and validating the synthesized
genome designs.
A computing platform may generate genome designs and partition the genome
designs into
units that may be synthesized and/or edited, in which the genome designs
satisfy user-
specified constraints and maximize the probability of biological viability and
constructability.
Units or individual components of the redesigned genome may be tested, and
design failures
may be detected based on identifying components that fail testing. Rules or
conditions or
parameters or features for the genome design may be updated accordingly, and
recommendations for subsequent iterations may be provided.
Aspects of this disclosure are directed to a method for designing genomes
implemented by a
computing platform. The method includes receiving, as an input at a computing
platform,
data for a known genome and a list of alleles to be replaced in the known
genome, based on
the list of alleles, identifying, by the computing platform, occurrences of
each allele in the
known genome, removing, by the computing platform, the occurrences of each
allele from
the known genome, determining, by the computing platform, a plurality of
allele choices with
which to replace occurrences of each allele in the known genome, generating,
by the
computing platform, a plurality of alternative gene sequences for a genome
design based on
the known genome, wherein each alternative gene sequence comprises a different
allele
choice from the plurality of allele choices, applying, by the computing
platform, a plurality of
rules or conditions or parameters or features to each alternative gene
sequence by assigning a
2
CA 03027882 2018-12-14
WO 2017/218727
PCT/US2017/037596
score for each rule or condition or parameter or feature in each alternative
gene sequence,
resulting in scores for the plurality of rules or conditions or parameters or
features applied to
each alternative gene sequence, scoring, by the computing platform, each
alternative gene
sequence based on a weighted combination of the scores for the plurality of
rules or
conditions or parameters or features, and selecting, by the computing
platform, at least one
alternative gene sequence as the genome design based on the weighted scoring.
In some embodiments, the disclosed genome design method may be implemented for
any
type of genome, including bacterial genomes, mycoplasma genomes, yeast
genomes, human
genomes, genomes for any naturally-occurring organism, or genomes for any
previously
evolved or engineered organism. In additional embodiments, the disclosed
genome design
method may be implemented for designing any genomic changes, including
removing any
alleles, removing sites for restriction enzymes, replacing repetitive
extragenic palindromic
(REP) sequences with terminators, deleting non-essential genes, inserting
heterologous genes
to expand function, and the like.
According to some aspects, a method for updating rules in genome design is
provided. The
method includes introducing one or more features of a genome design into at
least one cell,
testing the one or more features of the at least one cell by an assay in order
to identify
genome viability and evaluate the phenotype of the one or more features
introduced into the
at least one cell, based on the testing, determining that the one or more
features introduced
into the at least one cell are expected to be viable or expected to fail
according to one or more
predefined rules or conditions or parameters or features for the genome
design, and updating
the predefined rules or conditions or parameters or features for genome design
based on the
determination. In some embodiments, the predefined rules may be updated by
leveraging
statistical techniques or machine learning algorithms.
Aspects of this disclosure provide a computer-implemented method for testing
and modifying
genome designs. The method includes obtaining all or a portion of a known
genome sequence
and a genome design generated by a computing platform, determining that one or
more
features in the genome design fail a set of predefined rules or conditions or
parameters or
features, predicting modifications to the genome design to satisfy a
predetermined design
objective and to increase probability of viability, and testing the predicted
modifications to
generate an improved genome design.
3
CA 03027882 2018-12-14
WO 2017/218727
PCT/US2017/037596
Additional aspects of the disclosure provide methods for identifying sequence
designs when
no computationally designed solution is found to be viable or confer the
desired phenotype.
Degenerate DNA sequences may be tested in combinations. Viable or
phenotypically correct
individual sequences may be identified by screening or selection. Viable DNA
sequences
may be used to update or learn new computational design rules or conditions or
parameters or
features.
The disclosure provides an engineered organism comprising a recoded genome
wherein a
particular sense codon at all instances within a gene or non-coding motif in a
template
genome is changed to alternative codons. According to one aspect, the gene is
an essential
gene or a non-essential gene encoding a protein sequence. According to one
aspect, an
instance of a particular sense codon overlaps with a non-coding motif
According to one
aspect, the non-coding motif is a ribosome binding site motif, an mRNA
secondary structure,
an internal ribosome pausing site motif or a promoter. According to one
aspect, the protein
sequence is preserved. According to one aspect, the non-coding motif is
preserved.
According to one aspect, the particular sense codon is a member selected from
the group
consisting of AGG, AGA, AGC, AGU, UUG, and UUA. According to one aspect, the
engineered organism is E. coli. According to one aspect, the engineered
organism is virus
resistant or biocontained. According to one aspect, a cognate tRNA to the
particular sense
codon is eliminated from the template genome. According to one aspect, a
cognate tRNA to
the particular sense codon is not present in the recoded genome. According to
one aspect, the
particular sense codon is placed within the engineered organism and is
reassigned to a non-
standard amino acid. According to one aspect, the alternative codon is a
synonymous codon.
According to one aspect, the alternative codon is a non-synonymous codon. The
present
disclosure provides an engineered organism comprising a recoded genome wherein
a
particular sense codon at all instances within genes or non-coding motifs in a
template
genome are changed to alternative codons. The present disclosure provides an
engineered
organism comprising a recoded genome wherein a particular sense codon in a
template
genome is changed genome-wide to alternative codons. The present disclosure
provides an
engineered organism comprising a recoded genome wherein particular sense
codons at all
instances within an essential gene in a template genome are changed to
alternative codons.
The present disclosure provides an engineered organism comprising a recoded
genome
wherein particular sense codons at all instances within essential genes in a
template genome
are changed to alternative codons. The present disclosure provides an
engineered organism
4
CA 03027882 2018-12-14
WO 2017/218727
PCT/US2017/037596
comprising a recoded genome wherein particular sense codons in a template
genome are
changed genome-wide to alternative codons. The present disclosure provides an
engineered
organism comprising a recoded genome designed by the methods described herein.
The
present disclosure provides an engineered organism comprising a recoded genome
wherein
instances of a particular sense codon are changed to alternative codons such
that the cognate
tRNA to the particular sense codon can be eliminated from the engineered
organism. The
present disclosure provides an engineered organism comprising a recoded genome
wherein
instances of a particular sense codon are changed to alternative codons such
that translation
function of the particular sense codon can be changed. The present disclosure
provides an
engineered organism comprising a recoded genome wherein instances of a
particular sense
codon are changed to alternative codons such that translation function of the
particular sense
codon can be eliminated.
Further features and advantages of certain embodiments of the present
disclosure will become
more fully apparent in the following description of embodiments and drawings
thereof, and
from the claims.
BRIEF DESCRIPTION OF THE DRAWINGS
The foregoing and other features and advantages of the present embodiments
will be more
fully understood from the following detailed description of illustrative
embodiments taken in
conjunction with the accompanying drawings in which:
Figure 1 illustrates a block diagram of an example computing device that may
be utilized to
execute software in accordance with one or more example embodiments.
Figure 2 illustrates an example block diagram of a genome design module in
which various
aspects of the present disclosure may be implemented in accordance with one or
more
example embodiments.
Figure 3 illustrates an example flow diagram of example method steps for
designing genomes
in accordance with one or more example embodiments.
Figure 4 illustrates an example graph of predicted viral resistance of recoded
genomes.
Figures 5A-5C illustrate an example of a 57-codon E. coil genome. FIG. 5A
illustrates the
entire recoded genome divided into 87 segments of ¨50-kb. Codons AGA, AGG,
AGC,
CA 03027882 2018-12-14
WO 2017/218727
PCT/US2017/037596
AGU, UUA, UUG, and UAG were computationally replaced by synonymous
alternatives
(center). Other codons (e.g. UGC) remain unchanged. Color-coded histograms
represent the
abundance of the seven forbidden codons in each segment. FIG. 5B illustrates
codon
frequencies in non-recoded (wt; E. coil MDS42) versus recoded (rc) genome.
Forbidden
codons are colored. FIG. 5C illustrates the scale of DNA editing in genomes
constructed by
de novo synthesis. Plot area represents DNA editing as the number of modified
bp compared
to the parent genome. Dark gray represents percent of genome (63%) validated
in vivo. Wt,
wild-type.
Figure 6 illustrates a genealogy of recoded E. coil strains, including the
lineage of genome-
recoded E. coil strains and their computational and biological parents.
Commonly used
laboratory strains are shown in green. Non-E. coil strain from which
orthogonal tRNA was
imported is shown in brown. Previously published recoded strains are shown in
blue. Strains
constructed in the current study are shown in black. The final rE.coli-57 and
its bio-contained
counterpart rE.coli-57C are shown in gray. (aaRS = aminoacyl-tRNA synthetase).
Figure 7 illustrates Serine, Arginine, Leucine and Stop codon frequency is for
E. coil MD542
(dark color) and the computationally designed rE.coli-57 genome (light color,
frequency
labeled).
Figure 8 illustrates an overview of the computational pipeline for recoded
genome design.
The software accepts as input a genome template (GenBank file) and a list of
codons to be
replaced. User-defined rules, both biological and technical (A-G), are then
applied to
generate a new recoded genome (Genbank file). Synthesis-compatible 2 ¨ 4 kb
sequences are
generated. Rules A-G are schematized in FIGS. 9A-9G and further explained in
Tables 1-2.
Figures 9A-9G illustrate rules or conditions or parameters or features or
guidelines for
computational design.
Figures 10A-10C illustrate an experimental strategy for recoded genome
validation. FIG.
10A illustrates a pipeline schematic comprising 1) computational design of a
57-codon
genome; 2) de novo synthesis of 2 to 4-kb overlapping recoding fragments; 3)
assembly of
50-kb segment in S. cerevisiae (orange) on a low copy plasmid; 4) plasmid
electroporation in
E. coil (wt.seg - non-recoded chromosomal segment); 5) chromosomal sequence
corresponding to recoded segment (e.g., wt.seg) replaced by kanamycin cassette
(Kan), such
that cell viability depends solely on expression of recoded genes; 6) 2,,-
integrase-mediated
6
CA 03027882 2018-12-14
WO 2017/218727
PCT/US2017/037596
recombination of attP and attB sequences (13- episomal, B- chromosomal); 6a,b)
elimination
of residual vectors (see (FIG. 10C)); 7) single-copy integrated recoded
segment. attL-attR
sites shown in gray. FIG. 10B illustrates PCR analysis of steps 4-7. (Lanes:
"L" - GeneRuler
1-kb plus ladder; "C" - control Top10; numbers 4-7 correspond to schematics in
FIG. 10A).
Red arrows denote PCR primers. FIG. 10C illustrates Cas9-mediated vector
elimination, in
which residual vector carrying recoded segment is targeted for digestion by
Cas9 using attP-
specific guide RNA (gRNA). In 6a) additional copies of the recoded segment
carry intact
attP sequence; 6b) shows Cas9 targeting of attP sequence to eliminate
additional vector
copies. The integrated segment is not cut since it does not contain an attP
sequence. All steps
were confirmed by PCR analysis. "gRNA" - guide-RNA.
Figure 11 illustrates an example of rE.coli-57 genome construction. The genome
was parsed
into 87 segments, each ¨50 kb in size. All recoded segments were de novo
synthesized
(green). A total of 55 segments were tested in vivo thus far (blue), of which
44 were
successfully validated for all gene functionality on low copy plasmids (red),
and 10 segments
were further successfully reduced to single copy of all recoded genes (yellow)
Figures 12A-12D illustrate phenotypic analysis of recoded strains. In FIG.
12A, recoded
segments were episomally expressed in the absence of corresponding wild-type
genes.
Doubling time is shown relative to the non-recoded parent strain, FIG. 12B
illustrates
localization of fitness impairment in segment 21, Chromosomal genes (gray)
were deleted to
test for complementation by recoded genes (orange). Decrease in doubling time
was observed
upon deletion of rpmF-accC operon. Essential genes in FIG. 12B are framed. In
FIG. 12C,
fine-tuning of rpmF-accC operon promoter resulted in increased gene expression
and
decrease in doubling time. (Orange: Initial promoter. Green: Improved
promoter). FIG. 12D
illustrates RNA-Seq analysis of 208 recoded genes (blue, segments 21, 38, 44,
46, 70). (Wt
gene expression shown in gray. Differentially expressed recoded genes shown in
red
(absolute 1og2 fold-change >2, adjusted p-value <0.01). Inset: P-value
distribution of recoded
genes).
Figures 13A-13B illustrate graphs representing fitness of partially recoded
strains. FIG. 13A
illustrates measurements of doubling time before and after removal of the wild-
type
chromosomal sequence in strains carrying a recoded segment on low copy plasmid
(see steps
4 and 5 in FIG. 10A). FIG. 13B illustrates measurements of doubling time
before and after
removal of the wild-type sequence, and after chromosomal integration (see
steps 4, 5, 6 and 7
7
CA 03027882 2018-12-14
WO 2017/218727
PCT/US2017/037596
in FIG. 10A). Relative Doubling time - fold change between modified and
parental strain (i.e.
intact genome and no recoded segments).
Figures 14A-14B illustrate a transcriptional landscape of recoded segment 43,
in which
expression levels of all genes within segment 43 are shown. Genes were
analyzed in non-
recoded strain (TOP10) and after chromosomal deletion. RNA was prepared
independently
for the different strains, and sequenced on an Illumina MiSeq using PE150 V2
kits (IIlumina).
For analysis of differential expression, counts were aggregated corresponding
to genes using
Genomic Features (Bioconductor). Counts obtained per gene were normalized at
the genome-
wide level using DESeq2 package (Bioconductor) (Anders et al., 2010). FIG. 14A
shows
expression levels for recoded (green) and non-recoded (purple) genes. FIG. 14B
shows p-
value and fold changes for all recoded genes. None of the genes in segment 43
was found to
be significantly differentially expressed (i.e absolute 1og2 fold-change >2
and adjusted p-
value <0.01).
Figures 15A-15B illustrate an example of troubleshooting lethal design
exceptions. In FIG.
15A, recoded segment 44 (orange) did not support cell viability upon complete
deletion of
chromosomal sequence (Chr-Aseg44.0). The causative recoded gene (accD) was
identified by
successive chromosomal deletions (Chr-Aseg44.1-4. 'X' ¨ nonviable). Essential
genes are
framed. In FIG. 15B, 2-recombination was used to exchange lethal accD sequence
recoded codons in orange) with an alternative recoded accD sequence
(accD Improved, alternative codons in blue). mRNA structure and RBS motif
strength were
calculated for both sequences. Wt shown in gray. `accD nuc': the first
position in each
recoded codon. The resulting viable sequence (accD. Viable) carried codons
from both
designs. mRNA and RBS scores - ratio between predicted mRNA folding energy
(kcal/mol)
(Markham et al., 2005) or predicted RBS strength (Salis, 2011) of recoded and
non-recoded
codon.
Figure 16 illustrates an example of exploring viable alternatives for accD
recoding. In order
to locate the recalcitrant codon(s) in the recoded gene accD, MAGE
(multiplexed automated
genome engineering as is known in the art) (Wang et al., 2009) was used in a
naive non-
recoded strain. The N-terminal end of the gene that is the most probable loci
for gene
expression disruption was specifically targeted (Plotkin et al, 2011, Goodman
et al., 2013,
Boel et al., 2016). The first five forbidden codons of gene accD (nucleotide
positions 4, 25,
52, 85, 100) were targeted by two oligonucleotides carrying degenerate bases
at the recoded
8
CA 03027882 2018-12-14
WO 2017/218727
PCT/US2017/037596
positions. (N represents base pairs A, T, C or G). WT represents non-recoded
accD sequence
(black), sequences1-5 are viable genotypes resulting from MAGE experiment
(forbidden
codons shown in black), accD.Initial represents lethal recoded accD (yellow),
accD Improved represents an alternative computationally generated accD
sequence. Predicted
mRNA folding energy scores for each sequences are shown on the right.
Predicted RBS
strength scores for each codon are shown below (bars for each position are in
the following
order: WT (black); sequence 1-5 (gray); accD.Initial (yellow); accD Improved
(blue)). mRNA
score represents the ratio between the predicted mRNA folding energy
(kcal/mol) of the
recoded sequence and the wild-type sequence. RBS score represents the ratio
between the
predicted RBS strength of the recoded sequence and the wild-type sequence for
each codon.
RBS strength is a calculated score used as a proxy for ribosome pausing.
Figure 17 illustrates an example of sequence alignment of the different
versions of the gene
accD in segment 44. WT corresponds to non-recoded sequence. accD.Initial
corresponds to
lethal recoded design. accD Improved corresponds to recoded accD sequence
generated by an
improved algorithm. accD. Viable corresponds to the genotype of the viable
clones obtained
after recombineering of accD Improved to replace accD.Initial.
Figures 18A-18B illustrates examples showing compatibility of 57-codon adk
gene with
biocontainment. In order to verify rE.coli-57 compatibility with
biocontainment, seven-codon
replacement for the essential gene adk was applied in two different bio-
contained strains
(C321.4A.adk_d6 and C321.4A.adk_d6.tyrS_d8). FIG. 18A illustrates bio-
contained strains
modified with 57-codon adk maintained similar fitness as their nonmodified
parents. Light
gray - non modified biocontainment strains (Mandell et al., 2015); Dark gray ¨
biocontained
strains with 57-codon adk. FIG. 18B illustrates escape rate of bio-contained
strains with or
without 57- codon adk. SC media: SDS + Chloramphenicol. SCA media: SDS +
Chloramphenicol + Arabinose.
Figures 19A-19B illustrate an example of construction of strain C123. FIG. 19A
illustrates an
example workflow used to create and analyze strain C123. The design phase
involved
identification of 123 AGR codons in the essential genes of Escherichia coli.
MAGE oligos
were designed to replace all instances of these AGR codons with the synonymous
CGU
codon. The build phase used CoS-MAGE to convert 110 AGR codon to CGU.
Multiplex
allele specific colony PCR (MASC-PCR) was used to screen for desired
recombinants. AGR
conversions that were not observed in 96 clones screened by MASC-PSC were
triaged to
9
CA 03027882 2018-12-14
WO 2017/218727
PCT/US2017/037596
troubleshooting. The in vivo troubleshooting phase resolved the 13 codons that
could not be
readily converted to CGU. In the Study Phase, sequencing, evolution and
phenotyping was
performed on strain C123. FIG. 19B illustrates an example schematic of the
C123 genome
relative to MG1655 (Chr. 0 oriented up.). Exterior labels indicate the set
groupings of AGR
codons. Successful AGR to CGU conversions are indicated by radial green lines,
and 13
recalcitrant codons are indicated by radial red lines.
Figures 20A-2B illustrate an example analysis of attempted AGR -> CGU
replacements. FIG.
20A illustrates AGR recombination frequency versus normalized ORF position.
AGR
recombination frequency was determined 96 clones per cell population using
MASC-PCR.
Normalized ORF position was the residue number of the AGR codon divided by the
total
length of the ORF. Failed AGR to CGU conversions are indicated using vertical
red lines
below the x-axis. FIG. 20B illustrates doubling time of strains in the C123
lineage in LBL
media at 34 C was determined in triplicate on a 96-well plate reader. Colored
bars indicate
which set of codons was under construction when a doubling time was
determined.
Recalcitrant AGR->CGU conversions that were unsuccessful (i.e., MASC-PCR
frequency <
1/96) were triaged into a troubleshooting pipeline. The optimized replacement
sequences for
these 13 recalcitrant AGR codons were incorporated into the final strain (gray
section at
right, labeled with a `*'), and the resulting doubling times were measured.
Figures 21A-21D illustrate examples of failure mechanisms for four
recalcitrant AGR
replacements. Wild type AGR codons are indicated in bold black letters, design
flaws are
indicated in red letters, and optimized replacement genotypes are indicated in
green letters.
FIG. 21A illustrates genes fts/ and murE overlap with each other. An AGA->CGU
mutation
in fts/ would introduce a non-conservative Asp3Val mutation in murE. The amino
acid
sequence of murE was preserved by using an AGA->CGA mutation. FIG. 21B
illustrates
gene secE overlaps with the RBS for downstream essential gene nusG. An AGG-
>CGU
mutation is predicted to diminish the RBS strength by 97% (47). RBS strength
is preserved
by using an AGG->GAG mutation. FIG. 21C illustrates that gene ssb has an
internal RBS-
like motif shortly after its start codon. An AGG->CGU mutation would diminish
the RBS
strength by 94%. RBS strength is preserved by using an AGA->CGA mutation
combined
with additional wobble mutations indicated in green letters. FIG. 21D
illustrates that gene
rnpA has a defined mRNA structure that would be changed by an AGG->CGU
mutation. The
original RNA structure is preserved by using an AGG->CGG mutation. The RBS
(green),
CA 03027882 2018-12-14
WO 2017/218727
PCT/US2017/037596
start codon (blue) and AGR codon (red) are annotated with like-colored boxes
on the
predicted RNA secondary structures.
Figure 22 illustrates an example of RBS strength and mRNA structure predict
synonymous
mutation success. In particular, FIG. 22 illustrates a scatter plot showing
predicted RBS
strength (y-axis, calculated with the Salis ribosome binding site calculator
(47)) versus
deviations in mRNA folding (x-axis, calculated at 37 C by UNAFold Calculator
(41)). Small
gray dots represent non-essential genes in E. coil MG1655 that have an AGR
codon within
the first 10 or last 10 codons. Large gray dots represent successful AGR->CGU
conversions
in the first 10 or last 10 codons of essential genes. Orange asterisks
represent unsuccessful
AGR->CGU mutations (recalcitrant codons) in essential genes. Green dots
represent
optimized solutions for these recalcitrant codons. The "safe replacement zone"
(blue shaded
region) is an empirically defined range of mRNA folding and RBS strength
deviations, based
on the successful AGR->CGU replacement mutations observed in this study. Most
unsuccessful AGR->CGU mutations (Orange asterisks) cause large deviations in
RBS
strength or mRNA structure that are outside the "safe replacement zone." Genes
holB andfts/
are two notable exceptions because their initial CGU mutations caused amino
acid changes in
overlapping essential genes. Arrows show that deviations in RBS strength
and/or mRNA
structure are reduced for four examples of optimized replacement of
recalcitrant codons (ftsA,
folC, rnpA, rpsJ).
Figure 23 illustrates an example of codon preference of 14 N-terminal AGR
codons. CRAM
(Crispr-Assisted MAGE) was used to explore codon preference for several AGR
codons
located within the first 10 codons of their CDS. Briefly, MAGE was used to
diversify a
population by randomizing the AGR of interest, then a CRISPR/Cas9 system as
generally
known in the art using guide RNA and a Cas enzyme was used to deplete the
parental
(unmodified) population, allowing exhaustive exploration of all 64 codons at a
position of
interest. Thereafter codon abundance was monitored over time by serially
passaging the
population of cells and sequencing using an Illumina MiSeq. The left y-axis
(Codon
Frequency) indicates relative abundance of a particular codon (stacked area
plot). The right y-
axis indicates the combined deviations in mRNA folding structure (red line)
and internal RBS
strength (blue line) in arbitrary units (AU) normalized to 0.5 at the initial
timepoint. 0 means
no deviation from wild type. The horizontal axis indicates the experimental
time point in
hours at which a particular reading of the population diversity was obtained.
Genes bcsB and
11
CA 03027882 2018-12-14
WO 2017/218727
PCT/US2017/037596
chpS are non-essential in examples of strains described herein and thus serve
as controls for
AGR codons that are not under essential gene pressure.
Figure 24 illustrates an example in which RBS strength and mRNA structure
predict codon
preference of 14 N-terminal codon substitutions. In particular, FIG. 24 shows
a scatter plot
showing the results of the CRAM experiment (FIG. 23) Each panel represents a
different
gene. The Y-axis represents RBS strength deviation (calculated with the Salis
ribosome
binding site calculator (Salis, 2011)) while the X-axis shows deviations in
mRNA folding
energy (x-axis, calculated at 37 C by UNAFold Calculator (Zadeh et al., 2011).
Codon
abundance at the intermediate time point (t=72hrs, chosen to show maximal
diversity after
selection) is represented by the dot size. Green dots represent the WT codon.
Blue dots
represent synonymous AGR codons. Orange dots represent the remaining 58 non-
synonymous codons, which may introduce non-viable amino acid substitutions.
Black
squares represent unsuccessful AGR->CGU conversions observed in the genome-
wide
recoding effort (Table 3, FIG. 19A-19B). The "safe replacement zone" (blue
shaded region)
is the empirically defined range of mRNA folding and RBS strength deviations,
based on the
successful AGR->CGU replacement mutations observed in this study (FIG. 21A-D).
Genes
bcsB and chpS are non-essential in examples of strains described and thus
serve as controls
for AGR codons that are not under essential gene pressure.
Figures 25A-25B illustrate an example in which predicting optimal replacements
for AGR
codons reduces the number of predicted codons that require troubleshooting.
FIG. 25A
illustrates empirical data from the construction of C123. 110 AGR codons were
successfully
recoded to CGU (green), and 13 recalcitrant AGR codons required
troubleshooting (red,
striped). FIG. 25B illustrates predicted recalcitrant codons for replacing all
instances of the
AGR codons genome-wide. The reference genome used for this analysis had
insertion
elements and prophages removed (Umenhoffer et al., 2010) to limit total
nucleotides
synthesized, leaving 3181 AGR codons to be replaced. The analysis predicts
that replacing all
instances of AGR with CGU would have resulted in 246 failed conversions (Naïve
Replacement', red striped). However, implementing the rules from this work
(Informed
Replacement') to identify the best CGN alternative reduces the predicted
failure rate from
10.5% (13/123), to 2.32% (74/3181) of which only a small subset will have a
direct impact
on fitness due to their location in non-essential genes. Each specific
synonymous CGN is
identified with a unique shade of green and is labeled inside of its
respective section.
12
CA 03027882 2018-12-14
WO 2017/218727
PCT/US2017/037596
Figure 26 illustrates an example strategy for replacing each "set" of AGR
codons in all of the
essential genes of Escherichia coil (EcM2.1). Here the AGR codons are marked
with open
triangles (various colors). To start, a dual-selectable to/C cassette (double
green line) is
recombined into the genome using lambda red in a multiplexed recombination
along with
several oligos targeting nearby (<500 kb), downstream AGR loci (various
colored lines).
Upon selection for to/C insertion clones, correctly chosen AGR codons are also
observed
(filled in triangles) at a higher frequency due to strong linkage between
recombination events
at to/C and other nearby (< 500 kb), downstream AGR loci. Next, a second
recombination is
carried out using the same AGR conversion oligo pool, but now paired with
another oligo to
disrupt the to/C ORF with a premature stop, after which the to/C counter-
selection is applied,
again enriching the population for AGR conversions. A third, multiplexed
recombination
then fixes the to/C ORF, again targeting AGR loci. After applying the to/C
selection clones
are assayed by MASC-PCR. Assuming most conversions in a given set had been
made, the
selectable marker would then be removed using a repair oligo in a singleplexed
or
multiplexed recombination (depending on need). The to/C counter-selection is
then leveraged
to both leave a scarless chromosome and free up the tolC cassette for use
elsewhere in the
genome.
Figures 27A-27C illustrate an example schematic of 3 different failures cases
for recalcitrant
AGR->CGU mutations. For each case, the top row is the initial sequence, the
middle row is
the AGR->CGU mutation and the third row of primary DNA sequence is the
optimized
solution converged on in troubleshooting. Green boxes below the DNA sequence
indicates
amino acid sequence in the same order (top is initial, middle results from AGR-
>CGU,
bottom results from troubleshot solution). FIG. 27A illustrates C-terminal
overlap cases of
AGR's at ends of essential genes with downstream ORF's. (i) Genes fts/ and
murE overlap
with each other. An AGA->CGU mutation in fts/ would introduce a non-
conservative
Asp3Val mutation in murE. The amino acid sequence of murE was preserved by
using an
AGA->CGA mutation. (ii) Genes holB and tmk overlap with each other. An AGA-
>CGU
mutation in holB would introduce a non-conservative Stop214Cys mutation in
tmk. The
amino acid sequence of tmk was preserved by using an AGA->CGC mutation and
adding 3
nucleotides. FIG. 27B illustrates C-terminal overlap cases of AGR's at ends of
essential
genes with the RBS of a downstream gene. (i) Gene secE overlaps with the RBS
for
downstream essential gene nusG. An AGG->CGU mutation would diminish the RBS
strength by 97% (Salis et al., 2011). RBS strength is preserved by using an
AGG->GAG
13
CA 03027882 2018-12-14
WO 2017/218727
PCT/US2017/037596
mutation. (ii) Gene dnaT overlaps with the RBS for downstream essential gene
dnaC. An
AGG->CGU mutation would diminish the RBS strength by 77% (Salis et al., 2011).
RBS
strength is preserved by using an AGG->CGA mutation. (ii) Gene folC overlaps
with the
RBS for downstream gene dedD, shown to be essential in the strain. An AGGAGA-
>CGUCGU mutation would diminish the RBS strength by 99% (Salis et al., 2011).
RBS
strength is preserved by using an AGG->CGGCGA mutation. FIG. 27C illustrates N-
terminal
RBS motifs causing recalcitrant AGR conversions at the beginning of essential
genes. (i)
Gene dnaT has an internal RBS-like motif An AGG->CGU mutation would increase
the
RBS strength 26 times (Salis, 2011). RBS strength is better preserved by using
an AGA-
>CGU mutation combined with additional wobble mutations. (ii) Gene prfB has an
internal
RBS-like motif This RBS motif is involved in a downstream planned frameshift
in prfB
(Curan, 1993). Only by removing the frameshift was AGG->CGU mutation possible
(leaving
a poor RBS-like site). To maintain the frameshift, AGG->CGG mutation and
additional
wobble was required. In that case, local RBS strength was maintained (fourth
row). (iii) Gene
ssb has an internal RBS-like motif. An AGG->CGU mutation would diminish the
RBS
strength by 94%. RBS strength is preserved by using an AGA->CGA mutation
combined
with additional wobble mutations.
Figure 28 illustrates an example of ribosomal pausing data drawn from previous
work (Li et
al., 2012) for genes ssb, dnaT and prfB. Green line represents ribosome
profiling data for
each gene. Orange line is the average for all genes with an AGR codon within
the first 30
nucleotides of the annotated start codon. Region between the two vertical red
lines indicates
zones of interest (centered 12bp after the AGR codon). Interestingly, prfB and
ssb show a
peak after the AGR codon, where no peak is observed for dnaT. Based on
predictions from
the Salis calculator, replacing AGR with CGU in those 3 cases is believed to
disrupt
ribosomal pausing (prfB and ssb) or to introduce ribosomal pausing (dnaT).
Figure 29 illustrates an example of mRNA folding predictions for the 4
recalcitrant AGR-
>CGU mutations explained by mRNA folding variations. mRNA folding prediction
of 100
nucleotides upstream and 30 nt downstream of the start codon using UNAfold
(Markham et
al., 2008). Both the shape of the mRNA folding and the folding energy value
have to be taken
into account to understand failure of the AGR->CGU conversion. `AGR' depicts
the
predicted, wild-type mRNA, `CGU' is the mRNA folding prediction with an AGR-
>CGU
mutation (generally not observed) and 'Optimized' correspond to the mRNA
folding
14
CA 03027882 2018-12-14
WO 2017/218727
PCT/US2017/037596
prediction of the AGR replacement solution found after in vivo
troubleshooting. Under each
structure, the predicted free energy of folding of the visualized structure is
listed in kcal/mol.
Figures 30A-30D illustrate an example of mRNA folding predictions for the gene
rnpA. For
folding predictions, 30 nucleotides were used upstream and 100 nucleotides
downstream of
the rnpA start site using UNAfold (Markham et al., 2008). FIG. 30A illustrates
the wild-type
rnpA sequence, with AGG (in blue box). FIG. 30B illustrates the wild-type rnpA
sequence
with AGG->CGU in blue box (not observed). FIG. 30C illustrates the wild-type
rnpA
sequence with AGG->CGG in blue box (observed with no growth rate defect). FIG.
30D
illustrates the wild-type rnpA sequence with AGG->CTG in blue box and one
complementary
mutation CCC->CCA to maintain the mRNA loop (in blue box) (observed, also with
no
growth rate defect).
Figure 31 illustrates an example in which G15A ArgU does not affect expression
and
aminoacylation levels in WT and recoded E. coli strains. Northern blot Acid-
Urea PAGE was
performed on WT and G15A argU tRNA in wild-type E. coli (WT-WT and WT-G15A),
and
in the final strains C123a and b (501 and 503) at several growth conditions.
Aminoacylation
levels are comparable to wild-type for all conditions and combinations,
suggesting no effect
on charging levels despite the mutation sweeping into the population.
Figure 32 illustrates an example of a number of reads for each codon and for
each gene in the
CRAM experiment at time point 24hrs. CRAM (Crispr-Assisted MAGE) was used to
explore
codon preference for several N-terminal AGR codons. The left y-axis (Number of
reads)
indicates abundance of a particular codon. The x-axis indicates the 64
possible codons ranked
from AAA to TIT in alphabetical order. Experimental time point 24hrs is
presented.
Diversity was assayed by Illumina sequencing. Genes bcsB and chpS are non-
essential and
thus serve as controls for AGR codons that are not under essential gene
pressure.
Figure 33 illustrates an example of a number of reads for each codon and for
each gene in the
CRAM experiment at time point 144hrs. CRAM (Crispr-Assisted MAGE) was used to
explore codon preference for several N-terminal AGR codons. The left y-axis
(Number of
reads) indicates abundance of a particular codon. The x-axis indicates the 64
possible codons
ranked from AAA to TTT in alphabetical order. Experimental time point 144hrs
is presented.
Diversity was assayed by Illumina sequencing. Genes bcsB and chpS are non-
essential and
thus serve as controls for AGR codons that are not under essential gene
pressure.
CA 03027882 2018-12-14
WO 2017/218727
PCT/US2017/037596
Figure 34 illustrates an example of a number of predicted recalcitrant AGR
codons for each
AGR replacement strategy. 4 possible genomes replacing all 3222 AGRs have been
designed
using 4 replacement strategies. First AGRs were changed to CGU genome-wide
(green bars).
Second, AGR synonyms were chosen to minimize local mRNA folding deviation near
the
start of genes (orange bars). Third, AGR synonyms were chosen to reduce RBS
strength
deviation (blue bars). Finally, AGR synonyms were chosen to minimize both
(purple bars).
These genomes were then scored using custom software and compared. Every
deviation
outside of the Safe Replacement Zone is predicted to be a recalcitrant codon.
Figure 35 illustrates an example of a representational graph of the fully
recoded genome
relative to MG1655. The outer ring contains the set grouping that each AGR
codon (vertical
line) is in. Each line contains information on troubleshooting (red if
troubleshot, green if not),
and relative recombination frequency (dot). Each internal ring represents the
mutations
accumulated during that sets creation, the active set for each ring is
highlighted. The internal
rings represent the troubleshooting steps during strain construction.
Fig. 36A is a schematic depicting various method steps of embodiments of the
present
disclosure.
Fig. 36B is a graph depicting the experimental procedure where alternative
codons are
introduced via MAGE at different positions in the genome. The population is
then maintained
at mid-logarithmic phase growth while sampling at regular intervals. Codon
fractions are
plotted vs time and a logarithmic decay function is fitted and the decay
constant indicates
fitness.
Fig. 36C compares the experimentally-measured fitness to the predicted GETK
score. Each
position on the x-axis corresponds to one of 95 sub-experiments testing a
different genomic
position. Position on the y-axis indicates fitness relative to wild-type, with
more negative
value indicating worse fitness and 0 indicating wild-type fitness. Inset shows
fitness of
measured codons grouped by good, average, or bad GETK scores. Examples with
good
predicted score have significantly better fitness.
Fig. 37 shows a summary of results of 62 sub-experiment testing combinations
of proximal
codon changes near the 5-prime ends of various genes. A library of oligos was
designed with
degeneracy at codon positions within the 90-mer oligo window. Sub-experiment
results are
presented together, but separated by codon combinations with good fitness (<
7% fitness
16
CA 03027882 2018-12-14
WO 2017/218727
PCT/US2017/037596
defect) or bad fitness (> 13% fitness defect). A pair of good-bad fitness
summaries is plotted
for each of three GETK scoring metrics: change in 5-prime mRNA folding
strength, change
in upstream RBS motif strength, change in internal RBS motif strength. For
each metric, a
lower score indicates less predicted disruption of the respective motif
Fig. 38 illustrates alternative codon trajectories for controls. Top row shows
null-effect
controls, where synonymous codons and early stop codons were introduced into
non-essential
genes LacZ and GalK at multiple positions, and showing similar effect between
synonymous
codons and internal stops. Bottom rows shows strong-effect controls, where
synonymous
codons and internal stop codons were introduced into essential genes. These
show a marked
difference between internal stop and synonymous codons, with a greater dynamic
range of
codon preference at some positions.
Fig. 39 summarizes results from testing non-synonymous and synonymous
mutations
observed in phylogenetically-close neighbors of E. coli in gammaproteobacteria
at specific
positions internal to genes (not limited to 5-prime end). These positions were
prioritized
according to whether internal RBS for some alternatives were predicted by GETK
to be
disruptive. Internal RBS score is shown to be a strong predictor of fitness of
alternative allele
choices.
Fig. 40 shows results from testing a mix of non-synonymous mutations predicted
by
conservation. These positions were prioritized according to peaks of ribosomal
pausing as
reported by (Li et al., 2012). Internal RBS score is shown to be a strong
predictor of fitness of
alternative allele choices.
DETAILED DESCRIPTION
Embodiments of the present disclosure are based on methods, algorithms, and
computer
software for designing genomes based on a set of rules or constraints or
conditions or
parameters or features which may be generally referred to throughout as
"constraints", "a
constraint," "rules," or "a rule" or "ruled based." The rule-based genome
design described
herein includes methods and computer algorithms for implementing genome
modifications
while preserving known biological motifs and features in DNA and satisfying
various
constraints and/or rules or conditions or parameters or features for synthesis
and assembly of
17
CA 03027882 2018-12-14
WO 2017/218727
PCT/US2017/037596
designed genomes. As described herein, rules or conditions or parameters or
features may
refer to biological constraints and synthesis constraints which may be applied
in synthesizing
genome designs by scoring each constraint for a possible genome design.
Biological motifs
may include essential genes, ribosome binding site (RBS) motifs, mRNA
secondary
structures, internal ribosome pausing site motifs, and the like. In some
embodiments, the
disclosed methods for genome design may be directed to designing genetic
elements,
including genes, operons, genomes, and the like.
Aspects of the present disclosure include methods for empirically deriving new
rules or
constraints or conditions or parameters or features based on combinations of
multiplex
automatable genome engineering (MAGE) and targeted sequencing, along with
other
technologies such as CRISPR-assisted MAGE (CRAM), MAGE in combination with
molecular inversion probes (MIPS), and the like. Aspects described herein may
also include
providing information about designed genomes based on a set of constraints
and/or rules and
recommending modifications that may yield phenotypic improvements in future
genome
design. Ultimately, the rule-based genome design methods and integrated
software disclosed
herein may be beneficial in the fields of genome engineering and bioproduction
for
improving efficiency and reducing costs of DNA construct production.
In some cases, several challenges may arise when modifying a genome, such as
when
choosing synonymous alleles for genome-wide allele replacement of certain
alleles (which
may be referred to as "forbidden alleles" or "forbidden codons" as described
herein). First, to
ensure biological viability, it may be important to maintain the fundamental
features of a
parent genome, such as GC content and regulatory elements encoded by the
primary
nucleotide sequence. Additionally, when forbidden alleles fall in overlapping
gene regions, it
may be necessary to carefully split these overlaps in a manner that avoids
introducing non-
synonymous mutations or disrupting regulatory features. Finally, it may be
desirable for a
computational design scheme to be compatible with the experimental tools being
used for
genome construction.
Thus, described herein is a rule-based architecture for genome recoding
software, in which
user-specified rules serve as constraints for finding suitable synonymous
allele replacements.
As an example, Tables 1 and 2 provide further examples of rules and
constraints that may be
implemented for genome design (e.g., for design and synthesis of a radically
recoded E. coil
genome). In particular, Table 1 provides examples of biological constraints or
conditions or
18
CA 03027882 2018-12-14
WO 2017/218727
PCT/US2017/037596
parameters or features for genome design rules, whereas Table 2 provides
examples of
synthesis constraints or conditions or parameters or features for genome
design rules. The
rule-based architecture described herein may be implemented as a computer
module or
software module and may be extended to general applications, as well as
customized
according to specific needs.
In the following description of the various embodiments, reference is made to
the
accompanying drawings, which form a part hereof, and in which is shown by way
of
illustration, various embodiments of the disclosure that may be practiced. It
is to be
understood that other embodiments may be utilized. A person of ordinary skill
in the art after
reading the following disclosure will appreciate that the various aspects
described herein may
be embodied as a computerized method, system, device, or apparatus utilizing
one or more
computer program products. Accordingly, various aspects of the computerized
methods,
systems, devices, and apparatuses may take the form of an embodiment
consisting entirely of
hardware, an embodiment consisting entirely of software, or an embodiment
combining
software and hardware aspects. Furthermore, various aspects of the
computerized methods,
systems, devices, and apparatuses may take the form of a computer program
product stored
by one or more non-transitory computer-readable storage media having computer-
readable
program code, or instructions, embodied in or on the storage media. Any
suitable computer
readable storage media may be utilized, including hard disks, CD-ROMs, optical
storage
devices, magnetic storage devices, and/or any combination thereof In addition,
various
signals representing data or events as described herein may be transferred
between a source
and a destination in the form of electromagnetic waves traveling through
signal-conducting
media such as metal wires, optical fibers, and/or wireless transmission media
(e.g., air and/or
space). It is noted that various connections between elements are discussed in
the following
description. It is noted that these connections are general and, unless
specified otherwise,
may be direct or indirect, wired or wireless, and that the specification is
not intended to be
limiting in this respect.
In one or more arrangements, teachings of the present disclosure may be
implemented with a
computing device. FIG. 1 illustrates a block diagram of a computing device 100
that may be
used in accordance with aspects of the present disclosure, such as for
implementing methods
for genome design. The computing device 100 is a specialized computing device
programmed and/or configured to perform and carry out aspects associated with
rule-based
19
CA 03027882 2018-12-14
WO 2017/218727
PCT/US2017/037596
genome design as described herein. The computing device 100 may have a genome
design
module 101 configured to perform methods and execute instructions as described
herein. The
genome design module 101 may be implemented with one or more specially
configured
processors and one or more storage units (e.g., databases, RAM, ROM, and other
computer-
readable media), one or more application specific integrated circuits (ASICs),
and/or other
hardware components. Throughout this disclosure, the genome design module 101
may refer
to the software (e.g., a computer program, application, and or algorithm)
and/or hardware
used to receive one or more genome files or templates (e.g., one or more
annotated GenBank
files), receive a list of alleles to be replaced, modify a genome by applying
a set of biological
constraints and synthesis constraints to the genome sequences(s), generate a
new genome
design based on the modifications, scoring genome designs, modifying and/or
creating new
rules or constraints or conditions or parameters or features for genome
design, and the like.
Specifically, the genome design module 101 may be a part of a rule-based
architecture for
genome recoding software which may be further extended to other applications.
The one or
more specially configured processors of the genome design module 101 may
operate in
addition to or in conjunction with another general processor 103 of the
computing device
100. In some embodiments, the genome design module 101 may be a software
module
executed by one or more general processors 103. Both the genome design module
101 and
the general processor 103 may be capable of controlling operations of the
computing device
100 and its associated components, including RAM 105, ROM 107, an input/output
(I/O)
module 109, a network interface 111, and memory 113.
The I/O module 109 may be configured to be connected to an input device 115,
such as a
microphone, keypad, keyboard, touchscreen, gesture or other sensors, and/or
stylus through
which a user of the computing device 100 may provide input data. The I/O
module 109 may
also be configured to be connected to a display device 117, such as a monitor,
television,
touchscreen, and the like, and may include a graphics card. The display device
117 and input
device 115 are shown as separate elements from the computing device 100,
however, they
may be within the same structure. Using the input device 115, system
administrators or users
may add and/or update various aspects of the genome design module, such as
rules or
constraints or conditions or parameters or features, scoring, predefined
thresholds, ranges,
and biological and synthesis constraints related to designing a genome. The
input device 115
may also be operated by users in order to design a genome by inputting a
genome file and a
CA 03027882 2018-12-14
WO 2017/218727
PCT/US2017/037596
list of alleles or sequences to be modified in the genome file by the genome
design module
101.
The memory 113 may be any computer readable medium for storing computer
executable
instructions (e.g., software). The instructions stored within memory 113 may
enable the
computing device 100 to perform various functions. For example, memory 113 may
store
software used by the computing device 100, such as an operating system 119 and
application
programs 121, and may include an associated database 123.
The network interface 111 allows the computing device 100 to connect to and
communicate
with a network 130. The network 130 may be any type of network, including a
local area
network (LAN) and/or a wide area network (WAN), such as the Internet. Through
the
network 130, the computing device 100 may communicate with one or more
computing
devices 140, such as laptops, notebooks, smartphones, personal computers,
servers, and the
like. The computing devices 140 may include at least some of the same
components as
computing device 100. In some embodiments the computing device 100 may be
connected to
the computing devices 140 to form a "cloud" computing environment.
The network interface 111 may connect to the network 130 via communication
lines, such as
coaxial cable, fiber optic cable, and the like or wirelessly using a cellular
backhaul or a
wireless standard, such as IEEE 802.11, IEEE 802.15, IEEE 802.16, and the
like. In some
embodiments, the network interface may include a modem. Further, the network
interface
111 may use various protocols, including TCP/IP, Ethernet, File Transfer
Protocol (FTP),
Hypertext Transfer Protocol (HTTP), and the like, to communicate with other
computing
devices 140.
According to certain aspects, the computing device 100 may interface with one
or more
databases 155 to access genome data (e.g., gene sequences). For example, a
database 155
may be an external database that stores a collection of nucleotide sequences
(e.g., DNA,
mRNA, cDNA, and the like) and corresponding protein translations (e.g.,
GenBank). In some
cases, the genome design module 101 may access and/or receive a specific
genome file or
template from the database 155, and the genome design module 101 may utilize
the file for
further genome design based on a set of rules and scoring.
FIG. 1 is an example embodiment of a computing device 100. In other
embodiments, the
computing device 100 may include fewer or more elements. For example, the
computing
21
CA 03027882 2018-12-14
WO 2017/218727
PCT/US2017/037596
device 100 may use the general processor(s) 103 to perform functions of the
genome design
module 101, and thus, might not include a separate processor or hardware for
the genome
design module 101.
Although not required, various aspects described herein may be embodied as a
method, data
processing system, or as computer-readable medium storing computer-executable
instructions. For example, a computer-readable medium storing instructions to
cause a
processor to perform steps of a method in accordance with aspects of the
disclosed
embodiments is contemplated. For example, aspects of the method steps and
algorithms
disclosed herein may be executed on a processor on computing device 100. Such
a processor
may execute computer-executable instructions stored on a computer-readable
medium.
FIG. 2 illustrates an example block diagram of a genome design module in which
various
aspects of the present disclosure may be implemented in accordance with one or
more
example embodiments. In particular, FIG. 2 illustrates a genome design module
201 which
may comprise a software tool that may be utilized for any genome
modifications, such as a
genome-wide allele replacement in a prokaryotic genome. In some embodiments,
the genome
design module 201 may be the same as the genome design module 101.
The genome design module 201 may utilized for a variety of purposes, including
refactoring
genomes such as by removing all occurrences of a particular allele throughout
the genome
(allowing deletion of translation factors and functional allele reassignment),
rearranging
operons into functionally related units, removing non-essential elements
(e.g., cryptic
prophages, mobile elements, non-essential genes, etc.),
modifying/optimizing/introducing
metabolic pathways, and the like.
As illustrated in the example in FIG. 2, the genome design module 201 may
receive two
inputs: a genome template file 202 and a list of alleles 204. The genome
template 202 may
comprise known genome sequences or a particular genome (e.g., in the form of
an annotated
GenBank file). In some embodiments, the genome template 202 may comprise
sequences for
any type of genome, including bacterial genomes, mycoplasma genomes, yeast
genomes,
human genomes, genomes for any naturally-occurring organism, or genomes of any
previously evolved or engineered organism. As an example, an E. coil MD542
genome
template (GenBank: AP012306.1) was used as the genome template 202 as
described in the
Examples herein. The list of alleles 204 may comprise a list of alleles to be
synonymously
22
CA 03027882 2018-12-14
WO 2017/218727
PCT/US2017/037596
replaced throughout the genome. The list of alleles 204 may also include
coding sequences
(e.g., codons) and non-coding sequences (e.g., non-coding RNAs including tRNA
and sRNA,
extragenic sequence motifs that may or may not overlap with the coding
sequence, repetitive
extragenic palindromic (REP) sequences, or the like). In some embodiments, the
list of alleles
204 may represent a list of codons, which may be referred to as "forbidden
codons." For
example, the following seven codons were in the list of codons to be replaced
in the E. coil
example described below: AGA, AGG, AGC, AGU, UUG, UUA, and UAG.
The genome design module 201 may receive the genome template 202 and the list
of alleles
204 and automatically replace all instances of alleles from the list in the
genome. For
example, the genome design module 201 may automatically replace, within the
genome, all
instances of forbidden codons from a list of codons. The genome design module
201 may
also utilize a scoring sub-module 208, and the genome design module 201 may be
configured
to select synonymous codons that allow the resulting sequence to best adhere
to biological
constraints 205 and/or synthesis constraints 206. In some embodiments, the
scoring sub-
module 208 may be referred to as a scoring tool.
Tables 1 and 2 provide examples of biological constraints 205 and synthesis
constraints 206,
respectively, which may be applied in genome design, along with descriptions
of rules,
constraints or conditions or parameters or features, motivation,
implementation, and
corresponding genome annotations. The synthesis constraints 206 may include
one or more
experimental rules or constraints or conditions or parameters or features that
may be applied
for synthesizing genome designs. In some cases, the synthesis constraints 206
may be vendor
and/or technology-specific rules or constraints or conditions or parameters or
features that are
to be satisfied during genome design. Examples of synthesis constraints 206
may include
(and are not limited to) rules for removing forbidden restriction enzyme
motifs, leveraging
synonymous swaps to normalize high/low GC content within genes in a genome
design,
preserving regulatory motifs if high/low GC content is present in intergenic
regions,
minimizing strong secondary structures, deleting repetitive elements which may
be difficult
to synthesize and replacing them by terminators, leveraging synonymous swaps
to diversify
primary sequence if homopolymer runs are present within genes, preserving
regulatory motifs
if homopolymer runs are present in intergenic regions, partitioning operons to
increase the
likelihood of synthesizing modular genome units that contain entirety of
discrete
transcriptional units, etc.
23
CA 03027882 2018-12-14
WO 2017/218727
PCT/US2017/037596
The biological constraints 205 may include one or more rules or constraints or
conditions or
parameters or features that are applied to genome design for preserving
biologically relevant
motifs, in which the biological constraints 205 may be implemented as code in
the genome
design module 201. For example, the biological constraints 205 may include a
rule for
maintaining predicted secondary structure of RNA (e.g., including, but not
limited to,
mRNA). The genome design module 201 may compute a predicted RNA secondary
structure
for both an original sequence and a modified, design sequence, and the scoring
sub-module
208 may provide a quantitative representation of the difference between the
two. In some
embodiments, the genome design module 201 may compute deviation in predicted
mRNA
secondary structure by comparing the predicted free energy (AG) of the
original and designed
sequences (e.g., a thermodynamic-based secondary structure prediction) and/or
by calculating
a number of nucleotides that are no longer paired with the same sister
nucleotide in the
designed sequence with respect to the original sequence. In some cases, a rule
may be
modified according to the context of a desired change. For example, for
changes near a 5' end
of a gene, the genome design module 201 may compute an mRNA secondary
structure
spanning nucleotides -30 to +100 of a sequence and relative to the start codon
of the gene.
Additionally, the biological constraints 205 may also include a rule or
constraint or condition
or parameter or feature for preserving ribosome binding site (RBS) motifs. A
ribosome
binding site may comprise a DNA sequence motif (e.g., sequence of nucleotides)
found
approximately ten bases upstream of a gene (e.g., upstream of a start codon).
The genome
design module 201 may score and rank sequence designs according to disruption
to ribosome
binding sites (e.g., by using the scoring sub-module 208). For example, if a
RBS motif exists
in overlapping genes (e.g., to support expression of a downstream, overlapping
gene), it may
be beneficial to only allow mutations that do not strongly impact RBS
strength. In yet another
example, if output design parameters conflict with preserving said RBS motif
in an
overlapped architecture, then coding regions may be split and an RBS motif of
similar
strength may be inserted to support translation of downstream genes.
In some embodiments, the genome design module 201 may implement RBS motif
strength
predictions by utilizing biophysical models, such as the Salis ribosome
binding site calculator
(Salis, 2011), or by other empirical RBS strength look-up tables. For example,
the scoring
sub-module 208 of the genome design module 201 may calculate a predicted
expression
score for the reference sequence and the designed sequence using a biophysical
model (e.g.,
24
CA 03027882 2018-12-14
WO 2017/218727
PCT/US2017/037596
from Salis, 2001). The ratio (or log-ratio) of these scores may become a
quantified expression
of disruption of this rule or constraint or conditios or parameter or feature.
In yet another example, the biological constraints 205 may include a rule or
constraint or
condition or parameter or feature for preserving internal ribosome pausing
site motifs. For
example, the occurrence of ribosome binding site-like motifs (e.g., an anti-
Shine-Dalgarno
sequence) may correspond to translational pausing in E. coil, which may
suggest that these
motifs comprise a biologically important role (Li et al., 2012). Thus, the
genome design
module 201 may implement a design rule that leverages a biophysical model
(e.g., from Salis,
2001). As described in the Examples herein, to score a proposed design change,
it may be
assumed that a codon might be part of an RBS by inserting a phantom ATG start
codon the
correct number of bases (e.g., approximately 10) downstream of the change.
Based on this
rule, the genome design module 201 may calculate the predicted RBS strength
before and
after a proposed design change, penalizing disruption of existing internal
ribosome pausing
sites, or introduction of strong internal ribosomal pausing sites where one
did not exist
before.
Additional examples of biological constraints 205 may include (and are not
limited to) rules
or constraints or conditions or parameters or features for ensuring that a
selection of
alternative alleles or codons is consistent with global distribution of allele
or codon choice
(both for recoding and heterologous expression), preserving known sequence
motifs in a
genome design (e.g., frame-shift, selenocysteine insertion sequence (SECTS)
sites,
recombination sites, etc.), preserving regulatory motifs such as by
preserving/tuning
promoter, enhancer, and/or transcription factor motifs, applying phylogenetic
conservation
for a genome design by choosing sequences which are closest to
phylogenetically-related
neighbors when considering alternatives for a genome design modification,
reducing
homology between redesigned regions through non-disruptive muddling, etc. In
the reducing
homology example, the optimal solution for performing synonymous codon swaps
while
preserving an overlapping regulatory motif may be to split the overlap by
making a copy,
which may result in adjacent regions of high homology. The homology may be
broken by
performing synonymous codon swaps or other changes that do not break any
annotated
regulatory motifs. This may be important to produce stable genomes, such as by
preventing
an undesired recombination that could revert the redesigned sequence.
CA 03027882 2018-12-14
WO 2017/218727
PCT/US2017/037596
Furthermore, the genome design module 201 may implement the rules or
constraints or
conditions or parameters or features of the biological constraints 205 by
using the scoring
sub-module 208 to score genetic sequences (e.g., genome designs) with respect
to reference
sequences (e.g., genome templates). In some embodiments, the scoring sub-
module 208 may
assign a quantitative score to every possible change to a gene or genome. This
scoring may
allow ranking and prioritizing designs that achieve a desired genotypic or
phenotypic
outcome. The scoring, ranking, and prioritization features may comprise core
features of the
software for the genome design module 201.
For example, for a design choice with mutually exclusive options (e.g., for
choosing an allele
replacement), the genome design module 201 may allow ranking of design
choices. In some
embodiments, the best single design choice or any number of the best single
design choices
may be chosen for synthesis and testing. In other embodiments, all design
choices that pass a
predefined score threshold may be synthesized and tested.
Additionally, the scoring sub-module 208 of the genome design module 201 may
implement
different types of scoring. For example, a higher score may indicate less
deviation from the
biological constraints 205 (e.g., a set of rules) and may thus be preferred.
For example, less
deviation from the constraints may indicate a higher predicted success in
biological
validation. In another example, a lower score may indicate less deviation from
the biological
constraints 205 (e.g., a set of rules), and may thus be preferred.
The genome design module 201 may further implement scoring for a genetic
design as a
weighted combination of scores from specific rules or constraints or
conditions or parameters
or features. For example, in the case where a score may be interpreted as a
deviation from a
biological motif value and for the genetic design of swapping alternative
alleles, each choice
of allele may be scored according to a combination of factors.
That is, there may be a plurality of alternative gene sequences in which each
alternative gene
sequence comprises a different allele choice which may be used to replace one
or more
forbidden alleles in a reference genome. Thus, the genome design module 201
may apply
rules or constraints or conditions or parameters or features for the
biological constraints 205
by assigning a score for each rule in each alternative gene sequence. In some
embodiments,
each allele choice may be scored according to a combination of biological
constraints 205,
26
CA 03027882 2018-12-14
WO 2017/218727
PCT/US2017/037596
including fold disruption of predicted mRNA secondary structure folding
energy, fold
disruption of predicted ribosome binding site (RBS) affinity strength, and the
like.
For example, a total score for an alternative gene sequence comprising an
allele choice may
be computed (e.g., by the genome design module 201) using the following
equation:
score = f(mRNA score) x w2 * g(RBS score)
In the above equation, wi and W2 represent weights, whereas fand g represent
functions of the
respective quantification of the rules. Furthermore, the weights lc and W2 may
be determined
empirically and may be updated or modified according results from synthesizing
and testing
genome designs. In other embodiments, the weights may be adjusted by manual
specification
in which a user may manually specify (e.g., enter in) each weight (e.g., as an
input into the
genome design module 201 and/or the computing device 100). The weights and
scoring may
also be applied globally or may be context-specific. For example, a first set
of weights may
hold true and be applied near a 5' end of a gene, whereas a different set of
weights or a
different combination of rules or constraints or conditions or parameters or
features may be
true and may be applied in a different area of the gene (e.g., in the middle
of the gene). As
described in the Examples herein, it was empirically found that the following
weights for
codons choices in E. coil may predict a successful swap:
score = (0.65/1.5411) *mRNAratio x (0.35/8.4257)* (1 + LOG(RBSratio))
In additional embodiments, the genome design module 201 may follow an
automated
computational design pipeline as illustrated in FIG. 8. For example, the
genome design
module 201 may first implement forbidden allele replacement based on the list
of alleles 204
and the genome template 202 in all instances of gene overlaps while accounting
for biological
constraints 205. The genome design module 201 may then apply remaining
forbidden allele
replacement in each gene independently while accounting for biological
constraints 205. For
example, for each allele that is to be replaced, there may be multiple choices
for synonymous
allele substitutions. A design may be minimally disruptive with respect to
design rules or
constraints or conditions or parameters or features that quantify deviation
from the wild-type
sequence (e.g. secondary structure, GC content, RBS motif strength).
However, in some embodiments, an exhaustive comparison of all possible allele
or codon
modifications may be computationally expensive, making iteration slow. For
example, in the
27
CA 03027882 2018-12-14
WO 2017/218727
PCT/US2017/037596
case of recoding E. coil, there are about 17 forbidden codons per gene and 4
possible
synonymous swaps per codon, resulting in 417 possible sequences to evaluate
per gene. Thus,
the genome design module 201 may identify a solution that satisfies each rule
or constraint or
condition or parameter or feature within a threshold, rather than identifying
a global
minimum. To identify a satisfactory solution, the genome design module 201 may
identify
and represent a genome-recoding problem as a graph that is traversed using an
algorithm
based on depth first search. In some embodiments, the algorithm may be
referred to as a
graph search-based codon replacement algorithm.
For example, nodes in the graph may represent a unique alternative gene
sequence. Sibling
nodes in the graph may differ in the value of a specific codon. Children of a
node may
represent all possible changes to the next downstream codon. Each node may be
assigned a
score corresponding to each of the rules, including GC content, secondary
structure, and
codon rarity deviation. Each score may be a quantitative measure of deviation
away from
wild-type sequence in the respective score profile for a base pair window
(e.g., a 40 base pair
window or a window of any other number of base pairs) centered at a specific
codon. A node
may be expanded and pursued as long as all scores are below the thresholds for
their
respective profiles. If all nodes at a level violate the threshold, the
algorithm (e.g.,
implemented by the genome design module 201) may backtrack to an earlier node
and choose
a different branch. If the algorithm is unable to find a solution for a
particular gene, the
threshold constraints may be modified, and a search may be restarted. In some
embodiments,
the graph search-based algorithm may also be applied in allele replacement for
genome
design.
After the graph search-based codon (or allele) selection, the genome design
module 201 may
apply technical rules or constraints or conditions or parameters or features
considering
synthesis and assembly constraints for genome design. For example, the genome
design
module 201 may further modify the genome template 202 using the synthesis
constraints 206,
in order to satisfy DNA vendor constraints, such as by removing specific
restriction enzyme
sites and homopolymer sequences, and balancing GC content. Finally, the genome
design
module 201 may partition the modified genome into segments of a predefined
size (e.g.,
segments of any number of bases). For example, the genome design module 201
may first
partition the modified genome into ¨50 kb segments and then partition each
segment into 2-4
kb synthesis units or fragments.
28
CA 03027882 2018-12-14
WO 2017/218727
PCT/US2017/037596
In additional embodiments, the genome design module 201 may also allow users
to provide a
list of manually-specified modifications for a genome. In some embodiments,
these
manually-specified modifications (which may be referred to as miscellaneous
design notes)
may include solutions from empirical validation or special cases for which
generalized rules
or constraints or conditions or parameters or features have not yet been
implemented. For
example, in the case of recoding E. coil, the UUG codon, which encodes Leucine
using
tRNALeu, was chosen as one of the seven codons for replacement throughout
protein coding
genes. However, when the same codon (UUG) occurs as a translational start
codon, it is
decoded by tRNAfmet, and does not need to be replaced. Thus, a miscellaneous
design note
was added not to replace these start codons in order to minimize perturbation
of gene
expression level. The miscellaneous design note may be implemented in the
software in order
to facilitate automated allele replacement. In another miscellaneous design
note, manual
substitutions were designated for AGR codons in essential genes based on
previous empirical
testing. In yet another miscellaneous design note, codons overlapping
selenocysteine
insertion sequence (SECTS) sites were manually recoded in the following genes:
fdhF, fdnG,
and fdoG .
The genome design module 201 may ultimately generate a plurality of
alternative gene
sequences (each comprising a different codon or allele choice) and select at
least one
alternative gene sequence as the genome design based on weighted scoring. The
genome
design module 201 may output a final genome design 210 which may comprise a
file (e.g., a
GenBank file) of the final genome design. In some cases, the genome design
module 201
may identify synthesizable DNA by dividing the genome design 210 into
contiguous
segments, in which each segment is composed of a predetermined number of
bases. For
example, the genome design module 201 may also generate a list of synthesis-
compatible 2-4
kilobase (kb) fragments, which may be synthesized and tested. Furthermore, one
or more
rules or constraints or conditions or parameters or features for the
biological constraints 205
and synthesis 206 may be updated based on empirical testing resulting from the
final genome
design 210.
In additional embodiments, the final genome design may be based on one of: a
genetic code
with minor modifications from a canonical genome code, a radically redefined
genetic code,
a novel genetic code, or a genetic code in which codons map to non-standard
amino acids
(nsAAs).
29
CA 03027882 2018-12-14
WO 2017/218727
PCT/US2017/037596
FIG. 3 illustrates a flow diagram of an example method in accordance with
aspects of the
present disclosure. In particular, FIG. 3 illustrates example method steps for
designing
genomes based on applying rules or constraints or conditions or parameters or
features for
biological constraints and synthesis constraints and scoring designs. The
steps of FIG. 3 may
be performed by a computing platform, such as by at least one of a genome
design module
101, genome design module 201, scoring sub-module 208, or the like. As a
result of the
method of FIG. 3, a genome design may be selected and output as a final
design.
The method of FIG. 3 may begin with a step 302 of a computing platform
receiving data for a
known genome and a list of alleles to be replaced in the known genome. For
example, the
genome design module 201 may receive a genome template 202 (e.g., comprising a
known
genome reference sequence) and a list of alelles 204 as inputs. At step 304,
the computing
platform may identify occurrences of each allele in the known genome based on
the list of
alleles. For example, the genome design module 201 may find all the alleles
(e.g., forbidden
codons) that are to be replaced in the genome sequence 202. At step 306, the
computing
platform may remove the occurrences of each allele from the known genome. For
example,
the genome design module 201 may apply allele replacement or removal in all
occurrences in
the known genome 202. In some embodiments, the genome design module 201 may
apply
forbidden codon replacement or removal in the known genome 202.
At step 308, the computing platform may determine a plurality of allele
choices with which to
replace occurrences of each allele in the known genome. For example, the
genome design
module 201 may identify that are there are several synonymous allele that may
be utilized to
replace each occurrence of each allele in the known genome 202. In alternative
arrangements,
steps 306 and steps 308 of the method may be combined as one step performed by
the
genome design module 201, in which the genome design module 201 may identify
alleles to
remove from the known genome and determine a plurality of allele choices with
which to
replace occurrences of each allele.
At step 310, the computing platform may generate a plurality of alternative
gene sequences
for a genome design based on the known genome. For example, the genome design
module
201 may generate a plurality of alternative gene sequences, in which each
alternative gene
sequences includes a different allele choice from the plurality of synonymous
allele choices.
CA 03027882 2018-12-14
WO 2017/218727
PCT/US2017/037596
At step 312, the computing platform may apply a plurality of rules or
constraints or
conditions or parameters or features to each alternative gene sequence by
assigning a score
for each rule or constraint or condition or parameter or feature in each
alternative gene
sequence, resulting in scores for the plurality of rules or constraints or
conditions or
parameters or features applied to each alternative gene sequence. For example,
the genome
design module 201 or the scoring sub-module 208 may utilize the one or more
rules or
constraints or conditions or parameters or features for the biological
constraints 205 and
synthesis constraints 206 to calculate sores for each rule or constraint or
condition or
parameter or feature with respect to each allele choice. That is, the scoring
sub-module 208
calculate a score for each rule or constraint or condition or parameter or
feature, including for
preserving coding mRNA secondary structure, preserving ribosome binding site
motifs,
preserving internal ribosome pausing site motifs, and the like. Each
alternative gene sequence
(comprising a different allele choice) may have a score calculated for each of
the rules or
constraints or conditions or parameters or features.
At step 314, the computing platform may score each alternative gene sequence
based on a
weighted combination of the scores for the plurality of rules or constraints
or conditions or
parameters or features. For example, the genome design module 201 may
implement scoring
for each alternative gene sequence as a weighted combination of scores from
the specific
rules or constraints or conditions or parameters or features. At step 316, the
computing
platform may select at least one alternative gene sequence as the genome
design based on the
weighted scoring. For example, the genome design module 201 may select one or
more
alternative gene sequences as the final genome design 210 based on identifying
which
alternative gene sequences comprise a weighted score above a predefined
threshold. In some
cases, after selection, the genome design module 201 may output the final
genome design 210
as a Genbank file which may be utilized for synthesis and testing. In some
embodiments,
after identifying which alternative gene sequences comprise a weighted score
above a
predefined threshold, the identified alternative gene sequences may be
empirically tested
individually or as a library (e.g., a mixture of sequences). In additional
embodiments, the
genome design module 201 may update one or more rules or constraints or
conditions or
parameters or features in the plurality of rules or constraints or conditions
or parameters or
features based on comparing rule predictions to empirically observed
viability. For example,
the final genome design 210 may be synthesized and tested for viability, and
results from
testing the synthesized final genome design 210 (along with results from other
designs) may
31
CA 03027882 2018-12-14
WO 2017/218727
PCT/US2017/037596
be used to update and derive new rules or constraints or conditions or
parameters or features
for future genome design.
In additional embodiments, one or more rules or constraints or conditions or
parameters or
features in genome design may be updated, such as by utilizing a computing
platform (e.g.,
computing device 100 comprising the genome design module 101 or genome design
module
201). First, one or more features of a genome design may be introduced into at
least one cell.
In some embodiments, one or more features of the genome design may be
introduced into the
at least one cell by using DNA cleavage to select against a wild-type genotype
and/or
facilitate homologous recombination. Further examples for introducing features
into a cell
may include using CRISPR/Cas, transcription activator-like effector nucleases
(TALENs),
zinc-finger nucleases (ZFNs), meganucleases, restriction endonucleases, or the
like.
In other embodiments, one or more features of the genome design may be
introduced into the
at least one cell by using recombinases/integrases. Additional examples for
introducing
features into a cell may include using multiplex automated genome engineering
(MAGE),
lambda red-recombineering, site-specific recombinases/integrases (e.g., Cre,
PhiC31, lambda
integrase, Flp, etc.), recombinase-mediated cassette exchange (RMCE), or the
like. In other
embodiments, introducing one or more features of the genome design into the at
least one cell
may further include synthesizing a partial or whole genome based on the genome
design.
Additionally, in some embodiments, the one or more features may be tested by a
growth
assay using a kinetic plate reader. In other embodiments, the one or more
features may be
tested by an assay to test protein production. In yet additional embodiments,
the one or more
features may be tested by sequencing representative portions of the cell
population at
predetermined time points. For example, next-generation sequencing (NGS) may
be used to
monitor which genotypes become enriched or depleted in the population, which
may be
interpreted as relative fitness information.
The one or more features that have been introduced into the at least one cell
may be tested by
an assay in order to identify genome viability and evaluate the phenotype of
the one or more
features introduced into the at least one cell. In some embodiments, the one
or more features
may be tested on a vector (e.g., plasmid, cosmid, phagemid, bacteriophage, or
artificial
chromosome) or integrated into a chromosome. Based on the testing, it may be
determined
that the one or more features introduced into the at least one cell are
expected to be viable or
expected to fail according to one or more predefined rules or constraints or
conditions or
32
CA 03027882 2018-12-14
WO 2017/218727
PCT/US2017/037596
parameters or features for the genome design. The predefined rules or
constraints or
conditions or parameters or features for genome design may ultimately be
updated based on
the determination. In some embodiments, the one or more predefined rules or
constraints or
conditions or parameters or features for genome design may comprise one or
more
phenotypic and genotypic parameters.
In additional embodiments, the computing platform may update the predefined
rules or
constraints or conditions or parameters or features for genome design further
based on
statistical techniques and machine-learning algorithms. For example, the
computing platform
may update and/or automatically infer new rules or constraints or conditions
or parameters or
features using representation learning algorithms including, but not limited
to, deep learning.
Other machine learning techniques may be used for updating and learning new
rules or
constraints or conditions or parameters or features, including supervised or
unsupervised
learning, semi-supervised learning, reinforcement learning, and deep learning.
These may
include specific techniques, such as convolutional neural networks, random
forests, hidden
Markov models, autoencoders, Boltzmann machines, and the like. In another
example, a user
may utilize the computing platform to manually define new rules or constraints
or conditions
or parameters or features based on analysis.
In additional embodiments, genome designs may be generated by a computing
platform (e.g.,
computing device 100 comprising the genome design module 101 or genome design
module
201) and may be tested by the computing platform by determining one or more
features in the
genome design that fail a set of predefined rules or constraints or conditions
or parameters or
features. In some embodiments, the set of predefined rules or constraints or
conditions or
parameters or features may comprise one or more phenotypic and genotypic
parameters. The
computing platform may obtain or access a sample of a known genome sequence
(e.g., a
known genome sequence that the genome design is based on), the computing
platform may
further analyze the sample of the known genome sequence. In some embodiments,
the
computing may determine the one or more features in the genome design that
fail a set of
predefined rules or constraints or conditions or parameters or features by
testing individual
mutations in the genome design in parallel. In other embodiments, the
computing may
determine the one or more features in the genome design that fail a set of
predefined rules or
constraints or conditions or parameters or features by testing individual
mutations in the
genome design in multiplex.
33
CA 03027882 2018-12-14
WO 2017/218727
PCT/US2017/037596
The computing platform may predict modifications to the genome design that may
be
implemented in order to satisfy a predetermined design objective and to
increase probability
of viability. For example, a predetermined design objective may comprise one
or more
features of the natural genome that may need to be changed. A natural genome
sequence may
be viable, whereas a recoded genome sequence or genome design may need to be
tested in
order to determine if the design is still viable. After predicting the
modifications, the
computing platform may test the predicted modifications to generate an
improved genome
design. In some embodiments, the predicted modifications for the genome design
may be
tested as a mixture. In other embodiments, the predicted modifications for the
genome design
may be tested using genetic diversity and selection.
The above disclosure generally describes the present invention. All references
disclosed
herein are expressly incorporated by reference. A more complete understanding
can be
obtained by reference to the following specific examples which are provided
herein for
purposes of illustration only, and are not intended to limit the scope of the
invention.
34
CA 03027882 2018-12-14
WO 2017/218727
PCT/US2017/037596
EXAMPLES
The following examples are given for the purpose of illustrating various
embodiments of the
disclosure and are not meant to limit the present disclosure in any fashion.
The present
examples, along with the methods described herein are presently representative
of preferred
embodiments, are exemplary, and are not intended as limitations on the scope
of the
disclosure. Changes therein and other uses which are encompassed within the
spirit of the
disclosure as defined by the scope of the claims will occur to those skilled
in the art. Other
equivalent embodiments will be apparent in view of the present disclosure,
figures and
accompanying claims.
EXAMPLE I
Design, Synthesis, and Testing of a 57-Codon Genome
According to some aspects, methods are described herein for design and
construction of a
radically recoded Escherichia coil. Recoding, the re-purposing of genetic
codons, is a
powerful approach to enhance genomes with functions not commonly found in
nature. The
degeneracy of the canonical genetic code allows the same amino acid to be
encoded by
multiple synonymous codons. The near universality of a 64-codon code among
natural
organisms (Crick, 1963) makes codon replacement a powerful tool for genetic
isolation of
synthetic organisms. For example, while most organisms follow a common 64-
codon
template for translation of cellular proteins, deviations from this universal
code found in
several prokaryotic and eukaryotic genomes (Ambrogelly et al. 2007, Kano et
al., 1991, Oba
et al., 1991, Macino et al., 1979, Ling et al., 2015) have spurred the
exploration of synthetic
organisms with expanded genetic codes.
Whole-genome synonymous codon replacement provides a mechanism to construct
unique
organisms exhibiting genetic isolation and expanded biological functions. Once
a codon is
synonymously replaced genome-wide and its cognate tRNA is eliminated, the
genomically
recoded organism (GRO) may no longer translate the missing codon (Lajoie et
al., 2013b).
Therefore, genetic isolation is achieved since DNA acquired from natural
viruses, plasmids
and other organisms would be improperly translated, rendering the recoded
strain insensitive
to infection by viruses and horizontal gene transfer (FIG. 4).
CA 03027882 2018-12-14
WO 2017/218727 PCT/US2017/037596
For example, FIG. 4 illustrates, for a panel of coliphages, the percent of
bacteriophage genes
that are predicted to be properly translated in recoded E. coil strain with an
increasing number
of unassigned missing codons (e.g., no cognate translation). In this example,
1 codon = UAG;
3 codon = UAG, AGG, and AGA; and 7 codons = UAG, AGG, AGA, AGC, AGU, UUG,
and UUA.
The gene translation percentage may be computed by the following equation:
Gene translation%
Total # of genes in given viral genome ¨ # of viral genes containing forbidden
codons
Total # of genes in given viral genome
Furthermore, proteins with novel chemical properties may be explored by
reassigning
replaced codons to incorporate non-standard amino acids (nsAAs) functioning as
chemical
handles for bioorthogonal reactivity, photoresponsive elements, or biophysical
probes (Liu et
al., 2010). Codon reassignment has also made it possible to establish
metabolic dependence
on nsAAs that do not naturally exist in the environment, enhancing
biocontainment of GROs
which may be a major consideration in environmental, industrial and medical
applications
(Marliere, 2009, Mandell et al., 2015, Rovner et al., 2015). In some
embodiments, non-
standard amino acids (nsAAs) may comprise any amino acid other than the 20
canonical
protein coding amino acids. In other words, nsAAs may include any amino acid
incorporated
using one or more codons whose assignment differs from those of a given
natural organism.
Described herein are methods for multiple codon replacements genome-wide, with
the aim of
producing a virus-resistant, biocontained organism relevant for industrial
applications. A
computational design is presented, along with experimental testing of 2.5 Mb
(63%) of an E.
coil genome in which all 62,214 instances of seven different codons
(corresponding to 5.4%
of all E. coil codons) have been synonymously replaced (FIG. 5A-5C). The new
recoded
genome may be referred to as rE.coli-57 as described herein and is composed of
57 of
canonical 64 codons when assembled (FIG. 6). While several synthetic genomes
have been
previously reported (Blight et al., 2000, Cello et al., 2002, Smith et al.,
2003, Chan et al.,
2005, Gibson et al., 2008, Gibson et al., 2010, Annaluru et al., 2014), a
functionally altered
synthetic genome of this scale has not yet been explored (FIG. 5C).
36
CA 03027882 2018-12-14
WO 2017/218727
PCT/US2017/037596
In some cases, alterations of codon usage may affect gene expression and
cellular fitness at
multiple levels from translation initiation to protein folding (Kudla et al.,
2009, Tuller et al.,
2010, Plotkin et al., 2011, Goodman et al., 2013, Zhou et al., 2013, Quax et
al., 2015, Boel et
al., 2016). Yet, parsing the individual impact of codon choices may remain
difficult,
imposing a barrier to designing new genomes. The present disclosure provides
prediction
tools and efficient technologies to rapidly prototype synthetic genomes.
In order to address the unprecedented scale and complexity of genome
engineering goals,
computational tools, cost-effective de novo synthesis strategy, and a
comprehensive
experimental validation plan as described herein. For example, the number of
modifications
required to replace all instances of seven codons may be far beyond the
current capabilities of
single-codon editing strategies previously used for genome-wide replacement of
the UAG
codon (Lajoie et al., 2013b, Isaacs et al., 2011). Although it may be possible
to
simultaneously edit multiple alleles using MAGE (Wang et al., 2009) or Cas9
(Esvelt et al.,
2013), these strategies may involve extensive screening using numerous oligos
and RNA
guides and may likely introduce off-target mutations (Wang et al., 2009). De
novo synthesis
allows for an almost unlimited number of modifications independent of
biological template.
Moreover, the plummeting costs of DNA synthesis are reducing financial
barriers for
synthesizing entire genomes.
For this example, the following three codons were chosen for replacement: the
UAG stop
codon and the AGA and AGG arginine codons (FIG. 6). These codons were also
among the
rarest codons in the genome, minimizing the number of changes required. The
other codons
were chosen such that their anticodon is not recognized as a tRNA identity
element by
endogenous aminoacyl-tRNA synthetases, so that heterologous tRNAs will not be
mischarged with canonical amino acids upon incorporation of nsAAs. Lastly, to
allow
unambiguous reassignment, codons were chosen whose tRNA do not overlap with
other
synonymous codons for the same amino acid. Thus, the following seven codons
(termed
'forbidden codons') were targeted for replacement: AGA (Arg), AGG (Arg), AGC
(Ser),
AGU (Ser), UUG (Leu), UUA (Leu) and UAG (Stop) (FIG. 5A-5C, FIG. 6, FIG. 3).
In order to minimize synthesis costs and improve genome stability, the 57-
codon genome
described herein is based on the reduced-genome E. coil strain MD542 (Posfai
et al., 2006).
The disclosed computational tool automates synonymous replacements for all
occurrences of
the target codons in all protein-coding genes while satisfying biological and
technical
37
CA 03027882 2018-12-14
WO 2017/218727
PCT/US2017/037596
constraints, in which examples of these constraints are illustrated in FIGS. 8-
9 and Tables 1-
2. In particular, amino acid sequences of all coding genes were preserved, and
protein
synthesis levels were maintained by separating overlapping genes carrying
forbidden codons
and by introducing synonymous codons to minimize potential recombination
events (Chan et
al., 2005, Temme et al., 2010). The relative codon usage of the remaining
codons was
conserved to meet translational demand (Yona et al., 2013) and to preserve
characteristics of
the primary nucleotide sequence, including predicted ribosome binding site
(RBS) strength,
mRNA secondary structure folding energy, and GC content (Lajoie et al., 2013b,
Lajoie et
al., 2013a). Finally, adjustments were made to avoid difficult-to-synthesize
sequences from
the final genome design (e.g., removing homopolymers, normalizing regions of
extreme GC
content and reducing repetitive sequences) (FIGS. 9A-9G).
Overall, forbidden codons were uniformly distributed throughout the genome,
averaging
about 17 codon changes per gene. Essential genes (Yamazaki et al., 2008),
which provide a
stringent test for successful codon replacement, contain about 6.3% of all
forbidden codons
(3,903 of 62,214 codons). Altogether, the recoded genome necessitated a total
of 148,955
changes to remove all instances of forbidden codons and adjust the primary DNA
sequence to
accommodate design constraints.
Once designed, the recoded genome was parsed into 1,256 synthesis-compatible
overlapping
fragments of 2 to 4 kilobases (kb). 87 segments of about 50-kb were
individually assembled
and tested (FIG. 8). Segments of about 50-kb contain a manageable number of
genes,
averaging about 40 total genes and about 3 essential genes per segment.
Additionally, it was
found that 50-kb may be a convenient size for assembly in yeast and shuttling
into E. coil.
Importantly, based on earlier studies (Mandell, D.J. et al., Biocontainment of
genetically
modified organisms by synthetic protein design. Nature. 518, 55-60 (2015).; K.
M. Esvelt et
al., Orthogonal Cas9 proteins for RNA-guided gene regulation and editing. Nat.
Methods. 10,
1116-1121 (2013)) it was estimated that each segment would on average contain
only about
1 potentially lethal recoding exception.
FIGS. 10A-10C outline the experimental strategy utilized in this example. In
brief, each
segment was assembled in S. cerevisiae and electroporated directly into E.
coil on a low copy
plasmid. Subsequent deletion of the corresponding chromosomal segment provides
a
stringent test for the function of the recoded genes because errors in
essential genes would be
lethal. Thus far, chromosomal deletions for 2,229 recoded genes across 55
segments have
38
CA 03027882 2018-12-14
WO 2017/218727
PCT/US2017/037596
been performed, accounting for 63% of the entire genome and 53% of essential
genes (FIG.
11). Additionally, all recoded genes in 44 of these 55 segments were found to
complement
wild-type chromosomal genes without requiring any optimization. The growth of
these
strains was assessed, and gene expression was analyzed via RNA-Seq (FIGS. 12A-
12B).
Moreover, the majority of these strains exhibited only marginal fitness
impairment upon
chromosomal deletion (FIG. 12A, FIGS. 13A-13B).
Furthermore, RNA-Seq analysis of 208 recoded genes suggests the majority show
only minor
change in transcription due to codon replacement (FIG. 14A-14B). Only 28 genes
were found
to be significantly differentially expressed (i.e., >2-fold change, p < 0.01)
(27 overexpressed,
1 underexpressed).
Recoded segments that failed to complement the entire wild-type segment (e.g.,
11 of 55
segments) were tested by making small chromosomal deletions of the region
until the causal
gene(s) was localized. Overall, 13 recoded essential genes were found that
failed to support
cell viability due to synonymous codon replacement. In some embodiments, these
may be
referred to as "design exceptions."
Segment 44 was selected as a test case to develop a troubleshooting pipeline
for solving
design exceptions (FIGS. 15A-15B). As shown for gene accD, RBS strength and
mRNA
folding were first analyzed to pinpoint the most probable cause of disruption
in gene
expression (Plotkin et al., 2011, Goodman et al., 2013, Boel et al., 2016).
Then, degenerate
MAGE oligos were used to rapidly prototype viable alternative codons (FIG.
16). For
calculating the mRNA secondary structure score, a sliding window of 40 bp
around the codon
of interest was used. The algorithm was further updated to score mRNA
secondary structure
as a skewed interval that is -30 to +100 nucleotides relative to the codon of
interest. Notably,
for codons in the first 100 nucleotides, the window was centered at the start
of the gene.
Finally, a new recoded sequence was computationally generated using more
stringent mRNA
and RBS scoring parameters (FIGS. 15A-15B, FIG. 17) and was introduced into
the recoded
segment via multiple cycles of lambda Red recombineering. Viable clones were
selected by
the subsequent chromosomal deletion.
In some cases, all viable clones carried a specific sequence of accD that had
the N-terminal
end of the improved design and the C-terminal end of the initial (lethal)
design, highlighting
the significance of N-terminal optimization for successful synonymous codon
replacement
39
CA 03027882 2018-12-14
WO 2017/218727
PCT/US2017/037596
(Kudla et al., 2009, Goodman et al., 2013). Furthermore, such recombination
events, which
are expected due to the high degree of homology between the two gene versions,
effectively
shuffle the sequences and increase the search space of viable recoded codons.
To further confirm adequate chromosomal expression, the recoded segment was
integrated
into the chromosome using 2,,-integrase. attP-specific Cas9-mediated DNA
cleavage was then
used to ablate all non-integrated plasmids, leaving a single integration event
per genome. No
fitness changes were observed upon segment integration (FIG. 13A-13B).
Finally, DNA
sequence analysis of all validated strains may suggest some degree of in vivo
accumulation of
mutations, which may be expected during strain engineering. Yet, to achieve
complete
genome recoding, non-lethal reversions and silent mutations may be corrected
in the final
strain using MAGE .
According to certain aspects, substantial modifications to both codon usage
and tRNA
anticodons may lead to instability of a reduced genetic code without proper
selection to
prevent codon reversion (Osawa et al., 1989); however, establishing functional
dependence
on the recoded state may both stabilize the modified genome and offer a
stringent
biocontainment mechanism (Marliere, 2009). As an example, a biocontained
strain was
developed in which all UAG codons were removed and two essential genes (adk
and tyrS)
were altered so that the strain required nsAAs to remain viable (Mandell et
al., 2015). In
order to determine whether the final rEcoli-57 strain will support a similar
biocontainment
mechanism, the 57-codon versions of both adk and tyrS were confirmed to be
functionally
active in vivo. Moreover, it was found that recoded and nsAA-dependent adk
gene has the
same fitness and extremely low escape rates reported for the original strain
(FIG. 18A-18B).
Even after all instances of forbidden codons are removed from the genome, the
genetic code
may remain unchanged until the genes for five tRNAs (argU, argW, serV, leuX,
leuZ) and
one release factor (prfA) are removed. Once rEcoli-57 is fully recoded and
these tRNAs are
removed, the strain may be tested for novel properties such as resistance to
viruses and
horizontal gene transfer. Additionally, orthogonal aminoacyl-tRNA
synthetase/tRNA pairs
may be introduced to expand the genetic code by as many as 4 nsAAs.
Ultimately, the hierarchal, in vivo validation approach supported by robust
design software,
as described herein, may be utilized for large-scale synthetic genome
construction and to
radically change the genetic code. Genetically isolated and recoded genomes
may expand
CA 03027882 2018-12-14
WO 2017/218727
PCT/US2017/037596
synthetic functionality of living cells, offering a unique chassis for broad
applications in
biotechnology.
DNA synthesis
DNA was synthesized by industrial partners Gen9, SGI-DNA, Twist Biosciences,
Genewiz,
and IDT DNA technologies. The synthesis pipeline was developed primarily with
the aim of
reducing synthesis cost and turnaround time, considering constraints of
synthesis error rate
and QC. Gen9 synthesized the majority of DNA, providing 3,960 kb as fragments
ranging in
size from1.2 - 4.2 kb. Additional synthesis was provided by Twist Biosciences
(30 kb in
fragments ranging 1.4 - 2.0 kb) IDT (27 kb in fragments ranging 1.0 - 1.7 kb),
and Genewiz
(26 kb in fragments ranging 12.4 - 3.0 kb). An additional 328 kb (SGI-DNA), 36
kb (Twist),
and 6 kb (Gen9) were synthesized, but were not used in the final genome
segment syntheses.
PCR amplification of synthetic DNA
All synthetic DNA was PCR amplified and purified prior to assembly. 304, of
PCR reaction
was prepared as follows; 14 of diluted template DNA (14 synthetic template DNA
(synDNA) ranging 1 to 5ng/4, diluted in 94 TE buffer), 24 of primer mix (1011M
each
primer, mixed in 504 of TE buffer), 154 of 2xSeqAmp DNA polymerase (Clontech
Laboratories, Inc.), and 154 of PCR grade water. PCR cycles: 95 C - 1 minute,
98 C ¨ 10
seconds, 60 C - 15 seconds, 68 C - 2 minutes, 35 cycles. 1% agarose gel was
used to analyze
14 of PCR product. Optimization of unsuccessful PCR was done using 2x KAPA-
HiFi
DNA polymerase (Kapa Biosystems). 304 of PCR reaction was as follows; 14 of
diluted
template DNA (as above), 24 of primer mix (as above), 154 of 2x KAPA-HiFi, and
124
of PCR grade water. PCR cycles: 95 C - 1 minute, 98 C - 20 seconds, 60 C - 15
seconds,
72 C ¨ 2 minutes, for 30 or 35 cycles. PCR products were gel purified using 2%
E-gel Ex
(Thermo Fisher Scientific Inc.).
Segment assembly in S. cerevisiae
For segment assembly, GeneArt High-Order Genetic Assembly System (Life
Technologies)
was used with modifications. The vector pYES1L was modified to include
restriction sites
EcoRI and BamHI used for linearization, and a S. cerevisiae uracil selective
marker was
added to the vector backbone (termed `pYES1L-URA'). Vector digestion was
performed
with both enzymes as follows: 5 hours at 37 C, followed by 20 minutes enzyme
inactivation
41
CA 03027882 2018-12-14
WO 2017/218727
PCT/US2017/037596
at 65 C and 30 minute End Repair Module (NEB) treatment at 20 C. Linear vector
was
purified (Zymo DNA Clean & Concentrator) and size verified on DNA gel prior to
use.
Amplified synthetic fragment (400 ng of each) were mixed and purified for each
assembly
reaction (10-15 fragments used for each assembly), then added with 100 ng of
purified linear
vector pYES1L-URA. Vector/fragment DNA mix was concentrated using SAVANT DNA
120 SpeedVac concentrator (Thermo Fisher Scientific Inc.) to ¨104 in volume.
Transformation of MaV203 competent cells was performed according to
manufacturer
instructions. Cells were plated on CM glucose media without tryptophan and
incubated at
30 C for 3 days. Colony PCR was used to screen for segment assembly; yeast
colony was
lysed in 15 [IL of 0.02 M NaOH, boiled for 5 minutes at 95 C and kept on ice
for 5 minutes,
followed by dilution with 40 [IL ddH20. 1.5 [IL of the mix was used as
template for
multiplex PCR using KAPA2G multiplex polymerase (KAPA Biosystems) and the
following
PCR conditions: 98 C - 5 minute, 98 C - 30 seconds, 62 C - 30 seconds, 72 C -
30 seconds,
72 C - 5 minutes (32 cycles). Only colonies showing positive PCR were used.
For E. coil
transformation, cells were lysed in 15 [IL 0.02 M NaOH, vortexed with glass
beads for 5
minutes and placed on ice. 1.5 [IL of the lysis mix was added to
electrocompetent TOP10
cells (Thermo Fisher Scientific), immediately electroporated (1.8 kV, 25
Farads, 200 S2),
and recovered for 1 hour at 37 C before plating on spectinomycin selective
plates.
E. coil Methods ¨ Strains & Culture
TOP10 electrocompetent E. coil (Thermo Fisher Scientific) were used for the
entire process
for all segments except segments 19,22,23,43,44,47 that were performed in
BW38028
(Conway et al., 2014). EcM2.1 naïve strains were used for troubleshooting
(EcM2.1 is a
strain optimized for MAGE - Escherichia coil MG1655 mutS mut dnaG Q576A exoX
mut
xonA mut xseA mut1255700::tolQRA A(ybhB-bioAB)::P,cI857 N(cro-ea59)::tetR-
blal)
(Gregg et al., 2014).
Liquid culture medium consisted of the Lennox formulation of Lysogeny broth
(LBL; 1%
w/v bacto tryptone, 0.5% w/v yeast extract, 0.5% w/v sodium chloride) with
appropriate
selective agents: spectinomycin (95 [tg/mL), chloramphenicol (50 [tg/mL),
kanamycin (30
[tg/mL), carbenicillin (50 [tg/mL), zeocin (10 [tg/mL). Solid culture medium
consisted of
LBL autoclaved with 1.5% w/v Bacto agar (Thermo Fisher Scientific), containing
the same
concentrations of antibiotics as necessary.
42
CA 03027882 2018-12-14
WO 2017/218727
PCT/US2017/037596
Plasmid transformation, Lambda Red Recombinations, MAGE.
TOP10 and BW38028 (Conway et al., 2014) cells transformed with pYES1L-URA
plasmid
were the subject of all pipeline strain engineering. The average copy number
for recoded
segment on vector pYES1L-URA was found to be 1.8 plasmids/genome.
Knockout of the homologous chromosomal non recoded segment sequence is
achieved by
lambda Red recombineering specifically targeted to the genomic locus. 50 bp
homology arms
of the kanamycin cassette deletion are targeted to both sides of the genomic
segment, which
are different in sequence than the two sides of the plasmid carrying recoded
segment.
Therefore, the cassette specifically replaces the genomic segment.
All cells were transformed with pKD78 plasmid (Datsenko et al., 2000) to
introduce the
lambda Red recombineering machinery. Recombinase expression was induced for
2hrs in
Arabinose (2ug/m1) followed by DNA transformation, using either double-
stranded PCR
products or MAGE oligonucleotides. Notably, all kanamycin cassette deletions
were
performed with 100 ng double-stranded PCR products. Each recombination was
paired with a
negative control (deionized water) to monitor kanamycin selection performance.
Other
recombineering experiments were carried out as described previously (Wang et
al., 2009),
and total oligo pool was adjusted to a maximum of 5 04. After 3hrs of recovery
at 34 C, the
cells were plated in permissive media (for MAGE) or selective media (e.g.
kanamycin) and
incubated overnight at 34 C. The amount of cells plated was ¨103 for MAGE
experiments,
¨107 for plasmid transformations and ¨108 for kanamycin cassette deletions.
Resulting
strains were then subjected to verification by PCR.
Oligonucleotides, Polymerase Chain Reaction
A complete table of PCR oligonucleotides and primers can be found in Tables 3
and 4. PCR
products used in recombination or for Sanger sequencing were amplified with
Kapa 2G Fast
polymerase according to manufacturer's standard protocols. Multiplex allele-
specific PCR
(mascPCR) was used for multiplexed genotyping using the KAPA2G Fast Multiplex
PCR
Kit, according to previous methods (Isaacs et al., 2011). Primers for mascPCR
were designed
using an automated software specially built for this purpose. Sanger
sequencing reactions
were carried out through a third party (Genewiz). mascPCR screening was
performed after
the pKD78 transformation, kanamycin deletion, attP-zeocin insertion and 2,,-
Integration steps.
43
CA 03027882 2018-12-14
WO 2017/218727
PCT/US2017/037596
Genome integration of recoded segments
-integrase was used for integration of recoded segment plasmid into E. colt
genome
(Haldimann et al., 2001). attP site was added to the segment vector by lambda-
red
recombineering, along with zeocin resistance marker. Then, 2,,-integrase was
heat-induced for
6 hours at 42 C, and cells were plated on spectinomycin and kanamycin plates
for screening.
PCR screening was performed using attP and attB specific primers (attB-seq-f:
CAG GGA
TGC AAA ATA GTG TTG AG; attB-seqr: GA GAA GTC CGC GTG AGG; attP-f:
GCGCTAATGCTCTGTTACAG; attP-r:GAAATCAAATAATGATTTTATTIT GACTGA)
as well as allele-specific primers (Table 4) to identify clones with correct
plasmid integration.
Cas9-induced vector elimination
Once integrated, a further validation step was taken to ensure no additional
copies of the
recoded segments remain in the cell. Before chromosomal integration, all
recoded segment
plasmids contain an attP site for 2,,-integration. Since 2,,-integration
modifies the attP sequence
upon genome integration into attB site, only non-integrated plasmids carry
intact attP
sequence. Residual copies of the plasmid were eliminated using attP-specific
Cas9-targeting
(FIG. 10C) (Esvelt et al., 2013), such that SpCas9 protein induces double
stranded breaks in
all episomal (non-integrated) segment plasmids. Linearized remaining plasmids
are then
digested, and the resulting strains are plasmid-free.
Specifically, a plasmid containing the SpCas9 protein gene was constructed as
well as a
tracrRNA and a guide RNA directed towards the unmodified attP sequence
(Plasmid details
(DS-SPcas, Addgene plasmid 48645): cloDF13 origin, carb, proC promoter,
SPcas9,
tracrRNA (with native promoter and terminator), J23100 promoter, 1 repeat
(added to
facilitate cloning in a spacer onto the same plasmid). The guide RNA sequence
cloned in the
spacer is: TCAGCTTTTTTATACTAAGT. Plasmid was transformed and cells were plated
3hrs after transformation for growth at 37 C under selection for SpCas9
plasmid
(carbenicillin) (-107 cells). Resulting cells were PCR-verified for loss of
all attP sequence.
Presence of the integrated vector carrying recoded segment was confirmed by
mAsPCR.
Fitness Measurements
Strain doubling time was calculated as previously described (Lajoie et al.,
2013b). Briefly,
cultures were grown in flat-bottom 96-well plates (150 p.L LBL, 34 C, 300
r.p.m.). Kinetic
44
CA 03027882 2018-12-14
WO 2017/218727
PCT/US2017/037596
growth (0D600) was monitored on a Biotek Eon Microplate reader with orbital
shaking at
365 cpm at 34 C overnight and at 5-min intervals. Doubling times were
calculated by t= At
X ln(2)/m, where At=5 min per time point and m is the maximum slope of
ln(0D600)
calculated by linear regression of a sliding window of 5 contiguous time
points (20 min
intervals). Analysis was performed using a Matlab script.
The average change decrease in fitness observed for all 44 segments is 15%
relative to the
parental non-recoded strain fitness. 75% of segments (33 segments) were
observed to have <
20% decrease in fitness relative to wild-type, and only 4% of segments (2
segments) were
observed to have more than 50% decrease in fitness (segments 21, 84), which
may be
referred to as "substantial decrease."
Investigation of severe fitness impairment
A fitness impairing recoded gene was defined when deletion of the gene
resulted in a reduced
doubling time relative to the parent. This suggests the recoded gene was not
well expressed.
Impaired genes were located by gradually deleting each chromosomal gene using
lambda Red
recombineering and by measuring doubling times after each deletion (FIG. 12A-
12B). Once
located, a fitness impairing recoded gene is addressed using a troubleshooting
pipeline.
First, the gene was Sanger-sequenced with allele-specific primers which prime
only on the
recoded, not the wild-type sequence. Sequencing results were analyzed to
decide on one of
two troubleshooting routes:
1) Sequencing revealed a mutation causing fitness impairment. Specifically,
these refer to
mutations that are not included in the computational genome design. Those
mutations were
fixed using MAGE.
2) No mutations were identified in the sequence compared to computational
design. The
fitness impairment of the recoded gene was assumed to originate in the recoded
codons.
FIG. 12A-12B (segment 21) illustrates the troubleshooting strategy. Potential
deleterious
codons were identified in both the fitness impairing gene (fabH) and in the
promoter of the
entire operon (3 recoded codons located in upstream gene yceD). MAGE was
performed
(Wang et al., 2009) in a naïve strain (EcM2.1 (Gregg et al., 2014)) with
oligos corresponding
to the original recoded scheme to find fitness impairing codons. After 3
cycles of MAGE,
CA 03027882 2018-12-14
WO 2017/218727
PCT/US2017/037596
cells were plated on permissive media (-103 cells). 96 clones were screened
with mascPCR
primers targeting the wild-type sequence. The doubling time of clones having
incorporated
recoded codons was measured (-20). No significant fitness impairment was
observed for
codons changed in gene fabH. Thus, the original design changes in the promoter
were
identified as the troublesome change. MAGE was performed in a naive strain
using
degenerate MAGE oligos. After 3 cycles of MAGE, cells were plated on
permissive media
(103 cells). An alternative recoded design without any forbidden codons was
identified.
Biocontainment assay
The most effective biocontainment strategy involving recoded organisms
(Mandell et al.,
2015) uses 3 genes that are redesigned to accommodate a non-standard-amino-
acid: the
tyrosyl-tRNA-synthetase (tyrS), the adenylate kinase (adk) and the
biphenylalanyl-tRNA
syntethase (bipARS). Confirmation that those redesigned genes are compatible
with the
recoding strategy is critical for assaying the biocontainment potential of the
recoded strain.
The bipARS gene does not contain any of the seven forbidden codons and thus
considered
compatible and can be integrated into the recoded strain. The gene adk, which
contains only 1
forbidden codon and 2 additional adjustment mutations, was recoded and further
validated in
a bio-contained strain. The gene tyrS, which contains multiple forbidden
codons, was recoded
successfully in the current study, but the recoded tyrS was not yet tested in
the
biocontainment strategy.
Strains used in this study have the following background: All strains were
based on EcNR2
(Escherichia coli MG1655 4mutS::cat 4(ybhBbioAB)::P,cI857 N(cro-ea59)::tetR-
bla]).
Strains C321 [strain 48999 (www.addgene.org/48999)I and C321. AA [strain 48998
(www.addgene.org/48998)] are available from Addgene. C321.4A.adk_d6 and
C321.4A.adk.d6_tyrS.d8_bipARS.d7 are based on (Mandell et al., 2015).
Using MAGE, the 3 codon changes in adk were included in the biocontained
strain
C321.4A.adk.d6 (escape rate around 10-6) and adk.d6_tyrS.d8_bipARS.d7 (most
biocontained strain with escape rate <10-12). Fitness of the resulting strains
(C321.4A.adk.d6.rc and C321.4A.adk.d6.rc_tyrS.d8_bipARS.d7) was evaluated as
presented
above. Escape frequencies were measured as previously described (Mandell et
al., 2015).
Briefly, all strains were grown in permissive conditions and harvested in late
exponential
phase. Cells were washed twice in LBL and resuspended in LBL. Viable cfu was
calculated
46
CA 03027882 2018-12-14
WO 2017/218727
PCT/US2017/037596
from the mean and standard error of the mean (s.e.m.) of three technical
replicates of tenfold
serial dilutions on permissive media. Three technical replicates were plated
on non-
permissive media and monitored for 7 days (-107 cells). Two different non-
permissive media
conditions were used: SC, LBL with SDS and chloramphenicol; and SCA, LBL with
SDS,
chloramphenicol and 0.2% arabinose.
DNA and RNA Sequencing Methods ¨ Genome Sequencing
Bacterial genomic DNA was purified from 1 mL overnight cultures using the
Illustra Bacteria
GenomicPrep Spin Kit (General Electrics), and libraries were constructed using
the Nextera
DNA library Prep (Illumina), or the NebNext library prep (New England
Biolabs). Libraries
were sequenced using a MiSeq instrument (Illumina) with PE250 V2 kits
(Illumina).
SNP Calling
Two different pipelines were used to analyze genomes. Breseq (Deatherage,
2014) which
supports haploid genome analysis, was used for SNP and short indels calling
for strains with
only one version of the segment (i.e. recoded or non-recoded wild-type).
Breseq was used
with default parameters.
RNAseq methods
RNA was prepared from strains carrying an episomal copy of the recoded segment
and
deletion of the chromosomal segment. RNA was stabilized using RNAprotect
(QIAGEN),
and extracted with miRNeasy kit (QIAGEN). rRNA content was reduced using
riboZero
rRNA Removal Kit (Illumina). RNAseq libraries were constructed using the
Truseq Stranded
mRNA Library Kit (Illumina). Libraries were sequenced using a MiSeq instrument
(Illumina)
with PE150 V2 kits (Illumina).
RNAseq analysis:
FASTQ files obtained from RNAseq experiments were mapped using BWA (Li et al.,
2009a)
using default parameters, and processed (indexing, sorting) using SAMTOOLs (Li
et al.,
2009b) to generate a bam file for each sample. Custom R scripting was used to
analyze the
data. The library GenomicFeatures (Bioconductor) was used to associate reads
to genes, and
47
CA 03027882 2018-12-14
WO 2017/218727
PCT/US2017/037596
the Bioconductor library DESeq (Anders et al., 2010) was used to perform
differential
expression analysis. Genes with an absolute 1og2 fold change higher than 2,
and adjusted p-
value smaller than 0.01 were classified as differentially expressed genes.
Specifically,
partially recoded strains and TOP10 control were individually analyzed by RNA-
Seq. The
expression of each gene was then compared using DESeq2 (Anders et al., 2010)
in each
sample (recoded or non recoded) to the expression of the same gene in every
other sample (5
independent segments) to get a representative range of gene expression across
all samples.
For example, expression level for gene foIC in segment 44 was measured in
recoded segment
44 (only recoded copy), in TOP10 (only wild-type copy) and in all other
partially recoded
strains (where segment 44 is not recoded, e.g. only wild-type copy of gene
folC).
EXAMPLE II
Rules for Codon Choice - Editing Rare Arginine Codons in E. coli
According to some aspects, methods are described herein for empirical
validation and
updating of rules or constraints or conditions or parameters or features for
genome design. In
particular, the rare arginine codons AGA and AGG (AGR) present a case study in
codon
choice, with AGRs encoding important transcriptional and translational
properties distinct
from the other synonymous alternatives (CGN). A strain of Escherichia coli has
been created
in which all 123 instances of AGR codons have been removed from all essential
genes. 110
AGR codons were replaced with the synonymous CGU, whereas the remaining 13
AGRs
necessitated diversification to identify viable alternatives. Successful
replacement codons
tended to conserve local ribosomal binding site-like motifs and local mRNA
secondary
structure, sometimes at the expense of amino acid identity. Based on these
observations,
metrics were empirically defined for a multi-dimensional 'safe replacement
zone' (SRZ)
within which alternative codons may be more likely to be viable. To further
evaluate
synonymous and non-synonymous alternatives to essential AGRs, a CRISPR/Cas9-
based
method was implemented to deplete a diversified population of a wild type
allele, in which
the method allowed for a comprehensive evaluation of the fitness impact of all
64 codon
alternatives. Using this method, relevance of the SRZ was confirmed by
tracking codon
fitness over time in 14 different genes. It was found that codons that fall
outside the SRZ may
be rapidly depleted from a growing population.
48
CA 03027882 2018-12-14
WO 2017/218727
PCT/US2017/037596
Ultimately, the genetic code possesses inherent redundancy (Crick, 1963), with
up to six
different codons specifying a single amino acid. This implies that synonymous
codons are
equivalent (Kimura, 1977), however most prokaryotes and many eukaryotes (dos
Reis et al.,
2004; Newton and Wernisch, 2014) display a strong preference for certain
codons over
synonymous alternatives (Hershberg and Petrov, 2008; Plotkin and Kudla, 2011).
While
different species have evolved to prefer different codons, codon bias is
largely consistent
within each species (Hershberg and Petrov, 2008). However, within a given
genome, codon
bias differs among individual genes according to codon position, suggesting
that codon
choice has functional consequences. For example, rare codons are enriched at
the beginning
of essential genes (Chen and Inouye, 1990; Chen and Inouye, 1994), and codon
usage
strongly affects protein levels (Kane, 1995; Sharp and Li, 1987; Sharp et al.,
1993),
especially at the N-terminus (Goodman et al., 2013). This suggests that codon
usage plays a
poorly understood role in regulating protein expression.
Several hypotheses attempt to explain how codon usage mediates this effect,
including but
not limited to: facilitating ribosomal pausing early in translation to
optimize protein folding
(Zhou et al., 2013), adjusting mRNA secondary structure to optimize
translation initiation or
modulate mRNA degradation, preventing ribosome stalling by co-evolving with
tRNAs
levels (Plotkin and Kudla, 2011), providing a "translational ramp" for proper
ribosome
spacing and effective translation (Tuller et al., 2010), or providing a layer
of translational
regulation for independent control of each gene in an operon (Li, 2015).
Additionally, codon
usage may impact translational fidelity (Hooper and Berg, 2000), and the
proteome may be
tuned by fine control of the decoding tRNA pools (Gingold et al., 2014).
Although Quax et
al. provides an excellent review of how biology chooses codons, systematic and
exhaustive
studies of codon choice in whole genomes are lacking (Quax et al., 2015).
Studies have only
begun to probe the effects of codon choice in a relatively small number of
genes (Goodman et
al., 2013; Isaacs et al., 2011; Kudla et al., 2009; Lajoie et al., 2013a; Li
et al., 2012).
Furthermore, although the UAG stop codon has been completely removed from
Escherichia
coil (Lajoie 2013a), and the AGG codon has been ambiguously reassigned (Lee et
al., 2015;
Mukai et al., 2015; Zeng et al., 2014), no genomewide attempt to entirely
replace a sense
codon has been reported. Prior work has established there are unknown
constraints to such
replacement (Isaacs et al., 2011; Lajoie et al., 2013a; Lajoie et al., 2013b).
Attempting to
replace all essential instances of a codon in a single strain would provide
valuable insight into
these constraints. Additionally, while some constraints are known to exist in
certain genes, no
49
CA 03027882 2018-12-14
WO 2017/218727
PCT/US2017/037596
attempt has been made to explore the breakdown of synonymous codons on a
genome wide
scale.
As described in the Example herein, rare arginine codons AGA and AGG
(comprising AGR
according to IUPAC conventions) were chosen for this study because the
literature suggests
that they are among the most difficult codons to replace and that their
similarity to ribosome
binding sequences underlies important non-coding functions (Chen and Inouye,
1990,
Rosenberg et al., 1993, Spanjaard et al., 1988, Spanjaard et al., 1990,
Bonekamp et al., 1985.
Furthermore, their sparse usage (123 instances in the essential genes of E.
coli MG1655 and
4228 instances in the entire genome (Table 3) made replacing all AGR instances
in essential
genes a tractable goal, with essential genes serving as a stringent test set
for identifying any
fitness impact from codon replacement (Baba, et al., 2006). Additionally,
recent work has
shown the difficulty of directly mutating some AGR codons to other synonymous
codons
(Zeng, et al, 2014), although the authors do not explain the mechanism of
failure or report
successful implementation of alternative designs. All 123 instances of AGR
codons were
attempted to be removed from essential genes by replacing them with the
synonymous CGU
codon. CGU was chosen to maximally disrupt the primary nucleic acid sequence
(AGR-
>CGU). It was hypothesized that this strategy would maximize design flaws,
thereby
revealing rules for designing genomes with reassigned genetic codes.
Importantly, individual
codon target were not inspected a priori in order to ensure an unbiased
empirical search for
design flaws.
To construct this modified genome, co-selection multiplex automatable genome
engineering
(CoS-MAGE) was used (Can et al., 2012, Gregg et al., 2014) to create an E.
coli strain
(C123) with all 123 AGR codons removed from its essential genes (FIG. 19A).
CoS-MAGE
leverages lambda red-mediated recombination (Yu et al., 2000, Ellis et al.,
2001) and exploits
the linkage between a mutation in a selectable allele (e.g. to1C) to nearby
edits of interest
(e.g., AGR conversions), thereby enriching for cells with those edits (Figure
Si). To
streamline C123 construction, E. coli strain EcM2.1 was chosen to start with,
in which the
strain was previously optimized for efficient lambda red-mediated genome
engineering
(Gregg et al., 2014, Lajoie et al., 2012). Using CoS-MAGE on EcM2.1 improves
allele
replacement frequency by 10-fold over MAGE in non-optimized strains but
performs
optimally when all edits are on the same replichore and within 500 kilobases
of the selectable
allele (Gregg et al., 2014). To accommodate this requirement, the genome was
divided into
CA 03027882 2018-12-14
WO 2017/218727
PCT/US2017/037596
12 segments containing all 123 AGR codons in essential genes. A to/C cassette
was moved
around the genome to enable CoS-MAGE in each segment, allowing us to rapidly
prototype
each set of AGR->CGU mutations across large cell populations in vivo. Of the
123 AGR
codons in essential genes, 110 could be changed to CGU by this process (Figure
1), revealing
considerable flexibility of codon usage for most essential genes. Allele
replacement (in this
case, AGR->CGU codon substitution) frequency varied widely across these 110
permissive
codons, with no clear correlation between allele replacement frequency and
normalized
position of the AGR codon in a gene (Figure 2A).
The remaining 13 AGR->CGU mutations were not observed, suggesting a codon
substitution
frequency of less than the detection limit of 1% of the bacterial population.
These
'recalcitrant codons' were assumed to be deleterious or non-recombinogenic and
were triaged
into a troubleshooting pipeline for further analysis (FIG. 19A-B).
Interestingly, all except for
one of the thirteen recalcitrant codons were co-localized near the termini of
their respective
genes, suggesting the importance of codon choice at these positions ¨ seven
were at most 30
nt downstream of the start codon, while five were at most 30 nucleotides (nt)
upstream of the
stop codon (FIG. 20A, lower panel). These failed AGR->CGU mutations were
inspected for
obvious design errors. For example, fts/ AGA1759 overlaps the second and third
codons of
murE, an essential gene, introducing a missense mutation (murE D3V) that may
impair
fitness. Replacing fts/ AGA with CGA successfully replaced the forbidden AGA
codon
while conserving the primary amino acid sequence of MurE with a minimal impact
on fitness
(FIG. 21A). Similarly, holB_AGA4 overlaps the upstream essential gene tmk, and
replacing
AGA with CGU converts the tmk stop codon to Cys, adding 14 amino acids to the
Cterminus
of tmk. While some C-terminal extensions are well-tolerated in E. coli (Ohtake
et al., 2012),
extending tmk appears to be deleterious. holB_AGA was successfully with CGC by
inserting
three nucleotides comprising a stop codon before the holB start codon. This
reduced the
tmklholB overlap, and preserved the coding sequences of both genes (FIG. 27A).
Subtler overlap errors were identified for the four remaining C-terminal
failures, where it was
determined that AGR->CGU mutations disrupt RBS motifs belonging to downstream
genes
(secE_AGG376 for nusG, dnaT AGA532 for dnaC, and f2lC_AGAAGG1249,1252 for
dedD, the latter constituting two codons). Both nusG and dnaC are essential,
suggesting that
replacing AGR with CGU in secE and dnaT lethally disrupts translation
initiation and thus
expression of the overlapping nusG and dnaC (FIG. 21B and FIG 27B). Although
dedD is
51
CA 03027882 2018-12-14
WO 2017/218727
PCT/US2017/037596
annotated as non-essential (Baba, et al., 2006), it was hypothesized that
replacing the AGR
with CGU in foIC disrupted a portion of dedD that is essential to the survival
of EcM2.1 (E.
coli K-12). In support of this hypothesis, the 29 nucleotides of dedD that
were not deleted by
Baba et al. (Baba, et al., 2006) were not deleted and did not overlap with
folC, suggesting that
this sequence is essential in the strains described. The unexpected failure of
this conversion
highlights the challenge of predicting design flaws even in well-annotated
organisms.
Consistent with the observation that disrupting these RBS motifs underlies the
failed AGR-
>CGU conversions, all three design flaws were overcome by selecting codons
that conserved
RBS strength, including a non-synonymous (Arg->Gly) conversion for secE.
These lessons, together with previous observations that ribosomes pause during
translation
when they encounter ribosome binding site motifs in coding DNA sequences (Li
et al., 2012),
provided key insights into the N-terminal AGR->CGU failures. As described
herein, RBS-
like motifs may refer to both RBS motifs (which may typically occur before a
start codon)
and similar motifs (which may occur in the open reading frame but do not
necessarily cause
translation initiation). Three of the N-terminal failures (ssb AGA10, dnaT
AGA10 and
prfB AGG64) had RBS-like motifs either disrupted or created by CGU
replacement. While
prfB_AGG64 is part of the ribosomal binding site motif that triggers an
essential frameshift
mutation in prfB (Lajoie et al., 2013a, Craigen et al., 1985, Curran et al.,
1993), pausing-
motif-mediated regulation of ssb and dnaT expression has not been reported.
Nevertheless,
ribosomal pausing data (Li et al., 2012) showed that ribosomal occupancy peaks
are present
directly downstream of the AGR codons for ssb and absent for dnaT (FIG. 28);
meanwhile,
unsuccessful CGU mutations were predicted to weaken the RBS-like motif for
prfB and ssb
and strengthen the RBS-like motif for dnaT (FIG. 21C and FIG. 27C), suggesting
a
functional relationship between RBS occupancy and cell fitness.
Consistent with this hypothesis, successful codon replacements from the
troubleshooting
pipeline conserve predicted RBS strength compared to the large predicted
deviation caused
by unsuccessful AGR->CGU mutations (FIG. 22, y axis and comparison between
orange
asterisks and green dots). Interestingly, attempts to replace dnaT AGA10 with
either CGN or
NNN failed¨only by manipulating the wobble position of surrounding codons and
conserving the arginine amino acid could dnaT AGA10 be replaced (FIG. 27C).
These
wobble variants appear to compensate for the increased RBS strength caused by
the AGA-
52
CA 03027882 2018-12-14
WO 2017/218727
PCT/US2017/037596
>CGU mutation¨ RBS motif strength with wobble variants deviated 8-fold from
the
unmodified sequence, whereas RBS motif strength for AGA->CGU alone deviated 27-
fold.
In order to better understand several remaining N-terminal failure cases that
did not exhibit
considerable RBS strength deviations (rnpA_AGG22, ftsA_AGA19, frr_AGA16, and
rpa AGA298), other potential nucleic acid determinants of protein expression
were
examined. Based on the observation that mRNA secondary structure near the 5'
end of Open
Reading Frames (ORFs) strongly impacts protein expression (Goodman et al.,
2013), it was
found that AGR->CGU mutations often changed the predicted folding energy and
structure
of the mRNA near the start codon of target genes (FIG. 21D and FIG. 29).
Successful codon
replacements obtained from degenerate MAGE oligos reduced the disruption of
mRNA
secondary structure compared to CGU (FIG. 22, green dots). For example, rnpA
has a
predicted mRNA loop near its RBS and start codon that relies on base pairing
between both
guanines of the AGG codon to nearby cytosines (FIG. 21D, FIG. 30A).
Importantly, only
AGG22CGG was observed out of all attempted rnpA AGG22CGN mutations, and the
fact
that only CGG preserves this mRNA structure suggests that it is
physiologically important
(FIG. 21D, FIG. 30B-30C). In support of this, a rnpA AGG22CUG mutation (Arg-
>Leu)
was successfully introduced only when the complementary nucleotides in the
stem were
changed from CC (base pairs with AGG) to CA (base pairs with CUG), thus
preserving the
natural RNA structure (FIG. 30D) while changing both RBS motif strength and
amino-acid
identity.
The analysis of all four optimized gene sequences showed reduced deviation in
computational mRNA folding energy (computed with UNAFold(Markham et al.,
2008))
compared to the unsuccessful CGU mutations (FIG. 22, x-axis orange asterisks
and green
dots). Similarly, predicted mRNA structure (computed with a different mRNA
folding
software: NUPACK(Zadeh et al., 2011)) for these genes was strongly changed by
CGU
mutations and corrected in the empirically optimized solutions (FIG. 29).
Troubleshooting these 13 recalcitrant codons revealed that mutations causing
large
deviations from natural mRNA folding energy or RBS strength are associated
with failed
codon substitutions. By calculating these two metrics for all attempted AG-
>CGU
mutations, a safe replacement zone (SRZ) was empirically defined inside which
most CGU
mutations were tolerated (FIG. 22, shaded area). The SRZ is defined as the
largest multi-
dimensional spac e which contains none of the mRNA folding energy or RBS
strength
53
CA 03027882 2018-12-14
WO 2017/218727
PCT/US2017/037596
associated recalcitrant AGR->CGU mutations (FIG. 22, red asterisks). It
comprises
deviations in mRNA folding energy of less than 10% with respect to the natural
codon and
deviations in RBS-like motif scores of less than a half log with respect to
the natural codon,
providing a quantitative guideline for codon substitution. Notably, the
optimized solution
used to replace the 13 recalcitrant codons always exhibited reduced deviation
for at least one
of these two parameters than the deviation seen with mutation to CGU.
Furthermore,
solutions to the 13 recalcitrant codons overlapped almost entirely with the
empirically-
defined SRZ. These results suggest that computational predictions of mRNA
folding energy
and RBS strength can be used as a first approximation to predict whether a
designed
mutation is likely to be lethal. By developing in silico heuristics to predict
problematic
alleles in turn reduces the search space required for in vivo genome
engineering, making it
possible to create radically altered genomes that remain viable.
Once viable replacement sequences were identified for all 13 recalcitrant
codons, the
successful 110 CGU conversions were combined with the 13 optimized codon
substitutions
to produce strain C123, which has all 123 AGR codons removed from all of its
annotated
essential genes. C123 was then sequenced to confirm AGR removal and analyzed
using
Millstone, a publicly available genome resequencing analysis pipeline (Goodman
et al.,
2015). Two spontaneous AAG (Lys) to AGG (Arg) mutations were observed in the
essential
genes pssA and cca. While attempts to revert these mutations to AAG were
unsuccessful¨
perhaps suggesting functional compensation¨they were replaced with CCG (Pro)
in pssA
and CAG (Gin) in cca using degenerate MAGE oligos. The resulting strain,
C123a, is the
first strain completely devoid of AGR codons in its annotated essential gene.
This strain
provides strong evidence that AGR codons can be completely removed from the E.
coli
genome, permitting the unambiguous reassignment of AGR translation function.
Kinetic growth analysis showed that the doubling time increased from 52.4 (+/-
2.6) minutes
in EcM2.1 (0 AGR codons changed) to 67 (+/- 1.5) minutes in C123a (123 AGR
codons
changed in essential genes) in lysogeny broth (LB) at 34 C in a 96-well plate
reader.
Notably, fitness varied significantly during C123 strain construction (FIG.
20B). This may
be attributed to codon deoptimization (AGR->CGU) and compensatory spontaneous
mutations to alleviate fitness defects in a mismatch repair deficient (mutS-)
background.
Overall the reduced fitness of C123a may be caused by on-target (AGR->CGU) or
off-
target (spontaneous mutations) that occurred during strain construction. In
this way, mutS
54
CA 03027882 2018-12-14
WO 2017/218727
PCT/US2017/037596
inactivation is simultaneously a useful evolutionary tool and a liability.
Final genome
sequence analysis revealed that along with the 123 desired AGR conversions,
C123a had 419
spontaneous non-synonymous mutations not found in the EcM2.1 parental strain
(FIG. 35).
Of particular interest was the mutation argU Gl5A, located in the D arm of
tRNAArg
(argU), which arose during CoS-MAGE with AGR set 4. It was hypothesized that
argU Gl5A compensates for increased CGU demand and decreased AGR demand, but
no
direct fitness cost associated with reverting this mutation in C123 was
observed, and
argU Gl5A does not impact aminoacylation efficiency in vitro or aminoacyl-tRNA
pools in
vivo (FIG. 31). Consistent with Mukai et al. and Baba et al. (Mukai et al.,
2015, Baba, et al.,
2006), argW (tRNAArg
ccu; decodes AGG only) was dispensable in C123a because it can
be complemented by argU (tRNAArg
ucu; decodes both AGG and AGA). However, argU
is the only E. coil tRNA that can decode AGA and remains essential in C123a
probably
because it is required to translate the AGR codons for the rest of the
proteome (Lajoie et al.,
2013b).
To evaluate the genetic stability of C123a after removal of all AGR codons
from all the
known essential genes, C123a was for passaged 78 days (640 generations) to
test whether
AGR codons would recur and/or whether spontaneous mutations would improve
fitness.
After 78 days, no additional AGR codons were detectable in a sequenced
population, and
doubling time of isolated clones ranged from 22% faster to 22% slower than
C123a (n=60).
To gain more insight into how local RBS strength and mRNA folding impact codon
choice,
an evolution experiment was performed to examine the competitive fitness of
all 64 possible
codon substitutions at each of AGR codons. While MAGE is a powerful method to
explore
viable genomic modifications in vivo, it was of interest to map the fitness
cost associated
with less-optimal codon choices, requiring codon randomization depleted of the
parental
genotype, which was hypothesized to be at or near the global fitness maximum.
To do this, a
method called CRAM (Crispr-Assisted-MAGE) was developed. First, oligos were
designed
that changed not only the target AGR codon to NNN, but also made several
synonymous
changes at least 50 nt downstream that would disrupt a 20 bp CRISPR target
locus. MAGE
was used to replace each AGR with NNN in parallel, and CRISPR/cas9 was used to
deplete
the population of cells with the parental genotype. This approach allowed
exhaustive
exploration of the codon space, including the original codon, but absent the
preponderance of
CA 03027882 2018-12-14
WO 2017/218727
PCT/US2017/037596
the parental genotype. Following CRAM, the population was passaged 1:100 every
24 hours
for six days, and sampled prior to each passage using Illumina sequencing
(FIG. 23).
Sequencing 24 hours after CRAM showed that all codons were present (including
stop
codons) (FIG. 32), validating the method as a technique to generate massive
diversity in a
population. All sequences for further analysis were amplified by PCR with
allele-specific
primers containing the changed downstream sequence. Subsequent passaging of
these
populations revealed many gene-specific trends (FIG. 23, FIG. 33, FIG. 33).
Notably, all
codons that required troubleshooting (dnaT AGA10, ftsA AGA19, frr AGA16,
rnpA AGG22) converged to their wild-type AGR codon, suggesting that the
original codon
was globally optimized. For all cases where an alternate codon replaced the
original AGR,
the predicted deviation in mRNA folding energy and local RBS strength (as a
proxy for
ribosome pausing) was computed for these alternative codons and compared these
metrics to
the evolution of codon distribution at this position over time. The fraction
of sequences that
fall within the SRZ inferred was also computed from FIG. 22. CRAM initially
introduced a
large diversity of mRNA folding energies and RBS strengths, but these
genotypes rapidly
converged toward parameters that are similar to the parental AGR values in
many cases
(FIG. 23, overlays). Codons that strongly disrupted predicted mRNA folding and
internal
RBS strength near the start of genes were disfavored after several days of
growth, suggesting
that these metrics can be used to predict optimal codon substitutions in
silico. In contrast,
non-essential control genes bcsB and chpS did not converge toward codons that
conserved
RNA structure or RBS strength, supporting the conclusion that the observed
conservation in
RNA secondary structure and RBS strength is biologically relevant for
essential genes.
Interestingly, tilS AGA19 was less sensitive to this effect, suggesting that
codon choice at
that particular position is not under selection. Additionally, the average
internal RBS
strength for the ipsG populations converged towards the parental AGR values
whereas
mRNA folding energy averages did not, suggesting that this position in the
gene may be
more sensitive to RBS disruption rather than mRNA folding. Gene 1ptF followed
the
opposite trend.
Interestingly, several genes (lptF, ipsG, tilS, gyrA and rimN) preferred
codons that changed
the amino acid identity from Arg to Pro, Lys, or Glu, suggesting that non-
coding functions
trump amino acid identity at these positions. Importantly, all successful
codon substitutions
in essential genes fell within the SRZ (FIG. 24), validating the heuristics
based on an
56
CA 03027882 2018-12-14
WO 2017/218727
PCT/US2017/037596
unbiased test of all 64 codons. Meanwhile non-essential control gene chpS
exhibited less
dependence on the SRZ. Based on these observations, while global codon bias
may be
affected by tRNA availability (Plotkin et al., 2011, Novoa et al., 2012,
Ikemura, 1985),
codon choice at a given position may be defined by at least 3 parameters: (1)
amino acid
sequence, (2) mRNA structure near the start codon and RBS (3) RBS-mediated
pausing. In
some cases, a subset of these parameters may not be under selection, resulting
in an evolved
sequence that only converges for a subset of the metrics. In other cases, all
metrics may be
important, but the primary nucleic acid sequence might not have the
flexibility to
accommodate all of them equally, resulting in codon substitutions that impair
cellular fitness.
These rules were used to generate a draft genome in silico with all AGR codons
replaced
genome-wide, reducing by almost fourfold the number of predicted design flaws
(e.g.,
synonymous codons with metrics outside of the SRZ) compared to the naïve
replacement
strategy (FIG. 25A-25B, FIG. 34). Furthermore, predicting recalcitrant codons
provides
hypotheses that can be rapidly tested in vivo using MAGE. Successful
replacement
sequences can then be implemented together in a redesigned genome. These rules
are
expected to increase the tractability of creating a genome completely devoid
of AGR codons,
which could be used for unambiguously reassigning AGR translation function.
Comprehensively removing all instances of AGR codons from E. coli essential
genes
revealed 13 design flaws which could be explained by a disruption in coding
DNA
Sequence, RBS-mediated translation initiation/pausing, or mRNA structure.
While the
importance of each factor has been reported, methods described herein
systematically explore
to what extent and at what frequency they impact genome function. Furthermore,
methods
described herein establish quantitative guidelines to reduce the chance of
designing non-
viable genomes. Although additional factors undoubtedly impact genome
function, the fact
that these guidelines captured all instances of failed synonymous codon
replacements (FIG.
22) suggests that the disclosed genome design guidelines provide a strong
first
approximation of acceptable modifications to the primary sequence of viable
genomes.
These design rules coupled with inexpensive DNA synthesis will facilitate the
construction
of radically redesigned genomes exhibiting useful properties such as
biocontainment, virus
resistance, and expanded amino acid repertoires (Lajoie et al., 2015).
57
CA 03027882 2018-12-14
WO 2017/218727
PCT/US2017/037596
Materials and Methods:
Strains and Culture Methods Used:
The strains used in this work were derived from EcM2.1 (Escherichia coil
MG1655
mutS mut dnaG Q576AexoX mut xonA mut xseA mut 1255700::tolQRA A(ybhB-
bioAB):: RcI857 N(cro-ea59)::tetR-blal) (Carr et al., 2012). Liquid culture
medium consisted
of the Lennox formulation of Lysogeny broth (LBL; 1% w/v bacto tryptone, 0.5%
w/v yeast
extract, 0.5% w/v sodium chloride) (Lennox, 1955) with appropriate selective
agents:
carbenicillin (50 jtg/mL) and SDS (0.005% w/v). For tolC counter-selections,
colicin El
(colE1) was used at a 1:100 dilution from an in-house purification (Schwartz
et al., 1971) that
measured 14.4 jig protein/4 (Isaacs et al., 2011, Lajoie et al., 2013b), and
vancomycin was
used at 64 pg/mL. Solid culture medium consisted of LBL autoclaved with 1.5%
w/v Bacto
Agar (Fisher), containing the same concentrations of antibiotics as necessary.
ColE1 agar
plates were generated as described previously (Gregg et al., 2014). Doubling
times were
determined on a Biotek Eon Microplate reader with orbital shaking at 365 cpm
at 34 C
overnight, and analyzed using a matlab script.
Oligonucleotides, Polymerase Chain Reaction, and Isothermal Assembly
PCR products used in recombination or for Sanger sequencing were amplified
with Kapa 2G
Fast polymerase according to manufacturer's standard protocols. Multiplex
allele-specific
PCR (mascPCR) was used for multiplexed genotyping of AGR replacement events
using the
KAPA2G Fast Multiplex PCR Kit, according to previous methods (Isaacs et la.,
2011,
Mosberg et al., 2012). Sanger sequencing reactions were carried out through a
third party
(Genewiz). CRAM plasmids were assembled from plasmid backbones linearized
using PCR
(Yaung et al., 2014), and CRISPR/PAM sequences obtained in Gblocks from IDT,
using
isothermal assembly at 50 C for 60 minutes. (Gisbon et al., 2009).
Lambda Red Recombinations, MAGE, & CoS-MAGE
Red recombineering, MAGE, and CoS-MAGE were carried out as described
previously
(Gregg et al., 2014, Wang et al., 2009). In singleplex recombinations, the
MAGE oligo was
used at 1 1.1M, whereas the co-selection oligo was 0.2 jt.M and the total
oligopool was 5 jtM in
58
CA 03027882 2018-12-14
WO 2017/218727
PCT/US2017/037596
multiplex recombinations (7-14 oligos). When double-stranded PCR products were
recombined (e.g., tolC insertion), 100 ng of double-stranded PCR product was
used. Since
CoS-MAGE was used with to/C selection to replace target AGR codons, each
recombination
was paired with a control recombined with water only to monitor to/C selection
performance.
The standard CoS-MAGE protocol for each oligo set was to insert to/C,
inactivate to/C,
reactivate to/C, and delete to/C. MascPCR screening was performed at the to/C
insertion,
inactivation and deletion steps. All 2,, Red recombinations were followed by a
recovery in 3
mL LBL followed by a SDS selection (to/C insertion, to/C activation) or ColE1
counter-
selection (to/C inactivation, to/C deletion) that was carried out as
previously described
(Gregg et al., 2014).
General AGR replacement strategy
AGR codons in essential genes were found by cross-referencing essential gene
annotation
according to two complementary resources (Baba, et al., 2006, Hashimoto et
al., 2005) to find
the shared set (107 coding regions), which contained 123 unique AGR codons (82
AGA, 41
AGG). optMAGE (Ellis et al., 2001, Wang et al., 2009) was used to design 90-
mer oligos
(targeting the lagging strand of the replication fork) that convert each AGR
to CGU. The total
number of AGR replacement oligos was reduced to 119 by designing oligos to
encode
multiple edits where possible, maintaining at least 20 bp of homology on the
5' and 3' ends
of the oligo. The oligos were then pooled based on chromosomal position into
twelve MAGE
oligo sets of varying complexity (minimum: 7, maximum: 14) such that a single
marker
(to/C) could be inserted at most 564,622 bp upstream relative to replication
direction for all
targets within a given set. to/C insertion sites were identified for each of
the twelve pools
either into intergenic regions or non-essential genes that met the distance
criteria for a given
pool. See Table 5 for descriptors for each of the 12 oligo pools.
Troubleshooting strategy
A recalcitrant AGR was defined as one that was not converted to CGU in one of
at least 96
clones picked after the third step of the conversion process. The recalcitrant
AGR codon was
then triaged for troubleshooting (FIG. 12A) in the parental strain (EcM2.1).
First, the
sequence context of the codon was examined for design errors or potential
issues, such as
misannotation or a disrupted RBS for an overlapping gene. In most cases,
corrected oligos
could be easily designed and tested. If no such obvious redesign was possible,
AGR was
59
CA 03027882 2018-12-14
WO 2017/218727
PCT/US2017/037596
attempted to be replaced with CGN mutations. If attempting to replace AGR with
CGN failed
to give recombinants, compensatory, synonymous mutations were tested in a 3
amino acid
window around the recalcitrant AGR. If needed, synonymous stringency was
relaxed by
recombining with oligos encoding AGR-to-NNN mutations. After each step in the
troubleshooting workflow, 96 clones from 2 successive CoS-MAGE recombinations
were
screened using allele specific PCR with primers that hybridize to the wildtype
genotype.
Sequences that failed to yield a wild-type amplicon were Sanger sequenced to
confirm
conversion. Doubling time was measured of all clones in LBL to pair sequencing
data with
fitness data, and chose the recombined clone with the shortest doubling time.
Doubling time
was determined by obtaining a growth curve on a Biotek plate reader (either an
Eon or H1),
and analyzed using web-based open source genome resequencing software. This
genotype
was then implemented in the complete strain at the end of strain construction
using MAGE,
and confirmed by MASC-PCR screening.
mRNA folding and RBS strength computation
A custom Python pipeline was used to compute mRNA folding and RBS strength
value for
each sequence. mRNA folding was based on the UNAFold calculator (Markham et
al., 2008)
and RBS strength on the Salis calculator (Salis, 2011). The parameters for
mRNA folding are
the temperature (37 C) and the window used which was an average between -
30:+100nt and -
15:+100nt around the start site of the gene and was based on Goodman et al.,
2013. The only
parameter for RBS strength is the distance between RBS and promoter and
between 9 and 10
nt was averaged after the codon of interest based on Li et al., 2012. Data
visualization was
performed through a custom Matlab code.
Whole genome sequencing of strains lacking AGR codons in their essential genes
Sheared genomic DNA was obtained by shearing 130uL of purified genomic DNA in
a
Covaris E210. Whole genome library prep was carried out as previously
described (Rohland
et al., 2012). Briefly, 130 uL of purified genomic DNA was sheared overnight
in a Covaris
E210 with the following protocol: Duty cycle 10%, intensity 5, cycles/burst
200, time 780
seconds/sample. The samples were assayed for shearing on an agarose gel and if
the
distribution was acceptable (peak distribution ¨400 nt) the samples were size-
selected by
SPRI/Reverse-SPRI purification as described in (Rohland et al., 2012). The
fragments were
then blunted and p5/p7 adaptors were ligated, followed by fill-in and gap
repair (NEB). Each
CA 03027882 2018-12-14
WO 2017/218727
PCT/US2017/037596
sample was then qPCR quantified using SYBR green and Kapa Hifi. This was used
to
determine how many cycles to amplify the resulting library for barcoding using
P5-sol and
P7-sol primers. The resulting individual libraries were quantified by Nanodrop
and pooled.
The resulting library was quantified by qPCR and an Agilent Tapestation, and
run on MiSeq
2x150. Data was analyzed to confirm AGR conversions and to identify off-target
mutations
using Millstone, an web-based open-source genome resequencing tool.
NNN-sequencing and CRISPR
CRISPR/Cas9 was used to deplete the wildtype parental genotype by selectively
cutting
chromosomes at unmodified target sites next to the desired AGR codons changes.
Candidate
sites were determined using the built-in target site finder in Geneious
proximally close to the
AGR codon being targeted. Sites were chosen if they were under 50 bp upstream
of the AGR
codon and could be disrupted with synonymous changes. If multiple sites
fulfilled these
criteria, the site with the lowest level of sequence similarity to other
portions of the genome
was chosen. Oligos of a length of ¨130 bp were designed for all 24 genes with
an AGR
codon in the first 30 nt after the translation start site. Those oligos
incorporated both an NNN
random codon at the AGR position as well as multiple (up to 6) synonymous
changes in a
CRISPR target site at least 50 nt downstream of an AGR codon. This modifies
the AGR
locus at the same time as disrupting the CRISPR target site, ensuring
randomization of the
locus after the parental genotype is deleted. Recombinations were performed in
the parental
strain EcM2.1 carrying the Cas9 expressing plasmid DsCas9. For each of 24
genes, five
cycles of MAGE were performed with the specific mutagenesis oligo at a
concentration of
luM. CRISPR repeat-spacer plasmids carrying guides designed to target the
chosen sites, and
were electroporated into each diversified pool after the last recombineering
cycle. After 1
hour of recovery, both the DsCas9 and repeat-spacer plasmids were selected
for, and
passaged in three parallel lineages for each of the 24 AGR codons for 144 hrs.
After 2 hours
of selection, and at every 24 hour interval, samples were taken and the cells
were diluted
1/100 in selective media.
Each randomized population was amplified using PCR primers allowing for
specific
amplification of strains incorporating the CRISPR-site modifications. The
resulting triplicate
libraries for each AGR codon were then pooled and barcoded with P5-sol and P7-
sol primers,
and run on a MiSeq 1x50. Data was analyzed using custom Matlab code.
61
CA 03027882 2018-12-14
WO 2017/218727
PCT/US2017/037596
For each gene and each data point, reads were aligned to the reference genome
and
frequencies of each codon were computed. In FIG. 23, the mRNA structure
deviation (red
line) and RBS strength deviation (blue line) in arbitrary units were computed
based as the
product of the frequencies and the corresponding deviation for each codon.
EXAMPLE III
Genome Engineering Toolkit and Multi-locus Validation Experiment
Methods described herein make use of the Genome Engineering Toolkit (GETK), a
software
library for reassigning codons genome-wide. GETK software supports design and
synthesis
of recoded genes and whole genomes (FIG. 36A). The software takes into account
biophysical constraints to choose the best codon reassignment, minimizing the
risk of
redesigned organisms that are impaired or inviable. Using software encoding
methods
described herein, experiments were we carried recoding positions throughout
the genome
and demonstrating that the codon choices specified by the methods described
herein reduce
the risk of design exceptions.
To validate the design rules described herein, an experiment was carried out
to test
synonymous codon substitutions throughout the genome. 235 codon competition
experiments
were designed, and prioritized according to the predicted difficulty of codon
replacement.
Positions were selected where at least one of mRNA, RBS, or internal RBS were
predicted by
the design rules to be significantly disrupted for at least one alternative
codon. The 6
forbidden sense codons as in Example I were considered: AGA (Arg), AGG (Arg),
AGC
(Ser), AGU (Ser), UUG (Leu), and UUA (Leu). Positions were prioritized where
the design
rule-predicted score max {mRNA RBS internal RBS} exceeded a threshold, or at
least
one bad recoding existed. For each sub-experiment, MAGE oligos were designed
that
introduce synonymous codons at the target. For some sub-experiments, MAGE
oligos were
designed that introduce non-synonymous mutations. Each sub-experiment was
performed in a
separate well and MAGE was used to electroporate the oligo set for that sub-
experiment. The
population was sampled at regular intervals and diluted to maintain
logarithmic-phase
growth. The samples were sequenced and used to quantify codon abundance, which
was then
used to calculate relative fitness (FIG. 36B).
62
CA 03027882 2018-12-14
WO 2017/218727
PCT/US2017/037596
Predicted scores were compared to experimental fitness measurements (FIG.
36C). Our
experiments reveal that alternative codon predictions can minimize design
issues. In the case
of testing single codon changes at the 5-prime ends of essential genes, codons
categorized as
having good scores (minimal predicted disruption of mRNA folding, ribosome
binding site
strength, and internal ribosome pausing sites) result in significantly less
fitness impact (K-S
test). Testing combinations of codon swaps within the same 90-mer oligo window
showed
even stronger correspondence between predicted scores and observed fitness
(FIG. 37).
As a null-effect controls, synonymous codons and early stop codons were
introduced into
non-essential genes LacZ and GalK at multiple positions, showing similar
effect between
synonymous codons and internal stops (FIG. 38, top row). As strong-effect
controls,
synonymous codons and internal stop codons were introduced into essential
genes. These
show a marked difference between internal stop and synonymous codons, with a
greater
dynamic range of codon preference at some positions (FIG. 38, bottom row).
Beyond testing synonymous substitutions, non-synonymous substitutions observed
in
phylogenetic neighbors of E. colt (gammaproteobacteria, e.g. Salmonella
enter/ca) that score
well according to the rules described herein were tested for ability to
replace codons.
Preventing disruption of internal RBS motifs is an effective rule for
selecting codons internal
to genes, both for loci with potential high RBS disruption (FIG. 39)
(Kolmogorov-Smirnov p
= 3.E-14) and for loci observed to have strong ribosomal pausing peaks (Li et
al., 2012) (FIG.
40) (Kolmogorov-Smirnov p = 7.9E-05).
Choosing Genomic Locus Targets:
Targets for the 235-codon competition experiments were organized into three 96-
well plates:
Plate 1: Single codon changes in 5-prime of essential genes
95 codons were chosen that occur near the 5-prime end of essential genes, (-
30, +100) bases
relative to the start codon. Positions were considered where the worst
possible score exceeds
thresholds for at least one filter (poor RBS or mRNA folding prediction), as
described by the
filter:
single_codon_any_bad_max = single_codon_agg_data_dfl
(sin gle_codon_agg_data_df I'm ax_RBS Jog_ratio > 3.3) I
63
CA 03027882 2018-12-14
WO 2017/218727
PCT/US2017/037596
(single_codon_agg_data_df I'm ax_mRNA_positive_ratio] > 1.1) I
(single_codon_agg_data_df I'm ax_internal_RBS_scorel > 4.1)]
The threshold values were chosen as follows:
RBS Jog_ratio: 3.3 = 1 + math log_e(10)
mRNA_positive_ratio: 1.1 = 10% deviation
max_internal_RBS_score: 4.1 = 3.3 + a bit more to get down to < 96-well plate
The candidate set contains targets with at least one problem in the design
(i.e. the worst
design is bad). At least two of these targets introduce non-synonymous
mutations into
overlapping genes, allowing testing the aspect of the software that balances
amino acid sense
against preservation of regulatory gene expression signals.
Plate 2: Combos of codon changes and adjacent degenerate tests
From among the single changes, those that occur adjacent to others within a 90-
basepair
oligonucleotide size were combined into a new set of sub-experiments that
tested all
combinations of adjacent oligos. There were 62 such targets.
12 sub-experiments were designed with synonymous codon swaps in non-forbidden
codons
adjacent to forbidden codons. Oligos were designed that bring in all
synonymous codon
swaps on either side of some choice forbidden codons, e.g. the region
surrounding an
arginine V-R-G might look like GTN-CGN-GGN in an oligo. For these, recodings
were
targeted which have a score that exceeds threshold values with the best
synonymous codon
swap, where even the best synonymous solution is bad.
Plate 3: Testing Phylogenetic Conservation
The final 66 sub-experiments were designed to test phylogenetic conservation
as a source of
permitted non-synonymous substitutions. Seven strains of gammaproteobacteria
were aligned
and codons were identified that have non-synonymous variants relative to E.
coil. Targets
were tested around the 5-prime ends of essential genes as well as targets in
the middle of
essential genes. For conservation 5-prime targets, a subset was chosen of non-
synonymous
64
CA 03027882 2018-12-14
WO 2017/218727
PCT/US2017/037596
changes observed in phylogenetic conservation data for which there is a
possible bad score,
as described by:
conservation_5_prime_non_synonymous_df = conservation_5_prime_dfi
(conservation_5_prime_difreplacement_codonl.apply(
lambda c: c not in FORBIDDEN CODONS)) &
(¨conservation_5_prime_difis_synonymous11)]11:1
conservation_5_prime_synonymous_only_bad_df =
conservation_5_prime_non_synonymous_df[
(conservation_5_prime_non_synonymous_difmax_mRNA_positive_ratio'l >
1.1)1
(conservation_5_prime_non_synonymous_difmax_RBSJog_ratio'l > 3.3)1
(conservation_5_prime_non_synonymous_difmax_internal_RBS_score'] > 4.1)
conservation_5_prime_first_30nt_bad_score =
conservation_5_prime_non_synonymous_df[
(conservation_5_prime_non_synonymous_df[codon_start'l <30) &
((conservation_5_prime_non_synonymous_difmRNA_positive_ratio'l > 1.1)1
(conservation_5_prime_non_synonymous_df[RBS Jog_ratio'l > 3.3)1
(conservation_5_prime_non_synonymous_difinternal_RBS_score'l > 3.3))
conservation_5_prime_targets_df = pd.concat([
conservation_5_prime_synonymous_only_bad_df,
conservation_5_prime_first_30nt_bad_scorel)
conservation_5_prime_targets_dfdrop_duplicates(inplace=True)
These selections were competed against the corresponding single codon
degenerate oligo
from plate 1.
CA 03027882 2018-12-14
WO 2017/218727
PCT/US2017/037596
For conservation in middle of genes, the ¨3500 candidate targets in essential
genes were
reduced using two criteria: 1) internal RBS score with a bad potential maximum
with
synonymous changes and 2) locations of peaks from ribosomal pausing data (Li
et al., 2012).
For internal RBS, 12 targets at 9 unique positions were chosen, for a total of
21 oligos. This
filter used is:
conservation_middle_of_genes_df = conservation_essentals_df1
(conservation_essentals_df[codon_start] > 30) &
(conservation_essentals_dif scoring_gene'l
conservation_essentals_df[codon_gene'D &
(conservation_essentals_difreplacement_codon] .apply(
lambda c: c not in FORBIDDEN CODONS)) &
(¨conservation_essentals_difis_synonymousl) &
(conservation_essentals_dif max_internal_RB S_score'l > 6.5) &
(conservation_essentals_difinternal_RBS_score'l <
conservation_essentals_difmin_internal_RBS_score'D
For Weissman, 14 targets at 9 unique positions, or 23 oligos were chosen.
Oligonucleotides were designed as described in (Wang et al., 2009). DNA was
synthesized
by industrial partners IDT DNA technologies (Coralville, IA).
Strains & Culture
EcM2.1 naïve strains were used for the competition experiment (EcM2.1 is a
strain
optimized for MAGE - Escherichia coil MG1655 mutS mut dnaG Q576A exoX mut
xonA mut xseA mut 1255700: :tolQRA A(ybhB-bioAB)::P,c1 857 N(cro-ea59)::tetR-
bla]).
66
CA 03027882 2018-12-14
WO 2017/218727
PCT/US2017/037596
Liquid culture medium consisted of the Lennox formulation of Lysogeny broth
(LBL;
1% w/v bacto tryptone, 0.5% w/v yeast extract, 0.5% w/v sodium chloride) with
appropriate
selective agents: carbenicillin (50 jt.g/mL). Solid culture medium consisted
of LBL autoclaved
with 1.5% w/v Bacto agar (Thermo Fisher Scientific Inc.), containing the same
concentrations of antibiotics as necessary.
Experiment Setup
The recombineering experiments using the EcM2.1 strain were carried out as
described previously, and in the same conditions for all different competition
experiment.
Depending on the experiment, the total oligo pool was adjusted to a maximum of
5 laM.
After transformation of the oligos, cells were taken out at 1, 3, 5, 7 and 24
hrs to be
sequenced. Dilution were performed so as to maintain cells in constant log
phase. At each
timepoint, cells were plated on permissive media so as to count the number of
cells present in
the pools. Based on these numbers, we were able to compute the number of
doublings
between each timepoint.
Timepoint # of Doublings
1 hr 1
3 hr 3
hr 7
7 hr 10
Sequencing
Each population was amplified and barcoded with Illumina P5 and P7 primers,
pooled, and sequenced using a MiSeq or NextSeq using a PE-150 kit. Reads were
demultiplexed to the reference genome and frequencies of each codon were
computed for
each sub-experiment.
Estimating Relative Allele Fitness and Scoring
67
CA 03027882 2018-12-14
WO 2017/218727
PCT/US2017/037596
For each sub-experiment, the relative frequency of each codon was calculated.
Then
the fractions were normalized relative to the fraction at the first timepoint.
Then, for each
codon, the fitness was inferred by fitting a logarithmic function to the codon
fraction across
all time points and taking the decay constant as a measure fitness. The mRNA
structure
deviation and RBS strength deviation were computed using GETK and scores were
compared
to empirically measured fitness.
68
CA 03027882 2018-12-14
WO 2017/218727 PCT/US2017/037596
TABLES
Table 1. Genome Design Rules ¨ Biological Constraints
Rule Motivation Implementation
A Fix gene Forbidden codons may fall in the Use synonymous codon swaps
overlaps: Perform overlapping region of two genes. (Genbank annotation: adj
base ov)
minimal Sometimes it may be possible to to avoid introducing on
synonymous
synonymous remove changes in overlapping genes.
codon swaps forbidden codons through
required to synonymous swaps alone. In other Use computational RBS
motif
properly recode cases, in order to avoid strength prediction to maintain
RBS
both overlapping introducing nonsynonymous motif
genes. mutations or disrupting regulatory In short gene overlaps,
attempt to
motifs such as ribosome binding minimize editing, for example
sites (RBS), it is necessary to reduce 4 nucleotide overlap to 1
separate the genes first so that nucleotide (see FIG. 9A (i))
codons in each gene can
be replaced independently.
If minimal overlap fix does not
If necessary ¨
preserve RBS motif, separate the
separate by
d overlap by copying the
overlapping
uplicating
sequence and 15-20 base pairs
overlapping
upstream, to preserve native RBS
regions [202
(see FIG. 9A (ii)) Genbank
instances]
annotation: fix overlap.
Reduce homology To separate overlapping genes, the Perform synonymous codon
swaps
between sequences are duplicated, creating in copied regions to
reduce
duplicated two tandem paralogous regions. homology while
maintaining
regions through These two paralogs have the regulatory motifs. (Genbank
non-disruptive potential to recombine annotation:adj base ov)
shuffling of spontaneously which could cause
copied region a disruptive change in either the
upstream or downstream gene.
This spontaneous recombination
was prevented by shuffling the
codons of the upstream paralog,
thus maintaining the native
nucleotide sequence of the N-
terminus of the downstream gene
and 15-20 bases upstream. This
region has shown to be important
for mRNA folding and translation
initiation
Rule Motivation Implementation
B Preserve 5-prime Gene expression is affected by Use thermodynamics-
based
mRNA secondary mRNA secondary structure secondary structure prediction
to
structure of genes compare mRNA free energy (AG) of
69
CA 03027882 2018-12-14
WO 2017/218727 PCT/US2017/037596
wild-type and recoded sequence.
Minimize AG change across 40-bp
windows centered at modified
codons.
Preserve GC Related to DNA stability, mRNA Maintain GC content when
choosing
content secondary structure. among alternative codons.
Minimize
AGC across 40 base pair windows
centered at modified codons.
Rebalance codon Preserve codon usage bias for Ensure selection of alternate
codons
usage remaining 57 codons in order to is consistent with
global distribution
preserve expression dynamics that of codon choice; both for recoding
are dependent on aa-tRNA and heterologous expression.
availability.
CA 03027882 2018-12-14
WO 2017/218727 PCT/US2017/037596
Table 2. Genome Design Rules ¨ Synthesis Constraints
Rule Motivation Implementation
C Remove REP regions were found to be Replace each REP sequence with
unique
repetitive over-enriched in DNA fragments terminator sequence drawn from
(REP) that failed the repetitiveness orthogonal set. Note that
not all REPs
sequences metric for commercial synthesis were deleted as some were
tolerated for
[132 and/or failed during synthesis. DNA synthesis.
instances] Hypothesizing that these REP
elements were used as Genbank annotation:
transcriptional terminators, it rep to term.
was tested whether they could be
replaced with synthetic
terminator sequences (data not
shown). It was found that REP
sequences could be replaced with
synthetic transcriptional
terminators with no measurable
effect.
D Remove DNA synthesis vendor constraint Disruption of restriction
enzyme motifs
restriction using synonymous codon swaps.
sites needed (Genbank annotation: adj base RE)
for synthesis Preserve functional RNA (e.g. rRNA)
[AarI: 972 secondary structure when necessary.
If
instances, outside of coding regions, change
single
BsaI: 182 nucleotides to avoid disrupting
annotated
instances, regulatory motifs. (Genbank
annotation:
BsmBI: 954 adj base RNA)
instances]
E Remove DNA synthesis vendor In coding sequence, synonymous codon
homopolymer constraint: remove sequence swaps were performed. In intergenic
runs [158 of more than 8 consecutive sequence, minimal nucleotide
changes
instances] A, C, T or more than 5 were performed that avoid disrupting
consecutive G annotated regulatory motifs.
(Genbank
annotation: adj base hp)
NA Rebalance DNA synthesis vendor If coding sequence contains very
high/
GC content constraint: 0.30 < GC <0.75. low GC content, use synonymous
codon
extremes swaps to normalize GC content.
Genbank
annotation: adj base GC)
If intergenic sequences contains high/low
GC content, introduce minimal
nucleotide changes to avoid disrupting
annotated regulatory motifs. (Genbank
annotation:adj base GC)
Rule Motivation Implementation
71
CA 03027882 2018-12-14
WO 2017/218727 PCT/US2017/037596
F Partition genome Splitting operons were avoided Allow 5 kb
variability in segment
into 87 50-kb so that segments remain size to find partitioning that
keep
µ`segments" at modular and can be redesigned whole operons together.
operon boundaries independent of each other.
Genbank annotation: segment.
G Partition each 2-4 kb was used as the primary Choose partitioning to
minimize
µ`segment" into ¨15 synthesis unit, as offered by secondary structure at 50
base pair
synthesiscompatible vendors. 50 bp overlaps enable overlaps to maximize
success rate in
fragments of 2-4 kb homologous-recombination yeast assembly.
with 50 bp overlaps based assembly in S.
between adjacent cerevisiae. Genbank annotation: synthesis
Jrag.
fragments
72
CA 03027882 2018-12-14
WO 2017/218727
PCT/US2017/037596
Table 3. Primers used for PCR of kanamycin cassette for chromosomal deletion.
Cassette .,-, .. . 4 . = . .
r 011.3ianc primer
terse prim er
Km.Deletrosa-seg0 eAA AAA A?...T. AIC . AAA T.L.I., -fa': AT A TAT ICC
CCA .A.A7 oak cAc Ace
AAA Ace OLT TAG IA TA ITT ITC CIG GAT A IC AGG C=CI AIC ICC T.TA T.TA
C.',AA AA:4.CW
ATC CTFCAACTC_AGC AAA AGTTC A.W.!. GAG CAT CAA ATC
KanDetetioi3,Segi CAA IA C( CCA GCC GGA A.A.A. ALA GTC AX ?AT AT
cr.rArrTAC ITT
CC,' ..: LA AAC: CAC CTT TAT ATT CiTG CM. AAC UT A1T GAT GM CAC; 'TrA TTA
GA.A. AAA CIC
ATC CIT CAA CTCAGC.A.AA AGT TC ATC GAG CAT CA.A.ATG
K2nDe ieton,seg2 AAA TAC: two CCACTGTGAAT'TJ. TCA CCG GOC ATT Ciai
T.CG TIT ATGCGC
CCC, = C1GC C-CG TAO AGTACG GOA CTt:2- AGC CSCO TC3C OCT C+AC.' TIT TTA TTA
GAA AAA CT.C.
KM. t7.17T CA.A CTC AGC AAA AGT TC ATCGAe CAT CAA Are
KanDeietiOn-Seg3 TACACC: takc AAA GC:COAT OGG CGT CTG A.4C 'MC MK:
CVO GAA GTA AC4
(ITC ATT TT C CAG ACT OCG CM' TAA CIG KEG crei CAA. Cli-5 OTC: TAG TrA
TIA 'CAA AAA.=
ATC t7.17T CAA CTC AL t AAA MT TC ATC: GAG CAT CAA ATG
KariDel:4iOrt-Seg4 AAA TCA AAA AAT TAC C.TG CIT CAC TC.I TFC Azke GA.Ci
CAA ITU. TAT ATT
TAT TCYGGT GAT AAA ATT CAC GAT CTG OTT ATO TAA CCA ACT OC:T TTA TTA CAA AAA CK
Alt CIT CAA CDC 2ACK:' AAA ACT TC ATC GAG CAT CAA AR's
KariDeietion-eieg5 TOC t 4AT TYA ATC TTC TCC: ATA CCT ACA akr. 'NT DK
Ci-CC ATI' COT AGO'
ATG AO C AAA AIT C1X.7'oor CiTACIC wa GAT A AO GCG TTC ACG ..... TTA GAA AAA
CTC
ATC ell CAA CTC ACC AA_A. AGT TC ATC GACi CAT CAA Ale
KanDelAion-seg6 IOC OAA COT IGTC. MA GIG TAT .C.iCEA . AAA ATC AGA
AAA ACT CACI
CCT ACA 'TOT TCA IGC c6 Are xi C.TC:- CAA ATC. CG NIG ACT TTC rrik rut akA
AAA CTC
ATC CTI CAA CTC AGC A.A1, AGT IV ATC. GAG CAT CAA ATC'
KanDeleterk-seg 7 GAA AGC MG AX TAA CCG CAC: TTG TCA CTC: TAA TGA TAA
TTA TTT GIT
*GA ACT Ca:: OCK: CFC ACO ICC OOK: CTC AAA IAA TEG TrT TAT TIC TrA Tr& CAA AAA
CTC
ATC =CAA CTC= AGC AAA AGT TC AIC GAG CAT CAA AM
K2nDelehortseg8 OGG A GT OCT GAA. GGA CzT.C.TCze AAA CCA. T AC:. CAC
CAA. CAC+ GCG 1T &L
OCCe GGC AAT reta TAT AAC CAA TOT CIO Mk AA AA O OCA CCT UGC, ITA ID A OAR
A.s..A CTC=
ATC CTFCAACTC_AGC AAA AGT IC A.W.1 k0 CAT CAA ATC
KanDelerit3SI-Seg4 TCA TCTCCA CIT TC:CC-CAAAT AK C:CO T.AC CCA ITO TAO'
CCC TGATA-A
TAT CIC C-CC ATT AAC COT 'ITC ACC CIO CAT OW ICA ACC ATC C,-CA TF.A. TTA
CAA. AAA CIC
ATC CET CAA CIC AGO A.AAAGTIC AM GAG CAT CA.A. AM
K2nDe ieton,segl. 0 GCC TAC .A.AC: CGG 'MC CGC ATC CAG CGC CAT OCA AGT
GCTOGA TAG GCT.
CC...5 CAA TTG.GT.0 CAC AATGCC TGA CFG TA.AOCT carrrA AOC TTA TTA GAA AAA CTC
A.TC: <TF CLA CR:. AOC AAA AGT TC ATC' OAG CAT CAA ATC
KanDeietion-seg 1 1 Ana-cc (KC AGA. COC CGC COC CAG ACA cca err TGT AGA
AAT TOT TIT
A.CG IGA. CAC CGT *Ca ACA CII AAT CIO ACA AAA ATG GTO ATG CAA TTA TTA CAA AAA
ere
ATC err CAA CTC AGC AAA AGTIU AT4.3 GAG CAT CAA ATC.
KArtneietit3G-Seg 1 2. A AT C.C.ICi CTF TO& AAA ersi+GGC CiTC A-AC GCC
TFA TCC GC=C CIA CAA AAT
TAT CAT CVC ACC CCOCC-C CGC AGA CFG CGC"FTAT TCA ATA I AT ms. TTA GAA AAA
C:TC:
Alt CTT CAA CDC AGC:' AAA. ACIT TC ATC GAG CAT CAA AR's
KariDeietion-eieg13 TTAC ATO CC4C TAC GCT TA.4. OGT AAC TTTAGT GAC ATr
TAT OTT
TAT cAc.ow..TAX ;YAG GAT &if GCA. CIO T.AA AAIGTG TGA cirri' /LTA TFA TTA
GA.A. AAA cpc
,s.ac ci-T c.s.A CTC ACC AA_A. AGT RS ATC GACi CAT CAA Alia
KanDelAion-segi 4 CGT C.TC. TIT TTA TCT TTA ATT CiCC OTT TAT C7CC GGA
TC-C GOC OTG AAC GCC.
AAC MA AAC TCCCTCTG ATC 1TA TCC GGC. CTA CAA A.CC TrA TTA GAA AAA CM
err CAA. CD:. ACC AAA iiKer. It AICCAO CAT CAA AK*
KanDeleterk-seg 1 5 CiK TTA TCA CiGC CTA CAT ITT GC< TAA ATC AIT CAC AM'
A_TC AAT TX
CT( CC A.siT A. TA IT ?LAT TTO UK CT .3 ATC: c.Tr Acr TM ATT MA ITA TTA ORA
AAA CM
ATC CT. T CAA CTC ACC AAA. AGT TC AT GAG CAT CAA ATO
73
CA 03027882 2018-12-14
WO 2017/218727
PCT/US2017/037596
Kalineletkon-segI6 CCGTAA CG TGTAAT AAC AT CIA. AGC CT T C:GA TCT CAA
AAG CAT TAT
Gle ACOCAG ACiC ACA AAT TAT ATT CM CAG ACT eAr ACC CIA TFA TTA TTA ilAA AAA CM
_AM CIT CAA:. CM Aoc A,s,A AGT IC Ale:GAG- CAT CAA AM
KanDellefion-seg17 ca.% Tee crc TOA AAG COT IC7 A_AA ACOOGT CAG AIC TGC
CACI AGT CAG
ACO ATA ATA ATGATA TCC. TrT CAA c:To COICAC MA CCA CAA TAA TFA TTA GAA. ,,k.A
ca:
AM CTT CAA. CTC. AGC AAA AGT TC AM GAG CAT CAA ATG
KalinektiOn-Segig OCt.s... CM ATA TM CCG CM CIG- AC7 CGC C7O AGA AAA.
CAG OGG 7. AA ATT
GCG CGT A.k4. C-CG A.ATAGT AAA TAA CTG COO' CGA ATO GCC Gai CT:VITA TTA &AA
AAA=
AM CT T CAA.CTC. AGC AAA AGT7.0 ATC GAG CAT CAA MG
KanDeletion-seRl 9 AAG ATA ACT AAA. CCA. CFG OCT AGA AAA. ATAACC MA TAA
MG- TAO ATC
TOA TAA ATA LkC:C oAA ToG coo' CAA CTO. TOC: OTC Trr ATC:CTO AAA MA TTA GAA
AAA CR:
.ATC (Tr CAA CTC AOC AAA ACT TC ATC CAC AT CAA ATG
KanDeietion-See0
,. cAc+ ICI TAT GAA TAT CGC AAT TrT M.'. AGT AAA AAA T-Ea
TCC ACC. GAG
CG COA ATA CCT CIG GM GTA GAG cm G7G MC Aak AAA A ACAAG TTATF:k GAA .AAA
&TO OTT CAA OTC ACK. AAA AGT7C CM ATC rme CAT:AA AEG
KanDelAip.ti-Segn. ATAT.:LA :W. ATA TIT cat:: MT A.ct-k TCO TFF:rizie
TOC CGT ATA TAT MC:
At.7Ã CCT TTTT Ct7C AM T.A.A. AAO &TT CIS C:AT TAT TCC CAT ITC TGC TTA
TTA GAA AAA Cle
ATC =CAA CTC AOC A..kA MT IC AM GAO CAT CAA ATG
Ka npeietiOn-Seg22 MT CAT G7A AAC CAA ACA GAG ACG MA ICT GIT COG TCG-
C:TA ABC: CAT
A.AT GIC iTi TCA GM CAT TCG CM/CM Tar' GCG CTC CTG CGG GAG TM TTA AA :AAA
CTC:
A.TC CITCAA CM: AGC AAA AGT TC AFC oetio CAT CAA Alta
KmErAelkon-seg23 CTO ATT TAC TGA GGG ICA AA. T TAC AGT GAC TIC TA AAA A
TT AM AGA.
AAA TAT .AGC: GGC: AGG AAA AAA GCG CM 777 TTC ACC; GTG CFG IAA TTA 'TTA G-sA
AAA CIC
ATC CIT CAA. CTC AGC AAA AGT TC AM:GAG-CAT CAA AM
KanDele,6031-Seg24 ATI TGC CGT OTGOI7 AGTOCC 'ITT TIT OCO CCC ACA TCA
TAA OGG ITO
1-7-T, ACA TCG. GTA AGO CIA C:-GGATT CM TCO CAA ATA TIC 'Mk AAT TTA TM CAA AAA
Cre
ATC OTT CAA. OTC AGC AAA AGT TC ATC GAG CAT CAA ATG
K2tiDelletion-seg25 err We TAC TAG TTA ACT AGT GCT GAA CM 7.71k ATA CAA
= GCGTGC:
TCG ATG Arr AAT TOT CAA. CAO CM:: CTG CAA =TIT ATC TIT TECk TTA TIA. GAA AAA
010
ATK: crT CAA= ACC AAA AGTTC ATCGAG CAT CAA ATO
KanDeletion-seg26 ATC CTG C-CA TGT MC MT TGA AAT CGC TGA CAC AAA CCG ATA
TTQACA
TTC TIC AAT CAC: ATC. ITT ATA AAT 010 TOO TCC AGO CCC MA AC& TIA TTA GAA
A/,:k CTC
ATC CTT CA.A.CIC AGO AAA ACTT IC ATC GAG CAT CAA AM
Kft ODeleti Of -.Se7 CCC AAA TGA ITC AW CCU CAO ACC: ATr GCC: TC-C GCA
ATG CTIG TTT TIG
OCT GAACGT AGO AG& GAT CC:A COI CTG TIT TM. TCT CICT ITA TAC n-A TIA GM AAA
crc:
Am CTT CAA CTC A.Gt. AAA AGT TC ATC GAG CAT CAA .ATG
Ka riDeletion-seg28 CiGG tier TIT ATC GTO ITT 'Gel TTA. TCC ACC AAA AAT
TCT TOC CGA ICC TCA
CCG CCA cioci col me. CCT CAA. CTO.s..TO TTA CCA. Ger GAC GIG ATA TIA TrA SAA
AAA. CTC
CIT CAA CM AGO AAA AGT TC AM GAO CAT CAA ATG
K2 naeletion-seg29 TOO CAT 'ITC: COO GIC TOT TM .AAT OTT A.A.G TAG MA TM
GIG CCC 000
TTG TM CCC GGC GTA MG AGT AAA CM' CG A TGT 010 Orr TrA ca: TFA TTA GAA AAA
:OTC
ATC =CAA CTC: AGO AAA AGTTC ATV eAo CAT CAA ATO
Klii.preieteri-Seg30 CAC err AGA .=,...CG COG GAT AAA CTG GOC COT WC GGI
OAA OGC TAT GCC
CAC MA TA.A TIG KT TOG ACE 010010 TOT oGr CIA All MG MA TTA ITA CiA.A AAA ere
ATC C:IT CAA OTC AGO AAA AGTTC MO GAG-CAT CAA MG
KanDellefien-segal TOG CAA CTT GAG CA-A GOA CCA AAC A.AC TCA COC AAC Mai
CAA ACO ATT
COG CAA GGT ACG CIG GOZ TCTTAA OM TAO TOG TCC4TAT TTC:AAG TIA TTA GM AAA 010
ATC OTT CAA.C.TC. AGC ALA AGTT.0 ATC GAG CAT CAA MG
ATA GTA AGT GAC TGG GG7 GAA TGC OTT TGA.C.GA7CT ATT SOT
ATAAAT
Ceti AM TAG CCC: CAG CAC ATG CAA CFG MC TGA TOT Trr TTC ITT TTA TTA CAA AAA
CTO
ATK: crT CAA= ACC AAA AGTTC ATCGAG CAT CAA ATG
Kne1etiOU-See:3 ATC ATG ATTAGC AAA ACT TAA IGA ACT MAGIC TCAGAC CM
11100-C
CCA ITT TM AAT AAA TM. ACA Air= COG MA TO."..s- CTC TOG AAT TTA =TTAGAA AAA CM
. .6, TC OTT CAA CTC AGO AAA AGT TC MC GAQ CAT CAA ATG
74
CA 03027882 2018-12-14
WO 2017/218727
PCT/US2017/037596
KarDeletion-see...1 MG:3TC TC4T TAC AG& TTG AIG CTT Ta} taws. TTC A. CT
TCT CT.T TAC4GGT
GAA. @GC WO CCG CAA AAA akectiA AAT TAA TAG CCOTTA ACT TT.% TTA GAA
AAA CTO
CTG ATC CTT CAA CYO. Ac4c. AAA Afir TC: ATC GAG CAT CAA ATG
Ka rDeletion-seg35 AM.CAA TGA ATA. AAA AGT TAT car ACA GCG CGC: TA CCA
TAC AAA CM
AIC ACT =MT CAT AAA ACA arc Car, OCT TTA AAA TOG CCG Azre TV,. TM eAA AAA
C:TC:
Air tn. CAA CIC ACC AAA. ACT IC AT:: BAG CAT CA. A AM
KanDedetion-seg36 <ice, ATC ITC MT t I I CTG AAT TIG ACC AAT GGC
GTO ACC AC:A OG1 ATC TIT
.:CA CCT ATC AIA CAC AGG TGC CIG ATC CTC TOT IGG Ci.'33 TAT TGT YEA TTA GAA
AAA CTC
(..= CAA OTT: AGO AAA AGT TO ATC GAG CAT CAA AT&
KanDei etion-se IAA TAA Gcne,Ac C CO CAT TGA ATA ACC 'MA CAT TAT (:TA
AT.T AAA
'' 07 OTT AA.C: CAA. IAA COG ATT CCA TAC CIO. AGT WI .A AT AAT AAA.
ACA TTA TTA GA A AAA CTC
ATC (TT CAA CIO AGO AAA AGT IC ATC GAG OAT CAA AM
E¶."122Deleticli-Seg38 ____________ ACA.ATATIT AAI ATA GIG a CI .GIG AAA
AGO OW TAG ATA grA CCA AAT
CCA CAT GCG ATA UT CTT AAA TAA. CTG 000 AAA MG TTA. AGT AAG TTA TTA CAA AAA
010
ATC CTTCA.A CIO AGC AAA AGT TC Aix!: GAG CAT CAA ATG
KanDeletion-se g 3 9 GAT AAA CC A ICA GC:T GAT A.GT .AAT tacm-roc CGA
GM' AAC AGE: Cite
T. TA CCTGAA. GAA TAT AGA CAA GTACT0 ATA ACA ACA ATT AAA GCC T. TA TTA aka.
AAA. CTC
.:VIC CTT CAA cat .Aoc AAA AGT TC ATC GAG CAT CAA ATG
KanDeleion-seg40 CTI TIT. AAA .ATT 001 101 ICC ATG OGG TAT WA OCT ATG
GOT Alt ITC TOT
Cre OGT AAC OCT cc:A akA.A_AC CIG ATC ACC CAA MC ITT TAA CAG TTA TTA GA-A AAA
ere
CTT CAA C:TC AGC AAA AGT Tc .ATOG,AKE CAT CAA ATG
KparDelehon-seg4 I AGA. ACC AGA TM AM CAT TGA TOT CCC 110 ITT GAA 110
AAA AGT CrA
GOT TIC ATC.OTA TGA. LAT TAA TTG CTG 000100 AAA 'GM Iliti CiCT ITA TTA akA
:,s.AA Mr
ATC CTT CAA OTC AGC AAA AGT TC ATC GAG CAT CAA ATG
KanDeletion-seg42 ITT PIA COG GCA. cAe CCA AAC GAGGTA ATT CAG WO TAA.
ICA ACA ACC:
rrr Acc-: GM CCC TAATAC GAC AAA CM CIT GTC TAT AGT TAG MA TM TTA GAA AAA. CTC
.A.TO CTT CAA C:TC AGO AAA AGT TC MC GAG CAT CAA MG
f,:ar,LIjeletion-seg43 ACC AAA. CIG ATI ACTA CAT TOT
TIC AAC. CGC' : TAT ACC:MC TAT CTT
t--ITE TOT CCA ITT 000. IAA AAC CIG 010 OTT CAG GAO. AAT .AAT GCA. TTA TTA.
GAA AAA CM
:MC (TI CAA CIC AGO AAA AGT IC =GAG CAT (A.A. AM
KanDeletIon-seg44 TGA Kat Ck.k CAG TAA CAT TCA AAA MC AGO GAT MT ACC OAT
OAT PTA
ACG TrA AAT ATV TrA MA AGA COT (TO TAG TIT CAA OTT GOOACT TrA .TTA CAA AAA CTC
ATC CU CAA C.IO AGO AAA AGT TC ATC GAG CAT CAA MG
KanDeletin,;-se,-,d'S TM CAA TAO AAT TCT TM.: ock:
ca.7 'tX:G COG 010 000 GAA GCA TAA AAA
IGT AGG ATT AGT. AAG AW . ACT TAI CIG A_AI i3C1C GCC: GAT WC: CGC TEA TIA GA A
AAA CTC
ATC CII CAA CIC ACC AAA ACI IC ATCGAG CAT CAA AM
KanDeletiori-seg46 cce aiT _AT.0 CGC MC AAG AAG .CTI GAC TM (TI CAC TOT
AOC: WC AAC.i
MA 1-3L. TM CC:G CAG TOA AAA .A TO (TO GTA Oak OCC A-AT CGT C4CiA ITA "ETA GAA
AAA. CTC
ATC C. TTCAA CITL' AGO AAA AGT TO ATc.c.,Ae CAT CAA fil'e
KanDektion-seR47 TTC AGT ATA AAA GS& (MOAT .AGT COA. TAG TAA CCC WO CTT
COG 00k
-
&AT TEA CAT TAA 010 CU ITT ITC CT0 TAG CAA GCA Try 11100k ITA TEA GAA
AAA CIO
ATC CU CAA C.IO AGO AAA AGT TC ATC GAG CAT CAA MG
KanDeletim-seg48 ea': anG CCA T.TA TAC: AGA WG .CTA TTA ACT OTA ATA ITT
ake (0-0 CAC
TM COG Arr CiCA TOT ACC CCT TTT cre OCCi CTG COO CIO ATC ACA TTA YEA GAA AAA
CM
ATC CTDCAA Cit AGC AP,A AftTTC ATC.G.'AG CAT CAA ATG
KarDelettm-seg49 OCT CCTGTA 000 11111k Ilk GOT GCATCC AGA AAO TAA CAA
TAG COA
ACA ACG OCT TAT TOT AM TAT Trr CM ACA GAO AAA AAG.AAT .ACG TTA PTA GAA AAA
ATC (TT CAA CTC AOC AAA AGT TC: CIO ATC GAG CAT CA.A ATG.
L'3.1.-Deletkon-seg50 CAA CC:C: WI OCT CiTA. Coo GOT
CAA. ATC GCC: CGA An Itc C:GT GAT MA
TN i ITT TIT GEA WC CAC= TIG CPO AGO GOT GAG AGC A.AA TCOITA TrA GA A AAA CTC
:VIC (Tr CAA cm' AGO AAA. AGT TC ATC GAG CAT CAA &TO
Kainelehon.-seg5 1 TIC AGO COT ITT TIC GCT ATC: ITT GCG GTO AATAAT GM
OAT akr OTC GAA
GAO AAA AAA TAT CAA en-.1.01010.ATC .AIG ACA 00000k CAC CGC TTA TTA.Cks-k AAA
010
CTT CAA CTC AGO: AAA AGI TO: .ATC GAG CAT CAA MG
CA 03027882 2018-12-14
WO 2017/218727
PCT/US2017/037596
K-inDeleticn-eg.52 ITT ATT. GIT A.TT .AAA CiAG ATT = ACC OTT CTG CiCC
"'Roc GGA C-TT GIA CAW
AAG crA AAG ATG AAT T.TC GTC CTGATG eTGATA ACA CGC OTC AA0-17.4µ.. T. TA' CiA-
A. AAA csrc
cm, CAA ore ACC AAA AGT TC: AD:: CAG CAT CAA ATG
Kar.Dele.h00-Seg53 TIC TAG GCC GCA COI:. CAC AM CAA CIAA GAA :AAA TTC:
CCC TPT COT TAT
COA CAT -ICA CM CCT GAT CCG AGG CIO CAA CAA TAA TITACGTAC rm. '''''' GA A AAA.
CEG
AT:: CTT CAA CIG AOC AAA AOT TC ATC OAG CAT eflut AM.
KaaDelthon-seg54 AAT COW OCIC GAA AAT CAC CAT TAT. CTA CCC CIC TAT MG
TXXI OTT ACT
AAA ACGGCT C.,T CAI GGT CC-I-ACC CTO CGT TOC.A.A.A. cer T,.....c COG ITATIA.
CAA AAA CPC
ATC: CIT CAA C.TC AGC A AA AGT TC: ATC OAG CAT C:A A ATG
KaiaDektion-seg55 ATC ACA LAG GAA ATA mc CTG Ca': GAT MT OCT GM OCT =
GC& OCT
ACC AOC: AGT CAG AGA CAT AAC TOG CT ACT CITA ICA GAA TM. CU: ITA TEA GAR AAA
CM
.AIT: CFI CAA CM:: AOC AAA. AGT TC ATC CiAG CAI CA.A ATG
KanDeletion-seg56 TAT OCT CAC TCA ITT GAT CCA CCA. TAG 3.i..b:FIA ACT
OTT 11A. GAC ITA
TTA TGC CTTCHO CCG TGA CTAcre ATA AAA -r.4_Lt. rrr GAO err rrA TIA
Calt AAA CTC
ATC CM CAA CTC ACC AAA AGT TC: AT.0 GAG CAT CAA ATG
KanDe14ion-seg57 ca..: OCT CAC CAA GOC .AAA CCC AGA. OCT ICC GOC:
RTC:CA 'Ka TGA TOT
A-TA GACATO TGG TCA. GAC ATA GCCi Cle CCT TAT ATT MG CAT TCCTTA ITA. GM. AAA
crit
Alt 3= CAA inr. AGC AAA ACT TC ATC CAG CAT CAA ATO
K2iDe1tion,seg5.6 ATI TAT TCG CCT :XX GTC CCC ITA.C.M CAA TIC CTO CTGµ
CTT TOT AAA
0..-4:: GTT OTT ACT CT T CCT MT TCA (TO GCA C.-CKi C..C. CCT ITT TIC TTA
VIA GAIA AAA CTC:
ATC CET CAA SITC:.AUC .3AA ACT TC ATC: CAC CAT CAA AT&
KaaDeletion-seg59 iriGA.c*AA AQ c cm UTG TAT ACT GAT TAT fiGe GAG CAA
OUCCAC.i AIA AAC
cur cAc: CCT TAT AAA ACLI: CCC =CM CCC ACGTIT -MG CGA TCG TIA TIA CAA AAA CT.0
ATC CM CA.A CM AGC A AA AGT TC: ATC OAG CAT CAA ATG
KmDeletion-seg60 AAC AAC: CCO TAG GCC: CGA CAA T.:U. A.GA AAC: CA G GGT
GM ATC OTC: IOC
CAT GC:G CCA CCA TCG CAT CCG OCA CTO GM GCA TOT IAA COT (AG ITA rrA .G.A.A. A
AA CTC:
ATC: CFI CAA ca: AGC AAA ACT TC AM GAG CAT CAA ADO
K21iDeietio13seg6 1 AM: CC G ATO ACT eTT Icc: Krr AAJA CAA-MC:CM TrA
Acorn .ICTDIA.
ecT urr. CEC: TIT TAT ACT CiTO- OCC: GIG OCT TCTTC1GAA Ai3T Cii,-A TIA TTA
GAA AAA CIC
ATC OTT CAA CTC ACC AAA AGT TC AT.0 GAG CAT CAA ATG
KanDel4ion-seg62 CIO AAA -ICC TM: TCA. ATC AAC: TOC TGA TOG CCA AAGTCC
CTC ACC: AGT
GTC ATT TOT ACA. ITT TOT 000 CTT (TO rfc CAC MC. AAT A A_A. cs-3-T TTA
TrA &IA .LAA CIC
ATC CTI CAA CM. AGC AAA AC-lit ATC GAG CAT CAA AM*
itialipeia100,.Se.g63 TGA CGA CGCOGA CAA. Car' GAA COG TIO.A.GC TOG CD.
GAT TAG CCA CCC
ocT AAA TAC .AGA CAA. CiTC ATA GA.A. GIG AAT. m TIG -TAT OTC TOT- TTA TIA CAA
AAA C:TC:
AFC. Cal- CAA. CTC: AOC AAA AGT TC Art:: CAC CAT CAA ATI:*
KaaDeletion-seg64 TIC- CAT OCT akA AAG CCT: GT: _____ ACT GAA COG TCC
t..1..: i- CAM 0:X: TIT eco
TIC AGO ATA CTC AAA TOG AAA CC-C CTG GAO AGO OTT AGO UM AGO TTA. TEA GAA. AAA
ATC CTT CAA CTC ACC AAA AGT TC CTC: AM OAG CAT-CA-AM*
Kas-De ietion-seR65 CAT CCG WC ATO GTG GCC COI AR: TAA AAA GAT. GAT= AAT
AAA TCT
TGA Arr TEA CAT CCC: GTA COT TGC CIO All AAG .A AT GAG- .ATG GAG 'TA ITA
GAA AAA. CM
ATC CTI CAA CIC AEC A AA AGT TC: Ate. GAG CAT CAA MG
KmiDeIetio3seg.66 IAA GTA AAC CiAC RCP. AAC ACE. OCT AI.A AAG CAA CCC
OCT TIC TCA OCT
TIC ATA.AOT AAA ATA ICC ACT GIG CTC TTGIAG &Xi AAC AAT AAG TTA TTA GAA AAA C1C
. Arc crrr CAA CTC: AOC AA AAGT IC Arc GAG CAT CAA ATG
KanDeletion-seq67 TIC AAC TOGTOO (CT OW AOA CAC IT.GGCA TOT =TTO GOA ITA
TIA ACiC
AGA AAT rrA AAG TTA AAA. AAT .AAC CIO TGis.L. CAA TIC ATA CGATTA 'rrA TIA GAA
AAA CTC:
AM CIT CAA C.TC AGC A AA AGT TC: ATC OAG CAT CAA ATG
KalDe ietion-sega CAT ICU TCA _ICA An TGA ACA OTA _ACC* CTA AACI TCT
C1.1 ITC AAA OTT
ACA. CAA. TAG TC/A. CCC: ACA TTC CCO CT GCA Tn.' TIC TAA &TT TGT TrA. TIAGAA.
AAA. Cif:
.Alt CIT CAA. CFC: AGO AAA ACT IC ATC GAG CAT CAA ATO
KarDeietion,segt69 ACC ACA CCA AA0 CGA AAA ACT TIT TEC AAC: TAT Cit. TOT
AAC: CC1 TOG
OM COG AAA GAG TOT ea. TGA AGC Cm CM TAIL ATT COT GAT AOC: TTA TA GAA AAA CTC
.ATC C1-*Int CAA Ca-. AOC AAA ACT TC ATG. GAG CAT CAA AM*
76
CA 03027882 2018-12-14
WO 2017/218727
PCT/US2017/037596
KanDeletion-seVO ACCi TOA CIGGCG A_AA TCTTCG ITT ATT GTCGGC AGT WC
A.Ci..s.. AGT LAT
, CCA car GGT .AAC AGC ITT ACE ACP,. CYG TCA TGC GCG CCO GAT
GGC TTA TTA GAA AAA CTC
ATC CT CAA CTC AGC.A.AA AGI IC .ATC'. GAG CAT CAA AM
KanDeleti-Ti-seRTI AGG CGC TGA MG CGA AGT TAG AC ATC GTT OTC CCA TGG-
AGC TGA.TGA
CGT AC<:: OTT TAT GCC CGA laza TAT CM CCA. ICC TGC GOT CAC GIG TEA TIA G..s,A
AAA OTC
ATC CIT CAA. CTC AGC AAA AGTTC AM'. ciAci CAT CAA ATG.
K2iLaeleti011-Seg72 CCG CT0 GC& ACG- COG AT0 TCC &AG CAC CTT ANT TAT CGT
CC4:: All CAG
CAT CAG ()Cie CAG CCCGTTTAA GCO CT G AAC AGICTC GAT GCG AIG TTA TEA CA-k ALA.
CTC
ATC CIT CAA ca: Aoc AAA AGT TC ATC GAG CAT CAA ATG
KatiThAetion-seg73 CGT CT.A. AAC ATA ATA TCC CET ATICTTMA caa AGC TAG-
ITA.Tai ,XiC
TAT CiGT GCA AAG AAA GAA TTA ACG CRT' GGA GTA TIA GTT ACG CTT TrA .TTA CAA AAA
CTC:
ATC CIT CAA CM AOC AAA AGT TC ATC GAG CAT C:s,A ATG
Katetion-se,g74 AAT TATJTG TC:CT MA TGA ITT Gars TGA AAC AGT CAG
77.__CCG CEA AA
AAA IGT 7Tr OTT TTA CAC TCT OTC: CIO TIC CAT GEC GGA TAA GCC .TTA
TrA.:GAA AAA CTC
ATC en' CAA CIK: AOC: AAA. AGT TC ATC GAG CAT CAA ATCi
KanDeriet0/1.-Seg75. CGC: TAT TAC: A.W AAT ATI' Tyr TAc: ATT TCA TAG TGA
TGC TCC TIA C:TC
CT AT GAA CGT GCCGGIA AAG CGA CT. G TTG AGA CAG ACA COT TAG TIA TTA CAA AAA
CTC
ATC CIT C.'...,,k CD:: A03:: W. AOT TC ATC ako CAT CAA ATCi
Kanagetiefl-Seg76 ATC AGA TIC ACC GAI ATC 0CC cfro AAC ATA ATA .4-kT QA
A-A2k AGA AA-xl
7 III ATT GIG GGA. TTG. M.X. CM CIO CGC CC AC ACO CAT ITT. ITA TTA
GAA..AAA C.TC
ATC CTT CAA CTC A.GC AAA AGT IC ATC GAG CAT C.A.A AIG
KanDerMion-Se77 GCA GGA crr ATI" CAT"ITC MO AAA TCA GOG AAG ATCS AAA
AAA err CAO
AAT TTT ATT ATT TTA TIT ATA AAC Cal GAT. GGT A AG AAA AAG A_A.A TTA ?TA.
GAA AAA
ATC CTT CAA Clt AGC AAA ACT lc OTC ATC GAG CAT CAA ATE
KanDeie4ion-Se.278 ATO OTT AOT TTA TAT ITG CAC COT AT T AOC TIT TCG (CAT
TAT ACC ClX
-. TCC COT T31.14-CTT TGC ATA CM GAT CIG TCA ACA GAG CCTOIC
TCA TTA TTA_ GAA AAA CTC
Ale T CAA crc AGC AAA. ACT' IC ATC GAG CAT CAA ATO.
KanDeletion-seg79 ACC AGA .ACC MGM': ATc Aur CAC n-r TAT TAA CTC AC-C
ATT ATI: ITT
GAT TIT crr TOT CAT LAT CAT TGC CTG AAA CAT CAA ACC ACT TAA TTA TTA Oks.
AAA CTC
ATC CET CAA cm: ACC AAA AGITU AT GAG CAT CA.A. AT&
Kanneietica-seg80 CCO TAA ..6,AG ITT CC:-G MG ..AAT AGA AAC ACA GIT AAA
AAT TGC AAA A_GA
GAG ATC -LTG cak TTT MT TAA IAA CIG TTT TIT AGA ccrCK.A OAA =TTAITA
eA.A.AAA c.:Tc
ATC CIT CAA CIC AGO AAA A:01'TC ATV tam:: CAT cAA ATe
Km:De ietion-seg81 AA.T AAA TOC GTO AAA. AAC TIT CAC CCT AAC CET- CIC
CCC ma ',lice caco
ACI=TGC AM Aak ACT ICA TACTIC: CTe....= AGGGGA CCG ATT GM CTC TTA ITA.GAA AAA
CTC
ATC CTT CAA CTC AGC AAA AGTTC. AT GAG CAT CAA ATG
KanDeietion-seg82 CAC CCC AAT OGG GAG AOC: GAG CAT TOT AAA C:AT TesA
.ATG TrT. AR=
.AAA ACC, AG.::' CC A ATA TT3: A.4.T ATC CR= TKA Tak TAT f:AA c7r ei".:GTTA
.1TA GAA AAA Ca'
ATC. CTT CAA CTC AGC AAA AG= ATC GAG CAT CAA ATG
KiliDeletion-seg3 'ITT CM TAAG TO .ei.GA ACT TGA AAT CAC CGT TTG CTT AAA
AAT GGA TTC
ecar ITT ITA TTA ACA CAT C:Mi GAT CTO TAG CAT. C.00 ITT TTC MIA ITATTA GAA
A-kel.C:TC
ATC: CTT CAA CTC AOC AAA ACT IC ATC OAG CAT CAA ATG
-Kaaneietion_seg84 AA.:-.: AGA CIGATC GAG GIS:: All AAT .LAG TIC ITC TOG
COT AAI .4.AC CCT
TIT GAG- TGC AAA. A.A6 TC. TOT AAC: CTO GAA ale' C.:CO .GCT TCG OTT TTA TTA
ciAA AAA CTC
ATC CIT CAA CTC AOC AAA AGT TC ATC GAG CAT CAA ATG
KinDeletion-seg35 CAA TAA. GOT GIG TIT ATT TAT ITT nT. TAT TTC TAC TGA
TAA GAA TTA
CGC ea:G. CAT AAA AAA ACC (TT ACT CTG CAA WC ACA TCA CGT TAT 'MITA avi AAA CTC
ATC CTT CAA CIC AOC AAA AGT IC ATC OAG CAT CA_k ATG
is:anDeietion-Seg86 GTO ATE. AAG ATC AMICA GAA ATC CAC ACA GAG AC A TAT.
T.GC CCO. rirta
AA.T TGITAC ATT .ACT ATG TTA CGC.CIG CAL TCA OA WA AA_A GCT TTA TTA CAA AAA
CTC=
ATC CTT CAA CTC: AOC AAA ACT IC ATC GAG CAT CAA MG
77
CA 03027882 2018-12-14
WO 2017/218727
PCT/US2017/037596
Table 4. MASC primers used for analysis of recoded segments.
Primer Sequence.
cA.:Asic.T.kGA.C:f_AAGGCATGT-C.A
121.PCR-seggI.1.. Reverse CGAIATTTTCCCGTGGITCTGAC
mAsPCR-seg00. I Wad-Type CAAGTTAGACGAAGGCATCT'AG.f
raAsP=CR-ses,.,00:2...R.ertrAed (...G.A(...:C:ATGGCGATCTIC.AGC:
imksPCR-seg0Ø2.. ResTrse TTCCAGGTATTACGC.ACIAAATTGTTC
mAsPCR7seROO:2..Wild-Type CGACCATGGCGATTTACAGT
tnAsPCR-seg.00.3..Recaded CTTACCGCGCAAPsATITCATCTC:..A
tnAsPCR-segg.I.3.. .Reveae TrITTACGCAGCACTACITGTATATGG
mAsPCR-segO. 3.. Wiki-Type ITAACCGC:GCAA-AATTTCATCAGC
inAsPCR-seg00.4.. Receded CCIGTI7TCACACTACC.: 'GTTCA
mAsPCR-seg00.4..Reve-rse TTAATTIGCAT.AGACCGTITTCAGAGT.
rfLk,sPCR-00.4. .Wild-Type (.:CIGTTTAGC:C.ACTACCGTAKK.:
imksPCR-seg0Ø.5..Ree...aded ,CGGGAAGTGATGTTTTATCTGAACC
mAsPCR7seRC.4.5...Revesse ACTTTOGCAIDTGGCTTC,TG
niAsPC.R Wi kl -Type CGGGAAGTGATGTTTTATCTCAACT
alAsPCR-seg00 .Recoded TGC.Cc3TCAGGGAGATAATTTTAG
mAsPCR-segO. 6...Reverse CCCTGACCAACGCC:AAAG
g006. :Wild-Type TGCCGTCAGGGAGATAATTTTC2'C
mAsPCR-seg00.7..Recoded CA.C.:CGATGAAACAGCCCAAG
C&TTTT&TAGCCCGCTCT
-seg00,7..Wd4 -Type CAC:CGATGAAAAACAGCCCCAA.
niAsPCR7se.R00 .8...Re:coded CCCGAGTGTGTATTCA.GGTTCAAT
131AsPCR-seg00.8..Reverse CCTGGACTTCGGTTTCACG
tuAsPCR-seg00.:E.. _Wild-Type CCCGAGTGIGTATTCA.GGTTCAAA
13 Lk&PCR-segOl. 1..Re.cz CGTCTiGGAAGAGCACAAAGACT
triAsPCM7se.R01.1...Revesse AAAAAGTTCAAAAATTC:GC:TGTGGAG
mAsPCR-segat I .. Wild-Type CGTC.TC' '-GAAGAGCACAAAGACA
rfLksPCR-seg01.2...R.ecedtd TGGA1CT. CAGATACAGAATC.AGAAC
tn.A.saTN.7..F..-segD1.2..Revene .AGCCACTGAT:GCTGAAC-GG
mAsPCR7se.Rfn TGGATCAGCGATACAGAAAGCGAAT
tuAsPCR-segOI .3..Reco3ed GGTGCAAGCGTAACCTGTAG
78
CA 03027882 2018-12-14
WO 2017/218727
PCT/US2017/037596
ra-UPCR.-sege1..3..itever&e GACTATTTCTACGGCACCATTCCC
niA.sINCR-segel.3..Wite-Typc GGTiLAAGCGTAACCIGCAA
.4..Recoded CGACCGGGGGAAAGATAATGT
sn.AsPCR-segDI.A.:Revem .GGCTGGGTTGGCCTTTTAAA
raAs.PCR-segel A.:Wild-Type CGACCGCGGGACAAATAATGA
mAsPCR-seg01.5..Reeceed GTGGTTGCGGGTTTC4GTTAG
mAsPCR-mtel...S. _Reverse GCTGGTCCGAAGCCTACG
inA8PCR-seel.5...Wile-Type .GTGGTTGCGGGTITGGTCAA.
suA&PCR-seel .6..Recoded CC.,..kACCICACGIGACAGA_AATAG
atAsPCR-segril .6..Revcrge .GGAIGACCGCAATICTGAA2s.2C-',
IDAOCR-se.g:01..d..Wild-Type CCAACCTCACGACTCAG_AAATAA
mAsPCIL-seg01.7...Recoded GCCCGCCAGGTTAAAAACT
7...Reverse CAAGAA.AATTCAACATCATCGGTGTAAT
113.A5 PC11-se 41. 7. Wile-Type G'CCi..GL.LAGTT..AAA.A.ACA
mA8PC11-aegDI.S.Recoded. TAGTAGTGGGATTGTAAGAACGCATC
ittAs.PCR-seg01 .g.Reverse IGGITAAGCAAACGGAAGACATTC
ra-UPCR.-sege1..3.Wil1-Type TAGTAGTGGGATTGTAAGCGCATA
ra-UPCR-sege2.1..itemeed GGAAGAACATGCCAAC .T.TIATCTCA
mAsPCIL-sega.2. _Reverse CCACCGCGTTGTTCAGTTC.`
sn.AsPCR-se,02.1..Wile-Type GGAes,GA¨kCATGCCAA.C.IT.L.s.TCAGT
sn.AsKa-8e,02.2.:Recoded GCAGATCTGATTGTCGCCTC:A
raAaPCR-seg02..2..Reverse TGTAGTTATC-CTCrCCCGGAAA
mAsPCR-mte2.2..Wi1d-Twe GCAGATCTGATTGTCGCCAGT
mAgPCR-seg02..3..Reeoeed 'TGA.A.GAAGTACTTATTGAAAAATGGCTATCG
inksPcR-see2.3_Revetse CA.GCCTGACACTAGCACTGT
raAsPCR-segii2.3..Wild-Type TGAAGAAGT4TTGATTGAA2s.AATGGC.TAAGT
raAsPCR-segii2.4..Recoded TTTTATTCACGCGTTTATACATTTCCGAT
mAsIPCR-sege24..iteverse TGCGTACXGGTGAAGGAAAA
.1-11A1PCR.-sega.1.4...Wild-Type TTT.TATTCACGCGT TTATACATTTCCGAG
mAsPCR-sega.1.5....Revaded GCAATGT.ATCTGCCAATTTTCCATC
mAs PC11-ae 42.5 . :ReN'EP3E CATGTCATCCGAGTCTGCGA
inAsPCP-seeg2.5...-Type GCAATGTATCTGCCAATTTTCCA TT
ittAs.PCR-seg.G2.6..Recceed GGTGAGGGCAATAATCTTTACACG
ra-UPCR-sege2.6..iteverse TCTIC.iCGGGIGIGGTATATGC
ni,UPCR-seg02.6..Wild-Twe GGTGAGGGCAATAATCTTIACA.CC
raAsPCIL-sega.2. 7...Recoiled CA.GCAC:GAAGA_TGGICACTCA
79
CA 03027882 2018-12-14
WO 2017/218727
PCT/US2017/037596
tuksPCR-seg02.1.Reverse GATACCTTCCTCAGCACCTTCC
mAsita-Fieg02.7. Wild-Type CAGCACGAAGATGGTCACAGC
raAs2CR-seg02.3.Recoded GCACATGGGGTTTA_AACGGTAG
tnAsPCR-seg02_8 . Reverse. AAACTTCGTTAATTC:GCATGGTGATAA
tuksPCR-seg02.2.:Wiiti-Type GCACATGGGGTTIAAACGGCAA
mAsita-Fieg03.1. Re,akied AGA 'G'CCGAAAAGCACTIGTICG
nuk,sPC:R -se.031 _Reverse GTTTTGGCAGCATTAGTTTCAGGA
nrAsPCR-8eg0 I . Wild-Type GCTCiC=CGA.A.TAACACTGTTICT
tuk5PCR-seg03.2. Receded GGTGGTGCCTTTGTCGTTA
in_als-DCR-seg03.2..Rew-rse GGGACGATTTAAACCACAGATAAAGT
nuk,sPC:R -se.03. 2 ..i-Type GGTGGTGCCTTTGTCGTTT
mAsPCR-8eg033..Rezzoleti CAAAATC_AJ,,_ACAGAATATTGTGCTCTGA
rrsAsPCR -5egta 3 _Reverse CTGGC:CTATATCTCTGCACTGG
imals-DCR-seg03 3. Mild-Type C:AAAATCAAACAG;kATATTGTGCTCACT
Re,akied TAGCATGCGAGAGTCTGAGTAAAGT
.Inks.PCR-seg(13.4..Recerse ATIATCCCTCAGGCTTCTGTTCG
IrsA5PCR-5e43.4..WM-Type CAAC ATGCGGC:TGTCACTGT ATAAA
tuksPCIt-seg03.5 . Receded GGTTACGCAGTTCGAGTGA
mAsita-Fieg03.5. _Reverse: GCCRA __ 1 I TCCCCCGAAC
raAs.PCR-segl3 .5. _Witd-Type GGTTAC-OCAGTTCGAGGCT
tnAsPCR-seggid. Rezzded CGACTT ATC7GACGGCCCTATC
tuksPCR-seg03.45. :Revers& CGGATGTAGCTGATC=CGGTA
CGAC:TTATCTGACGGCCTTAAG
inksPC:R-se.03. 7 _Receded GTGGAGGATAGTCGGTATGATG
mAsPCR-8eg033..R.Rverse GCC:GCTAAALAGTCCTCACT
tuk5PCR-seg03.1..Wild-Type GIGG'ri.s..CGATAGTC:OGAATACXTGC
in.ks-DCR-seg03.8.Reeetied ACGGTCATTAAAGTTCA.ACTGTCA
nuk,sPC:R -.seg03. F. Reverse TTAC:CAATCGCT.A.CGGTGTAkT.CA.
mAsPCR-seg03.3. Wild-Type Ai.=ISGTCATTAAAGTTCAACTGAGC
IrsAsPCR-5eg4. 1 ..Reenoded TTTGMCGTCGTGAACTGAAAG
imals-DCR-sege4.1..Rew-rse C:CGTCAACTGAGCTGATTTTCATC
TTTGTGCGTCGTGAACACTTAA
.111As2CR-se04.2..Recoded CGTACTICAGCATCTTTACGGATATCT
rrsAsPCR -5e44. 2 _Reverse TCTTTACCACCGACX:A.(X:AG-
tuksPCR-sege4.2..Wild-Type CGTACTICAGCATC:ITii CTGATATCG
mAsita-Fieg04.3. Re,akied ACA TCGACTCTACCCAAG 1 ICA
CA 03027882 2018-12-14
WO 2017/218727
PCT/US2017/037596
znAsPCR-seg04.3..Revene TCAACCTGGTCCGGTGAAC
gssA5PCR-3e04.3..Wiki-T:tve ACATCGACTCTACCCAGGTCAGT
tuAsPCR-seg04.4.1tecode1 GAAGAG.ATCAAAGAGAAAGCGCTATC
iliA&PCK-Ezg-04.4.1teverse AAGTCCCAGTGCGCGTTT
IlaAsPCR,seg0-4 A:Mt:I-Type GAAGAGATCAAAGAGGCGTTGAG
raA.sPCK-Neg04.5...Recoded CGGCACCGCATATCAAAAATCT
issAsPCF...-seg04.5..Reveme ACTGGCACTACATGGTTCATCAT
zu.A&PCR-seg04.5. Wild-Type CGGCACCGCATATCAGC
snAsPCR,seg04.6..Rec.,oded GGCATTTACTTTATCACCGGGTTAG
ntAsPC:R-segt14.6..Reveae CAGCTATCATCTGTGGGCGAA
ret.4.sPCK-mg04.6..Wiki-Type GGC:ATTTACITTATCAOCGC.-GTC:AA
121A2PCR-seg04.7..Recaded GTAGTACTTTGGGATTTGAGGCAAG
1.11A&PCR-8eg04.7..Reverse TAACCT:TCTCTIII:GCGTAC
ntAsPCR,seg04.7..Wiki-Thle GCAATACTTTGGGATTGCTGCrTAA
paAsPC:R-seg04.8.REcoded CACCTCATGAAGTTGTCCATCTGA
IMEAsPeR-RiegN.E.Rev &:;e GCCCGTC.CGCTTTTTAACTC
IllA5PCR-seg04.8.Wild-Type CACCTCATGTATTGTCCATCGCT
lar.A&PCK-eu-0.5..1._Reelxied is-AAGATCGTGCGGAAGAATGGA
suAsPCR,se05.1..Revelse C.:TTAAGCAGATGAA.AACCATAC ATTE' TAGIG
mA&PCR-seR-05..1. Wild-Type ACAAATCGTGCGGAAGAATACT
sssA.s12CR-1:e05.2..Recoded AAGACCTATAAAGCGATGGIA.A.2s,AGATCTA
mAsPCR-seg05.2...Reverse GCCATATTAT I TTCC:CTGCATTCAA
issAsPCR.-3e0.5.2..Wiki-Tn-Ne AAGACCTATAAAGCGATGGTAAAAGATTTG
traAsIK:R,seg05.3..Recadtd GAGTTCCA.GTTCGCTCAAATIX;A
raAaPCK-Ezg-05.3._Revea-se CCCAATGGCTGCTAACGC
CR,segO5.3..Wiid-Typt GAGTTCCAGTTCTCTCAAATCGT
mAsPCR-seg05.4.Recoded GCTCTGACTGAACCTTCACAG
rnAsPCK-mgC .5.4.Reverse CC-',TAGTGGGGATGCCAGATC
GCTCTGACTGAACCTICACGC
ra.4sPCK-Kg05.5..Recoded CGGAAGAGGACTCACGCCTT
tnAsPCR-seg05.5..Recesse CATACAGCCAGACAATCGAAAAAGAA
mAsPCR-segOS.S. Wild-Type CGGAAGAGGACTCA.CGCTTA
mAsPCR,sega5.6..Receded TCTACATGTAATACC-GTTGAAACGCTA
mi..laPCR-seg05.6....Reverse GAGTCTTGTGTGCCGTGTTC
ffset8PCR-mg05.6..Wiki-Type AGC:ACATGTAATACGGTTGAAACGTTG
IllAsPCR-seg05.7.REcoded ATGCTCTATCGTCTACAGC:.AAGTT
81
CA 03027882 2018-12-14
WO 2017/218727
PCT/US2017/037596
2171A&FC:R-se,g05.7.Revene GGTGGGTAGATGCTGAGTGATAAA
mk73PCR-seg05.7.Wild-Type ATGC7CTATCGTITACAGCAGGTC
sy:A&PCR-NeR05.S.Recaded GGT AATTTCAGAATATIGGACKAAAC:
frAsPCR-see$5.8.Reverse ArIcTe __ ILGGTAA.WaTGAGTTCATTA_AA
trAsiz-CR-segOS.B. Wild-Type GGTAATTTCACIAATATGGTGGACAAAAAT
ni2UPC:R_-geg06.1.Recoded AGCTGATTGTTTTTAACCGTATTAAGTATAG
..7.22_,AsPCR-segt16.1.Reveae CTGGGGbt¨TCGATGAAGTT
niAsPCR-Eeed. 1 .Wild-Type A.ACTGATTGTTTTTAACCGTATTAAGTATGC
InAsPCit-seg06.2..Recoded GATTGCAGTGAGTGGCTGA
sylAaT,CR-seR-06.2._Reverse TTAC:CGATCTAGCAGAAGAAGCC
inJUPC:R-segt.16.2. Wild-Type G.ATTGCAGTGAGTGGCGCT
1p-AOCK-8eg06.3...Rec2ded CGGAAAGGGGTACTAGCACTT
InA4P(.7K-R06.3..Reverse GGAACGACCGCTITTAGTGC
mAsPCR-8eg06.3. Wild-Type CGGAAAGGGGTAITGGCATTG
211AsPCK-NegNi.4..itecoded CCGTCAAAAGCTGCGATTG
InAsPCit-segOtiA..Revene TGAGCCTGGCGATCTGTTC
InA.sPCR-1:e06.4..Wild-Type CCGTCAAAAGCTGCGATGC
inJUPC:R-segt.16.5..Reeaded C:GGC:GGGATATAACATGACGA
mA.00K-Neg5. _Reverse GCACTAGGTC:AC:CAGCAAATC
mAsPGR-segtld.S..A.Vild-Type CGGC:GGGATATAACATGAGCT
mA5PCR-8eged.6....Recoded CC.ATTGGACGTTTCACCTCA
luMPCR-seg06.6..Reverse GCGTCCCTGCTCCAGAAG
211.4aPCK-seu-06.6..Wild-Type CCATTC-GACGTTTCACCAGC
ir,A.sPCR-1:e06.7..Recoded GGGGTCATTAATTTCATCCAGTGA
illAsPCR-8eg05.7..REVErSE CTGC=GGICAGTCGGTGATC
CF.-R06.7..Wild-Type GGCC:,TCATT4_4TTTCATCCAGGCT
mAsPC:3--seg06.8.Recoded CTCGICTIATC:ACKTAGTGTCA
211AsPCK-NegNi.g. Reverse GTGACTGCGOC-CTIATCGA
311.. ksPC:3L-seg05.8.Wild-Type GCGTGGTTATCLAGTTGGTGAGC.
2171A&KM-seu-07.1._Recoded TGA.GGCTCAGITAGTGTCGTC
nAgPCR-seg,01.1..Revevie TCGATGTECC:IGTCCTGCTG
arAsPi7R-8eg07.1..Wild -Type TGAGGCTCAGTCAATGTCGTT
TakiKR-segr7.2..Remied GCTGGCGCTTTCGGATCTA
mAsPCR-8eg07.2. _Reverse GCAAAGCGCCACCAGAAAT
mAsPCK-Neg07.2. Mild-Type GCTGGCGCTTTCGGATC:TG
311...4.sPCR-seg07.3..Recoded GCCCAGGAC:GGTAGGATATCA
82
CA 03027882 2018-12-14
WO 2017/218727
PCT/US2017/037596
mAEPCR-segCl7.3...Reverse GTCTGGC-CIGGCCTGATG
ITIAEPCR-8eg07.3..Wild-Type GCCCAGGACGGTAAGATATCG
mAsPCR-seg01.4..Recoded GCGTGACTCCTGCTACGATC
ITIAEPCR-8eg07.4..Revene. CCCIr..7GCAAGTCGAAAAGC
mAsPCR-segi31.4..1krild-Tnle GCGTGACTCCTGCTACGATT
mA.sPCR-segg.7.5...Recade41 TCAGGAAATCTGTGCAGAATCAAC
mAsPCR-segi31.5..Rever8e ITTCGTTICACAGITCTATCATTTACGTAA
mAsPCR-segg.75. Wild-Type TCAGGAAATCAATGTGCAGAATCAAT
tnAsPCR-seg07.6..Recoded CGCATCAGC4-A,AC:GGCAGA
mA.sPCR-segg.7.6..Reverse CGGGTGACTGGATCTATGTGAC
tnAsPCR-seg07.6.11ri1ii-Type CGCATC-AGAAAACGGCAGC
taAF,PCR-segis1.7..Recodett ATAATTTCTTG'CGGATGATGACG_AAG
tnAsPCR-seg07.7..Rever9e CATTATTCATGTGGCAAACGs.TATCA
inAsPCR-seg7..WiLd-Type ATAATTTCTTGCGGATGATGACGTAA
tnAsPCR-seg07.8.Recoded TGIAATGIVICATICTACCGATCACTC
mAsPCR-segis7.3.Revene AGAACCTGTACCACTGCCATTG
rrski-PCR-segOl.S.Wild-Type T#.3TA.ATCAIGTCATTCTACCGATC.ACAC;
inAsPCR-segis.8.1.Recoded CATGTTGTCCATCAGTTCTTTCTTTTTT
TrsAs.-PCR-seg08.1.Puever9e GACCGCGTAA.CCATCGACT
mAsPCR-sega3.1.Wi1d-Type CATGTTGTCCATCAGTTCTTTC*TTTTTG
TrsAs.-PCR-seg08.2..Repadeti GTCCCTTGATTTTCTTGACACGT
mAsPCR-se0,32..Reverie AAGCTGAACAA-LAAAATCCCACCA
-g( ....Wild-Type GTCCCTTCATTTTC=GACACGG
n-lAsPCR-sega3.3..Recoded AGCATTAGAAGTCGCTGGTGAAG
inA5PCR-,segin.3..Reverse GITTITGCTCAGAACGCCATGT
mAIRCR-segetS.3. Wild-Type AGCATCAATAATCGCTGGTCiTAA
tnA5PCR-segin.4...Recoaed TCATTAGTGACGCGC-GAAATG
mAIT3CR-segetS.4..Reverse GATGCATG..' 4AAATCGCG.AGGAG
ra_ks-DCR-seg,084..-Wrikl-Type TCATTAGTGA.CGCGC=GAAATC
mAIT3CR-segetS.S...Recoded CCTGAGCAATTTCATCGGATGA
in_ks,PCR-seg,08.5..Reverae CGGGTATCTTACTCATATCGCTATATTCA
mAIRCR-segetS.S. Wild-Type CCTGAGCAATTTCATCGC:TGCT
m_ksPCII-seg,08.5.Recodeil CAGACACAGGAACACGACAATTAG
InAEPCR-seg08.6.Reves:se CiGI:GTTCTCCICITCTCGT
ra_ks-DCR-seg,08.6.WM-Type CAGACACAGGAACACGAC,:LkTCAA
InAEPCR-seg08.1...Reo3ded ATACAGACGCAGCTCATCATCTAG
83
CA 03027882 2018-12-14
WO 2017/218727 PCT/US2017/037596
olAsPelk-sce8 .7 Revtrse GTTTGTTACCGAGCGICTGATC
InAs,1-.CR-fiegOS. WAd2rype AIACAGACGCAGCTCATGATCCAA
mAsPC.R.,seg0,3.8 Recoded TCCGCGAMTCACCTCAC:
InAsPCR-segetS.S R.Evrof CAACGCCCAGACCC:AGAG
o:A.z.F.,C1,1-seg08 .8 Mild-Type TCCGCGATGTCACCAGCT
.1n.A1PCR-seg09. LTkrzoded GATAAGACAC ACGGTTAGCATA = ;A. ACAA
ErsA5PCR-sze.}9. ..Reverse GCTATCTCACCAGGCC,ACAT
mAsPCR-8ega9.1...Wild-Typc GATAGCRACACGGITAGCATATITACAC
TATGAATATCTGGAACCGCTCGATCTA
mAs.PCR-seg09 .2 ..Resiuse G.AAGGAATAAGTACATCATTGCGGAT
ErsAiPCR-stg09.2..Wild-T-ype TATGAATATCTGGAACCGCTCOATTIG
ILA sPCR-sega9.3..Recoded CCAGACACC:GGCAAT2s,ATC AGA
trAIPCR-fie09.3.._Reverse CATC.ATGAACACGGAAGGTI-LkTAACG
inAPR-seg09 CCAGACACCGGCAAIAATCAGC
111A sPCR-seg09.4....lecoded CGCATT.AAAGCAGATAAAAAGCACCATA
sa4_,iPCR.-seg09 .4 _Reverse ATG.AAATAACCTCAGC GCTGGA
CGCATTAAAGCAGATAAATAACACCATC
irsA5PCR-stsc)9.5...Recodect TGTI-TITCC'GTACC_4ACTCGC.T
mA sPCR-sega9.5....Re caw CGCCTCAGTTCCCGTGAC
lailAPCR-see.39 Wild-Type ________ TGT1 I CCGTACGACTCGCA
alAsPCR-segt.19 .6 ..Recoded C:GTTTCTCTGC:TAATC:=GATGCTT
.111Aii3CR-ste9.6..._Reverse CTGCTACGCCATCCCGAAA
tr.AsPi.M.-sega9.6.1,Vild-Type CGTITCTCTGC:TAATTTATCGAIGTTA
InAIPCR-fie09. 7. _Recoded TGTC;TTTCGA:T.A.TAACCGTGGGA
suAsPC1.-seÃ09 õReverse GGCCC;AAGACTCACAAATCTITC
inAill'CR-mgc.)9. 7. .Wild2rype TGTGTTTC.GATATAACCGTGGCT
4;sPCR-sfe$9 .8 _Receded CTCTCAGCAGACGAGAAATCA
.111..A&PCR-ita09.8....Revene. AGGCAAACCAGACATTCTCGT
s.1.1A--.0C1,1-sev39 .8 CTCAGTGCAGACGAGAAAAGC
tr-AsPC11-seg1 ..Recoded GCCAAG TACAGCGGAAAGTTTT
E.a.4:sPCR-1:cg 1 . ..Reverse CAACTTATGGCGTGCTGICG
InAsPCR-seg, 0 A.:Wild-Type GCCCAATACAGCGGAA.A.GMA
RIA&PCK-seg 10.2. Recoded TGTAATGATGAATGACTTTTCTTTTAC:ACC:A
InAsPC1t-segIi3.2..Re.sielse AATACATCCGCAATTCTCAAACCTG
.Wilel-Type TGIAATGATGAATGAC __ I TCTITTAC:ACCG
m_1.-,`_sPCR-s.cg (1.3..Remied CiTCAGI:f ________ 1.ATCCACGCCTGA
84
CA 03027882 2018-12-14
WO 2017/218727
PCT/US2017/037596
mAftPCR-fieg la 3. .Reverse AC:GICTACA_AGGCTICGATACC
InAsPC11.-seg1 0. 3 _Wild-Type GTCACTTTA.TCCACGCCGC:T
tne-UPCR.-1:eg10.4..Rergaded TGAIGCTGAACCGCATTGTAAAG
mAsPC1t-seg 10 4..Reve.Ese. TG-AAGAACAACTC:GATACAGCACT
tilA5PCIZ-seg NA...Wild-Type TGATGCTGAACCGCATTGTACAA
mAsPCR-segIO. 5. Recoded GAAGGTGAAAAGGTGGTTTCCTC
17.3A5PCR-seg i O. 5 _Reverse GGTTAGCGGATAAGTCACCTGAT
mAsPCR-segIO. 5. . Wild-Type GAA.GGIGAAAAGGIGGTTICCAG
snAATCR-seg 10 .6 ..Recoded CACC:TGATTTACCGCT. TTTGGAAT.T
PCR-segIO.d..Reveae CGAGTTC TGGTTTGC:CCTTATTAA.
Ink.:PCR-seg d _WM-Type CACCTGATTTACC.GCTTTTGGAATG
niAaPCR-seg ID. 7.RE coded CGACCATTACCCCTTICGGA
titkiPCR-ueg ICC 7 _Reverse TGAAAATGATGCTGGAACiATGCG
mAaPCR-s==10. 7. Wild-Type CGACCATTACCCC.TTTCGGC¨
inA5PCR -Egg 1 a .11ecedeAi ATAGAAGCTCCAGTAGATCAATf_:TGAIGAC7'.
ta-AsPCR-segIO.g.ReVeVie CACGC,GAATAA.CTC:'.AT.CTGGCA
trAsPCR-seg 10 .8 .Wil d-1Te TTAACAACTCCAGCAj-V,,,IC ATCTGA TGAC
knAsPCR-segt 1.1..Recoded GGCTCATAACTACGCCATG T CA
tnAsPCR-seg 11 .I..Reverae GCCCATCAGCTCATC.TTCCA
m_AaPCR -seg 11.1. Wild-Type GGCTCA.T.A.,kCTACGCCATGAGT
suANPCR.- seg 11 ..Recoded GCGTGTAT1r1GCCATGAAC.TCA
usAiPCR-seg I1.2..Keverr4e TGCGGTCAGGGTACi.i..AATC AG
=isP.C.1-seg 1.2 ..TariisM'ype GCGTGIATITTGCCATGAACAGC
tysA8PCR-seg11.3...Recoded CATATTTGATITTAGCGATGGTTICAGAT
raAs...001-seg 1 .3 ..Revesse GCAACACCICA.GCCTGCA
47.14.5PCK-stg i I. 3 .= Wild-Type C:ATATTTGATTTT. AGCGATGGTTTCAGAG
srAsPC11.-sep, 11 4..Recoded CAATAATTGACTGTGCCGGATCT
17.3A PCR -3ieg 11.4...Reverse CGCTGCGCTCAATAAAAA.ACAG
rc-AsPCR-segI 1.4.. Wild-Type CAATAATTGACTGTGCCGGATCG
mA5PCR-seg11.5...Recaded CCTCGAAGACTCCGTAGCAC:
InAsPC:11-segt 1.. ..ReveEse ATT. TCCACTGCGCGGGTA.k
mA&PCR-fieg11.5. Wild-Type C:CTCGAAGACTCCGTAGC.AT
InAsPC11.-segt 1.. ..Recoded TGACAGCTCCACTTACCCTAC TA
titkiPCR-1:eg 1 I . _Reverse CAGACACCGTTTCCATATCCCiA
mAsPCR-seg 1..6 TG.ACAGCTCCATTAACCCTA.TTG
tilA5PCR-seg 1 1 ..Recsded GCTCCACGACTACTGGAAAATATTC
CA 03027882 2018-12-14
WO 2017/218727
PCT/US2017/037596
As..CR-seg11.7..Reverse .. TTTTAATC,,LkTGC4-G..C=TGAGT-TA
vaA,sPCR,se.:311.7..1kFiti-Type GCTCCACGTTLA.C-TGG:k.k.k1-.4.,ITE
ksPCR-se.R1I.S.Recod,D.i COAAGACAT...kAACCC.iTATC,AAT.-A..kci
raAOCR-stglI.S.Revene TA CTGAC __ I ATe.i' P.MCGGTACTG
2.1APC.R-seU.8.Wid-Te CGAAGACATAAACGAAT.kAT.kTCA.GCALTT.,
ImAiTCR-s8g12. 1 ..Rec.odmi CGTAACGTT.CAACCATGACTT.GT
InA5PCR-seg 12_ 1 _ReNXEW GCCATCGCCGATAkACTGAC
CGTAACGTTCkACCATCACCMC
alet rsPOZ-seF 12 .2..Recodtd. GGGT.kGGC3TAATACGCATCATCC
-sPCR-I2.2Reversc TTTGCACTIT.CCACTCCGATS
ns.A.sP:C:R ,stg 12. 2 . _W-Type AGGTAGGGTAATACGCATCATCA.
rLA.s.CR-seg 12 .3..Recade,1 CALkACCTATCACCAGCACCGTA.
InAsPR-stg12.3..Reverse TATITCGCGCTAC:TASTC3'ATGC-TT
mAsi2C.R-se.g .3..Wilti-T3.-pf CCAGCAC:CCi'IT.
CR-see12.4..aeozde,=1 CT. -11...AGeGGGCCA. Ti:AATCT13A
LAEP:CR-seg12.4Rei8& GCTGCi='CCTTCTCTCCTTACG
mAsPCR-seg12.4..Wkid-Type C.:T. T171',.kCGGC.'-(:CATCAATC,TGG
inAiPCIt -seg 12 5..Recoded _ATAATCAGGTC:TGGATTCTTCTCTTTGAG
n.AsPC.R-seg 12 .5..Rtwerw GAT..AACGCTCA.TACTGGICACAAC
tm.A.00II-seg 12. 5..Wiki-Type ATAAltAGGTereGAT.Tt:=T`ICTC,TT.'1.T.
Inek5KR-seg12.6..Recod.td (.5-AC,TGGTCCGGTAT=.ATC,CCT
ra:AsPC.K-se,g12.6..Revexse CCCTGTAGGICGTCGAGAAAT
11.121._ .iPat-seg 12 .6..Wikl-Type. GACTC=GICCGGTATTTATGCCA
alsPCR-sel2.7Rt1 .. GCGATCAATC:CAAATCTCAC:C:T
LkOC:R ,se.g 12. 7 ..Rewisie TGAC:CAAGCAGGACAACAC
mAsPCR-see12.7..W11d-Type GCGATC,TCC,4.AATCTCACW
ta,kiPCR-5eg12.S.Rmoded CGTTIGTAT.A.GAR=CGCCGAT
GAGCAAAITC:1"GTCACTT.CT.TCTAA.TGAA
inAiPCIt -seg 12. S .154qId-ri ype CGTTIGIATAAATA=CGCACT.G
im.A5PCR-se.g 13 1 _Rt+cockd GCTTC.-ITGCGGATTCATC.GAT
mA&PCR-seg13.1..Reverse CTCCACCTC:ACCGT.T+.7.:TATOC
ImAiTCR-stg13. 1 ..Wiid-Type G(:TTC:TTGCGGATICATGCTC.
-$,PCR-s,n 13 .2..Rzu.N=slfti AAA.AACCiTC:GGC7CAATTC:ICT
t1,1.4.0CR13. 2 ..Rew.rae GCTACCCGCGCCTGATA.:-.µC
InksPOZ-seF 13 .2..Wild-TyTst AAAA.AAACGTCGGC:CAA =CA
mAsKa-5eg13.3..Rec.miegi GGIGTGTGAACµJ'ATITC;ATGACTC:T
86
CA 03027882 2018-12-14
WO 2017/218727
PCT/US2017/037596
rnAsPea-seg13.3..Rtn-tt TGTTTACAAAGCGAGGGGTGATA
srcAs.PC.R,seg13 .3.:WW-Type GGTGTGTGAA GGATTTGA TGACA.GC
r.isiAJ,CR-Neg13.4..Reoxied TGGAATACGTGGTCTCI,=T:TT
snAsPCK-seg13 A.Revene GGCGTC:ATTACCCACCAGT
Wild-Type TGGAATACGTGGTCTC:-GTTTTTA
naAsPCR-seg13 . 5_ Recoded GGCATTCAC-GTTAGTAGAGGAC
IrLA.i,PCR-eg13.5..Itevene TTAAC'TGGCAAAAGGGTGACA
mAsPCP,-seg13 GGCATTCAGGTTAGTGCTGCTCT
11-324_,-.,PCR-Etg13.(5..RecN.-kded GC:AGGACTCCTC:GTATGCTATC
InAsPeR-seg116..Reve18e CGTAGTOGGTTA.GAAC:ITGCC:A
123A5PCR-mg.13.b..11611d-Type GCAGGAGTCCICGTATGCTAAG
nAsPCR-seg13.7.Recoded TGCC.GTTGTTGACOGTTCA
EnA PCR-Neg13.7.Reresse CCATGAAGATTTTGGTGAACTGCT
sPC11-seg13.7.Wild-Tyz.3e TGCCGTTGTTGACCGTAGT
r3AE:PCK-Neg13.S..Remied GATCCATTGAATTTTGATGAi:'sAGACGT
mAsPCB.-seg13 .8_.Reversie GCCTATACCGCC:TAT TCPCIGG
inA,sPCR-seg13 GAATCCATTGAATTTACTGCTAAC-iACGC
caAsPCR-seg14 . I Recoded CTGATGTCTAAGATTATCGCGACTCTA
raksPCR-1:f g 14 . 1 ..Revene TTGCGTGe=¨eCAAGAGAGGTG
mAsPCR-8eg14.1..Wi1d-Twe. CTGATGAG'TAAGATT ATCGCGACTTTG.
122A5PCR-E.eg14.2..Recsded CAGAC:GGTAAATTTATGGTAATGGTTTC
naAsPCR-seg14.2..Revme GTGACTTTGTAAGACGGGTTAGAAC
srAPCK-seg14 GCGACGGT.W..TTTATGGTAATGGTCAG
mAsPCR-seg14.3..Recoeltd. GTCGAACTTATTGATCATCTTGATTCCC
r:LAPC:IR-geg14.3..Reverie GCTCTCGCAGTCGTTCAT
2,13ARPCR-seg14.3.,Wild-T-y,-pe GTCGAACTTATTGATCATCTTGATAGTT
snA,iPCR-Ftg14 .4.Recoded CATCTGGGATATCAPIAAAGCATATCGGTTAT
au.A&PCR-seg14.4.Reverst CAAGACGATGGGTAATACAGGCA
mAsPCR,seg 14 A.WiId-Tmae CATCTGGGATATCAAAAAGC:ATATCGGTTAC
mAaPCR-eg14.5..Recoeitel TACCAATGGCTCGTAAATGGCTA
mAsPCR-seg14.5..Revene TC:-CC:GAGCAGTGTCTGAC:.
rnAsPCR-seg14.5.,Wilti-Type TACCAATGGCTCGTAAATCGTTG
saAsPC.R.-seg 4 h..Recoded AAATGTTCTTCGGC.AATTATTTCGTTATTC
ris.A.iiPCR-Neg14.6..Reverse TGGIA-A% CATGf,7TGTAAATATTCTCGTC
inAs.PCK-seg 4 b.:WM-Type AAATGITCTTCGGCAATIATTTCGTT"ATTA..
ns_Ai:PCR-Neg14. 7..R.e.wded TCGCAGTAATCGAGGCTGA
87
CA 03027882 2018-12-14
WO 2017/218727
PCT/US2017/037596
IsiAsPCR 14 GGTTTGGCTCTGGTCTGGTAG
mAsPCR-seg 14.7.. Wild-Type TeCiCAGTAAGAGGCGCT
131A2PCR-seg./ 4 .E.Recoded AGAGATCGAGGGCCGTFACTI
mAsPCR-seg 14.8. Kn-ent CAGK:CGCACACTATGAGC
.snA.PCR -seg /4 AGA.GATCTAAGGCCGICACC
1.5 CGGIGTCGAAAMGAAGCACTC
f$12PCR-stgli..1...R.everse CGATGCGCAGAGGTGACA
mAt:PCR-seg 15 .1.. Wild -Type. CGGIGFCGAAATGGAAGCATTA
inAsPC:R-seg /5 2.:Recodecl TGTT.TAGCCTCTGGACCGFAAG
in.ksPCR-seg 15 .2..Revme cACTGGATGAGATT __ i ACCC
PCeg5 TGITTAGCCTCTGGACCGTAGC
mAs,PCR-seg 15 .3....Pszoodr4i CGAAi:'sACGTCCGTGATTACTC:A
issA5PCR-seg15.3....Revesse GATGCCATCTITATTGA'G'CTGITCA
m_k&PCR-seg 15 3.. Wild-Type CGAAAACGTCCGTGATTACAGC
ritAsPCK-leg15.4..Recaded taAACCTGACGCOGC.TACTT
znAsPCa-seIi.4..Revese GATTAGCATAC'ACTTCACCTTCAGTAC
CAACCTGACGCCGTTGTTG
311,AsPCR-seg 15.5_ Recoded CCGTCTGAACCTTTATGCATGGA
CTGTTCCGCACTGATATCGAA.,,s.kTG
in.ksPCR-seg 15 .5.. Wilti-Type CCGTCTGA_AC:CTTTATGCATACT
selAsPCR,NeR=15.6..R.eoaded CCATCAC.AAGCAGGCCAGA
LuAsPE:R.-seg15.6..Revevie CGCGGATAAAA_A.ACTTGTTGTCG
mAsPCR-seg15.6..,Wild-Type. CCAMACTAACAGGCCGC T
silAsPC:R-seg153..Recode.d. CAGCAAATATAAGACCGTIAACTGAT
ffiAsPCK-aegi.5.7..Reverse CG'TITTGCTAAGGATGTCATCGTC:
znAsPf.7.R.-seg 15.7.. Wila-Type CAGCAAATAT CAAACCGTT AAC.GCTG
nlAs:PCK,Ne 15 .S.Rezaded C:GAACTGCATGGTGACSTT..kG
tnAsPC:R-seg 15.B_Reversie A TTC:CAGCTCAC2kGTG_A.TCAGA
nsAsPCR-:;eg15..E..Wild-Type. CGAAC:TGCATGGTGACGTTAC
mAsPCR-seg./6 __Recoded CGGTCACAGTCTGAATGCCT
saAsPCK-NeR=16.1..R.e.verse GTGCGTCATACAGCAGATCCT
TnAsPCR-seg _WM-Type CGGTCACAGTC.:TGAATGC:CG
GGTCCGCAATCTCTCTTTTTC.:A.
silAsPCR-seg .?...Reversie CMC..'CACCACGCCCATAT
mAt:PCR-seg 16.2.. Wild -Type. GGTCCGCAATCTCTC If I I AGT
mAsPCR-seg /6 3.:Recodecl GCAATAAT'CAC:GTTAGCTGCCT
88
CA 03027882 2018-12-14
WO 2017/218727 PCT/US2017/037596
1.-.^AsPf-a-seg 6.3..Reverse GTACAAGTAAGGATGOGACTATTTAACTG
naA5PCR- seg 16.3 _Wild-Type CCAATAATCACGTTAGC.A.AIGCCG
mA8PCR-seg16.4..Recoded TCCGCTGGTGTACGGACAAG
r3AsPCR-seg 16.4 .R.everse ACTTTACTTCACCATCGGAGTCC
mAsPC.R-segi TCC,G3.1.,TGGTGTTC:TGACTAA
ra.A.rsPCR-1.zeg .5..Recoded CTGGGAGGGGATGITTGITCTA
luAs...3CR-seg Id .5..Reverse CGCAAGC.AGAAGGTTACCC:
rnAsPCR-Eeg 1E5 CTGGGAGGGGATGTTTGTTT. IG
mAsPCR-seg16.6..Recoded GITCGAGATGCT(KiGGICA
.r.s_AsPCR-seg 16.6. Reverse OGGAAAGCGTC:AAICA:CTGA
mAsPeR-segI6.6..Wild-Type GITCGAGATGCIGGGGAGC:
st3A5PCR- seg 16.7 _Kee...coed CTGC:CATTTCTGATTGT. CTTTAAJL4..TAT
irLA.sPOR-seg, Id .7 _Reverse GCCGATCAGTAGACAGCAAAATG
16.7. Mild-Type CTGC:C.ATTICTGAITGTC-TTTA.A.A.sik.,TAAGC
CR-seg .8..Reeoded CAGGGACGSGATCAGTGA
2114.s PCa-seg 16. g. _R.everse TCTGCOGC.:A.G.AGAAAATC.:AATIT
mAsPC:R-seg tS CAGGGACGC4-C.,--ATCAGGCT
mAEPCR-8egI7.1.Recoded TGAGAGATOGACTITATGGCATGAC:
snAs.POR-seg .1..Reverse. AATACCTGAAGCATGGGA.AT.T.TAC
.r.s_AsPCR-seg17..1.1Vilzi-Type GOTGAGATCGAC:TrTATGGC.AACTG
snA,.:PCR.-seg 112. Receded GACAAACTCCITACGCTGAAAG
rri.AsPCR-seat 7.2..Re.vent GGTGATGAI. __ CTCTGCGGTTATC
irLA.sPCR-se? .2 _WM-Type GAC:_A.CTCCTTACGCGCTC44
n-Lks.PCR-segI 7.3...Recodtd AGAATTACCTGACCACC=ATT
ir,A.sPOR-seg .3..Reverse CAAACCAGGAGCTGCACAATG
InAsPCR-seg r7.3. _Wild-Type AGAATTACCMACCACCGITCATC
raMPCR-seg 17 .4..Recoded TATICCACGCATICCAGAGAAGTC
xlIAEPCR-seg1-7.4....Re.verst. GGGTGC' __ #
saAsPCR-segt?.4..Wild-Type T ATTGCACGCATECCAGAGA_A GAG
na.AsPCR-8eg17.5..Recoded CAT. CTGCGCA.UTACACCTICT
snA,.:PCP, -seg 11.5. Reverse GTCCGCGA.A.GATGAGICAGAT
raAsPOR-segi .5..Wild-Type CAT CIGCGCATTIACACCTTCA
raksPCR-1.zeg 17 .6..Recoded AT ACAGAGA GACAAT.kATAAT. GATTCT
naAsPCR-segI 7.6..Revcse GOGCCACGATTCAGAGTAATC.
113A5P.CR-Ezg17_6..Wild-Type ATACAGAGAGACAATAATAATC-G.TAGATAGC
ksPCR-segi 73..Recoded CCGATCGC.TGT.C:QTITTTACT
89
CA 03027882 2018-12-14
WO 2017/218727
PCT/US2017/037596
knA&PCR-g17.7. _Rrverse TTCGAGTGAWICTACCTATCTCTTT
LaA,i=PCR.-se.g1.7.7..Wit&Type CCGATCGCTGTCGTTTTTACC
raAsPCR-seg 17. Ilecocied CIGGCGGATCGT. GC.= A
illAAPCR-seg 1 LS. Reverse GCCATCCCCAC:GGTCATAT
raA sPCR-gegl TS. Type: CTGGC.GGATCGTGC-TTTTG
TCGTACCCTGG- r"TTACCAAAAACT
rocA sPC?-seg 18 .1..Re-ex-rse CCAGGTCAACA-GCCAGGT
nIAAPCR -seg1S. 1 TCGTACC:CTGGTTACCAAAAAC:A
riLk.sPCR-seg18.2...Recoded CCC-CAAAAAAGTAGTTGGTTGATA.GT
mA OCR-segl 3.2. _Revexse CCATCGGCCATCATCJAACG
$.13.2C.R.-seg WiWType CCGCAAAAGTAGTTGGTTGAaa.GA
InAiPCR -seg IS. 3 ..Recoded CTTAAIGCCIAT.AA.GLAGC.AACACTATCT
itIA5FCR-Seg1g.3 _Reverse TGGGITGAGATGCCAWITT
rz:AsPCR-aegl 83 TTAAATGCCTATAAAGCAGCAACATTAAGC
ITIAOCR -firg I S. 4..Recsaied GCTGAA TCTTA TCCGCTGCTTCTA
raAsPCR-seg, 18 .4..Rew-rse GTTCAACTCTGAGCAACGTCAC
nsAl-PCR -mg .1.4..Wiiii-Type GCTG A A TCTTATCC:GCTGTTA TIG
Recoaed GTTTCATAGCCAACACGATCTGA
mAsPCR-5eg1 S.5 ,Reverse G'GTGTCTACAGCGGAAGTAGG
siLksPCR-ses1.8 .5.. WiltiType GITTCATAGCCAACACGATCGCT
raA -segi 3.6 ..Rff.coded CTGACGACCACACATCATATTAAGT
inA5PCit-seg. I 8.6..Reveae GCCGCCTTTTCTTTTTCCGA
CTGACGACCACACATCATATTAAGC
n3AOCR -seg I S. 7 ..Recaded CTTGACTTCGATGCACTGATTAACT
mAsPCR-seg, 18 .7..Revtrse GTC:CTTCAGCATCTTCTTCCAGA
irsA.00R -sEg I S. 7 _WM-Type CTTGA=GATGCACTGATTAACA
InksPCIt-segl.S.3.Recocier1 OGA __ AGCTCC:CTGATGATATTACGA
mAsPCR-5eg1S.S.Revecse GTAAAACCCCTGTATTGTC:ATTAAL-CT
LrAsi-,CR-ses1.8.3.Wild-Type OGATT.AGCTCCCTGATCiATAIT A ACT
raAsPCB.' -segi 9_ I ..Reccoded GAT TIT GCCAGCACCATACCAATTGA
P(R-seg19.1..Reveae .AATTGGTTATAAGGAGAGAGTATGCGT
CTTTTTGCCAGCACCATACCAATACT
mAsPCR -seg 19. 2 ..RecockA CCGITCGTITTATCTATCAGGTTCA
AsPCR-seg19.2..Revme TATATCCI3CGCCAGTCAGITTT
knA&PCP. -seg19.2. _Wild:rype CGGTTC:CTTTTATTTAACTGGTAGC
-17.e.g1.9.3..Recoik-c1 CGGATCTGCTATCGTGCCTT
CA 03027882 2018-12-14
WO 2017/218727
PCT/US2017/037596
nrAsPCR-segI9.3..Re verse AAC:AGACC:AGTATCGAG.ATAATCCG
tnAsPCK- N.eg 19.3 ..Wig-Type CGGATIITGCTAAGCTGCTTG
isrANPC:R-segIS.4..Remied GCGACICAGAACGTATGCATCTT-
illAsPCR-segI9.4...Revemt GCCACCTTCAe-ITTCCTTCCG
raAciPCR-3egI9.4..17Ald-Type. GCGACAGTGAAAGAATGCATTTG
snAsPCR-5eg19.5...Recode1 TGAACAAGAAACACTTCCGC __ I
laAsi/CR-fie19.S. Revet-se TTCACCATCGCC2TATGCAC
IrrAsPCR,-seg 19 .5:Wild-Type 'TGAACAAGAAACACTTCOGCTTA
tnAs.PCR-8eg19.6, Recodeti OGATCACTTTTTGGCTCTTACTCT
ziAsPCK-ieg 19.6. Reverse GGGTATTGCC-CGTAGATTTCTC
atA;.:PCR-seg 19.6 ..Wild-T3,-pe CGATCATIGTITGGCAGITACAGC
InAsPCK-Ezg 19. 7. Receded GCAAAAAGATGGCCTCGACT
inA.4PCR-wg19.1..Reverse GTCAGCTCCATTCCTFCITTITTACG
1.T.IA,sPCR-segI9.7 .. Wild-Type C-.A.AAAGATGGCCTCGACA
ntAs.PCR-8e1. Recodeti ATGATTTCGGCCAAGAGGAGAGT
mAsPCR,-seg 19 .8...Reverse CGCCAATATCATCCGCAACATT
121.ksPCR-8eg19.8õWig-Type ATGATTlIGGCCAAGAGCAGAGA
InAsPCK-Ezg20. 1. RecoeW GGTAACTCTGCTCTTTTTTATGCATTAA
nrAsPC,R,segM.1..Reverse CTTAAACGTGAGAAACAC-GA C:(I=
mAsPCR-5e.R20. Mig-Type GGTAACTGIGC:ICTI-TITTTATGCATTAC
mAsPCK-Ete20.2...Recoded CGCTTTATTTTCTCTGAATCCTC-GGA
mAsPCR-see0.2... Re verse GGACGTTGGATCTTGTTTTTGTCTAC
naA,4PCK-5.eg2Ø2..Wi1d-Type CGC:TTTATTTTCTCGCTATCC:TGACT
niAsPCR,-seg2O.3.Recoded CCAGCTACCGCTATGTCTTCA
mAs,FCR-seg20.3..Revesse GCCGATCCAACCGTTAGC
TrsAsPCR-wg20.3.-Wiid-Type CCAGTTACCGGATATGAGTAGC
inAsPCR,seg,204..Recatied ATTTTCTTGTTGTTCT1TCAGATTCA.
Reverse CTATATACATC'1.TL.AAACAGGCA.AGGTT
asAsPC.R-se.g20 GAATTTTCTTGTTGTT'CTTTCAGATAGC
inAsPC:R-see0.5.Recoded TCCLT:GGAGTGTTTCATCTGAT
tnAsPCK- Ntg2Ø5 . Reverse GCAAATCATCTGCGCCTCTG
mAciPCR-Feg20.5.-Wiisi-Type TCCCGTAACGTCTCATCGCTG
tuAsPCR-seg20.6.Recoded GACGGCGCTTTACCCAGT
rsiAsPCR-3eg20.6.Reverse GGCAAACCCGGAAAACCG
TT ALCCAGC
iltA&FCR- fitm20.. 7. Recoded C.:CTICCIGACAGTACW-LACGACTA
91
CA 03027882 2018-12-14
WO 2017/218727 PCT/US2017/037596
raAsPCR-seg2(7...Reverse CCTACCAAACCOGCACTGATT
itIAFTE.R-seg:20.7..Wild-Type GC:TTECTGACAGTACIAAAAAAGC.ICTE
mAsPCK-8eg20.8.Recoded CCTGA_AG.'AGAAGATTTAGTGATGAGTAGA
raAsPCR-srg2B.S.Rever8e CCATT:T.A.GGGCTGATTTATTACTACACAC
111ACR-seg20.8.Witd-Twe CCTGCAA_AGA.kGATTTAGTGATCAACAAT
raAsPCR-seg21_1 = ..Recoded GrrA .. iC-CCGCGATCGTGAAG
tnAsPCR-stg2:1.1...Rever5e ATATCACCGACTTTTCCCGTCTTAA
mAsPCR-seg21.1...I.Vitcl-Type GTTATGCCGCGATCGTGTAA
rnAsP-CR-stg21.2.Recocled CIGGCACAATATCIGGCAGITTC:
InAsPCR-seg2:1.2.Rever8e AA 'GACATTGGGATTAGCAGCAGTA
niAOCR-seg2.1.2.:Wild-Type ',GILAL-AAAATAICTGGCAGTITT
triAsPCR-sg21.3..Recoded GTCAAACCAGCCAAAAACCGA
niA.00R-scal .3..ReNx-ac TCTGATC-CIGAACCCACTAAACTTAT
DIA.vs,7,..CR-8eg21.3...W11d-Type GTC:A2.-ACCAGCCAAAAACGCT
-taAsPC11-mg21.4...Recocied GTC:GAGGAC.TACCATGAACAAGTTTC
it3A.,sPCR-seg214.:Rewrse GTTTGCATCACCGTTTC.'-CATTTT
mAs_PCR-seg21.4,..Witli-T}Te GTCGAGGACTACCATGAACAAGTTTT
raAsPCK-seg21_5.Recoded CAGTGTTTCAGACGGA.ATGAGAG
imAsPCR-seg21..5.Reverse. A.k.E.TACCICTGC.ICATGGTC:GIC
raAs:::,CK-seg21.5,Wiid-Type CAGTG.TrICAGACGGAAGCTTAA
113AsPCR-wg21.6..Recoded GTAATGCCAAATCCITCAGACT iAAATGA
ilLMPCR-seg21.6...Rievene GGIATGTGTTCTTGATGGCGAAAT
raAOCK-seg21_6..Witd-Type GTAATGCCAAATCCTTCACTCTTAAAGCT
raAsPCR-stg21 .7..Recocled TACAAATA.ACCATCICATCTGCCTGA
raAsPCR-seg21.7....Revermt TTGACTCAGAAGGGTGGGTTAC
raAsPCR-seg21.7..Wild-Type TACAAATAACCATCTCATCTGCCTGC
niAsPCR-scg21 .8.Recodeli GCGATCGTAGGAGTTTGAIGA
niAsPCR-seg21. &Reverse GACCGCTACAACICAGAAAAGAC
raAsPCR-stg21 .8.Wild-Type GCGAICGTAACTGTIGCTGCT
tt3A.sPCR-sen22.1..Recatini C:AATAATCGTAAAGGGGCAGTTTC:
raAsPCK-seg22.1. Reverse GCTGTAGATGCGGGGAGATATT
tuAsi--=CR-ieg22_ = .17Vi1i-Type. CAA:I A_ATCGTAGGGCCGTC.:AG
113,42i,PCR-seg22.2.:Recoded CTTTCATCCATGTCATTTGCC'IrA
r:LA C e. g 21 2. Reverse GOT ATCCiTCTGGCTGTA TTCGT
naAsPCK-5eg22.2..Wild-Type TTAAGCTCCATGTCATTTGCCAGC
111,42;PCR-seg22.3..Recoded TGTCTTTCAC,CGCCATCACA
92
CA 03027882 2018-12-14
WO 2017/218727
PCT/US2017/037596
raAsPCR-seg22 .3 _Reverse GCACTTCCCTC..GTTIGTCCA
tuAsPCR-seg22.3.. Wild.-TypT TGTC _____________ I 1 1ACCGCCATCACT
raAsPCR-5eg22 .4 _Receded GCTTCTGATAATACTCITCATAa,AITGAGGA
triAsPCR-seg22 4..Reverm GCAGCCTITAACTCCGATAACC
mAsPCR-seR22..4... Wild-Type GCTTC:TGATAATACTCTTCATAAATGCTGCT
nrAsPCR-.seg22 .5 ..-Recodtd. GGGCTTATCAATGTGACCCTATCA.
suAsPCR-8eg22.5...REVErSE CGGTCATGATTTCTGCAATACCTG
rrsAsPCR-8eg22 .5 ..\Vild-Type GGGCTTATCkTGTCA.CCTTAAGT
suAsPCR-seg22.6...Rfooded CAGTTTGATCACTTC.:GTCATTATAGAGAG
riAsBCR-8eR22.6...Reverse CGGTCTGTCACTC.IATTCGC
nrAsPCR-.seg.22:6..Wild-Type CA(47= GATCACTTC GTCATTAATA.GATAA
mAsPCR-8e222..7..Re3ded GA.A.C:CACAGAGAGAGTC;AATGATGA
rrsAsPCR-seg22 .7 .Reverse TGATTGACAAGGGT kl'777'7"7,,`AGCTATGAA
satAsPCR-seg22."7. Wild-Type GAACCACAGATAAAGTGAAGCTACT
InAsBCR-8e02.S.Remied GGC:GCTCGATCTGACACTT
nrAsPCR-.seg22 .B _Reverse TACGGAC:AGTGACAGCGTTG
DaAs.PCII-aeg22. Wild-T CGATCTGACATTG
113.4.0CR-seg23 I _Receded GGAACGTTITATGCTGOAGIT __ CTC
mAsPCR-segr23.1..Rfvem TCTGCCGGGTGATCTTGC
mAsPCR-seg23 GGAACGITITAIGCTOGAGITITTG
rPC:F...-.seg.23 2 ..Recodtd CGGTGATGACGC:IATCTTCA
DaAsPCK-aeg23 .2...Reverse CCATCAAGGGTAAAGCGTGATTTATC:
isrAsPC:R-seg23 .2 ..Wi.la-Type OGGTGATGACGCTAAGCAGT
InAsPCR-seg23.3..REcoded CIs¨a...AGA.:kAGATACAGGCTGGAAT,.AAG
rIT,AsPCR-seg23 .3 _Reverse GTATCCCACT. CAGCCCTAATCG
I.D.As2CR-seg23.3.. Wild-Type AACACAAGATACAGGCTG<iATTAA
mAsKR-se223 ..Recoderl TAGATGACGGTTAGTTTCAGCGAGA
rIABPC:it-seg23 .4..REVErSE TGGAAGATGCCTGGGAATAIATGG
tuAsPCK-seg23 _WM-Type T.Ads,AIGACGGTTA,',TITCAGC:GAGC
rIT,AsPCR-seg23 .5..Recoded GAGAATGGCACCGACGAAAATT
in.A.s.PCR-seg23.5...Rf verse GTCAAGGTGTTCAGGCGTFTATTT
mAsKR-se223 ..Wild-Type GACiAATGGCACCGACGAAAATA
tuAsPC:R-seg23 .d..Recodtd. TGCC:GCAGT TTYCATTAGGAC.,,
tuAsPCR-5eg23..6..Reverse CATCAAGCTCAAAATGGATAACTGG
rn,AOCR-seg23 TG'CCGCA.GTTTTCATCAACAA
tm.A.s.PCR-4eg23.7.Rec3ded CGGACA-ACTGAAAAGGCTGATG
93
CA 03027882 2018-12-14
WO 2017/218727
PCT/US2017/037596
ritAsPCR-leg23_7. Reverse ATITTITACATTITCGATAAATICATCTGCA
znAsPCR-seg23.7. Wild-Type CGGACAACACT AAAGGCGCTAC
islAPC.1Z-,ufg23,8..R.ec=oded CTCTAC:GTGCTGATTAACCTGTTGT
311,ksPCR-seg23.6..Reveis& GCATGGCTCCCGAAAA-TCAT
alis,J,'CIZ-5eg213..Mid-T31.3e CTCTACGTGCTC-ATTAACCTGTTGA
in.AsPCR-seg24.1..Recoded. TGTGAGGAGTGGTTATAC.AATAAGAAGTT
InAs.Kit4e224.1..Revesse .',:.,,AAAACTGTCGCCTTTAATACCAATG
InAsP13.-seg24.1..Wati-Type TGGCTGGASTGGTTATAGAAATA,A.G.,iiAGTG
GATGCCATCGATGTGACCTC
mAsPCR-seg242..Kevel-sie TTCTTCCCAGACAC-CATCCAG
filAsPCK-8eg24.2.. Wild-T:1.13e GATGCCATCGATGTGACCAG
znAsPCR-seg24.3..Recoekd CGTTCCTGGTAATTGTATGA-AGA I GT
inAs,PCR,Ne24.3...R..evesse AGCCCTATTTACACCGATGATTTC
tnAsPCR-seg24.3..Wild-Type CGTTC:CTGGTAATTGTATGAAGATTGC
irtA&PCR-8eg2.4.4...Rzoode4i ACTGCTATCTTCAAATCGCTGATC.T
itR-.seg244.:Reverse AACAGAGTCAACAACAACAACAGAC
sm4sPat,NeR24.4..Wild-Type ACTCiCTATCTTCAAATCGCTGATCA.
inAsPCR-seg24.5..Recoded GCGCCAGTTGTTTCAGGTATG
mAsPCR-ieR24.5...Reverse C:CTATACC:CGGAATATGTAC:ATTGTGA.
mAsPCR,seg24.5..Witi-Type GCGCCAGTTGTTTCAGGTAGC
mAt:PCR-4eg2.4.6..Recoded TCCTGTTCTGGAGGGGTCA
.1nAsPCR-see24.6..Revme GGCAGGAACATGTTGALT I CGATC
raA.aPCR,Ne24.6..Wiki-Type ICCTGITCTGGAGGGG-kGT
isiAsPCR-stg24 .7.Recoded CACGTTCAGTCATTAAAGATTCCATGT
mAs,PCR-seg24.7.1tvelse CCATTTGCTTTTCCTCATTTAGAATCG
IliAsPCR-seg.24.7.WM-Type CACGTTCAGTCATTAAAGATTCCATGA
mAsPCR-seg24.8..Recoded GGCACAACGTGACGGT.kATCT
niA.PCR-seg2413..Reyes-se GCCACATACTTTATTCTCACCCAGA
niA&PCR-ieR24.8.. Wild-Type GGCAC:AACGTGAC:GGTAATCA
mAs'KR-8fg2.5_. 1. Reakted CGGGGCCAATACCTCACTAC
mAsPCR-seg2.5.1..RecErse CGGC¨ATATTCACGTTCAACTTCA
Mid-Tmze CGGGGCCAATACCAGTTTGT
tuAsPCR-seg25.2..Recotted. TCAACACCTCAGATGAAGTTATTCTTTCT
rs.LA.PCR-st,g25 2..Revuse TCTATTGCCAGATTGACGAAAGC
InA2PCR-seg25.2..Wati-Type TCAACACC:AGTGATGAAGTTATTC:TTAGC
niAsECR-seg253..Remimi TTAC=TTT AG CATA TT ACGAATGACATAXEGT
94
CA 03027882 2018-12-14
WO 2017/218727
PCT/US2017/037596
InAs PC.R-sez25.3 Reverse GCACCTTCGCCAAT.ATTCGC
a-A5PC.!R.-seg:25.3 TTACTTCAACATATTACGAATGACATAATGC:
ksPC.R-geg25.4..Rec.Dded GCGGC.4AAGAAGATGAAGCAGTA
anARPCR-frAg25.4.Rever8e TTACCACCTAAATGAAGCGG-AAGA
trA8PCR-seg25 .4. :Virikt-Typ, e GCGGGA.A.GAAGATGAAGCAGTI
anAaPCR-stg25.5 .Recoded ATTTCACTTTCCCTTCTC.'.G.AAAAGC
inA5PCR-seg:25.5 _Reverse TCTWfTGATGATTTTCGTGTI
InAsPCR-seg25.5 _Wild -Type =ACTTTCC=C:TCGAA_A.AGT
inAl.cPCR-seg25.45..Recoded. TGAAAGC:ATTTGAAGGTCATGCG.A
sr,As.PCR-seg25.6..REverse CCGT.GCCATTGAACTGCTG
IrsAaPCR-seg25.6..Witd-Type GCTAAGCATTTGT AAGTCATGGCT
PCR-seg25 CGCTACGACCA_AAG
InAsPCR-seg25.7 Reverse GAAGAAGCAGGTC:TGGGTCAG
inA!.cPCR-seg25.7..W-Type CGCTACGACCGGGAAC.A.A
trAs.PCR-seg25.8..Revatied ATTCACTGAACTGAACCATCTGGATATC:
sts.AEPCR-seg2S.S _Reverse GGAGAGCCCGGTAT.AGCC
suAsPC:R-seg25 ATTCACTGAACTGAAAACCATCTGGATAAG-
InAsPCR-seg26. ..Ree.,.-Aied CCTTCTCCCTGAAT,CC,'GAAATAC TT.
ACATTCGTITTATTLICTICTITACAGCCT
mAsPCR-seg26.1..Wiki-Type CTTGTTGCC:TGAAAGCGATATTA
3ysAaPCR-svg26. 2 ..Rec.a&I GAGTATGAAGATCGGGCGATICIT
suAsPCR-seg26 . 2 õRe-ex-sse CAGCGTTT'TG.AICTCTITACCTAC ATC
3lAsPCR-seg26. 2 ..Wi-Type GAGTATGAAGATCGGGCGATITTA
TCCGATAAATTCCATTATGCCGGAGTA
alAsPCR-8eg26.3. Reverse AGTGCGTG ATGAA TGGA TTG
frA.sPC'R-seg7..6. . _Wild-Type AGTGATAAATTCCATTATGCAGGTGTC
suAsPC.F..-seg26.4..Revaded CGTATTTCGGCCATCAGTGATG
alAsPC.R-svg26.4.R.elmise GTGGATTGACGATGACAAACC
Ir,AsPCR-seg26 . 4. :Wik1-T7y-pe CGTATTTCGGCCATCAGACTGC
mAsPCR-8eg26.5. Reed GCTGACC.AAATGACC.AGATATGAAG
filA,cPCR-seg7..6.5..Reveose GCGCCAAACTATGCCGAAG
inAsPCR-seg26.5...:Wild-Tn.e GCTGAC:CAACTCCAGATATT:
frAA5PCR-seg26.6.Recocteci GAAGAGA.ITTATCGTGGC:ACCTC
Ir,AsPCR-seg, 26 .Z.Reverse CGGCGGTGATICTCAGAAATTIT
In.AsPCR-seg26.6. Wild, Type GAAGAGAT __ I ATCC.T'TG'GCACCAG
alA,sPCR-seg7..6 ..RecoJed. C: __ I rfiCAAATACAACCATGCTG6A
CA 03027882 2018-12-14
WO 2017/218727
PCT/US2017/037596
InAsPCR-8eR26.7..1.everse AAGTCGGGGAACTCTTCT=GA
tnAsPCR-s267..Wild-Type TTGTTCAAATACAACGATGCAGGT
mAgPCF..-8eg26.8.Reo-Aed CTATCTC:TTC.;AACCGGTGATCCTA
niAsPCR-seg26 .S.Reverse GCAGCAGMCATAACCGAA_AAG
mAsPCR-8eg26.8.Wiid-Type CTAAGC.'CTTGAACCGGTGATCTTG
231AsPCRg27 .1..Recoded TTTATCCGCAAACGCATCTGTC
InAsPCR-seg27.1..Reveae AAAGGTGGCAGGATGTTTACGA
r...1AsPCK-seg2..7_1..Wild-Type TTTATCCGCAAACGCATCTGAG
1ak8PCIZ-seg27.2..Reccided AGAACTCAC.CATCTTTTATCGCAATT
ra_ks-DCR-sn27.2..Reverse CAACTCACCGAAGAACAGTACCA
rriA.sPCR-mg21 AGAACTCACCATCTTTTATCGCAATA
mA&PCR-sieg2.7.3..R.e.catie4 CCGGATCGTCTACCTCTGCTA.
mAsPCR-seg27..Reverse GCCAATGGAAAGCTGATG iI CA
mAsPCR-se27.3.. CCGGATCGTTTACCTCTGTTG
mAaPCR-8eg2.7.4..R.e.codeti GTTCA_CTTCTTGTTGTTTCATCATTCTCA
InA.00R-seg21 4..Reve.Ese. CTTTACCAATAC:CTGAGATGTAAACGG
mAs.PCR-8eg27.4..Wild-Twe GTTCA_TTGCTTGTTGTTTCAIKATTCAGT
niAsPCR-5eg27_.5..Ret....otted C-',ATTATCTACCGCTGTAICTGGAGTATC
ralks.PCR-seg27.5..Revesw GATATTGATTAAGCGGCGAAGAGTC
mAsPCR-...41.07 5 ..Wiki-Type GATTATCTACCGCTGTATCTGGAGTATT
mAsPCR-seg27.6..Recoded TCAATCAGATGACCAGAGTA(:TTTGA
nlAsPCK-seg2..7_6..Reverse CGCGGGATGATCAATATGCTG
tnA5PCR-seg27.6..Wi1d-Type TCAATCAGATGACCGCTGTACTTACT
inAsPCR-seg27.7..Recoded. AAACAACA-ACGACGCAACCCTT
rriAsPCR-..seg27 .7..Revene TTCGAAAGCAAAATCATCACGCA
InAsPCR-seg'27.7..Wild-Type AA.A.CAACAACGACGCAACCI7C.4
mAgPC13..-sieg2..7_g.Reo-Aed AAAGITCAAAAGAGA1 IATATCCCTTCTTCT
PCIZ-seg27.8.R.eves8e CACGCCATCCTGATCCATATGTATA
rnAs:PCR-stg2.Wild.-Type AAAGTTCAAAA.GAGATT.kTATCCCTTCTTCA
23-s.A.00R-.scg2gi GGCGGTAC-GGAGTTACGAAG
illAs2CR-sleg28.1..Reverse TTTCATTTGCTTATGTGC:T.GGTCA.k
mAs:PCR-stg23_1..Wild-Type GGCGGTAGGGAGTTACGTAA
raA,c.PCIZ-seg.28.2..Recoded CTTGTTACAAAGTAACTGGGAGTTTATGA
mAsPCR-seg2g2..Reverse CGGC;TTCACGGCTA_4.' ATGATAAC
rriA.sPCR-se.g2g C:TTGTTACAAAGTAAGAATGGGAGTT. TAA_CT
mAsPCR-8eg28.3..Rec3deti TTAAAATGGATAAGAAGCAAGTAACGGATC:
96
CA 03027882 2018-12-14
WO 2017/218727
PCT/US2017/037596
ILAsPca-seV8.3..Revense CCAGTAGC.GC;GCGAATTTATG
rEsA8PCK-Ritg2&3..Wiki-Type ITAAAATCGATAAGA_AGCA,AGTAACGGATT
231AsPC.R-see8.4..Rran1ed TGAAATTITCATCCGTCAGTITGAAT
irt24.&PCR-8eg28.4.._Reverse CATA ATGTGGT.GCGGTACAC
TriA,sPCR stg.28.4.12Viki-Twe TG.A.;=kATTTTCATCCGTCAGTTTGA
inAsPCR-seg28.S.._Recoded ATCTGGCTGGCACAATATTACTC:TT
rei..4.8PCR-5.tg2Z..5..Rever8e CGACGTTAT. T(L4CCAGGTGTAGA
C.R-see8.5..1,17i1d -Type zsLTC:TGGCTGGCACAATATTACTTTG
niA8PCR.-Ritg2S..6..Recoded GC:TTTEACTITCGCTGCCACTA
rriAsPCR stg7.8.6..Reverse CTTTATAAGCC:GTGAGTACTICTICAA
a-8eg28.6...W114-Type =GT. C.AkTTTC.GCTGCCATTG
TriAsPCR-stg28.7..Recoded GGCTT.TGCAATGGTTACTT.C:TGA
inAsPCR-segS.7..Recerse GTCTTTAATCATACCAATAACTCAGATGCC
ra..A5PCR-mg28..7..17iiki-Type GGGTTTGC.AATGGTTACTTC ACT
niA.PCR,stg28.8.Rec.okied CGTTCATGCTTACTACGATATTCTATCA.
m..A&PCR-8eg28. &Reverse GCTGCTG.TICT.GACTCGGT
rriAsPCR stg2S.8.141.7i1c1-Type CGTTCATGCTTACTACGATATTTTGAGC
rnAsPCR-seg9.1..Reeatiet1 TGGCCATCGC:TGTCTGGT
ra..4.&PCR-Kg29.. . Reverse GGCAATAAC:CGACACAATAAGCG
7CAsPCR-se29.1..-Wild-Type TC.-GCCATCGCTGTC:IGGA
InAsPCR-seg29.2..Recedcd G-TTC:TAAAGGATITTATTGAIGCACLuG
raAsPCR,stg2.2..Rever8e GAATGGCGGTGATZAAGGTTA.GGA
FiCIAAAGGATITTATTGATGCACTTALa
1.1.1Asl?..-Ntg29.3...Recocied CTACATCCACTAAATCATTACAACTCCTGA
g29.3..R CGCTACTC.:GGAC:GCTATGAA
ra..AaKR-Kg29..3..Wild-Type TTA.C.ATCCATTAAATCATTACAA.CAGCTAG
rak,PCR- tg2.9 .4..Recoded GIGITGCTGI(:GA:ICCGGTA_
mAsPCR-seg29.4...keverse CAAG(:G-GTGTCTGTG,s..GTTATTAATC
GTG,TIGCTGICGATCCGGIG
r1.-31A.,3PC.R,se.5..Recatied TC:CIGTGAGCGCATACAGTC
1.1.1A&PCK-Ntg29.5...Reverse AGAAGGGTATGAGTAATAAGGIGGGA
rrs.A.AX`R-8eg29.5..Wild-Type TCCTGTGAGCGCATACAGAG
mAsPCR-5eg29.6..Recoded TCACTGGAGTTGTACC.,TTGTAGAGAAG
ruA PCR-stg2.9.6..Reverse CTTOCCGCCTCCTGITTIG
snIPC:R -seg=t.6..1-Vitd-Type TCACTGAGAGTTGT..kCGTTGTAGAGTAA
raAsiiKR-Ee.g29.7. Receded CATAATTAG.AATGCCGTGCCATG.
97
CA 03027882 2018-12-14
WO 2017/218727
PCT/US2017/037596
.mAsPCR-seg29.7....Revelse GCCTATCCTTCCGGTGCTTT
mAsPCR.-seg29.7..1,Viid-Type CATAATCAAAATGCCGTGC:CACTC
.inAiPCR-seg29. S.. _Reciaded GCGGAACCCAGATAAGCAAG
inA5PCR-segZ9,8 _Revers& CGTTTTGCCGCCGAGATC
mAsPCR-8eg29.8..Wild-T5Te. GCGGAACCCAGAT AAGCTAA
ii:PCR.-seg30 .1 ..Recoded CAAAATAGGGAATAATCGACCACATTGA
mAsPCR-seg36 .1..Re verse CTTTGGTCAGTGTGGCTTGC
tylAsPCR-segn_ 1 ..Wild-Type CAAAATAGGGAATAATCGACCACATACT
mAsPCR-seg30.2...Rtczoleti CAAGGGCCGCAGCT
.inAiPCR-seg30.2.. _Revel-se GGTACTGGACTAAATACCC ATCCG
-1...:AsPCR-seg30.2.:Witti-Tn3e CAAGGGCCGCAGCTITTAA
InA8PCR-sEg30.3...Reooded GCGATATATCCCGAAAGCCCTAG
clAsPCIt-seg3ti.3..Reverse TGCAAACCCTGAAACGGAATC
.rnAsPCR-seg30.3..:WikI-Type. GCGATATATCCGCTTAACCCCAA
.173A0CR-seg30.4...Recocied CCRXAATCCTI::GAAGCACTC
rnAsPCR-seg30.4..Rt vets& CCAAATACGCC:GIGCATCAG
trIA5PCR-stg,30.4..Wi1ti-Type CCTGCAATCCTCGAAGCATTA
mAsPCR-seg30.5..Recadeti TTCGAGTGATGAGATTTTGCGAAATTTA
.mAiPCR-st,g30. S.. Reverse AAG-TAAGCTC.:TGC:ACIT GTGGA
n-As.lu'CR-seg36 -Type TTCGAGTGAACTGATTTTGCGAAATTTT
naAaPCR-seg30.6.ReaKIM ATCGC:CTCGGTCGTTTCT
tuAN,PCR,seg30.6.Reverse CA __ it, # .GCACCGTCAAACAGTG
.ITIAOCR-se.g30.6..Wati-Type ATCGCCTCGGTGGTC:AGC
5PCR - stg 0, 7. Reooded GGC:TTGATCCGAAGAAAACCT
mAsPCR-seg30.7. velm: GCCC-CCTGTAGACCTTCTT
alAsPCIt-seg3ti.7..Wild-Type GGTTGGATCCGAAGAAAACCA
TCACCTGGGAGCCATTC-G
.tilAOCR-seg30.E....Reversie GTAGCTGGTCA ¨GGGCGTAC
mAsPCR-sez30.S.. Wad-Type -.F.CAC::CTGACTC4CCATTGC.
.ITLAOCR-se.g31. _Recoded CTATAC:CGATTACC:CGAC:GCTA
inAsPC11.-w,g31.1..Revore CGCATCGGTTTTGGCGTT
mAsPCR-seg3I. -Type CTATACCGATTACCCGACGTTG
nIAOCIt-scg,31.2..Recodeel GTCGCGGAATTTATGTACCAGTCA
mAaPCR-seg31.2. _Egveaw C;ACGAAATACTTCATCAGAC.kCCCA
ITI4APCR-sEg31.2. Wild-Type GTCGCGGAATTTATGTACC:AGAGC
trAsPCR-sez3I .3 .. Rtzzoied GCL-C.-CATC:TTITGGCTCA
98
CA 03027882 2018-12-14
WO 2017/218727
PCT/US2017/037596
na.A sPCIL-seg3I.3..}b1 verse GGGACTGGCACTTCTTCT. GG
Ei:A5PCR-srg3.1_3..Wild-Type GCCGCATCTTT.TGGCAGC
arlAsPCR-seg31..4.1es:vded ACCAGATTGCCCTGAAC=CA
mAiPC:R-stR31. Reverse CCCATAGGTTCAACGACCAGAT
ACCA.GATTGCCCTGAACTT7.AGT
.ff:AsPeR-sag3.1.5..Recode,d GAAA GGCTG-GTC.GIGCATA
33.ANPCR-,seg3.1...5..Reveysie TCTATICGTCGCCIACTIGCC
tLA sPCR-seg.31.5..Wild-Type GAAAGGCTGGTCYGTGCATC
EiLA::iPCR-seg31..6..Recoded CGGTTGTCATTG TTG..kACTCAAGT
.illAsPCR.-seg3I..6..Reve-sw GATGATCLAAATGZATCCGTGCA
.in_kill,C.R-sag31. 45_ .Wild-Type CGGITGTCATIGTIGAACTCGAG A
mAsPCR,seg3i..7.Recoded GCTGGAACA.CAATAAAGGTTTTTGT AACT
mAsPCR-seg31.7.Rewme CGCCGTGTGAGCATTTCA
tiLAAPCR-seg3 I.. 7.Wi1d-Type GCTGGAAC.ACAATAAAGGTTTTT GTAACA
.1rAsPCR-ses3i. Z.Recodeti GCAATTAGCGTCCGTAGTGAA
.171AEPC:R-sEg31. &Reverse TGTCCGTCGATGAAGATCACC
sr.PCR,seg3i. 2.11Tik1-Type GCAATT!-UCGTCCGCAAACTG
nsAsPCR-seg32.1 . _Rgq.5.xted GCTCATCTGTCCC:AAC:GATCA
trJAPCR-seg32.1...Revef se CACAC.TGCCAGACCGTAG
.nlAsPCR-8eg32.1.. GCTCATCTGTCCCAAAGAAGT
.171A5PCR-mg32.2. Ree.ixied TITGC:CGTCGGT.TTICTGITT.T.A
snAsPCR,seÃ32.2 _Reverse GTATTGTGATGATGCAAGTC_:rAGAAA
in.A.:KPCR-sEg32..2_ WM-Type TTTGC:CGTCGGT..TTTCTGTTTTT
ni.4,;PCR-seg32.3..Recoeled AA CTTAACTCTGTC:TGGGTC TTTTCA
mAsPCR-seg32.3. _Reveaw CGCGACAGACATTTCATGACG
B.1.44PCR-seg32.3..Wild-Type AA T A_A_ACAGCGTCTGGGTMTAGC
ir..AsPCM,seg32.4..ReeadEd CCACCACCAGATGTTCAGGA
PCR-sEg32. 4_ Reverse GCGCAAACTACITCTTC:AGGTAAA.
CCACCACCAGATG-firAGGT
mAsPCR-seg32.5._RevxteA AAGGACTGGCGATTGTGATGT
PCR-seg32.5..Reverse AG TGCTGTCaATGA GAATAA GGCA
ILAsPC:R,see32.5..Wil:1-Type AAGGAC:TGGCGATTGTGATGA
,tp_4513CR-stg3.2.6_Recocied CAGC:TGGACTTC:TCTCTTCC:T
ir..AsPCR-seg3.1.6..Rt verse AAICTTCTCATTACGTAG'GTCTGCTT
.mAsPCR-seg32.6....Wik.1-Type CAGCTGGACTTCTCITTGCCG
PCR-seg,32.7.Receded CGACCGTCGGACC:CTT
99
CA 03027882 2018-12-14
WO 2017/218727
PCT/US2017/037596
mAsPR-geg32.7.Reverse CACAAGAGATATGCAGGACACT
InAsPCR,seg12.7.Wild-Type CGACCGTCGGACAACCIT A
mAiTsCR-5eg32.8__Recoded T ATAAAAATCACCCAACCTACTACG
inAsPCR-seg32.3..Reverse CTTA TGATTAAGCGCCTATCATATCGC
mAsPCR-geg32.g. Mild-Type GGTATATCACCC.CCCAA,=kATCCT
slAsPCR,seg3.3.1...Recotied GCATC.CCTATGGCGAGTGAT
mAsPCR-seg33 .. Re wise AAATGGGCGAATACTAC.AAAGGC
inAsPCR-seg3.3.1..Wild-Type GCATC..CCTATGGCCAGACTC
mAsPCR-geg33.2. _Recoded CGACCC:CTCCCC:AAATGA
AOCR-seg3.1.2__Reverse GGCTKkkCAGATAATLGTQaATGA
raAsPCR-seg332...Wiki -Type CGAC:CCCTCCCCAAAGCT
grA,i_KR.-8eg33.3..Recoded GCTGGAATCAAATAAAGCCGAAC
IrAs..0CR-seg,33.3..Reverse TTATTACCGCCCATs. CTC:AAGGG
niAIPCR-seg3.1.3. .70A-Type GCTGGAAAGCAATAAAGCCGAAT
alAsPCR,seg3.3.4..Recotied GCATCGACTATGAAATCCGCTCA.
mAiPCR-se,g33. 4_ _Reverse GGTGGCAATGATGA.!&AAGCAGAATATA
riksPCR-seg,33 .4. :Wild-Type GCATCGACTATCTCCGCAGC
mAsPCR-geg33.5._Reezded CCATCAAGCAGACCGTTTAGT
InAsPCR,seg3.3.5..Reverse AALTGATGGCGGCAACAACTIC
mAiPCR-seg33. S. .Wiiii-Type CCATCAAGCAG' CCTGTTC A AC
inAsPCR-seg3.3.6..Recoded CTGATAGCG..A.CACTGC I I 1 CTG
mAsPCR-geg33.d. _Reverse TICGGCGATGACOGGGAT
slAsPCR,seg3.3.6..Wild-Type CTGATAGCGACACTCCTTTTC:GC
mA5PCR-seg333. Recoded AG TACCCTTGATTAC TITA.AC:CT =GA
inAsPCR-seg3.3.7..Reverse 4,_+ i IT...TGCMGGTGCTATTGG
mAsPR-geg33.7..Wild-Type AGTACCCTTGATTACTTTAACCTTGCT
AIPCR-segii.S__Recoded CyTTTCATTACCGACATGCCCAAG
raAsPCR-seg331. Re wise TGGTCGGTCAATGGAGATTATTC:AT
inkiPCR-seg3.3.6..Wild-Type GTTI'CATTACCGACATGCCCTAA
r_AsPCR-seg,34.1..Rer_aded TAATCAGTAIT AAGTCGCXGAAGTGA
niAIPCR-seg34.1.__Reverse ATGGCCTGGCTATATCGTTACAC
alAs.,9CR,seg34 50Vild-Type 'TAATCAGTAITGAGACGGCGTACT
mAiPCR-seg34.2__Recoded AGAATCTAGCCATCA=AAACTC
mAsPCR-seg,34.2..Reverse AAGTTQTCGAAAGTAGATTGCAGATG
mAsPCR-geg34.2..W1ld-Ty2e AGAATTTGGCCATCATCAGCAACAG
snAsPCR,seg34.3..Recotied CAATAACGGCAACCACGAAA.GA
100
CA 03027882 2018-12-14
WO 2017/218727
PCT/US2017/037596
mAsPC.:R.-seg34.3..Roberse TGACCGTC:ACCAATAACTCGAAT
Ea:UPC:R.- mg34 .3 ..Wild-Type ,CAATAACGGCAACC.ACGAAGCT
231A sPC:R-seg34:4 lecoded TGTTTGATAATAATAGGCCCATTCAGC:T
in4sPICR-8ee4.4.1eve1se AATGCCACCACGCCACAG
mA=iPC:R.-Feg34 .4 ..Wiki. -Type TUT iTC-',ATAATAATAGGCCCATTIALA
imA&PCR-seg34.5....R..ecoded Gi-'s.ACCGGATAGACCCAL=LGA
111A&PCR-Neg34 .5.. Revers& CGATC:ACCGCCAAGCTTAIG
snAsPC.R,seg34 .5..Wikl -Type CTACCGGAT,CCCAGGC:T
mAsPCR-Neg34.6.Recocled TCTTGAACAGGGIGCAATTCTCTC
fnAcPCR- seg34 .6...Reverse TICCACCACGAACAGCTCT
InAsPCFL-seg34.6....1.11i1d-Type TCTIGAACAGGGTGCAATTTTAAG
risAsPCR-1:rg34 .7 .Recoded GAT AAAA.GATCTCAAICAGTACTGGTITICT
rsiAsPCR,scg34.7.Reverse ACITATCAATTTMAGCACGTCA.GG
.tuAsPCK-8e34. 7..W ad-Type GATA_A_AAGATCTCAATCAGTACTGGITTAGC
rEEAsiPCR-mg341.g.Recoded CAGTGCTCTAC:ATCCAACTTICA
111AsPCM-seg34.8.REIMIDE GAAGACGCC:ACGAATATCTGATIG
ra..4.,sPCR-Nzg34.g. cAGTGcrTAcArccAAciTAGc
Ftg3 5.1 ..Recocled AGATATCAATATTATCTGGCCGATG.ATCCIT
imAsPCR-seg35./.. Reveise CTTGCCGC:GC:iGITTIATGG
mAsPC:R.- F.t.g3 5. 1 ..Wild-Type GGATATCAATATTATCTGGCCGATGATCTTA
snAsPCR,seg35.1.1teeacial AGAAACGCGATTA=C"TTTTakGG
ra4&PCR-Neg3S.2. .Reverse _L=s.AACAGAATTTTACGCGGATC:TAAATC
InAsPCR,seg3 5.2 ..Wilti-Tyw AGAAACGCGATTACTICTTT ACTGC
InAsPCit-seg353..Rtcoded GAAAGATGCTCGGCGGTTGA
131.4512CR-mg35.3...Reversie CCGSCACCITTAACCAGITTAIC
InA sPCR-seg353...Wild -Type CTTA.,A-4.7,7GCTOGGCG.GTAC T.
in4sPCK-8e35.4...E.evaded CGAGGTCGTTTTATGCAGAGAA.
niAsPCR,seg3 5.4 ..ReNitrse TATGAACCAGGCTGTGAATATGCTAT
lasIss.a7R-Reg.35.4..W.M-Type 4:::G-.AGGTCGTTTTATGCAGGCTG
raAsPCR-Neg35.5...Recoded TGCTGGGTATGGACTACGGA
rmAsP. CR,seg35.5.. itt wise GC.TACAAAAAIGCCCG-ATCCTC
mA5PCR-8eg35.5 TGCTGGGIATGGACTACGGI
nsAsPCR.-Feg35.6:..Recoded GGA __ n* ATCAAACTCAGG,IATC-TATTCTGA
mAsPCR-8eg3535....E.tvelw C:AAAACTGCCGCGTACCG
im4siPCR-Neg3S.6,. Mild-Type GGATTTATCAAACTCA GGAAIGIATICGCT
InAsPC.R.rseg,35.7..Recatial GGITTCGAT TA TAIGGACCGCAAAC
101
CA 03027882 2018-12-14
WO 2017/218727
PCT/US2017/037596
nIAN,PCR-seg35.1..Rellet-se GCGTTATGCCAA AGTGATTCCA
113.AiPCR-seg3S.7..Wild-Type GGTITCGATTATATGGACCGCAAAT
mAsPCR-seg35.8..Recoded GCGCTCACTAAGT CCTGGT
mAsPCR-seg35.8....Reverse TTIAGTGAAGATITTACCGCGCTIAG
raAsPCR-seg35.8. Wild-Type GCGC:TC:ACTAASTCCTGGA
nsAi-PCR-sEg36. /....Recoded CTGAATACCCT. TAA A A TTGC:CTGGT
PCR-seg36.1..Reverse CGCCCACCAGATCATTTTC-',ATATTC
CR-stg36. .wild-Type CTG-AATACC:T:TAAAAATTGCCTGGA
n-AsPCR-sep.,36.2..Recoded .AITTGCGGTAATCACAATCACT. CA
mAsPCR-seg35.2...Reverse CAGGATATTCGTCATCAGCTCGA
DIAN,PCR-seg36.2.:Wild-pe AlriGCGGTAATCACAATCACAGT
113.AiPCR-seg36.3...Recoded CCAAACATGCCTTTC.ATTAGTTC:MA
InksPCR-seg36.3..Reveme ACAACTIAAACATCTIGGIATGGATATTG.AC
r3AIPCR-8eg36.3. Wild-Type CCAAACATGCCTTTCATTAATTCGCT
illA5PCR-8rg36.4..ltec3dea CGGAAT4.TGGCACTGATATGAA
triA sPCR-seg35:4...Re woe GCCCCCCTATTTCTGACACC
tr,A,.:PCK-seg36.4..Wild-Type CGGAATGATGGCACTGATATGAC
raAsPCR-seg36.5..Reeadtd TAGTGATGACGC:CAGAGATGAATTTCT
atA5PCR-,seg36.5..Reverse AGGCTG[AGTATTITCCAAAACG
mAsPCR-8eg35.5.. Wild-Type TAGTGATGACGCCAGAGATC-ATITCA
itIA.PC7.-seg36.6..ltecoded CCCGTCCGCTCGCTAAAC
r..-..AsPCR-seg36.6...RE caw CATCTCTTITTCATTG __ 11 CAGICGAAT
InksPCR-seg36.6. Wild-Type CCCGTCCGCTCGCTAAAT
raAsPCR-seg36.7.Recoded TTCAGAATATTCkyk_:T.TTCTCAATATACCTCA
inA5PCR-seg36.7.Re,,7erse Azs:TTCGisaAACCTGCAGCATGG
triA sPCR-seg35.7.. Wild-Type. TT CAGAATATTCGCITAGCCAATATACCAGT
irsA8PCR-sEg36.S.Rec3de4J ,AACGTATTATCCATATCAGCTTTCC=
raAsPCR-seg36.8.R.evene AGTGATGAGCGTGTCTGTAGC
inkiPC'R-seg36.8.Wild-Tyge AACGTATTATCCATATCAGTTGAGTAGC
m4sPC11-seg37.1..Recaded 'TATCTAAAACTTTCCTCTAACGGCTATCTC:
itlAiPC7.-seg37.1..lteverse GACATCTTC:G.GC:GGTGACT
raAsPCR-seg37.1. Wild-Type TATCTAAAA.TTAAGCAGTAACGGCTATTTG
n3AOC.R-seg37.2..1ecoded AACCTCCGTCACGCTATCAT
raAsPCR-seg37.2..Reveue TACGCACTITTCCGCCAGA
tnAiP. CR-seg37.2. Wild-Type AACCTOCGMACGCTAAGCA
silAsPCR-seg.37.3..Recoded GCGCATTCCTTTCCTGTTTTCA
102
CA 03027882 2018-12-14
WO 2017/218727
PCT/US2017/037596
,tp_4.0`C.R-ieg37.3..Keverse C:CAAACATITCGGT:AAA.CATCG.:1-
GCGCATTCCTTTCCTG:TTTAGC
1/AsPC:R-m,g37..4..Ret..,:ksiett T,A_TTAC,C.kACGCTCTTAA_AAC:ATCTGAC.G
in..k.sPCR,seg3 7 4..Revene. GCTGTAC:GCGATITAT.ATTOGC
f1148PCK-ITg37.4..W11):1-Tn3e T,4-TTAC.C.k.kC:GCTC:TTAAAAC4TC:TCTC.T
.trAsPCR-seg:37.5..Recoeled I6-A_AACACCCGCCCiAAA_,A1::
,tp_4.0`elt-Neg37.5..Keverse -C4CCCTGAGATS.LkrlAGTG
...,.8?.1::R.-seg37 .5_ .Wiiti-T Te TGAAACACCCGCCC;A.A.AT
1/AsPC:R-m,g37..6..Ret..,:ksiett GAA..CAT,CTCTATTGCT.G.ACT-ACTTTTAATC
in..k.sPCR,seg3 7 .6..Revene. GATICCTAGCL-C,LkACATOCG
nAsPeR-ITg37...W11):1-Tn3e GAACATAACTCETATTGCGAC:TTTTAATT
.trAsPCR-seg:37.7..Recoeled AGAGGGTIG-FTTA=GATCACGA
m4.0`elt-Neg37.7..Keverse C:AGGt: :GC:TeTCTer.::ACAC;
PC.-seg37 = 7 ..Wihi-Type AGAGGGTTGTTTATTCETGATCACG-fi
im_kaPCR-sr.R37.3.Rczded CGATGeTTCCTA:TTCGT.CGTGATT
in..ksPCR,seg3 7 .8.Revers& ACCACCCMCCCITTTICIT
f1148PCK-ITg37.8.W11(1-Type C:GATGTTAC:CTATTCGTC:GIGATA
.13.1APC:.R,seg3S _Recodect COAGC:ri.1TAGTAACCTGA
m4.0`elt-Neg33.1..Keverse Ge1TGATG-C7GCCGic:"ITTC
mAsPC:.-seg3&1.Wi1d-Type CGAGCTGC:A.kTIGwATAACCGCT
im_kaPCR-sr.R38.2..Re.c,adr4i CTATCAACTCTGaikCC:-GCTCA
in..ksPCR,seg3S 2..Revene. CGCCC:GTICTG,TOTGC
nAsPeR-ITg3S.2..W11):1-Tn3e TTAAGT.ACTCTSCAiMGCAGC
.13.1APC:.R,seg3S 3..Recodect GCGGCTATCGATTATTGGC:T
m4.0`elt-Neg33.3..Keverse iSTCATTTIC:G,C:CATTACCGCTI-
PNCR-seg3S .3_ .Wihi-Type GC!GGCTATCTGGATTA.TTGGCA
im_kaPCR-sr.R38.4..Re.c,adr4i GG.ATACCATTCGCCTG,A.CCTC
irAsPCR,seg3S 4..Revene. CGCAATCACATCCAGTTCGG
illAsPCR-82g33.4..W11):1-Tn3e GGATAC:CAT, TI:GECTGACt:-kG
.13.1APC:R,seg3S.5.Recoded CGGCTCAAAAGC,Ti,,IC:AGGAC:TT
m4.&PC:R-Neg38.:-.7t.Revesse GATICACC.AC:C:TGTACCAC:,421,TTC:
iliMPCR,-seg3E CGGCAGT=VifiGTACAGGT:ITA
im_kaPCR-sr.R36..Rec,adr4 TCGC-G=CTGAGGTAAGT.T.TI.
inAsPCR,seg3S .45..Revene. CACOTCGCCAGATTEIAAATT
ayLA&PC:R-8eg3.8.6..%Vi1d-T,Te TCGGGT.TTICGC:TGGIC7.2LIT'T
.silA0C.M,seg3S .7 _Recodect TCATCCCCICAGCCAItCTT
103
CA 03027882 2018-12-14
WO 2017/218727
PCT/US2017/037596
m4sPCR-seg38.7..Reverse GCCACGGTTCTGCTGATTG
a-LA sPCR-seg35.7 ..1,Vild -Type TCATCCCCAGC:GCCATCTTA
FILUPCR-seg38.e. _Receded TCAATAGTTACCAGCGCGTTTGA
TELAsECR-seg3 8.8...Reverse GCTTCGCGTGGGTGATATGTA
tr.A.CR-seg3S. S. Wad-Type TCAATAGITACCAGCGCGTTACT
-sr =.,AsPeR-seg,39.1..Recoded GAGTCTITCTTCCAGTATTCATCGAAAG
IDAsPCR-seg39.1.Reverse CACGAGGTCAACTTCATCTGC
InAsPCR-seg39.1.Wild-Type GAGTC:TTTC I IT._..CAGTATTCATCGAAGC
m.AsPCR-seg39.2...Receded AGCCTGCCCGTTA1 __ CTCA
mA&PCR-seg39.2. Reverse GTATGTTCCGGCCATTGTAGAATC
mAsP.Z:a-seg39.2..Wild-Type AGCCTGCCCGTTATTTCAGC
alAsPCR-seg39.3 . _Receded CGTTTTTATTCC:CGCTCC,TCA
inA4K:R-seg39.3..Reverse CAATGC:CAGAGCCAACGAC
mAIPCR-seg39.3õWiki-Type CGTT. TITATTCCCGCAGCAGT
mAsPCR-seg39.4. _Receded CAAACTATATGAAGCCAAAAACCGTCTT.
mAsPCR-seg39:4..Recene CAGGGTAAACG'CGGGT
raAsPCR-8eg39.4. Wild-Type CAAATTGTATGAAGCCAAA.A.kCCGTTTA
.silAsKR-seg39.5..Receded AAGATGTGAGTATGGGTCGTTAAA,LAG
2r..A.R-seg39.S. Reverse CAGCCACCTC:CGATTCC:T
.inAsPt R-seg,39 .5:Wild-Type CAAATGGCTGTATGGGTC:GTT.AAACAA.
inAsPCR-seg39.6...Receded 'GCATCAGGGCCAGTGAAAAAAC
TaAs:PCR- seg39.6:. Reverse TGCTCGCCCTAACCGTTATAC
arlAs?CR. -seg-39A ,Wild-Type GCATCAGGGCCAGGCTAAATAA
nIA&PCR-seg39.7.Receded CGGTCGTATTTTCTCTGGCTCT
mAsPC1t-seg39.7.Re-verie TC.GGTCGATTGAGTGACAGC
mAsPCR-seg39.7. Wild-Type CGGTCGTATTTTCAGTGGCAGC
mAsEK:R-seg,39.8..Recoded GTGAGAATATTAGATAGGTTGAGC:AGAGAA
mAsPCR-seg39. &Reverse CGTCTTGCATCACTTCACCTTTAAG
.silAsPCR,seg39 GTGAGAATATTACTTAAGTTCAACAGACT.'T
ir-.AsPCIt-seg40.1 õReceded CCACiGGCCGCTTCTTTTGA
fr....UPCR-seg40.1 . Reverse CCACCCATTGAGTGACCTGAA
mAsPCR-seg,40.1..1417-ild-Type. CCAGGGCCC:-CTTCTTTACT
ITL.A.PCR-Ezg4C1.2. Receded CCYGTGTACGGAATAATCAGTGA.
331.AsPeR,seg40.2...Reverse GOTTTACTTCCTGATGACCTCACT
inAsPCR-seg40.2... Wild-Type CGC27GTACGGAATAJVICAGGCT
ILIA.KR-seg40.3..Receded ..kAACTCTOCGTCACCCTITCC
104
CA 03027882 2018-12-14
WO 2017/218727
PCT/US2017/037596
inAsPell-se&403...REvene CGCATTITCGGCTA.1 ICGC
mAiPCR-seg40 3.:WiId-Type AAACTCTGCGTCACC:TTAAGT
srAsPCR-se00.4.REeakied GTTCACAGTGTCCTTGCATTATCTTTGATT
sPCE.-8eg40.4.S..everse TGCGGACGATCC.TGTAATACC
InAEPCR-.seg40 4.WiId-Type GTAGTCAGTGTCCTTGCATTATCTTTGATA
mAs2CR-mtg4D.5..Recoded CTCAGGATDl:GCCCATATCTCC
rilA8PCR-seg40.5..Rever8e ATrrCCGGCATCATCAACGC
inAsPOZ-seg:40.5...W-Type CTCAGGATTCG'CCCATATCAGT
trsA8PCR-seg40.6..Rectocie.a CGTAATCTICCTGCCGIGACG
.113Ai-PC.R-..seg423.6..Rever8e ACGITTGTGCTCTGAAATAAAA
CGTAATCTTCCTGCCGTGAAC
.113AOCR40.7..Recoded GTACAGACA.GA AGAGAATGGACGA
tnA5PCR-seg40.7..Reveple GTTTGTGGIGCTGCGTGIC
raAsPCR-seg4D.7..Wi1d-Type GTACAGACAGAAGAGA.ATGGAGCT
mAiPCR-..4eg40.E.Recoded GCAGGGTAAGGGTGCTIC
LuAk:.,CR-seg40.8.Revenc GCTTIAAC1. EL&ATTTCTTTACCGTCAAC
.rnALPCR-5eg402.761d-Type GCAGGGIAAGGGIGCGAG
nrsAsPCR-se.g41 1. Ikezockd TGGACACTACTGCTGGCAATC:T
mAsPCR-8eg41.1..Reverse GCACATCACGCTC:AACTGAATAG
nrsAsiPC:R-seg41 1 d-Type TGGACATTACTGCTGGCAATCA
inA5POZ-seg:41.2..Rezoded TATCCATAGCAGGITTTGATGGTAAGA
raAsPCR-stg41.2..Reverse GTGCGACCIGICCGGATI
alA5POZ-segM.2...Wild-Type TATCCATAACAGGITTIGATGGTAGCT
inAsPCR-,ses41.3....Rezoicd AATCTAACTTCTCGCTGCAACTCT
ImAi-PCR-seg41 3..Reverse GCTTCAAAACGATCCTCTTCTGAAAG
niAsPCR.-,ses41.3.. Wild-Type AATCTAACTTCTCGCTGCAACTCA
raAsPCK-seg41.4..Recadea TCGTCACC:AGAAGCACAATGATAAG
PCR-see41.4_.Reversf TTITITTIACCCTTCTITACACACTITTC-A
raAsPCK-8eg41.4..Wild-Type TCGICACC:AGTAACACTGATCAA
.11-1AOCR-5eg415..Recotied CGTCTACTGGCAGATCAGCTA
raA5PCIZ-seg41.5..RevEse CGGACACGCTCGGCATAA
mAsPCR-8eg41.S., Wild-Type CGTTTGCTIGGCAGATCAGTTG
.nsAsPCR-seg41 .6..Recotied ACCGCACCATTGAACTCTCA
311AsPeR-sieg41.6..Reverse CGATTTCTTTGAGTACTACGGACAGA TA
rf.,AsPCR-seg41.6..Wild-Type ACCGCACCAITGAACTC.AGT
alkiPat-seg41.7..RecodEd TA=CAGTTTGCCCTTTICAGA
105
CA 03027882 2018-12-14
WO 2017/218727
PCT/US2017/037596
mAsPCR-seg41 .7 _Reverse Cr.r.AATCGGGITCTTCCAGTGC.:
mA8PCR-8eg41. Wiid-Type CAATT.TCAGTITGCCCTTITC.G.=
13-1,-UPCR -.sewn .8 .Recoded ITGATAGATGAGATTTCCGTITTIGAA
PC.R.-seg41.3.Revene AGeTC.TTITCGTC.ACTC.C:TTGA
Inils:PCR-m4 1 .E.Wad-Type TTGATGC TACTGATITCCGTTITC: TT
InA8.PCR-seg42.1..Reroded AGACACT.ICTACGGTGCAACITT
raAsPeR-ieR42./..Reverse CGAAASAAA.CCCTGCCGTC:T
snksPC:R-seg.42.1..Wild-Type AGACACTICTAEGGTGC:AACTTA
ra..A.00R-..Neg42.2.Receded CCATTGCCCATCAGCGATTG
inAsPC.R- g42 2 .Reverse TCTIG_AACC7Gt.'7ATAATAGGT. l'AGATA_A2s.1.7(i
inAsPCR-8eg42 .2. Wild-Type CC ATICICCCATC.A6CGATAC
mAsPCR -seg42 .3 ..Recoded CGCAGGAAGTGGAAGTC:TCA
inAsPCR-seg.:42 .3..Reverse TTGACCT1GAGAAATCAC.G T
niA8PCR-aeg42.3..Wild-Type CGCA.GGAAGTGGAAGTCAGT
raksPCR g42 .4 ..Recoded TGTTC:CGCCAGA. TAG
mAsPCR-seg42.4...Reverse GTGGTTCTGGTAG.A.TGIA T.TICGAG.A.
smAsPC:R-seg42 GTTCCGCCAGA.TAGAAGAGC
mAsPCR-seg42.5..Reroded GACATC.CAGGAGTCGAGCA1 __ 1AG
yrsAgPCR-seg42. . _Reverse CCTGTATTACTCCGGCTCTGG-
anAs PCR-seg42 . S.. Wita-Type CTCATCCAGCAGTCGAGCATTAA
raAsiKR-,8eR42.6...Recoded TACTATGCAGGGCTCGC..AACTT
riutsPCR,seg42 ..Revet-Fie TeGGAATGAATTGAGATATCGCCTT
reLksPeR,Neg42..6..Wild-Type =TACTATGCAGGGC:TCGCAATTA.
In.A.sPCR -seg42 .7 :Rec...oded GCAATC'EATACCAGCACATA.GG.A
inAsPCR-seg42.7.Reverse GCSCAACTATCCCTGGGT
inksPCR -.seg42 .7 .Wad-Type GC.AA=CATACCAGCACATi-IACT
.8 ..Recodtd CAATTTAGAGTCACGTTCACCACAA
1.11.45PCR-5eg42.. E _Reverse TTGC:C.TCACTCAATGACGATCA
111.4.SPCR-seg42 .8.. Wild-Typ GAATI IGCTGTCACGITCACCACAT
mAsPeR-ieg43./..Reeoded CiTC.TAC:CACTIATCCAGTCTIVGC
Ini-UPC:R-seg43 .1..Reverie GTTATCC:GGGGCATAGCGT
mA8PCR-5eg43.1.. Wild-Type TTTGCCACTT ATCCAGTCTICGT
ys3.A.PCR-8eg43 .2 ..Recoded GTGAAGCAGTGGTGATAAC:TAGAATAGA
milsPCR-seg43.2..Reverse TTTCAATAT&AAATA&CTTCiAT&GC
1.11.4,51`CR-&eg43.2...W1ld-Type (: TGAAGCACTIGGIGATAALTAAA-4.TACT
InAsPeR-seg43 . 3 ..Re codtd G3ATIGTGACCATUICTi,1:AL.
106
CA 03027882 2018-12-14
WO 2017/218727
PCT/US2017/037596
mAsPCR-seg433..Reverse CC GTCTITGG TITC T. GCTTITT'G-
tuAsPCR-seg43.3... Wild-Type GGATTC-T.6AGCATCTCTGCAT
mAsPCR-seg43 .4..Recoded CGGAAATAT'ITGATGGCAGACTGTAG
Cit-seg.43 4..Reveace CGGIGGTATGCGTGAT>
alAsPCK-._µ,eg43.4. .Wild-Type CGGAAAT,kTTTGATGGCGCTCTGTAA
mAsPCR-seg.43 . 5 .Recodtd CC:GCCAGGCTAATA.,4-4,TTCTGA
inAsPCR-seg43.5..Reverse. GCACGTCAGCATT. CTCATTA I-CFTC
rrLA OCR-43 CCGCCAGGGGTAAT.AAATTCACT
inAsPCFC-seg43.6.1tecuded GATAATTTGATTAATTTCGTTGGCAGAAAG
mAsPCF..-eR43.t..Reverse GCGCTTC',4_TGTTTCCTGGTC
mAsPCR-seg-13 :Wild-Type GATAATTTGATT_A.ATTTGGTTGGC:GCTCAA
tnAsPCR-8eg43.7..Recoaed CGGCTGACCCAG T.kCAAGGAG
rrrAOCR -sere .7..Reverse IGGGAA' CGTATTIATC.CGCITGA
inAsPCR-seg43.7.. Mid-Type CGGCTGACCCAGTACTAACAA
tnAsPCR-seg43.g.Recadeii CAGCACiAGTGAAT_A.AGGATA-kGGT.a A
mAsPCR-seg.43 .S_ReyenE GGAGTGOC.iTTATATTTATGTAGTGATAGAGC
titA8PCK-seg43.8:Mid-Type CAC',CAGAGTGAAT2s,AGGATAAGGACT
rnAsPCR -seg44.1.Recodeti TATITA.TG AAACGACTCATTGTAGGCATCT
mAsPCR-8eg44.1.Rfverse A TA AGAC GTTGCATT ATTGTCCTGAAG
nrAOCR -.seg44 .1 .1.Vi1d2rype TATITA.T.GAAACGACTCATTGTAGCiCATCA
mA8POZ-seg.44.2_.Rec3ded GTGAAATCATT.CTCGCCCAGTAG
titAsPCK-se 444.2. Reverse GCTGCG TGCGTAATGACTAC
inAsIN7R-seg44.2..V.iiid-Typt GTGAAATCATTCTCGCCCAGCAA.
111.A&PCR-ses44.3..Reooded 'TGAGATAACCGTCATAGCACAGT
IrsA.00R -3eg44 3..Reverie CGTTTACT.T117GCTCGTCGGTT
1.-a-As.PCR-sep.44.11..Mid-Type 'IC:AGA-PA ACCGTCATAGCACAGC
mEAsPCK-se444.4..Remied GAATAGCGTTGATGACATTG'CAAG
inAsPCR-seg444...Re.veue G ATCTCATTATCGACGACATCAACG
InAsPCK-se $4 .Wi1d-Type GAATA.ACGTGCTACTC.kTTGCCAA
nr.A.00R -3eg44 .5..Recoded GTATGCTGGTGAAGATGACGTTTC
mAs.PCR-st17.44.S..Reverse GTCA L.k_vL:t::GCCATTTTCTT
mAsPCK-st444.5..WiId-Type GTATCPC IGGICTAAGATGACGITAG
inAsPCR-seg44.6...Recoded GICITC:CD4GTACAAGTTGGAC
raAsPCR-s,t A44..6..Reverse CAGCAGCGCACGACCAAG
nr.A.sPCR -Kg.44 .6...TAM-Type CIT. TGCCTGAA.G-TACAACTIGGAT
InA.sPCR-stt,444. . Recoded ACCTTTATCTTCGCGC:TT.ATGTC A
107
CA 03027882 2018-12-14
WO 2017/218727
PCT/US2017/037596
alkiPCR-seg44.7..Reverse ATCCATTIAACTAAGAGGACAATGCG
triAaPCR-seg44.7..17,7i1d-Type ACCTTTATCTTCGCG-TTAA.TGAGT
srAsPCR-seg44.S.Reeatied TTTCTCCGGAGTTTAAACAGTTCTTTTCA
trtAsPCR-seg441.. Rt vole C.CATGTGAGCGCAGT=G
Wild-Type IT TCTCCGGAGT TTAAACAGTICTITAGC
mAsPCR-seg45.1...Rf coded GCATCAAAATCGA ICGCACTATCA
inA5PCIR-seg4S. _Reverse C.TTTTTCACGTTCGTTAGCCTGT
InAsPCR,seg45.1..Wilti -Type GCAAGCAAATC:G.ikTCGCATTAAGT
la,AiPCR-seg45.1.1tecoded TGACTTCGG=GCATGGTAGG
PCR.-seg45.1.Reverse AA.kATTICGAGOTTATTAATCATGTCAGATC
triAaPCR-seg45.2..Wig-Type TGACTTCGGGCATGIGC A AT
srA3PCR-seg45.3.Reeatied GCTG __ I'fiCGCCATGTCAATTCT
trtAs PCR-seg453 Rt vole CGGATTCAGACGGATTGACGA
Wild-Type .GCTGT TTCGCCATGICAATAGC
inAsPCR-seg45.4..Recetied TGAAGATCTTACCCCATCACAGTTTC
irsAiPCR -seg45.4. Reverse GGAll'sfAGCCCGACAC:CTT
InAsPCR,seg45.4..Wilti -Type TGAAGATTTAACCCCAAGCCAGTTTT
mAsPCR-seg45.5..Recoded CGTCGGCTGGGTAGACATTAG
sr.A.r''CR-seg-45.5. Reverse TGATGTCAGGGATTTCACGCA
anAaPCR-seg4S.S. .Wig-Type CGTCGGCIGGGIAGACATCA A
inA5PCR-srg45_6...Reeoded CACGACCCCC.ACATAAAATATT G'AAG
trtAs?CR-seg45_6.. Rt vole CCTTAAAGTCGTTG,'TC:'14TCXG
aRAPCR-seg45.6. Wild-Type .CAGCTCCCCCAGATAAAATATTGCAA
inAsPCR-seg.45.7.Reeoded TTA.TCAACGCGGA.= AGAGATTGACT
irsAiPCR-seg4S. ?Reverse ATGACTICAATGCCX_:AGTTCCT
InAsPCR,seg45.7.Wild-Type TTATC:AACGCGGAAGAGATTGACA
mAsPCR-seg45.S.Recadesi GCGCTAkrf.TACAAGAAG.kTGAATCA
sLIAPCR-seg-45.8.Reverse AAGGTGC iTTITTTACGCATTTTTAACA
3nAaPCR-seg45.8.3.141d-Type GCGTTGAAACTACAAGATGAAAGC
inA5PCR-srg46_1...Reeoded GTATTGCCT ATIGTTIGTTCTAGTGTGGA
trtAsPCR-seg461 Rt %Tat TGAAGAACTAAAATTCACCTCCGTT
tpAaPCR-seg4ti. ..Wild-Type GTATTGCCIATIGTTIGTICTAATGIA.CT
inAsPCR-seg46.2..Recatied AACAATCGCCG12:TTICGTAAG
irsAiPCR -seg46.2.. Reverse ACAACGCCTGAAATGATGCATAAA
InAFCR-stg46.2..Wild-Twe .AACAATGGCCGCTITCGTTAA.
mAsPCR-seg46.3...Recoded TACCTCAGCGACAAGAGCG
108
CA 03027882 2018-12-14
WO 2017/218727
PCT/US2017/037596
niAIPCK-seg,46.3,../teverse TTCGGC,TTTGAGTGTCCGT
ra.Agi-,CR-geg463..1,ViId-Type TACCTC:AGC:GACCAAAAACAAAC:
inAOCR-s ,eg46.4..Recoded GCAGAAATCAGACCGAGTGA
mAs..aCR-se.g46.4..Reverse GTTATGGTCGCGTGAAGATTGAAG
imAiPCR-seg46_4..Wild-Type GCAGAAATCAGACOGAGGCT
mA s-PCR-seg46_5.. Rea/tied GTTGTTCAT.ATTCAG TAC:TTTACCGACTG
nIAAPCR-seg46.5.. Reverse CGCTGGCTGAAATICATC
mAsPCR-se.g.46.5..Witd-Type GTTGTTCATATTCAGTACTTTACCGACGC
ITLANFCR-seg,46_6...Recoded CAACCGTAATTAACAACGCCATCT
raAsPCR-s.eg-46.6..Revet-se AATCAGACGTTTATTGGTGTGTTTACG
nsAl-PCR-5eg416.45. Wild-Type CAACCGTAATTAACAACGCCATCA
raAsPCR-se.g46.7.Rtmsied CCGAACAAATCCTCGCCCTT
InAOCR-seg46.7.Reverse GAACAGACGAATGCCTTCAGAC
raAsPC.F..-s.eg-46.7.Wi1d-Thx CCGAACAAATCCTCGCCTTA
inAOCR-seg46.3.Recoded CGATGTGCATTGAGTTGTGGTC
szAv,,,-1CR-seg46.S.Reverse CTTITITTACATTGTGCTGCTGTCG
mAiPCR-seg44.S..Wild-Type CGATGTGCATTGAGTTGTGGAC_:
inA5PCR.-,seg4 7. 1 ...Recoded TTACACCTCATGG'AAATTGCTGATAT
mA&PCR-seg47_/ ...Reverse JUCCTCTCITATLATTATGGGTATTCTACGG
viAgPCR-set-47.1. Wild-Type CTACACCTCATGGAAAAATTGCTCrATAA
ra.A&PC.R-seg472..Recoded GTCAAAAACCAGTGCCTCAGA.
inAOCRAs ,eg47.2..Reverse CCGCATTTTGTCZAGCATCTC
mAsPCR-seg47.2õWild-Type GTCAA,A.AACCAGTGCCTCGCT
imAAPCR-seg,413...Reeocled TATCTTC:GGTGCCAGCCATGA
;;-PCR-seg47.3. ru cent CGGTCTGTCACTGCACGA
TATCTTCGGTGCCAGCCAACT
mAsPCR-seg47.4..Recaded C:AGC:AGCAGTGTGATCCCTAG
=UPC-E.-se:VIA...Reverse CGGTAGCGCTAGGTCATTTTC:T
triA&PCR-8eg47.4...Wild-Type CAGCAGCAGTGTGATCCCTAA
nsAl-PCR-seg4 7.5.. Recoded AGATTGGCGGTAATAAAATGCGAT
mAsPCR-seg47_5.. Re veae GGAGTCGC:GGTTC:TACACTG
InAOCR-seg47.S. Wild-Type AGATIGGCGGIAATAAAATGGCTG
n_AsPCR-ser47.6..Reeeded CTGACGACGAAACCTTTGCAT
irsAiPCR -mg47.45.. Reverse GTCGATACAGACCAGCGATAGAT
inAFPCK-seg47.6. Wild-Type CTGACGACGAAACC __ rri GCAA
mAiPCR-seg47.7.Reooded C.=CCTGATTAAAACCCGGA.AG
109
CA 03027882 2018-12-14
WO 2017/218727
PCT/US2017/037596
m.A.sPC.R.-.se,g4 7. 7? Reverse ACC AGTATCACATCGACTCAGAAC
mAsPCR-sear4 73. Wild- Typt CTGTTC:CTGATTAAAACCCGGCAA
InAsPCR-se.,07.8...Recoded GGGTTCTATGGTCAATGTA.,A.CC.CTT
mAsPC.R.-seg47 .8..Rtyt-rw CAGGACATTTGGTATTTGGCTGAA
Truk.sPCIR -seg4 7. 8 GGGTTCTATGGTC.,AATGATAAAACCTTA
mikaPCR-seg48.1..Recoded TAATCCAGTGCAGATAAC:CTTCA.GA
inAsPCR-fiega. 1 ...R.everse AGAGCCTGCACT.TCT:TTCTGG
inA5PC.11.-seg48.1..Wii4Type TAATC.CAGTGC' AGATAACCT.T.CACT
ni.A.sPC.R.-sega. 2 ..Recoded CA CTGAIGCTACCGGTAAAACTI
smAsPC3.-seg:48 .2.. Revere CGCACAGTCAACCACCATG
inAsPeK-fieg43.2....WiWT",olte CACTGATGCTACCGGTAAATTG
mAsPC.R.-se.g48.3..itecotitel CGGC.AGATGACTTCGGTTCA
tnAsPCR.-seg43.3..itEVEME T(IGALTAcGTcAiTrCAG
tuA,..sPCTR.-seg48..3...Witd-Type CGGCAGATGACTTCGGTAGC-
1.-aA&PC.R-srg,48.4..Rczaded CGTGGCGATGCGMA.= keTT
&Reverse CATCCAGITC. ATC:GGTCGTTTTTAG
mAsPCR.-8e.g43.4..WAIType CGTGGCGATGCGTCAATTA
1/IA0:JCR -sega. 5 ..Recoded CGACCGATGGATTTACGAACAAG
mAs,PC.R-seg48.5..Revrox GTCTGTGGAACGGCATCAAA.
nuksPCIR -sega. 5 ..Wi11-Type CGACCGATGGATITACGAACTAA
mAsPOZ-se.,048 Recoded GAACATGCGTGACGAGCTATC-
nsAs:PCR-5eg4S..6...R.evea.se CGGCACT.AGATACGC:AGAAG
inAsPC.11.-seg48 .6.. WiiciType GAACATGCGTGACGAGTTAAG
tnAs;PCR-8eg4&7. .Rewdest TCAGC:GTTGATCATCACAC.CA
smA8PCR-seg:48 . 7.. Revere GTCGGCCCGTGTGGTATG
ITSAsPeK-fieg43.7...WAI:rytte TCAGCGTTGATCATCACACCG
m.A.sPCR-.se.:.343..S.Rec.ocied GTC7TTGATGATAGATATAGTC-G.A.CATCTG
tnAsPCR.-seg43.8. Reverse ,GTIA_ATGAGGGA TT TATCAAAACGATGC
tuA,..sPCTR.-seg48..8..Wild-Type GT(.2- r'TTGATGATAGATATAGTGCCATCGC
1.-aA&PC.R-srg,49.1..Rczaded CCAAATTCTGA.GTGTCCCCA:TGA
nPCR _Reverse GCGGTGTGGCTGGAAAAC
mAs.PCR-se.g49.1..WiWType CCAAATTCACIGTGTCCCCAACT
1/IA0:JCR -seg=19. 2 ..Recoded GGGCGTTCTCTGGGCAATT
mAs,PC.R-seg49.2..Revrac AAGATCATCCGCGTTCCT
rriAsPeK-seg49.2...Wildq-ype GGGCGTTCTC.TGGGCAATA
ni.A=-se.g49 .3.. Receded CGCACCCAGTTCTTCGTTAAG
110
CA 03027882 2018-12-14
WO 2017/218727
PCT/US2017/037596
lal.APCR-seg-19.3..ReN:w.se GCCTGTATGAAGCCGTTAAAGC
mAsPCR-seg49_3...WIld-Type CGCACC.CAGITCTTCGTTAAACAA.
CR-seg49.4..iRecoded CAGGGGCTTOCCCAGTC.A.
alAsPCK-seg494..Remse ,GTTTTGCGCCACCAGACC
mak52CR-seg49.4..Wild-Type CAGGGGCTTOCCCAGAGT
mAsPCR-seg49_5..aeakded CGACAACCGCGACAACTC
InAsPCR-seg49 GGGAC:CAACGCTGTTTCG
EgkAsPCK-seg49_5...Wild-Type CGACAACCGCGACAA.CAG
mAsPCR-seg49.6"..Recoded GGTCCOTTAGCTGCTCIGA
mAsPCR-seg,49:6...Reverse GAGGATTAGGIGGTOA.AATAAAAAGGC
inAsPCR-seg49 6..Wikt-Type GGTCCGTTAACTC-CTCOCT
EaAsPCK-seR493..R.ealded GCAGCOGTACACCTTCTTTCA
TriAsPCR-.seg49 7..Reveise ACCGATG.ATAGCGCCTGTG
mAsPCR-seg,49.7..Wild-Type GCAGCGGTACACC=GAGT
mAsPCR-seg49_:E..aeakded TCTGCGGTATTGGAAGTCAGATTC
mAsPCR-seg49.S..REve18e GAGGCACGACGICT=T
TriAsPCR-.seg49 S.:Wad-Type TCTGCGGTATTGGAAGTCAGA=G
mAsPCR-seg50.1.Reeoded GTTTCTAATOTTCTCTGTCTCACTA
mAsPCR-seg50_1.Reverse CAATCGCCGTGCATTCATCAT
inAsPCR-seg50.1.1,Vild-Type GTTTGGATTCIATGTTCTCTGTCAGTTTG
alAsPCK-seg50.2..Rewded OACCATCGCCTCGTCTGA
mak,,i2CR-se.2..ReN:w.se GGAACAACAGGCGC'rTATGAAA
mAsPCR-seg502. Mid-Type: GACCATCCsCCFCGTCGCT
InAsPCR-seg50 3..Recoded CGCTAAC-TATCGACCATTGTC:TACTA
EaAsPGR-8eg503...R.everse =TTGCATTTCCGCTGATTCAAG
mAsPCR-seg50.3..Wild-Type CGTTAACTATC.GACCATTGTTTGTTG
inAsPCR-seg,5314...ReoxitA ACCGATAACTATGGTGC.4ACTCC
inAsPCR-seg50 4..Rewisie 'TTC:CAGACTCACTCTCCGGTA
EaAsPCR-seR5024..Wild-Type Ala:GATAACTATGITTGAAGACAGT
ITIAOCR -.see() 5.Reco4ed CTCAGGCGTTTTCTGTTCTTTTCiATGA
mAsPCR-seg50.S.Rtverst TGCCAGTTTTCACATTCTTCAGTT
mAsPCR-seg503:Wild-Type CTCAGGCGTTTTCTGTTCTTTACTACT
mAsPCR-seg50.6..Recoded CGAACTAATTGGCATGGACTCT
TriAsPCR -.see() 6..Rel,Tei8e TTTCTTGTGAG'TCGGCC'TGAT
mAsPCR-seg50.6"..WW-Type CGAATTGATTGGCATOGACAGC
mAsPCR-seg503..aeakded CCAGCC.TTTAEGCAGCGTCTT
1 1 1
CA 03027882 2018-12-14
WO 2017/218727
PCT/US2017/037596
mAsPCR-seg50.7..Rt verse CGAL:GGCATCCATTACTICC
ra.A5.PCR-seet1 7 .1-Type CCAGCCTTTATGCAGCGTT. TA
InAsPCK-seg50.8...Peded GGAAGTITTACACCTCATATzs...CGCTT
rrIATPCR-.stg50.E...ResTa-se AGGAATGTTGGCGTGGCT
mAsPCR-seg5t).S..-Wiic GGAAGTTTTACACCAGC!TATAC:GTTG
rnA5.PCR-seg.5 1 1 _Recoded CCCGGCTT CAGTTCGITAG
11.1AsPCR-seg51.1..RelitT8e CCCATTCATTAAGTAACTCTGCACTTG
rriA.4PCR-gS1. 1. .Wild-Type CCCC:-GCTTCAGITCGTTAC
mAsPCR-seg5 .2...Rtzeded G:TGIAACCGTAGACCTCCT GA
mAOCR-stg51.2.._Reveise GTGGGCGTGTGGTGTCTC
smAsPCR-seg51.2..Wiki-pe =GTGTAACCGTAGACCTCCTGC
rriAOCR-stgS1.3..Recocled AACTGATTC-GTATGGTCGCMAA.
LuAsPC1k-sep 1.3 ..Revers'e GCTGGTAGATCTCTTCACGGT
mAOCR-stg51.3...Wild-Type AACTGATTGGTATGGTCGCTCAC'T
ra.A5.PCR-see 1 Recoded CTGCCCAAC:CTGTTCGGAAAG
raAsPCK-5eg.51.4 vase CAAAAC:TAAGTACTCTATTTCGCAGC TT
InAPC1k-seg,51.4..Wil1-Type CTGCCCAACCTGTTCACTTAA
111AsPCR-seg5 .5..Rtzeded GCATCGCATCCATCACTGA
tnA5.PCR-see 1 5 Reveme GAA.GATAAATCTATCGCGCTGCTG
InAsPCK-seg.51.5..WikI-Tylm GCATCGCATCCATCACGCT
rrIAT.PCR -.stg51 Recoiled AAGCACCATTATCGGCTGTGA
mAsPCR-seg5 verse GTCGGCGAAGTCAACTCAGA
rnA5.PCR-see AAGCACCATTATCGGCTGACT
11.1AsPCR-seg51.7..Recolitd CGAGGTCAGTTTC AACCGTAAG
mAsPC:R -sEgS: 1 . 7. Res,vm CGTAAA.A.A.CTC:GC:CGCTG' kAA TA
mAsPCR-seg5i3..-Wiicl-Type CGAGGTCAGTTTCAACCGTTAA
mAOCR-fieg51.S.Recosted CTATT GA.AAAC. AA TGTGCCGGTGAATC
smAsPCR-seg51.8.Reverse CA.7.7CCTC AGGTGATTCiTCATTTTTG.A
mAiTICR-sEgSl.E..WiWType CTATTGAA.AACAATGTGCCGGTGAA TT
/liA.sPC1k-sep2.1..Recoeled ATTACGCTTATCCCGACGCTT
mAOCR-fieg52. _Reveise AGACGTGCCTGATCTTCCTG
tnA5.PCR-see2 1 ..1-Type ATTACGC:TTATCCOGAOGTTG
raAsPCK-seg52.2....P02,:eded CCCGCATCCAGATAGATACAAGA
InAPC1k-se2.2..Rever8e GCAGGCATTTGAGTTCAGGTC
111AsPCR-seg52.2...Wiicl -Type C:CCGCATCCAGATAGATACAACT
rnA5.PCR-segS23 _Recoded GTTTGCAGGATTTCGCGTAG
112
CA 03027882 2018-12-14
WO 2017/218727
PCT/US2017/037596
tr...,,ksPC.3.-seg5.2.3..Recerse CTC:AACATACGCAA.CCTGGTG
irsA5PCR-seg52.3..Wi1d-TYpe GTT.T.GCAGGATTTCGCGCAA
mAsPCR-seg.52.4..Recoded AGAGGAAGTTGTGCAAAACGTG-
tilA5PCR-mg52.4...Reverse AGCAAGCTACAAACGCGAAAC
mAsPt: a-seg5:24..Wild-T'}.w AGAGGAAGITGIGCWACGGC
ELAiPCR-stg52.5..itecoded GCAGACGACCAATCAGAGTTGA
snAs.PCK-seg52.5..Reverse CGG'AIGGTGCG
mACR-seg52.S..WIWType GCAGACGACCAATIAG.AGTACT.
1r,AsPC.3..-seg.52.6..Recoded CAAGGACTGTATGGTAll's.TCACGAAG
InAsPCR-seg52.6..Reverse CGTGAACATC..TC:GATCTTATCTTATC:C
CAAGGACIGTATGGIAATCACGCAA
3nAaPCR-seg52.7....Rec3skd ATCGCTTATTTGATACAAGTCCTGAAAG
ErsA5PCR-NegS2.7. Reverse GCGGGGCTTTCTATAAAC:GAT
mAsPCIL-seg52.7...Wad-Type .A.TCGCTIATTTGATACAAGTCCACTCAA
-fieg5.2. &Recoiled CCAGTTGCT=CCGGGTTAAG
ksPeR.-segf.,2.8.Reverse TATCGCTATCCCGTCTTTAATCCAC
mAiPCR-stg.52..S.Wild-T3Te CC-kGTTGCTCCGGGTTC..
sr,A.,.:PCIR-seg53.1..Rec..eded AAAGTGAACAGATATTAATAATTTTGCCTGA
an.A&PCR-8eg53..1....Reverse TTTCAGGTGGATTACTTTTCTCAGGT
mAsPCR-seg.5.3.1..Witiol-Ty-pe ACAATGAACAGATATTAATAATTTTGCCGCT
trAsPfilt-seg.53.2...Recoded CATIATGATCGGCrTGATTCCTCA
1/.1A;sKR-seg:.-.-3.2..Reverse AC;TTAAAGT i TATTATC=CCTGCATC:A
InA&PCR-8eg53.2. Wild-Type GATTAIGATCGGCTTTGATTCCAGC
ErsA5PCK-5eg53.3..Recoded GCGTGGTAGCTAATGATCGTT
tr..A sPCR-seg5.3.3..Re verse GCTCTCCCCA.GTCGATATTCTC
GCGTGGTAGCTAATGATCGTA
itaAsPCR-seg53 4..Recoded GC.All's_TGCACC-CTGGATAITCTITC
nAFil3CR-stg.53.4....Reverse CATGTTGCACCATATCTTCCAGGA
triAsPCR-seg53.4..Wild-Type GC:AATGCACGCTGGATATTTTAAG
tylAaPCR-seg53.5....Recoskd GCAAACAG=GATGCCCIA
suAsPCR,seg.53.5..Reverse AAAACAAGAACAAGAAkGGAAGGGTI
mAsPCR-seg.53.5..Wad-Type GCAAAC.kt.: .111...:GATOCCTI 473
0.1..4,0CR-seg53.6..Rec.aded TAAGTGAAGAGAGAAATTAGTGGACGATC
alAsPCR. -seg53.6....Kevene GTCGTAIAAAAGGD-ITGAATTGTGGGTT
ErsAiPCR-stg53.6..Wild-Type TAAGTGAGAGAGAAATTAGTGGACGATT
tr¨AsPC17.-seg,53.7..Recoded GTTTCCATATGCCAGOC,TATC: AA-J.-
113
CA 03027882 2018-12-14
WO 2017/218727
PCT/US2017/037596
mAsPCR-seR531..Reverse AGTTGCCTTAC:GATTTTTGAGAGC
mA5.PCR-seg53.7..Wikl-Type GTITCCATATGGCAGCCTATCAAA
-mAsPC13..-8eg53_8..Recsda CCATCTCTGCCAGCACTTrTAG
mAs:PCR-wg53 S.Revene ITCG=GTATGGCGTAGG
inAsPCR.-,seg53.8..Witd -Type CCATCTCTGCCAGCACTTTCAA
raktPCR-we-4 1 ..Rec.:Aed CTTCCGCCAGCGTTGCTAG
PCR-seg54.1..Revme CGAGAGAA.AGTGGCGCAAC
naAsPCK-seg54.1..WiLd-Type CTTCCGCCAGCGTTGCTAA
EIA,sPC3.-seg:54.2..Recoded TTAATGATATCGGGCTACTACACTCA
mAsPCR-8eg54.2..Reverse GA.A.C.GCGCACCG-TACC
mk.sPCIR-seg54 TTAATGATATCGGGTTGTTGCACAGC
lilAsPCR-8eg54.3..Recaded CGTGATAGC:ATGTCATCAAAACCAAG
311A8FeK-stg543..Reverse GGTCGTCTITGACC.IGGAAAG
inAsPGR-se.g:54.3..Wiiti-Type CGACTTAACATC_iTCATCAAAACCCAA
ritAa,C11-8eg,54.4..Reezadeti GCTATGGCGATCTCATC:TGTAC
inkEPCR-segfot 4.:Revefse C:ATCCTGAEX;TACGACCTGAAA
mAsPCR-8eg54.4..Wdd-Type GCTATGGCGATCAGTAGCGTAT
m.A.sPCR-wg54_5.Recodecl CGCGAA_AGTCCTACTICTTCAAATAG
inAsPCR-seg54.5.ReTerse ATCCACCCCTTCCTCTGTTTATAA
mA8PCR-.wg54 5.Wil1-Type CGGCTTAATCCTACTTCTTCAAACAA.
nikiPCR-seg54.6..Recoded CTTATTATCGCCTCCAAAGTGTCA
mAsPCR-seg54:6..Reverse CGCGTTGGTACTC:TGC:CA
inA3PCR-seg54.6..Wikl-Type rTAATTATCGCCTCCA_AAGTGAGC
inAsPC11.-seg54.7..Recoded GGCGAACCAGACGAATCG
rakiPCR-rieg54 7..Revene 'GGTAACGCACGGTGGTCA
inAsPCR-seg54.1.. Wild-Type GGCGAACCAGACGAAAGC
-mks KR-8eg54.8..RecsaM TGCCTG.AGACATGLAGAKiACTGA
Elks PCTZ-seg54..8..Reveae TCTGCGAAAGATTGATGGTATTCC
mksFt.a-seg54:g..Wild-Type TGCCTGAGACATGAACiA.kTACGCT
mkt-PC:IR-we-6 1.Recodrd GAAT.ATGCGCCTATGACAAATGCT
mAsPCR-8eg55.I..Reverse ATCAC.ACGAGAAGTICAGAAGCAT
rnAsPCR-seg55_1..Wild-Type C.VLATATGCGCCTATGACAAATGCG-
inAt:POZ-se.F55..2..Retoded TCCAATCGGTATCAATkkTCTATCTCA-kTCA
triA8FUK-wg552..Rever8e AATCTCGGITCCTATT I AATGITCAGAC
inks-PCIR-we5 TCCAATCGGTATCAATAATTTA.TCTCAAAGT
mAsPCR-aeg55.3...Recoded GATAACGGCAATTTCTCGGAAC:TT
114
CA 03027882 2018-12-14
WO 2017/218727
PCT/US2017/037596
rflAsPCK-seg55.3..Rtvelm CCfl.CGCTTCACCTrCCAG
rakiPC'R-seg,55.3..Wi1d-Twe GATAACGGCAAMCAGCGAATTA
riaAsPCR-seg55.4...rucustit TATC:ACCCGCAACGTC:AATCA
rrIAAPCR-seg55..4_ Reverse GTGGCCGATATAACCGA.GAAC
lav-IsPCR-sez55.4..lifik1-Type TATCACCCGCAACGTCAAAGC:
In. A5PCR-segS5..5. _Receded GGCTACAACCATCACCTTTCG
inAsPCII-seg55.5..ReNzerse CACTGAGTGAACTGAGCCTGA
.PC.R-svg55.5..W.11d-Type GGCTACAACCATCACCTTAGC
PCR-seg55.6..Retzatie1 AAAATACTTCCAGCCTCTATTTATGTACTT
rnAsPCR-seg55.6._Rf verse C:AATAAACCGCAGCGCAGAG
sn.A.1:PCR-seg55.6..Wild-Type AAAATATTGCCAGCCTCTATTTATGTATTA
CR-se.g55. 7. _RA.,Wed CGAAAGGAGAAACACTGAIGTCA
rriAiPCR-segS5_7. _Reverse AAGAGATCCGACGAAATGAGCAT
12-1AsPCR-seg55.7..Wild-Tve CGAAAGGAGACACTGATGAGC
.Pc.-seg5S.S.Recoded TCCCTGGATCAATTTATCG'AAGCAT
raAs.:CR-seg553.Reverse GAAATCGITCGGGAA.GGCAATC
InAOCR-sEg.55:.S..WiWrype AGCC.TGGATCAATITATCGAAGCAA
sn.A.1:PCR-seg56.1..Recoeled AACTGTATGAGCGTTATCAGCG.A
rnAsPCR-seg56.1....Revelle CCTCACGGCTAGGTTC'GC
AACTGIATGAGCGTTATCAGAGG
rnAsPCR-seg.56.2..RecentEd. GCAGOCATTCGTGTTCYTTTGA
in.A5PCR-seig56..2.Reverse CGATCTGTTTATTGCCACCACTG
rnAsPCR-seg5i5.2. Wild-Typc GCAGCCATTCGTGTTC __ r IGCT
InAOCR-sEg56.3_Rece<lect TCCAGTCCTAGCXAGTGTGA
mAsPCR-seg.56.3..Rewrse GGGAGAAATCACCGCCATG
rriAiPCR-segS6_3..Wili-Type TCCAC.-TCCTAACCAC-TGGC" :T
inAsPCR-seg56.4..Recoderi TGTTTACAGGCA.AATTGAGGTAGTAG
.inAiPCR-seg54.4..._Reve&-se CAGTTTTTGCCCTTGTTCCGT
mAsPCR.-seg55.4..WW-Type TGTTTACAGGC/W.TTGAGGCAATAA
rnAs.PCR-seg56.5. _Rg coded TATTTTTC:CATCAGATAGCGCTTAGGA
clAsPCR-seg,56.5..Reverse GGAAAATTATCGCCACCATC2CTT
mAsPCR-seg.56.5..Wik1 -Type TATTTTTCCATCAGATAGOC-CC:TAACT
.1r3A0CR-seg.5.6..61tecoded. GGTTTCTTCACCGTCACTGA
rnAsPCR-seg56.6.RevzslE. GCATAATTC.CCGTCATCAAAC,TTCTAG
CR-seg5t1.45...W ad-Type GGTTTCTTCACCGTCACGCT
mAsPC.R-seg55.7..Recaded TIGCCGCCA-LAATAITCGTATGA
115
CA 03027882 2018-12-14
WO 2017/218727
PCT/US2017/037596
.113AOCK-seg5.6. 7 _Reverse ACTCGt..+TICGGA A
laks2CR-seg56.7..Wild.-Type TTGCCCreCA-4 A AT A TICGTAGCT
InAl-PeR-sEgS6. 6 .Recoded GC:TTTTCAGGCTTACTCGC:TTTCC:
takiPCR-sep,56.S. Reverse CTGACCGTTGATATrGTTGCCT
raA8PCR-8eg56.8..W.ild-Twe GCTTTTC AGGCTTACAGTTTGAGT
illAv=PCR-seg5.7.1..Recoded AAA TC GATCGA_ACTCGGTGTATC. A
LIAsPCR-sew,57. 1 ..RIEViTSC GTC:TTTALtliCATCAC..-GATCACATC
iriA.00R.-seg.57. 1 _Wild-Type A.AATCGATCGAACTCGGTGTAAGC
mAsPCR-se,g5.7.2...ilemded GGITA:kACTICC:ICCGCTCiTC:A
.ITLAOCR -seg57. 2 ..Revealle CGCG-AACCAAACAGCG"TATT
LuAsPCR-seg57 .2.. Wild-Type GGTTAAATTACCTCCGC.',TGAGT
.113AOCR qieg5.7. 3 ..RecocW CCGCACTGGTTATGGGTTTTT
iiiMPCR-seg57.3...Re3;ef.se GTCACGGCCATCAAGCAC
rnAsPCR-wg573....W11.4Type CCGCACTGC.iTTATGGGTTTT_A_
PCR -segS: 7. 4..Recoded CTAAACAGCAAGI:GAATCAGTCA
mAsPCR-seg57.4..Reverse CAGAGATGTTGAAGAAGTC:G'AATGC
.n2A0CR -segS7. 4. _WM-Type CTAAACAGCAAGCGAATCAGAGC:
snAgPCR-se07.5.Recoded TCCAGACGGAAGATACTGAATACT
.11-1A1PCR-5eg.57.5 _Reverse CA.GAGGATTTTCGGGATGTCG
inAg.PC11-seg5 .5.Wild-Type TC:CAGAC:CG.GATALTGAATAC A
mAsPCR-se.,45 .7.6 ..Rewded TST-TAAGCTGACCA.A.CACCATCT
ANPCR-seg57.45...Rerse GCCACCAGCCiAATAGGTCA
mA &PCR-aeg5 ..Wiki.-Type. TGTTAAGCTGACC&UCACCATCA
.inAiPCR -wg.57. 7..Recoded CGTCGGTACTTATTGGTGCCT
;;.-AsPC.11-seg57..7..Revroe GGGCTATCTTGACCGACTGAC
.mAiPCR -segS. 7. 7 Wild-Type CGTCGiGTATTGATTGG:TGCCA
mA.00R-seg57.8.Recoeled GCGAACTATCTGGATAACTTCTCCCTI
rnAsPCR-seg57.8..Reverse TCGACA=CCAG.ACCAATATGC.
tilAPCR-sey,57.8.Wild-Twe GCGAACTATC.TGGATLkC.T.TCAGTTTA
n.lAsPCR-seg.58.1..itewded CCGGCT. TCATCATCITCGAAAG
.17.3.4.IPCR-segSS. I ..R.e3.,ersie CGAG,..4.A.AGTGAAGGGCGATAAACT
illAu-,C.ft-se.g58.1.. Wild-Type CCGGCT. T.C.ATCAT=CGATAA
.mAiPCR-segSg. 2 ..Reesxted GCATrGACAAGTTTTTTAACCTGTGAT.AG
srAsPCIE.-seg58.2..iteveose TTATCATGTGGCGTAAAGAAACAGG.
raAsPCR-se,g53.2...W11.1.-Type GCATTG.' ACAACTT-7TTTAACCTGACT. CAA.
trIA5PCR-seg56.3 ..Recocied CAACCGCTACTTCTATCETC:TTC:TT
116
CA 03027882 2018-12-14
WO 2017/218727 PCT/US2017/037596
tylA&PCR-seg58.3.acverse CGAAGATCGTATA=AAGCAATGATT
mAsPC:R7segg 3 :WiId-Type CAACCGCTATTGCTAAGTTI-GTTG
ErsA43CR-aeg53.4..Recolied GGTATGCCTGTTCCCGTGA
inAsa.".R.,..seaSS 4..Rewa-se TCATCGTCTATTCAACGGGCAA
Ers4sPeR,NeR53.4.. Wild :Fype GGTATGCCTGTTCCCGGCT
trAsPCR-seg58.5..Recojeci AGATTGACCCTAATAATAACCCCTCA
alAs.?CR-aeR5S.S...R.Everse CTGGTACTGGATTGTATTGATCGCT
mAsPCR-seg58.5..Wild-Type AGATTGACCCTAATAATAACCCCAGC
frA,.:PCIZ,seg5g_6.Rectided CTCTTAAATTCAAACTGGCCE_.1 CTT
irLtsPCR-seg56.6.1everse AGTAAGTGC:CGCCAGTGAG
DIMPCR-seg513_6:WiIci, Type GCCTTAAATTCAAACTGGCCTTGTTG
InAsPCR-seg58.7..Recolied CCGCACCTGATCCCATCA
suA.sPCR4;eg.58_7..Reverse CGTCGAGCATCTCCTGTGG
InAsPCR-8eg58.7..WiLizrype CCGCACCTGATCCCAAGC
suAgPCR,.;eg53_8..R.es:oded CAATCACAACCAAACGACTCATCA
mAsPCR-seg58.S.Reverse GAACCAGTCGCCCCAGGA
sr,APC2.,scglig .8 .WW-Type CAATCACAAC:CAAACGACAGCAGT
InAsPCR-seg59.1..Recotied AGCCAGTTCCGGGTCGATT
srAfiPCR,sttg59 ..Revea-se GTTAACC-GCTGAAGGACATCG
RIA.&PCR-,8m59.1..Wi11-12[Fge AGCCAGTTC:CGGGTCGATG
snA=sPCR,segs9 2..Recodesi GGTACGAATCGAC ATATAGCCTGA
CATTTGTTGTTATTTTGCAOGGTTTTTG
AsPC:it-seg59.2. :Wild-Type GGTACGAATCGACATATAGCCACT
ACAACTATAACTTCTGTCTTGATGGTCTT
ralAsPCR-segfc9.3..Kersie G=GC.'CGGACATITTIGAG:A
si:A.sPCR4;eg.593...EVild:rype. ACAACTATAACITCTGTCTIGATGG1 I. G
tr-AsPC.:R7segfc9.4..Recoded ikACGAACGTAATACCAAACCCTCT
EzA5PCR-5eg59_4...R.evelm: CGTCCAGTCTGAAC:CTTTGC
arlAsPCR-seg59.4.11ifikiType AACGAACGTAATACCAAACCCAGC
frsASPCR-aeg59_5..Rewded TGAGATGTATGAGICGCCAATAGA
mAsPCR-seg59.5..Revelse CCTGAAGATAAGTAAGATTTGACATAACCG
Ers.,U3CR-aeg59_ 5.. Wild :Fype ACTGATGTATGAGTCGCCAATGCT
trAsPCR-ses59.6"..Reco4ed TATTCAGGCCATTCATAA.GCAGAAATGA
sliAPCR,seg59 6.11evuse TTCGTACACTAATTACCCTTCGCA
mAsPCR-seg59.6..Wilii-Type TATTCAGGCCATTCATAA.GCAGAAAACT
snAliPat,seg59 .7.1lecocied AA GA AGAGCTTTCA.AAGATTCGTTCA
117
CA 03027882 2018-12-14
WO 2017/218727
PCT/US2017/037596
iss.A.sPCR-:ceg59.7.Revesse CGTGATGAGTGICCGCCATA
inAsPCR-seg59.7.Wild-Type AA.GAAGAGITGAGTAAGATICGTAGC
Tri21.Ã13CR-segS9. F._ Receded GCAAAAATSGACTOGTACCIGAAG
.tuAsPCR-seg59.3..Reverse 'IAGATIGTCC',ICAGGATGCCITC
IriAsPCR-segS9.S..W114-Type GCA.e2LAAATC.4.C.=ACTGGTATCTGAAA
si..ksPCR-8eg60. i..Rdd GTMTACC:TAG.AT.A...tCCTGAAATGACTGA
triAsPCR-seg6e. I. _Reverse GCACCGCGTGTITC:ACTC
inAsPCR-segfa 1 GITITTACCT.A.TAAC:C.'GCTAAIGAC:GCT
tilAsPCR-seg60.2. _Receded CIDGCC.:GATTCAATACCCGAAAG.
mAsPCR-seg60.2..Revn-se CCTACGCCAACCCG.AACA
InAsPCR.-sega;',2__Wilki-Type GCK;C'CGATTCAATACVACTTAA
alAsPCR-seg60.3.1teeeded CTTCTAAAAAT.,1ACGC:CTSTTCTCATATC:A
niAsPCR-.segti0.1 Reverse CCTCCCGGGTAAAATATTGCTT
ra.AsPCR-8eg60.3..Wild-Twe ITACT4AAAATAACGCC:IGITTTLAAT,42,..GC:
Tri21.Ã13CR-segg.:.4_ Receded T_AACCCATC:GCCGCAGAAAC.;-
rekAsPCR-seg60.4..Ree AICATITC:AGGG:ATTGC_kGTGC
InAs.PCR-seg60.4..Wi11-Type TAACCCATCGAAVICCGCACTTAA
mAsPCR-segal.5..Reeaded. CACGCTATGCCAAATATTGTTCTATCA
mA8PCR-seg..60.5. _Reverse CGTTAATGCGATTC:AC:C:GGAAC
iss.A.sPCR-5egt50.5..Wig-Type CACC1CIATGC:CAAATATICIFITIAAGC
tilAsPCR-se,360.6. _Receded GATGC.:GATITICTGGTTTACTC,TTCTC
in.A.sPCR-seg60.6._Revene CGATGTCACCACGTTAATATGCAC
inAsPCR-segeta.6..Wild-Type GATGCGA I' ITICTGGITIACTITGT-TiG
13AsPCK-8eg60.7..Recinied GTTTACCTCTGCAACGCTATCITC
mAsPCR-se:360.7..Reverse TC-ZIEGTGAATOGGGTGTTAACAGA
mAsPCR-segdfl 7. .Wilki-Type GT. ITACCTC:TGCAACGCTAAGTAG
InAs.PCR-seg60.8.Recoded ACCACI __ 1CGCACATCCTCTCE
TriA.sPCR-segfie.S.Reverse GGTGAAAGCGCGAAGTAA.CAAATA
mAsPCR-ses60.8.Wikl-Type ACCACTTAGCCAGATCTGC
TriAsPCR-fieg61. /...Receded CC..A.GGAGCAC;_,%.TCCAGTGA
121AsPCR-seg61..1..Reverse CIG.ATC:TTIACCI-GGITCTC,TATC;CT
CCAGCA,GCAGATCCAG..ACT
niAs2CR-sestil.2..Recetied CGTTCCATAAGOGTTTGTTCCGA
sriAsPCR-.segfil .2. Reverse GCACITACGCTTC:-CAGGATG
inAsPCR-segt$1.2..Wild-Type Li __ CCATTAACGTTTGITCG<:7
mAsPCR-seg61.3.Receded GCCGCACGTIATGAAGATGAAT
118
CA 03027882 2018-12-14
WO 2017/218727
PCT/US2017/037596
mAsPCR-seed1.3..aeverse OGCAAGC.ACCTACCGGAT
rnAsPCR-seg613_.Wild-Type GCCGCACGTTATGAAGATGAAA
tnAsPCR-seg61.4..iteceded GGCCTTTGTTTTCCAGATTCTCA
ril.A.sPCR-se01.4.11everse CGCCTGCTCACCGGTATT
mA&PCR-seg,61.4..Wiid-Type GGCCTTTGTTTTCCAGATTCTCC
mAsPCR-1.5.1keeoded TGAGGC_ICGACGC:AATCTC
mks.PCR-seg61.5..Revefw CGCAGGATTATAGTTACGCTCAAT
mAsTiCR-seg61.5..Wild-Type TGAGGGCGACGCAATCAG
mAsPCR-seg61.6..Recaded TGGCTGACGTCGGTATGC
mAs:PCK-seg6 I.6...Reverse TCGATGAGGTGAAGCAGGAC
inAOCR-see.1.6..Wild-Type TGGCTGACGTCGGTATGT
inAsPCR-fieg6-1.7..Recoded TCATCATCACCGTAGAATGAAGAAG
GTCTGATTGGCGGGCAAAT
tnAsPCK-seg61.7..WILI.-Type TCATCATC:ACCGTACTATGCAACAA
mAsPCR-sog61.S.Recoded GAGGCCCGACTGATCA=CA.
niA&PCR-seg,61.8...Revemse TC,,GAATGACATACTCAGGTTCGC
mA.00R-.seg61.8._Wild-Type GAGGCCAGAC:TGATCATTAGC
mAsPak-seg6.2.1..Recoded CATCATCTTCTCAAACACCGCAAC2'
mAsPCR-stg62. 1 ..Revezge AAAAITTTCGCCATGTATTACCAGGT
snAsPCR-seg62.1..Witd-Tyw CATCATCTTCTCAAA.CACCGCTAA
tnAsPCR-seg62.2..iteceded GCTTCGCGTATTCCTGATAGTCT
mAsPCR-seF6.2.2..Re8e CCGGAATATCGCTAAAGATCGC
inAsPeK-:rieg62.2._Witd-TypeGJILGCGTATTCCTGATAGTCG
2A.sPCR,se.:02.3.11ecoded CGATCTAAAAGTGGGCAAATTCTCA
mAsPCK-8eA62.3.Reverse GTGTGA_AGAGTTCCACCATGAG
mAsPCR-seg63_3_Wild-Tn,Ne CGATCTAAAAGTGGGC.nAAATTCAGC
mAs2CR-seg62.4..Rezaded CAGGGTCAGTTTTACCCCTGA
mA.sPC.R.-.seg62.4.1kewsgre CACTCCTGACTC:CTITTGACCA
mikaPCR-seg62.4._Wiid-Type CAGGGTCAGTTTTACCCCACT
inAsPCR-fieg6-2.5.Recoded T=ACC-AGCGCCATGTCAAAC
inAsPC11.-ses62.5.1Zeverse CGAC.AAAGTCCGGCAA_ACC
tnAgPCR-seg62.5.Wilti-Type TTTTACGAGCGCCATGTCAAAT
inAOCR-seg626..Recoded CACAGCAGTAGGGATATGCGA
mAsita-:ceg62.6..Reverse CGCTAAACTTGCGTGACTACA
mAsPC._W1d-Type CACAGCAGTAGGGA:TATGGCT
mAsPCE.-see62.7..1 GAATTC:CGGTAACCAGATTGACA
119
CA 03027882 2018-12-14
WO 2017/218727
PCT/US2017/037596
imAsPCR-stg61.7..Reverse GAAGCCGGTCGAATTTACTACC
mAsPCR-ses62.7..Wild -Type. GAATTCCGGTAACCAGATTGACG
trIAOCR-sep52 .8..Reeoded GGCCTGGTATCACTCTCCT
mA8PCR-seg62.B_ReVEE5g GCCGTTTC.VAGCGCAATATT
tmAsPCR-seg62.. E.. Wad-Type GGCCTGGTAAGCCTCTCCA
inA8PCR-seg53.1..Recoded CACGMT.TCAACCTGTTATTCGTC
mAsPCR-seR63.1..Reverse GTATTCGCAGTACCCAGCJCAA
nrAsPCR-se.g.63 _WM-Type GTC:GTTTACAACCTGTTATTCGTT
DaAsPCR-&eR63.1.Rec2desi ATGAATATCTGAAATCTCTAGGDGCaTCA
imAsPCR-stg632..Reverse GCTGTTTAGTGGAGTATCAATGCG
mAsPCR-ses63 -Type. ATGAATATCTGAA AAGTTTAGGTGCTAGC
AR-e3 3..Reeoded CATAAGCCAGTTTTGAACAATTCCAGA
mA8PCR-seg63 .3_.Reveae TCTGAAGACCCGGCAAGAAC
tmAsPCR-seg63.3....WiLd-Type CATTAACCAGTTTTGAACAATTCCGCT
mAsPCR-segt53 4..Recoded C:GCTTC:CA.GGGCAACCTT
mAsPat-seR63.4..Reverse CGTTGCTOGCATATTCTGTAGG
nrAsPCR-se.g.63 CGTTACCAGGGCAACAATTG
InAsPCR-seg63.5..Recoded TGCCGATTGTGCGTATCCTT
mAsPCR -st& .5..Revene GTATTTACCAGCCCAGGA ATTACC
mAsPCR-ses63 -Type. TGCCGATTGTGCGTATCTTA
tuAsPCR-seg63.6..ReonaM GCACCTTTACCACCAGCTGA
mA8PCR-seg63 .6_.Reve8e GTTGTGCCTGGTGAAACGG
risAsPCR-5itgi53_6_,Wi1gi-Type GCACC,TTTACCACCAGCACT
mAsPCR-segt53 .7 __Recoded CCAATACCTICTTCTGCGTAC:ATT
mAsPat-seR63. 7..Reverse TGTCAATCAGAC-:ff- GGGATTTGT
nrAsPCR-se.g.63 .7 _WM-Type CCAATACCTTCTTCTGC:GTACATC
szAsPCR-seg63 .8. Recoded ACGTGAGAATCATCATCCAGTATTAG
mAsPCR-stge &Reverse AC:CCGTAGTAT.CCCCACTTATCT
mAsPCR-ses63 .8. WiitizNse ACGTGAGAATCATCATCCAGTATCAA
tuAsPCR-seg64.1..ReonaM GCAGACGACCGATTGCAGA
sLIA,.;PCR-seg64.1..ReVEae AGCTGTGGGTAAAGCTGTCG
mAs.PCR-seg64.1..Wild-Type GCAGACGACCGATTGCACT
mAsPCR-segt54 2..Recoded GCTCCGCTTCTGGAAAAAAACT
mAsPat-seR64.2..Reverse CGACCTTCACCACCACCAT
raAsPCR-seg642...14614.-Type GCTCCGTTGCTGC-ZAAAAACA
mAsPCR-segt4.11.Recoded TILA.GIGG-GGAAGIT.GCCAGAA.G
120
CA 03027882 2018-12-14
WO 2017/218727
PCT/US2017/037596
tuAaPCR-stge14.3.11.everse CTATCICTACATCCGCCAGITCAA
titA4K:R.-seg64.3.-61:fild-Type TTAATGCGGAAGTTGCCAGTAA
.p.n.k&PCR-sge54. 4.. _Recode.d .AC.ACCGGAGACTCATCAACTAG
sa,A&PCR-stg61..4...Kevenie CGGCTGATG,.TTTGTG
inAsPCR-segti4 -Type TCACCGGAGACTCATCAACCAA
raAOCR-mg64.5...R.ecoctect GC'CGCCATTTTTACCC:TCTCA.
issA4PCR-seg64.5..RtVef5e ATCCGCTTGTAGTCAGTATTATTTTGC
mAsPCR-seg64.5..Wild-T34.1e GCCGCCATTTTTACCCTCAGT
alAsPCR-segt44.6..RecoJed. CTGCTA __ i ACCGACTCCTTCTTCTC
1.31.A.PCR,seg64.35..Rewsse GGAGATAAACCAAGCTGACCGA
111APR-seg64.6..d-Typt. CTGTTATTTACCGACTCCTTCTTCAG
.1.31APCP,se064.7..Rec.)Nieti CATCGC:-GATTATGCCC:.AGIC
laAsPCR-seg643..Porverse CGTGACTGC:C:GTACCGTT.
m4sPCR.-srg64..7..Vi-Type CATCGCG.ATT.ATGCC:CAGAG
niA4iK:R.-seg64.3..Recolitti ATCAAAAACGATCTCAAGCAGCIT
mAsPCR-stge54.8...Reverse TCCAGQTAAATTCCATCAGCGTTA
rsIAPC:R. -segc-14 Miki-Type. ATCAAAAACCTATCTCAAGCAGTIG
tuAsPCP,-seg65.1...Recoded GC AGGG TGTAGTCGATTGATGA
retA.&T-'CR-8Eg65.. ...Revesse GTCTACCTGTGGCGC:ATCA
.1.31AsPC.R,segi5.1...Wik1 -Type GCAGGGTGTAGTCGATACTGCT
In..4.sPCR-seg55.2...itg,cotted CGCATTACACTCTGCAGCTGT
ig.AsPCR-stg65.2..Revese ACCTCGGCGCAATTTGTTTC
zn..4.sPCR-8eg55.2..Wild-T3Te GCCATTACACTCTGCAGCTGA
.111AsPCR-segE.3.._Recsaimi TCATCTGA_AACCTTCCGTGTGAG
nik3PCR-seg65.3..Reverse TACTGATGAACCCGCCAATT_TTTT
inAsPCR-seg6S.3...Witd-Type TCATCGCTAACCTTCCTTGTTAA
fil..4.sPCK-srg65..4..Kecixted TTTCTCGCTGGGATGC.ATCA
tnAsPCR-segt55_4..RE:rene ACATCGITATITTCCAGCACGTIC
saAsPCR-stg65..4..Wi1d-Type TTTCTCGCTGGGATGCAAGT
1.11A%?C:R.-seg65.5..Recotied GTACATGATATCGTTIACAACCCATC:A
InAsPCR-sege55.5. _Reverse CCACAGLAAGCGTCGACAAC
tuA&T-=CR-sEg6S.S...Wiid-Type. GTACATGATATCGTTTACAACCCAAGC
suAsPCR.-seg65.6..Remied GCTTC:TTCTCATC:GTCACCCTT
niA&PCR-sEg6S.5.Rewsge GAATTCATAGTGTTGCGCCCAA
zaAaPCR-stg65..6..Wild-Type GTTATTGCTCATCGTCACCTTG
1.11.A.sPCF.-seg6.5.7...PorciAled CG'TGTCCATGCCGTTTCTC
121
CA 03027882 2018-12-14
WO 2017/218727
PCT/US2017/037596
inAsPCR-seg6-5.7..Reveose _A_ACTGIC'E __ CGCCATITC.AAA,A
13-1.45PCR-..seg65. 7 ..Wad-Type CGTGTCCATGCCGTTTTTG
imAssPCR-8eg65.8..RezatieA CGG.A.= kTTGGCTTATCGATACC:TTTT
rsiAsPCR-stg6.5.2. _Reverse GTGACC.CACGGCTTCCTG
szAsPCR-seg6-5.3..Wiki-Type CGG,,L'sATTGGCTI ATCGATACCITIC
niAsPCR -segi56. 1 ..Reesuded GTTC:ACTCCGGCTTATGTCA
anA8PCR-seg66..1..RexEw GGTCGCCCATCCCTCATG
raAsPCR-seg66.1. _Wild:Type GGAGTCTGCGGTTGATGAGC
tuAsPCR-seg66.2..Recoded CAGCGAGGTAAGTCCATTTACC=
DaAsPCR-8e266.2..Reverse GGTGCGCTGACTATCGGT
inA8PCR-seg66.2. Narga-Type CAGCGAGGTCAAAATCC.ATTTTCT
tmAsPCR-se 06.3...Receded CGGTA.TGCGGTAAGACCTGAT
rs-rAsPCR -seed 3 ..Re3xtise TGGTGGTTATL:AGGTGGGAAATT
mAsPCR-8eg66.3..Wiid-Type CGGTAAATGCGGTTAAACC:ACTG
niAsPCR-5e0-6.4..Recoded CCCTCAGCTTCA.GGA,AATTCA
imAsPCR-seg66.4..Reverse CGTTGOGATGATTGCGTICC
rs-sA.sPCR g66. 4..Wad-Type CCC..kGCGCTTCAGGAAATAGC
salAs.PCR-seg66.5..Recodeel CCTGGCTGGTTACCGGTT
113.42PCR-seg66.5. Reverse AC:CTTAGTACCCC:GCCGTA
satA8PCR-seg66.5..Wiki-Type CCTGGC.TGGTTACCGGTA
mAsPCR-seR6.6.6..Recoded CTCACCTTTAAACATTTTAGAGTACCATGA
anA8PCR-seg66..6..RexEw GAGTATGAT G TCGAA CTGGCC:TTA
risAsPCR-seg66.6.Allild:Type CTCACCTTT AAACAT I .1 GCTGTACCAACT
rrtAsPCR-seg66. 7..Recoded GTCACCATAGGC:CAGGTTTGA
DaAsPCR-8e266.7..Reverse ATGTGCGTCTCiTTCCGTGAA
121AsPCR-seg66..7. Narga-Type GICACCATAGGCC:AGGTTACT
21AsPeR-,seg611.8..Recaded CTGATTATCGCCGGTGCCT
rs-rAsPCR -seed ..Re3xtise CAGTACL.GCGGGCTTGTT
mAsPCR-8eg66.8..Wiid-Type CTGATTATCC-CCGGTGCCA
niAsPCR-5e0-1.1..Recoded 'ITITITTAGTCGCCACGTCAGAAG
inAsPCR-sege .1 _Reverse. GGAACGGCATIGTCACITACG
titAsPCK-siegd 7_ I ..Wiki.-Type TTTTTTTAATCGCCAC.GTCAGTAA.
salAs.PCR-seg67.2..Recodeel TCAC ATTGTCAGCTICAAil.ATCTCTCT
mAsPCR-seg67.2..Rever8e TCTGTTTTGGAGAGTGCTTTAACATC
rsrAsPCR,seg61.2..Wiid-Type AGCCATTGICAGCTTGA,4_AATTTAAGC
tnAsPCR-segd 7_3 _Receded ATATTTTTA.k TCTGGG TAT (.:,.s,AAGAGCTA
122
CA 03027882 2018-12-14
WO 2017/218727
PCT/US2017/037596
IIIMPCR,m07 3..Reverse C:ATCACCC:C:GC:CAAACC A
mAsPC21-80.3..,Wild-Twe CAATATTTTTAATCTGGGTATC:WGAGTTC;
A8PC:R.-se.g67 .4_ .Recodi4 GC:GTGCTCAT,6'LTTC:TACGTCGT4,4:TAAC
mA&PC11.-8eg67.4..Reveoe TCATCTICTATAITA.GTAGCTGTGAAAGGA
niAsPCR,seg67 C=CGT.(ICTC:ATATTCTACCiTAAAT,.L.?1T
m.k&PCR-5rR67.S..Reoxiesi CTICATAC:CGC;GCTIGCTACTR:IT
asAPCR-seg6.7.5...Reverse G4TC7CAsay-TAGACCAA.kC;TACC
znAsPC14,seg67.5..Wi1ti-Type ITGCATACC:GG4L-'iCIGTTAT.TATT.G
mAsPCR,seg67 .6..Recoded
tnAsPC:R-seg67 .ReveEse GC:AGGAAGGGCGA.kGAAG
faAs;PCR-seg67.6..1171U-Type CTATCAAT.A.TrCAACTGGGA-CKiT. TG
InAsPCE,8eg67.7..Recoded T,U_CTTCC:TCACTCAAATAGAACGACTT.G
imAsPeR-8e67.7..13.rverse TTGA TTCGCAAIGCATGACAGA
InAsPCR.-.se.g67 .7..Na'riisi;1`ype l'_TTITCTCACTCA,,I,.;1,.TTG4ACGA'ITAk
InA&PC11.-seg67.8..Recoded CGCC:GCTACCATCAGGATATTAG
niAsPCR,sego7 .F4..Reverae GCCTCTA.TCACTC7GACCITCG
mAsPeR-5eg67.8.. Wild-Type CGCCGCTACCATCACATATTAC
asAs:PCR-seg613.1 J&ecde CGCCEGCICTMATC1G-4.
alAsPCTI-seg68.1..Revcof ACCTGTCAAAAAATAT,2i_ACGCACTAATATCA
TfiAsPCR,stg6g ki-Typt CGCC,CGCTCTTCATCACT
tilAsPC:R.-se.g68.2_.Recodi4 GTGAGGCCCCCTC:AATTC,µA
mAgPCR-seg2. Reverse CATTTCTITGACOGATT5aTTG-TTCAC
tuAsPCH,seg68.2..Wild-Twg GACTGGCCCOCTGAATACT
imAsPeR-se0S.3..RecadM 'IrGCCAC:GAC:AA.T TAGGAGTAG
tultsPCR-.se.g.63..Re.verse. GICTTCCCTC-G-C:TGCGTT
riLksPeR-seR63.3..WArl-Type TCGCCACGACAATCAACAACAA
mAsPCR,sega .4...Recoded ACCGCCGAACAGCTIT.kCTC
inAsPeR-seg&.4..Revffse CCATA:T.TCGGGTC.=-CA:TCAGTIG
-131AsPCR,sega id-Type ACCC4CCGAACAGC:TTI.4LCA.C.'
mAsPCE,sege:8.5..Recoded GATAACGAGTITGAAGATGAATGTGC:T A
snAsPCR-septig5..Reverse TTIC:.TTGCCCCACAGCCA
rELAs:PCK-seg63.6.Reccoied TGATT. C.,,GGGC:CAT TIT. TGTTC:TTC
mAsPCR,seg68.6.Keyes.se TAITCAGCCAGGC GITAA GG, TT
.1-11.k&PCR-srRdS.6.Wild-17),-pe TGAITGC-C-GC:CATTTITC;TTTTAT
nIMPCR-.seg68 .7 __Recode4 GC:TCCGGTTTACTCAATCAGCTT A
123
CA 03027882 2018-12-14
WO 2017/218727
PCT/US2017/037596
mA21-PCR -see& 7. _Reveise CGATITGGGTTTCGTITCGIGT
smAsPCIL-seg68.7..Wiki-pe GCTCCGGTTTACTCAATCAGCTTC
rrIA.4PCR Recoded CCA.GAGTTTTAGCCTGAACCGA
tuAsPCR-seg68.3..Reverse GGGCAAAAAACA.LA.AAAGGICAGC2"
mA21-PCR -see& 3...Wild-Type CCAGA:GTTTTAGCCTGAACACT
smAsPCK-seg69.1..Recolitd CGGACGTAGATGTGGGAATTTCT
raAsPCK-sege. . Rg vase GTGTAACGCTCTGTGGAAAGTC:
LuAsPER-seg69 .1 ..Wild-Type CGGAC:GTAGATGTGGGAATTTCG
mAiPCR-fiegti9.2.._Recoded CAAAGACCGGTTTAAGATCATC'TGA
tnA5.PCR-se,09.2 Revel-se ACGC_4CACTATC4TTTTTTAACAATGAAAC
mAsPCK-5eg69.2.. Mik1-Type CAAAGACCGGTTTC:ATCATCGCT
LuAlOCR-serS . 3 _Receded TAA.A.AAATCAGACAAAGGCCGATACGT
mAsPCR-seg59.3..Re verse AACCTITACCCGTTGTGCTITC
rnA5.PCR-seg..693 I AAAAAATCAGACATAAGCCG.ATACGC
olAsPCK-seg69.4..Recolitd C:CGA:LA_GTGCCTGAATTGCA
rrIA.4PCR -seg69. 4.. Reverse CGTATAAg::GGTCAGGTACTTTCCA
mAsPCR-seg59.4..-Wiid-Ty-pc CGCTTAATGCCTakATTGCC
mA21-PCR -fiegti9. S.. _Recoded CTTGTTTGGAGGATACCTGTTTATTC:GA
smAsPCK-seg69.5..Relierse TTTAGCGCCAATCTGAATCGTTAAC
rrIA.4PCR -seg69. S.. Mild-Type GTTGTTTACTGCTTACCTGTTTATTACT
tuAsPCR-seg69.6..Recocled TAAGGACCCGATTAAA.GGCTGCTTTA
mAiPCRgti9.6._Reveise TTTTTTTCCCATCACTTCTTTCCC
smAsPCIL-seg69.6..Wiki-pe TTAAGACCCGATTAAAGGCTGCTTTT
mAsPCK-seg69.7. Rgi:eded CCGGACTCGAGATGACCTC
InAsPER-seg69.7..Reverse GACACATCCGCCAGCATT
mA21-PCR-fiegti9. 7. .Wild-type CCGGACTCGAGATGACCAG
tnA5.PCR-se,09.8. Receded GGGTTTAC1TTCGCCTGAC4A
mAsPCK-seg69.8. Rg vase TGGATCGGCTGATGGC
LuAlOCR-se09 GGGITTAC __ I 1 iCGCCTGGCT
111AsPCR-seg70. I ..RE coded CGGACGAC:TATGGCTGGATC
-s7_ 1 = _Revel-se CGCATCGGTTTATTTACACCAGTC
olAsPCK-seete.1..WikizType CGGACGATTGGCTGGATT
rrIA.4PCR -seg 70. 2. Recoded TGCC4CCCGAAT:ACCGTCTA
mAsPCR-seg70.2...Re verse GTCTGGAGTATTATCGTCGGCTTTA
mAiPCR-fieg70.2...Wikl-Type TGCGCCCGAATAACAGATT 'G
mAsPCR-seg7Ce.3..Recotitd AGGOGATATCCGGGTCTTCT
124
CA 03027882 2018-12-14
WO 2017/218727
PCT/US2017/037596
iyLA.s.1-`,.'.11:-segii".1.3. _Reverse LITACTGTCAAACACTCTCTGATCTTC A
.inAsPC.R.-seg 70 .3..Wild-Type AGGCGATAICCGGGTCTECA
mAaPCR-8eg70.4.aecoded GGAACGAC:AC:GCCCTTA.GAT
TakiPCR-segM.4..Reverse AACAATGTTGGTGAGCTTG.kGA
anAsPCR-seg70.4....Wild-Type GGAAACTCAC.GCCCTTGCTG
suA5PCK-seglO.5..Rec..exied CCTTGTTCGTGTTAATCCCAA.GA
alAsPCR-seg70.5..Recerse GCC:AGCCITTCGTACCATG
flAsPCR-Eeg70.5..Wi1d-Type CCTTGTTCGTGTTTCCCAGCT
InAsPCIL-seg70.6...Reeeded AAGAACTCAACGCGCTACTrC
iyLA.s.,PCK-segli"Ø _Reverse GCTITTAIG&GGGCCGAC.A.
.111,`-PC.R.-seg 70 .6:Wild-Type AAGAACTCAACGCGC TATIGT
mAaPCR-8eg70. Recoded ACTGGAGCTTATCAGTGTT AATFCCATAC
TakiPCR- seg 3L(1 .1. .Reverse TTCTGAATGTTTAAATGTTGCCTATGGT
mAs7CR-8eg10.7...Wild-Type ACTGGAGCTTATCAGTGTTAATTC:TATAT
rELAsPCR-Neg70..S.Reccided CCAATAAAAACiCACTGC,ATGATCAATAAG
LuAsPCR-seg70.8.Revase, CGAGGCTATCAGGTTGTGCT
mks PCR- mg70.1 .Wild-Type CCAATAAAAAGCACTGCATGATCAATTAA
/aMPCK-scg 7 .1....Remied GCTGGGTAAATGGGCTGATCTT
syLks3.--%.7R-hel/..1..R.everse GATGGTCTTTTAGTGCGGCAAC
niAsPC:R.-seg 7 .1..Wild-Type GCTGGGTAAATGGGCTGATTTA
mAsPCR-seg7/.2,..lecoded AAATGAGCT,.4AAACATAACAAACAACTT
/11.44Ka-F.eg .2..Reverse GGGGAGGGGAAATTGATAACTTGTA
mAsPCR-segn .2. _Wild-Type AAATGAGTTCAG AA CATAACAAAC AATTG
rel.4.sPCR-Neg71_3..Rec.odect GCGACCATCT TTC:ICTTCCGTATTA
LuAsPCR-seg7 .3..Recerse TGCTCAACC.ATGCTCTAGGTG
mks PCR-Eeg713..Wild-Type GCGACCATCTTICTC rir...-CGTATTC.
.sliA.PC.R.-seg 7 .4..Recoded GCC4TGGTTTATGGGCATC-CTA
217:A07-K7R-he =7/ .4. _Reverse CCGGT1CT13GAATGTGTTGTAC
ir.AsPC.R.-seg 7 A.:Wild-Type GOGTGGITTATGGGCATC,TTG
m..A.sPC.R-seg7/.5...lecoded GACGGA4TTATGGTTGAAATCTGGTC.7
alAPC:R4t17 5..Reverse CGACGACATCTGGGATTGCT
mAsPCR.-segTE .5..Wild-Type GACC.iGAATTATGGT-TGAAATCTGGAG
PCR-Neg71_6.Reccided GTCCAAAAGCCTCAATTCITTCA
InAsPC.R.-segTE .6.Revase, GCAATCTTATCAATCACCCGAAGTC
nIA&PCR-8e7/.6.111i1ki-Type GTCCAAGCCAGCATTTTGAGC:
.,-µ,,,.PCR.-seg 7 .7 õReceded GATGATCTGCCTTCTACGCCCTT
125
CA 03027882 2018-12-14
WO 2017/218727
PCT/US2017/037596
sr,A3PCK-seg7I .7 _Reverse CGACGGGAAGATAAACATGCC
.ffs4F,PCR-stg71. GATGATTGCCTTCTACGCCTTA
InAsPC:R-seg7 .8.Recoded CGGAATCGGCAGAAT.AAAAAG..kATT
g_Revesse GCCTG=ACCTCATATAAAACGC
srAsPCR-seg7 I .8.V.11A-T',.,w CGGAATCGGCAGAAT AA A C.AAAATA
m4iTiCR-stg72..1..Recoded TACATCGCCGCCCCTITTG
snksPCR-s.eg 72 . 1...Revem CGGTATCTACGCTAACCAGTCC.
ra.A. sPCR-seg.72.1. _Wild-Type TACATCGCCGCCCCTT, TAC
srAsPCR-seg722..Recoded TGAAATCTGCGGAGTT A AGTCGAATA
mAsPCR-seg72.2..Revene TCACCGCCAGACAAGCAC
silA.PC:R-seg 72 .2..Wild-Type TGAAATCTCgCGGAGTTAAGTCGAATT
mAs.PCR-seg72.3...Recoded AA TCCCCTCCAGCGACGA
taA5PCR-Eeg72.3..Reverse TGAGGTTTATCACGACTCTCTGTG
sPCR-seg723.. W iia -Type AATCCCCTCCAGCGAGCT
mAEPCR-stg72.4....Recoded CTACTCGTTTAAAGG ATGAAGCTA
InAsPC:R-seg7.24..RevefEe GCCAGTGCCTTTTCTTCTTCG
ti-LA.1:PCR-Eeg72.4..Wi1d-Type CTACTCGTTT AAAGGAT TAATCATGAAGTTG
srAsPCR-seg72 .5..Recoded AIT _______________ I CCATCTCCGCACCAGA
nIAJ'CR-stg72.S. _Reverse TGCGCGTACAGATTGGC:T
InAsPC:R-seg7.2.5..Wild-Type ATTTC:CATCTCCGCACC:GC:T
mAaPCR-:ieg.72.6....Recosied AAGCACGTCAGGGTTCACTT
srAsP.R-seg72.6..Reverse GCCTGTTCAATTTCCTGCC A
mAiTiCR-stg72.6. _Wild-Type AAGCACGTCAGC:-GTAGTTTG
snksPCR -.s.eg 72 .7 _Rer.oded __________________ CCGGTCGCGAATC
tyLARPCR-:ieg.72.7. _Reverse GTCCAGCGCCCAGGTATC
srAsPCR-seg72 .7 ..Wild.-Type GGTTTTTCCGGTCGCGAAAG
mAsPCR-seg72.8..Recoded ATTACCGAAGATTACCAGGAAAIGT
-.s.eg 72 .8..Revem GCAGITATCGTACCAGGGCTTA
ATTACCGAAGATTA_C:CAGG.AAATGA
taA5PCR-Eeg71. ..Recodea ACAATCAGGTAC:TTATCTTATTC:TATTCTCA
mAsPCR-seg73.1..Revene GCAGGTTGACGCCA .TATACC
m4EPCR-stg71.1. _Wild-Type ACAAAGTGGTACITATTTTTTTATTCAGC
InAsPC:R-seg73.2..Recadtd ATCAGAGAGACAATAATGCCACCTAG
ta_A.1:PCR-Eeg71.2..Reverse CCGGGTGCAATTGGTTATGTT
srAsPC:R.-seg75 ATCAGAGAGACAATAATGCCACCCAA.
m4CR-stg7_1.3.Reoxled ATACGTACCTGCGGATGACC
126
CA 03027882 2018-12-14
WO 2017/218727
PCT/US2017/037596
niAsPCR-seg73.3.Revefie CATTGCCATATC"ACCCTCCGA
retAsPCR-NegT3.3.Wil&Type ATACGTAC:ETGCGGATGACT
IllAsPCR-seg73.4..Reezded CAGCTACTGGTGGTGATAGCAT
mAs.PCR-8eg73.4.Reverse CGAGAATGTACGCAGGTCCA
rilAsPCR,seg73.4..Wiki-Thle CAGTTACTGG1=GATAGCAA.
mAsPCK-gThS.KcQded CATATAGCGCTTCCAGGGATGA
EEEAsPCK- mg71.5..Reverse GCCCGCGCGTTT<2'AATAT
laAsPCR-seg73.5..Wild-Type CATATAACGCTTCCAGACTGCT
snAsPCR-1:eg73.6..Recoded ICAAACA.CAAACCGCAGAATCC
snAsPCRrseg,7.3.6..Revelse GCGAGTATA.GATGC:CAGTAAGC
imA&PCR-FieR73.6..W31&Type TCAA ACAACAAACCGCAGA_AAGT
traAsPCR,seg73.7..Recaded ATCTGACCGATGACAATC-CCT
tuAsPCR-seg73.7...Re verse CCATC:GGTTGTTTTCAGAAGCAT
issAsPCR.-seg73.1..Wiki-Type ATCTGACCGATCZACAATCZCA.
IllA8PCR,seg7.5.8.Recoded CAC:GTTAATTTTTAGAAGATCGC:GAATAA.G
raA.a.PCR-Ezg73. S. Reverse AGATTGCGATGCTTAATGGTTGC
CR,seg7.3.3 Mild-Type CACG:TTAATTTTCA.AGATCGCGAATCAA
mAsPCR-seg74.1.Retvdr1 CTTGGACGAGGAAAGGCTTGA
rnAsPC:R.-seg74.1..Reveme ITCGGCATGIGC-AAAGTCA
zu..µ..PCR-seg74..1. Wild-Type CTIC:4GACGAGGIAGGCTTAG
ra.4sPCR-Kg74.2..Recoded GAC:ATCATC.ACCGTCGATTC:T
mAsPCR-seg74.2. Reverse GGTGCCATGTGAGCGATAGT
relAsPCR-Neg74.2..Wild-Type GAC:ATCATC-ACCGTCGATAGC
mAsPCR,seg74.3..Rec.x.sded CTAACCCGGACGATGACTCA
mAsPCR-set74.3. Reverse lkAACTCCAGCCCTTICGAC
rnAsPCR,seg74 .1 Wild-Type CTAACCCCrGACGATGAC.s.AGC
mAsPCR-seg74.4..Reezded CAGGAGCCAAAGATATAACCCkGT
roAsPCR-mg74.4..Reverse GTCTTCGTGGTTATACTTCTGCTAATAATTT
zn.A.sPfil-seg74.4.. Wild -Type CAGGAGCCAAAGATATAACOCAGG
111.40CR-se74.5. _Recycled CTGAACTACITTTCCT;GATATGTCGCTT
snAsPCRrseg,74.5..Re7else ACAAAAACCAGCGCCATC:AG
znAsPCFL-seg74.5. Mild-Type TTGAAt_.1 AC1 .1 ICCTGATAIGTC1TTG-
sssAsPCR-3eg74.6..Recocied C.:GIG-GC:I-GT! III CCTIGTAIC:
tuAsPC.R-seg74.6..krierse GGTGTCGC,GAC=TGAGATAGAG
mik&PCR-lieg74.6..Wi1d2Type CGTGGCTGTTTTICtTCGTCAG
snAsPCR,seg,74.7..Reeede4 ACCGTTCTGAATACATCAAGCAAC
127
CA 03027882 2018-12-14
WO 2017/218727
PCT/US2017/037596
tn.AsPCR-seg 74_ 7 _Revtme TTTGGGTAGTTATCGAAGTGCzCA
IMIMPCR-8eg74.7..Wi1d-1A3e ACCGTTCTGAATACATCAAGCAAT
smAsPCR-seg:74.8..Recoded. GCCAGAGTGCAAGTGGTC
mA&PC.R-seg,74.8..REvErse ATCC:ACTGCCAGACC:TC ATI-TT
GCCAGAGTOC...AAGTGGC2C
alAsPCR.-se 05.1 _Receded GTCGATIAGTTCCATAAATCGCTGAAG
1/IA0:JCR -seg75. 1 ..Revesse GG'ATACCAACAACATICAGTACGC
mAs,PCR-seg75 .I.. Wild-Type GTCGATTAATTCC ATAAATCGCTGCAA
rn.A.sPCR.-.seg 75 2 ..likee.ociM CiTGCAGATGAi4ATATCT4iTCT
mAsPOZ-seg:75.2..Rewse AAC:AAATGGTTCTATGAGAAAGAGGTAAA
inAsPCR-fieg752.:Witd-Type GTTGGCAG.TGAAATTGAAAAIATCTATAGC
inAsPCR-seg75.3..Recoticti TTCC.AGACAGGTAAGGGTAGAGAAT
tuAsPCR-8eg75.3 _Reverse CGCTECITTCTCCCGACC.A
inA,..sPCR-seg7.5.3..Wild-Tygse TTCCAGACAGGTTAAGGTAGAGAAA
1.-aA&PC.R-srg75.4...Recatied CAC.TITTGCTACC:AGACC IG A
TH.A.,sPCR g . 4.1kewssie CCGATTCAGGC:AAIGTGATTIGT
mAsPCR-se275.4..W1d.-Type CACTITIGCTACC:AGACC:GC.T.
mAsPCR -stg75. 5 ..Recodt.d "GGGCAAGTA ICIACAGCACTCA
snAsPC.R-seg75.5..Rtwcrw GCAATAATTAGTAGCTGCCAAATGGA
TriA.sPCIR75. 5 . AVM-Type GGGC:AAGTATTTACAGCACAGT
smAsPCR-seg:75.6..Recoded. GCCCAGGAACACCTCGAAC
inAsPeK-g75.6.Reverse GTTGCCGGATCGACAATGTC
mAsPC.R.-seg7-5.6..Wild-Type GCCCAGGAACACCTCGAAA.
tuAsPCR.-aeg75.7.Recodzi TTTTCACGTGGTTCACTACAAC TPC..
mA8PCR-seg75...7_Reverse ACAAAAAAGGTC:TCGGTAAAAGCG
risAsPeK-se 05:7 .Wact-Type. TTIAGCCGTGGITCA I-MC.4AT TC-T
rn.A.sPCIR.-.seg 75 8 .Recocteci AGCTTTGAGGTATCCArTCGTGA
mAsPCR-se275.8.Revier.6e TATGGAIGTTGATAAGCCAGGCAAI,.
rn.A.,sPCR-seg 7f: ..S_Wild-Type AGciTrGAGGTArccArrccAcT
inAsPCR-seg76.1..Recoticti CCAGTTTIACTTTTAATGGTGATGGTTCA
nIA.sPC.R-seg-76.1 _Reverse 'TTTCCGCATCCATTCCTTCAGA
inksPCR-seg:76..1 _Wild-Type CCAGITTACTTITAATGGIGAIGGTAGT
InAsPe.X.-:ceg76.2..Reaktied CTTGTCCACGCCTTG .. I ITC:TTTAG-
11.1A,s.PCR-seg76.2..Reveoe AAATCCGCCTTTTATTATGGTTCAGG
alAsPCR.-seg76.2:..Wdd.-Type CTIGICCACGCCTTGITTCTIC AA
tn.AsPCR-seg76.3Recodef. CAGATCCTCAACTCGCTGATTAACT
128
CA 03027882 2018-12-14
WO 2017/218727
PCT/US2017/037596
InAsPC:R-seg76.3..Revelse AG4C.r7GTCG4CCAGATTICG
mAs PCR,NeR76.3.. Wild-Type CAGATCCTC:AACTCGCTGATTAACA
raABPC:R-seg76.4..Recoded CGAGCAGCATGAAGATC TIA,s,ATCA
saa&PCR-seg75.4..itevesse TGATITPL:IGG-AAGIGGTCITICAG
mAsPCR g76 .4. Wild-Type CGAGCAGCATGAAGATTTAAAAAGT
mAs2C1I-8eg76.5. _Receded GATCTICCG1 __ tiTGATGTGGGA
zi.ksPCK,Negft.,Reverse CGCACAC:TTACACCCTGAAATATC
2Pseg76.5..Wild-Typt GATCTTCCGTTGTGATGTGACT
mAsPeR,NeR76.6../lecoded CCTGGCCAAACAAAGICCTCT
initsPCR,ut, g16.6..Reverse .ATTCATTCATTTATTC<=ATCCAC=TCGTT
mAsPCR-seg76.6...Wild-Type CCTGGCCAAACAAAGTCCTCA
nisisPCR,H176.7..Reeeded CGAAATCITTL=C4t:C=ALT.CT
atA6PC:R.-seg76.7..Reveise GTATGGAGCCAACGAAGAATAAAAATTT
AsPCK-%egL Wild-Type CGAAATCTTTGGCGACGAGACG
TriAsPCR-wg76.8.Remded GCGACGGCGGAAA_A_TTGA
InAsPCR-seg76.8.Rfvesz TCGACAGACAACCGATCACTIT
mAsPCR-stg16.8.7h1d-Type GCGACGGCGGAAAATAGC
tnksPCR-seg.77.1..Receded GTTA:TCACCAAGAAACAGAC:CTGA
mAsPCK,8eg77.1..Reverse CGGAGAAAGTCAACGCGTTT
tnksPCR-seg.77.1..Wild-Type GTTATCACCAAGAAACAGAC:CGCT
InAsPCR-seg77.2..REcoded AAAAGCGTCGAAAAGTGGTTGG
InAsPCR-wg77.2..Reverse GCAGCCCTATACCATCACC
InAsPCR-seg77.2. . Wild-Type AAAAGCGTCGAAAAGTG-CTT AC
InAsPCR-ieR7.7.3. _Receded CCGAC:AATACTGGAGATGAATATGTCT
tnAsPCR-seg77.3..Revase CCACACATCCAGGCCCATAAT
mAsPCK,8eg77.3. Wild-Type CCGAC:AATACTGGAGATGAATATGAGC
raA4PCR-segil.4..Reeeded GGTTCGGCACTATTCCTGTITC TA
niAsPCB.-seg77.4..REverse CGTGAGCGCCTGAAACAC
raAsPCR-seg77.4..Wild-Type GGTTCGGCACTATT=TT7TTG
triAsPCR-seg77.5..Recoded CTTCACATCCTGAGTATC:CT.TACCG
InAsPCR-ieR7.7.S..Reverse GCMTCTCACTGGCGGGT A
risAsPCR,;eg77.5. Wild-Type CTTCACATCCTGAGTATCTTTACCA
niAsPCR-seg77.6.Recoded ACCCACACCGAAGAA2,..tATGAGTAG
raA4PCR-segil.6.Reverse GCGAATGATCTAACAAACATGCATCAT
triAÃPCR-seg77.6.Wild-Type .ACCCACACC:GAAGAAAATCAACAA
InAsPCR-seg7.7. 7. Receded CAAAATCAGCAGGAAAAAACCTTTATCGATC
129
CA 03027882 2018-12-14
WO 2017/218727
PCT/US2017/037596
mAsPCR-seg77.7.Reverse CCCTTGCTCATATAGAT, __ ACTGCATC
mAsPaZ-seg77. 7 .Witd-Type CAAAATCAGCAGG.AA A AAACC1-3. IATC:GATT
mAsPCR-seg77.8.1Zteatied GTAGAATCACCATCTAATCCACTC:=
InAsPCR-seg77.S.Reverse GACCGTTCAGATATTTCGTGCAT
niAs2CR-seg7 .S.Wild-Tn.Ne CTTAGAAAGCCCAAGTAATC:CATTGTT A
mAsPCR -se.g 73.1 ..Recodect CAGTAGGTTCACGAAGAAGTCATTT
mAsPCR-seg.78.1..ReveIse GTGCCTGGTTCAAACTGACG
mAsPCIZ-segla 1 CAGCAAGTICACGAAGAAGTCATTG
mAsPCR-seg7S.2....1frxxpird TCATCGGGATCATGATTTTCAGTGA
InAsPCR-seg72..Reveaw GCACCACCTCACATACGGT
mAs2CR-seg76.2..Wikt-Type TCATC:GGGAT.CATGATTTTCAGGCT
-131A.sPCR 73. 3.Recoded CCTGA.GTCGCGTCCATAATTTTG
mAsPCR-seg.78.3.Reverse CGCATCTCATGTAACGTTGTGG
mAsPCIZ-seg7Z.3.Wt1.d-Type CCTGAGTCGCGICCATAAMTTAA
mAsPCR-seg78..4..Recoded GCTTCCGTATGACGCGTIG
InAsPCR-seg7..4..Reveaw CTGCTACTC.TCTCGCTGGAAA
mAs2CR-seg76.4..Wikt-Type GCTTCGGTATGACGCGTGC
811A8PCK-seg73.5.1beci3dezi CATGATGATGACGCTGAAAGGAC
mAsPCR-seg.78.5..ReveIse CACCTGTGAGATTTCTGAAGCTC
mAsPCIZ-seg7Z.5.:Wi1a-Type CATGATGATGACGCTGA_A_AGGTT
mAsPCR-seg78..6..Recoded AAGACGTACCACTTTTTCGGCAAG
InAsPCR-seg7..6..Reveaw CAATC:ATCGCACCTTTCCTTACC:
mAs2CR-seg76.15..Wikt-Type CA-AACGTACCACTITTTOC;GCTAA
fa.AOCK-seg73.7.Rezezigai AGTCAGGAGTATTTAG'CCTTGGAC
mAsPCR-seg.78.7.Reverse CGAGATTCCCCCAGTAGC:G
mAsPCIZ-segla 7 .Witd-Type AGTCZAGGAGTATTTAGCCTTGGAG
mAsPCR-seg78..S..Recoded TAATCCATCCCAGACTGA_AGGACATTTA.G
InAsPCR-segn.S.Reveaw CT "GGTGAAGYTTGTTTC:CG-kitir
mAs2CR-seg76.8..Wild-Type TAATCCATCCCGCTCTGTAAGACATTTAA
811A8PCK-seg79.1.1beci3dezi AGCGAACATGGAGCTGTCA
m4sPCR-seg.79.1..ReveIse GAGTCGGGTGCACATCCC
mks PCK-seg79.1.. WA1-Type AGCGAACAIGGAGCTGAGC
mAsPCR-seg79.2..Recoded GCCAGAATCCTICAACGTACTTC
InAz.iPCR-seg79.2..Reveaw TCAGGATCTGCTGACC-TTCC
=As PCR-seg 79.2_.Wild-Type GCCAGAATCCTTI_AACkiTATTGT
2111AOCK-seg79.3...levAed GCGCAGATGGITTGCACAAG
130
CA 03027882 2018-12-14
WO 2017/218727
PCT/US2017/037596
tisA8PCR-seg79.3 ..Revea-se CCCGTGAATCAGCCC4CTAT
$13A3PCR-seg79.3.1,Vi1d-Type GCGCAGATGGTTTGCACTAA
3sAaPCR-5eg79.4..Reco41 CATCGCCCATTCGO.TTTTGG
gt2AFiPCR-seg79.4..Re3;efw TTGACTCCGCAAGTTTGTATTCA,4A
InAsPCR-seg79_4..Wi1d.-T3Te CATCGCCCATTCGGTTTTG'C
gilA5PCR-seg,79.5..:Recocie4. TATTTTTA'XGCCGTTGATGCCTCA
srA3PCR-seg79.5..Revene CCTC:TTTCGCCATAACTTGTGC
m_AsPCR-svg79. 5 ..Wiki-Type TATITTTATCGCCGTTGATGCCAGT
suAsPCR-seg79.Z.Rteatied GATACCGGCTTTGTCAGAAACTG
InAsPCK-seg79_4.1Zevene. GCACAGAGTTATCCACAATCATCAAT
alA!tsPCR-seig79,45:Wild-Type GATACCGGCITIGTCAGAAACAC
mAsPCR-seg79.7. ibecoded CTCATTAACCGCGACCCAAAG
lysAaPR-svg79.7..Reverse TCAAGGA.:kAA.GACTACGILAGAATALUGAA
suAsPCF_-seg79.7. Mild.-T-y-pe CTCATIAACCGCGACCCACA-A
mAsPCR-seg79.8. ibecoded TITCCCCGGCACITATGGAACTI
alAEiPCR-seig 79,8. Revef.se TCTTCAATGGCGTCGCGAA
mAsPCR-8eg79.8..Wild:Type TTTCCCCGGCATTAATGGTTA
InAaPCR-svgn. ..Recca&I CITTATCC:ATCACGCGA.il.ACTICIT
inAsPelt-seg30.1..Rew-Tse GCCGACCACATTCATGCC
mARPC.R-sagSa 1 CTTTATCCATCACGCGAAATTGTTG'
isiksPC'R-scg,80.2..Recoded. GAGTTTATTCGCGGCATGTCA
mAsPCR-8eg80.2. .Reverse GCGTC:ATTTTCCTGGTCAGC
alA!cPCR.-seg,80.2..Witd-Type. GAGTTTATTCGCGGCATGAGT
inAsPCR-seg30.3..Reoatied TAGCGTTTTGGCCTCGGAA
alski-PCR-sEg80.3 ..Rewsge CAAC:AAAAATC-GGTCACTCAGGATC
mAsPCR-seg,80.3..Wik1.-T=y-pe TAGCGTTTTGGCCTCACTG
mAsPCR-segS0.4....xtert ACATCTTT kACC I I CACTCCTCCA
alAt;PCR-see0.4..Reveose CGTAATTTTCGCOTATCTGGGT
mAsPCR-gegS0.4..Wild-Type ACATCTTTAACC __ I I I CACACCACCT
InA5PCR-sEg80.5 ..Recoded ACTTGTTAAAGCCCTTCAGGACTGA
InAsPCR-seg.,80.5..Reverse CTGC-GATATTTCTGGTCCTGGTG
mAsPCR-segS0.5 ACTTGTTAAAGCCCTTCA 'GC:AC:ACT
filA;SPCR-seg80.6..Recoded ACATCTCCCGCGACGTAC
mAsPCR-gegS0.6..Reverse GACGGGTTGGCGGAAAGTA
m_AEPCR-segS0.45..Wi1d-Type. ACATCTCC:CGCGACGTAT
InAsPCR-seg..80.7..Recaded TACAGGTATGCGTTTAAACCCAGTTAAAC
131
CA 03027882 2018-12-14
WO 2017/218727
PCT/US2017/037596
.mAiPC.R-W.t. 7. Reverse CTCAAAGTGGGGGTTAAGAATGTC
mAsPCR-segN.7..W3ld-Type TAC.:AGGTATGCGTTTAAACCC:AGTTAAAT
ir.lAsPCR-seg,80.8. _Receded AGAAGCAGTACAGGTTTGGTGATA
inAsPCR-segN.S..Reverse GCCCCTGCCTCAAAAATGG
.113.4.i:PCR-seW.S...Wad-Type AGT AAC AQTACAGGTTTGGTGATT
IrAsPCR-seg8i..1.Ret.otted C:ATCTG.,-1.kTAGCGCACTC-GTC
tuAs.PCR-fieg81./ .Revese CGTGCGACCAGTGCAAAG
tuAN,PCR-seggl..1.Wild-Type CATCTGAATAAAGCGCACTGGAG
PC1-seg81.2. .R.ecoded TGACCAC:CCACA.AAACCTCA
in.A0CR-,sn8.1.2...Reverse GGAATTATACTCCC:CAACAGATG2=_ATT
tylA8PCR-seg812...Wild -Type TGACLACCCACAAAACCAGT
GTCACATCACCATCACATACGAAG
mAsPCR-seg5-1.3.. Re TTTTCCATGATGGCGAAt., ICAAAT
trIA5PCR-seg8.1_3. Wild-Type GTCACAAGTC:CATCAC ATAC.A.1,kAGAAA
CR.-sesSI..4.Recoded GATC 'GTGCAAAAGGITCYGTCT
InAOCR-sEgZ.1.4Reverse GCGACACCAAGCCAGAAC
mAsPCR-se.g31..4. Wild-Type GATCGTGCAAAAGGTTCTGAGC
CR-8e.g81.5..ReCCIded TACTATCTGTGGCAAAACGATT.ACTCA
inAsPCR-seggl..5..Rever5e TCGCCATATTAATCGACTCAACCA
.113.4.0CR-seg8.1.5...Wad-Type TACTATCTGTGGCAAAACGAITACAGC
rakiPCe.6..Recoded GCGAGAATCTCTOCGTGCAC.
mAs.PCR-fie01.6...Reverse GTTTTTTTGAATAGGGTATGCAGATGCA
inA8PCR-seggl..6..W1ld-Type GCGAGAATCTCTGC.¨GTGCAT
rnAsPCR-geg5L7. Receded C:AGT_A_AGCGCAATAACAATACGTGAA
in.A5PCR-,sn8.1.7.Reverse TGTAATITTCCCTCTTCAGCACGA
mAsPCR-seg81.7. WM-Type CAGTIAACGCAATAAC.:AATCCTGCTC
.ntAiPCR-egI& Receded CACCGAAGCCTTCAAAAAA.GC:AT
naAsPCR-segBI.3.. Re C:AACAC:CCATTGC:CATCGT
trIA5PCR-seg8.1.8. Wild-Type CACCGAAGCC_TTCAAAAAAGC:AA
.PCR.-sega2. I .Recoded GGGCG.ATATCTTCATACAGTITTACT
InA8PCR-sEg.82. .Reverse CTGCTGITCGC-CATGTCTGA.
GGGCGATATC.TTCATACAGTTTCACC
.mAiPCR-ste.g82.2. ReCiDded CTCTTGATAGCGTGTIGGGIATGA
inAsPCR-seg82.2...Rever5e CTGGCGGTGGTTCTCTCC
-n-...As2CR-seg82.2...Wild -Type CTCTTGATAGCGTGTTGGGTAGCT
al.AsPCR-se02.3..Recoded C.,GCGCAGAACACCATCTCA
132
CA 03027882 2018-12-14
WO 2017/218727
PCT/US2017/037596
&PCK-2J..Rers CATTTTGTIGACGCAGAGCCA
GGCGCAGAACACCATCAGT
rnAsPCR-seg32.4..Rec.cnieti TGTGTAICTGACTCGGTTTACCAAAIAAT,
.113AOCR-segS2.4.1kevene CGTCATATGA.TAC:GCCTGCATTC:
TGTGTAAGTGACAGCGTTTAICAAATTA
InAl-PC:R-.ste2.5..Rec.oded GCTTTTTCC:CGATCGCCTAG
m.M.Pat-seg82.5..RevErse. Al1KMCATAACCGGGTAAGCAA
mAsPCR-seg32.5._Wild-Type GCTTTTTCCCGATCGCCCAA
.i.PCR-se.F82.6..Recocied C:AATACC:CCGTATCCACTCGTC
sPCR-segS2.+5..R.everse GTTA.CC:ITTCGCCAGCATGATC
inAiPCR-.segS2.6.Wiid-Type. CAATAC:CCGGTATCC:AC.FCG.TT
mAsPCK-seRS2.7..Re CCGAGAACAGTACCGCAGA
anAiisCR-segg2.1.Revene CCCCGGAATCTTCATACAGCA
snAsPCR-ses82.7..Wilti-T,Te CCGAGAACAGTACCGCACT
ILAsPCR-seR82.8..Re CCAGCCATCAGATTCCGTACG
fi3A.00R-se02...S__Refsee GCACAC:CACCACITCTCC,
mAsPCK-segB2.8..Wkid-Type CCAGCCATCAGATTCCGTTCT
.173AOCR-segE. 1 ..Recodect CTCTAAAGAGT.TIGA.GAAATACACUll.k_.T
inAsPCII-seg83.1..Reverie TTGCIACCATtMCCGGATC:
.inAiTsCR-stg2. 1 ..Wi1d-Type CTGTAAAGAGTTTGAGA_AATA.CACCTTCA
trIA5PCR-seg.83.1Recoded TCAGGAATATCTGAGA=TGTTGTTTGA
mAsPCK-seg33.2..1teverse CGTACCAGTG.'9.CATACCGATAACT
alA:iPCT.-seg83.2..Wild-Type TCAGGAATATCACTGATTTTGTTGTTGCT
CCTC4-;LAAATT.GTTGITTGCCT6A
.mAiPCR-ste3.3.1k.ew1-8e ATGGAACTC:-CGCGACCTG
CCGCTATTGTTCTTTGCCACT
ta-LA&PCR-seg33.4..Recxxied CAGTTACCGCCCAGAGTGA
al.MPOt-seg.83.4..RevErse. CAGGGCAAAGTAGAATCATCGAAAG
tuAsPCR-seg33.4._Wild-Type CAGTTACCGCCCAGAGACT
.n2A8PCR-.segS3.5.17..ecoded ACGTCAGGATCTC:GACCGT
li-AsPCR.-segS3.5..R.everse CGCGAGGTGTCATCCATAAC
tuAsPCR-seg33.5._Wild-Type ACGTCAGGATC:TCGACACA
al-A.PCit-seg83.6..Recotled CGCAATATCGGTTATCGCGTAC
anAiPCR-stg83..6..Revea-se CCTGGGGAGTCAATCACATCA
CGC.AATATCGGTTATCGCGTAT
mAs:=-)CH..-se.R83.7..Realded TATTGGCGATCCTGATTATU_A,TI c
133
CA 03027882 2018-12-14
WO 2017/218727
PCT/US2017/037596
mAiPCR-stga3. 7..Revexse CAGTGTAATTCGAGCCATTCTGC
inAsPCR-seg83.7..Wild-T-y,-pe TATTGGCGATCCTGATTATGCGTTTAG
anA.sPCR-segS3.8..Rec.octed GGC.:A:TACGAACTTGC:AGAGA
inksPOZ.-seF83.8..Rene GC1 __ 1111 CAGGCTCTAACGGA
knAsPCR-segn.S..Wild,Type 'G'GCATACGAACTIGCAGACT
in-AsPCR-se.g84.1..Recodecl GTTGACGGACGC'ACATAGTAT
mAsiCR-seR84.1..Revene A,ACTGGTCTTCACTCGTCGTC
inAEPCR-se..04. 1 . Wad-Type GTTGACGGACGCAC ATAGTAG
mAsPCR-8eg34.2..Rweded CGTACTTAAAGGTTGTTCAGATTCTTCT
roMPCR-sEg84.2..Ree CGCAGAGTAAAACGGTAAGCC
111ASPCR-seg84.2..Wild-Type CGTATTGAA.AGGTTGTAGCGATAGTAGC
anA.sPCR-segS4 3.Recocled AGTACAACAAATCTC:AGTCf.'ATCACTC
EIA,sPOZ.-seF84.3.Reveme ACAACTTTCAGACCGACCTCTAC
knAsPCR-5eg34.3.WILI.-Type. AGTACAACAAAAGTCAGTCCATCACTT
PCR-mt.04.4..R.ecocied GGTGGTGATCAAGCCCTCA
mAsPCR-sea4.4..Reverse CATCTTTCCCCCAGGCGAA
inAEPCR-se..04.4..Nrilkl-Type GGTGGTGATCAAGCC'CAGC
tuAsPCR-se.04.5..Recoded CATCCATCCCTCCGTTCTCA
mAs.:PCR-segS4.5..Reves-se CTCTACGGCCI'n.AGTCAGTCTATG
111ASPCR-seg84.5..Wild-Type CATCCATCCX:TCCGTTCAGC
r1.1_,s.sPCK-seg34.6..Receded GATGCCACACGCCAGITT
inki..K1t-seg84.6..Roxf se GATAA.AGATCGGCGGCATTACG
knAsPC11-seg34.6..Warl-Type GATGCCACACGCCAGTtC
PCR-sÃV4. 7.Recocied TGGAGTTCAAATTTACCC:CGTTTAAG
mAsPCR-sea4.7..Reverse ACGAAGAAATACCCATAACAATAAATGAAT
inAEPCR-se..04.7..Nrii1a-Type TGGAGITCAAATTIACCCCCiTTTTAA
smAsPCR-se.014.3..Recoded CTGAATCTGACGGCGGAACTA
mAs.:PCR -segS4. ..Reves-se ACGGGTAAAGATCGGGTTTATC AT
111ASPCR-seg84.8..Wild-Type CTGAATCTGACGGCGGAATTG
r1.1i...-M-seg35.1..Receded CTTIC:TCGATCAGGTCTAICGTTTC
inAsKa-seg:85.1..ReNVSe TCAATCAGGCGGATGATCTCG
mAaPCR-seg,85.1.:sifiid-Type CTM:TCGATCAGGTC:TATCAGGTCAG
APCR-mt.05.2..likeesxied GAAATGCC.:GGTGGTCTTGG
mAsPCR-seR85.2..Reverse GGCGTCATCACCTTGATCGA
inAiPC'R-stg852..Wari-Type CTAATGCCC;GIGGTCTTGC
inAsi-,CR-se.g85.3..Recoded CCTCGAAATCCOGTGACAACTC
134
CA 03027882 2018-12-14
WO 2017/218727
PCT/US2017/037596
mAsPaZ-seg85.3..Rellerse TTTTTTAATGAATITGCTGGTTGAAAAATC
ittAsPCR-seg$53...WiIti-Type CCAGTAAATCCCGTGACAACAG
niA.sPCR,seg85.4..Rectxled CAA TCTCGCCATTGTGA.CGT
InA.sPC:R-seg85 4..Resime GAAACAGAAAGTGATCGTCAAAC:ATCT
mAsPCK-aeR85:.4,..WM-Type CAATCTCGCCATTGTGACGC
rriA.sPCR-.seg85.5.Recoded TGTACTACCATATATTAATGAACAGCGTCTT
snA.00R-sege.5.5.Rervene GCAAGAAAATGGCGGAAGAATT
113Asi-CR-5eg85..5.Wi1d-Type TGIATTACCATATATTAATGAACAGCGITTA
131A,PCR-see.5.6..Recodeti CTACCTGCCAA TTCATCATCAT C A
inAsPCFC-seg$5.6..Reverse ATACAGATGAATCGTACGCGTTTAG
mAsPaZ-seg85.6..Wiki-Type C,.:TACCTGCCAATAGTTCAAGTAGT
nIAOCR-seg$5.7. art xa-tt CCACGACGATGCA.GGZJsAG
mA&PCK-8e05.7...Revea-se GCTAAGATAATTATACTCAACGGATMACC
mA.sPC:R -.seg85 .7 ..Witd-Type CCACGACGATGCAGGCAC
mAsPCR-se05.8..Recinied GCCCGACACCTGAATCTA.CTAG
InAsPCK-seg85., E. Reverse GCTGITTATTGCCATIGTTATTGCG
8.. W i1d -Type GCCCGACACCGCTATC'TACTAA
iuAsPCR-8eg85.I..R.ecoded GTATACCCATCATCTGCTGG_A_'4TCT
mAsPCR-seg86.1..Reller8e GCCCACTTTATCCCAATCCG
InAsPCFC-4eg45.1. _WM-Type GTATACCCATCATCTGC:TGGAAAGC
mAsPC11-aeg86.2...Recodec1 GC..ATTGTTCATGTTATCTGC:IGAAAG
mA.sPC:R-.seg.86 2..Resime GGT, .4AATCCGTAC:TTATC:ATCACCGT
mikaPCK-8e06.2..W3k1-Type GC:ATIGTTCATGTTATCTGCGCTTAA
rnA&PCR-8eg86.3..Recoded TCACAAACAGAACGTGGAT=T
raAt:PCR-seeri.3...Reveme CGGGAGGGGGCATCATTTAA
113A5PeR-seg86.3..WiId-Type TCACAAACAGAACGTGGATCTTCA
ITLA5PCR-scg86.4..Recoded CGTCGATTC TCAGGCAC:.AATCA
snAsPCR-seg85.4...Reverse GC:TGGACTGGCTTTGGATAAAATT
mAsPCR-seg86.4..Wiki-Type CGTC:GATTCTCAGGCACAI,siAGT
ra.A.sPCR-seg86.5.Reco.sied TakTGGACGTGAAAGTGGGTTC.:
illAsPell-aeg86.5.Reverse .kGCACCGCC:TGTAGTTTCG
mA8PCR-seg86.5.Wild-Type TGATGGACCTGAGTGGGTAG
ill.APCFt-segf:6.6..Recoded CTTCAGAGATTCGTTCCTGACCT
nIMPCR-8eg86.6..Reverse GGCTGGAAC AAAACCGTCTG
rs-sA.PCR -:seg86 cavrc AGA.GAITCGITCCTG.ACCG
in.A&PCR-8eg85.7.Recoded GGATAAAGCGACGCTTATGTCA
135
CA 03027882 2018-12-14
WO 2017/218727
PCT/US2017/037596
mAsPCF.-seg86.7.Reverae TSGTAGGCATTCTTAAGCACeSTC
InA&PCR-seg86.7:Wild-Type GGATA:4ACCGACGT-.TGATGAGC
LtAsPC:R-sea6.8..Recotied CAGAA.AG..TCGCC.GGTACCT
tlAPC:R-se786.8,.Reverse CGTGGT.ATTGGTGTGGTC;_4,.AAG
laLkt:PCR-segS6.8_WAd-Type CAGACTA:TCC.CGGTACCG
136
CA 03027882 2018-12-14
WO 2017/218727
PCT/US2017/037596
Table 5. Summary of AGR codons changed by location in the genome, and failure
rates
by pool.
AGR pool # AGR codon # Successful # Failed % Success
AGR.1 11 10 1 91
AGR.2 12 10 2 83
AGR.3 10 10 0 100
AGR.4 7 7 0 100
AGR.5 14 13 1 93
AGR.6 8 8 0 100
AGR.7 13 11 2 85
AGR.8 9 8 1 89
AGR.9 10 9 1 90
AGR.10 13 12 1 92
AGR.11 7 6 1 86
AGR.12 9 6 3 67
Total 123 110 13 89
137
CA 03027882 2018-12-14
WO 2017/218727
PCT/US2017/037596
REFERENCES
The specification identifies the references by author with the complete
citations provided
below. The disclosure of each reference cited is hereby incorporated by
reference in its
entirety.
1. Gibson, D.G., Glass, J.I., Lartigue, C., Noskov, V.N., Chuang, R.Y.,
Algire, M.A.,
Benders, G.A., Montague, M.G., Ma, L., Moodie, M.M., et al. (2010). Creation
of a
bacterial cell controlled by a chemically synthesized genome. Science 329,52-
56.
2. Lajoie, M.J., Kosuri, S., Mosberg, J.A., Gregg, C.J., Zhang, D., and
Church, G.M.
(2013a). Probing the limits of genetic recoding in essential genes. Science
342,361-
363 .
3. Lajoie, M.J., Rovner, A.J., Goodman, D.B., Aerni, H.R., Haimovich, A.D.,
Kuznetsov, G., Mercer, J.A., Wang, H.H., Can, P.A., Mosberg, J.A., et al.
(2013b).
Genomically recoded organisms expand biological functions. Science 342,357-
360.
4. Crick, F.H. (1963). On the genetic code. Science 139, 461-464.
5. Liu, C.C., Schultz, P.G. Adding new chemistries to the genetic code.
Annu. Rev.
Biochem. 79, 413-444 (2010).
6. P. Marliere, The farther, the safer: a manifesto for securely navigating
synthetic
species away from the old living world. Syst. Synth. Biol. 3, 77-84 (2009).
7. Mandell, D.J. et al., Biocontainment of genetically modified organisms
by synthetic
protein design. Nature. 518, 55-60 (2015).
8. Rovner, A.J. et al., Recoded organisms engineered to depend on synthetic
amino
acids. Nature. 518, 89-93 (2015).
9. A. Ambrogelly, S. Palioura, D. SO11, Natural expansion of the genetic
code. Nat.
Chem. Biol. 3, 29-35 (2007).
10. A. Kano, Y. Andachi, T. Ohama, S. Osawa, Novel anticodon composition of
transfer
RNAs in Micrococcus luteus, a bacterium with a high genomic G + C content.
Correlation with codon usage. I Mol. Biol. 221, 387-401 (1991).
11. T. Oba, Y. Andachi, A. Muto, S. Osawa, CGG: an unassigned or nonsense
codon in
Mycoplasma capricolum. Proc. Natl. Acad. Sci. U S. A. 88, 921-925 (1991).
12. G. Macino, G. Coruzzi, F. G. Nobrega, M. Li, A. Tzagoloff, Use of the
UGA
terminator as a tryptophan codon in yeast mitochondria. Proc. Natl. Acad. Sci.
U S.
A. 76, 3784-3785 (1979).
138
CA 03027882 2018-12-14
WO 2017/218727
PCT/US2017/037596
13. J. Ling, P. O'Donoghue, D. Sell, Genetic code flexibility in
microorganisms: novel
mechanisms and impact on physiology. Nat. Rev. Microbiol. 13, 707-721 (2015).
14. K. J. Blight, A. A. Kolykhalov, C. M. Rice, Efficient initiation of HCV
RNA
replication in cell culture. Science. 290, 1972-1974 (2000).
15. J. Cello, A. V. Paul, E. Wimmer, Chemical synthesis of poliovirus cDNA:
generation
of infectious virus in the absence of natural template. Science. 297, 1016-
1018
(2002).
16. H. 0. Smith, C. A. Hutchison, C. Pfannkoch, J. C. Venter, Generating a
synthetic
genome by whole genome assembly: 9)(174 bacteriophage from synthetic
oligonucleotides. Proceedings of the National Academy of Sciences. 100, 15440-
15445 (2003).
17. L. Y. Chan, S. Kosuri, D. Endy, Refactoring bacteriophage T7. Mol.
Syst. Biol. 1,
2005.0018 (2005).
18. D. G. Gibson et al., Complete chemical synthesis, assembly, and cloning
of a
Mycoplasma genitalium genome. Science. 319, 1215-1220 (2008).
19. N. Annaluru et al., Total synthesis of a functional designer eukaryotic
chromosome.
Science. 344, 55-58 (2014).
20. G. Kudla, A. W. Murray, D. Tollervey, J. B. Plotkin, Coding-sequence
determinants
of gene expression in Escherichia coli. Science. 324, 255-258 (2009).
21. T. Tuller, Y. Y. Waldman, M. Kupiec, E. Ruppin, Translation efficiency
is
determined by both codon bias and folding energy. Proc. Natl. Acad. Sci. U S.
A.
107, 3645-3650 (2010).
22. J. B. Plotkin, G. Kudla, Synonymous but not the same: the causes and
consequences
of codon bias. Nat. Rev. Genet. 12, 32-42 (2011).
23. D. B. Goodman, G. M. Church, S. Kosuri, Causes and effects of N-
terminal codon
bias in bacterial genes. Science. 342, 475-479 (2013).
24. M. Zhou et al., Non-optimal codon usage affects expression, structure
and function of
clock protein FRQ. Nature. 495, 111-115 (2013).
25. T. E. F. Quax, N. J. Claassens, D. Sell, J. van der Oost, Codon Bias as
a Means to
Fine-Tune Gene Expression. Mol. Cell. 59, 149-161 (2015).
26. G. Boel etal., Codon influence on protein expression in E. coli
correlates with mRNA
levels. Nature. 529, 358-363 (2016).
27. F. J. Isaacs etal., Precise manipulation of chromosomes in vivo enables
genome-wide
codon replacement. Science. 333, 348-353 (2011).
139
CA 03027882 2018-12-14
WO 2017/218727
PCT/US2017/037596
28. H. H. Wang et al., Programming cells by multiplex genome engineering
and
accelerated evolution. Nature. 460, 894-898 (2009).
29. K. M. Esvelt et al., Orthogonal Cas9 proteins for RNA-guided gene
regulation and
editing. Nat. Methods. 10, 1116-1121 (2013).
30. G. Posfai et al., Emergent properties of reduced-genome Escherichia
coli. Science.
312, 1044-1046 (2006).
31. K. Temme, D. Zhao, C. A. Voigt, Refactoring the nitrogen fixation gene
cluster from
Klebsiella oxytoca. Proc. Natl. Acad. Sci. U S. A. 109, 7085-7090 (2012).
32. A. H. Yona etal., tRNA genes rapidly change in evolution to meet novel
translational
demands. Elif e. 2, e01339 (2013).
33. Y. Yamazaki, H. Niki, J.-I. Kato, in Microbial Gene Essentiality:
Protocols and
Bioinformatics, A. L. Osterman, S. Y. Gerdes, Eds. (Humana Press, Totowa, NJ,
2008), vol. 416 of Methods in Molecular Biologym, pp. 385-389.
34. S. Anders, W. Huber, Differential expression analysis for sequence
count data.
Genome Biol. 11, R106 (2010).
35. S. Osawa, T. H. Jukes, Codon reassignment (codon capture) in evolution.
I Mol.
Evol. 28, 271-278 (1989).
36. H. M. Salis, The ribosome binding site calculator. Methods Enzymol.
498, 19-42
(2011).
37. T. Conway etal., Unprecedented high-resolution view of bacterial operon
architecture
revealed by RNA sequencing. MBio . 5, e01442-14 (2014).
38. C. J. Gregg etal., Rational optimization of tolC as a powerful dual
selectable marker
for genome engineering. Nucleic Acids Res. 42, 4779-4790 (2014).
39. K. A. Datsenko, B. L. Wanner, One-step inactivation of chromosomal
genes in
Escherichia coli K-12 using PCR products. Proc. Natl. Acad. Sci. U S. A. 97,
6640-
6645 (2000).
40. A. Haldimann, B. L. Wanner, Conditional-replication, integration,
excision, and
retrieval plasmid-host systems for gene structure-function studies of
bacteria. I
Bacteriol. 183, 6384-6393 (2001).
41. D. E. Deatherage, J. E. Barrick, Identification of mutations in
laboratory-evolved
microbes from next-generation sequencing data using breseq. Methods Mol. Biol.
1151, 165-188 (2014).
42. H. Li, R. Durbin, Fast and accurate short read alignment with
Burrows¨Wheeler
transform. Bioinformatics. 25, 1754-1760 (2009a).
140
CA 03027882 2018-12-14
WO 2017/218727
PCT/US2017/037596
43. H. Li et al., The Sequence Alignment/Map format and SAMtools.
Bioinformatics . 25,
2078-2079 (2009b).
44. S. Anders, W. Huber, Differential expression analysis for sequence
count data.
Genome Biol. 11, R106 (2010).
45. Carr PA, et al. (2012) Enhanced multiplex genome engineering through co-
operative
oligonucleotide co-selection. Nucleic Acids Res 40(17): e 132
46. Lennox ES (1955) Transduction of linked genetic characters of the host
by
bacteriophage Pl. Virology 1(2):190-206.
47. Schwartz SA & Helinski DR (1971) Purification and characterization of
colicin El.
The Journal of biological chemistry 246(20):6318-6327.
48. Mosberg JA, Gregg CJ, Lajoie MJ, Wang HI-1, & Church GM (2012)
Improving
Lambda Red Genome Engineering in Escherichia coli via Rational Removal of
Endogenous Nucleases. PLoS One 7(9):e44638.
49. Yaung SJ, Esvelt KM, & Church GM (2014) CRISPR/Cas9-mediated phage
resistance is not impeded by the DNA modifications of phage T4. PLoS One
9(6):e98811.
50. Gibson DG, et al. (2009) Enzymatic assembly of DNA molecules up to
several
hundred kilobases. Nat Methods 6(5):343-345.
51. Baba T, et al. (2006) Construction of Escherichia coli K-12 in-frame,
single-gene
knockout mutants: the Keio collection. Mol Syst Biol 2:2006 0008.
52. Hashimoto M, et al. (2005) Cell size and nucleoid organization of
engineered
Escherichia coli cells with a reduced genome. Mol Microbiol 55(1):137-149.
53. Ellis HM, Yu D, DiTizio T, & Court DL (2001) High efficiency
mutagenesis, repair,
and engineering of chromosomal DNA using single-stranded oligonucleotides.
Proc
Natl Acad Sci USA 98(12):6742-6746.
54. Markham NR & Zuker M (2008) UNAFold: software for nucleic acid folding
and
hybridization. Methods in molecular biology 453:3-31.
55. Rohland N & Reich D (2012) Cost-effective, high-throughput DNA
sequencing
libraries for multiplexed target capture. Genome research 22(5):939-946.
56. Zadeh JN, et al. (2011) NUPACK: Analysis and design of nucleic acid
systems. J
Comput Chem 32(1):170-173.
57. Li GW, Oh E, & Weissman JS (2012) The anti-Shine-Dalgarno sequence
drives
translational pausing and codon choice in bacteria. Nature 484(7395):538-541.
141
CA 03027882 2018-12-14
WO 2017/218727
PCT/US2017/037596
58. Chen GF & Inouye M (1990) Suppression of the negative effect of minor
arginine
codons on gene expression; preferential usage of minor codons within the first
25
codons of the Escherichia coli genes. Nucleic Acids Res 18(6):1465-1473.
59. Rosenberg AH, Goldman E, Dunn JJ, Studier FW, & Zubay G (1993) Effects
of
consecutive AGG codons on translation in Escherichia coli, demonstrated with a
versatile codon test system. J Bacteriol 175(3):716-722.
60. Spanjaard RA & van Duin J (1988) Translation of the sequence AGG-AGG
yields
50% ribosomal frameshift. Proc Nat! Acad Sci USA 85(21):7967-7971.
61. Spanjaard RA, Chen K, Walker JR, & van Duin J (1990) Frameshift
suppression at
tandem AGA and AGG codons by cloned tRNA genes: assigning a codon to argU
tRNA and T4 tRNA(Arg). Nucleic Acids Res 18(17):5031-5036.
62. Bonekamp F, Andersen HD, Christensen T, & Jensen KF (1985) Codon-
defined
ribosomal pausing in Escherichia coli detected by using the pyrE attenuator to
probe
the coupling between transcription and translation. Nucleic Acids Res
13(11):4113-
4123.
63. Zeng Y, Wang W, & Liu WR (2014) Towards reassigning the rare AGG codon
in
Escherichia coli. Chembiochem : a European journal of chemical biology
15(12):1750-1754.
64. Yu D, et al. (2000) An efficient recombination system for chromosome
engineering in
Escherichia coli. Proc Natl Acad Sci USA 97(11):5978-5983.
65. Lajoie MJ, Gregg CJ, Mosberg JA, Washington GC, & Church GM (2012)
Manipulating replisome dynamics to enhance lambda Red-mediated multiplex
genome engineering. Nucleic Acids Res 40(22):e 170.
66. Curran JF (1993) Analysis of effects of tRNA:message stability on
frameshift
frequency at the Escherichia coli RF2 programmed frameshift site. Nucleic
Acids Res
21(8):1837-1843.
67. Ohtake K, et al. (2012) Efficient decoding of the UAG triplet as a full-
fledged sense
codon enhances the growth of a prfA-deficient strain of Escherichia coli. I
194(10) :2606-2613 .
68. Craigen WJ, Cook RG, Tate WP, & Caskey CT (1985) Bacterial peptide
chain release
factors: conserved primary structure and possible frameshift regulation of
release
factor 2. Proc Natl Acad Sci USA 82(11):3616-3620.
69. Goodman D, Kuznetsov, G., Lajoie, M., Ahern, B., (2015) Millstone, a
web based
genome engineering and analysis software.
142
CA 03027882 2018-12-14
WO 2017/218727
PCT/US2017/037596
70. Novoa EM & Ribas de Pouplana L (2012) Speeding with control: codon
usage,
tRNAs, and ribosomes. Trends in genetics: TIG 28(11):574-581.
71. Novoa EM, Pavon-Eternod M, Pan T, & Ribas de Pouplana L (2012) A role
for tRNA
modifications in genome structure and codon usage. Cell 149(1):202-213.
72. Ikemura T (1985) Codon usage and tRNA content in unicellular and
multicellular
organisms. Mol Biol Evol 2(1): 13-34.
73. Lajoie MJ, Soll D, & Church GM (2015) Overcoming challenges in
engineering the
genetic code. J Mol Biol.
74. N. R. Markham, M. Zuker, DINAMelt web server for nucleic acid melting
prediction.
Nucleic Acids Res. 33, W577-81 (2005).
143