Pages
Products
CBpromise

Our promise to you:
Guaranteed product quality, expert customer support.

24x7 CUSTOMER SERVICE
CONTACT US TO ORDER

Using CRISPR-ERA Webserver for sgRNA Design Protocol

Experiment Summary

The CRISPR-Cas9 system is emerging as a powerful technology for gene editing (modifying the genome sequence) and gene regulation (without modifying the genome sequence). Designing sgRNAs for specific genes or regions of interest is indispensable to CRISPR-based applications. CRISPR-ERA (http://crispr-era.stanford.edu/) is one of the state-of-the-art designer webserver tools, which has been developed both for gene editing and gene regulation sgRNA design. This protocol discusses how to design sgRNA sequences and genome-wide sgRNA library using CRISPR-ERA.

Equipment

  1. Personal computer for CRISPR-ERA website searching
  2. High performance computing cluster for building genome-wide sgRNA library. Taken genome version hg19 as an example, the minimum storage space is 500 G

Software

  1. CRISPR-ERA (http://crispr-era.stanford.edu/)
  2. USCS genome browser (Kent et al., 2002; http://genome.ucsc.edu/)
  3. Bowtie2 (Langmead et al., 2012; http://bowtie-bio.sourceforge.net/bowtie2/index.shtml)
  4. NCBI (https://www.ncbi.nlm.nih.gov/)
  5. Perl scripts (Programming language, https://www.perl.org/)
  6. Shell scripts (Programming language, Command Line Interface shell, https://www.linux.org/)

Procedure

A. Using CRISPR-ERA webserver for sgRNA searching

1. CRISPR-ERA webserver input (Figure 1)

a) Choose the type of objective gene manipulation: gene editing using nuclease, gene editing using nickase, gene repression, or gene activation.

b) Choose the host organism: Escherichia coli, Bacillus subtilis, Saccharomyces cerevisiae, Drosophila melanogaster, Caenorhabditis elegans, Danio rerio, Rattus norvegicus, Mus musculus, or Homo sapiens. Different type of choice in step A1a presents different optional organisms.

c) Choose the input format: official gene name, gene location (target region for gene editing or transcriptional start sites (TSS) location for gene regulation), or gene sequence in FASTA format (using a textbox or uploading files).

Fig. 1 CRISPR-ERA input process.Fig. 1 CRISPR-ERA input process.

2. CRISPR-ERA webserver output (Figure 2)

Fig. 2 CRISPR-ERA output webpage.Fig. 2 CRISPR-ERA output webpage.

Output webpage contains two parts, 'See results in UCSC Genome Browser' and 'Results'.

a) By clicking 'click here to see result in UCSC Genome Browser', CRISPR-ERA can show all the sequences on UCSC Genome Browser. sgRNA is identified by 'ID'. The sum of E-score and S-score is represented by color shades referred to the color bar.

b) Result table contains sgRNA sequences and their properties, such as target gene, transcript ID, distance to TSS, location, strand, etc. The sgRNAs starting with 'G' can be screened out, which could be applied in a CRISPR system using U6 promoter. When targetable region belongs to more than one transcript, the result table will show the information of all the transcripts, as shown in Figure 2.

c) E-score and S-score columns contain the features that affect sgRNA efficiency and specificity. E-score and S-score are computed based on the criteria summarized from published data. E-score could represent the sgRNA efficacy, which contains GC content, poly-T presence and other sequence features. S-score shows the specificity of sgRNA sequence which is based on genome-wide off-target information. All sgRNA sequences can be downloaded.

B. Genome-wide sgRNA library building pipeline

1. Download genome sequence files in FASTA format and genome annotation files in RefFlat or GFF format, from UCSC genome browser or NCBI website. With genome version hg19 as an example, genome sequence and annotation files can be downloaded in http://hgdownload.soe.ucsc.edu/downloads.html.

2. The Perl scripts can be received after the material transfer form is submitted, which allow 20 bp sgRNA searching with a default PAM (NGG) sequence and pattern (N20NGG). During the searching step, locations and strand information of all potential sgRNA target sites will be recorded.

3. Run Perl program:

perl find_all_sgRNA_z_f_c_y.pl hg19_dna.fa out_sgRNA.txt out_sgRNA_fasta.txt out_sgRNA_gc_t.txt out_nag_fasta.txt out_no_sgRNA.txt

# out_sgRNA: all potential sgRNA sequences

# out_sgRNA_fasta.txt: all potential sgRNA sequences with FASTA format for bowtie next step (with PAM sequence NGG)

#out_sgRNA_gc_t.txt: all sgRNA sequences with GC content and Poly T information

#out_nag_fasta.txt: all potential sgRNA sequences with FASTA format for bowtie next step (different with out_sgRNA_fasta.txt, PAM sequence here is NAG)

# out_no_sgRNA.txt: Number of sgRNA sequences in each chromosome.

3. Run Bowtie to find all possible off-target sequences (both PAM = NGG, PAM = NAG are considered) containing up to 3-bp mismatches for each sgRNA.

bowtie -v 2 -k 100 ./hg19 -f out_sgRNA_fasta.txt sgRNA_bowtie_fasta.txt

bowtie -v 2 -k 100 ./hg19 -f out_nag_fasta.txt sgRNA_nag_bowtie_fasta.txt

4. Compute the E-score and S-score by analyzing the sgRNA sequence features. E-score is computed by GC content and poly-T presence (mammalian only), and S-score is computed based on off-target information derived in step B3. Criteria can be customized, and differ in different organisms and gene manipulations (Figure 3).

Figure 3. An example of E-score and S-score computation. Sequence: GGTGAATGAGGGCTTGCGA.Figure 3. An example of E-score and S-score computation. Sequence: GGTGAATGAGGGCTTGCGA.

5. Extract gene TSS location and coding region in genome annotation files. For gene editing, sgRNA target region is coding region. For gene repression, sgRNA targets a region from upstream -1.5 kbp to downstream 1.5 kbp from TSS, while the target region is -1.5 kbp upstream from TSS for gene activation. By hash searching the eligible sgRNA of these regions in the genome-wide sgRNA library derived in step B4, details of sgRNA for all genes are derived. Then update the E-score and S-score scores according to the additional target location information. Figure 3 is an E-score and S-score computation example of one sgRNA for Pou5f1 repression. The sgRNA database for different gene manipulations formed after the information above integrated.

Data Analysis

After finding the objective sgRNA sequences, the essential step is to evaluate the efficiency and specificity of each sgRNA sequence. In this protocol, we provide a general method to compute the E-score and S-score when building genome-wide sgRNA libraries. For sgRNA database for specific gene manipulations, other criteria should be included except the criteria for genome-wide sgRNA libraries, such as exon locations for gene editing and the distance to TSS for gene regulation. For example, efficiency reduces with a longer distance relative to TSS for gene regulation. The more detailed description of E-score and S-score could be found on the 'Help' webpage of CRISPR-ERA webserver (http://crispr.stanford.edu/help.jsp).

* For research use only. Not intended for any clinical use.
Quick Inquiry