RBPmap: mapping binding sites of RNA-binding proteins
Regulation of gene expression is executed in many cases by RNA-binding proteins (RBPs) that bind to mRNAs as well as to non-coding RNAs. RBPs recognize their RNA target via specific binding sites on the RNA. Predicting the binding sites of RBPs is known to be a major challenge. We present a new webserver, RBPmap, freely accessible through the website for accurate prediction and mapping of RBP binding sites. RBPmap has been developed specifically for mapping RBPs in human, mouse and Drosophila melanogaster genomes, though it supports other organisms too. RBPmap enables the users to select motifs from a large database of experimentally defined motifs. In addition, users can provide any motif of interest, given as either a consensus or a PSSM. The algorithm for mapping the motifs is based on a Weighted-Rank approach, which considers the clustering propensity of the binding sites and the overall tendency of regulatory regions to be conserved. In addition, RBPmap incorporates a position-specific background model, designed uniquely for different genomic regions, such as splice sites, 5’ and 3’ UTRs, non-coding RNA and intergenic regions. RBPmap was tested on high-throughput RNA-binding experiments and was proved to be highly accurate.
A pipeline summarizing RBPmap algorithm. (A) The mandatory input parameters for RBPmap run; a query sequence and a motif of interest to be mapped to the sequence. (B) A match score for the motif is calculated for each site in the query sequence, in overlapping windows of the motif size. (C) The match scores are compared to the average match score that is calculated for each motif in a background of randomly chosen regulatory regions. This step uses two different thresholds; a signifcant threshold for the anchor site (default P-value<0.005) and a suboptimal threshold for the secondary sites (default P-value<0.01) used to evaluate the clustering propensity. (D) A WR score is calculated for a window of 50 nts around each signifcant site. This score reflects the propensity of suboptimal sites to cluster around the signifcant site, weighted by their match score to the motif of interest. (E) To reduce false-positive predictions, the WR scores are compared to a region-specifc background model that is generated independently per each motif for different genomic regions, removing non-signifcant results (P-value>=0.05). The fgure exemplifes the procedure conducted for a query sequence spanning three different genomic regions (mid-intron, intronic region flanking a splice site and an internal exon). (F) Finally, a conservation-based fltering step is applied only to sites mapped to mid-intron/intergenic regions, fltering out sites which fall in non-conserved regions (below the average conservation level calculated for intronic regulatory regions).