A combined sequence and structure based approach for discovering enriched motifs in RNA from in vivo binding data
Here we present a novel computational RNA sequence and structural motif discovery method, named SMARTIV (Structural and Sequence Motif Enrichment Analysis for Ranked RNA Data from In-Vivo Experiments). Our tool is freely accessible at
Background: RNA binding proteins (RBPs) play an important role in cell regulation processes. Many RBPs recognize RNA binding sites characterized by specific short sequences to regulate gene expression. Both RNA primary sequence and its secondary structure affect specific RNA recognition by RBPs. In recent years, several experimental approaches, such as CrossLinking and ImmunoPrecipitation (CLIP) based methods, were developed to identify RBP targets. However, these methods do not provide information regarding structural preferences of the protein. While methods to obtain the structure of RNA are available, inferring both the sequence and the structure preferences of RBPs remains a challenge.
Results: SMARTIV is designed for discovering combined sequence and structure binding motifs based on in-vivo RNA binding data (sequences containing target sites), ranking of their binding scores and predicted secondary structure. The combined motifs are provided in a unified 8-letter color representation that is informative and easy for visual perception. We tested the method on CLIP-data from different platforms, such as HITS-CLIP, PAR-CLIP, iCLIP, eCLIP, for a variety of RBPs.
Conclusions: Our results are highly consistent with many known binding motifs inferred from in-vivo and in-vitro data, offering additional information on their structural preferences. To our knowledge, SMARTIV method is one of most efficient of its kind.
Polishchuk M et al., Methods, 2017
A flowchart describing SMARTIV. The SMARTIV method takes as an input ranked RNA binding data (e.g. CLIP-seq data) and predicted RNA secondary structure. The method output consists of combined sequence and structure motifs (in 8-letter alphabet) and sequence motifs (in 4-letter alphabet) represented by logos and corresponding PWM’s ranked by the p-value. Combined sequence and structure motif logos in 8-letter alphabet use specific colors for paired and unpaired nucleotides as shown in the flower in the center.