Tab-delimited text files format and other frequent questions:
- Predictions are based on GENCODE transripts/genes (see info page. For easy-of-use, we classify these into several categories, such as intergenic lncRNAs.
- Site predictions are done on individual transcripts, and later aggregated into non-redundant sets for individual genes. This allows splice junction-spanning sites to be identified.
- The Region column indicates if the site is in the 3'UTR, CDS or 5'UTR of a gene. Situations can arise where region assignment is ambigous (different region in different transcript isoforms). In this case, several regions are indicated (e.g. '3pUTR,ncRNA'). Sites in non-coding RNAs, or in non-coding spliceforms of coding genes, show 'ncRNA' in this column.
- MicroRNA family names follows the TargetScan nomenclature, as these are widely used.
- Dito for seed match types: 7-mer-A1 is a 6-mer match flanked by an adenosine in the target sequence, at the positions complementary to the 5'-most base of the microRNA.
- 7-mer-m8 is simply a 7-mer complementary seed match, but for consistency with TargetScan we use this name.
- 8-mer is a 7-mer match flanked by an adenosine (see 7-mer-A1 above).
- All site positions refer to the 5' end of the microRNA.
- The repeat column indicates that the seed overlaps with a RepeatMasker-defined repeat region (e.g. Alu).
- Conservation scores are given, that indicate conservation (%) in primates (9), non-primate mammals (23), and more distant vertebrates (13).
- In the downloadable files, these columns are preceeded by a total conservation column (%) (45 species excluding human).