Great work led by Moran Cabili and Margaret Dunagin. A wonderful collaboration between the Rinn, Regev and Raj labs!
The advent of tiling microarrays and then deep sequencing has revealed that there are many long transcripts in the cell that often have many hallmarks of messenger RNA (splicing, polyadenylation, etc.), but have very low protein coding potential. These long non-coding RNA have many putative functions in the cell, including control of gene expression. However, the mechanisms underlying their behavior have often proven elusive.
One of the most well-known long non-coding RNAs is Xist, which is involved in X chromosome dosage compensation in eutherians. Part of the reason we know something about how Xist works is due to direct imaging studies, showing that Xist coats one copy of the X chromosome. Debates continue about how many Xist molecules are out there with implications for mechanism (how many sites is Xist active at), thus showing the value in absolute quantification and localization. Unfortunately, we do not have any such information for the majority of recently identified lncRNAs, and we have no idea how general the lessons learned from Xist are.
We wanted to develop a more systematic picture of lncRNA localization and abundance at the single cell level. Thus, we used our single molecule RNA FISH method, that enables detection of individual RNA molecules by fluorescence microscopy to interrogate a panel of around 35 representative lncRNAs for localization and abundance in single cells.
For regular mRNA FISH, we have a very high success rate with our standard probe design software. However, lncRNAs often contain repetitive elements and other sequences. When an oligonucleotide targets such a region, it can bind off-target, thus creating spurious signals. We controlled for this by using two-color assays: if two probe sets colocalize, then they must be binding specifically. This eliminated a number of probe sets, and appears to be a critical validation step for RNA FISH on lncRNAs.
We performed RNA FISH with probe sets for our panel across 3 cell types: HeLa, human foreskin fibroblasts and human lung fibroblasts. We found a wide range of expression levels and patterns, with some expressed at levels on the order of many hundreds or higher per cell (MALAT1) to a large number the expressed at levels of just a few molecules per cell. Of course, the power of our technique was also that it showed us where these lncRNA were instead of just how many there were. We found that overall, they were much more nuclear biased than mRNA.
Overall, we found that the vast majority of our lncRNA localizations could be described as a combination of three underlying patterns: bright foci (most likely at the site of transcription), mono-disperse nuclear RNA, and mono-disperse cytoplasmic RNA. We found examples consisting of each of these independently, and examples in which they all appeared in combination. Our pictures suggest a model in which lncRNA can accumulate at the transcription site, then may diffuse away from there and sometimes make it to the cytoplasm. Note that one can find these patterns in mRNAs as well, but nuclear blobs for mRNAs are typically much less bright, especially when normalized for the total number of mRNAs floating around in the cell, which is typically much larger than for lncRNA.
The generally low abundance of most lncRNA has led to many researchers wondering how they could exert their function at such low copy numbers. One hypothesis in the field is that while most cells may have few or zero lncRNA, occasional rare cells might have very large numbers, thus allowing lncRNA to play their functional role in those cells. We performed an extensive single cell analysis of lncRNA expression, and while we observed some variability, none had variability beyond that of typical mRNAs, and we found no evidence for rare cells with unusually high abundance.
Many lncRNA seem to appear near coding genes, but transcribed in the opposite direction. We wondered whether this physical proximity would lead to any particular transcriptional associations. We found that a couple of the lncRNAs showed a correlation at the single cell level, but most did not, suggesting (though not proving) that there is no relationship between the transcription of the lncRNA and the proximal coding gene.

Localization and abundance analysis of human lncRNAs at single cell and single molecule resolution

By Moran Cabili, Margaret Dunagin, Patrick McClanahan, Andrew Biaesch, Olivia Padovan-Merhar, Aviv Regev, John Rinn, Arjun Raj

Link to Manscript


Questions: all,

At this point, there are a number of excellent reviews on lncRNAs. Here are a few:

Mercer et al. 2009

Rinn and Chang 2013

Ulitsky and Bartel 2013

We did not test for stress-induced relocalization, but that would be a very interesting aspect to consider. We have looked at stress-induced changes in mRNA localization and didn't notice any major changes, but lncRNA could certainly behave differently.

We had to choose the lncRNA in our panel carefully because RNA FISH is relatively low throughput, and so it was not really feasible to do more than around ~50 probe sets. We wanted to cover a variety of types of lncRNA and expression levels while still ensuring that we targeted some of the most promising ones (i.e., high abundance). We ultimately chose a panel with a range of tissue specificity and expression levels:

We also included a bunch of "classic" lncRNAs for which there was already a lot of literature support, like Xist, MALAT1, and NEAT1. These tended, perhaps unsurprisingly, to be relatively high expression relative to the others in the panel. Also, we chose a number of ubiquitous lncRNAs and a number of lncRNAs specific to one or two of the three cell lines tested (HeLa, human foreskin fibroblast, human lung fibroblast):

We found that some were just qualitatively different, in which blobs showed up in two different locations for the two colors. Here's an example where the full probe gives blobs, but those blobs only show up in the "odds" and not in the "evens":

Sometimes we would get a quantitative inconsistency, in which some spots colocalized, but the full probe set would give higher spot counts:

These could probably be “rescued” by isolating the offending oligonucleotides, but due to the labor involved, we elected not to bother.


Since they are most likely due to single oligonucleotides binding spuriously, they’re probably highly expressed. Most likely, it’s some sort of repetitive element or similar genomic contaminant; these are much easier to find in lncRNA than in mRNA coding sequences, for instance.

Not particularly. For instance, FPKM didn't really correlate, and neither did the number of probes designed to target the gene, although perhaps the really short ones at very low FPKM were more prone to lack of signal:

This suggests that there's no real way to know which lncRNAs are likely to be FISHable ahead of time. It is also unclear whether ones with no signal really just didn't exist (RNA-seq artifact) or whether RNA FISH had a problem (RNA FISH artifact).

For the most part, yes. For the few examples in which we saw a discrepancy, mostly it was just due to a change in the abundance: as the mean increased, you might find, say, more cytoplasmic RNA.

One standard way to show that a spot is at the site of transcription is the use of intron-specific probes that target the pre-spliced RNA. Most splicing happens co-transcriptionally and the introns degrade rapidly post-splicing, so the introns will bind near the site of transcription itself. We haven't done this for most of the lncRNA in our panel, but for the few lncRNAs we have tried this with (e.g. linc-HOXA1, lincSFPQ (unpublished)), they have colocalized at the site of transcription.

Yes, there are definitely such examples. Examples include MALAT1 and NEAT1.

In each cell, we also stained for CCNA2 mRNA, which is only abundant during the S and G2 phases of the cell cycle. We saw three lncRNAs that correlated with CCNA2, indicating some potential cell-cycle association.

It is certainly difficult to prove the negative, because it's always possible that we just didn't look at enough cells. For a few genes that had low expression, we did look at larger numbers of cells (500+). We didn't find any evidence for rare cells with high expression.