Skip to content

new dataset for clustering benchmark

Dear orchestrator, During the BC2 workshop I have added this new dataset to the clustering benchamrk. The dataset comes from this paper https://pubmed.ncbi.nlm.nih.gov/35354960/ that we published last year. Here is the gitlab repository: https://gitlab.renkulab.io/omb_benchmarks/bc2_2023/dataset-klein-2022-clustering

This is a CITE-seq dataset and I considered as "ground-truth" the gating that we performed in-silico using the surface protein expression levels. This defines 5 sub populations of bone marrow progenitor cells, classically studied by immunologists using flow cytometry. We show in the paper that the sorting scheme is not perfect (some populations overlap) but overall theit is still interesting to see how clustering algorithms perfect on such a task

Note that to be able to upload the SingleCellExperiment onto the gitlab repo, I retained only one out of the 4 replicates (but the replicates were very similar), and filtered out the cells that were left out of the gating scheme (there is some "safety" margin on the expression of the surface markers -- Cells falling into these margins are left out) since these cells are scattered across the whole dataset and should not correspond to a true cluster.

Let me know if some more info is needed!
Best Julien