opt-SNE

opt-SNE is a modified version of t-SNE that enables high-quality embeddings in the optimal amount of compute time without having to tune algorithm parameters.

About opt-SNE

opt-SNE was developed to solve two fundamental problems with t-SNE. The first is the tendency of large datasets to fail to produce useful embeddings. The second is the need to empirically search the algorithm parameter space to find optimal settings.


It is common to observe the following benefits using the opt-SNE methodology compared to previously conventional strategies for running t-SNE:

  • High-quality embeddings ~2-5x faster
  • Reliable embedding of datasets with numbers of observations that were previously prohibitively large (e.g., 5*106, 20*106)
  • Improved local structure resolution
  • Avoiding the expensive scenario of having to do multiple runs to optimize settings if the initial run failed to produce a good embedding

To see examples of the benefits listed above and for thorough background and discussion on this work, please read the paper available in Nature Communications.

Running opt-SNE

Omiq supports two ways to run opt-SNE:


1) Open-source

In order to build and run opt-SNE on any personal computer, navigate to the Multicore opt-SNE github repository and follow the directions there. Omiq is happy to offer free help with this process. Just reach out using the form below.


2) On the Web

For ease-of-use and performance considerations, opt-SNE is also available in the cloud-based OMIQ Data Science Platform. This requires neither installation of software nor experience with command-line interfaces. To get access to this service, sign up using the form below.


(we'll never share your email with anyone else)

Data

Some data files referenced within the opt-SNE paper are available for download below. All files are zipped.


Flow18parameter

18 parameter fluorescent flow cytometry file from Belkina AC, Snyder-Cappione JE. OMIP-037: 16-color panel to measure inhibitory receptor signatures from multiple human immune cell subsets. Cytometry A. 2017 Feb;91(2):175-179. doi: 10.1002/cyto.a.22983.

  • flow18_annotated.fcs - an FCS file with all original channels. An extra channel is included that identifies cell types assigned by expert gating. The data are compensated and all fluorescent parameters scaled by the function asinh(x/150).
  • flow18_for_optsne.csv - the data from above but converted to a CSV file that can be used as input to the open-source Multicore-opt-SNE package. Only channels used for opt-SNE analysis are included in the file. Note that the events are sorted by class ID, so if subsetting the file it is recommended to randomize the row order beforehand.
  • tsne_results_flow18.csv - an example of opt-SNE results for the Flow18parameter dataset.

Mass41parameter

41 parameter mass cytometry dataset from Bendall SC et al. Single-cell mass cytometry of differential immune and drug responses across a human hematopoietic continuum. Science. 2011 May 6;332(6030):687-96. doi: 10.1126/science.1198704.

  • mass41_annotated.fcs - an FCS file produced by concatenation of multiple files from the original dataset. An extra channel is included that identifies cell types assigned by expert gating. An additional extra channel identifies file of origin. All mass parameters are scaled by the function asinh(x/5).
  • mass41_for_optsne.csv - the data from above but converted to a CSV file that can be used as input to the open-source Multicore-opt-SNE package. Only channels used for opt-SNE analysis are included in the file. Note that the events are sorted by class ID, so if subsetting the file it is recommended to randomize the row order beforehand.
  • tsne_results_mass41.csv - an example of opt-SNE results for the Mass41parameter dataset.