
Tahoe Therapeutics Secures $30 Million to Build the World’s Largest Dataset for Training AI-Powered Human Cell Models, Aiming to Transform Precision Medicine
Tahoe Therapeutics, a biotechnology company at the intersection of artificial intelligence and cellular biology, has announced the successful close of a $30 million funding round. The capital infusion will be used to create what the company calls the “definitive foundational dataset” for training next-generation Virtual Cell Models — AI-powered computational representations of human cellular behavior.
The scale of the undertaking is unprecedented. Tahoe plans to generate one billion single-cell datapoints and map one million drug–patient interactions, a level of resolution and breadth of coverage that has never been attempted in biomedical research. By capturing the way thousands of drug molecules interact with a broad spectrum of human biological states, Tahoe aims to significantly accelerate drug discovery, reduce the failure rates of clinical trials, and usher in a new era of precision medicine for cancer and other complex diseases.
Funding Led by High-Profile Investors
The latest financing was spearheaded by Amplify Partners, with participation from an impressive roster of investors that includes Databricks Ventures, Wing Venture Capital, General Catalyst, Civilization Ventures, Conviction, Mubadala Capital Ventures, and AIX Ventures.
These backers bring a mix of expertise in biotechnology, artificial intelligence, data infrastructure, and life sciences commercialization — a strategic advantage for a company operating at the cutting edge of two rapidly evolving fields.
Sunil Dhaliwal, General Partner at Amplify Partners, emphasized the transformative potential of Tahoe’s work:
“While structural models have accelerated molecular design, they rarely translate to clinical success — a problem that remains one of the biggest challenges in drug development. Tahoe Therapeutics is uniquely positioned to move the industry past this bottleneck by generating massive drug–patient datasets and training high-dimensional, cell-based AI models. We’re proud to back this exceptional team as they combine biology and computation to accelerate clinical impact.”
Building on the Success of Tahoe-100M
This new initiative builds upon the momentum of Tahoe-100M, released just months earlier. Tahoe-100M is the world’s first gigascale perturbative single-cell dataset — a resource that has already become foundational for teams developing AI-driven virtual cell models.
Since its open-source release, Tahoe-100M has been downloaded nearly 100,000 times, finding use among major AI research labs, pharmaceutical companies, and academic institutions. Researchers have leveraged the dataset to identify promising new therapeutic candidates for major cancer subtypes and to uncover novel drug targets across multiple modalities.
By dramatically expanding the scale from 100 million to one billion datapoints, Tahoe is setting out to create the biological equivalent of a “GPT moment” — a leap forward in capability that enables predictive models of cellular biology to make clinically actionable forecasts.
The Vision: A “GPT Moment” for Human Cell Models
Tahoe’s approach reflects a deep belief that AI models trained on massive, biologically rich datasets can fundamentally change the way drugs are discovered and developed. Traditional drug development often relies on structural models, which help design molecules in silico but do not always predict how those molecules will behave in the complexity of human biology.
In contrast, Tahoe’s Virtual Cell Models aim to simulate how actual human cells — across different patient types and disease states — respond to therapeutic interventions. This could dramatically improve predictions of efficacy and safety before a drug ever enters clinical trials.

Co-founder and CEO Nima Alidoust underscored the magnitude of the project:
“Building Tahoe-100M required us to invent new ways to generate single-cell data. Now, we’re applying that superpower to go 10x further. This next phase is about using these massive datasets to bring about the GPT moment for AI models of human cells, translating insights to clinical readouts, and developing new medicines with much lower clinical failure rates.”
A Unique Partnership Model
In addition to pushing its own therapeutic programs toward clinical development, Tahoe plans to adopt a selective collaboration strategy. The company will choose a single strategic partner — either a pharmaceutical company or an AI technology firm — to share access to the new billion-cell dataset.
This exclusive arrangement is designed to accelerate the translation of Tahoe’s massive datasets into real-world clinical outcomes. The chosen partner will bring complementary expertise in clinical development, advanced modeling, or both, with the joint aim of producing the first medicines ever developed using virtual cell models at this scale.
Scientific and Technological Foundations
Tahoe’s platform represents a convergence of expertise in single-cell genomics, machine learning, and drug discovery. By performing large-scale, single-cell drug screening across diverse patient-derived samples, the company can observe how drugs perturb cellular states in different genetic and environmental contexts.
The technology emerged from scientific breakthroughs at University of California, San Francisco (UCSF), where Tahoe’s founding team — Nima Alidoust, Johnny Yu, Hani Goodzari, and Kevan Shokat — built deep expertise in decoding the molecular mechanisms underlying disease.
What makes Tahoe’s approach particularly disruptive is scalability. Historically, high-resolution single-cell drug screening has been too resource-intensive to perform on a large scale. Tahoe has developed proprietary methods to dramatically reduce the cost and increase the throughput of generating these datasets, making gigascale biological mapping possible for the first time.
Impact on Drug Development and Precision Medicine
The implications of Tahoe’s billion-cell dataset extend across the pharmaceutical value chain.
1. More Accurate Preclinical Predictions:
By training AI models on a dataset that spans a million drug–patient interaction profiles, researchers can better predict how a therapy will behave in specific patient subgroups. This could significantly reduce the risk of costly late-stage trial failures.
2. Faster Identification of New Targets:
The data could reveal novel biomarkers or molecular pathways that are essential for disease progression, unlocking entirely new therapeutic approaches.
3. Broad Therapeutic Applications:
While oncology is a primary focus — given the complexity and heterogeneity of cancer — the same data infrastructure could be applied to autoimmune diseases, neurological disorders, and infectious diseases.
4. Accelerated Precision Medicine:
The combination of AI and single-cell biology could allow for the creation of highly targeted therapies matched to the molecular profile of an individual’s disease, delivering maximum efficacy with minimal side effects.
The billion-cell dataset project is expected to take shape over the next several years, with milestones including incremental data releases, model training demonstrations, and the initiation of collaborative drug development programs.
Tahoe’s decision to work with only one strategic partner for this dataset reflects its intent to focus resources and maximize the impact of the first large-scale application of Virtual Cell Models in drug development.
If successful, Tahoe’s platform could mark a turning point in the life sciences, shifting the industry’s reliance from trial-and-error experimentation toward data-driven, predictive modeling grounded in real human biology.
About Tahoe Therapeutics
Tahoe Therapeutics is a biotechnology company developing Virtual Cell Models trained on massive-scale single-cell datasets to accelerate the discovery and development of precision medicines. The company’s technology platform combines innovations in genomics, machine learning, and high-throughput screening to map how thousands of drug molecules interact with human cells across diverse disease contexts. Founded by leaders in cell biology, computational science, and drug discovery, Tahoe is building the data infrastructure necessary for the next generation of AI-powered biomedical breakthroughs.