Home›Companies›Cactus

Kernels & Engine for running AI on phones

App developers can now deploy private, local, offline AI models in their mobile apps, achieving up to 150 tokens/sec and <50ms time to first token. Cactus is used by 3k+ developers and completes 500k+ weekly inference tasks on phones today. It is open-source! Check out the repo: https://github.com/cactus-compute/cactus
Active Founders
Roman Shemet
Roman Shemet
Founder
Former quant & economist with a background in product and data engineering. Pilot, triathlete, chess enthusiast. Working on mobile inference @ Cactus.
Henry Ndubuaku
Henry Ndubuaku
Founder
Sidestepped an AI/ML role at Nvidia to work on Cactus.
Company Launches
Cactus 🌵: Deploy AI models locally on smartphones
See original launch post

TL;DR

Deploy AI models locally, privately, and offline in any app using Cactus. Cactus is a blazing-fast inference engine optimized for smartphones and comes with React Native, Flutter, and Kotlin bindings.

https://youtu.be/xwKrmYkJZD8

Our framework:

Cactus is a cross-platform & open-source framework for doing inference on smartphones, wearables, and other low power devices. It supports any LLM or VLM available on HuggingFace directly.

The recently released Google AI Edge and Apple Foundation Frameworks are platform-specific and primarily support specific models from the companies.

To this end, Cactus:

  • Is available in Flutter and React-Native for cross-platform developers, since most apps are built with these today.

  • Supports any GGUF model you can find on Huggingface; Qwen, Gemma, Llama, DeepSeek, Phi, Mistral, SmolLM, SmolVLM, InternVLM, Jan Nano etc.

  • Accommodates from FP32 to as low as 2-bit quantized models, for better

    efficiency and less device strain.

  • Have MCP tool-calls to make models performant, truly helpful (set reminder, gallery search, reply messages) and more.

  • Fallback to big cloud models for complex, constrained or large-context tasks, ensuring robustness and high availability.

So far, our customers have built:

  • Personalised and private RAG and prompt-enhancement pipelines for their app users.
  • Offline fallback for the big remote AI models.
  • Phone tool use agents like gallery & calendar management.
  • AI for medical and other privacy-pertinent industries.

Some demos:

LLMs and embedding models

Real-time vision inference

Tell us how we can make it great!

Repo: https://github.com/cactus-compute/cactus
Discord: https://discord.gg/nPGWGxXSwr

YC Photos
Hear from the founders

How did your company get started? (i.e., How did the founders meet? How did you come up with the idea? How did you decide to be a founder?)

We met through YC co-founder matching in London four years ago!

Jobs at Cactus
San Francisco, CA, US / Remote (San Francisco, CA, US)
$120K - $180K
0.10% - 1.00%
Any (new grads ok)
San Francisco, CA, US / Remote
$120K - $180K
0.10% - 1.00%
Any (new grads ok)
Cactus
Founded:2025
Batch:Summer 2025
Team Size:3
Status:
Active
Location:San Francisco
Primary Partner:Andrew Miklas