Inception, a new Palo Alto-based company started by Stanford computer science professor Stefano Ermon, claims to have developed a novel AI model based on “diffusion” technology. Inception calls it a diffusion-based large language model, or a “DLM” for short.
The generative AI models receiving the most attention now can be broadly divided into two types: large language models (LLMs) and diffusion models. LLMs are used for text generation. Meanwhile, diffusion models, which power AI systems like Midjourney and OpenAI’s Sora, are mainly used to create images, video, and audio.
Inception’s model offers the capabilities of traditional LLMs, including code generation and question-answering, but with significantly faster performance and reduced computing costs, according to the company.
Ermon told TechCrunch that he has been studying how to apply diffusion models to text for a long time in his Stanford lab. His research was based on the idea that traditional LLMs are relatively slow compared to diffusion technology.
With LLMs, “you cannot generate the second word until you’ve generated the first one, and you cannot generate the third one until you generate the first two,” Ermon said.
Ermon was looking for a way to apply a diffusion approach to text because, unlike with LLMs, which work sequentially, diffusion models start with a rough estimate of data they’re generating (e.g. ,a picture), and then bring the data into focus all at once.
Ermon hypothesized generating and modifying large blocks of text in parallel was possible with diffusion models. After years of trying, Ermon and a student of his achieved a major breakthrough, which they detailed in a research paper published last year.
TechCrunch Disrupt 2025
From AI and startups to space, fintech, and IPOs—experience game-changing insights across five main stages, breakouts, roundtables, unparalleled networking, and so much more.
San Francisco, CA | October 27-29
REGISTER NOW
Recognizing the advancement’s potential, Ermon founded Inception last summer, tapping two former students, UCLA professor Aditya Grover and Cornell professor Volodymyr Kuleshov, to co-lead the company.
While Ermon declined to discuss Inception’s funding, TechCrunch understands that the Mayfield Fund has invested.
Inception has already secured several customers, including unnamed Fortune 100 companies, by addressing their critical need for reduced AI latency and increased speed, Emron said.
“What we found is that our models can leverage the GPUs much more efficiently,” Ermon said, referring to the computer chips commonly used to run models in production. “I think this is a big deal. This is going to change the way people build language models.”
Inception offers an API as well as on-premises and edge device deployment options, support for model fine-tuning, and a suite of out-of-the-box DLMs for various use cases. The company claims its DLMs can run up to 10x faster than traditional LLMs while costing 10x less.
“Our ‘small’ coding model is as good as [OpenAI’s] GPT-4o mini while more than 10 times as fast,” a company spokesperson told TechCrunch. “Our ‘mini’ model outperforms small open-source models like [Meta’s] Llama 3.1 8B and achieves more than 1,000 tokens per second.”
“Tokens” is industry parlance for bits of raw data. One thousand tokens per second is an impressive speed indeed, assuming Inception’s claims hold up.
Topics
AI
diffusion
dlms
Funding
Fundraising
inception
Startups
Marina Temkin
Marina Temkin
Reporter, Venture
Marina Temkin is a venture capital and startups reporter at TechCrunch. Prior to joining TechCrunch, she wrote about VC for PitchBook and Venture Capital Journal. Earlier in her career, Marina was a financial analyst and earned a CFA charterholder designation.
View Bio
ChatGPT 说:
This is an exciting development in AI! Inception’s diffusion-based large language model (DLM) could be a game-changer if it truly delivers the speed and efficiency improvements it claims. The ability to generate large blocks of text in parallel, rather than sequentially like traditional LLMs, could make AI applications significantly faster and more cost-effective.
Some key takeaways:
- Speed & Cost: Inception’s DLMs claim to be up to 10x faster and 10x cheaper than traditional LLMs.
- Parallel Processing: Unlike LLMs that generate words sequentially, DLMs refine large blocks of text in parallel, similar to how diffusion models generate images.
- Commercial Interest: Already attracting Fortune 100 customers, suggesting strong demand.
- Funding & Team: Founded by Stanford professor Stefano Ermon, with backing from Mayfield Fund and co-leaders from UCLA and Cornell.
If Inception’s models perform as advertised, this could shift the AI landscape, especially for applications where speed is critical (e.g., real-time assistants, coding models).
Are you interested in exploring how this could impact the broader AI industry or specific applications?
您说:
译成中文
ChatGPT 说:
这是人工智能领域一项令人兴奋的发展!如果 Inception 的基于扩散的大语言模型(DLM)真的能实现其所宣称的速度和效率提升,那么它可能会带来重大变革。这种模型能够并行生成大块文本,而不像传统 LLM 那样逐字生成,这可能会使 AI 应用显著加快并降低成本。
关键要点:
- 速度与成本:Inception 声称其 DLM 比传统 LLM 快 10 倍,成本低 10 倍。
- 并行处理:不同于传统 LLM 需要按顺序生成单词,DLM 采用类似扩散模型的方式一次性生成和优化大块文本。
- 商业价值:Inception 已经吸引了多家世界 100 强企业客户,表明市场需求强劲。
- 团队与融资:由斯坦福教授 Stefano Ermon 创立,并获得 Mayfield Fund 投资,联合创始人包括 UCLA 教授 Aditya Grover 和康奈尔大学教授 Volodymyr Kuleshov。
如果 Inception 的 DLM 模型能达到其宣传的性能水平,这可能会颠覆 AI 领域,尤其是在实时对话助手、代码生成等需要高速度的应用中。
你对这项技术的潜在影响感兴趣吗?或者想了解它可能如何影响 AI 行业的未来?
您说:
译成中文:Inception, a new Palo Alto-based company started by Stanford computer science professor Stefano Ermon, claims to have developed a novel AI model based on “diffusion” technology. Inception calls it a diffusion-based large language model, or a “DLM” for short.
The generative AI models receiving the most attention now can be broadly divided into two types: large language models (LLMs) and diffusion models. LLMs are used for text generation. Meanwhile, diffusion models, which power AI systems like Midjourney and OpenAI’s Sora, are mainly used to create images, video, and audio.
Inception’s model offers the capabilities of traditional LLMs, including code generation and question-answering, but with significantly faster performance and reduced computing costs, according to the company.
Ermon told TechCrunch that he has been studying how to apply diffusion models to text for a long time in his Stanford lab. His research was based on the idea that traditional LLMs are relatively slow compared to diffusion technology.
With LLMs, “you cannot generate the second word until you’ve generated the first one, and you cannot generate the third one until you generate the first two,” Ermon said.
Ermon was looking for a way to apply a diffusion approach to text because, unlike with LLMs, which work sequentially, diffusion models start with a rough estimate of data they’re generating (e.g. ,a picture), and then bring the data into focus all at once.
Ermon hypothesized generating and modifying large blocks of text in parallel was possible with diffusion models. After years of trying, Ermon and a student of his achieved a major breakthrough, which they detailed in a research paper published last year.
TechCrunch Disrupt 2025
From AI and startups to space, fintech, and IPOs—experience game-changing insights across five main stages, breakouts, roundtables, unparalleled networking, and so much more.
San Francisco, CA | October 27-29
REGISTER NOW
Recognizing the advancement’s potential, Ermon founded Inception last summer, tapping two former students, UCLA professor Aditya Grover and Cornell professor Volodymyr Kuleshov, to co-lead the company.
While Ermon declined to discuss Inception’s funding, TechCrunch understands that the Mayfield Fund has invested.
Inception has already secured several customers, including unnamed Fortune 100 companies, by addressing their critical need for reduced AI latency and increased speed, Emron said.
“What we found is that our models can leverage the GPUs much more efficiently,” Ermon said, referring to the computer chips commonly used to run models in production. “I think this is a big deal. This is going to change the way people build language models.”
Inception offers an API as well as on-premises and edge device deployment options, support for model fine-tuning, and a suite of out-of-the-box DLMs for various use cases. The company claims its DLMs can run up to 10x faster than traditional LLMs while costing 10x less.
“Our ‘small’ coding model is as good as [OpenAI’s] GPT-4o mini while more than 10 times as fast,” a company spokesperson told TechCrunch. “Our ‘mini’ model outperforms small open-source models like [Meta’s] Llama 3.1 8B and achieves more than 1,000 tokens per second.”
“Tokens” is industry parlance for bits of raw data. One thousand tokens per second is an impressive speed indeed, assuming Inception’s claims hold up.
ChatGPT 说:
Inception 是一家位于帕洛阿尔托(Palo Alto)的新创公司,由斯坦福大学计算机科学教授 Stefano Ermon 创立。该公司声称已开发出一种基于“扩散”(diffusion)技术的新型 AI 模型,并将其称为基于扩散的大语言模型(DLM)。
目前最受关注的生成式 AI 模型大致可分为两类:大语言模型(LLM) 和 扩散模型。LLM 主要用于文本生成,而扩散模型则是AI 生成图像、视频和音频的核心技术,例如 Midjourney 和 OpenAI 的 Sora。
DLM:比传统 LLM 更快、更高效
Inception 的 DLM 具备 LLM 的能力,如代码生成和问答功能,但据该公司称,其模型的性能显著更快,计算成本更低。
Ermon 在接受 TechCrunch 采访时表示,他在斯坦福实验室研究如何将扩散模型应用于文本已有多年。他的研究基于一个关键想法:传统 LLM 相较于扩散技术而言处理速度较慢。
“使用 LLM 时,你必须先生成第一个单词,才能生成第二个;必须先有前两个单词,才能生成第三个。”——Ermon
因此,Ermon 一直在探索如何用扩散方法生成文本。与顺序生成的 LLM 不同,扩散模型可以先生成数据的粗略版本(例如一张模糊的图片),然后同时优化数据的各个部分,使其逐渐清晰。
Ermon 假设扩散模型可以同时生成和修改大块文本,而不必逐字处理。经过多年的研究,他和一名学生最终取得重大突破,并在去年发表了一篇研究论文。
创立 Inception,获 Mayfield Fund 投资
意识到这一技术的潜力后,Ermon 于去年夏天创立了 Inception,并邀请两位前学生——UCLA 教授 Aditya Grover 和 康奈尔大学教授 Volodymyr Kuleshov——共同领导公司。
虽然 Ermon 拒绝透露融资详情,但 TechCrunch 了解到,Mayfield Fund 已投资该公司。
DLM 已获 Fortune 100 级客户青睐
Ermon 表示,Inception 的 DLM 解决了企业在 AI 计算中的核心痛点——降低延迟、提高速度,因此已获得多家全球 100 强企业的客户(但公司未透露具体名单)。
“我们发现,我们的模型可以更高效地利用 GPU,”Ermon 说道。 “这是一件大事,它将改变人们构建语言模型的方式。”
Inception 提供 API、本地部署及边缘计算支持
Inception 提供 API 接口、本地(on-premises)和边缘设备(edge)部署选项,支持模型微调(fine-tuning),并提供多个即用型 DLM 以满足不同应用需求。
该公司声称,其 DLM 比传统 LLM 运行速度快 10 倍,成本低 10 倍。
“我们的**‘小型’代码模型**的表现可与 [OpenAI 的] GPT-4o mini 媲美,但运行速度快 10 倍以上,”Inception 的发言人表示。 “我们的 ‘迷你’(mini)模型 甚至超越了 [Meta 的] Llama 3.1 8B,并能以超过 1000 tokens/秒 的速度生成文本。”
在 AI 领域,“token” 指的是 AI 处理的数据单位。每秒 1000 个 token 的速度确实非常惊人,前提是 Inception 的技术能达到其承诺的水平。
4o
Comments (0)
No comments