Artwork

Konten disediakan oleh TWIML and Sam Charrington. Semua konten podcast termasuk episode, grafik, dan deskripsi podcast diunggah dan disediakan langsung oleh TWIML and Sam Charrington atau mitra platform podcast mereka. Jika Anda yakin seseorang menggunakan karya berhak cipta Anda tanpa izin, Anda dapat mengikuti proses yang diuraikan di sini https://id.player.fm/legal.
Player FM - Aplikasi Podcast
Offline dengan aplikasi Player FM !

Dataflow Computing for AI Inference with Kunle Olukotun - #751

57:37
 
Bagikan
 

Manage episode 513616212 series 2355587
Konten disediakan oleh TWIML and Sam Charrington. Semua konten podcast termasuk episode, grafik, dan deskripsi podcast diunggah dan disediakan langsung oleh TWIML and Sam Charrington atau mitra platform podcast mereka. Jika Anda yakin seseorang menggunakan karya berhak cipta Anda tanpa izin, Anda dapat mengikuti proses yang diuraikan di sini https://id.player.fm/legal.

In this episode, we're joined by Kunle Olukotun, professor of electrical engineering and computer science at Stanford University and co-founder and chief technologist at Sambanova Systems, to discuss reconfigurable dataflow architectures for AI inference. Kunle explains the core idea of building computers that are dynamically configured to match the dataflow graph of an AI model, moving beyond the traditional instruction-fetch paradigm of CPUs and GPUs. We explore how this architecture is well-suited for LLM inference, reducing memory bandwidth bottlenecks and improving performance. Kunle reviews how this system also enables efficient multi-model serving and agentic workflows through its large, tiered memory and fast model-switching capabilities. Finally, we discuss his research into future dynamic reconfigurable architectures, and the use of AI agents to build compilers for new hardware.

The complete show notes for this episode can be found at https://twimlai.com/go/751.

  continue reading

777 episode

Artwork
iconBagikan
 
Manage episode 513616212 series 2355587
Konten disediakan oleh TWIML and Sam Charrington. Semua konten podcast termasuk episode, grafik, dan deskripsi podcast diunggah dan disediakan langsung oleh TWIML and Sam Charrington atau mitra platform podcast mereka. Jika Anda yakin seseorang menggunakan karya berhak cipta Anda tanpa izin, Anda dapat mengikuti proses yang diuraikan di sini https://id.player.fm/legal.

In this episode, we're joined by Kunle Olukotun, professor of electrical engineering and computer science at Stanford University and co-founder and chief technologist at Sambanova Systems, to discuss reconfigurable dataflow architectures for AI inference. Kunle explains the core idea of building computers that are dynamically configured to match the dataflow graph of an AI model, moving beyond the traditional instruction-fetch paradigm of CPUs and GPUs. We explore how this architecture is well-suited for LLM inference, reducing memory bandwidth bottlenecks and improving performance. Kunle reviews how this system also enables efficient multi-model serving and agentic workflows through its large, tiered memory and fast model-switching capabilities. Finally, we discuss his research into future dynamic reconfigurable architectures, and the use of AI agents to build compilers for new hardware.

The complete show notes for this episode can be found at https://twimlai.com/go/751.

  continue reading

777 episode

Tüm bölümler

×
 
Loading …

Selamat datang di Player FM!

Player FM memindai web untuk mencari podcast berkualitas tinggi untuk Anda nikmati saat ini. Ini adalah aplikasi podcast terbaik dan bekerja untuk Android, iPhone, dan web. Daftar untuk menyinkronkan langganan di seluruh perangkat.

 

Panduan Referensi Cepat

Dengarkan acara ini sambil menjelajah
Putar