Artwork

Konten disediakan oleh Sanket Gupta. Semua konten podcast termasuk episode, grafik, dan deskripsi podcast diunggah dan disediakan langsung oleh Sanket Gupta atau mitra platform podcast mereka. Jika Anda yakin seseorang menggunakan karya berhak cipta Anda tanpa izin, Anda dapat mengikuti proses yang diuraikan di sini https://id.player.fm/legal.
Player FM - Aplikasi Podcast
Offline dengan aplikasi Player FM !

26: Building Data Engineering Pipelines at Scale (with Data Warehouse, Spark and Airflow)

39:30
 
Bagikan
 

Manage episode 300256049 series 2550866
Konten disediakan oleh Sanket Gupta. Semua konten podcast termasuk episode, grafik, dan deskripsi podcast diunggah dan disediakan langsung oleh Sanket Gupta atau mitra platform podcast mereka. Jika Anda yakin seseorang menggunakan karya berhak cipta Anda tanpa izin, Anda dapat mengikuti proses yang diuraikan di sini https://id.player.fm/legal.

Imagine you are at a beach and you are hanging out and seeing all the waves come and go and all the shells on the beach. And you get an idea. How about you collect these shells and make necklaces to sell? Well how would you go about doing this? Maybe you’d collect a few shells and make a small necklace and try to show to your friend. This is where we begin our journey on learning about data engineering pipelines.

Using an example of running a necklace business from shells - we learn about the following data engineering concepts:

1. ETL - Extract Transform Load vs ELT - Extract Load Transform concepts. Why Data Warehouses are great for analytics.

2. Spark for large data processing and hosting / running

3. Data orchestration using Airflow

My blog on Towards Data Science about moving from Pandas to Spark: https://towardsdatascience.com/moving-from-pandas-to-spark-7b0b7d956adb

Great book to learn about Spark: https://www.amazon.com/dp/1492050040/?tag=omnilence-20

Tools covered in the episode:

dbt: https://www.getdbt.com/

Databricks: https://databricks.com/

EMR: https://aws.amazon.com/emr/

AWS Redshift: https://aws.amazon.com/redshift/

Snowflake: https://www.snowflake.com/

Delta Lake: https://databricks.com/product/delta-lake-on-databricks

--- Send in a voice message: https://podcasters.spotify.com/pod/show/the-data-life-podcast/message Support this podcast: https://podcasters.spotify.com/pod/show/the-data-life-podcast/support
  continue reading

27 episode

Artwork
iconBagikan
 
Manage episode 300256049 series 2550866
Konten disediakan oleh Sanket Gupta. Semua konten podcast termasuk episode, grafik, dan deskripsi podcast diunggah dan disediakan langsung oleh Sanket Gupta atau mitra platform podcast mereka. Jika Anda yakin seseorang menggunakan karya berhak cipta Anda tanpa izin, Anda dapat mengikuti proses yang diuraikan di sini https://id.player.fm/legal.

Imagine you are at a beach and you are hanging out and seeing all the waves come and go and all the shells on the beach. And you get an idea. How about you collect these shells and make necklaces to sell? Well how would you go about doing this? Maybe you’d collect a few shells and make a small necklace and try to show to your friend. This is where we begin our journey on learning about data engineering pipelines.

Using an example of running a necklace business from shells - we learn about the following data engineering concepts:

1. ETL - Extract Transform Load vs ELT - Extract Load Transform concepts. Why Data Warehouses are great for analytics.

2. Spark for large data processing and hosting / running

3. Data orchestration using Airflow

My blog on Towards Data Science about moving from Pandas to Spark: https://towardsdatascience.com/moving-from-pandas-to-spark-7b0b7d956adb

Great book to learn about Spark: https://www.amazon.com/dp/1492050040/?tag=omnilence-20

Tools covered in the episode:

dbt: https://www.getdbt.com/

Databricks: https://databricks.com/

EMR: https://aws.amazon.com/emr/

AWS Redshift: https://aws.amazon.com/redshift/

Snowflake: https://www.snowflake.com/

Delta Lake: https://databricks.com/product/delta-lake-on-databricks

--- Send in a voice message: https://podcasters.spotify.com/pod/show/the-data-life-podcast/message Support this podcast: https://podcasters.spotify.com/pod/show/the-data-life-podcast/support
  continue reading

27 episode

모든 에피소드

×
 
Loading …

Selamat datang di Player FM!

Player FM memindai web untuk mencari podcast berkualitas tinggi untuk Anda nikmati saat ini. Ini adalah aplikasi podcast terbaik dan bekerja untuk Android, iPhone, dan web. Daftar untuk menyinkronkan langganan di seluruh perangkat.

 

Panduan Referensi Cepat