<aside> ☝
List of notes for this specialization + Lecture notes & Repository & Quizzes + Home page on Coursera. Read this note alongside the lecture notes—some points aren't mentioned here as they're already covered in the lecture notes.
</aside>
This course focuses on 1st 2 steps of the DE life cycle — data generation and source systems and data ingestion from those source system.
DE life cycle.
People misunderstand that 80% are for modeling and 20% are for ingestion but it’s not true, it’s the opposite!
Lot of the time is actually spent thinking about the data → for all of AI workloads.
In this course, we not only work with structured data, but also with text, image data and so forth. The volume of unstructured data in the world is much greater than the volume of structured data.
The plan:
3 types of data:
3 types of source system to ingest (they don’t need to be corresponding 1-to-1 with 3 types of data): Databases, Files and Streaming Systems.
In this figure: CRUD, DBMS, relational databases, non-relational (NoSQL) databases.
any types of files. They’re one of the most common source systems you’ll work with as a DE.
As a data engineer, you'll extract raw data from various sources like databases, files, and streaming systems. This data can be structured, semi-structured, or unstructured.