Data engineers ecosystem includes infrastructure to
- extract
- architecting and managing pipelines and data repositories
- optimize workflow and data flow
- developing applications needed
Data can be categorized into
- structured (follows format and in rows and columns)
- semi-structured (consistent but not rigid)
- unstructured (complex and qualitative)
Types of data repositories
- transactional (OLTP), store high volume of data, mostly relational
- analytical (OLAP), relation or non-relational, data warehouses/lakes
Collated -> Processed -> Cleansed -> Integrated -> Users
Data pipeline is a set of tools and processes of data from source to destination (ETL or ELT)
BI and reporting tools present integrated data in a visual format in a drag and drop manner