From 36821a0ae43a7f00acd39c8b2d43cdd702e21e2b Mon Sep 17 00:00:00 2001
From: Marcin Rudolf <rudolfix@rudolfix.org>
Date: Sun, 17 Sep 2023 16:11:51 +0200
Subject: [PATCH] post merge fixes

---
 .../website/docs/user-guides/data-beginner.md | 130 ---------------
 .../docs/user-guides/data-scientist.md        | 129 ---------------
 .../docs/user-guides/engineering-manager.md   | 155 ------------------
 3 files changed, 414 deletions(-)
 delete mode 100644 docs/website/docs/user-guides/data-beginner.md
 delete mode 100644 docs/website/docs/user-guides/data-scientist.md
 delete mode 100644 docs/website/docs/user-guides/engineering-manager.md

diff --git a/docs/website/docs/user-guides/data-beginner.md b/docs/website/docs/user-guides/data-beginner.md
deleted file mode 100644
index e6dd8b8d22..0000000000
--- a/docs/website/docs/user-guides/data-beginner.md
+++ /dev/null
@@ -1,130 +0,0 @@
----
-title: Data Beginner
-description: A guide to using dlt for aspiring data professionals
-keywords: [beginner, analytics, machine learning]
----
-
-# Data Beginner
-
-If you are an aspiring data professional, here are some ways you can showcase your understanding and
-value to data teams with the help of `dlt`.
-
-## Analytics: Empowering decision-makers
-
-Operational users at a company need general business analytics capabilities to make decisions, e.g.
-dashboards, data warehouse, self-service, etc.
-
-### Show you can deliver results, not numbers
-
-The goal of such a project is to get you into the top 5% of candidates, so you get invited to an
-interview and understand pragmatically what is expected of you.
-
-Depending on whether you want to be more in engineering or analytics, you can focus on different
-parts of this project. If you showcase that you are able to deliver end to end, there remains little
-reason for a potential employer to not hire you.
-
-Someone hiring folks on this business analytics path will be looking for the following skills:
-
-- Can you load data to a db?
-  - Can you do incremental loading?
-  - Are your pipelines maintainable?
-  - Are your pipelines reusable? Do they take meaningful arguments?
-- Can you transform the data to a standard architecture?
-  - Do you know dimensional modelling architecture?
-  - Does your model make the data accessible via a user facing tool to a business user?
-  - Can you translate a business requirement into a technical requirement?
-- Can you identify a use case and prepare reporting?
-  - Are you displaying a sensible use case?
-  - Are you taking a pragmatic approach as to what should be displayed and why?
-  - Did you hard code charts in a notebook that the end user cannot use or did you use a user-facing
-    dashboard tool?
-  - Is the user able to answer follow-up questions by changing the dimensions in a tool or did you
-    hard code queries?
-
-Project idea:
-
-1. Choose an API that produces data. If this data is somehow business relevant, that’s better. Many
-   business apps offer free developer accounts that allow you to develop business apps with them.
-1. Choose a use case for this data. Make sure this use case makes some business sense and is not
-   completely theoretical. Business understanding and pragmatism are key for such roles, so do not
-   waste your chance to show it. Keep the use case simple-otherwise it will not be pragmatic right
-   off the bat, handicapping yourself from a good outcome. A few examples are ranking leads in a
-   sales CRM, clustering users, and something around customer lifetime value predictions.
-1. Build a dlt pipeline that loads data from the API for your use case. Keep the case simple and
-   your code clean. Use explicit variable and method names. Tell a story with your code. For loading
-   mode, use incremental loading and don’t hardcode parameters that are subject to change.
-1. Build a [dbt package](../dlt-ecosystem/transformations/dbt.md) for this pipeline.
-1. Build a visualization. Focus on usability more than code. Remember, your goal is to empower a
-   business user to self-serve, so hard coded dashboards are usually seen as liabilities that need
-   to be maintained. On the other hand, dashboard tools can be adjusted by business users too. For
-   example, the free “Looker studio” fro Google is relatable to business users, while notebooks
-   might make them feel insecure. Your evaluator will likely not take time to set up and run your
-   things, so make sure your outcomes are well documented with images. Make sure they are self
-   readable, explain how you intend the business user to use this visualization to fulfil the use
-   case.
-1. Make it presentable somewhere public, such as GitHub, and add docs. Show it to someone for
-   feedback. You will find likeminded people in
-   [our Slack](https://join.slack.com/t/dlthub-community/shared_invite/zt-1slox199h-HAE7EQoXmstkP_bTqal65g)
-   that will happily give their opinion.
-
-## Machine Learning: Automating decisions
-
-Solving specific business problems with data products that generate further insights and sometimes
-automate decisions.
-
-### Show you can solve business problems
-
-Here the challenges might seem different from the business analytics path, but they are often quite
-similar. Many courses focus on statistics and data science but very few focus on pragmatic
-approaches to solving business problems in organizations. Most of the time, the largest obstacles to
-solving a problem with ML are not purely algorithmic but rather about the semantics of the business,
-data, and people who need to use the data products.
-
-Employers look for a project that showcases both technical ability and business pragmatism in a use
-case. In reality, data does not typically come in files but via APIs with fresh data, where you
-usually will have to grab it and move it somewhere to use, so show your ability to deliver end to
-end.
-
-Project idea:
-
-1. Choose an API that produces data. If this data is somehow business relevant, that’s better. Many
-   business apps offer free developer accounts that allow you to develop business apps with them.
-1. Choose a use case for this data. Make sure this use case makes some business sense and is not
-   completely theoretical. Business understanding and pragmatism are key for such roles, so do not
-   waste your chance to show it. Keep the use case simple-otherwise it will not be pragmatic right
-   off the bat, handicapping yourself from a good outcome. A few examples are ranking leads in a
-   sales CRM, clustering users, and something around customer lifetime value predictions.
-1. Build a dlt pipeline that loads data from the API for your use case. Keep the case simple and
-   your code clean. Use explicit variable and method names. Tell a story with your code. For loading
-   mode, use incremental loading and don’t hardcode parameters that are subject to change.
-1. Build a data model with SQL. If you are ambitious you could try running the SQL with a
-   [dbt package](../dlt-ecosystem/transformations).
-1. Showcase your chosen use case that uses ML or statistics to achieve your goal. Don’t forget to
-   mention how you plan to do this “in production”. Choose a case that is simple so you don’t end up
-   overcomplicating your solution. Focus on outcomes and next steps. Describe what the company needs
-   to do to use your results, demonstrating that you understand the costs of your propositions.
-1. Make it presentable somewhere public, such as GitHub, and add docs. Show it to someone for
-   feedback. You will find likeminded people in
-   [our Slack](https://join.slack.com/t/dlthub-community/shared_invite/zt-1slox199h-HAE7EQoXmstkP_bTqal65g)
-   that will happily give their opinion.
-
-## Further reading
-
-Good docs pages to check out:
-
-- [Getting started.](../getting-started)
-- [Create a pipeline.](../walkthroughs/create-a-pipeline)
-- [Run a pipeline.](../walkthroughs/run-a-pipeline)
-- [Deploy a pipeline with GitHub Actions.](../walkthroughs/deploy-a-pipeline/deploy-with-github-actions)
-- [Understand the loaded data.](../general-usage/destination-tables.md)
-- [Explore the loaded data in Streamlit.](../dlt-ecosystem/visualizations/exploring-the-data.md)
-- [Transform the data with SQL or python.](../dlt-ecosystem/transformations)
-- [Contribute a pipeline.](https://github.com/dlt-hub/verified-sources/blob/master/CONTRIBUTING.md)
-
-Here are some example projects:
-
-- [Is DuckDB a database for ducks? Using DuckDB to explore the DuckDB open source community.](https://colab.research.google.com/drive/1NfSB1DpwbbHX9_t5vlalBTf13utwpMGx?usp=sharing)
-- [Using DuckDB to explore the Rasa open source community.](https://colab.research.google.com/drive/1c9HrNwRi8H36ScSn47m3rDqwj5O0obMk?usp=sharing)
-- [MRR and churn calculations on Stripe data.](../dlt-ecosystem/verified-sources/stripe.md)
-
-Please [open a PR](https://github.com/dlt-hub/verified-sources) to add projects that use `dlt` here!
diff --git a/docs/website/docs/user-guides/data-scientist.md b/docs/website/docs/user-guides/data-scientist.md
deleted file mode 100644
index b8415937e4..0000000000
--- a/docs/website/docs/user-guides/data-scientist.md
+++ /dev/null
@@ -1,129 +0,0 @@
----
-title: Data Scientist
-description: A guide to using dlt for Data Scientists
-keywords: [data scientist, data science, machine learning, machine learning engineer]
----
-
-
-# Data Scientist
-
-Data Load Tool (`dlt`) can be highly useful for Data Scientists in several ways. Here are three
-potential use cases:
-
-## Use case #1: Efficient Data Ingestion and Optimized Workflow
-
-Data Scientists often deal with large volumes of data from various sources. `dlt` can help
-streamline the process of data ingestion by providing a robust and scalable tool for loading data
-into their analytics environment. It can handle diverse data formats, such as CSV, JSON, or database
-dumps, and efficiently load them into a data lake or a data warehouse.
-
-![dlt-main](images/dlt-main.png)
-
-By using `dlt`, Data Scientists can save time and effort on data extraction and transformation
-tasks, allowing them to focus more on data analysis and models training. The tool is designed as a
-library that can be added to their code, making it easy to integrate into existing workflows.
-
-`dlt` can facilitate a seamless transition from data exploration to production deployment. Data
-Scientists can leverage `dlt` capabilities to load data in the format that matches the production
-environment while exploring and analyzing the data. This streamlines the process of moving from the
-exploration phase to the actual implementation of models, saving time and effort. By using `dlt`
-throughout the workflow, Data Scientists can ensure that the data is properly prepared and aligned
-with the production environment, leading to smoother integration and deployment of their models.
-
-- [Use existed Verified Sources](../walkthroughs/add-a-verified-source) and pipeline examples or
-  [create your own](../walkthroughs/create-a-pipeline) quickly.
-
-- [Deploy the pipeline](../walkthroughs/deploy-a-pipeline), so that the data is automatically loaded
-  on a schedule.
-
-- Transform the [loaded data](../dlt-ecosystem/transformations) with dbt or in
-  Pandas DataFrames.
-
-- Learn how to [run](../running-in-production/running),
-  [monitor](../running-in-production/monitoring), and [alert](../running-in-production/alerting)
-  when you put your pipeline in production.
-
-- Use `dlt` when doing exploration in a Jupyter Notebook and move more easily to production. Explore
-  our
-  [Colab Demo for Chess.com API](https://colab.research.google.com/drive/1NfSB1DpwbbHX9_t5vlalBTf13utwpMGx?usp=sharing)
-  to realize how easy it is to create and use `dlt` in your projects:
-
-  ![colab-demo](images/colab-demo.png)
-
-### `dlt` is optimized for local use on laptops
-
-- It offers a seamless
-  [integration with Streamlit](../dlt-ecosystem/visualizations/exploring-the-data.md).
-  This integration enables a smooth and interactive data analysis experience, where Data Scientists
-  can leverage the power of `dlt` alongside Streamlit's intuitive interface and visualization
-  capabilities.
-- In addition to Streamlit, `dlt` natively supports
-  [DuckDB](https://dlthub.com/docs/blog/is-duckdb-a-database-for-ducks), an in-process SQL OLAP
-  database management system. This native support ensures efficient data processing and querying
-  within `dlt`, leveraging the capabilities of DuckDB. By integrating DuckDB, Data Scientists can
-  benefit from fast and scalable data operations, enhancing the overall performance of their
-  analytical workflows.
-- Moreover, `dlt` provides resources that can directly return data in the form of
-  [Pandas DataFrames from an SQL client](../dlt-ecosystem/visualizations/exploring-the-data). This
-  feature simplifies data retrieval and allows Data Scientists to seamlessly work with data in
-  familiar Pandas DataFrame format. With this capability, Data Scientists can leverage the rich
-  ecosystem of Python libraries and tools that support Pandas.
-
-With `dlt`, the transition from local storage to remote is quick and easy. For example, read the
-documentation [Share a dataset: DuckDB -> BigQuery](../walkthroughs/share-a-dataset).
-
-## Use case #2:  Structured Data and Enhanced Data Understanding
-
-### Structured data
-
-Data Scientists often prefer structured data lakes over unstructured ones to facilitate efficient
-data analysis and modeling. `dlt` can help in this regard by offering seamless integration with
-structured data storage systems, allowing Data Scientists to easily load and organize their data in
-a structured format. This enables them to access and analyze the data more effectively, improving
-their understanding of the underlying data structure.
-
-![structured-data](images/structured-data.png)
-
-A `dlt` pipeline is made of a source, which contains resources, and a connection to the destination,
-which we call pipeline. So in the simplest use case, you could pass your unstructured data to the
-`pipeline` and it will automatically be migrated to structured at the destination. See how to do
-that in our [pipeline documentation](../general-usage/pipeline).
-
-Besides strurdiness, this also adds convenience by automatically converting json types to db types,
-such as timestamps, etc.
-
-Read more about schema evolution in our blog:
-**[The structured data lake: How schema evolution enables the next generation of data platforms](https://dlthub.com/docs/blog/next-generation-data-platform).**
-
-### Data exploration
-
-Data Scientists require a comprehensive understanding of their data to derive meaningful insights
-and build accurate models. `dlt` can contribute to this by providing intuitive and user-friendly
-features for data exploration. It allows Data Scientists to quickly gain insights into their data by
-visualizing data summaries, statistics, and distributions. With `dlt`, data understanding becomes
-clearer and more accessible, enabling Data Scientists to make informed decisions throughout the
-analysis process.
-
-Besides, having a schema imposed on the data acts as a technical description of the data,
-accelerating the discovery process.
-
-See [Destination tables](../general-usage/destination-tables.md) and
-[Exploring the data](../dlt-ecosystem/visualizations/exploring-the-data) in our documentation.
-
-## Use case #3: Data Preprocessing and Transformation
-
-Data preparation is a crucial step in the data science workflow. `dlt` can facilitate data
-preprocessing and transformation tasks by providing a range of built-in features. It simplifies
-various tasks like data cleaning, anonymizing, handling missing values, data type conversion,
-feature scaling, and feature engineering. Data Scientists can leverage these capabilities to clean
-and transform their datasets efficiently, making them suitable for subsequent analysis and modeling.
-
-Python-first users can heavily customize how `dlt` sources produce data, as `dlt` supports
-selecting, [filtering](../general-usage/resource#filter-transform-and-pivot-data),
-[renaming](../general-usage/customising-pipelines/renaming_columns),
-[anonymizing](../general-usage/customising-pipelines/pseudonymizing_columns), and just about any
-custom operation.
-
-Compliance is also a case where preprocessing is the way to solve the issue: Besides being
-python-friendly, the ability to apply transformation logic before loading data allows us to
-separate, filter or transform sensitive data.
diff --git a/docs/website/docs/user-guides/engineering-manager.md b/docs/website/docs/user-guides/engineering-manager.md
deleted file mode 100644
index 70e23eb2c1..0000000000
--- a/docs/website/docs/user-guides/engineering-manager.md
+++ /dev/null
@@ -1,155 +0,0 @@
----
-title: Staff Data Engineer
-description: A guide to using dlt for Staff Data Engineers
-keywords: [staff data engineer, senior data engineer, ETL engineer, head of data platform,
-  data platform engineer]
----
-
-# Staff Data Engineer
-
-Staff data engineers create data pipelines, data warehouses and data lakes in order to democratize
-access to data in their organizations.
-
-With `dlt` we offer a library and building blocks that data tool builders can use to create modern
-data infrastructure for their companies. Staff Data Engineer, Senior Data Engineer, ETL Engineer,
-Head of Data Platform - there’s a variety of titles of how data tool builders are called in
-companies.
-
-## What does this role do in an organisation?
-
-The job responsibilities of this senior vary, but often revolve around building and maintaining a
-robust data infrastructure:
-
-- Tech: They design and implement scalable data architectures, data pipelines, and data processing
-  frameworks.
-- Governance: They ensure data integrity, reliability, and security across the data stack. They
-  manage data governance, including data quality, data privacy, and regulatory compliance.
-- Strategy: Additionally, they evaluate and adopt new technologies, tools, and methodologies to
-  improve the efficiency, performance, and scalability of data processes.
-- Team skills and staffing: Their responsibilities also involve providing technical leadership,
-  mentoring team members, driving innovation, and aligning the data strategy with the organization's
-  overall goals.
-- Return on investment focus: Ultimately, their focus is on empowering the organization to derive
-  actionable insights, make data-driven decisions, and unlock the full potential of their data
-  assets.
-
-## Choosing a Data Stack
-
-The above roles play a critical role in choosing the right data stack for their organization. When
-selecting a data stack, they need to consider several factors. These include:
-
-- The organization's data requirements.
-- Scalability, performance, data governance and security needs.
-- Integration capabilities with existing systems and tools.
-- Team skill sets, budget, and long-term strategic goals.
-
-They evaluate the pros and cons of various technologies, frameworks, and platforms, considering
-factors such as ease of use, community support, vendor reliability, and compatibility with their
-specific use cases. The goal is to choose a data stack that aligns with the organization's needs,
-enables efficient data processing and analysis, promotes data governance and security, and empowers
-teams to deliver valuable insights and solutions.
-
-## What does a senior architect or engineer consider when choosing a tech stack?
-
-- Company Goals and Strategy.
-- Cost and Return on Investment (ROI).
-- Staffing and Skills.
-- Employee Happiness and Productivity.
-- Maintainability and Long-term Support.
-- Integration with Existing Systems.
-- Scalability and Performance.
-- Data Security and Compliance.
-- Vendor Reliability and Ecosystem.
-
-## What makes dlt a must-have for your data stack or platform?
-
-For starters, `dlt` is the first data pipeline solution that is built for your data team's ROI. Our
-vision is to add value, not gatekeep it.
-
-By being a library built to enable free usage, we are uniquely positioned to run in existing stacks
-without replacing them. This enables us to disrupt and revolutionise the industry in ways that only
-open source communities can.
-
-## dlt massively reduces pipeline maintenance, increases efficiency and ROI
-
-- Reduce engineering effort as much as 5x via a paradigm shift. Structure data automatically to not
-  do it manually.
-  Read about [structured data lake](https://dlthub.com/docs/blog/next-generation-data-platform), and
-  [how to do schema evolution](../reference/explainers/schema-evolution.md).
-- Better Collaboration and Communication: Structured data promotes better collaboration and
-  communication among team members. Since everyone operates on a shared understanding of the data
-  structure, it becomes easier to discuss and align on data-related topics. Queries, reports, and
-  analysis can be easily shared and understood by others, enhancing collaboration and teamwork.
-- Faster time to build pipelines: After extracting data, if you pass it to `dlt`, you are done. If
-  not, it needs to be structured. Because structuring is hard, we curate it. Curation involves at
-  least the producer, and consumer, but often also an analyst and the engineer, and is a long,
-  friction-ful process.
-- Usage focus improves ROI: To use data, we need to understand what it is. Structured data already
-  contains a technical description, accelerating usage.
-- Lower cost: Reading structured data is cheaper and faster because we can specify which parts of a
-  document we want to read.
-- Removing friction: By alerting schema changes to the producer and stakeholder, and by automating
-  structuring, we can keep the data engineer out of curation and remove the bottleneck.
-  [Notify maintenance events.](../running-in-production/running#inspect-save-and-alert-on-schema-changes)
-- Improving quality: No more garbage in, garbage out. Because `dlt` structures data and alerts schema
-  changes, we can have better governance.
-
-## dlt makes your team happy
-
-- Spend more time using data, less time loading it. When you build a `dlt` pipeline, you only build
-  the extraction part, automating the tedious structuring and loading.
-- Data meshing to reduce friction: By structuring data before loading, the engineer is no longer
-  involved in curation. This makes both the engineer and the others happy.
-- Better governance with end to end pipelining via dbt:
-  [run dbt packages on the fly](../dlt-ecosystem/transformations/dbt.md),
-  [lineage out of the box](../general-usage/destination-tables.md#data-lineage).
-- Zero learning curve: Declarative loading, simple functional programming. By using `dlt`'s
-  declarative, standard approach to loading data, there is no complicated code to maintain, and the
-  analysts can thus maintain the code.
-- Autonomy and Self service: Customising pipelines is easy, whether you want to plug an anonymiser,
-  rename things, or curate what you load.
-  [Anonymisers, renamers](../general-usage/customising-pipelines/pseudonymizing_columns.md).
-- Easy discovery and governance: By tracking metadata like data lineage, describing data with
-  schemas, and alerting changes, we stay on top of the data.
-- Simplified access: Querying structured data can be done by anyone with their tools of choice.
-
-## dlt is a library that you can run in unprecedented places
-
-Before `dlt` existed, all loading tools were built either
-
-- as SaaS (5tran, Stitch, etc.);
-- as installed apps with their own orchestrator: Pentaho, Talend, Airbyte;
-- or as abandonware framework meant to be unrunnable without help (Singer was released without
-  orchestration, not for public).
-
-`dlt` is the first python library in this space, which means you can just run it wherever the rest of
-your python stuff runs, without adding complexity.
-
-- You can run `dlt` in [Airflow](../dlt-ecosystem/deployments/orchestrators/airflow-deployment.md) -
-  this is the first ingestion tool that does this.
-- You can run `dlt` in small spaces like [Cloud Functions](../dlt-ecosystem/deployments/running-in-cloud-functions.md)
-  or [GitHub Actions](../dlt-ecosystem/deployments/orchestrators/github-actions.md) -
-  so you could easily set up webhooks, etc.
-- You can run `dlt` in your Jupyter Notebook and load data to [DuckDB](../dlt-ecosystem/destinations/duckdb.md).
-- You can run `dlt` on large machines, it will attempt to make the best use of the resources available
-  to it.
-- You can [run `dlt` locally](../walkthroughs/run-a-pipeline.md) just like you run any python scripts.
-
-The implications:
-
-- Empowering Data Teams and Collaboration: You can discover or prototype in notebooks, run in cloud
-  functions, and deploy to production, the same scalable, robust code. No more friction between
-  roles.
-  [Colab demo.](https://colab.research.google.com/drive/1NfSB1DpwbbHX9_t5vlalBTf13utwpMGx?usp=sharing#scrollTo=A3NRS0y38alk)
-- Rapid Data Exploration and Prototyping: By running in Colab with DuckDB, you can explore
-  semi-structured data much faster by structuring it with `dlt` and analysing it in SQL.
-  [Schema inference](../general-usage/schema#data-normalizer),
-  [exploring the loaded data](../dlt-ecosystem/visualizations/exploring-the-data.md).
-- No vendor limits: `dlt` is forever free, with no vendor strings. We do not create value by creating
-  a pain for you and solving it. We create value by supporting you beyond.
-- `dlt` removes complexity: You can use `dlt` in your existing stack, no overheads, no race conditions,
-  full observability. Other tools add complexity.
-- `dlt` can be leveraged by AI: Because it's a library with low complexity to use, large language
-  models can produce `dlt` code for your pipelines.
-- Ease of adoption: If you are running python, you can adopt `dlt`. `dlt` is orchestrator and
-  destination agnostic.