diff --git a/docs/website/blog/2024-05-28-openapi-pipeline.md b/docs/website/blog/2024-05-28-openapi-pipeline.md new file mode 100644 index 0000000000..cf0847227a --- /dev/null +++ b/docs/website/blog/2024-05-28-openapi-pipeline.md @@ -0,0 +1,82 @@ +--- +slug: openapi-pipeline +title: "Instant pipelines with dlt-init-openapi" +image: https://storage.googleapis.com/dlt-blog-images/openapi.png +authors: + name: Adrian Brudaru + title: Open source Data Engineer + url: https://github.com/adrianbr + image_url: https://avatars.githubusercontent.com/u/5762770?v=4 +tags: [full code etl, yes code etl, etl, python elt] +--- + +# Welcome to the Future of Data Pipelines, now. + +Dear dltHub Community, + +We are thrilled to announce the launch of our groundbreaking pipeline generator tool. + +We call it the `dlt-init-openapi`, or OpenAPI Source. + +Just point our pipeline generator to an OpenAPI spec, select your endpoints, and you're done! + +### What's OpenAPI again? + +OpenAPI is the world's most widely used API description standard. +In 2021 an information-security company named Assetnote scanned the web and unearthed 200,000 public +OpenAPI files. Modern API frameworks like FastAPI generate such specifications automatically. + +## How does it work? + +**A pipeline is a series of datapoints or decisions about how to extract and load the data**, expressed as code or config. +Our tool does its best to pick out the necessary details and detect the rest to generate the complete pipeline for you. + +The information required for taking those decisions comes from: +- The OpenAPI Spec (endpoints, auth) +- The dlt REST API Source which attempts to detect pagination +- The dlt OpenAPI Source which attempts to detect incremental logic and dependent requests. + +### How well does it work? + +This is something we are also learning about. We did an internal hackathon where we each built a few pipelines with this generator. In our experiments with APIs for which we had credentials, it worked pretty well. + +However, we cannot undertake a big detour from our work to manually test each possible pipeline, so your feedback will be invaluable. +So please, if you try it, let us know how well it worked - and ideally, add the spec you used to our [repository](https://github.com/dlt-hub/openapi-specs). + +### What to do if it doesn't work? + +Once a pipeline is created, it is a **fully configurable instance of the REST API Source**. +So if anything did not go smoothly, you can make the final tweaks. +You can learn how to adjust the generated pipeline by reading our [REST API Source documentation](https://dlthub.com/docs/dlt-ecosystem/verified-sources/rest_api). + +### Are we using LLMS? + +No. Maybe later. + +The pipelines are generated algorithmically with deterministic outcomes. This way, we have more control over the quality of the decisions. + +If we took an LLM-first approach, the errors would compound and put the burden back on the data person. + +We are however considering using LLM-assists for the things that the algorithmic approach can't detect. Another avenue could be generating the OpenAPI spec from website docs. +So we are eager to get feedback from you on what works and what needs work, enabling us to improve it. + +## Try it out now: + +**[Tool and Code repo](https://github.com/dlt-hub/dlt-init-openapi)** + +**[Colab demo.](https://colab.research.google.com/drive/1MRZvguOTZj1MlkEGzjiso8lQ_wr1MJRI?usp=sharing)** + +**Video Walkthrough:** + + +**[Specs repository you can generate from](https://github.com/dlt-hub/openapi-specs)** + +## Next steps + +We're excited to see how you will use our new pipeline generator and we are eager for your feeedback. **[Join our community!](https://dlthub.com/community)** + +Got an OpenAPI spec? **[Add it to our specs repository](https://github.com/dlt-hub/openapi-specs)** so others may use it. If the spec doesn't work, please note that in the PR and we will use it for R&D. + +*Thank you for being part of our community and for building the future of ETL together!* + +*- dltHub Team*