-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy pathindex.html
39 lines (39 loc) · 22.1 KB
/
index.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
<!DOCTYPE html> <html lang="en-US"> <head> <meta charset="UTF-8"> <meta http-equiv="X-UA-Compatible" content="IE=Edge"> <link rel="shortcut icon" href="/favicon.ico" type="image/x-icon"> <link rel="stylesheet" href="/assets/css/just-the-docs-default.css"> <script type="text/javascript" src="/assets/js/vendor/lunr.min.js"></script> <script type="text/javascript" src="/assets/js/just-the-docs.js"></script> <meta name="viewport" content="width=device-width, initial-scale=1"> <!-- Begin Jekyll SEO tag v2.7.1 --> <title>About</title> <meta name="generator" content="Jekyll v4.2.1" /> <meta property="og:title" content="About" /> <meta name="author" content="Ozgun Ataman" /> <meta property="og:locale" content="en_US" /> <meta name="description" content="About Napkin, its features, philosophy, goals, and reasons why it was created." /> <meta property="og:description" content="About Napkin, its features, philosophy, goals, and reasons why it was created." /> <link rel="canonical" href="https://docs.napkin.run/" /> <meta property="og:url" content="https://docs.napkin.run/" /> <meta name="twitter:card" content="summary" /> <meta property="twitter:title" content="About" /> <script type="application/ld+json"> {"@context":"https://schema.org","@type":"WebSite","author":{"@type":"Person","name":"Ozgun Ataman"},"description":"About Napkin, its features, philosophy, goals, and reasons why it was created.","headline":"About","url":"https://docs.napkin.run/"}</script> <!-- End Jekyll SEO tag --> <script src="https://code.jquery.com/jquery-3.6.0.slim.min.js" integrity="sha256-u7e5khyithlIdTpu22PHhENmPcRdFiHRjhAuHcs05RI=" crossorigin="anonymous"></script> <script type="text/javascript" src="/assets/js/resize.js"></script> </head> <body> <svg xmlns="http://www.w3.org/2000/svg" style="display: none;"> <symbol id="svg-link" viewbox="0 0 24 24"> <title>Link</title> <svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewbox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-link"> <path d="M10 13a5 5 0 0 0 7.54.54l3-3a5 5 0 0 0-7.07-7.07l-1.72 1.71"></path><path d="M14 11a5 5 0 0 0-7.54-.54l-3 3a5 5 0 0 0 7.07 7.07l1.71-1.71"></path> </svg> </symbol> <symbol id="svg-search" viewbox="0 0 24 24"> <title>Search</title> <svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewbox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-search"> <circle cx="11" cy="11" r="8"></circle><line x1="21" y1="21" x2="16.65" y2="16.65"></line> </svg> </symbol> <symbol id="svg-menu" viewbox="0 0 24 24"> <title>Menu</title> <svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewbox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"> <line x1="3" y1="12" x2="21" y2="12"></line><line x1="3" y1="6" x2="21" y2="6"></line><line x1="3" y1="18" x2="21" y2="18"></line> </svg> </symbol> <symbol id="svg-arrow-right" viewbox="0 0 24 24"> <title>Expand</title> <svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewbox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-chevron-right"> <polyline points="9 18 15 12 9 6"></polyline> </svg> </symbol> <symbol id="svg-doc" viewbox="0 0 24 24"> <title>Document</title> <svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewbox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-file"> <path d="M13 2H6a2 2 0 0 0-2 2v16a2 2 0 0 0 2 2h12a2 2 0 0 0 2-2V9z"></path><polyline points="13 2 13 9 20 9"></polyline> </svg> </symbol> </svg> <div class="side-bar"> <div class="site-header"> <a href="#" id="menu-button" class="site-button"> <svg viewbox="0 0 24 24" class="icon"><use xlink:href="#svg-menu"></use></svg> </a> </div> <nav role="navigation" aria-label="Main" id="site-nav" class="site-nav"> <ul class="nav-list">
<li class="nav-list-item active"><a href="https://docs.napkin.run/" class="nav-list-link active">About</a></li>
<li class="nav-list-item"><a href="https://docs.napkin.run/getting-started/" class="nav-list-link">Getting started</a></li>
<li class="nav-list-item">
<a href="#" class="nav-list-expander"><svg viewbox="0 0 24 24"><use xlink:href="#svg-arrow-right"></use></svg></a><a href="https://docs.napkin.run/install/" class="nav-list-link">Installation</a><ul class="nav-list ">
<li class="nav-list-item "><a href="https://docs.napkin.run/install#native" class="nav-list-link">Native</a></li>
<li class="nav-list-item "><a href="https://docs.napkin.run/install#homebrew" class="nav-list-link">Homebrew</a></li>
<li class="nav-list-item "><a href="https://docs.napkin.run/install#docker" class="nav-list-link">Docker</a></li>
<li class="nav-list-item "><a href="https://docs.napkin.run/install#cachix" class="nav-list-link">Cachix</a></li>
</ul>
</li>
<li class="nav-list-item"><a href="https://docs.napkin.run/fundamentals/" class="nav-list-link">Fundamentals</a></li>
<li class="nav-list-item"><a href="https://docs.napkin.run/tips-and-tricks/" class="nav-list-link">Tips and Tricks</a></li>
<li class="nav-list-item">
<a href="#" class="nav-list-expander"><svg viewbox="0 0 24 24"><use xlink:href="#svg-arrow-right"></use></svg></a><a href="https://docs.napkin.run/user-manual/" class="nav-list-link">User Manual</a><ul class="nav-list ">
<li class="nav-list-item "><a href="https://docs.napkin.run/user-manual/cli-reference/" class="nav-list-link">CLI reference</a></li>
<li class="nav-list-item "><a href="https://docs.napkin.run/user-manual/db-connection/" class="nav-list-link">Connecting to the database</a></li>
<li class="nav-list-item "><a href="https://docs.napkin.run/user-manual/devcontainer/" class="nav-list-link">Devcontainer</a></li>
<li class="nav-list-item "><a href="https://docs.napkin.run/user-manual/docker/" class="nav-list-link">Docker</a></li>
<li class="nav-list-item "><a href="https://docs.napkin.run/user-manual/ide/" class="nav-list-link">IDE Support</a></li>
<li class="nav-list-item "><a href="https://docs.napkin.run/user-manual/multi-environment/" class="nav-list-link">Multi-environment pipelines in a team setting</a></li>
<li class="nav-list-item "><a href="https://docs.napkin.run/user-manual/mustache/" class="nav-list-link">Mustache Interpolation</a></li>
<li class="nav-list-item "><a href="https://docs.napkin.run/user-manual/preprocessors/" class="nav-list-link">Preprocessors</a></li>
<li class="nav-list-item "><a href="https://docs.napkin.run/user-manual/github-actions/" class="nav-list-link">Running as a scheduled job on GitHub Actions</a></li>
<li class="nav-list-item "><a href="https://docs.napkin.run/user-manual/yaml-reference/" class="nav-list-link">Spec YAML reference</a></li>
<li class="nav-list-item "><a href="https://docs.napkin.run/user-manual/spec-arguments/" class="nav-list-link">Spec arguments</a></li>
</ul>
</li>
<li class="nav-list-item">
<a href="#" class="nav-list-expander"><svg viewbox="0 0 24 24"><use xlink:href="#svg-arrow-right"></use></svg></a><a href="https://docs.napkin.run/metaprogramming/" class="nav-list-link">Metaprogramming</a><ul class="nav-list ">
<li class="nav-list-item "><a href="https://docs.napkin.run/metaprogramming/haskell/" class="nav-list-link">Haskell API</a></li>
<li class="nav-list-item "><a href="https://docs.napkin.run/metaprogramming/repl/" class="nav-list-link">REPL</a></li>
<li class="nav-list-item "><a href="https://docs.napkin.run/metaprogramming/custom-hooks/" class="nav-list-link">Writing Custom Hooks</a></li>
</ul>
</li>
<li class="nav-list-item"><a href="https://docs.napkin.run/haddock/" class="nav-list-link">API Reference</a></li>
<li class="nav-list-item"><a href="https://docs.napkin.run/changelog/" class="nav-list-link">Changelog</a></li>
</ul> </nav> </div> <div class="site-super-header"> <div class="super-nav-box"> <div class="site-logo"></div> <div class="site-buttons"> <a class="link-button active" href="/">Documentation</a> <a class="link-button" href="https://napkinweb.webflow.io/community" target="_blank" rel="noopener noreferrer">Community</a> <a class="link-button" href="https://napkinweb.webflow.io/features" target="_blank" rel="noopener noreferrer">Features</a> <a class="link-button" href="https://napkinweb.webflow.io/usecases" target="_blank" rel="noopener noreferrer">Use Cases</a> <span class="gradient-border"> <a class="link-button" href="/install/">Install</a> </span> </div> </div> </div> <div class="width-ruler"></div> <div class="main" id="top"> <div id="main-header" class="main-header"> <div class="search"> <div class="search-input-wrap"> <input type="text" id="search-input" class="search-input" tabindex="0" placeholder="Search " aria-label="Search " autocomplete="off"> <label for="search-input" class="search-label"><svg viewbox="0 0 24 24" class="search-icon"><use xlink:href="#svg-search"></use></svg></label> </div> <div id="search-results" class="search-results"></div> </div> </div> <div id="main-content-wrap" class="main-content-wrap"> <div id="main-content" class="main-content" role="main"> <ul id="markdown-toc"> <li><a href="#what-is-napkin" id="markdown-toc-what-is-napkin">What is Napkin</a></li> <li>
<a href="#napkins-philosophy" id="markdown-toc-napkins-philosophy">Napkin’s Philosophy</a> <ul> <li><a href="#base-as-much-data-compute-as-possible-on-sql" id="markdown-toc-base-as-much-data-compute-as-possible-on-sql">Base as much data compute as possible on SQL</a></li> <li><a href="#do-as-much-compute-as-possible-on-modern-analytics-dbs-like-bigqueryredshiftsnowflake" id="markdown-toc-do-as-much-compute-as-possible-on-modern-analytics-dbs-like-bigqueryredshiftsnowflake">Do as much compute as possible on modern analytics DBs like BigQuery/Redshift/Snowflake</a></li> <li><a href="#abstract-and-reuse-complex-transformations-where-possible" id="markdown-toc-abstract-and-reuse-complex-transformations-where-possible">Abstract and reuse complex transformations where possible</a></li> <li><a href="#data-pipelines-should-be-declarative-and-managed-on-git" id="markdown-toc-data-pipelines-should-be-declarative-and-managed-on-git">Data pipelines should be declarative and managed on Git</a></li> <li><a href="#data-pipelines-should-be-regenerative" id="markdown-toc-data-pipelines-should-be-regenerative">Data pipelines should be regenerative</a></li> <li><a href="#data-pipeline-dev-should-be-lightweight-on-bare-laptops" id="markdown-toc-data-pipeline-dev-should-be-lightweight-on-bare-laptops">Data pipeline dev should be lightweight on bare laptops</a></li> <li><a href="#doctrine-of-extreme-convenience" id="markdown-toc-doctrine-of-extreme-convenience">Doctrine of extreme convenience</a></li> </ul> </li> <li><a href="#napkins-benefits" id="markdown-toc-napkins-benefits">Napkin’s Benefits</a></li> <li><a href="#napkins-future" id="markdown-toc-napkins-future">Napkin’s Future</a></li> <li><a href="#next-steps" id="markdown-toc-next-steps">Next Steps</a></li> </ul> <h1 id="what-is-napkin">What is Napkin</h1> <p>Napkin is a command line application that executes data pipelines of all sizes, backed by a feature-rich Haskell library offering programmatic freedom. It’s lightweight, offers a quick start for new projects and yet scales to massive data pipelines with powerful meta-programming possibilities.</p> <p>Napkin has a broad vision in making life easier for data scientists and engineers, encapsulating a large portion of the data engineering landscape. It therefore bundles several key features together:</p> <ol> <li> <p>A consumer-grade Command Line Interface (CLI) that acts as the single point of entry for all typical workflows of data engineering and pipeline curation. The <code class="language-plaintext highlighter-rouge">napkin</code> app can refresh entire data pipelines, re-create individual tables, validate/typecheck pipelines in seconds, export dependency graphs and more.</p> </li> <li> <p>A multi-backend (w.g. BigQuery and Redshift) database runtime environment that provides for all key capabilities in executing a modern data pipeline, including interacting the database (see what’s there, query tables, create/recreate/update tables, etc.), performing runtime unit-tests/assertions, logging, timing and interacting with the outside world.</p> </li> <li> <p>A built-in DAG orchestrator that can automatically detect all the dependency relationships in a data pipeline (e.g. 30+ tables) and perform the pipeline updates in the correct order. Data pipelines are called “Spec”s in napkin and ship with all batteries included: Ability to rewrite table destinations into different schemas/datasets for different environments (e.g. devel vs. prod), mass-prefixing/renaming tables, setting different “Refresh Strategies” for each table (e.g. update daily vs. only update when missing), a wide range of data unit-tests (e.g. table must be unique by columns X+Y) that are automatically performed each time the table is updated.</p> </li> <li> <p>For the power user, a SQL wrapper DSL in Haskell that stays as close as possible to SQL, without any intermediary object or relational mappings. This DSL looks almost like regular SQL, but allows sophisticated programmatic manipulation and composition of SQL queries and statements. Napkin can parse regular SQL into this internal DSL, perform any desired manipulations and render it back out as regular SQL.</p> </li> <li> <p>A sophisticated SQL meta-programming environment that accelerates modern data engineering efforts. Napkin users can interweave several options for crafting SQL as they see fit, even in the same file. These options include:</p> <ul> <li>Writing plain SQL files without any low-grade templating noise. Napkin will still auto-detect all dependencies and make the pipeline “just work”.</li> <li>Using lightweight variable substitutions in <code class="language-plaintext highlighter-rouge">.sql</code> files via Mustache templates.</li> <li>Using sophisticated <code class="language-plaintext highlighter-rouge">{{#sExp}} ... {{/sExp}}</code> splices directly in <code class="language-plaintext highlighter-rouge">.sql</code> files to write Haskell code that dynamically generates SQL fragments on the fly.</li> <li>Expressing entire queries directly using napkin’s Haskell DSL, often used for dynamic generation of SQL code based on complex logic. For example, prediction trees can be rendered into SQL this way, sometimes generated 100K LOC SQL files from a single model.</li> </ul> </li> </ol> <h1 id="napkins-philosophy">Napkin’s Philosophy</h1> <p>Napkin was created to capitalize on an opportunity we noticed back in 2015 to (massively) accelerate our team’s data engineering capabilities and yet make the resulting code-bases way more sustainable/maintainable. At the time, we were drowning in the complexity of custom Hadoop MapReduce programs, Spark programs and repositories of ad-hoc SQL scripts targeted on Redshift/Hive/etc at the time. We created napkin because we sorely needed something more practical and reliable for our own work.</p> <p>Over time, the opportunities we saw got crystallized into a set of philosophies we can articulate about what napkin is trying to achieve and whether it may be the huge catalyst for your team that it has been for us.</p> <h2 id="base-as-much-data-compute-as-possible-on-sql">Base as much data compute as possible on SQL</h2> <p>Despite its age and missed opportunities, SQL code is declarative, functional and highly expressive. It’s easy to construct even for non-engineer data scientists/analysts and tends to offer good “equational reasoning”. It’s constrained just the right amount that business logic does not go “off the hook” like it can in typical programming languages like Python, R, Scala, etc. Once written and tested, SQL tends to produce reliable results.</p> <p>Over the years, we have found almost all data engineering efforts outside of SQL to be error-prone, hard to grow and expensive (e.g. needs data engineers) to maintain over time.</p> <p>If you can imagine how a table should be structured and express that table as a query in SQL, you can use napkin to engineer a pipeline.</p> <h2 id="do-as-much-compute-as-possible-on-modern-analytics-dbs-like-bigqueryredshiftsnowflake">Do as much compute as possible on modern analytics DBs like BigQuery/Redshift/Snowflake</h2> <p>Napkin aims to be a data engineering superpower even for very small teams. This is accomplished in large part by leaning on the amazing compute capabilities of modern analytics databases like BigQuery. Napkin’s creation goes back to our realization that if we could express even a very complex computation in SQL on these databases, no matter how convoluted, they would get the work done in astonishingly little time for minimal cost.</p> <p>In our work, we have produced numerous 200,000+ LOC SQL queries using napkin’s meta-programming capabilities that run within minutes on databases like Amazon Redshift and Google’s BigQuery. Fun fact: BigQuery has a ~1M character limit on queries, which we sometimes bypass by breaking complex queries into parts and joining them up / unioning them later. Even this transformation can be done automatically for you by napkin in certain cases!</p> <h2 id="abstract-and-reuse-complex-transformations-where-possible">Abstract and reuse complex transformations where possible</h2> <h2 id="data-pipelines-should-be-declarative-and-managed-on-git">Data pipelines should be declarative and managed on Git</h2> <h2 id="data-pipelines-should-be-regenerative">Data pipelines should be regenerative</h2> <h2 id="data-pipeline-dev-should-be-lightweight-on-bare-laptops">Data pipeline dev should be lightweight on bare laptops</h2> <h2 id="doctrine-of-extreme-convenience">Doctrine of extreme convenience</h2> <p>With napkin, we aim to make various data engineering and data science workflows so easy to perform that practitioners change their behavior to lean on them more frequently. We believe that speed and convenience without sacrificing correctness and reliability makes a huge difference in sustaining data ecosystem effectiveness.</p> <h1 id="napkins-benefits">Napkin’s Benefits</h1> <p>Here’s our best description of benefits you can expect after you’ve gotten a hang of napkin:</p> <ol> <li> <p>You’ll be able to see and manage your entire data pipeline in a simple codebase, in declarative fashion and in source control - just like any modern software project.</p> </li> <li> <p>You’ll always be able to “blow away and fully refresh” your entire pipeline from raw data at the push of a button - recovering from mistakes will be a breeze.</p> </li> <li> <p>Your data pipeline will entirely rely on the power of your backend database, whatever it may be. The likes of BigQuery for large datasets or Postgres (or even Sqlite) when you can get away with it on small data. You won’t rely on error prone Python pandas code, your own custom data processing application and similar constructs that are hard to grow/maintain and ensure correctness over time.</p> </li> <li> <p>Your data will have actual unit tests that will confirm correctness with each update. (Example: Making sure you don’t double count sales)</p> </li> <li> <p>You’ll benefit from tens of combinators we ship with napkin, such as incrementally updating large tables, column-to-row transformations, union-in same-structured tables into one, etc. As we improve napkin, you’ll get all that for free.</p> </li> <li> <p>You’ll be able to implement your own clever SQL meta-programming to express logic that’d be too tedious to do in plain SQL. Yet the result will still have all the benefits of declarative SQL running on modern analytics databases, instead of your custom Python/R/Scala scripting machine. You’ll be able to create your own mini programs that produce 10-table “purchasing funnel” computations that connect just the right way based on configuration parameters supplied.</p> </li> </ol> <h1 id="napkins-future">Napkin’s Future</h1> <p>Napkin is utilized heavily in commercial projects both at Soostone and at our clients. We improve napkin all the time and have a long backlog of major features we will realize in the future.</p> <p>We would like to be transparent with our roadmap and are looking for ways to best communicate our plans. We’re currently maintaining a Trello board with our roadmap where we would love to hear your reactions and feedback. You can access our roadmap board at <a href="https://trello.com/b/rIbzqkFb/napkin-roadmap" target="_blank" rel="noopener noreferrer">Napkin Roadmap</a></p> <h1 id="next-steps">Next Steps</h1> <p><a href="/fundamentals/">Continue with <strong>Tutorial</strong></a></p> </div> </div> <div class="search-overlay"></div> </div> <div class="site-super-footer"> <div class="super-nav-box"> <div class="site-logo"></div> <div class="site-buttons"> <div class="table-wrapper"><table> <tr> <td> <a class="link-button active" href="/">Documentation</a> </td> <td> <a class="link-button" href="https://napkinweb.webflow.io/usecases" target="_blank" rel="noopener noreferrer">Use Cases</a> </td> <td> <span class="gradient-border"> <a class="link-button" href="/install/">Get Napkin</a> </span> </td> </tr> <tr> <td> <a class="link-button" href="https://napkinweb.webflow.io/community" target="_blank" rel="noopener noreferrer">Community</a> </td> <td> <a class="link-button" href="https://napkinweb.webflow.io/about" target="_blank" rel="noopener noreferrer">About Napkin</a> </td> </tr> <tr> <td> <a class="link-button" href="https://napkinweb.webflow.io/features" target="_blank" rel="noopener noreferrer">Features</a> </td> <td> <a class="link-button" href="https://napkinweb.webflow.io/contact" target="_blank" rel="noopener noreferrer">Contact</a> </td> </tr> </table></div> </div> <div class="media-bar"> <div class="media-pad"><a href="#" class="inline-icon facebook"></a></div> <div class="media-pad"><a href="#" class="inline-icon twitter"></a></div> <div class="media-pad"><a href="#" class="inline-icon linked-in"></a></div> </div> <div class="copyright"> <div class="left"> Copyright 2022 © </div> <div class="right"> Made with <span class="heart"> </span> in NYC </div> </div> </div> </div> <script type="text/javascript" src="/assets/js/copy.js"></script> </body> </html>