Skip to content

Commit

Permalink
Browse files Browse the repository at this point in the history
  • Loading branch information
docs-action committed Nov 15, 2023
1 parent 3243e50 commit c807f51
Show file tree
Hide file tree
Showing 2 changed files with 2 additions and 2 deletions.
2 changes: 1 addition & 1 deletion assets/js/search-data.json
Original file line number Diff line number Diff line change
Expand Up @@ -2087,7 +2087,7 @@
},"298": {
"doc": "Welcome to lakeFS",
"title": "How Can lakeFS Help Me?",
"content": "lakeFS helps you maintain a tidy data lake in several ways, including: . Isolated Dev/Test Environments with copy-on-write . lakeFS makes creating isolated dev/test environments for ETL testing instantaneous, and through its use of copy-on-write, cheap. This enables you to test and validate code changes on production data without impacting it, as well as run analysis and experiments on production data in an isolated clone. 👉🏻 Read more . Reproducibility: What Did My Data Look Like at a Point In Time? . Being able to look at data as it was at a given point is particularly useful in at least two scenarios: . | Reproducibility of ML experiments . ML experimentation is usually an iterative process, and being able to reproduce a specific iteration is important. With lakeFS you can version all components of an ML experiment including its data, as well as make use of copy-on-write to minimise the footprint of versions of the data . | Troubleshooting production problems . Data engineers are often asked to validate the data. A user might report inconsistencies, question the accuracy, or simply report it to be incorrect. Since the data continuously changes, it is challenging to understand its state at the time of the error. With lakeFS you can create a branch from a commit to debug an issue in isolation. | . 👉🏻 Read More . Rollback of Data Changes and Recovery from Data Errors . Human error, misconfiguration, or wide-ranging systematic effects are unavoidable. When they do happen, erroneous data may make it into production or critical data assets might accidentally by deleted. By their nature, backups are a wrong tool for recovering from such events. Backups are periodic events that are usually not tied to performing erroneous operations. So, they may be out of date, and will require sifting through data at the object level. This process is inefficient and can take hours, days, or in some cases, weeks to complete. By quickly committing entire snapshots of data at well-defined times, recovering data in deletion or corruption events becomes an instant one-line operation with lakeFS: just identify a good historical commit, and then restore to it or copy from it. 👉🏻 Read more . Multi-Table Transactions guarantees . Data engineers typically need to implement custom logic in scripts to guarantee two or more data assets are updated synchronously. This logic often requires extensive rewrites or periods during which data is unavailable. The lakeFS merge operation from one branch into another removes the need to implement this logic yourself. Instead, make updates to the desired data assets on a branch and then utilize a lakeFS merge to atomically expose the data to downstream consumers. To learn more about atomic cross-collection updates, check out this video which describes the concept in more detail, along with this notebook in the lakeFS samples repository. Establishing data quality guarantees - CI/CD for data . The best way to deal with mistakes is to avoid them. A data source that is ingested into the lake introducing low-quality data should be blocked before exposure if possible. With lakeFS, you can achieve this by tying data quality tests to commit and merge operations via lakeFS hooks. 👉🏻 Read more . ",
"content": "lakeFS helps you maintain a tidy data lake in several ways, including: . Isolated Dev/Test Environments with copy-on-write . lakeFS makes creating isolated dev/test environments for ETL testing instantaneous, and through its use of copy-on-write, cheap. This enables you to test and validate code changes on production data without impacting it, as well as run analysis and experiments on production data in an isolated clone. 👉🏻 Read more . Reproducibility: What Did My Data Look Like at a Point In Time? . Being able to look at data as it was at a given point is particularly useful in at least two scenarios: . | Reproducibility of ML experiments . ML experimentation is usually an iterative process, and being able to reproduce a specific iteration is important. With lakeFS you can version all components of an ML experiment including its data, as well as make use of copy-on-write to minimise the footprint of versions of the data . | Troubleshooting production problems . Data engineers are often asked to validate the data. A user might report inconsistencies, question the accuracy, or simply report it to be incorrect. Since the data continuously changes, it is challenging to understand its state at the time of the error. With lakeFS you can create a branch from a commit to debug an issue in isolation. | . 👉🏻 Read More . Rollback of Data Changes and Recovery from Data Errors . Human error, misconfiguration, or wide-ranging systematic effects are unavoidable. When they do happen, erroneous data may make it into production or critical data assets might accidentally be deleted. By their nature, backups are a wrong tool for recovering from such events. Backups are periodic events that are usually not tied to performing erroneous operations. So, they may be out of date, and will require sifting through data at the object level. This process is inefficient and can take hours, days, or in some cases, weeks to complete. By quickly committing entire snapshots of data at well-defined times, recovering data in deletion or corruption events becomes an instant one-line operation with lakeFS: just identify a good historical commit, and then restore to it or copy from it. 👉🏻 Read more . Multi-Table Transactions guarantees . Data engineers typically need to implement custom logic in scripts to guarantee two or more data assets are updated synchronously. This logic often requires extensive rewrites or periods during which data is unavailable. The lakeFS merge operation from one branch into another removes the need to implement this logic yourself. Instead, make updates to the desired data assets on a branch and then utilize a lakeFS merge to atomically expose the data to downstream consumers. To learn more about atomic cross-collection updates, check out this video which describes the concept in more detail, along with this notebook in the lakeFS samples repository. Establishing data quality guarantees - CI/CD for data . The best way to deal with mistakes is to avoid them. A data source that is ingested into the lake introducing low-quality data should be blocked before exposure if possible. With lakeFS, you can achieve this by tying data quality tests to commit and merge operations via lakeFS hooks. 👉🏻 Read more . ",
"url": "/#how-can-lakefs-help-me",

"relUrl": "/#how-can-lakefs-help-me"
Expand Down
2 changes: 1 addition & 1 deletion index.html
Original file line number Diff line number Diff line change
Expand Up @@ -708,7 +708,7 @@ <h3 id="rollback-of-data-changes-and-recovery-from-data-errors">

<p>Human error, misconfiguration, or wide-ranging systematic effects are
unavoidable. When they do happen, erroneous data may make it into
production or critical data assets might accidentally by deleted.</p>
production or critical data assets might accidentally be deleted.</p>

<p>By their nature, backups are a wrong tool for recovering from such events.
Backups are periodic events that are usually not tied to performing
Expand Down

0 comments on commit c807f51

Please sign in to comment.