diff --git a/docs/website/docs/dlt-ecosystem/verified-sources/filesystem.md b/docs/website/docs/dlt-ecosystem/verified-sources/filesystem.md
index f00f3b299c..55774cf063 100644
--- a/docs/website/docs/dlt-ecosystem/verified-sources/filesystem.md
+++ b/docs/website/docs/dlt-ecosystem/verified-sources/filesystem.md
@@ -15,8 +15,9 @@ source. Currently the following reader sources are supported:
- read_csv (with Pandas)
- read_jsonl
-- read_parquet (with pyarrow)
Additionally, it can read Excel files with a standalone
- transformer and copy files locally.
+- read_parquet (with pyarrow)
+
+Additionally, it can read Excel files with a standalone transformer and copy files locally.
Sources and resources that can be loaded using this verified source are:
@@ -210,9 +211,9 @@ For more information, read the [Walkthrough: Run a pipeline](../../walkthroughs/
This source provides resources that are chunked file readers. You can customize these readers
optionally, resources provided are:
-- read_csv(chunksize, \*\*pandas_kwargs)
-- read_jsonl(chunksize)
-- read_parquet(chunksize)
+- read_csv
+- read_jsonl
+- read_parquet
```python
@dlt.source(_impl_cls=ReadersSource, spec=FilesystemConfigurationResource)
@@ -223,9 +224,14 @@ def readers(
) -> Tuple[DltResource, ...]:
```
-`bucket_url`: The url to the bucket.
`credentials`: The credentials to the filesystem of fsspec
-`AbstractFilesystem` instance.
`file_glob`: Glob filter for files; defaults to non-recursive
-listing in the bucket.
+`bucket_url`: The url to the bucket.
+
+`credentials`: The credentials to the filesystem of fsspec
+
+`AbstractFilesystem` instance.
+
+`file_glob`: Glob filter for files; defaults to non-recursive
+listing in the bucket.
### Resource `filesystem`
@@ -245,11 +251,17 @@ def filesystem(
) -> Iterator[List[FileItem]]:
```
-`bucket_url`: URL of the bucket.
`credentials`: Filesystem credentials of `AbstractFilesystem`
-instance.
`file_glob`: File filter in glob format. Defaults to listing all non-recursive files
-in bucket_url.
`files_per_page`: Number of files processed at once (default: 100).
+`bucket_url`: URL of the bucket.
+
+`credentials`: Filesystem credentials of `AbstractFilesystem` instance.
+
+`file_glob`: File filter in glob format. Defaults to listing all non-recursive files
+in bucket_url.
+
+`files_per_page`: Number of files processed at once (default: 100).
+
`extract_content`: If true, the content of the file will be read and returned in the resource.
-(default: False).
+(default: False).
## Filesystem Integration and Data Extraction Guide
@@ -286,11 +298,17 @@ pipeline.run(met_files.with_name("met_csv"))
#### FileItem Fields:
-`file_url` - Complete URL of the file; also the primary key (e.g., file://).
`file_name` - Name
-or relative path of the file from the bucket_url.
`mime_type` - File's mime type; sourced from
-the bucket provider or inferred from its extension.
`modification_date` - File's last
-modification time (format: pendulum.DateTime).
`size_in_bytes` - File size.
`file_content` -
-Content, provided upon request.
+`file_url` - Complete URL of the file; also the primary key (e.g., file://).
+
+`file_name` - Name or relative path of the file from the bucket_url.
+
+`mime_type` - File's mime type; sourced from the bucket provider or inferred from its extension.
+
+`modification_date` - File's last modification time (format: pendulum.DateTime).
+
+`size_in_bytes` - File size.
+
+`file_content` - Content, provided upon request.
> 📌 Note: When using a nested or recursive glob pattern, file_name will include the file's path. For
> instance, using the resource:
@@ -339,9 +357,10 @@ verified source.
print(pipeline.last_trace.last_normalize_info)
```
- > The `file_glob` parameter targets all CSVs in the "met_csv/A801" directory..
The
- > `print(pipeline.last_trace.last_normalize_info)` line displays the data normalization details
- > from the pipeline's last trace.
📌 Note: If you have a default bucket URL set in
+ > The `file_glob` parameter targets all CSVs in the "met_csv/A801" directory.
+ > The`print(pipeline.last_trace.last_normalize_info)` line displays the data normalization details
+ > from the pipeline's last trace.
+ >📌 Note: If you have a default bucket URL set in
> "/.dlt/config.toml", you can omit the bucket_url parameter.
When rerun the next day, this pipeline updates both new and the previous day's records.