-
Notifications
You must be signed in to change notification settings - Fork 4.2k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Source Apify Dataset: add item_collection stream with dynamic schema (#…
…31333) Co-authored-by: Joe Reuter <[email protected]>
- Loading branch information
Showing
7 changed files
with
72 additions
and
7 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
12 changes: 12 additions & 0 deletions
12
...rations/connectors/source-apify-dataset/source_apify_dataset/schemas/item_collection.json
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,12 @@ | ||
{ | ||
"$schema": "http://json-schema.org/draft-07/schema#", | ||
"title": "Item collection", | ||
"type": ["null", "object"], | ||
"additionalProperties": true, | ||
"properties": { | ||
"data": { | ||
"additionalProperties": true, | ||
"type": ["null", "object"] | ||
} | ||
} | ||
} |
24 changes: 24 additions & 0 deletions
24
...grations/connectors/source-apify-dataset/source_apify_dataset/wrapping_dpath_extractor.py
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,24 @@ | ||
# | ||
# Copyright (c) 2023 Airbyte, Inc., all rights reserved. | ||
# | ||
|
||
from dataclasses import dataclass | ||
|
||
import requests | ||
from airbyte_cdk.sources.declarative.extractors.dpath_extractor import DpathExtractor | ||
from airbyte_cdk.sources.declarative.types import Record | ||
|
||
|
||
@dataclass | ||
class WrappingDpathExtractor(DpathExtractor): | ||
""" | ||
Record extractor that wraps the extracted value into a dict, with the value being set to the key `data`. | ||
This is done because the actual shape of the data is dynamic, so by wrapping everything into a `data` object | ||
it can be specified as a generic object in the schema. | ||
Note that this will cause fields to not be normalized in the destination. | ||
""" | ||
|
||
def extract_records(self, response: requests.Response) -> list[Record]: | ||
records = super().extract_records(response) | ||
return [{"data": record} for record in records] |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,10 +1,14 @@ | ||
# Apify Dataset Migration Guide | ||
|
||
## Upgrading to 2.1.0 | ||
|
||
A minor update adding a new stream `item_collection` for general datasets. No actions are required regarding your current connector configuration setup. | ||
|
||
## Upgrading to 2.0.0 | ||
|
||
Major update: The old broken Item Collection stream has been removed and replaced with a new Item Collection (WCC) stream specific for the datasets produced by [Website Content Crawler](https://apify.com/apify/website-content-crawler) Actor. Please update your connector configuration setup. Note: The schema of the Apify Dataset is at least Actor-specific, so we cannot have a general Stream with a static schema for getting data from a Dataset. | ||
|
||
## Upgrading to 1.0.0 | ||
|
||
A major update fixing the data ingestion to retrieve properly data from Apify. | ||
Please update your connector configuration setup. | ||
Please update your connector configuration setup. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters