SC + Opensearch using Aggregation Spout #1091
-
Hello all, I've been working on setting up SC with Opensearch as the backend by referring the stormcrawler-docker repository. The setup was interesting and I've managed to get all the services up and running without any errors. However, my document in the Here's the document I've added to the status index. Request:
Response:
Worker logs only shows the following:
|
Beta Was this translation helpful? Give feedback.
Replies: 3 comments 4 replies
-
Hi @dhaneshsabane Looking at your problem, the key is an usual one. Did you inject the seed manually? You could change the log level and see what queries are generated by the spouts and compare with the one you posted above to check that there isn't anything missing from the document that is required by the spouts. |
Beta Was this translation helpful? Give feedback.
-
Hey @jnioche !
Sounds good! I'll try the newer version.
Yes. I've injected that seed manually. My first insert on the index was only with the url but after seeing that it was not working, I tried to add the other fields in an attempt to make it work.
I have two shards and the same number of spouts. I remember reading in some other discussions / answers / documentation that it is important to keep the number of spouts = number of index shards. I'll enable the debug logs while also updating the version to see if it fixes the issue for me. If not, I'll come back with some follow up questions. Thanks for your help! |
Beta Was this translation helpful? Give feedback.
-
The actual query is {
"from": 0,
"size": 0,
"query": {
"bool": {
"filter": [
{
"range": {
"nextFetchDate": {
"from": null,
"to": "2023-09-05T15:26:36Z",
"include_lower": true,
"include_upper": true,
"boost": 1
}
}
}
],
"adjust_pure_negative": true,
"boost": 1
}
},
"explain": false,
"track_total_hits": -1,
"aggregations": {
"partition": {
"terms": {
"field": "key",
"size": 50,
"min_doc_count": 1,
"shard_min_doc_count": 0,
"show_term_doc_count_error": false,
"order": [
{
"top_hit": "asc"
},
{
"_key": "asc"
}
]
},
"aggregations": {
"docs": {
"top_hits": {
"from": 0,
"size": 2,
"version": false,
"seq_no_primary_term": false,
"explain": false,
"sort": [
{
"nextFetchDate": {
"order": "asc"
}
},
{
"url": {
"order": "asc"
}
}
]
}
},
"top_hit": {
"min": {
"field": "nextFetchDate"
}
}
}
}
}
} the doc needs a key field but the doc you added is missing one. Note that URLs belonging to the same domain or host need to be routed using the key so that they end up in the same shard. I'd recommend that you inject with a local topology (see latest version of resources generated by the archetype) and not manually. |
Beta Was this translation helpful? Give feedback.
The actual query is