Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Push index to AWS OpenSearch through tunneling #117

Open
caoyang1211 opened this issue Jun 29, 2022 · 1 comment
Open

Push index to AWS OpenSearch through tunneling #117

caoyang1211 opened this issue Jun 29, 2022 · 1 comment

Comments

@caoyang1211
Copy link

caoyang1211 commented Jun 29, 2022

I created an OpenSearch domain on AWS inside a VPC. To read from or write data to the domain from my laptop, I have to run SecureCRT, create a session for a Bastion server that is running on AWS and has access to the VPC, and set up port forwarding to redirect traffic https://127.0.0.1:60443 to the OpenSearch domain https://vpc-*****-us-east-1-1-1-yzdjblkpyhbyaytxcnbzrxavpm.us-east-1.es.amazonaws.com.

I verified that port forwarding was working by running a curl command to index multiple JSON files into that OpenSearch domain with a --insecure option to ignore certificate checks. The command is like this:
curl -H "Content-Type:application/json" --insecure -XPOST "https://127.0.0.1:60443/_bulk" --data-binary @TutorialVideoDbRecords.json"

To use the Norconex crawler to index web pages to the OpenSearch domain, I set https://localhost:60443 in the Norconex config file and ran the crawler, it reported "Failure occured on node: "null" and "Host name 'localhost' does not match the certificate subject provided by the peer (CN=*.us-east-1.es.amazonaws.com)"

So it looks like the problem is caused by a certificate validation failure. Is there an option in the config file that can ignore certificate checks like the --insecure option in the curl command? My configuration is as following, no user credential is required to access the OpenSearch domain inside the VPC :

        <committer class="ElasticsearchCommitter">
			<nodes>https://localhost:60443</nodes>
			<indexName>tutorials_videos</indexName>
        </committer>

The error message I got after running the crawler is as following:
0:42:22.020 [tutorial video#1] ERROR ElasticsearchCommitter - Failure occured on node: "null". Check node logs.
10:42:22.022 [tutorial video#1] ERROR COMMITTER_BATCH_ERROR - CommitterEvent[connectionTimeout=1000,credentials=Credentials[username=,password=,passwordKey=],discoverNodes=false,dotReplacement=,fixBadIds=false,ignoreResponseErrors=false,indexName=tutorials_videos,jsonFieldsPattern=,socketTimeout=30000,sourceIdField=,targetContentField=content,typeName=,queue=FSQueue[batchSize=20,commitLeftoversOnInit=false,ignoreErrors=false,maxPerFolder=500,retrier=Retrier[exceptionFilter=,maxCauses=10,maxRetries=0,retryDelay=0],splitBatch=OFF],committerContext=CommitterContext[eventManager=com.norconex.commons.lang.event.EventManager@6a5e167a,streamFactory=com.norconex.commons.lang.io.CachedStreamFactory@60e06f7d,workDir=.\tutorial_video\MP_32_Collector\tutorial_32_video\committer\0],fieldMappings=com.norconex.committer.core3.CommitterException: Could not commit JSON batch to Elasticsearch.,restrictions=[],request=]
10:42:22.127 [tutorial video#2] INFO DOCUMENT_COMMITTED_UPSERT - https://kidshealth.org/en/teens/center/concussions-ctr.html - Committers: ElasticsearchCommitter
10:42:22.129 [tutorial video#1] ERROR COMMITTER_UPSERT_ERROR - CommitterEvent[connectionTimeout=1000,credentials=Credentials[username=,password=
,passwordKey=],discoverNodes=false,dotReplacement=,fixBadIds=false,ignoreResponseErrors=false,indexName=tutorials_videos,jsonFieldsPattern=,socketTimeout=30000,sourceIdField=,targetContentField=content,typeName=,queue=FSQueue[batchSize=20,commitLeftoversOnInit=false,ignoreErrors=false,maxPerFolder=500,retrier=Retrier[exceptionFilter=,maxCauses=10,maxRetries=0,retryDelay=0],splitBatch=OFF],committerContext=CommitterContext[eventManager=com.norconex.commons.lang.event.EventManager@6a5e167a,streamFactory=com.norconex.commons.lang.io.CachedStreamFactory@60e06f7d,workDir=.\tutorial_video\MP_32_Collector\tutorial_32_video\committer\0],fieldMappings=com.norconex.committer.core3.batch.queue.CommitterQueueException: Could not process one or more files form committer batch located at C:\Users\pantr\eclipse-workspace\medlineplus-crawler-http.\tutorial_video\MP_32_Collector\tutorial_32_video\committer\0\queue\batch-1656600137136000000. Moved them to error directory: C:\Users\pantr\eclipse-workspace\medlineplus-crawler-http.\tutorial_video\MP_32_Collector\tutorial_32_video\committer\0\error,restrictions=[],request=UpsertRequest[reference=https://kidshealth.org/en/parents/pilonidal_gips_animation.html]]
10:42:22.129 [tutorial video#1] ERROR CrawlerCommitterService - Could not execute "upsert" on committer: ElasticsearchCommitter[connectionTimeout=1000,credentials=Credentials[username=,password=********,passwordKey=],discoverNodes=false,dotReplacement=,fixBadIds=false,ignoreResponseErrors=false,indexName=tutorials_videos,jsonFieldsPattern=,socketTimeout=30000,sourceIdField=,targetContentField=content,typeName=,queue=FSQueue[batchSize=20,commitLeftoversOnInit=false,ignoreErrors=false,maxPerFolder=500,retrier=Retrier[exceptionFilter=,maxCauses=10,maxRetries=0,retryDelay=0],splitBatch=OFF],committerContext=CommitterContext[eventManager=com.norconex.commons.lang.event.EventManager@6a5e167a,streamFactory=com.norconex.commons.lang.io.CachedStreamFactory@60e06f7d,workDir=.\tutorial_video\MP_32_Collector\tutorial_32_video\committer\0],fieldMappings={},restrictions=[]]
com.norconex.committer.core3.batch.queue.CommitterQueueException: Could not process one or more files form committer batch located at C:\Users\pantr\eclipse-workspace\medlineplus-crawler-http.\tutorial_video\MP_32_Collector\tutorial_32_video\committer\0\queue\batch-1656600137136000000. Moved them to error directory: C:\Users\pantr\eclipse-workspace\medlineplus-crawler-http.\tutorial_video\MP_32_Collector\tutorial_32_video\committer\0\error
at com.norconex.committer.core3.batch.queue.impl.FSQueue.moveUnrecoverableBatchError(FSQueue.java:429) ~[norconex-committer-core-3.0.0.jar:3.0.0]
at com.norconex.committer.core3.batch.queue.impl.FSQueue.consumeSplitableBatchDirectory(FSQueue.java:364) ~[norconex-committer-core-3.0.0.jar:3.0.0]
at com.norconex.committer.core3.batch.queue.impl.FSQueue.consumeBatchDirectory(FSQueue.java:338) ~[norconex-committer-core-3.0.0.jar:3.0.0]
at com.norconex.committer.core3.batch.queue.impl.FSQueue.queue(FSQueue.java:331) ~[norconex-committer-core-3.0.0.jar:3.0.0]
at com.norconex.committer.core3.batch.AbstractBatchCommitter.doUpsert(AbstractBatchCommitter.java:87) ~[norconex-committer-core-3.0.0.jar:3.0.0]
at com.norconex.committer.core3.AbstractCommitter.upsert(AbstractCommitter.java:215) ~[norconex-committer-core-3.0.0.jar:3.0.0]
at com.norconex.collector.core.crawler.CrawlerCommitterService.lambda$upsert$1(CrawlerCommitterService.java:84) ~[norconex-collector-core-2.0.0.jar:2.0.0]
at com.norconex.collector.core.crawler.CrawlerCommitterService.executeAll(CrawlerCommitterService.java:129) [norconex-collector-core-2.0.0.jar:2.0.0]
at com.norconex.collector.core.crawler.CrawlerCommitterService.upsert(CrawlerCommitterService.java:80) [norconex-collector-core-2.0.0.jar:2.0.0]
at com.norconex.collector.core.pipeline.committer.CommitModuleStage.execute(CommitModuleStage.java:30) [norconex-collector-core-2.0.0.jar:2.0.0]
at com.norconex.collector.core.pipeline.committer.CommitModuleStage.execute(CommitModuleStage.java:24) [norconex-collector-core-2.0.0.jar:2.0.0]
at com.norconex.commons.lang.pipeline.Pipeline.execute(Pipeline.java:91) [norconex-commons-lang-2.0.0.jar:2.0.0]
at com.norconex.collector.http.crawler.HttpCrawler.executeCommitterPipeline(HttpCrawler.java:388) [norconex-collector-http-3.0.0.jar:3.0.0]
at com.norconex.collector.core.crawler.Crawler.processImportResponse(Crawler.java:681) [norconex-collector-core-2.0.0.jar:2.0.0]
at com.norconex.collector.core.crawler.Crawler.processNextQueuedCrawlData(Crawler.java:614) [norconex-collector-core-2.0.0.jar:2.0.0]
at com.norconex.collector.core.crawler.Crawler.processNextReference(Crawler.java:556) [norconex-collector-core-2.0.0.jar:2.0.0]
at com.norconex.collector.core.crawler.Crawler$ProcessReferencesRunnable.run(Crawler.java:923) [norconex-collector-core-2.0.0.jar:2.0.0]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
at java.lang.Thread.run(Thread.java:829) [?:?]
Caused by: com.norconex.committer.core3.batch.queue.CommitterQueueException: Could not consume batch. Number of attempts: 1
at com.norconex.committer.core3.batch.queue.impl.FSQueue.consumeRetriableBatch(FSQueue.java:407) ~[norconex-committer-core-3.0.0.jar:3.0.0]
at com.norconex.committer.core3.batch.queue.impl.FSQueue.consumeSplitableBatchDirectory(FSQueue.java:356) ~[norconex-committer-core-3.0.0.jar:3.0.0]
... 18 more
Caused by: com.norconex.commons.lang.exec.RetriableException: Execution failed, maximum number of retries reached.
at com.norconex.commons.lang.exec.Retrier.execute(Retrier.java:204) ~[norconex-commons-lang-2.0.0.jar:2.0.0]
at com.norconex.committer.core3.batch.queue.impl.FSQueue.consumeRetriableBatch(FSQueue.java:395) ~[norconex-committer-core-3.0.0.jar:3.0.0]
at com.norconex.committer.core3.batch.queue.impl.FSQueue.consumeSplitableBatchDirectory(FSQueue.java:356) ~[norconex-committer-core-3.0.0.jar:3.0.0]
... 18 more
Caused by: com.norconex.committer.core3.CommitterException: Could not commit JSON batch to Elasticsearch.
at com.norconex.committer.elasticsearch.ElasticsearchCommitter.commitBatch(ElasticsearchCommitter.java:543) ~[norconex-committer-elasticsearch-5.0.0.jar:5.0.0]
at com.norconex.committer.core3.batch.AbstractBatchCommitter.consume(AbstractBatchCommitter.java:112) ~[norconex-committer-core-3.0.0.jar:3.0.0]
at com.norconex.committer.core3.batch.queue.impl.FSQueue.lambda$consumeRetriableBatch$1(FSQueue.java:398) ~[norconex-committer-core-3.0.0.jar:3.0.0]
at com.norconex.commons.lang.exec.Retrier.execute(Retrier.java:177) ~[norconex-commons-lang-2.0.0.jar:2.0.0]
at com.norconex.committer.core3.batch.queue.impl.FSQueue.consumeRetriableBatch(FSQueue.java:395) ~[norconex-committer-core-3.0.0.jar:3.0.0]
at com.norconex.committer.core3.batch.queue.impl.FSQueue.consumeSplitableBatchDirectory(FSQueue.java:356) ~[norconex-committer-core-3.0.0.jar:3.0.0]
... 18 more
Caused by: java.io.IOException: Host name 'localhost' does not match the certificate subject provided by the peer (CN=.us-east-1.es.amazonaws.com)
at org.elasticsearch.client.RestClient.extractAndWrapCause(RestClient.java:901) ~[elasticsearch-rest-client-7.16.2.jar:7.16.2]
at org.elasticsearch.client.RestClient.performRequest(RestClient.java:288) ~[elasticsearch-rest-client-7.16.2.jar:7.16.2]
at org.elasticsearch.client.RestClient.performRequest(RestClient.java:276) ~[elasticsearch-rest-client-7.16.2.jar:7.16.2]
at com.norconex.committer.elasticsearch.ElasticsearchCommitter.commitBatch(ElasticsearchCommitter.java:537) ~[norconex-committer-elasticsearch-5.0.0.jar:5.0.0]
at com.norconex.committer.core3.batch.AbstractBatchCommitter.consume(AbstractBatchCommitter.java:112) ~[norconex-committer-core-3.0.0.jar:3.0.0]
at com.norconex.committer.core3.batch.queue.impl.FSQueue.lambda$consumeRetriableBatch$1(FSQueue.java:398) ~[norconex-committer-core-3.0.0.jar:3.0.0]
at com.norconex.commons.lang.exec.Retrier.execute(Retrier.java:177) ~[norconex-commons-lang-2.0.0.jar:2.0.0]
at com.norconex.committer.core3.batch.queue.impl.FSQueue.consumeRetriableBatch(FSQueue.java:395) ~[norconex-committer-core-3.0.0.jar:3.0.0]
at com.norconex.committer.core3.batch.queue.impl.FSQueue.consumeSplitableBatchDirectory(FSQueue.java:356) ~[norconex-committer-core-3.0.0.jar:3.0.0]
... 18 more
Caused by: javax.net.ssl.SSLPeerUnverifiedException: Host name 'localhost' does not match the certificate subject provided by the peer (CN=
.us-east-1.es.amazonaws.com)
at org.apache.http.nio.conn.ssl.SSLIOSessionStrategy.verifySession(SSLIOSessionStrategy.java:209) ~[httpasyncclient-4.1.4.jar:4.1.4]
at org.apache.http.nio.conn.ssl.SSLIOSessionStrategy$1.verify(SSLIOSessionStrategy.java:188) ~[httpasyncclient-4.1.4.jar:4.1.4]
at org.apache.http.nio.reactor.ssl.SSLIOSession.doHandshake(SSLIOSession.java:360) ~[httpcore-nio-4.4.12.jar:4.4.12]
at org.apache.http.nio.reactor.ssl.SSLIOSession.isAppInputReady(SSLIOSession.java:523) ~[httpcore-nio-4.4.12.jar:4.4.12]
at org.apache.http.impl.nio.reactor.AbstractIODispatch.inputReady(AbstractIODispatch.java:120) ~[httpcore-nio-4.4.12.jar:4.4.12]
at org.apache.http.impl.nio.reactor.BaseIOReactor.readable(BaseIOReactor.java:162) ~[httpcore-nio-4.4.12.jar:4.4.12]
at org.apache.http.impl.nio.reactor.AbstractIOReactor.processEvent(AbstractIOReactor.java:337) ~[httpcore-nio-4.4.12.jar:4.4.12]
at org.apache.http.impl.nio.reactor.AbstractIOReactor.processEvents(AbstractIOReactor.java:315) ~[httpcore-nio-4.4.12.jar:4.4.12]
at org.apache.http.impl.nio.reactor.AbstractIOReactor.execute(AbstractIOReactor.java:276) ~[httpcore-nio-4.4.12.jar:4.4.12]
at org.apache.http.impl.nio.reactor.BaseIOReactor.execute(BaseIOReactor.java:104) ~[httpcore-nio-4.4.12.jar:4.4.12]
at org.apache.http.impl.nio.reactor.AbstractMultiworkerIOReactor$Worker.run(AbstractMultiworkerIOReactor.java:591) ~[httpcore-nio-4.4.12.jar:4.4.12]
... 1 more
10:42:22.149 [tutorial video#1] INFO Crawler - Could not process document: https://kidshealth.org/en/parents/pilonidal_gips_animation.html (Could not execute "upsert" on 1 committer(s): "ElasticsearchCommitter". Check the logs for more details.)

@essiembre
Copy link
Contributor

Do you still have the issue? I do not think there is a way to ignore certificates for now. I think your best bet would be to give direct access from your crawler server to your OpenSearch instance (shared VPC, IP whitelisting, etc.).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants