You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When indexing a file where e.g. Tika extracts too long meta data entries, the current exception handling for the resulting HTTP 400 error from solr is not very helpful.
Printing the response text in addition makes it far more understandable where the issue comes from.
So I would propose to add a print statement in case of status_code >= 400
# if bad status code, raise exception
if r.status_code >= 400:
print('Solr {} error: {}'.format(r.status_code, r.text))
r.raise_for_status()
"msg":"Exception writing document id /media/text_document.docx to the index; possible analysis error: Document contains at least one immense term in field="Text_TextEntry_ss" (whose UTF8 encoding is longer than the max length 32766), all of which were skipped. Please correct the analyzer to not produce such terms. The prefix of the first immense term is: '[107, 101, 121, 119, 111, 114, 100, 61, 88, 77, 76, 58, 99, 111, 109, 46, 97, 100, 111, 98, 101, 46, 120, 109, 112, 44, 32, 118, 97, 108]...', original message: bytes can be at most 32766 in length; got 38935. Perhaps the document has an indexed string field (solr.StrField) which is too large",
"code":400}}
The text was updated successfully, but these errors were encountered:
When indexing a file where e.g. Tika extracts too long meta data entries, the current exception handling for the resulting HTTP 400 error from solr is not very helpful.
open-semantic-etl/src/opensemanticetl/export_solr.py
Line 156 in f51efea
Printing the response text in addition makes it far more understandable where the issue comes from.
So I would propose to add a print statement in case of status_code >= 400
See the previous error output:
vs. the new error output:
The text was updated successfully, but these errors were encountered: