Skip to content

Commit

Permalink
Reintroduce excessive body content truncation
Browse files Browse the repository at this point in the history
We previously removed this (#129) as the main issue we were seeing was
around metadata limits being exceeded, but we do have a very small
subset of documents that exceed even the 1MB body content limit.

This adds truncation back in (at 999KB) to handle that long tail.
  • Loading branch information
csutter committed Mar 5, 2024
1 parent 805bf5a commit 076ad27
Show file tree
Hide file tree
Showing 2 changed files with 20 additions and 0 deletions.
6 changes: 6 additions & 0 deletions app/models/concerns/publishing_api/content.rb
Original file line number Diff line number Diff line change
Expand Up @@ -52,6 +52,11 @@ module Content
].map { JsonPath.new(_1, use_symbols: true) }.freeze
INDEXABLE_CONTENT_SEPARATOR = "\n".freeze

# The limit of content length on Discovery Engine API is currently 1MB (not MiB), a small
# handful of documents exceed this so we need to truncate the content to a reasonable size.
# This is slightly lower than 1 million bytes to allow for some rounding error.
INDEXABLE_CONTENT_MAX_BYTE_SIZE = 999_000

# Extracts a single string of indexable unstructured content from the document.
def content
values_from_json_paths = INDEXABLE_CONTENT_VALUES_JSON_PATHS.map do |item|
Expand All @@ -67,6 +72,7 @@ def content
.flatten
.compact_blank
.join(INDEXABLE_CONTENT_SEPARATOR)
.truncate_bytes(INDEXABLE_CONTENT_MAX_BYTE_SIZE)
end
end
end
14 changes: 14 additions & 0 deletions spec/models/concerns/publishing_api/content_spec.rb
Original file line number Diff line number Diff line change
Expand Up @@ -72,6 +72,20 @@
it { is_expected.to eq("<h1>Foo</h1>\nbar\n<h1>Bar</h1>\n<blink>baz</blink>") }
end

describe "with excessively large content" do
let(:document_hash) do
{
details: {
body: "a" * 1200.kilobytes,
},
}
end

it "truncates the content" do
expect(extracted_content.bytesize).to be < 1_000_000
end
end

describe "without any fields" do
let(:document_hash) do
{
Expand Down

0 comments on commit 076ad27

Please sign in to comment.