Skip to content

Commit

Permalink
Updated README.md and comments.
Browse files Browse the repository at this point in the history
  • Loading branch information
parpalak committed Nov 7, 2023
1 parent e821cf7 commit 67c95db
Show file tree
Hide file tree
Showing 4 changed files with 27 additions and 3 deletions.
3 changes: 2 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -210,7 +210,8 @@ Then you can run queries with different restrictions:
- `(new Query('content'))->setInstanceId(2)` searches through comments,
- `(new Query('content'))` searches everywhere.

If you omit instance_id or provide `instance_id === null`, a value `0` will be used internally. This content can match only queries without instance_id restriction.
When indexing, if you omit instance_id or provide `instance_id === null`, a value `0` will be used internally.
Such content can only match queries without instance_id restrictions.

### Content format and extraction

Expand Down
6 changes: 5 additions & 1 deletion src/S2/Rose/Entity/Metadata/SentenceMap.php
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,10 @@ class SentenceMap
private array $paragraphs = [];
private int $formatId;

/**
* @param int $formatId Id of formatting.
* @see SnippetSource::ALLOWED_FORMATS for formatting
*/
public function __construct(int $formatId)
{
$this->formatId = $formatId;
Expand All @@ -37,7 +41,7 @@ public function __construct(int $formatId)
/**
* @param int $paragraphIndex Number of current paragraph. Must be detected outside based on formatting.
* @param string $path Some identifier of a content node. Must be unique for the paragraph given.
* @param string $textContent Raw text content of a node. No formatting is supported now. TODO add simple formatting?
* @param string $textContent Raw text content of a node. Formatting must correspond to formatId constructor parameter.
*/
public function add(int $paragraphIndex, string $path, string $textContent): self
{
Expand Down
19 changes: 19 additions & 0 deletions src/S2/Rose/Entity/Metadata/SnippetSource.php
Original file line number Diff line number Diff line change
Expand Up @@ -7,9 +7,28 @@
namespace S2\Rose\Entity\Metadata;

use S2\Rose\Exception\InvalidArgumentException;
use S2\Rose\Helper\StringHelper;

class SnippetSource
{
/**
* Snippets in search results can store formatting information.
* There are now 2 formatting options available: no formatting and so-called "internal" formatting.
*
* In the first case (FORMAT_PLAIN_TEXT), the text is stored in the snippet as it is.
*
* In the second case (FORMAT_INTERNAL), the backslash character starts to play a special role.
* In internal formatting, backslashes in the source text must be escaped, i.e. \ is changed to \\.
* Then, a single slash and the following character encode a formatting alternation.
* For example, in the sentence "This is a \bbold\B example", the word "bold" is bolded.
* Similarly, \i and \I indicate italics, \u and \U indicate superscripts, and \d and \D indicate subscripts.
*
* The formatting is supposed to be correct, properly balanced. When converting to html,
* formatting characters are translated into html tags by usual substitution of substrings.
* Incorrect internal formatting will lead to incorrect html in the output.
*
* @see StringHelper::convertInternalFormattingToHtml for details of internal formatting processing.
*/
public const FORMAT_PLAIN_TEXT = 0;
public const FORMAT_INTERNAL = 1;
private const ALLOWED_FORMATS = [self::FORMAT_PLAIN_TEXT, self::FORMAT_INTERNAL];
Expand Down
2 changes: 1 addition & 1 deletion src/S2/Rose/Helper/StringHelper.php
Original file line number Diff line number Diff line change
Expand Up @@ -50,7 +50,7 @@ public static function sentencesFromText(string $text, bool $hasFormatting): arr
// For example, consider the input: 'Sentence <i>1. Sentence 2. Sentence</i> 3.'
// After processing, it becomes ['Sentence <i>1.</i>', 'Sentence 2.', '<i>Sentence</i> 3.'].
//
// This approach is reasonable because sentences are typically divided into snippets,
// This approach is reasonable because individual sentences are typically joined into snippets,
// and preserving formatting across multiple sentences may not be meaningful.
array_walk($substrings, static function (string &$text) {
$text = self::fixUnbalancedInternalFormatting($text);
Expand Down

0 comments on commit 67c95db

Please sign in to comment.