Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OdfTableCell.getDisplayText() includes comment timestamp and text. #220

Open
ttaomae opened this issue May 22, 2023 · 3 comments
Open

OdfTableCell.getDisplayText() includes comment timestamp and text. #220

ttaomae opened this issue May 22, 2023 · 3 comments
Assignees
Milestone

Comments

@ttaomae
Copy link

ttaomae commented May 22, 2023

If an ODS spreadsheet cell contains a comment, the OdfTableCell.getDisplayText() method returns a string which looks something like: "2023-05-22T00:00:00CommentCell" where I assume the timestamp is the time of the comment, "Comment" is the comment text and "Cell" is the cell text.

Is this intentional? And if so, is there any way to obtain just the cell text?

@svanteschubert
Copy link
Contributor

Hi, the easiest way is to create a simple test document (with a comment and cell text) and alter an existing test to debug what the code does! I believe you are on the right track - I must admit I do not have such a good memory and this would be exactly the way I would proceed - perhaps also grep on the odfdom test folder for cell (or comment) to see if there any test already working on this.

Happy hunting, you might add the result/findings or additional questions... (or perhaps others have something to add)

@ttaomae
Copy link
Author

ttaomae commented May 27, 2023

I experimented with the OdfTableCell API and I wasn't able to find anything that does what I need. I also tried searching and didn't find any tests related to comments or annotations (since comments are represented as an <office:annotation>).

Based on the implementation of OdfTableCell.getDisplayText(), I wrote the following method which does what I need in the cases I've tested so far.

static String getCellText(OdfTableCell cell)
{
    var result = new StringBuilder();
    var whitespaceProcesser = new OdfWhitespaceProcessor();
    var nodes = cell.getOdfElement().getChildNodes();

    for (int i = 0; i < nodes.getLength(); i++) {
        var node = nodes.item(i);
        // Ignore comments.
        if (!(node instanceof OfficeAnnotationElement)) {
            // Add a line break before new paragraphs.
            if (result.length() != 0 &&node instanceof OdfTextParagraph) {
                result.append("\n");
            }
            result.append(whitespaceProcesser.getText(node));
        }
    }

    return result.toString();
}

It is still not clear to me if getDisplayText() is behaving as expected. The docs say that it returns "the text displayed in this cell." Which I would consider to be inaccurate or at least misleading for a few reasons.

  • I would argue that the comment is not technically "displayed in [the] cell".
  • I don't think the timestamp is displayed at all. At least in the version of LibreOffice Calc that I am running.
  • Even if there is no comment, the result does not match the cell text when there are multiple paragraphs since it doesn't include line breaks between paragraphs.

@svanteschubert
Copy link
Contributor

svanteschubert commented May 30, 2023

I wonder why the method is being called getDisplayText() and not getTextContent()?

This getDisplayText() method
https://github.com/tdf/odftoolkit/blob/master/odfdom/src/main/java/org/odftoolkit/odfdom/doc/table/OdfTableCell.java#L697
is calling
https://github.com/tdf/odftoolkit/blob/master/odfdom/src/main/java/org/odftoolkit/odfdom/incubator/doc/text/OdfWhitespaceProcessor.java#L49
which incorrectly considers only children and no descendands.

As stated in #229:
OdfElement has the base functionality to concatenate the text content:
https://github.com/tdf/odftoolkit/blob/master/odfdom/src/main/java/org/odftoolkit/odfdom/pkg/OdfElement.java#L2633
but the every text node containing element like OdfTextSpan should override this method and define its specific behavior.
By this method implementation, the specific behavior.

Finally, there is some third funcationality in https://github.com/tdf/odftoolkit/blob/master/odfdom/src/main/java/org/odftoolkit/odfdom/incubator/doc/text/OdfTextExtractor.java

These approaches should (and will) be harmonized to avoid duplicated implementations.

@svanteschubert svanteschubert self-assigned this May 30, 2023
@svanteschubert svanteschubert added this to the 1.0.0 milestone May 30, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants