Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for @itemprop="mainEntity" top-level items #1

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

timvdalen
Copy link

@timvdalen timvdalen commented Mar 4, 2021

An entity with @itemprop="mainEntity" is also a top-level entity (the primary entity described in the page), per: https://schema.org/mainEntity

If this is not something you want to support in the library, let me know and I'll add a (private) fork to our package repo instead.

@BenMorel
Copy link
Member

Hi, thanks for your PR!
I just tried with and without your modifications on the example HTML code located at https://schema.org/mainEntity (example 2).

Currently:

https://schema.org/WebPage
  - https://schema.org/breadcrumb: Books > Literature & Fiction > Classics
  - https://schema.org/mainEntity: (https://schema.org/Book)

With your PR:

https://schema.org/WebPage
  - https://schema.org/breadcrumb: Books > Literature & Fiction > Classics
  - https://schema.org/mainEntity: (https://schema.org/Book)
https://schema.org/Book
  - https://schema.org/image: http://www.example.com/catcher-in-the-rye-book-cover.jpg
  - ...

My remarks:

  • Book is now on the same level as WebPage; is that what we want?
  • Book is now present both at root level, and under WebPage as mainEntity; should it be filtered out from there?

@timvdalen
Copy link
Author

Thanks for your remarks! It's quite possible I've overlooked something or the page I'm testing this against isn't standards compliant.

Before, the mainEntity wasn't returned at all, since it wasn't the child of some other top-level element.

I will take some time tomorrow to test this against both my example and the example HTML code on schema.org and get back to you on your remarks.

@timvdalen
Copy link
Author

Indeed; the difference is that the example I'm using, the mainEntity isn't a child of some other top-level item.
As far as I can see from the spec, that should be legal.

Here is the example from https://schema.org/mainEntity edited to reflect the situation that prompted this PR:

<body>
	<div itemprop="mainEntity" itemscope itemtype="https://schema.org/Book">

		<img itemprop="image" src="catcher-in-the-rye-book-cover.jpg"
		     alt="cover art: red horse, city in background"/>
		<span itemprop="name">The Catcher in the Rye</span> -
		<link itemprop="bookFormat" href="https://schema.org/Paperback">Mass Market Paperback
		by <a itemprop="author" href="/author/jd_salinger.html">J.D. Salinger</a>

		<div itemprop="aggregateRating" itemscope itemtype="https://schema.org/AggregateRating">
			<span itemprop="ratingValue">4</span> stars -
			<span itemprop="reviewCount">3077</span> reviews
		</div>

		<div itemprop="offers" itemscope itemtype="https://schema.org/Offer">
			Price: $<span itemprop="price">6.99</span>
			<meta itemprop="priceCurrency" content="USD" />
			<link itemprop="availability" href="https://schema.org/InStock">In Stock
		</div>

		Product details
		<span itemprop="numberOfPages">224</span> pages
		Publisher: <span itemprop="publisher">Little, Brown, and Company</span> -
		<meta itemprop="datePublished" content="1991-05-01">May 1, 1991
		Language: <span itemprop="inLanguage">English</span>
		ISBN-10: <span itemprop="isbn">0316769487</span>

		Reviews:

		<div itemprop="review" itemscope itemtype="https://schema.org/Review">
			<span itemprop="reviewRating">5</span> stars -
			<b>"<span itemprop="name">A masterpiece of literature</span>"</b>
			by <span itemprop="author">John Doe</span>,
			Written on <meta itemprop="datePublished" content="2006-05-04">May 4, 2006
			<span itemprop="reviewBody">I really enjoyed this book. It captures the essential
  challenge people face as they try make sense of their lives and grow to adulthood.</span>
		</div>

		<div itemprop="review" itemscope itemtype="https://schema.org/Review">
			<span itemprop="reviewRating">4</span> stars -
			<b>"<span itemprop="name">A good read.</span>" </b>
			by <span itemprop="author">Bob Smith</span>,
			Written on <meta itemprop="datePublished" content="2006-06-15">June 15, 2006
			<span itemprop="reviewBody">Catcher in the Rye is a fun book. It's a good book to read.</span>
		</div>

	</div>
</body>

Currently, this library doesn't detect any Things in the given snippet, while I think it should.

With that out of the way, I agree that the output for the example you've posted is also not what we want.
I can see two basic approaches:

  1. Only return the mainEntity as a top-level Thing if it doesn't have a parent
  2. Filter out Book from WebPage so we don't report it twice

I am happy to update this PR to do either - what do you think is best here?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants