Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support other source data types #11

Open
dcramer opened this issue Sep 25, 2024 · 2 comments
Open

Support other source data types #11

dcramer opened this issue Sep 25, 2024 · 2 comments

Comments

@dcramer
Copy link
Owner

dcramer commented Sep 25, 2024

Currently we only support PDF. Obviously we could add markdown support pretty easily, but there's two other things that would be awesome:

  1. Images - if nothing else to be able to embed them. Technically we could OCR them but that'll be hit and miss.
  2. Spreadsheets - not really sure what to do here, but for example Terraforming Mars has no information on any of its cards, and the cards are actually key to the game. The community has created a spreadsheet that we could use as a model.
@dcramer
Copy link
Owner Author

dcramer commented Sep 25, 2024

One thought here that we can curate is having it be aware of BGG's API. I dont know that we wanna crawl the whole site and index it (seems hard), but we could at the very least tell it to index certain threads.

https://boardgamegeek.com/wiki/page/BGG_XML_API2
https://github.com/WanielDeiss/rx-bgg-api

@dcramer
Copy link
Owner Author

dcramer commented Sep 25, 2024

Open question: If we automatically indexed BGG, what would the strategy be?

Maybe cutoff on threads based on number of replies? Only pinned? (pinned seems risky)

https://boardgamegeek.com/boardgame/167791/terraforming-mars/forums/66?pageid=1&sort=hot

There's still an issue in some cases that the material we need ends up being an external doc. Also its possible the content is not that heavy to index, though we'd have to store it in Postgres + run tsvector+embeddings on the whole thing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant