Skip to content
This repository has been archived by the owner on Oct 5, 2023. It is now read-only.

[FEAT] New dataset of parser IF #241

Open
Ordinaryperson2 opened this issue Jan 29, 2020 · 0 comments
Open

[FEAT] New dataset of parser IF #241

Ordinaryperson2 opened this issue Jan 29, 2020 · 0 comments
Labels
enhancement New feature or request

Comments

@Ordinaryperson2
Copy link

🚀 Feature Request

This webpage has gameplay transcripts for a huge variety of parser based Interactive Fiction games.
With some processing this seems like a good source of training data to me, though some of the edits would probably need to be done manually.

The processing on the game transcripts themselves should include things such as substituting '> x desk' for 'You look at the desk', removing the commands/responses that are unrecognised, and removing the out-of-world about sections of the transcripts.
There's also the issue that IRC chat is interspersed with the game transcript on the page, but this should be easily filterable for someone who knows what they're doing I assume? Or perhaps the webmaster will have separate logs.

With this data I would hope the AI could improve on describing objects/people/places, since IF does a lot of that, while the current data seems to be better at conversations and actions.

I started writing a script to download them and filter out the IRC chat, but I've realised my approach isn't a great one. I'm not super proficient at CLI, so I leave this to more capable hands if there are any takers?

#!/bin/bash
elinks -dump -dump-width 999 $1 \
| egrep " Floyd \||(to floyd)|(to Floyd)" \
| sed -e 's/[[:space:]]*Floyd [|][[:space:]]*//g' \
-e 's/[[:space:]][[:alnum:]]* says (to [Ff]loyd),/>/gi' \
-e 's/^>$//g' \
-e '/>/ { s/"//g}' \
-e 's/^\*.*el.$//g' \
-e 's/^ *> /> /g' \
-e 's/        .*//g' \
> ./Dumps/$2.txt```
@Ordinaryperson2 Ordinaryperson2 added the enhancement New feature or request label Jan 29, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant