Skip to content

PrincetonCompMemLab/narrative

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Notification:

#f03c15 Since the coffee shop world generator is under active development, the usage might change every once in awhile... I will try my best to keep the documentation up to date!

#f03c15 Note that projectSEM/narrative is a forked repo. I work with the original repo at PrincetonCompMemLab/narrative directly, and I will update push to projectSEM/narrative whenever I made major update.

Resources

Some related papers and data sets can be found here

Slides from Sep 2017 Princeton meeting:

History, current status, and future of coffee shop world (7:30-8:15pm)

  • Alex - The coffee shop story generator - slides
  • Andre - Behavioral experiments - slides
  • Qihong - Neural networks for schema learning - slides

The coffee shop world "engine"

The "engine" takes a schema and generates a bunch of stories! Concretely, a schema is a graph {V,E} representing some states and transitions. Each state is a sentence that can be binded with some role fillers. For example,

1. Generate stories run_engine.py

Currently, we have two schema: "poetry reading" and "fight". Here's an exmaple of poetry reading:

Mariko walked into the coffee shop on poetry night. She found an empty chair next to Sarah. "Oh hi there Mariko!" said Sarah. "I am glad you could make it Sarah!" Mariko replied. Olivia, who was the emcee for tonight, walked to the front of the room and introduced the first poet, Julian. Julian stepped up to the microphone and read the poem that he had written: "I wandered lonely as a cloud that floats on high over vales and hills, when all at once I saw a crowd, A host, of golden daffodils." The crowd snapped their fingers politely. Mariko had also written a poem, but decided that she was not in the mood to share it today. After all the poets had performed, Mariko and Sarah said their goodbyes and walked toward the door. Mariko made a mental note to come back again next week.

how to use

In the terminal, cd to \src, and run the cmd with the following format

python run_engine.py [schema_file_1] [schema_file_2] ... [schema_file_k] [niter] [nrepeats]"

where, schema_file_i is a txt schema file, n_iter is the number of stories and alternating is a boolean value that indicates if the generated stories will alternate across the k schemas. For example, the following command generate 8 stories (alternating between 2 poetry stories and 2 fight stories for 2 iterations).

python run_engine.py poetry fight 2 2

After running the cmd, you will see a file called schemaFiles_niter_nrepeats.txt under the story/schemaFiles_niter_nrepeats/ directory and another "question file" that can be used for 2AFC next state prediction task.

For example:

Pradeep stepped up to the microphone and read the poem ...... The crowd snapped their fingers politely.
Alter_1_fillers: Sarah->Pradeep (Friend->Poet) 
0, Subject_performs, Will then took at turn at the microphone. "Love at the lips was touch, as sweet as I could bear, and once that seemed too much, I lived on air." When he sat back down, Pradeep said that he loved the poem.
Permute_2_fillers: Sarah->Olivia (Friend->Emcee) Will->Sarah (Subject->Friend) 
0, Subject_performs, Sarah then took at turn at the microphone. "You may write me down in history with your bitter, twisted lies, you may trod me in the very dirt but still, like dust, I will rise." When she sat back down, Olivia said that she loved the poem.
Subject.Mood == "happy"
0.1, Subject_declines, Will had also written a poem, but decided that he was not in the mood to share it today.
Truth
0.9, Subject_performs, Will then took at turn at the microphone. "Love at the lips was touch, as sweet as I could bear, and once that seemed too much, I lived on air." When he sat back down, Sarah said that she loved the poem.

The 1st line shows the current state. Then every next-two-lines-section represents one alternative state. For example, the line Alter_1_fillers: Sarah->Pradeep (Friend->Poet), represents the fact that the state below was generated by alter one of the fillers (Sarah became Pradeep). Similarly, the state below Permute_2_fillers: Sarah->Olivia (Friend->Emcee) Will->Sarah (Subject->Friend) was generated by permuting several fillers. All states generated by altering fillers are technically "impossible" so they always have probability zero. In contrast, the header Subject.Mood == "happy" means that the state below is generated by going to an alternative future state (which is possible when the current state has more than one successors), and the next state probabilistically depends on "subject's mood". If the next state does not depends on the properties of any fillers, then the header will be marked as default. Finally, the state under Truth is the actual next state.

Optional output: In run_engine.py several parameters of the main method (listed in the code block below)control options for the story output file. You can do any combination of them, besides having both gen_symbloic_states and attach_role_marker. Currently, these features are not supported as command line arguments and need to be set in the file run_engine.py.

mark_end_state = False
attach_questions = False
gen_symbolic_states = False
attach_role_marker = False
attach_role_maker_before = ['Pronoun', 'Name', 'Pronoun_possessive', 'Pronoun_object']
  1. mark_end_state When this is True, the engine generate stories with "end-of-state markers" and "end-of-story markers".

For example:

Mariko walked into the coffee shop on poetry night. ENDOFSTATE She found an empty chair next to Sarah. "Oh hi there Mariko!" said Sarah. "I am glad you could make it Sarah!" Mariko replied. ENDOFSTATE ......

  1. attach_questions

Note that this has NOTHING to do with the question file. When this is turned on, each sentence of the the story file is followed by "question markers".

For example:

Mariko walked into the coffee shop on poetry night. QSubject She found an empty chair next to Sarah. "Oh hi there Mariko!" said Sarah. "I am glad you could make it Sarah!" Mariko replied. QFriend QSubject

  1. attach_role_marker

When this is turned on, the engine insert the role before all fillers.

For example:

Subject Mariko walked into the coffee shop on poetry night. Subject she found an empty chair next to Friend Sarah. "Oh hi there Subject Mariko!" said Friend Sarah. "I am glad you could make it Friend Sarah!" Subject Mariko replied.

  1. gen_symbolic_states

When this is turned on, the engine generate "symbolic states".

For example, the story will look like:

Mariko BEGIN, Mariko Sit_down Sarah, Olivia Emcee_intro Julian, Julian Poet_performs, Mariko Subject_performs Sarah, Mariko Say_goodbye Sarah, Mariko END.

Functionalties:

  • generate stories according to some input schema
  • alternate between k schema
  • insert state/story boundaries
  • generate 2 alternative force choice questions for the next state
  • generate symbolic states
  • plot the graph of the schema (markov model)
  • add "higher order schema"
  • generate sentences, interleaved across k stories?


2. post-processing proc_txt.py

how to use

Having generated a text file contains a bunch of stories (from step 1) under the story/schemaFiles_niter_nrepeats/ directory, in the terminal, cd to \src, and run the cmd with the following format

python proc_txt.py schemaFiles_niter_nrepeats

where schemaFiles_niter_nrepeats is the name of the directory you just generated from step 1. For example, the following cmd is valid:

python proc_txt.py poetry_fight_2_2

This procedure generates 3 directories shuffle_none/, shuffle_words/, shuffle_states/, for training data without no shuffling, shuffled words and shuffled states, respectively. Inside each directory, you will see a bunch of files:

  • chars_we.txt - story with end markers
  • chars_woe.txt - story without end markers
  • words_*.npz - python friendly data file, has subfields train and valid for training and validation
  • word_dict.pickle - the dictionary of the token words
  • metadata.txt - text file version of word_dict.pickle (might add more things in the future)

Functionalities:

  • separate training vs. test set and save to .npz file
  • remove punctuation marks
  • transform characters to lower case
  • insert state/story boundaries
  • convert character representations to word representations
  • shuffle/reverse the order of words within each state
  • shuffle/reverse the order of sentences within each story
    • shuffle/reverse every k-words segment (Amy)
  • convert to a cleaner key-value binding test
    • replace non-filler words by "blah"
    • replace the k-th non-filler consecutive-words by "blah_k"

Let me know if you have more suggestions - [email protected]

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •  

Languages