Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Changes for LLM evaluation project #363

Closed
wants to merge 81 commits into from
Closed
Changes from 4 commits
Commits
Show all changes
81 commits
Select commit Hold shift + click to select a range
8233a39
add open vr as a submodule
May 1, 2024
6b9f956
modified object states
May 1, 2024
63ae43d
added transition_model
May 1, 2024
e95d6e5
add human annotations of demos
May 1, 2024
4db4a10
100 tasks passed
May 3, 2024
0b1a95b
refined llm eval
May 3, 2024
40e3d38
changed position
May 3, 2024
1521819
changed location of scripts
May 3, 2024
408a3ea
refined import
May 3, 2024
d2f726b
added evoling graph
May 6, 2024
6b6b21d
added results analysis
May 6, 2024
f87d8b4
add behavior pddl domain filw
May 6, 2024
8ab927b
added new scripts for generating LLM prompts
May 8, 2024
4aa6dd3
repositioned files
May 8, 2024
72e8cd1
added important data files
May 8, 2024
b756ae9
changed import
May 8, 2024
8c81499
chaned loaction of gpt_utils.py
May 8, 2024
1d1c019
added transition_model evaluation
May 10, 2024
46d19ac
added transition_modeling
May 10, 2024
5d3b12a
renamed human annotations, tested transition modeling part
May 12, 2024
09cf4a5
renamed human_annotations action_sequence_human_annotations
May 12, 2024
50d834a
refined error analysis
May 12, 2024
f31331a
renamed final rst of action sequence batch evaluation
May 13, 2024
b56d00c
deleted old evaluation scripts
May 13, 2024
4219c6a
modified pddl domain files
May 13, 2024
3d55f85
bddl to tl translation
QinengWang-Aiden May 14, 2024
21facf5
added state dict
May 15, 2024
4a409eb
Merge branch 'master' of https://github.com/JamesKrW/iGibson
May 15, 2024
c6ea6b0
added inhandofrobot to state dict
May 15, 2024
e9bc42c
subgoal_gen
QinengWang-Aiden May 16, 2024
b86a7f5
Merge branch 'master' of github.com:JamesKrW/iGibson
QinengWang-Aiden May 16, 2024
5396eb4
renamed transition_model_graph for evolving_graph
May 16, 2024
243bcf1
Merge branch 'master' of https://github.com/JamesKrW/iGibson
May 16, 2024
07a493f
added goal interpretation code
bryanzhou008 May 16, 2024
2c69038
repositioned scripts, added check success for graphstate
May 16, 2024
dc9216a
changed location of scripts
May 16, 2024
e92a617
fixed bug for slicable objects reformatted state dict
May 16, 2024
9ef42ad
fixed bug in evolving graph,reannotated action sequences
May 17, 2024
f43c43e
added pddl generatiom pipeline
May 20, 2024
df46547
generate problem pddl for behavior100 bddl
May 20, 2024
c96d345
reanamed bahavior bddl info folder
May 20, 2024
3f51822
minor changes
May 21, 2024
439f2df
redefined special names
May 21, 2024
e2b0687
removed navigate to in igibson/evolving_graph/evolving_graph.py
May 22, 2024
7e2a019
removed taskonlyobjets option in evolving_graph
May 22, 2024
ce5a3e6
subgoal evaluation part finished
QinengWang-Aiden May 22, 2024
2b57ce6
update eval subgoal part
QinengWang-Aiden May 23, 2024
23f5fcf
modified action def
May 23, 2024
282da75
refined evluation for action sequence
May 23, 2024
651f1f7
modified evaluation for action sequence
May 25, 2024
77694a9
modified evaluation for action sequence
May 25, 2024
3f8ba83
modified as prompt
May 25, 2024
30021c8
modified check success, human annotation and grasp
May 26, 2024
9dd6450
modified pddl generator
May 26, 2024
617e1f7
modified prompt
May 26, 2024
ebd37e7
modifed evaluation for as
May 27, 2024
b936990
restructured data
May 27, 2024
49632ac
restructured data
May 27, 2024
3fe580d
added resources for pddl
May 27, 2024
1f3c41e
eval all llm
May 27, 2024
b1f6842
update evolving_graph
May 28, 2024
70be386
fix evolving graph bugs
May 29, 2024
4c0dfb0
debugged clean in evolving graph
Jun 2, 2024
9279747
debugged asevaluator
Jun 2, 2024
c0dbcf8
update path in transition modeling
Jun 2, 2024
4244957
modifed naming of action sequence prompt
Jun 11, 2024
187daf7
modified naming for t_m prompts
Jun 11, 2024
38ea108
update raw subgoal decomposition
QinengWang-Aiden Jun 17, 2024
844bf6d
slight modify raw subgoal decomposition
QinengWang-Aiden Jun 17, 2024
2bd4335
update evol_graph
QinengWang-Aiden Jun 18, 2024
b2535e9
align envoling_graph/env with subgoal evaluation
Jun 18, 2024
a518d05
moved human human_annotations
Jun 18, 2024
3c1467b
moved human annotation, added demo names
Jun 18, 2024
698d7f1
Merge branch 'master' of github.com:JamesKrW/iGibson
QinengWang-Aiden Jun 18, 2024
8c5e9d4
merge branches with slight modification
QinengWang-Aiden Jun 18, 2024
f37befc
Merge remote-tracking branch 'origin/subgoal_evaluation'
QinengWang-Aiden Jun 18, 2024
b631767
subgoal input prompt gen part refined
QinengWang-Aiden Jun 19, 2024
ec8d043
update input prompt refine part
QinengWang-Aiden Jun 19, 2024
f60452c
update forml eval subgoal
QinengWang-Aiden Jun 19, 2024
f477c84
merged data and prompt
bryanzhou008 Jun 20, 2024
84eecf5
updated goal interpretation code
bryanzhou008 Jun 20, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion igibson/evaluation/eval_subgoal_plan/.gitignore
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
generate_vocab.py
analyze*.py

*bat
*.bat
173 changes: 151 additions & 22 deletions igibson/evaluation/eval_subgoal_plan/checkers.py
Original file line number Diff line number Diff line change
@@ -9,7 +9,7 @@
from igibson.evolving_graph.eval_evolving_graph_env import EvalGraphEnv
from igibson.evolving_graph.evolving_graph import EvolvingGraph, GraphState, ErrorType, ErrorInfo
from igibson.evaluation.eval_subgoal_plan.tl_formula.simple_tl_parser import parse_simple_tl
from igibson.evaluation.eval_subgoal_plan.tl_formula.simple_tl import SimpleTLExpression, Proposition, Action
from igibson.evaluation.eval_subgoal_plan.tl_formula.simple_tl import SimpleTLExpression, Proposition, Action, SimpleTLNot, SimpleTLPrimitive
from igibson.evaluation.eval_subgoal_plan.tl_formula.simple_tl import extract_args, extract_propositions_and_actions, sample_a_determined_path_from_tl_expr
from typing import List, Dict, Any, Optional, Tuple, Union

@@ -119,7 +119,8 @@ def run_checker(self) -> bool:
'''This method runs the checker to check the subgoal plan.

Returns:
bool: whether the subgoal plan is correct'''
bool: whether the subgoal plan is correct
'''
raise NotImplementedError('This method should be implemented in the subclass.')

def update_statistics(self, error_info) -> None:
@@ -157,13 +158,16 @@ def update_statistics(self, error_type, error_info) -> None:
self.error_info = error_info



def run_checker(self) -> bool:
try:
self.parsed_tl_expression = parse_simple_tl(self.tl_formula, self.vocab.predicate_list, self.vocab.action_list)
except Exception as e:
error_type = str(e.__class__.__name__)
error_info = str(e)
if 'Unknown primitive' in error_info:
error_type = 'UnknownPrimitive'
else:
error_type = 'NotParseable'
self.update_statistics(error_type, error_info)
return False
return True
@@ -243,9 +247,48 @@ def __init__(self, env:EvalGraphEnv, subgoal_plan: SubgoalPlan, vocab: Vocab, tl
self.feasible_action_seqs = []
self.error_info = []
self.executable = False
self.goal_info = None
self.run_result = self.run_checker() if semantic_rst else False


def get_special_state(self, subgoal: SimpleTLExpression):
if isinstance(subgoal, SimpleTLNot) and isinstance(subgoal.arg, SimpleTLPrimitive):
state_name = subgoal.arg.prop_or_action.name
if 'stained' in state_name:
return 'stained'
if 'dusty' in state_name:
return 'dusty'
return None

@staticmethod
def handle_compound_errors(error_info: ErrorInfo):
error_dict = error_info.report_error()
error_type_list = error_dict['error_type']
error_info_list = error_dict['error_info']
if len(error_type_list) > 1:
# precedance: additional step > wrong order > missing step > affordance
# maintain the highest precedance error
if str(ErrorType.ADDITIONAL_STEP) in error_type_list:
new_error = ErrorType.ADDITIONAL_STEP
new_error_info_list = [error_info_list[error_type_list.index(str(new_error))]]
elif str(ErrorType.WRONG_TEMPORAL_ORDER) in error_type_list:
new_error = ErrorType.WRONG_TEMPORAL_ORDER
new_error_info_list = [error_info_list[error_type_list.index(str(new_error))]]
elif str(ErrorType.MISSING_STEP) in error_type_list:
new_error = ErrorType.MISSING_STEP
new_error_info_list = [error_info_list[error_type_list.index(str(new_error))]]
elif str(ErrorType.AFFORDANCE_ERROR) in error_type_list:
new_error = ErrorType.AFFORDANCE_ERROR
new_error_info_list = [error_info_list[error_type_list.index(str(new_error))]]
else:
assert False, f'Unknown error type list {error_type_list}'
new_error_info = ErrorInfo()
new_error_info.update_error(new_error, new_error_info_list[0])
return new_error_info
return error_info

def execute_subgoal_plan(self):
init_action_env_state = copy.deepcopy(self.env.action_env.cur_state)
init_saved_history_state = copy.deepcopy(self.env.action_env.history_states)
prev_action_env_state = copy.deepcopy(self.env.action_env.cur_state)
prev_saved_history_state = copy.deepcopy(self.env.action_env.history_states)
prev_executed_action_list = []
@@ -270,40 +313,60 @@ def execute_subgoal_plan(self):
if len(cur_remained_subgoals) == 0:
executable = True
self.executable = True
success = self.env.action_env.cur_state.check_success(self.env.task)
success_dict = self.env.action_env.cur_state.check_success(self.env.task)
success = success_dict['success']
if success:
feasible_action_seqs.append(cur_executed_action_list)
feasible_action_seqs.append((cur_executed_action_list, cur_error_info_list))
correct_plan = True
else:
# for future error analysis, temporaliy assigned as missing step
error_info = ErrorInfo()
error_info.update_error(ErrorType.MISSING_STEP, 'Final goal not satisfied')
cur_error_info_list.append([error_info.report_error(), None, None])
error_type = ['ErrorType.GOAL_FAILED']
error_info = ['Final goal not satisfied']
error_dict = {
'error_type': error_type,
'error_info': error_info
}
cur_error_info_list.append([error_dict, None, None])
failed_action_seqs.append((cur_executed_action_list, cur_error_info_list))
else:
cur_subgoal = cur_remained_subgoals[0]
next_subgoal = cur_remained_subgoals[1] if len(cur_remained_subgoals) > 1 else None
self.env.update_evolving_graph_state(copy.deepcopy(prev_action_env_state), copy.deepcopy(prev_saved_history_state))
is_combined_states, action_candidates = self.state_action_translator.map_subgoal_to_action_sequence_dynamic_version(cur_subgoal, next_subgoal, self.env.action_env)
if len(action_candidates) == 0:
new_action_env_state = copy.deepcopy(self.env.action_env.cur_state)
new_executed_actions = copy.deepcopy(cur_executed_action_list)
new_remained_subgoals = copy.deepcopy(cur_remained_subgoals[1:]) if len(cur_remained_subgoals) > 1 else []
new_saved_history_state = copy.deepcopy(self.env.action_env.history_states)
new_error_info_list = copy.deepcopy(cur_error_info_list)
new_level = cur_level + 1
new_rst = (new_action_env_state, new_executed_actions, new_remained_subgoals, new_saved_history_state, new_error_info_list, new_level)
exec_queue.append(new_rst)
continue
special_state = self.get_special_state(cur_subgoal)
for action_set in action_candidates:
self.env.update_evolving_graph_state(copy.deepcopy(prev_action_env_state), copy.deepcopy(prev_saved_history_state))
cur_error_info_list = copy.deepcopy(prev_error_info_list)
# self.env.action_env = copy.deepcopy(prev_action_env)
success = True
tmp_executed_action_list = []
for action in action_set:
action_name = action['action']
action_args = action['object']
rst, error_info = self.env.eval_subgoal_apply_action(action_name, action_args)
rst, error_info = self.env.eval_subgoal_apply_action(action_name, action_args, special_state)
if not rst:
error_info = self.handle_compound_errors(error_info)
error_dict = error_info.report_error()
error_type = error_dict['error_type']
cur_error_info_list.append([error_info.report_error(), cur_subgoal, action])
if error_type[0] != str(ErrorType.ADDITIONAL_STEP):
if all(t != str(ErrorType.ADDITIONAL_STEP) for t in error_type):
success = False
failed_action_seq = copy.deepcopy(cur_executed_action_list) + action_set
failed_action_seq = copy.deepcopy(cur_executed_action_list) + tmp_executed_action_list
failed_error_info_list = copy.deepcopy(cur_error_info_list)
failed_action_seqs.append((failed_action_seq, failed_error_info_list))
break
else:
tmp_executed_action_list.append(action)
if success:
new_action_env_state = copy.deepcopy(self.env.action_env.cur_state)
new_executed_actions = copy.deepcopy(cur_executed_action_list)
@@ -319,30 +382,96 @@ def execute_subgoal_plan(self):
exec_queue.append(new_rst)
if len(feasible_action_seqs) > 0:
print('==[Has a feasible plan!]==')
return executable, correct_plan, feasible_action_seqs, failed_action_seqs
return executable, correct_plan, feasible_action_seqs, failed_action_seqs, init_action_env_state, init_saved_history_state

def get_activated_failed_action_seqs(self, failed_action_seqs:List[Tuple[List, List]]):
new_failed_action_seqs = []
for failed_info in failed_action_seqs:
failed_error_info_list = failed_info[1]
end_goal = False
for failed_error_info_dict in failed_error_info_list:
failed_error_info = failed_error_info_dict[0]
error_type = failed_error_info['error_type']
if 'GOAL_FAILED' in error_type:
end_goal = True
break
if end_goal:
new_failed_action_seqs.append(failed_info)
return new_failed_action_seqs

def get_action_seq_rst(self, init_action_env_state: GraphState, init_saved_history_state: List[GraphState], action_seq:List[Dict[str, str]]) -> Dict[str, Any]:
self.env.update_evolving_graph_state(init_action_env_state, init_saved_history_state)
for action in action_seq:
action_name = action['action']
action_args = action['object']
rst, info = self.env.eval_subgoal_apply_action(action_name, action_args)
if not rst:
print(f'Error in applying action {action_name} with args {action_args}')
success_dict = self.env.action_env.cur_state.check_success(self.env.task)
return success_dict


def run_checker(self) -> bool:
executable, correct_plan, feasible_action_seqs, failed_action_seqs = self.execute_subgoal_plan()
executable, correct_plan, feasible_action_seqs, failed_action_seqs, init_action_env_state, init_saved_history_state = self.execute_subgoal_plan()
if correct_plan:
# self.feasible_action_seqs = [seq for seq, _ in feasible_action_seqs]
min_errors = float('inf')
min_failed_action_seq = None
min_failed_error_info = None
for success_info in feasible_action_seqs:
success_action_seq = success_info[0]
success_error_info_list = success_info[1]
num_errors = len(success_error_info_list)
if num_errors < min_errors:
min_errors = num_errors
min_failed_action_seq = success_action_seq
min_failed_error_info = success_error_info_list
if min_failed_action_seq is not None:
for success_error_info_dict in min_failed_error_info:
success_subgoal = success_error_info_dict[1]
success_action = success_error_info_dict[2]
tmp_dict = {
'failed_action_sequence': min_failed_action_seq,
'failed_subgoal': str(success_subgoal),
'failed_action': success_action,
'error_info': success_error_info_dict[0]
}
self.update_statistics(tmp_dict)
self.goal_info = self.get_action_seq_rst(init_action_env_state, init_saved_history_state, min_failed_action_seq)
self.feasible_action_seqs = [min_failed_action_seq]
return executable and correct_plan and len(self.feasible_action_seqs) > 0
new_failed_action_seqs = self.get_activated_failed_action_seqs(failed_action_seqs)
failed_action_seqs = new_failed_action_seqs if len(new_failed_action_seqs) > 0 else failed_action_seqs
if len(failed_action_seqs) > 0:
min_errors = float('inf')
min_failed_action_seq = None
min_failed_action_seq_len = 0
min_failed_error_info = None

for fail_info in failed_action_seqs:
failed_action_seq = fail_info[0]
failed_error_info_list = fail_info[1]
for failed_error_info_dict in failed_error_info_list:
failed_action_seq_len = len(failed_action_seq)
num_errors = len(failed_error_info_list)
if num_errors < min_errors or (num_errors == min_errors and failed_action_seq_len > min_failed_action_seq_len): # we prefer longer action sequence
min_errors = num_errors
min_failed_action_seq = failed_action_seq
min_failed_action_seq_len = failed_action_seq_len
min_failed_error_info = failed_error_info_list

if min_failed_action_seq is not None:
for failed_error_info_dict in min_failed_error_info:
failed_subgoal = failed_error_info_dict[1]
failed_action = failed_error_info_dict[2]
tmp_dict = {
'failed_action_sequence': failed_action_seq,
'failed_action_sequence': min_failed_action_seq,
'failed_subgoal': str(failed_subgoal),
'failed_action': failed_action,
'error_info': failed_error_info_dict[0]
}
self.update_statistics(tmp_dict)
# for i, action_seq in enumerate(feasible_action_seqs):
# print(f"Feasible action sequence {i}:")
# for action in action_seq:
# print(f" {action}")
# print(f"------------------")
self.feasible_action_seqs = feasible_action_seqs
self.goal_info = self.get_action_seq_rst(init_action_env_state, init_saved_history_state, min_failed_action_seq)
self.feasible_action_seqs = []
return executable and correct_plan and len(self.feasible_action_seqs) > 0

def update_statistics(self, error_info) -> None:
Loading