Questions on Training Configuration and Evaluation Errors for MTR Model on unScenes Dataset #35

Platonight · 2024-11-05T03:46:22Z

Thank you for your excellent work! I have two questions I’d like to discuss with you:

I encountered a similar issue with the MTR model not achieving the expected results on the unScenes dataset. From previous responses in the issues, I understand that I need to set two configurations:

max_data_num to -1
--split to options: prediction_split = ["mini_train", "mini_val", "train", "train_val", "val"]

Following this, I generated three sub-datasets using train, train_val, and val. Specifically:

For training, I used train and train_val and set the config.yaml file as follows:

# exp setting
exp_name: 'test'
ckpt_path: null
seed: 42 # random seed
debug: True # debug mode, will use cpu only
devices: [1, 2] # gpu ids

# data related
load_num_workers: 0 # number of workers for loading data
train_data_path: ["/data/unScenes/result_train/"] # list of paths to the train data
val_data_path: ["/data/unScenes/result_train_val/"] # list of paths to the train_val data

max_data_num: [-1] # maximum number of data for each training dataset
past_len: 21 # history trajectory length, 2.1s
future_len: 60 # future trajectory length, 6s
object_type: ['VEHICLE'] # object types included in the training set
line_type: ['lane', 'stop_sign', 'road_edge', 'road_line', 'crosswalk', 'speed_bump'] # line types to consider in input
masked_attributes: ['z_axis', 'size'] # attributes to mask in input
trajectory_sample_interval: 1 # trajectory sample interval
only_train_on_ego: False # only train on AV
center_offset_of_map: [30.0, 0.0] # map center offset
use_cache: False # enable data loading cache
overwrite_cache: False # overwrite cache if exists
store_data_in_memory: False # store data in memory

# official evaluation
nuscenes_dataroot: "/data/sets/nuscenes/" 
eval_nuscenes: False # evaluate with nuscenes evaluation tool
eval_waymo: False # evaluate with waymo evaluation tool

defaults:
  - method: MTR

For evaluation, I used val with the config.yaml file set as follows:

# exp setting
exp_name: 'test'
ckpt_path: '/my_own_model.ckpt'
seed: 42 # random seed
debug: True # debug mode, will use cpu only
devices: [1, 2] # gpu ids

# data related
load_num_workers: 0 # number of workers for loading data
val_data_path: ["/data/unScenes/result_val/"] # list of paths to val data

max_data_num: [-1] # maximum number of data for each dataset
past_len: 21 # history trajectory length, 2.1s
future_len: 60 # future trajectory length, 6s
object_type: ['VEHICLE'] # object types in training set
line_type: ['lane', 'stop_sign', 'road_edge', 'road_line', 'crosswalk', 'speed_bump'] # line types in input
masked_attributes: ['z_axis', 'size'] # attributes to mask in input
trajectory_sample_interval: 1 # trajectory sample interval
only_train_on_ego: False # only train on AV
center_offset_of_map: [30.0, 0.0] # map center offset
use_cache: False # data loading cache
overwrite_cache: False # overwrite cache if exists
store_data_in_memory: False # store data in memory

# official evaluation
nuscenes_dataroot: "/data/sets/nuscenes/" 
eval_nuscenes: False # evaluate with nuscenes evaluation tool
eval_waymo: False # evaluate with waymo evaluation tool

defaults:
  - method: MTR

Our MTR.yaml is:

# common
model_name: MTR

# model
CONTEXT_ENCODER:
  NAME: MTREncoder
  NUM_OF_ATTN_NEIGHBORS: 7
  NUM_INPUT_ATTR_AGENT: 39
  NUM_INPUT_ATTR_MAP: 29
  NUM_CHANNEL_IN_MLP_AGENT: 256
  NUM_CHANNEL_IN_MLP_MAP: 64
  NUM_LAYER_IN_MLP_AGENT: 3
  NUM_LAYER_IN_MLP_MAP: 5
  NUM_LAYER_IN_PRE_MLP_MAP: 3
  D_MODEL: 256
  NUM_ATTN_LAYERS: 6
  NUM_ATTN_HEAD: 8
  DROPOUT_OF_ATTN: 0.1
  USE_LOCAL_ATTN: True

MOTION_DECODER:
  NAME: MTRDecoder
  NUM_MOTION_MODES: 6
  INTENTION_POINTS_FILE: 'models/mtr/cluster_64_center_dict_6s.pkl'
  D_MODEL: 512
  NUM_DECODER_LAYERS: 6
  NUM_ATTN_HEAD: 8
  MAP_D_MODEL: 256
  DROPOUT_OF_ATTN: 0.1
  NUM_BASE_MAP_POLYLINES: 256
  NUM_WAYPOINT_MAP_POLYLINES: 128
  LOSS_WEIGHTS: {
    'cls': 1.0,
    'reg': 1.0,
    'vel': 0.5
  }

  NMS_DIST_THRESH: 2.5

# train
max_epochs: 60
learning_rate: 0.0001
learning_rate_sched: [ 22, 24, 26, 28 ]
optimizer: AdamW
scheduler: lambdaLR
grad_clip_norm: 1000.0
weight_decay: 0.01
lr_decay: 0.5
lr_clip: 0.000001
WEIGHT_DECAY: 0.01
train_batch_size: 48 #32 #128
eval_batch_size: 48 #32 #128

# data related
max_num_agents: 64
map_range: 100
max_num_roads: 768

# will be overwritten if manually_split_lane is True
max_points_per_lane: 20

manually_split_lane: True
point_sampled_interval: 1
num_points_each_polyline: 20
vector_break_dist_thresh: 1.0

However, I’m a bit confused by your previous response mentioning that “we have not touched the ‘train_val’ split.”
What's more our MTR model is still not achieving the expected results in the paper. Could you clarify if there’s an issue with my current setup or are there any other adjustments I need to make?

When attempting to use the official evaluation, I set eval_nuscenes to True and pointed nuscenes_dataroot to the raw unScenes data, but encountered an error:

File "/UniTraj/unitraj/models/base_model/base_model.py", line 157, in compute_official_evaluation
'instance': input_dict['scenario_id'][bs_idx].split('_')[1],
IndexError: list index out of range

After inspection, I found the issue is due to scenario_id[0] returning only ['scene-0233'] without the expected additional parts for ‘instance’ (scenario_id.split('')[1]) and ‘sample’ (scenario_id.split('')[2]). It appears this stems from the unified data format used in the Unitraj batch dictionary. Could you advise if there’s an adjustment I might be missing here?

Thank you very much for your assistance!

Let me know if you need any further adjustments!

The text was updated successfully, but these errors were encountered:

Alan-LanFeng · 2024-11-07T13:24:03Z

Hi, your configuration looks good.

nuscens has 3 splits: train, train_val, and val. We train MTR on train and validate on val. There are 32k samples in train and 9k samples in val, can you double check this?

ChengkaiYang · 2024-11-07T17:38:14Z

Hi, your configuration looks good.

nuscens has 3 splits: train, train_val, and val. We train MTR on train and validate on val. There are 32k samples in train and 9k samples in val, can you double check this?

Sorry, Alan.When I'm using Unitraj to preprocess nuscenes of 'train_val', it only has 1136 scenes, which only contains about 1136 trajectories(far away from 32k).I'm using default config file.Could you please help me what's the problem?
Does 32k trajectories contain surrounding agents of focal agent in a scenario?I found we only train focal agent so if it's correct I only train 1136 trajectories for nuscenes dataset?Some experiments' result may confirme my suppose——（I got almost same brier-fde6 3.51 as paper）.Looking forward for your reply!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Questions on Training Configuration and Evaluation Errors for MTR Model on unScenes Dataset #35

Questions on Training Configuration and Evaluation Errors for MTR Model on unScenes Dataset #35

Platonight commented Nov 5, 2024

Alan-LanFeng commented Nov 7, 2024

ChengkaiYang commented Nov 7, 2024 •

edited

Loading

Questions on Training Configuration and Evaluation Errors for MTR Model on unScenes Dataset #35

Questions on Training Configuration and Evaluation Errors for MTR Model on unScenes Dataset #35

Comments

Platonight commented Nov 5, 2024

Alan-LanFeng commented Nov 7, 2024

ChengkaiYang commented Nov 7, 2024 • edited Loading

ChengkaiYang commented Nov 7, 2024 •

edited

Loading