About running COFFE with local optimization instead of global #40

yc2367 · 2022-11-12T05:23:39Z

Hi,

I am currently using coffe to run automatic sizing for a full adder. I only care about the area and delay trade-off for this subcircuit without considering the global mode. Hence, I specify "-o local" when running coffee. However, it continuously popped up an error saying that the area_list in tran_sizing.py, line 1386 was empty.
I looked into this issue and found that the reason was that the get_eval_area() function in tran_sizing.py, line 237 doesn't return anything when the opt_type is "local":

COFFE/coffe/tran_sizing.py

Lines 237 to 243 in 09fb0d4

    
           def get_eval_area(fpga_inst, opt_type, subcircuit, is_ram_component, is_cc_component): 
        
           	# Get area based on optimization type (subcircuit if local optimization, tile if global) 
        
           	if opt_type == "local": 
        
           		area = fpga_inst.area_dict[subcircuit.name] 
        
           	# If the block being sized is part of the memory component, return ram size 
        
           	# Otherwise, the block size is returned

And the same problem exists for the get_final_area() function in tran_sizing.py, line 254:

COFFE/coffe/tran_sizing.py

Lines 254 to 257 in 09fb0d4

    
           def get_final_area(fpga_inst, opt_type, subcircuit, is_ram_component, is_cc_component): 
        
           	# Get area based on optimization type (subcircuit if local optimization, tile if global) 
        
           	if opt_type == "local": 
        
           		area = fpga_inst.area_dict[subcircuit.name]

Could you take a look at this?

vaughnbetz · 2022-11-13T15:19:30Z

You're right. I think in both cases the area that was calculated should be returned (it is put in a local variable area, but not returned). If you make a pull request with that change and test it gives reasonable results we can merge it in.

Thanks for flagging this.

yc2367 · 2022-11-16T20:25:58Z

Hi,

After carefully looking at the whole tran_sizing.py code. I found additional issues regarding the cost function for area-delay trade-offs:

For the sizing code for all components, the cost_function() function is invoked with hard-coded global and fpga_inst.sb_mux parameters for all circuit blocks. Please see the following codes as examples, but this issue appears for all blocks:

COFFE/coffe/tran_sizing.py

Line 2330 in 09fb0d4

    
           current_cost =  cost_function(get_eval_area(fpga_inst, "global", fpga_inst.sb_mux, 0, 0), get_current_delay(fpga_inst, 0), area_opt_weight, delay_opt_weight)

COFFE/coffe/tran_sizing.py

Line 2363 in 09fb0d4

    
           current_cost =  cost_function(get_eval_area(fpga_inst, "global", fpga_inst.sb_mux, 0, 0), get_current_delay(fpga_inst, 0), area_opt_weight, delay_opt_weight)

COFFE/coffe/tran_sizing.py

Line 2397 in 09fb0d4

    
           current_cost =  cost_function(get_eval_area(fpga_inst, "global", fpga_inst.sb_mux, 0, 0), get_current_delay(fpga_inst, 0), area_opt_weight, delay_opt_weight)

This won't be a problem for the "global" mode. If you look at the get_eval_area() function invoked within the cost_function() and follow the get_eval_area() definition:

COFFE/coffe/tran_sizing.py

Line 237 in 09fb0d4

    
           def get_eval_area(fpga_inst, opt_type, subcircuit, is_ram_component, is_cc_component):

When the opt_type parameter is hard-coded as "global", and the is_ram_component and is_cc_component parameters are 0 as in the case for most logic tile components, the get_eval_area() function will return fpga_inst.area_dict["tile"] which is okay. But I think a better practice is that in the sizing code for each component as shown in the above three examples, the fpga_inst.sb_mux parameter should be replaced by the name of each component being sized. For example, in this example:

COFFE/coffe/tran_sizing.py

Line 2363 in 09fb0d4

    
           current_cost =  cost_function(get_eval_area(fpga_inst, "global", fpga_inst.sb_mux, 0, 0), get_current_delay(fpga_inst, 0), area_opt_weight, delay_opt_weight)

The code is trying to size the connection block mux, then it's better to put fpga_inst.cb_mux instead of fpga_inst.sb_mux. This allows the user to run COFFE in local mode without using the wrong area (which is always the area of fpga_inst.sb_mux in the current COFFE).

The second issue is that for some components, the current COFFE is trying to assign the current_cost parameter twice. Please see the following three examples (there are more than three):

COFFE/coffe/tran_sizing.py

Lines 2575 to 2586 in 09fb0d4

    
           if quick_mode_dict[name] == 1: 
        
           	time_after_sizing = time.time() 
        
           	past_cost = current_cost 
        
           	current_cost =  cost_function(get_eval_area(fpga_inst, "global", fpga_inst.sb_mux, 0, 0), get_current_delay(fpga_inst, 0), area_opt_weight, delay_opt_weight)    
        
           	if (past_cost - current_cost)/past_cost < fpga_inst.specs.quick_mode_threshold: 
        
           		quick_mode_dict[name] = 0 
        
           	print "Duration: " + str(time_after_sizing - time_before_sizing) 
        
           	print "Current Cost: " + str(current_cost) 
        
           	current_cost =  cost_function(get_eval_area(fpga_inst, "global", fpga_inst.sb_mux, 1, 0), get_current_delay(fpga_inst, 1), area_opt_weight, delay_opt_weight)

COFFE/coffe/tran_sizing.py

Lines 2611 to 2622 in 09fb0d4

    
           if quick_mode_dict[name] == 1: 
        
           	time_after_sizing = time.time() 
        
           	past_cost = current_cost 
        
           	current_cost =  cost_function(get_eval_area(fpga_inst, "global", fpga_inst.sb_mux, 1, 1), get_current_delay(fpga_inst, 0), area_opt_weight, delay_opt_weight)    
        
           	if (past_cost - current_cost)/past_cost < fpga_inst.specs.quick_mode_threshold: 
        
           		quick_mode_dict[name] = 0 
        
           	print "Duration: " + str(time_after_sizing - time_before_sizing) 
        
           	print "Current Cost: " + str(current_cost) 
        
           	current_cost =  cost_function(get_eval_area(fpga_inst, "global", fpga_inst.sb_mux, 1, 1), get_current_delay(fpga_inst, 1), area_opt_weight, delay_opt_weight)

COFFE/coffe/tran_sizing.py

Lines 2646 to 2657 in 09fb0d4

    
           if quick_mode_dict[name] == 1: 
        
           	time_after_sizing = time.time() 
        
           	past_cost = current_cost 
        
           	current_cost =  cost_function(get_eval_area(fpga_inst, "global", fpga_inst.sb_mux, 1, 1), get_current_delay(fpga_inst, 0), area_opt_weight, delay_opt_weight)    
        
           	if (past_cost - current_cost)/past_cost < fpga_inst.specs.quick_mode_threshold: 
        
           		quick_mode_dict[name] = 0 
        
           	print "Duration: " + str(time_after_sizing - time_before_sizing) 
        
           	print "Current Cost: " + str(current_cost) 
        
           	current_cost =  cost_function(get_eval_area(fpga_inst, "global", fpga_inst.sb_mux, 1, 1), get_current_delay(fpga_inst, 1), area_opt_weight, delay_opt_weight)

COFFE/coffe/tran_sizing.py

Lines 2681 to 2692 in 09fb0d4

    
           if quick_mode_dict[name] == 1: 
        
           	time_after_sizing = time.time() 
        
           	past_cost = current_cost 
        
           	current_cost =  cost_function(get_eval_area(fpga_inst, "global", fpga_inst.sb_mux, 1, 1), get_current_delay(fpga_inst, 0), area_opt_weight, delay_opt_weight)    
        
           	if (past_cost - current_cost)/past_cost < fpga_inst.specs.quick_mode_threshold: 
        
           		quick_mode_dict[name] = 0 
        
           	print "Duration: " + str(time_after_sizing - time_before_sizing) 
        
           	print "Current Cost: " + str(current_cost) 
        
           	current_cost =  cost_function(get_eval_area(fpga_inst, "global", fpga_inst.sb_mux, 1, 1), get_current_delay(fpga_inst, 1), area_opt_weight, delay_opt_weight)

In the first example which sizes general ble output transistors,
current_cost = cost_function(get_eval_area(fpga_inst, "global", fpga_inst.sb_mux, 1, 1), get_current_delay(fpga_inst, 0), area_opt_weight, delay_opt_weight) is trying to assign the current_cost using get_current_delay(fpga_inst, 0).
However, in the second assignment,
current_cost = cost_function(get_eval_area(fpga_inst, "global", fpga_inst.sb_mux, 1, 1), get_current_delay(fpga_inst, 1), area_opt_weight, delay_opt_weight) is trying to assign the current_cost using get_current_delay(fpga_inst, 1). And this second wrong assignment will be used as the past_cost in the next iteration.

If you look at the get_current_delay() function:

COFFE/coffe/tran_sizing.py

Line 406 in 09fb0d4

def get_current_delay(fpga_inst, is_ram_component):

The second parameter is_ram_component specifies whether this component is a RAM component, this parameter should be 0 when sizing the above three examples, but COFFE assigns the cost_function twice with the second assignment using is_ram_component = 1.
According to the get_current_delay() function, if is_ram_component = 1, the function will return the delay of RAM which gives a wrong cost.

COFFE/coffe/tran_sizing.py

Lines 483 to 486 in 09fb0d4

    
           if is_ram_component == 0: 
        
           	return path_delay 
        
           else: 
        
           	return ram_delay

After modifying the code with my understanding based on the above two points, I reran the test_top_level.py file to test whether I did something wrong. The tests passed for most components since I am running in the global mode. But for some carry chains and lut drivers, the tests failed with 5% ~ 15% error in delay. But I think this is because that the current COFFE uses a wrong current cost for these components as described in my second point (assigning current_cost twice).

If you want to look at my modifications, you can refer to my branch here (only changed the tran_sizing.py file): https://github.com/yc2367/COFFE/tree/Yuzong_Test

Please let me know if you prefer me to create a pull request for comparison. Thanks!

vaughnbetz · 2022-11-17T02:22:33Z

Thanks for the detailed analysis! @sadegh68 @aman26kbm : any thoughts? These look like good things to fix, but getting your opinion on the code in question and what tests should be run would be great.

aman26kbm · 2022-11-17T16:12:15Z

Good observations, @yc2367. Thanks for the details.

I agree that these are all bugs we should fix. Most likely they are copy-paste errors, and because we usually don't run the local mode.

I went through the changes in your branch. I agree with all changes. There are a couple things I want to mention:

On line 3125 in tran_sizing.py, looks like you missed makig the change. Instead of sb_mux, it should be configurable decoder
On line 2250 in tran_sizing.py, I am not sure why we're using sb_mux as the component. It seems to be a call to just initialize the cost, so may be it's okay.

I do see you've added a new test file as well, which runs the existing 4 tests with the local switch. That's good. Please also add the reference results files for these tests. You can generate them using the generate_reference method in tests.py.

sadegh68 · 2022-11-18T02:17:37Z

I agree with the changes. These bugs exist due to the fact that the local mode was not tested during parts of COFFE's development. A few tests focusing on the local mode should reduce the likelihood of such future bugs in the future.

yc2367 · 2022-11-18T18:03:11Z

Hi @vaughnbetz, @aman26kbm and @sadegh68 ,

Thanks a lot for your reply.

You are right that I didn't change line 3125 in tran_sizing.py, since the decoder has some if-else conditions for different fun-ins, e.g., nand2, nand3. I didn't modify this since I was not sure whether my current modifications for other blocks are correct. It seems that my understanding is correct based on your reply, so I will modify it soon.
For line 2250 in tran_sizing.py, I agree that sb_mux is used to initialize the cost. I copied the examples wrongly.

Another question, I actually reran the original COFFE with tests_top_level.py. It also gave some delay errors for components like lut_driver, carry_chain etc. Could you help me clarify this point?

I will try to add a reference file for the local mode soon. I think the benefit of local mode is that if some users want to develop additional components like computing in-memory blocks, e.g. an adder. They may want to reduce the area overhead by running the local mode just for that computing in-memory block, and this will be much faster since the rest of the sizing code can even be commented out when running the local mode for just one block.

aman26kbm · 2022-11-19T21:58:40Z

Aren't these failures expected? We are changing the code that affects the sizing of these components (lut driver and carry chain), even in the global mode. No?

yc2367 · 2022-11-19T22:02:41Z

Thanks for the reply! I mean the original COFFE without my changes, which is the current COFFFE on this repository, also failed some tests using the current test reference file.

aman26kbm · 2022-11-19T22:13:07Z

Ah I see. Could be because of the HSPICE version? The original reference files may have been generated with an older version of HSPICE.

yc2367 · 2022-11-19T22:33:30Z

Might be, I am using the 2022 version Hspice. But just want to clarify if you will have get some 5%~15% errors using yiur Hspice?

vaughnbetz · 2022-11-20T15:44:46Z

I'm not sure of the reason. The changes are pretty small, thankfully. Adding @StephenMoreOSU in case he has any ideas.

yc2367 · 2022-11-30T02:44:15Z

Hi,
May I ask if someone has helped me verify if they also had some small error when running the tests_top_level.py? Thanks a lot!

aman26kbm · 2022-11-30T03:48:01Z

Sorry I won't have the time to help with the verification.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

About running COFFE with local optimization instead of global #40

About running COFFE with local optimization instead of global #40

yc2367 commented Nov 12, 2022

vaughnbetz commented Nov 13, 2022

yc2367 commented Nov 16, 2022

vaughnbetz commented Nov 17, 2022

aman26kbm commented Nov 17, 2022

sadegh68 commented Nov 18, 2022

yc2367 commented Nov 18, 2022

aman26kbm commented Nov 19, 2022

yc2367 commented Nov 19, 2022

aman26kbm commented Nov 19, 2022

yc2367 commented Nov 19, 2022

vaughnbetz commented Nov 20, 2022

yc2367 commented Nov 30, 2022

aman26kbm commented Nov 30, 2022

About running COFFE with local optimization instead of global #40

About running COFFE with local optimization instead of global #40

Comments

yc2367 commented Nov 12, 2022

vaughnbetz commented Nov 13, 2022

yc2367 commented Nov 16, 2022

vaughnbetz commented Nov 17, 2022

aman26kbm commented Nov 17, 2022

sadegh68 commented Nov 18, 2022

yc2367 commented Nov 18, 2022

aman26kbm commented Nov 19, 2022

yc2367 commented Nov 19, 2022

aman26kbm commented Nov 19, 2022

yc2367 commented Nov 19, 2022

vaughnbetz commented Nov 20, 2022

yc2367 commented Nov 30, 2022

aman26kbm commented Nov 30, 2022