Application-Domain-Specic Procesors Alreay covere: Architecture moel or processor moel. Retargetable coe generation. List scheuling. Toay s topics: -space exploration by example. Selection/aaptation by: Sabih Gerez, University of Twente, April, 00 3 x filter c 0 c c 63 Control unit r - Selection/aaptation by: Sabih Gerez, University of Twente, April, 00 4 Minimizes the ference between x an e reference signal y Many applications are possible echo cancelling for TV e flyback signal kwn without echoes automatic equalization of cables in ata transmission acoustic echo cancelling e x[n] x[n-] x[n-i] x[n-63] Z - Z - Z - x[n-i] A i mu t[n] c A 0 c 0 * A c * A i c i * A 63 n * * S 0 [n] S [n] S i [n] S 63 [n] ê [n] C i [n] * Z - t[n] C i [n-] Z - r[n] Selection/aaptation by: Sabih Gerez, University of Twente, April, 00 5 - e[n] Selection/aaptation by: Sabih Gerez, University of Twente, April, 00 6
#efine mu 0. #efine WORD num<3, func main input, e : WORD r : WORD sum [ 0 ] WORD 0 x input t WORD r @ * WORD mu i : 0.. 63 :: en en ehat sum [ 64 ] r e ehat c[i]@ c [ i ] c [ i ] @ WORD t * x @ i s [ i ] WORD x @ i * c [ i ] @ sum [ i ] sum [ i ] s [ i ] Selection/aaptation by: Sabih Gerez, University of Twente, April, 00 7 r w t * r x@i * sum[i] sum[i] RAM 66 clock cycles. mm ALU bus implementation ROM MULT Selection/aaptation by: Sabih Gerez, University of Twente, April, 00 8 3 bus ACU implementation implementation 3 4 5 5 5 RAM ALU ROM ACU RAM ACU ALU MULT RAM ROM ACU bus bus 50 clock cycles 0.7 mm 0 clock cycles.4 mm Selection/aaptation by: Sabih Gerez, University of Twente, April, 00 9 Selection/aaptation by: Sabih Gerez, University of Twente, April, 00 0
Outline esign process retargetable coe generation problem statement ADSP/VLIW architectures Mistral /A RT esigner instructive emo Aelante application examples low power aspects Mistral /A RT esigner iscussion conclusion Mistral Mistral Estimation Low power aspects Architecture EXU ACTIVITY AREA POWER alu_ 0% 6 05 acs_asu_ 83% 38 386 or_asu_ 0% 6 romctrl_ 6% 65 acu_ 36% 94 05 ipb_ 0% 07 43 opb_ % 63 35 ctrl 864 3597 total 5747 7944 area spee power Estimation Selection/aaptation by: Sabih Gerez, University of Twente, April, 00 Selection/aaptation by: Sabih Gerez, University of Twente, April, 00 GSM viterbi ecoer : efault solution 3750 alu_ alu_ 96% 96% 3469 3469 4696 4696 romctrl_ romctrl_48% 48% 39 39 59 59 acu_ acu_ 6% 6% 37 37 09 09 ipb_ ipb_ 5% 5% 3 3 05 05 opb_ opb_ 3% 3% 804 804 580 580 ctrl ctrl 98 98 35035 35035 total total 559 559 88605 88605 controller responsible for 70% of power consumption maximum resource-sharing heavy ecision-making : main loop with 6 metrics-computations per iteration EXU-numbers inclue Registers for local storage GSM viterbi ecoer : loop-foling alu_ alu_ 9% 9% 34 34 45073 45073 romctrl_ romctrl_45% 45% 39 39 55 55 acu_ acu_ 5% 5% 94 94 087 087 ipb_ ipb_ 5% 5% 07 07 86 86 opb_ opb_ % % 66 66 5340 5340 ctrl ctrl 499 499 70087 70087 total total 043 043 98 98 area own by 33% power own by 35% 447 next step: reuce # of program-steps with secon ALU Selection/aaptation by: Sabih Gerez, University of Twente, April, 00 3 Selection/aaptation by: Sabih Gerez, University of Twente, April, 00 4
GSM viterbi ecoer : ALU s GSM viterbi ecoer : x ACS-ASU cycle count own 30% area up 4% power own by 5% next step: introuce ASU to reuce ALU-loa 9739 alu_ alu_ 69% 69% 797 797 48 48 alu_ alu_ 65% 65% 393 393 896 896 romctrl_ romctrl_67% 67% 39 39 55 55 acu_ acu_ 37% 37% 94 94 087 087 ipb_ ipb_ 8% 8% 49 49 9 9 opb_ opb_ 33% 33% 36 36 687 687 ctrl ctrl 8957 8957 8735 8735 total total 4766 4766 673 673 func func ACS ACS M, M, M, M, MS, MS, MS MS M M M- M- M M M- M- fi; fi; M- M- M M M- M- M M fi; fi; en; en; EXU EXU ACTIV ACTIV AREA AREA POWER POWER alu_ alu_ 0% 0% 6 6 05 05 acs_asu_ acs_asu_ 83% 83% 38 38 386 386 or_asu_ or_asu_ 0% 0% 6 6 930 romctrl_ romctrl_ 6% 6% 65 65 acu_ acu_ 36% 36% 94 94 05 05 ipb_ ipb_ 0% 0% 07 07 43 43 opb_ opb_ % % 63 63 35 35 ctrl ctrl 864 864 3597 3597 total total 5747 5747 7944 7944 cycle count own 5X power own 0X! Selection/aaptation by: Sabih Gerez, University of Twente, April, 00 5 Selection/aaptation by: Sabih Gerez, University of Twente, April, 00 6 GSM viterbi ecoer : 4 x ACS-ASU alu_ alu_ 94% 94% 43 43 97 97 acs_asu_ acs_asu_ 95% 95% 04 04 40 40 acs_asu_ acs_asu_ 95% 95% 04 04 40 40 acs_asu_3 acs_asu_3 95% 95% 04 04 40 40 acs_asu_4 acs_asu_4 95% 95% 04 04 40 40 split_asu_ split_asu_ 47% 47% 90 90 8 8 or_asu_ or_asu_ 47% 47% 59 59 8 8 romctrl_ romctrl_ 8% 8% 48 48 6 45 acu_ acu_ 98% 98% 85 85 ipb_ ipb_ 3% 3% 60 60 6 opb_ opb_ 50% 50% 369 369 80 80 ctrl ctrl 306 306 555 555 total total 7084 7084 645 645 cycle count own ather 5X area up 3% power own ather 3X! GSM viterbi example : summary Mistral 0000 8000 6000 4000 000 0000 8000 6000 4000 000 0 power area cycles 7x 7x!! efault loop ALU ACS 4 ACS Selection/aaptation by: Sabih Gerez, University of Twente, April, 00 7 Selection/aaptation by: Sabih Gerez, University of Twente, April, 00 8
applications Discussion: phase 3 SW coe generation yes OK? yes more appl.? HW esign Exploration phase processormoel Selection/aaptation by: Sabih Gerez, University of Twente, April, 00 9 Freeze processor moel applications SW coe generation OK? yes Application software evelopment: constraint riven compilation Discussion: problems with VLIWs coe size an instruction banwith coe compaction reuce coe size after scheuling possible compaction ratio? e.g. p0 0.9 an p 0. information content entropy - p i log p i 0.47 maximum compression factor control parallelism uring scheuling switch between ferent processor moels 0% of coe 90% runtime architecture reuce number of control bits for operan aresses e.g. 8 reg TM 8 bits/issue slot for aresses only use stacks an fos Selection/aaptation by: Sabih Gerez, University of Twente, April, 00 30 Discussion: clustere VLIW architectures RF RF RF3 RF4 FU FU FU3 FU4 RF RF RF3 RF4 flags FU FU FU3 FU4 IR IR IR3 IR4 Instruction memory Control Selection/aaptation by: Sabih Gerez, University of Twente, April, 00 3 Selection/aaptation by: Sabih Gerez, University of Twente, April, 00 3
Conclusions ADSPs provie efficient solutions for a well-efine application omain orers of magnitue higher efficiency. The methoology is interesting for IP creation. The key problem is retargetable compilation. A istribute VLIW moel is a goo compromise between HW an SW. Although an automatic process can generate a efault solution, the process usually is interactive an iterative for efficiency reasons. The key is fast an accurate feeback. Selection/aaptation by: Sabih Gerez, University of Twente, April, 00 33