1
             Instruction set definition and instruction selection for ASIPs
             - Van Praet, J.; Goossens, G.; Lanneer, D.; De Man, H.
             IMEC, Leuven, Belgium

This Paper Appears in :
High-Level Synthesis, 1994., Proceedings of the Seventh International Symposium on

on Pages: 11 - 16

             This Conference was Held : 18-20 May 1994
              1994
                                    ISBN: 0-8186-5785-5

             IEEE Catalog Number: 94TH0641-1
             Total Pages: ix+171
             References Cited: 14
             Accession Number: 4706372

Abstract:

                   Application Specific Instruction set Processors (ASIPs) are field or mask programmable
                   processors of which the architecture and instruction set are optimised to a specific
                   application domain. ASIPs offer a high degree of flexibility and are therefore increasingly
                   being used in competitive markets like telecommunications. However, adequate CAD
                   techniques for the design and programming of ASIPs are missing hitherto. An interactive
                   approach for the definition of optimised microinstruction sets of ASIPs is presented. A
                   second issue is a method for instruction selection when generating code for a predefined
                   ASIP. A combined instruction set and data-path model is generated, onto which the
                   application is mapped.

2
             A performance maximization algorithm to design ASIPs under the
             constraint of chip area including RAM and ROM sizes
             - Nguyen Ngoc Binh; Imai, M.; Takeuchi, Y.
             Dept. of Inf. & Math. Sci., Osaka Univ., Japan

This Paper Appears in :
Design Automation Conference 1998. Proceedings of the ASP-DAC '98. Asia and South Pacific

on Pages: 367 - 372

             This Conference was Held : 10-13 Feb. 1998
              1998
                                    ISBN: 0-7803-4425-1

             IEEE Catalog Number: 98EX121
             Total Pages: xxxviii+606
             References Cited: 19
             Accession Number: 5984946

Abstract:

                   In designing ASIPs (Application Specific Integrated Processors) the papers investigated so
                   far have almost focused on the optimization of the CPU core and did not pay enough attention
                   to the optimization of the RAM and ROM size together. This paper overcomes this limitation
                   and proposes an optimization algorithm to define the best tradeoff between the CPU core,
                   RAM and ROM of an ASIP chip to achieve the highest performance while satisfying design
                   constraints on the chip area. The partitioning problem is formalized as a combinatorial
                   optimization problem that partitions the operations into hardware and software so that the
                   performance of the designed ASIP is maximized under given chip area constraint, where the
                   chip area includes the HW cost of the register file for a given application program with the
                   associated input data set. The optimization problem is parameterized so that it can be applied
                   with different technologies to synthesize CPU cores, RAMs or ROMs. The experimental
                   results show that the proposed algorithm is found to be effective and efficient.

3
               PEAS-I: A hardware/software co-design system for ASIPs
               - Alomary, A.; Nakata, T.; Honma, Y.; Sato, J.; Hikichi, N.; Imai, M.
               Toyohashi Univ. of Technol., Japan

               This Paper Appears in :
               Design Automation Conference, 1993, with EURO-VHDL '93. Proceedings EURO-DAC '93.,
               European

on Pages: 2 - 7

               This Conference was Held : 20-24 Sept. 1993
                1993
                                          ISBN: 0-8186-4350-1

               IEEE Catalog Number: 93CH3352-2
               Total Pages: xxi+579
               References Cited: 10
               Accession Number: 5038430

Abstract:

                      The current implementation and experimental results of the PEAS-1 (practical environment
                      for application specific integrated processor (ASIP) development - Version I) system are
                      described. The PEAS-I system is a hardware/software co-design system for ASIP
                      development. The input to the system is a set of application programs written in C language,
                      an associated data set, and design constraints such as chip area and power consumption.
                      The system generates an optimized CPU core design in the form of an HDL, as well as a set
                      of application program development tools, such as a C compiler, assembler, and simulator. A
                      novel method that formulates the design of an optimal instruction set using an integer
                      programming approach is described. A tool that enables the designer to predict the chip area
                      and performance of the design before the detailed design is completed is discussed.
                      Application program development tools are generated in addition to the ASIP hardware

4
               An ASIP design methodology for embedded systems
               - Kucukcakar, K.
               Escalade Corp., Santa Clara, CA, USA

               This Paper Appears in :
               Hardware/Software Codesign, 1999. (CODES '99). Proceedings of the Seventh International
               Workshop on

on Pages: 17 - 21

               This Conference was Held : 3-5 May 1999
                1999
                                          ISBN: 1-58113-132-1

               IEEE Catalog Number: 99TH8450
               Total Pages: vii+216
               References Cited: 8
               Accession Number: 6319827

Abstract:

                      A well-known challenge during processor design is to obtain the best possible results for a
                      typical target application domain that is generally described as a set of benchmarks.
                      Obtaining the best possible result in turn becomes a complex tradeoff between the generality
                      of the processor and the physical characteristics. A custom instruction to perform a task can
                      result in significant improvements for an application, but generally, at the expense of some
                      overhead for all other applications. In the recent years, Application-Specific Instruction-Set
                      Processors (ASIP) have gained popularity in production chips as well as in the research
                      community. In this paper, we present a unique architecture and methodology to design ASIPs
                      in the embedded controller domain by customizing an existing processor instruction set and
                      architecture rather than creating an entirely new ASIP tuned to a benchmark.

5
               An integrated design environment for application specific integrated
               processor
               - Sato, J.; Imai, M.; Hakata, T.; Alomary, A.Y.; Hikichi, N.
               Dept. of Inf. & Comput. Sci., Toyohashi Univ. of Technol., Japan

               This Paper Appears in :
               Computer Design: VLSI in Computers and Processors, 1991. ICCD '91. Proceedings, 1991 IEEE
               International Conference on

on Pages: 414 - 417

               This Conference was Held : 14-16 Oct. 1991
                1991
                                          ISBN: 0-8186-2270-9

               Total Pages: xvi+654
               References Cited: 10
               Accession Number: 4128007

Abstract:

                      A novel framework for ASIP (application specific integrated processor) development is
                      proposed. The system accepts a set of example programs written in the C language and their
                      expected data as input, and profiles these programs both statically and dynamically. Then
                      taking advantage of the profiled results, the system decides the instruction set and hardware
                      architectures of ASIP, and synthesizes the CPU core design of the ASIP, as well as the
                      software development tools for the ASIP such as compiler and simulator.

6
               PSCP: A scalable parallel ASIP architecture for reactive systems
               - Pyttel, A.; Sedlmeier, A.; Veith, C.
               Corp. Technol., Siemens AG, Munich, Germany

This Paper Appears in :
Design, Automation and Test in Europe, 1998., Proceedings

on Pages: 370 - 376

               This Conference was Held : 23-26 Feb. 1998
                1998
                                          ISBN: 0-8186-8359-7

               IEEE Catalog Number: 98EX123
               Total Pages: xxxiv+993
               References Cited: 18
               Accession Number: 5906829

Abstract:

                      We describe a codesign approach based on a parallel and scalable ASIP architecture, which
                      is suitable for the implementation of reactive systems. The specification language of our
                      approach is extended statecharts. Our ASIP architecture is scalable with respect to the
                      number of processing elements as well as parameters such as bus widths and register file
                      sizes. Instruction sets are generated from a library of components covering a spectrum of
                      space/time trade-off alternatives. Our approach features a heuristic static timing analysis
                      step for statecharts. An industrial example requiring the real-time control of several stepper
                      motors illustrates the benefits of our approach.

7
               Design of an ASIP architecture for low-level visual elaborations
               - Raffo, L.; Sabatini, S.P.; Mantelli, M.; De Gloria, A.; Bisio, G.M.
               Dept. of Electr. & Electron. Eng., Cagliari Univ., Italy

This Paper Appears in :
Very Large Scale Integration (VLSI) Systems, IEEE Transactions on

on Pages: 145 - 153

               This Conference was Held : 18-20 Jan. 1995
                March 1997
                                       Vol. 5
                                                           Issue: 1
                                                                                 ISSN: 1063-8210

               References Cited: 10
               CODEN: IEVSE9
               Accession Number: 5525495

Abstract:

                      We consider the design process of VLSI systems dedicated to the real-time implementation
                      of cooperative algorithms whose functionalities can be characterized by multilayer
                      ensembles of simple elements which interact locally. These algorithms are related, even
                      though not exclusively, to the implementation of various tasks in low-level machine vision.
                      The starting point in the design process is the formulation of the sequential algorithm that
                      computes the behavior of the system. Algorithmic transformations are performed to expose
                      the parallelism originally present in the task. Given the description in terms of parallel loops,
                      we partition the system and organize it as a set of processing units. The architectural
                      structure of these units takes properly into account the algorithmic constraints on precision
                      both in data representation and computation. The program flow implemented by our
                      programmable architectural solution (ASIP) is an iterative sequence of
                      multiply-and-accumulate operations performed in parallel. The programmability concerns
                      both the structure/coefficients of the algorithm-depending on the specific application-and its
                      computational parameters. The architecture's main blocks are described in VHDL and
                      synthesized as a semi-custom chip, using standard tools. Following this procedure, we
                      designed an ASIP core for performing real-time texture-based image segregation.

               Lower bound on latency for VLIW ASIP datapaths
               - Jacome, M.F.; De Veciana, G.
               Dept. of Electr. & Comput. Eng., Texas Univ., Austin, TX, USA

               This Paper Appears in :
               Computer-Aided Design, 1999. Digest of Technical Papers. 1999 IEEE/ACM International Conference
               on

on Pages: 261 - 268

               This Conference was Held : 7-11 Nov. 1999
                1999
                                          ISBN: 0-7803-5832-5

               IEEE Catalog Number: 99CH37051
               Total Pages: xxiv+611
               References Cited: 11
               Accession Number: 6441936

Abstract:

                      Traditional lower bound estimates on latency for dataflow graphs assume no data transfer
                      delays. While such approaches can generate tight lower bounds for datapaths with a
                      centralized register file, the results may be uninformative for datapaths with distributed
                      register file structures that are characteristic of VLIW ASIPs (very large instruction word
                      application-specific instruction set processors). In this paper, we propose a latency bound
                      that accounts for such data transfer delays. The novelty of our approach lies in constructing
                      the "window dependency graph" and bounds associated with the problem which capture
                      delay penalties due to operation serialization and/or data moves among distributed register
                      files. Through a set of benchmark examples, we show that the bound is competitive with
                      state-of-the-art approaches. Moreover, our experiments show that the approach can aid an
                      iterative improvement algorithm in determining good functional unit assignments-a key step
                      in code generation for VLIW ASIPs.

9
               A new HW/SW partitioning algorithm for synthesizing the highest
               performance pipelined ASIPs with multiple identical FUs
               - Binh, N.N.; Imai, M.; Shiomi, A.
               Dept. of Inf. & Comput. Sci., Osaka Univ., Japan

               This Paper Appears in :
               Design Automation Conference, 1996, with EURO-VHDL '96 and Exhibition, Proceedings EURO-DAC
               '96, European

on Pages: 126 - 131

               This Conference was Held : 16-20 Sept. 1996
                1996
                                          ISBN: 0-8186-7573-X

               IEEE Catalog Number: 96CB36000
               Total Pages: xxiii+579
               References Cited: 18
               Accession Number: 5412409

Abstract:

                      This paper introduces a new HW/SW partitioning algorithm for automatic synthesis of a
                      pipelined CPU architecture with multiple identical functional units (MIFUs) of each type in
                      designing ASIPs (Application Specific Integrated Processors). The partitioning problem is
                      formalized as a combinatorial optimization problem that partitions the operations into
                      hardware and software so that the performance of the designed ASIP is maximized under
                      given gate count and power consumption constraints, regarding the optimal selection of
                      needed FUs of each type. A branch-and-bound algorithm with proposed lower bound
                      function is used to solve the formalized problem. The experimental results show that the
                      proposed algorithm is found to be effective and efficient.

10
               System design using ASIPs
               - Carro, L.; Pereira, G.A.; Alba, C.; Suzim, A.
               Univ. Federal do Rio Grande do Sul, Porto Alegre, Brazil

This Paper Appears in :
Engineering of Computer-Based Systems,1996. Proceedings., IEEE Symposium and Workshop on

on Pages: 80 - 85

               This Conference was Held : 11-15 March 1996
                1996
                                          ISBN: 0-8186-7355-9

               IEEE Catalog Number: 96TB100022
               Total Pages: xi+465
               References Cited: 9
               Accession Number: 5226399

Abstract:

                      This paper describes our current research in the field of systems design, trying to reach an
                      Application Specific System Integration (ASIS). We try to go beyond circuit integration to
                      reach systems integration, using Application Specific Processors (ASIPs) with different
                      architectures. Our target system is based on industry applications. In this paper we show the
                      environment that allows the fine tuning of RISC processors to specific applications, and the
                      migration of a CISC microcontroller to an ASIP architecture. The studied examples show
                      meaningful gains regarding the total area of the processor for each approach. This free space
                      can be used to integrate other parts of the whole system.

11
               Incorporating compiler feedback into the design of ASIPs
               - Onion, F.; Nicolau, A.; Dutt, N.
               Dept. of Inf. & Comput. Sci., California Univ., Irvine, CA, USA

This Paper Appears in :
European Design and Test Conference, 1995. ED&TC 1995, Proceedings.

on Pages: 508 - 513

               This Conference was Held : 6-9 March 1995
                1995
                                          ISBN: 0-8186-7039-8

               IEEE Catalog Number: 95TH8058
               Total Pages: xxvii+611
               References Cited: 12
               Accession Number: 5057083

Abstract:

                      This paper presents a framework for providing feedback from an optimizing compiler into the
                      design of an ASIP (Application Specific Instruction-set Processor). The optimizing compiler
                      is used to assess the hardware needs of a suite of applications to which the ASIP is to be
                      tuned. By incorporating the compiler into the design process, the design space is increased
                      as more information is provided at an earlier stage during the design process. Our initial
                      study involves detecting potentially chainable operation sequences using scheduling
                      techniques developed for exploiting instruction-level parallelism. Results of this study are
                      included.

12
               Application-Specific Pipelines for Exploiting Instruction-Level Parallelism
                Childers, B.R.; Davidson J.W.
                University of Virginia

Technical Report No. CS-98-14, May 1, 1998

                Abstract :
                    Application-specific processor design is a promising approach for meeting the
                    performance and cost goals of a system. Application-specific processors are
                    especially promising for embedded systems (e.g., automobile control systems,
                    avionics, cellular phones, etc.) where a small increase in performance and
                    decrease in cost can have a large impact on a product's viability. Sutherland,
                    Sproull, and Molnar have proposed a new pipeline organization called the
                    Counterflow Pipeline (CFP). This paper shows that the CFP is an ideal architecture
                    for fast, low-cost design of high-performance processors customized for
                    computation-intensive embedded applications. First, we describe why CFP's are
                    particularly well-suited to realizing application-specific processors.
                    Second, we describe how a CFP tailored to an application can be constructed
                    automatically. Third, we present measurements that show CFP's elegantly and simply
                    provide speculative execution, out-of-order execution, and register renaming that is
                    matched to the application. These measurements show that CFP's speculative
                    and out-of-order execution allow it to tolerate frequent control dependences and
                    high-latency operations such as memory accesses. Finally, we show that asynchro-
                    nous counterflow pipelines may achieve very high-performance by reducing the
                    average execution latency of instructions over synchronous implementations. Appli-
                    cation speedups of up to 7.8 are achieved using custom counterflow pipelines for
                    several well-known kernel loops.

13
               Hierarchical test generation and design for testability methods for ASPPs
               and ASIPs
               - Ghosh, I.; Raghunathan, A.; Jha, N.K.
               Fujitsu Labs. of America, Sunnyvale, CA, USA

This Paper Appears in :
Computer-Aided Design of Integrated Circuits and Systems, IEEE Transactions on

on Pages: 357 - 370

                March 1999
                                       Vol. 18
                                                           Issue: 3
                                                                                 ISSN: 0278-0070

               References Cited: 38
               CODEN: ITCSDI
               Accession Number: 6196926

Abstract:

                      In this paper, we present design for testability (DFT) and hierarchical test generation
                      techniques for facilitating the testing of application-specific programmable processors
                      (ASPPs) and application-specific instruction processors (ASIPs). The method utilizes the
                      register-transfer level (RTL) circuit description of an ASPP or ASIP to come up with a set of
                      test microcode patterns which can be written into the instruction read-only memory (ROM)
                      of the processor. These lines of microcode dictate a new control/data flow in the circuit and
                      can be used to test modules which are not easily testable. The new control/data flow is used
                      to justify precomputed test sets of a module from the system primary inputs to the module
                      inputs and propagate output responses from the module output to the system primary
                      outputs. The testability analysis, which is based on the relevant control/data flow extracted
                      from the RTL circuit, is symbolic. Thus, it is independent of the bit-width of the data path
                      and is extremely fast. The test microcode patterns are a by-product of this analysis. If the
                      derived test microcode cannot test all untested modules in the circuit, then test multiplexers
                      are added (usually to the off-critical paths of the data path) to test these modules. This is
                      done to guarantee the testability of all modules in the circuit. If the control microcode
                      memory of the processor is erasable, then the test microcode lines can be erased once the
                      testing of the chip is over. In that case, the DFT scheme has very little overhead (typically
                      less than 1%). Otherwise, the test microcode lines remain as an overhead in the control
                      memory. The method requires the addition of only one external test pin. Application of this
                      technique to several examples has resulted in a very high fault coverage (above 99.6%) for
                      all of them. The test generation time is about three orders of magnitude smaller compared to
                      an efficient gate-level sequential test generator. The average area overhead (without
                      assuming an erasable ROM) is 3.1% while the delay overheads are negligible. This method
                      does not require any scan in the controller or data path. It is also amenable to at-speed
                      testing.

14
               Functional verification of intellectual properties (IP): a simulation-based
               solution for an application-specific instruction-set processor
               - Stadler, M.; Rower, T.; Kaeslin, H.; Felber, N.; Fichtner, W.; Thalmann, M.
               Integrated Syst. Lab., Swiss Fed. Inst. of Technol., Zurich, Switzerland

This Paper Appears in :
Test Conference, 1999. Proceedings. International

on Pages: 414 - 420

               This Conference was Held : 28-30 Sept. 1999
                1999
                                          ISBN: 0-7803-5753-1

               IEEE Catalog Number: 99CH37034
               Total Pages: xiv+1163
               References Cited: 16
               Accession Number: 6536392

Abstract:

                      Scalability and customization properties of IP modules demand for new approaches in
                      functional verification. We present a novel simulation-based solution for an
                      Application-specific Instruction-set Processor (ASIP). Existing assembler code
                      preselected by IP-configurable constraints forms the verification data base (reference
                      stimuli). A behavioral "golden model" of the IP is used to derive expected responses suitable
                      for any possible configuration of the final ASIP (RTL) implementation. Cycle-based
                      verification is performed by stimulating the RTL model with the assembled reference stimuli
                      and by comparing the outputs (actual responses) against the expected responses. Primary
                      input stimulation is accomplished by reading back interface data prior written to a memory
                      (model) under control of the reference stimuli. The synchronization of the
                      configaration-dependent actual responses to the non-cycle-related expected responses is
                      achieved by a mechanism based on "interface-specific activity scheduling", which further
                      more reduces the number of vectors efficiently, resulting in a significant simulation
                      speed-up.

15
             Reconfigurable systems: activities in Asia and South Pacific
             - Amano, H.; Shibata, Y.
            Dept. of Comput. Sci., Keio Univ., Yokohama, Japan

This Paper Appears in :
Design Automation Conference 1998. Proceedings of the ASP-DAC '98. Asia and South Pacific

on Pages: 453 - 457

             This Conference was Held : 10-13 Feb. 1998
              1998
                                    ISBN: 0-7803-4425-1

             IEEE Catalog Number: 98EX121
             Total Pages: xxxviii+606
             References Cited: 43
             Accession Number: 5920071

Abstract:

                   Systems and researches on reconfigurable systems in Asia and South Pacific are picked up
                   and introduced. Like Northern America and European countries, various platforms,
                   application specific systems and education platforms have been proposed and developed.

16
               Exploiting intellectual properties in ASIP designs for embedded DSP
               software
               - Hoon Choi; Ju Hwan Yi; Jong-Yeol Lee; In-Cheol Park; Chong-Min Kyung
               Dept. of Electr. Eng., Korea Adv. Inst. of Sci. & Technol., Taejon, South Korea

This Paper Appears in :
Design Automation Conference, 1999. Proceedings. 36th

on Pages: 939 - 944

               This Conference was Held : 21-25 June 1999
                1999
                                          ISBN: 1-58113-092-9

               IEEE Catalog Number: 99CH36361
               Total Pages: xxxii+1003
               References Cited: 10
               Accession Number: 6504323

Abstract:

                      The growing requirements on the correct design of a high-performance system in a short
                      time force us to use IP's in many designs. In this paper, we propose a new approach to select
                      the optimal set of IPs and interfaces to make the application program meet the performance
                      constraints in ASIP designs. The proposed approach selects IPs with considering interfaces
                      and supports concurrent execution of parts of task in kernel as software code with others in
                      IPs, while the previous state-of-the-art approaches do not consider IPs and interfaces
                      simultaneously and cannot support the concurrent execution. The experimental results on
                      real applications show that the proposed approach is effective in making application
                      programs meet the performance constraints using IPs.

17
               Instruction set selection for ASIP design
               - Gschwind, M.
               IBM Thomas J. Watson Res. Center, Yorktown Heights, NY, USA

               This Paper Appears in :
               Hardware/Software Codesign, 1999. (CODES '99). Proceedings of the Seventh International
               Workshop on

on Pages: 7 - 11

               This Conference was Held : 3-5 May 1999
                1999
                                          ISBN: 1-58113-132-1

               IEEE Catalog Number: 99TH8450
               Total Pages: vii+216
               References Cited: 26
               Accession Number: 6319825

Abstract:

                      We describe an approach for application-specific processor design based on an extendible
                      microprocessor core. Core-based design allows to derive application-specific instruction
                      processors from a common base architecture with low non-recurring engineering cost. The
                      results of this application-specific customization of a common base architecture are families
                      of related and largely compatible processor families. These families can share support tools
                      and even binary compatible code which has been written for the common base architecture.
                      Critical code portions are customized using the application-specific instruction set
                      extensions. We describe a hardware/software co-design methodology which can be used
                      with this design approach. The presented approach uses the processor core to allow early
                      evaluation of ASIP design options using rapid prototyping techniques. We demonstrate this
                      approach with two case studies, based on the implementation and evaluation of
                      application-specific processor extensions for Prolog program execution, and memory
                      prefetching for vector and matrix operations.

18
               Resource constrained dataflow retiming heuristics for VLIW ASIPs
               - Jacome, M.; de Veciana, G.; Akturan, C.
               Dept. of Electr. & Comput. Eng., Texas Univ., Austin, TX, USA

               This Paper Appears in :
               Hardware/Software Codesign, 1999. (CODES '99). Proceedings of the Seventh International
               Workshop on

on Pages: 12 - 16

               This Conference was Held : 3-5 May 1999
                1999
                                          ISBN: 1-58113-132-1

               IEEE Catalog Number: 99TH8450
               Total Pages: vii+216
               References Cited: 14
               Accession Number: 6319826

Abstract:

                      This paper addresses issues in code generation of time critical loops for VLIW ASIPs with
                      heterogenous distributed register structures. We discuss a code generation phasing whereby
                      one first considers binding options that minimize the significant delays that may be incurred
                      on such processors. Given such a binding we consider retiming, subject to code size
                      constraints, so as to enhance performance. Finally a compatible schedule, minimizing
                      latency, is sought. Our main focus in this paper is on the role retiming plays in this complex
                      code generation problem. We propose heuristic algorithms for exploring code
                      size/performance tradeoffs through retiming. Experimental results are presented indicating
                      that the heuristics perform well on a sample of dataflows.

19
               A hardware/software codesign partitioner for ASIP design
               - Alomary, A.Y.
               Appl. Sci. Univ., Amman, Jordan

               This Paper Appears in :
               Electronics, Circuits, and Systems, 1996. ICECS '96., Proceedings of the Third IEEE International
               Conference on

on Pages: 251 - 254 vol.1

               This Conference was Held : 13-16 Oct. 1996
                1996
                                          Vol. 1
                                                                    ISBN: 0-7803-3650-X

               IEEE Catalog Number: 96TH8229
               Total Pages: 2 vol. xxix+1256
               References Cited: 7
               Accession Number: 5621974

Abstract:

                      This paper introduces a new codesign partitioning method used in automating the design of
                      ASIP (Application Specific Integrated Processor). The codesign partitioning problem is
                      formalized as a combinatorial optimization problem that partitions the operations into
                      hardware and software such that a certain performance goal is met using minimum hardware
                      resources. A branch-and-bound algorithm is used to solve the presented formalization. The
                      proposed method is found to be effective in producing a quality design in reasonable time
                      with a minimum of design interaction.

20
               An ASIP instruction set optimization algorithm with functional module
               sharing constraint
               - Alomary, A.; Nakata, T.; Honma, Y.; Imai, M.; Hikichi, N.
               Toyohashi Univ. of Technol., Japan

               This Paper Appears in :
               Computer-Aided Design, 1993. ICCAD-93. Digest of Technical Papers., 1993 IEEE/ACM
               International Conference on

on Pages: 526 - 532

               This Conference was Held : 7-11 Nov. 1993
                1993
                                          ISBN: 0-8186-4490-7

               IEEE Catalog Number: 93CH3344-9
               Total Pages: xxviii+781
               References Cited: 6
               Accession Number: 4979737

Abstract:

                      This paper describes a formal method that selects the instruction set of an ASIP (application
                      specific integrated processor) that maximizes the chip performance under the constraints of
                      chip area and power consumption. Our contribution includes a new formalization and
                      algorithm that considers the functional module sharing in the problem of instruction set
                      optimization. This problem was not addressed in the previous work and considering it leads
                      to an efficient implementation of the selected instructions. The proposed method also
                      enables designers to predict the performance of their designs before implementing them,
                      which is an important feature for producing a high quality design in reasonable time.

21
               Mapping statechart models onto an FPGA-based ASIP architecture
               - Buchenrieder, K.; Pyttel, A.; Veith, C.
               Corp. Res. & Dev., Siemens AG, Munich, Germany

               This Paper Appears in :
               Design Automation Conference, 1996, with EURO-VHDL '96 and Exhibition, Proceedings EURO-DAC
               '96, European

on Pages: 184 - 189

               This Conference was Held : 16-20 Sept. 1996
                1996
                                          ISBN: 0-8186-7573-X

               IEEE Catalog Number: 96CB36000
               Total Pages: xxiii+579
               References Cited: 19
               Accession Number: 5412417

Abstract:

                      In this paper, we describe a system to map hardware-software systems specified with
                      statechart models on an ASIP architecture based on FPGAs. The architecture consists of a
                      reusable CPU core with enhancements to execute the behavior of statecharts correctly. Our
                      codesign system generates an application-specific hardware control block, an
                      application-specific set of registers, and an instruction stream. The instruction stream
                      consists of a static set of core instructions, and a set of custom instructions for performance
                      enhancements. In contrast to previous approaches, the presented method supports extended
                      statecharts. The system also assists designers during space/time tradeoff optimizations. The
                      benefits of the approach are demonstrated with an industrial control application comparing
                      two different timing schemes.

22
               A hardware/software partitioning algorithm for pipelined instruction set
               processor
               - Binh, N.N.; Imai, M.; Shiomi, A.; Hikichi, N.
               Dept. of Inf. & Comput. Sci., Toyohashi Univ. of Technol., Japan

This Paper Appears in :
Design Automation Conference, 1995, with EURO-VHDL, Proceedings EURO-DAC '95., European

on Pages: 176 - 181

               This Conference was Held : 18-22 Sept. 1995
                1995
                                          ISBN: 0-8186-7156-4

               IEEE Catalog Number: 95CB35850
               Total Pages: xxviii+608
               References Cited: 9
               Accession Number: 5100243

Abstract:

                      This paper proposes a new method to design an optimal instruction set for pipelined ASIP
                      development using a formal HW/SW codesign methodology. The codesign task addressed in
                      this paper is to find a set of HW implemented operations to achieve the highest performance
                      of a pipelined ASIP under a given gate count and power consumption constraint. The method
                      enables to estimate the performance and pipeline hazards of the designed ASIP very
                      accurately. The experimental results show that the proposed method is effective and quite
                      efficient.

23
             Architecture synthesis of high-performance application-specific
             processors
             - Breternitz, M., Jr.; Shen, J.P.
             Dept. of Electr. & Comput. Eng., Carnegie Mellon Univ., Pittsburgh, PA, USA

This Paper Appears in :
Design Automation Conference, 1990. Proceedings., 27th ACM/IEEE

on Pages: 542 - 548

             This Conference was Held : 24-28 June 1990
              1990
                                    ISBN: 0-89791-363-9

             Total Pages: xxi+743
             References Cited: 13
             Accession Number: 3976155

Abstract:

                   An automated approach, called architecture synthesis, for designing application-specific
                   processors is presented. The key principles of the application-specific processor design
                   (ASPD) methodology include: a semicustom compilation-driven design/implementation
                   approach, the exploitation of fine-grained parallelism for high performance, and the
                   adaptation of datapath topology to the data transfers required by the application. The
                   powerful microcode compilation techniques of percolation scheduling and pipeline scheduling
                   extract and enhance the parallelism in the application object code to generate all optimized
                   specification of the target processor. Implementation optimization is performed to allocate
                   functional units and register files. Graph-coloring algorithms minimize the amount of
                   hardware needed to exploit available parallelism. Data memory employs an organization with
                   multiple banks. Compilation techniques are used to allocate data over the memory banks to
                   enhance parallel access.

24
             Architectural considerations for application-specific counterflow
             pipelines
             - Childers, B.R.; Davidson, J.W.
             Editor(s): Wills, D.S., DeWeerth, S.P.
             Dept. of Comput. Sci., Virginia Univ., Charlottesville, VA, USA

This Paper Appears in :
Advanced Research in VLSI, 1999. Proceedings. 20th Anniversary Conference on

on Pages: 3 - 22

             This Conference was Held : 21-24 March 1999
              1999
                                    ISBN: 0-7695-0056-0

             Total Pages: x+380
             References Cited: 29
             Accession Number: 6376051

Abstract:

                   Application-specific processor design is a promising approach for meeting the performance
                   and cost goals of a system. Application-specific processors are especially promising for
                   embedded systems (e.g., digital cameras, cellular phones, etc.) where a small increase in
                   performance and decrease in cost can have a large impact on a product's viability. Sproull,
                   Sutherland and Molnar (see IEEE Design and Test of Computers, vol. 11, no. 3, p. 48-59,
                   1994) have proposed a new pipeline organization called the Counterflow Pipeline (CFP). This
                   paper evaluates CFP design alternatives and shows that the CFP is an ideal architecture for
                   fast, low-cost design of high-performance processors customized for
                   computation-intensive embedded applications. First, we describe why CFP's are particularly
                   well-suited to realizing application-specific processors. Second we describe how a CFP
                   tailored to an application can be constructed automatically. Third, we present measurements
                   that evaluate CFP design trade-offs and show that CFP's provide speculative and
                   out-of-order execution, and register renaming that is matched to an application. Fourth, we
                   show that asynchronous counterflow pipelines achieve high-performance by reducing the
                   average execution latency of instructions over synchronous implementations. Finally, we
                   demonstrate that custom CFP's achieve cycles per instruction measurements that are
                   competitive with 4-way superscalar out-of-order processors at a potentially low design
                   complexity.

25
               Instruction-set modelling for ASIP code generation
               - Leupers, R.; Marwedel, P.
               Dept. of Comput. Sci., Dortmund Univ., Germany

This Paper Appears in :
VLSI Design, 1996. Proceedings., Ninth International Conference on

on Pages: 77 - 80

               This Conference was Held : 3-6 Jan. 1996
                1995
                                          ISBN: 0-8186-7228-5

               IEEE Catalog Number: 96TB100010
               Total Pages: xxxiv+439
               References Cited: 13
               Accession Number: 5374969

Abstract:

                      A main objective in code generation for ASIPs is to develop retargetable compilers in order to
                      permit exploration of different architectural alternatives within short turnaround time.
                      Retargetability requires that the compiler is supplied with a formal description of the target
                      processor. This description is usually transformed into an internal instruction set model, on
                      which the actual code generation operates. In this contribution we analyze the demands on
                      instruction set models for retargetable code generation, and we present a formal instruction
                      set model which meets these demands. Compared to previous work, it covers a broad range
                      of instruction formats and includes a detailed view of inter-instruction restrictions.

26
               Instruction-set matching and selection for DSP and ASIP code generation
               - Liem, C.; May, T.; Paulin, P.
               Bell-Northern Res., Ottawa, Ont., Canada

               This Paper Appears in :
               European Design and Test Conference, 1994. EDAC, The European Conference on Design
               Automation. ETC European Test Conference. EUROASIC, The European Event in ASIC Design,
               Proceedings.

on Pages: 31 - 37

               This Conference was Held : 28 Feb.-3 March 1994
                1994
                                          ISBN: 0-8186-5410-4

               IEEE Catalog Number: 94TH0634-6
               Total Pages: xxvii+676
               References Cited: 15
               Accession Number: 4682244

Abstract:

                      The increasing use of digital signal processors (DSPs) and application specific
                      instruction-set processors (ASIPs) has put a strain on the perceived mature state of
                      compiler technology. The presence of custom hardware for application-specific needs has
                      introduced instruction types which are unfamiliar to the capabilities of traditional compilers.
                      Thus, these traditional techniques can lead to inefficient and sparsely compacted machine
                      microcode. In this paper, we introduce a novel instruction-set matching and selection
                      methodology, based upon a rich representation useful for DSP and mixed control-oriented
                      applications. This representation shows explicit behaviour that references architecture
                      resource classes. This allows a wide range of instructions types to be captured in a pattern
                      set. The pattern set has been organized in a manner such that matching is extremely efficient
                      and retargeting to architectures with new instruction sets is well defined. The matching and
                      selection algorithms have been implemented in a retargetable code generation system called
                      CodeSyn.

27
               IP-based design of custom field programmable network processors
               - Bombana, M.; Fominykh, N.; Gorla, G.; Kriajev, A.; Krivosheyin, B.; Rytchagov, J.
               Central Res., Italtel Soc. Italiana Telecommun. SpA, Milan, Italy

This Paper Appears in :
Electronics, Circuits and Systems, 1998 IEEE International Conference on

on Pages: 467 - 471 vol.1

               This Conference was Held : 7-10 Sept. 1998
                1998
                                          Vol. 1
                                                                    ISBN: 0-7803-5008-1

               IEEE Catalog Number: 98EX196
               Total Pages: 3 vol. (xxviii+557+557+569)
               References Cited: 9
               Accession Number: 6476137

Abstract:

                      A methodology was tested, based on reuse, to design ASIPs (application specific
                      programmable processors) at ASIC cost. Criteria are defined to identify reusable semantics
                      (noninstantiated intellectual properties) within functional specifications written in C. These
                      are isolated as hierarchically nested, object oriented C++ behaviors. A "what-if" exploration
                      flow brings to the optimized hw and sw sorting of every such IP inside an algorithm running
                      on a programmable architecture. The specific architecture is modeled and taken into account
                      by the sw and hw synthesis tools, not in the IP model. We evaluated the procedure
                      developing a VLIW custom programmable processor, re-configurable on both hw and sw.
                      This emulator is a prototype for fixed or programmable DSPs, and an archetype of a real-time
                      field retargetable "class" processor, with optimum speed and power performance tuned to
                      every new algorithm/data couple within a certain class of applications. An experiment on
                      processing the real time code for multi-mode communication terminals is reported.

28
               Function unit specialization through code analysis
               - Benyamin, D.; Mangione-Smith, W.H.
               Dept. of Electr. Eng., California Univ., Los Angeles, CA, USA

               This Paper Appears in :
               Computer-Aided Design, 1999. Digest of Technical Papers. 1999 IEEE/ACM International Conference
               on

on Pages: 257 - 260

               This Conference was Held : 7-11 Nov. 1999
                1999
                                          ISBN: 0-7803-5832-5

               IEEE Catalog Number: 99CH37051
               Total Pages: xxiv+611
               References Cited: 9
               Accession Number: 6441935

Abstract:

                      Many previous attempts at ASIP (application-specific instruction set processor) synthesis
                      have employed template matching techniques to target function units to application code, or
                      directly design new units to extract maximum performance. This paper presents an entirely
                      new approach to specializing hardware for application-specific needs. In our framework of a
                      parameterized VLIW processor, we use a post-modulo scheduling analysis to reduce the
                      allocated hardware resources while increasing the code's performance. Initial results
                      indicate significant savings in area, as well as optimizations to increase FIR filter code
                      performance by 200% to 300%.

29
            Custom Computing Machines vs. Hardware/Software Codesign : from a globalized point of
            view.
             - Hartenstein, R.W.; Becker, J.; Kress R.
            University of Kaiserslautern

            Abstract
                The paper gives a generalized survey on Customized Computing with research
                activities of the emerging new research scenes of Application Specific Instruction
                Set Processors (ASIPs) and Custom Computing Machines (CCMs). Both scenes
                have strong relations to Hardware/Software Co-Design. CCMs are mainly based
                on field-programmable add-on hardware to accelerate microprocessors or computers.
                The CCM scene tries to make standard hardware more soft for flexible adaptation to
                a variety of particular application environments. The ASIP scene tries to design an
                instruction set as an interface between hardware and application closely matching
                their characteristics.

30
             Algorithm and architecture-level design space exploration using
             hierarchical data flows
             - Peixoto, H.P.; Jacome, M.F.
             Editor(s): Thiele, L., Fortes, J., Vissers, K., Taylor, V., Noll, T., Teich, J.
             Dept. of Electr. & Comput. Eng., Texas Univ., Austin, TX, USA

             This Paper Appears in :
             Application-Specific Systems, Architectures and Processors, 1997. Proceedings., IEEE International
             Conference on

on Pages: 272 - 282

             This Conference was Held : 14-16 July 1997
              1997
                                    ISBN: 0-8186-7959-X

             IEEE Catalog Number: 97TB100177
             Total Pages: xii+540
             References Cited: 21
             Accession Number: 5685264

Abstract:

                   Incorporating algorithm and architecture level design space exploration in the early phases of
                   the design process can have a dramatic impact on the area, speed, and power consumption of
                   the resulting systems. This paper proposes a framework for supporting system-level design
                   space exploration and discusses the three fundamental issues involved in effectively
                   supporting such an early design space exploration: definition of an adequate level of
                   abstraction; definition of good fidelity system-level metrics; and definition of mechanisms for
                   automating the exploration process. The first issue, the definition of an adequate level of
                   abstraction is then addressed in detail. Specifically, an algorithm-level model, an
                   architecture-level model, and a set of operations on these models, are proposed, aiming at
                   efficiently supporting an early, aggressive system-level design space exploration. A
                   discussion on work in progress in the other two topics, metrics and automation, concludes
                   the paper.

31
               Designing with intellectual property
               - Gorla, G.
               Editor(s): Smailagic, A., Brodersen, R., De Man, H.
               Italtel SpA, Milan, Italy

This Paper Appears in :
VLSI '99. Proceedings. IEEE Computer Society Workshop On

on Pages: 125 - 132

               This Conference was Held : 8-9 April 1999
                1999
                                          ISBN: 0-7695-0152-4

               Total Pages: x+133
               References Cited: 9
               Accession Number: 6421923

Abstract:

                      A methodology was developed based on IP reuse, aimed at the design of integrated
                      micro-systems. It was tested on a specific custom ASIP (application specific instruction
                      processor) with good performance. IP occurrences are searched and identified inside the
                      system specification code (C has been used for test), before any architectural or partitioning
                      choice is done. Isolation criteria are their reusability, encapsulation and completeness, while
                      their C++ models are deliberately kept as mutually nestable objects arranged in a number of
                      hierarchical levels. Each such WARELET can be instantiated to full HW instance (like a
                      black box), or full software procedure, or a mix. Every alternative choice gives an IP instance
                      (IPI) whose reuse value is keyed in the IP model and in the parametric synthesis procedures
                      attached to it not in a single specific implementation The collection of WARELET instances
                      builds up the specific system instance. The design process is a "what-if": inside the code
                      describing a (sub)system some selected warelets are attributed to a HW implementation.
                      HW synthesis generates blocks that communicate within a pre-defined parametric
                      architectural harness either as coprocessors or as execution units of the instruction set. A
                      parallel stepwise co-synthesis is operated for SW code, re-targeting the microprogram
                      control code and the SW algorithm to every new HW configuration. A profiling process gives
                      performance figures to validate or change the choice. These system-level IPs offer
                      innovative opportunities concerning the management of intellectual value within products and
                      the commercial and industrial infrastructure.

32
               Conception and design of a RISC CPU for the use as embedded controller
               within a parallel multimedia architecture
               - Dogimont, S.; Gumm, M.; Mombers, F.; Mlynek, D.; Torielli, A.
               Editor(s): Thiele, L., Fortes, J., Vissers, K., Taylor, V., Noll, T., Teich, J.
               Ecole Polytech. Federale de Lausanne, Switzerland

               This Paper Appears in :
               Application-Specific Systems, Architectures and Processors, 1997. Proceedings., IEEE International
               Conference on

on Pages: 412 - 421

               This Conference was Held : 14-16 July 1997
                1997
                                          ISBN: 0-8186-7959-X

               IEEE Catalog Number: 97TB100177
               Total Pages: xii+540
               References Cited: 13
               Accession Number: 5685277

Abstract:

                      In this paper, the problem of defining a high performance control structure for a parallel
                      motion estimation architecture for MPEG2 coding is addressed. Various design and
                      architecture choices are discussed and the final architecture is described. It represents a
                      combined MIMD-SIMD approach which is based on a small but efficient ASIP with subword
                      parallelism.

33
               Software acceleration using coprocessors: is it worth the effort?
               - Edwards, M.
               Comput. Dept., Univ. of Manchester Inst. of Sci. & Technol., UK

               This Paper Appears in :
               Hardware/Software Codesign, 1997. (CODES/CASHE '97)., Proceedings of the Fifth International
               Workshop on

on Pages: 135 - 139

               This Conference was Held : 24-26 March 1997
                1997
                                          ISBN: 0-8186-7895-X

               IEEE Catalog Number: 97TB100115
               Total Pages: ix+179
               References Cited: 13
               Accession Number: 5559220

Abstract:

                      A commonly accepted technique in hardware/software co-design is to implement as many
                      system functions as possible in software and to move performance-critical functions into
                      special-purpose external hardware in order to either satisfy timing constraints or reduce the
                      overall execution time of a program-this is known as "software acceleration". This paper
                      investigates the limits to the performance enhancements obtainable using software
                      acceleration techniques. A practical target architecture, based on the use of programmable
                      logic, is used to illustrate the problems associated with software acceleration. It is shown
                      that, normally, little benefit can be obtained by applying software acceleration methods to
                      general-purpose applications. Whereas software acceleration can profitably be used in a
                      limited number of special-purpose applications, a designer would probably be better off
                      developing ASIP (application-specific instruction-set processor) components, based on
                      heterogeneous multiprocessor architectures.

34
               A constructive method for exploiting code motion
               - dos Santos, L.C.V.; Heijligers, M.J.M.; van Eijk, C.A.J.; van Eijndhoven, J.T.J.; Jess, J.A.G.
               Eindhoven Univ. of Technol., Netherlands

This Paper Appears in :
System Synthesis, 1996. Proceedings., 9th International Symposium on

on Pages: 51 - 56

               This Conference was Held : 6-8 Nov. 1996
                1996
                                          ISBN: 0-8186-7563-2

               IEEE Catalog Number: 96TB100061
               Total Pages: xii+145
               References Cited: 19
               Accession Number: 5450812

Abstract:

                      In this paper we address a resource-constrained optimization problem for behavioral
                      descriptions containing conditionals. In high-level synthesis of ASICs or in code generation
                      for ASIPs, most methods use greedy choices in such a way that the search space is limited
                      by the applied heuristics. For example, they might miss opportunities to optimize across
                      basic block boundaries when treating conditional execution. We propose an approach based
                      on local search and present a constructive method to allow unrestricted types of code
                      motion, while keeping optimal solutions in the search space. A code-motion pruning
                      technique is presented for cost functions optimizing schedule lengths. A technique for
                      treating concurrent flows of execution is also described.

35
               Instruction-set matching and GA-based selection for
               embedded-processor code generation
               - Shu, J.; Wilson, T.C.; Banerji, D.K.
               Dept. of Comput. & Inf. Sci., Guelph Univ., Ont., Canada

This Paper Appears in :
VLSI Design, 1996. Proceedings., Ninth International Conference on

on Pages: 73 - 76

               This Conference was Held : 3-6 Jan. 1996
                1995
                                          ISBN: 0-8186-7228-5

               IEEE Catalog Number: 96TB100010
               Total Pages: xxxiv+439
               References Cited: 9
               Accession Number: 5374968

Abstract:

                      The core tasks of retargetable code generation are instruction-set matching and selection for
                      a given application program and a DSP/ASIP processor. In this paper, we utilize a model of
                      target architecture specification that employs both behavioral and structural information, to
                      facilitate this process. The matching method is based on a pattern tree structure of
                      instructions. This tree structure, generated automatically, is implemented by using a pattern
                      queue and a flag table. The matching process is efficient since it bypasses many patterns in
                      the tree which do not match at certain nodes in the DFG of given application program. Two
                      genetic algorithms are implemented for pattern selection: a pure GA which uses standard GA
                      operators, and a GA with backtracking which employs variable-length chromesomes.
                      Optimal or near-optimal pattern selection is obtained in a reasonable period of time for a
                      wide range of application programs.

36
               A hardware/software codesign method for pipelined instruction set
               processor using adaptive database
               - Nguyen Ngoc Binh; Imai, M.; Shiomi, A.; Hikichi, N.
               Dept. of Inf. & Comput. Sci., Toyohashi Univ. of Technol., Japan

               This Paper Appears in :
               Design Automation Conference, 1995. Proceedings of the ASP-DAC '95/CHDL '95/VLSI '95., IFIP
               International Conference on Hardware Description Languages. IFIP International Conference on
               Very Large Scale Integration., Asian and South Pacific

on Pages: 81 - 86

               This Conference was Held : 29 Aug.-1 Sept. 1995
                1995
                                          ISBN: 4-930813-67-0

               IEEE Catalog Number: 95TH8102
               Total Pages: xxxii+860
               References Cited: 11
               Accession Number: 5217819

Abstract:

                      Proposes a new method to design an optimal pipelined instruction set processor using a
                      formal HW/SW codesign methodology. First, a HW/SW partitioning algorithm for selecting
                      an optimal pipelined architecture is introduced briefly. Then, an adaptive database approach
                      is presented that enables to enhance the optimality of the design through very accurate
                      estimation of the performance of a pipelined ASIP in HW/SW partitioning. The experimental
                      results show that the proposed methods are effective and efficient.

37
               An integer programming approach to instruction implementation method
               selection problem
               - Imai, M.; Alomary, A.; Sato, J.; Hikichi, N.
               Toyohashi Univ. of Technol., Japan

This Paper Appears in :
Design Automation Conference, 1992., EURO-VHDL '92, EURO-DAC '92. European

on Pages: 106 - 111

               This Conference was Held : 7-10 Sept. 1992
                1992
                                          ISBN: 0-8186-2780-8

               Total Pages: xviii+765
               References Cited: 11
               Accession Number: 4493502

Abstract:

                      A new algorithm for instruction implementation method selection problem (IMSP) in
                      application specific integrated processors (ASIP) design automation is proposed. This
                      problem is to be solved in the instruction set architecture and CPU core architecture designs.
                      First, the IMSP is formalized as an integer programming problem, which is to maximize the
                      performance of the CPU under the constraints of chip area and power consumption. Then, a
                      branch-and-bound algorithm to solve IMSP is described. According to the experimental
                      results, the proposed algorithm is quite effective and efficient in solving the IMSP. This
                      algorithm will automate the complex parts of the ASIP chip design.

38
               Performance evaluation for application-specific architectures
               - Jie Gong; Gajski, D.D.; Nicolau, A.
               Dept. of Inf. & Comput. Sci., California Univ., Irvine, CA, USA

This Paper Appears in :
Very Large Scale Integration (VLSI) Systems, IEEE Transactions on

on Pages: 483 - 490

                Dec. 1995
                                      Vol. 3
                                                           Issue: 4
                                                                                 ISSN: 1063-8210

               References Cited: 12
               CODEN: IEVSE9
               Accession Number: 5151981

Abstract:

                      Performance evaluation is critical for the minimization of design cost. It consists of two
                      parts: modeling the underlying hardware engine and evaluating the performance of the
                      application code for the model developed in the first part. In this paper, we propose a new
                      parameterized model for application-specific architectures and present a retargetable
                      scheduler for performance evaluation. The model, different from those proposed previously,
                      reflects comprehensive architectural characteristics that affect hardware parallelism. The
                      scheduler, distinguished from previous ones, takes into account not only functional and
                      storage unit resources but also interconnect resources during the performance evaluation.
                      The new architecture model, together with the retargetable scheduler, enables designers to
                      accurately evaluate the performance of a variety of ASIC and ASIP architectures.

39
               TAO-BIST: a framework for testability analysis and optimization of RTL
               circuits for BIST
               - Ravi, S.; Jha, N.K.; Lakshminarayana, G.
               Dept. of Electr. Eng., Princeton Univ., NJ, USA

This Paper Appears in :
VLSI Test Symposium, 1999. Proceedings. 17th IEEE

on Pages: 398 - 406

               This Conference was Held : 25-29 April 1999
                1999
                                          ISBN: 0-7695-0146-X

               IEEE Catalog Number: PR00146
               Total Pages: xxxii+488
               References Cited: 19
               Accession Number: 6450989

Abstract:

                      In this paper, we present TAO-BIST, a framework for testing register-transfer level (RTL)
                      controller-datapath circuits using built-in self-test (BIST). Conventional BIST techniques
                      at the RTL generally introduce more testability hardware than is necessary, thereby causing
                      unnecessary area, delay and power overheads. They have typically been applied to only
                      application-specific integrated circuits (ASICs). TAO-BIST adopts a three-phased
                      approach to provide an efficient BIST framework at the RTL. In the first phase, we identify
                      and add an initial set of test enhancements to the given circuit. In the second phase, we use
                      regular-expression based high-level symbolic testability analysis of a BIST model of the
                      circuit to completely encapsulate justification/propagation information for the modules under
                      test. The regular expressions so obtained are then used to construct a Boolean function in
                      the final phase for determining a test enhancement solution that meets delay constraints with
                      minimal area overheads. Our method is applicable to a wide spectrum of circuits including
                      ASICs, application-specific programmable processors (ASPPs), application-specific
                      instruction processors (ASIPs), digital signal processors (DSPs) and microprocessors.
                      Experimental results on a number of benchmark circuits show that high fault coverage
                      (<99%) can be obtained with our scheme. The average area and delay overheads due to
                      TAO-BIST are only 6.0%, and 1.5%, respectively. The test application time to achieve the
                      high fault coverage for the whole controller-datapath circuit is also quite low.

40
               Synthesis of configurable architectures for DSP algorithms
               - Ramanathan, S.; Visvanathan, V.; Nandy, S.K.
               Supercomput. Educ. & Res. Centre, Indian Inst. of Sci., Bangalore, India

This Paper Appears in :
VLSI Design, 1999. Proceedings. Twelfth International Conference On

on Pages: 350 - 357

               This Conference was Held : 7-10 Jan. 1999
                1999
                                          ISBN: 0-7695-0013-7

               IEEE Catalog Number: PR00013
               Total Pages: xxxi+642
               References Cited: 27
               Accession Number: 6324838

Abstract:

                      ASICs offer the best realization of DSP algorithms in terms of performance, but the cost is
                      prohibitive, especially when the volumes involved are low. However, if the architecture
                      synthesis trajectory for such algorithms is such that the target architecture can be identified
                      as an interconnection of elementary parameterized computational structures, then it is
                      possible to attain a close match, both in terms of performance and power with respect to an
                      ASIC, for any algorithmic parameters of the given algorithm. Such an architecture is weakly
                      programmable (configurable) and can be viewed as an application specific instruction-set
                      processor (ASIP). In this work, we present a methodology to synthesize ASIPs for DSP
                      algorithms.

41
             Memory size estimation for multimedia applications
             - Grun, P.; Balasa, F.; Dutt, N.
             California Univ., Irvine, CA, USA

             This Paper Appears in :
             Hardware/Software Codesign, 1998. (CODES/CASHE '98). Proceedings of the Sixth International
             Workshop on

on Pages: 145 - 149

             This Conference was Held : 15-18 March 1998
              1998
                                    ISBN: 0-8186-8442-9

             IEEE Catalog Number: 98TB100232
             Total Pages: vii+151
             References Cited: 15
             Accession Number: 5894896

Abstract:

                   Memory modules dominate the cost, performance, and power of embedded systems that
                   process multidimensional signals, typically present in image and video processing. Therefore,
                   studying the impact of parallelism on memory size is crucial for trading off system
                   performance against area cost to enable intelligent system partitioning and exploration. We
                   propose a memory size estimation method for algorithmic specifications containing
                   multidimensional arrays and parallel constructs, intended as part of a high-level partitioning
                   and exploration methodology. The system designer can trade-off estimation accuracy for
                   increased run time. We present the results of our estimation approach on a number of image
                   and video processing kernels, and discuss some preliminary results on the influence of
                   parallelism on storage requirement.

42
               Instruction subsetting: Trading power for programmability
               - Dougherty, W.E.; Pursley, D.J.; Thomas, D.E.
               Editor(s): Smailagic, A., Brodersen, R.De Man, H.
               Dept. of Electr. & Comput. Eng., Carnegie Mellon Univ., Pittsburgh, PA, USA

This Paper Appears in :
VLSI '98. System Level Design. Proceedings. IEEE Computer Society Workshop on

on Pages: 42 - 47

               This Conference was Held : 16-17 April 1998
                1998
                                          ISBN: 0-8186-8448-8

               IEEE Catalog Number: 98EX158
               Total Pages: ix+142
               References Cited: 16
               Accession Number: 6046631

Abstract:

                      Power consumption is an increasingly important consideration in the design of mixed
                      hardware/software systems. This work defines the notion of instruction subsetting and
                      explores its use as a means of reducing power consumption from the system level of design.
                      Instruction subsetting is defined as creating an application specific instruction set processor
                      from a more general processor such as a DSP. Although not as effective as an ASIC
                      solution, instruction subsetting provides much of the power savings while maintaining some
                      level of programmability. Instruction set choice strongly affects the savings. We synthesized
                      5 ASIPs through place and route and found that a poorly chosen instruction set may consume
                      more than 4 times the energy of an ASIP with a proper instruction set choice. This finding
                      will allow designers to consider another set of trade-offs in their hardware/software design
                      space exploration.

43
               Embedded software in real-time signal processing systems: application
               and architecture trends
               - Paulin, P.G.; Liem, C.; Cornero, M.; Nacabal, F.; Goossens, G.
               SGS-Thomson Microelectron., Crolles, France

This Paper Appears in :
Proceedings of the IEEE

on Pages: 419 - 435

                March 1997
                                       Vol. 85
                                                           Issue: 3
                                                                                 ISSN: 0018-9219

               References Cited: 60
               CODEN: IEEPAD
               Accession Number: 5550585

Abstract:

                      We present an extensive survey of trends in embedded processor use with an emphasis on
                      emerging applications in wireless communication, multimedia, and general
                      telecommunications. We demonstrate the importance of application-specific instruction-set
                      processors (ASIPs) in high-volume, low cost applications. We also examine some of the
                      underlying trends of the applications in which embedded processors are used. This is
                      followed by a description of embedded software development tool requirements.
                      High-performance software compilation emerges as a key requirement. Finally, specific
                      industrial case studies of products in MPEG, videophone, and low-cost digital signal
                      processor (DSP) applications are used to illustrate the architecture design tradeoffs, and
                      highlight specific tool requirements. A companion paper (Goosens et al., 1997) presents a
                      comprehensive survey of embedded software development tools, focusing mostly on
                      retargetable software compilation.

44
             Embedded software in real-time signal processing systems: design
             technologies
             - Goossens, G.; Van Praet, J.; Lanneer, D.; Geurts, W.; Kifli, A.; Liem, C.; Paulin, P.G.
             Target Compiler Technol., Leuven, Belgium

This Paper Appears in :
Proceedings of the IEEE

on Pages: 436 - 454

              March 1997
                                 Vol. 85
                                                   Issue: 3
                                                                      ISSN: 0018-9219

             References Cited: 97
             CODEN: IEEPAD
             Accession Number: 5550586

Abstract:

                   The increasing use of embedded software, often implemented on a core processor in a
                   single-chip system, is a clear trend in the telecommunications, multimedia, and consumer
                   electronics industries. A companion paper (Paulin et al., 1997) presents a survey of
                   application and architecture trends for embedded systems in these growth markets.
                   However, the lack of suitable design technology remains a significant obstacle in the
                   development of such systems. One of the key requirements is more efficient software
                   compilation technology. Especially in the case of fixed-point digital signal processor (DSP)
                   cores, it is often cited that commercially available compilers are unable to take full advantage
                   of the architectural features of the processor. Moreover, due to the shorter lifetimes and the
                   architectural specialization of many processor cores, processor designers are often
                   compelled to neglect the issue of compiler support. This situation has resulted in an
                   increased research activity in the area of design tool support for embedded processors. This
                   paper discusses design technology issues for embedded systems using processor cores,
                   with a focus on software compilation tools. Architectural characteristics of contemporary
                   processor cores are reviewed and tool requirements are formulated. This is followed by a
                   comprehensive survey of both existing and new software compilation techniques that are
                   considered important in the context of embedded processors.

45
               Designing a Java microcontroller to specific applications
               - Ito, S.A.; Carro, L.; Jacobi, R.P.
               Inst. of Comput. Sci., Univ. Fed. do Rio Grande do Sul, Porto Alegre, Brazil

This Paper Appears in :
Integrated Circuits and Systems Design, 1999. Proceedings. XII Symposium on

on Pages: 12 - 15

               This Conference was Held : 29 Sept.-2 Oct. 1999
                1999
                                          ISBN: 0-7695-0387-X

               IEEE Catalog Number: PR00387
               Total Pages: xiii+236
               References Cited: 14
               Accession Number: 6520680

Abstract:

                      Stack machines are known to provide code compactness and simple execution
                      engines-important features when implementing small devices. This paper discusses some
                      benefits, problems and open questions by using a stack based microcontroller to support
                      native execution of Java bytecode. The discussion is based on our experience in designing a
                      Java ASIP in FPGA, in order to explore software compatibility, reconfiguration capability and
                      the small size of optimized microcontrollers to implement specific applications. The paper
                      also presents the synthesized machine architecture and shows some area and speed results.

46
               A design and tool reuse methodology for rapid prototyping of application
               specific instruction set processors
               - Young Geol Kim; Tag Gon Kim
               Dept. of Electr. Eng., Korea Adv. Inst. of Sci. & Technol., Seoul, South Korea

This Paper Appears in :
Rapid System Prototyping, 1999. IEEE International Workshop on

on Pages: 46 - 51

               This Conference was Held : 16-18 June 1999
                1999
                                          ISBN: 0-7695-0246-6

               IEEE Catalog Number: PR00246
               Total Pages: x+243
               References Cited: 5
               Accession Number: 6325340

Abstract:

                      This paper proposes a design method and a tool reuse scheme for the rapid prototyping of
                      application-specific instruction-set processors (ASIPs). We propose a three-level
                      hierarchical architecture abstraction method for top-down processor design. We also
                      propose a reusable architecture description language (READ) and a family of retargetable
                      simulators that allow top-down processor description and prototyping from instruction-set
                      design to RTL implementation.

47
               MetaCore: an application specific DSP development system
               - Jin-Hyuk Yang; Byoung-Woon Kim; Sang-Jun Nam; Jang-Ho Cho; Sung-Won Seo; Chang-Ho Ryu;
               Young-Su Kwon; Dae-Hyun Lee; Jong-Yeol Lee; Jong-Sun Kim; Hyun-Dhong Yoon; Jae-Yeol Kim;
               Kun-Moo Lee; Chan-Soo Hwang; In-Hyung Kim; Jun-Sung Kim; Kwang-Il Park; Kyu-Ko Park;
               Yong-Hoon Lee; Seung-Ho Hwang; In-Cheol Park; Chong-Min Kyung
               Dept. of Electr. Eng., Korea Adv. Inst. of Sci. & Technol., Seoul, South Korea

This Paper Appears in :
Design Automation Conference, 1998. Proceedings

on Pages: 800 - 803

               This Conference was Held : 15-19 June 1998
                1998
                                          ISBN: 0-89791-964-5

               IEEE Catalog Number: 98CH36175
               Total Pages: xxxii+820
               References Cited: 6
               Accession Number: 6084493

Abstract:

                      This paper describes the MetaCore system which is an ASIP (Application-Specific
                      Instruction set Processor) development system targeted for DSP applications. The goal of
                      MetaCore system is to offer an efficient design methodology meeting specifications given as
                      a combination of performance, cost and design turnaround time. MetaCore system consists
                      of two major design stages: design exploration and design generation. In the design
                      exploration stage, MetaCore system accepts a set of benchmark programs and a formal
                      specification of ISA (Instruction Set Architecture), and estimates the hardware cost and
                      performance for each hardware configuration being explored. Once a hardware configuration
                      is chosen, the system helps generate a VLSI processor design in the form of HDL along with
                      the application program development tools such as C compiler, assembler and instruction set
                      simulator.

48
               Register= Assignrnent Through Resource Classification For Asip
               Microcode Generation
               - Liem, C.; May, T.; Paulin, P.

This Paper Appears in :
Computer-Aided Design, 1994., IEEE/ACM International Conference on

on Pages: 397 - 402

This Conference was Held : November 6-10, 1994
ISSN: 1063-6757

Abstract:

Not Available

49
               Processor-core based design and test
               - Marwedel, P.
               Dortmund Univ., Germany

This Paper Appears in :
Design Automation Conference, 1997. Proceedings of the ASP-DAC '97 Asia and South Pacific

on Pages: 499 - 502

               This Conference was Held : 28-31 Jan. 1997
                1997
                                          ISBN: 0-7803-3662-3

               IEEE Catalog Number: 97TH8231
               Total Pages: xxxii+691
               References Cited: 52
               Accession Number: 5559031

Abstract:

                      This paper responds to the rapidly increasing use of various cores for implementing
                      systems-on-a-chip. It specifically focusses on processor cores. We give some examples of
                      cores, including DSP cores and application-specific instruction-set processors (ASIPs).
                      We mention market trends for these components, and we touch design procedures, in
                      particular the use of compilers. Finally, we discuss the problem of testing core-based
                      designs. Existing solutions include boundary scan, embedded in-circuit emulation (ICE), the
                      use of processor resources for stimuli/response compaction and self-test programs.

50
               Hierarchical Test Generation And Design For Testability Of ASPPs and
               ASIPs
               - Ghosh, L.; Raghunathan, A.; Jha, N.K.
               Princeton University, Princeton, NJ 08544

This Paper Appears in :
Design Automation Conference, 1997. Proceedings of the 34th

on Pages: 534 - 539

This Conference was Held : June 9-13, 1997
ISSN: 0738-100X

Abstract:

Not Available

51
               Retargetable generation of code selectors from HDL processor models
               - Leupers, R.; Marwedel, P.
               Dept. of Comput. Sci., Dortmund Univ., Germany

This Paper Appears in :
European Design and Test Conference, 1997. ED&TC 97. Proceedings

on Pages: 140 - 144

               This Conference was Held : 17-20 March 1997
                1997
                                          ISBN: 0-8186-7786-4

               IEEE Catalog Number: 97TB100102
               Total Pages: xxxvi+634
               References Cited: 22
               Accession Number: 5622676

Abstract:

                      Besides high code quality, a primary issue in embedded code generation is retargetability of
                      code generators. This paper presents techniques for automatic generation of code selectors
                      from externally specified processor models. In contrast to previous work, our retargetable
                      compiler RECORD does not require tool-specific modelling formalisms, but starts from
                      general HDL processor models. From an HDL model, all processor aspects needed for code
                      generation are automatically derived. As demonstrated by experimental results, short
                      turnaround times for retargeting are achieved, which permits study of the HW/SW trade-off
                      between processor architectures and program execution speed.

52
               Methods for retargetable DSP code generation
               - Leupers, R.; Niemann, R.; Marwedel, P.
               Editor(s): Rabaey, J., Chau, P.M., Eldon, J.
               Dept. of Comput. Sci. XII, Dortmund Univ., Germany

This Paper Appears in :
VLSI Signal Processing, VII, 1994., [Workshop on]

on Pages: 127 - 136

               This Conference was Held : 26-28 Oct. 1994
                1994
                                          ISBN: 0-7803-2123-5

               IEEE Catalog Number: 94TH8008
               Total Pages: xii+511
               References Cited: 9
               Accession Number: 5105714

Abstract:

                      Efficient embedded DSP system design requires methods of hardware/software codesign. In
                      this contribution we focus on software synthesis for partitioned system behavioral
                      descriptions. In previous approaches, this task is performed by compiling the behavioral
                      descriptions onto standard processors using target-specific compilers. It is argued that
                      abandoning this restriction allows for higher degrees of freedom in design space exploration.
                      In turn, this demands for retargetable code generation tools. We present different schemes
                      for DSP code generation using the MSSQ microcode generator. Experiments with industrial
                      applications revealed that retargetable DSP code generation based on structural hardware
                      descriptions is feasible, but there exists a strong dependency between the behavioral
                      description style and the resulting code quality. As a result, necessary features of
                      high-quality retargetable DSP code generators are identified.

53
               Industrial experience using rule-driven retargetable code generation for
               multimedia applications
               - Liem, C.; Paulin, P.; Cornero, M.; Jerraya, A.
               Inst. Nat. Polytech. de Grenoble, France

This Paper Appears in :
System Synthesis, 1995., Proceedings of the Eighth International Symposium on

on Pages: 60 - 65

               This Conference was Held : 13-15 Sept. 1995
                1995
                                          ISBN: 0-8186-7076-2

               IEEE Catalog Number: 95TH8050
               Total Pages: xiii+175
               References Cited: 13
               Accession Number: 5087877

Abstract:

                      The increasing usage of application-specific instruction set processors (ASIPs) in audio and
                      video telecommunications has made strong demands on the rapid availability of dedicated
                      compilers. A rule-driven approach to code generation may have benefits over model-based
                      approaches as the user is not confined to the capabilities supported by the model. However,
                      the sole use of transformation rules may or may not be sufficient in optimization abilities
                      depending on the target architecture. This paper outlines experiences with a rule-driven code
                      generation approach for two applications in audio and video processing. The first is a
                      controller for the VideoPhone codec at SGS-Thomson Microelectronics. The second is a
                      VLIW (very large instruction word) processor for high-fidelity and MPEG audio at Thomson
                      Consumer Electronic Components. The experience has shown that a rule-driven approach to
                      compilation is applicable to both the controller and VLIW architectures; however, is limited
                      in optimization abilities for the latter.

54
               Prototyping and reengineering of microcontroller-based systems
               - Carro, L.; Pereira; Suzim, A.
               Dept. de Engenharia Electrica & Pos Graduacao em Ciencia de Computacao, Univ. Federal do Rio Grande do
               Sul, Porto Alegre, Brazil

This Paper Appears in :
Rapid System Prototyping, 1996. Proceedings., Seventh IEEE International Workshop on

on Pages: 178 - 182

               This Conference was Held : 19-21 June 1996
                1996
                                          ISBN: 0-8186-7603-5

               IEEE Catalog Number: 96TB100055
               Total Pages: ix+189
               References Cited: 9
               Accession Number: 5317120

Abstract:

                      This paper describes our current research in the field of systems design, trying to reach an
                      Application Specific Integrated System (ASIS). Our target system is based on industry
                      applications. We show the design approach to change presently developed boards using
                      classical microcontrollers, migrating the Cisc architecture to an ASIP architecture. The
                      studied examples show meaningful gains regarding the total area of the processor.

55
               Embedded architecture co-synthesis and system integration
               - Lin, B.; Vercauteren, S.; De Man, H.
               Editor(s): Thomas, D., Ernst, R.
               IMEC, Leuven, Belgium

               This Paper Appears in :
               Hardware/Software Co-Design, 1996. (Codes/CASHE '96), Proceedings., Fourth International
               Workshop on

on Pages: 2 - 9

               This Conference was Held : 18-20 March 1996
                1996
                                          ISBN: 0-8186-7243-9

               IEEE Catalog Number: 96TB100020
               Total Pages: ix+141
               References Cited: 16
               Accession Number: 5256458

Abstract:

                      Embedded system architectures comprising of software programmable components (e.g.
                      DSP, ASIP, and micro-controller cores) and customized hardware co-processors, integrated
                      into a single cost-efficient VLSI chip, are emerging as a key solution to today's
                      microelectronics design problems. This trend is being driven by new emerging applications in
                      the areas of wireless communication, high-speed optical networking, and multimedia
                      computing. A key problem confronted by embedded system designers today is the rapid
                      prototyping of application-specific embedded system architectures where different
                      combinations of programmable processors and hardware components must be integrated
                      together, while ensuring that the hardware and software parts communicate correctly. In this
                      paper, we present a solution to this embedded architecture co-synthesis and system
                      integration problem based on an orchestrated combination of architectural strategies,
                      parameterized libraries, and software CAD tools.

56
               Memory bank and register allocation in software synthesis for ASIPs
               - Sudarsanam, A.; Malik, S.
               Dept. of Electr. Eng., Princeton Univ., NJ, USA

               This Paper Appears in :
               Computer-Aided Design, 1995. ICCAD-95. Digest of Technical Papers., 1995 IEEE/ACM
               International Conference on

on Pages: 388 - 392

               This Conference was Held : 5-9 Nov. 1995
                1995
                                          ISBN: 0-8186-7213-7

               IEEE Catalog Number: 95CB35859
               Total Pages: xxviii+743
               References Cited: 10
               Accession Number: 5145258

Abstract:

                      An architectural feature commonly found in digital signal processors (DSPs) is multiple
                      data-memory banks. This feature increases memory bandwidth by permitting multiple
                      memory accesses to occur in parallel when the referenced variables belong to different
                      memory banks and the registers involved are allocated according to a strict set of conditions,
                      Unfortunately, current compiler technology is unable to take advantage of the potential
                      increase in parallelism offered by such architectures, Consequently, most application
                      software for DSP systems is hand-written-a very time-consuming task. We present an
                      algorithm which attempts to maximize the benefit of this architectural feature. While
                      previous approaches have decoupled the phases of register allocation and memory bank
                      assignment, our algorithm performs these two phases simultaneously. Experimental results
                      demonstrate that our algorithm substantially improves the code quality of many
                      compiler-generated and even hand-written programs.

57
Rapid Prototyping of Application-Specific Counterflow pipelines
- Childers B.; Davidson J.

University of Virginia

Technical Report CS-99-01

            Abstract
                Application-specific processor (ASIP) design is a promising approach for meeting
                the performance and cost goals of an embedded system. We have developed a new
                microarchitecture for automatically constructing ASIP's. This new architecture,
                called a wide counterflow pipeline (WCFP), is based on the counterflow pipeline
                organization proposed by Sproull, Sutherland, and Molnar. Our ASIP synthesis
                technique uses software pipelining and design-space exploration to generate
                a custom WCFP and instruction set for an embedded application. This type of
                architecture synthesis requires an infrastructure for rapidly prototyping ASIP's
                to evaluate design trade-offs. This paper presents the requirements and
                implementation of such an environment for automatic design of WCFP's. First,
                we describe a database for specifying design elements and architectural con-
                straints. Second, we present an intermediate representation for WCFP synthesis
                and reconfigurable simulation. Finally, we describe a fast and reconfigurable simula-
                tion methodology for WCFP's.

58
             Processor evaluation in an embedded systems design environment
             - Gupta, T.V.K.; Sharma, P.; Balakrishnan, M.; Malik, S.
             Dept. of Comput. Sci. & Eng., Indian Inst. of Technol., Delhi, India

This Paper Appears in :
VLSI Design, 2000. Thirteenth International Conference on

on Pages: 98 - 103

             This Conference was Held : 3-7 Jan. 2000
              2000
                                    ISBN: 0-7695-0487-6

             Total Pages: xxxiv+588
             References Cited: 10
             Accession Number: 6576983

Abstract:

                   In this paper we present a novel methodology for processor evaluation in an embedded
                   systems design environment. This evaluation can help in either selecting a suitable processor
                   core or in evaluating changes to an ASIP. The processor evaluation is carried out in two
                   stages. First, an architecture independent stage in which processors are rejected based on
                   key application parameters and secondary architecture dependent stage in which
                   performance is estimated on selected processors. The contribution of our work includes
                   identification of application parameters which can influence processor selection, a
                   mechanism to capture widely varying processor architectures and an instruction constrained
                   scheduler. Initial experimental results suggest the potential of this approach.

59
             System Design Based on Single Language and Single-Chip Java ASIP
             Microcontroller
             - Akira, S.; Carro, I.L.; Jacobi, R.P.
             UFRGS - Brazil

This Paper Appears in :
Design, Automation and Test in Europe Conference and Exhibition 2000. Proceedings

on Pages: 703 - 707

             This Conference was Held : March 27-30, 2000
              2000
                                    ISBN: 0-7695-0537-6

Abstract:

Not Available

60
             The construction of a retargetable simulator for an architecture template
             - Kienhuis, B.; Deprettere, E.; Vissers, K.; van der Wolf, P.
             Delft Univ. of Technol., Netherlands

             This Paper Appears in :
             Hardware/Software Codesign, 1998. (CODES/CASHE '98). Proceedings of the Sixth International
             Workshop on

on Pages: 125 - 129

             This Conference was Held : 15-18 March 1998
              1998
                                    ISBN: 0-8186-8442-9

             IEEE Catalog Number: 98TB100232
             Total Pages: vii+151
             References Cited: 13
             Accession Number: 5894893

Abstract:

                   Systems in the domain of high-performance video signal processing are becoming more and
                   more programmable. We suggest an approach to design such systems that involves
                   measuring, via simulation, the performance of various architectures on which a set of
                   applications are mapped. This approach requires a retargetable simulator for an architecture
                   template. We describe the retargetable simulator that we constructed for a stream-oriented
                   application-specific dataflow architecture. For each architecture instance of the architecture
                   template, a specific simulator is derived in three steps: the architecture instance is
                   constructed, an execution model is added, and the executable architecture is instrumented to
                   obtain performance numbers. We used object oriented principles together with a high-level
                   simulation mechanism to ensure retargetability and an efficient simulation speed. Finally we
                   explain how a retargetable simulator can be encapsulated within an environment for
                   automated design space exploration.

             A framework for retargetable code generation using simulated annealing
             - Visser, B.-S.
             Dept. of Comput. Sci., Twente Univ., Enschede, Netherlands

This Paper Appears in :
EUROMICRO Conference, 1999. Proceedings. 25th

on Pages: 458 - 462 vol.1

             This Conference was Held : 8-10 Sept. 1999
              1999
                                    Vol. 1
                                                           ISBN: 0-7695-0321-7

             Total Pages: 2 vol. (xxviii+530+478)
             References Cited: 11
             Accession Number: 6364161

Abstract:

                   Co-development of hardware and software is a methodology dealing with the increased
                   design complexity of embedded systems. Retargetable code generation is a co-designing
                   method to map a high-level software description onto a variety of hardware architectures
                   without the need to rewrite a compiler. Highly efficient code generation is required to meet,
                   for example, timing, area and low-power constraints. The traditional ordering of code
                   generation phases introduces inefficiencies in the code generation process; phase-coupling
                   deals with these inefficiencies. We introduce a new code generation technique based on
                   simulated annealing. This technique focuses especially on highly irregular DSP architectures
                   and is part of a generic framework for retargetable code generation. This approach is new
                   because it fully tackles the phase-coupling problem. Furthermore, this approach shows that
                   the modeling of the software algorithm and the hardware architecture plays a key role in the
                   efficiency of code generation.

62
             Instruction selection, resource allocation, and scheduling in the AVIV
             retargetable code generator
             - Hanono, S.; Devadas, S.
             Dept. of Electr. Eng. & Comput. Sci., MIT, MA, USA

This Paper Appears in :
Design Automation Conference, 1998. Proceedings

on Pages: 510 - 515

             This Conference was Held : 15-19 June 1998
              1998
                                    ISBN: 0-89791-964-5

             IEEE Catalog Number: 98CH36175
             Total Pages: xxxii+820
             References Cited: 11
             Accession Number: 6084458

Abstract:

                   The AVIV retargetable code generator produces optimized machine code for target
                   processors with different instruction set architectures. AVIV optimizes for minimum code
                   size. Retargetable code generation requires the development of heuristic algorithms for
                   instruction selection, resource allocation, and scheduling. AVIV addresses these code
                   generation subproblems concurrently, whereas most current code generation systems
                   address them sequentially. It accomplishes this by converting the input application to a
                   graphical (Split-Node DAG) representation that specifies all possible ways of implementing
                   the application on the target processor. The information embedded in this representation is
                   then used to set up a heuristic branch-and-bound step that performs functional unit
                   assignment, operation grouping, register bank allocation, and scheduling concurrently. While
                   detailed register allocation is carried out as a second step, estimates of register
                   requirements are generated during the first step to ensure high quality of the final assembly
                   code. We show that near-optimal code can be generated for basic blocks for different
                   architectures within reasonable amounts of CPU time. Our framework thus allows us to
                   accurately evaluate the performance of different architectures on application code.

63
             A BDD-based frontend for retargetable compilers
             - Leupers, R.; Marwedel, P.
             Dept. of Comput. Sci., Dortmund Univ., Germany

This Paper Appears in :
European Design and Test Conference, 1995. ED&TC 1995, Proceedings.

on Pages: 239 - 243

             This Conference was Held : 6-9 March 1995
              1995
                                    ISBN: 0-8186-7039-8

             IEEE Catalog Number: 95TH8058
             Total Pages: xxvii+611
             References Cited: 12
             Accession Number: 5057047

Abstract:

                   We present a unified frontend for retargetable compilers that performs analysis of the target
                   processor model. Our approach bridges the gap between structural and behavioral processor
                   models for retargetable compilation. This is achieved by means of instruction set extraction.
                   The extraction technique is based on a BDD data structure which significantly improves
                   control signal analysis in the target processor compared to previous approaches.

64
             Power efficient mediaprocessors: design space exploration
             - Kin, J.; Chunho Lee; Mangione-Smith, W.H.; Potkonjak, M.
             Dept. of Electr. Eng., California Univ., Los Angeles, CA, USA

This Paper Appears in :
Design Automation Conference, 1999. Proceedings. 36th

on Pages: 321 - 326

             This Conference was Held : 21-25 June 1999
              1999
                                    ISBN: 1-58113-092-9

             IEEE Catalog Number: 99CH36361
             Total Pages: xxxii+1003
             References Cited: 31
             Accession Number: 6495998

Abstract:

                   We present a framework for rapidly exploring the design space of low power
                   application-specific programmable processors (ASPP), in particular mediaprocessors. We
                   focus on a category of processors that are programmable yet optimized to reduce power
                   consumption for a specific set of applications. The key components of the framework
                   presented in this paper are a retargetable instruction level parallelism (ILP) compiler,
                   processor simulators, a set of complete media applications written in a high level language
                   and an architectural component selection algorithm. The fundamental idea behind the
                   framework is that with the aid of a retargetable ILP compiler and simulators it is possible to
                   arrange architectural parameters (e.g., the issue width, the size of cache memory units, the
                   number of execution units, etc.) to meet low power design goals under area constraints.

65
             Binding and scheduling algorithms for highly retargetable compilation
             - Yamaguchi, M.; Ishiura, N.; Kambe, T.
             Precision Technol. Center, Sharp Corp., Nara, Japan

This Paper Appears in :
Design Automation Conference 1998. Proceedings of the ASP-DAC '98. Asia and South Pacific

on Pages: 93 - 98

             This Conference was Held : 10-13 Feb. 1998
              1998
                                    ISBN: 0-7803-4425-1

             IEEE Catalog Number: 98EX121
             Total Pages: xxxviii+606
             References Cited: 7
             Accession Number: 5912424

Abstract:

                   This paper presents new binding and scheduling algorithms for a retargetable compiler which
                   can deal with diverse architectures. Application specific embedded processors often
                   includes a "nonorthogonal" datapath where all the registers are not equally accessible from
                   all the functional units. Nonorthogonal datapath makes a binding task very hard because
                   inadvertent assignment of an operation to a functional unit may rule out all the possible
                   assignments to other operations due to reachability constraints among datapath resources.
                   Scheduling must take register capacity constraints into account in addition to resource
                   constraints. We discuss these problems and propose algorithms to solve them.

66
             A retargetable optimizing code generator for digital signal processors
             - Kreuzer, W.; Gotschlich, M.; Wess, B.
             Inst. fur Nachrichtentech. & Hochfrequenztech., Tech. Univ. Wien, Austria

This Paper Appears in :
Circuits and Systems, 1996. ISCAS '96., Connecting the World., 1996 IEEE International Symposium on

on Pages: 257 - 260 vol.2

              1996
                                    Vol. 2
                                                           ISBN: 0-7803-3073-0

             IEEE Catalog Number: 96CH35876
             Total Pages: 4 vol.(xlviii+692+801+612+845)
             References Cited: 12
             Accession Number: 5425344

Abstract:

                   Efficient DSP software synthesis for systems with stringent cost and power constraints
                   requires tools which minimize code size as well as tools to evaluate processor architectures
                   for a given application. In this paper, we introduce a user retargetable code generator
                   translating homogeneous atomic data flow graphs into high-quality DSP assembly code. By
                   using a target architecture description file, flexibility in the design process is enhanced
                   without impairing final code quality. Based on a trellis tree straight-line code generation
                   algorithm, we present a method for code compaction and register optimization to exploit
                   instruction level parallelism. The results of our code generator match the quality of assembly
                   programs which were coded by hand and thoroughly optimized.

67
             A graph based processor model for retargetable code generation
             - Van Praet, J.; Lanneer, D.; Goossens, G.; Geurts, W.; De Man, H.
             IMEC, Leuven, Belgium

This Paper Appears in :
European Design and Test Conference, 1996. ED&TC 96. Proceedings

on Pages: 102 - 107

             This Conference was Held : 11-14 March 1996
              1996
                                    ISBN: 0-8186-7423-7

             IEEE Catalog Number: 96TB100027
             Total Pages: xxxi+623
             References Cited: 11
             Accession Number: 5309465

Abstract:

                   Embedded processors in electronic systems typically are tuned to a few applications.
                   Development of processor specific compilers is prohibitively expensive and as a result such
                   compilers, if existing, yield code of an unacceptable quality. To improve this code quality, we
                   developed a retargetable and optimising code generator. It uses a graph based processor
                   model that captures the connectivity the parallelism and all architectural peculiarities of an
                   embedded processor In this paper; the processor model is presented and we formally define
                   the code generation task, including code selection, register allocation and scheduling, in
                   terms of this model.

68
             Efficient retargetable compiler code generation
             - Hatcher, P.J.; Tuller, J.W.
             Dept. of Comput. Sci., New Hampshire Univ., Durham, NH, USA

This Paper Appears in :
Computer Languages, 1988. Proceedings., International Conference on

on Pages: 25 - 30

             This Conference was Held : 9-13 Oct. 1988
              1988
                                    ISBN: 0-8186-0874-9

             Total Pages: xv+446
             References Cited: 19
             Accession Number: 3327897

Abstract:

                   A discussion is presented of the design and implementation of a retargetable code generation
                   system, UNH-CODEGEN, specifically designed for the bottom-up tree pattern matching
                   algorithms. The authors describe experiments in which the system has been used to build
                   compilers. These experiments demonstrate that the system can be used to quickly generate a
                   code generator that will run fast (roughly four times the speed of the Portable C Compiler's
                   code generators), that will be space-efficient, and that will make best use of the underlying
                   machine description.

69
             Hypermedia processors: design space exploration
             - Kin, J.; Chunho Lee; Mangione-Smith, W.H.; Potkonjak, M.
             Editor(s): Wong, P.W., Alwan, A., Ortega, A., Kuo, C.-C.J., Nikian, C.L.M.
             Dept. of Electr. Eng., California Univ., Los Angeles, CA, USA

This Paper Appears in :
Multimedia Signal Processing, 1998 IEEE Second Workshop on

on Pages: 323 - 328

             This Conference was Held : 7-9 Dec. 1998
              1998
                                    ISBN: 0-7803-4919-9

             IEEE Catalog Number: 98EX175
             Total Pages: xvii+638
             References Cited: 8
             Accession Number: 6313715

Abstract:

                   We present a framework for area optimal system design space exploration for hypermedia
                   applications. We focus on a category of processors that are programmable yet optimized to a
                   hypermedia application. The key components of the framework presented in this paper are a
                   retargetable instruction-level parallelism compiler, instruction level simulators, a set of
                   complete media applications written in a high level language and a media processor synthesis
                   algorithm. The framework addresses the need for area optimal system design by exploiting
                   the instruction-level parallelism found in media applications by compilers that target
                   multiple-instruction-issue processors. Using the framework we conduct an extensive
                   exploration of area optimal system design space for a hypermedia application. We found that
                   there is enough ILP in the typical media and communication applications to achieve highly
                   concurrent execution when throughput requirements are high. On the other hand, when
                   throughput requirements are low, there is no need to use multiple-instruction-issue
                   processors.

70
             Describing instruction set processors using nML
             - Fauth, A.; Van Praet, J.; Freericks, M.
             Inst. fur Tech. Inf., Tech. Univ. Berlin, Germany

This Paper Appears in :
European Design and Test Conference, 1995. ED&TC 1995, Proceedings.

on Pages: 503 - 507

             This Conference was Held : 6-9 March 1995
              1995
                                    ISBN: 0-8186-7039-8

             IEEE Catalog Number: 95TH8058
             Total Pages: xxvii+611
             References Cited: 24
             Accession Number: 5057082

Abstract:

                   Programmable processors offer a high degree of flexibility and are therefore increasingly
                   being used in embedded systems. We introduce the formalism nML which is especially
                   suited to describe such processors in terms of their instruction set, an nML description is
                   directly related to the standard description as found in the usual programmer's manuals. The
                   nML formalism is based on a mixed structural and behavioural model facilitating exact yet
                   concise descriptions. The philosophy of nML is already applied in two approaches to
                   retargetable code generation and instruction set simulation.

71
             The White Dwarf: a high-performance application-specific processor
             - Wolfe, A.; Breternitz, M., Jr.; Stephens, C.; Ting, A.L.; Kirk, D.B.; Bianchini, R.P., Jr.; Shen, J.P.
             Dept. of Electr. & Comput. Eng., Carnegie-Mellon Univ., Pittsburgh, PA, USA

This Paper Appears in :
Computer Architecture, 1988. Conference Proceedings. 15th Annual International Symposium on

on Pages: 212 - 222

             This Conference was Held : 30 May-2 June 1988
              1988
                                    ISBN: 0-8186-0861-7

             Total Pages: xi+461
             References Cited: 18
             Accession Number: 3228437

Abstract:

                   The design and implementation of a high-performance special-purpose processor, called the
                   White Dwarf, or accelerating finite-element analysis algorithms is presented. The White
                   Dwarf CPU contains two Am2935 32-bit floating-point processors and one Am29332 32-bit
                   arithmetic logic unit (ALU), and uses a wide-instruction-word architecture in which the
                   application algorithm is directly implemented in microcode. The entire system is VME-bus
                   compatible and interfaces with a Sun 3/160 host. The system's potential peak performance is
                   20 MFLOPS (million floating-point operations per second) a sustained computation rate in
                   excess of 15 MFLOPS is expected. A potential speedup of between one and two orders of
                   magnitude is possible. With a fully populated memory subsystem, the White Dwarf can
                   accommodate finite-element problems involving up to half a million nodes. The system is
                   designed using an approach called application-specific processor design (ASPD). A
                   retargetable compiler has been developed which is capable of generating highly parallel and
                   efficient code for the White Dwarf and other processors with similar architecture. System
                   debug/integration is in progress; a highly useful system is expected.

72
             A formal model of computer architectures for digital system design
             environments
             - Wilsey, P.A.; Dasgupta, S.
             Dept. of Electr. & Comput. Eng., Cincinnati Univ., OH, USA

This Paper Appears in :
Computer-Aided Design of Integrated Circuits and Systems, IEEE Transactions on

on Pages: 473 - 486

              May 1990
                                 Vol. 9
                                                   Issue: 5
                                                                      ISSN: 0278-0070

             References Cited: 35
             CODEN: ITCSDI
             Accession Number: 3721471

Abstract:

                   A new and powerful model of computer architectures for machine description is presented.
                   This model is capable of representing a machine across the abstraction levels ranging from
                   the exo-architecture to the gate level. The goal is to establish a formal framework for the
                   construction of a new hardware description language that will be useful for a large class of
                   retargetable design-automation systems.

73
              Retargetable estimation scheme for DSP architectnre selection
              - Ghazal, N.; Newton, R.; Jan Rabaey
              University of California

This Paper Appears in :
Design Automation Conference, 2000. Proceedings of the ASP-DAC 2000. Asia and South Pacific

on Pages: 485 - 489

              This Conference was Held : January 25-28, 2000
               2000
                                     ISBN: 0-7803-5973-9

Abstract:

Not Available

74
             Instruction set design and optimizations for address computation in DSP
             architectures
             - Araujo, G.; Sudarsanam, A.; Malik, S.
             Dept. of Electr. Eng., Princeton Univ., NJ, USA

This Paper Appears in :
System Synthesis, 1996. Proceedings., 9th International Symposium on

on Pages: 102 - 107

             This Conference was Held : 6-8 Nov. 1996
              1996
                                    ISBN: 0-8186-7563-2

             IEEE Catalog Number: 96TB100061
             Total Pages: xii+145
             References Cited: 10
             Accession Number: 5450820

Abstract:

                   In this paper we investigate the problem of code generation for address computation for DSP
                   processors. This work is divided into four parts. First, we propose a branch instruction design
                   which can guarantee minimum overhead for programs that make use of implicit indirect
                   addressing. Second, we give a formulation and propose a solution for the problem of
                   allocating address registers (ARs) for array accesses within loop constructs. Third, we
                   describe retargetable approaches for auto-increment (decrement) optimizations of pointer
                   variables, and loop induction variables. Finally, we use a graph coloring technique to allocate
                   physical ARs to the virtual ARs used in the previous phases. The results show that the
                   combination of the above techniques considerably improves the final code quality for
                   benchmark DSP programs.

75
             Time-constrained code compaction for DSPs
             - Leupers, R.; Marwedel, P.
             Dept. of Comput. Sci., Dortmund Univ., Germany

This Paper Appears in :
Very Large Scale Integration (VLSI) Systems, IEEE Transactions on

on Pages: 112 - 122

             This Conference was Held : 18-20 Jan. 1995
              March 1997
                                 Vol. 5
                                                   Issue: 1
                                                                      ISSN: 1063-8210

             References Cited: 33
             CODEN: IEVSE9
             Accession Number: 5525492

Abstract:

                   This paper addresses instruction-level parallelism in code generation for digital signal
                   processors (DSPs). In the presence of potential parallelism, the task of code generation
                   includes code compaction, which parallelizes primitive processor operations under given
                   dependency and resource constraints. Furthermore, DSP algorithms in most cases are
                   required to guarantee real-time response. Since the exact execution speed of a DSP program
                   is only known after compaction, real-time constraints should be taken into account during
                   the compaction phase. While previous DSP code generators rely on rigid heuristics for
                   compaction, we propose a novel approach to exact local code compaction based on an integer
                   programming (IP) model, which handles time constraints. Due to a general problem
                   formulation, the IP model also captures encoding restrictions and handles instructions having
                   alternative encodings and side effects and therefore applies to a large class of instruction
                   formats. Capabilities and limitations of our approach are discussed for different DSPs.

76
             A low overhead design for testability and test generation technique for
             core-based systems-on-a-chip
             - Ghosh, I.; Jha, N.K.; Dey, S.
             Fujitsu Labs. of America, Sunnyvale, CA, USA

This Paper Appears in :
Computer-Aided Design of Integrated Circuits and Systems, IEEE Transactions on

on Pages: 1661 - 1676

              Nov. 1999
                                 Vol. 18
                                                   Issue: 11
                                                                      ISSN: 0278-0070

             References Cited: 30
             CODEN: ITCSDI
             Accession Number: 6453580

Abstract:

                   In a fundamental paradigm shift in system design, entire systems are being built on a single
                   chip, using multiple embedded cores. Though the newest system design methodology has
                   several advantages in terms of time-to-market and system cost, testing such core-based
                   systems is difficult, mainly due to the problem of justifying test sequences at the inputs of a
                   core embedded deep in the circuit and propagating test responses from the core outputs. In
                   this paper, we first present a design for testability technique for testing such core-based
                   systems. In this scheme, untestable cores are first made testable using hierarchical
                   testability analysis techniques. If necessary, additional testability hardware is added to the
                   cores to make them transparent so that they can propagate test data without information
                   loss. This testability and transparency technique is currently applicable to cores of the
                   following types: application-specific integrated circuits, application-specific programmable
                   processors, and application-specific instruction processors. Other core types can be made
                   testable and transparent using traditional techniques. The testable and transparent cores can
                   then he integrated together with some system-level testability hardware to ensure
                   justification of precomputed test sequences of each core from system primary inputs to the
                   core inputs and propagation of test responses from core outputs to system primary outputs.
                   Justification and propagation of test sequences are done at the system level by extending
                   and suitably modifying the symbolic hierarchical testability analysis method that has been
                   successfully applied to register-transfer level circuits. Since the testability analysis method
                   is symbolic, the system test generation method is independent of the bit-width of the cores.
                   The system-level test set is obtained as a byproduct of the testability analysis and insertion
                   method without further search. The test methodology was applied to six example systems.
                   Besides the proposed test method, the two methods that are currently used in the industry
                   were also evaluated: (1) FScan-BScan, where each core is full-scanned, and system test is
                   performed using boundary scan and (2) FScan-TBus, where each core is full-scanned, and
                   system test is performed using a test bus. The experiments show that the proposed scheme
                   has significantly lower area overhead, delay overhead, and test application time compared to
                   FScan-BScan and FScan-TBus, without any compromise in the system fault coverage.

77
             Cooperative register assignment and code compaction for digital signal
             processors with irregular datapaths
             - Kreuzer, W.; Wess, B.
             Inst. fur Nachrichtentech. und Hochfrequenztech., Tech. Univ. Wien, Austria

This Paper Appears in :
Acoustics, Speech, and Signal Processing, 1997. ICASSP-97., 1997 IEEE International Conference on

on Pages: 691 - 694 vol.1

             This Conference was Held : 21-24 April 1997
              1997
                                    Vol. 1
                                                           ISBN: 0-8186-7919-0

             IEEE Catalog Number: 97CB36052
             Total Pages: 5 vol. (xxii+xxv+xxiv+xxii+4156)
             References Cited: 12
             Accession Number: 5716155

Abstract:

                   We address the phase ordering problem of code compaction and register assignment in a
                   data flow graph compiler. During register assignment, we take into account the
                   instruction-level parallelism available. Symbolic variables in straight-line code are allocated
                   to register set/memory location pairs which maximally preserve the freedom available for
                   code compaction. Whenever necessary, spill code is inserted during final register assignment
                   and scheduled during code compaction. Register assignment is performed taking into account
                   its impact on code compaction. This strategy results in final code of high quality.

78
             An efficient model for DSP code generation: performance, code size,
             estimated energy
             - Gebotys, C.H.
             Dept. of Electr. & Comput. Eng., Waterloo Univ., Ont., Canada

This Paper Appears in :
System Synthesis, 1997. Proceedings., Tenth International Symposium on

on Pages: 41 - 47

             This Conference was Held : 17-19 Sept. 1997
              1997
                                    ISBN: 0-8186-7949-2

             IEEE Catalog Number: 97TB100114
             Total Pages: x+141
             References Cited: 16
             Accession Number: 5717352

Abstract:

                   The paper presents a model for simultaneous instruction selection, compaction, and register
                   allocation. An arc mapping model, along with logical propositions is used to create an
                   optimization model. Code is generated in fast cpu times and is optimized for minimum code
                   size, maximum performance or estimated energy dissipation. Code generated for realistic
                   DSP applications provides performance and code size improvements from 1.09 up to 2.18
                   times for the TMS320C2x processor compared to previous research and a commercial
                   compiler. In all examples, up to 106 instructions are generated in under one cpu minute. This
                   research is important for industry since DSP code can be efficiently generated with
                   constraints on code size, performance and energy dissipation.

79
             Optimal register assignment to loops for embedded code generation
             - Kolson, D.J.; Nicolau, A.; Dutt, N.; Kennedy, K.
             Dept. of Inf. & Comput. Sci., California Univ., Irvine, CA, USA

This Paper Appears in :
System Synthesis, 1995., Proceedings of the Eighth International Symposium on

on Pages: 42 - 47

             This Conference was Held : 13-15 Sept. 1995
              1995
                                    ISBN: 0-8186-7076-2

             IEEE Catalog Number: 95TH8050
             Total Pages: xiii+175
             References Cited: 18
             Accession Number: 5087874

Abstract:

                   One of the challenging tasks in code generation for embedded systems is register
                   assignment. When more live variables than registers exist, some variables are necessarily
                   accessed from data memory. Because loops are typically executed many times and are often
                   time-critical, good register assignment in loops is exceedingly important, since accessing
                   data memory can degrade performance. The issue of finding an optimal register assignment
                   to loops, one which minimizes the number of spills between registers and memory, has been
                   open for some time. In this paper, we address this issue and present an optimal, but
                   exponential, algorithm which assigns registers to loop bodies such that the resulting spill
                   code is minimal. We also show that a heuristic modification performs as well as the
                   exponential approach on typical loops from scientific code.

80
             Code generation for a DSP processor
             - Wei-Kai Cheng; Youn-Long Lin
             Dept. of Comput. Sci., Tsinghua Univ., Beijing, China

This Paper Appears in :
High-Level Synthesis, 1994., Proceedings of the Seventh International Symposium on

on Pages: 82 - 87

             This Conference was Held : 18-20 May 1994
              1994
                                    ISBN: 0-8186-5785-5

             IEEE Catalog Number: 94TH0641-1
             Total Pages: ix+171
             References Cited: 13
             Accession Number: 4706383

Abstract:

                   Proposes a method for compiling an application program into microcodes of a programmable
                   DSP processor. Since most state-of-the-art DSP processors feature some sort of parallel
                   processing architectures, the code generation is a non-trivial task. Based on several
                   scheduling and allocation techniques previously developed by the CAD community for
                   high-level synthesis, we propose a DSP code generator. We emphasize reducing the memory
                   access and register usage conflicts, which often lengthen the total execution time. Starting
                   with an as-soon-as-possible scheduling, without regard to the resource constraints, we
                   transform this illegal scheduling step-by-step into a legal one. In the meantime, registers are
                   allocated and reallocated for variables, taking into account both memory access and register
                   usage constraints. A software system called THEDA.DSP/sub CG/ has been implemented
                   and tested using a set of benchmark programs. Simulation of generated codes which are
                   targeted towards the TI TMS320C40 DSP processor shows that the proposed approach is
                   indeed very effective.

81
             Constraint analysis for DSP code generation
             - Mesman, B.; Timmer, A.H.; Van Meerbergen, J.L.; Jess, J.A.G.
             Philips Res. Lab., Eindhoven, Netherlands

This Paper Appears in :
Computer-Aided Design of Integrated Circuits and Systems, IEEE Transactions on

on Pages: 44 - 57

              Jan. 1999
                                 Vol. 18
                                                   Issue: 1
                                                                      ISSN: 0278-0070

             References Cited: 25
             CODEN: ITCSDI
             Accession Number: 6148508

Abstract:

                   Code generation methods for digital signal processing (DSP) applications are hampered by
                   the combination of tight timing constraints imposed by the performance requirements of DSP
                   algorithms and resource constraints imposed by a hardware architecture. In this paper, we
                   present a method for register binding and instruction scheduling based on the exploitation and
                   analysis of the combination of resource and timing constraints. The analysis identifies
                   implicit sequencing relations between operations in addition to the preceding constraints.
                   Without the explicit modeling of these sequencing constraints, a scheduler is often not
                   capable of finding a solution that satisfies the timing and resource constraints. The presented
                   approach results in an efficient method to obtain high-quality instruction schedules with low
                   register requirements.

82
             Register files constraint satisfaction during scheduling of DSP code
             - Pinto, C.A.A.; Mesman, B.; Van Eijk, K.
             Design Autom. Sect., Eindhoven Univ. of Technol., Netherlands

This Paper Appears in :
Integrated Circuits and Systems Design, 1999. Proceedings. XII Symposium on

on Pages: 74 - 77

             This Conference was Held : 29 Sept.-2 Oct. 1999
              1999
                                    ISBN: 0-7695-0387-X

             IEEE Catalog Number: PR00387
             Total Pages: xiii+236
             References Cited: 8
             Accession Number: 6520693

Abstract:

                   Algorithms in digital signal processing (DSP) impose tight timing constraints that the
                   compiler has to respect while considering the limited capacity of the available register files in
                   a target DSP processor. Traditional code generation methods that schedule spill code to
                   satisfy storage capacity may take many iterations and are usually not capable of satisfying
                   the timing constraints. In this paper we present a new method to handle register file capacity
                   constraints during scheduling. The method identifies potential bottlenecks for register binding
                   and subsequently serializes the lifetimes of values until it can be guaranteed that all capacity
                   constraints will be satisfied after scheduling. Experiments show that we efficiently obtain
                   high quality instruction schedules for DSP kernels.

83
             Algorithms for address assignment in DSP code generation
             - Leupers, R.; Marwedel, P.
             Dept. of Comput. Sci., Dortmund Univ., Germany

             This Paper Appears in :
             Computer-Aided Design, 1996. ICCAD-96. Digest of Technical Papers., 1996 IEEE/ACM
             International Conference on

on Pages: 109 - 112

             This Conference was Held : 10-14 Nov. 1996
              1996
                                    ISBN: 0-8186-7597-7

             IEEE Catalog Number: 96CB35991
             Total Pages: xxv+697
             References Cited: 7
             Accession Number: 5465406

Abstract:

                   This paper presents DSP code optimization techniques, which originate from dedicated
                   memory address generation hardware. We define a generic model of DSP address generation
                   units. Based on this model we present efficient heuristics for computing memory layouts for
                   program variables, which optimize utilization of parallel address generation units.
                   Improvements and generalizations of previous work are described, and the efficacy of the
                   proposed algorithms is demonstrated through experimental evaluation.

84
             Instruction selection for embedded DSPs with complex instructions
             - Leupers, R.; Marwedel, P.
             Dept. of Comput. Sci., Dortmund Univ., Germany

             This Paper Appears in :
             Design Automation Conference, 1996, with EURO-VHDL '96 and Exhibition, Proceedings EURO-DAC
             '96, European

on Pages: 200 - 205

             This Conference was Held : 16-20 Sept. 1996
              1996
                                    ISBN: 0-8186-7573-X

             IEEE Catalog Number: 96CB36000
             Total Pages: xxiii+579
             References Cited: 12
             Accession Number: 5412419

Abstract:

                   We address the problem of instruction selection in code generation for embedded digital
                   signal processors. Recent work has shown that this task can be efficiently solved by tree
                   covering with dynamic programming, even in combination with the task of register allocation.
                   However, performing instruction selection by tree covering only does not exploit available
                   instruction level parallelism, for instance in form of multiply-accumulate instructions or
                   parallel data moves. In this paper we investigate how such complex instructions may affect
                   detection of optimal tree covers, and we present a two-phase scheme for instruction
                   selection which exploits available instruction-level parallelism. At the expense of higher
                   compilation time, this technique may significantly increase the code quality compared to
                   previous work, which is demonstrated for a widespread DSP.

85
             Retargetable assembly code generation by bootstrapping
             - Leupers, R.; Schenk, W.; Marwedel, P.
             Lehrstuhl Inf., Dortmund Univ., Germany

This Paper Appears in :
High-Level Synthesis, 1994., Proceedings of the Seventh International Symposium on

on Pages: 88 - 93

             This Conference was Held : 18-20 May 1994
              1994
                                    ISBN: 0-8186-5785-5

             IEEE Catalog Number: 94TH0641-1
             Total Pages: ix+171
             References Cited: 10
             Accession Number: 4706384

Abstract:

                   In a hardware/software codesign environment compilers are needed that map software
                   components of a partitioned system behavioral description onto a programmable processor.
                   Since the processor structure is not static, but can repeatedly change during the design
                   process, the compiler should be retargetable in order to avoid manual compiler adaption for
                   each alternative architecture. A restriction of existing retargetable compilers is that they
                   only generate microcode for the target architecture instead of machine-level code. We
                   introduce a bootstrapping technique permitting to translate high-level language (HLL)
                   programs into real machine-level code using a retargetable microcode compiler.
                   Retargetability is preserved, permitting to compare different architectural alternatives in a
                   codesign framework within relatively short time.

86
             A knowledge-based retargetable compiler for application specific signal
             processors
             - Kuroda, I.; Nishitani, T.
             NEC Corp., Kanagawa, Japan

This Paper Appears in :
Circuits and Systems, 1989., IEEE International Symposium on

on Pages: 631 - 634 vol.1

This Conference was Held : 8-11 May 1989
1989

             Total Pages: 3 vol. xl+2246
             References Cited: 6
             Accession Number: 3636453

Abstract:

                   A knowledge-based compiler for application-specific signal processors has been developed.
                   In order to generate optimized microcode for specific architectures, code optimization
                   knowledge used by expert programmers has been implemented in the knowledge base and
                   applied in every phase in the compiler, i.e. program analysis, intermediate code optimization,
                   and code generation. This knowledge-based approach has been evaluated using a signal
                   processor mu PD77230. The developed compiler generates optimized microcode whose code
                   size is almost the same as the code realized by expert programmers. This approach also
                   leads to a compiler that can be retargeted by replacing the machine-dependent databases in
                   the compiler.

87
             An evaluation system for application specific architectures
             - De Gloria, A.; Faraboschi, P.
             Dept. of Biophys. & Electron. Eng., Genoa Univ., Italy

             This Paper Appears in :
             Microprogramming and Microarchitecture. Micro 23. Proceedings of the 23rd Annual Workshop and
             Symposium., Workshop on

on Pages: 80 - 89

             This Conference was Held : 27-29 Nov. 1990
              1990
                                    ISBN: 0-8186-2124-9

             Total Pages: x+299
             References Cited: 10
             Accession Number: 4038572

Abstract:

                   Application specific architectures are assuming an important role in the design of tailored
                   systems as they enable a better cost/performance ratio, by exploiting application intrinsic
                   features, with respect to standard components. An ASA design environment has been
                   developed in order to allow the evaluation of different architecture solutions in terms of cost
                   and performance. The system deals with parallel synchronous non-homogeneous
                   architectures and, starting from the high-level description of the application benchmarks,
                   reaches code generation and simulation of architectures whose description can range from
                   simple timing organization to detailed data-path and instruction structures. As an application
                   example, the system is applied to the comparison of pipelined and parallel micro-architecture
                   organizations for floating-point processing.

88
              Code generation for embedded processors with complex instructions
              - Jong-Yeol Lee; Hyun-Dhong Yoon; Jin-Hyuk Yang; In-Cheol Park; Chong-Min Kyung
              Korea Advanced Institute of Science and Technology

This Paper Appears in :
VLSI and CAD, 1999. ICVC '99. 6th International Conference on

on Pages: 525 - 527

              This Conference was Held : October 26-27, 1999
               1999
                                     ISBN: 0-7803-5727-2

Abstract:

Not Available

89
             A hardware/software partitioning algorithm for processor cores of digital
             signal processing
             - Togawa, N.; Sakurai, T.; Yanagisawa, M.; Ohtsuki, T.
             Dept. of Electr., Inf. & Commun. Eng., Waseda Univ., Tokyo, Japan

This Paper Appears in :
Design Automation Conference, 1999. Proceedings of the ASP-DAC '99. Asia and South Pacific

on Pages: 335 - 338 vol.1

             This Conference was Held : 18-21 Jan. 1999
              1999
                                    ISBN: 0-7803-5012-X

             IEEE Catalog Number: 99EX198
             Total Pages: (xxvi+372+suppl.)
             References Cited: 12
             Accession Number: 6358317

Abstract:

                   A hardware/software cosynthesis system for processor cores of digital signal processing
                   has been developed. This paper focuses on a hardware/software partitioning algorithm which
                   is one of the key issues in the system. Given an input assembly code generated by the
                   compiler in the system, the proposed hardware/software partitioning algorithm first
                   determines the types and the numbers of required hardware units, such as multiple functional
                   units, hardware loop units, and particular addressing units, for a processor core (initial
                   resource allocation). Second, the hardware units determined at initial resource allocation are
                   reduced one by one while the assembly code meets a given timing constraint (configuration
                   of a processor core). The execution time of the assembly code becomes longer but the
                   hardware costs for a processor core to execute it becomes smaller. Finally, it outputs an
                   optimized assembly code and a processor configuration. Experimental results demonstrate
                   that the system synthesizes processor cores effectively according to the features of an
                  application program/data.

90
             Memory organization for improved data cache performance in embedded
             processors
             - Panda, P.; Dutt, N.; Nicolau, A.
             Dept. of Inf. & Comput. Sci., California Univ., Irvine, CA, USA

This Paper Appears in :
System Synthesis, 1996. Proceedings., 9th International Symposium on

on Pages: 90 - 95

             This Conference was Held : 6-8 Nov. 1996
              1996
                                    ISBN: 0-8186-7563-2

             IEEE Catalog Number: 96TB100061
             Total Pages: xii+145
             References Cited: 17
             Accession Number: 5450818

Abstract:

                   Code generation for embedded processors creates opportunities for several performance
                   optimizations not applicable for traditional compilers. We present techniques for improving
                   data cache performance by organizing variables declared in embedded code into memory,
                   using specific parameters of the data cache. Our approach clusters variables to minimize
                   compulsory cache misses, and solves the memory assignment problem to minimize conflict
                   cache misses. Our experiments demonstrate significant improvement in data cache
                   performance (average 46% in hit ratios) by the application of our memory organization
                   technique using code kernels from DSP and other domains on the LSI Logic CW4001
                   embedded processor.

91
             Application-driven design of DSP architectures and compilers
             - Saghir, M.A.R.; Chow, P.; Lee, C.G.
             Dept. of Electr. & Comput. Eng., Toronto Univ., Ont., Canada

This Paper Appears in :
Acoustics, Speech, and Signal Processing, 1994. ICASSP-94., 1994 IEEE International Conference on

on Pages: II/437 - II/440 vol.2

             This Conference was Held : 19-22 April 1994
              1994
                                    Vol. ii
                                                           ISBN: 0-7803-1775-0

             IEEE Catalog Number: 94CH3387-8
             Total Pages: 6 vol. 3382
             References Cited: 9
             Accession Number: 4917147

Abstract:

                   Current DSP architectures are designed to enhance the execution of
                   computationally-intensive, kernel-like loops. Their peculiar architectural features are often
                   difficult for high-level language compilers to exploit. Moreover, their tightly-encoded
                   instruction sets usually restrict the exploitation of instruction-level parallelism beyond a few
                   instances. The quality of compiler-generated code is therefore poor when compared to
                   hand-coded assembly language. We argue for an application-driven approach to designing
                   flexible DSP architectures and effective compilers. We show that the run-time behavior and
                   architectural characteristics of DSP kernels are different from those of DSP applications.
                   We also show that when given a sufficiently flexible target architecture, a compiler is
                   capable of effectively exploiting instances of instruction-level parallelism and DSP-specific
                   architectural features. Finally, we show that a suitable DSP architecture is one that provides
                   the functionality to support digital signal processing requirements, and the flexibility that
                   enables a compiler to generate efficient code.

92
             Exploiting conditional instructions in code generation for embedded
             VLIW processors
             - Leupers, R.
             Editor(s): Borrione, D., Ernst, R.
             Dept. of Comput. Sci., Dortmund Univ., Germany

This Paper Appears in :
Design, Automation and Test in Europe Conference and Exhibition 1999. Proceedings

on Pages: 105 - 109

             This Conference was Held : 9-12 March 1999
              1999
                                    ISBN: 0-7695-0078-1

             IEEE Catalog Number: PR00078
             Total Pages: xxx+798
             References Cited: 10
             Accession Number: 6375753

Abstract:

                   This paper presents a new code optimization technique for a class of embedded processors.
                   Modern embedded processor architectures show deep instruction pipelines and highly
                   parallel VLIW-like instruction sets. For such architectures, any change in the control flow of
                   a machine program due to a conditional jump may cause a significant code performance
                   penalty. Therefore, the instruction sets of recent VLIW machines offer support for
                   branch-free execution of conditional statements in the form of so-called conditional
                   instructions. Whether an if-then-else statement is implemented by a conditional jump
                   scheme or by conditional instructions has a strong impact on its worst-case execution time.
                   However the optimal selection is difficult particularly for nested conditionals. We present a
                   dynamic programming technique for selecting the fastest implementation for nested
                   if-then-else statements based on estimations. The efficacy is demonstrated for a real-life
                   VLIW DSP.

93
             Synthesis of application specific instruction sets
             - Ing-Jer Huang; Despain, A.M.
             Inst. of Comput. & Inf. Eng., Nat. Sun Yat-Sen Univ., Kaohsiung, Taiwan

This Paper Appears in :
Computer-Aided Design of Integrated Circuits and Systems, IEEE Transactions on

on Pages: 663 - 675

              June 1995
                                 Vol. 14
                                                   Issue: 6
                                                                      ISSN: 0278-0070

             References Cited: 21
             CODEN: ITCSDI
             Accession Number: 4975295

Abstract:

                   In instruction set serves as the interface between hardware and software in a computer
                   system. In an application specific environment, the system performance can be improved by
                   designing an instruction set that matches the characteristics of hardware and the application.
                   We present a systematic approach to generate application-specific instruction sets so that
                   software applications can be efficiently mapped to a given pipelined micro-architecture. The
                   approach synthesizes instruction sets from application benchmarks, given a machine model,
                   an objective function, and a set of design constraints. In addition, assembly code is
                   generated to show how the benchmarks can be compiled with the synthesized instruction
                   set. The problem of designing instruction sets is formulated as a modified scheduling
                   problem. A binary tuple is proposed to model the semantics of instructions and integrate the
                   instruction formation process into the scheduling process. A simulated annealing scheme is
                   used to solve for the schedules. Experiments have shown that the approach is capable of
                   synthesizing powerful instructions for modern pipelined microprocessors, and running with
                   reasonable time and a modest amount of memory for large applications.

94
             A data dependent approach to instruction level power estimation
             - Sarta, D.; Trifone, D.; Ascia, G.
             Editor(s): Piuri, V.
             Catania Univ., Italy

This Paper Appears in :
Low-Power Design, 1999. Proceedings. IEEE Alessandro Volta Memorial Workshop on

on Pages: 182 - 190

             This Conference was Held : 4-5 March 1999
              1999
                                    ISBN: 0-7695-0019-6

             Total Pages: x+203
             References Cited: 7
             Accession Number: 6370330

Abstract:

                   The increasing diffusion of portable systems, like mobile computers and phones, or embedded
                   computing applications has driven the need for power analysis and optimization in digital
                   processors used in these systems. In modern CPUs, power estimation and optimization are
                   "two strongly pattern dependent" problems. This means that the influence of the software in
                   power consumption is very high and a power figure for whatever processor must be related to
                   the running software program. Based on the recent techniques already described in literature,
                   we propose a new instruction level power analysis approach, that tries to relate the power
                   dissipation to the executed instructions and their operand values.

95
             Synthesis of application specific instructions for embedded DSP software
             - Hoon Choi; In-Cheol Park; Seung Ho Hwang; Chong-Min Kyung
             Dept. of Electr. Eng., Korea Adv. Inst. of Sci. & Technol., Seoul, South Korea

             This Paper Appears in :
             Computer-Aided Design, 1998. ICCAD 98. Digest of Technical Papers. 1998 IEEE/ACM International
             Conference on

on Pages: 665 - 671

             This Conference was Held : 8-12 Nov. 1998
              1998
                                    ISBN: 1-58113-008-2

             IEEE Catalog Number: 98CB36287
             Total Pages: xxii+704
             References Cited: 15
             Accession Number: 6127727

Abstract:

                   Application specific instructions play an important role in reducing the required code size and
                   increasing performance. This paper describes a new approach to generate application
                   specific instructions for DSP applications. The proposed approach is based on a modified
                   subset-sum problem, and can support multi-cycle complex instructions as well as single
                   cycle instructions, while the previous state-of-the-art approaches can generate only the
                   single-cycle instructions or can just select instructions from the fixed super-set of possible
                   instructions. In addition, the proposed approach can also be applicable to the case that
                   instructions are predefined. The experimental results on real applications show that the
                   proposed approach is effective in making the instructions meet the given constraints without
                   attaching special hardware accelerators.

96
Generating Instruction Sets And Microarchitectures From Applications
- Ing-Jer Huang; Despain, A.M.

This Paper Appears in :
Computer-Aided Design, 1994., IEEE/ACM International Conference on

on Pages: 391 - 396

This Conference was Held : November 6-10, 1994
ISSN: 1063-6757

Abstract:

Not Available

97
             Hardware-software co-designing benchmark-driven superpipelined
             instruction set processors
             - Ching-Long Su; Despain, A.M.
             Lab. of Adv. Comput. Archit., Univ. of Southern California, Los Angeles, CA, USA

             This Paper Appears in :
             Computer Software and Applications Conference, 1994. COMPSAC 94. Proceedings., Eighteenth
             Annual International

             on Pages: 319
             This Conference was Held : 9-11 Nov. 1994
              1994
                                    ISBN: 0-8186-6705-2

             IEEE Catalog Number: 94CH35721
             Total Pages: xvii+477
             References Cited: 2
             Accession Number: 4829894

Abstract:

                   This paper focuses on the issues of designing an optimal superpipelined ISP (instruction set
                   processor) driven by a set of benchmark programs. Most issues discussed in this paper also
                   apply to VLIW and superscalar processors.

98
             An integrated approach to retargetable code generation
             - Wilson, T.; Grewal, G.; Halley, B.; Banerji, D.
             VLSI-CAD Group, Guelph Univ., Ont., Canada

This Paper Appears in :
High-Level Synthesis, 1994., Proceedings of the Seventh International Symposium on

on Pages: 70 - 75

             This Conference was Held : 18-20 May 1994
              1994
                                    ISBN: 0-8186-5785-5

             IEEE Catalog Number: 94TH0641-1
             Total Pages: ix+171
             References Cited: 10
             Accession Number: 4706381

Abstract:

                   Special-purpose instruction set processors (ISPs) challenge compilers because of
                   instruction level parallelism, small numbers of registers, and highly specialized register
                   capabilities. Many traditionally separate subproblems in code generation have been unified
                   and jointly optimized within a single integer linear programming (ILP) model. ILP modeling
                   provides a powerful methodology for generating high-quality code for a variety of ISPs.

99
             Media architecture: general purpose vs. multiple application-specific
             programmable processor
             - Chunho Lee; Kin, J.; Potkonjak, M.; Mangione-Smith, W.H.
             Dept. of Comput. Sci., California Univ., Los Angeles, CA, USA

This Paper Appears in :
Design Automation Conference, 1998. Proceedings

on Pages: 321 - 326

             This Conference was Held : 15-19 June 1998
              1998
                                    ISBN: 0-89791-964-5

             IEEE Catalog Number: 98CH36175
             Total Pages: xxxii+820
             References Cited: 33
             Accession Number: 6084423

Abstract:

                   In this paper we report a framework that makes it possible for a designer to rapidly explore
                   the application-specific programmable processor design space under area constraints. The
                   framework uses a production-quality compiler and simulation tools to synthesize a high
                   performance machine for an application. Using the framework we evaluate the validity of the
                   fundamental assumption behind the development of application-specific programmable
                   processors. Application-specific processors are based on the idea that applications differ
                   from each other in key architectural parameters, such as the available instruction-level
                   parallelism, demand on various hardware components (e.g. cache memory units, register
                   files) and the need for different number of functional units. We found that the framework
                   introduced in this paper can be valuable in making early design decisions such as area and
                   architectural trade-off, cache and instruction issue width trade-off under area constraint,
                   and the number of branch units and issue width.

100
             Architecture Description Languages for Systems-on-Chip Design
             - Tominiyama H.; Halambi A.; Grun P.; Dutt N.; Nicolau A.
             University of California, Irvine, CA, USA
             This Paper Appears in :
             APCHDL 1999

Abstract
Not Available

101
             Parameterized System Design
             - Givargis T.D.; Vahid F.
             University of California, Riverside, CA

CODES 2000 Held in May 2000.

              Abstract
                    Continued growth in chip capacity has led to new methodologies stressing reuse,
                    not only of pre-designed processing components, but even of entire pre-designed
                    architectures. To be used across a variety of applications, such architectures must be
                    heavily parameterized, so they can adapt to those applications' differing constraints
                    by trading off power, performance and size. We describe several parameterized
                    system design issues, and provide results showing how a single architecture with
                    easily configurable parameters can support a wide range of tradeoffs.

102
             Power Analysis Of Embedded Software: A First Step Towards Software
             Power Minimization
             - Tiwari, V.; Malik, S.; Wolfe, A.

This Paper Appears in :
Computer-Aided Design, 1994., IEEE/ACM International Conference on

on Pages: 384 - 390

This Conference was Held : November 6-10, 1994
ISSN: 1063-6757

Abstract:
Not Available

103

             Power analysis and minimization techniques for embedded DSP software
             - Mike Tien-Chien Lee; Tiwari, V.; Malik, S.; Fujita, M.
             Fujitsu Labs. of America, Santa Clara, CA, USA

This Paper Appears in :
Very Large Scale Integration (VLSI) Systems, IEEE Transactions on

on Pages: 123 - 135

             This Conference was Held : 18-20 Jan. 1995
              March 1997
                                 Vol. 5
                                                   Issue: 1
                                                                      ISSN: 1063-8210

             References Cited: 15
             CODEN: IEVSE9
             Accession Number: 5525493

Abstract:

                   Power is becoming a critical constraint for designing embedded applications. Current power
                   analysis techniques based on circuit-level or architectural-level simulation are either
                   impractical or inaccurate to estimate the power cost for a given piece of application
                   software. In this paper, an instruction-level power analysis model is developed for an
                   embedded digital signal processor (DSP) based on physical current measurements.
                   Significant points of difference have been observed between the software power model for
                   this custom DSP processor and the power models that have been developed earlier for some
                   general purpose commercial microprocessors. In particular, the effect of circuit state on the
                   power cost of an instruction stream is more marked in the case of this DSP processor. In
                   addition, the processor has special architectural features that allow dual memory accesses
                   and packing of instructions into pairs. The energy reduction possible through the use of these
                   features is studied. The on-chip Booth multiplier on the processor is a major source of
                   energy consumption for DSP programs. A microarchitectural power model for the multiplier is
                   developed and analyzed for further power minimization. In order to exploit all of the above
                   effects, a scheduling technique based on the new instruction-level power model is proposed.
                   Several example programs are provided to illustrate the effectiveness of this approach.
                   Energy reductions varying from 26% to 73% have been observed. These energy savings are
                   real and have been verified through physical measurement. It should be noted that the energy
                   reduction essentially comes for free. It is obtained through software modification, and thus,
                   entails no hardware overhead. In addition, there is no loss of performance since the running
                   times of the modified programs either improve or remain unchanged.

104

             Designing for low power in complex embedded DSP systems
             - Gebotys, C.H.; Gebotys, R.J.
             Editor(s): Sprague, R.H., Jr.
             Dept. of Electr. & Comput. Eng., Waterloo Univ., Ont., Canada

             This Paper Appears in :
             Systems Sciences, 1999. HICSS-32. Proceedings of the 32nd Annual Hawaii International Conference
             on

             on Pages: 8 pp.
             This Conference was Held : 5-8 Jan. 1999
              1999
                                    ISBN: 0-7695-0001-3

             Total Pages: liii+341
             References Cited: 17
             Accession Number: 6182117

Abstract:

                   This paper presents an empirical methodology for low power driven complex DSP embedded
                   systems design. Unlike DSP design for high performance, research of low power DSP design
                   has received little attention, yet power dissipation is an increasingly important and growing
                   problem. Highly accurate power prediction models for DSP software are derived. Unlike
                   previous techniques, the methodology derives software power prediction models using
                   statistical optimization and it is verified with real power measurements. The approach is
                   general enough to be applied to any embedded DSP processor. Results from two different
                   DSP processors and over 180 power measurements of DSP code show that power can be
                   predicted far embedded systems design with less than 4% error. This result is important for
                   developing a general methodology for power characterization of embedded DSP software
                   since low power is critical to complex DSP applications in many cost sensitive markets.

105
Speeding up Power Estimation of Embedded Software
- Sama, A.; Balakrishnan, M.; Theeuwen, J.F.M.

This Paper will Appear in :
ISPLED 2000 to be held on July 2000.

106
             High-level power modeling, estimation, and optimization
             - Macii, E.; Pedram, M.; Somenzi, F.
             Politecnico di Torino, Italy

This Paper Appears in :
Computer-Aided Design of Integrated Circuits and Systems, IEEE Transactions on

on Pages: 1061 - 1079

              Nov. 1998
                                 Vol. 17
                                                   Issue: 11
                                                                      ISSN: 0278-0070

             References Cited: 111
             CODEN: ITCSDI
             Accession Number: 6120274

Abstract:

                   Silicon area, performance, and testability have been, so far, the major design constraints to
                   be met during the development of digital very-large-scale-integration (VLSI) systems. In
                   recent years, however, things have changed; increasingly, power has been given weight
                   comparable to the other design parameters. This is primarily due to the remarkable success
                   of personal computing devices and wireless communication systems, which demand
                   high-speed computations with low power consumption. In addition, there exists a strong
                   pressure for manufacturers of high-end products to keep power under control, due to the
                   increased costs of packaging and cooling this type of device. Last, the need of ensuring high
                   circuit reliability has turned out to be more stringent. The availability of tools for the
                   automatic design of low-power VLSI systems has thus become necessary. More
                   specifically, following a natural trend, the interests of the researchers have lately shifted to
                   the investigation of power modeling, estimation, synthesis, and optimization techniques that
                   account for power dissipation during the early stages of the design flow. This paper surveys
                   representative contributions to this area that have appeared in the recent literature.

107
             A minimum-cost circulation approach to DSP address-code generation
             - Gebotys, C.H.
             Dept. of Electr. & Comput. Eng., Waterloo Univ., Ont., Canada

This Paper Appears in :
Computer-Aided Design of Integrated Circuits and Systems, IEEE Transactions on

on Pages: 726 - 741

              June 1999
                                 Vol. 18
                                                   Issue: 6
                                                                      ISSN: 0278-0070

             References Cited: 24
             CODEN: ITCSDI
             Accession Number: 6270838

Abstract:

                   This paper presents a new approach to solving the DSP address code generation problem. A
                   minimum cost circulation approach is used to efficiently generate high-performance
                   addressing code in polynomial time. Results show that addressing code size improvements of
                   up to 6/spl times/ are obtained, accounting for up to 1.6/spl times/ improvement in code size
                   and performance of compiler-generated DSP code. This research is important for industry
                   since this value-added technique can improve code size, energy dissipation, and
                   performance, without increasing cost.

108
             Application-driven synthesis of memory-intensive systems-on-chip
             - Kirovski, D.; Chunho Lee; Potkonjak, M.; Mangione-Smith, W.H.
             Dept. of Comput. Sci., California Univ., Los Angeles, CA, USA

This Paper Appears in :
Computer-Aided Design of Integrated Circuits and Systems, IEEE Transactions on

on Pages: 1316 - 1326

              Sept. 1999
                                 Vol. 18
                                                   Issue: 9
                                                                      ISSN: 0278-0070

             References Cited: 29
             CODEN: ITCSDI
             Accession Number: 6347848

Abstract:

                   Due to the increasing popularity of multimedia and communications applications,
                   requirements for application-specific systems typically include design flexibility and data
                   management ability. Since the development of such systems is a market-driven task,
                   reducing the time to market and manufacturing cost, while still satisfying application
                   performance requirements, is an important system synthesis requirement. We have
                   developed a new approach for area optimization of core-based systems. The approach uses
                   basic block relocation in order to reduce the number of cache misses and, thus, enable
                   hardware savings during system synthesis. Given a processor model, a cache model, and a
                   set of nonpreemptive tasks with timing constraints, the goal of the synthesis framework is to
                   select a system configuration (processor, I-cache, and D-cache) of minimal area that
                   satisfies the performance constraints. The system synthesis framework has two key
                   components. The first component is a code optimization engine that relocates basic blocks
                   within a given assembly program in order to reduce the number of cache misses. The second
                   component is a search mechanism that leverages the improvements in code performance
                   obtained by the first component to select the most area-efficient system configuration. In
                   order to bridge the gap between the profiling and modeling tools, we have constructed a new
                   performance evaluation platform. It integrates the existing modeling, profiling, and simulation
                   tools with the developed system-level synthesis tools. The effectiveness of the synthesis
                   approach is demonstrated on a variety of modern real-life multimedia and communication
                   applications.

109
             Programmable DSP architectures. I
             - Lee, E.A.
             Dept. of Electr. Eng. & Comput. Sci., California Univ., Berkeley, CA, USA

This Paper Appears in :
ASSP Magazine, IEEE [see also IEEE Signal Processing Magazine]

on Pages: 4 - 19

              Oct. 1988
                                 Vol. 5
                                                   Issue: 4
                                                                      ISSN: 0740-7467

             References Cited: 22
             CODEN: IAMAEI
             Accession Number: 3354120

Abstract:

                   The architectural features of single-chip programmable digital signal processors (DSPs) are
                   explored. The focus is on the most basic such feature, the integration of a hardware
                   multiplier/accumulator into the data path, and a more subtle feature, the use of several (up to
                   six) independent memory banks. These features are studied in terms of the performance
                   benefit and the impact on the user. Representative DSPs from three manufacturers AT&T
                   Motorola, and Texas Instruments are used to illustrate the ideas to compare different
                   solutions to the same problems.

110
             Programmable DSP architectures. II
             - Lee, E.A.
             Dept. of Electr. Eng. & Comput. Sci., California Univ., Berkeley, CA, USA

This Paper Appears in :
ASSP Magazine, IEEE [see also IEEE Signal Processing Magazine]

on Pages: 4 - 14

              Jan. 1989
                                 Vol. 6
                                                   Issue: 1
                                                                      ISSN: 0740-7467

             References Cited: 9
             CODEN: IAMAEI
             Accession Number: 3375276

Abstract:

                   For pt.I see ibid., vol.5, no.4, p.4-19, Oct. 1988. Three distinct techniques are used for dealing
                   with pipelining, namely, interlocking, time-stationary coding, and data-stationary coding, are
                   examined. These techniques are studied in light of the performance benefit and the impact on
                   the user. Representative DSPs from AT&T, Motorola, and Texas Instruments are used to
                   illustrate the ideas and compare different solutions to the same problems. Trends are
                   discussed, and some predictions for the future are made.

111
Design Challenges for New Application-Specific Processors

Margarida F. Jacome and Gustavo de Veciana

      This article discusses research challenges in developing methodologies and retargetable compilers/CAD tools for the
      synthesis and analysis of a key component in portable digital communications and multimedia consumer electronics
      systems, namely, application-specific processors and associated compilers. Typically, functionality is implemented in
      software; however, the penalty in cost efficiency incurred by using general-purpose processors, or even "off-the-shelf"
      DSP cores, may be unacceptable. Very large instruction word (VLIW) application-specific instruction-set processors
      (ASIPs) realize attractive cost/efficiency trade-offs. Still, difficulties with ASIP design and current compiler technology
      pose significant obstacles to this technology. In this article we discuss these challenges and propose a framework to
      jointly address the synthesis of VLIW ASIPs and the development of high-quality retargetable compilers for such
      specialized processors.