This Paper Appears in :
High-Level Synthesis, 1994., Proceedings of the Seventh International Symposium
on
on Pages: 11 - 16
This Conference was Held : 18-20 May 1994
1994
ISBN: 0-8186-5785-5
IEEE Catalog Number: 94TH0641-1
Total Pages: ix+171
References Cited: 14
Accession Number: 4706372
Abstract:
Application Specific Instruction set Processors (ASIPs) are field or mask
programmable
processors of which the architecture and instruction set are optimised
to a specific
application domain. ASIPs offer a high degree of flexibility and are therefore
increasingly
being used in competitive markets like telecommunications. However, adequate
CAD
techniques for the design and programming of ASIPs are missing hitherto.
An interactive
approach for the definition of optimised microinstruction sets of ASIPs
is presented. A
second issue is a method for instruction selection when generating code
for a predefined
ASIP. A combined instruction set and data-path model is generated, onto
which the
application is mapped.
2
A performance maximization algorithm to design ASIPs under the
constraint of chip area including RAM and ROM sizes
- Nguyen Ngoc Binh; Imai, M.; Takeuchi, Y.
Dept. of Inf. & Math. Sci., Osaka Univ., Japan
This Paper Appears in :
Design Automation Conference 1998. Proceedings of the ASP-DAC '98. Asia
and South Pacific
on Pages: 367 - 372
This Conference was Held : 10-13 Feb. 1998
1998
ISBN: 0-7803-4425-1
IEEE Catalog Number: 98EX121
Total Pages: xxxviii+606
References Cited: 19
Accession Number: 5984946
Abstract:
In designing ASIPs (Application Specific Integrated Processors) the papers
investigated so
far have almost focused on the optimization of the CPU core and did not
pay enough attention
to the optimization of the RAM and ROM size together. This paper overcomes
this limitation
and proposes an optimization algorithm to define the best tradeoff between
the CPU core,
RAM and ROM of an ASIP chip to achieve the highest performance while satisfying
design
constraints on the chip area. The partitioning problem is formalized as
a combinatorial
optimization problem that partitions the operations into hardware and software
so that the
performance of the designed ASIP is maximized under given chip area constraint,
where the
chip area includes the HW cost of the register file for a given application
program with the
associated input data set. The optimization problem is parameterized so
that it can be applied
with different technologies to synthesize CPU cores, RAMs or ROMs. The
experimental
results show that the proposed algorithm is found to be effective and efficient.
3
PEAS-I: A hardware/software co-design system for ASIPs
- Alomary, A.; Nakata, T.; Honma, Y.; Sato, J.; Hikichi, N.; Imai, M.
Toyohashi Univ. of Technol., Japan
This Paper Appears in :
Design Automation Conference, 1993, with EURO-VHDL '93. Proceedings EURO-DAC
'93.,
European
on Pages: 2 - 7
This Conference was Held : 20-24 Sept. 1993
1993
ISBN: 0-8186-4350-1
IEEE Catalog Number: 93CH3352-2
Total Pages: xxi+579
References Cited: 10
Accession Number: 5038430
Abstract:
The current implementation and experimental results of the PEAS-1 (practical
environment
for application specific integrated processor (ASIP) development - Version
I) system are
described. The PEAS-I system is a hardware/software co-design system for
ASIP
development. The input to the system is a set of application programs written
in C language,
an associated data set, and design constraints such as chip area and power
consumption.
The system generates an optimized CPU core design in the form of an HDL,
as well as a set
of application program development tools, such as a C compiler, assembler,
and simulator. A
novel method that formulates the design of an optimal instruction set using
an integer
programming approach is described. A tool that enables the designer to
predict the chip area
and performance of the design before the detailed design is completed is
discussed.
Application program development tools are generated in addition to the
ASIP hardware
4
An ASIP design methodology for embedded systems
- Kucukcakar, K.
Escalade Corp., Santa Clara, CA, USA
This Paper Appears in :
Hardware/Software Codesign, 1999. (CODES '99). Proceedings of the Seventh
International
Workshop on
on Pages: 17 - 21
This Conference was Held : 3-5 May 1999
1999
ISBN: 1-58113-132-1
IEEE Catalog Number: 99TH8450
Total Pages: vii+216
References Cited: 8
Accession Number: 6319827
Abstract:
A well-known challenge during processor design is to obtain the best possible
results for a
typical target application domain that is generally described as a set
of benchmarks.
Obtaining the best possible result in turn becomes a complex tradeoff between
the generality
of the processor and the physical characteristics. A custom instruction
to perform a task can
result in significant improvements for an application, but generally, at
the expense of some
overhead for all other applications. In the recent years, Application-Specific
Instruction-Set
Processors (ASIP) have gained popularity in production chips as well as
in the research
community. In this paper, we present a unique architecture and methodology
to design ASIPs
in the embedded controller domain by customizing an existing processor
instruction set and
architecture rather than creating an entirely new ASIP tuned to a benchmark.
5
An integrated design environment for application specific integrated
processor
- Sato, J.; Imai, M.; Hakata, T.; Alomary, A.Y.; Hikichi, N.
Dept. of Inf. & Comput. Sci., Toyohashi Univ. of Technol., Japan
This Paper Appears in :
Computer Design: VLSI in Computers and Processors, 1991. ICCD '91. Proceedings,
1991 IEEE
International Conference on
on Pages: 414 - 417
This Conference was Held : 14-16 Oct. 1991
1991
ISBN: 0-8186-2270-9
Total Pages: xvi+654
References Cited: 10
Accession Number: 4128007
Abstract:
A novel framework for ASIP (application specific integrated processor)
development is
proposed. The system accepts a set of example programs written in the C
language and their
expected data as input, and profiles these programs both statically and
dynamically. Then
taking advantage of the profiled results, the system decides the instruction
set and hardware
architectures of ASIP, and synthesizes the CPU core design of the ASIP,
as well as the
software development tools for the ASIP such as compiler and simulator.
6
PSCP: A scalable parallel ASIP architecture for reactive systems
- Pyttel, A.; Sedlmeier, A.; Veith, C.
Corp. Technol., Siemens AG, Munich, Germany
This Paper Appears in :
Design, Automation and Test in Europe, 1998., Proceedings
on Pages: 370 - 376
This Conference was Held : 23-26 Feb. 1998
1998
ISBN: 0-8186-8359-7
IEEE Catalog Number: 98EX123
Total Pages: xxxiv+993
References Cited: 18
Accession Number: 5906829
Abstract:
We describe a codesign approach based on a parallel and scalable ASIP architecture,
which
is suitable for the implementation of reactive systems. The specification
language of our
approach is extended statecharts. Our ASIP architecture is scalable with
respect to the
number of processing elements as well as parameters such as bus widths
and register file
sizes. Instruction sets are generated from a library of components covering
a spectrum of
space/time trade-off alternatives. Our approach features a heuristic static
timing analysis
step for statecharts. An industrial example requiring the real-time control
of several stepper
motors illustrates the benefits of our approach.
7
Design of an ASIP architecture for low-level visual elaborations
- Raffo, L.; Sabatini, S.P.; Mantelli, M.; De Gloria, A.; Bisio, G.M.
Dept. of Electr. & Electron. Eng., Cagliari Univ., Italy
This Paper Appears in :
Very Large Scale Integration (VLSI) Systems, IEEE Transactions on
on Pages: 145 - 153
This Conference was Held : 18-20 Jan. 1995
March 1997
Vol. 5
Issue: 1
ISSN: 1063-8210
References Cited: 10
CODEN: IEVSE9
Accession Number: 5525495
Abstract:
We consider the design process of VLSI systems dedicated to the real-time
implementation
of cooperative algorithms whose functionalities can be characterized by
multilayer
ensembles of simple elements which interact locally. These algorithms are
related, even
though not exclusively, to the implementation of various tasks in low-level
machine vision.
The starting point in the design process is the formulation of the sequential
algorithm that
computes the behavior of the system. Algorithmic transformations are performed
to expose
the parallelism originally present in the task. Given the description in
terms of parallel loops,
we partition the system and organize it as a set of processing units. The
architectural
structure of these units takes properly into account the algorithmic constraints
on precision
both in data representation and computation. The program flow implemented
by our
programmable architectural solution (ASIP) is an iterative sequence of
multiply-and-accumulate operations performed in parallel. The programmability
concerns
both the structure/coefficients of the algorithm-depending on the specific
application-and its
computational parameters. The architecture's main blocks are described
in VHDL and
synthesized as a semi-custom chip, using standard tools. Following this
procedure, we
designed an ASIP core for performing real-time texture-based image segregation.
8
Lower bound on latency for VLIW ASIP datapaths
- Jacome, M.F.; De Veciana, G.
Dept. of Electr. & Comput. Eng., Texas Univ., Austin, TX, USA
This Paper Appears in :
Computer-Aided Design, 1999. Digest of Technical Papers. 1999 IEEE/ACM
International Conference
on
on Pages: 261 - 268
This Conference was Held : 7-11 Nov. 1999
1999
ISBN: 0-7803-5832-5
IEEE Catalog Number: 99CH37051
Total Pages: xxiv+611
References Cited: 11
Accession Number: 6441936
Abstract:
Traditional lower bound estimates on latency for dataflow graphs assume
no data transfer
delays. While such approaches can generate tight lower bounds for datapaths
with a
centralized register file, the results may be uninformative for datapaths
with distributed
register file structures that are characteristic of VLIW ASIPs (very large
instruction word
application-specific instruction set processors). In this paper, we propose
a latency bound
that accounts for such data transfer delays. The novelty of our approach
lies in constructing
the "window dependency graph" and bounds associated with the problem which
capture
delay penalties due to operation serialization and/or data moves among
distributed register
files. Through a set of benchmark examples, we show that the bound is competitive
with
state-of-the-art approaches. Moreover, our experiments show that the approach
can aid an
iterative improvement algorithm in determining good functional unit assignments-a
key step
in code generation for VLIW ASIPs.
9
A new HW/SW partitioning algorithm for synthesizing the highest
performance pipelined ASIPs with multiple identical FUs
- Binh, N.N.; Imai, M.; Shiomi, A.
Dept. of Inf. & Comput. Sci., Osaka Univ., Japan
This Paper Appears in :
Design Automation Conference, 1996, with EURO-VHDL '96 and Exhibition,
Proceedings EURO-DAC
'96, European
on Pages: 126 - 131
This Conference was Held : 16-20 Sept. 1996
1996
ISBN: 0-8186-7573-X
IEEE Catalog Number: 96CB36000
Total Pages: xxiii+579
References Cited: 18
Accession Number: 5412409
Abstract:
This paper introduces a new HW/SW partitioning algorithm for automatic
synthesis of a
pipelined CPU architecture with multiple identical functional units (MIFUs)
of each type in
designing ASIPs (Application Specific Integrated Processors). The partitioning
problem is
formalized as a combinatorial optimization problem that partitions the
operations into
hardware and software so that the performance of the designed ASIP is maximized
under
given gate count and power consumption constraints, regarding the optimal
selection of
needed FUs of each type. A branch-and-bound algorithm with proposed lower
bound
function is used to solve the formalized problem. The experimental results
show that the
proposed algorithm is found to be effective and efficient.
10
System design using ASIPs
- Carro, L.; Pereira, G.A.; Alba, C.; Suzim, A.
Univ. Federal do Rio Grande do Sul, Porto Alegre, Brazil
This Paper Appears in :
Engineering of Computer-Based Systems,1996. Proceedings., IEEE Symposium
and Workshop on
on Pages: 80 - 85
This Conference was Held : 11-15 March 1996
1996
ISBN: 0-8186-7355-9
IEEE Catalog Number: 96TB100022
Total Pages: xi+465
References Cited: 9
Accession Number: 5226399
Abstract:
This paper describes our current research in the field of systems design,
trying to reach an
Application Specific System Integration (ASIS). We try to go beyond circuit
integration to
reach systems integration, using Application Specific Processors (ASIPs)
with different
architectures. Our target system is based on industry applications. In
this paper we show the
environment that allows the fine tuning of RISC processors to specific
applications, and the
migration of a CISC microcontroller to an ASIP architecture. The studied
examples show
meaningful gains regarding the total area of the processor for each approach.
This free space
can be used to integrate other parts of the whole system.
11
Incorporating compiler feedback into the design of ASIPs
- Onion, F.; Nicolau, A.; Dutt, N.
Dept. of Inf. & Comput. Sci., California Univ., Irvine, CA, USA
This Paper Appears in :
European Design and Test Conference, 1995. ED&TC 1995, Proceedings.
on Pages: 508 - 513
This Conference was Held : 6-9 March 1995
1995
ISBN: 0-8186-7039-8
IEEE Catalog Number: 95TH8058
Total Pages: xxvii+611
References Cited: 12
Accession Number: 5057083
Abstract:
This paper presents a framework for providing feedback from an optimizing
compiler into the
design of an ASIP (Application Specific Instruction-set Processor). The
optimizing compiler
is used to assess the hardware needs of a suite of applications to which
the ASIP is to be
tuned. By incorporating the compiler into the design process, the design
space is increased
as more information is provided at an earlier stage during the design process.
Our initial
study involves detecting potentially chainable operation sequences using
scheduling
techniques developed for exploiting instruction-level parallelism. Results
of this study are
included.
12
Application-Specific Pipelines for Exploiting Instruction-Level Parallelism
Childers, B.R.; Davidson J.W.
University of Virginia
Technical Report No. CS-98-14, May 1, 1998
Abstract :
Application-specific processor design is a promising approach for meeting
the
performance and cost goals of a system. Application-specific processors
are
especially promising for embedded systems (e.g., automobile control systems,
avionics, cellular phones, etc.) where a small increase in performance
and
decrease in cost can have a large impact on a product's viability. Sutherland,
Sproull, and Molnar have proposed a new pipeline organization called the
Counterflow Pipeline (CFP). This paper shows that the CFP is an ideal architecture
for fast, low-cost design of high-performance processors customized for
computation-intensive embedded applications. First, we describe why CFP's
are
particularly well-suited to realizing application-specific processors.
Second, we describe how a CFP tailored to an application can be constructed
automatically. Third, we present measurements that show CFP's elegantly
and simply
provide speculative execution, out-of-order execution, and register renaming
that is
matched to the application. These measurements show that CFP's speculative
and out-of-order execution allow it to tolerate frequent control dependences
and
high-latency operations such as memory accesses. Finally, we show that
asynchro-
nous counterflow pipelines may achieve very high-performance by reducing
the
average execution latency of instructions over synchronous implementations.
Appli-
cation speedups of up to 7.8 are achieved using custom counterflow pipelines
for
several well-known kernel loops.
13
Hierarchical test generation and design for testability methods for ASPPs
and ASIPs
- Ghosh, I.; Raghunathan, A.; Jha, N.K.
Fujitsu Labs. of America, Sunnyvale, CA, USA
This Paper Appears in :
Computer-Aided Design of Integrated Circuits and Systems, IEEE Transactions
on
on Pages: 357 - 370
March 1999
Vol. 18
Issue: 3
ISSN: 0278-0070
References Cited: 38
CODEN: ITCSDI
Accession Number: 6196926
Abstract:
In this paper, we present design for testability (DFT) and hierarchical
test generation
techniques for facilitating the testing of application-specific programmable
processors
(ASPPs) and application-specific instruction processors (ASIPs). The method
utilizes the
register-transfer level (RTL) circuit description of an ASPP or ASIP to
come up with a set of
test microcode patterns which can be written into the instruction read-only
memory (ROM)
of the processor. These lines of microcode dictate a new control/data flow
in the circuit and
can be used to test modules which are not easily testable. The new control/data
flow is used
to justify precomputed test sets of a module from the system primary inputs
to the module
inputs and propagate output responses from the module output to the system
primary
outputs. The testability analysis, which is based on the relevant control/data
flow extracted
from the RTL circuit, is symbolic. Thus, it is independent of the bit-width
of the data path
and is extremely fast. The test microcode patterns are a by-product of
this analysis. If the
derived test microcode cannot test all untested modules in the circuit,
then test multiplexers
are added (usually to the off-critical paths of the data path) to test
these modules. This is
done to guarantee the testability of all modules in the circuit. If the
control microcode
memory of the processor is erasable, then the test microcode lines can
be erased once the
testing of the chip is over. In that case, the DFT scheme has very little
overhead (typically
less than 1%). Otherwise, the test microcode lines remain as an overhead
in the control
memory. The method requires the addition of only one external test pin.
Application of this
technique to several examples has resulted in a very high fault coverage
(above 99.6%) for
all of them. The test generation time is about three orders of magnitude
smaller compared to
an efficient gate-level sequential test generator. The average area overhead
(without
assuming an erasable ROM) is 3.1% while the delay overheads are negligible.
This method
does not require any scan in the controller or data path. It is also amenable
to at-speed
testing.
14
Functional verification of intellectual properties (IP): a simulation-based
solution for an application-specific instruction-set processor
- Stadler, M.; Rower, T.; Kaeslin, H.; Felber, N.; Fichtner, W.; Thalmann,
M.
Integrated Syst. Lab., Swiss Fed. Inst. of Technol., Zurich, Switzerland
This Paper Appears in :
Test Conference, 1999. Proceedings. International
on Pages: 414 - 420
This Conference was Held : 28-30 Sept. 1999
1999
ISBN: 0-7803-5753-1
IEEE Catalog Number: 99CH37034
Total Pages: xiv+1163
References Cited: 16
Accession Number: 6536392
Abstract:
Scalability and customization properties of IP modules demand for new approaches
in
functional verification. We present a novel simulation-based solution for
an
Application-specific Instruction-set Processor (ASIP). Existing assembler
code
preselected by IP-configurable constraints forms the verification data
base (reference
stimuli). A behavioral "golden model" of the IP is used to derive expected
responses suitable
for any possible configuration of the final ASIP (RTL) implementation.
Cycle-based
verification is performed by stimulating the RTL model with the assembled
reference stimuli
and by comparing the outputs (actual responses) against the expected responses.
Primary
input stimulation is accomplished by reading back interface data prior
written to a memory
(model) under control of the reference stimuli. The synchronization of
the
configaration-dependent actual responses to the non-cycle-related expected
responses is
achieved by a mechanism based on "interface-specific activity scheduling",
which further
more reduces the number of vectors efficiently, resulting in a significant
simulation
speed-up.
15
Reconfigurable systems: activities in Asia and South Pacific
- Amano, H.; Shibata, Y.
Dept. of Comput. Sci., Keio Univ., Yokohama, Japan
This Paper Appears in :
Design Automation Conference 1998. Proceedings of the ASP-DAC '98. Asia
and South Pacific
on Pages: 453 - 457
This Conference was Held : 10-13 Feb. 1998
1998
ISBN: 0-7803-4425-1
IEEE Catalog Number: 98EX121
Total Pages: xxxviii+606
References Cited: 43
Accession Number: 5920071
Abstract:
Systems and researches on reconfigurable systems in Asia and South Pacific
are picked up
and introduced. Like Northern America and European countries, various platforms,
application specific systems and education platforms have been proposed
and developed.
16
Exploiting intellectual properties in ASIP designs for embedded DSP
software
- Hoon Choi; Ju Hwan Yi; Jong-Yeol Lee; In-Cheol Park; Chong-Min Kyung
Dept. of Electr. Eng., Korea Adv. Inst. of Sci. & Technol., Taejon,
South Korea
This Paper Appears in :
Design Automation Conference, 1999. Proceedings. 36th
on Pages: 939 - 944
This Conference was Held : 21-25 June 1999
1999
ISBN: 1-58113-092-9
IEEE Catalog Number: 99CH36361
Total Pages: xxxii+1003
References Cited: 10
Accession Number: 6504323
Abstract:
The growing requirements on the correct design of a high-performance system
in a short
time force us to use IP's in many designs. In this paper, we propose a
new approach to select
the optimal set of IPs and interfaces to make the application program meet
the performance
constraints in ASIP designs. The proposed approach selects IPs with considering
interfaces
and supports concurrent execution of parts of task in kernel as software
code with others in
IPs, while the previous state-of-the-art approaches do not consider IPs
and interfaces
simultaneously and cannot support the concurrent execution. The experimental
results on
real applications show that the proposed approach is effective in making
application
programs meet the performance constraints using IPs.
17
Instruction set selection for ASIP design
- Gschwind, M.
IBM Thomas J. Watson Res. Center, Yorktown Heights, NY, USA
This Paper Appears in :
Hardware/Software Codesign, 1999. (CODES '99). Proceedings of the Seventh
International
Workshop on
on Pages: 7 - 11
This Conference was Held : 3-5 May 1999
1999
ISBN: 1-58113-132-1
IEEE Catalog Number: 99TH8450
Total Pages: vii+216
References Cited: 26
Accession Number: 6319825
Abstract:
We describe an approach for application-specific processor design based
on an extendible
microprocessor core. Core-based design allows to derive application-specific
instruction
processors from a common base architecture with low non-recurring engineering
cost. The
results of this application-specific customization of a common base architecture
are families
of related and largely compatible processor families. These families can
share support tools
and even binary compatible code which has been written for the common base
architecture.
Critical code portions are customized using the application-specific instruction
set
extensions. We describe a hardware/software co-design methodology which
can be used
with this design approach. The presented approach uses the processor core
to allow early
evaluation of ASIP design options using rapid prototyping techniques. We
demonstrate this
approach with two case studies, based on the implementation and evaluation
of
application-specific processor extensions for Prolog program execution,
and memory
prefetching for vector and matrix operations.
18
Resource constrained dataflow retiming heuristics for VLIW ASIPs
- Jacome, M.; de Veciana, G.; Akturan, C.
Dept. of Electr. & Comput. Eng., Texas Univ., Austin, TX, USA
This Paper Appears in :
Hardware/Software Codesign, 1999. (CODES '99). Proceedings of the Seventh
International
Workshop on
on Pages: 12 - 16
This Conference was Held : 3-5 May 1999
1999
ISBN: 1-58113-132-1
IEEE Catalog Number: 99TH8450
Total Pages: vii+216
References Cited: 14
Accession Number: 6319826
Abstract:
This paper addresses issues in code generation of time critical loops for
VLIW ASIPs with
heterogenous distributed register structures. We discuss a code generation
phasing whereby
one first considers binding options that minimize the significant delays
that may be incurred
on such processors. Given such a binding we consider retiming, subject
to code size
constraints, so as to enhance performance. Finally a compatible schedule,
minimizing
latency, is sought. Our main focus in this paper is on the role retiming
plays in this complex
code generation problem. We propose heuristic algorithms for exploring
code
size/performance tradeoffs through retiming. Experimental results are presented
indicating
that the heuristics perform well on a sample of dataflows.
19
A hardware/software codesign partitioner for ASIP design
- Alomary, A.Y.
Appl. Sci. Univ., Amman, Jordan
This Paper Appears in :
Electronics, Circuits, and Systems, 1996. ICECS '96., Proceedings of the
Third IEEE International
Conference on
on Pages: 251 - 254 vol.1
This Conference was Held : 13-16 Oct. 1996
1996
Vol. 1
ISBN: 0-7803-3650-X
IEEE Catalog Number: 96TH8229
Total Pages: 2 vol. xxix+1256
References Cited: 7
Accession Number: 5621974
Abstract:
This paper introduces a new codesign partitioning method used in automating
the design of
ASIP (Application Specific Integrated Processor). The codesign partitioning
problem is
formalized as a combinatorial optimization problem that partitions the
operations into
hardware and software such that a certain performance goal is met using
minimum hardware
resources. A branch-and-bound algorithm is used to solve the presented
formalization. The
proposed method is found to be effective in producing a quality design
in reasonable time
with a minimum of design interaction.
20
An ASIP instruction set optimization algorithm with functional module
sharing constraint
- Alomary, A.; Nakata, T.; Honma, Y.; Imai, M.; Hikichi, N.
Toyohashi Univ. of Technol., Japan
This Paper Appears in :
Computer-Aided Design, 1993. ICCAD-93. Digest of Technical Papers., 1993
IEEE/ACM
International Conference on
on Pages: 526 - 532
This Conference was Held : 7-11 Nov. 1993
1993
ISBN: 0-8186-4490-7
IEEE Catalog Number: 93CH3344-9
Total Pages: xxviii+781
References Cited: 6
Accession Number: 4979737
Abstract:
This paper describes a formal method that selects the instruction set of
an ASIP (application
specific integrated processor) that maximizes the chip performance under
the constraints of
chip area and power consumption. Our contribution includes a new formalization
and
algorithm that considers the functional module sharing in the problem of
instruction set
optimization. This problem was not addressed in the previous work and considering
it leads
to an efficient implementation of the selected instructions. The proposed
method also
enables designers to predict the performance of their designs before implementing
them,
which is an important feature for producing a high quality design in reasonable
time.
21
Mapping statechart models onto an FPGA-based ASIP architecture
- Buchenrieder, K.; Pyttel, A.; Veith, C.
Corp. Res. & Dev., Siemens AG, Munich, Germany
This Paper Appears in :
Design Automation Conference, 1996, with EURO-VHDL '96 and Exhibition,
Proceedings EURO-DAC
'96, European
on Pages: 184 - 189
This Conference was Held : 16-20 Sept. 1996
1996
ISBN: 0-8186-7573-X
IEEE Catalog Number: 96CB36000
Total Pages: xxiii+579
References Cited: 19
Accession Number: 5412417
Abstract:
In this paper, we describe a system to map hardware-software systems specified
with
statechart models on an ASIP architecture based on FPGAs. The architecture
consists of a
reusable CPU core with enhancements to execute the behavior of statecharts
correctly. Our
codesign system generates an application-specific hardware control block,
an
application-specific set of registers, and an instruction stream. The instruction
stream
consists of a static set of core instructions, and a set of custom instructions
for performance
enhancements. In contrast to previous approaches, the presented method
supports extended
statecharts. The system also assists designers during space/time tradeoff
optimizations. The
benefits of the approach are demonstrated with an industrial control application
comparing
two different timing schemes.
22
A hardware/software partitioning algorithm for pipelined instruction set
processor
- Binh, N.N.; Imai, M.; Shiomi, A.; Hikichi, N.
Dept. of Inf. & Comput. Sci., Toyohashi Univ. of Technol., Japan
This Paper Appears in :
Design Automation Conference, 1995, with EURO-VHDL, Proceedings EURO-DAC
'95., European
on Pages: 176 - 181
This Conference was Held : 18-22 Sept. 1995
1995
ISBN: 0-8186-7156-4
IEEE Catalog Number: 95CB35850
Total Pages: xxviii+608
References Cited: 9
Accession Number: 5100243
Abstract:
This paper proposes a new method to design an optimal instruction set for
pipelined ASIP
development using a formal HW/SW codesign methodology. The codesign task
addressed in
this paper is to find a set of HW implemented operations to achieve the
highest performance
of a pipelined ASIP under a given gate count and power consumption constraint.
The method
enables to estimate the performance and pipeline hazards of the designed
ASIP very
accurately. The experimental results show that the proposed method is effective
and quite
efficient.
23
Architecture synthesis of high-performance application-specific
processors
- Breternitz, M., Jr.; Shen, J.P.
Dept. of Electr. & Comput. Eng., Carnegie Mellon Univ., Pittsburgh,
PA, USA
This Paper Appears in :
Design Automation Conference, 1990. Proceedings., 27th ACM/IEEE
on Pages: 542 - 548
This Conference was Held : 24-28 June 1990
1990
ISBN: 0-89791-363-9
Total Pages: xxi+743
References Cited: 13
Accession Number: 3976155
Abstract:
An automated approach, called architecture synthesis, for designing application-specific
processors is presented. The key principles of the application-specific
processor design
(ASPD) methodology include: a semicustom compilation-driven design/implementation
approach, the exploitation of fine-grained parallelism for high performance,
and the
adaptation of datapath topology to the data transfers required by the application.
The
powerful microcode compilation techniques of percolation scheduling and
pipeline scheduling
extract and enhance the parallelism in the application object code to generate
all optimized
specification of the target processor. Implementation optimization is performed
to allocate
functional units and register files. Graph-coloring algorithms minimize
the amount of
hardware needed to exploit available parallelism. Data memory employs an
organization with
multiple banks. Compilation techniques are used to allocate data over the
memory banks to
enhance parallel access.
24
Architectural considerations for application-specific counterflow
pipelines
- Childers, B.R.; Davidson, J.W.
Editor(s): Wills, D.S., DeWeerth, S.P.
Dept. of Comput. Sci., Virginia Univ., Charlottesville, VA, USA
This Paper Appears in :
Advanced Research in VLSI, 1999. Proceedings. 20th Anniversary Conference
on
on Pages: 3 - 22
This Conference was Held : 21-24 March 1999
1999
ISBN: 0-7695-0056-0
Total Pages: x+380
References Cited: 29
Accession Number: 6376051
Abstract:
Application-specific processor design is a promising approach for meeting
the performance
and cost goals of a system. Application-specific processors are especially
promising for
embedded systems (e.g., digital cameras, cellular phones, etc.) where a
small increase in
performance and decrease in cost can have a large impact on a product's
viability. Sproull,
Sutherland and Molnar (see IEEE Design and Test of Computers, vol. 11,
no. 3, p. 48-59,
1994) have proposed a new pipeline organization called the Counterflow
Pipeline (CFP). This
paper evaluates CFP design alternatives and shows that the CFP is an ideal
architecture for
fast, low-cost design of high-performance processors customized for
computation-intensive embedded applications. First, we describe why CFP's
are particularly
well-suited to realizing application-specific processors. Second we describe
how a CFP
tailored to an application can be constructed automatically. Third, we
present measurements
that evaluate CFP design trade-offs and show that CFP's provide speculative
and
out-of-order execution, and register renaming that is matched to an application.
Fourth, we
show that asynchronous counterflow pipelines achieve high-performance by
reducing the
average execution latency of instructions over synchronous implementations.
Finally, we
demonstrate that custom CFP's achieve cycles per instruction measurements
that are
competitive with 4-way superscalar out-of-order processors at a potentially
low design
complexity.
25
Instruction-set modelling for ASIP code generation
- Leupers, R.; Marwedel, P.
Dept. of Comput. Sci., Dortmund Univ., Germany
This Paper Appears in :
VLSI Design, 1996. Proceedings., Ninth International Conference on
on Pages: 77 - 80
This Conference was Held : 3-6 Jan. 1996
1995
ISBN: 0-8186-7228-5
IEEE Catalog Number: 96TB100010
Total Pages: xxxiv+439
References Cited: 13
Accession Number: 5374969
Abstract:
A main objective in code generation for ASIPs is to develop retargetable
compilers in order to
permit exploration of different architectural alternatives within short
turnaround time.
Retargetability requires that the compiler is supplied with a formal description
of the target
processor. This description is usually transformed into an internal instruction
set model, on
which the actual code generation operates. In this contribution we analyze
the demands on
instruction set models for retargetable code generation, and we present
a formal instruction
set model which meets these demands. Compared to previous work, it covers
a broad range
of instruction formats and includes a detailed view of inter-instruction
restrictions.
26
Instruction-set matching and selection for DSP and ASIP code generation
- Liem, C.; May, T.; Paulin, P.
Bell-Northern Res., Ottawa, Ont., Canada
This Paper Appears in :
European Design and Test Conference, 1994. EDAC, The European Conference
on Design
Automation. ETC European Test Conference. EUROASIC, The European Event
in ASIC Design,
Proceedings.
on Pages: 31 - 37
This Conference was Held : 28 Feb.-3 March 1994
1994
ISBN: 0-8186-5410-4
IEEE Catalog Number: 94TH0634-6
Total Pages: xxvii+676
References Cited: 15
Accession Number: 4682244
Abstract:
The increasing use of digital signal processors (DSPs) and application
specific
instruction-set processors (ASIPs) has put a strain on the perceived mature
state of
compiler technology. The presence of custom hardware for application-specific
needs has
introduced instruction types which are unfamiliar to the capabilities of
traditional compilers.
Thus, these traditional techniques can lead to inefficient and sparsely
compacted machine
microcode. In this paper, we introduce a novel instruction-set matching
and selection
methodology, based upon a rich representation useful for DSP and mixed
control-oriented
applications. This representation shows explicit behaviour that references
architecture
resource classes. This allows a wide range of instructions types to be
captured in a pattern
set. The pattern set has been organized in a manner such that matching
is extremely efficient
and retargeting to architectures with new instruction sets is well defined.
The matching and
selection algorithms have been implemented in a retargetable code generation
system called
CodeSyn.
27
IP-based design of custom field programmable network processors
- Bombana, M.; Fominykh, N.; Gorla, G.; Kriajev, A.; Krivosheyin, B.; Rytchagov,
J.
Central Res., Italtel Soc. Italiana Telecommun. SpA, Milan, Italy
This Paper Appears in :
Electronics, Circuits and Systems, 1998 IEEE International Conference on
on Pages: 467 - 471 vol.1
This Conference was Held : 7-10 Sept. 1998
1998
Vol. 1
ISBN: 0-7803-5008-1
IEEE Catalog Number: 98EX196
Total Pages: 3 vol. (xxviii+557+557+569)
References Cited: 9
Accession Number: 6476137
Abstract:
A methodology was tested, based on reuse, to design ASIPs (application
specific
programmable processors) at ASIC cost. Criteria are defined to identify
reusable semantics
(noninstantiated intellectual properties) within functional specifications
written in C. These
are isolated as hierarchically nested, object oriented C++ behaviors. A
"what-if" exploration
flow brings to the optimized hw and sw sorting of every such IP inside
an algorithm running
on a programmable architecture. The specific architecture is modeled and
taken into account
by the sw and hw synthesis tools, not in the IP model. We evaluated the
procedure
developing a VLIW custom programmable processor, re-configurable on both
hw and sw.
This emulator is a prototype for fixed or programmable DSPs, and an archetype
of a real-time
field retargetable "class" processor, with optimum speed and power performance
tuned to
every new algorithm/data couple within a certain class of applications.
An experiment on
processing the real time code for multi-mode communication terminals is
reported.
28
Function unit specialization through code analysis
- Benyamin, D.; Mangione-Smith, W.H.
Dept. of Electr. Eng., California Univ., Los Angeles, CA, USA
This Paper Appears in :
Computer-Aided Design, 1999. Digest of Technical Papers. 1999 IEEE/ACM
International Conference
on
on Pages: 257 - 260
This Conference was Held : 7-11 Nov. 1999
1999
ISBN: 0-7803-5832-5
IEEE Catalog Number: 99CH37051
Total Pages: xxiv+611
References Cited: 9
Accession Number: 6441935
Abstract:
Many previous attempts at ASIP (application-specific instruction set processor)
synthesis
have employed template matching techniques to target function units to
application code, or
directly design new units to extract maximum performance. This paper presents
an entirely
new approach to specializing hardware for application-specific needs. In
our framework of a
parameterized VLIW processor, we use a post-modulo scheduling analysis
to reduce the
allocated hardware resources while increasing the code's performance. Initial
results
indicate significant savings in area, as well as optimizations to increase
FIR filter code
performance by 200% to 300%.
29
Custom Computing Machines vs. Hardware/Software Codesign : from a globalized
point of
view.
- Hartenstein, R.W.; Becker, J.; Kress R.
University of Kaiserslautern
Abstract
The paper gives a generalized survey on Customized Computing with research
activities of the emerging new research scenes of Application Specific
Instruction
Set Processors (ASIPs) and Custom Computing Machines (CCMs). Both scenes
have strong relations to Hardware/Software Co-Design. CCMs are mainly based
on field-programmable add-on hardware to accelerate microprocessors or
computers.
The CCM scene tries to make standard hardware more soft for flexible adaptation
to
a variety of particular application environments. The ASIP scene tries
to design an
instruction set as an interface between hardware and application closely
matching
their characteristics.
30
Algorithm and architecture-level design space exploration using
hierarchical data flows
- Peixoto, H.P.; Jacome, M.F.
Editor(s): Thiele, L., Fortes, J., Vissers, K., Taylor, V., Noll, T., Teich,
J.
Dept. of Electr. & Comput. Eng., Texas Univ., Austin, TX, USA
This Paper Appears in :
Application-Specific Systems, Architectures and Processors, 1997. Proceedings.,
IEEE International
Conference on
on Pages: 272 - 282
This Conference was Held : 14-16 July 1997
1997
ISBN: 0-8186-7959-X
IEEE Catalog Number: 97TB100177
Total Pages: xii+540
References Cited: 21
Accession Number: 5685264
Abstract:
Incorporating algorithm and architecture level design space exploration
in the early phases of
the design process can have a dramatic impact on the area, speed, and power
consumption of
the resulting systems. This paper proposes a framework for supporting system-level
design
space exploration and discusses the three fundamental issues involved in
effectively
supporting such an early design space exploration: definition of an adequate
level of
abstraction; definition of good fidelity system-level metrics; and definition
of mechanisms for
automating the exploration process. The first issue, the definition of
an adequate level of
abstraction is then addressed in detail. Specifically, an algorithm-level
model, an
architecture-level model, and a set of operations on these models, are
proposed, aiming at
efficiently supporting an early, aggressive system-level design space exploration.
A
discussion on work in progress in the other two topics, metrics and automation,
concludes
the paper.
31
Designing with intellectual property
- Gorla, G.
Editor(s): Smailagic, A., Brodersen, R., De Man, H.
Italtel SpA, Milan, Italy
This Paper Appears in :
VLSI '99. Proceedings. IEEE Computer Society Workshop On
on Pages: 125 - 132
This Conference was Held : 8-9 April 1999
1999
ISBN: 0-7695-0152-4
Total Pages: x+133
References Cited: 9
Accession Number: 6421923
Abstract:
A methodology was developed based on IP reuse, aimed at the design of integrated
micro-systems. It was tested on a specific custom ASIP (application specific
instruction
processor) with good performance. IP occurrences are searched and identified
inside the
system specification code (C has been used for test), before any architectural
or partitioning
choice is done. Isolation criteria are their reusability, encapsulation
and completeness, while
their C++ models are deliberately kept as mutually nestable objects arranged
in a number of
hierarchical levels. Each such WARELET can be instantiated to full HW instance
(like a
black box), or full software procedure, or a mix. Every alternative choice
gives an IP instance
(IPI) whose reuse value is keyed in the IP model and in the parametric
synthesis procedures
attached to it not in a single specific implementation The collection of
WARELET instances
builds up the specific system instance. The design process is a "what-if":
inside the code
describing a (sub)system some selected warelets are attributed to a HW
implementation.
HW synthesis generates blocks that communicate within a pre-defined parametric
architectural harness either as coprocessors or as execution units of the
instruction set. A
parallel stepwise co-synthesis is operated for SW code, re-targeting the
microprogram
control code and the SW algorithm to every new HW configuration. A profiling
process gives
performance figures to validate or change the choice. These system-level
IPs offer
innovative opportunities concerning the management of intellectual value
within products and
the commercial and industrial infrastructure.
32
Conception and design of a RISC CPU for the use as embedded controller
within a parallel multimedia architecture
- Dogimont, S.; Gumm, M.; Mombers, F.; Mlynek, D.; Torielli, A.
Editor(s): Thiele, L., Fortes, J., Vissers, K., Taylor, V., Noll, T., Teich,
J.
Ecole Polytech. Federale de Lausanne, Switzerland
This Paper Appears in :
Application-Specific Systems, Architectures and Processors, 1997. Proceedings.,
IEEE International
Conference on
on Pages: 412 - 421
This Conference was Held : 14-16 July 1997
1997
ISBN: 0-8186-7959-X
IEEE Catalog Number: 97TB100177
Total Pages: xii+540
References Cited: 13
Accession Number: 5685277
Abstract:
In this paper, the problem of defining a high performance control structure
for a parallel
motion estimation architecture for MPEG2 coding is addressed. Various design
and
architecture choices are discussed and the final architecture is described.
It represents a
combined MIMD-SIMD approach which is based on a small but efficient ASIP
with subword
parallelism.
33
Software acceleration using coprocessors: is it worth the effort?
- Edwards, M.
Comput. Dept., Univ. of Manchester Inst. of Sci. & Technol., UK
This Paper Appears in :
Hardware/Software Codesign, 1997. (CODES/CASHE '97)., Proceedings of the
Fifth International
Workshop on
on Pages: 135 - 139
This Conference was Held : 24-26 March 1997
1997
ISBN: 0-8186-7895-X
IEEE Catalog Number: 97TB100115
Total Pages: ix+179
References Cited: 13
Accession Number: 5559220
Abstract:
A commonly accepted technique in hardware/software co-design is to implement
as many
system functions as possible in software and to move performance-critical
functions into
special-purpose external hardware in order to either satisfy timing constraints
or reduce the
overall execution time of a program-this is known as "software acceleration".
This paper
investigates the limits to the performance enhancements obtainable using
software
acceleration techniques. A practical target architecture, based on the
use of programmable
logic, is used to illustrate the problems associated with software acceleration.
It is shown
that, normally, little benefit can be obtained by applying software acceleration
methods to
general-purpose applications. Whereas software acceleration can profitably
be used in a
limited number of special-purpose applications, a designer would probably
be better off
developing ASIP (application-specific instruction-set processor) components,
based on
heterogeneous multiprocessor architectures.
34
A constructive method for exploiting code motion
- dos Santos, L.C.V.; Heijligers, M.J.M.; van Eijk, C.A.J.; van Eijndhoven,
J.T.J.; Jess, J.A.G.
Eindhoven Univ. of Technol., Netherlands
This Paper Appears in :
System Synthesis, 1996. Proceedings., 9th International Symposium on
on Pages: 51 - 56
This Conference was Held : 6-8 Nov. 1996
1996
ISBN: 0-8186-7563-2
IEEE Catalog Number: 96TB100061
Total Pages: xii+145
References Cited: 19
Accession Number: 5450812
Abstract:
In this paper we address a resource-constrained optimization problem for
behavioral
descriptions containing conditionals. In high-level synthesis of ASICs
or in code generation
for ASIPs, most methods use greedy choices in such a way that the search
space is limited
by the applied heuristics. For example, they might miss opportunities to
optimize across
basic block boundaries when treating conditional execution. We propose
an approach based
on local search and present a constructive method to allow unrestricted
types of code
motion, while keeping optimal solutions in the search space. A code-motion
pruning
technique is presented for cost functions optimizing schedule lengths.
A technique for
treating concurrent flows of execution is also described.
35
Instruction-set matching and GA-based selection for
embedded-processor code generation
- Shu, J.; Wilson, T.C.; Banerji, D.K.
Dept. of Comput. & Inf. Sci., Guelph Univ., Ont., Canada
This Paper Appears in :
VLSI Design, 1996. Proceedings., Ninth International Conference on
on Pages: 73 - 76
This Conference was Held : 3-6 Jan. 1996
1995
ISBN: 0-8186-7228-5
IEEE Catalog Number: 96TB100010
Total Pages: xxxiv+439
References Cited: 9
Accession Number: 5374968
Abstract:
The core tasks of retargetable code generation are instruction-set matching
and selection for
a given application program and a DSP/ASIP processor. In this paper, we
utilize a model of
target architecture specification that employs both behavioral and structural
information, to
facilitate this process. The matching method is based on a pattern tree
structure of
instructions. This tree structure, generated automatically, is implemented
by using a pattern
queue and a flag table. The matching process is efficient since it bypasses
many patterns in
the tree which do not match at certain nodes in the DFG of given application
program. Two
genetic algorithms are implemented for pattern selection: a pure GA which
uses standard GA
operators, and a GA with backtracking which employs variable-length chromesomes.
Optimal or near-optimal pattern selection is obtained in a reasonable period
of time for a
wide range of application programs.
36
A hardware/software codesign method for pipelined instruction set
processor using adaptive database
- Nguyen Ngoc Binh; Imai, M.; Shiomi, A.; Hikichi, N.
Dept. of Inf. & Comput. Sci., Toyohashi Univ. of Technol., Japan
This Paper Appears in :
Design Automation Conference, 1995. Proceedings of the ASP-DAC '95/CHDL
'95/VLSI '95., IFIP
International Conference on Hardware Description Languages. IFIP International
Conference on
Very Large Scale Integration., Asian and South Pacific
on Pages: 81 - 86
This Conference was Held : 29 Aug.-1 Sept. 1995
1995
ISBN: 4-930813-67-0
IEEE Catalog Number: 95TH8102
Total Pages: xxxii+860
References Cited: 11
Accession Number: 5217819
Abstract:
Proposes a new method to design an optimal pipelined instruction set processor
using a
formal HW/SW codesign methodology. First, a HW/SW partitioning algorithm
for selecting
an optimal pipelined architecture is introduced briefly. Then, an adaptive
database approach
is presented that enables to enhance the optimality of the design through
very accurate
estimation of the performance of a pipelined ASIP in HW/SW partitioning.
The experimental
results show that the proposed methods are effective and efficient.
37
An integer programming approach to instruction implementation method
selection problem
- Imai, M.; Alomary, A.; Sato, J.; Hikichi, N.
Toyohashi Univ. of Technol., Japan
This Paper Appears in :
Design Automation Conference, 1992., EURO-VHDL '92, EURO-DAC '92. European
on Pages: 106 - 111
This Conference was Held : 7-10 Sept. 1992
1992
ISBN: 0-8186-2780-8
Total Pages: xviii+765
References Cited: 11
Accession Number: 4493502
Abstract:
A new algorithm for instruction implementation method selection problem
(IMSP) in
application specific integrated processors (ASIP) design automation is
proposed. This
problem is to be solved in the instruction set architecture and CPU core
architecture designs.
First, the IMSP is formalized as an integer programming problem, which
is to maximize the
performance of the CPU under the constraints of chip area and power consumption.
Then, a
branch-and-bound algorithm to solve IMSP is described. According to the
experimental
results, the proposed algorithm is quite effective and efficient in solving
the IMSP. This
algorithm will automate the complex parts of the ASIP chip design.
38
Performance evaluation for application-specific architectures
- Jie Gong; Gajski, D.D.; Nicolau, A.
Dept. of Inf. & Comput. Sci., California Univ., Irvine, CA, USA
This Paper Appears in :
Very Large Scale Integration (VLSI) Systems, IEEE Transactions on
on Pages: 483 - 490
Dec. 1995
Vol. 3
Issue: 4
ISSN: 1063-8210
References Cited: 12
CODEN: IEVSE9
Accession Number: 5151981
Abstract:
Performance evaluation is critical for the minimization of design cost.
It consists of two
parts: modeling the underlying hardware engine and evaluating the performance
of the
application code for the model developed in the first part. In this paper,
we propose a new
parameterized model for application-specific architectures and present
a retargetable
scheduler for performance evaluation. The model, different from those proposed
previously,
reflects comprehensive architectural characteristics that affect hardware
parallelism. The
scheduler, distinguished from previous ones, takes into account not only
functional and
storage unit resources but also interconnect resources during the performance
evaluation.
The new architecture model, together with the retargetable scheduler, enables
designers to
accurately evaluate the performance of a variety of ASIC and ASIP architectures.
39
TAO-BIST: a framework for testability analysis and optimization of RTL
circuits for BIST
- Ravi, S.; Jha, N.K.; Lakshminarayana, G.
Dept. of Electr. Eng., Princeton Univ., NJ, USA
This Paper Appears in :
VLSI Test Symposium, 1999. Proceedings. 17th IEEE
on Pages: 398 - 406
This Conference was Held : 25-29 April 1999
1999
ISBN: 0-7695-0146-X
IEEE Catalog Number: PR00146
Total Pages: xxxii+488
References Cited: 19
Accession Number: 6450989
Abstract:
In this paper, we present TAO-BIST, a framework for testing register-transfer
level (RTL)
controller-datapath circuits using built-in self-test (BIST). Conventional
BIST techniques
at the RTL generally introduce more testability hardware than is necessary,
thereby causing
unnecessary area, delay and power overheads. They have typically been applied
to only
application-specific integrated circuits (ASICs). TAO-BIST adopts a three-phased
approach to provide an efficient BIST framework at the RTL. In the first
phase, we identify
and add an initial set of test enhancements to the given circuit. In the
second phase, we use
regular-expression based high-level symbolic testability analysis of a
BIST model of the
circuit to completely encapsulate justification/propagation information
for the modules under
test. The regular expressions so obtained are then used to construct a
Boolean function in
the final phase for determining a test enhancement solution that meets
delay constraints with
minimal area overheads. Our method is applicable to a wide spectrum of
circuits including
ASICs, application-specific programmable processors (ASPPs), application-specific
instruction processors (ASIPs), digital signal processors (DSPs) and microprocessors.
Experimental results on a number of benchmark circuits show that high fault
coverage
(<99%) can be obtained with our scheme. The average area and delay overheads
due to
TAO-BIST are only 6.0%, and 1.5%, respectively. The test application time
to achieve the
high fault coverage for the whole controller-datapath circuit is also quite
low.
40
Synthesis of configurable architectures for DSP algorithms
- Ramanathan, S.; Visvanathan, V.; Nandy, S.K.
Supercomput. Educ. & Res. Centre, Indian Inst. of Sci., Bangalore,
India
This Paper Appears in :
VLSI Design, 1999. Proceedings. Twelfth International Conference On
on Pages: 350 - 357
This Conference was Held : 7-10 Jan. 1999
1999
ISBN: 0-7695-0013-7
IEEE Catalog Number: PR00013
Total Pages: xxxi+642
References Cited: 27
Accession Number: 6324838
Abstract:
ASICs offer the best realization of DSP algorithms in terms of performance,
but the cost is
prohibitive, especially when the volumes involved are low. However, if
the architecture
synthesis trajectory for such algorithms is such that the target architecture
can be identified
as an interconnection of elementary parameterized computational structures,
then it is
possible to attain a close match, both in terms of performance and power
with respect to an
ASIC, for any algorithmic parameters of the given algorithm. Such an architecture
is weakly
programmable (configurable) and can be viewed as an application specific
instruction-set
processor (ASIP). In this work, we present a methodology to synthesize
ASIPs for DSP
algorithms.
41
Memory size estimation for multimedia applications
- Grun, P.; Balasa, F.; Dutt, N.
California Univ., Irvine, CA, USA
This Paper Appears in :
Hardware/Software Codesign, 1998. (CODES/CASHE '98). Proceedings of the
Sixth International
Workshop on
on Pages: 145 - 149
This Conference was Held : 15-18 March 1998
1998
ISBN: 0-8186-8442-9
IEEE Catalog Number: 98TB100232
Total Pages: vii+151
References Cited: 15
Accession Number: 5894896
Abstract:
Memory modules dominate the cost, performance, and power of embedded systems
that
process multidimensional signals, typically present in image and video
processing. Therefore,
studying the impact of parallelism on memory size is crucial for trading
off system
performance against area cost to enable intelligent system partitioning
and exploration. We
propose a memory size estimation method for algorithmic specifications
containing
multidimensional arrays and parallel constructs, intended as part of a
high-level partitioning
and exploration methodology. The system designer can trade-off estimation
accuracy for
increased run time. We present the results of our estimation approach on
a number of image
and video processing kernels, and discuss some preliminary results on the
influence of
parallelism on storage requirement.
42
Instruction subsetting: Trading power for programmability
- Dougherty, W.E.; Pursley, D.J.; Thomas, D.E.
Editor(s): Smailagic, A., Brodersen, R.De Man, H.
Dept. of Electr. & Comput. Eng., Carnegie Mellon Univ., Pittsburgh,
PA, USA
This Paper Appears in :
VLSI '98. System Level Design. Proceedings. IEEE Computer Society Workshop
on
on Pages: 42 - 47
This Conference was Held : 16-17 April 1998
1998
ISBN: 0-8186-8448-8
IEEE Catalog Number: 98EX158
Total Pages: ix+142
References Cited: 16
Accession Number: 6046631
Abstract:
Power consumption is an increasingly important consideration in the design
of mixed
hardware/software systems. This work defines the notion of instruction
subsetting and
explores its use as a means of reducing power consumption from the system
level of design.
Instruction subsetting is defined as creating an application specific instruction
set processor
from a more general processor such as a DSP. Although not as effective
as an ASIC
solution, instruction subsetting provides much of the power savings while
maintaining some
level of programmability. Instruction set choice strongly affects the savings.
We synthesized
5 ASIPs through place and route and found that a poorly chosen instruction
set may consume
more than 4 times the energy of an ASIP with a proper instruction set choice.
This finding
will allow designers to consider another set of trade-offs in their hardware/software
design
space exploration.
43
Embedded software in real-time signal processing systems: application
and architecture trends
- Paulin, P.G.; Liem, C.; Cornero, M.; Nacabal, F.; Goossens, G.
SGS-Thomson Microelectron., Crolles, France
This Paper Appears in :
Proceedings of the IEEE
on Pages: 419 - 435
March 1997
Vol. 85
Issue: 3
ISSN: 0018-9219
References Cited: 60
CODEN: IEEPAD
Accession Number: 5550585
Abstract:
We present an extensive survey of trends in embedded processor use with
an emphasis on
emerging applications in wireless communication, multimedia, and general
telecommunications. We demonstrate the importance of application-specific
instruction-set
processors (ASIPs) in high-volume, low cost applications. We also examine
some of the
underlying trends of the applications in which embedded processors are
used. This is
followed by a description of embedded software development tool requirements.
High-performance software compilation emerges as a key requirement. Finally,
specific
industrial case studies of products in MPEG, videophone, and low-cost digital
signal
processor (DSP) applications are used to illustrate the architecture design
tradeoffs, and
highlight specific tool requirements. A companion paper (Goosens et al.,
1997) presents a
comprehensive survey of embedded software development tools, focusing mostly
on
retargetable software compilation.
44
Embedded software in real-time signal processing systems: design
technologies
- Goossens, G.; Van Praet, J.; Lanneer, D.; Geurts, W.; Kifli, A.; Liem,
C.; Paulin, P.G.
Target Compiler Technol., Leuven, Belgium
This Paper Appears in :
Proceedings of the IEEE
on Pages: 436 - 454
March 1997
Vol. 85
Issue: 3
ISSN: 0018-9219
References Cited: 97
CODEN: IEEPAD
Accession Number: 5550586
Abstract:
The increasing use of embedded software, often implemented on a core processor
in a
single-chip system, is a clear trend in the telecommunications, multimedia,
and consumer
electronics industries. A companion paper (Paulin et al., 1997) presents
a survey of
application and architecture trends for embedded systems in these growth
markets.
However, the lack of suitable design technology remains a significant obstacle
in the
development of such systems. One of the key requirements is more efficient
software
compilation technology. Especially in the case of fixed-point digital signal
processor (DSP)
cores, it is often cited that commercially available compilers are unable
to take full advantage
of the architectural features of the processor. Moreover, due to the shorter
lifetimes and the
architectural specialization of many processor cores, processor designers
are often
compelled to neglect the issue of compiler support. This situation has
resulted in an
increased research activity in the area of design tool support for embedded
processors. This
paper discusses design technology issues for embedded systems using processor
cores,
with a focus on software compilation tools. Architectural characteristics
of contemporary
processor cores are reviewed and tool requirements are formulated. This
is followed by a
comprehensive survey of both existing and new software compilation techniques
that are
considered important in the context of embedded processors.
45
Designing a Java microcontroller to specific applications
- Ito, S.A.; Carro, L.; Jacobi, R.P.
Inst. of Comput. Sci., Univ. Fed. do Rio Grande do Sul, Porto Alegre, Brazil
This Paper Appears in :
Integrated Circuits and Systems Design, 1999. Proceedings. XII Symposium
on
on Pages: 12 - 15
This Conference was Held : 29 Sept.-2 Oct. 1999
1999
ISBN: 0-7695-0387-X
IEEE Catalog Number: PR00387
Total Pages: xiii+236
References Cited: 14
Accession Number: 6520680
Abstract:
Stack machines are known to provide code compactness and simple execution
engines-important features when implementing small devices. This paper
discusses some
benefits, problems and open questions by using a stack based microcontroller
to support
native execution of Java bytecode. The discussion is based on our experience
in designing a
Java ASIP in FPGA, in order to explore software compatibility, reconfiguration
capability and
the small size of optimized microcontrollers to implement specific applications.
The paper
also presents the synthesized machine architecture and shows some area
and speed results.
46
A design and tool reuse methodology for rapid prototyping of application
specific instruction set processors
- Young Geol Kim; Tag Gon Kim
Dept. of Electr. Eng., Korea Adv. Inst. of Sci. & Technol., Seoul,
South Korea
This Paper Appears in :
Rapid System Prototyping, 1999. IEEE International Workshop on
on Pages: 46 - 51
This Conference was Held : 16-18 June 1999
1999
ISBN: 0-7695-0246-6
IEEE Catalog Number: PR00246
Total Pages: x+243
References Cited: 5
Accession Number: 6325340
Abstract:
This paper proposes a design method and a tool reuse scheme for the rapid
prototyping of
application-specific instruction-set processors (ASIPs). We propose a three-level
hierarchical architecture abstraction method for top-down processor design.
We also
propose a reusable architecture description language (READ) and a family
of retargetable
simulators that allow top-down processor description and prototyping from
instruction-set
design to RTL implementation.
47
MetaCore: an application specific DSP development system
- Jin-Hyuk Yang; Byoung-Woon Kim; Sang-Jun Nam; Jang-Ho Cho; Sung-Won Seo;
Chang-Ho Ryu;
Young-Su Kwon; Dae-Hyun Lee; Jong-Yeol Lee; Jong-Sun Kim; Hyun-Dhong Yoon;
Jae-Yeol Kim;
Kun-Moo Lee; Chan-Soo Hwang; In-Hyung Kim; Jun-Sung Kim; Kwang-Il Park;
Kyu-Ko Park;
Yong-Hoon Lee; Seung-Ho Hwang; In-Cheol Park; Chong-Min Kyung
Dept. of Electr. Eng., Korea Adv. Inst. of Sci. & Technol., Seoul,
South Korea
This Paper Appears in :
Design Automation Conference, 1998. Proceedings
on Pages: 800 - 803
This Conference was Held : 15-19 June 1998
1998
ISBN: 0-89791-964-5
IEEE Catalog Number: 98CH36175
Total Pages: xxxii+820
References Cited: 6
Accession Number: 6084493
Abstract:
This paper describes the MetaCore system which is an ASIP (Application-Specific
Instruction set Processor) development system targeted for DSP applications.
The goal of
MetaCore system is to offer an efficient design methodology meeting specifications
given as
a combination of performance, cost and design turnaround time. MetaCore
system consists
of two major design stages: design exploration and design generation. In
the design
exploration stage, MetaCore system accepts a set of benchmark programs
and a formal
specification of ISA (Instruction Set Architecture), and estimates the
hardware cost and
performance for each hardware configuration being explored. Once a hardware
configuration
is chosen, the system helps generate a VLSI processor design in the form
of HDL along with
the application program development tools such as C compiler, assembler
and instruction set
simulator.
48
Register= Assignrnent Through Resource Classification For Asip
Microcode Generation
- Liem, C.; May, T.; Paulin, P.
This Paper Appears in :
Computer-Aided Design, 1994., IEEE/ACM International Conference on
on Pages: 397 - 402
This Conference was Held : November 6-10, 1994
ISSN: 1063-6757
Abstract:
Not Available
49
Processor-core based design and test
- Marwedel, P.
Dortmund Univ., Germany
This Paper Appears in :
Design Automation Conference, 1997. Proceedings of the ASP-DAC '97 Asia
and South Pacific
on Pages: 499 - 502
This Conference was Held : 28-31 Jan. 1997
1997
ISBN: 0-7803-3662-3
IEEE Catalog Number: 97TH8231
Total Pages: xxxii+691
References Cited: 52
Accession Number: 5559031
Abstract:
This paper responds to the rapidly increasing use of various cores for
implementing
systems-on-a-chip. It specifically focusses on processor cores. We give
some examples of
cores, including DSP cores and application-specific instruction-set processors
(ASIPs).
We mention market trends for these components, and we touch design procedures,
in
particular the use of compilers. Finally, we discuss the problem of testing
core-based
designs. Existing solutions include boundary scan, embedded in-circuit
emulation (ICE), the
use of processor resources for stimuli/response compaction and self-test
programs.
50
Hierarchical Test Generation And Design For Testability Of ASPPs and
ASIPs
- Ghosh, L.; Raghunathan, A.; Jha, N.K.
Princeton University, Princeton, NJ 08544
This Paper Appears in :
Design Automation Conference, 1997. Proceedings of the 34th
on Pages: 534 - 539
This Conference was Held : June 9-13, 1997
ISSN: 0738-100X
Abstract:
Not Available
51
Retargetable generation of code selectors from HDL processor models
- Leupers, R.; Marwedel, P.
Dept. of Comput. Sci., Dortmund Univ., Germany
This Paper Appears in :
European Design and Test Conference, 1997. ED&TC 97. Proceedings
on Pages: 140 - 144
This Conference was Held : 17-20 March 1997
1997
ISBN: 0-8186-7786-4
IEEE Catalog Number: 97TB100102
Total Pages: xxxvi+634
References Cited: 22
Accession Number: 5622676
Abstract:
Besides high code quality, a primary issue in embedded code generation
is retargetability of
code generators. This paper presents techniques for automatic generation
of code selectors
from externally specified processor models. In contrast to previous work,
our retargetable
compiler RECORD does not require tool-specific modelling formalisms, but
starts from
general HDL processor models. From an HDL model, all processor aspects
needed for code
generation are automatically derived. As demonstrated by experimental results,
short
turnaround times for retargeting are achieved, which permits study of the
HW/SW trade-off
between processor architectures and program execution speed.
52
Methods for retargetable DSP code generation
- Leupers, R.; Niemann, R.; Marwedel, P.
Editor(s): Rabaey, J., Chau, P.M., Eldon, J.
Dept. of Comput. Sci. XII, Dortmund Univ., Germany
This Paper Appears in :
VLSI Signal Processing, VII, 1994., [Workshop on]
on Pages: 127 - 136
This Conference was Held : 26-28 Oct. 1994
1994
ISBN: 0-7803-2123-5
IEEE Catalog Number: 94TH8008
Total Pages: xii+511
References Cited: 9
Accession Number: 5105714
Abstract:
Efficient embedded DSP system design requires methods of hardware/software
codesign. In
this contribution we focus on software synthesis for partitioned system
behavioral
descriptions. In previous approaches, this task is performed by compiling
the behavioral
descriptions onto standard processors using target-specific compilers.
It is argued that
abandoning this restriction allows for higher degrees of freedom in design
space exploration.
In turn, this demands for retargetable code generation tools. We present
different schemes
for DSP code generation using the MSSQ microcode generator. Experiments
with industrial
applications revealed that retargetable DSP code generation based on structural
hardware
descriptions is feasible, but there exists a strong dependency between
the behavioral
description style and the resulting code quality. As a result, necessary
features of
high-quality retargetable DSP code generators are identified.
53
Industrial experience using rule-driven retargetable code generation for
multimedia applications
- Liem, C.; Paulin, P.; Cornero, M.; Jerraya, A.
Inst. Nat. Polytech. de Grenoble, France
This Paper Appears in :
System Synthesis, 1995., Proceedings of the Eighth International Symposium
on
on Pages: 60 - 65
This Conference was Held : 13-15 Sept. 1995
1995
ISBN: 0-8186-7076-2
IEEE Catalog Number: 95TH8050
Total Pages: xiii+175
References Cited: 13
Accession Number: 5087877
Abstract:
The increasing usage of application-specific instruction set processors
(ASIPs) in audio and
video telecommunications has made strong demands on the rapid availability
of dedicated
compilers. A rule-driven approach to code generation may have benefits
over model-based
approaches as the user is not confined to the capabilities supported by
the model. However,
the sole use of transformation rules may or may not be sufficient in optimization
abilities
depending on the target architecture. This paper outlines experiences with
a rule-driven code
generation approach for two applications in audio and video processing.
The first is a
controller for the VideoPhone codec at SGS-Thomson Microelectronics. The
second is a
VLIW (very large instruction word) processor for high-fidelity and MPEG
audio at Thomson
Consumer Electronic Components. The experience has shown that a rule-driven
approach to
compilation is applicable to both the controller and VLIW architectures;
however, is limited
in optimization abilities for the latter.
54
Prototyping and reengineering of microcontroller-based systems
- Carro, L.; Pereira; Suzim, A.
Dept. de Engenharia Electrica & Pos Graduacao em Ciencia de Computacao,
Univ. Federal do Rio Grande do
Sul, Porto Alegre, Brazil
This Paper Appears in :
Rapid System Prototyping, 1996. Proceedings., Seventh IEEE International
Workshop on
on Pages: 178 - 182
This Conference was Held : 19-21 June 1996
1996
ISBN: 0-8186-7603-5
IEEE Catalog Number: 96TB100055
Total Pages: ix+189
References Cited: 9
Accession Number: 5317120
Abstract:
This paper describes our current research in the field of systems design,
trying to reach an
Application Specific Integrated System (ASIS). Our target system is based
on industry
applications. We show the design approach to change presently developed
boards using
classical microcontrollers, migrating the Cisc architecture to an ASIP
architecture. The
studied examples show meaningful gains regarding the total area of the
processor.
55
Embedded architecture co-synthesis and system integration
- Lin, B.; Vercauteren, S.; De Man, H.
Editor(s): Thomas, D., Ernst, R.
IMEC, Leuven, Belgium
This Paper Appears in :
Hardware/Software Co-Design, 1996. (Codes/CASHE '96), Proceedings., Fourth
International
Workshop on
on Pages: 2 - 9
This Conference was Held : 18-20 March 1996
1996
ISBN: 0-8186-7243-9
IEEE Catalog Number: 96TB100020
Total Pages: ix+141
References Cited: 16
Accession Number: 5256458
Abstract:
Embedded system architectures comprising of software programmable components
(e.g.
DSP, ASIP, and micro-controller cores) and customized hardware co-processors,
integrated
into a single cost-efficient VLSI chip, are emerging as a key solution
to today's
microelectronics design problems. This trend is being driven by new emerging
applications in
the areas of wireless communication, high-speed optical networking, and
multimedia
computing. A key problem confronted by embedded system designers today
is the rapid
prototyping of application-specific embedded system architectures where
different
combinations of programmable processors and hardware components must be
integrated
together, while ensuring that the hardware and software parts communicate
correctly. In this
paper, we present a solution to this embedded architecture co-synthesis
and system
integration problem based on an orchestrated combination of architectural
strategies,
parameterized libraries, and software CAD tools.
56
Memory bank and register allocation in software synthesis for ASIPs
- Sudarsanam, A.; Malik, S.
Dept. of Electr. Eng., Princeton Univ., NJ, USA
This Paper Appears in :
Computer-Aided Design, 1995. ICCAD-95. Digest of Technical Papers., 1995
IEEE/ACM
International Conference on
on Pages: 388 - 392
This Conference was Held : 5-9 Nov. 1995
1995
ISBN: 0-8186-7213-7
IEEE Catalog Number: 95CB35859
Total Pages: xxviii+743
References Cited: 10
Accession Number: 5145258
Abstract:
An architectural feature commonly found in digital signal processors (DSPs)
is multiple
data-memory banks. This feature increases memory bandwidth by permitting
multiple
memory accesses to occur in parallel when the referenced variables belong
to different
memory banks and the registers involved are allocated according to a strict
set of conditions,
Unfortunately, current compiler technology is unable to take advantage
of the potential
increase in parallelism offered by such architectures, Consequently, most
application
software for DSP systems is hand-written-a very time-consuming task. We
present an
algorithm which attempts to maximize the benefit of this architectural
feature. While
previous approaches have decoupled the phases of register allocation and
memory bank
assignment, our algorithm performs these two phases simultaneously. Experimental
results
demonstrate that our algorithm substantially improves the code quality
of many
compiler-generated and even hand-written programs.
57
Rapid Prototyping of Application-Specific Counterflow pipelines
- Childers B.; Davidson J.
University of Virginia
Technical Report CS-99-01
Abstract
Application-specific processor (ASIP) design is a promising approach for
meeting
the performance and cost goals of an embedded system. We have developed
a new
microarchitecture for automatically constructing ASIP's. This new architecture,
called a wide counterflow pipeline (WCFP), is based on the counterflow
pipeline
organization proposed by Sproull, Sutherland, and Molnar. Our ASIP synthesis
technique uses software pipelining and design-space exploration to generate
a custom WCFP and instruction set for an embedded application. This type
of
architecture synthesis requires an infrastructure for rapidly prototyping
ASIP's
to evaluate design trade-offs. This paper presents the requirements and
implementation of such an environment for automatic design of WCFP's. First,
we describe a database for specifying design elements and architectural
con-
straints. Second, we present an intermediate representation for WCFP synthesis
and reconfigurable simulation. Finally, we describe a fast and reconfigurable
simula-
tion methodology for WCFP's.
58
Processor evaluation in an embedded systems design environment
- Gupta, T.V.K.; Sharma, P.; Balakrishnan, M.; Malik, S.
Dept. of Comput. Sci. & Eng., Indian Inst. of Technol., Delhi, India
This Paper Appears in :
VLSI Design, 2000. Thirteenth International Conference on
on Pages: 98 - 103
This Conference was Held : 3-7 Jan. 2000
2000
ISBN: 0-7695-0487-6
Total Pages: xxxiv+588
References Cited: 10
Accession Number: 6576983
Abstract:
In this paper we present a novel methodology for processor evaluation in
an embedded
systems design environment. This evaluation can help in either selecting
a suitable processor
core or in evaluating changes to an ASIP. The processor evaluation is carried
out in two
stages. First, an architecture independent stage in which processors are
rejected based on
key application parameters and secondary architecture dependent stage in
which
performance is estimated on selected processors. The contribution of our
work includes
identification of application parameters which can influence processor
selection, a
mechanism to capture widely varying processor architectures and an instruction
constrained
scheduler. Initial experimental results suggest the potential of this approach.
59
System Design Based on Single Language and Single-Chip Java ASIP
Microcontroller
- Akira, S.; Carro, I.L.; Jacobi, R.P.
UFRGS - Brazil
This Paper Appears in :
Design, Automation and Test in Europe Conference and Exhibition 2000. Proceedings
on Pages: 703 - 707
This Conference was Held : March 27-30, 2000
2000
ISBN: 0-7695-0537-6
Abstract:
Not Available
60
The construction of a retargetable simulator for an architecture template
- Kienhuis, B.; Deprettere, E.; Vissers, K.; van der Wolf, P.
Delft Univ. of Technol., Netherlands
This Paper Appears in :
Hardware/Software Codesign, 1998. (CODES/CASHE '98). Proceedings of the
Sixth International
Workshop on
on Pages: 125 - 129
This Conference was Held : 15-18 March 1998
1998
ISBN: 0-8186-8442-9
IEEE Catalog Number: 98TB100232
Total Pages: vii+151
References Cited: 13
Accession Number: 5894893
Abstract:
Systems in the domain of high-performance video signal processing are becoming
more and
more programmable. We suggest an approach to design such systems that involves
measuring, via simulation, the performance of various architectures on
which a set of
applications are mapped. This approach requires a retargetable simulator
for an architecture
template. We describe the retargetable simulator that we constructed for
a stream-oriented
application-specific dataflow architecture. For each architecture instance
of the architecture
template, a specific simulator is derived in three steps: the architecture
instance is
constructed, an execution model is added, and the executable architecture
is instrumented to
obtain performance numbers. We used object oriented principles together
with a high-level
simulation mechanism to ensure retargetability and an efficient simulation
speed. Finally we
explain how a retargetable simulator can be encapsulated within an environment
for
automated design space exploration.
61
A framework for retargetable code generation using simulated annealing
- Visser, B.-S.
Dept. of Comput. Sci., Twente Univ., Enschede, Netherlands
This Paper Appears in :
EUROMICRO Conference, 1999. Proceedings. 25th
on Pages: 458 - 462 vol.1
This Conference was Held : 8-10 Sept. 1999
1999
Vol. 1
ISBN: 0-7695-0321-7
Total Pages: 2 vol. (xxviii+530+478)
References Cited: 11
Accession Number: 6364161
Abstract:
Co-development of hardware and software is a methodology dealing with the
increased
design complexity of embedded systems. Retargetable code generation is
a co-designing
method to map a high-level software description onto a variety of hardware
architectures
without the need to rewrite a compiler. Highly efficient code generation
is required to meet,
for example, timing, area and low-power constraints. The traditional ordering
of code
generation phases introduces inefficiencies in the code generation process;
phase-coupling
deals with these inefficiencies. We introduce a new code generation technique
based on
simulated annealing. This technique focuses especially on highly irregular
DSP architectures
and is part of a generic framework for retargetable code generation. This
approach is new
because it fully tackles the phase-coupling problem. Furthermore, this
approach shows that
the modeling of the software algorithm and the hardware architecture plays
a key role in the
efficiency of code generation.
62
Instruction selection, resource allocation, and scheduling in the AVIV
retargetable code generator
- Hanono, S.; Devadas, S.
Dept. of Electr. Eng. & Comput. Sci., MIT, MA, USA
This Paper Appears in :
Design Automation Conference, 1998. Proceedings
on Pages: 510 - 515
This Conference was Held : 15-19 June 1998
1998
ISBN: 0-89791-964-5
IEEE Catalog Number: 98CH36175
Total Pages: xxxii+820
References Cited: 11
Accession Number: 6084458
Abstract:
The AVIV retargetable code generator produces optimized machine code for
target
processors with different instruction set architectures. AVIV optimizes
for minimum code
size. Retargetable code generation requires the development of heuristic
algorithms for
instruction selection, resource allocation, and scheduling. AVIV addresses
these code
generation subproblems concurrently, whereas most current code generation
systems
address them sequentially. It accomplishes this by converting the input
application to a
graphical (Split-Node DAG) representation that specifies all possible ways
of implementing
the application on the target processor. The information embedded in this
representation is
then used to set up a heuristic branch-and-bound step that performs functional
unit
assignment, operation grouping, register bank allocation, and scheduling
concurrently. While
detailed register allocation is carried out as a second step, estimates
of register
requirements are generated during the first step to ensure high quality
of the final assembly
code. We show that near-optimal code can be generated for basic blocks
for different
architectures within reasonable amounts of CPU time. Our framework thus
allows us to
accurately evaluate the performance of different architectures on application
code.
63
A BDD-based frontend for retargetable compilers
- Leupers, R.; Marwedel, P.
Dept. of Comput. Sci., Dortmund Univ., Germany
This Paper Appears in :
European Design and Test Conference, 1995. ED&TC 1995, Proceedings.
on Pages: 239 - 243
This Conference was Held : 6-9 March 1995
1995
ISBN: 0-8186-7039-8
IEEE Catalog Number: 95TH8058
Total Pages: xxvii+611
References Cited: 12
Accession Number: 5057047
Abstract:
We present a unified frontend for retargetable compilers that performs
analysis of the target
processor model. Our approach bridges the gap between structural and behavioral
processor
models for retargetable compilation. This is achieved by means of instruction
set extraction.
The extraction technique is based on a BDD data structure which significantly
improves
control signal analysis in the target processor compared to previous approaches.
64
Power efficient mediaprocessors: design space exploration
- Kin, J.; Chunho Lee; Mangione-Smith, W.H.; Potkonjak, M.
Dept. of Electr. Eng., California Univ., Los Angeles, CA, USA
This Paper Appears in :
Design Automation Conference, 1999. Proceedings. 36th
on Pages: 321 - 326
This Conference was Held : 21-25 June 1999
1999
ISBN: 1-58113-092-9
IEEE Catalog Number: 99CH36361
Total Pages: xxxii+1003
References Cited: 31
Accession Number: 6495998
Abstract:
We present a framework for rapidly exploring the design space of low power
application-specific programmable processors (ASPP), in particular mediaprocessors.
We
focus on a category of processors that are programmable yet optimized to
reduce power
consumption for a specific set of applications. The key components of the
framework
presented in this paper are a retargetable instruction level parallelism
(ILP) compiler,
processor simulators, a set of complete media applications written in a
high level language
and an architectural component selection algorithm. The fundamental idea
behind the
framework is that with the aid of a retargetable ILP compiler and simulators
it is possible to
arrange architectural parameters (e.g., the issue width, the size of cache
memory units, the
number of execution units, etc.) to meet low power design goals under area
constraints.
65
Binding and scheduling algorithms for highly retargetable compilation
- Yamaguchi, M.; Ishiura, N.; Kambe, T.
Precision Technol. Center, Sharp Corp., Nara, Japan
This Paper Appears in :
Design Automation Conference 1998. Proceedings of the ASP-DAC '98. Asia
and South Pacific
on Pages: 93 - 98
This Conference was Held : 10-13 Feb. 1998
1998
ISBN: 0-7803-4425-1
IEEE Catalog Number: 98EX121
Total Pages: xxxviii+606
References Cited: 7
Accession Number: 5912424
Abstract:
This paper presents new binding and scheduling algorithms for a retargetable
compiler which
can deal with diverse architectures. Application specific embedded processors
often
includes a "nonorthogonal" datapath where all the registers are not equally
accessible from
all the functional units. Nonorthogonal datapath makes a binding task very
hard because
inadvertent assignment of an operation to a functional unit may rule out
all the possible
assignments to other operations due to reachability constraints among datapath
resources.
Scheduling must take register capacity constraints into account in addition
to resource
constraints. We discuss these problems and propose algorithms to solve
them.
66
A retargetable optimizing code generator for digital signal processors
- Kreuzer, W.; Gotschlich, M.; Wess, B.
Inst. fur Nachrichtentech. & Hochfrequenztech., Tech. Univ. Wien, Austria
This Paper Appears in :
Circuits and Systems, 1996. ISCAS '96., Connecting the World., 1996 IEEE
International Symposium on
on Pages: 257 - 260 vol.2
1996
Vol. 2
ISBN: 0-7803-3073-0
IEEE Catalog Number: 96CH35876
Total Pages: 4 vol.(xlviii+692+801+612+845)
References Cited: 12
Accession Number: 5425344
Abstract:
Efficient DSP software synthesis for systems with stringent cost and power
constraints
requires tools which minimize code size as well as tools to evaluate processor
architectures
for a given application. In this paper, we introduce a user retargetable
code generator
translating homogeneous atomic data flow graphs into high-quality DSP assembly
code. By
using a target architecture description file, flexibility in the design
process is enhanced
without impairing final code quality. Based on a trellis tree straight-line
code generation
algorithm, we present a method for code compaction and register optimization
to exploit
instruction level parallelism. The results of our code generator match
the quality of assembly
programs which were coded by hand and thoroughly optimized.
67
A graph based processor model for retargetable code generation
- Van Praet, J.; Lanneer, D.; Goossens, G.; Geurts, W.; De Man, H.
IMEC, Leuven, Belgium
This Paper Appears in :
European Design and Test Conference, 1996. ED&TC 96. Proceedings
on Pages: 102 - 107
This Conference was Held : 11-14 March 1996
1996
ISBN: 0-8186-7423-7
IEEE Catalog Number: 96TB100027
Total Pages: xxxi+623
References Cited: 11
Accession Number: 5309465
Abstract:
Embedded processors in electronic systems typically are tuned to a few
applications.
Development of processor specific compilers is prohibitively expensive
and as a result such
compilers, if existing, yield code of an unacceptable quality. To improve
this code quality, we
developed a retargetable and optimising code generator. It uses a graph
based processor
model that captures the connectivity the parallelism and all architectural
peculiarities of an
embedded processor In this paper; the processor model is presented and
we formally define
the code generation task, including code selection, register allocation
and scheduling, in
terms of this model.
68
Efficient retargetable compiler code generation
- Hatcher, P.J.; Tuller, J.W.
Dept. of Comput. Sci., New Hampshire Univ., Durham, NH, USA
This Paper Appears in :
Computer Languages, 1988. Proceedings., International Conference on
on Pages: 25 - 30
This Conference was Held : 9-13 Oct. 1988
1988
ISBN: 0-8186-0874-9
Total Pages: xv+446
References Cited: 19
Accession Number: 3327897
Abstract:
A discussion is presented of the design and implementation of a retargetable
code generation
system, UNH-CODEGEN, specifically designed for the bottom-up tree pattern
matching
algorithms. The authors describe experiments in which the system has been
used to build
compilers. These experiments demonstrate that the system can be used to
quickly generate a
code generator that will run fast (roughly four times the speed of the
Portable C Compiler's
code generators), that will be space-efficient, and that will make best
use of the underlying
machine description.
69
Hypermedia processors: design space exploration
- Kin, J.; Chunho Lee; Mangione-Smith, W.H.; Potkonjak, M.
Editor(s): Wong, P.W., Alwan, A., Ortega, A., Kuo, C.-C.J., Nikian, C.L.M.
Dept. of Electr. Eng., California Univ., Los Angeles, CA, USA
This Paper Appears in :
Multimedia Signal Processing, 1998 IEEE Second Workshop on
on Pages: 323 - 328
This Conference was Held : 7-9 Dec. 1998
1998
ISBN: 0-7803-4919-9
IEEE Catalog Number: 98EX175
Total Pages: xvii+638
References Cited: 8
Accession Number: 6313715
Abstract:
We present a framework for area optimal system design space exploration
for hypermedia
applications. We focus on a category of processors that are programmable
yet optimized to a
hypermedia application. The key components of the framework presented in
this paper are a
retargetable instruction-level parallelism compiler, instruction level
simulators, a set of
complete media applications written in a high level language and a media
processor synthesis
algorithm. The framework addresses the need for area optimal system design
by exploiting
the instruction-level parallelism found in media applications by compilers
that target
multiple-instruction-issue processors. Using the framework we conduct an
extensive
exploration of area optimal system design space for a hypermedia application.
We found that
there is enough ILP in the typical media and communication applications
to achieve highly
concurrent execution when throughput requirements are high. On the other
hand, when
throughput requirements are low, there is no need to use multiple-instruction-issue
processors.
70
Describing instruction set processors using nML
- Fauth, A.; Van Praet, J.; Freericks, M.
Inst. fur Tech. Inf., Tech. Univ. Berlin, Germany
This Paper Appears in :
European Design and Test Conference, 1995. ED&TC 1995, Proceedings.
on Pages: 503 - 507
This Conference was Held : 6-9 March 1995
1995
ISBN: 0-8186-7039-8
IEEE Catalog Number: 95TH8058
Total Pages: xxvii+611
References Cited: 24
Accession Number: 5057082
Abstract:
Programmable processors offer a high degree of flexibility and are therefore
increasingly
being used in embedded systems. We introduce the formalism nML which is
especially
suited to describe such processors in terms of their instruction set, an
nML description is
directly related to the standard description as found in the usual programmer's
manuals. The
nML formalism is based on a mixed structural and behavioural model facilitating
exact yet
concise descriptions. The philosophy of nML is already applied in two approaches
to
retargetable code generation and instruction set simulation.
71
The White Dwarf: a high-performance application-specific processor
- Wolfe, A.; Breternitz, M., Jr.; Stephens, C.; Ting, A.L.; Kirk, D.B.;
Bianchini, R.P., Jr.; Shen, J.P.
Dept. of Electr. & Comput. Eng., Carnegie-Mellon Univ., Pittsburgh,
PA, USA
This Paper Appears in :
Computer Architecture, 1988. Conference Proceedings. 15th Annual International
Symposium on
on Pages: 212 - 222
This Conference was Held : 30 May-2 June 1988
1988
ISBN: 0-8186-0861-7
Total Pages: xi+461
References Cited: 18
Accession Number: 3228437
Abstract:
The design and implementation of a high-performance special-purpose processor,
called the
White Dwarf, or accelerating finite-element analysis algorithms is presented.
The White
Dwarf CPU contains two Am2935 32-bit floating-point processors and one
Am29332 32-bit
arithmetic logic unit (ALU), and uses a wide-instruction-word architecture
in which the
application algorithm is directly implemented in microcode. The entire
system is VME-bus
compatible and interfaces with a Sun 3/160 host. The system's potential
peak performance is
20 MFLOPS (million floating-point operations per second) a sustained computation
rate in
excess of 15 MFLOPS is expected. A potential speedup of between one and
two orders of
magnitude is possible. With a fully populated memory subsystem, the White
Dwarf can
accommodate finite-element problems involving up to half a million nodes.
The system is
designed using an approach called application-specific processor design
(ASPD). A
retargetable compiler has been developed which is capable of generating
highly parallel and
efficient code for the White Dwarf and other processors with similar architecture.
System
debug/integration is in progress; a highly useful system is expected.
72
A formal model of computer architectures for digital system design
environments
- Wilsey, P.A.; Dasgupta, S.
Dept. of Electr. & Comput. Eng., Cincinnati Univ., OH, USA
This Paper Appears in :
Computer-Aided Design of Integrated Circuits and Systems, IEEE Transactions
on
on Pages: 473 - 486
May 1990
Vol. 9
Issue: 5
ISSN: 0278-0070
References Cited: 35
CODEN: ITCSDI
Accession Number: 3721471
Abstract:
A new and powerful model of computer architectures for machine description
is presented.
This model is capable of representing a machine across the abstraction
levels ranging from
the exo-architecture to the gate level. The goal is to establish a formal
framework for the
construction of a new hardware description language that will be useful
for a large class of
retargetable design-automation systems.
73
Retargetable estimation scheme for DSP architectnre selection
- Ghazal, N.; Newton, R.; Jan Rabaey
University of California
This Paper Appears in :
Design Automation Conference, 2000. Proceedings of the ASP-DAC 2000. Asia
and South Pacific
on Pages: 485 - 489
This Conference was Held : January 25-28, 2000
2000
ISBN: 0-7803-5973-9
Abstract:
Not Available
74
Instruction set design and optimizations for address computation in DSP
architectures
- Araujo, G.; Sudarsanam, A.; Malik, S.
Dept. of Electr. Eng., Princeton Univ., NJ, USA
This Paper Appears in :
System Synthesis, 1996. Proceedings., 9th International Symposium on
on Pages: 102 - 107
This Conference was Held : 6-8 Nov. 1996
1996
ISBN: 0-8186-7563-2
IEEE Catalog Number: 96TB100061
Total Pages: xii+145
References Cited: 10
Accession Number: 5450820
Abstract:
In this paper we investigate the problem of code generation for address
computation for DSP
processors. This work is divided into four parts. First, we propose a branch
instruction design
which can guarantee minimum overhead for programs that make use of implicit
indirect
addressing. Second, we give a formulation and propose a solution for the
problem of
allocating address registers (ARs) for array accesses within loop constructs.
Third, we
describe retargetable approaches for auto-increment (decrement) optimizations
of pointer
variables, and loop induction variables. Finally, we use a graph coloring
technique to allocate
physical ARs to the virtual ARs used in the previous phases. The results
show that the
combination of the above techniques considerably improves the final code
quality for
benchmark DSP programs.
75
Time-constrained code compaction for DSPs
- Leupers, R.; Marwedel, P.
Dept. of Comput. Sci., Dortmund Univ., Germany
This Paper Appears in :
Very Large Scale Integration (VLSI) Systems, IEEE Transactions on
on Pages: 112 - 122
This Conference was Held : 18-20 Jan. 1995
March 1997
Vol. 5
Issue: 1
ISSN: 1063-8210
References Cited: 33
CODEN: IEVSE9
Accession Number: 5525492
Abstract:
This paper addresses instruction-level parallelism in code generation for
digital signal
processors (DSPs). In the presence of potential parallelism, the task of
code generation
includes code compaction, which parallelizes primitive processor operations
under given
dependency and resource constraints. Furthermore, DSP algorithms in most
cases are
required to guarantee real-time response. Since the exact execution speed
of a DSP program
is only known after compaction, real-time constraints should be taken into
account during
the compaction phase. While previous DSP code generators rely on rigid
heuristics for
compaction, we propose a novel approach to exact local code compaction
based on an integer
programming (IP) model, which handles time constraints. Due to a general
problem
formulation, the IP model also captures encoding restrictions and handles
instructions having
alternative encodings and side effects and therefore applies to a large
class of instruction
formats. Capabilities and limitations of our approach are discussed for
different DSPs.
76
A low overhead design for testability and test generation technique for
core-based systems-on-a-chip
- Ghosh, I.; Jha, N.K.; Dey, S.
Fujitsu Labs. of America, Sunnyvale, CA, USA
This Paper Appears in :
Computer-Aided Design of Integrated Circuits and Systems, IEEE Transactions
on
on Pages: 1661 - 1676
Nov. 1999
Vol. 18
Issue: 11
ISSN: 0278-0070
References Cited: 30
CODEN: ITCSDI
Accession Number: 6453580
Abstract:
In a fundamental paradigm shift in system design, entire systems are being
built on a single
chip, using multiple embedded cores. Though the newest system design methodology
has
several advantages in terms of time-to-market and system cost, testing
such core-based
systems is difficult, mainly due to the problem of justifying test sequences
at the inputs of a
core embedded deep in the circuit and propagating test responses from the
core outputs. In
this paper, we first present a design for testability technique for testing
such core-based
systems. In this scheme, untestable cores are first made testable using
hierarchical
testability analysis techniques. If necessary, additional testability hardware
is added to the
cores to make them transparent so that they can propagate test data without
information
loss. This testability and transparency technique is currently applicable
to cores of the
following types: application-specific integrated circuits, application-specific
programmable
processors, and application-specific instruction processors. Other core
types can be made
testable and transparent using traditional techniques. The testable and
transparent cores can
then he integrated together with some system-level testability hardware
to ensure
justification of precomputed test sequences of each core from system primary
inputs to the
core inputs and propagation of test responses from core outputs to system
primary outputs.
Justification and propagation of test sequences are done at the system
level by extending
and suitably modifying the symbolic hierarchical testability analysis method
that has been
successfully applied to register-transfer level circuits. Since the testability
analysis method
is symbolic, the system test generation method is independent of the bit-width
of the cores.
The system-level test set is obtained as a byproduct of the testability
analysis and insertion
method without further search. The test methodology was applied to six
example systems.
Besides the proposed test method, the two methods that are currently used
in the industry
were also evaluated: (1) FScan-BScan, where each core is full-scanned,
and system test is
performed using boundary scan and (2) FScan-TBus, where each core is full-scanned,
and
system test is performed using a test bus. The experiments show that the
proposed scheme
has significantly lower area overhead, delay overhead, and test application
time compared to
FScan-BScan and FScan-TBus, without any compromise in the system fault
coverage.
77
Cooperative register assignment and code compaction for digital signal
processors with irregular datapaths
- Kreuzer, W.; Wess, B.
Inst. fur Nachrichtentech. und Hochfrequenztech., Tech. Univ. Wien, Austria
This Paper Appears in :
Acoustics, Speech, and Signal Processing, 1997. ICASSP-97., 1997 IEEE International
Conference on
on Pages: 691 - 694 vol.1
This Conference was Held : 21-24 April 1997
1997
Vol. 1
ISBN: 0-8186-7919-0
IEEE Catalog Number: 97CB36052
Total Pages: 5 vol. (xxii+xxv+xxiv+xxii+4156)
References Cited: 12
Accession Number: 5716155
Abstract:
We address the phase ordering problem of code compaction and register assignment
in a
data flow graph compiler. During register assignment, we take into account
the
instruction-level parallelism available. Symbolic variables in straight-line
code are allocated
to register set/memory location pairs which maximally preserve the freedom
available for
code compaction. Whenever necessary, spill code is inserted during final
register assignment
and scheduled during code compaction. Register assignment is performed
taking into account
its impact on code compaction. This strategy results in final code of high
quality.
78
An efficient model for DSP code generation: performance, code size,
estimated energy
- Gebotys, C.H.
Dept. of Electr. & Comput. Eng., Waterloo Univ., Ont., Canada
This Paper Appears in :
System Synthesis, 1997. Proceedings., Tenth International Symposium on
on Pages: 41 - 47
This Conference was Held : 17-19 Sept. 1997
1997
ISBN: 0-8186-7949-2
IEEE Catalog Number: 97TB100114
Total Pages: x+141
References Cited: 16
Accession Number: 5717352
Abstract:
The paper presents a model for simultaneous instruction selection, compaction,
and register
allocation. An arc mapping model, along with logical propositions is used
to create an
optimization model. Code is generated in fast cpu times and is optimized
for minimum code
size, maximum performance or estimated energy dissipation. Code generated
for realistic
DSP applications provides performance and code size improvements from 1.09
up to 2.18
times for the TMS320C2x processor compared to previous research and a commercial
compiler. In all examples, up to 106 instructions are generated in under
one cpu minute. This
research is important for industry since DSP code can be efficiently generated
with
constraints on code size, performance and energy dissipation.
79
Optimal register assignment to loops for embedded code generation
- Kolson, D.J.; Nicolau, A.; Dutt, N.; Kennedy, K.
Dept. of Inf. & Comput. Sci., California Univ., Irvine, CA, USA
This Paper Appears in :
System Synthesis, 1995., Proceedings of the Eighth International Symposium
on
on Pages: 42 - 47
This Conference was Held : 13-15 Sept. 1995
1995
ISBN: 0-8186-7076-2
IEEE Catalog Number: 95TH8050
Total Pages: xiii+175
References Cited: 18
Accession Number: 5087874
Abstract:
One of the challenging tasks in code generation for embedded systems is
register
assignment. When more live variables than registers exist, some variables
are necessarily
accessed from data memory. Because loops are typically executed many times
and are often
time-critical, good register assignment in loops is exceedingly important,
since accessing
data memory can degrade performance. The issue of finding an optimal register
assignment
to loops, one which minimizes the number of spills between registers and
memory, has been
open for some time. In this paper, we address this issue and present an
optimal, but
exponential, algorithm which assigns registers to loop bodies such that
the resulting spill
code is minimal. We also show that a heuristic modification performs as
well as the
exponential approach on typical loops from scientific code.
80
Code generation for a DSP processor
- Wei-Kai Cheng; Youn-Long Lin
Dept. of Comput. Sci., Tsinghua Univ., Beijing, China
This Paper Appears in :
High-Level Synthesis, 1994., Proceedings of the Seventh International Symposium
on
on Pages: 82 - 87
This Conference was Held : 18-20 May 1994
1994
ISBN: 0-8186-5785-5
IEEE Catalog Number: 94TH0641-1
Total Pages: ix+171
References Cited: 13
Accession Number: 4706383
Abstract:
Proposes a method for compiling an application program into microcodes
of a programmable
DSP processor. Since most state-of-the-art DSP processors feature some
sort of parallel
processing architectures, the code generation is a non-trivial task. Based
on several
scheduling and allocation techniques previously developed by the CAD community
for
high-level synthesis, we propose a DSP code generator. We emphasize reducing
the memory
access and register usage conflicts, which often lengthen the total execution
time. Starting
with an as-soon-as-possible scheduling, without regard to the resource
constraints, we
transform this illegal scheduling step-by-step into a legal one. In the
meantime, registers are
allocated and reallocated for variables, taking into account both memory
access and register
usage constraints. A software system called THEDA.DSP/sub CG/ has been
implemented
and tested using a set of benchmark programs. Simulation of generated codes
which are
targeted towards the TI TMS320C40 DSP processor shows that the proposed
approach is
indeed very effective.
81
Constraint analysis for DSP code generation
- Mesman, B.; Timmer, A.H.; Van Meerbergen, J.L.; Jess, J.A.G.
Philips Res. Lab., Eindhoven, Netherlands
This Paper Appears in :
Computer-Aided Design of Integrated Circuits and Systems, IEEE Transactions
on
on Pages: 44 - 57
Jan. 1999
Vol. 18
Issue: 1
ISSN: 0278-0070
References Cited: 25
CODEN: ITCSDI
Accession Number: 6148508
Abstract:
Code generation methods for digital signal processing (DSP) applications
are hampered by
the combination of tight timing constraints imposed by the performance
requirements of DSP
algorithms and resource constraints imposed by a hardware architecture.
In this paper, we
present a method for register binding and instruction scheduling based
on the exploitation and
analysis of the combination of resource and timing constraints. The analysis
identifies
implicit sequencing relations between operations in addition to the preceding
constraints.
Without the explicit modeling of these sequencing constraints, a scheduler
is often not
capable of finding a solution that satisfies the timing and resource constraints.
The presented
approach results in an efficient method to obtain high-quality instruction
schedules with low
register requirements.
82
Register files constraint satisfaction during scheduling of DSP code
- Pinto, C.A.A.; Mesman, B.; Van Eijk, K.
Design Autom. Sect., Eindhoven Univ. of Technol., Netherlands
This Paper Appears in :
Integrated Circuits and Systems Design, 1999. Proceedings. XII Symposium
on
on Pages: 74 - 77
This Conference was Held : 29 Sept.-2 Oct. 1999
1999
ISBN: 0-7695-0387-X
IEEE Catalog Number: PR00387
Total Pages: xiii+236
References Cited: 8
Accession Number: 6520693
Abstract:
Algorithms in digital signal processing (DSP) impose tight timing constraints
that the
compiler has to respect while considering the limited capacity of the available
register files in
a target DSP processor. Traditional code generation methods that schedule
spill code to
satisfy storage capacity may take many iterations and are usually not capable
of satisfying
the timing constraints. In this paper we present a new method to handle
register file capacity
constraints during scheduling. The method identifies potential bottlenecks
for register binding
and subsequently serializes the lifetimes of values until it can be guaranteed
that all capacity
constraints will be satisfied after scheduling. Experiments show that we
efficiently obtain
high quality instruction schedules for DSP kernels.
83
Algorithms for address assignment in DSP code generation
- Leupers, R.; Marwedel, P.
Dept. of Comput. Sci., Dortmund Univ., Germany
This Paper Appears in :
Computer-Aided Design, 1996. ICCAD-96. Digest of Technical Papers., 1996
IEEE/ACM
International Conference on
on Pages: 109 - 112
This Conference was Held : 10-14 Nov. 1996
1996
ISBN: 0-8186-7597-7
IEEE Catalog Number: 96CB35991
Total Pages: xxv+697
References Cited: 7
Accession Number: 5465406
Abstract:
This paper presents DSP code optimization techniques, which originate from
dedicated
memory address generation hardware. We define a generic model of DSP address
generation
units. Based on this model we present efficient heuristics for computing
memory layouts for
program variables, which optimize utilization of parallel address generation
units.
Improvements and generalizations of previous work are described, and the
efficacy of the
proposed algorithms is demonstrated through experimental evaluation.
84
Instruction selection for embedded DSPs with complex instructions
- Leupers, R.; Marwedel, P.
Dept. of Comput. Sci., Dortmund Univ., Germany
This Paper Appears in :
Design Automation Conference, 1996, with EURO-VHDL '96 and Exhibition,
Proceedings EURO-DAC
'96, European
on Pages: 200 - 205
This Conference was Held : 16-20 Sept. 1996
1996
ISBN: 0-8186-7573-X
IEEE Catalog Number: 96CB36000
Total Pages: xxiii+579
References Cited: 12
Accession Number: 5412419
Abstract:
We address the problem of instruction selection in code generation for
embedded digital
signal processors. Recent work has shown that this task can be efficiently
solved by tree
covering with dynamic programming, even in combination with the task of
register allocation.
However, performing instruction selection by tree covering only does not
exploit available
instruction level parallelism, for instance in form of multiply-accumulate
instructions or
parallel data moves. In this paper we investigate how such complex instructions
may affect
detection of optimal tree covers, and we present a two-phase scheme for
instruction
selection which exploits available instruction-level parallelism. At the
expense of higher
compilation time, this technique may significantly increase the code quality
compared to
previous work, which is demonstrated for a widespread DSP.
85
Retargetable assembly code generation by bootstrapping
- Leupers, R.; Schenk, W.; Marwedel, P.
Lehrstuhl Inf., Dortmund Univ., Germany
This Paper Appears in :
High-Level Synthesis, 1994., Proceedings of the Seventh International Symposium
on
on Pages: 88 - 93
This Conference was Held : 18-20 May 1994
1994
ISBN: 0-8186-5785-5
IEEE Catalog Number: 94TH0641-1
Total Pages: ix+171
References Cited: 10
Accession Number: 4706384
Abstract:
In a hardware/software codesign environment compilers are needed that map
software
components of a partitioned system behavioral description onto a programmable
processor.
Since the processor structure is not static, but can repeatedly change
during the design
process, the compiler should be retargetable in order to avoid manual compiler
adaption for
each alternative architecture. A restriction of existing retargetable compilers
is that they
only generate microcode for the target architecture instead of machine-level
code. We
introduce a bootstrapping technique permitting to translate high-level
language (HLL)
programs into real machine-level code using a retargetable microcode compiler.
Retargetability is preserved, permitting to compare different architectural
alternatives in a
codesign framework within relatively short time.
86
A knowledge-based retargetable compiler for application specific signal
processors
- Kuroda, I.; Nishitani, T.
NEC Corp., Kanagawa, Japan
This Paper Appears in :
Circuits and Systems, 1989., IEEE International Symposium on
on Pages: 631 - 634 vol.1
This Conference was Held : 8-11 May 1989
1989
Total Pages: 3 vol. xl+2246
References Cited: 6
Accession Number: 3636453
Abstract:
A knowledge-based compiler for application-specific signal processors has
been developed.
In order to generate optimized microcode for specific architectures, code
optimization
knowledge used by expert programmers has been implemented in the knowledge
base and
applied in every phase in the compiler, i.e. program analysis, intermediate
code optimization,
and code generation. This knowledge-based approach has been evaluated using
a signal
processor mu PD77230. The developed compiler generates optimized microcode
whose code
size is almost the same as the code realized by expert programmers. This
approach also
leads to a compiler that can be retargeted by replacing the machine-dependent
databases in
the compiler.
87
An evaluation system for application specific architectures
- De Gloria, A.; Faraboschi, P.
Dept. of Biophys. & Electron. Eng., Genoa Univ., Italy
This Paper Appears in :
Microprogramming and Microarchitecture. Micro 23. Proceedings of the 23rd
Annual Workshop and
Symposium., Workshop on
on Pages: 80 - 89
This Conference was Held : 27-29 Nov. 1990
1990
ISBN: 0-8186-2124-9
Total Pages: x+299
References Cited: 10
Accession Number: 4038572
Abstract:
Application specific architectures are assuming an important role in the
design of tailored
systems as they enable a better cost/performance ratio, by exploiting application
intrinsic
features, with respect to standard components. An ASA design environment
has been
developed in order to allow the evaluation of different architecture solutions
in terms of cost
and performance. The system deals with parallel synchronous non-homogeneous
architectures and, starting from the high-level description of the application
benchmarks,
reaches code generation and simulation of architectures whose description
can range from
simple timing organization to detailed data-path and instruction structures.
As an application
example, the system is applied to the comparison of pipelined and parallel
micro-architecture
organizations for floating-point processing.
88
Code generation for embedded processors with complex instructions
- Jong-Yeol Lee; Hyun-Dhong Yoon; Jin-Hyuk Yang; In-Cheol Park; Chong-Min
Kyung
Korea Advanced Institute of Science and Technology
This Paper Appears in :
VLSI and CAD, 1999. ICVC '99. 6th International Conference on
on Pages: 525 - 527
This Conference was Held : October 26-27, 1999
1999
ISBN: 0-7803-5727-2
Abstract:
Not Available
89
A hardware/software partitioning algorithm for processor cores of digital
signal processing
- Togawa, N.; Sakurai, T.; Yanagisawa, M.; Ohtsuki, T.
Dept. of Electr., Inf. & Commun. Eng., Waseda Univ., Tokyo, Japan
This Paper Appears in :
Design Automation Conference, 1999. Proceedings of the ASP-DAC '99. Asia
and South Pacific
on Pages: 335 - 338 vol.1
This Conference was Held : 18-21 Jan. 1999
1999
ISBN: 0-7803-5012-X
IEEE Catalog Number: 99EX198
Total Pages: (xxvi+372+suppl.)
References Cited: 12
Accession Number: 6358317
Abstract:
A hardware/software cosynthesis system for processor cores of digital signal
processing
has been developed. This paper focuses on a hardware/software partitioning
algorithm which
is one of the key issues in the system. Given an input assembly code generated
by the
compiler in the system, the proposed hardware/software partitioning algorithm
first
determines the types and the numbers of required hardware units, such as
multiple functional
units, hardware loop units, and particular addressing units, for a processor
core (initial
resource allocation). Second, the hardware units determined at initial
resource allocation are
reduced one by one while the assembly code meets a given timing constraint
(configuration
of a processor core). The execution time of the assembly code becomes longer
but the
hardware costs for a processor core to execute it becomes smaller. Finally,
it outputs an
optimized assembly code and a processor configuration. Experimental results
demonstrate
that the system synthesizes processor cores effectively according to the
features of an
application program/data.
90
Memory organization for improved data cache performance in embedded
processors
- Panda, P.; Dutt, N.; Nicolau, A.
Dept. of Inf. & Comput. Sci., California Univ., Irvine, CA, USA
This Paper Appears in :
System Synthesis, 1996. Proceedings., 9th International Symposium on
on Pages: 90 - 95
This Conference was Held : 6-8 Nov. 1996
1996
ISBN: 0-8186-7563-2
IEEE Catalog Number: 96TB100061
Total Pages: xii+145
References Cited: 17
Accession Number: 5450818
Abstract:
Code generation for embedded processors creates opportunities for several
performance
optimizations not applicable for traditional compilers. We present techniques
for improving
data cache performance by organizing variables declared in embedded code
into memory,
using specific parameters of the data cache. Our approach clusters variables
to minimize
compulsory cache misses, and solves the memory assignment problem to minimize
conflict
cache misses. Our experiments demonstrate significant improvement in data
cache
performance (average 46% in hit ratios) by the application of our memory
organization
technique using code kernels from DSP and other domains on the LSI Logic
CW4001
embedded processor.
91
Application-driven design of DSP architectures and compilers
- Saghir, M.A.R.; Chow, P.; Lee, C.G.
Dept. of Electr. & Comput. Eng., Toronto Univ., Ont., Canada
This Paper Appears in :
Acoustics, Speech, and Signal Processing, 1994. ICASSP-94., 1994 IEEE International
Conference on
on Pages: II/437 - II/440 vol.2
This Conference was Held : 19-22 April 1994
1994
Vol. ii
ISBN: 0-7803-1775-0
IEEE Catalog Number: 94CH3387-8
Total Pages: 6 vol. 3382
References Cited: 9
Accession Number: 4917147
Abstract:
Current DSP architectures are designed to enhance the execution of
computationally-intensive, kernel-like loops. Their peculiar architectural
features are often
difficult for high-level language compilers to exploit. Moreover, their
tightly-encoded
instruction sets usually restrict the exploitation of instruction-level
parallelism beyond a few
instances. The quality of compiler-generated code is therefore poor when
compared to
hand-coded assembly language. We argue for an application-driven approach
to designing
flexible DSP architectures and effective compilers. We show that the run-time
behavior and
architectural characteristics of DSP kernels are different from those of
DSP applications.
We also show that when given a sufficiently flexible target architecture,
a compiler is
capable of effectively exploiting instances of instruction-level parallelism
and DSP-specific
architectural features. Finally, we show that a suitable DSP architecture
is one that provides
the functionality to support digital signal processing requirements, and
the flexibility that
enables a compiler to generate efficient code.
92
Exploiting conditional instructions in code generation for embedded
VLIW processors
- Leupers, R.
Editor(s): Borrione, D., Ernst, R.
Dept. of Comput. Sci., Dortmund Univ., Germany
This Paper Appears in :
Design, Automation and Test in Europe Conference and Exhibition 1999. Proceedings
on Pages: 105 - 109
This Conference was Held : 9-12 March 1999
1999
ISBN: 0-7695-0078-1
IEEE Catalog Number: PR00078
Total Pages: xxx+798
References Cited: 10
Accession Number: 6375753
Abstract:
This paper presents a new code optimization technique for a class of embedded
processors.
Modern embedded processor architectures show deep instruction pipelines
and highly
parallel VLIW-like instruction sets. For such architectures, any change
in the control flow of
a machine program due to a conditional jump may cause a significant code
performance
penalty. Therefore, the instruction sets of recent VLIW machines offer
support for
branch-free execution of conditional statements in the form of so-called
conditional
instructions. Whether an if-then-else statement is implemented by a conditional
jump
scheme or by conditional instructions has a strong impact on its worst-case
execution time.
However the optimal selection is difficult particularly for nested conditionals.
We present a
dynamic programming technique for selecting the fastest implementation
for nested
if-then-else statements based on estimations. The efficacy is demonstrated
for a real-life
VLIW DSP.
93
Synthesis of application specific instruction sets
- Ing-Jer Huang; Despain, A.M.
Inst. of Comput. & Inf. Eng., Nat. Sun Yat-Sen Univ., Kaohsiung, Taiwan
This Paper Appears in :
Computer-Aided Design of Integrated Circuits and Systems, IEEE Transactions
on
on Pages: 663 - 675
June 1995
Vol. 14
Issue: 6
ISSN: 0278-0070
References Cited: 21
CODEN: ITCSDI
Accession Number: 4975295
Abstract:
In instruction set serves as the interface between hardware and software
in a computer
system. In an application specific environment, the system performance
can be improved by
designing an instruction set that matches the characteristics of hardware
and the application.
We present a systematic approach to generate application-specific instruction
sets so that
software applications can be efficiently mapped to a given pipelined micro-architecture.
The
approach synthesizes instruction sets from application benchmarks, given
a machine model,
an objective function, and a set of design constraints. In addition, assembly
code is
generated to show how the benchmarks can be compiled with the synthesized
instruction
set. The problem of designing instruction sets is formulated as a modified
scheduling
problem. A binary tuple is proposed to model the semantics of instructions
and integrate the
instruction formation process into the scheduling process. A simulated
annealing scheme is
used to solve for the schedules. Experiments have shown that the approach
is capable of
synthesizing powerful instructions for modern pipelined microprocessors,
and running with
reasonable time and a modest amount of memory for large applications.
94
A data dependent approach to instruction level power estimation
- Sarta, D.; Trifone, D.; Ascia, G.
Editor(s): Piuri, V.
Catania Univ., Italy
This Paper Appears in :
Low-Power Design, 1999. Proceedings. IEEE Alessandro Volta Memorial Workshop
on
on Pages: 182 - 190
This Conference was Held : 4-5 March 1999
1999
ISBN: 0-7695-0019-6
Total Pages: x+203
References Cited: 7
Accession Number: 6370330
Abstract:
The increasing diffusion of portable systems, like mobile computers and
phones, or embedded
computing applications has driven the need for power analysis and optimization
in digital
processors used in these systems. In modern CPUs, power estimation and
optimization are
"two strongly pattern dependent" problems. This means that the influence
of the software in
power consumption is very high and a power figure for whatever processor
must be related to
the running software program. Based on the recent techniques already described
in literature,
we propose a new instruction level power analysis approach, that tries
to relate the power
dissipation to the executed instructions and their operand values.
95
Synthesis of application specific instructions for embedded DSP software
- Hoon Choi; In-Cheol Park; Seung Ho Hwang; Chong-Min Kyung
Dept. of Electr. Eng., Korea Adv. Inst. of Sci. & Technol., Seoul,
South Korea
This Paper Appears in :
Computer-Aided Design, 1998. ICCAD 98. Digest of Technical Papers. 1998
IEEE/ACM International
Conference on
on Pages: 665 - 671
This Conference was Held : 8-12 Nov. 1998
1998
ISBN: 1-58113-008-2
IEEE Catalog Number: 98CB36287
Total Pages: xxii+704
References Cited: 15
Accession Number: 6127727
Abstract:
Application specific instructions play an important role in reducing the
required code size and
increasing performance. This paper describes a new approach to generate
application
specific instructions for DSP applications. The proposed approach is based
on a modified
subset-sum problem, and can support multi-cycle complex instructions as
well as single
cycle instructions, while the previous state-of-the-art approaches can
generate only the
single-cycle instructions or can just select instructions from the fixed
super-set of possible
instructions. In addition, the proposed approach can also be applicable
to the case that
instructions are predefined. The experimental results on real applications
show that the
proposed approach is effective in making the instructions meet the given
constraints without
attaching special hardware accelerators.
96
Generating Instruction Sets And Microarchitectures From Applications
- Ing-Jer Huang; Despain, A.M.
This Paper Appears in :
Computer-Aided Design, 1994., IEEE/ACM International Conference on
on Pages: 391 - 396
This Conference was Held : November 6-10, 1994
ISSN: 1063-6757
Abstract:
Not Available
97
Hardware-software co-designing benchmark-driven superpipelined
instruction set processors
- Ching-Long Su; Despain, A.M.
Lab. of Adv. Comput. Archit., Univ. of Southern California, Los Angeles,
CA, USA
This Paper Appears in :
Computer Software and Applications Conference, 1994. COMPSAC 94. Proceedings.,
Eighteenth
Annual International
on Pages: 319
This Conference was Held : 9-11 Nov. 1994
1994
ISBN: 0-8186-6705-2
IEEE Catalog Number: 94CH35721
Total Pages: xvii+477
References Cited: 2
Accession Number: 4829894
Abstract:
This paper focuses on the issues of designing an optimal superpipelined
ISP (instruction set
processor) driven by a set of benchmark programs. Most issues discussed
in this paper also
apply to VLIW and superscalar processors.
98
An integrated approach to retargetable code generation
- Wilson, T.; Grewal, G.; Halley, B.; Banerji, D.
VLSI-CAD Group, Guelph Univ., Ont., Canada
This Paper Appears in :
High-Level Synthesis, 1994., Proceedings of the Seventh International Symposium
on
on Pages: 70 - 75
This Conference was Held : 18-20 May 1994
1994
ISBN: 0-8186-5785-5
IEEE Catalog Number: 94TH0641-1
Total Pages: ix+171
References Cited: 10
Accession Number: 4706381
Abstract:
Special-purpose instruction set processors (ISPs) challenge compilers because
of
instruction level parallelism, small numbers of registers, and highly specialized
register
capabilities. Many traditionally separate subproblems in code generation
have been unified
and jointly optimized within a single integer linear programming (ILP)
model. ILP modeling
provides a powerful methodology for generating high-quality code for a
variety of ISPs.
99
Media architecture: general purpose vs. multiple application-specific
programmable processor
- Chunho Lee; Kin, J.; Potkonjak, M.; Mangione-Smith, W.H.
Dept. of Comput. Sci., California Univ., Los Angeles, CA, USA
This Paper Appears in :
Design Automation Conference, 1998. Proceedings
on Pages: 321 - 326
This Conference was Held : 15-19 June 1998
1998
ISBN: 0-89791-964-5
IEEE Catalog Number: 98CH36175
Total Pages: xxxii+820
References Cited: 33
Accession Number: 6084423
Abstract:
In this paper we report a framework that makes it possible for a designer
to rapidly explore
the application-specific programmable processor design space under area
constraints. The
framework uses a production-quality compiler and simulation tools to synthesize
a high
performance machine for an application. Using the framework we evaluate
the validity of the
fundamental assumption behind the development of application-specific programmable
processors. Application-specific processors are based on the idea that
applications differ
from each other in key architectural parameters, such as the available
instruction-level
parallelism, demand on various hardware components (e.g. cache memory units,
register
files) and the need for different number of functional units. We found
that the framework
introduced in this paper can be valuable in making early design decisions
such as area and
architectural trade-off, cache and instruction issue width trade-off under
area constraint,
and the number of branch units and issue width.
100
Architecture Description Languages for Systems-on-Chip Design
- Tominiyama H.; Halambi A.; Grun P.; Dutt N.; Nicolau A.
University of California, Irvine, CA, USA
This Paper Appears in :
APCHDL 1999
Abstract
Not Available
101
Parameterized System Design
- Givargis T.D.; Vahid F.
University of California, Riverside, CA
CODES 2000 Held in May 2000.
Abstract
Continued growth in chip capacity has
led to new methodologies stressing reuse,
not only of pre-designed processing components,
but even of entire pre-designed
architectures. To be used across a variety of applications, such architectures
must be
heavily parameterized, so they can adapt to those applications'
differing constraints
by trading off power, performance and size.
We describe several parameterized
system design issues, and provide results
showing how a single architecture with
easily configurable parameters can support a wide range of tradeoffs.
102
Power Analysis Of Embedded Software: A First Step Towards Software
Power Minimization
- Tiwari, V.; Malik, S.; Wolfe, A.
This Paper Appears in :
Computer-Aided Design, 1994., IEEE/ACM International Conference on
on Pages: 384 - 390
This Conference was Held : November 6-10, 1994
ISSN: 1063-6757
Abstract:
Not Available
103
Power analysis and minimization techniques for embedded DSP software
- Mike Tien-Chien Lee; Tiwari, V.; Malik, S.; Fujita, M.
Fujitsu Labs. of America, Santa Clara, CA, USA
This Paper Appears in :
Very Large Scale Integration (VLSI) Systems, IEEE Transactions on
on Pages: 123 - 135
This Conference was Held : 18-20 Jan. 1995
March 1997
Vol. 5
Issue: 1
ISSN: 1063-8210
References Cited: 15
CODEN: IEVSE9
Accession Number: 5525493
Abstract:
Power is becoming a critical constraint for designing embedded applications.
Current power
analysis techniques based on circuit-level or architectural-level simulation
are either
impractical or inaccurate to estimate the power cost for a given piece
of application
software. In this paper, an instruction-level power analysis model is developed
for an
embedded digital signal processor (DSP) based on physical current measurements.
Significant points of difference have been observed between the software
power model for
this custom DSP processor and the power models that have been developed
earlier for some
general purpose commercial microprocessors. In particular, the effect of
circuit state on the
power cost of an instruction stream is more marked in the case of this
DSP processor. In
addition, the processor has special architectural features that allow dual
memory accesses
and packing of instructions into pairs. The energy reduction possible through
the use of these
features is studied. The on-chip Booth multiplier on the processor is a
major source of
energy consumption for DSP programs. A microarchitectural power model for
the multiplier is
developed and analyzed for further power minimization. In order to exploit
all of the above
effects, a scheduling technique based on the new instruction-level power
model is proposed.
Several example programs are provided to illustrate the effectiveness of
this approach.
Energy reductions varying from 26% to 73% have been observed. These energy
savings are
real and have been verified through physical measurement. It should be
noted that the energy
reduction essentially comes for free. It is obtained through software modification,
and thus,
entails no hardware overhead. In addition, there is no loss of performance
since the running
times of the modified programs either improve or remain unchanged.
104
Designing for low power in complex embedded DSP systems
- Gebotys, C.H.; Gebotys, R.J.
Editor(s): Sprague, R.H., Jr.
Dept. of Electr. & Comput. Eng., Waterloo Univ., Ont., Canada
This Paper Appears in :
Systems Sciences, 1999. HICSS-32. Proceedings of the 32nd Annual Hawaii
International Conference
on
on Pages: 8 pp.
This Conference was Held : 5-8 Jan. 1999
1999
ISBN: 0-7695-0001-3
Total Pages: liii+341
References Cited: 17
Accession Number: 6182117
Abstract:
This paper presents an empirical methodology for low power driven complex
DSP embedded
systems design. Unlike DSP design for high performance, research of low
power DSP design
has received little attention, yet power dissipation is an increasingly
important and growing
problem. Highly accurate power prediction models for DSP software are derived.
Unlike
previous techniques, the methodology derives software power prediction
models using
statistical optimization and it is verified with real power measurements.
The approach is
general enough to be applied to any embedded DSP processor. Results from
two different
DSP processors and over 180 power measurements of DSP code show that power
can be
predicted far embedded systems design with less than 4% error. This result
is important for
developing a general methodology for power characterization of embedded
DSP software
since low power is critical to complex DSP applications in many cost sensitive
markets.
105
Speeding up Power Estimation of Embedded Software
- Sama, A.; Balakrishnan, M.; Theeuwen, J.F.M.
This
Paper will Appear in :
ISPLED 2000 to be held on July 2000.
106
High-level power modeling, estimation, and optimization
- Macii, E.; Pedram, M.; Somenzi, F.
Politecnico di Torino, Italy
This Paper Appears in :
Computer-Aided Design of Integrated Circuits and Systems, IEEE Transactions
on
on Pages: 1061 - 1079
Nov. 1998
Vol. 17
Issue: 11
ISSN: 0278-0070
References Cited: 111
CODEN: ITCSDI
Accession Number: 6120274
Abstract:
Silicon area, performance, and testability have been, so far, the major
design constraints to
be met during the development of digital very-large-scale-integration (VLSI)
systems. In
recent years, however, things have changed; increasingly, power has been
given weight
comparable to the other design parameters. This is primarily due to the
remarkable success
of personal computing devices and wireless communication systems, which
demand
high-speed computations with low power consumption. In addition, there
exists a strong
pressure for manufacturers of high-end products to keep power under control,
due to the
increased costs of packaging and cooling this type of device. Last, the
need of ensuring high
circuit reliability has turned out to be more stringent. The availability
of tools for the
automatic design of low-power VLSI systems has thus become necessary. More
specifically, following a natural trend, the interests of the researchers
have lately shifted to
the investigation of power modeling, estimation, synthesis, and optimization
techniques that
account for power dissipation during the early stages of the design flow.
This paper surveys
representative contributions to this area that have appeared in the recent
literature.
107
A minimum-cost circulation approach to DSP address-code generation
- Gebotys, C.H.
Dept. of Electr. & Comput. Eng., Waterloo Univ., Ont., Canada
This Paper Appears in :
Computer-Aided Design of Integrated Circuits and Systems, IEEE Transactions
on
on Pages: 726 - 741
June 1999
Vol. 18
Issue: 6
ISSN: 0278-0070
References Cited: 24
CODEN: ITCSDI
Accession Number: 6270838
Abstract:
This paper presents a new approach to solving the DSP address code generation
problem. A
minimum cost circulation approach is used to efficiently generate high-performance
addressing code in polynomial time. Results show that addressing code size
improvements of
up to 6/spl times/ are obtained, accounting for up to 1.6/spl times/ improvement
in code size
and performance of compiler-generated DSP code. This research is important
for industry
since this value-added technique can improve code size, energy dissipation,
and
performance, without increasing cost.
108
Application-driven synthesis of memory-intensive systems-on-chip
- Kirovski, D.; Chunho Lee; Potkonjak, M.; Mangione-Smith, W.H.
Dept. of Comput. Sci., California Univ., Los Angeles, CA, USA
This Paper Appears in :
Computer-Aided Design of Integrated Circuits and Systems, IEEE Transactions
on
on Pages: 1316 - 1326
Sept. 1999
Vol. 18
Issue: 9
ISSN: 0278-0070
References Cited: 29
CODEN: ITCSDI
Accession Number: 6347848
Abstract:
Due to the increasing popularity of multimedia and communications applications,
requirements for application-specific systems typically include design
flexibility and data
management ability. Since the development of such systems is a market-driven
task,
reducing the time to market and manufacturing cost, while still satisfying
application
performance requirements, is an important system synthesis requirement.
We have
developed a new approach for area optimization of core-based systems. The
approach uses
basic block relocation in order to reduce the number of cache misses and,
thus, enable
hardware savings during system synthesis. Given a processor model, a cache
model, and a
set of nonpreemptive tasks with timing constraints, the goal of the synthesis
framework is to
select a system configuration (processor, I-cache, and D-cache) of minimal
area that
satisfies the performance constraints. The system synthesis framework has
two key
components. The first component is a code optimization engine that relocates
basic blocks
within a given assembly program in order to reduce the number of cache
misses. The second
component is a search mechanism that leverages the improvements in code
performance
obtained by the first component to select the most area-efficient system
configuration. In
order to bridge the gap between the profiling and modeling tools, we have
constructed a new
performance evaluation platform. It integrates the existing modeling, profiling,
and simulation
tools with the developed system-level synthesis tools. The effectiveness
of the synthesis
approach is demonstrated on a variety of modern real-life multimedia and
communication
applications.
109
Programmable DSP architectures. I
- Lee, E.A.
Dept. of Electr. Eng. & Comput. Sci., California Univ., Berkeley, CA,
USA
This Paper Appears in :
ASSP Magazine, IEEE [see also IEEE Signal Processing Magazine]
on Pages: 4 - 19
Oct. 1988
Vol. 5
Issue: 4
ISSN: 0740-7467
References Cited: 22
CODEN: IAMAEI
Accession Number: 3354120
Abstract:
The architectural features of single-chip programmable digital signal processors
(DSPs) are
explored. The focus is on the most basic such feature, the integration
of a hardware
multiplier/accumulator into the data path, and a more subtle feature, the
use of several (up to
six) independent memory banks. These features are studied in terms of the
performance
benefit and the impact on the user. Representative DSPs from three manufacturers
AT&T
Motorola, and Texas Instruments are used to illustrate the ideas to compare
different
solutions to the same problems.
110
Programmable DSP architectures. II
- Lee, E.A.
Dept. of Electr. Eng. & Comput. Sci., California Univ., Berkeley, CA,
USA
This Paper Appears in :
ASSP Magazine, IEEE [see also IEEE Signal Processing Magazine]
on Pages: 4 - 14
Jan. 1989
Vol. 6
Issue: 1
ISSN: 0740-7467
References Cited: 9
CODEN: IAMAEI
Accession Number: 3375276
Abstract:
For pt.I see ibid., vol.5, no.4, p.4-19, Oct. 1988. Three distinct techniques
are used for dealing
with pipelining, namely, interlocking, time-stationary coding, and data-stationary
coding, are
examined. These techniques are studied in light of the performance benefit
and the impact on
the user. Representative DSPs from AT&T, Motorola, and Texas Instruments
are used to
illustrate the ideas and compare different solutions to the same problems.
Trends are
discussed, and some predictions for the future are made.
111
Design Challenges for New Application-Specific
Processors
Margarida F. Jacome and Gustavo de Veciana
This article discusses research challenges
in developing methodologies and retargetable compilers/CAD tools for the
synthesis and analysis of a key component
in portable digital communications and multimedia consumer electronics
systems, namely, application-specific
processors and associated compilers. Typically, functionality is implemented
in
software; however, the penalty in cost
efficiency incurred by using general-purpose processors, or even "off-the-shelf"
DSP cores, may be unacceptable. Very
large instruction word (VLIW) application-specific instruction-set processors
(ASIPs) realize attractive cost/efficiency
trade-offs. Still, difficulties with ASIP design and current compiler technology
pose significant obstacles to this technology.
In this article we discuss these challenges and propose a framework to
jointly address the synthesis of VLIW
ASIPs and the development of high-quality retargetable compilers for such
specialized processors.