4 Choosing the Right DSP Processor

4.1 Digital Signal Processing

Digital signal processing is a method of processing real world signals (represented by a sequence of numbers) using mathematical techniques to perform transformations or extract information.

We don't speak in a digital signal. A digital signal is a language of 1s and 0s that can be processed by mathematics. We speak in real-world, analog signals. Analog signals are real world signals that we experience everyday - sound, light, temperature, and pressure. A digital signal is a numerical representation of the analog signal. It may be easier and more cost effective to process these signals in the digital world. In the real world, we can convert these signals into digital signals through our analog-to-digital conversion process, process the signals, and if needed, bring the signals back out to the analog world through the digital-to-analog converter (Figure 4‑1).

Figure 4‑1: Digital signal processing logic.

The result is crystal clear sound, with no annoying echoes. That's a basic explanation of what a DSP does. It takes a digital signal and processes it to improve the signal. The improvement may be clearer sound, sharper images, or faster data. And that ability to improve signals is making new breakthroughs such as Internet music and broadband to the home possible.

These real-time processors make up the fastest-growing segment of the semiconductor market and are particularly well suited to handle the demands of processing information, whether as the engine of communications applications, by providing the processing platform for the convergence of the internet and wireless applications, or by enabling breakthroughs in medical imaging or performance audio.

4.2 Choosing the Technology

If a universal microprocessor solution existed with which every design could be realized, the electronics industry wouldn't be a very competitive place. However, typically in most electronic designs, more than one processor technology can be used to implement the required functions. The trick is, of course, to choose the one that best delivers the performance, size, power consumption, features, software and tools to get the job done fast - without breaking the budget. After almost two decades of development, digital signal processors continue to take the place of competitive processors. Digital signal processors are, after all, at the center of signal processing.

4.2.1 Digital Signal Processors (DSP)

A digital signal processor (DSP) is a type of microprocessor - one that is incredibly fast and powerful. A DSP is unique because it processes data in real time. This real-time capability makes a DSP perfect for applications that cannot tolerate any delays. For example, did you ever talk on a cell phone where two people couldn't talk at once? You had to wait until the other person finished talking. If you both spoke simultaneously, the signal was cut--you didn't hear the other person. With today's digital cell phones, which use DSP, we can talk normally. The DSP processors inside cell phones process sounds so rapidly we hear them as quickly as we can speak - in real time.

Here are just some of the advantages of designing with DSPs over other microprocessors:

· Single-cycle multiply-accumulate operations

· Real-time performance, simulation and emulation

· Flexibility

· Reliability

· Increased system performance

· Reduced system cost

4.2.2 Field Programmable Gate Arrays (FPGA)

Field-Programmable Gate Arrays have the capability of being reconfigurable within a system, which can be a big advantage in applications that need multiple trial versions within development. They also offer greater raw performance per specific operation because of the resulting dedicated logic circuit. However, FPGAs are significantly more expensive and typically have much higher power dissipation than DSPs with similar functionality. As such, even when FPGAs are the chosen performance technology in designs such as wireless infrastructure, DSPs are typically used in conjunction with FPGAs to provide greater flexibility, better price/performance ratios, and lower system power.

4.2.3 Application Specific IC (ASIC)

Application-specific ICs can be tailored to perform specific functions extremely well, and can be made quite power efficient. However, since ASICS are not field-programmable, their functionality cannot be iteratively changed or updated while in product development. As such, every new version of the product requires a redesign and trips through the foundry, an expensive proposition, and an impediment. Programmable DSPs, on the other hand, can be updated without changing the silicon, merely change the software program, greatly reducing development costs, and availing aftermarket feature enhancements with mere code downloads. Consequently, more often than not, when we see ASICs in real time signal processing applications, they are typically employed as bus interfaces, glue logic, and/or functional accelerators for a programmable DSP-based system.

4.2.4 General Purpose Procesoors (GPP)

In contrast to ASICs that are optimized for specific functions, general-purpose microprocessors (GPPs) are best suited for performing a broad array of tasks. However, for applications in which the end product must process answers in real time, or must do so while powered by consumer batteries, GPPs comparatively poor real time performance and high power consumption all but rules them out. More and more, these processors are being seen as the dinosaurs of the industry, too encumbered with PC compatibility and desktop features to adapt to the changing real time market place. As the world embraces tiny hand-held wireless-enabled products that require power dissipation measured in milliwatts-not the watts that these processors consume - DSPs are the programmable technology of choice. That trend is bound to continue as digital Internet appliances get smaller, faster and more portable.

4.3 Choosing the Processor

The right DSP processor for a job depends heavily on the application. One processor may perform well for some applications, but be a poor choice for others. With this in mind, one can consider a number of features that vary from one DSP to another in selecting a processor. These features are discussed below.

4.3.1 Arithmetic Format

One of the most fundamental characteristics of a programmable digital signal processor is the type of native arithmetic used in the processor. Most DSPs use fixed point arithmetic, while other processors using floating-point arithmetic. Floating-point arithmetic is a more flexible and general mechanism than fixed-point. With floating-point, system designers have access to wider dynamic range (the ratio between the largest and smallest numbers that can be represented). As a result, floating-point DSP processors are generally easier to program than their fixed point cousins, but usually are also more expensive and have higher power consumption. The increased cost and power consumption result from the more complex circuitry required within the floating-point processor, which implies a larger silicon die. The ease-of-use advantage of floating-point processors is due to the fact that in many cases the programmer doesn’t have to be concerned about dynamic range and precision. In contrast, on a fixed-point processor, programmers often must carefully scale signals at various stages of their programs to ensure adequate numeric precision with the limited dynamic range of the fixed-point processor. Most high-volume, embedded applications use fixed point processors because the priority is on low cost and, often, low power. Programmers and algorithm designers determine the dynamic range and precision needs of their application, either analytically or through simulation, and then add scaling operations into the code if necessary.

For applications that have extremely demanding dynamic range and precision requirements, or where ease of development is more important than unit cost, floating-point processors have the advantage. It’s possible to perform general-purpose floating point arithmetic on a fixed-point processor by using software routines that emulate the behavior of a floating point device. However, such software routines are usually very expensive in terms of processor cycles. Consequently, general-purpose floating-point emulation is seldom used. A more efficient technique to boost the numeric range of fixed-point processors is block floating point, wherein a group of numbers with different mantissas but a single, common exponent are processed as a block of data. Block floating-point is usually handled in software, although some processors have hardware features to assist in its implementation.

4.3.2 Data Width

All common floating-point DSPs use a 32-bit data word. For fixed-point DSPs, the most common data word size is 16 bits. Motorola’s DSP563xx family uses a 24-bit data word, however, while Zoran’s ZR3800x family uses a 20-bit data word. The size of the data word has a major impact on cost, because it strongly influences the size of the chip and the number of package pins required, as well as the size of external memory devices connected to the DSP. Therefore, designers try to use the chip with the smallest word size that their application can tolerate.

As with the choice between fixed and floating point chips, there is often a trade-off between word size and development complexity. For example, with a 16-bit fixed-point processor, a programmer can perform double-precision 32-bit arithmetic operations by stringing together an appropriate combination of instructions. (Of course, double-precision arithmetic is much slower than single-precision arithmetic.) If the bulk of an application can be handled with single-precision arithmetic, but the application needs more precision for a small section of the code, the selective use of double-precision arithmetic may make sense. If most of the application requires more precision, a processor with a larger data word size is likely to be a better choice.

Note that while most DSP processors use an instruction word size equal to their data word size, not all does. The Analog Devices ADSP-21xx family, for example, uses a 16-bit data word and a 24-bit instruction word.

4.3.3 Speed

A key measure of the suitability of a processor for a particular application is its execution speed. There are a number of ways to measure a processor’s speed. Perhaps the most fundamental is the processor’s instruction cycle time: the amount of time required to execute the fastest instruction on the processor. The reciprocal of the instruction cycle time divided by one million and multiplied by the number of instructions executed per cycle is the processor’s peak instruction execution rate in millions of instructions per second, or MIPS.

A problem with comparing instruction execution times is that the amount of work accomplished by a single instruction varies widely from one processor to another. Some of the newest DSP processors use VLIW (very long instruction word) architectures, in which multiple instructions are issued and executed per cycle. These processors typically use very simple instructions that perform much less work than the instructions typical of conventional DSP processors. Hence, comparisons of MIPS ratings between VLIW processors and conventional DSP processors can be particularly misleading, because of fundamental differences in their instruction set styles.

Even when comparing conventional DSP processors, however, MIPS ratings can be deceptive. Although the differences in instruction sets are less dramatic than those seen between conventional DSP processors and VLIW processors, they are still sufficient to make MIPS comparisons inaccurate measures of processor performance. For example, some DSPs feature barrel shifters that allow multi-bit data shifting (used to scale data) in just one instruction, while other DSPs require the data to be shifted with repeated one-bit shift instructions. Similarly, some DSPs allow parallel data moves (the simultaneous loading of operands while executing an instruction) that are unrelated to the ALU instruction being executed, but other DSPs only support parallel moves that are related to the operands of an ALU instruction.

Some newer DSPs allow two MACs to be specified in a single instruction, which makes MIPS-based comparisons even more misleading. One solution to these problems is to decide on a basic operation (instead of an instruction) and use it as a yardstick when comparing processors. A common operation is the MAC operation. Unfortunately, MAC execution times provide little information to differentiate between processors: on many DSPs a MAC operation executes in a single instruction cycle, and on these DSPs the MAC time is equal to the processor’s instruction cycle time.

And, as mentioned above, some DSPs may be able to do considerably more in a single MAC instruction than others. Additionally, MAC times don’t reflect performance on other important types of operations, such as looping, that are present in virtually all applications. A more general approach is to define a set of standard benchmarks and compare their execution speeds on different DSPs. These benchmarks may be simple algorithm “kernel” functions (such as FIR or IIR filters), or they might be entire applications or portions of applications (such as speech coders). Implementing these benchmarks in a consistent fashion across various DSPs and analyzing the results can be difficult.

Two final notes of caution on processor speed: First, be careful when comparing processor speeds quoted in terms of “millions of operations per second” (MOPS) or “millions of floating-point operations per second” (MFLOPS) figures, because different processor vendors have different ideas of what constitutes an “operation.” For example, many floating-point processors are claimed to have a MFLOPS rating of twice their MIPS rating, because they are able to execute a floating-point multiply operation in parallel with a floating-point addition operation. Second, use caution when comparing processor clock rates. A DSP’s input clock may be the same frequency as the processor’s instruction rate, or it may be two to four times higher than the instruction rate, depending on the processor. Additionally, many DSP chips now feature clock doublers or phase-locked loops (PLLs) that allow the use of a lower-frequency external clock to generate the needed high-frequency clock onchip.

4.3.4 Memory Organization

The organization of a processor’s memory subsystem can have a large impact on its performance. As mentioned earlier, the MAC and other DSP operations are fundamental to many signal processing algorithms. Fast MAC execution requires fetching an instruction word and two data words from memory at an effective rate of once every instruction cycle. There are a variety of ways to achieve this, including multiported memories (to permit multiple memory accesses per instruction cycle), separate instruction and data memories (the “Harvard” architecture and its derivatives), and instruction caches (to allow instructions to be fetched from cache instead of from memory, thus freeing a memory access to be used to fetch data).

Another concern is the size of the supported memory, both on- and off-chip. Most fixed-point DSPs are aimed at the embedded systems market, where memory needs tend to be small. As a result, these processors typically have small-to-medium on-chip memories (between 4K and 64K words), and small external data buses. In addition, most fixed-point DSPs feature address buses of 16 bits or less, limiting the amount of easily-accessible external memory.

Some floating-point chips provide relatively little (or no) on-chip memory, but feature large external data buses. For example, the Texas Instruments TMS320C30 provides 6K words of on-chip memory, one 24-bit external address bus, and one 13-bit external address bus. In contrast, the Analog Devices ADSP-21060 provides 4 Mbits of memory on-chip that can be divided between program and data memory in a variety of ways. As with most DSP features, the best combination of memory organization, size, and number of external buses is heavily application-dependent.

4.3.5 Ease of Development

The degree to which ease of system development is a concern depends on the application. Engineers performing research or prototyping will probably require tools that make system development as simple as possible. That said, items to consider when choosing a DSP are software tools (assemblers, linkers, simulators, debuggers, compilers, code libraries, and real-time operating systems), hardware tools (development boards and emulators), and higher-level tools (such as block-diagram based code-generation environments). A fundamental question to ask when choosing a DSP is how the chip will be programmed. Typically, developers choose either assembly language, a high-level language—such as C or Ada—or a combination of both. Surprisingly, a large portion of DSP programming is still done in assembly language. Because DSP applications have voracious number-crunching requirements, programmers are often unable to use compilers, which often generate assembly code that executes slowly. Rather, programmers can be forced to hand-optimize assembly code to lower execution time and code size to acceptable levels. Users of high-level language compilers often find that the compilers work better for floating-point DSPs than for fixed-point DSPs, for several reasons. First, most high-level languages do not have native support for fractional arithmetic. Second, floating-point processors tend to feature more regular, less restrictive instruction sets than smaller, fixed-point processors, and are thus better compiler targets. Third, as mentioned, floating point processors typically support larger memory spaces than fixed-point processors, and are thus better able to accommodate compiler-generated code, which tends to be larger than hand crafted assembly code.

VLIW-based DSP processors, which typically use simple, orthogonal RISC-based instruction sets and have large register files, are somewhat better compiler targets than traditional DSP processors. However, even compilers for VLIW processors tend to generate code that is inefficient in comparison to hand-optimized assembly code. Hence, these processors, too, are often programmed in assembly language—at least to some degree. Whether the processor is programmed in a high-level language or in assembly language, debugging and hardware emulation tools deserve close attention since, sadly, a great deal of time may be spent with them. Almost all manufacturers provide instruction set simulators, which can be a tremendous help in debugging programs before hardware is ready. If a high-level language is used, it is important to evaluate the capabilities of the high-level language debugger: will it run with the simulator and/or the hardware emulator? Is it a separate program from the assembly-level debugger that requires the user to learn another user interface? Most DSP vendors provide hardware emulation tools for use with their processors. Modern processors usually feature on-chip debugging/emulation capabilities, often accessed through a serial interface that conforms to the IEEE 1149.1 JTAG standard for test access ports. This serial interface allows scan-based emulation—programmers can load breakpoints through the interface, and then scan the processor’s internal registers to view and change the contents after the processor reaches a breakpoint.

Scan-based emulation is especially useful because debugging may be accomplished without removing the processor from the target system. Other debugging methods, such as pod-based emulation, require replacing the processor with a special processor emulator pod. Off-the-shelf DSP system development boards are available from a variety of manufacturers, and can be an important resource. Development boards can allow software to run in real-time before the final hardware is ready, and can thus provide an important productivity boost. Additionally, some low-production-volume systems may use development boards in the final product.

4.3.6 Multiprocessor Support

Certain computationally intensive applications with high data rates (e.g., radar and sonar) often demand multiple DSP processors. In such cases, ease of processor interconnection (in terms of time to design interprocessor communications circuitry and the cost of linking processors) and interconnection performance (in terms of communications throughput, overhead, and latency) may be important factors. Some DSP families—notably the Analog Devices ADSP-2106x—provide special-purpose hardware to ease multiprocessor system design. ADSP-2106x processors feature bidirectional data and address buses coupled with six bidirectional bus request lines. These allow up to six processors to be connected together via a common external bus with elegant bus arbitration. Moreover, a unique feature of the ADSP-2106x processor connected in this way is that each processor can access the internal memory of any other ADSP-2106x on the shared bus. Six four-bit parallel communication ports round out the ADSP-2106x’s parallel processing features.

4.3.7 Power Consumption and Management

DSPs are increasingly being used in portable applications (such as cellular phones and portable audio players) where power consumption is a major concern. As a result, many processor vendors are reducing processor supply voltages and adding power management features to give programmers greater influence over processor power consumption. Power management features available on some DSPs include:

Reduced voltage operation: Many vendors offer low-voltage (3.3-, 2.5-, or 1.8-volt) versions of their DSP processors. These processors consume far less power than five-volt equivalents at the same clock rate.

Sleep or idle modes: Most DSPs feature modes that turn off the processor’s clock to all but certain sections of the processor, reducing power consumption. In some cases, any unmasked interrupt will bring the processor back from sleep mode, while in other cases, only a few designated external interrupt lines will wake the processor. Some processors provide multiple sleep modes with different power savings and wakeup latencies.

Programmable clock dividers: Some DSPs allow the processor’s clock frequency to be varied under software control to use the minimum clock speed required for a particular task.

Peripheral control: Some DSPs allow the programmer to disable peripherals that are not in use. Regardless of power management features, it is often difficult for design engineers to obtain meaningful power consumption figures for DSPs. This is because a DSP’s power consumption may vary by as much as a factor of three depending on the instructions it executes.

Unfortunately, most vendors publish only “typical” or “maximum” power consumption numbers, usually without specifying what constitutes a “typical” program. One exception is Texas Instruments, which provides application notes that detail power consumption vs. instruction type and processor configuration.

4.3.8 Cost

Obviously, processor cost is a major concern for products that are to be produced in volume. For such applications, designers try to use the lowest cost DSP that meets the requirements of the application, even though such devices may be considerably less flexible and more difficult to program than costlier processors. Among processor families, the least expensive family members tend to have significantly fewer features, less on-chip memory, and lower performance than the more expensive members.

A key factor in processor pricing is the dependence of price on device packaging. For example, plastic thin quad flat pack (PQFP and TQFP) packages can be significantly less expensive than pin grid array (PGA) packages. Finally, when considering prices, it is important to remember two things. First, processor prices are continually falling. Second, prices are strongly dependent on quantity, and prices for, say, a quantity 100,000 order may be significantly lower than for a quantity 1,000 order.

4.4 TMS320 DSP Product Review

Since the launch of Texas Instruments' first single-chip Digital Signal Processor (DSP) in 1982, it has been provided designers an accelerated next-generation, breakthrough systems as well as complementary technology and support. DSPs are unique microprocessors that are programmable and operate in real-time - much faster than general-purpose microprocessors. The ability to crunch vast quantities of numbers is the value digital signal processors bring to the electronics science.

The TMS320 DSP family offers the most extensive selection of DSPs available anywhere, with a balance of general-purpose and application-specific processors to suit application needs. There are three distinct Instruction Set Architectures that are completely code-compatible within platforms:

4.4.1 Power Efficiency: TMS320C5000 DSP Platform

TMS320C5000 DSP Platform is optimized for the consumer digital market - the heart of the mobile Internet - and it's convergence with other consumer electronics. With a roadmap to power consumption as low as 0.33mA/MHz, the TMS320C55x and TMS320C54x DSPs are optimized for personal and portable products like digital music players, GPS receivers, portable medical equipment, 3G cell phones, and digital cameras as well as MIPS-intensive voice and data applications and extremely cost effective single and multi-channel applications.

Based on the C55x DSP core, the OMAP5910 processor integrates a C55x DSP core with a ARM925 on a single chip for the optimal combination of high performance with low power consumption. This unique architecture offers a solution to both DSP and ARM developers, providing the low power real-time signal processing capabilities of a DSP coupled with the command and control functionality of an ARM. Sampling today, the OMAP5910 is optimal for designers working with devices that require embedded applications processing in a connected environment.

4.4.2 Control Optimized: TMS320C2000 DSP Platform

TMS320C2000 DSP Platform provides on-chip integration and computational abilities that produce unparalleled improvements in energy efficiency. The TMS320C28x DSP generation is the highest-performance solution for digital control. The TMS320C24x DSP generation is the foundation for this diverse platform. This generation delivers power and control advantages that allow designers to implement advanced, cost-efficient control systems..

4.4.3 Highest Performance: TMS320C6000 DSP platform

The C6000 DSP platform offers fastest running at clock speeds up to 1 GHz. The platform consists of the TMS320C64x and TMS320C62x fixed-point generations as well as the TMS320C67x floating-point generation. Optimal for broadband infrastructure, performance audio and imaging applications, the C6000 DSP platform's performance ranges from 1200 to 8000 MIPS for fixed-point and 600 to 1350 MFLOPS for floating point.

4.5 TMS320C5000 Platform Overview

To enable the design of power-sensitive systems, and to find the device that best suits the design, the TMS320C5000 DSP platform includes over 20 devices with the optimal combination of performance, peripheral options, small packaging and the power-efficient performance. This combination gives designers an edge in today's portable Internet and wireless communications. With power consumption as low as 0.45 mA/MHz and performance up to 600 MIPS, the C5000 DSP platform is optimized for portable media and communication products like digital music players, GPS receivers, portable medical equipment, feature phones, modems, 3G cell phones, and portable imaging.

4.5.1 Platform highlights

· Performance up to 900 MIPS

· Ultra-low-power down to 0.33mA/MHz - enabling incredible new potential for power-sensitive portable systems

· A wide-range of devices with a rich array of peripherals allows designers to accurately target system needs

· Complete code-compatibility across all devices, allows reuse of existing code to greatly reduce development burden

· A complete development environment and support

4.5.2 Code compatible generations

The TMS320C5000 platform consists of two fully code-compatible device generations:

TMS320C54x: The C54x generation consists of over 17 code compatible devices. With a broad range of performance and peripheral options, low-power operation and innovative architecture and instruction set, the C54x generation gives designers effective ways of achieving high-performance, low-power operation and low system cost.

TMS320C55x: The TMS320C55x generation contains the industrys most power-efficient DSPs and redefines the potential of applications ranging from portable Internet appliances to high-speed wireless communications. The rapidly-growing generation delivers ultra-low power performance through advanced power management techniques that automatically power down inactive peripherals, memory and core functional units increasing battery life for portable applications.

4.5.3 Signal Processing Libraries and Peripheral Drivers

To enable to dramatically reduce the code development time, platform-specific libraries; the Signal Processing and the Chip Support libraries contain a collection of high-level, optimized DSP function modules and help to achieve performance higher than standard ANSI C code.

4.6 TMS320C2000 Platform Overview

The TMS320C2000 family of digital signal controllers with its performance and peripheral integration, offers flash memory, ultra-fast A/D converters, and robust CAN modules and ease of use. The high-precision control DSPs are as follows.

The TMS320C28x generation of digital signal controllers are the industry's first 32-bit DSP-based controllers with on-board Flash memory and performance up to 150 MIPS. They target industrial control, optical networking, and automotive control applications. The C28x core is a high performance control optimized controller and offers up to 150 MIPS of computational bandwidth to handle numerous sophisticated control algorithms in real-time, such as sensorless speed control, random PWM, and power factor correction.

The TMS320C24x generation of digital signal controllers offers 20 to 40 MIPS of DSP performance along with MCU control and with integrated Flash memory and are idea for implementing sophisticated control algorithms in cost sensitive and space constraint applications.

Table 4‑1: C2000 Fixed-point DSPs

DSP Generation	DSP Type	Features	Price*
C24x	16-bit data Fixed-Point	SCI, SPI, CAN, A/D, event manager, watchdog timers, on-chip Flash/ROM memory, 20-40 MIPS	$1.95-$15
C28x	32-bit data Fixed-Point	SCI, SPI, CAN, 12-bit A/D, McBSP, watchdog timers, on-chip Flash memory, up to 150 MIPS	$4.95 - $15

Target applications of TMS320C2000 are as follows:

1. Industrial: automation, drives,

2. Automotive: Electronic power steering, integrated starter alternator, brushless motors and pumps

3. Appliances/White Goods: Drive motors, Water Pumps, HVAC

4. Other : Hand-held power tools, Power Supplies, Optical Networking,

5. Motor Types : Single phase, Three phase, Sensored, Sensorless, AC Induction, Brushless DC, Permanent Magnet Synchronous, Switched Reluctance,

6. Smart Sensing and Measurement: Powerline Communication Systems, Ballast Control Systems, Radar Applications

4.7 TMS320C6000 Platform Overview

Raising the bar in performance and cost efficiency, the TMS320C6000 DSP platform offers fast DSPs running at clock speeds up to 1 GHz. The platform consists of the TMS320C64x and TMS320C62x fixed-point generations as well as the TMS320C67x floating-point generation. Optimal for broadband infrastructure, performance audio and imaging applications, the C6000 DSP platform's performance ranges from 1200 to 8000 MIPS for fixed-point and 600 to 1800 MFLOPS for floating point.

4.7.1 Platform highlights

Optimized for highest performance and ease-of-use in high-level language programming with three device generations. Fixed-point performance ranges from 1200 to 8000 MIPS and floating-point performance from 600 to 1350 MFLOPS. Memory, peripherals and co-processor combinations tailored to meet the needs of targeted broadband infrastructure, performance audio and imaging applications. Complete software compatibility across all C6000 devices, allows reuse of existing object code to greatly reduce development burden

4.7.2 Code compatible generations

The TMS320C6000 platform consists of three fully code-compatible device generations:

TMS320C64x: The C64x fixed-point DSPs offer the industry's highest level of performance to address the demands of the digital age. At clock rates of up to 1 GHz, C64x DSPs can process information at rates up to 8000 MIPS with costs as low as $19.95. In addition to a high clock rate, C64x DSPs can do more work each cycle with built-in extensions. These extensions include new instructions to accelerate performance in key application areas such as digital communications infrastructure and video and image processing.

TMS320C62x: These first-generation fixed-point DSPs represent breakthrough technology that enables new equipments and energizes existing implementations for multi-channel, multi-function applications, such as wireless base stations, remote access servers (RAS), digital subscriber loop (xDSL) systems, personalized home security systems, advanced imaging/biometrics, industrial scanners, precision instrumentation and multi-channel telephony systems.

TMS320C67x: For designers of high-precision applications, C67x floating-point DSPs offer the speed, precision, power savings and dynamic range to meet a wide variety of design needs. These dynamic DSPs are the ideal solution for demanding applications like audio, medical imaging, instrumentation and automotive. A detailed list of DSPs are given in Table 4‑2

4.7.3 C Compiler

The C6000 DSP platform has a high-performance C engine with a compiler that leverages the architecture to sustain performance while speeding design development time for high-performance applications. C compiler/optimization tools gives the ability to balance code size and performance to meet the needs of the application and is available free of charge.

C6000 Signal Processing Libraries and Peripheral Drivers: To enable to dramatically reduce the code development time, platform-specific libraries; the Signal Processing and the Chip Support libraries contain a collection of high-level, optimized DSP function modules and help to achieve performance higher than standard ANSI C code.

Table 4‑2: C67x Floating Point DSPs.

Family Name : TMS320C67x DSP Generation
Part Number	Data/Program Memory (bits)	Approx. 1KU Price (US$)	Description
TMS320C6713-300	4KBytes L1D Data Cache; 4KBytes L1P Program Cache; 64KBytes L2 Cache; 192 KBytes L2 SRAM	38.75	Floating-Point Digital Signal Processor
TMS320C6713-225	4KBytes L1D Data Cache; 4KBytes L1P Program Cache; 64KBytes L2 Cache; 192 KBytes L2 SRAM	29.14
TMS320C6713-200	4KBytes L1D Data Cache; 4KBytes L1P Program Cache; 64KBytes L2 Cache; 192 KBytes L2 SRAM	22.18
TMS320C6713-167	4KBytes L1D Data Cache; 4KBytes L1P Program Cache; 64KBytes L2 Cache; 192 KBytes L2 SRAM	22.18
TMS320C6712-150	32Kbits L1D Data Cache; 32Kbits L1P Program Cache; 512Kbits L2 Cache
TMS320C6712-100	32Kbits L1D Data Cache; 32Kbits L1P Program Cache; 512Kbits L2 Cache
TMS320C6711-250	32Kbits L1D Data Cache; 32Kbits L1P Program Cache; 512Kbits L2 Cache	20.34	Floating-Point DSP [Revision D Recommended for New Designs]
TMS320C6711-200	32Kbits L1D Data Cache; 32Kbits L1P Program Cache; 512Kbits L2 Cache	20.34
TMS320C6711-167	32Kbits L1D Data Cache; 32Kbits L1P Program Cache; 512Kbits L2 Cache	20.34
TMS320C6701-167	512K/512K		Floating-Point Digital Signal Processor
TMS320C6701-150	512K/512K	82.24
SMJ320C6701	512K/512K	892.48
SM320C6713B-EP	4KBytes L1D Data Cache; 4KBytes L1P Program Cache; 64KBytes L2 Cache; 192 KBytes L2 SRAM	69.94	Military Enhanced Plastic Floating-Point Digital Signal Processor
SM320C6712D-EP	32Kbits L1D Data Cache; 32Kbits L1P Program Cache; 512Kbits L2 Cache	30.52
SM320C6711D-EP	32Kbits L1D Data Cache; 32Kbits L1P Program Cache; 512Kbits L2 Cache	48.82
SM320C6701-EP	512K/512K	180.76
SM320C6701	512K/512K	535.13

4.8 Medical Applications of C67x DSPs

To develop high-precision applications, TMS320C67x DSPs are suitable since they provide the speed, precision, power savings and dynamic range to meet a wide variety of application needs. The structure of C67x is shown in Figure 4‑2. These dynamic DSPs are suitable for demanding applications such as medical imaging.

Key Features

· Up to 1350 MFLOPS at 225 MHz

· 100% code-compatible with 32-bit instructions, single and double precision

· C6000 DSP platform advanced VLIW architecture

· Two inter-integrated circuit (I2C) bus interfaces

· Two multi-channel buffered serial ports (McBSPs)

· Up to 256 Kbytes of on-chip memory

· 16-channel DMA controller

· Up to eight 32-bit instructions executed each cycle

· Eight independent, multipurpose functional units and thirty-two 32-bit registers

· Advanced DSP C compiler and assembly optimizer maximize efficiency and performance

· IEEE floating-point format

· Packaging: 27/35-mm BGA and 28-mm TQFP options

Figure 4‑2: The C67x DSPs’ two-level cache memory structure.

Applications

· Digital imaging

· Medical ultrasound

· Portable ultrasound equipment

· CT scanners

· Magnetic resonance imaging (Figure 4‑3)

Figure 4‑3: Magnetic Resonance Imaging implementation of TMS320C67 DSP.

4.9 Compiler Interface

As with modern DSPs, a C-compiler is provided to program and simulate the TMS32067x processor, so called Code Composer Studio. Using a compiler speeds and enhances the development process, and gives the ability to create and test real-time, embedded signal processing applications. C-compiler extends the capabilities to include full awareness of the DSP target by the host and real-time analysis tools. It’s user interface is shown in Figure 4‑4.

Figure 4‑4: User interface of C-compiler, Code Composer Studio.