Saturday, October 17, 2009

Racetrack memory

Racetrack memory



IBM Racetrack Memory is an experimental non-volatile memory device under development at IBM's Almaden Research Center by a team led by Stuart Parkin.[1] In early 2008, a 3-bit version was successfully demonstrated.[2] If it is developed successfully, racetrack would offer storage density higher than comparable solid-state memory devices like Flash RAM and similar to conventional disk drives, but with much higher read/write performance. It is one of a number of new technologies vying to become a universal memory in the future.

Description

Racetrack Memory uses spin-coherent electric current to move the magnetic domains along a nanoscopic permalloy wire about 200 nm across and 100 nm thick. As current is passed through the wire, the domains pass by magnetic read/write heads positioned near the wire, which alter the domains to record patterns of bits. A Racetrack Memory device is made up of many such wires and read/write elements. In general operational concept, Racetrack Memory is similar to the earlier twistor memory or bubble memory of the 1960s and 70s. Both of these used electrical currents to "push" a magnetic pattern through a substrate. Dramatic improvements in magnetic detection capabilities, based on the development of spintronic magnetoresistive sensing materials and devices, allow the use of much smaller magnetic domains to provide far higher areal densities.

In production, it is expected that the wires can be scaled down to around 50 nm. There are two ways to arrange Racetrack Memory. The simplest is a series of flat wires arranged in a grid with read and write heads arranged nearby. A more widely studied arrangement uses U-shaped wires arranged vertically over a grid of read/write heads on an underlying substrate. This allows the wires to be much longer without increasing its 2D area, although the need to move individual domains further along the wires before they reach the read/write heads results in slower random access times. This does not present a real performance bottleneck; both arrangements offer about the same throughput. Thus the primary concern in terms of construction is practical; whether or not the 3D vertical arrangement is feasible to mass produce.
[edit] Comparison to other memory devices
Ambox content.png
This section is written like an advertisement. Please help rewrite this section from a neutral point of view. (April 2009)

Current projections suggest that IBM Racetrack Memory will offer performance on the order of 20 to 32 ns to read or write a random bit. This compares to about 3,000,000 ns for a hard drive, or 6 to 40 ns for conventional DRAM. The authors of the primary work also discuss ways to improve the access times with the use of a "reservoir," improving to about 9.5 ns. Aggregate throughput, with or without the reservoir, is on the order of 250 to 670 Mbit/s for IBM Racetrack Memory, compared to 102400 for dual channel DDR2 DRAM, 1000 for high-performance hard drives, and much slower performance on the order of 30 to 100 Mbit/s for Flash devices. The only current technology that offers a clear performance benefit over IBM Racetrack Memory is SRAM, on the order of 2 ns, but is much more expensive and far lower density.[3]

Flash, in particular, is a highly asymmetrical device. Although read performance is fairly fast, especially compared to a hard drive, writing is much slower. Flash works by "trapping" electrons in the chip surface, and requires a burst of high voltage to remove this charge and reset the cell. In order to do this, charge is accumulated in a device known as a charge pump, which takes a relatively long time to charge up. In the case of "NOR" flash, which allows random bit-wise access like IBM Racetrack Memory, read times are on the order of 70 ns, while write times are much slower, about 2,500 ns. To address this concern, "NAND" flash allows reading and writing only in large blocks, but this means that the time to access any random bit is greatly increased, to about 1,000 ns. Additionally, the use of the burst of high voltage physically degrades the cell, so most flash devices allow on the order of 100,000 writes to any particular bit before their operation becomes unpredictable. Wear leveling and other techniques can spread this out, but only if the underlying data can be re-arranged.

The key determinant of the cost of any memory device is the physical size of the storage medium. The reason for this is due to the way memory devices are fabricated. In the case of solid-state devices like Flash or DRAM, a large "wafer" of silicon is processed into many individual devices, which are then cut apart and packaged. The cost of packaging is about $1 per device, so as the density increases and the number of bits per devices increases with it, the cost per bit falls by an equal amount. In the case of hard drives, data is stored on a number of rotating platters, and the cost of the device is strongly related to the number of platters. Increasing the density allows the number of platters to be reduced for any given amount of storage.

In most cases memory devices store one bit in any given location, so they are typically compared in terms of "cell size", a cell storing one bit. Cell size itself is given in units of F², where F is the design rule, representing usually the metal line width. Flash and racetrack both store multiple bits per cell, but the comparison can still be made. For instance, modern hard drives appear to be rapidly reaching their current theoretical limits around 650 nm²/bit,[4] which is defined primarily by our capability to read and write to tiny patches of the magnetic surface. DRAM has a cell size of about 6 F², SRAM is much worse at 120 F². NAND flash is currently the densest form of non-volatile memory in widespread use, with a cell size of about 4.5 F², but storing two bits per cell for an effective size of 2.25 F². NOR is slightly less dense, at an effective 4.75 F², accounting for 2-bit operation on a 9.5 F² cell size.[3]

IBM Racetrack Memory appears to scale to much smaller sizes than any current memory device. In the vertical orientation (U-shaped) about 128 bits are stored per cell, which itself can have a physical size of at least about 20 F². No other near-term solid-stage technology appears to be able to scale anywhere near these densities, representing a storage density about 100 times that of Flash.[3] The caveat here is that bits at different positions on the "track" would take different times (from ~10 ns to nearly a microsecond, or 10 ns/bit) to be accessed by the read/write sensor, because the "track" is moved at fixed rate (~100 m/s) past the read/write sensor.

IBM Racetrack Memory is one of a number of new technologies aiming to replace Flash, and potentially offer a "universal" memory device applicable to a wide variety of roles. Other leading contenders include MRAM, PCRAM and FeRAM. Most of these technologies offer densities similar to Flash, in most cases worse, and their primary advantage is the lack of write endurance limits like those in Flash. Field-MRAM offers excellent performance as high as 3 ns access time, but requires a large 25 to 40 F² cell size. It might see use as a SRAM replacement, but not as a mass storage device. The highest densities from any of these devices is offered by PCRAM, which has a cell size of about 5.8 F², similar to Flash, as well as fairly good performance around 50 ns. Nevertheless, none of these can come close to competing with IBM Racetrack Memory in overall terms, especially density. For example, 50 ns allows about 5 bits to be operated in an IBM Racetrack Memory device, resulting in an effective cell size of 20/5=4 F², easily exceeding the performance-density product of PCM. On the other hand, without sacrificing bit density, the same 20 F² area can also fit 2.5 2-bit 8 F² alternative memory cells (such as RRAM or spin-torque transfer MRAM), each of which could individually operated much faster (~10 ns).
[edit] Development difficulties

One limitation of the early experimental devices was that the magnetic domains could only be pushed slowly through the wires, requiring current pulses on the orders of microseconds to move them successfully. This was unexpected, and led to performance roughly equal to hard drives, as much as 1000 times slower than predicted. Recent research at the University of Hamburg has traced this problem to microscopic imperfections in the crystal structure of the wires which led to the domains becoming "stuck" at these imperfections. Using an x-ray microscope to directly image the boundaries between the domains, their research found that domain walls would be moved by pulses as short as a few nanoseconds when these imperfections were absent. This corresponds to a macroscopic performance of about 110 m/s.[5]

The voltage required to drive the domains along the racetrack would be proportional to the length of the wire. The current density must be sufficiently high to push the domain walls (as in electromigration). For example, a permalloy racetrack of resistivity 5*10-7 ohm-m, that is 1 cm long to cover an entire chip array,[6] and uses a current density of 3*108 A/cm2, would require a driving voltage of 15 kV along the racetrack.










No comments:

Post a Comment