The demand for memory is exponential - Amazon Web Services



The demand for memory is exponentialDriven by the fast growth of mobile and IoT devices as well as autonomous vehicles, artificial intelligence, personal assistants etc. data is now being generated at an exponentially growing rate. Currently we store most of this data using Flash memory and Dynamic RAM (DRAM) . DRAM is used for working (program) memory. It is volatile and fast, but is also expensive. Flash, on the other hand, is used for longer term storage. It is non-volatile, slow and cheap. In a smart phone, the DRAM holds copies of the programs for execution while the phone is turned on and the NAND stores the programs when the power is off along with photos, videos, music and other less speed-sensitive data. A server will store programs and data in its DRAM main memory and use flash-based SSDs for its long-term and backup storage. Smaller systems may use NOR flash instead of NAND and Static RAM (SRAM) instead of DRAM, but only if their memory needs are very small as NOR costs one or two orders of magnitude more per byte than NAND flash and SRAM’s cost is a couple of orders of magnitude larger than that of DRAM. DRAM and NAND flash are today’s prevailing memory technologies, from smart phones to data processing equipment. However, both DRAM and NAND flash have idiosyncrasies that cause them to be highly energy inefficient. Both technologies are also currently facing difficulties in scaling and the costs to do so are rising quickly.Scaling flash memoryOne of the major ways in which cost reduction occurs for memory is through scaling. There are three main techniques that have been used to scale flash memories and they are all becoming increasingly difficult:Process shrinks - improvements in flash density have largely occurred through feature size scaling. Feature size refers to the size specifications of a device. Feature size scaling allows fitting more memory cells into the same horizontal area. However, it is generally agreed that that the limits to this are soon to be reached or will be too cost prohibitive to pursue. Multiple layers – as process shrinks are now less viable, flash manufacturers have started using a new scaling method called 3D Power Scaling. This method is not too dissimilar from the approach taken in Manhattan, Tokyo, Hong Kong or similarly highly crowded places to deal with space limitations. Instead of building horizontally, city sprawl, they building vertically with skyscrapers. This has become the standard approach to maximize packing density and now several flash memories are using 3D or V-NAND RAM which do the same thing. They build vertically and stack multiple layers of memory transistors on top of each other in a single integrated circuit. The main issue with 3D NAND is that it is a difficult technology to work with due to its complexity. 3D NAND, however, has given a new life to NAND in terms of scaling.Multiple bits per cell – multiple bits per cell can improve density and reduce the cost per bit, but it comes with significant impacts to complexity, write latency and the maximum number of possible writes. What dictates memory cost?There are three elements to memory cost:Wafer cost Megabytes per wafer. This is tied into the wafer and memory size. For example, when the wafer is larger you can fit more megabytes onto each wafer.Yield which is about whether 50% of the chips useful or 99% of them are. High yield is an extremely important characteristic. Wafer cost and yield are a function of the amount of volume that you push through the fabrication plant. That is, they are enabled by economies of scale. The ability to scale down to smaller process geometries is what will allow new memories to become the more cost-effective solution over the currently dominant technologies.The crossover point when new memories cost the same as flash has been anticipated for over two decades, but it has been continually postponed. Going in the 3D direction has extended the timeline for flash’s scaling limit which means that it will likely only be taken over when a new memory technology gains a considerable portion of the market. This is because of the fact that economies of scale lead to significant price advantages. Technologies with shrinking markets, e.g. EEPROM, NOR Flash and SRAM, will not move to the lower geometries because companies will not spend the money to create the processes that would allow them to do so. This means that NOR flash and System On a Chip (SOC) are ripe areas for new memory technologies and are likely where they are going to first get a decent foothold in the market. DRAM is also expected to hit a scaling limit in a few years and once DRAM stops scaling the new memories will overtake it in a few generations.Memory technology leaders, DRAM and NAND flash, will continue to cost reduce in accordance with Moore's law. However, new memories will be moving faster than Moores law because they will be using processes that are inexpensive to get developed on and then will eventually move onto the processes that are more advanced. New memories will also be able to progress quicker by having simpler structures which allows them to be developed and manufactured quicker.Limitations of the current Von Neumann architectureComputer architecture has not changed since Von Neumann introduced his concept on how to perform computing in 1945. Also, the advances that have allowed the computing industry to largely keep pace with Moore’s law have been focused on performance and scaling. This means that there has not been too much attention paid to the fact that the power consumption has kept on increasing with each new technological generation. This has lead to today’s memory technologies requiring a lot of power.Fundamental thermal limits and limits on the ability to generate power are now placing significant constraints on what can be achieved using current computing architectures. These constraints mean that power consumption has moved from a secondary concern to a primary concern.The Von Neumann bottleneck Artificial intelligence, big data analysis and edge computing are all placing requirements on the current Von Neumann architecture that it is not efficient at meeting. This is because in modern von Neumann systems the data is stored in memory, but it is processed in a separate processing unit. Data transfer between these units incurs energy costs?and delays several orders of magnitude greater than the energy costs?and delays incurred by the?computation itself. If an application is data intensive, then the data transfer requirement will severely limit both the performance and energy efficiency. Figure 1 In Von Neuman architecture, the memory and cpu is separated The requirement for data transfer between the CPU and memory creates a fundamental limitation where the overall performance is not only limited by the CPU and memory, but also by the ability to transfer data between them. This limitation is known as the von Neumann bottleneck. When the buses throughput is lower than the rate at which the CPU can work the effective processing speed is limited as the CPU is continually forced to wait for the needed data to move to or from memory. Since CPU speed and memory size have increased much faster than the throughput between them, the bottleneck has become more of a problem with every new technological generation. Big data also exacerbates the problem because it often involves workloads where the CPU is required to perform small levels of processing on large amounts of data.Dark silicon and the move to multiple coresWhile enabling high calculation speeds and data storage, the von Neumann architecture doesn’t allow for optimum energy efficiency. A problem associated with energy consumption is the associated requirement for heat dissipation. With Von Neuman architecture, there are fundamentally only two ways to improve computing performance. These are increasing the operating frequency or increasing the number of transistors and both of these things lead to greater amounts of heat that must be dissipated. This means that the physical limitations on heat dissipation also place limits on the number of possible transistors or their frequency. To overcome these limitations the computing industry has decided to limit frequency which has stalled since the middle of the previous decade. The limitation on the maximum useable frequency has impacted the rate of progress of the computer industry and is called the ‘dark silicon’ problem. This problem refers to the fact that not all the cores or the frequency in a single core can be used at the same time. Some processing power must be kept off or ‘dark’ to keep below the requirements for heat dissipation. To combat this problem the computing industry has been compelled to develop methods such as: dynamic voltage and frequency scaling and moving the architecture of microprocessors towards multiple cores.Data centers moving to colder climates The insatiable demand for information has led to the creation of gigantic clusters of servers and memory banks which are called data centers. These data center facilities require massive amounts of power both to run and cool the servers. Due to the power consumption requirements, these data centers often come with an associated power plant. Due to the related heat dissipation requirements, these data centers require complex cooling systems which means that they are often built in areas in proximity of large water sources, like lakes and in predominantly cold regions.Growing and Emerging DomainsInternet of Things (IOT)The successful adoption of the internet and the rapid rise of mobile phones and other similar devices?have promoted the extensive deployment of fiber optic cables and the proliferation of multiple wireless technologies ranging from communication satellites to tens of thousands of base stations. This?has enabled an unprecedented level of global and mobile connectivity?which has, in turn, enabled the creation of completely new and unexpected markets, for?example, social networks. As more and more different types of devices, cameras and sensors connect to the Internet and to each other it is going to once again?change the landscape of what is possible and the data that can be generated.We have already?seen large scale ramifications from the ability to store and proliferate data created by people with sites like Facebook and YouTube. The next era, which has already started, will be?driven by data created by things. It will be the era of?the Internet of Things (IoT) and it will have vast impacts.IoT describes a world where just about anything can be connected to the internet. Most of us think about “being connected” in terms of electronic devices such as servers, computers, tablets, telephones and smart phones. In?the IoT era, sensors and actuators will be embedded in physical objects, from roadways to pacemakers, which are then linked through wired and wireless networks.There are many possible applications?for IoT. One example is self-driving cars which integrate powerful on-board computers connected via Wi-Fi to navigation information and are capable of simultaneously collecting information from multiple sensors operating from optical to RADAR frequencies.As IoT continues to evolve and become?increasingly common, the demand on engineers to design new generations of flexible, low-cost, low power embedded memories into IoT hardware will become ever greater. The devices in these domains run on batteries or even just ambient energy, e.g. solar power, and therefore have strict demands on the power consumed by the memory systems.Heterogeneous systemsAs we move towards the vision of trillions of sensors that will be used in the era of Internet of Things, Heterogeneous Systems are going to be increasingly prevalent. A system would be termed heterogeneous when it contains multiple processing elements of fundamentally different types. For example, a system that has a CPU and a FPGA would be called?heterogeneous because CPU and FPGA are fundamentally different in architecture.Heterogeneous systems are becoming more prevalent because energy efficiency is becoming increasingly important. To meet the demands for both performance and energy efficiency devices must use the right architectures, but since these devices must perform multiple and often completely?different tasks to do them well they need to be systems that contain components with different architectures.Heterogeneous systems can vary greatly between different domains as these domains can?have greatly different device performance requirements. For example, automotive applications require accelerometers with high reliability that can be used in extreme environmental conditions. Accelerometers in mobile devices, on the other hand, primarily require low cost and low power consumption. Thus, the main requirement for low cost could be traded off with other requirements such as long-term reliability or accuracy.The use of heterogeneous systems means that the market is becoming broader and more fragmented. It also means that the technologies that are going to find the most success are those that use standard CMOS materials and simple manufacturing processing steps and tools. This is because the technologies in the heterogeneous systems market will need to be integrated into many different heterogeneous systems.PIM The growth of the Internet of Things (IoT) is one of the leading factors in the emergence of big data. Huge amounts of data?are now being?transferred and processed. The bandwidth and latency of these data transfers are?posing difficult challenges for traditional computing systems. This is because the movement of massive amounts of data affects?performance, power efficiency and reliability, which are the three fundamental attributes of a computing system.?A new computing paradigm offering to alleviate some of the burdens of big data is?Processing-in-Memory (PIM), which is aimed?at placing computation as close as possible to memory. In current computing systems,?the memory and CPU are separate devices, with memory serving only as ‘dumb’ storage from which the CPU reads and writes. PIM is about making memory 'smart and is enabled by?emerging non-volatile resistive memory technologies (memristors) that make it possible to both store and process data within the same memory cells.Neuromorphic computingAI techniques such as deep neural networks or deep learning have found widespread success when applied to several problems including image and video interpretation, speech and natural language processing and medical diagnostics. At present, most of the computing is performed on graphics processing units (GPUs), accelerator application-specific integrated circuits (ASICs) or?field-programmable gate arrays (FPGAs) and it is mostly done at the data center end of the cloud infrastructure. Scaling up and distributing this computing is limited by constraints associated with the Von Neumann architecture.The human brain, although not entirely understood, is a clear demonstration that there are other possible architectures that are much more energy efficient and compact than the current Von Neumann architecture. As can be seen in the graph below, the human brain has a “clock speed” (neuron firing speed) measured in tens of hertz and a total power consumption of around 20 watts. A modern silicon chip, on the other hand, can?consume thousands or millions times more energy to perform similar tasks to a human brain. Figure 2 energy efficiency of the brain vs. a silicon chip Neuromorphic computing is a domain concerned with attempting to mimic the natural biological structures of our nervous system. Neuromorphic or brain-inspired computing is about investigating how we can get closer to the energy efficiency and parallelism of the brain using electronics. The goal is do this by implementing neurons in silicon. Although there is much still to be understood about how the brain works and operates, emerging memory architectures which enable PIM appear to offer a way to significantly progress in this domain. In the brain, synapses are used for both computing and storing information. To progress in the neuromorphic domain we need to go beyond von Neumann and have memory play a much more active role in computation.Edge computingNew memory architectures, neuromorphic computing and processing in memory will be great enablers of what is called edge computing. Edge computing means that all the data processing is done at the point of the device rather than in the cloud. Devices like smart watches, phones and sensors all need processing power to help them make decisions and a lot of that processing is currently being done in the cloud. So every time one of these devices, like a smart watch, collects data on your heart rate, for example, it needs to have a connection to the cloud to process that data.Edge computing allows these devices to do that processing on the actual device or at the edge of the network, which removes or reduces the need to communicate with the cloud. Edge computing improves response times, saves bandwidth and eliminates the security concerns involved with sending data to the cloud. Storage Class MemoryStorage Class Memory (SCM) is a class of memory technology which has recently become viable for use. Its name?arises from the fact that it allows both storage capability and speeds similar to memory. That is, SCM exhibits non-volatility of data, similar to secondary storage, for example flash, while also having latencies comparable to?primary memory, for example DRAM. It also allows byte-addressibility.?The current technologies used for primary memory and caches face many problems that SCM could potentially solve. SRAM, the technology typically used in caches, suffers from low density making it increasingly difficult to pack together in order to meet the growing demands of speed. DRAM used in primary memory has better density, but suffers from slower access times and requires constant power to refresh its memory.The "holy grail" for the memory industry would be to come up with something that has the speed of DRAM as well as the capacity, cost, and persistence of NAND flash memory. This, however, is not currently achievable and due to the shortcomings associated with currently available SCM-based memory systems instead of replacing existing layers of memory the current focus for SCM is in trying to bridge the gap between long latency NAND-based secondary memory and primary memory. In this way, SCM is seeking to change the memory hierarchy from a set of discrete memory layers to a spectrum of different memory possibilities.Figure 3 SCM will fill the space between Flash and DRAMCharacteristics of emerging memoriesPower consumption is increasingly a concern with current memories. It is a major concern in the embedded domain and is also a concern in the cloud. In the cloud, power consumption is often one of the highest cost elements in a data center, especially when cooling is included.Performance is also a major driver of the search for new memory technologies. Next-generation mobile architectures will integrate higher computing requirements for artificial intelligence at the edge while also demanding lower energy consumption. All of this must be achieved at a low cost, which is a major challenge with existing memory technologies.Emerging memories have many advantages over the existing memory technologies and most emerging memory technologies share the following attributes : they are nonvolatile or persistent, a decided strength against DRAM with its power-hungry need to be refreshed at regular intervals they don’t require the high erase/write voltages that flash needs they are byte addressable rather than block addressable. they don’t use the clumsy block erase / page write approach required by flash memory (NAND and NOR). This allows these technologies to have read and write speeds that are similar to each other, not off each other by an order of magnitude like with flash. Currently in flash memory when you want to change a few bits you have read the whole block, change the bits and then write the block back. That is a huge operation which is why it takes time and a lot of power. It is also the reason why read time is currently a 1000 times faster in Flash than write time is.Some of them allow cost reductions through scaling that surpasses that of today’s entrenched memory technologiesThe market for emerging memories is still new and undetermined. There are many potential technologies including: Phase Change Memory (PCM), Ferroelectric memories (FRAM), Magnetic RAM (MRAM) and Resistive RAM (ReRAM). Cost will be a major determinant in which type of emerging memory gains dominance in the market. This has already been demonstrated by the technologies used in today’s smart phones, which use DRAM and NAND flash even though these two technologies are extraordinarily poorly suited to operating with only battery power when compared to NOR flash and SRAM. Also, NAND flash, as a whole, is as dominant as it is today largely because of its low cost and compatibility with CMOS technology.A major thing that gets in the way of emerging memories taking over the market today is that they use new materials. Anything that uses a new material is something that is difficult to get past in manufacturing. The emerging technologies that will end up dominating the market will be the ones that find their way into applications by providing a cost advantage. It is most likely that only one technology, or at most two, will reach prominence, since most emerging memories have very similar characteristics. The economies of scale will allow one technology to develop a significantly better cost structure than any of its competitors and this will cause the market to gravitate towards a single winning technology.Technologies that use standard CMOS materials and simple manufacturing processing steps and tools have the highest chance to succeed in this competitive market. The early adopters and the strength of strategic collaborations between memory IP providers and manufacturing partners will determine which of these technologies will win out to become the nonvolatile memory of choice for the industry as a whole. ................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download