# CIRCUIT AND LAYOUT TECHNIQUES FOR SOFT-ERROR-RESILIENT DIGITAL CMOS CIRCUITS

# A DISSERTATION SUBMITTED TO THE DEPARTMENT OF ELECTRICAL ENGINEERING AND THE COMMITTEE ON GRADUATE STUDIES OF STANFORD UNIVERSITY IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY

Hsiao-Heng Kelin Lee September 2011

© Copyright by Hsiao-Heng Kelin Lee 2011 All Rights Reserved

| I cert | tify th | at I hav  | re read | this | disserta | tion | and  | that, in  | n my  | opii | nion, | it  |
|--------|---------|-----------|---------|------|----------|------|------|-----------|-------|------|-------|-----|
| is ful | lly ade | quate in  | ı scope | and  | quality  | as a | diss | sertation | n for | the  | degr  | ree |
| of Do  | octor c | of Philos | ophy.   |      |          |      |      |           |       |      |       |     |

(Umran S. Inan) Principal Adviser

I certify that I have read this dissertation and that, in my opinion, it is fully adequate in scope and quality as a dissertation for the degree of Doctor of Philosophy.

(Ivan R. Linscott)

I certify that I have read this dissertation and that, in my opinion, it is fully adequate in scope and quality as a dissertation for the degree of Doctor of Philosophy.

(Christoforos Kozyrakis)

Approved for the University Committee on Graduate Studies

### Abstract

Radiation-induced soft errors are a major concern for modern digital circuits, especially memory elements. Unlike large Random Access Memories (RAM) that can be protected using error-correcting codes and bit interleaving, soft error protection of sequential elements, i.e. latches and flip-flops, is challenging. Traditional techniques for designing soft-error-resilient sequential elements generally address single node errors, or Single Event Upsets (SEU). However, with technology scaling, the charge deposited by a single particle strike can be simultaneously collected and shared by multiple circuit nodes. The likelihood that a soft error caused by multiple circuit node disruptions, or Single Event Multiple Upset (SEMU), happens, increases exponentially as separation between individual transistors decreases. Hence, soft error resilience techniques for sequential elements must focus on Single Event Multiple Upsets (SEMUs).

In this dissertation, we address these concerns by presenting a design framework for soft-error-resilient sequential cell design with an overview of existing circuit and layout techniques for soft error mitigation. In order to address the growing concern over SEMUs, we introduce a new soft error resilience layout design principle called LEAP, or Layout Design through Error-Aware Transistor Positioning, which targets SEMUs by using circuit interactions and transistor placement to improve the soft error performance of a circuit without significant area cost. As an example of circuit and layout co-design for soft error resilience, we discuss our application of LEAP on the SEU-immune Dual Interlocked Storage Cell (DICE) by implementing a new sequential element layout called LEAP-DICE. LEAP-DICE retains the original DICE circuit topology, but employs a layout design efficient at using transistor interactions

to reduce SEMUs. After comparing the soft error performance of SEU-immune flip-flops with the LEAP-DICE flip-flop with a test chip in 180nm CMOS under 200–MeV proton radiation, we conclude that

- Our LEAP-DICE flip-flop achieved the best soft error performance among all SEU-immune flip-flop designs we investigated. LEAP-DICE encounters on average 2,000X fewer errors compared to the reference D flip-flop.
- Our LEAP-DICE flip-flop encounters 5X fewer errors compared to the DICE flip-flop, while both designs share identical circuit topology and transistor sizing. LEAP-DICE imposes negligible power and delay costs and 40% area cost compared to the DICE design.
- In the evaluation of our design framework, we also discovered new soft error effects related to operating conditions such as voltage scaling, clock frequency setting and radiation dose.

## Acknowledgement

Throughout my graduate studies at Stanford, I have had the privilege of working with exceptional people, teachers and students alike. Although I am closing this chapter of my life and moving forward, I will always cherish the good times we had together.

First, I would like to first thank my advisors, Prof. Umran Inan and Dr. Ivan Linscott, for offering me the unique opportunity of working on an independent research project, for giving me the freedom to explore various research ideas, for providing the support to continue my studies, and for always believing in me. I have not been the easiest student to mentor, but you have always been patient with me.  $Tesekk\ddot{u}r$  ederim.

I would also like to thank Prof. Christos Kozyrakis for chairing my oral defense committee and serving as the third reader for this dissertation.

I am heavily indebted to Prof. Subhasish Mitra, who acted as a mentor and supporter for my work in soft errors, and who introduced me to the art of writing papers. Prof. Mitra was instrumental in helping me complete this work. Without Prof. Mitra, I would still be languishing in the abyss of graduate school (figuratively and literally speaking, given that I worked in the basement of the electrical engineering building for many unspeakable years).

I would like to thank the various people who have given me technical guidance and support in the field of soft errors, notably Dr. Klas Lilja and Mounaim Bounasser of Robust Chip, and Norbert Seifert and Min Zhang of Intel. This work was heavily sponsored by Robust Chip and inspired by Dr. Lilja's vision of using layout as a design dimension for soft-error-resilient circuit design.

I cannot forget the essential support of the administrative staff: Shaolan Min,

Helen Niu, and Uma Mulukutla, who always made sure that students have sufficient funding support, and enough to get by from day to day.

For two years, I had the chance to take part on the WIPER project, where I worked with Dr. Jack Doolittle, Robert Bumala and Clem Tillier from Lockheed Martin, and Dr. Dave Lauben from Stanford. Their collaboration was instrumental in contributing to my understanding of how to design satellite instrument.

I also would thank Dr. Bennett Wilburn and Prof. Mark Horowitz for the opportunity to be part of the Light Field Camera project during my first two years at Stanford. This experience gave me a peek at doing top notch independent research at Stanford.

I am blessed to have a chance to work with exceptional peers at Stanford, namely Dr. Charles Wang and Dr. Benjamin Mossawir, former students of the Stanford VLF Group, who supported me at critical times in test chip tapeout, and offered helpful suggestions for radiation hardening ideas and radiation testing setup. I would also like to thank Prasanthi Relangi, Rakesh Gnana David Jeyasingh and Jumie Yuventi for helping me on my radiation experiments at New Mexico and Indiana. I would also thank other members of the VLF Group: Jeff Chang (whose board testing support was critical), Carsten Barth (whose late night company I enjoyed), former occupants of my Packard 037 basement office (including Dr. Robert Moore, Dr. Joe Payne, Dr. Ryan Said and Dr. Ben Cotts), and other fellow current and former members of the Stanford VLF Group and Stanford Robust Systems Group.

Finally, I would like to thank my family, my father Eric, my mother Grace, and my brother Ben. No words can express the love and gratitude I feel for my family. At times I have almost given up, and without their support, I would not have had the courage and determination to find myself again.

HSIAO-HENG "KELIN" LEE

Stanford, California August 26, 2011 This work was sponsored by the following contracts and agencies:

- NASA New Technology Initiatives Program, Grant No. NAG5-10822.
- The Aerospace Corporation Subcontract No. 46000001624-9.
- Air Force Research Laboratory, Contract No. AFRL FA8718-05-C0027.
- Defense Threat Reduction Agency/Robust Chip Inc., Contract No. HDTRA1-09-P0011.
- National Science Foundation.

This research was also made possible with generous fabrication support from National Semiconductor Corporation, as well as radiation testing support from Los Alamos National Laboratory and Indiana University Cyclotron Facility.

## Contents

| $\mathbf{A}$ | bstra | act                                       | iv |
|--------------|-------|-------------------------------------------|----|
| $\mathbf{A}$ | ckno  | wledgement                                | vi |
| 1            | Intr  | roduction                                 | 1  |
|              | 1.1   | A Brief Overview of Soft Errors           | 2  |
|              | 1.2   | Purpose                                   | 3  |
|              | 1.3   | Contributions                             | 4  |
|              | 1.4   | Organization                              | 4  |
| 2            | Rac   | liation Effects in Electronics            | 6  |
|              | 2.1   | Single-Event Effects                      | 6  |
|              |       | 2.1.1 Soft Errors                         | 8  |
|              |       | 2.1.2 Hard Errors                         | 8  |
|              | 2.2   | Sources of Radiation                      | 9  |
|              |       | 2.2.1 Near-Earth Radiation Environment    | 9  |
|              |       | 2.2.2 Terrestrial Radiation Environment   | 10 |
|              | 2.3   | Total Dose Effects                        | 15 |
|              | 2.4   | Conclusion                                | 15 |
| 3            | Circ  | cuit Soft Error Resilience Techniques     | 17 |
|              | 3.1   | Soft Error Generation                     | 18 |
|              | 3.2   | RC-Based Soft Error Resilience Techniques | 21 |
|              | 3.3   | Triple Modular Redundancy                 | 25 |

|   | 3.4 | Dual I                        | Modular Redundancy                                           | 27  |  |  |  |  |
|---|-----|-------------------------------|--------------------------------------------------------------|-----|--|--|--|--|
|   |     | 3.4.1                         | C-Element Based Soft-Error-Resilient Latches                 | 27  |  |  |  |  |
|   |     | 3.4.2                         | Dual Interlocked Storage Cell (DICE)                         | 33  |  |  |  |  |
|   |     | 3.4.3                         | Differential Cascode Voltage Switch Logic (DCVSL)            | 34  |  |  |  |  |
|   | 3.5 | SEMU                          | Vulnerability in SEU-Immune Techniques                       | 35  |  |  |  |  |
|   | 3.6 | Concl                         | usion                                                        | 38  |  |  |  |  |
| 4 | Lay | out So                        | oft Error Resilience Techniques                              | 39  |  |  |  |  |
|   | 4.1 | Charg                         | e Collection in CMOS Circuits                                | 40  |  |  |  |  |
|   | 4.2 | Tradit                        | tional Layout Techniques for Charge Collection Mitigation    | 44  |  |  |  |  |
|   |     | 4.2.1                         | Guard Rings, Guard Contacts and Guard Drains                 | 45  |  |  |  |  |
|   |     | 4.2.2                         | Node Separation                                              | 50  |  |  |  |  |
|   | 4.3 | Layou                         | t Design through Error-Aware Transistor Positioning (LEAP) . | 52  |  |  |  |  |
|   | 4.4 | LEAP                          | P-DICE: A Case Study                                         | 59  |  |  |  |  |
|   | 4.5 | Conclu                        | usion                                                        | 69  |  |  |  |  |
| 5 | Tes | t Chip                        | Implementation                                               | 71  |  |  |  |  |
|   | 5.1 | Proces                        | ss Selection                                                 | 72  |  |  |  |  |
|   | 5.2 | General Test Chip Description |                                                              |     |  |  |  |  |
|   | 5.3 | Test C                        | Chip Implementation Details                                  | 76  |  |  |  |  |
|   |     | 5.3.1                         | Standard Cell Layout Style                                   | 77  |  |  |  |  |
|   |     | 5.3.2                         | Standard Cell Library                                        | 84  |  |  |  |  |
|   |     | 5.3.3                         | I/O Cells                                                    | 85  |  |  |  |  |
|   |     | 5.3.4                         | Flip-Flop Designs under Test                                 | 89  |  |  |  |  |
|   |     | 5.3.5                         | Clock Generation and Distribution                            | 104 |  |  |  |  |
|   | 5.4 | Conclu                        | usion                                                        | 105 |  |  |  |  |
| 6 | Exp | erime                         | ntal Setup and Results                                       | 106 |  |  |  |  |
|   | 6.1 | Radia                         | tion Experimental Setup                                      | 107 |  |  |  |  |
|   | 6.2 | Radia                         | tion Experimental Results                                    | 113 |  |  |  |  |
|   |     | 6.2.1                         | Neutron Testing                                              | 113 |  |  |  |  |
|   |     | 622                           | Proton Tosting                                               | 116 |  |  |  |  |

|                           | 6.3 Conclusion      | 122 |
|---------------------------|---------------------|-----|
| 7                         | Conclusion          | 124 |
|                           | 7.1 Future Research | 125 |
| $\mathbf{B}_{\mathbf{i}}$ | ibliography         | 127 |

## List of Tables

| 3.1 | Overview of circuit-level techniques for soft error resilience                                                                    | 23       |
|-----|-----------------------------------------------------------------------------------------------------------------------------------|----------|
|     | Conditions to produce single-event transients in CMOS circuits Two-node SEMU combinations and possible protective node selection. |          |
|     | Normalized performance comparison for various flip-flop designs Test chip I/O voltage settings                                    | 74<br>88 |

## List of Figures

| 2.1  | Charge generation and collection                                        | 7  |
|------|-------------------------------------------------------------------------|----|
| 2.2  | Parasitic PNPN structure leading to single event latchup                | 10 |
| 2.3  | Theoretical sea-level cosmic rays                                       | 12 |
| 3.1  | Pulse models for single event transients                                | 20 |
| 3.2  | Comparison of published SEMU probabilities                              | 20 |
| 3.3  | Combinational and sequential soft errors                                | 21 |
| 3.4  | Vulnerable timing window for combinational SETs                         | 22 |
| 3.5  | SEU response model in an SRAM cell                                      | 24 |
| 3.6  | RC-hardened SRAM cells                                                  | 25 |
| 3.7  | Triple modular redundancy latches                                       | 26 |
| 3.8  | The C-Element                                                           | 28 |
| 3.9  | SET filtering of a latch input using C-Element                          | 29 |
| 3.10 | Single C-Element Dual Modular Redundancy (SCDMR) flip-flop              | 30 |
| 3.11 | Built-In Soft Error Resilience (BISER) flip-flop                        | 31 |
| 3.12 | Delay filtering in the feedforward path of soft-error-resilient latches | 32 |
| 3.13 | Single Event Resistant Topology (SERT) latch                            | 34 |
| 3.14 | Dual Interlocked Storage Cell (DICE)                                    | 35 |
| 3.15 | Differential Cascode Voltage Switch Logic (DCVSL)                       | 36 |
| 3.16 | SEMU in the Dual Interlocked Storage Cell (DICE)                        | 37 |
| 4.1  | Particle strike on the drain contact node of an "OFF" transistor in an  |    |
|      | inverter                                                                | 41 |

| 4.2  | Particle strike on the drain contact node of an "ON" transistor in an |    |
|------|-----------------------------------------------------------------------|----|
|      | inverter                                                              | 42 |
| 4.3  | Static CMOS logic design                                              | 43 |
| 4.4  | Bipolar parasitic conduction in an "OFF" PMOS transistor activated    |    |
|      | by a radiation strike                                                 | 46 |
| 4.5  | Well contacts in CMOS cell layout                                     | 47 |
| 4.6  | Well contact styles in CMOS inverter cell layout                      | 48 |
| 4.7  | Guard bands and guard rings in CMOS inverter cell layout              | 49 |
| 4.8  | Guard drains in CMOS layout                                           | 50 |
| 4.9  | Soft error rate dependence on node separation                         | 51 |
| 4.10 | LEAP principle for an inverter through transistor alignment           | 53 |
| 4.11 | LEAP principle for a cross-coupled inverter pair                      | 55 |
| 4.12 | Radiation-induced charge collection on the "OFF" transistor tree of a |    |
|      | static CMOS gate.                                                     | 56 |
| 4.13 | Radiation-induced charge collection on the "ON" transistor tree of a  |    |
|      | static CMOS gate.                                                     | 57 |
| 4.14 | Single-event transient suppression in a CMOS gate using the LEAP      |    |
|      | principle                                                             | 58 |
| 4.15 | Single-event charge collection in the Dual Interlocked Storage Cell   |    |
|      | (DICE)                                                                | 60 |
| 4.16 | Two DICE layout configurations                                        | 62 |
| 4.17 | LEAP-DICE layout with the SEMU example highlighted                    | 62 |
| 4.18 | Alternate DICE layout                                                 | 63 |
| 4.19 | Simulation "snapshot" for the LEAP-DICE latch structure               | 64 |
| 4.20 | Simulated DICE output voltage with different LET levels               | 65 |
| 4.21 | Coordinate system for particle strike directions                      | 66 |
| 4.22 | Layout configuration with projected cross-section regions for DICE    | 66 |
| 4.23 | Layout configuration with projected cross-section regions for LEAP-   |    |
|      | DICE                                                                  | 67 |
| 4.24 | Error cross-section comparison of DICE and LEAP-DICE as a function    |    |
|      | of LET.                                                               | 68 |

| 4.25 | LET upset threshold as a function of tilt angle                       | 69  |
|------|-----------------------------------------------------------------------|-----|
| 5.1  | Master-slave flip-flop configuration                                  | 73  |
| 5.2  | Simulation test bench for post-layout timing and power measurements.  | 75  |
| 5.3  | Test chip organization                                                | 76  |
| 5.4  | Die photograph of the 180nm bulk test chip                            | 77  |
| 5.5  | Test chip clocking scheme                                             | 78  |
| 5.6  | Routing grid example                                                  | 80  |
| 5.7  | Standard cell inverter layout example                                 | 81  |
| 5.8  | Complex routing example                                               | 82  |
| 5.9  | Total dose current leakage effect on standard two-edged vs. enclosed- |     |
|      | geometry transistors                                                  | 83  |
| 5.10 | Enclosed-geometry transistor layout                                   | 84  |
| 5.11 | I/O cell structure                                                    | 86  |
| 5.12 | ESD protection circuit                                                | 87  |
| 5.13 | Level-down shifter                                                    | 89  |
| 5.14 | Two-stage level-up shifter                                            | 90  |
| 5.15 | Staggered-pitch pad placement                                         | 91  |
| 5.16 | Circuit symbols                                                       | 92  |
| 5.17 | Standard "BASIC" D flip-flop                                          | 93  |
| 5.18 | Standard D latch                                                      | 93  |
| 5.19 | SCDMR flip-flop                                                       | 94  |
| 5.20 | QCDMR flip-flop                                                       | 95  |
| 5.21 | QCDMR latch                                                           | 96  |
| 5.22 | DCVSL inverter                                                        | 97  |
| 5.23 | Clocked DCVSL inverter                                                | 97  |
| 5.24 | DCVSL latch                                                           | 98  |
| 5.25 | Three-input DCVSL AND gate                                            | 99  |
| 5.26 | DCVSL "DIFF" flip-flop                                                | 99  |
| 5.27 | Clocked "DICE" latch                                                  | 100 |
| 5.28 | "DICE" D flip-flop                                                    | 100 |

| 5.29 | "DICE" D flip-flop layout                                                 | 101 |
|------|---------------------------------------------------------------------------|-----|
| 5.30 | Close-up of the clocked "DICE" latch layout                               | 101 |
| 5.31 | "LEAP-DICE" flip-flop layout                                              | 102 |
| 5.32 | Close-up of the clocked "LEAP-DICE" latch layout                          | 103 |
| 5.33 | Non-overlapping clock generation                                          | 104 |
| 6.1  | Test environment for integrated circuits radiation testing                | 109 |
| 6.2  | Test chip socket board                                                    | 110 |
| 6.3  | Test bench setup located away from the radiation beam                     | 111 |
| 6.4  | Close-up of FPGA and CPLD boards                                          | 112 |
| 6.5  | Neutron beam profile at Los Alamos National Laboratory                    | 114 |
| 6.6  | Clock tree protection against soft errors                                 | 116 |
| 6.7  | Measured soft error performance of flip-flops at 1V                       | 118 |
| 6.8  | SCDMR layout placement                                                    | 119 |
| 6.9  | SCDMR frequency dependence                                                | 120 |
| 6.10 | Soft error reduction in the DIFF design with increasing radiation dose.   | 122 |
| 6.11 | Design framework of soft-error resilient flip-flops in the "energy — area |     |
|      | — soft-error-resilience" space                                            | 123 |

## Chapter 1

## Introduction

The year was 2000. Mysterious computer server crashes started to plague internet and telecommunications companies across the United States. The server crashes happened on high-end servers made by Sun Microsystems, and the server disruptions were serious enough that they received substantial media coverage [Lyons, 2000]. The root cause: Soft Errors.

What are soft errors? In electronics, a soft error occurs when a signal or datum suddenly becomes corrupted due to charge deposition produced by an energetic particle strike. The data corruption is not a persistent phenomenon, and the system can normally function if it is allowed to reset itself or recover from a previous state. Therefore, a soft error happens infrequently and is difficult to reproduce. In the case of Sun Microsystems, infrequent data corruption by cosmic rays (i.e. atmospheric neutrons and protons) in the unprotected cache memory of the affected servers was identified as the root cause for the mysterious server crashes. And Sun was not the only semiconductor company affected by soft errors. A series of soft error warnings was also issued by Cisco for its networking switches, and now Cisco has dedicated customer support related to soft errors in their products [Cisco Systems Inc., 2005].

#### 1.1 A Brief Overview of Soft Errors

Soft errors in electronics have been directly observed starting from the 70s. It was first theorized by Wallmark and Marcus [1961] that silicon device dimensions would be eventually limited to 10  $\mu$ m by cosmic rays. Then reports of cosmic-ray-induced upsets came in space electronics in 1975 by  $Binder\ et\ al.$  [1975]. Soon, cosmic-neutron-induced upsets at the ground level were recorded in the Cray-1 computer at Los Alamos National Laboratory in 1976 [Normand et al., 2010], followed by accounts of alpha-particle-related upsets in dynamic random memories (DRAM) three years later [May and Woods, 1979]. Dodd and Massengill [2003] gave a detailed history of soft errors in electronics. Benedetto [1998] also described the challenge of electronics facing soft errors in space applications.

In recent years, as transistor dimensions have scaled down to deep submicron levels, transistors become more and more sensitive to charge collection due to particle strikes. While major progress in the development of soft error resilience techniques allowed the soft error rate in silicon memory and sequential cells to fall [Borucki et al., 2008], technology scaling also means that more transistors than ever are packed within the same unit area, and the soft error improvement per memory cell has not been keeping pace with the exponential growth of transistor density. As a result, overall soft error rate per chip has rapidly increased, and is now a major reliability concern for microprocessors [Karnik and Hazucha, 2004; Baumann, 2005; Meaney et al., 2005; Mitra et al., 2005; Seifert, 2007; Sanda et al., 2008; Gill et al., 2009; Dixit and Heald, 2009; Dixit and Wood, 2011].

The first soft error hardening efforts were developed in the 1980s, and focused on hardening large random access memory (RAM) arrays to prevent or correct soft errors in stored memory cells, either through static random access memory (SRAM) cell hardening [e.g., Andrews et al., 1982; Weaver et al., 1987] or error correction codes [e.g., Hsiao, 1970; Chen and Hsiao, 1984]. May et al. [1984] discovered that propagation of glitches in combinational logic produced by particle strikes can also lead to soft errors. However, the contribution of soft errors in combinational logic was not substantial enough at the time, and most efforts were concentrated on the design

of robust sequential elements. The development of soft-error-resilient SRAM cells [e.g., Diehl et al., 1982; Weaver et al., 1987] later leads to soft-error-resilient latch designs such as the Rockett Cell [Rockett, 1988], the Whitaker Cell [Whitaker et al., 1991] and the Dual Interlocked Storage Cell (DICE) [Calin et al., 1996]. Redundancy techniques such as Triple Modular Redundancy and Double Modular Redundancy soon followed [Mavis and Eaton, 2002; Mitra et al., 2005; Shuler et al., 2005].

#### 1.2 Purpose

The design techniques for designing soft-error-resilient sequential elements, described in the previous subsection, generally address single errors caused by Single-Event Upsets (SEUs), where charge collection in one circuit node due to a particle strike can flip the value of the memory bit stored. However, with technology scaling, the charge deposited by a particle strike can be simultaneously collected and shared by multiple circuit nodes in the same well [Olson et al., 2005; Seifert, 2007; Amusan et al., 2008. In addition to charge sharing, the particle strike direction also determines how much charge is deposited, as well as charge distribution among multiple nodes [Amusan et al., 2006; Baze et al., 2008]. When the incoming particle is a proton or neutron, interactions between the incident particle and the silicon nucleus can produce secondary particles generating multiple ionization tracks [Koga, 1996]. A Single-Event Multiple Upset (SEMU) happens when one energetic particle generates charge collection in multiple circuit nodes resulting in a soft error [Dodd and Massengill, 2003. As device dimensions shrink, the probability of SEMUs increases exponentially, and previous techniques focusing on single node upsets are not as effective against SEMUs. Hence soft error resilience techniques for sequential elements must focus on SEMUs. Since SEMUs have a geometric dependence on circuit layout, the main focus of this work is to bring in integrated circuit layout, specifically the placement of individual transistors, as a new design dimension to mitigate SEMUs in addition to existing circuit and layout techniques for soft error resilience.

#### 1.3 Contributions

The primary contribution of this dissertation is as follows:

• We demonstrated the first silicon implementation of the LEAP layout principle for soft error resilience in digital circuits with the new LEAP-DICE flip-flop. LEAP, or *Layout Design through Error-Aware Transistor Positioning*, is a layout design principle that relies on transistor placement and transistor interactions within the circuit to reduce the overall single event circuit response. LEAP-DICE has the best soft error performance among existing circuit techniques with moderate area and power cost, and achieves 2,000X soft error resilience compared to the standard D flip-flop.

The development of the LEAP-DICE flip-flop also leads to other contributions:

- We developed a framework for soft-error-resilient sequential cell design, by quantifying the performance trade-offs of circuit and layout resilience techniques in the soft-error-resilience power delay area design space. We experimentally demonstrate the success of this framework with a comprehensive evaluation in silicon of representative circuit and layout techniques using an 180nm CMOS test chip. In this framework, we also introduced a simple yet easy-to-use metric called *Soft Error Resilience*, to evaluate the robustness of different circuit designs operating under the same conditions.
- We also discovered new soft error effects related to circuit operating conditions such as supply voltage, total radiation dose and clock frequency. From these observations, we concluded that design for soft error resilience must target circuit operating conditions over the lifetime of the application.

#### 1.4 Organization

This chapter provided a brief history of the discovery of soft errors in electronics as a primer for research in soft errors. It then described the earlier development

of soft error resilience techniques. With growing concern over SEMUs as a result of technology scaling, we decided to address this problem by developing a new soft-error-resilient cell called LEAP-DICE using a combination of circuit and layout techniques. The new sequential cell design and the preparation of an effective design framework to evaluate this cell, form the basic of this work.

To familiarize the reader with radiation effects in electronics, we describe various sources of radiation as well as their short-term and long-term effects on electronics in Chapter 2. Chapter 3 provides an overview on existing circuit-level soft-error-resilience techniques. After briefly discussing existing layout techniques, Chapter 4 introduces the new LEAP layout design principle for soft error resilience, then implements the new LEAP-DICE design created from the application of this layout principle on the SEU-immune DICE circuit topology. The resulting sequential cell design is both SEU-immune and SEMU-resilient.

In order to evaluate the soft error performance of the new LEAP-DICE flip-flop, we created a test chip containing the new design, plus a baseline design and additional SEU-immune flip-flop designs inspired from existing circuit techniques to form a basis for comparison. Chapter 5 discusses the design and implementation of the 180nm CMOS test chip. Our experimental procedures and results obtained from accelerated radiation testing at Los Alamos National Laboratory and Indiana University Cyclotron Facility are reported in Chapter 6. Finally, Chapter 7 completes this dissertation by reviewing the major contributions of this work and offers new insights into possible future research in soft errors. A bibliography of all references is included at the end to facilitate the reading of this work.

## Chapter 2

## Radiation Effects in Electronics

As transistors become smaller and smaller with technology scaling, they become more sensitive to temporary disruptions caused by energetic particle strikes from the surrounding environment. These disruptions, called Single-Event Effects, are of great concern for the reliable operation of deep submicron electronic circuits. Single-event effects can cause soft errors, where loss or corruption of data occurs but the circuit can operate correctly if allowed to reset, and hard errors, where the circuit can become permanently damaged. This chapter starts with a discussion on how single-event effects can disturb circuit operation, then continues with various radiation sources capable of producing energetic particles causing soft errors along with different types of radiation particles of concern. Although long-term exposure effects, called Total-Dose Effects, are not of concern in terrestrial environment, they can impact electronics under accelerated radiation testing (used to estimate the soft error performance within a short period of time), and are briefly discussed in this chapter, followed by conclusions.

#### 2.1 Single-Event Effects

When an energetic particle from the environment hits silicon, it can generate charge through *direct ionization* or *indirect ionization*. Direct ionization happens when the particle bounces off outer electrons of silicon atoms in its traveling path, generating



Figure 2.1: Charge generation and collection from *Baumann* [2005, Figure 2].

electron-hole pairs in its wake. Indirect ionization happens when a small-size particle interacts with the silicon nucleus to produce various by-products such as heavy ions or charged particles, which can in turn produce charge through direct ionization.

During direct ionization, the energetic particle can create disturbance in the form of Linear Energy Transfer (LET), or kinetic energy loss per unit length. LET is often expressed in units of energy per distance such as MeV/cm, or MeV·cm<sup>2</sup>/mg when normalized by the density of the material [JEDEC Standard, 1996]. In silicon, every 3.6eV of energy lost produces one electron-hole pair, and charge collection of 1 to 100s of fC often occurs within a few microns of a reverse-bias silicon junction.

As an energetic ion strikes in the vicinity of a reverse-biased silicon junction, electron-hole pairs form around the particle track, where most of the charge generation is concentrated in a cylindrical form. Right after the onset of the event, charge collection rapidly occurs near the depletion region of the silicon junction by drift action within a nanosecond, followed by slower charge collection from diffusion. The overall result is a current pulse shown in Figure 2.1. If the particle strikes in a silicon region where no electric field is present, the electrons and holes generated will not move by drift or diffusion, and will recombine in place, resulting in no charge collection. Baumann [2005] offers a detailed explanation of the charge collection process.

Energetic particle strikes can cause both soft errors (temporary disruption of data)

or hard errors (permanent hardware failure). The following subsections discuss their causes and behaviors.

#### 2.1.1 Soft Errors

In the event that a particle strike results in charge collection in a circuit node in a digital circuit, the circuit node voltage can momentarily change. If the magnitude of this voltage is large enough, a Single-Event Transient (SET, also called a glitch) can happen. The equivalent amount of charge required to be collected is called Critical Charge, often labeled as  $Q_{\text{crit}}$  [Dodd and Sexton, 1995]. If the SET is allowed to propagate and persist in a digital circuit, it can create a Soft Error, an erroneous change (or upset) in the state of the circuit. On one hand, a Single-Event Upset (SEU) is a soft error with only one transistor diffusion node affected by charge collection. On the other hand, a Single-Event Multiple Upset (SEMU) occurs when a particle strike affects multiple transistor diffusion nodes and causes a soft error. The Soft Error Rate (SER), is the rate at which a device or system encounters or is expected to encounter soft errors. It can be expressed as either the number of Failtures-In-Time (FIT), or as Mean Time Between Failures (MTBF). FIT is characterized in units of number of errors per billion hours (1 FIT = 1 error per  $10^9$  hours). MTBF is measured in terms of number of hours between individual failures (1 MTBF = 114,077 FIT). Please note that a "failure" is not equivalent to a "soft error", since a soft error may not result in the overall system failure, and multiple soft errors may be required to induce a system failure.

#### 2.1.2 Hard Errors

In addition to creating soft errors, single particle strikes can sometimes have long lasting and potentially damaging effects. If the particle energy is sufficiently large, it can create a *Single-Event Gate Rupture* (SEGR), where the electric field across the transistor gate oxide becomes large enough to exceed the critical breakdown field allowed. The result of the oxide breakdown is a permanent short circuit through the oxide. Interestingly enough, the device scaling of transistor oxide thickness actually

increases the electric field tolerance for oxide breakdown [Sexton et al., 1997; Massen-gill et al., 2001]. SEGR is not a concern for modern CMOS processes at the ground level.

Single-Event Latchup (or SEL) is another potential catastrophic failure mechanism caused by energetic particles [Leavy and Poll, 1969]. When a particle strikes in the vicinity of two neighboring PMOS and NMOS transistors, it can activate the parasitic PNPN structure formed by the NMOS-PMOS pair (see Figure 2.2). The activated parasitic PNPN structure provides a short impedance path between the power and ground of the circuit and creates a short circuit current run-off, potentially destroying transistors in the vicinity. Once latchup occurs, the only way to remove this condition is through power cycling (i.e. turning off the power supply then turning it back on). Latchup can be prevented by properly insulating each individual transistor at the process level, or by following proper integrated circuit layout rules by placing supply contacts or guard rings (acting as low-resistance paths to supplies) near transistors (see Section 4.2.1). Some silicon processes containing good isolation such as Siliconon-Insulator (SOI) are thus inherently resistant to latchup. [Sexton, 2003] provides a detailed review of the various destructive single-event effects found in semiconductor devices. Please note that hard errors are not the focus of this work. This work primarily deals with soft errors in the terrestrial environment.

#### 2.2 Sources of Radiation

Soft errors are caused by particles from radiation sources in the environment. The following subsections detail different sources of radiation causing soft errors in electronics.

#### 2.2.1 Near-Earth Radiation Environment

The Near-Earth environment is home to a host of energetic charged particles around Earth held in place by the Earth's magnetic field. Most of these particles originate from either the solar wind or cosmic rays. Particles trapped in this region are mostly



Figure 2.2: Parasitic PNPN structure leading to single event latchup from *Sexton* [2003]. (a) Cross-section of a bulk CMOS technology on n-substrate material. (b) Equivalent circuit.

electrons and protons as well as heavy ions [Walt, 1994; Hareyama et al., 2007]. In this study, we do not consider electronics operating in space.

#### 2.2.2 Terrestrial Radiation Environment

Radiation affecting electronics in the terrestrial environment mainly comes from three sources: alpha particles, high-energy cosmic rays and low-energy cosmic rays [Baumann, 2005]. To evaluate the soft error resilience of a system, it is possible to perform an accelerated testing using a concentrated radiation source to mimic the actual radiation dose received during its lifetime of operation. The accelerated radiation testing

can predict actual field soft error performance lasting several years in a short amount of time (typically minutes to hours) in laboratory setting. However, devices under irradiation can become damaged by the high intensity of the radiation source, and the soft error rate measured from this process can deviate from the actual field behavior.

The following subsections describe various forms of radiation sources in the terrestrial environment, along with the availability of accelerated testing for these radiation sources.

#### Alpha Particles

Alpha particles, as a major source of soft errors in electronics, were first observed in the 1970s in DRAM memories [May and Woods, 1979]. An alpha particle consists of a nucleus formed by two neutrons and two protons, and is emitted by the nuclear decay of unstable radioactive isotopes such as Uranium-238 ( $^{238}$ U) or Thorium-232 ( $^{232}$ Th) in packaging, and Lead-210 ( $^{210}$ Pb) in solder. Alpha particles are mostly produced at energy levels of less than 10 MeV [Baumann, 2005], and can create charge through direct or indirect ionization. Since an alpha particle is positively charged, it creates an ionizing path when it travels through silicon until it loses all its energy (being "stopped"). In silicon, the traveling range of a 10-MeV alpha particle is less than 100  $\mu$ m. Therefore, only alpha particles produced close to the silicon die (i.e. from packaging) can cause soft errors. Alpha particles used to be of great concern, but can be mitigated by making sure packaging material is not contaminated with radioactive isotopes.

To perform accelerated alpha-particle testing on integrated circuits, one can simply place a thin radioactive foil such as Californium-252 ( $^{252}$ Cf) directly over the silicon die [Koga, 1996].

#### Neutrons

Atmospheric neutrons are also a major source of soft errors in electronics. When cosmic rays reach the Earth's atmosphere, they produce a chain of nuclear interactions with the atmosphere, producing muons, protons, neutrons and pions reaching sea level



Figure 2.3: Theoretical sea-level cosmic rays from Ziegler [1998].

[Ziegler and Lanford, 1980]. Figure 2.3 shows the distribution of cosmic ray particles at ground level. Among the cosmic ray particles found, neutrons are the main source of soft errors, as electronic chips cannot be well shielded from the cosmic neutron flux using conventional means (metal shielding, better shielding materials etc.): one foot of concrete can barely reduce the neutron flux by a factor of 1.4 [Dirk et al., 2003]. The neutron flux depends on the altitude: going from sea level to 10,000 feet, the neutron flux increases by about 10X.

Since neutrons have no charge themselves, they can only generate charge through indirect ionization by interacting with silicon nuclei or nuclei of other elements (dopants or metal) present in the chip [Baumann, 2005]. When a neutron collides with the nucleus of an atom, it can break the nucleus into multiple fragments (inelastic collision) or displace it (elastic collision). The fragments or recoil products can become a lighter ion with additional smaller particles (neutrons, protons or alpha particles), and they will move through silicon by bouncing off outer electrons of neighboring atoms in their ionization path before being completely stopped in silicon.

Low-energy neutrons, or thermal neutrons, tend to be absorbed through inelastic collision. Most importantly, very low energy neutrons ( $\ll 1$  MeV) can interact with the Boron implant found in semiconductor doping. Boron naturally occurs with two isotopes: Boron-11 ( $^{11}$ B, 80.1% abundance) and Boron-10 ( $^{10}$ B, 19.9% abundance). The  $^{10}$ B isotope is unstable in the presence of neutrons, and has a reaction cross-section three to seven orders of magnitude higher than other isotopes found in semiconductor materials. When a  $^{10}$ B nucleus absorbs a thermal neutron, it breaks apart into an excited Lithium-7 ( $^{7}$ Li) recoil nucleus and an alpha particle capable of inducing soft errors. The  $^{10}$ B isotope can be found as p-type silicon doping implant and as implant in the BPSG dielectric layer, but its occurrence is three orders of magnitude more likely in BPSG than in silicon implant doping. For conventional processes containing BPSG, BPSG is the main source of soft errors due to boron reactions. It is possible reduce SER due to  $^{10}$ B activation by eliminating the BPSG layer in the silicon processes, or by enriching BPSG with  $^{11}$ B isotope [Baumann and Hossein, 1995].

It is possible to perform accelerated neutron testing at one of the following facilties:

- Los Alamos Neutron Science Center (LANSCE) at Los Alamos National Laboratory in Los Alamos, NM.
- Tri-University Meson Facility at the University of British Columbia (TRIUMF) in Vancouver Canada.
- Atmospheric-like Neutrons from thick Target (ANITA) at Svedberg Laboratory, Uppsala University in Uppsala, Sweden.
- Research Center for Nuclear Physics (RCNP) at Osaka University in Osaka, Japan.
- Vesuvio Beamline (ISIS) at Rutherford Appleton Laboratory, Oxfordshire, United Kingdom.

In this work, we used the ICE House at LANSCE to perform accelerated neutron testing [LANSCE, 2011]. The neutron testing is discussed in Section 6.2.1.

#### **Protons**

Atmospheric protons can cause soft errors in electronics. Low-energy protons can be easily absorbed by the atmosphere or most shielding materials and are not a concern for soft errors. At high energy levels (> 100 MeV), protons behave similarly to neutrons in that they generate charge mostly through indirect ionization. However, high-energy proton flux at the ground level (as seen from Figure 2.3) is not very significant compared to the neutron flux, and high-energy protons are often lumped together with high-energy neutrons for study purposes. In the near earth environment, protons are a significant source of radiation, and can both cause soft errors and induce significant long-term damage to electronics similar to transistor aging.

Accelerated proton beam facilities can be found in both dedicated research beam accelerators and cancer treatment facilities, as proton therapy is commonly used for treatment of certain types of cancer such as prostate cancer, pediatric neoplasms, brain cancer and lung cancer. In this study, we performed accelerated proton beam testing at the Indiana University Cyclotron Facility in Bloomington, Indiana [IUCF, 2011]. The proton testing is discussed in Section 6.2.2.

#### **Heavy Ions**

Heavy ions are energetic ionized atoms heavier than helium. Heavy ions are mostly found in the space environment, but can also be found as secondary products of neutron or proton nuclear interaction with silicon, metal or silicon dopants such as <sup>10</sup>B (discussed previously in Section 2.2.2. [Baumann, 2005] When a heavy ion hits silicon, it creates charge only through directly ionization, and becomes completely stopped in silicon. Due to their large size, their energy is entirely transferred to silicon and they can generate substantially more charge than small particles such as neutrons and protons. This work does not consider heavy ions for soft errors, although they also have similar single event effects in electronics.

#### 2.3 Total Dose Effects

When circuits operate for a long time under radiation exposure, they can be subject to slow-varying but lasting effects called Total Dose Effects. Most of the radiation-induced long term damage to CMOS transistors is located in the silicon oxide layer, where additional trap centers are created. Damage to both the field oxide and the gate oxide can change the behavior of transistors, affecting mostly the threshold voltage, leakage current, transconductance (or current gain) of the transistors and noise performance [Wang, 2009, Section 2.3]. In general, radiation dose increases and decreases the threshold voltage of PMOS and NMOS transistors, respectively. For digital circuits, these total-dose effects translate into leakage current (and power) increase, rise/fall transition mismatch (falling transitions become faster while rising transitions become slower) and delay distribution expansion (where the minimum delay path may become faster over time and the maximum delay path slower). These effects are very similar to transistor aging effects in deep submicron processes such as Negative Bias Temperature Instability (NBTI) and Positive Bias Temperature Instability (PBTI).

Although this work primarily deals with radiation effects in the terrestrial environment where total-dose effects are minimal, it is important to note that devices can experience these effects under accelerated radiation testing. Total-dose effects affecting the soft error resilience of sequential circuits are reported in Section 6.2.2.

#### 2.4 Conclusion

This chapter presented an introduction to radiation effects in electronics. Single-event effects, the focus of this work, were divided into soft errors and hard errors. Various soft error terminologies integral to this work were first given, followed by brief descriptions of hard errors such as single-event latchup and single-event gate rupture. To better understand where and how single-event effects occur, various radiation environments and types of radiation particles affecting transistors were described. Accelerated radiation testing offers the possibility of quickly estimating the

soft error performance of electronics within a short time frame. However, total-dose effects, although absent in the normal terrestrial environment, can pose a challenge on circuit operation and performance during accelerated radiation testing, and were later explained in this chapter. To ensure reliable operation of digital circuits, the aforementioned radiation effects, specifically single-event effects, must be taken into account in the design of soft-error-resilient circuits. The following chapter elaborates on various circuit design techniques capable of addressing single-event upsets, and new layout design techniques are developed in Chapter 4. Experimental results from radiation experiments confirm the effectiveness of these design techniques against single-event effects in Chapter 6, and discover new circuit effects caused by radiation effects discussed in this chapter.

## Chapter 3

# Circuit Soft Error Resilience Techniques

Radiation-induced soft errors have been an important design issue for space-bound system applications. A designer can attempt to reduce the number of soft errors by either making individual transistors or circuits more robust to soft errors, or accept that soft errors may occur and use error detection and correction to recover from data corruption. To prevent soft errors from occurring, *Radiation Hardness-By-Design* (RHBD) techniques can be developed at the circuit abstraction level, where the operation of a circuit inherently prevents temporary disruption of a circuit node due to a radiation strike from propagating and generating a soft error.

This chapter presents an overview of existing circuit techniques for soft error resilience. The chapter begins with an introduction on how digital circuit designers traditionally perceive radiation-induced soft errors. Soft errors are then categorized by the type of digital logic where they originate (combinational logic and sequential logic). Soft errors originating from combinational logic typically have substantially lower error rates than those from sequential logic. Therefore, most of the current RHBD efforts are focused on making sequential elements more robust. To give a general understanding of how circuit design can be used to reduce soft errors in sequential logic, this chapter first describes RC-based circuit techniques used to make each circuit node more resistant to charge collection. RC techniques harden a circuit

node by increasing the time constant of the circuit relative to the time constant of single event charge collection. With device scaling, circuits become faster and consume less power, and RC techniques impose a large performance penalty on deep submicron circuits.

Circuit redundancy techniques, described next in this chapter, provide an efficient alternative for reducing soft errors. The main forms of circuit redundancy, such as Triple Modular Redundancy (TMR) and Dual Modular Redundancy (DMR), can offer several orders of magnitude in soft error reduction, by replicating identical circuit elements in the hope that a particle strike will only affect some but not all circuit elements, and the circuit can recover from partial failure by voting on the results of each replicated element. These circuit redundancy schemes can make sequential circuits immune to Single-Event Upsets, or single errors due to radiation strikes affecting on a circuit node. This chapter places more emphasis on DMR due to its lower associated design costs, by describing an assortment of DMR circuit level techniques successfully used in the past or currently in use. The chapter then concludes with a discussion on the shortcomings of the circuit-level soft error resilience techniques, specifically circuit vulnerability to Single-Event Multiple Upsets (SEMUs), where a single particle strike can cause charge collection in multiple circuit nodes.

#### 3.1 Soft Error Generation

In the past, circuit techniques for soft-error-resilient sequential cells generally target soft errors due to upsets on a single circuit node, or Single Event Upsets (SEUs). For circuit designers, charge collection on a single circuit node due to a particle strike was thought of as a short duration current pulse producing a voltage glitch, called Single Event Transient (SET). SETs are often modeled as rectangular or double exponential current pulses (see Figure 3.1) [Messenger, 1982], instead of a more accurate current model involving complex device-level simulations such as the one shown in Figure 2.1. For technology nodes with feature sizes larger than  $0.25\mu m$ , this assumption was mostly valid, as Single Event Multiple Upsets (SEMUs), or simultaneous upsets on multiple circuit nodes due to a single particle strike, still formed a small portion of

the overall Soft Error Rate (SER) (see Figure 3.2) [Seifert et al., 2006].

The proportion of SEMUs in all soft errors increases exponentially as device dimensions and distances are reduced. Chapter 4 addresses the growing SEMU concern through transistor layout techniques.

Soft errors can be categorized in two groups: soft errors in combinational logic and soft errors in sequential logic. A soft error can be generated when a particle hit produces a SET (glitch) on a combinational logic gate: the SET can propagate down the combinational path and become latched by a sequential cell, producing a soft error (Figure 3.3) [Mavis and Eaton, 2002; Dodd and Massengill, 2003].

In order for the SET in the combinational logic to produce a soft error, several conditions must be satisfied:

- 1. The produced SET must be a voltage pulse with sufficient amplitude (larger than the input noise margin) and duration to propagate downstream.
- 2. The logic path leading to the sequential cell must be sensitized to allow the SET to propagate to the input of the sequential cell.
- 3. The SET must arrive at the data input of the sequential cell inside the cell's vulnerable input latching window (Figure 3.4).

However, combinational soft errors still form a very small portion of the overall soft error rate, and typical soft error resilience strategies target sequential soft errors. When an energetic particle strikes a sequential cell in its retention state, sufficient charge can be collected by the circuit node storing the memory content and alter the memory state. To prevent soft errors in sequential cells, we can harden individual circuit nodes to make them more resilient to charge collection. The likelihood that an individual node becomes upset depends on its node capacitance, the voltage difference needed to flip its logic value as well as the strength (or lack) of the driver maintaining the node voltage. Resistor- and capacitor-based techniques discussed in Section 3.2 can reduce the soft error rate by targeting these conditions to protect against individual node upsets, but they do not provide full immunity against SEUs and can affect circuit performance. It is possible to make sequential cells fully SEU-immune



Figure 3.1: Rectangular and double exponential pulse models for single event transients.



Figure 3.2: Comparison of published single-event multiple upset probabilities (adapted from  $Seifert\ et\ al.\ [2006].$ 



Figure 3.3: Combinational and sequential soft errors from *Dodd and Massengill* [2003].

by making use of circuit node redundancy (i.e. storage the same memory bit in multiple locations). Sections 3.3 and 3.4 present efficient SEU-immune redundancy-based circuit techniques where circuit nodes are replicated, and single errors among multiple copies of the same node can be detected and corrected. Table 3.1 summarizes the RC-based and redundancy-based circuit techniques to be discussed in this chapter.

## 3.2 RC-Based Soft Error Resilience Techniques

To understand how to design radiation-hardened sequential elements, one must first understand how sequential elements in general respond to radiation-induced charge collection. CMOS memory storage elements can have their data content temporarily stored as floating charge (for example, in a capacitor in a Dynamic Random Access



Figure 3.4: Vulnerable timing window for combinational SETs latching into sequential cells *Mavis and Eaton* [2002].

Memory (DRAM) cell or in a floating gate in Flash Memory), or maintained by an active feedback circuitry (such as a pair of cross-coupled inverters in static latch or a Static Random Access Memory (SRAM) Cell).

DRAM cells typically have more compact cell area due to the lack of feedback circuitry, but are prone to bitline current leakage as well radiation-induced collection impacting their stored charge. Therefore, most DRAM memory arrays have frequent charge refresh cycles as well as built-in Error Correcting Code (ECC) circuitry, and do not in general require additional circuit hardening beyond the built-in error correction circuitry. Due to the constant memory refresh of DRAM cells, the soft error rate of DRAM is about two orders of magnitude lower than SRAM [Doucin et al., 1997].

CMOS flash memory is more resilient to single-event charge collection compared to SRAM or DRAM, since the floating gate in each cell is not electrically connected to the semiconductor substrate or junctions where most of the radiation-induced charge collection occurs [Doucin et al., 1997; Fogle et al., 2004]. However, the floating gate structures are sensitive to voltage threshold shifts and leakage current due to long term

| Family                                 | Technique                                       | Design                                    | Description                                                                  |
|----------------------------------------|-------------------------------------------------|-------------------------------------------|------------------------------------------------------------------------------|
| Triple Modular Redundancy (TMR)        | Spatial<br>Sampling                             | Spatial<br>Sampling Latch <sup>(1)</sup>  | Triplication of storage elements with a majority voter at the output.        |
|                                        | Temporal<br>Sampling                            | Temporal<br>Sampling Latch <sup>(1)</sup> | Three time-delayed storage nodes filtered with 3-input majority voter.       |
| Dual<br>Modular<br>Redundancy<br>(DMR) | C-Element                                       | Guard Gate <sup>(2)</sup>                 | SET filter by comparing a signal and its delayed copy.                       |
|                                        |                                                 | BISER <sup>(3)</sup>                      | Duplicated unprotected storage elements and C-Element voter with keeper.     |
|                                        |                                                 | 4-TAG <sup>(4)</sup>                      | Modified latch with built-in keeperless C-Elements in duplicated loop paths. |
|                                        | Half-Transition<br>NAND gate                    | $SERT^{(5)}$                              | Latch modified from 4-TAG.                                                   |
|                                        | Differential<br>Cascode Voltage<br>Switch Logic | $\mathrm{DCVSL}^{(6)}$                    | Duplicated circuit nodes for every logic gate.                               |
|                                        | Dual-<br>Interlocked<br>Storage Cell            | DICE <sup>(7)</sup>                       | Modified latch with duplicated circuit nodes.                                |

Table 3.1: Overview of circuit-level techniques for soft error resilience<sup>1</sup>.

ionizing dose exposure, and soft error rates increase substantially with increasing total dose exposure [Nguyen et al., 1999; Nguyen and Sheik, 2003; Bagatin et al., 2010]. The discussion on the impact of radiation damage on flash memory is outside the scope of this work.

Most sequential cells maintain their data content by actively driving them in a feedback circuit loop. The simplest form of sequential cell is a pair of cross-coupled inverters storing a single bit in an SRAM cell, shown in Figure 3.5. In this circuit configuration, a memory bit is stored as a voltage and its complement in circuit nodes A and B. When a radiation strike hits an "off" NMOS transistor (called "struck transistor"), negative charge collects on its drain node, creating a negative voltage transient on the drain node and potentially upsetting the logic-high value of node A.

<sup>1(1)</sup> Mavis and Eaton [2002]. (2) Balasubramanian et al. [2005]. (3) Mitra et al. [2005]. (4) Shuler et al. [2005]. (5) Shuler et al. [2009]. (6) Casey et al. [2005]. (7) Calin et al. [1996].



Figure 3.5: SEU response model in an SRAM cell adapted from *Dodd and Massengill* [2003].

This initial charge collection step is represented by the first exponential of the double exponential model in Figure 3.1. If the collected charge does not reach the critical charge level (the minimum charge required to upset the memory cell, often labeled as  $Q_{\rm crit}$  or  $Q_{\rm c}$ ) [Dodd and Sexton, 1995], the "on" PMOS transistor, connected to the "off" NMOS transistor by its drain node and acting as a "restoring transistor" will remove the excess charge and restore the circuit node voltage connecting both "struck transistor" and "restoring transistor." The restoring action is represented by the second exponential of the double exponential model in Figure 3.1. However, if the radiation-induced charge collection exceeds the critical charge, the "restoring transistor" may not recover the upset drain node voltage in time, and the upset node voltage can propagate to the complementary circuit node B through the opposite inverter circuit feedback and flip the overall stored bit (i.e. both circuit node A and B are upset).

The likelihood that the SRAM cell is upset depends on the amount of charge deposited, the capacitance of the affected circuit node and the drive strength of the restoring transistor. The goal of hardening the SRAM cell against single-event inducing radiation strikes is to reduce the width and/or amplitude of the transient



Figure 3.6: RC-hardened SRAM cells. (a) Hardened SRAM cell using Metal-Insulator-Metal capacitors. (b) Hardened SRAM cell using highly resistive polysilicon resistors.

voltage pulse so that the SRAM cell becomes less sensitive to charge collection. One way to improve the soft error resilience of the SRAM cell is to substantially increase the storage node capacitance in each cell without any area impact by introducing a Metal-Insulator-Metal (MIM) capacitor between the polysilicon and Metal 1 layers of the cell [Geppart, 2004; Roche et al., 2004; Roche et al., 2005; Lysinger et al., 2008] (Figure 3.6a). It is also possible to harden the SRAM cell by adding intracell decoupling polysilicon resistors in series with the gate terminal of each inverter inside the SRAM cell, as shown in Figure 3.6b [Diehl et al., 1982; Weaver et al., 1987; Rockett, 1992]. Both the addition of high density capacitors and resistors require silicon process enhancement, and are not covered in this work. Moreover, these techniques do not provide full immunity against SEUs, and it is possible to make cells even more robust without any special process enhancement using redundancy-based techniques to be discussed in the next two sections.

## 3.3 Triple Modular Redundancy

Most of the effort on circuit-level soft error mitigation techniques in the past two decades has been on providing full immunity to single soft errors through circuit redundancy. The most popular technique, called *Triple Modular Redundancy* (TMR),



Figure 3.7: Triple modular redundancy latches from *Mavis and Eaton* [2002]. (a) Spatial sampling latch. (b) Temporal sampling latch.

involves replicating a storage node three times and adding a three-input majority gate to filter out unwanted SETs or SEUs [Mavis and Eaton, 2002]. When a single error occurs in any of the three storage nodes, a three-input majority gate can recover the correct value through voting. Figure 3.7 shows two implementations of Triple Modular Redundancy. Spatial Sampling (shown in Figure 3.7a) involves the use of three identical latches (or flip-flops) to store one memory bit. Whenever a single error occurs in any of the three elements, the majority gate at the output can recover the correct result.

The use of delay elements with delay  $\delta$  and  $2\delta$  to create three delayed versions of the clock for sampling, also called *Temporal Sampling*, can prevent SETs on the clock and data inputs, of widths smaller than  $\delta$ , from simultaneously corrupting the three storage elements, since each latch or flip-flop now samples at clock edges separated by  $\delta$  in time. Temporal sampling can also be incorporated directly into the latch structure by replicating the storage node in time using the same delay elements (Figure 3.7b).

Triple Modular Redundancy, especially using spatial sampling such as Figure 3.7a, is popular among ASIC (Application Specific Integrated Circuit) and FPGA (Field Programmable Gate Array) designers, since it does not introduce any new circuit element to the existing standard cell library. However, TMR requires at least 3X area cost, 3X power cost and moderate delay cost due to the triplication of the sequential cells and the addition of the delay elements and majority gate.

## 3.4 Dual Modular Redundancy

In order to achieve the same soft error resilience as TMR but with less area and power cost, we can incorporate Dual Modular Redundancy, or DMR, in sequential cell design [Calin et al., 1996; Shuler et al., 2005; Mitra et al., 2005; Casey et al., 2005]. The main idea behind DMR is to use duplicated circuit elements as a form of redundancy to obtain soft error resilience, and in general the power and area cost required is slightly higher than double that of a regular design. Though capable of achieving near identical soft error resilience as TMR at a much reduced area and power cost, DMR has not been incorporated in standard ASIC design flow where the majority of IC design is based upon, as DMR requires specialized circuit designs outside the standard cell library typically provided by ASIC vendors. Also, for large memory arrays, error correcting codes can replace the need for having individual hardened memory cells. Despite the limited use of DMR, DMR has a rich assortment of circuit techniques that can address SEUs, and it is worth studying these techniques to understand how interactions between transistors in a redundant circuit can protect against soft errors. Chapter 4 takes inspiration from these circuit interactions to develop new ways of placing individual transistors to stress these interactions to improve the soft error resilience of the circuit.

#### 3.4.1 C-Element Based Soft-Error-Resilient Latches

When we discussed TMR in the previous section, the first thing that came to mind was the three-input voter capable of filtering out one error stored in three memory



Figure 3.8: The C-Element. (a) Circuit implementation with optional keeper. (b) Truth table. (c) Symbolic representation [Muller and Bartky, 1959].

elements storing the same bit. One wonders whether it is possible to do so with only two memory elements (i.e. DMR). If indeed a single upset happens in one of the memory elements, we can only observe that both elements contain different bits, but we cannot determine which element is erroneous. But what if we *knew* what the correct value should be before the single upset?

Muller and Bartky [1959] developed the C-Element in asynchronous circuit theory as a two-input SET filter capable of correcting single upsets in DMR-based designs (replacement for the three-input majority voter found in TMR). The C-Element passes the value of its inputs to its output when both inputs are equal, or retains its previous output value otherwise (Figure 3.8a). Figure 3.8b shows the logic function of C-Element, and Figure 3.8c shows the symbols for the C-Element used in this thesis. The keeper-less C-Element in Figure 3.8c is sometimes called a "Guard Gate" or a "transition NAND Gate", and is much more vulnerable to circuit noise due to the lack of the keeper [Balasubramaian et al., 2005; Shuler et al., 2006].

The C-Element can be used to filter out unwanted SETs in combinational logic entering a sequential element, as shown in Figure 3.9. In this case, any SET of duration less than  $\delta$  produced in the combinational logic or at the inputs to the



Figure 3.9: SET filtering of a latch input using C-Element from *Balasubramanian et al.* [2005].

C-Element cannot propagate to the latch input. However, this configuration is still vulnerable to SETs with pulse widths larger than  $\delta$ , as well as SETs produced by a particle strike on the output of the C-Element. In some cases, the contribution of SETs from the combinational logic block can be large enough to merit the use of guard gate as an effective SET suppression method.

By using the C-Element to vote on the outputs of two identical unhardened master-slave flip-flops, we can construct a soft-error-resilient sequential element called Single C-Element Dual Modular Redundancy (SCDMR) flip-flop (Figure 3.10), similar to the BISER flip-flop from *Mitra et al.* [2005] and *Zhang et al.* [2006]. Like the TMR scheme described in Section 3.3, SCDMR uses voting to detect and correct single errors: the two-input C-element acts as the two-input voting gate for two identical flip-flops. The soft error resilience of SCDMR is based on the idea that whenever a mismatch occurs at the outputs of the duplicated storage elements, the C-Element voting circuit blocks the outputs of the identical flip-flops and the keeper retains the previous correct value. When the C-Element is in blocking mode (its inputs are different), the keeper acts as a temporary storage for the previously correct value (before any error occurred on one of the latches), and any additional upset on the keeper can permanently change the output of the sequential cell.



Figure 3.10: Single C-Element Dual Modular Redundancy (SCDMR) master-slave flip-flop.

The SCDMR flip-flop also has an additional vulnerability to upsets occurring simultaneously or separately on both flip-flops, as the identical unhardened latches themselves have no ability to recover themselves. While operating in high clock frequencies (> MHz), upsets in the SCDMR flip-flops are dominated by SEMUs. If the SCDMR latch operates in very low frequencies (<< Hz) or remains in a retention (opaque) mode where the clock is disabled for some time (this can be the case if the circuit is in standby mode), different upsets due to different particle strikes can upset this sequential element.

Despite the more than double area/power penalty and moderate delay penalty imposed on SCDMR due to the presence of two identical flip-flops and the C-Element, the cost impact of SCDMR can be greatly reduced by reusing existing on-chip scan design-for-testability resources within microprocessors. *Mitra et al.* [2005] presented the *Built-In Soft Error Resilience* (BISER) flip-flop, where the existing scan flip-flop is re-equipped with a C-Element voting gate at its output and small changes to its input to provide soft error protection with minor costs (Figure 3.11). *Zhang et al.* [2006] provided a detailed description of different operational modes for this flip-flop.



Figure 3.11: Built-In Soft Error Resilience (BISER) flip-flop from *Mitra et al.* [2005].

Radiation experiments in 45nm bulk CMOS demonstrate that BISER can have soft error resilience "in excess of 100X with respect to non-hardened designs" [Seifert, 2008].

A different approach to improving the soft error resilience of a flip-flop using C-Elements is to incorporate delay filtering with keeper-less C-Elements in the feed-forward path of a latch. The keeper-less C-Elements are also called Guard Gates [e.g., Balasubramanian et al., 2005] or Transition AND Gates (TAG) [e.g., Shuler et al., 2005]. The TMR temporal sampling latch from Figure 3.7b is a typical example of delay filtering in feedforward path of a soft-error-resilient latch, where the original latch signal is replicated 3 times in time to allow removal of any SET with pulse duration less than the delay of the delay element,  $\delta$ , by the 3-input majority voting gate. During the opaque (or retention) mode of this latch, the overall latch can be reduced to the circuit shown in Figure 3.12a. To transform this TMR temporal sampling latch into a DMR latch to reduce design costs, we can replace the 3-input majority voting gate with a 2-input keeper-less C-Element to produce a DMR temporal sampling latch called "1-TAG" shown in Figure 3.12b [Shuler et al., 2005]. The



Figure 3.12: Delay filtering in the feedforward path of soft-error-resilient latches. (a) TMR temporal sampling latch. (b) 1-TAG from *Shuler et al.* [2006]. (c) Enhanced Delay-Filtering (EDF) latch (this work). (d) 3-TAG from *Shuler et al.* [2006]. (e) 4-TAG from *Shuler et al.* [2005].

"1-TAG" latch is significantly less soft-error-resilient compared to the TMR Temporal Sampling Latch, since any SET arriving at the inputs of the C-Element can render the C-Element output temporarily undriven (floating) and prone to any additional single-event charge collection.

To improve the soft error resilience of the "1-TAG" latch, we can create more complex latches using two logic levels of keeperless C-Elements, such as the "EDF" latch (Figure 3.12c, e.g. this work), the "3-TAG" latch (Figure 3.12d) [e.g., Shuler et al., 2006] and the "4-TAG" latch (Figure 3.12e) [e.g., Shuler et al., 2005; Shuler et al., 2006]. Heavy ion experiments in  $0.35\mu$ m bulk CMOS show that the "1-TAG" and "3-TAG" exhibit 3.2X and 6.6X soft error reduction, significantly less than the 100X soft error reduction expected from typical DMR circuit techniques. However, the same study also shows that the "4-TAG" latch performs substantially better, with only a single error observed during the entire course of the experiment. The difference

in soft error resilience between the various "TAG" designs can be explained by the following observations:

- 1. Both the "1-TAG" and "3-TAG" designs are at not immune to SEUs created from SETs on a single node with duration longer than  $\delta$ , the delay of the delay element used.
- 2. The "4-TAG" design is instead immune to ALL SEUs regardless of SET duration, because every circuit node in the design is duplicated. The SEU immunity restricts the range of possible strike angles that can upset this design.
- 3. Circuit simulations also show that the "4-TAG" design has an inherently higher critical charge on its sensitive nodes compared to the other designs.

In addition to substantially higher soft error resilience, the "4-TAG" also benefits from more compact area, smaller power consumption and faster speed, since it does not require the use of delay elements (which tend to be power- and delay-hungry), and it has a symmetric circuit topology allowing more compact layout design. The "4-TAG" latch can be further simplified by replacing each 4-transistor TAG gate (or keeper-less C-Element) with a 3-transistor half-transition NAND gate (Figure 3.13a) to create a *Single Event Resistant Topology* (SERT) latch (Figure 3.13b) [Gambles et al., 2003; Shuler et al., 2009]. The SERT latch has similar soft error immunity compared to the 4-TAG latch, but benefits from a reduced transistor count and thus power consumption.

## 3.4.2 Dual Interlocked Storage Cell (DICE)

The Dual Interlocked Storage Cell, or DICE, is an 8-transistor storage element relying on dual modular redundancy of its internal circuit nodes to achieve immunity to errors affecting one single circuit node [Calin et al., 1996]. This storage element has four internal circuit nodes and stores one memory bit (Figure 3.14a). When a single event temporarily upsets one of the four circuit nodes, only one additional circuit node is affected by the upset through positive feedback, and the other unaffected circuit



Figure 3.13: Single Event Resistant Topology (SERT) latch. (a) Half-Transition NAND gate (adapted from *Gambles et al.* [2003]). (b) SERT latch from *Shuler et al.* [2009].

nodes will correct the values of the affected circuit nodes (Figure 3.14b). However, a single upset on one circuit node can lower the critical charge necessary to upset other unaffected circuit nodes, and the DICE circuit can become vulnerable to single-event multiple upsets through charge sharing. For example, a particle striking both transistors M1 and M8 can cause a SEMU. Blum and Delgado-Frias [2006] implemented a TMR version of DICE called TPDICE.

## 3.4.3 Differential Cascode Voltage Switch Logic (DCVSL)

The Differential Cascode Voltage Switch Logic, or DCVSL, provides a different DMR approach by replicating each signal with its complement (Figure 3.15a) [Casey et al., 2005]. Because all logic functions in DCVSL are implemented in its "n-tree" structures while the PMOS devices serve as pull-up, DCVSL logic gates are only sensitive to  $0\rightarrow 1$  transitions at their inputs. This special property prevents any single event transient on one circuit node from propagating beyond two logic stages (Figure 3.15b). Using DCVSL logic gates, we can construct a SEU-immune latch such as the one shown in Figure 3.15c. In order to prevent a SET from causing an upset, three stages of differential inverters are used in the memory loop. A three-input DCVSL C-Element then acts as a SET filter at the output. In order to cause an upset in DCVSL logic,



Figure 3.14: Dual Interlocked Storage Cell (DICE), access transistors not shown [Calin et al., 1996]. (a) Stable state "A = 1". (b) Temporary state after upset on transistor M1.

a SEMU must happen at both differential outputs of a logic gate.

## 3.5 SEMU Vulnerability in SEU-Immune Techniques

The circuit techniques presented in the previous section protect against SEUs, i.e. soft errors due to charge collection in a single circuit node. However, they are not sufficient to address soft errors due to SEMUs.

To understand how the DICE cell is sensitive to SEMUs, let us first examine circuit interactions inside DICE to determine the SEMU vulnerability of the DICE circuit.



Figure 3.15: Differential Cascode Voltage Switch (DCVSL) [Casey et al., 2005]. (a) Circuit implementation. (b) Single-event transient filter. (c) DCVSL latch DIFF.

Suppose the DICE circuit is in the initial state shown in Figure 3.16a. When a particle strikes near an "OFF" transistor in the DICE circuit (for example, transistor M1), its drain node (n1) collects an upsetting charge. The circuit node (A) connected to this drain node can temporarily change its own state and trigger the following changes in the rest of the circuit (Figure 3.16b):

- 1. Circuit node B changes its logic value due to positive feedback interaction with circuit node A through transistors M1 and M4.
- 2. The voltage change in node B weakens the drive strength of transistor M6 driving circuit node C.
- 3. Circuit node D is left floating as transistor M7 is turned "OFF" by the voltage change in node A.

If no additional node collects charge, the voltages on circuit nodes C and D should be left undisturbed, and the stable circuit node voltages eventually restores the DICE circuit to its initial state in Figure 3.16a. However, if transistor drain nodes n1 and n8 (of transistors M1 and M8) are both in the path of the particle strike, the charge



Figure 3.16: Single-event multiple upset (SEMU) in the Dual Interlocked Storage Cell (DICE). (a) Initial circuit state. (b) Response to an upset in a single circuit node. (c) Response to upsets in a sensitive circuit node pair.

collected by drain node n8 can easily flip the logical value of circuit node D, which was rendered floating by the node upset on A, producing a SEMU (Figure 3.16c). Seifert

et al. [2007] observed that a charge deposition on a primary circuit node significantly lowers the critical charge required on a secondary node to cause a SEMU (related to the example of circuit nodes A and D), while heavy ion results from  $Baze\ et\ al.$  [2008] demonstrated that the error cross-section of the DICE circuit shows a strong angular dependence to radiation strikes primarily due to SEMU strikes.

#### 3.6 Conclusion

This chapter presented an overview of common circuit techniques used for soft error resilience. We first introduced various circuit upset conditions for soft errors, then addressed these conditions by presenting RC-based techniques that enhance the resilience of a single circuit node, followed by redundancy-based techniques that provide full single-event upset (SEU) immunity. On one hand, the RC techniques are often unavailable in standard commercial fabrication processes. On the other hand, the redundancy-based designs can be implemented in any process, yet they only protect against single-event upsets, and require at least twice as many transistors, resulting in more than double area and power costs with moderate delay cost.

With technology scaling, individual circuit node capacitances become smaller, and RC-based techniques introduce too much power and delay in order to maintain the same soft error resilience moving from one process node to the next. What's more, the soft error rate due to single-event multiple upsets (SEMUs) increases exponentially with device scaling, and the contribution of single-event multiple upsets toward the overall soft error rate is already more than 10% for technology nodes smaller than 65nm [e.g., Seifert et al., 2006]. The SEU-immune redundancy circuit techniques presented in this chapter are no longer sufficient to make circuits robust, and new techniques must be developed to address SEMUs. The next chapter explores design techniques at the layout level targeting SEMUs by reducing the overall charge collection through layout geometry and circuit interactions. Combining circuit techniques for SEU immunity with SEMU-resilient layout techniques, it is possible to overcome the charge sharing effect of technology scaling and design highly soft-error-resilient circuits with moderate costs.

## Chapter 4

## Layout Soft Error Resilience Techniques

Radiation-induced soft errors are a major concern for robust systems, especially those targeting enterprise applications [Meaney et al., 2005; Sanda et al., 2008]. Unlike large SRAM arrays that can be protected using error-correcting codes and bit interleaving [e.g. Hsiao, 1970; Chen and Hsiao, 1984], soft error protection of sequential elements, i.e. latches and flip-flops, is challenging. Hardened by Design (HBD) techniques based on functional circuit topology as described in the previous chapter, do not prevent charge collection in a particular circuit node, but rather rely on the low probability of simultaneous upsets on redundant circuit nodes to obtain soft error resilience. In contrast, improved layout techniques are possible to ensure that charge collection and charge sharing can be reduced among circuit nodes.

With technology scaling, the probability of charge sharing and Single-Event Multiple Upsets (SEMUs) becomes a significant contributor toward the overall soft error rate [e.g. Seifert et al., 2006; Seifert et al., 2010], and it becomes increasingly difficult to guarantee a low probability for SEMUs without substantial area cost. Recent results in 32nm bulk CMOS from Seifert et al. [2010] show that SEU-immune designs achieve 10X more soft errors than expected compared to results from older process generations due to SEMUs. If this trend continues, we can expect SEMUs to dominate in next generation processes. It is therefore necessary to develop new area-efficient

layout techniques to address the mounting SEMU problem.

In this chapter, we first describe how charge collection affects CMOS circuits, and give a brief overview of the conventional techniques used to reduce the effect of charge collection and charge sharing. These techniques require area overhead and transistor spacing to achieve soft error resilience, and become infeasible as device dimensions become smaller and more transistors are packed within an unit area. We present a more area-efficient layout principle called LEAP, or Layout Design through Error Aware Positioning, where transistors in a layout are placed in such a way that vulnerable particle strike directions are protected. In the event of a SEMU particle strike, multiple transistors in a LEAP-protected layout can simultaneously collect charge to cancel (fully or partially) the overall effect of the single event on the circuit. Finally, we end the discussion on LEAP with LEAP-DICE, a layout example where LEAP is applied on the SEU-immune DICE circuit to achieve SEMU resilience.

## 4.1 Charge Collection in CMOS Circuits

To understand how CMOS circuits can be upset by particle strikes, we first examine the effects of energetic particle strikes, or Single Event Transients (SETs), in CMOS technology. When an energetic particle strikes in the vicinity of a MOS transistor, electrons and holes are generated around the particle track in the silicon. The injected charge is transported by drift and diffusion, causing reverse bias currents in all pnjunctions reached by the charge, and eventually removed by charge collection or recombination. For an NMOS transistor, a net negative charge is collected at the source and drain contacts [Dodd and Massengill, 2003], resulting in a positive current pulse (into the silicon). For a PMOS transistor, net positive charge collected at the source and drain contacts results in a negative current pulse.

For a single inverter, when an energetic particle hits the drain contact node of the "OFF" PMOS transistor, the positive charge collected at this node raises the inverter output voltage. If the inverter output is initially LOW (i.e. logic 0) and enough charge is collected, then the logic value of the inverter output can change (Figure 4.1a). Once the injected excess charge is swept out or recombined, the node



Figure 4.1: Effect of an energetic particle strike on the drain contact node of an "OFF" (a) PMOS transistor and (b) NMOS transistor in an inverter.

voltage is restored by the "ON" NMOS transistor. Likewise, when the "OFF" NMOS transistor of an inverter is hit while the inverter output is HIGH (i.e. logic 1), the output voltage is temporarily lowered (Figure 4.1b).

When the inverter output is "HIGH", a single event particle strike on the drain contact node of the PMOS transistor does not change the logic value of that circuit node, but rather drives the inverter output voltage higher than the supply voltage until the "ON" PMOS transistor removes the excess charge (Figure 4.2a). In the same way, if the inverter output is LOW, a particle strike on the drain contact node of the NMOS transistor pulls the voltage lower than the ground voltage, and reinforces the output state of the inverter (Figure 4.2b).

The different radiation-induced charge collection effects can be generalized from the inverter examples in Figure 4.1 and Figure 4.2 to any CMOS logic gate (Figure 4.3). For simplicity, let us consider only active logic gates. In active static gates, a NMOS transistor network (or tree) passes the "LOW" value to the output, whereas



Figure 4.2: Effect of an energetic particle strike on the drain contact node of an "ON" (a) PMOS transistor and (b) NMOS transistor in an inverter.

a PMOS transistor network passes the "HIGH" value to the output. To avoid output contention, only one of the transistor trees can be active, passing the intended logic value (0 or 1) to the output, or none at all, by relying on the capacitance of the output circuit node to retain its value. We define a transistor tree to be "ON" if it passes a logic value to the output ("HIGH" for PMOS or "LOW" for NMOS), or "OFF" if it doesn't.

When a particle strike hits a CMOS gate, the output value can change if charge collected by the circuit can be propagated to the output. As we recall, a particle hit on a PMOS transistor collects positive charge, while a particle hit on a NMOS transistor collects negative charge. If we define the "HIGH" output value as positive and the "LOW" output value as negative, in order for an upsetting SET to occur at the gate output (such as in Figure 4.1), charge with a polarity different than the output must be collected somewhere in the circuit after a particle strike and propagated to the output, without any "OFF" transistor blocking the path to the output from where the



Figure 4.3: Static CMOS logic design. Note that here the pass transistor logic is considered as being part of a larger complex gate which comprises the active logic driving its data inputs. This definition is slightly different than the regular definition for a static logic gate.

charge is collected. To produce a HIGH to LOW upsetting SET at the gate output, charge must be collected in the NMOS transistor tree (which happens to be in the "OFF" state), while for a LOW to HIGH upsetting SET, charge must be collected in the PMOS transistor tree (also in the "OFF" state). If the charge collected has the same polarity as the output and is propagated to the output, the output logic value is reinforced instead (Figure 4.2). In pass-transistor style logic, the transistor trees can be in either the "ON" or "OFF" state when an upset occurs, as long as the collected charge polarity is opposite to the polarity of the signal being upset, and the charge can propagate to the output. Table 4.1 summarizes how different particle strike scenarios can produce upsets at the outputs or reinforce the outputs instead.

Original Gate Particle Charge Propagated Gate Output Charge Same Value Strike on Collected **Polarity** to Output **Behavior** HIGH NMOS Negative No Yes Upsetting SET (+)(-) $(\downarrow)$ HIGH NMOS Negative No No Change No (+)(-)HIGH **PMOS** Positive Yes Yes Reinforcing SET (+)(+)No Change HIGH **PMOS** Positive Yes No (+)(+)LOW NMOS Negative Yes Yes Reinforcing SET (-)(-) $(\downarrow)$ LOWNMOS Yes No No Change Negative (-)(-)LOW **PMOS** Positive Yes Upsetting SET No (+)(-) $(\uparrow)$ LOW No No Change **PMOS** Positive No (-)(+)

Table 4.1: Conditions to produce single-event transients in CMOS circuits.

# 4.2 Traditional Layout Techniques for Charge Collection Mitigation

The conventional view on layout mitigation techniques for single-event charge collection has been focused on either reducing the charge collection of critical or sensitive individual circuit nodes, or the probability of simultaneous charge collection between a sensitive circuit node pair for circuits protected by SEU-immune circuit techniques. This section describes the two main layout techniques commonly used to improve soft error resilience in digital circuits.

#### 4.2.1 Guard Rings, Guard Contacts and Guard Drains

With technology scaling, digital circuits carry less and less capacitance per node, leading to a reduction of critical charge per node and a greater susceptibility to SETs. Since most of the single-event charge collection occurs near transistor drain/source junctions, a well placed well contact (ohmic diffusion contact tying a bias voltage to the well) in the vicinity of a transistor can ensure stable well potential and prevent parasitic PNP bipolar conduction of the transistor (Figure 4.4), suppressing the actual charge collection and resulting in smaller widths and amplitudes for SETs and improved immunity against SETs and SEUs [Olson et al., 2007]. The placement of a well contact near a transistor can also protect against Single-Event Latchup (SEL) by preventing the activation of a parasitic PNPN structure in a NMOS-PMOS pair where the transistors are situated next to each other (previously discussed in Section 2.1.2). Figure 4.5 shows how well contacts are placed in a CMOS inverter layout example. The well contacts are usually placed some distance away from active transistors due to N+ to P+ diffusion layer separation rules (see Figure 4.6a), but some fabrication processes may allow both N+ and P+ diffusion layers to join each other under certain geometric conditions (see Figure 4.6b).

Various well contact or well-contact-like geometries have been proposed by *Black* et al. [2005], *Narasimham et al.* [2007], *Olson et al.* [2007], and *Narasimham et al.* [2008]. Here is a short summary of the various forms well contacts:

- 1. A Guard Band is a rectangular strip of highly doped diffusion ohmic well contact, placed between groups of one or more transistors within the same well to minimize radiation-induced charge sharing among the two transistor groups [Black et al., 2005]. In typical standard library cells, continuous horizontal ohmic diffusion strips are used as baseline well contact structures (see Figure 4.7a). Guard bands can be used to extend the baseline horizontal well contact strips to separate different transistor sections residing within the same well (see Figure 4.7b).
- 2. A *Guard Ring* is a continuous strip of highly doped diffusion ohmic contact, in close proximity and completely surrounding a group of one or more transistors



Figure 4.4: Bipolar parasitic conduction in an "OFF" PMOS transistor activated by a radiation strike, resulting in a current injection into the drain node, adapted from Olson et al., [2005]. A more closely situated well contact ("B" terminal) results in a smaller well resistance  $R_{\rm well}$ , reducing the possibility of activating the parasitic PNP device during a radiation strike in the vicinity of the transistor.

[Clein, 1999]. Guard rings offer better protection than guard bands, and are often necessary to prevent SELs in I/O or high-current-drive circuits. Guard rings can be fully contacted or partially contacted. Fully contacted guard rings (as shown in Figure 4.7c) feature rings of fully contacted ohmic diffusion and Metal 1 layers completely surrounding the target transistor or transistor group, and require the use of Metal 2 routing layer to connect between transistors residing inside different guard rings. Partially contacted guard rings allow the use of Metal 1 layer to connect transistors inside different guard rings without requiring the use of Metal 2 layer within a cell, but are not as effective as fully contacted guard rings in removing excess charge (see Figure 4.7d).



Figure 4.5: Well contacts in CMOS cell layout. (a) Inverter schematic with transistor body terminals shown. (b) Layout of an inverter showing the location of well contacts used to transistor body terminal connections.

3. Narasimham et al. [2008] proposed a different form of charge removal technique called Guard Drains, which are dummy reversed-biased diodes placed near the drain node of transistors. Figure 4.8 shows how drain contacts can be placed in a CMOS inverter layout. Guard drains can be viewed as antenna diodes liberally tied to the well region near the drain terminals of a transistor they want to protect. Layout-wise guard drains are very similar to well contacts, except they use diffusion type opposite to the well they reside in.



Figure 4.6: Well contact styles in CMOS inverter cell layout. (a) Isolated well contacts. (b) Abutted well contacts (only allowed in certain fabrication processes).



Figure 4.7: Guard bands and guard rings in CMOS inverter cell layout. (a) Baseline horizontal well contacts commonly used in standard cell library design. The horizontal well contacts lie underneath the horizontal Metal 1 supply lines. (b) Additional vertical guard bands extend from the baseline well contacts. (c) Fully contacted guard rings. This layout style requires the use of vertical Metal 2 wires within the cell. (d) Partially contacted guard rings.



Figure 4.8: Guard drains in CMOS layout. (a) Example inverter layout showing abutted well contacts and guard drains. (b) Vertical cross-section of the dashed-line NWELL region from the example layout.

## 4.2.2 Node Separation

In SEU-immune sequential cell design, upsets can happen if multiple circuit node voltages are disturbed during a particle strike. Previously, Section 3.5 showed how the Dual Interlocked Storage Cell (DICE) is sensitive to a particle strike affecting a circuit node pair. Seifert et al. [2007] quantified the strong exponential nature of the SER dependence on critical node separation distance in SEUT, a DICE implementation



Figure 4.9: Soft error rate dependence on node separation. The simulated SER (in arbitrary units) is plotted against minimum critical node separation in the 45nm SEUT design, a DICE-based sequential cell (adapted from *Seifert et al.* [2007].

as shown in Figure 4.9. As a general rule of thumb, the doubling of node separation distance results in roughly 10X reduction in the upset probability of a circuit node pair, given the same SEU-immune circuit design and the same fabrication process. As the critical charge per circuit node continues to drop due to process scaling, it is impossible to maintain the same SER for a given design without increasing the relative distances between individual circuit nodes. Recent results in 32nm bulk CMOS process from Seifert et al. [2010a] show that SEU-immune designs only achieve in the order of 10X SER reduction compared unprotected designs, instead of >100X SER improvement typically enjoyed in 90nm bulk CMOS or older process generations. If this trend continues, within the next process generations, we can expect similar soft error rates from redundancy-based designs compared to non-hardened designs. Therefore, SEU-immune circuit designs cannot rely on node separation to protect sensitive circuit node pairs. Instead, the need to protect specific sensitive circuit node pairs in future silicon generations becomes imminent, and leads to the motivation to develop new layout techniques addressing sensitive circuit node pairs described in the following section.

## 4.3 Layout Design through Error-Aware Transistor Positioning (LEAP)

Aggressive technology scaling is driving transistor dimensions and the distances between them smaller than ever, and with the increasing transistor density comes a heavy price: SEMUs. Typically, a particle strike can affect a silicon area within a  $1\text{--}10~\mu\mathrm{m}$  radius of the site of impact. For technologies below 180nm, inter-transistor distances become much smaller than  $10~\mu\mathrm{m}$ , and if the scaling trend continues, SEMUs will be totally unavoidable in bulk silicon technology due to charge sharing. Circuit redundancy schemes from Sections 3.3 and 3.4 are no longer sufficient, since the redundant circuit nodes can now reside within the same area of impact. Instead of focusing on separating redundant circuit nodes apart as discussed in the previous section, we decided to look at existing circuit interactions to try to better understand the charge collection process and its effect in circuits, and devise new design strategies to reduce SEMUs. Our goal: given the knowledge of a circuit topology, we want to find the optimal transistor placement to improve the soft error resilience of the circuit with minimal area overhead. With this goal in mind, we developed a new principle of layout topology management we call "LEAP".

**LEAP** (or *Layout Design through Error-Aware Transistor Positioning*), first introduced by *Lilja* [2008], is a new layout principle for soft error resilience of digital circuits [*Lee et al.*, 2010]. According to the LEAP principle, given a circuit topology, without any modification of the circuit, it is possible to produce a soft-error-resilient layout, by performing:

- 1. An analysis of the circuit response to a single event for each individual drain contact node in the layout, and
- 2. A careful placement of each drain contact node in the layout based on the above analysis, such that multiple drain contact nodes act together to cancel (fully or partially) the overall effect of the single event on the circuit.

As an initial illustration of how LEAP utilizes the different effects of charge collection on multiple nodes to reduce single event sensitivity, consider the case where



Figure 4.10: LEAP principle for an inverter through transistor alignment. (a) Reduced charge collection when a particle hits both NMOS and PMOS drain contact nodes of an inverter simultaneously. (b) Horizontal transistor alignment to reduce charge collection.

the drain contact nodes of the PMOS and NMOS transistors in an inverter are simultaneously hit by a particle strike. In the inverter example shown in Figure 4.10, the positive charge collected by the PMOS transistor is offset by the negative charge collected by the NMOS transistor, resulting in lower total charge collection at the output node. The extent of the charge reduction depends on the relative sizes of both drain contact nodes as well as the exact strike direction hitting both nodes.

The above inverter example shows how single event charge collection in multiple drain contact nodes, sharing the same circuit node, can be used to reduce the effect of the single event transient on the circuit. The LEAP technique also considers interactions between multiple circuit nodes. To illustrate this, consider a pair of cross-coupled inverters (Figure 4.11a). The state of this circuit is the state of the

latch formed by the two inverters: "STATE0" (A=0, B=1) and "STATE1" (A=1, B=0). A single event affecting drain node n1 can pull circuit node A LOW and turn on transistor M4, driving circuit node B HIGH and pushing the latch state toward "STATE0". Conversely, a single event affecting drain contact node n3 can pull circuit node B LOW and turn on transistor M2, driving circuit node A HIGH and pushing the latch state toward "STATE1". If an energetic particle simultaneously strikes both drain contact nodes n1 and n3, charge collection on drain contact node n3 reduces the effect of the charge collection at node n1 (for any initial state of the latch). The result is a higher LET upset threshold for a single event affecting both n1 and n3, than for a single event affecting only n1. The increase in LET upset threshold can be quantified using Technology CAD (TCAD) simulations capable of accurately modeling the charge collection. Figure 4.11b shows a layout of a cross-coupled inverter which utilizes charge cancellation for SEMU resilience along the horizontal direction.

As the examples in Figure 4.10 and Figure 4.11 demonstrated, in a circuit layout designed using the LEAP principle, different drain contact nodes can interact with each other during radiation-induced single events to reduce the overall charge collection.

In general, when an energetic particle hits a transistor diffusion contact node in an "OFF" transistor tree of a static CMOS gate (the transistor hit does not need to be "OFF"), sufficient charge collection by the node can induce a temporary change in logic level of the gate output (shown in Figure 4.12a), if the charge collected by the diffusion contact node (-Q) can flow to the output with a clear path. In fact, any charge collection in a logic gate can only induce a change at the gate output if the charge can flow freely toward the output through a clear path, regardless of the charge polarity. If the gate output is previously driven by an "ON" transistor tree (the gate output is being actively driven and not floating), the "ON" transistor tree will try to remove the collected charge with an active current (I in Figure 4.12a). The current drive of the "ON" transistor tree and the charge deposition profile will determine the rate at which the collected charge will be removed, the shape of the output glitch (SET) and whether the glitch is large and wide enough to be propagated to the subsequent logic stages. Figure 4.12b shows an example how a single-event



Figure 4.11: LEAP principle for a cross-coupled inverter pair. (a) Circuit schematic. (b) Transistor alignment to reduce charge collection in the horizontal direction.

transient can be produced in a two-input NAND gate.

The charge collection behavior is very similar when an energetic particle hits a transistor diffusion contact node in an "ON" transistor tree, provided again that the diffusion contact node has a clear path to the output (Figure 4.13). In this case however, the charge collection reinforces the output logic value and does not result in any upset.

Using the generalized charge collection mechanisms for static CMOS circuits, we can develop the following LEAP single-event transient suppression techniques to reduce overall charge collection: Direct LEAP SET Suppression, Indirect LEAP Type I SET Suppression and Indirect LEAP Type II SET Suppression. In Direct LEAP SET Suppression, (Figure 4.14a), when a particle strike hits both "OFF" and "ON" transistor trees sharing the same gate output, the charge collected by the "ON" transistor tree can oppose the charge collected by the "OFF" transistor, effectively reducing the



Figure 4.12: Radiation-induced charge collection on the "OFF" transistor tree of a static CMOS gate. (a) Generalized CMOS gate. (b) Two-input NAND gate example.

effect of the single-event transient at the output. In this case, the collected opposing charges can affect the gate output only if they have clear paths to the gate output. Because an actively driven CMOS gate always has an "ON" transistor tree and an "OFF" transistor tree, it is always possible to find a helper candidate diffusion node in the "ON" transistor tree to collect a reinforcing charge (the collected charge reinforces the gate output) to counter the upsetting charge (the collected charge changes the gate output) collected by a victim candidate diffusion node in the "OFF" transistor tree. In fact, any transistor diffusion node in the "ON" transistor tree attached to the gate output is automatically a helper node. However, because the victim and helper nodes are of different MOSFET types, there must maintain a minimum separation distance due to layout spacing rules.

Indirect LEAP Type I SET Suppression (Figure 4.14b) happens when a particle strike hits both the "OFF" transistor tree of the current gate as well as the "ON" transistor tree of the previous gate, which drives the "ON" transistor tree of the current gate. The hit on the "ON" transistor tree of the previous gate can produce a reinforcing SET at the output of the previous gate driving into the "ON" transistor tree of the current gate. The reinforcing SET increases the gate overdrive of



Figure 4.13: Radiation-induced charge collection on the "ON" transistor tree of a static CMOS gate. (a) Generalized CMOS gate. (b) Two-input NAND gate example.

the active section of the "ON" transistor tree and the recovery current it produces. Consequently, the strengthened recovery current can more efficiently remove the upsetting charge from a simultaneous hit on the "OFF" transistor tree. This type of SET suppression technique has the advantage that both the victim and helper diffusion nodes on which a particle simultaneously strikes are of the same MOSFET type and are therefore easier to be placed closer to each other in order to reduce layout area cost compared to Direct LEAP SET Suppression.

Indirect LEAP Type II SET Suppression (Figure 4.14c) occurs when a particle strike deposits charge on both the "OFF" transistor tree of the current gate as well as the "OFF" transistor tree of the previous gate, which drives the "OFF" transistor tree of the current gate. The hit on the "OFF" transistor tree of the previous gate produces an upsetting SET at the previous gate output. If this SET drives the gate input of a transistor in the path through which the upsetting charge collected by the victim diffusion node in the current "OFF" transistor tree can flow to the current gate output, it can turn off this transistor (changing from "ON" state to "OFF" state) and prevent the upsetting charge from reaching the gate output if no other charge flow path is available from the victim diffusion node to the gate output. Similar to the



Figure 4.14: Single-event transient suppression in a CMOS gate using the LEAP principle. (a) Direct LEAP SET suppression. (b) Indirect LEAP Type 1 SET suppression. (c) Indirect LEAP Type 2 SET suppression.

case of Indirect LEAP Type I SET Suppression, both victim and helper nodes also share the same MOSFET type. Because a transistor switches from the "ON" state to the "OFF" state in the upsetting charge flow path, Indirect LEAP Type 2 SET Suppression cannot happen for gates without any transistors in series (no blocking transistor available).

The LEAP layout principle places the drain contact nodes in the layout to take advantage of the opposing single event effects discussed above. Since LEAP does not depend on physical separation of sensitive circuit nodes, which may require large area overheads, LEAP-based designs can be more compact. As a matter of fact, one study in 130nm bulk CMOS showed that the node separation technique may provide

a 10X reduction in charge collection at the expense of increased node separation from  $0.18\mu m$  to  $2\mu m$  [Amusan et al., 2006].

# 4.4 LEAP-DICE: A Case Study

To complement the SEU immunity of the circuit techniques described in Chapter 3 with the SEMU soft error resilience of the LEAP layout principle showcased in the previous section, we choose the DICE circuit topology (from Section 3.4.2) as an example of the DMR-enhanced sequential circuit, then apply the LEAP layout principle on the DICE circuit to create a new sequential element layout called LEAP-DICE [Lee et al., 2010]. The new LEAP-DICE layout is both SEU-immune and SEMU-resilient. In Section 3.4.2, we have demonstrated that given an initial state shown in Figure 4.15a, the DICE circuit is vulnerable to a SEMU strike on both transistors M1 and M8 (see Figure 4.15b). In contrast, if an energetic particle strikes both transistors M1 and M2, according to the LEAP principle, the overall charge collection on circuit node A can actually be reduced, since the drain nodes n1 and n2 (of transistors M1 and M2) collect charges of opposite polarity through direct LEAP single-event transient suppression (see Figure 4.15c).

Since the DICE circuit is only sensitive to SEMU strikes involving at least two "sensitive" transistor drain nodes (all source nodes are attached to power and ground in this example), for each possible SEMU path involving two sensitive transistor drain nodes, we try to place a "protective" transistor drain node between the sensitive node pair. In this way, every time both nodes in the sensitive node pair are struck simultaneously, the inserted protective node will also be struck, and reduce the effective charge collection in one of the sensitive nodes through LEAP charge cancellation. Table 4.2 lists the different two-node SEMU combinations for the DICE circuit, along with possible protective nodes for each sensitive node pair.

Figure 4.16a shows the standard DICE layout. Using only direct LEAP interactions, we can produce a new LEAP-DICE layout (see Figure 4.16b), where all possible two-node SEMU strike directions are protected. Figure 4.17 illustrates the SEMU example involving sensitive nodes n1 and n8, where the placement of node n2 between



Figure 4.15: Single-event charge collection in the Dual Interlocked Storage Cell (DICE). For simplicity, each "OFF" transistor is grayed off. (a) Initial state. (b) SEMU strike on transistor M1 and M8. (c) Reduction in charge collection when transistors M1 and M2 are struck together.

Table 4.2: All two-node SEMU combinations and possible protective node selection. Actual SEMU strikes can involve more than two transistor diffusion nodes, but will always include at least one sensitive node pair listed in this table.

| Sensitive |      | Protective Node for |          | Protective Node for |          |  |
|-----------|------|---------------------|----------|---------------------|----------|--|
| Node Pair |      | Sensitive Node 1    |          | Sensitive Node 2    |          |  |
| Node      | Node | Direct              | Indirect | Direct              | Indirect |  |
| 1         | 2    | LEAP                | LEAP     | LEAP                | LEAP     |  |
| n1        | n5   | n2                  | n3       | n6                  | n7       |  |
| n1        | n8   | n2                  | n3       | n7                  | n6       |  |
| n2        | n3   | n1                  | n8       | n4                  | n5       |  |
| n2        | n6   | n1                  | n4       | n5                  | n8       |  |
| n3        | n7   | n4                  | n5       | n8                  | n1       |  |
| n4        | n5   | n3                  | n2       | n6                  | n7       |  |
| n4        | n8   | n3                  | n2       | n7                  | n6       |  |
| n6        | n7   | n5                  | n4       | n8                  | n1       |  |

n1 and n8 protects the sensitive pair. The LEAP-DICE layout also has an additional benefit where the introduction of four wells provides some isolation between individual nodes, and thus reduces charge sharing. It is also interesting to note that, even the standard DICE layout is partially protected through LEAP interactions. For instance, two sensitive nodes of the same type (ex: n1-n5) are protected by inserting another protective node of the same type (n3) between the sensitive nodes. There also exists an alternative compact DICE layout created using LEAP interactions (see Figure 4.18), but it is not as soft-error-resilient as LEAP-DICE.

The new LEAP-DICE layout offers an additional benefit in terms of soft error resilience: since all transistors are horizontally aligned, the number of possible particle tracks hitting multiple circuit nodes in LEAP-DICE is confined to a narrow incident angle range around the horizontal direction, further reducing its susceptibility to SEMUs. The analysis of SEMUs on more than two drain contact nodes becomes quite complex, and we only focus our discussion on reducing SEMUs related to simultaneous hits on the drain contact nodes of two "OFF" transistors.

Even with the best possible transistor placement in mind using LEAP, accurate



Figure 4.16: Two DICE layout configurations. (a) Standard DICE layout. (b) LEAP-DICE layout.



Figure 4.17: LEAP-DICE layout with the SEMU example highlighted.

single event simulations must be used to provide effective quantitative assessment of LEAP for a specific circuit and layout. Many device-level effects (notably charge sharing in the well, particle energy and electric field distribution affecting charge movement) as well as 3D geometric considerations (particle strike angles, device contours), can only be accounted for using 3D device simulations capable of predicting the charge flow and distribution caused by a particle strike. To compare the soft error resilience of the new LEAP-DICE layout (referred to as "LEAP-DICE") with the standard DICE layout (referred to as "DICE"), we performed mixed-mode 3D-Technology CAD (TCAD) simulations in 90nm bulk technology using the tool ACCURO provided by Robust Chip Inc. A "snapshot" from one such simulation is



Figure 4.18: Alternate DICE layout using both direct and indirect LEAP interactions.

shown in Figure 4.19. The ACCURO single event simulation tool can simulate single event charge distribution and charge collection while fully accounting for layout, substrate and circuit details, and provides accuracy similar to a full 3D TCAD device simulation [Lilja, 2009; Robust Chip Inc., 2010]. It is fast enough to run a very large number of single event experiments to perform error cross-section analysis and Linear Energy Transfer (LET) threshold prediction.

To evaluate the LEAP-DICE storage cell layout and compare it to the DICE storage cell layout, we implemented the two "DICE" and "LEAP-DICE" layouts in 90 nm CMOS technology for the DICE circuit. Minimum device widths were used for the NMOS devices with a P/N width ratio of 2.4. Note that both layouts use exactly the same circuit and device sizing.

For selected incoming angles (defined using "tilt" and "azimuth") of the single-event generating particle, ACCURO scans the entire layout area and finds the LET upset threshold at every scan point in the layout. Initial charge generation, governed by the LET of the particle, is injected into the three-dimensional circuit structure along the trajectory of the particle, and the charge transport and collection at the drain contact nodes are simulated. At the end of simulation, the voltage on the DICE output node is monitored to determine whether the DICE circuit is upset. Figure 4.20 shows the DICE output voltage (circuit node n4 in Figure 3.14) for different LET values for a particle strike at a particular angle of incidence.

As discussed earlier, the LEAP-DICE storage cell (Figure 4.16b) can only be



Figure 4.19: Simulation "snapshot" for the LEAP-DICE latch structure used in the ACCURO simulation, with color coded electron concentration during a single event particle strike with 180° azimuth, 85° tilt angles.

upset when the angle of incidence is in a narrow cone around tilt =  $90^{\circ}$  and azimuth =  $0^{\circ}$  or  $180^{\circ}$  due to the horizontal alignment of the transistors. For a conventional layout of the DICE storage cell (Figure 4.16a), the latch can be upset for multiple azimuth angles around  $90^{\circ}$  tilt. We determined one sensitive direction to be around an azimuth angle of  $305^{\circ}$ , corresponding to strikes directions hitting the following pair of drain contact nodes: n2-n3, n4-n5 and n6-n7. We also show simulation results for another sensitive direction with an azimuth angle of  $220^{\circ}$ , corresponding to strike directions hitting drain contact nodes n1 and n8. Figure 4.21 shows the coordinate system for particle strike directions in this study. Figure 4.22 and Figure



Figure 4.20: Simulated DICE output voltage for single events with different Linear Transfer Energy (LET) levels using the LEAP-DICE 3D structure in Figure 4.19, with a strike direction of 180° azimuth, 85° tilt angles.

4.23 show the cross-section regions for DICE and LEAP-DICE respectively at the sensitive strike directions with given LET level. The cross-section region is the area in the plane perpendicular to the particle trajectory where an incident particle strike can upset DICE or LEAP-DICE. For improved visibility, this cross-section region is first projected to the side of the simulation structure (x-z, and y-z planes), then rotated to the plane of the layout and centered.



Figure 4.21: Coordinate system for particle strike directions. This coordinate system differs from the conventional coordinate system for evaluating radiation strikes, where the z-direction is pointed downward instead.



Figure 4.22: Layout configuration with projected cross-section regions for DICE at LET =  $3/30~{\rm MeV\cdot cm^2\cdot mg^{-1}}$ . Blue/green region on x-z plane for particle strike direction with 85° tilt /  $305^{\circ}$  azimuth, orange/green on y-z plane for particle strike direction with 85° tilt /  $220^{\circ}$  azimuth.



Figure 4.23: LEAP-DICE storage cell layout with projected error cross-section region on y-z plane for  $85^{\circ}$  tilt and  $180^{\circ}$  azimuth angles.



Figure 4.24: Error cross-section comparison of DICE ( $90^{\circ}$  tilt /  $305^{\circ}$  azimuth and  $90^{\circ}$  tilt /  $220^{\circ}$  azimuth) and LEAP-DICE ( $85^{\circ}$  tilt /  $180^{\circ}$  azimuth) as a function of Linear Energy Transfer.

Figure 4.24 shows the error cross-sections (area in the circuit vulnerable to upsets) of DICE and LEAP-DICE as a function of LET for the selected directions in each layout. The lowest LET upset threshold for LEAP-DICE is almost an order of magnitude larger than the lowest LET upset threshold for DICE (at 305° azimuth). Note that, even at an angle quite far from the worst case (220° azimuth), the LET upset threshold of the DICE is still as low as the upset threshold of the LEAP-DICE. The LEAP-DICE implementation has its lowest LET upset threshold around 85° tilt, whereas the DICE has the minimum at 90° tilt.

The simulated LET upset threshold as a function of tilt angle, at the worst case azimuth angle, is shown in Figure 4.25. The LET upset threshold analysis shows that "LEAP-DICE" has a much higher LET upset threshold than "DICE" for all tilt directions at their worst case azimuth angles.

While the simulations presented here are relatively limited in scope, they show the effectiveness of the LEAP principle and quantify the reduction in LET upset



Figure 4.25: LET upset threshold as a function of tilt angle for LEAP-DICE at an azimuth angle of 180°, and for DICE at an azimuth angle of 305°.

threshold and cross-section for LEAP-DICE compared to DICE. From the previous error cross-section and LET threshold analysis, we conclude that "LEAP-DICE" is more soft-error-resilient than "DICE" based on our TCAD simulation results.

# 4.5 Conclusion

In this chapter, we demonstrated how circuit layout can influence the way radiation-induced charge collection impacts circuit operation. A conventional soft-error-resilient circuit layout can rely on removing excess charge collection through the use of well contact structures or reduce the probability of multiple circuit nodes being affected by a single particle strike by placing sensitive circuit nodes far apart from each other. However, both methods require additional area penalty in order to be implemented.

We presented a new layout principle for soft error resilience, called LEAP, or Layout Design through Error Aware Transistor Positioning [Lee et al., 2010]. LEAP

looks at circuit interactions within an existing circuit topology, and places transistors in such a way that transistors within the design can help each other reduce the probability of an upset due to a particle strike. In particular, we illustrated how LEAP can enhance the SEU-immune but SEMU-sensitive DICE circuit by creating the SEU-immune and SEMU-resilient LEAP-DICE layout. Our simulations in 90nm bulk technology confirm that the LEAP principle applied on circuit layout can indeed protect certain sensitive SEMU strike directions in the circuit layout. With device feature size and distances between individual transistors shrinking with technology scaling, single particle strikes will have higher probability of affecting multiple transistors (or SEMU probability) [Seifert et al., 2006; Seifert et al., 2010]. Layout design using LEAP addresses this concern by targeting SEMU charge collection through the use of multiple circuit node interactions to reduce the overall soft error rate.

# Chapter 5

# Test Chip Implementation

This chapter presents the implementation of a test chip containing various radiation-hardened flip-flop designs utilizing circuit and layout techniques for soft error resilience discussed in Chapters 3 and 4. The aim of this chapter is to describe in as much detail as possible the test chip design so that a similar test chip can be designed to evaluate the soft error resilience of flip-flops in any future technology.

The chapter begins with a brief overview of the test chip and the flip-flop designs being tested. The discussion then continues with detailed descriptions of each radiation-hardened flip-flop cell design. Due to different soft error sensitivity for each design, various circuit and layout details for each design are provided so that each design can be faithfully reproduced and tested for soft error resilience. Other circuit implementation details are also given concerning I/O design and clock generation to provide suitable operating conditions for the flip-flop designs under radiation test.

Note that, although the various flip-flops in the test chip are implemented with the best design decisions possible prior to fabrication, it is impossible to accurately assess the overall soft error rate of each design before an actual single-event effect testing is conducted on the test chip, since radiation data on the silicon process was not previously available. To ensure soft error resilience, a conservative design approach is taken at the expense of silicon area utilization to minimize charge collection by the circuits under test.

## 5.1 Process Selection

Selection of fabrication process can have a large impact on the radiation tolerance as well as the manufacturing cost of integrated circuits. For CMOS integrated circuits, bulk technology is more susceptible to radiation-induced charge collection and multiple node charge sharing than Silicon-on-Insulator (SOI) technology, but is usually available at a fraction of the cost of the latter. The chosen technology node (i.e. the minimum drawn feature size of the process — typically the minimum polysilicon width) can also strongly impact the soft error resilience of the designs, as designs using smaller transistors have lower critical charge levels in their circuit nodes and are therefore more prone to charge collection. Moreover, technology scaling also reduces the distance between individual transistors, and consequently increases the probability of multiple-node charge collection.

In addition to commercially available processes, there also exist radiation-hardened processes, where transistors and field oxides are made to be more resistant to total-dose device degradation, and where the addition of high density resistors and capacitors can improve the soft error resilience of memory cells [Roche and Gasiot, 2005; Lysinger et al., 2008]. However, radiation-hardened processes are very expensive and are often offered several generations behind mainstream technologies, and fabrication access to these processes may be restricted by government export regulations.

Since the goal of this research is to demonstrate circuit and layout soft error resilience techniques irrespective of fabrication process, we chose to implement our test structures in a commercial low-cost bulk CMOS fabrication process generously provided by National Semiconductor Corporation. This process is a single-well, 5-metal 180nm CMOS process fabricated on an epitaxial substrate.

# 5.2 General Test Chip Description

To validate the soft error resilience of circuit and layout techniques discussed in Chapters 3 and 4, we implemented several flip-flop designs in 180nm CMOS bulk technology. To ensure that fair performance comparison can be made between the



Figure 5.1: Master-slave flip-flop configuration.

different designs, each flip-flop design is buffered by a minimum-size input inverter and a minimum-size output inverter. All designs employ a master latch and a slave latch, with independent clocks for each master or slave stage to allow the possibility of separately testing either the master latch or the slave latch. Figure 5.1 shows the master-slave configuration used for most of the designs.

The test chip contains the following flip-flop designs (more details on each design can be found in Section 5.3.4):

- 1. BASIC: reference standard D flip-flop implemented with a master D-latch followed by a slave D-latch (both identical designs). All transistors are minimum-size transistors (P/N= $0.68\mu$ m/ $0.28\mu$ m, L= $0.18\mu$ m).
- 2. BASIC2: same design as BASIC FF but with double transistor sizes for all transistors except in the input, output and clock inverters. It thus consumes 1.44X power (instead of 2X) and 1X area of BASIC as cell height remains unchanged.
- 3. SCDMR: flip-flop similar to the SCDMR flip-flop shown in Figure 3.10, where each pair of master-slave latches constitutes a flip-flop identical to BASIC.
- 4. QCDMR: master-slave D flip-flop using the QCDMR latch design in Figure 3.12e for both its master and slave latches.
- 5. DIFF: master-slave D flip-flop using the DCVSL latch design in Figure 3.15c for both its master and slave latches.

**SCDMR** 

**QCDMR** 

DIFF

DICE

LEAP-DICE

| Flip-Flop | Transistor | Layout | Power | Average Clock   |
|-----------|------------|--------|-------|-----------------|
| Design    | Count      | Area   |       | to Output Delay |
| BASIC     | 24         | 1.00   | 1.00  | 1.00            |
| BASIC2    | 24         | 1.00   | 1.44  | 0.97            |

2.33

3.00

2.42

1.67

2.33

2.16

3.76

3.37

1.50

1.54

1.37

1.80

2.48

1.06

1.07

60

84

56

52

**52** 

Table 5.1: Normalized performance comparison for various flip-flop designs. The numbers are obtained from post-layout simulation at 40 MHz and 1V supply.

- 6. DICE: master-slave D flip-flop using the DICE latch design in Figure 3.14 for both its master and slave latches with standard transistor placement in Figure 4.16a.
- 7. LEAP-DICE: flip-flop using DICE latch in Figure 3.14 with the new LEAP-DICE layout from Figure 4.16b.

Table 5.1 lists the performance parameters from our circuit simulations. Each flip-flop was drawn in Cadence Virtuoso using 180nm CMOS technology, and post-layout parasitic extraction was performed on each layout design using a Cadence Assura RCX extraction script provided by National Semiconductor. Cadence Spectre was used to simulate the flip-flops using the extracted netlists. The simulation setup is shown in Figure 5.2. For each simulation, the supply voltage was set at 1V, and two 40-MHz non-overlapping clocks drove the master and slave stages of the flip-flops. The delay, area and power performance numbers of all designs are normalized to the reference design, "BASIC". Each design drives a 4X minimum-size inverter load (included in power simulation), at about 40% of the power consumed by the BASIC design.

Once the test flip-flop designs are implemented in layout, the flip-flops are organized in arrays for testing. Each array consists of 144 rows of identical flip-flops, with each row containing between 16 to 32 flip-flops depending on the size of each flip-flop



Figure 5.2: Simulation test bench for post-layout timing and power measurements. All inverters are minimum sized except for the final inverter at the output (sized at 4X minimum size).

(larger designs have fewer cells per row so that each array size is identical). Inside each array, flip-flops are connected as a single scan chain of between 2304 to 4608 flip-flops in a snake-like fashion (see Figure 5.3a). The flip-flop array is supported by its own I/O circuitry and clock drivers, and together the different circuit blocks form a self-contained test chip for a particular flip-flop design (see Figure 5.3b). The actual test silicon contains 8 independent test chips named according to the flip-flop design being tested ("BASIC", "BASIC2", "SCDMR", "QCDMR", "DIFF", "DICE" and "LEAP-DICE"; "N/A" is not included in the discussion of this work). Figure 5.4 shows the die photograph of the 5mm × 5mm fabricated test chip using an 180nm bulk CMOS process from National Semiconductor. To maximize area utilization, scan chains of 2,304 to 4,608 flip-flops were implemented for each design, with the length of the chain inversely proportional to the cell area of the design. Each scan chain has its own I/O and core supplies so that each chain can be independently tested. The non-overlapping clocks for the master and slave stages of the flip-flops, MCLK and SCLK respectively, can be supplied from an external source (Figure 5.5) or generated internally (see Section 5.3.5 for discussion on the non-overlapping clock generation).



Figure 5.3: Test chip organization. (a) Flip-flop array. (b) Test chip.

# 5.3 Test Chip Implementation Details

To design a general purpose Application-Specific Integrated Circuit (ASIC) digital chip, it is often useful to determine a layout design methodology before the actual design is drawn for silicon. Due to the complexity of modern-day digital designs (often amounting to millions of transistors or more per millimeter square area), it is nearly impossible to know before hand how the design will be organized in layout without an effective design abstraction strategy. To facilitate the design of complex logic blocks, we chose to use a cell-based layout methodology, where commonly used logic gates or small logic cells are organized into a library called Standard Cell Library. The use of standard-cell-based design approach allows one designer to focus on high-level implementation of a digital design, while another designer focuses on the layout



Figure 5.4: Die photograph of the 180nm bulk test chip.

implementation of the individual cells. Since the standard cells can be reused for different designs, high-level designs can be translated into netlists of standard cells which are then placed on a suitable placement grid of common size and connected (or routed) by metal wires.

# 5.3.1 Standard Cell Layout Style

#### Layout Grid

Before making the standard logic cells suitable for placement, we must first determine the size of the routing grid. The routing grid is defined as a square (or in some cases



Figure 5.5: Test chip clocking scheme, with a scan chain of flip-flops using two separate external clocks, MCLK and SCLK, clocking the master and slave stages of the flip-flop, respectively.

rectangular) placement grid for metal wires, where:

- 1. The routing grid is only valid for metal wires of fixed width from metal routing layers (metal 1, metal 2 etc.). Also, the different metal routing layers must share a common routing grid so that different metal layers can be connected by metal vias. However, one metal layer can have a smaller routing grid than another as long as that they share common grid points.
- 2. Each metal wire is drawn using Manhattan distances (or unit grid distances) on its routing grid, with the center of the metal wire lying on the placement grid.
- 3. Each metal layer has a preferred orientation for metal wires. It is customary to have alternating orientations for consecutive metal layers. For example, the metal 1 layer wires can be routed horizontally, while metal 2 layer wires can be routed vertically, metal 3 again horizontally, and so on. Note that metal layers are named in ascending number according to their proximity to the silicon substrate (i.e. metal 1 layer is the lowest metal routing layer).

- 4. To connect metal wires between different layers, the two metal wires must intersect on a common grid point where an inter-metal via is placed to connect both metal wires. As a result, vias are always placed on common grid points.
- 5. A correctly placed metal wire never violates any layout distance or sizing rule.

The minimum routing grid size for any given metal layer can be easily determined using the following relationship:

 $routing\ grid\ size\ \geq min.\ via\ (or\ wire)\ width+min.\ metal\ spacing$ 

Figure 5.6 shows a routing grid example, where metal 1 wires are drawn horizontally and metal 2 wires are drawn vertically. For the test chip implementation, the routing grid size is set to 0.72  $\mu$ m × 0.72  $\mu$ m.

#### Standard Cell Layout

Once the routing grid is set, we can start designing cells for the standard cell library. From a high-level layout design perspective, standard cells are viewed as small black boxes where only the cell placement boundaries and the location of the input/output (I/O) pins are visible to the layout designer. In order to allow a smooth placement and routing of these standard cells, both the cell placement boundaries as well as the I/O pins of the cells must lie on the routing grid. Note that the cell placement boundary does not necessarily correspond to the actual cell boundary, as some cells can share parts of their borders with their neighbors. Figure 5.7 shows the minimum drive strength standard inverter cell used in the test chip implementation. This cell employs guard rings (as discussed in Section 4.2.1) to reduce overall radiation-induced charge collection.

Standard logic cells also usually share a common same cell height so that they can be placed next to each other on rows of identical height to facilitate their placement. After the cells are placed, routing is done on metal layers above the cells to avoid wire shorts between routing wires and wires internal to the cells. To facilitate wiring in logic blocks with hundreds to thousands of cells, horizontal wires reside in the same



Figure 5.6: Routing grid example.

layer (M3) while vertical wires reside in a different layer (M2). Wire tracks in the same direction are analogous to highway lanes, and signals can travel horizontally or vertically by moving up and down metal layers through metal vias for maximum track utilization. Using pre-determined wire tracks, integrated circuit layout routing software can perform automatic wire routing. Figure 5.8 shows an example of complex wire routing done on a group of placed standard cells.



Figure 5.7: Standard cell inverter layout example. The grid size is 0.72  $\mu\mathrm{m}\times0.72$   $\mu\mathrm{m}.$ 



Figure 5.8: Complex routing example. Only metal 2 (vertical) and metal 3 wires (horizontal) are visible. Cell placement boundaries are shown in bright yellow. This example is taken from an actual 20-bit  $\times$  20-bit multiplier.

#### **Enclosed-Geometry Transistor Layout**

For radiation testing involving any large deposition of radiation dose (such as proton testing), it is possible to protect total-dose transistor degradation discussed in Section 2.3 by employing a special layout technique called Enclosed-Geometry Transistor layout [Mavis and Alexander, 1997; Nowlin et al., 2005]. By completely surrounding one of diffusion terminals (drain or source) by the polysilicon gate terminal in the NMOS transistor, total-dose induced high leakage current paths are eliminated by the removal of thick silicon oxides around the transistor channel (see Figure 5.9). In our standard cell library, we used enclosed-geometry (also called "ringed") NMOS



Figure 5.9: Total dose current leakage effect on standard two-edged vs. enclosed-geometry transistors from Nowlin et al., [2005]. The labels 128/1 ("wide") and 2/16 ("long") indicate relative transistor sizes (1=minimum length). (a) Two-edged transistors showed an exponential increase in leakage current when the gate voltage over-drive is less than zero (VG less than 0.5V) with increasing radiation dose. The transistor OFF current (leakage current) can be as high as 1% of the ON current. (b) Enclosed-ring transistors do not show significant change in leakage current. Curves labeled (W/L=128/1) belong to the "wide" transistor under test, while curves labeled (W/L=2/16) belong to the "long" device under test.

transistors in all cells to prevent early logic failure with low supply voltage due to exponential total dose leakage current increase and substantial transistor threshold shift. Although PMOS transistor thresholds are also affected by total dose effects, their leakage currents do not increase with radiation dose and result in circuit failure. Figure 5.10 shows an example of enclosed-geometry layout in standard cell layout design.

Comparing the relative sizes of the transistors in Figure 5.10a and Figure 5.10b, we observe that ringed transistors require larger layout compared to the standard "two-edged" transistors. In fact, for our fabrication process, the smallest ringed transistor width (1.12  $\mu$ m) is four times the minimum two-edged transistor width (0.28  $\mu$ m) in



Figure 5.10: Enclosed-geometry transistor layout. (a) Regular "two-edged" transistor layout. (b) Enclosed-source transistor layout. (c) Two-input NAND gate using regular NMOS transistor layout. (d) Two-input NAND gate using enclosed-source NMOS transistor layout.

our fabrication process. The increase in minimum transistor size therefore can lead to more than threefold increase in zero-dose total power consumption for digital circuit designs using this approach. However, analog designs are seldom affected by ringed geometries, since transistor sizes in analog circuit design tend to be relatively large.

# 5.3.2 Standard Cell Library

To enable design reuse of digital design blocks, it is useful to build a Standard Cell Library, or a collection of commonly used low-level logic functions. As previously discussed in Section 5.3.1, these cells have a common fixed height (which is a multiple of the routing grid size) and a variable width (also a multiple of the routing grid size),

allowing cells to be placed in rows. For each given logic function, multiple cells of varying drive strength can be designed such that they share the same functionality but different drive strength, with the drive strength measured in units of the smallest driving gate. For example, a simple inverter can be implemented as inverters with drive strength of 1X (the smallest inverter in the library), 2X (equivalent to two 1X inverters connected in parallel), 3X ..., and so on. This way, the designer can choose discrete drive strengths for each logic function without worrying about sizing individual transistors inside each logic gate. We implemented the following cells in our standard cell library:

- 1. Basic logic gates. These logic gates are single-stage logic gates (of up to four inputs each). Ex: INV (inverting buffer), BUF (non-inverting buffer), NAND2 (two-input NAND gate), AOI21 (three-input AOI21 gate).
- 2. Composite logic gates. These logic gates are made up of individual single-stage logic gates to perform commonly used small logic functions. Ex: MUX2 (two-input multiplexer), FA (full adder).
- 3. Fill cells or dummy cells. Fill cells are dummy cells without any logic function to "fill" the empty space in silicon in order to maintain the layout density of some fabrication layers (such as diffusion, metal, polysilicon) and prevent abrupt changes in layout density. Layout density rules guarantee that the local and global density of some fabrication layers will fall within a certain permissible range to reduce intra-die variation of fabricated structures (such as transistors, capacitors etc.). Examples of fill cells include decoupling Metal-Oxide-Silicon (MOS) or polysilicon capacitors used to fill the polysilicon and diffusion layers and reduce noise on the supply and ground lines, or supply line cells which connect the supply lines of different rows of standard cells together.

# 5.3.3 I/O Cells

Input/output (I/O) cells form an integral component of a functional chip. They serve as a buffer between external off-chip signals and internal core signals, and also provide



Figure 5.11: Input/output (I/O) cell structure. (a) Input cell structure. (b) Output cell structure.

some protection against damage from electrostatic discharge. An I/O cell typically comprises the following components (see Figure 5.11):

- A signal bond pad (or I/O pad), where an off-chip signal from the package housing the silicon die can be connected via a bond wire;
- An ESD protection circuit;
- A level shifter converting the voltage levels of I/O signals to that of core signals;
- Buffers to drive the signal over a large capacitive load.

The I/O cells in a chip can then be placed in a ring surrounding the core circuitry to facilitate wire bonding.

#### Electrostatic Device Structures (ESD)

Electrostatic discharge (ESD) is a major reliability problem in integrated circuits, and typically manifests itself as a large deposition of electrostatic charge (often due to human touch or "plug-in" of the device) from an external source to the silicon,



Figure 5.12: ESD protection circuit. (a) Schematic. (b) Layout.

resulting in an electrical overstress and possible destruction of the circuit structures in silicon. To protect against ESD, we employ standard diode- and CMOS transistor-based ESD protection in our I/O circuits from [Beebe, 1998, Chapter 1], shown in Figure 5.12. To prevent single-event latchup, double guard rings surround the ESD diodes. Ringed-geometry is used for the diode-connected ESD MOSFETs to reduce total dose leakage current.

### I/O Voltage Conversion

To allow lower supply operation for our test circuits, it is necessary to convert large-swing input/output ("I/O") digital signals into small-swing internal ("core") digital

 Core Voltage
 I/O Voltage

 Minimum
 Maximum

 1.8V
 0.6V
 1.8V

 2.5V
 0.9V
 1.8V

Table 5.2: Test chip I/O voltage settings.

signals for use in our internal test circuits, and vice versa. Allowing different "core" voltages adds an additional degree of freedom for testing the soft error performance of our test circuits, since the core supply determines how fast circuits respond to single-event charge generation, as well as how much charge is collected thereafter. We designed the test chip to tolerate external I/O signals of up to 2.5V, with a conversion factor of about 3X for the internal signals. For the 180nm CMOS fabrication process, standard CMOS transistors (used in the core) can only tolerate up to 1.8V supply, while high-threshold CMOS transistors (used in I/O circuits) can tolerate up to 2.5V supply. Using our I/O conversion circuitry, for 2.5V I/O operation, the internal core voltage can be set to as low as 0.9V. For 1.8V I/O operation, the internal core voltage can be set to as low as 0.6V. Table 5.2 summarizes the I/O range of operation.

To convert the digital signals between the different voltage domains, we implemented new I/O conversion circuits modified from the level-up and level-down shifters from  $Wang\ et\ al.$  [2001]. The I/O circuits require three supply voltages:  $V_{\rm DD}$  (core supply),  $V_{\rm DDIO}$  (external I/O supply) and  $V_{\rm MID}$  (an intermediate supply between  $V_{\rm DD}$  and  $V_{\rm DDIO}$  for more flexible voltage conversion).  $V_{\rm MID}$  can be tied to the same voltage as  $V_{\rm DD}$  if left unused. Figures 5.13 and Figure 5.14 show the level-down and level-up shifters implemented in our test chip.

#### Pad Frame

Once the I/O cells are designed in layout, we can assemble them together to form a pad ring, or a ring of I/O pads around the core circuitry, where bond pads are located on the edge of the silicon die for easier wire bonding. In our test chip, we chose the bond pad size to be 75  $\mu$ m × 75  $\mu$ m, with a minimum pad-to-pad spacing of 75  $\mu$ m.



Figure 5.13: Level-down shifter. (a) Transistor symbols. (b) Level-down shifters.

These pad size and spacing are appropriate for standard manual wire bonding. To allow more signal bond pads per side, we use staggered-pitch bond pad placement in our test chip (see Figure 5.15).

# 5.3.4 Flip-Flop Designs under Test

Before going into the details of the various test flip-flop designs implemented in silicon, we define the commonly used schematic symbols shown in Figure 5.16 to simplify the subsequent circuit schematics presented in this section. In all flip-flop designs, identical master and slave latches are used, and all designs except the SCDMR design follow the inverter-latch-latch-inverter configuration shown in Figure 5.1. Each design is implemented using the standard cell layout style described in Section 5.3.1 using standard "two-edged" transistors due to significant layout penalty with ringed-geometry transistors. Also, each design has the same following I/O signals: D (data input), Q (flip-flop output), CLK (clock for the slave latch) and CLKB (clock for the master latch).



Figure 5.14: Two-stage level-up shifter.



Figure 5.15: Staggered-pitch pad placement.

## Standard D Flip-Flop ("BASIC" and "BASIC2")

The Standard D Flip-Flop "BASIC" is the reference D-Flip-Flop design to which the circuit and soft error performance of all other designs under test is compared to. The "BASIC" D flip-flip (shown in Figure 5.17) is a master-slave flip-flop made of identical modified C<sup>2</sup>MOS D latches (shown in Figure 5.18) [Stojanovic and Oklobdzija, 1999]. In the "BASIC" design, All transistors are minimum-size transistors (P/N=0.68 $\mu$ m/0.28 $\mu$ m, L=0.18 $\mu$ m). We also implemented a "BASIC2" D flip-flop



Figure 5.16: Circuit symbols commonly used in Section 5.3.4. (a) Inverter. (b) Clocked inverter. (c) Keeper-less C-Element from Figure 3.8a. (d) Two-input multiplexer.



Figure 5.17: Standard "BASIC" D flip-flop.



Figure 5.18: Standard D latch [Stojanovic and Oklobdzija, 1999].

with the same circuit topology as "BASIC", but double transistor widths for all transistors except the input, output and clock inverters to preserve the same input and output loading. The "BASIC" design represents the "minimal unprotected D flip-flop design", whereas the "BASIC2" design establishes the lower bound for the speed and power performance of the DMR flip-flop designs discussed next.

#### Single C-Element DMR Flip-Flop ("SCDMR")

The "SCDMR" flip-flop (shown in Figure 5.19) is a flip-flop design modified from the BISER design in [*Mitra et al.*, 2005]. The "SCDMR" design concept was previously discussed in Section 3.4.1. This dual modular redundancy-based design is virtually equivalent to two identical "BASIC" flip-flops voted by a two-input C-Element at the



Figure 5.19: "SCDMR" flip-flop, similar to the BISER flip-flop from *Mitra et al.* [2005].

output. All transistor sizes in the "SCDMR" flip-flop are identical to the transistor sizes found in the "BASIC" flip-flop, to the exception of the weak inverter, which uses a much longer transistor length (five times the minimum transistor length).

#### Quadruple C-Element D Flip-Flop ("QCDMR")

The "QCDMR" flip-flop (shown in Figure 5.20) is a master-slave flip-flop made of identical "QCDMR" latches (shown in Figure 5.21) modified from the 4-TAG latch [Shuler et al., 2005]. The "QCDMR" design concept was previously discussed in Section 3.4.1. All transistors in the "QCDMR" design are also minimum-size transistors (P/N=  $0.68\mu$ m/ $0.28\mu$ m, L= $0.18\mu$ m) like the aforementioned designs.

### Differential D Flip-Flop ("DIFF")

The DCVSL logic style, discussed in Section 3.4.3, is a rather unusual circuit design style with the soft error advantage of limiting the propagation of single-event



Figure 5.20: "QCDMR" Flip-Flop.

transients to up to two logic stages [Casey et al., 2005]. However, DCVSL logic has unbalanced rising and falling output transition times. The unevenness of the output transition times is related to the fact that the bulk of the transistor logic function resides in NMOS logic (responsible for falling transitions) which drives its output first before the PMOS logic (responsible for rising transitions) can drive its output through circuit feedback from the NMOS logic output.

We implemented a DCVSL flip-flop called "DIFF" using mostly differential inverters and clocked inverters shown in Figure 5.22 and Figure 5.23. Since SETs can propagate through one logic stage in DCVSL logic, a logic value (represented by a pair of two differential circuit nodes having opposite binary values) must be stored three times inside an internal storage loop. This way, in the event of a single particle strike affecting one circuit node, at most two stored logic values (the location of the original strike plus one propagated stage) are affected, and the unaffected logic value can restore the circuit back to its original state. Figure 5.24 shows our soft-error-resilient DCVSL latch design ("DIFF"). To prevent SETs in the "DIFF" latch from propagating to the latch output which may be connected to standard CMOS logic gates susceptible to SET propagation (unlike DCVSL), we inserted a three-input DCVSL AND gate (shown in Figure 5.25) at the latch output. We then configured the DCVSL "DIFF" flip-flop as a master-slave flip-flop using identical "DIFF" latches (see Figure 5.26).



Figure 5.21: "QCDMR" Latch, similar to the 4-TAG Latch from Shuler et al. [2005].

#### DICE D Flip-Flop ("DICE") and LEAP-DICE Flip-Flop ("LEAP-DICE")

The Dual Interlocked Storage Cell (DICE), discussed in Section 3.4.2, is an efficient DMR-based storage cell resilient to single event upsets affecting single circuit nodes [Calin et al., 1996]. The 8-transistor DICE configuration from Figure 3.14 is suitable for regular SRAM operation, but needs to be modified for high-speed latch or flip-flop operation. We implemented the clocked version of the DICE latch in Figure 5.27, and configured the DICE D flip-flip as a master-slave flip-flop using two identical clocked DICE latches in Figure 5.28.



Figure 5.22: DCVSL inverter.



Figure 5.23: Clocked DCVSL inverter.

The "LEAP-DICE" D flip-flop uses the same circuit topology as the "DICE" D flip-flop. However, the layout arrangement for "LEAP-DICE" is quite different than "DICE". The layout for the "DICE" flip-flop is shown in Figure 5.29, with a close-up view on the clocked "DICE" latch in Figure 5.30. Similarly, the layout for the "LEAP-DICE" D flip-flop is shown in Figure 5.31, with a close-up view on the clocked "LEAP-DICE" latch shown in Figure 5.32. To highlight relative positions of the original eight sensitive diffusion nodes (named n1-n8) from the DICE and LEAP-DICE layout configurations in Figure 4.16, we labeled them in Figure 5.30 and Figure 5.32.



Figure 5.24: DCVSL latch ("DIFF"). The three-input DCVSL AND gate at the output is used to filter SETs in the latch internal loop.



Figure 5.25: Three-input DCVSL AND gate.



Figure 5.26: DCVSL "DIFF" flip-flop.



Figure 5.27: Clocked "DICE" latch.



Figure 5.28: "DICE" D flip-flop.



Figure 5.29: "DICE" D flip-flop layout.



Figure 5.30: Close-up of the clocked "DICE" latch layout.



Figure 5.31: "LEAP-DICE" flip-flop layout.



Figure 5.32: Close-up of the clocked "LEAP-DICE" latch layout.



Figure 5.33: Non-overlapping clock generation. The outputs CLK and CLKB are used in the slave and master stages of the test flip-flops respectively.

#### 5.3.5 Clock Generation and Distribution

To allow the master and slave latch stages of the flip-flop designs to be tested individually and prevent race-through conditions, the JEDEC89A standard suggests that non-overlapping clocks are to be used to feed the flip-flops [JEDEC Standard, 2006]. Figure 5.33 shows the non-overlapping clock generator used in the test chip. The clock generator takes in an external 50% duty-cycle clock signal "CLK-IN", and generates two non-overlapping active-high clocks "CLK" and "CLKB", with a minimum of 100 ps of non-overlapping phase.

To drive the large combined clock load on all the cells in the flip-flop array, we implemented a clock distribution tree for each of the non-overlapping clocks using the standard "fanout of four" (FO4) rule (i.e. each inverter can drive up to a load equivalent of four inverters of the same size).

## 5.4 Conclusion

This chapter described the implementation of a test chip containing various softerror-resilient flip-flop designs in a commercial 180nm CMOS process. The flip-flop designs utilize soft error resilience techniques discussed in Chapters 3 and 4. The methodology for the test chip implementation presented here is applicable to the soft error evaluation of any sequential cell design in any given technology node. It is important to note that design for soft error robustness is not only limited to circuit and layout techniques inside the sequential cells. Rather, robust design must consider all layers of abstraction, from circuit layout to chip organization. Ultimately, the soft error resilience of the various presented sequential cell designs can only be verified by undergoing actual accelerated radiation testing, the focus of next chapter.

# Chapter 6

# **Experimental Setup and Results**

This chapter presents a design and testing framework for soft-error-resilient sequential cells, by quantifying the performance trade-offs of circuit and layout resilience techniques (presented in Chapter 3 and 4) in the new "soft error resilience — power — delay — area" design space. This chapter first describes the experimental setup we used for the accelerated radiation testing of various sequential cells under study, then presents soft error results obtained from the testing. From the experimental results, we demonstrated that our SEU-immune and SEMU-resilient LEAP-DICE design achieved the best soft performance among all techniques investigated in this study, and deduced that a mixture of SEU-immune circuit techniques and SEMUresilient layout techniques will be needed for future silicon process generations to mitigate the growing concern over SEMUs due to device scaling. We also presented a new way of evaluating each sequential cell in the new soft error design space by introducing an easy-to-use soft-error metric called Soft Error Resilience. Additionally, this study discovered new soft error dependence on circuit operating conditions such as supply voltage, clock frequency and total dose exposure. These new effects helped us conclude that all possible circuit operating conditions, pertaining to the lifetime of the application in which the sequential cells are used, must be considered during the soft error performance evaluation of sequential cells.

## 6.1 Radiation Experimental Setup

To evaluate the soft error resilience of electronics, accelerated radiation testing must be performed on a circuit. An hour of accelerated radiation testing can lead to soft error results similar to those from electronics exposed for years in the normal environment of operation. Accelerated radiation testing, unlike typical electronic device testing, involves a lot of care and consideration to avoid unnecessary radiation damage and interference to the test equipment, which are often not radiation-hardened and difficult or costly to replace. Therefore, this section presents our experimental setup suitable for testing sequential cells under irradiation, and explores the various implementation choices we made for the test setup in this study.

In radiation testing, the device under test is placed directly in front of the particle beam. To avoid radiation damage to anything other than the test chip, all test equipments are placed as far as possible from the test beam/test chip. Very often, the radiation beam resides in a specialized radiation chamber which is extremely dangerous to the human user while the beam is in operation. Therefore, the user must be able to maintain control of the test equipment during irradiation outside the radiation chamber, and the test setup must ensure that test signals to and from the test device can be reliably transmitted through very long cables. The remote control nature of the test setup also requires the system to be monitored remotely and maintain some autonomous test functions, which further complicates the system setup.

Based on these needs, we created a reliable single-event radiation testing platform comprising the following components:

- 1. A test chip socket board containing the test chip under irradiation.
- 2. A Complex Programmable Logic Device (CPLD) board responsible for buffering test chip signals over long distances.
- 3. A Field-Programmable Gate Array (FPGA) board capable of creating complex or user specified test patterns for the test chip as well as recording test results.

- 4. A real-time analog acquisition board monitoring test chip supply voltages (to make sure no single-event latchup has occurred).
- 5. A test computer capable of commanding the FPGA board to setup various radiation tests.

In this system, the test chip socket board and the CPLD board are linked through 10-meter SCSI cables. The CPLD and FPGA boards are linked through 50-cm ribbon cables, and the test computer commands the FPGA board through a 10-meter LAN cable. The use of long cables allows the user to operate and monitor the test equipment from outside the radiation chamber, and to sufficiently isolate most test equipments (except the test chip socket board) from the radiation beam to prevent radiation damage and interference to these equipments. Figure 6.1 shows our experimental setup for single event testing.

For radiation testing purposes, a test chip socket board housing only the test chip socket and passive circuit elements (decoupling capacitors and line matching resistors) was built to house the test chip under irradiation (see Figure 6.2). The rest of the setup is shown in Figure 6.3. This test board is intended to be placed directly in front of radiation beam. To prevent unintended radiation effects (SEUs, SELs etc.) from interfering with the radiation test setup, we avoided placing any active circuit element other than the test chip on the socket board so that any error or malfunction observed during irradiation can only be attributed to the test chip. The test chip signals are then transmitted and received through 30-feet shielded SCSI cables to allow all other test equipment (which are not radiation hardened) to be placed as far as possible in the radiation chamber or even outside the chamber in order to minimize equipment failure due to radiation. In the lab environment, we were able to transmit digital signals of up to 100 kHz through these highly shielded cables without any signal adjustment or equalization.

To communicate with the test chip and generate the necessary test stimuli, we used the Digilent XUPV2 FPGA development system board from *Digilent Inc.* [2011]. The Digilent board contains a Xilinx Virtex-2 Pro XC2VP30 FPGA with 30,816 logic cells and 2,448 Kb of block Static Random Access Memory (SRAM), plus various data



Figure 6.1: Test environment for integrated circuits radiation testing. The yellow area defines the radiation chamber area where no user can safely stay without suffering serious health consequences during irradiation.



Figure 6.2: Test chip socket board. The board must be held perpendicularly to the radiation beam for irradiation testing. Here two such boards are stacked back to back during neutron testing at Los Alamos National Laboratory (LANL). An overhead hairdryer is used to increase the operating temperature in an attempt to increase the number of soft errors.



Figure 6.3: Test bench setup located away from the radiation beam (picture taken at LANL). At Indiana University Cyclotron Facility (IUCF), the beam was located further away, and the CPLD and FPGA boards had to be located inside the radiation chamber.



Figure 6.4: Close-up of FPGA and CPLD boards. Each FPGA board could connect to two CPLD boards, each connected to a separate test chip socket board through 50-feet SCSI cables.

communication ports such as a 10/100 Mb Ethernet port and expansion connectors where the test chip signals can be sent and received.

Since the test signals produced by the FPGA board do not have sufficient signal strength to drive the 30-ft SCSI cables, and have higher voltage range (0V-3.3V) than the signal range required by the test chip (0V-1.8V), we used a custom-designed Complex Programmable Logic Device (CPLD) board to convert the FPGA test signals into suitable signals for the test chip. The CPLD acts as a small programmable buffer device for the test signals, and the CPLD board can generate various supply voltages ranging from 0 to 1.8V to power and bias the test chip. In addition, during the initial stages of test chip debugging, the CPLD board served as a simple test pattern generator before the entire testing platform was ready. Both the CPLD and FPGA boards are shown in Figure 6.4. Each FPGA board can connect up to two CPLD boards simultaneously, testing two different test chips.

# 6.2 Radiation Experimental Results

For this study, we performed two sets of tests following guidelines from the JEDEC89A Standard [*JEDEC Standard*, 2006]:

- 1. Static testing: a test data pattern ("all 0 bits" or "all 1 bits") is first loaded into the flip-flops using non-overlapping clocks. Clocks are then disabled during irradiation while the test chip is still powered. After the irradiation reaches a certain fluence (number of particles passed through an unit area), the data pattern is read back by enabling the clocks again.
- 2. Dynamic testing: the test data pattern is continuously loaded into the flip-flops at a set frequency during irradiation while errors are counted. The "all 0 bits" and "all 1 bits" patterns are also used in this testing. The clock frequency can be dynamically set.

We performed these tests under accelerating neutron and proton irradiation. The results of these tests are reported in the following sections.

## 6.2.1 Neutron Testing

To evaluate the soft error performance of the different flip-flop designs, we first conducted an accelerated neutron test at Los Alamos National Laboratory (LANL) in Los Alamos, New Mexico in September 2009 [LANSCE, 2006]. The LANL neutron beam has an energy profile similar to the neutron flux found at New York City Sea Level but with an acceleration factor of  $3 \times 10^8$  (see Figure 6.5). The chips were irradiated at 0.8V supply at normal incidence with a 3-inch beam diameter, and there was no observed latchup. Few errors were detected for the BASIC flip-flop (1-2 errors per 2-3 hours of beam time), and none for the other designs.

Moreover, while performing static testing using checkered patterns (strings of 1s followed by strings of 0s), we discovered that either the entire bit pattern, or half the bit pattern stored in the flip-flop arrays, was shifted after several hours of exposure. For example, a repeating "0xF0" pattern was shifted to "0xE1". The shifting of the



Figure 6.5: Neutron beam profile at the ICE House in Los Alamos National Laboratory between 1 MeV to 1 GeV neutron beam energy [LANSCE, 2006]. TRIUMF is a similar facility at the National Laboratory for Particle and Nuclear Physics in Canada. Note that outside the displayed range of 1 MeV to 1 GeV, the beam profiles at ICE and TRIUMF are drastically different than the natural ground spectrum. However, the majority of neutron-related soft errors is produced by neutrons within the displayed range, as neutrons with lower energies are easily absorbed, and neutrons with higher energies can quickly travel through silicon without creating much charge generation.

bit patterns, limited to only the entire pattern or half the pattern, was likely caused by SETs in the first stages of the on-chip clock distribution tree. In fact, after examining the implementation of the clock tree, we discovered that there was only one single delay chain leading to the clock buffer tree. Once in the clock buffer tree, the clock signal is buffered using a fanout of 3–4, and the signals are re-shorted together after every two inverter stages. In the initial inverter stages, the capacitance of the shorted loads is not large enough to prevent SETs. However, in later stages, as the buffers become larger, they become more resistant to SETs. This is the reason why we only observed unintended bit pattern shifting either through the entire flip-flop array or through half the flip-flop array.

The bit shifting due to soft errors rendered checkered patterns unusable in further soft error testing. Consequently, we decided to perform static and dynamic testing using only blanket ("all 1" or "all 0") bit patterns. Blanket patterns are an acceptable form of test data pattern according to the JEDEC JESD89A Standard [JEDEC Standard, 2006]. Using checkered patterns can result in different electric fields in the periphery of each cell due to different electric fields from neighboring cells. But since each cell flip-flop is extensively surrounded by guard rings, most of the fields terminate at the guard rings, and the soft error rate of checkered patterns will not differ significantly from the soft error rate of blanket patterns. Additionally, checkered patterns during dynamic testing can produce larger than expected soft error rates due to combinational soft errors captured during the latching window of each flip-flop [Gadlage et al., 2005]. This additional soft error rate only depends on the frequency of operation, but does not depend on the soft error resilience of the design. In this study, we are interested in the sequential soft error rate of each design, and blanket patterns are sufficient for the evaluation of sequential cells in this study.

Figure 6.6 shows a possible way to harden the clock distribution tree by creating at least three independent delay chain branches then shorting them at the final stage. Any single SET produced in any of the three branches will not appear at the output load. However, the single input buffer driving the branches may still be vulnerable to SETs.



Figure 6.6: Clock tree protection against soft errors.

#### 6.2.2 Proton Testing

To increase the number of observed errors during accelerated radiation testing, we performed a second test at Indiana University Cyclotron Facility (IUCF) using their 3-inch-diameter 200-MeV proton beam in October and December 2009. The IUCF proton beam flux is more than 10<sup>6</sup> higher than the neutron beam flux at LANL. Due to their high kinetic energy, 200-MeV protons have limited kinetic energy loss due to collisions with electrons in silicon, and behave similarly to 200-MeV neutrons in terms of charge generation and causing soft errors in silicon. However, the proton beam is mono-energetic, and does not match the natural neutron spectrum. Although it is possible to test electronics with lower energy protons, low energy protons have a behavior more similar to alpha particles than neutrons due to the presence of charge on the particle, and can cause greater total dose effects and distort the soft error results by direct ionization. We even observed some total dose effects with the 200-MeV proton beam. Consequently, soft error rates collected in this proton test cannot translate to Failure in Time (FIT, one FIT equals one failure per billion hours) for the ground level, a measure of estimating the expected failure rate in semiconductors, as ground level electronics are affected by a broad neutron spectrum and do not suffer total dose effects. Despite these shortcomings, we can still evaluate the relative soft error resilience of each sequential cell design using the proton beam.

At IUCF, the test chips were exposed at normal incidence up to 2 Mrad[Silicon] dose. We performed static and dynamic testing on the chips at 1V, 1.4V and 1.8V. The observed number of errors remained fairly constant at 1V, but increased during the course of the experiments at other voltages due to total dose effects similar to transistor aging. We report the observed soft error counts for all designs with static and dynamic testing at 1V in Figure 6.7.

We made the following observations from our experiments:

**LEAP-DICE** is the most resilient design. The LEAP-DICE flip-flop has the best overall soft error performance among all designs, with on average 2,000X fewer errors compared to the reference BASIC flip-flop design, and 5X fewer errors compared to the DICE flip-flop design which shares the same DICE circuit but has a different layout. LEAP-DICE requires 133% more area, 54% more power but negligible delay compared to BASIC.

LEAP-DICE encounters 5X fewer errors than DICE with same circuit.

# It's interesting to compare the performance between DICE and LEAP-DICE in all dimensions of the "soft-error-resilience — power — delay — area" design space. Since both designs share the same circuit topology and transistor sizes, their power and delay numbers are literally identical, save for minor differences due to parasitics caused by different internal wiring. LEAP-DICE requires roughly 40% more area than DICE, but improves the soft error resilience by 5X. Using the "2X node separation = 10X fewer soft errors" rule of thumb from [Seifert et al., 2007], a 40% area increase in the conventional DICE layout using proportional increase in node separation without any change in transistor placement will only result in 1.75X better soft error resilience. Therefore, node separation alone cannot fully account for the 5X soft error improvement in LEAP-DICE. This simple calculation shows that transistor placement can play an active role in making sequential circuits more robust without sacrificing too

much layout area.



Figure 6.7: Measured soft error performance of flip-flops at 1V under 200-MeV proton irradiation.



Figure 6.8: SCDMR layout placement. (a) Original layout. (b) Proposed layout fix.

Transistor doubling did not help. The doubling of internal transistor sizes in BASIC2 compared to BASIC barely improved the soft error rate, showing that making transistors bigger alone is not sufficient to make circuits robust.

SCDMR is unexpectedly softer due to design flaws. The soft error resilience improvement using DMR circuit techniques falls within a range of values around between 10X to 1000X with an average of around 100X–200X. Our SCDMR flip-flop was unexpectedly a lot softer compared to DICE, although both shared similar SEMU charge collection thresholds in simulation. The soft errors in SCDMR (Figure 6.8a) were dominated by SEMU strikes involving the slave latch and the C-Element separated at 1  $\mu$ m apart, while the DICE design is most sensitive to SEMUs involving circuit nodes with a 10  $\mu$ m separation. To reduce the number of SEMUs in SCDMR, we propose a new SCDMR layout arrangement (Figure 6.8b) which physically separates the C-Element from the slave latch with negligible impact on area, power and delay.

SCDMR has clock dependency for soft errors. The SCDMR flip-flop encountered almost 100X more errors in static testing compared to dynamic testing. To investigate this difference and understand which result is valid for normal operating conditions, we tested SCDMR by varying the clock frequency in dynamic testing as well as changing the fluence steps in static testing by treating each fluence step as one single clock cycle in dynamic testing. Figure 6.9 shows the combined result.



Figure 6.9: SCDMR frequency dependence. From left to right, the blue diamond points correspond to dynamic testing at 50 kHz, 0.5 Hz and 0.05 Hz, while the red square points correspond to static testing at 10, 25, 50, 100 and 200 krad[Si].

At low fluence steps, the soft error count remains constant, as it is related to the cell SEMU probability. At high fluence steps however, the soft error rate increases at a rate identical to the square of BASIC soft error rate, which is proportional to the probability that two independent BASIC flip-flops will get hit at the same time. Since the SCDMR flip-flop is made up of two (unprotected) BASIC flip-flops plus a C-Element acting as a SET filter at the output, we can conclude that the higher soft error rate in SCDMR at high fluence steps is dominated by separate hits on each flip-flop. Although it is anticipated that the SCDMR flip-flop is vulnerable to SEMU hits as well as two independent single node hits resulting in an upset, this is the first ever demonstration that the soft error rate of the SCDMR flip-flop or similar designs can be highly influenced by the probability of two separate hits under high fluence steps. By extension, we can also conclude that designs similar to the SCDMR flip-flop may

not be suitable for standby operation in a radiation environment, where the fluence during standby can reach levels similar to the fluence steps used during static testing. The commonly used Triple Modular Redundancy (TMR) flip-flop, composed of three identical unprotected flip-flops voted by a three-input majority gate, also falls under the same category of sequential cells vulnerable to two separate hits. Additionally, since most electronic applications operate at the supra-MHz frequency range, static testing may not be suitable for assessing the soft error resilience of these sequential cells, as our results show that static testing is equivalent to testing electronics at extremely low frequencies ( $\ll 1~{\rm Hz}$ ).

Total dose effects reduce soft errors in DIFF. The soft error count for most flip-flop designs increased around 20% by scaling down the supply voltages from 1.8V to 1V. The soft error count also increased slightly up to 10% with large total dose exposure. In contrast, the DIFF design showed substantial soft error improvement in both cases (Figure 6.10). DCVSL gates normally have fast output fall time and slow output rise time due to unbalanced P/N ratio (weak PMOS, strong NMOS). With total dose transistor aging making PMOS transistors slower and NMOS transistors faster, the skewed rise/fall time ratio becomes more pronounced. When the drive strength of NMOS devices become much stronger than that of the PMOS devices, the gate delay slows down significantly due to slow PMOS feedback, making DCVSL gates substantially slower, but also preventing short-duration SET pulses from propagating and creating an upset.

To showcase the performance tradeoffs between the flip-flop designs, we introduce a new metric called Soft Error Resilience, defined as the inverse of Normalized Soft Error Count, where the dynamic flip-flop soft error counts are normalized to the reference BASIC design [Lee et al., 2011]. Figure 6.11 puts each flip-flop design in the new "soft error resilience — energy — area" space, where the (switching) energy is defined as the power-delay product. LEAP-DICE is the most soft-error-resilient design among all designs considered in this study, with moderate area and energy costs. QCDMR, while exhibiting slightly lower soft error resilience compared to LEAP-DICE, requires much more area, power and delay overhead.



Figure 6.10: Soft error reduction in the DIFF design with increasing radiation dose.

## 6.3 Conclusion

This chapter presented our custom experimental setup for accelerated testing of sequential cell designs. After evaluating the designs considered in this study (including our LEAP-DICE design) under neutron and proton testing, we determined that LEAP-DICE, which combines the SEU-immune DICE circuit with the SEMU-resilient LEAP layout principle, obtained the best soft error performance among all designs. LEAP-DICE achieved 2,000X soft error resilience compared to our reference D flip-flop, with moderate area and power costs, and negligible delay overhead.

After introducing a new soft error design metric called Soft Error Resilience, we were able to place each sequential cell we investigated in the "Soft Error Resilience — Power — Delay — Area" design space. The addition of soft error resilience as a new design dimension opens up new research possibilities in integrated circuit design.



Figure 6.11: Design framework putting LEAP-DICE and other soft-error resilient flip-flops in the "energy — area — soft-error-resilience" space. Energy (switching) refers to the power-delay product. All dimensions are normalized to BASIC.

Newly discovered soft error effects from this study highlights the importance of including operating conditions as an essential factor in determining the robustness of a soft-error-resilient sequential cell during its lifetime of operation. It is important to note that different designs may have different soft error behavior under different operating conditions, and not all testing conditions may be suitable for all designs (for example, static testing vs. dynamic testing for the SCDMR flip-flop). The soft error resilience of these designs must be carefully assessed under all operating conditions to ensure that they are properly characterized under irradiation.

# Chapter 7

# Conclusion

In the last four decades, technology scaling in CMOS technology has allowed exponential growth in transistor density, reduction in power consumption as well as performance doubling every 18 months according to Moore's Law. The drastic shrinking of transistor dimensions also increased the probability of soft errors in semiconductor chips. Traditional techniques at the circuit and system level have helped mitigating the effects of single-event upsets. However, individual transistors have become so close to each other that more than 10% of all soft errors now come from single-event multiple upsets, and these types of errors are more difficult to correct. The goal of this dissertation is to develop new design techniques targeting the growing concern over SEMUs.

The original vision for developing a layout technique addressing SEMUs lies on the fact that future transistor dimensions will be scaled down so much that particle strikes near active silicon area will most likely affect a cluster of transistors within a micron radius of the site of impact. Past soft error analyses often treated radiation-induced charge collection as "bad" and avoided at all cost. However, not all charge collection is bad, as transistors can collect either charge causing a soft error ("bad charge collection"), or charge that makes logic signals stronger instead ("good charge collection"). Therefore, if charge sharing is unavoidable, we should look for ways to utilize the "good" aspect of charge collection to counteract the "bad" aspect of charge collection. The development of the LEAP layout principle soft error resilience, as well

as the subsequent realization of the soft-error-resilient LEAP-DICE sequential cell design (application of the LEAP principle on the DICE design), reflects that vision.

Radiation experiments on 180nm silicon chip showed that the LEAP-DICE design achieved the best soft error performance among all techniques we investigated. LEAP-DICE offers 2,000X soft error resilience compared to the conventional D flip-flop, at moderate power and area costs, and negligible delay penalty.

In the course of evaluating and comparing different sequential cell designs, we demonstrated a design framework for soft error resilience, by quantifying the performance trade-offs of circuit and layout resilience techniques in the "soft error resilience — power — delay — area" design space. By coincidence, we also discovered new soft error effects related to operating conditions such as voltage scaling, clock frequency setting and total radiation dose. These effects are strongly related to circuit conditions varying over the lifetime of the circuit. Therefore, we concluded that the design of soft error resilience must take into consideration various possible operating conditions to ensure that the application remains robust over its lifetime.

## 7.1 Future Research

The LEAP-DICE design illustrates how a combination of circuit and layout techniques will be essential for the design of next generation sequential cells, as SEMU probability increases exponentially with device scaling. So far, this study only investigated the LEAP principle on the DICE cell circuit topology.

Is LEAP-DICE a unique soft-error-resilient design, or is the LEAP layout principle applicable to other sequential cells? Concurrent to the publication of our work in 2010, our peers in the soft-error-resilient design community have also reported remarkable soft error reduction using the charge cancellation effect promoted by the LEAP principle [Seifert et al., 2010a; Ahlbin et al., 2010; Uemura et al., 2010]. It will be interesting to extend LEAP to a larger number of sequential cell topologies (both soft-error-resilient and non-resilient), and even combinational logic to evaluate the effectiveness of the LEAP principle.

As deep submicron device scaling makes transistors closer to each other than ever,

we suggest that future work in the area of soft-error-resilient sequential cell design includes a re-evaluation of the circuit and layout techniques surveyed in this work using a more recent technology node (i.e. 22nm or 28nm CMOS). The technology re-assessment could provide some insights into the effect of technology scaling on the LEAP layout principle. The eventual goal would be to extend the LEAP principle to any design, and develop a systematic layout design methodology for soft error resilience (or a set of soft error layout design rules) that can be applied toward any integrated circuit layout.

# Bibliography

Ahlbin, J. R.; Gadlage, M. J.; Atkinson, N. M.; Bhuva, B. L.; Witulski, A. F.; Holman, W. T.; Massengill, L. W.; Eaton, P. H.; and Narasimham, B. (2010). "Effect of Multiple-Transistor Charge Collection on SET Pulse Widths," *Proceedings of IEEE International Reliability Physics Symposium (IRPS)*, pp. 198–202, May 2010.

Andrews, J. L.; Schroeder, J. E.; Gingerich, B. L.; Kolasinski, W. A.; Koga, R.; and Diehl, S.E. (1982). "Single Event Error Immune CMOS RAM", *IEEE Transactions on Nuclear Science*, vol. 29, no. 6, pp. 2040–2043, December 1982.

Amusan, O.A.; Witulski, A. F.; Massengill, L. W.; Bhuva, B. L.; Fleming, P. R.; Alles, M. L.; Sternberg, A.L.; Black, J.D.; and Schrimpf, R.D. (2006). "Charge Collection and Charge Sharing in a 130 nm CMOS Technology," *IEEE Transactions on Nuclear Science*, vol. 53, no. 6, pp. 3253–3258, December 2006.

Amusan, O. A.; Massengill, L. W.; Baze, M. P.; Bhuva, B.L.; Witulski, A.F.; Black, J. D.; Balasubramanian, A.; Casey, M. C.; Black, D. A.; Ahlbin, J. R.; Reed, R.A.; and McCurdy, M. W. (2008). "Mitigation Techniques for Single-Event-Induced Charge Sharing in a 90-nm Bulk CMOS Process," *Proceedings of IEEE International Reliability Physics Symposium (IRPS)*, pp. 468–472, April 2008.

BIBLIOGRAPHY 128

Bagatin, M.; Gerardin, S.; Paccagnella, A.; Cellere, G.; Visconti, A.; and Bonanomi, M. (2010). "Increase in the Heavy-Ion Upset Cross Section of Floating Gate Cells Previously Exposed to TID," *IEEE Transactions on Nuclear Science*, vol. 57, no. 6, pp. 3407–3413, December 2010.

Balasubramanian, A.; Bhuva, B. L.; Black, J.D.; and Massengill, L.W. (2005). "RHBD Techniques for Mitigating Effects of Single-Event Hits Using Guard-Gates," *IEEE Transactions on Nuclear Science*, vol. 52, no. 6, pp. 2531–2535, December 2005.

Baumann, R. C. and Hossain, T. Z. (1995). "Electronic Device and Process Achieving a Reduction in Alpha Particle Emissions from Boron-Based Compounds Essentially Free of Boron-10," U.S. Patent 5 395 783, March 7, 1995.

Baumann, R. C. (2005). "Radiation-Induced Soft Errors in Advanced Semiconductor Technologies," *IEEE Transactions on Device and Materials Reliability*, vol. 5, no. 3, pp. 305–316, September 2005.

Baze, M. P.; Hughlock, B.; Wert, J.; Tostenrude, J.; Massengill, L.; Amusan, O.; Lacoe, R.; Lilja, K.; and Johnson, M. (2008). "Angular Dependence of Single Event Sensitivity in Hardened Flip/Flop Designs," *IEEE Transactions on Nuclear Science*, vol. 55, no. 6, pp. 3295–3301, December 2008.

Beebe, S. (1998). "Characterization, Modeling, and Design of ESD Protection Circuits," *Ph.D. Dissertation*, Department of Electrical Engineering, Stanford University, Stanford CA, March 1998.

Benedetto, J. M. (1998). "Economy-Class Ion-Defying ICs in Orbit," *IEEE Spectrum*, vol. 35, no. 3, pp. 36–41, March 1998.

Binder, D.; Smith, E. C.; and Holman, A. B. (1975). "Satellite Anomalies from Galactic Cosmic Rays," *IEEE Transactions on Nuclear Science*, vol. 22, no. 6, pp. 2675–2680, December 1975.

- Black, J. D.; Sternberg, A. L.; Alles, M. L.; Witulski, A. F.; Bhuva, B. L.; Massengill, L. W.; Benedetto, J. M.; Baze, M. P.; Wert, J. L.; and Hubert, M. G. (2005). "HBD Layout Isolation Techniques for Multiple Node Charge Collection Mitigation," *IEEE Transactions on Nuclear Science*, vol. 52, no. 6, pp. 2536–2541, December 2005.
- Blum, D. R. and Delgado-Frias, J. G. (2006). "Schemes for Eliminating Transient-Width Clock Overhead from SET-Tolerant Memory-Based Systems," *IEEE Transactions on Nuclear Science*, vol. 53, no. 3, pp. 1564–1573, June 2006.
- Borucki, L.; Schindlbeck, G.; and Slayman, C. (2008). "Comparison of Accelerated DRAM Soft Error Rates Measured at Component and System Level," *Proceedings of IEEE International Reliability Physics Symposium* (IRPS), pp. 482–487, April 2008.
- Calin, T.; Nicolaidis, M.; and Velazco, R. (1996). "Upset Hardened Memory Design for Submicron CMOS Technology," *IEEE Transactions on Nuclear Science*, vol. 43, no. 6, pp. 2874–2878, December 1996.
- Casey, M.C.; Bhuva, B. L.; Black, J. D.; and Massengill, L.W. (2005). "HBD Using Cascode-Voltage Switch Logic Gates for SET Tolerant Digital Systems," *IEEE Transactions on Nuclear Science*, vol. 52, no. 6, pp. 2510–2515, December 2005.
- Chen, C.L. and Hsiao, M. Y. (1984). "Error-Correcting Codes for Semiconductor Memory Applications: A State-of-the-Art Review," *IBM Journal of Research and Development*, vol. 28, no. 2, pp. 124–134, March 1984.

Cisco Systems Inc. (2005). "Soft Errors on Cisco 12000 E4/E4+ Engine Based Line Cards", Cisco 12000 Series Routers Field Notice 23754, October 2005. url: http://www.cisco.com/en/US/ts/fn/200/fn23754.html

Clein, D. (1999). "CMOS IC Layout: Concepts, Methodologies, and Tools," *Boston MA: Newness Press*, pp. 93–99, December 1999.

Diehl, S. E.; Ochoa, A. Jr; Dressendorfer, P. V.; Koga, R.; and Kolasinski, W.A. (1982). "Error Analysis and Prevention of Cosmic Ion-Induced Soft Errors in Static CMOS RAMs," *IEEE Transactions on Nuclear Science*, vol. 29, no. 6, pp. 2036–2039, December 1982.

Digilent Inc. (2011). The Virtex-II Pro Development System. url: http://www.digilentinc.com/Products/Detail.cfm?Prod=XUPV2P

Dirk, J. D.; Nelson, M. E.; Ziegler, J. F.; Thompson, A.; and Zabel, T. H. (2003), "Terrestrial thermal neutrons," *IEEE Transactions on Nuclear Science*, vol. 50, no. 6, pp. 2060–2064, December 2003.

Dixit, A. and Heald, R. (2009). "Soft Error Estimates for Fabless Companies," *Proceedings of IEEE International Conference on IC Design and Technology (ICICDT)*, pp. 125–127, May 2009.

Dixit, A. and Wood, A. (2011). "The Impact of New Technology on Soft Error Rates," *Proceedings of IEEE International Reliability Physics Symposium (IRPS)*, pp. 5B.4.1–5B.4.7, Monterey, CA, April 2011.

Dodd, P. E. and Sexton, F. W. (1995). "Critical Charge Concepts for CMOS SRAMs," *IEEE Transactions on Nuclear Science*, vol. 42, no. 6, pp. 1764–1771, December 1995.

Dodd, P. E. and Massengill, L.W. (2003). "Basic Mechanisms and Modeling of Single-Event Upset in Digital Microelectronics," *IEEE Transactions on Nuclear Science*, vol. 50, no. 3, pp. 583–601, June 2003.

Doucin, B.; Poivey, C.; Carlotti, C.; Salminen, A.; Ojasalo, K.; Ahonen, R.; Poirot, P.; Baudry, L.; and Harboe Sorensen, R. (1997). "Study of Radiation Effects on Low Voltage Memories," *Proceedings of Fourth European Conference on Radiation and Its Effects on Components and Systems (RADECS)*, pp. 561–569, September 1997.

Fogle, A.D.; Darling, D.; Blish, R.C.; and Daszko, G. (2004). "Flash Memory under Cosmic and Alpha Irradiation," *Proceedings of IEEE International Reliability Physics Symposium (IRPS)*, pp. 637–638, April 2004.

Gambles, J.; Hass, K.; and Whitaker, S. (2003). "Radiation-Hardness of Ultra Low Power CMOS VLSI," 11th NASA Symposium on VLSI Design, May 2003.

Gadlage, M.; Eaton, P. H.; Benedetto, J. M.; and Turflinger, T. L. (2005) "Comparison of Heavy Ion and Proton Induced Combinatorial and Sequential Logic Error Rates in a Deep Submicron Process," *IEEE Transactions on Nuclear Science*, vol. 52, no. 6, pp. 2120–2124, December 2005.

Geppart, L. (2004) "A Static RAM Says Goodbye to Data Errors," *IEEE Spectrum Magazine*, vol. 41, no. 2, pp. 16–17, February 2004.

Gill, B.; Seifert, N.; and Zia, V. (2009). "Comparison of Alpha-Particle and Neutron-Induced Combinational and Sequential Logic Error Rates at the 32nm Technology Node," *Proceedings of IEEE International Reliability Physics Symposium (IRPS)*, pp. 199–205, April 2009.

Hareyama, M.; Hasebe, N.; Kodaira, S.; Masuyama, N.; Ota, S.; Sakurai, K.; Goka, T.; Koshiishi, H.; and Matsumoto, H. (2008). "Charge and Mass Composition of Heavy Ions in the Earth's Radiation Belt," *Proceedings of the 30th International Cosmic Ray Conference*, vol. 1, pp. 647–650, Merida, Mexico, 2007.

Hsiao, M. Y. (1970). "A Class of Optimal Minimum Odd-Weight-Column SEC-DED Codes," *IBM Journal of Research and Development*, vol. 14, no. 4, pp. 395–401, July 1970.

IUCF (Indiana University Cyclotron Facility) (2011). Radiation Effects Research Program (RERP).

url: http://www.iucf.indiana.edu/rerp/

JEDEC Standard (1996). "Test Procedures for the Measurement of Single-Event Effects in Semiconductor Devices from Heavy Ion Irradiation," *JESD57*, JEDEC Solid State Technology Association, New York, December 1996.

JEDEC Standard. (2006). "Measurement and Reporting of Alpha Particles and Terrestrial Cosmic Ray-Induced Soft Errors in Semiconductor Devices," *JESD89A*, JEDEC Solid State Technology Association, New York, October 2006.

Karnik, T. and Hazucha, P. (2004). "Characterization of Soft Errors Caused by Single Event Upsets in CMOS Processes," *IEEE Transactions on Dependable and Secure Computing*, vol. 1, no. 2, pp. 128–143, April–June 2004.

Koga, R. (1996). "Single-Event Effect Ground Test Issues," *IEEE Transactions on Nuclear Science*, vol. 43, no. 2, pp. 661–670, April 1996.

LANSCE (Los Alamos Neutron Science Center) (2006). *The ICE House*. url: http://lansce.lanl.gov/ns/instruments/ICEHouse/index.html.

Leavy, J. F. and Poll, R. A. (1969). "Radiation-Induced Integrated Circuit Latchup," *IEEE Transactions on Nuclear Science*, vol. 16, no. 6, pp. 96–103, December 1969.

Lee, H. K.; Lilja, K.; Bounasser, M.; Relangi, P.; Linscott, I.R.; Inan, U.S.; and Mitra, S. (2010). "LEAP: Layout Design through Error-Aware Transistor Positioning for Soft-Error Resilient Sequential Cell Design," *Proceedings of IEEE International Reliability Physics Symposium (IRPS)*, pp. 203–212, Anaheim CA, May 2010.

Lee, H. K.; Linscott, I.; Inan, U. (2011). "Design Framework for Soft-Error-Resilient Sequential Cells," *Nuclear and Space Radiation Effects Conference (NSREC)*, Las Vegas, July 2011.

Lilja, K. (2008). "Layout Method for Soft-Error Hard Electronics, and Radiation Hardened Logic Cell," U.S. Patent Pending 12/354,655, January 2008.

Lilja, K. (2009). "Single Event Cross-Section and Error-Rate Prediction for Digital Logic Using Accurate Simulation," *Single Event Effects Symposium (SEE)*, La Jolla, 2009.

Lyons, D. (2000). "Sun Screen," Forbes Magazine, November 13 2000. url: http://www.forbes.com/global/2000/1113/0323026a.html.

Lysinger, M.; Roche, P.; Zamanian, M.; Jacquet, F.; Sahoo, N.; McClure, D.; and Russell, J. (2008). "A Radiation Hardened Nano-Power 8Mb SRAM in 130nm CMOS," *Proceedings of International Symposium on Quality Electronic Design (ISQED)*, pp. 23–29, San Jose CA, 2008.

Massengill, L. W.; Choi, B. K.; Fleetwood, D. M.; Schrimpf, R. D.; Galloway, K. F.; Shaneyfelt, M. R.; Meisenheimer, T. L.; Dodd, P. E.; Schwank, J. R.; Lee, Y. M.; Johnson, R. S.; and Lucovsky, G. (2001). "Heavy-Ion Induced Breakdown in Ultra-Thin Gate Oxides and High-k Dielectrics," *IEEE Transactions on Nuclear Science*, vol. 48, no. 6, pp. 1904–1912, December 2001.

- Mavis, D. G. and Alexander, D. R. (1997). "Employing Radiation Hardness by Design Techniques with Commercial Integrated Circuit Processes," *Proceedings of AIAA/IEEE Digital Avionics Systems Conference*, vol. 1, no. 2.1, pp. 15–22, Irvine CA, October 1997.
- Mavis, D. G. and Eaton, P. H. (2002). "Soft Error Rate Mitigation Techniques for Modern Microcircuits," *Proceedings of IEEE International Reliability Physics Symposium (IRPS)*, pp. 216–225, Dallas TX, 2002.
- May, T.C. and Woods, M. H. (1979). "Alpha-Particle Induced Soft Errors in Dynamic Memories," *IEEE Transactions on Electronic Devices*, vol. 26, pp. 2–9, Feb. 1979.
- May, T. C.; Scott, G. L.; Meieran, E.S.; Winer, P.; and Rao, V.R. (1984). "Dynamic Fault Imaging of VLSI Random Logic Devices," *Proceedings of IEEE International Reliability Physics Symposium (IRPS)*, pp. 95–108, Las Vegas NV, 1984.
- Meaney, P. J.; Swaney, S.B.; Sanda, P.N.; and Spainhower, L. (2005). "IBM z990 Soft Error Detection and Recovery," *IEEE Transactions on Device and Materials Reliability*, vol. 5, no. 3, pp. 419–427, September 2005. Messenger, G.C. (1982). "Collection of Charge on Junction Nodes from Ion Tracks," *IEEE Transactions on Nuclear Science*, vol. 29, no. 6, pp. 2024–2031, December 1982.

Mitra, S.; Seifert, N.; Zhang, M.; Shi, Q.; and Kim, K.S. (2005) "Robust System Design with Built-In Soft Error Resilience," *IEEE Computer*, vol. 38, no. 2, pp.43–52, February 2005.

Muller, D. E. and Bartky, W. S. (1959). "A Theory of Asynchronous Circuits," *Proceedings of International Symposium on the Theory of Switching*, Cambridge, M.A: Harvard University Press, 1959.

Narasimham, B.; Shuler, R. L.; Black, J. D.; Bhuva, B. L.; Schrimpf, R. D.; Witulski, A. F.; Holman, W. T.; and Massengill, L.W. (2007). "Quantifying the Effectiveness of Guard Bands in Reducing the Collected Charge Leading to Soft Errors," *Proceedings of IEEE International Reliability Physics Symposium (IRPS)*, pp. 676–677, April 2007.

Narasimham, B.; Gambles, J. W.; Shuler, R. L.; Bhuva, B. L.; and Massengill, L. W. (2008). "Quantifying the Effect of Guard Rings and Guard Drains in Mitigating Charge Collection and Charge Spread," *IEEE Transactions on Nuclear Science*, vol. 55, no. 6, pp. 3456–3460, December 2008.

Normand, E.; Wert, J. L.; Quinn, H.; Fairbanks, T. D.; Michalak, S.; Grider, G.; Iwanchuk, P.; Morrison, J.; Wender, S.; and Johnson, S. (2010). "First Record of Single-Event Upset on Ground, Cray-1 Computer at Los Alamos in 1976," *IEEE Transactions on Nuclear Science*, vol. 57, no. 6, pp. 3114–3120, December 2010.

Nowlin, R. N.; McEndree, S. R.; Wilson, A. L.; and Alexander, D. R. (2006). "A New Total-Dose-Induced Parasitic Effect in Enclosed-Geometry Transistors," *IEEE Transactions on Nuclear Science*, vol. 52, no. 6, pp. 2495–2502, December 2005.

Nguyen, D. N.; Guertin, S. M.; Swift G. M.; and Johnston, A. H. (1999). "Radiation Effects on Advanced Flash Memories," *IEEE Transactions on Nuclear Science*, vol. 46, no. 6, pp. 1744–1750, December 1999.

Nguyen, D. N. and Scheik, L. (2003). "Total Dose, Single Event Effect and Radiation Induced Single Cell Failures in Advanced Flash Memories," *IEEE Nuclear Science and Radiation Effects Conference*, Monterey, July 2003.

Olson, B. D.; Ball, D. R.; Warren, K. M.; Massengill, L. W.; Haddad, N. F.; Doyle, S.E.; and McMorrow, D. (2005). "Simultaneous Single Event Charge Sharing and Parasitic Bipolar Conduction in a High-Scaled SRAM Design," *IEEE Transactions on Nuclear Science*, vol. 52, no. 6, pp. 2132–2136, December 2005.

Olson, B. D.; Amusan, O. A.; Dasgupta, S.; Massengill, L. W.; Witulski, A. F.; Bhuva, B. L.; Alles, M. L.; Warren, K. M.; and Ball, D. R. (2007). "Analysis of Parasitic PNP Bipolar Transistor Mitigation Using Well Contacts in 130 nm and 90 nm CMOS Technology," *IEEE Transactions on Nuclear Science*, vol. 54, no. 4, pp. 894–897, August 2007.

Robust Chip Inc. (2010). The ACCURO simulator. url: www.robustchip.com

Roche, P.; Jacquet, F.; Caillat, C.; and Schoelkopf, J. (2004). "An Alpha Immune and Ultra Low Neutron SER High Density SRAM," *Proceedings of IEEE International Reliability Physics Symposium (IRPS)*, pp. 671–672, April 2004.

Roche, P. and Gasiot, G. (2005). "Impacts of Front-End and Middle-End Process Modifications on Terrestrial Soft Error Rate," *IEEE Transactions on Device and Materials Reliability*, vol. 5, no. 3, pp. 382–396, September 2005.

Rockett, L. R. (1988). "An SEU-Hardened CMOS Data Latch Design," IEEE Transactions on Nuclear Science, vol. 35, no. 6, pp. 1682–1687, December 1988.

Rockett, L. R. (1992). "Simulated SEU Hardened Scaled CMOS SRAM Cell Design Using Gated Resistors," *IEEE Transactions on Nuclear Science*, vol. 39, no. 5, pp. 1532–1541, October 1992.

Sanda, P. N.; Kellington, J. W.; Kudva, P.; Kalla, R.; McBeth, R. B.; Ackaret, J.; Lockwood, R.; Schumann, J.; and Jones, C.R. (2008). "Soft-Error Resilience of the IBM POWER6 Processor," *IBM Journal of Research and Development*, vol. 52, no. 3, pp. 275–284, May 2008.

Seifert, N. P.; Slankard, P.; Kirsch, M.; Narasimham, B.; Zia, V.; Brookreson, C.; Vo, A.; Mitra, S.; Gill, B.; and Maiz, J. (2006). "Radiation-Induced Soft Error Rates of Advanced CMOS Bulk Devices," *Proceedings of IEEE International Reliability Physics Symposium (IRPS)*, pp. 215–225, March 2006.

Seifert, N.; Gill, B.; Zia, V.; Zhang, M.; and Ambrose, V. (2007). "On the Scalability of Redundancy based SER Mitigation Schemes," *Proceedings of IEEE International Conference on Integrated Circuit Design and Technology (ICICDT)*, pp. 1–9, May 2007.

Seifert, N. (2008). "Soft Error Rates of RadHard Sequentials Utilizing Local Redundancy," *Proceedings of IEEE On-Line Testing Symposium (IOLTS)*, pp. 49–50, July 2008.

Seifert, N. P.; Ambrose, V.; Gill, B.; Shi, Q.; Allmon, R.; Recchia, C.; Mukherjee, S.; Nassif, N.; Krause, J.; Pickholtz, J.; and Balasubramanian, A. (2010a). "On The Radiation-Induced Soft Error Performance of Hardened Sequential Elements in Advanced Bulk CMOS Technologies," *Proceedings of IEEE International Reliability Physics Symposium (IRPS)*, pp. 188–197, Anaheim CA, May 2010.

Sexton, F. W.; Fleetwood, D. M.; Shaneyfelt, M. R.; Dodd, P. E.; and Hash, G. L. (1997). "Single Event Gate Rupture in Thin Gate Oxides," IEEE Transactions on Nuclear Science, vol. 44, no. 6, pp. 2345–2352, December 1997.

Sexton, F. W. (2003). "Destructive Single-Event Effects in Semiconductor Devices and ICs," *IEEE Transactions on Nuclear Science*, vol. 50, no. 3, pp. 603–621, June 2003.

Shuler, R. L.; Kouba, C.; and O'Neill, P. M. (2005). "SEU Performance of TAG Based Flip-Flops," *IEEE Transactions on Nuclear Science*, vol. 52, no. 6, pp. 2550–2553, December 2005.

Shuler, R.L.; Balasubramanian, A.; Narasimham, B.; Bhuva, B.L.; O'Neill, P.M.; and Kouba, C. (2006). "The Effectiveness of TAG or Guard-Gates in SET Suppression Using Delay and Dual-Rail Configurations at  $0.35\mu m$ ," *IEEE Transactions on Nuclear Science*, vol. 53, no. 6, pp. 3428–3431, December 2006.

Shuler, R. L.; Bhuva, B. L.; O'Neill, P. M.; Gambles, J. W.; and Rezgui, S. (2009). "Comparison of Dual-Rail and TMR Logic Cost Effectiveness and Suitability for FP-GAs With Reconfigurable SEU Tolerance," *IEEE Transactions on Nuclear Science*, vol. 56, no. 1, pp. 214–219, February 2009.

Stojanovic, V. and Oklobdzija, V. G. (1999). "Comparative Analysis of Master-Slave Latches and Flip-Flops for High-Performance and Low-Power Systems," *IEEE Journal of Solid-State Circuits*, vol. 34, no. 4, pp. 536–548, April 1999.

Uemura, T.; Tosaka, Y.; Matsuyama, H.; Shono, K.; Uchibori, C.J.; Takahisa, K.; Fukuda, M.; and Hatanaka, K. (2010). "SEILA: Soft Error Immune Latch for Mitigating Multi-Node-SEU and Local-Clock SET," *Proceedings of International Reliability Physics Symposium*, pp. 218–223, May 2010.

Wallmark, J. T. and Marcus, S. M. (1961). "Maximum Packing Density and Minimum Size of Semiconductor Devices," *Proceedings of the International Electron Device Meeting*, vol. 7, pp. 34, Washington DC, October 1961.

Walt, M. (1994). "Introduction to Geomagnetically Trapped Radiation", *New York: Cambridge University Press*, pp. 74–83, 1994. ISBN 0-521-43143-3.

Wang, W.; Ker, M.; Chiang, M.; and Chen, C. (2001). "Level Shifters for High-Speed 1V to 3.3V Interfaces in a 0.13m Cu-Interconnection / Low-K CMOS Technology," *Proceedings of International Symposium on VLSI Technology*, Systems, and Applications, pp. 307-310, April 2001.

Wang, C. C. (2009). "High-Fidelity Analog-to-Digital Conversion for Spaceborne Applications," *Ph.D. Dissertation*, Department of Electrical Engineering, Stanford University, Stanford CA, September 2009.

Weaver, H. T.; Axness, C. L.; McBrayer, J.D.; Browning, J.S.; Fu, J.S.; Ochoa, A. Jr.; and Koga, R. (1987). "An SEU Tolerant Memory Cell Derived from Fundamental Studies of SEU Mechanisms in SRAM," *IEEE Transactions on Nuclear Science*, vol. 34, no. 6, pp. 1281–1286, December 1987.

Whitaker, S.; Canaris, J.; and Liu, K. (1991). "SEU Hardened Memory Cells for a CCSDS Reed-Solomon Encoder," *IEEE Transactions on Nuclear Science*, vol. 38, no. 6, pp. 1471-1477, December 1991.

Zhang, M.; Mitra, S.; Bak, T. M.; Seifert, N.; Wang, N. J.; Shi, Q.; Kim, K. S.; Shanbhag, N. R.; and Patel, S.J. (2006). "Sequential Element Design with Built-In Soft Error Resilience," *IEEE Transactions on Very Large Scale Integration (VLSI) Systems*, vol. 14, no. 12, December 2006.

Ziegler, J. F. and Lanford, W.A. (1980). "The Effect of Sea Level Cosmic Rays on Electronic Devices," *Digest of Technical Papers of the IEEE International Solid-State Circuits Conference*, vol. 23, pp. 70–71, February 1980.

Ziegler, J. F. (1998). "Terrestrial Cosmic Ray Intensities," *IBM Journal of Research and Development*, vol. 42, no. 1, pp. 117–140, January 1998.