#### Building Reliable NAND Flash Memory Storage Systems

Kevin M. Greenan, Ethan L. Miller and Darrell D.E. Long UCSC Thomas Schwarz Santa Clara University Kaladhar Voruganti, Garth Goodson and Jon Elerath NetApp







## **NAND Flash Memory Overview**

#### The Good

- Fast random reads
- Low power utilization
- No moving parts
- The Bad
  - Writing involves erasing/programming
  - Reliability is dependent on usage and time
    - Endurance
    - Retention
    - Raw bit-error rate (RBER)

Must overcome reliability concerns without hurting performance



## **Objectives**

- Improve reliability
  - Control all writes to flash
  - Put mechanisms in place to deal with increasing RBER
    - Dynamic mechanisms
    - Trade space and performance for increased fault tolerance
  - Error handling beyond bit errors
- Erasure codes provide great fit
- Maintain good performance using erasure codes
  - Stage writes in other NVRAM or BB-RAM
  - Write across as many chips as possible
  - Write sequentially to each device





## Flash Media Reliability

- Reliability is typically given by RBER, retention and endurance
- Each changes with:
  - Manufacturer
  - Bits per cell (i.e. SLC and MLC)
  - Use
  - Time
- Here, we consider the relationship between use and RBER
  - Still figuring out use/time dependency on RBER
- Failure of other components may also lead to data loss
  - Chips, controllers, etc.





#### **RBER as a Function of Erase Cycles**



- Use has a dramatic effect on RBER!
- **pdsi** 🔅 Data taken from Intel-Micron study

SSRC

- Performed regression over data to extrapolate
- 4 devices: (1)10K cycles, (2) 5K cycles, (1) unspecified

#### Architecture







#### **Threats in this Architecture**



e pdsi





## **Options for Handling Errors**

- Error Correcting Codes (ECC)
  - Correct e bit errors
  - Can detect 2e bit errors
  - Generally computed in controller (or interface)
  - Applied to sectors or pages
- Hashing
  - Easy to compute
  - Can detect any errors with very high probability
- Erasure Coding
  - Applied at coarser granularity than ECCs (i.e. multiple pages)
  - Can correct known errors via ECC or hash
  - Detect errors with very high probability
  - Easily re-code if implemented in SW





## **Challenges: Erasure Coding in Flash**

- Block management
  - Given encoding, determine addressable data blocks
- Writing erasure coded data
  - Balance writes across banks
  - Properly handle parity updates
- Rebuilding lost data
  - Localize recovery operations
- Graceful degradation
  - Provide ability to change encoding as RBER increases
- Failover
  - Determine where to put rebuilt data





### **Component Protection**







#### **Component Protection**







### **Component Protection**



10

#### **Block Groups**



All writes go to current block group  $B_{0,i}$   $B_{1,i}$   $B_{2,i}$   $B_{3,i}$   $B_{4,i}$   $B_{5,i}$   $B_{6,i}$   $B_{7,i}$ 

 $parity\_map \leftarrow \{7 = 0 \oplus 1 \oplus 2 \oplus 3 \oplus 4 \oplus 5 \oplus 6\}$  $data\_map \leftarrow \{0 \to 7, 1 \to 7, 2 \to 7, 3 \to 7, 4 \to 7, 5 \to 7, 6 \to 7\}$ 

An erasure code instance is associated with a block group



















 $parity\_map \leftarrow \{7 = 0 \oplus 1 \oplus 2 \oplus 3 \oplus 4 \oplus 5 \oplus 6\}$  $data\_map \leftarrow \{0 \to 7, 1 \to 7, 2 \to 7, 3 \to 7, 4 \to 7, 5 \to 7, 6 \to 7\}$ 



**pdsi** 

















 $parity\_map \leftarrow \{7 = 0 \oplus 1 \oplus 2 \oplus 3 \oplus 4 \oplus 5 \oplus 6\}$  $data\_map \leftarrow \{0 \to 7, 1 \to 7, 2 \to 7, 3 \to 7, 4 \to 7, 5 \to 7, 6 \to 7\}$ 



**pdsi** 



Block groups allow us to change the encoding

Two encodings: current and old

**pdsi** 

All block groups with old encoding are more likely to be cleaned



#### Failover

- Page and block errors
  - Write data to current block group
  - Try to use page or block again once block group is cleaned
  - If we get a write error, then mark page or block as bad
- Can deal with bank errors with spares
- Spare-less component errors
  - Try to reconstruct data
  - Mark banks under failed components as bad
  - Reform block groups without bad banks



### Performance

- Based on flash simulator from NetApp
  - 4 DMA channels/card (Libra card has 2)
  - 2 interfaces/card
  - 2 DIMMs/interface
  - 2 banks/DIMM (16 total banks)
  - 64 blocks/bank
  - 64 pages/block
  - 1.2 ms (erase), 0.2 ms (prog), 0.025 ms (read)
- 2 cards connected to a host
- All functionality resides in driver on the host
- Evaluate write performance/reliability
  - No erasure code
  - 15+1 (across 16 banks)
  - 3+1 host-level, 3+1 iface level (across 16 banks)







## **Erasure Coding and Reliability**



Up-code when expected RBER gets too high

### Performance



Write size max is number of data pages in a stripe

Rebuild performance

**SISTIC** \* Current encoder does not compute full stripe parity

## **Other Challenges and Concerns**

#### Cleaning

- Wear leveling with block groups
- Bad block management
- Reliability and performance after failover
- Smart write policies
  - Coalesce page updates into single parity computation
  - Exploit parallelism in the hierarchy





#### **Questions?**



