Rethinking RAID - Trilug

Transcription

Rethinking RAIDDwain Simsdsims@bayleafnc.org

Secure Computing with Apache StrutsDwain Simsdsims@bayleafnc.org

Who is this guy?MS Computer Science, West Virginia University16 Years in Silicon ValleyLockheedSun Microsystems12 Years in Linux High Availability5 Years in Flash StorageFusion-ioSanDiskWestern Digital3

InspirationStorage is going through a Revolution4

InspirationOld Habits Die Hard5

Quick History Lesson5 MB 3200/Month19566

Fujitsu Eagle470 MB, 10K, 600W7

RAID now enters, stage left .This is where the whole idea about RAID gotstarted.8

Shugart (Seagate) ST-5065 MB 150019809

HGST “King Cobra” C15K600 670, 600GB, 7.5W10

HGST Ultrastar He12 670 12TB, 9.8W11

What is this RAID stuff anyway?12

Quick RAID HistoryUC BerkleyAlso the home of vi, csh, UNIX TCP/IP, BSD UNIX and Bill Joy!David Patterson, Garth Gibson, and Randy KatzMid-80sRedundant Array of Inexpensive DisksNow “Independent” DisksIBM can also claim invention of RAIDNorman Ken Ouchi – RAID 4Clark, et al. - Patent on RAID 5 (1986)13

Early RAID Systems14

RAID TerminologyRAID-0Striping; Super Important and widely used. No Redundancy!RAID-1Mirroring; Super important and widely used.RAID-10A stripe of mirrors. Super important and widely used.N number of devices are lost capacity-wise.RAID-2Never UsedRAID-3 and RAID-4Rarely used15

RAID TerminologyRAID-5Parity spread across N 1 devices; Can survive 1 device failure.Can be implemented in both Hardware and SoftwareSingle device capacity is lostRAID-6Parity spread across N 2 devices; Can survive 2 device failures.Can be implemented in both Hardware and SoftwareTwo device capacity is lost16

So what is the problem?17

Device failure means RAID Rebuild!Not Really a big deal with sub-TB hard drivesWe will see that data shortlyBecame more Dangerous and Painful at 1TBSolution – RAID 6! (well sorta.)However, with 10TB devices (and beyond).Monster Problem!As we shall see .18

MethodologyCommon ServersLenovo Broadwell based (Lenovo x3650 M5, 2U, 2 Socket)CentOS 7.3 (.514 kernel)Avago (LSI) RAID Adapter “Flatwoods” (mostly)RAID-5 Array5 Devices in RAID 5, with a hot spare (in most cases)(and couple of interesting Software RAID Scenarios)Common LoadFlexible I/O Tester “fio”60/40 Random Read/WriteQueue Depth 32 per job (20 jobs)19

MethodologyMeasuringIOPS with No LoadIOPS under LoadRAID Rebuild time with No LoadRAID Rebuild time under Load20

And Now a Word from Our Sponser21

YOU!22

Easy Way to Sponser23

Collected DataRAID 5 Rebuild TimesDriveRAID Array SizeRebuild timeIdle(hours)500GB 7200 6G SASHGST King Cobra F 15K 300G 12G SASHGST Cobra F 10K 600GB 12G SASHGST 10TB 12G SAS (Libra He10)CloudSpeed II 1.92TB SATAOptimus II Max 3.84TB 6G SASOptimus II Ascend 800GB 6G SASBear Cove 10DWPD 800G 12G SAS R100 (14W)Bear Cove 10DWPD 800G 12G SAS R100 (9W)Fusion ioMemory SX350 3.2TB PCIeFusion ioMemory SX350 3.2TB PCIe (Thread 32)HGST SN-150 1.6TB NVMeHGST SN-150 1.6TB NVMe (Threaded sion ioMemory SX350 3.2TB PCIeFusion ioMemory SX350 3.2TB PCIeFusion ioMemory SX350 3.2TB PCIe12.8TB16TB3.2TB24Rebuild Timeunder Load(hours)13454584200 ld 10.8K11.2K11.3K12K95.7K28.5K81.9K

Consequences!RAID-5(6) Rebuild times on current “Capacity”(10,12 TB) drives are enormous!4200 Hours 5 ½ MonthsStaggering!!Devices are stressed even more during rebuildIncreased chance of additional device(s) failingRelatively slow devices now run even slower!25

Is there Better Way?Absolutely!26

Application RedundancyLet your application take care of RedundancyMySQLMaster-Slave ReplicationOracleData GuardMicrosoftSQLserver AlwaysOn Application ClusterSAP HanaHadoop (in the base architecure)OpenStack and CephNot only protects against storage failure, butsystem failure as well27

Erasure CodingRAID-6 is a primitive Erasure CodeTahoe-LAFSCeph – Block and ObjectHadoopSwift – and other Object Storage SolutionsHGST ActiveScale – S3API (ie Reed-Solomon, OpenRQ)28

Software Defined StorageCephSwiftSUSE Enterprise StorageVMware VSANMicrosoft Storage Spaces DirectDataCoreNexentaNutanix(and a score of others)29

Remember the Revolution .Flash StorageUBERTypically an order of magnitude (or two!) better than spinnersNo Moving PartsBuilt-in Resiliency30

ToolsFioThe Flexible I/O TesterSmall learning curve yields great resultsVery script-ableTipsRemember to “Pre-Condition” (especially Flash devices)Watch your Queue DepthUse the right “io engine”Beware – power tools can injure!31

Fio sample script[global]readwrite writerwmixread 0blocksize 4Mioengine libaiothread 0size 100%iodepth 16group reporting 1description fio PRECONDITION sequential 4M complete write[/dev/sda]filename /dev/sdacpus allowed 0 1932

More ToolsMegaRAID Storage ManagerLinux md RAID toolscat /proc/mdstatmdadm –misc –detail /dev/mdYYYdmesg H wTake Time to Tune your md ArrayThreads sudo echo 16 /sys/block/md0/md/group thread cntSpeed Limitsdev.raid.speed limit max xxyyzzDefaults to dev.raid.speed limit max 20000033

Things to RememberRAID 0 and 1 (and 10) are still very viable Maybe not so much with RAID 10 . RAID 5 and 6 are still OK for Flash Devices Understand your Limitations! The RAID Adapter will be your limiting factor RAID 6 is likely OK for sub-TB Spinning Disk As long as you can get them! RAID Hardware varies widely in performance! Capacity Hard Drives Require a different DataResiliency Technique Using md Software RAID? Do not forget to tune! 34

Maybe some concern with RAID 10.35

Where next?36

(Sept 1995, page 248)https://www.youtube.com/watch?v V-WbdMPiM1AFujitsu Eagle Spinup!http://queue.acm.org/detail.cfm?id 1670144Triple-Parity RAID and Beyond (Adam Leventhal, Sun)https://github.com/axboe/fioFlexible I/O Tester (fio) (Jens id.wiki.kernel.org/index.php/RAID setupExcellent md RAID tutorial37

Thanks!!!Dwain Simsdsims@bayleafnc.orgGoogle Voice: 919-480-177438

Collected Data39

Redundant Array of Inexpensive Disks Now "Independent" Disks IBM can also claim invention of RAID Norman Ken Ouchi - RAID 4 Clark, et al. - Patent on RAID 5 (1986) 14 Early RAID Systems. 15 RAID Terminology RAID-0 Striping; Super Important and widely used. No Redundancy! RAID-1 Mirroring; Super important and widely used. RAID-10