Synology Btrfs crash

A couple of months ago. I got myself a new Synology to replace my older DS213. The choice was made for a DS716+II, which also supports Btrfs. To migrate from the DS213 to the DS716, I took out one of the mirror-drives of the DS213 and placed it in the DS716 to reformat and then gradually transfer the data to the new Syno. Once everything would be transferred, I’d put the remaining disk of the DS213 and add it to the DS716.

After adding and wiping the second drive to the DS716, my Btrfs-volume decided to crash…

Since the second disk was already wiped, I had no other backup of my data, so I began searching the internet. I did notice, when logging in to the NAS with SSH, that my data was still present, but the volume wouldn’t come back online.

After several searches, I came across this post from Sébastien Dubois, where he encountered the same error.

When checking the health of the volume, I had the following output:

> mdadm --detail /dev/md2
/dev/md2:
        Version : 1.2
  Creation Time : Thu Apr 27 02:33:31 2017
     Raid Level : raid1
     Array Size : 2925444544 (2789.92 GiB 2995.66 GB)
  Used Dev Size : 2925444544 (2789.92 GiB 2995.66 GB)
   Raid Devices : 2
  Total Devices : 1
    Persistence : Superblock is persistent

    Update Time : Sun Apr 30 12:25:01 2017
          State : clean, degraded
 Active Devices : 1
Working Devices : 1
 Failed Devices : 0
  Spare Devices : 0

           Name : ds716:2  (local to host ds716)
           UUID : 5d2c337f:99f93c69:ac00d5b7:f5c6819f
         Events : 9685

    Number   Major   Minor   RaidDevice State
       0       8        3        0      active sync   /dev/sda3
       1       0        0        1      removed

Inspection of the disk gave me the following output:

> mdadm --examine /dev/sda3
/dev/sda3:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
     Array UUID : 5d2c337f:99f93c69:ac00d5b7:f5c6819f
           Name : ds716:2  (local to host ds716)
  Creation Time : Thu Apr 27 02:33:31 2017
     Raid Level : raid1
   Raid Devices : 2

 Avail Dev Size : 5850889120 (2789.92 GiB 2995.66 GB)
     Array Size : 5850889088 (2789.92 GiB 2995.66 GB)
  Used Dev Size : 5850889088 (2789.92 GiB 2995.66 GB)
    Data Offset : 2048 sectors
   Super Offset : 8 sectors
          State : clean
    Device UUID : de0fb6cf:a3f04514:93f8ac77:ef9a8dec

    Update Time : Sun Apr 30 12:25:01 2017
       Checksum : cd541bd3 - correct
         Events : 9685


   Device Role : Active device 0
   Array State : A. ('A' == active, '.' == missing)

Sébastien had contacted Synology about his issue, and got remote help. Using the commands he retrieved from the log-file, I executed the following commands to re-create the array to get it back up again.

# Stop all NAS services except from SSH
syno_poweroff_task -d
# Stop the RAID-set
mdadm --stop /dev/md2
# Unmount a volume
umount /volume1
#Recreate the RAID-set
mdadm -Cf /dev/md2 -e1.2 -n1 -l1 /dev/sda3 -u5d2c337f:99f93c69:ac00d5b7:f5c6819f

The actual output of the last command was the following:

> mdadm -Cf /dev/md2 -e1.2 -n1 -l1 /dev/sda3 -u5d2c337f:99f93c69:ac00d5b7:f5c6819f
mdadm: /dev/sda3 appears to be part of a raid array:
    level=raid1 devices=2 ctime=Thu Apr 27 02:33:31 2017
Continue creating array? y
mdadm: array /dev/md2 started.
> mdadm --detail /dev/md2
/dev/md2:
        Version : 1.2
  Creation Time : Sun Apr 30 12:44:12 2017
     Raid Level : raid1
     Array Size : 2925444544 (2789.92 GiB 2995.66 GB)
  Used Dev Size : 2925444544 (2789.92 GiB 2995.66 GB)
   Raid Devices : 1
  Total Devices : 1
    Persistence : Superblock is persistent

    Update Time : Sun Apr 30 12:44:12 2017
          State : clean
 Active Devices : 1
Working Devices : 1
 Failed Devices : 0
  Spare Devices : 0

           Name : ds716:2  (local to host ds716)
           UUID : 5d2c337f:99f93c69:ac00d5b7:f5c6819f
         Events : 0

    Number   Major   Minor   RaidDevice State
       0       8        3        0      active sync   /dev/sda3

After that, the Btrfs-volume was back up again, as confirmed by mdstat.

> cat /proc/mdstat
Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4]
md2 : active raid1 sda3[0]
      2925444544 blocks super 1.2 [1/1] [U]

md3 : active raid1 sdb3[0]
      2925444544 blocks super 1.2 [1/1] [U]

md1 : active raid1 sdb2[1] sda2[0]
      2097088 blocks [2/2] [UU]

md0 : active raid1 sdb1[1]
      2490176 blocks [2/1] [_U]

unused devices: <none>

The issue I had, occured late on a Friday. I did submit a support-ticket with Synology, but since it already was weekend, I wasn’t expecting to hear back from them until Monday.

During the troubleshooting I did, I kept them updated. On Monday I got a request to send them a diagnostics pacakge.

I appeared one of my drives was having a crazy amount of bad blocks and they urged me to replace that drive as soon as possible.

The faulty drive was a WD Red, which was still under warranty. Luckily for me, it got replaced and everything is working as intended now.