Synology Btrfs crash
A couple of months ago. I got myself a new Synology to replace my older DS213. The choice was made for a DS716+II, which also supports Btrfs. To migrate from the DS213 to the DS716, I took out one of the mirror-drives of the DS213 and placed it in the DS716 to reformat and then gradually transfer the data to the new Syno. Once everything would be transferred, I’d put the remaining disk of the DS213 and add it to the DS716.
After adding and wiping the second drive to the DS716, my Btrfs-volume decided to crash…
Since the second disk was already wiped, I had no other backup of my data, so I began searching the internet. I did notice, when logging in to the NAS with SSH, that my data was still present, but the volume wouldn’t come back online.
After several searches, I came across this post from Sébastien Dubois, where he encountered the same error.
When checking the health of the volume, I had the following output:
> mdadm --detail /dev/md2
/dev/md2:
Version : 1.2
Creation Time : Thu Apr 27 02:33:31 2017
Raid Level : raid1
Array Size : 2925444544 (2789.92 GiB 2995.66 GB)
Used Dev Size : 2925444544 (2789.92 GiB 2995.66 GB)
Raid Devices : 2
Total Devices : 1
Persistence : Superblock is persistent
Update Time : Sun Apr 30 12:25:01 2017
State : clean, degraded
Active Devices : 1
Working Devices : 1
Failed Devices : 0
Spare Devices : 0
Name : ds716:2 (local to host ds716)
UUID : 5d2c337f:99f93c69:ac00d5b7:f5c6819f
Events : 9685
Number Major Minor RaidDevice State
0 8 3 0 active sync /dev/sda3
1 0 0 1 removed
Inspection of the disk gave me the following output:
> mdadm --examine /dev/sda3
/dev/sda3:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x0
Array UUID : 5d2c337f:99f93c69:ac00d5b7:f5c6819f
Name : ds716:2 (local to host ds716)
Creation Time : Thu Apr 27 02:33:31 2017
Raid Level : raid1
Raid Devices : 2
Avail Dev Size : 5850889120 (2789.92 GiB 2995.66 GB)
Array Size : 5850889088 (2789.92 GiB 2995.66 GB)
Used Dev Size : 5850889088 (2789.92 GiB 2995.66 GB)
Data Offset : 2048 sectors
Super Offset : 8 sectors
State : clean
Device UUID : de0fb6cf:a3f04514:93f8ac77:ef9a8dec
Update Time : Sun Apr 30 12:25:01 2017
Checksum : cd541bd3 - correct
Events : 9685
Device Role : Active device 0
Array State : A. ('A' == active, '.' == missing)
Sébastien had contacted Synology about his issue, and got remote help. Using the commands he retrieved from the log-file, I executed the following commands to re-create the array to get it back up again.
# Stop all NAS services except from SSH
syno_poweroff_task -d
# Stop the RAID-set
mdadm --stop /dev/md2
# Unmount a volume
umount /volume1
#Recreate the RAID-set
mdadm -Cf /dev/md2 -e1.2 -n1 -l1 /dev/sda3 -u5d2c337f:99f93c69:ac00d5b7:f5c6819f
The actual output of the last command was the following:
> mdadm -Cf /dev/md2 -e1.2 -n1 -l1 /dev/sda3 -u5d2c337f:99f93c69:ac00d5b7:f5c6819f
mdadm: /dev/sda3 appears to be part of a raid array:
level=raid1 devices=2 ctime=Thu Apr 27 02:33:31 2017
Continue creating array? y
mdadm: array /dev/md2 started.
> mdadm --detail /dev/md2
/dev/md2:
Version : 1.2
Creation Time : Sun Apr 30 12:44:12 2017
Raid Level : raid1
Array Size : 2925444544 (2789.92 GiB 2995.66 GB)
Used Dev Size : 2925444544 (2789.92 GiB 2995.66 GB)
Raid Devices : 1
Total Devices : 1
Persistence : Superblock is persistent
Update Time : Sun Apr 30 12:44:12 2017
State : clean
Active Devices : 1
Working Devices : 1
Failed Devices : 0
Spare Devices : 0
Name : ds716:2 (local to host ds716)
UUID : 5d2c337f:99f93c69:ac00d5b7:f5c6819f
Events : 0
Number Major Minor RaidDevice State
0 8 3 0 active sync /dev/sda3
After that, the Btrfs-volume was back up again, as confirmed by mdstat.
> cat /proc/mdstat
Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4]
md2 : active raid1 sda3[0]
2925444544 blocks super 1.2 [1/1] [U]
md3 : active raid1 sdb3[0]
2925444544 blocks super 1.2 [1/1] [U]
md1 : active raid1 sdb2[1] sda2[0]
2097088 blocks [2/2] [UU]
md0 : active raid1 sdb1[1]
2490176 blocks [2/1] [_U]
unused devices: <none>
The issue I had, occured late on a Friday. I did submit a support-ticket with Synology, but since it already was weekend, I wasn’t expecting to hear back from them until Monday.
During the troubleshooting I did, I kept them updated. On Monday I got a request to send them a diagnostics pacakge.
I appeared one of my drives was having a crazy amount of bad blocks and they urged me to replace that drive as soon as possible.
The faulty drive was a WD Red, which was still under warranty. Luckily for me, it got replaced and everything is working as intended now.