Faking no read or write errors on a disk in Linux

dsg_admin — Sat, 08 Nov 2025 23:18:06 +0000

I have a major Linux Geek Filesystem hack I absolutely had to share express to those who would get and appreciate!

Here goes,

I have had this issue I have been working to resurrect a 6+2 raidset with 3 bad disks..

In a 6+2 raidset when 2 disks are bad then the raidset is degraded but can function, However when 3 disks are offline the raidset will not function and will be rendered offline there for the data is unaccessible

2 of the disks were marked as failed and 1 drive was rebuilding so the filesystem would not bring the raidset online

In this case, the 2 disks were offline because the disk read/write error counts exceeded the threshold allowable for us to keep the disk in the raidset.

I noted that we (Tintri TXOS) fail the disks when the read/write or other values exceed a specific threshold.

My intention was to find a way to make a couple disks online temporarily that had already been marked failed.

The disks do not have any hardware IO or MCE (Machine Check Errors) so I felt there must be a way.

I did not have a way to just replace them because there are not enough spares to rebuild the raidset. Also the supported drives were EOL.

So in this case I had to find a way to unfail the disks so I could temporarily make them online so we can evacuate the data!

So here is my hack…

In Linux the disk errors are kept in

/sys/block/sdX/device/read_err (cor write_err)

Because this is the Linux “sysfs” file system you cannot write to those files to change or clear it those values are kept on the disk where they are read here upon disk insertion and it is read by the kernel

Sooo

My crazy workaround was to make a script that would mimic the data in each of those files.

For example the filesystem comes looks like this:

# cat /tmp/fake_sdX/read_err
0 0 0 7 0 20 0 0 [ 0 0 0 0 0 ]

I would make 3 directories and 3 empty files for the disk

/tmp/fake_sdX/read_err
/tmp/fake_sdX/write_err
/tmp/fake_sdX/other_err

I stop the file system so all of the disks are unmounted

Then echo the above values in there:

echo -e “0 0 0 0 0 0 0 0 [ 0 0 0 0 0 ]” > /tmp/fake_sdX/read_err

echo -e “0 0 0 0 0 0 0 0 [ 0 0 0 0 0 ]” > /tmp/fake_sdX/write_err

echo -e “0 0 0 0 0 0 0 0 [ 0 0 0 0 0 ]” > /tmp/fake_sdX/other_err

Then I mount the actual files to the temporary fake files:

Meaning I mount:

/sys/block/sdX/device/read_err —-> /tmp/fake_sdX/read_err

/sys/block/sdX/device/write_err—-> /tmp/fake_sdX/write_err

/sys/block/sdX/device/other_err —-> /tmp/fake_sdX/other_err

Then I verify the disk errors are now cleared and use my fake zeroed values.

Then I restart the file system

The filesystem comes up and the disks because the 2 disks previously marked failed due to threshold errors are now temporarily active !

File systems

Faking no read or write errors on a disk in Linux