Uncategorized https://datastorageguy.com Consulting Services and Tech-Tips from Ben Patridge Sat, 08 Nov 2025 23:18:06 +0000 en-US hourly 1 https://wordpress.org/?v=6.9.1 https://datastorageguy.com/wp-content/uploads/2025/11/cropped-dsgplatter-32x32.png Uncategorized https://datastorageguy.com 32 32 189268276 Faking no read or write errors on a disk in Linux https://datastorageguy.com/2025/11/08/faking-no-read-or-write-errors-on-a-disk-in-linux/ Sat, 08 Nov 2025 23:18:06 +0000 https://datastorageguy.com/?p=334

I have a major Linux Geek Filesystem hack I absolutely had to share express to those who would get and appreciate!

Here goes,

I have had this issue I have been working to resurrect a 6+2 raidset with 3 bad disks..

In a 6+2 raidset when 2 disks are bad then the raidset is degraded but can function, However when 3 disks are offline the raidset will not function and will be rendered offline there for the data is unaccessible

2 of the disks were marked as failed and 1 drive was rebuilding so the filesystem would not bring the raidset online

In this case, the 2 disks were offline because the disk read/write error counts exceeded the threshold allowable for us to keep the disk in the raidset.

I noted that we (Tintri TXOS) fail the disks when the read/write or other values exceed a specific threshold.

My intention was to find a way to make a couple disks online temporarily that had already been marked failed.

The disks do not have any hardware IO or MCE (Machine Check Errors) so I felt there must be a way.

I did not have a way to just replace them because there are not enough spares to rebuild the raidset. Also the supported drives were EOL.

So in this case I had to find a way to unfail the disks so I could temporarily make them online so we can evacuate the data!

So here is my hack…

In Linux the disk errors are kept in

/sys/block/sdX/device/read_err (cor write_err)

Because this is the Linux “sysfs” file system you cannot write to those files to change or clear it those values are kept on the disk where they are read here upon disk insertion and it is read by the kernel

Sooo

My crazy workaround was to make a script that would mimic the data in each of those files.

For example the filesystem comes looks like this:

# cat /tmp/fake_sdX/read_err
0 0 0 7 0 20 0 0 [ 0 0 0 0 0 ]

I would make 3 directories and 3 empty files for the disk

/tmp/fake_sdX/read_err
/tmp/fake_sdX/write_err
/tmp/fake_sdX/other_err

I stop the file system so all of the disks are unmounted

Then echo the above values in there:

echo -e “0 0 0 0 0 0 0 0 [ 0 0 0 0 0 ]” > /tmp/fake_sdX/read_err

echo -e “0 0 0 0 0 0 0 0 [ 0 0 0 0 0 ]” > /tmp/fake_sdX/write_err

echo -e “0 0 0 0 0 0 0 0 [ 0 0 0 0 0 ]” > /tmp/fake_sdX/other_err

Then I mount the actual files to the temporary fake files:

Meaning I mount:

/sys/block/sdX/device/read_err —-> /tmp/fake_sdX/read_err

/sys/block/sdX/device/write_err—-> /tmp/fake_sdX/write_err

/sys/block/sdX/device/other_err —-> /tmp/fake_sdX/other_err

Then I verify the disk errors are now cleared and use my fake zeroed values.

Then I restart the file system

The filesystem comes up and the disks because the 2 disks previously marked failed due to threshold errors are now temporarily active !

]]>
334
Using grep to search a file using multiple words as boolean AND OR condition https://datastorageguy.com/2024/09/10/using-grep-to-search-a-file-using-multiple-words-as-boolean-and-or-condition/ Tue, 10 Sep 2024 22:37:52 +0000 https://datastorageguy.com/?p=312

Command Overview

The following extended grep  (grep -E) example shows how to search a log files for

  1. Any line containing the words
    1. init
      AND
    2. SNMP OR tomcat 
  2. Any line containing either
    1. start_tomcat
      OR
    2. stop_tomcat
      OR
    3. start_snmp
      OR
    4. stop_snmp
  3. excluding any match with the word INT

Example

# grep -E "init.*(snmp|tomcat)|(start|stop)_(tomcat|snmp)" debug.log |grep -v INT 2024-09-10T13:54:36.686053-07:00 nec-6090-01#b stop_tomcat[4019]: running, pid file: /var/run/jsvc.pid , timeout interval: 15 secs 2024-09-10T13:54:36.687429-07:00 nec-6090-01#b stop_tomcat[4019]: pid: 1651 2024-09-10T13:54:36.704588-07:00 nec-6090-01#b stop_tomcat[4019]: init-+-AuthenticationS---7*[{Authentication}] 2024-09-10T13:54:36.705498-07:00 nec-6090-01#b stop_tomcat[4019]: |-HAMon---15*[{HAMon}] 2024-09-10T13:54:36.706346-07:00 nec-6090-01#b stop_tomcat[4019]: |-PlatMon---13*[{PlatMon}] 2024-09-10T13:54:36.707142-07:00 nec-6090-01#b stop_tomcat[4019]: |-ProcMon-+-realstore---98*[{realstore}] 2024-09-10T13:54:36.707881-07:00 nec-6090-01#b stop_tomcat[4019]: | 2024-09-10T13:54:36.708690-07:00 nec-6090-01#b stop_tomcat[4019]: `-9*[{ProcMon}] 2024-09-10T13:54:36.709454-07:00 nec-6090-01#b stop_tomcat[4019]: |-agetty 2024-09-10T13:54:36.710315-07:00 nec-6090-01#b stop_tomcat[4019]: |-cimserver---2*[{cimserver}] 2024-09-10T13:54:36.711077-07:00 nec-6090-01#b stop_tomcat[4019]: |-corewatch 2024-09-10T13:54:36.711792-07:00 nec-6090-01#b stop_tomcat[4019]: |-crond 2024-09-10T13:54:36.712541-07:00 nec-6090-01#b stop_tomcat[4019]: |-dbus-daemon 2024-09-10T13:54:36.713388-07:00 nec-6090-01#b stop_tomcat[4019]: |-java---21*[{java}] 2024-09-10T13:54:36.714237-07:00 nec-6090-01#b stop_tomcat[4019]: |-jsvc-+-jsvc 2024-09-10T13:54:36.714977-07:00 nec-6090-01#b stop_tomcat[4019]: | 2024-09-10T13:54:36.715776-07:00 nec-6090-01#b stop_tomcat[4019]: `-jsvc---62*[{jsvc}] 2024-09-10T13:54:36.716540-07:00 nec-6090-01#b stop_tomcat[4019]: |-jsvc-+-jsvc 2024-09-10T13:54:36.717367-07:00 nec-6090-01#b stop_tomcat[4019]: | 2024-09-10T13:54:36.718192-07:00 nec-6090-01#b stop_tomcat[4019]: `-jsvc---109*[{jsvc}] 2024-09-10T13:54:36.720254-07:00 nec-6090-01#b stop_tomcat[4019]: |-lldpd---lldpd 2024-09-10T13:54:36.721010-07:00 nec-6090-01#b stop_tomcat[4019]: |-6*[mingetty] 2024-09-10T13:54:36.721717-07:00 nec-6090-01#b stop_tomcat[4019]: |-ntpd---ntpd 2024-09-10T13:54:36.722473-07:00 nec-6090-01#b stop_tomcat[4019]: |-rpcbind 2024-09-10T13:54:36.723252-07:00 nec-6090-01#b stop_tomcat[4019]: |-rsyslogd-+-log_scrub.py 2024-09-10T13:54:36.724039-07:00 nec-6090-01#b stop_tomcat[4019]: | 2024-09-10T13:54:36.724822-07:00 nec-6090-01#b stop_tomcat[4019]: `-9*[{rsyslogd}] 2024-09-10T13:54:36.725621-07:00 nec-6090-01#b stop_tomcat[4019]: |-runuser---postmaster---18*[postmaster] 2024-09-10T13:54:36.726416-07:00 nec-6090-01#b stop_tomcat[4019]: |-runuser---postmaster---16*[postmaster] 2024-09-10T13:54:36.727143-07:00 nec-6090-01#b stop_tomcat[4019]: |-sh---stop_tomcat---pstree 2024-09-10T13:54:36.727860-07:00 nec-6090-01#b stop_tomcat[4019]: |-sshd-+-sshd---rsync.bin 2024-09-10T13:54:36.728562-07:00 nec-6090-01#b stop_tomcat[4019]: | 2024-09-10T13:54:36.729270-07:00 nec-6090-01#b stop_tomcat[4019]: `-sshd---bash---service---tomcat---initctl 2024-09-10T13:54:36.730039-07:00 nec-6090-01#b stop_tomcat[4019]: |-thriftshell---11*[{thriftshell}] 2024-09-10T13:54:36.730807-07:00 nec-6090-01#b stop_tomcat[4019]: |-udevd---2*[udevd] 2024-09-10T13:54:36.731537-07:00 nec-6090-01#b stop_tomcat[4019]: |-watchdog 2024-09-10T13:54:36.732229-07:00 nec-6090-01#b stop_tomcat[4019]: `-winbindd---winbindd 2024-09-10T13:54:37.735365-07:00 nec-6090-01#b stop_tomcat[4019]: Tomcat is stopped 2024-09-10T13:54:37.738757-07:00 nec-6090-01#b init[4098]: tomcat post-start 2024-09-10T13:54:39.366904-07:00 nec-6090-01#b start_tomcat[4097]: LOG-TOMCAT-0003: TASKSET options 2024-09-10T13:54:39.369087-07:00 nec-6090-01#b start_tomcat[4097]: Set JAVA options -Xms128M -Xmx2800M -XX:MaxMetaspaceSize=140M -XX:ReservedCodeCacheSize=140M -XX:+UseCodeCacheFlushing -Xss512k -XX:+UseStringDeduplication -Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=*:5005 -XX:+PrintClassHistogram -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/var/corefiles -XX:ErrorFile=/var/corefiles/hs_err_pid<>.log -Dfile.encoding=UTF8 -Dcom.tintri.platform.productid=BR2 -Dhttps.protocols=TLSv1.2 -Dorg.apache.catalina.loader.WebappClassLoader.ENABLE_CLEAR_REFERENCES=false 2024-09-10T13:54:39.396222-07:00 nec-6090-01#b start_tomcat[4097]: Starting server using jsvc. 2024-09-10T13:55:11.215363-07:00 nec-6090-01#b stop_snmp[15117]: Stopping snmp agent 2024-09-10T13:55:12.217316-07:00 nec-6090-01#b stop_snmp[15117]: SNMP agent is stopped 2024-09-10T13:55:12.246235-07:00 nec-6090-01#b start_snmp[15416]: Set JAVA options -Xms64M -Xmx128M -XX:MaxMetaspaceSize=48M -XX:ReservedCodeCacheSize=48M -XX:+UseCodeCacheFlushing -Xss512k -Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=*:5006 -XX:+UseStringDeduplication -XX:+PrintClassHistogram -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/var/corefiles -Dcom.tintri.platform.productid=BR2 -Dfile.encoding=UTF8 2024-09-10T13:55:12.247242-07:00 nec-6090-01#b start_snmp[15416]: Starting SNMP agent using jsvc. 2024-09-10T14:27:39.615636-07:00 nec-6090-01#b vmstore[2582]: HA: [2961] [tid 4996] HA-STATE: errorDetected: module [1], current state [104] indicates Error 2024-09-10T14:27:39.615698-07:00 nec-6090-01#b vmstore[2582]: HA: [2962] [tid 4996] HA-STATE: setStatusLocked: setting nodeRole to [SECONDARY], nodeStatus to [DISCONNECTED] 2024-09-10T14:27:39.690762-07:00 nec-6090-01#b vmstore[2582]: HA: [2967] [tid 5043] HA-STATE: errorDetected: module [5], current state [104] indicates Error 2024-09-10T14:27:40.191327-07:00 nec-6090-01#b vmstore[2582]: HA: [2969] [tid 5020] HA-STATE: setStatusLocked: setting nodeRole to [PRIMARY], nodeStatus to [SELECTED] 2024-09-10T14:27:40.312790-07:00 nec-6090-01#b vmstore[2582]: HA: [2970] [tid 2911] HA-STATE: disconnecting Secondary; becoming Primary 2024-09-10T14:27:40.313379-07:00 nec-6090-01#b vmstore[2582]: HA: [2980] [tid 2911] HA-STATE: notified HA modules to become Primary 2024-09-10T14:27:40.914011-07:00 nec-6090-01#b vmstore[2582]: HA: [3672] [tid 2911] HA-STATE: becomePrimary: invoked 2024-09-10T14:27:40.914059-07:00 nec-6090-01#b vmstore[2582]: HA: [3673] [tid 2911] HA-STATE: becomePrimary: eligible to be Primary due to NVRAM gen num [4] >= current disk gen num [4] 2024-09-10T14:27:40.914099-07:00 nec-6090-01#b vmstore[2582]: HA: [3674] [tid 2911] HA-STATE: updateGenNum: STARTUP: writing intended gen num [5] 2024-09-10T14:27:40.956406-07:00 nec-6090-01#b vmstore[2582]: HA: [3676] [tid 2911] HA-STATE: updateGenNum: writing NVRAM gen num as [5] 2024-09-10T14:27:40.999307-07:00 nec-6090-01#b vmstore[2582]: HA: [3681] [tid 2911] HA-STATE: setStatusLocked: setting nodeRole to [PRIMARY], nodeStatus to [RECOVERING] 2024-09-10T14:27:46.172490-07:00 nec-6090-01#b vmstore[2582]: HA: [7846] [tid 2911] HA-STATE: primaryBeginAccepting: invoked 2024-09-10T14:27:46.172972-07:00 nec-6090-01#b vmstore[2582]: HA: [7856] [tid 2911] HA-STATE: setStatusLocked: setting nodeRole to [PRIMARY], nodeStatus to [ACTIVE] 2024-09-10T14:27:46.173513-07:00 nec-6090-01#b vmstore[2582]: HA: [7868] [tid 746] HA-STATE: haManagerListenerThread: invoked 2024-09-10T14:28:46.213586-07:00 nec-6090-01#b vmstore[2582]: HA: [10685] [tid 746] HA-STATE: syncComplete: module [5] indicates SyncComplete 2024-09-10T14:28:47.113674-07:00 nec-6090-01#b vmstore[2582]: HA: [10815] [tid 4997] HA-STATE: syncComplete: module [1] indicates SyncComplete
]]>
312
Data Storage Funny #1 https://datastorageguy.com/2023/09/20/data-storage-funny-1/ Wed, 20 Sep 2023 17:38:17 +0000 https://datastorageguy.com/?p=269 Let this sink in!!

]]>
269