Smartctl: Difference between revisions
(5 intermediate revisions by the same user not shown) | |||
Line 318: | Line 318: | ||
== SMART overall-health self-assessment test result == | == SMART overall-health self-assessment test result == | ||
You can run the command '''smartctl -H /dev/sdX''' to check the overall-health self-assessment test result of the drive. If the test result is PASSED, it means that the drive is considered healthy according to the SMART system. If the test result is FAILED, it means that the drive is considered to be in a pre-failure condition and may fail soon. | * You can run SMART tests on a mounted disk. However, it's generally recommended to run long tests on unmounted disks to prevent any potential issues, especially during read/write operations. | ||
* You can run the command '''smartctl -H /dev/sdX''' to check the overall-health self-assessment test result of the drive. | |||
* If the test result is PASSED, it means that the drive is considered healthy according to the SMART system. If the test result is FAILED, it means that the drive is considered to be in a pre-failure condition and may fail soon. | |||
== Where is the log file == | |||
* The test results are displayed directly in the terminal and stored in the drive's firmware and can be viewed as long as the drive is operational. So by default we can't find the test date/time. | |||
* Use "smartctl -a /dev/sda > smartctl_results.txt" to save the results to a file. | |||
== Passed status == | |||
* The "PASSED" status indicates that the drive's overall SMART health checks have been completed, and no attributes have crossed their critical thresholds at the time of the assessment. | |||
== Low risk Error == | |||
* A few reallocated sectors | |||
* Drives with occasional read or write errors (like CRC errors) | |||
== High risk error == | |||
* Rapidly increasing '''reallocated sectors count''': This indicates the drive is actively deteriorating. High or rapidly increasing values (e.g., > 5–10) can indicate potential failure. | |||
* '''Current pending sectors''' that are not getting reallocated can mean data on those sectors is already corrupted. Non-zero values suggest potential data loss if these sectors are found to be unreadable. | |||
* '''Uncorrectable sectors count''' or errors during read/write operations suggest potential data loss. A value greater than zero is alarming and indicates significant drive issues. | |||
== Non-Critical Attributes == | |||
Attributes like '''Power-On Hours''', '''Load Cycle Count''', and '''Temperature'''. | |||
== Raw_Read_Error_Rate == | == Raw_Read_Error_Rate == | ||
Line 324: | Line 345: | ||
* The THRESH column tells you what the vendors considers as lowest possible value considered as healthy. | * The THRESH column tells you what the vendors considers as lowest possible value considered as healthy. | ||
* If the WORST column shows values below THRESH in same row, the drive is considered as not healthy. It also implies that VALUE has been seen below THRESH, of course. You can also see that only the attributes of type Pre-fail matter when evaluating health. | * If the WORST column shows values below THRESH in same row, the drive is considered as not healthy. It also implies that VALUE has been seen below THRESH, of course. You can also see that only the attributes of type Pre-fail matter when evaluating health. | ||
== Current Pending Sector == | |||
* This has been identified by UNRAID from my 3.5" WD blue HDD. | |||
* However, if the Current Pending Sector Count increases, it indicates that drive failure is imminent. Pending Sectors are the prediction of reallocated sectors which can also be a strong indicator of dead of the hard drive. [https://www.minitool.com/backup-tips/current-pending-sector-count.html What to Do When Encountering Current Pending Sector Count?] | |||
* [https://superuser.com/questions/1058592/how-should-i-understand-current-pending-sector-count-in-crystaldiskinfo-report How should I understand "Current Pending Sector Count" in CrystalDiskInfo reports?] | |||
== Offline uncorrectable == | |||
* This has been identified by UNRAID from my 3.5" WD blue HDD. | |||
* [https://www.minitool.com/lib/uncorrectable-sector-count.html What Does Uncorrectable Sector Count Mean & How to Fix It] | |||
* [https://unix.stackexchange.com/a/549863 Smartctl utility giving uncorrectable and unreadable sectors error on HDD] | |||
== UDMA CRC error count == | |||
* This has been identified by UNRAID from my Crucial CT525MX300 525G SSD. But overall-health is passed. | |||
== Current pending ECC count == | |||
* This has been identified by UNRAID from my Crucial CT1000MX500 1T SSD. I still add it to the array. After a while, the error was gone. | |||
== Output from a brand new disk == | == Output from a brand new disk == | ||
Line 402: | Line 439: | ||
** It seems 970 EVO plus is better. | ** It seems 970 EVO plus is better. | ||
== Reviews | = Monitor dashboard: scrutiny = | ||
<ul> | |||
<li>https://github.com/AnalogJ/scrutiny WebUI for smartd S.M.A.R.T monitoring. The dashboard shows all disks health at a glance. | |||
<li>Following is the docker compose file '''docker-compose.yml''' for my case. Note that also "lsblk" shows my NVME drive is "nvme0n1", but I need to use "nvme0" to make it to work. To view the dashboard, go to http://localhost:8082. | |||
<pre> | |||
version: '3.5' | |||
services: | |||
scrutiny: | |||
container_name: scrutiny | |||
image: ghcr.io/analogj/scrutiny:master-omnibus | |||
cap_add: | |||
- SYS_ADMIN | |||
- SYS_RAWIO | |||
ports: | |||
- "8082:8080" # webapp | |||
- "8086:8086" # influxDB admin | |||
volumes: | |||
- /run/udev:/run/udev:ro | |||
- ./config:/opt/scrutiny/config | |||
- ./influxdb:/opt/scrutiny/influxdb | |||
devices: | |||
- "/dev/sda" | |||
- "/dev/nvme0" | |||
</pre> | |||
<li>As the README described, smartd does not record S.M.A.R.T attribute history, so it can be hard to determine if an attribute is degrading slowly over time. | |||
</ul> | |||
= Reviews = | |||
* [https://www.tomshardware.com/reviews/samsung-980-m2-nvme-ssd-review Samsung 980 M.2 NVMe SSD Review: Going DRAMless with V6 V-NAND (Updated)] | * [https://www.tomshardware.com/reviews/samsung-980-m2-nvme-ssd-review Samsung 980 M.2 NVMe SSD Review: Going DRAMless with V6 V-NAND (Updated)] | ||
* [https://www.tomshardware.com/reviews/crucial-p3-plus-ssd-review-capacity-on-the-cheap Crucial P3 Plus PCIe NVMe M.2 SSD] also has no DRAM. | * [https://www.tomshardware.com/reviews/crucial-p3-plus-ssd-review-capacity-on-the-cheap Crucial P3 Plus PCIe NVMe M.2 SSD] also has no DRAM. | ||
Line 409: | Line 474: | ||
= Some disks = | = Some disks = | ||
== Best SSDs == | |||
[https://www.pcworld.com/article/407542/best-ssds.html Best SSDs of 2024: Reviews and buying advice] | |||
== Samsung Portable SSD T7 == | == Samsung Portable SSD T7 == | ||
Yes, the [https://semiconductor.samsung.com/consumer-storage/portable-ssd/t7/ Samsung Portable SSD T7] (PSSD T7) supports S.M.A.R.T. data reporting. However, there have been some issues with Linux tools not working with the Samsung PSSD T7. This issue has been fixed with a pull request to the `drivedb` of `smartmontools`. If your `drivedb` is current, it will now work correctly. If it is not current, you can manually add the `-d sntasmedia` argument to `smartctl` or update the `drivedb` independently of `smartmontools` by using the [https://www.smartmontools.org/wiki/Download update-smart-drivedb] command; see [https://superuser.com/questions/1649054/linux-tools-dont-work-with-samsung-pssd-t7 Linux tools don't work with Samsung PSSD T7] & [https://www.smartmontools.org/ticket/1403?cversion=0&cnum_hist=12 NVMe pass-through support for Samsung T7 SSD]. | Yes, the [https://semiconductor.samsung.com/consumer-storage/portable-ssd/t7/ Samsung Portable SSD T7] (PSSD T7) supports S.M.A.R.T. data reporting. However, there have been some issues with Linux tools not working with the Samsung PSSD T7. This issue has been fixed with a pull request to the `drivedb` of `smartmontools`. If your `drivedb` is current, it will now work correctly. If it is not current, you can manually add the `-d sntasmedia` argument to `smartctl` or update the `drivedb` independently of `smartmontools` by using the [https://www.smartmontools.org/wiki/Download update-smart-drivedb] command; see [https://superuser.com/questions/1649054/linux-tools-dont-work-with-samsung-pssd-t7 Linux tools don't work with Samsung PSSD T7] & [https://www.smartmontools.org/ticket/1403?cversion=0&cnum_hist=12 NVMe pass-through support for Samsung T7 SSD]. |
Latest revision as of 15:08, 18 October 2024
Smartmontools
- https://www.smartmontools.org/
- https://en.wikipedia.org/wiki/Smartmontools
- https://help.ubuntu.com/community/Smartmontools
- https://wiki.archlinux.org/title/S.M.A.R.T.
- sudo apt install smartmontools
- By default, smartctl was installed in /usr/sbin.
- Put export PATH=$PATH:/usr/sbin in the .bashrc file
- sudo apt install -y gsmartcontrol
- SMART data is not partition-dependent but rather disk-dependent.
NVME
- Version
$ smartctl -v | head -1 smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.19.0-46-generic] (local build)
- Keyboards to look for: Written, Percentage
$ sudo smartctl -a /dev/nvme0 | grep "Writ" Data Units Written: 274,127 [140 GB] Host Write Commands: 7,499,312 $ sudo smartctl -a /dev/nvme0 | grep "Percentage" Percentage Used: 0%
- Full output
$ sudo smartctl -a /dev/nvme0 smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.19.0-46-generic] (local build) Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org === START OF INFORMATION SECTION === Model Number: CT1000P3SSD8 Serial Number: 2314E6C4100F Firmware Version: P9CR30A PCI Vendor/Subsystem ID: 0xc0a9 IEEE OUI Identifier: 0x00a075 Controller ID: 1 NVMe Version: 1.4 Number of Namespaces: 1 Namespace 1 Size/Capacity: 1,000,204,886,016 [1.00 TB] Namespace 1 Formatted LBA Size: 512 Namespace 1 IEEE EUI-64: 6479a7 77f00000c9 Local Time is: Sat Jul 1 11:22:30 2023 EDT Firmware Updates (0x12): 1 Slot, no Reset required Optional Admin Commands (0x0017): Security Format Frmw_DL Self_Test Optional NVM Commands (0x005e): Wr_Unc DS_Mngmt Wr_Zero Sav/Sel_Feat Timestmp Log Page Attributes (0x06): Cmd_Eff_Lg Ext_Get_Lg Maximum Data Transfer Size: 64 Pages Warning Comp. Temp. Threshold: 85 Celsius Critical Comp. Temp. Threshold: 95 Celsius Supported Power States St Op Max Active Idle RL RT WL WT Ent_Lat Ex_Lat 0 + 6.00W 0.0000W - 0 0 0 0 0 0 1 + 3.00W 0.0000W - 0 0 0 0 0 0 2 + 1.50W 0.0000W - 0 0 0 0 0 0 3 - 0.0250W 0.0000W - 3 3 3 3 5000 1900 4 - 0.0030W - - 4 4 4 4 13000 100000 Supported LBA Sizes (NSID 0x1) Id Fmt Data Metadt Rel_Perf 0 + 512 0 1 1 - 4096 0 0 === START OF SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED SMART/Health Information (NVMe Log 0x02) Critical Warning: 0x00 Temperature: 26 Celsius Available Spare: 100% Available Spare Threshold: 5% Percentage Used: 0% Data Units Read: 201,206 [103 GB] Data Units Written: 274,128 [140 GB] Host Read Commands: 4,982,258 Host Write Commands: 7,499,381 Controller Busy Time: 23 Power Cycles: 13 Power On Hours: 408 Unsafe Shutdowns: 9 Media and Data Integrity Errors: 0 Error Information Log Entries: 42 Warning Comp. Temperature Time: 0 Critical Comp. Temperature Time: 0 Temperature Sensor 1: 26 Celsius Temperature Sensor 2: 31 Celsius Temperature Sensor 8: 26 Celsius Error Information (NVMe Log 0x01, 16 of 16 entries) Num ErrCount SQId CmdId Status PELoc LBA NSID VS 0 42 0 0x5007 0x4005 0x028 0 0 -
Wear out
Use the attribute Percentage.
smartctl -a /dev/nvme0 | grep "Percentage" Percentage Used: 2%
Difference of /dev/nvme0, /dev/nvme0n1, /dev/nvme0n1p1
Why is there both character device and block device for nvme?, Interpreting wiki/documentation for an NVMe disk
- /dev/nvme0 represents the raw device and is the “control” device node that you use to configure the hardware. It’s the NVMe device controller.
- /dev/nvme0n1, on the other hand, represents the first namespace on that device. The n1 denotes the first namespace of the device. These are the devices you use for actual storage, which will behave essentially as disks. This is what I get when I issue the "lsblk" command. So /dev/nvme0n1 is like /dev/sda.
- /dev/nvme0n1p1 represents a partition on an NVMe storage namespace. So /dev/nvme0n1p1 is like /dev/sda1.
nvme-cli command
SATA SSD
- How can I monitor the TBW on my Samsung SSD?
- Crucial shows rated as 220TB Total Bytes Written (TBW) while Samsung shows as 600 TB TBW. Both 5 year warranty.
- Sector size is 512 bytes.
- The ID# may be different on different devices.
- 1TB is 1024^4 bytes (~10^12).
- Keyboards to look for: Written, Percent
$ sudo smartctl -a /dev/sda | grep "Writ" 206 Write_Error_Rate 0x000e 100 100 000 Old_age Always - 0 246 Total_LBAs_Written 0x0032 100 100 000 Old_age Always - 7384050441 $ sudo smartctl -a /dev/sda | grep "Sector" Sector Size: 512 bytes logical/physical $ sudo smartctl -a /dev/sda | grep "Percent" # 99% life remain in this case 202 Percent_Lifetime_Remain 0x0030 099 099 001 Old_age Offline - 1
- Full output
$ sudo smartctl --all /dev/sda smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.19.0-46-generic] (local build) Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org === START OF INFORMATION SECTION === Model Family: Crucial/Micron Client SSDs Device Model: Crucial_CT525MX300SSD1 Serial Number: 1644148274F7 LU WWN Device Id: 5 00a075 1148274f7 Firmware Version: M0CR031 User Capacity: 525,112,713,216 bytes [525 GB] Sector Size: 512 bytes logical/physical Rotation Rate: Solid State Device Form Factor: 2.5 inches TRIM Command: Available, deterministic, zeroed Device is: In smartctl database [for details use: -P show] ATA Version is: ACS-3 T13/2161-D revision 5 SATA Version is: SATA 3.2, 6.0 Gb/s (current: 6.0 Gb/s) Local Time is: Sat Jul 1 11:17:53 2023 EDT SMART support is: Available - device has SMART capability. SMART support is: Enabled === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED General SMART Values: Offline data collection status: (0x00) Offline data collection activity was never started. Auto Offline Data Collection: Disabled. Self-test execution status: ( 0) The previous self-test routine completed without error or no self-test has ever been run. Total time to complete Offline data collection: ( 1391) seconds. Offline data collection capabilities: (0x7b) SMART execute Offline immediate. Auto Offline data collection on/off support. Suspend Offline collection upon new command. Offline surface scan supported. Self-test supported. Conveyance Self-test supported. Selective Self-test supported. SMART capabilities: (0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability: (0x01) Error logging supported. General Purpose Logging supported. Short self-test routine recommended polling time: ( 2) minutes. Extended self-test routine recommended polling time: ( 7) minutes. Conveyance self-test routine recommended polling time: ( 3) minutes. SCT capabilities: (0x0035) SCT Status supported. SCT Feature Control supported. SCT Data Table supported. SMART Attributes Data Structure revision number: 16 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x002f 100 100 000 Pre-fail Always - 0 5 Reallocate_NAND_Blk_Cnt 0x0032 100 100 010 Old_age Always - 0 9 Power_On_Hours 0x0032 100 100 000 Old_age Always - 14046 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 118 171 Program_Fail_Count 0x0032 100 100 000 Old_age Always - 0 172 Erase_Fail_Count 0x0032 100 100 000 Old_age Always - 0 173 Ave_Block-Erase_Count 0x0032 099 099 000 Old_age Always - 17 174 Unexpect_Power_Loss_Ct 0x0032 100 100 000 Old_age Always - 78 183 SATA_Interfac_Downshift 0x0032 100 100 000 Old_age Always - 0 184 Error_Correction_Count 0x0032 100 100 000 Old_age Always - 0 187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0 194 Temperature_Celsius 0x0022 077 058 000 Old_age Always - 23 (Min/Max 12/42) 196 Reallocated_Event_Count 0x0032 100 100 000 Old_age Always - 0 197 Current_Pending_ECC_Cnt 0x0032 100 100 000 Old_age Always - 0 198 Offline_Uncorrectable 0x0030 100 100 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x0032 100 100 000 Old_age Always - 1 202 Percent_Lifetime_Remain 0x0030 099 099 001 Old_age Offline - 1 206 Write_Error_Rate 0x000e 100 100 000 Old_age Always - 0 246 Total_LBAs_Written 0x0032 100 100 000 Old_age Always - 7384050441 247 Host_Program_Page_Count 0x0032 100 100 000 Old_age Always - 231070651 248 FTL_Program_Page_Count 0x0032 100 100 000 Old_age Always - 94337836 180 Unused_Reserve_NAND_Blk 0x0033 000 000 000 Pre-fail Always - 1940 210 Success_RAIN_Recov_Cnt 0x0032 100 100 000 Old_age Always - 0 SMART Error Log Version: 1 No Errors Logged SMART Self-test log structure revision number 1 Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Vendor (0xff) Completed without error 00% 14038 - # 2 Vendor (0xff) Completed without error 00% 13741 - # 3 Vendor (0xff) Completed without error 00% 13548 - # 4 Vendor (0xff) Completed without error 00% 13126 - # 5 Vendor (0xff) Completed without error 00% 12915 - # 6 Vendor (0xff) Completed without error 00% 5647 - # 7 Vendor (0xff) Completed without error 00% 5484 - # 8 Vendor (0xff) Completed without error 00% 5312 - SMART Selective self-test log data structure revision number 1 SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS 1 0 0 Not_testing 2 0 0 Not_testing 3 0 0 Not_testing 4 0 0 Not_testing 5 0 0 Not_testing Selective self-test flags (0x0): After scanning selected spans, do NOT read-scan remainder of disk. If Selective self-test is pending on power-up, resume after 0 minute delay.
Wear out
Use the attribute Media_Wearout_Indicator or Percentage Used or SSD Life Left.
# Kingston SSD 240 GB # smartctl -a /dev/sda | grep Left 231 SSD_Life_Left 0x0000 002 002 000 Old_age Offline - 98 # Crucial 1T $ sudo smartctl -a /dev/sda | grep Percent 202 Percent_Lifetime_Remain 0x0030 099 099 001 Old_age Offline - 1 # PNY CS900 1T $ sudo smartctl -a /dev/sda | grep -A 1 -i "lifetime" # '-A 1' is to include one line of context after the match Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Short offline Completed without error 00% 8360 # The "00% Remaining" in this context means the test completed # with no remaining time left, which is expected for a completed test.
USB adapter
- man smartctl and search for "-d TYPE".
- The message Unknown USB bridge [Please specify device type with the -d option.] indicates that smartctl is unable to automatically detect the type of USB bridge used by your external drive.
- You can try using the -d sat option to specify that the device type is a SATA drive behind a SCSI-to-ATA Translation (SAT) layer. This is exactly the case for vantec adapter. GSmartControl also showed it is /dev/sdc (scsi) in the Drive information. Oddly, when I use Ugreen adapter, dmesg also shows it is scsi but GSmartControl does not show scsi. That is being said, it does not hurt to add -d sat parameter to the smartctl command.
$ sudo dmesg ... [997217.895800] scsi 5:0:0:0: Direct-Access Crucial_ CT525MX300SSD1 1414 PQ: 0 ANSI: 6 [997217.899060] sd 5:0:0:0: Attached scsi generic sg1 type 0 ...
sudo smartctl -a -d sat /dev/sdX
The -d option is used to specify the device type, which can be useful when smartctl is unable to correctly guess the device type. For example, on some systems, smartctl may correctly guess that a drive is a SATA drive, while on other systems it may not. In such cases, the -d sat option can be used to explicitly specify that the device is a SATA drive.
- GSmartControl displays SMART supported is Yes on Ugreen adapter but No on Vantec adapter.
- On Samsung PSSD T7, I need to use sudo smartctl -a -d scsi /dev/sdc
- If this doesn’t work, you can try other device types such as -d sat,12, -d usbcypress, -d usbjmicron, -d usbprolific, or -d usbsunplus. You can find more information about these options in the smartctl man page or by running smartctl --help.
eMMC
- eMMC storage is typically accessed via an SD/MMC interface, which is not directly supported by smartctl. Therefore, it is not possible to use smartctl to check the health of eMMC storage by specifying a device type.
- How to Check eMMC info from linux - depends on supports from Kernel Driver
dmesg | grep mmc
Calculation
$ sudo apt install calc $ calc 274127*512/1024^2 133.85107421875
> 274127 *512/1024^2 # sudo smartctl -a /dev/nvme0 | grep "Data Units Written" [1] 133.8511 # GB > 7384050441 * 512/1024^3 # sudo smartctl -a /dev/sda | grep "Total_LBAs_Written" [1] 3520.99 # GB
Understanding smartctl -a output
SMART overall-health self-assessment test result
- You can run SMART tests on a mounted disk. However, it's generally recommended to run long tests on unmounted disks to prevent any potential issues, especially during read/write operations.
- You can run the command smartctl -H /dev/sdX to check the overall-health self-assessment test result of the drive.
- If the test result is PASSED, it means that the drive is considered healthy according to the SMART system. If the test result is FAILED, it means that the drive is considered to be in a pre-failure condition and may fail soon.
Where is the log file
- The test results are displayed directly in the terminal and stored in the drive's firmware and can be viewed as long as the drive is operational. So by default we can't find the test date/time.
- Use "smartctl -a /dev/sda > smartctl_results.txt" to save the results to a file.
Passed status
- The "PASSED" status indicates that the drive's overall SMART health checks have been completed, and no attributes have crossed their critical thresholds at the time of the assessment.
Low risk Error
- A few reallocated sectors
- Drives with occasional read or write errors (like CRC errors)
High risk error
- Rapidly increasing reallocated sectors count: This indicates the drive is actively deteriorating. High or rapidly increasing values (e.g., > 5–10) can indicate potential failure.
- Current pending sectors that are not getting reallocated can mean data on those sectors is already corrupted. Non-zero values suggest potential data loss if these sectors are found to be unreadable.
- Uncorrectable sectors count or errors during read/write operations suggest potential data loss. A value greater than zero is alarming and indicates significant drive issues.
Non-Critical Attributes
Attributes like Power-On Hours, Load Cycle Count, and Temperature.
Raw_Read_Error_Rate
https://unix.stackexchange.com/a/384833
- The THRESH column tells you what the vendors considers as lowest possible value considered as healthy.
- If the WORST column shows values below THRESH in same row, the drive is considered as not healthy. It also implies that VALUE has been seen below THRESH, of course. You can also see that only the attributes of type Pre-fail matter when evaluating health.
Current Pending Sector
- This has been identified by UNRAID from my 3.5" WD blue HDD.
- However, if the Current Pending Sector Count increases, it indicates that drive failure is imminent. Pending Sectors are the prediction of reallocated sectors which can also be a strong indicator of dead of the hard drive. What to Do When Encountering Current Pending Sector Count?
- How should I understand "Current Pending Sector Count" in CrystalDiskInfo reports?
Offline uncorrectable
- This has been identified by UNRAID from my 3.5" WD blue HDD.
- What Does Uncorrectable Sector Count Mean & How to Fix It
- Smartctl utility giving uncorrectable and unreadable sectors error on HDD
UDMA CRC error count
- This has been identified by UNRAID from my Crucial CT525MX300 525G SSD. But overall-health is passed.
Current pending ECC count
- This has been identified by UNRAID from my Crucial CT1000MX500 1T SSD. I still add it to the array. After a while, the error was gone.
Output from a brand new disk
- PNY 1T SSD
SMART support/capability
- Use sudo smartctl -a /dev/sdb . If I use sudo smartctl -a /dev/sdb1, it will show SMART support is: Unavailable - device lacks SMART capability.
GSmartControl
- https://gsmartcontrol.shaduri.dev/downloads
- GSmartControl is part of Gparted Live.
- GSmartControl 1.1.3 -> Options -> Update Drive Database (failed). Download and install 1.1.4. A new window is open but no progress.
- For some reason, GSmartControl show my WD black nvme as Unknown model. But "sudo smartctl -a /dev/nvme0n1 | grep -i model" can display the model. So the command line tool is better.
- gsmartcontrol can show the command it used though it does not print everything. Select a disk and click Options -> View Execution Log'.
UGREEN adapter$ sudo smartctl --info --health --capabilities /dev/sdb === START OF INFORMATION SECTION === ... SMART support is: Available - device has SMART capability. SMART support is: Enabled === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED ...
VANTEC adapter
$ sudo smartctl --info --health --capabilities /dev/sdb === START OF INFORMATION SECTION === ... SMART support is: Available - device has SMART capability. SMART support is: Enabled === START OF READ SMART DATA SECTION === SMART Status not supported: Incomplete response, ATA output registers missing SMART overall-health self-assessment test result: PASSED Warning: This result is based on an Attribute check. ...
But the rest of output are very similar. This bridge may not pass on all SMART commands.
- UGREEN SATA/USB adapter and VANTEC.
smartd: SMART Disk Monitoring Daemon
- How to configure smartd and be notified of hard disk problems via email
- Create a configuration file /etc/smartd.conf
/dev/sdX -H -l error -l selftest -m <email_address>
where "-H" means to monitor the health status, error log (-l error), and self-test log (-l selftest) of the /dev/sdX device, and to send an email if any issues are detected.
- Run sudo systemctl enable smartd.service && sudo systemctl start smartd.service
- The threshold for the temperature of a disk is typically determined by the manufacturer and is often not directly changeable by the user.
- https://wiki.archlinux.org/title/S.M.A.R.T.#smartd
- Monitoring hard disk health with smartd under Linux or UNIX operating systems
Monitor temperature
sudo apt install hddtemp hddtemp hddtemp /dev/sda hddtemp -d /dev/sd[abcd] telnet remotebox 7634 # OR nc 192.168.1.100 7634
- SAMSUNG 980 SSD 1TB PCle 3.0x4, NVMe and search "temperature"
- smartd reports warnings: Device: /dev/nvme0, Critical Warning (0x02): Temperature. It seems this is a common problem.
- smartctl -a /dev/nvme0 can show the current temperature
- It seems 970 EVO plus is better.
Monitor dashboard: scrutiny
- https://github.com/AnalogJ/scrutiny WebUI for smartd S.M.A.R.T monitoring. The dashboard shows all disks health at a glance.
- Following is the docker compose file docker-compose.yml for my case. Note that also "lsblk" shows my NVME drive is "nvme0n1", but I need to use "nvme0" to make it to work. To view the dashboard, go to http://localhost:8082.
version: '3.5' services: scrutiny: container_name: scrutiny image: ghcr.io/analogj/scrutiny:master-omnibus cap_add: - SYS_ADMIN - SYS_RAWIO ports: - "8082:8080" # webapp - "8086:8086" # influxDB admin volumes: - /run/udev:/run/udev:ro - ./config:/opt/scrutiny/config - ./influxdb:/opt/scrutiny/influxdb devices: - "/dev/sda" - "/dev/nvme0"
- As the README described, smartd does not record S.M.A.R.T attribute history, so it can be hard to determine if an attribute is degrading slowly over time.
Reviews
- Samsung 980 M.2 NVMe SSD Review: Going DRAMless with V6 V-NAND (Updated)
- Crucial P3 Plus PCIe NVMe M.2 SSD also has no DRAM.
- TEAMGROUP T-Force CARDEA A440 Pro Graphene Heatsink 1TB DRAM SLC Cache
- Crucial P5 Plus 1TB PCIe Gen4 look good with 5y warranty.
Some disks
Best SSDs
Best SSDs of 2024: Reviews and buying advice
Samsung Portable SSD T7
Yes, the Samsung Portable SSD T7 (PSSD T7) supports S.M.A.R.T. data reporting. However, there have been some issues with Linux tools not working with the Samsung PSSD T7. This issue has been fixed with a pull request to the `drivedb` of `smartmontools`. If your `drivedb` is current, it will now work correctly. If it is not current, you can manually add the `-d sntasmedia` argument to `smartctl` or update the `drivedb` independently of `smartmontools` by using the update-smart-drivedb command; see Linux tools don't work with Samsung PSSD T7 & NVMe pass-through support for Samsung T7 SSD.