I am here with an update. I've tried several things:
- I've tried with only one of hdds attached, but behaviour was still the same - overheating and random hdd shut down (probably those are two separate issues).
- Set power state to low power, host contolled temperature and temperature threshold to acceptable (for me) values [1]. This didn't help for the kernel error, but helped about the temperature. Now hdd is throttled at desired value (when reaching 71 degrees and up to ~75 degrees, still without the heatsync).
- As upper step still didn't fix the kernel error and ssd shut down, I've tried the proposal from dmesg - appended to the boot cmd the [2] params. This fixes the error and hdd shut down (at least for today), but of course I am not happy with it, because it prevents hdd going to low power states, which causes more overheating and probably shorter hdd life.
- Now I am logging the temp at 10s interval, to track any anomalies. If some of them still occur, I'll try continue tweaking those settings.
What are my next steps:
- Change serial cable (x1005 comes with 2 in the package).
- Add radiator to hdds when it comes.
- Try with other (old hdd) when I have time.
At last, here are some observations:
- Temperature and shutdown problems seem to have different root cause (still could be related).
- dmesg error and hdd shut down still happens at 50 degrees (without the cmd line [2] applied)
- After applying kernel params, temperature can not be kept at 50 degrees (which seems normal - it doesn't allow certain low power states)
- According to me, it is not related to the x1005 board, but a bad thermal management by kernel [3], Raspberry PI os, or both.
Anyways, I will continue my investigations, because this is not yet solved.
[1]
sudo nvme set-feature /dev/nvme0 -f 2 -v 1
sudo nvme set-feature /dev/nvme0 -f 4 -v 323
sudo nvme set-feature /dev/nvme0 -f 0x10 -v 0x01570158
sudo nvme set-feature /dev/nvme1 -f 2 -v 1
sudo nvme set-feature /dev/nvme1 -f 4 -v 323
sudo nvme set-feature /dev/nvme1 -f 0x10 -v 0x01570158
[2]
nvme_core.default_ps_max_latency_us=0 pcie_aspm=off pcie_port_pm=off
[3]
Linux pi 6.6.58-v8-16k
- I've tried with only one of hdds attached, but behaviour was still the same - overheating and random hdd shut down (probably those are two separate issues).
- Set power state to low power, host contolled temperature and temperature threshold to acceptable (for me) values [1]. This didn't help for the kernel error, but helped about the temperature. Now hdd is throttled at desired value (when reaching 71 degrees and up to ~75 degrees, still without the heatsync).
- As upper step still didn't fix the kernel error and ssd shut down, I've tried the proposal from dmesg - appended to the boot cmd the [2] params. This fixes the error and hdd shut down (at least for today), but of course I am not happy with it, because it prevents hdd going to low power states, which causes more overheating and probably shorter hdd life.
- Now I am logging the temp at 10s interval, to track any anomalies. If some of them still occur, I'll try continue tweaking those settings.
What are my next steps:
- Change serial cable (x1005 comes with 2 in the package).
- Add radiator to hdds when it comes.
- Try with other (old hdd) when I have time.
At last, here are some observations:
- Temperature and shutdown problems seem to have different root cause (still could be related).
- dmesg error and hdd shut down still happens at 50 degrees (without the cmd line [2] applied)
- After applying kernel params, temperature can not be kept at 50 degrees (which seems normal - it doesn't allow certain low power states)
- According to me, it is not related to the x1005 board, but a bad thermal management by kernel [3], Raspberry PI os, or both.
Anyways, I will continue my investigations, because this is not yet solved.
[1]
sudo nvme set-feature /dev/nvme0 -f 2 -v 1
sudo nvme set-feature /dev/nvme0 -f 4 -v 323
sudo nvme set-feature /dev/nvme0 -f 0x10 -v 0x01570158
sudo nvme set-feature /dev/nvme1 -f 2 -v 1
sudo nvme set-feature /dev/nvme1 -f 4 -v 323
sudo nvme set-feature /dev/nvme1 -f 0x10 -v 0x01570158
[2]
nvme_core.default_ps_max_latency_us=0 pcie_aspm=off pcie_port_pm=off
[3]
Linux pi 6.6.58-v8-16k
Statistics: Posted by jshan — Fri Nov 01, 2024 1:33 pm