I/O-Performance

Minor incident de/fra2 Distributed Storage
2024-03-04 16:32 CEST · 15 hours, 27 minutes

Updates

Post-mortem

On Sunday, 3.3.2024 around 5:30pm we lost Storage Nodes in our Cluster. This led to a reduced performance and reduced redundancy because of less Disk and CPU power in the Cluster.

But the remaining Storages Nodes were able to handle all storage operations in a good performance without any impact to our customers.

This is an absolute normal event which can happen at any time and will be handled by our software.

On Monday, 3.4.2024 around 3pm we replaced the failed disks to restore the full redundancy and performance.

This is also an absolut normal operation procedure to replace the failed hardware on the next business day.

Starting with the rebalancing and synchronization of the replaced disks, we observe a slight increase to the storage latency which led to some impact for storage latency sensitive customer applications.

For the future we will increase the amount of available storage nodes and slow down the recovery speed to reduce the chance of increasing the IO latency.

March 6, 2024 · 12:39 CEST
Resolved

We located the issue and restored the Service. Everything is working again.

March 5, 2024 · 07:59 CEST
Issue

We are currently experiencing impairments to I/O performance at the Fra2 site.

March 4, 2024 · 16:32 CEST

← Back