Is our VRT is reliable? How we are measuring VRT reliability?

naveen.gupta · September 17, 2023, 5:34pm

aakriti.kamboj · September 30, 2023, 6:36am

@Vikas_Dhillon could you pls answer this ques ?

Dilip_Vishwakarma · February 7, 2024, 9:19am

For VRT, we can say it is reliable based on answers to below questions:

Would VRT be able to work correctly when one of the services stops working because of a hardware problem? - Yes, we use autoscaling, horizontal.
VRT be able to work correctly when the traffic is tripled or quadrupled? - Yes, Horizontal autscaling.
VRT be able to work correctly when you experience a DDoS attack? - Yes, our VRT are security certified and tested for DDoS attack.
Is the rate of successful end-user requests more than 99%? - Yes, as per the configuration.

So, we can say that our VRT is reliable.

DebugHorror · February 7, 2024, 3:03pm

@Dilip_Vishwakarma I would like to differ from you a quick search on Internet will show these are the measurable metrics of a reliable system

Uptime: The percentage of time that a system is operational and available for use.
Mean Time Between Failures (MTBF): The average time interval between system failures.
Mean Time to Repair (MTTR): The average time required to repair a failed system and restore it to full functionality.
Availability: The proportion of time that a system is operational and accessible to users within a specified time period.
Failure Rate: The frequency at which components or the system as a whole fail within a given timeframe.
Reliability Growth: The improvement in system reliability over time as issues are identified and addressed through testing, maintenance, and updates.
Redundancy Levels: The degree to which critical components or functions are duplicated within the system to mitigate the impact of failures.
Service Level Agreement (SLA) Compliance: The extent to which the system meets agreed-upon performance standards and uptime targets outlined in SLAs.

Claiming any system reliable or not should be backed by these parameters so I would request you to share the parameters for Vahana. Without the exact matrix, these are just assumptions

naveen.gupta · February 7, 2024, 5:25pm

When we talk about reliability, it’s important to ensure that all components are reliable, not just one area. For example, vLog is not reliable; it often doesn’t work on Sandbox/CUG, and we often have to perform manual work to make it function.