Resolved

This incident has been resolved.

Update

Greetings Rally Customers,

We have a conclusive update to share with you all on the progress of ensuring webhooks are firing and being delivered properly. We have successfully deployed a change to correct the issues we have been seeing with webhooks, and all webhooks rules are now firing as expected.

That said, as noted in my earlier communications about the underlying issues, the problems we were seeing had created a very large queue of messages that need to be processed. Our expectation is that we still have until sometime after 1AM MT for the queue to be caught up to the expected processing speeds. By the time your teams log in tomorrow morning you should see normal operations of webhooks.

I can only reiterate that we appreciate your patience and frustration with this, and we are committed to making sure that our customers are able to leverage Rally to meet their needs. Sincerely,

Your Dedicated Rally Team

Update

Hello Rally Customers,

We wanted to provide you all with some updates as we continue to narrow down the root cause of the webhook issue.

Issue #1: Cases where webhooks may fire multiple times based on the number of expressions in that webhook. For example, if a customer has a webhook rule with 2 expressions that webhook can actually fire twice. Depending on a customer's code implementation/error handling you may just drop the extraneous messages on the floor. Regardless of whether or not you're seeing this directly, the net result is that this issue has caused massive inflation in the size of our webhooks queue, which is driving the inconsistent delivery times that you're seeing.

Next Steps and Timing: We were just able to pinpoint this exact behavior as the root cause in the last hour. The engineering team is working on a fix right now but can't say yet how soon we will be able to deploy something. As soon as I know more about this fix I will share an update right away.

Issue #2: The other issue that surfaced as part of this is a network configuration setting that limits the number of outbound socket connections that we can make to anyone endpoint. So while we have been flooded with this backlog of traffic this has limited our ability to move through the queue quickly across all customer and webhook endpoints. While this issue is clearly exacerbated right now as a result of the first problem we believe that this may have manifested in the past but only very intermittently and was hard to track down leading to a perception that webhooks could be "flakey." Correcting this problem should improve the overall reliability of webhook delivery.

Next Steps and Timing: We understand what needs to happen to fix this, however, we'd like to have a better sense of timing of the fix to the first issue before we deploy this so that we don't generate an unnecessarily high rate of unexpected webhooks to customer endpoints.

This has been an incredibly complex issue to debug and we genuinely appreciate the frustration and how this has impacted your business.

Sincerely, Your Rally Team

Update

Hello Rally Customers, We have continued to investigate the root cause of the webhooks issue today, which we have little to provide an update on. Our teams are still looking into the two issues as mentioned previously. Currently, we are seeing a larger volume of outbound messages to our customers leading to a delay in webhook delivery. We have also noticed an uptick in non-response replies when trying to deliver to a 3rd party consumer of our webhooks that may be causing these delay issues and working with them to determine the root cause of a dramatic increase in volume. We are truly sorry for the delay in webhook delivery and the inconvenience this has caused to your business. We will continue to provide updates daily as we narrow down the issue.

Update

We are continuing to investigate this issue.

Update

To our valued Rally Customers, We appreciate all of your feedback on this issue and understand how this is impacting your business. As you’re aware over the past week we have seen an issue with webhook reliability, and have made significant improvements to address those issues. Webhook rules are now triggering and we are receiving successful status back for the majority of users. Currently, we are seeing a larger volume of outbound messages to our customers leading to a delay in webhook delivery. We have also noticed an uptick in non-response replies when trying to deliver to a 3rd party consumer of our webhooks that may be causing these delay issues and working with them to determine the root cause of a dramatic increase in volume. Our Rally engineering team is hard at work to resolve and improve this experience. And on behalf of your Rally team, we thank you for taking the time to read this update and continue to engage with us. We’ll be providing updates regularly as we drive to conclude this issue.

We apologize for any inconvenience that this has caused.

Update

We are continuing to investigate this issue.

Update

We are continuing to investigate this issue.

Update

To our valued Rally Customers, We appreciate all of your feedback on this issue and understand how this is impacting your business. As you’re aware over the past week we have seen an issue with webhook reliability, and have made significant improvements to address those issues. Webhook rules are now triggering and we are receiving successful status back for the majority of users. Currently, we are seeing a larger volume of outbound messages to our customers leading to a delay in webhook delivery. We have also noticed an uptick in non-response replies when trying to deliver to a 3rd party consumer of our webhooks that may be causing these delay issues and working with them to determine the root cause of a dramatic increase in volume. Our Rally engineering team is hard at work to resolve and improve this experience. And on behalf of your Rally team, we thank you for taking the time to read this update and continue to engage with us. We’ll be providing updates regularly as we drive to conclude this issue.

We apologize for any inconvenience that this has caused.

Update

We are investigating this issue

Update

We are still investigating the issue with WebHooks.

Investigating

We are currently investigating this issue.

Identified

We have identified an issue with WebHooks matching and are continuing to investigate a solution for this issue. We will provide additional information as it becomes available. Thank you for your patience as we work towards a resolution of this issue.

Investigating

We are currently investigating an issue with Webhooks not processing correctly. We apologize for any inconvenience that this has caused.

Began at:

Affected components