At this moment there are several ways to build exciting new applications. In several projects, we are using a hybrid/cloud architecture, specifically Windows Azure. In my upcoming posts I would like to share some of the guidelines we are developing internally, in this case specifically a way of handling errors in Azure queues/topic-subscriptions.
A lot of the Azure (integration) Architectures (and even between web-worker roles) will likely use some elements of the Azure Service Bus, or Azure Queues. Going through the different architectures is not part of this post, so I will suffice with a slide from the Service Bus Deep Dive presentation;
Within our company Caesar, several internal systems have been created and where possible purchased. One of them, CRM4.0 was outdated, or not suited for all our requirements (among them Accessibility online). We decided to migrate our CRM system to the Cloud, using Dynamics CRM. As not all systems are migrate and we are in the process of analyzing the requirements and alternatives, we needed a solution for updating our internal systems which use CRM information.
As Dynamics CRM provides means to push updates to Windows Azure, we have implemented the following solution;
· Dynamics CRM send Contacts to the Azure Service Bus Topic ‘Contacts’
o For each system subscription, we have a subscription (e.g. contacts-systemA)
· Dynamics CRM send Accounts to the Azure Service bus Topic ‘Accounts’
· An internal (windows) Services picks up messages from the subscriptions and sends them to the LOB systems
The following architecture explains this architecture:
This worked fine, however, sometimes we had a problem processing messages. After diving into the problem we identified that malformed messages/incomplete accounts/contacts were send, which caused an error, which leaded to the Abandon, the message would remain on the queue, and thus, eventually the problem would occur…..we implemented a maximum number of errors strategy, so ultimately the processing service would stop. Implementing error handling, transient fault handling, and Email Listener did not prevent anything; we did not know when an error would occur and what the error would be.
We stretched the capabilities of the CRM Plugin and CRM configuration which allows you to send all fields, perform validations, however, several things can go wrong:
o Transient faults – network hick-ups, Azure updates which terminate connections, these call all be handled by implementing the EntLib Transient Fault Handling block
o Environment Configuration - Azure Topic/Subscriptions have not been created in the environment, these can all be prevented by using a strategy such as proposed in my earlier post
o Management - Azure Storage account configuration is modified/removed, these risks can be minimized by implementing an solid Azure security policy (and not promoting everybody to co-administrator)
o Server (processing service) is not available, this should be monitored and causes business issues, but due to the asynchronous setup of this architecture, does not cause any issues in the system which are not solved when restarting this service
· Functional error
o Entity consistency
§ Contacts/Accounts are not valid as not all mandatory fields are set, these can be resolved by managing the CRM Plugin
o Entity dependencies
§ Contact insert is not processed in the internal system, Contact update will fail
§ Account insert is not processed, relation with account cannot be made, this contact insert will fail
Given the problem, some can be solved by implementing readily available frameworks and components, however, for some errors, a strategy is in order. Let’s look at the aforementioned problem in relation to the operations. Processing messages has been implemented earlier by using the peek-lock model where a message is only marked as processed by the following operations on the brokered message:
· Complete (everything went fine)
· Abandon (an error occurred while processing)
· Defer (meta-data can be added to the message, so that the message can be picked up at a later time)
Will this solve a functional error? No!
So what we need is a strategy…which allows messages to be stored in a location, related to the queue/topic-subscription, but will not be processed, is ‘dead’, and is queues for further investigation, hence:
“All messages, which cannot be processed, are placed in the DeadLetter queue”
This will result in the following state:
This however, poses several new challenges, what to do with the dead-letter messages, how to restart messages, in the next post I will explain my effort to implement a monitoring solution by using and evaluating several existing frameworks and technologies.
To be continued….