Earlier this week I ran into a weird issue with workflows in my SharePoint 2013 (SP2013) Beta 2 environment. Sharing the details of the issue as well as the resolution in this post just in case someone else runs into the same issue. If you were in my webinar last week, saw the issue
Next step was to watch the communication between SP2013 & WAW so I setup Fiddler to debug the exchange ( I outlined the steps of how to setup Fiddler to debug WAW & SP2013 in this post ). I noticed when SPD2013 was hung up on the “subscribing” step, SP2013 was actually submitting the workflow to WAW and waiting for a response. However eventually the request would timeout or be aborted.
The following image of a trace from Fiddler shows this, specifically session #16 eventually failed (notice no HTTP response code). SP2013 is installed on server W15SP, I’m publishing to a team site at http://intranet.contoso.com and WAW is listening on port 12291. You can even see in the right-hand part of the image the markup of my workflow I created:
When that happened I saw SPD2013 act like the workflow as successfully published. When I tried to run the workflow, WAW would respond with “The scope […] has no workflows under it (shown in the following image from a Fiddler trace when I tried to start an instance of the workflow):
This made sense as SP2013 had a record of the workflow, but WAW didn’t… SP2013 was telling WAW to start a workflow on an item that WAW didn’t have a record of. Strange… so I dug deeper into the Event Log and found a ton of the following errors:
Ah… so now it looks like a problem with ServiceBus! That makes sense as WAW relies on ServiceBus. After some troubleshooting with some of the engineers, it at first appeared (from the Event Log & ULS logs) that there was a certificate issue with ServiceBus, but in fact all certs looked good. We tried to connect to the ServiceBus using the ServiceBus Explorer tool but it couldn’t connect to the default workflow instance either.
Everything looked right, but for some reason stuff was all haywire. The fix was surprisingly easy: flush the DNS cache (c:> IPCONFIG /FLUSHDNS)… something had the wrong pointers. Not sure how that happened, but once all the cache was purged, I recycled all Workflow & Service Bus services and then tried to publish the workflow again… and it worked! This was confirmed by being able to connect to the workflow namespace with the ServiceBus Explorer successfully.
Hope my pain helps someone else along the way!