Well, this took me a long time to figure out what ended up being a pretty simple issue to fix!
For a while now I have been getting a lot of weird RPC errors between my Exchange 2013 and Exchange 2016 DAG's where database copies would not seed and the content index kept failing! I also could not get my Exchange 2016 servers added into a DAG due to RPC related errors. Naturally, I chalked this up to a MS Bug (as you do! :) ) or latency issues causing timeout issues between my cloud OVH site and my own datacenter (site to site VPN) but it turned out to be much simpler than that! The event logs weren't very helpful either as the errors are usually generic, for example the error message I got is similar to this:
Error: MapiExceptionNetworkError: GetPhysicalDatabaseInformation rpc failed. (hr=0x80040115, ec=-2147221227)
Googling the error brought up all sorts of generic fixes which I knew from experience wouldn't resolve it. I had tried Update-MailboxDatabaseCopy with the -DeleteExistingFiles option as most guides state for this error and double and triple checked all the services were running (including WinRM) and Windows Firewall wasn't blocking any ports and that there was enough free disk space on both ends. I had no other issues so I never assumed it was anything network related.
Then one day (i.e today) I was poking around the Cisco ASAv's packet inspection rules (global policy-map) and I noticed one of the inspect lines had "inspect dcerpc".
For those that don't know; DCOM/RPC (Distributed Computing Environment/Remote Procedure Calls) in Microsoft Exchange (and pretty much in every other MS product such as Active Directory) uses TCP port number 135 to initiate RPC connections and then responds back via a random dynamic port. "Inspect dcerpc" on the global policy-map on the ASA inspects sessions established via port 135 and then monitors the packet direction and then allows the return packets coming back to the source server (that originally established the port 135 traffic) through the dynamic port the destination Windows Server established on.
In environments with good security you would only allow through traffic via sources, destinations and ports that you trust so the inspection is important and a key feature of the Cisco ASA appliances to prevent these issues, otherwise you'd have to allow ports from, 1024-65535 which would open up a huge hole in your security!
So, what is the solution to the RPC errors? Either one of the following:
Disable dcerpc inspection (such as by going to global configuration mode on the ASA 'conf t' and then, policy-map your_ global_policy, then 'no inspect dcerpc') then permit all traffic between your Exchange DAG members (open all ports). That fixed the issue straight away and my DAG's miraculously started working again, for the short-term at least. In the long run you'd want to lock down the traffic allowed for security purposes (only only trusted traffic).
Adjust the inspection and specify/increase the 'pinhole' timeout so that the servers at either end don't think the connection has terminated and end up dropping the session. This is necessary if your servers are connected via slow(er) links. In my case my two sites are in different countries and tunnelled over the internet via a Site to Site VPN so I definitely needed to increase the pinout.
To do this delete any existing dcerpc inspection rules in your global policy-map as per option 1 (this will depend on whether the dcerpc inspection rule is directly in the global_policy or in a class, etc which depends on your own environment and config, just put 'no' in front of anything referencing dcerpc or RPC or MSRPC or DCOM, etc. You might want to get permission before you do this due to potential impact on the network to other production services!!)
Now do the following:
class-map dcerpc endpoint-mapper lookup-operation match port tcp eq 135 policy-map type inspect dcerpc dcerpc_map parameters timeout pinhole 0:10:00 policy-map global_policy class dcerpc inspect dcerpc dcerpc_map service-policy global_policy global
Again, this example works for me but may differ in your environment according to your company policies and the name of your global policy as well as if the policy is applied to a specific interface rather than global! You would need to get approval for your environment and might need to schedule downtime/make a change request otherwise you might get some angry customers!
To test or troubleshoot you can run a packet capture to inspect all the traffic that matches the above rule by running:
debug dcerpc packet
or only debug errors by running:
debug dcerpc error