Zscaler and Digital Guardian Web Inspection Proxy: a lesson in pain

, ,

We have been fighting to get a deployment of zscaler client connector and digital guardian agent/web inspection proxy working since March 2022. It is now Jan-27-23.

So here is the grind. Backstory and configs first. DG WIP is a local proxy installed with the DG agent onto the vmware virtual desktop via image cloning (non-persistent destroy on log off windows machines) It runs a service and listens on localhost port 3128. It does SSL inspection and needs a trusted root cert to function correctly. It never reaches out to anything in the cloud and everything is done locally via the agent.

Zscaler client connector is configured for packet filter, tunnel, z-tunnel version 1, enforce proxy in the forwarding profile. App profile is setup to use a PAC file with various bypasses. We have messed with every setting you could think of during troubleshooting. We have tried version 2, no PAC, etc. as well.

When DGWIP is alone without zscaler everything is fine. Add zscaler to the mix and things get very interesting. The two products fight and the following happens. Sometimes zscaler wins and the cert presented to the browser is the root cert for zscaler and zscaler does what it needs to. Sometimes DGWIP wins and it is the winning cert and does what it needs to. Sometimes nothing works and every connection made is counted as invalid by the browser and websites will not load. The real killer was that NO Microsoft cloud logins would function at all when you signed into zscaler before everything else loaded. For example, on bootup the windows 10 virtual machine would load up, ZCC login was first thing to load. If you signed into zscaler first, anything MS that wanted to login subsequently would not even load the login prompts. Teams normally comes up and asks for MS account login creds and there is a space to input username and password. When signed into zscaler first you just got an error that said can’t reach login.microsoft.com. For everything. Outlook, windows activation, teams, office, etc. We exempted every…single…IP range… MS listed for those apps. Everywhere you could think of. Didn’t matter. Seems simple right? Just exit zscaler app and the problem should go away. NOPE. The machine was permanently broken and would never login to anything microsoft again. The only way to fix it was to spawn a new machine or uninstall zscaler completely. Mind you this is all reproducible to this day in my environment. Here is the real grift. If you waited to sign into zscaler until after everything MS login was done/logged in, everything would proceed normally! You could then open and close MS products, sign out, log back into MS accounts, completely normal. Absolutely bizarre. Another insane aspect of this: persistent VDI machines that can reboot, and physical machines are not impacted. At all. Everything appears to work as it should with the two products.

When zscaler is alone without DG everything is fine. The two together are like oil and water. We tried with DG support to figure out some form of domain flags/process flags exemptions or third party proxy forwarding settings to make things work. We just made it so much worse.

Finally, through months of reading documentation, troubleshooting with support and learning how each product works at a design engineering level of understanding, I found that proxy chaining ZCC to DGWIP local proxy might work. Sure enough, once the right combination of settings in zscaler was found, it works now. DG WIP can intercept SSL traffic and stop what it is supposed to stop. Zscaler can intercept SSL traffic and block bad websites, browser isolate, etc. DG also has some settings that tell it to leave zscaler processes and URLs alone.

The config I discovered to make this happen was to configure zscaler forwarding control to forward all http/https service traffic to the DGWIP proxy by using 127.0.0.1:3128 as the proxy/gateway. I then set ZCC to packet filter, tunnel, z-tunnel version 2 DTLS, system proxy never in the forwarding profile. App profile is setup to not use a PAC file. I had to exclude all private ip space traffic in domain exclusions in the app profile to enable local traffic flow. That was fun figuring that out as machines would check in and pull the zcc config then promptly shut themselves out of the local network without that exemption.

So, what do I want? I need someone from zscaler to let me know one way or the other if this is a supported config. What are the drawbacks? How does the traffic actually flow in this situation? In what order? Am I missing something important in the configs the way I have it? What traffic is actually being sent through ZCC? It would also be super cool to have someone from zscaler understand this completely and document it accordingly so others do not have to suffer like I did. I had all this run through zscaler support with a ticket as the very first thing I did. They told me zscaler works fine on its own without DG, so we can’t help. It is funny because on my own after months of work I figured out how to make zscaler play nice with DG. Sure it might not be a supported config, but it sure would be nice to know. I really don’t want to work with level 1 phone support again as I almost lost my sanity explaining this situation to them. They just don’t get it until I show them via zoom, and then the phone goes silent as they stare in disbelief. BTW I have been in contact with the sales engineer and sales guy. I felt guilty bothering the sales engineer so much, so I gave up. Thanks for the help.