Question answered from #ask-csb slack channel:
Customer is seeing a lot of duplicate files in their ACSB environment. They have opened a ticket and support has come back with some details but they still don’t understand why they are seeing so many duplicate files. As a result the value of ACSB is being watered down. Out of ~300 files 6 have been none dupes and even then a couple of those files have been duplicates.
The file, with the same name/URL/MD5, is sandboxed multiple times sometimes within minutes of each other. Basically every time a customer’s application starts up it looks like it calls home, pulls down a DLL, which is sandboxed.My prospect has ~300 sandbox entries and most of them are this DLL. It has been sandboxed every time (they have an allow and scan rule do there is no user impact) but the logs are over run with junk. Support seems to think that this is working as designed … any thoughts ?
We’re logging that your user is downloading a file that is in our Known Benign list. If you look at the report it shows the first time we actually saw it and did the analysis.
Let’s consider the cloud effect…When something is bad, it’s pushed to all ZEN’s, CSB goodness for all. When something is GOOD, it’s held on CSB cluster, but the ZEN has to pull in the result (which it will then hold in memory). Logistically, it’s not feasible to cache every benign MD5, the RAM required would be impossible by todays measures. So, when an individual ZEN first see’s the file, it’s logged as “sent for analysis” as the ZEN does not have the MD5 in the benign/whitelisted table. It will quickly get a known-good result from CSB, meaning the next transaction will be Sandbox-Benign and allowed. Repeat for every ZEN and so-on and so-on.
With regards to the .dll file that is hitting the sandbox over and over again I would like to explain the flow of how the Sandbox works. When an unknown file arrives, it is scanned by our signature based engines, our Advanced Threat Protection Engines and, if still unknown, arrives at the Sandbox. The Sandbox policy will be evaluated and that file will be scanned if that policy exists. This is the entry for “Sent for Analysis”. If that file is malicious then that file will be blocked and added to our known bad list. This information will be propagated within minutes to all Advanced Cloud Sandbox customers. This is what we refer to as the Cloud Effect. If that file is found to be benign then that information will stay on the Sandbox cluster. Known good MD5 hashes are not stored on our processing nodes as the Known Good MD5 list is massive and would contribute to memory exhaustion on the processing nodes. When the file is subsequently downloaded it is once again run through the signature engines, ATP engines and will hit the Sandbox. The processing node will ask the sandbox if it knows about this file. Since the Sandbox cluster stores that Known good list it will immediately return “Sandbox - Benign” and the file will not be scanned. So while you are seeing this entry in your logs the file is not actually scanned more than once.
With regards to the log entries there are two options that I can see
Ignore duplicate log entries. They will show up in the logs but the files are not actually being scanned and are benign. However since they do technically hit the sandbox the logs will show up
Create the appropriate Custom URL category and Sandbox rules to Allow and not scan files from that Custom URL category. While this is possible it is not recommended as if a malicious file arrives from the URL in question then it would never be scanned by the Sandbox.
We are working on some future enhancements to our Sandbox technology, which will utilize technology from our Advanced Machine Learning and AI group, but there is no committed time to when those enhancements will be rolled out.