I am testing out the various dictionaries and I’m finding that both “Self-Harn and Cyberbullying” and “Source Code” seem to be very noisy dictionaries… any suggestions on possible tuning of these?
The dictionaries alone can be noisy, depending on the threshold you set it to. The pre-defined dictionaries are a collection of words, and with any type of DLP strategy you will want to add some context rather than searching for just self-harm and source code.
For example, you can combine the self-harm & cyberbullying dictionary with another dictionary that contains key words such as the CEO’s name, the company name, the school’s name, teachers’ names, etc.
There will be some fine-tuning required and it will vary for each organization.