Automated way of Cleaning Known Malicious Sites

I am running the API to pull down the sites we have in our Known Malicious sites list and running (100 at a time which is painful) against the URL Category check. The question I have is if anyone in the community has put together a script that will auto run the list of urls I have against the URL Category check and create a dump of the files that Zscaler has already marked as known malicious so that I can then remove them from my list and free up some of the Custom URLs for future endeavors?

Hello

I have a script - can you send me a sample list and I’ll test for you?
email is yogi@zscaler.com

Hello. I am interested in this script as well. Can you share?

Hello

here is the methdology.
You need a client that goes through Zscaler plaform.
I have coded right now in transparent not with explicit proxy.
Basically running it on a client with zapp installed with tunnel mode.

test-url-filter copy.txt (1.4 KB)
rename the attachment .py

and run python test-url-filter.py -f list.txt

ychandiramani@MacBook-Pro URL Filtering % python test-url-filter.py -f list.txt
URL http://2016.eicar.org/86-0-Intended-use.html/ is blocked by policy
URL http://www.lci.fr/is not blocked by policy
URL http://www.yogi.com/ is blocked by policy
where list.txt is a list of urls to be tested

1 Like

Thanks @yogic. I’ve pasted the code below too (may be handy to have search engine indexed)

#!/usr/bin/python
#

"""Check if url is already blocked 
"""

__author__ = 'yogi@zscaler.com (Yogi Chandiramani)'

import xmltodict
from socket import inet_ntoa
from struct import pack
import sys
import urllib
import re
import requests
import xml.etree.ElementTree as ET
from requests.exceptions import ConnectionError
from termcolor import colored
import os, sys, getopt

# Parsing Arguments
try:
    myopts, args = getopt.getopt(sys.argv[1:],"f:d")

except getopt.GetoptError as e:
    print (str(e))
    print("Usage: %s -f file" % sys.argv[0])
    sys.exit(2)

for o, a in myopts:
	if o == '-f':
		file=a
	if o == '-d':
		detailed=1

#read file file which contains all urls
#text_file = open(file, "r")
headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.84 Safari/537.36'}
with open(file) as fp:
	url=fp.readline()
	while url:
		#print url	
		try:
			s = requests.get(url, allow_redirects=False,timeout=2)
		except ConnectionError:
			print colored("Oops!  URL %s could not be contacted...",'yellow')%(url)
		except requests.exceptions.Timeout:
			print colored("Oops!  URL %s could not be contacted (timeout)...",'yellow')%(url)
		else:
			if s.status_code == 403:
				print colored("URL %s is blocked by policy",'red')%(url)
			else:
				print ("URL %s is not blocked by policy")%(url)                
		url=fp.readline()

1 Like