We are exploring how we might permit read-only access to github.com and github.io for our end users. The application control currently only has options to block or allow, so that doesn’t help (although we have asked our TAM to open an enhancement request for that control to include a read-only option in future). We are decrypting the traffic to these sites, so we are testing now to try and identify keywords/strings in the URLs to effectively block certain functions on the site like upload or post (to prevent data leakage). We have had success doing this with youtube and some other streaming sites. Has anyone else already explored this with github and do you have any keywords/strings that you could share for blocking functionality on the site? Also, what alternatives to github should we also consider making read-only to prevent data leakage? Thank you!
Welcome to Community Tristan (@mahantr)!
This sounds like a use case for our browser-isolation technology that will come into the platform in the future. While there is no official release date yet and safe harbour terms apply, our engineers are working with the technology via Zscaler’s acquisition of Appsulate to enable browser-isolation within Zscaler.
Additional information can be found here: https://www.zscaler.com/products/browser-isolation
Did you explore the DLP feature? We have Dictionnaries to detect source code so you can prevent people from uploading sourcecode to GitHub or any other Websites.
I am trying to get more information about what the source code dictionary is actually detecting to understand if it would be useful for us. Can you provide insight?
This is a machine learning based dictionary. The algorithm is trained to identify programming languages based on the features extracted from source code samples in our training set. It is actively tested against programming language samples like c, c++, java, csharp, perl, python, ruby, php, shell etc.
Good information - thank you
Nov 4, 2019
Zscaler: How to create policies to manage Github user access to sites with 3 different access levels.
With GitHub CloudApp, this can interfere with read-only access as URL’s in CloudApp are either allow or block and takes precedence over URL policy. Read-only access needs access to the base URL’s but needs to find URL’s with keywords in them to block accordingly. This is why Read-only users are NOT in the Cloud App policy.
Using filetype controls doesn’t provide complete coverage of file types (.csv and .txt are examples not covered by file type).
File Types blocks can be misleading to a user as larger files like .zip doesn’t show the block page message and streams some content to user.
If you need to implement File Type controls, it will only be needed for “All” Access users as everything else should be blocked with Cloud and URL policies mentioned below.
Here is a way to manage GitHub user access without dependency on file types.
Block all access to Github for all users (Generic block policy for ALL):
— Implement a URL policy to block these sites from all users (This type of URL category policy block may already exist and just added to that existing URL category):
Read-only access to GitHub for a specific set of AD groups/Users (No uploads or downloads):
a. Create URL category (A)for Upload/Download requirements as follows:
- Keywords: -upload-, /upload, /upload/, /upload/master (These keywords are required due to “upload” word being in different places within URL strings associated with GitHub access)
- URLs: codeload.github.com, github.com/upload
b. Create a URL Block policy for Read-Only AD groups/users only for this URL category (A)
c. Create URL category (B) for keyword = github
d. Create an URL Allow policy for URL category (B) for any AD groups/users which need ANY access to GitHub. ( i.e. Read-only and “All” access to Github)
e. NOTE: URL Block policy needs to be ordered above the URL Allow policy
“All” access to Github:
— Create “Cloud App” policy for GitHub to Allow ONLY for the AD groups/Users which require ALL access to GitHub
CAVEATS: With Read-only users Github users, using keywords may impact access to other URL’s which may have those words in other non-Github URL’s. This will impact only these users since the policy is defined for them only and previous URL policies ahead of this one would process for most other scenarios. My customer was willing to manage those instances as case by case as this appears to be minimal exposure.
- Within the upload processing on Github, there are .aws.com URL’s that are accessed. Within those URL’s, the keyword “-upload-“ appears to be always present. Most customers will not block .aws.com URL’s so this is why we use this keyword in the block as it’s unique and unlikely to be in other URL’s strings with “-“ at front and back of “upload”.
- For download processing on Github, the repository URL’s always redirect to codeload.github.com URL which is easier to block.
- Additional testing for uploads brings the biggest challenge as the GitHub URL strings contain the GitHub username and the repository names which are dynamic and not predictable. Also, trying to use a “*” wildcard for the URL is not possible as the structure is github.com/username/repositoryname/upload/master. It appears that we can’t do (2) wildcards back-to-back. In 2.a. above, this is all the strings I found in testing.