Updated: Nov 21, 2019
Note - SGOS 184.108.40.206 has been released resolving the issues discussed in this blog.
Recently a Cyber Vigilance customer encountered a Memory handling bug in their Advanced Secure Gateway (ASG) deployment. This bug causes the reporting of memory pressure in the Content Analysis engine on the ASG to report as high (80% and higher). Symantec have told me that the bug is caused by a process incorrectly reporting the current memory utilisation in the Content Analysis engine and appears to affect all versions of SGOS 6.7.4.x to 220.127.116.11. With the memory utilisation being reported in as 80% and above, adding user traffic to the situation means there is only 20% memory ‘remaining’ for allocation to the handling of traffic, which unfortunately for our customer was not enough. Once the Memory reached 100% the device then started to freeze and required a cold power cycle to bring the device back to life.
A hint that things are about to go south on an ASG can sometimes be seen when the ProxySG module on the ASG starts entering the warning state due to the “cas.bluecoat-local-request” and/or “cas.bluecoat-local-response” health checks failing.
Taking a look at the Content Analysis “CAS” log will also show lines stating, “no slave process available”.
For most customers, the memory pressure rising will not be obvious. The reason for this is because there is no health monitoring or alerting offered in the ASG to track the memory pressure of the Content Analysis engine. The CPU and Memory statistics typically present in a standalone Content Analysis devices management console is removed from the ASG equivalent.
Finding the current memory pressure of the Content Analysis engine on an ASG requires looking inside a Sysinfo. Inside the Sysinfo is a section containing PDM statistics which captures key statistics and figures at fixed intervals. Searching for the string “ASG:host:memory:usage~daily15minute” will show the section of interest. Here, looking at the daily 15 minute statistics will show as close to real time as possible the current and historical memory pressure statistics.
I decided to take a look at the “ASG:host:memory:usage~daily15minute” statistics on factory reset ASG device running SGOS 18.104.22.168 and found that even when the device has no traffic and has no Content Analysis functionality enabled , the memory pressure sits at ~80% right from boot.
At the time of writing Symantec have confirmed this behaviour as a bug, and that the device freezing is caused by another bug related to the Cylance engine consuming too much memory. Currently the fix for the memory pressure bug is to downgrade the ASG to 22.214.171.124 where it has been confirmed this bug does not appear to exist. Symantec have said that this bug is being addressed in the upcoming patch release 126.96.36.199 which is scheduled for release mid October 2019.
The following shows the memory utilisation after downgrading.
If you are experiencing similar issues, it is advisable to raise a ticket with Symantec to get some advice on the best course of action.
Note - All information accurate at time of writing and derived from conversations with Symantec support. Images taken from real world production devices. I will keep this article updated with any fresh news/corrections required.