Table of Contents:
- Introduction
- Squid - Optimizing Web Access
- Starting Squid
- Localised settings in OpenBSD package
- Example Configuration
- Extending The Sample Configuration
- Managing the Log Files
-
- Other Miscellaneous Issues
- Authentication - the MSNT module
- Transparent Proxy - Not having to manually update all clients
- FTP Proxy
- SOCKS 5 Proxy
- Cache Utilisation Analysis Tools
- Author and Copyright
Introduction
There are at least three great values in using a caching proxy, the immediately obvious two are bandwidth optimisation (in the form of minimising unnecessary traffic, caching) and control of what resources can be requested from outside (the proxy.) The third, oft unexplored, value of a caching proxy server such as squid is the records, or logs that it maintains to allow the administrator to further 'fine-tune' the performance of the system and to isolate communications from within the environment to the external world.
Squid - Optimising Web Access
[package: squid-2.3.tgz]
[ref: Squid, A User's Guide, by Oskar Pearson]
[ref: squid faq]
Squid 2.3 package is available with the 2.8 CD and later versions may be available on the NET. To install the package, use the pkg_add program as in the example below
| # pkg_add /[path-to-package]/squid-2.3.tgz |
Once the package is installed you will be prompted for a number of further activities to refine your installation. The following are part of that installation refinement.
(1) Configure the cache swap directory by using squid -z. This process will take a bit of time.\
| # /usr/local/bin/squid -z |
| [ ... program displays ... ] |
| YYYY/MM/DD HH:MM:SS| Creating Swap Directories |
Starting Squid
You can manually start squid by typing in /usr/local/bin/squid which will start the squid parent process waiting around for connections. To configure OpenBSD to automatically start squid with every system start-up then you can edit the rc.conf.local file to modfiy the configuration and rc.local to take action when the configurations are set.
Edit the file: /etc/rc.conf.local file to include the following lines:
squid=YES
Edit the file: /etc/rc.local
After the 'starting local daemons' and before the following echo '.', Insert the following instructions to the /etc/rc.local file:
| echo -n 'starting local daemons:' |
| if [ -f /etc/squid/squid.conf ]; then |
| # [ ... stuff left out ... ] |
Now each restart of the machine will automatically check to see whether we have enabled squid in the configuration file (rc.conf.local) and then start the squid daemon. If we wish to disable squid the auto-start we can simply change squid=YES to squid=NO
Localised settings in OpenBSD package
| - configuration files | /etc/squid |
| - sample configuration files | /usr/local/share/examples/squid/conf |
| - error message files | /usr/local/share/squid/errors |
| - sample error message | /usr/local/share/examples/squid/errors |
| - icons | /usr/local/share/squid/icons |
| - sample icons | /usr/local/share/examples/squid/icons |
| - cache | /var/squid/cache |
| - logs | /var/squid/logs |
| - uid:gid squid runs as is | www:www |
Example Configuration
Scenario:
At a private school I work with they have just recieved a DSL connection to the local ISP and before releasing the Internet connection the administrators have requirements (policies) within the school they wish to be implemented as part of the Internet Connection.
The computer department have come to a realisation that a Block by Default approach is not conducive to optimal educational use of the Internet, but there is a need for policing and monitoring its policies.
The chosen solution is two-fold. (1.) Physical supervision of Internet Access computers is mandatory and must be combined with user education and training. (2) Software blocking will be both informative and as comprehensive as possible.
Software monitoring, restrictions is where squid plays a significant role. Squid's Access Control Lists (ACLs) provide a very flexible environment for supporting organisational policies.
Details:
School Policies: The school has some standards of certain types of material it does not want students to access through the Internet (specifically pornography.) As a consequence of that requirement, the school also does not want students using 'chat' environments or public web hosted email services (eg. hotmail)
Network Policies: The DSL connection is 64K but the ISP has a very poor connection to the backbone (remember we're calling from Tonga) so there is a significant concern about bandwidth utilisation. The less unnecessary stuff going up and down the 'pipe' the better for us.
As a consequence of the bandwidth problem, and the need to keeping the students focussed on academically oriented pursuits, the network administrators want to ban a number of entertainment sites. Primarily to minimise bandwidth use and secondarily to keep students off time wasters.
Advertisers are problematic bandwidth consumers, so these will also be blocked where possible.
Network Configuration:
The school operates 3 subnets with differing authorisation levels. Through some magic, we would like to provide special access privileges for system administrators:
| Segment | Purpose |
| 2 class-rooms | controlled, timed access with potential limits to 'net access during class times. |
| 1 pub access | Public Access for school community. This will include machines available to school administrators and general staff for accessing the network and 'NET. subnet_pub |
| 1 admin | administrator with freer access to the 'NET, probably need to be password authenticated. |
Authentication is the simplest solution for providing system administrators with greater access to the Internet. To simplify this example, I will discuss authentication in the more detailed revision of this example.
The 7 stages we will cover to get our squid configuration working are:-
- Specifying the port we want squid to listen on
- Specifying which network IPs we will support in squid
- Specifying Time intervals we will support
- Specifying Organisational Policies (Restricted Sites)
- Specifying Informative Messages relevant to Organisational Policies
- Configuring Access to the Cache
- Let's Go
Specifying the Port to Listen On
Edit the file: /etc/squid/squid.conf
Now the scenario is out of the way, lets get down to configuring our squid cache/proxy.
The control of external access to the local lan should be managed by the Firewall.
To be safer (or am I just pedantic) I set the below restriction on where the squid server is listening.
| # http_port 3128 |
Normally squid starts up and listens to 3128 on all network devices. The above just ensures that it is listening on port 3128 only for the internal network. Our firewall can further block port 3128 requests from coming through from the outside (but our ACLs should be handling any further problems.)
Specifying which network IPs we will support in squid
Next I set up my Access Control Lists (ACLs) defining the range of machines I have on the Internal Network.
| # Networks allowed to use this Cache acl subnet_lab1 src ip-address_lab1/netmask acl subnet_pub src ip-address_pub/netmask acl all src 0.0.0.0/0.0.0.0 acl dst_all dst 0.0.0.0/0.0.0.0 |
I choose to list the subnets separately (all non-routeable IPs) as we have some policies for Internet access that can be managed using the subnet information. The acl "all" and "dst_all" refer to any communications with all available internet IP addresses. The "all" refers to "source" or 'client' ip address wanting to use the cache. The "dst_all" refers to "destination" or URL host being requested.
Specifying Time intervals we will support
Related to the subnet information will be certain time periods for which we want to disable specific subnets. So I have to set up the ACL for that
| # After Hours Settings |
Our sample Network Policy will provide different service levels dependent on the time of day (e.g. allow access after hours to different services blocked during business hours.)
Squid TIME acls cannot wrap from one day to the next, so to get from 4:30 in the afternoon until 8:00 the next morning, we have to actually specify one acl for 4:30 to midnight and another acl for midnight to 8 in the morning.
Specifying Organisational Policies (Restricted Sites)
A number of organisational policies require that we restrict use of the Internet and for that we have collected a list of urls and domains from the Internet. We are storing these urls in text files related to the categorisation we have chosen (eg. entertainment, porn, etc.)
| # Regular Expression Review of URLs, and Destination Domains
# The first list are sites known to be wrongly blocked by the later list
# The following are the sites restricted by organisational policy acl block_advertisers url_regex -i "/etc/squid/block_advertisers.txt" |
We create ACLs for each category, and we store the text files in the /etc/squid directory. The text files list on separate lines the words or phrase we wish to block access to (such as domain adresses.)
Specifying Informative Messages relevant to Organisational Policies
Location: /usr/local/share/squid/errors
| # TAG: deny_info |
We have created customised error messages for the different areas our organisational policy restricts access. The error messages are text files using the naming convention used by the squid error messages. We store the files in /usr/local/share/squid/errors (standard configuration in the squid-2.3 OpenBSD port.)
Note: the beautify our error messages (ie. add graphics & style sheet) we have created an alias directory in our Apache website to store these extra files. Squid will throw the custom messages at the user browser, but all other access has to come from the local website.
Configuring Access to the Cache
The final major thing, is to set up our rules for accessing the cache.
| # TAG: http_access |
The standard format, as shown above, is http_access followed by either allow or denu and then a list of your aclnames (with an optional ! at the begin to negate the aclname.) Note that aclnames are "ANDed" together.
There are a number of standard security configurations already in squid.conf, I've left them standing and added the things specific to our scenario.
Restricting Access to External Sites - relevant to organisational policies
| # # http ACCESS PRIVILEGES # --> URLs to Unblock
# --> Domains & URLS to block |
Our first action is to block those sites which are restricted by our organisational policies.
Allowing Specified networks access to the cache
Specifying access to cache from LAN machines
| # --> Subnet Access to the NET |
In this example, we allow the local subnets to use the cache, so long as they are authenticated (again, if you are not using authentication then just remove the "authenticated" acl.)
Restricting Internal Access - relevant to organisational policies
Because we are not ready for prime-time, we denied Internet access to the public access machines. 1st they are two buildings away and we cannot supervise them at the moment, and 2nd we haven't gone through our education program for staff use.
| # --> Subnet Access to the NET # During initial phase, keep subnet_pub off the air |
Because of the same above problems of supervising the public access terminals, we have included time based limiting. Once we are certain our system is better configured for public access then we can enable access from the public terminals within specified hours.
Ignoring the cache when requesting from Local Area Network
Next, we tell squid to not cache requests for the internal Local Area Network sites.
| # always go direct to LAN sites |
Our local website doesn't need to be cached. Some of my friends think they get better performance (even for internal clients) by caching the local web server. Parts of our sites are static pages (straight html, images, and pdfs) but our new section is based on PHP so we will just avoid any further complications with our cache by not caching it.
Let's Go.
The final part is to specifically state that we want to be able to access the rest of the world, and we want to specifically deny access to the cache from anyone we have not specifically allowed access.
| # And finally deny all other access to this proxy http_access allow dst_all |
Extending the Sample Configuration
This section further extends the previous example, but with more specifics. Partially as an aid to anyone wishing further examples, but primarily to document our network.
The portions of the example we will extend, and add upon are:
- Authenticating Users
- Specifying Organisational Policies (Restricted Sites)
- Specifying Informative Messages relevant to Organisational Policies
- Configuring Access to the Cache
- Let's Go
Authenticating Users
To maximise the potential for user conformance, while providing a more flexible user environment we have selected to use User Authentication. The most flexible for our configuration is the MSNT authentication module which is configured as below. (More details for installing is listed further below.)
All the clients are authenticated on an MS Windows NT Domain before they can use the network, so our choice was simplified.
After installing and testing the msntauth module, we configure the authentication by including the following directives in the /etc/squid/squid.conf file
Edit the file /etc/squid/squid.conf:
| authenticate_program /usr/local/bin/msntauth # authenticate_ip_ttl_is_strict on |
We specify the Authentication program and some important parameters.
In our environment we will let the authentication remain active 15 minutes after the last authentication (900 seconds). To annoy people who wish to share their passwords (should be more restrictive than this) we require authentication of a user to be tied to an ip address. If within 60 seconds two IP addresses request through the cache, both users will be denied access and be required to re-authenticate.
If we were really pedantic about password use (which may be relevant in our context) we could force authentication to remain with the originating authenticator until expiry. Specifically this prevents the user using two terminals.
Our organisation policy we setup authentication so (a) Only those designated for Internet Access can access the external web, (b) Our log files can determine by user their access patterns to the Internet. Note that this approach may be considered draconian by others and is dependent on the type of site you are running for which purpose you want to use authentication.
For authentication to be useful, we next have to specify an acl.
| # Authentication acl authenticated proxy_auth REQUIRED acl users_sysadmin proxy_auth AdminID1 AdminID2 |
We want authentication of all users before they access the Internet (for this we will use 'authenticated') and we want to provide special privileges to System Administrators (for this we will use 'users_sysadmin.
The AdminID1, AdminID2 are users on the server that will provide the authentication (in our case on our Windows NT Domain.)
Specifying Organisational Policies (Restricted Sites)
| # Regular Expression Review of URLs, and Destination Domains |
We drastically change our blocking scheme by using three separate methods of analysing a URL before we decide whether it should be allowed, or blocked. In our previous example we only used the full URL (url_regex) In this example, we use url_regex which analyses the full URL, and dstdom_regex which analyses only the host (domain) information of the URL.
This distinction is very important when we want to use a catch word like "quake" to block access to game sites that host quake tournaments. When we were blocking "quake" in the URL, students were unable to do research on Earthquakes as our URL based block prevented access.
By using dstdom_regex we can block only the reference to quake in the URLs (which still blocks Earthquake.com etc) By further refining our regular expression of quake, we can specify .quake. or ^quake. to block only sites with quake as a host (allow earthquake, deadquake, aquake) and block only domain names where quake. is at the very beginning, but allow quaken etc.
| acl block_filesURLPATH urlpath_regex -i "/etc/squid/block_filesURLPATH.txt" |
A further improvement in selectivity with the url is the urlapath_regex which only looks at the "path" portion of the URL. We will use the path only portion to review which are file transfers, audio video that we do not want.
Of course Squid 2.5 (and possibly 2.4) supports acls for mime-types, but I'm trying to get this stuff working 1st.
The next acl we configure is to specify the maximum number of connections we want users to be doing. This is mostly relevant to the power users, who inexplicably consume significant bandwidth by running multiple browsers.
| acl MaxCONNECTIONS maxconn 5 |
Since this is the 1st time we're doing this, we will set a reasonable number initially and then change things along the way.
Note from the FAQ:
| Note, the maxconn ACL type is kind of tricky because it uses less-than comparison. The ACL is a match when the number of established connections is greater than the value you specify. |
Specifying Informative Messages relevant to Organisational Policies
| deny_info CUSTOM_ERRS_ADVERTISERSurl block_advertisersURL |
Our Custom Error Messages have also evolved to inform users which parts of the URL they have hit upon has caused the 'connection failure.'
We deem that this is more helpful to clients and will maximise our ability to analyse whether the ruleset is accurate/effective.
Configuring Access to the Cache
|
|
Restricting Access to External Sites - relevant to organisational policies
| # --> Domains & URLS to block |
Our access configuration remains largely the same, we're just using more acls.
| ## |
One change we implement is to allow administrators greater freedom to the Internet, restricting their access only to sites specifically limited by the network policy and organisational policy.
users_sysadmin is a proxy authentication acl, so this allow sequence will only be made available if the client user can authenticate to the users listed with users_sysadmin (in our example: AdminID1, and AdminID2)
| http_access deny block_webhostURL http_access deny block_filesURLPATH |
We now restrict external access via the domain portion of the URL, giving us greater freedom to use words that would otherwise cause significant problem if used in the complete URL. We can also provide a set of limited users extra privileges, independent of the machines they are using.
| http_access allow block_filesURLPATH authenticated TIMEafterhoursMORN !MaxCONNECTIONS http_access deny block_filesURLPATH |
With file restrictions we choose to deny access to download files during peak use periods. Here we specifically allow file downloads to authenticated users after hours and when the user has not exceeded allowed maximum number of connections.
Otherwise, we will block file downloads.
Allowing Specified networks access to the cache
| # --> Subnet Access to the NET |
The subnets not only have to be correct to allow access to the cache, the clients also have to be connected and must not be greater than MaxConnections (5 in our initial estimation.)
To gain access to the cache, the client must
- be in a valid ip-address (subnet_lab1 or subnet_lab2) AND
- be an authenticated user (userid, password) AND
- Must not have more than the MaxCONNECTIONS
Restricting Internal Access - relevant to organisational policies
| http_access deny subnet_pub TIMEafterhoursMORN |
There is minimal change in the time restriction. We have only included authentication and maxconn requirements to the commented access specifications.
Let's Go
| http_access allow dst_all authenticated !MaxCONNECTIONS |
In our final line we have required authentication on going out from the cache to the rest of the world, just in case we've made some fundamentally stupid mistake somewhere else in our configuration.
Managing the Log Files
Edit the /etc/daily.local file and add the file lines:
| if [ -x /usr/local/bin/squid -a -f /var/squid/logs/squid.pid ]; then |
Other Miscellaneous Issues ?
Squids DNS Startup Test
We get very poor service from our ISP, and one serious problem when we were configuring our server was not being able to resolve the DNS names for squid. Failing to find the dns entries for netscape.com, internic.net, nlanr.net, microsoft.com the squid server will just hang-around and then eventually quit.
| # TAG: dns_testnames dns_testnames mydomain.com |
To solve the startup problem (because our ISP will regularly have problems with their DNS server) we set the dns test to look for our host details, which is configured in our internal DNS Server.
Debugging your Configuration
| # TAG: debug_options |
I was having a number of problems with squid while playing around with the configuration file (especially when trying to get authentication working) and because of the problems we were having with our ISP connection failures. Squid can log more information in the /var/squid/logs/cache.log file. By increasing the amount of information that is placed in there I had a much better understanding of when squid was failing.
Squid User and Group
Another problem I was having in updating and downgrading squid (I was originally attempting to use LDAP authentication in squid to synchronise accounts between Samba, Squid, & Windows 2000) is the fact that the source distribution will use nobody but the OpenBSD ports use www:www
| # TAG: cache_effective_user |
While shifting between port and source I was continually having problems with the source not being able to use the directories created by the OpenBSD port. It took a while (dumb admin I am) to figure out that uid:gid were different between the different compilations. Sometimes I would remember the ./configure directive, sometimes I'd forget.
Authentication - the MSNT module
[source: msntauth-v2.0 http://stellarx.tripod.com/]
The authentication module works pretty well, with little user involvement. Instructions are well documented in the accompanying README.html file.
The only customisations that was required was changing the default directory settings.
Edit File: confload.c (reference is out of date in the readme file)
| #define CONFIGFILE "/usr/local/squid/etc/msntauth.conf" |
Change the settings to what is the general directory structure for OpenBSD
| #define CONFIGFILE "/etc/squid/msntauth.conf" /* Path to configuration file */ |
Edit the Makefile to specify the directories where you wish the bin files to be located. (no autoconfig yet.)
Copy the sample msntauth.conf file from the source directory to the directory specified above (/etc/squid.) Edit the file to specify your Domain authentication configuration.
touch the file /etc/squid/denyusers
touch the file /etc/squid/allowusers
Test that the authentication module is functioning correctly by manually executing it at the command prompt. Refer to the readme.html for further instructions on testing.
Content Filtering
If you think that filtering through the use of squid by URL or IP is draconian, some people actually have the need to filter even by the content of pages delivered.
For HTTP traffice, a proxy filtering solution is DansGuardian at http://www.dansguardian.org/.
Transparent Proxy
Package: transproxy-0.4.tgz
If you want to use transparent proxying with squid-authentication, don't. Read the FAQ and source for further details.
This program is used with Darren Reed's IPFILTER package and used to intercept things like http requests and divert them to a www proxy server (eg: squid), without requiring user intervention or configuration.
Install the package and make the following configuration changes.
Edit: /etc/services file to include the following lines:
| tproxy tcp/8081 # Transparent Proxy |
Edit: /etc/rc.conf.local file to include the following lines in Section 1:
| tproxy=YES |
Edit: /etc/rc.local.
After the 'starting local daemons' and before the following echo '.', Insert the following instructions to the /etc/rc.local file:
|
echo -n 'starting local daemons:' # [ ... stuff left out ... ]
|
|
/usr/local/sbin/tproxy -s 8081 -r www [proxy-server-ip-address] [port] |
| # [ ... stuff left out ... ] echo '.' |
This tells the transparent proxy server to start as a server (-s) accept requests on port 8081, use the UserID www (-r) and to pass data on to the host [proxy-server-ip-address] at port [port]. On my machine (since i have the cache on the same server and I'm using squid at 3128) I can use:
| # /usr/local/sbin/tproxy -s 8081 -r www 127.0.0.1 3128 |
The following ipnat rules should redirect www connection attempts (from the internal network to the external network) through to the cache.
Edit: /etc/ipnat.rules to include the line
| rdr EXT_LINK 0.0.0.0/0 port www -> 127.0.0.1 port tproxy rdr EXT_LINK 0.0.0.0/0 port 8080 -> 127.0.0.1 port tproxy |
Unlike some other transparent proxy solutions, this does not require the proxy run on the machine itself. Running the caching server on a separate machine allows for greater scalability, and a feature of tproxyd is that it accepts connections on the redirected port, connects to the real proxy server and transports data between the two sockets.
FTP Proxy
Suse Proxy-Suite
SOCKS 5 Proxy
dante in ports
Cache Utilisation Analysis Tools
webalizer - (package)
squidclients - http://www.cineca.it/~nico/squidclients.thml
Human readable reports on cache utilisation, or network utilisation is always good for something. A few of the tools that we have come across for generating automatic reports on the cache use include: calamaris, webalizer, squidclients, and sqmgrlog (renamed as sarg).
What does the log file record.
Calamaris
[ref: http://calamaris.cord.de/]
Calamaris can generate a quick and neatly formatted report from the access files.
Calamaris interesting options:-
-a all (equivalent to: -d 20 -P 60 -r 1 -s -t 20
-d n show n top-level and n second level destinations
-P n show throughput data for every n minutes
-r n show n requesters
-s show verbose status reports
-t n show n content-type, n extensions, and requested protocols
Output Format
-m mailformat
-w web HTML format
Sample usage:
| #!/bin/sh # Shell Script Used to generate log analysis reports from squid logs# using calamaris # cd /var/squid/logs gunzip access*.gz cat access.log access.log.0 access.log.1 access.log.2 access.log.3 access.log.4 access.log.5 access.log.6 | calamaris -a -w > squidreport.html gzip access.log.* # cat squidreport.html | mail -s "calamaris weekly report" somebody |
Assumptions in the script are:
* calamaris has been manually installed into /usr/local/bin
* squid access log files are located at /var/squid/logs
* log files are rotated for 7 days (0 ~ 6)
sqmgrlog, sarg
[ref: http://web.onda.com.br/orso/index.html]
Sarg is a Squid Analysis Report Generator that allow you to view "where" your users are going to on the Internet. Sarg generate reports in html, with many fields, like: users, IP Addresses, bytes, sites and times.
This is what we actually use, and it was so easy to follow the instructions I can't remember how it was done.
Author and Copyright
Copyright (c) 2000/1/2 Samiuela LV Taufa. All Rights Reserved.
I reserve the right to be totally incorrect even at the best advice of betters. In other words, I'm probably wrong in enough places for you to call me an idiot, but don't 'cause you'll hurt my sensibilities, just tell me where I went wrong and I'll try again.
You are permitted and encouraged to use this guide for fun or for profit as you see fit. If you republish this work in what-ever form, it would be nice (though not enforceable) to be credited.

No comments:
Post a Comment