SeTRoM: Proxy/Cache Service for the Internet

Table of Contents:

Introduction

There are at least three great values in using a caching proxy, the immediately obvious two are bandwidth optimisation (in the form of minimising unnecessary traffic, caching) and control of what resources can be requested from outside (the proxy.) The third, oft unexplored, value of a caching proxy server such as squid is the records, or logs that it maintains to allow the administrator to further 'fine-tune' the performance of the system and to isolate communications from within the environment to the external world.

Squid - Optimising Web Access

[package: squid-2.3.tgz]
[ref: Squid, A User's Guide, by Oskar Pearson]
[ref: squid faq]

Squid 2.3 package is available with the 2.8 CD and later versions may be available on the NET. To install the package, use the pkg_add program as in the example below

# pkg_add /[path-to-package]/squid-2.3.tgz

Once the package is installed you will be prompted for a number of further activities to refine your installation. The following are part of that installation refinement.

(1) Configure the cache swap directory by using squid -z. This process will take a bit of time.\

# /usr/local/bin/squid -z

[ ... program displays ... ]

YYYY/MM/DD HH:MM:SS| Creating Swap Directories

Starting Squid

You can manually start squid by typing in /usr/local/bin/squid which will start the squid parent process waiting around for connections. To configure OpenBSD to automatically start squid with every system start-up then you can edit the rc.conf.local file to modfiy the configuration and rc.local to take action when the configurations are set.

Edit the file: /etc/rc.conf.local file to include the following lines:

squid=YES

Edit the file: /etc/rc.local

After the 'starting local daemons' and before the following echo '.', Insert the following instructions to the /etc/rc.local file:

echo -n 'starting local daemons:'
# [ ... stuff left out ... ]

if [ -f /etc/squid/squid.conf ]; then
if [ X"${squid}" == X"YES" -a -x /usr/local/bin/squid ]; then
echo -n ' squid'; /usr/local/bin/squid
fi
fi

# [ ... stuff left out ... ]
echo '.'

Now each restart of the machine will automatically check to see whether we have enabled squid in the configuration file (rc.conf.local) and then start the squid daemon. If we wish to disable squid the auto-start we can simply change squid=YES to squid=NO

Localised settings in OpenBSD package

- configuration files	/etc/squid
- sample configuration files	/usr/local/share/examples/squid/conf
- error message files	/usr/local/share/squid/errors
- sample error message	/usr/local/share/examples/squid/errors
- icons	/usr/local/share/squid/icons
- sample icons	/usr/local/share/examples/squid/icons
- cache	/var/squid/cache
- logs	/var/squid/logs
- uid:gid squid runs as is	www:www

Example Configuration

Scenario:

At a private school I work with they have just recieved a DSL connection to the local ISP and before releasing the Internet connection the administrators have requirements (policies) within the school they wish to be implemented as part of the Internet Connection.

The computer department have come to a realisation that a Block by Default approach is not conducive to optimal educational use of the Internet, but there is a need for policing and monitoring its policies.

The chosen solution is two-fold. (1.) Physical supervision of Internet Access computers is mandatory and must be combined with user education and training. (2) Software blocking will be both informative and as comprehensive as possible.

Software monitoring, restrictions is where squid plays a significant role. Squid's Access Control Lists (ACLs) provide a very flexible environment for supporting organisational policies.

Details:

School Policies: The school has some standards of certain types of material it does not want students to access through the Internet (specifically pornography.) As a consequence of that requirement, the school also does not want students using 'chat' environments or public web hosted email services (eg. hotmail)

Network Policies: The DSL connection is 64K but the ISP has a very poor connection to the backbone (remember we're calling from Tonga) so there is a significant concern about bandwidth utilisation. The less unnecessary stuff going up and down the 'pipe' the better for us.

As a consequence of the bandwidth problem, and the need to keeping the students focussed on academically oriented pursuits, the network administrators want to ban a number of entertainment sites. Primarily to minimise bandwidth use and secondarily to keep students off time wasters.

Advertisers are problematic bandwidth consumers, so these will also be blocked where possible.

Network Configuration:

The school operates 3 subnets with differing authorisation levels. Through some magic, we would like to provide special access privileges for system administrators:

Segment	Purpose
2 class-rooms	controlled, timed access with potential limits to 'net access during class times. subnet_lab1, subnet_lab2
1 pub access	Public Access for school community. This will include machines available to school administrators and general staff for accessing the network and 'NET. subnet_pub
1 admin	administrator with freer access to the 'NET, probably need to be password authenticated.

Authentication is the simplest solution for providing system administrators with greater access to the Internet. To simplify this example, I will discuss authentication in the more detailed revision of this example.

The 7 stages we will cover to get our squid configuration working are:-

Specifying the Port to Listen On

Edit the file: /etc/squid/squid.conf

Now the scenario is out of the way, lets get down to configuring our squid cache/proxy.

The control of external access to the local lan should be managed by the Firewall.

To be safer (or am I just pedantic) I set the below restriction on where the squid server is listening.

# http_port 3128
http_port internal_nic1:3128
http_port internal_nic2:3128

Normally squid starts up and listens to 3128 on all network devices. The above just ensures that it is listening on port 3128 only for the internal network. Our firewall can further block port 3128 requests from coming through from the outside (but our ACLs should be handling any further problems.)

Specifying which network IPs we will support in squid

Next I set up my Access Control Lists (ACLs) defining the range of machines I have on the Internal Network.

# Networks allowed to use this Cache

acl subnet_lab1 src ip-address_lab1/netmask
acl subnet_lab2 src ip-address_lab2/netmask

acl subnet_pub src ip-address_pub/netmask

acl all src 0.0.0.0/0.0.0.0

acl dst_all dst 0.0.0.0/0.0.0.0

I choose to list the subnets separately (all non-routeable IPs) as we have some policies for Internet access that can be managed using the subnet information. The acl "all" and "dst_all" refer to any communications with all available internet IP addresses. The "all" refers to "source" or 'client' ip address wanting to use the cache. The "dst_all" refers to "destination" or URL host being requested.

Specifying Time intervals we will support

Related to the subnet information will be certain time periods for which we want to disable specific subnets. So I have to set up the ACL for that

# After Hours Settings
acl TIMEafterhoursMORN time MTWHF 00:00-08:00
acl TIMEafterhoursAFT time MTWHF 16:30-24:00
acl TIMEsatMORN time A 00:00-07:00
acl TIMEsatAFT time A 17:00-24:00
acl TIMEsundALLDAY time S 00:00-24:00

Our sample Network Policy will provide different service levels dependent on the time of day (e.g. allow access after hours to different services blocked during business hours.)

Squid TIME acls cannot wrap from one day to the next, so to get from 4:30 in the afternoon until 8:00 the next morning, we have to actually specify one acl for 4:30 to midnight and another acl for midnight to 8 in the morning.

Specifying Organisational Policies (Restricted Sites)

A number of organisational policies require that we restrict use of the Internet and for that we have collected a list of urls and domains from the Internet. We are storing these urls in text files related to the categorisation we have chosen (eg. entertainment, porn, etc.)

# Regular Expression Review of URLs, and Destination Domains

# The first list are sites known to be wrongly blocked by the later list
acl unblock_porn url_regex -i "/etc/squid/unblock_porn.txt"

# The following are the sites restricted by organisational policy

acl block_advertisers url_regex -i "/etc/squid/block_advertisers.txt"
acl block_entertainment url_regex -i "/etc/squid/block_entertainment.txt"
acl block_webmail url_regex -i "/etc/squid/block_webmail.txt"
acl block_porn url_regex -i "/etc/squid/block_porn.txt"

We create ACLs for each category, and we store the text files in the /etc/squid directory. The text files list on separate lines the words or phrase we wish to block access to (such as domain adresses.)

Specifying Informative Messages relevant to Organisational Policies

Location: /usr/local/share/squid/errors

# TAG: deny_info
# Usage: deny_info err_page_name acl
#
#Default:
# none
deny_info CUSTOM_ERRS_ADVERTISERS block_advertisers
deny_info CUSTOM_ERRS_ENTERTAINMENT block_entertainment
deny_info CUSTOM_ERRS_PORN block_porn
deny_info CUSTOM_ERRS_WEBMAIL block_webmail

We have created customised error messages for the different areas our organisational policy restricts access. The error messages are text files using the naming convention used by the squid error messages. We store the files in /usr/local/share/squid/errors (standard configuration in the squid-2.3 OpenBSD port.)

Note: the beautify our error messages (ie. add graphics & style sheet) we have created an alias directory in our Apache website to store these extra files. Squid will throw the custom messages at the user browser, but all other access has to come from the local website.

Configuring Access to the Cache

The final major thing, is to set up our rules for accessing the cache.

# TAG: http_access
# Allowing or Denying access based on defined access lists
#
# Access to the HTTP port:
# http_access allow|deny [!]aclname ...

The standard format, as shown above, is http_access followed by either allow or denu and then a list of your aclnames (with an optional ! at the begin to negate the aclname.) Note that aclnames are "ANDed" together.

There are a number of standard security configurations already in squid.conf, I've left them standing and added the things specific to our scenario.

Restricting Access to External Sites - relevant to organisational policies

#
# INSERT YOUR OWN RULE(S) HERE TO ALLOW ACCESS FROM YOUR CLIENTS
#

# http ACCESS PRIVILEGES

# --> URLs to Unblock
http_access allow unblock_porn

# --> Domains & URLS to block
http_access deny block_advertisers
http_access deny block_entertainment
http_access deny block_porn
http_access deny block_webmail

Our first action is to block those sites which are restricted by our organisational policies.

Allowing Specified networks access to the cache

Specifying access to cache from LAN machines

# --> Subnet Access to the NET

http_access allow localhost
http_access allow subnet_lab1
http_access allow subnet_lab2

In this example, we allow the local subnets to use the cache, so long as they are authenticated (again, if you are not using authentication then just remove the "authenticated" acl.)

Restricting Internal Access - relevant to organisational policies

Because we are not ready for prime-time, we denied Internet access to the public access machines. 1st they are two buildings away and we cannot supervise them at the moment, and 2nd we haven't gone through our education program for staff use.

# --> Subnet Access to the NET

http_access deny subnet_pub

# During initial phase, keep subnet_pub off the air
#
# After testing, the below script should be used
# --> Format, deny 1st and then allow later
http_access deny subnet_pub TIMEafterhoursMORN
http_access deny subnet_pub TIMEafterhoursAFT
http_access deny subnet_pub TIMEsatMORN
http_access deny subnet_pub TIMEsatAFT
http_access deny subnet_pub TIMEsundALLDAY
# http_access allow subnet_pub authenticated

Because of the same above problems of supervising the public access terminals, we have included time based limiting. Once we are certain our system is better configured for public access then we can enable access from the public terminals within specified hours.

Ignoring the cache when requesting from Local Area Network

Next, we tell squid to not cache requests for the internal Local Area Network sites.

# always go direct to LAN sites
# always cache, and always cache (never_direct) all other sites.
always_direct allow localhost
always_direct allow subnet_lab1
always_direct allow subnet_lab2
#never_direct allow all

Our local website doesn't need to be cached. Some of my friends think they get better performance (even for internal clients) by caching the local web server. Parts of our sites are static pages (straight html, images, and pdfs) but our new section is based on PHP so we will just avoid any further complications with our cache by not caching it.

Let's Go.

The final part is to specifically state that we want to be able to access the rest of the world, and we want to specifically deny access to the cache from anyone we have not specifically allowed access.

# And finally deny all other access to this proxy

http_access allow dst_all
http_access deny all

Extending the Sample Configuration

This section further extends the previous example, but with more specifics. Partially as an aid to anyone wishing further examples, but primarily to document our network.

The portions of the example we will extend, and add upon are:

Authenticating Users

To maximise the potential for user conformance, while providing a more flexible user environment we have selected to use User Authentication. The most flexible for our configuration is the MSNT authentication module which is configured as below. (More details for installing is listed further below.)

All the clients are authenticated on an MS Windows NT Domain before they can use the network, so our choice was simplified.

After installing and testing the msntauth module, we configure the authentication by including the following directives in the /etc/squid/squid.conf file

Edit the file /etc/squid/squid.conf:

authenticate_program /usr/local/bin/msntauth
authenticate_children 15
authenticate_ttl 900 seconds
authenticate_ip_ttl 60 seconds

# authenticate_ip_ttl_is_strict on

We specify the Authentication program and some important parameters.

In our environment we will let the authentication remain active 15 minutes after the last authentication (900 seconds). To annoy people who wish to share their passwords (should be more restrictive than this) we require authentication of a user to be tied to an ip address. If within 60 seconds two IP addresses request through the cache, both users will be denied access and be required to re-authenticate.

If we were really pedantic about password use (which may be relevant in our context) we could force authentication to remain with the originating authenticator until expiry. Specifically this prevents the user using two terminals.

Our organisation policy we setup authentication so (a) Only those designated for Internet Access can access the external web, (b) Our log files can determine by user their access patterns to the Internet. Note that this approach may be considered draconian by others and is dependent on the type of site you are running for which purpose you want to use authentication.

For authentication to be useful, we next have to specify an acl.

# Authentication

acl authenticated proxy_auth REQUIRED

acl users_sysadmin proxy_auth AdminID1 AdminID2

We want authentication of all users before they access the Internet (for this we will use 'authenticated') and we want to provide special privileges to System Administrators (for this we will use 'users_sysadmin.

The AdminID1, AdminID2 are users on the server that will provide the authentication (in our case on our Windows NT Domain.)

Specifying Organisational Policies (Restricted Sites)

# Regular Expression Review of URLs, and Destination Domains
acl unblock_pornURL url_regex -i "/etc/squid/unblock_pornURL.txt"
acl unblock_domainDOM dstdom_regex -i "/etc/squid/unblock_domainDOM.txt"
acl unblock_stuffURL url_regex -i "/etc/squid/unblock_stuffURL.txt"
acl block_pornURL url_regex -i "/etc/squid/block_pornURL.txt"
acl block_pornDOM dstdom_regex -i "/etc/squid/block_pornDOM.txt"
acl block_advertisersURL url_regex -i "/etc/squid/block_advertisersURL.txt"
acl block_advertisersDOM dstdom_regex -i "/etc/squid/block_advertisersDOM.txt"
acl block_entertainmentURL url_regex -i "/etc/squid/block_entertainmentURL.txt"
acl block_entertainmentDOM dstdom_regex -i "/etc/squid/block_entertainmentDOM.txt"
acl block_anonymizersDOM url_regex -i "/etc/squid/block_anonymizersDOM.txt"
acl block_webhostURL url_regex -i "/etc/squid/block_webhostURL.txt"
acl block_webhostDOM dstdom_regex -i "/etc/squid/block_webhostDOM.txt"
acl block_badlangURL url_regex -i "/etc/squid/block_badlangURL.txt"
acl block_piratesURL url_regex -i "/etc/squid/block_piratesURL.txt"
acl block_piratesDOM dstdom_regex -i "/etc/squid/block_piratesDOM.txt"

We drastically change our blocking scheme by using three separate methods of analysing a URL before we decide whether it should be allowed, or blocked. In our previous example we only used the full URL (url_regex) In this example, we use url_regex which analyses the full URL, and dstdom_regex which analyses only the host (domain) information of the URL.

This distinction is very important when we want to use a catch word like "quake" to block access to game sites that host quake tournaments. When we were blocking "quake" in the URL, students were unable to do research on Earthquakes as our URL based block prevented access.

By using dstdom_regex we can block only the reference to quake in the URLs (which still blocks Earthquake.com etc) By further refining our regular expression of quake, we can specify .quake. or ^quake. to block only sites with quake as a host (allow earthquake, deadquake, aquake) and block only domain names where quake. is at the very beginning, but allow quaken etc.

acl block_filesURLPATH urlpath_regex -i "/etc/squid/block_filesURLPATH.txt"

A further improvement in selectivity with the url is the urlapath_regex which only looks at the "path" portion of the URL. We will use the path only portion to review which are file transfers, audio video that we do not want.

Of course Squid 2.5 (and possibly 2.4) supports acls for mime-types, but I'm trying to get this stuff working 1st.

The next acl we configure is to specify the maximum number of connections we want users to be doing. This is mostly relevant to the power users, who inexplicably consume significant bandwidth by running multiple browsers.

acl MaxCONNECTIONS maxconn 5

Since this is the 1st time we're doing this, we will set a reasonable number initially and then change things along the way.

Note from the FAQ:

Note, the maxconn ACL type is kind of tricky because it uses less-than comparison. The ACL is a match when the number of established connections is greater than the value you specify.

Specifying Informative Messages relevant to Organisational Policies

deny_info CUSTOM_ERRS_ADVERTISERSurl block_advertisersURL
deny_info CUSTOM_ERRS_ADVERTISERSdom block_advertisersDOM
deny_info CUSTOM_ERRS_ANONYMIZERSdom block_anonymizersDOM
deny_info CUSTOM_ERRS_BADLANGurl block_badlangURL
deny_info CUSTOM_ERRS_ENTERTAINMENTurl block_entertainmentURL
deny_info CUSTOM_ERRS_ENTERTAINMENTdom block_entertainmentDOM
deny_info CUSTOM_ERRS_FILESurlpath block_filesURLPATH
deny_info CUSTOM_ERRS_PIRATESurl block_piratesURL
deny_info CUSTOM_ERRS_PIRATESdom block_piratesDOM
deny_info CUSTOM_ERRS_PORNurl block_pornURL
deny_info CUSTOM_ERRS_PORNdom block_pornDOM
deny_info CUSTOM_ERRS_WEBHOSTurl block_webhostURL
deny_info CUSTOM_ERRS_WEBHOSTdom block_webhostDOM
deny_info CUSTOM_ERRS_MaxCONNECTIONS MaxCONNECTIONS

Our Custom Error Messages have also evolved to inform users which parts of the URL they have hit upon has caused the 'connection failure.'

We deem that this is more helpful to clients and will maximise our ability to analyse whether the ruleset is accurate/effective.

Configuring Access to the Cache

Restricting Access to External Sites - relevant to organisational policies

# --> Domains & URLS to block
http_access deny block_pornURL
http_access deny block_pornDOM
http_access deny block_advertisersURL
http_access deny block_advertisersDOM
http_access deny block_entertainmentURL
http_access deny block_entertainmentDOM
http_access deny block_anonymizersDOM

Our access configuration remains largely the same, we're just using more acls.

##
## SPECIAL PRIVILEGE SECTION FOR ADMINISTRATORS
##
http_access allow users_sysadmin dst_all

One change we implement is to allow administrators greater freedom to the Internet, restricting their access only to sites specifically limited by the network policy and organisational policy.

users_sysadmin is a proxy authentication acl, so this allow sequence will only be made available if the client user can authenticate to the users listed with users_sysadmin (in our example: AdminID1, and AdminID2)

http_access deny block_webhostURL
http_access deny block_webhostDOM
http_access deny block_badlangURL
http_access deny block_piratesURL
http_access deny block_piratesDOM

http_access deny block_filesURLPATH

We now restrict external access via the domain portion of the URL, giving us greater freedom to use words that would otherwise cause significant problem if used in the complete URL. We can also provide a set of limited users extra privileges, independent of the machines they are using.

http_access allow block_filesURLPATH authenticated TIMEafterhoursMORN !MaxCONNECTIONS
http_access allow block_filesURLPATH authenticated TIMEafterhoursAFT !MaxCONNECTIONS
http_access allow block_filesURLPATH authenticated TIMEsatMORN !MaxCONNECTIONS
http_access allow block_filesURLPATH authenticated TIMEsatAFT !MaxCONNECTIONS
http_access allow block_filesURLPATH authenticated TIMEsundALLDAY !MaxCONNECTIONS

http_access deny block_filesURLPATH

With file restrictions we choose to deny access to download files during peak use periods. Here we specifically allow file downloads to authenticated users after hours and when the user has not exceeded allowed maximum number of connections.

Otherwise, we will block file downloads.

Allowing Specified networks access to the cache

# --> Subnet Access to the NET
http_access allow localhost
http_access allow subnet_lab1 authenticated !MaxCONNECTIONS
http_access allow subnet_lab2 authenticated !MaxCONNECTIONS

The subnets not only have to be correct to allow access to the cache, the clients also have to be connected and must not be greater than MaxConnections (5 in our initial estimation.)

To gain access to the cache, the client must

be in a valid ip-address (subnet_lab1 or subnet_lab2) AND
be an authenticated user (userid, password) AND
Must not have more than the MaxCONNECTIONS

Restricting Internal Access - relevant to organisational policies

http_access deny subnet_pub TIMEafterhoursMORN
http_access deny subnet_pub TIMEafterhoursAFT
http_access deny subnet_pub TIMEsatMORN
http_access deny subnet_pub TIMEsatAFT
http_access deny subnet_pub TIMEsundALLDAY
# http_access allow subnet_pub authenticated !MaxCONNECTIONS

There is minimal change in the time restriction. We have only included authentication and maxconn requirements to the commented access specifications.

Let's Go

http_access allow dst_all authenticated !MaxCONNECTIONS
http_access deny all

In our final line we have required authentication on going out from the cache to the rest of the world, just in case we've made some fundamentally stupid mistake somewhere else in our configuration.

Managing the Log Files

Edit the /etc/daily.local file and add the file lines:

if [ -x /usr/local/bin/squid -a -f /var/squid/logs/squid.pid ]; then
/usr/local/bin/squid -k rotate
fi

Other Miscellaneous Issues ?

Squids DNS Startup Test

We get very poor service from our ISP, and one serious problem when we were configuring our server was not being able to resolve the DNS names for squid. Failing to find the dns entries for netscape.com, internic.net, nlanr.net, microsoft.com the squid server will just hang-around and then eventually quit.

# TAG: dns_testnames
# The DNS tests exit as soon as the first site is successfully looked up
#
# This test can be disabled with the -D command line option.
#
#Default:
# dns_testnames netscape.com internic.net nlanr.net microsoft.com

dns_testnames mydomain.com

To solve the startup problem (because our ISP will regularly have problems with their DNS server) we set the dns test to look for our host details, which is configured in our internal DNS Server.

Debugging your Configuration

# TAG: debug_options
# Logging options are set as section,level where each source file
# is assigned a unique section. Lower levels result in less
# output, Full debugging (level 9) can result in a very large
# log file, so be careful. The magic word "ALL" sets debugging
# levels for all sections. We recommend normally running with
# "ALL,1".
#
#Default:
# debug_options ALL,1
debug_options ALL,1 32,2

I was having a number of problems with squid while playing around with the configuration file (especially when trying to get authentication working) and because of the problems we were having with our ISP connection failures. Squid can log more information in the /var/squid/logs/cache.log file. By increasing the amount of information that is placed in there I had a much better understanding of when squid was failing.

Squid User and Group

Another problem I was having in updating and downgrading squid (I was originally attempting to use LDAP authentication in squid to synchronise accounts between Samba, Squid, & Windows 2000) is the fact that the source distribution will use nobody but the OpenBSD ports use www:www

# TAG: cache_effective_user
# TAG: cache_effective_group
#
# NOTE: OpenBSD ports packages use uid:gid www:www
# = To make sure uid:guid squid:squid works
# = You need to make sure the user/group exists
# = AND to chown -R www:www the /var/squid directories (if need be)
#
#Default:
# cache_effective_user nobody
cache_effective_user www
cache_effective_group www

While shifting between port and source I was continually having problems with the source not being able to use the directories created by the OpenBSD port. It took a while (dumb admin I am) to figure out that uid:gid were different between the different compilations. Sometimes I would remember the ./configure directive, sometimes I'd forget.

Authentication - the MSNT module

[source: msntauth-v2.0 http://stellarx.tripod.com/]

The authentication module works pretty well, with little user involvement. Instructions are well documented in the accompanying README.html file.

The only customisations that was required was changing the default directory settings.

Edit File: confload.c (reference is out of date in the readme file)

#define CONFIGFILE "/usr/local/squid/etc/msntauth.conf"
#define DENYUSERSDEFAULT "/usr/local/squid/etc/denyusers"
#define ALLOWUSERSDEFAULT "/usr/local/squid/etc/allowusers"

Change the settings to what is the general directory structure for OpenBSD

#define CONFIGFILE "/etc/squid/msntauth.conf" /* Path to configuration file */
#define DENYUSERSDEFAULT "/etc/squid/denyusers"
#define ALLOWUSERSDEFAULT "/etc/squid/allowusers"

Edit the Makefile to specify the directories where you wish the bin files to be located. (no autoconfig yet.)

Copy the sample msntauth.conf file from the source directory to the directory specified above (/etc/squid.) Edit the file to specify your Domain authentication configuration.

touch the file /etc/squid/denyusers
touch the file /etc/squid/allowusers

Test that the authentication module is functioning correctly by manually executing it at the command prompt. Refer to the readme.html for further instructions on testing.

Content Filtering

If you think that filtering through the use of squid by URL or IP is draconian, some people actually have the need to filter even by the content of pages delivered.

For HTTP traffice, a proxy filtering solution is DansGuardian at http://www.dansguardian.org/.

Transparent Proxy

Package: transproxy-0.4.tgz

If you want to use transparent proxying with squid-authentication, don't. Read the FAQ and source for further details.

This program is used with Darren Reed's IPFILTER package and used to intercept things like http requests and divert them to a www proxy server (eg: squid), without requiring user intervention or configuration.

Install the package and make the following configuration changes.

Edit: /etc/services file to include the following lines:

tproxy tcp/8081 # Transparent Proxy

Edit: /etc/rc.conf.local file to include the following lines in Section 1:

tproxy=YES

Edit: /etc/rc.local.

After the 'starting local daemons' and before the following echo '.', Insert the following instructions to the /etc/rc.local file:

echo -n 'starting local daemons:'

# [ ... stuff left out ... ]

if [ X"${tproxy}" = X"YES" -a -x /usr/local/sbin/tproxy ]; then
echo -n ' tproxy'

/usr/local/sbin/tproxy -s 8081 -r www [proxy-server-ip-address] [port]
fi

# [ ... stuff left out ... ]

echo '.'

This tells the transparent proxy server to start as a server (-s) accept requests on port 8081, use the UserID www (-r) and to pass data on to the host [proxy-server-ip-address] at port [port]. On my machine (since i have the cache on the same server and I'm using squid at 3128) I can use:

# /usr/local/sbin/tproxy -s 8081 -r www 127.0.0.1 3128

The following ipnat rules should redirect www connection attempts (from the internal network to the external network) through to the cache.

Edit: /etc/ipnat.rules to include the line

rdr EXT_LINK 0.0.0.0/0 port www -> 127.0.0.1 port tproxy
rdr EXT_LINK 0.0.0.0/0 port 8080 -> 127.0.0.1 port tproxy

Unlike some other transparent proxy solutions, this does not require the proxy run on the machine itself. Running the caching server on a separate machine allows for greater scalability, and a feature of tproxyd is that it accepts connections on the redirected port, connects to the real proxy server and transports data between the two sockets.

FTP Proxy

Suse Proxy-Suite

SOCKS 5 Proxy

dante in ports

Cache Utilisation Analysis Tools

webalizer - (package)
squidclients - http://www.cineca.it/~nico/squidclients.thml

Human readable reports on cache utilisation, or network utilisation is always good for something. A few of the tools that we have come across for generating automatic reports on the cache use include: calamaris, webalizer, squidclients, and sqmgrlog (renamed as sarg).

What does the log file record.

Calamaris

[ref: http://calamaris.cord.de/]

Calamaris can generate a quick and neatly formatted report from the access files.

Calamaris interesting options:-
  -a all (equivalent to: -d 20 -P 60 -r 1 -s -t 20
  -d n show n top-level and n second level destinations
  -P n show throughput data for every n minutes
  -r n show n requesters
  -s   show verbose status reports
  -t n show n content-type, n extensions, and requested protocols


Output Format
  -m mailformat
  -w web HTML format

Sample usage:

#!/bin/sh
# Shell Script Used to generate log analysis reports from squid logs# using calamaris
#
cd /var/squid/logs
gunzip access*.gz
cat access.log access.log.0 access.log.1 access.log.2 access.log.3 access.log.4 access.log.5 access.log.6 | calamaris -a -w > squidreport.html

gzip access.log.*

# cat squidreport.html | mail -s "calamaris weekly report" somebody

Assumptions in the script are:
* calamaris has been manually installed into /usr/local/bin
* squid access log files are located at /var/squid/logs
* log files are rotated for 7 days (0 ~ 6)

sqmgrlog, sarg

[ref: http://web.onda.com.br/orso/index.html]

Sarg is a Squid Analysis Report Generator that allow you to view "where" your users are going to on the Internet. Sarg generate reports in html, with many fields, like: users, IP Addresses, bytes, sites and times.

This is what we actually use, and it was so easy to follow the instructions I can't remember how it was done.

Author and Copyright

I reserve the right to be totally incorrect even at the best advice of betters. In other words, I'm probably wrong in enough places for you to call me an idiot, but don't 'cause you'll hurt my sensibilities, just tell me where I went wrong and I'll try again.

You are permitted and encouraged to use this guide for fun or for profit as you see fit. If you republish this work in what-ever form, it would be nice (though not enforceable) to be credited.

SeTRoM

Friday, January 27, 2006

Proxy/Cache Service for the Internet

Introduction

Squid - Optimising Web Access

Starting Squid

Localised settings in OpenBSD package

Example Configuration

Specifying the Port to Listen On

Specifying which network IPs we will support in squid

Specifying Time intervals we will support

Specifying Organisational Policies (Restricted Sites)

Specifying Informative Messages relevant to Organisational Policies

Configuring Access to the Cache

Restricting Access to External Sites - relevant to organisational policies

Allowing Specified networks access to the cache

Restricting Internal Access - relevant to organisational policies

Ignoring the cache when requesting from Local Area Network

Let's Go.

Extending the Sample Configuration

Authenticating Users

Specifying Organisational Policies (Restricted Sites)

Specifying Informative Messages relevant to Organisational Policies

Configuring Access to the Cache

Restricting Access to External Sites - relevant to organisational policies

Allowing Specified networks access to the cache

Restricting Internal Access - relevant to organisational policies

Let's Go

Managing the Log Files

Other Miscellaneous Issues ?

Squids DNS Startup Test

Debugging your Configuration

Squid User and Group

Authentication - the MSNT module

Content Filtering

Transparent Proxy

FTP Proxy

SOCKS 5 Proxy

Cache Utilisation Analysis Tools

Calamaris

sqmgrlog, sarg

Author and Copyright

No comments:

setrom [at] inbom.com

About Me

Labels

Blog Archive

setrom [at] inbom.com