Difference between revisions of "How to cache openSUSE repositories with Squid"

From Perswiki
Jump to: navigation, search
(fetcher206)
(Overall/theoretical efficiency)
 
(131 intermediate revisions by 2 users not shown)
Line 1: Line 1:
'''Summary:''' How to set up a local squid web cache that works with openSUSE repositories and the openSUSE network installation process. In effect,  
+
==Summary==
a fully autonomous local mirror.
+
How to make your local Squid web cache work with openSUSE repositories and the openSUSE network installation process. In effect,  
 +
how to run a fully autonomous, local on-demand repository mirror.  Even with a high-speed ADSL internet connection, savings of up to 60% are easily achieved.
 +
 
 +
In 2012, I wrote this for squid 2.7. In April 2016, I updated everything to work with squid 3.4.x.
 +
 
 +
==Before you begin==
 +
Unless you already have a working squid web-cache, or you want to set it up anyway, the setup I describe here is probably too complex. It will probably be easier to run a local rsync-mirror.
  
 
== Background ==
 
== Background ==
I do quite a lot of testing of openSUSE, and more and more often, I install over the network. Previously, I used to keep
+
In my company, we do quite a lot of testing of openSUSE, and over the last three-four years, we have increasingly switched to installing over the network. Prior to that, we would install from DVD images over NFS served by a local server. However, over last couple of years, we've been working a lot more with Factory
the SuSE Linux and openSUSE DVDs available over NFS on a local server, but over last couple of years, I've been working a lot more with Factory
+
 
and the regular snap-shots that lead up to a final/gold release. With those it is much easier to just point the installation process  
 
and the regular snap-shots that lead up to a final/gold release. With those it is much easier to just point the installation process  
to the right URL.
+
to the right URL and have everything downloaded there and then.
  
Especially when testing installation or new hardware, I (and/or my colleagues) end up repeating the process many times on different machines.  
+
When we're testing installation or new hardware, we often have to repeat the installation process many times on different machines. Not because it doesn't work
Sometimes virtual machines, sometimes desktop, sometimes
+
as such, but because we might be testing or debugging our own add-ons or to collect diagnostics.  
servers in our downstairs datacentre. A local copy of the repository would seem to be the right thing, but it does require a tiny a bit of manual interaction
+
Sometimes we install on virtual machines, sometimes on desktops, more often on
(specifying the URL when installing). Instead I thought of using squid - managing a local cache of downloaded objects is exactly what squid is good at. With this it would just be
+
server hardware in our downstairs datacentre. We have a local Squid web cache, but after having switched to doing network installs more frequently, I have often been annoyed by the lack of effectiveness for caching the openSUSE repository. When I've already done one installation, the downloads for a subsequent one should obviously happen a lot faster, in fact at wire speed.
 +
Well, they don't and that's annoying when you know they could have been cached.
  
a) download the NET iso
+
The immediate alternative would be to run a local mirror of the openSUSE repositories, but it requires a process for keeping a the local mirror up-to-date, plus a bit of manual interaction (adding the right URL when installing. This is all entirely feasible, but I thought using Squid would be a more elegant and (hopefully) fully autonomous solution. so I decided to figure out why our Squid wasn't coping.  
b) copy it onto a USB stick
+
c) go and boot the machine.
+
  
Alas, squid and the openSUSE network installation process don't work together very well. Not out-of-the-box anyway.  
+
Well, Squid and the openSUSE network installation process just don't work together very well. Not out-of-the-box anyway.  
The repository at [http://download.opensuse.org download.opensuse.org] is served by a load-distribution system combining mirrorbrain and metalinks. For the moment I won't
+
The repository at [http://download.opensuse.org download.opensuse.org] is served by a load-distribution system combining [http://www.mirrorbrain.org mirrorbrain] and [http://www.metalinker.org/ metalinks].  
go into any further detail, suffice to say that this means packages are downloaded using [http://en.wikipedia.org/wiki/Segmented_downloading segmented downloading] spread over multiple mirrors, which makes it impossible for squid to do much cacheing.
+
I won't go into any further detail, suffice to say that this means packages are downloaded using [http://en.wikipedia.org/wiki/Segmented_downloading segmented downloading] spread over multiple mirrors, which together makes it impossible for squid to do much caching.
  
 
== The problem ==
 
== The problem ==
Well, two problems really:  
+
Well, three problems really:  
  
* the openSUSE repository is mirrored around the world. Mirrorbrain does a good job of picking the most suitable mirrors depending on your location, which also means a good distribution so individual mirrors aren't overloaded. However, squid does not know that multiples mirror sites serve the same file, so cacheing is mostly ineffective.  
+
* the openSUSE repositories are mirrored around the world, clients are served by Mirrorbrain. Mirrorbrain does a good job of picking the most suitable IPv4 mirrors depending on your location, which presumably also means a good distribution so individual mirrors aren't overloaded. However, Squid does not know that multiple mirror sites serve the same file, making caching at best ineffective.  
* the segmented download means a package is downloaded in bits from multiple mirrors. This is good for speeding up the download and making good use of the available downstream bandwidth. The problem is that squid is only able to cache whole files, not parts of files, so again cacheing is rendered ineffective.
+
* the segmented download means a package is downloaded in bits from multiple mirrors. This is good for speeding up the download and for maximizing the use of the available downstream bandwidth. The problem is that Squid is only able to cache whole files, not parts of files, rendering caching completely useless.
 +
* at the time of writing, the openSUSE mirror setup had a poor IPv6 geolocation. If you are accessing via IPv6, you are redirected to arbitrary mirrors around the world, some of them very slow.
  
I have solved both of these problems:  
+
I have solved all three problems:  
  
* using a squid url rewriter, I map all the mirror locations on to a single one.
+
* using a Squid url rewriter, I map all the mirror locations on to a single one.
* using a squid logfile and a custom written daemon, I do complete downloads of all the files that are being fetched with segmented downloading.
+
* using a Squid logfile and a custom written daemon, I do complete downloads of all the files that are being fetched with segmented downloading.
  
 
== Summary ==
 
== Summary ==
For anyone, an individual or a group of people, doing repeated ad-hoc installations of openSUSE, using this squid setup means  
+
For anyone, an individual or a group of people, doing repeated ad-hoc installations of openSUSE (typically Factory), using this squid setup means  
  
* downloads at local network speed
+
* significantly faster installation due to downloads at wire speed
* significant bandwidth savings
+
* significant bandwidth savings due to a working cache
* less load on openSUSE mirrors
+
* less load on openSUSE mirrors due to a working cache
* zero local mirror management
+
* zero local mirror management (assuming a working squid setup).
 
* no need to worry about where to install from
 
* no need to worry about where to install from
  
== The URL rewriter ==
+
Others doing e.g. repeated updates or adding software, should enjoy similar benefits (once the packages have been cached).
First of all, we need a list of the openSUSE mirrors. This is available [http://mirrors.opensuse.org/list/all.html here] - parsing the generated HTML
+
is not exactly an optimal solution, but I've checked with the admin, and for the time being there is no text file available.  It's probably also
+
fairly safe to assume that the HTML format at mirrors.opensuse.org will not change very often.  
+
  
The URL rewriter is a fairly mature piece of software named '[http://www.linofee.org/~jel/webtools/jesred/ jesred]'.  I had to make a couple of changes
+
== 60% faster at 6Mbit/s downstream ==
to it to make it fully compatible with the [http://www.squid-cache.org/ latest squid] (2.7), I've made my version available [http://jessen.ch/files/ here].
+
I run this setup primarily to save time on installations.  In the office, we have a 6000/600Kbit ADSL connection. It's sufficient for most
 +
activities, but when installing openSUSE over the network, it's really a bit slowFor openSUSE 12.1, it takes about an hour to complete phase 1 of the install process - 6-7 minutes
 +
for the initial 6 system installation images, then 50 minutes for a vanilla KDE installation.  
  
Once you've (built and) installed jesred, you need these two lines in squid.conf:
+
However, installing at wire speed (our LAN is 100Mbit) from the Squid cache is a lot faster, taking only 22 minutes (15 seconds for the initial 6 installation images, 21 minutes
 +
for phase 1 to complete.  That is a reduction of more than 60%.  With a slower network connection and perhaps slower mirrors too, only more time saved.
  
storeurl_rewrite_program /usr/bin/jesred
+
== Overall/theoretical efficiency ==
  storeurl_rewrite_children 5
+
'''Update 2019/06/16.''' Since I originally wrote this article, the internet speeds available to consumers have been improving every other month.  My measly 6Mbit ADSL line is way outdated by now, but it is a good example.
 +
Your potential gains from implementing the scheme I have described here depend entirely on
  
In my experience, the number of url_rewriter processes is not very critical, but 5 doesn't seem unreasonable. I think squid will
+
* your internet downlink speed
write to log if it's running out of url_rewriters.
+
* your local network speed
 +
* your usage patterns
  
The config file for jesred, /etc/squid/jesred.conf
+
With my 2012 example, I had a 6 MBit/s downlink and 100Mbit/s on local ethernet.  Theoretically that made for a factor 16 improvement.  Today I have 1Gbit/s local ethernet and 1Gbit/s downlink, so theoretically no improvement.  I.e. a download from a local cache is no faster than a download from the internet.
 +
 
 +
For my personal/business use or benefit, the use case has all but disappeared.  Our downlink is plenty for infrequent single installs and we maintain a [http://opensuse.hostsuisse.net public mirror] too.  We don't do so many regular installs either, so the main saving is on the load on mirrors, which is probably negligible anyway.  (unless thousands of users were to do local caching with this scheme).
 +
 
 +
However, in a nutshell, if a local cache is faster than your internet access, you should still a worthwhile improvement using this scheme.
 +
 
 +
== Download ==
 +
For the impatient, I've tar'ed everything into a single download. This contains the daemon code, one sample config files and the scripts for keeping
 +
up with the list of openSUSE mirrors. It's not as easy as just plonking another package into your openSUSE system with YaST or zypper, but the following step by step guide will hopefully help.
 +
 
 +
[http://files.jessen.ch/fetcher206-1.3.tar.gz fetcher206-1.3.tar.gz]
 +
 
 +
Current version is 1.3.
 +
 
 +
===Change history===
 +
2016/04/19 version 1.3
 +
Updates to work with Squid 3.4.x, added a fetcher206 systemd service file, updated jesred to 1.4.
 +
 +
2016/04/15 version 1.2a
 +
Tiny syntax error correction, no change in version#.  Thank you, hpj.
 +
 +
2015/05/21 version 1.2
 +
Miscellaneous updates from the last three years.
 +
 +
2012/06/04 version 1.1
 +
Improved parsing of the mirror list
 +
 +
2012/05/18 version 1.0
 +
First public release
 +
 
 +
== Step by step ==
 +
===Squid===
 +
The Squid web-proxy is the key element in this setup, so a working Squid installation is prerequisite.
 +
Setting up Squid is not as complicated as it may appear, but you'll have to consult squid documentation, it's outside the scope of this article.
 +
Whether you prefer directing access using environment variables ''http_proxy'' et al, or if you run a transparent proxy (like I do), is
 +
not really important.
 +
 
 +
===jesred===
 +
''jesred'' is the URL rewriter. It's fairly mature, but fully functional. ([http://www.linofee.org/~jel/webtools/jesred/ original webpage]).  I had to make a couple of changes
 +
to make it fully compatible with [http://www.squid-cache.org/ squid 2.7], this was version 1.3. In April 2016, I made some minor changes to make it work with IPv6 clients too, version 1.4:
 +
 
 +
* [http://files.jessen.ch/jesred-1.4.tar.gz jesred-1.4.tar.gz]
 +
 
 +
For the moment, it does not come packaged, you'll have to build it from scratch:
 +
 
 +
tar xzvf <tarball>
 +
cd jesred-1.4
 +
make
 +
 
 +
Installation: when you're done, copy the binary ''jesred'' into /usr/local/bin or whatever your preferred location for your own binaries is.
 +
 
 +
The config file for jesred: ''/etc/squid/jesred.conf''
  
 
  allow = /etc/squid/redirector.acl
 
  allow = /etc/squid/redirector.acl
Line 64: Line 122:
 
  rewrite_log = /var/log/squid/rewrite.log
 
  rewrite_log = /var/log/squid/rewrite.log
  
Using /etc/squid/redirector.acl you can control which clients' requests the rewriter should process:
+
Using ''/etc/squid/redirector.acl'' you can control which clients' requests the rewriter should process, but I find
 +
this is actually easier to control with Squid's ACL and ''storeurl_access'' directive, so I enable for all clients:
  
 
  # rewrite all URLs from
 
  # rewrite all URLs from
  192.168.0.0/21
+
  0.0.0.0/0
  
The rewriter rules file <tt>/etc/squid/opensuse-redirect.rules</tt> is the key component here. I create this automagically whenever a new mirror list is
+
===/etc/squid/squid.conf===
available. This is just an excerpt from a recently generated file:
+
Configuration: add the following lines to /etc/squid/squid.conf
  
  regexi ^<nowiki>http://download.opensuse.org/(.*)$                  http://download.opensuse.org/\1</nowiki>
+
  store_id_program /usr/local/bin/jesred
  regexi ^<nowiki>http://opensuse.mirror.ac.za/opensuse/(.*)$        http://download.opensuse.org/\1</nowiki>
+
  store_id_children 5
  regexi ^<nowiki>http://ftp.up.ac.za/mirrors/opensuse/opensuse/(.*)$ http://download.opensuse.org/\1</nowiki>
+
   
  regexi ^<nowiki>http://mirror.bjtu.edu.cn/opensuse(.*)$            http://download.opensuse.org/\1</nowiki>
+
  acl metalink req_mime_type application/metalink4+xml
  regexi ^<nowiki>http://fundawang.lcuc.org.cn/opensuse/(.*)$        http://download.opensuse.org/\1</nowiki>
+
  store_id_access deny metalink
  regexi ^<nowiki>http://mirror.lupaworld.com/opensuse/(.*)$          http://download.opensuse.org/\1</nowiki>
+
  store_id_access allow localnet
regexi ^<nowiki>http://mirrors.sohu.com/opensuse/(.*)$              http://download.opensuse.org/\1</nowiki>
+
  
In my setup, I have a daily cron-job that fetches a copy of the mirror list, and generates a set new redirector rules if
+
I also recommend changing "maximum_object_size" to e.g. 128Mb (default is 4Mb).
it has changed.
+
  
== fetcher206 ==
+
===fetcher206 logfile===
This is the daemon that works with squid to retrieve complete copies of files that were otherwise retrieved with segmented download.
+
Amend ''/etc/squid/squid.conf'' as follows:
The explanation is that although squid is unable to assemble individual segments into a complete file, it is able to satisfy partial requests
+
once a complete copy of a file has been retrieved.
+
 
+
In other words, the solution is to make sure squid gets a complete copy of every file that is retrieved with segmented dopwnload. fetcher206 does this by reading a squid logfile and using wget to get complete copies of files.
+
 
+
fetcher206?? Well, the daemon had to have a name, and as it's looking for completed partial HTTP requests, and these are indicated by an HTTP status code 206, I ended up with fetcher206.
+
 
+
Configuring the logfile for fetcher206 in /etc/squid/squid.conf:
+
  
 
  logformat f206 %{%Y-%m-%dT%H:%M:%S}tl %Ss/%03Hs %rm %ru %mt
 
  logformat f206 %{%Y-%m-%dT%H:%M:%S}tl %Ss/%03Hs %rm %ru %mt
 
  access_log /var/log/squid/fetch206.log f206
 
  access_log /var/log/squid/fetch206.log f206
  
These two lines define a new logformat called 'f206' and make squid write to the specified logfile. It would have been better to use a named pipe here,
+
This log will be read by fetcher206.
but as far as I can tell, squid doesn't support that.  I use logrotate to stop this file growing too big.
+
  
For now the daemon is written in PHP - at some point I want to rewrite it in C, but I find PHP is very useful for fast prototyping. There is room
+
To prevent it growing too big, add the following to ''/etc/logrotate.d/'' :
for improvement, but it does a pretty decent job as it is.
+
  
Daemon pseudo-code:
+
/var/log/squid/fetch206.log {
 +
    compress
 +
    dateext
 +
    maxage 365
 +
    rotate 5
 +
    size=+4M
 +
    notifempty
 +
    missingok
 +
    create 640 squid root
 +
    sharedscripts
 +
    postrotate
 +
    /etc/init.d/squid reload
 +
    endscript
 +
}
  
read config
+
===squid delay pool===
while true
+
This is an optional step - depending on your available downstream bandwidth, you may want to restrict what is used
  check jobqueue, joblist
+
by fetcher206 for retrieving the repository filesThis prevents
  if logfile has data
+
    look for TCP/206, if host is an openSUSE mirror, update joblist.
+
  done
+
  done
+
  
== Restricting bandwidth abuse ==
+
* slowing down the current installation and
When files are not yet cached, running fetcher206 will produce a little more network load.  I have not looked at exactly how much more, but as fetcher206
+
* abuse of the internet connection
is intended to help squid spped up the _next_ installation, I use a squid delay_pool to restrict the bandwidth used:
+
  
 
  delay_pools 1
 
  delay_pools 1
Line 122: Line 177:
 
  delay_parameters 1 1000000/1000000
 
  delay_parameters 1 1000000/1000000
  
This defines one delay_pool, only accessible from localhost (which is where fetcher206 will be running wgets) with a maximum bandwidth of 1MByte/sec.
+
Add the above to /etc/squid/squid.conf - it defines one delay_pool, only accessible from localhost (which is where fetcher206 will be running wget) with a maximum bandwidth of 1MByte/sec.
 +
 
 +
If you have other http/proxy traffic originating from localhost, you could just add another 127.0.0.x address, and use that specifically for fetcher206.
 +
 
 +
===mirror database===
 +
We need a current list of the available openSUSE mirrors. This can be retrieved from mirrors.opensuse.org. For the time being, I use XSL to
 +
parse the HTML page, but I hope to move to a suitably formatted list direct from MirrorBrain.
 +
 
 +
mkdir -p /var/lib/fetcher206
 +
cp ''tarball/Makefile.mirrors'' /var/lib/fetcher206/Makefile
 +
cp ''tarball/extract*'' /var/lib/fetcher206/
 +
make -C /var/lib/fetcher206
 +
cp ''tarball/opensuse_mirrors.cron'' /etc/cron.d/opensuse_mirrors
 +
 
 +
April 2016: I have not yet found a way of getting a mirror list straight from mirrorbrain.
 +
 
 +
===reload squid===
 +
 
 +
When you've come this far, it's time to reload squid with
 +
 
 +
squid -k reconfigure
 +
 
 +
===fetcher206===
 +
''fetcher206'' is, for the time being, a PHP script. Install it by simply copying it into /usr/local/bin.
 +
It has a few hard-coded options, such as number of wgets to run concurrently, name of logfile etc.
 +
 
 +
fetcher206 comes with a systemd service unit, see the tar file.
 +
 
 +
fetcher206 needs a couple of extra PHP modules - php5-pcntl and php5-xsl.

Latest revision as of 08:08, 17 June 2019

Summary

How to make your local Squid web cache work with openSUSE repositories and the openSUSE network installation process. In effect, how to run a fully autonomous, local on-demand repository mirror. Even with a high-speed ADSL internet connection, savings of up to 60% are easily achieved.

In 2012, I wrote this for squid 2.7. In April 2016, I updated everything to work with squid 3.4.x.

Before you begin

Unless you already have a working squid web-cache, or you want to set it up anyway, the setup I describe here is probably too complex. It will probably be easier to run a local rsync-mirror.

Background

In my company, we do quite a lot of testing of openSUSE, and over the last three-four years, we have increasingly switched to installing over the network. Prior to that, we would install from DVD images over NFS served by a local server. However, over last couple of years, we've been working a lot more with Factory and the regular snap-shots that lead up to a final/gold release. With those it is much easier to just point the installation process to the right URL and have everything downloaded there and then.

When we're testing installation or new hardware, we often have to repeat the installation process many times on different machines. Not because it doesn't work as such, but because we might be testing or debugging our own add-ons or to collect diagnostics. Sometimes we install on virtual machines, sometimes on desktops, more often on server hardware in our downstairs datacentre. We have a local Squid web cache, but after having switched to doing network installs more frequently, I have often been annoyed by the lack of effectiveness for caching the openSUSE repository. When I've already done one installation, the downloads for a subsequent one should obviously happen a lot faster, in fact at wire speed. Well, they don't and that's annoying when you know they could have been cached.

The immediate alternative would be to run a local mirror of the openSUSE repositories, but it requires a process for keeping a the local mirror up-to-date, plus a bit of manual interaction (adding the right URL when installing. This is all entirely feasible, but I thought using Squid would be a more elegant and (hopefully) fully autonomous solution. so I decided to figure out why our Squid wasn't coping.

Well, Squid and the openSUSE network installation process just don't work together very well. Not out-of-the-box anyway. The repository at download.opensuse.org is served by a load-distribution system combining mirrorbrain and metalinks. I won't go into any further detail, suffice to say that this means packages are downloaded using segmented downloading spread over multiple mirrors, which together makes it impossible for squid to do much caching.

The problem

Well, three problems really:

  • the openSUSE repositories are mirrored around the world, clients are served by Mirrorbrain. Mirrorbrain does a good job of picking the most suitable IPv4 mirrors depending on your location, which presumably also means a good distribution so individual mirrors aren't overloaded. However, Squid does not know that multiple mirror sites serve the same file, making caching at best ineffective.
  • the segmented download means a package is downloaded in bits from multiple mirrors. This is good for speeding up the download and for maximizing the use of the available downstream bandwidth. The problem is that Squid is only able to cache whole files, not parts of files, rendering caching completely useless.
  • at the time of writing, the openSUSE mirror setup had a poor IPv6 geolocation. If you are accessing via IPv6, you are redirected to arbitrary mirrors around the world, some of them very slow.

I have solved all three problems:

  • using a Squid url rewriter, I map all the mirror locations on to a single one.
  • using a Squid logfile and a custom written daemon, I do complete downloads of all the files that are being fetched with segmented downloading.

Summary

For anyone, an individual or a group of people, doing repeated ad-hoc installations of openSUSE (typically Factory), using this squid setup means

  • significantly faster installation due to downloads at wire speed
  • significant bandwidth savings due to a working cache
  • less load on openSUSE mirrors due to a working cache
  • zero local mirror management (assuming a working squid setup).
  • no need to worry about where to install from

Others doing e.g. repeated updates or adding software, should enjoy similar benefits (once the packages have been cached).

60% faster at 6Mbit/s downstream

I run this setup primarily to save time on installations. In the office, we have a 6000/600Kbit ADSL connection. It's sufficient for most activities, but when installing openSUSE over the network, it's really a bit slow. For openSUSE 12.1, it takes about an hour to complete phase 1 of the install process - 6-7 minutes for the initial 6 system installation images, then 50 minutes for a vanilla KDE installation.

However, installing at wire speed (our LAN is 100Mbit) from the Squid cache is a lot faster, taking only 22 minutes (15 seconds for the initial 6 installation images, 21 minutes for phase 1 to complete. That is a reduction of more than 60%. With a slower network connection and perhaps slower mirrors too, only more time saved.

Overall/theoretical efficiency

Update 2019/06/16. Since I originally wrote this article, the internet speeds available to consumers have been improving every other month. My measly 6Mbit ADSL line is way outdated by now, but it is a good example. Your potential gains from implementing the scheme I have described here depend entirely on

  • your internet downlink speed
  • your local network speed
  • your usage patterns

With my 2012 example, I had a 6 MBit/s downlink and 100Mbit/s on local ethernet. Theoretically that made for a factor 16 improvement. Today I have 1Gbit/s local ethernet and 1Gbit/s downlink, so theoretically no improvement. I.e. a download from a local cache is no faster than a download from the internet.

For my personal/business use or benefit, the use case has all but disappeared. Our downlink is plenty for infrequent single installs and we maintain a public mirror too. We don't do so many regular installs either, so the main saving is on the load on mirrors, which is probably negligible anyway. (unless thousands of users were to do local caching with this scheme).

However, in a nutshell, if a local cache is faster than your internet access, you should still a worthwhile improvement using this scheme.

Download

For the impatient, I've tar'ed everything into a single download. This contains the daemon code, one sample config files and the scripts for keeping up with the list of openSUSE mirrors. It's not as easy as just plonking another package into your openSUSE system with YaST or zypper, but the following step by step guide will hopefully help.

fetcher206-1.3.tar.gz

Current version is 1.3.

Change history

2016/04/19 version 1.3
Updates to work with Squid 3.4.x, added a fetcher206 systemd service file, updated jesred to 1.4. 

2016/04/15 version 1.2a
Tiny syntax error correction, no change in version#.  Thank you, hpj.

2015/05/21 version 1.2
Miscellaneous updates from the last three years. 

2012/06/04 version 1.1
Improved parsing of the mirror list

2012/05/18 version 1.0 
First public release

Step by step

Squid

The Squid web-proxy is the key element in this setup, so a working Squid installation is prerequisite. Setting up Squid is not as complicated as it may appear, but you'll have to consult squid documentation, it's outside the scope of this article. Whether you prefer directing access using environment variables http_proxy et al, or if you run a transparent proxy (like I do), is not really important.

jesred

jesred is the URL rewriter. It's fairly mature, but fully functional. (original webpage). I had to make a couple of changes to make it fully compatible with squid 2.7, this was version 1.3. In April 2016, I made some minor changes to make it work with IPv6 clients too, version 1.4:

For the moment, it does not come packaged, you'll have to build it from scratch:

tar xzvf <tarball>
cd jesred-1.4
make

Installation: when you're done, copy the binary jesred into /usr/local/bin or whatever your preferred location for your own binaries is.

The config file for jesred: /etc/squid/jesred.conf

allow = /etc/squid/redirector.acl
rules = /etc/squid/opensuse-redirect.rules
redirect_log = /var/log/squid/redirect.log
rewrite_log = /var/log/squid/rewrite.log

Using /etc/squid/redirector.acl you can control which clients' requests the rewriter should process, but I find this is actually easier to control with Squid's ACL and storeurl_access directive, so I enable for all clients:

# rewrite all URLs from
0.0.0.0/0

/etc/squid/squid.conf

Configuration: add the following lines to /etc/squid/squid.conf

store_id_program /usr/local/bin/jesred
store_id_children 5

acl metalink req_mime_type application/metalink4+xml
store_id_access deny metalink 
store_id_access allow localnet

I also recommend changing "maximum_object_size" to e.g. 128Mb (default is 4Mb).

fetcher206 logfile

Amend /etc/squid/squid.conf as follows:

logformat f206 %{%Y-%m-%dT%H:%M:%S}tl %Ss/%03Hs %rm %ru %mt
access_log /var/log/squid/fetch206.log f206

This log will be read by fetcher206.

To prevent it growing too big, add the following to /etc/logrotate.d/ :

/var/log/squid/fetch206.log {
   compress
   dateext
   maxage 365
   rotate 5
   size=+4M
   notifempty
   missingok
   create 640 squid root
   sharedscripts
   postrotate
    /etc/init.d/squid reload
   endscript
}

squid delay pool

This is an optional step - depending on your available downstream bandwidth, you may want to restrict what is used by fetcher206 for retrieving the repository files. This prevents

  • slowing down the current installation and
  • abuse of the internet connection
delay_pools 1
delay_class 1 1
delay_access 1 allow localhost
delay_parameters 1 1000000/1000000

Add the above to /etc/squid/squid.conf - it defines one delay_pool, only accessible from localhost (which is where fetcher206 will be running wget) with a maximum bandwidth of 1MByte/sec.

If you have other http/proxy traffic originating from localhost, you could just add another 127.0.0.x address, and use that specifically for fetcher206.

mirror database

We need a current list of the available openSUSE mirrors. This can be retrieved from mirrors.opensuse.org. For the time being, I use XSL to parse the HTML page, but I hope to move to a suitably formatted list direct from MirrorBrain.

mkdir -p /var/lib/fetcher206
cp tarball/Makefile.mirrors /var/lib/fetcher206/Makefile
cp tarball/extract* /var/lib/fetcher206/
make -C /var/lib/fetcher206
cp tarball/opensuse_mirrors.cron /etc/cron.d/opensuse_mirrors

April 2016: I have not yet found a way of getting a mirror list straight from mirrorbrain.

reload squid

When you've come this far, it's time to reload squid with

squid -k reconfigure

fetcher206

fetcher206 is, for the time being, a PHP script. Install it by simply copying it into /usr/local/bin. It has a few hard-coded options, such as number of wgets to run concurrently, name of logfile etc.

fetcher206 comes with a systemd service unit, see the tar file.

fetcher206 needs a couple of extra PHP modules - php5-pcntl and php5-xsl.