Difference between revisions of "How to cache openSUSE repositories with Squid"

From Perswiki
Jump to: navigation, search
(Background)
Line 1: Line 1:
How to set up a local squid webcache that works with openSUSE repositories and the openSUSE network installation process. In effect,  
+
'''Summary:''' How to set up a local squid web cache that works with openSUSE repositories and the openSUSE network installation process. In effect,  
 
a fully autonomous local mirror.
 
a fully autonomous local mirror.
  
 
== Background ==
 
== Background ==
I do a lot of testing of openSUSE, and more and more often, I install over the network. I used to keep  
+
I do quite a lot of testing of openSUSE, and more and more often, I install over the network. Previously, I used to keep  
 
the SuSE Linux and openSUSE DVDs available over NFS on a local server, but over last couple of years, I've been working a lot more with Factory
 
the SuSE Linux and openSUSE DVDs available over NFS on a local server, but over last couple of years, I've been working a lot more with Factory
and the many snap-shots that lead up to a final/gold release. With those it is much easier to just point the installation process  
+
and the regular snap-shots that lead up to a final/gold release. With those it is much easier to just point the installation process  
 
to the right URL.
 
to the right URL.
  
Especially when testing installation, I (and/or my colleagues) end up repeating the process many times on different machines.  
+
Especially when testing installation or new hardware, I (and/or my colleagues) end up repeating the process many times on different machines.  
 
Sometimes virtual machines, sometimes desktop, sometimes
 
Sometimes virtual machines, sometimes desktop, sometimes
 
servers in our downstairs datacentre. A local copy of the repository would seem to be the right thing, but it does require a tiny a bit of manual interaction
 
servers in our downstairs datacentre. A local copy of the repository would seem to be the right thing, but it does require a tiny a bit of manual interaction
(specifying the URL when installing). Instead I thought of using squid - managing a local cache of downloaded objects is exactly what squid is good at.  
+
(specifying the URL when installing). Instead I thought of using squid - managing a local cache of downloaded objects is exactly what squid is good at. With this it would just be
 +
 
 +
a) download the NET iso
 +
b) copy it onto a USB stick
 +
c) go and boot the machine.
  
 
Alas, squid and the openSUSE network installation process don't work together very well. Not out-of-the-box anyway.  
 
Alas, squid and the openSUSE network installation process don't work together very well. Not out-of-the-box anyway.  
 
The repository at [http://download.opensuse.org download.opensuse.org] is served by a load-distribution system combining mirrorbrain and metalinks. For the moment I won't
 
The repository at [http://download.opensuse.org download.opensuse.org] is served by a load-distribution system combining mirrorbrain and metalinks. For the moment I won't
 
go into any further detail, suffice to say that this means packages are downloaded using [http://en.wikipedia.org/wiki/Segmented_downloading segmented downloading] spread over multiple mirrors, which makes it impossible for squid to do much cacheing.
 
go into any further detail, suffice to say that this means packages are downloaded using [http://en.wikipedia.org/wiki/Segmented_downloading segmented downloading] spread over multiple mirrors, which makes it impossible for squid to do much cacheing.
 +
 +
== The problem ==
 +
Well, two problems really:
 +
 +
* the openSUSE repository is mirrored around the world. Mirrorbrain does a good job of picking the most suitable mirrors depending on your location, which also means a good distribution so individual mirrors aren't overloaded. However, squid does not know that multiples mirror sites serve the same file, so cacheing is mostly ineffective.
 +
* the segmented download means a package is downloaded in bits from multiple mirrors. This is good for speeding up the download and making good use of the available downstream bandwidth. The problem is that squid is only able to cache whole files, not bits of files, so again cacheing is rendered ineffective.

Revision as of 10:01, 12 November 2011

Summary: How to set up a local squid web cache that works with openSUSE repositories and the openSUSE network installation process. In effect, a fully autonomous local mirror.

Background

I do quite a lot of testing of openSUSE, and more and more often, I install over the network. Previously, I used to keep the SuSE Linux and openSUSE DVDs available over NFS on a local server, but over last couple of years, I've been working a lot more with Factory and the regular snap-shots that lead up to a final/gold release. With those it is much easier to just point the installation process to the right URL.

Especially when testing installation or new hardware, I (and/or my colleagues) end up repeating the process many times on different machines. Sometimes virtual machines, sometimes desktop, sometimes servers in our downstairs datacentre. A local copy of the repository would seem to be the right thing, but it does require a tiny a bit of manual interaction (specifying the URL when installing). Instead I thought of using squid - managing a local cache of downloaded objects is exactly what squid is good at. With this it would just be

a) download the NET iso
b) copy it onto a USB stick
c) go and boot the machine.

Alas, squid and the openSUSE network installation process don't work together very well. Not out-of-the-box anyway. The repository at download.opensuse.org is served by a load-distribution system combining mirrorbrain and metalinks. For the moment I won't go into any further detail, suffice to say that this means packages are downloaded using segmented downloading spread over multiple mirrors, which makes it impossible for squid to do much cacheing.

The problem

Well, two problems really:

  • the openSUSE repository is mirrored around the world. Mirrorbrain does a good job of picking the most suitable mirrors depending on your location, which also means a good distribution so individual mirrors aren't overloaded. However, squid does not know that multiples mirror sites serve the same file, so cacheing is mostly ineffective.
  • the segmented download means a package is downloaded in bits from multiple mirrors. This is good for speeding up the download and making good use of the available downstream bandwidth. The problem is that squid is only able to cache whole files, not bits of files, so again cacheing is rendered ineffective.