3. debdelta-upgrade service

In June 2006 I set up a delta-upgrading framework, so that people may upgrade their Debian box using debdelta-upgrade (that downloads package 'deltas'). This section is an introduction to the framework that is behind 'debdelta-upgrade', and is also used by 'cupt'. In the following, I will simplify (in places, quite a lot).

3.1. The framework

The framework is so organized: I keep up some servers where I use the program 'debdeltas' to create all the deltas; whereas endusers use the client 'debdelta-upgrade' to download the deltas and apply them to produce the debs needed to upgrade their boxes. In my server, I mirror some repositories, and then I invoke 'debdeltas' to make the deltas between them. I use the scripts /usr/share/debdelta/debmirror-delta-security and /usr/share/debdelta/debmirror-marshal-deltas for this. This generates any delta that may be needed for upgrades in squeeze,squeeze-security,wheezy,sid,experimental, for architectures i386 and amd64 (as of Mar 2011); the generated repository of deltas is more or less 10GB.

3.2. The goals

There are two ultimate goals in designing this framework:

  1. SMALL) reduce the size of downloads (fit for people that pay-by-megabyte);

  2. FAST) speed up the upgrade.

The two goals are unfortunately only marginally compatible. An example: bsdiff can produce very small deltas, but is quite slow (in particular with very large files); so currently (2009 on) I use 'xdelta3' as the backend diffing tool for 'debdeltas' in my server. Another example is in debs that contain archives ( .gz, , tar.gz etc etc): I have methods and code to peek inside them, so the delta become smaller, but the applying gets slower.

3.3. The repository structure

The repository of deltas is just a HTTP archive; it is similar to the pool of packages; that is, if foobar_1_all.deb is stored in pool/main/f/foobar/ in the repository of debs, then the delta to upgrade it will be stored in pool/main/f/foobar/foobar_1_2_all.debdelta in the repository of deltas. Contrary to the repository of debs, a repository of deltas has no indexes, see Section 3.7.2. The delta repository is in http://debdeltas.debian.net/debian-deltas.

3.4. The repository creation

Suppose that the unstable archive, on 1st Mar, contains foobar_1_all.deb (and it is in pool/main/f/foobar/ ) ; then on 2nd Mar, foobar_2_all.deb is uploaded; but this has a flaw (e.g. FTBFS) and so on 3rd Mar foobar_3_all.deb is uploaded. On 2nd Mar, the delta server generates pool/main/f/foobar/foobar_1_2_all.debdelta On 3rd Mar, the server generates both pool/main/f/foobar/foobar_1_3_all.debdelta pool/main/f/foobar/foobar_2_3_all.debdelta. So, if the end-user Ann upgrades the system on both 2nd and 3rd Mar, then she uses both foobar_1_2_all.debdelta (on 2nd) and foobar_2_3_all.debdelta (on 3rd Mar). If the end-user Boe has not upgraded the system on 2nd Mar, , and he upgrades on 3rd Mar, then on 3rd Mar he uses foobar_1_3_all.debdelta.

3.5. size limit

Note that currently the server rejects deltas that exceed 70% of the deb size: indeed the size gain would be too small, and the time would be wasted, if you sum the time to download the delta and the time to apply it (OK, these are run as much as possible in parallel, yet ....).

Also, the server does not generate delta for packages that are smaller than 10KB.

3.6. /etc/debdelta/sources.conf

Consider a package that is currently installed. It is characterized by name installed_version architecture (unfortunately there is no way to tell from which archive it came from, but this does not seem to be a problem currently) Suppose now that a newer version is available somewhere in an archive, and that the user wishes to upgrade to that version. The archive Release file contain these info: "Origin , Label , Site, Archive". (Note that Archive is called Suite in the Release file). Example for the security archive:


	Origin=Debian
	Label=Debian-Security
	Archive=stable
	Site=security.debian.org
      
The file /etc/debdelta/sources.conf , given the above info, determines the host that should contain the delta for upgrading the package. This information is called "delta_uri" in that file. The complete URL for the delta is built adding to the delta_uri a directory path that mimicks the "pool" structure used in Debian archives, and appending to it a filename of the form name_oldversion_newversion_architecture.debdelta. All this is implemented in the example script contrib/findurl.py . If the delta is not available at that URL, and name_oldversion_newversion_architecture.debdelta-too-big is available, then the delta is too big to be useful. If neither is present, then, either the delta has not yet been generated, or it will never be generated... but this is difficult to know.

3.7. indexes

3.7.1. indexes of debs in APT

Let's start examining the situation for debs and APT. Using indexes for debs is a no-brainer decision: indeed, the client (i.e. the end user) does not know the list of available debs in the server, and, even knowing the current list, cannot foresee the future changes. So indexes provide needed informations: the packages' descriptions, versions, dependencies, etc etc; these info are used by apt and the other frontends.

3.7.2. no indexes of deltas in debdelta

If you then think of deltas, you realize that all requirements above fall. Firstly there is no description and no dependencies for deltas. [1] Of course 'debdelta-upgrade' needs some information to determine if a delta exists, and to download it; but these information are already available:


	      the name of the package P
	      the old version  O
	      the new version  N
	      the architecture A
	    
Once these are known, the URL of the file F can be algorithmically determined as URI/POOL/P_O_N_A.debdelta where URI is determined from /etc/debdelta/sources.conf and POOL is the directory in the pool of the package P . This algorithm is also implemented (quite verbosely) in contrib/findurl.py in the sources of debdelta. This is the reason why currently there is no "index of deltas", and nonetheless 'debdelta-upgrade' works fine (and "cupt" as well). Adding an index of file would only increase downloads (time and size) and increase disk usage; with negligeable benefit, if any.

3.8. no incremental deltas

Let me add another point that may be unclear. There are no incremental deltas (and IMHO never will be).

3.8.1. What "incremental" would be, and why it is not

Please recall Section 3.4. What does not happen currently is what follows: on 3rd Mar , Boe decides to upgrade, and invokes 'debdelta-upgrade'; then 'debdelta-upgrade' finds foobar_1_2_all.debdelta and foobar_2_3_all.debdelta , it uses the foremost to generate foobar_2_all.deb, and in turn it uses this and the second delta to generate foobar_3_all.deb . This is not implemented, and it will not, for the following reasons.

  • The delta size is, on average, 40% of the size of the deb (and this is getting worse, for different reasons, see Section 5.2); so two deltas are 80% of the target deb, and this too much.

  • It takes time to apply a delta; applying two deltas to produce one deb takes too much time.

  • The server does generate the direct delta foobar_1_3_all.debdelta :-) so why making things complex when they are easy? :-)

  • Note also that incremental deltas would need some index system to be implemented... indeed, Boe would have no way to know on 3rd Mar that the intermediate version of foobar between "1" and "3" is "2"; but since incremental deltas do not exist, then there is no need to have indexes).

3.9. Repository howto

There are (at least) two ways two manage a repository, and run a server that creates the deltas

3.9.1. debmirror --debmarshal

The first way is what I currently use. It is implemented in the script /usr/share/debdelta/debmirror-marshal-deltas (a simpler version, much primitive but more readable , is /usr/share/debdelta/debmirror-delta-security) Currently I use the complex script that creates deltas for amd64 and i386, and for lenny squeeze sid experimental ; and the simpler one for lenny-security. Let me start outlining how the simple script generate deltas . It is a 3 steps process. Lets say that $secdebmir is the directory containg the mirror of the repository security.debian.org.

  1. 
	--- 1st step
    	#make copy of current stable-security lists of packages
    	olddists=${TMPDIR:-/tmp}/oldsecdists-`date +'%F_%H-%M-%S'`
    	mkdir $olddists
    	cp -a $secdebmir/dists $olddists
          
  2. --- 2nd step call 'debmirror' to update the mirror ; note that I apply a patch to debmirror so that old debs are not deleted , but moved to a /old_deb directory

  3. --- 3rd step call 'debdeltas' to generate deltas , from the state of packages in $olddists to the current state in $secdebmir , and also wrt what is in stable. Note that, for any package that was deleted from the archive, then 'debdeltas' will go fishing for it inside /old_deb .

The more complex script uses the new debmirror --debmarshal so it keeps 40 old snapshots of the deb archives, and it generates deltas of the current package version (the "new" version) to the versions in snapshots -10,-20,-30,-40.

3.9.2. hooks and repository of old_debs

I wrote the scheleton for some commands.

debdelta_repo [--add name version arch filename disttoken]

This first one is to be called by the archive management tool (e.g. DAK) when a new package enters in a part of the archive (lets say, package="foobar" version="2" arch="all" and filename="pool/main/f/foobar/foobar_2_all.deb" just entered disttoken="testing/main/amd64"). That command will add that to a delta queue, so appropriate deltas will be generated; this command returns almost immediately.

debdelta_repo [--delta]

This does create all the deltas.

debdelta_repo [--sos filename]

This will be called by DAK when (before) it does delete a package from the archive; this command will save that old deb somewhere (indeed it may be needed to generate deltas sometimes in the future). (It will be up to some piece of debdelta_repo code to manage the repository of old debs, and delete excess copies).

TODO that scheleton does not handle 'security', where some old versions of the packages are in a different DISTTOKEN

Notes

[1]

deltas have a "info" section, but that is, as to say, standalone