Ebuild Version Encoding

From Funtoo Linux
Revision as of 14:52, 13 November 2011 by Wishstudio (Talk)

Jump to: navigation, search

Contents

Introduction

When implementing a database backend for portage, the versions of packages need to be directly comparable. An integer sized 64 or 128 bit is the way to go. This page describes a scheme to encode the original version strings into compacted integers.

Version Format

The format of version strings used in ebuilds can be categorized as:

version ::= <basic> <suffix>
basic   ::= <number> {'.' <number>} [char]
suffix  ::= {'_' <symbol> [number]} ['-r' <number>]
number  ::= an arbitary non-negative number
symbol  ::= one of "alpha", "beta", "rc", "pre", "p"
char    ::= 'a' to 'z'

NOTE: {'a'} means 'a' can appear none or arbitary times.

Comparison Rules

  • Basic version: Use the general comparison rules. For example: 1.0 < 1.1 < 1.1.1 < 1.2 < 1.2a < 1.3 < 2 < 2.0
  • Suffixes: _alpha < _beta < _pre < _rc < none < _p
  • Overall: If the basic versions are different, compare them. Or compare the suffixes. At last compare ebuild revisions.

Basics

We have to use a single integers to represent multiple integers plus some symbols while keep them comparable.

In this part we are focusing on the basic versions. Take 16 bit integers for explanation. Using 16 bit integer 2 and 12 have the form

0000000000000010
0000000000001100

and they are apparently comparable. What's more, even they appear in the middle of the integer:

0000001000000000
0001100000000000

Provided their LSB (least significant bit) are alligned, they are still comparable. So basically we can align the LSB of numbers to some fixed positions. If we use each 4 bit for a number we can get:

1.0      => 0001 0000 0000 0000
1.0.0    => 0001 0000 0000 0000
1.1      => 0001 0001 0000 0000
3.4.1.10 => 0011 0100 0001 1010

We put the numbers from left (most significant) to right (least significant) to give numbers the correct priority.

This method has a big limitation that the size of an integer and the number of the integers are all fixed. Neither we can't destinguish 1.0 and 1.0.0. To overcome these limitations, we can prepend a length marker before the number and use the following encoding:

Encoding Range
0xx 0~3
1xxxxx 4~31

Losing one bit for a number, now we can encode more complex versions:

1.0         => 001 000               => 0010 0000 0000 0000
1.0.0       => 001 000 000           => 0010 0000 0000 0000
1.1         => 001 001               => 0010 0100 0000 0000
1.3         => 001 011               => 0010 1100 0000 0000
1.4         => 001 100100            => 0011 0010 0000 0000
2.6.20      => 010 100110 110100     => 0101 0011 0110 1000
3.0         => 001 000               => 0010 0000 0000 0000
3.1         => 001 001               => 0010 0100 0000 0000
3.2.3.3     => 011 010 011 011       => 0110 1001 1011 0000
3.2.3.3.1   => 011 010 011 011 001   => 0110 1001 1011 0010
3.4.1.10    => 011 100100 001 101010 => 0111 0010 0001 1010 10 (too long!)

You see previous version 3.4.1.10 fail using this method. This is a trade-off for flexibility.

The length markers can also be variable-size. The point here is to use Huffman coding. We just have to make sure:

  • The length markers are comparable. In other words, markers for shorter length must be less than markers for longer length, when aligned to MSB.
  • No length marker can be a prefix of another, or we can't distinguish them.

Encoding Specification

Under construction

Considerations

Under construction

Scripts

Under construction

Personal tools
Namespaces

Variants
Actions
Categories
Toolbox
Stuff