http://invisible-island.net/autoconf/
Copyright © 2014–2015,2016 by Thomas E. Dickey
There is no standard version of the tar program. This may surprise some, who either assume that because it is available “everywhere” or have read comments to the contrary, suppose it must be a standard. In POSIX (since 2001), the equivalent of tar is the pax program. As noted in the rationale:
The pax utility was new for the ISO POSIX-2:1993 standard. It represents a peaceful compromise between advocates of the historical tar and cpio utilities.
Arnold Robbins (in Unix in a Nutshell) gives more detail:
pax [options] [patterns]
Portable Archive Exchange program. When members of the first POSIX 1003.2 working group could not standardize on either tar or cpio, they invented this program. (See also cpio and tar.)
GNU/Linux and Max OS X use almost identical versions of pax, developed by the OpenBSD team, based on the original freely available version by Keith Muller.
I used (and preferred) cpio starting in early 1986, when I wrote sccs_tools. My project had about twenty tape cartridges storing snapshots of the project's sources. The cpio program was used for writing and reading those tapes. I continued to use cpio for my own backups when I started development with Linux early in 1994. Here is a fragment from my backup script from May 1994:
cpio --verbose --reset-access-time --format=ustar -B -o -O $DST
The nice thing about cpio was that it accepts a list of pathnames from its standard input. Sadly, cpio was not prevalent on the systems where I was developing at that time, and I began to rely upon tar. Unlike cpio, tar requires its pathnames to be given on the command-line, limiting its use of standard input/output to the actual data being processed. Aside from doing my backups, I also exchange data with others in tar-files. The compelling reason for using tar is that (unlike cpio) if I provide a tar-file to others, they are likely to have a program to read it.
Unlike cpio, there were several implementations of tar. Others have tabulated differences (I will not summarize those here).
Usually tar is just a “given”, used for distributing and receiving a set of files.
The lynx web browser uses tar in its menu of file operations. When I first started working with lynx, the program pathname (and options) were compiled-in, using constants. I modified this in stages:-
Initially, in 1997 (see CHANGES2.8), tar was simply one of several programs for which I chose to compile-in full pathnames, to match the previous hand-crafted makefiles as well as to ensure that a specific program was run, rather than just any program named “tar”:
1997-04-02 * refine CF_PATH_PROG to allow for machines that haven't the given programs, by using only the program name and added configure option --disable-full-paths to enforce this behavior. - TD 1997-03-23 * Add autoconf tests for paths of programs, including sendmail vs mmdf - TD
- In 2002, I made the compiled-in pathnames easier to configure by adding settings in
lynx.cfg
which could override the compiled-in pathnames.2002-12-01 (2.8.5dev.11) * document xxx_PATH variables in lynx.cfg -TD
That included TAR_PATH.
-
Early in 2004, I extended the check for tar to include similar programs (referring to CHANGES):
2004-01-28 (2.8.5pre.4) * modify configure check for tar to test several common variants including star, modify makefile.in to use the configured 'tar' program (request by FLWM) -TD
The last step addressed the more common tar (or pax!) variants. Here is the configure check which I wrote:
dnl CF_TAR_OPTIONS version: 1 updated: 2004/01/26 20:58:41
dnl --------------
dnl This is just a list of the most common tar options, allowing for variants
dnl that can operate with the "-" standard input/output option.
AC_DEFUN([CF_TAR_OPTIONS],
[
case ifelse($1,,tar,$1) in
*pax)
TAR_UP_OPTIONS="-w"
TAR_DOWN_OPTIONS="-r"
TAR_PIPE_OPTIONS=""
TAR_FILE_OPTIONS="-f"
;;
*star)
TAR_UP_OPTIONS="-c -f"
TAR_DOWN_OPTIONS="-x -U -f"
TAR_PIPE_OPTIONS="-"
TAR_FILE_OPTIONS=""
;;
*tar)
# FIXME: some versions of tar require, some don't allow the "-"
TAR_UP_OPTIONS="-cf"
TAR_DOWN_OPTIONS="-xf"
TAR_PIPE_OPTIONS="-"
TAR_FILE_OPTIONS=""
;;
esac
AC_SUBST(TAR_UP_OPTIONS)
AC_SUBST(TAR_DOWN_OPTIONS)
AC_SUBST(TAR_FILE_OPTIONS)
AC_SUBST(TAR_PIPE_OPTIONS)
])dnl
It supplements this chunk:
CF_PATH_PROG(TAR, tar, pax gtar gnutar bsdtar star) CF_TAR_OPTIONS($TAR) AC_DEFINE_UNQUOTED(TAR_UP_OPTIONS, "$TAR_UP_OPTIONS") AC_DEFINE_UNQUOTED(TAR_DOWN_OPTIONS, "$TAR_DOWN_OPTIONS") AC_DEFINE_UNQUOTED(TAR_FILE_OPTIONS, "$TAR_FILE_OPTIONS") AC_DEFINE_UNQUOTED(TAR_PIPE_OPTIONS, "$TAR_PIPE_OPTIONS")
With these parameters of “tar” it was possible to rework some hardcoded command-lines to un-tar files which were downloaded by lynx, e.g., (and simplifying):
gzip -dc filename.tar.dc | $TAR_PATH $TAR_DOWN_OPTIONS $TAR_PIPE_OPTIONS
That worked well enough, but there were a few trouble-spots.
The configure check assumes too much about the option syntax, by basing the available options on the tar program name. It would be possible to improve on this by testing the program against known useful options.
The check does not concern itself with the ownership of files which are extracted from the tar archive. Lynx disables setuid operation, but could be run by the root user.
A configure script cannot be counted on to run as root, and cannot test whether a tar program requires some special option to preserve file ownership.
SVR4 tar on AIX, HPUX, Solaris documents these options, with some variations.
I omit an unrelated paragraph from the “o” option for brevity:
- o
- When o is used for reading, it causes the extracted file to take on the user and group IDs of the user running the program rather than those on the tape. This is the default for the ordinary user and can be overridden, to the extent that system protections allow, by using the p function modifier.
- p
- Cause file to be restored to the original modes and ownerships written on the archive, if possible. This is the default for the superuser, and can be overridden by the o function modifier. If system protections prevent the ordinary user from executing chown(), the error is ignored, and the ownership is set to that of the restoring process (see chown(2)). The set-user-id, set-group-id, and sticky bit information are restored as allowed by the protections defined by chmod() if the chown() operation above succeeds.
The same options were documented in SunOS 4 tar (with fewer words, of course):
o Suppress information specifying owner and modes of directories which tar normally places in the archive. Such information makes former versions of tar generate an error message like: filename/: cannot create when they encounter it. p Restore the named files to their original modes, ignoring the present umask(2V). SetUID and sticky information are also extracted if you are the super-user. This option is only useful with the x key letter.
Date: Sun, 13 Jul 1997 01:24:55 +1000 From: David Dawes <dawes@rf900.physics.usyd.edu.au> To: devel@XFree86.Org Subject: Extract utility (was: Re: missing 'p' flag for tar in RELNOTES) On Fri, Jun 06, 1997 at 10:03:16PM +0200, Matthieu Herrb wrote: >David Dawes wrote (in a message from Fri 6) > > > > Not all versions of tar require the 'p' flag for this. Gnu tar for > > example doesn't require this. Neither does the 'tar' that comes with > > Solaris 2.5 (in spite of what the man page implies). Which tar does > > OpenBSD use? > >A modified pax. > > > Is using OpenBSD's cpio a better option > > (if it knows how to extract tar archives)? > >it's based on pax too, but it does preserve the file modes on >extraction, so it's indeed better. > > > I'm more and more coming to the conclusion that we should provide an > > 'extract' binary for each OS that people can use to unpack the .tgz > > files in a reliable way. I would currently see this as being say GNU > > tar, with the --unlink flag that some BSD versions have added included > > and enabled by default, and modified to use zlib to avoid the need for > > a separate gzip binary. > >Yes. For example OpenBSD's pax based commands can't read some tarballs >made by GNU tar. I've done some work on this, and I have something which we can hopefully use for 3.3.1. It is gnu tar 1.12, with support added to make use of zlib so that it is self-contained. When run as "extract" it sets the -x, -z and --unlink-first flags, and accepts multiple .tgz files on the command line. The -t flag can be used to override -x and list the contents. When run under any other name, it behaves like tar. The code for this is available as utils-1.0.0.tgz in the beta directory. Can those who build binary distributions please check that it compiles and works OK. If there are any problems, let me know. Building it should only require running 'make' from the utils directory. >That's the reason for which I didn't contribute back my buid-bindist >scripts for 3.3. This has forced me to use one of the pax based >commands. Unfortunatly none of them have the equivalent of the GNU tar >'--exclude-from' option, so I had to build explicit lists of files to >include in each tarball. This binary can be used (under the name gnu-tar) to build the bindists. In fact, it is probably best to use this one so that compatibility problems are avoided. David
Beyond the parameterization, lynx's extraction of files from an archive is simplistic, assumes no errors. In practice, that could fail for any of several reasons. But the most interesting one is due to tar-file format differences, e.g., in the way excessively long pathnames are stored.
Although POSIX documented (with pax) a scheme for storing long filenames in 1989, it was not until the mid-1990s before things started to settle out. Not everyone got on board at the same time.
For instance, Ant's documentation for the tar
task says:
Early versions of tar did not support path lengths greater than 100 characters. Over time several incompatible extensions have been developed until a new POSIX standard was created that added so called PAX extension headers (as the pax utility first introduced them) that among another things addressed file names longer than 100 characters. All modern implementations of tar support PAX extension headers.
Ant's tar support predates the standard with PAX extension headers, it supports different dialects that can be enabled using the longfile attribute. If the longfile attribute is set to fail, any long paths will cause the tar task to fail. If the longfile attribute is set to truncate, any long paths will be truncated to the 100 character maximum length prior to adding to the archive. If the value of the longfile attribute is set to omit then files containing long paths will be omitted from the archive. Either option ensures that the archive can be untarred by any compliant version of tar.
For more detailed information on Ant, see the documentation on The TAR package.
The interesting tar variants of course are those which I can inspect and compare their behavior at different points in time. That equates to saying that I can read the source code.
I have access to a few Unix systems for comparison (AIX 5-7, HPUX 11, Solaris 8-11). Because source is not generally available, there is not much to say.
Illumos (descendent of OpenSolaris) has tar source (and cpio source) in its Github repository.
Interesting enough, it started as 4.3 BSD tar:
/* * Copyright (c) 1988, 2010, Oracle and/or its affiliates. All rights reserved. * Copyright 2012 Milan Jurik. All rights reserved. * Copyright 2015 Joyent, Inc. */ /* Copyright (c) 1983, 1984, 1985, 1986, 1987, 1988, 1989 AT&T */ /* All Rights Reserved */ /* Copyright (c) 1987, 1988 Microsoft Corporation */ /* All Rights Reserved */ /* * Portions of this source code were derived from Berkeley 4.3 BSD * under license from the Regents of the University of California. */
For what it's worth, the cpio
source also uses BSD code and has similar copyrights:
/* * Copyright (c) 1988, 2010, Oracle and/or its affiliates. All rights reserved. * Copyright 2012 Milan Jurik. All rights reserved. * Copyright (c) 2012 Gary Mills */ /* Copyright (c) 1983, 1984, 1985, 1986, 1987, 1988, 1989 AT&T */ /* All Rights Reserved */ /* * Portions of this source code were derived from Berkeley 4.3 BSD * under license from the Regents of the University of California. */
The earliest sources I have at hand for tar are
ansitar
published on net.sources at the beginning of July 1983.- 3BSD tar (January 3, 1980).
3BSD tar is more interesting than ansitar
, because the latter works only for tapes, not files. Also, ansitar
uses a different header format.
Some of the BSD source code was reportedly AT&T source code, but not apparent because AT&T neglected to mark their sources. In reading the BSD source for tar and its manual page, there is no copyright notice applied until 1986 (for the 4.3BSD source code) and 1990 (for the manual page). That is not AT&T:
/* * Copyright (c) 1980 Regents of the University of California. * All rights reserved. The Berkeley software License Agreement * specifies the terms and conditions for redistribution. */
The successive releases from CSRG are clearly related (1980 through 1990).
A new implementation (part of pax) by Keith Muller was introduced after that (seen in 4.4BSD-Lite):
/*- * Copyright (c) 1992 Keith Muller. * Copyright (c) 1992, 1993 * The Regents of the University of California. All rights reserved. * * This code is derived from software contributed to Berkeley by * Keith Muller of the University of California, San Diego.
Outside the BSD sources, there was another tar implementation. You may find a copy in DECUS as "posixtar" (July 9, 1987):
/* * A public domain tar(1) program. * * Written by John Gilmore, ihnp4!hoptoad!gnu, starting 25 Aug 85. * * @(#)tar.c 1.21 10/29/86 Public Domain - gnu */
This happens to be the same version that John Gilmore posted to mod.sources volume 7 as v07i088: Public-domain TAR program (1986/12/10).
It is likely the version fetched by Stallman as the basis for GNU tar.
A quick check indicates that Gilmore wrote this shortly after leaving Sun:
- comp.unix.wizards newsgroup posting in February 1988:
Here's my two cents on the issue (disclaimer: I was emp #5 of Sun, though I've been gone more than two years). DEC, HP, Apollo, etc were happy with AT&T controlling Unix when it was clear AT&T was not a competitive threat. AT&T's inability to sell computers is legendary. In a tighter partnership with Sun, AT&T might actually be able to make money at computers, which would give the protesters a major competitor rather than a pussycat.
- According to The Internet: Biographies (Hilary W. Poole, Laura Lambert, Chris Woodford, Christos J. P. Moschovitis ABC-CLIO, 2005):
In 1985, Gilmore left Sun with $10,000 in his pocket, a Sun workstation, and significant stock holdings in the company.
It is mentioned in the BACKLOG file for GNU tar 1.12:
1. ....-..-.. John Gilmore: Re: I'm writing a public domain -tar- 2. 1985-09-14 Richard M. Stallman: I'm writing a public domain -tar- 3. 1985-12-03 John Gilmore: Re: tar 4. ....-..-.. David C. Anderson: Re: tar 5. 1986-10-31 John Gilmore: Re: wanted: a VMS program to write UNIX tar tapes 6. 1986-12-22 Richard M. Stallman: I got the tar 7. 1987-02-14 John Gilmore: Re: tar 8. 1987-12-15 Brian Reid: (none) 9. 1988-02-03 Jay Fenlason: (none)
Schilling refers to a version obtained from Sun Users Group as being the first that Gilmore published, and also leads the reader to believe that Gilmore did the work as an employee of Sun. For example:
The social background is: Star is maintained by me since 1982. Gtar started as PD-TAR/SUG-TAR from John Gilmore (a Sun employee) in late 1986 and it was taken by Stallman in 1989. In the early 1990s, the maintained changed frequently and in that time (1993) I first reported the problem. – schily Sep 5 at 9:47
The mod.sources volume 7 files are older, and there are significant differences:
Makefile | 107 +++====
PORTING | 45 !
README | 59 +!==
TODO | 76 ++-==
buffer.c | 712 +++++++++++++++++------===============================
create.c | 526 +-!======================================
extract.c | 407 +++++++-!=======================
list.c | 477 +!!================================
port.c | 431 ++++++++++++++++++++++!========
port.h | 19 =
sugtar/diffarch.c | 319 ++++++++++++++++++++++++
sugtar/open3.h | 45 +++
tar.1 | 185 +-=============
tar.c | 450 +-=================================
tar.h | 176 =============
15 files changed, 1184 insertions(+), 102 deletions(-), 240 modifications(!), 2508 unchanged lines(=)
Likewise, the tie-in to Sun is weaker than stated by Schilling.
Reflecting on it, there are other problems with Schilling's statement. But aside from those I have commented on, there is no independent source of information which can be used to compare against Schilling's account. For each detail where there is another source of information, it differs.
Gilmore made a second posting of pdtar to comp.sources.unix volume12 v12i068: Public domain TAR (1987/11/29). One of the differences between the two postings was the addition of wildmat.c
, which is present in GNU tar 1.09, indicating that this latter posting was used in the development of GNU tar. First, compare against the volume 7 posting:
Makefile | 157 ++++++!===
PORTING | 57 +!
README | 54 !!
TODO | 69 +-==
buffer.c | 763 +++++++++++++++++++-----!==========================
create.c | 594 ++++++-!!!!!=============================
extract.c | 454 ++++++++++!!!!================
list.c | 507 +++!!!!===========================
names.c | 118 =======
port.c | 541 +++++++++++++++++++++++++++=========
port.h | 29
tar.1 | 215 +++!!!=======
tar.5 | 217 ==============
tar.c | 496 ++++-=============================
tar.h | 180 ===========
volume12/diffarch.c | 323 ++++++++++++++++++++++
volume12/msd_dir.c | 214 ++++++++++++++
volume12/msd_dir.h | 36 ++
volume12/open3.h | 50 +++
volume12/wildmat.c | 132 ++++++++
20 files changed, 2028 insertions(+), 115 deletions(-), 428 modifications(!), 2635 unchanged lines(=)
Now, compare against the SUG version:
Makefile | 157 +++!=====
PORTING | 57 !!
README | 59 !!=
TODO | 64 !==
buffer.c | 688 +++!!=========================================
create.c | 599 +++++-!!!!!=============================
diffarch.c | 324 =====================
extract.c | 451 +++!!!========================
list.c | 509 ++!!=============================
names.c | 118 =======
open3.h | 50 ==
pdtar-volume12/msd_dir.c | 214 ++++++++++++++
pdtar-volume12/msd_dir.h | 36 ++
pdtar-volume12/wildmat.c | 132 +++++++++
port.c | 541 +++++++=============================
port.h | 29
tar.1 | 215 ++!===========
tar.5 | 217 ==============
tar.c | 492 +++!=============================
tar.h | 180 ===========
20 files changed, 872 insertions(+), 41 deletions(-), 343 modifications(!), 3876 unchanged lines(=)
Considering the numbers, it seems that the SUG version is about midway between the Usenet postings for volume 7 and volume 12.
Here, pax is mainly of interest because it implements the USTAR (Unix standard tar format), provided by modern implementations of tar
.
The program itself was the result of a failure to agree on whether tar
or cpio
was the one to standardize, and as a result we have a program which does either. The newsgroup thread beginning with John S. Quarterman's posting tar vs. cpio to comp.std.unix on June 1, 1987 summarizes the different points of view.
According to Glen Fowler, the first “public implementation” of pax
was written by Mark H. Colburn.
He posted it to comp.sources.unix as “Usenix/IEEE POSIX replacement for TAR and CPIO”
(volume 17, issues 74, 75, 76, 77, 78, and 79, date February 3, 1989).
The manual pages for pax
on some Unix vendors attribute pax
to Mark H. Colburn:
-
HPUX:
AUTHOR pax was developed by Mark H. Colburn, OSF, and HP. STANDARDS CONFORMANCE pax: XPG4, POSIX.2
-
IRIX (SGI):
COPYRIGHT Copyright (c) 1989 Mark H. Colburn. All rights reserved. Redistribution and use in source and binary forms are permitted provided that the above copyright notice is duplicated in all such forms and that any documentation, advertising materials, and other materials related to such distribution and use acknowledge that the software was developed by Mark H. Colburn and sponsored by The USENIX Association. THE SOFTWARE IS PROVIDED ``AS IS'' AND WITHOUT ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. AUTHOR Mark H. Colburn Minnetech Consulting, Inc. 117 Mackubin Street, Suite 1 St. Paul, MN 55102 mark@jhereg.MN.ORG Sponsored by The USENIX Association for public distribution.
-
SCO:
Copyright Copyright © 1989 Mark H. Colburn. All rights reserved. Redistribution and use in source and binary forms are permitted provided that the above copyright notice is duplicated in all such forms and that any documentation, advertising materials, and other materials related to such distribution and use acknowledge that the software was developed by Mark H. Colburn and sponsored by The USENIX Association. THE SOFTWARE IS PROVIDED ``AS IS'' AND WITHOUT ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED WARRANTIES OF MERCHANTIBILITY AND FITNESS FOR A PARTICULAR PURPOSE. Author Mark H. Colburn NAPS International 117 Mackubin Street, Suite 1 St. Paul, MN 55102 mark@jhereg.MN.ORG Sponsored by The USENIX Association for public distribution.
but not others:
While there was early discussion (in 1990) for Minix to use Colburn's pax
, as of 2015 Minix manual pages list only tar (no pax
). This is apparently BSD tar (based on bulk import from NetBSD).
Later implementations include
While there are a few exceptions, e.g., Linux From Scratch which uses Gunnar Ritter's version,
most BSD- and Linux-systems provide the implementation by Keith Muller:
-
FreeBSD and NetBSD began with checkins from 4.4BSD-Lite in 1994.
- FreeBSD archives have a copy of the CSRG version from Keith Muller beginning in December 1992.
- FreeBSD source as such, begins with initial checkin from 4.4BSD-Lite on May 26, 1994.
- NetBSD source (initial checkin from 4.4BSD-Lite on June 13, 1994).
-
OpenBSD came later.
OpenBSD source), (initial checkin from NetBSD on June 11, 1996).
Before importingpax
from NetBSD, OpenBSD used GNU tar. -
OSX
pax
comes from OpenBSD, using source from early 1998 according to the CVS identifiers (for example, see pax.c).
There are minor changes made by Apple, too small to see here:Makefile | 54 !
ar_io.c | 1372 ================================
ar_subs.c | 1288 ==============================
buf_subs.c | 1094 =========================
cache.c | 500 ===========
cpio.c | 1284 ==============================
extern.h | 299 =======
file_subs.c | 1117 ==========================
ftree.c | 565 =============
gen_subs.c | 467 ===========
getoldopt.c | 73 =
options.c | 1515 ====================================
osx-tar-20151206/Makefile.postamble | 5
osx-tar-20151206/Makefile.preamble | 1
osx-tar-20151206/PB.project | 54 +
pat_rep.c | 1240 =============================
pax.c | 426 ==========
sel_subs.c | 662 ===============
tables.c | 1434 ==================================
tar.c | 1214 ============================
tty_subs.c | 251 =====
21 files changed, 104 insertions(+), 1 deletion(-), 63 modifications(!), 14747 unchanged lines(=)You can see the size by ignoring unchanged lines:
Makefile | 80 ++++++++++++++++++++++--------------
ar_io.c | 4 -
ar_subs.c | 4 +
buf_subs.c | 2
cache.c | 25 ++++++++---
cpio.c | 2
extern.h | 2
file_subs.c | 11 ++++
ftree.c | 2
gen_subs.c | 6 +-
getoldopt.c | 2
options.c | 2
osx-tar-20151206/Makefile.postamble | 5 ++
osx-tar-20151206/Makefile.preamble | 1
osx-tar-20151206/PB.project | 54 ++++++++++++++++++++++++
pat_rep.c | 2
pax.c | 4 -
sel_subs.c | 2
tables.c | 2
tar.c | 17 +++----
tty_subs.c | 2
21 files changed, 167 insertions(+), 64 deletions(-) -
As of 2015, Debian (see package) uses Muller's version with updates by Thorsten Glaser.
The package switched to this combination with Debian 6.0 (squeeze):mircpio (20080906-1) experimental; urgency=low * Initial release * Adjust manpages to cope with GNU groffs inferiorities -- Thorsten Glaser <tg@freewrt.org> Sun, 07 Sep 2008 01:00:10 +0000
pax (1:1.5-1) unstable; urgency=low * Initial Release of the OpenBSD's pax program from Keith Muller -- David Frey <dFrey@debian.org> Wed, 10 Dec 1997 12:57:48 +0100
-
OpenSuSE, Red Hat and related (see rpmfind) OpenBSD, use a version ported from OpenBSD by Thorsten Kukuk at SuSE.
Kukuk's work stopped with version 3.4, released August 1, 2005 (see ftp directory).Kukuk's initial port (apparently from OpenBSD CVS early December 2001) went beyond the scope of a port:
-
In a third of the source files, Kukuk removed the CVS identifiers and the ifdef's used to support pre-ANSI C compilers.
There are a few porting changes scattered in (such as renaminggetline
to avoid conflict),
but those are easily overlooked in the changes to whitespace.
Those modified files are marked copyright by Kukuk. -
The source code for each of Kukuk's snapshots has BSD copyrights and licenses (including Kukuk's contributions), but he added a GPL COPYING file to the releases. Perhaps he was confused about the licensing exemption for autoconf- and automake-files.
-
Kukuk's snapshot was made during a period where Todd Miller was changing all calls to
strcpy
andstrncpy
to use hisstrlcpy
. However,pax
relied upon the null-padding provided bystrncpy
and shortly after Kukuk's initial work, Miller reverted and amended the use ofstrlcpy
inpax
(for example revision 1.21 of tar.c), just after Kukuk's port. Kukuk's initial 3.0 port (packaged a few weeks later, in January) simply providedstrlcpy
in an add-on file. Because Kukuk's port did not incorporate Miller's corrections, it seems there was no communication between the two. -
Here is a summary of Kukuk's initial port:
Makefile | 21
ar_io.c | 1363 ====================================
ar_subs.c | 1365 ++!!!!!!!!!!!!!!!!!!!!!!!!!!!!======
cache.c | 477 ============
cpio.1 | 295 --------
ftree.c | 539 ==============
gen_subs.c | 465 !!!!!!!!===
options.c | 1761 ++-!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!=========
pax-3.0-src/Makefile.am | 28
pax-3.0-src/Makefile.in | 401 ++++++++++
pax.c | 460 !!!!!!=====
pax.h | 244 ======
sel_subs.c | 655 =================
tar.1 | 295 --------
tar.c | 1222 -!!!!!!!!!!!!!!!!!!!!!!=========
15 files changed, 628 insertions(+), 804 deletions(-), 3717 modifications(!), 4442 unchanged lines(=)If blanks are ignored, the summary line would change to
12 files changed, 723 insertions(+), 288 deletions(-), 1101 modifications(!), 6963 unchanged lines(=)
-
-
While Red Hat has provided a port of Muller's
pax
from OpenBSD since 2000 (see changelog),
Mark Sobel's book A Practical Guide to Red Hat Linux 8 (December 2002) says onlyThe syntax of the
pax
command is too complex to describe here (as you might expect from looking at all the options available totar
andcpio
). If it exists in your system, consult the manual pages. The USENIX Association funded the development of a portable implementation ofpax
and placed it in the public domain, so this utility is now widely available. Refer to thepax
man page for more information.Checking further (see example), I found no support for Sobel's statement:
- there was no package named pax in Red Hat 6.2,
- Red Hat 7.0 provided Muller's version as
pax-1.5-2
:* Fri Jun 30 2000 Preston Brown <pbrown@redhat.com> - debian version, which is a port from OpenBSD's latest.
- Later, Red Hat incorporated fixes made by SuSE, probably referring to “pax-3.0”:
* Tue Mar 05 2002 Matt Wilson <msw@redhat.com> - pull PAX source tarball from the SuSE package (which is based off this one yet claims copyright on the spec file)
-
There is a port back to NetBSD in pkgsrc.se, described as “a port of OpenBSD pax for SuSE Linux by Thorsten Kukuk”.
The Austin Group has a credits page where they mention Gunnar Ritter's Heirloom Toolkit.
It also refers to Schilling's pax, although the latter appears to be an error:
Working with the Open Source community
The group includes developers from the Open Source community. As part of acknowledging their valuable input the copyright holders have made several grants relating to use of the documentation in those projects. Some of these are listed: the Linux Man Pages project, the FreeBSD project, the NetBSD operating system, the Cygwin Project, Gunnar Ritter's Heirloom Toolkit and other tools, Joerg Schilling's pax and find, Jens Schweikardt book, and the ISPRAS Linux testing project.
For documentation on features, see the GNU tar manual. The manual's notion of history is in terms of random notes about features.
The mail in early 1988 from Jay Fenlason is a hint to when he began work on GNU tar. His progress was reported in successive GNU bulletins:
The earliest versions of GNU tar do not appear to be online. The earliest which you may find are (modified) versions 1.09 for MSDOS:
The two are the same, except that the FreeDOS files contain some additional DOS-specific files written by Kai Uwe Rommel to support direct disk access for OS/2 and DOS. Those would not have been incorporated into the GNU sources.
The accessible source-archives are not much help in researching its early history:
- The GNU project's Git repository, has earlier commits, but for
tar
it essentially starts in 1994, with a series of commits by François Pinard. - François Pinard made a related repository on Github of “paxutils” which covers GNU
tar
version 1.09 through 1.12
(before branching off into his own work). Pinard did not preserve timestamps on the commits.
From the latter, the v1.09.tar.gz
file is probably useful for comparisons. Comparing against Gilmore's second posting, you can see that GNU tar had grown somewhat (as well as discarding some pieces, such as the manual page in favor of the “texinfo” file):
Makefile | 247 ++!===
PORTING | 57 -
README | 54 -
TODO | 55 -
buffer.c | 1352 +++++++++++++++++++++!!!!!!!==============
create.c | 1276 ++++++++++++++++++++++!!================
diffarch.c | 721 ++++++++++++-!!=======
extract.c | 747 +++++++++!=============
getoldopt.c | 89 ==
list.c | 726 +++++++!==============
msd_dir.c | 218 ======
msd_dir.h | 41 =
names.c | 135 ===
open3.h | 69
paxutils-1.09/COPYING | 249 +++++++
paxutils-1.09/ChangeLog | 636 ++++++++++++++++++++
paxutils-1.09/getdate.y | 882 ++++++++++++++++++++++++++++
paxutils-1.09/getopt.c | 596 ++++++++++++++++++
paxutils-1.09/getopt.h | 102 +++
paxutils-1.09/getopt1.c | 160 +++++
paxutils-1.09/gnu.c | 605 +++++++++++++++++++
paxutils-1.09/mangle.c | 226 +++++++
paxutils-1.09/rmt.h | 77 ++
paxutils-1.09/rtape_lib.c | 620 +++++++++++++++++++
paxutils-1.09/rtape_server.c | 226 +++++++
paxutils-1.09/tar.texinfo | 1289 ++++++++++++++++++++++++++++++++++++++++
paxutils-1.09/update.c | 534 ++++++++++++++++
paxutils-1.09/version.c | 90 ++
port.c | 1319 ++++++++++++++++++++++++=================
port.h | 47
tar.1 | 215 ------
tar.5 | 217 ------
tar.c | 1225 +++++++++++++++++++++++!!!============
tar.h | 297 +++-======
wildmat.c | 151 ===
35 files changed, 10391 insertions(+), 671 deletions(-), 786 modifications(!), 3702 unchanged lines(=)
The change-logs for GNU tar are helpful, since only a half-dozen people have done a significant number of commits to its source archives. Using the script which I wrote for counting changelogs, here are the percentages for developers with at least one percent of the total:
Percent Name 2.8 David J MacKenzie 20.8 François Pinard 1.5 Jay Fenlason 2.9 Michael I Bushnell 28.5 Paul Eggert 1.3 Pavel Raiskup 37.3 Sergey Poznyakoff 4.9 “other”
GNU tar releases were not at uniform intervals, but it is still useful to see how the contributions break down by time:
Version Date DJM FP JF MIB PE PR SP 1.28 2014-07-27 11.7 5.5 76.6 1.27 2013-10-05 30.5 16.2 43.8 1.26 2011-03-12 72.1 25.6 1.25 2010-11-07 46.9 53.1 1.24 2010-10-24 76.5 23.5 1.23 2010-03-10 92.3 1.22 2009-03-09 100.0 1.21 2008-12-27 96.8 1.20 2008-05-05 8.1 87.8 1.19 2007-10-10 94.4 1.18 2007-06-29 6.7 56.7 1.17 2007-06-08 31.0 68.1 1.16 2006-10-21 19.0 78.6 1.15 2004-12-20 10.0 88.7 1.14 2004-05-11 61.6 33.1 1.13 1997-07-08 9.8 61.0 4.7 9.0 11.5 1.12 1997-04-25 100.0 1.11 1992-09-09 66.9 32.4 1.10 1991-07-01 10.0 25.0 56.0 1.09 1990-10-16 18.6 78.0 1.08 1990-01-26 34.4 29.7 1.07 1989-01-26 100.0
Schilly tar (sometimes referred to as "star") was first published at the end of April 1997. It had not been published anywhere before that date.
For instance, Schilling commented in comp.unix.solaris 12/9/1996:
In article <5892f4$2...@news.Informatik.Uni-Oldenburg.DE>, Christian Kuehnke <Christia...@arbi.Informatik.Uni-Oldenburg.DE> wrote: > >j...@cs.tu-berlin.de (Joerg Schilling) writes: >> For a tar implementation that has no known bugs, will read all >> (currently except HP-UX) tar streams and is the fastest implementation ^^^^^^^^^^^^^^^^^^^^^^ if they contain device files >> at all (faster than ufsdump) look at: >> >> ftp://ftp.fokus.gmd.de/pub/unix/star >> >> for the rest of the goods. > >Nice. But why don't you provide the source? I always intended to provide star in source. There are some reasons, why I din't do this up to now: 1) I dont want star to go the same way as gnu tar You remember... Gnu tar has been first written in August 1985 by John Gilmore,ihnp4!hoptoad!gnu. It has been brought to the public at the Sun User Group meeting in december 1987 in San Jose as 'sugtar'. This version was really nice. The actual version has been ported to death. For this reason, I want to have star in my hands until I know the line for portability to other systems is clear. Star has been first written in 1982 by me. The main growth in functionality did come in May 1985. Although star has been designed to be very portable, id did run only on UNOS, SYSVr0-2, SunOS and Solaris. The major porting effort has been taken in 1994. It now runs on SunOS, Solaris, HP-UX, IRIX, Linux, DG/UX, AIX 2) Makefile system In May 1996 I made a makefile sytstem that allows simultaneous compilation on all supported platforms. This still needs some fine tuning until it may do the way to the public. I expect star to be available in souce in January 1997. Joerg P.S. Star has been ported to DG/UX with the help of Data General. It will soon be available on Data General systems as a fast backup. PP.S. GMD in Birlinghoven currently switches from 2MB/s X.25 to 34 MB/s ATM. For this reason our ftp server may not be reacheable from outside germany until the mid of the next week. -- EMail: jo...@schily.isdn.cs.tu-berlin.de (home) Jörg Schilling D-13353 Berlin j...@cs.tu-berlin.de (uni) If you don't have iso-8859-1 j...@fokus.gmd.de (work) chars my name is URL: http://www.fokus.gmd.de/usr/schilling J"org Schilling
The actual announcement at the end of April 1997 was much longer (and was cross-posted to 26 newsgroups):
Path: euryale.cc.adfa.oz.au!newshost.carno.net.au!harbinger.cc.monash.edu.au!news.mira.net.au!news.netspace.net.au!news.mel.connect.com.au!munnari.OZ.AU!news.Hawaii.Edu!news.caldera.com!news.eli.net!uunet!in1.uu.net!160.45.4.4!fu-berlin.de!cs.tu-berlin.de!js From: js@cs.tu-berlin.de (Joerg Schilling) Newsgroups: comp.unix.admin,comp.unix.misc,alt.os.linux,alt.sys.sun,bln.comp.sun,bln.comp.unix,comp.os.linux.development.apps,comp.os.linux.misc,comp.sys.hp.apps,comp.sys.hp.misc,comp.sys.sgi.admin,comp.sys.sgi.apps,comp.sys.sgi.misc,comp.sys.sun.admin,comp.sys.sun.apps,comp.sys.sun.misc,comp.unix.aix,comp.unix.bsd.freebsd.misc,comp.unix.solaris,de.comp.os.linux.misc,de.comp.os.unix,linux.dev.admin,linux.dev.apps,maus.os.linux,maus.os.linux68k,maus.os.unix,uk.comp.os.linux Subject: STAR (tape archiver) source code released Date: 30 Apr 1997 10:57:06 GMT Organization: Technical University of Berlin, Germany Lines: 108 Distribution: inet Message-ID: <5k78i2$fht$1@news.cs.tu-berlin.de> NNTP-Posting-Host: 130.149.25.72 Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: 8bit Summary: Star is a fast and Posix compliant tape archiver Xref: euryale.cc.adfa.oz.au comp.unix.admin:57542 comp.unix.misc:29041 alt.os.linux:20710 alt.sys.sun:11011 comp.os.linux.development.apps:32517 comp.os.linux.misc:172775 comp.sys.hp.apps:6817 comp.sys.hp.misc:11233 comp.sys.sgi.admin:46008 comp.sys.sgi.apps:14655 comp.sys.sgi.misc:30303 comp.sys.sun.admin:86045 comp.sys.sun.apps:15307 comp.sys.sun.misc:29517 comp.unix.aix:98941 comp.unix.bsd.freebsd.misc:40018 comp.unix.solaris:105077 de.comp.os.unix:409 Star, the fastest tar archiver for UNIX is now available in source. Star has many improvements compared to other tar imlementations (including gnu tar). See below for a short description of the highlight of star. Star is located on: ftp://ftp.fokus.gmd.de/pub/unix/star Revision history (short) 1982 First version on UNOS (extract only) 1985 Port to UNIX (fully funtional version) 1985 Added pre Posix method of handling special files/devices 1986 First experiments with fifo as external process. 1993 Remote tape access 1993 diff option 1994 Fifo with shared memory integrated into star 1994 Very long filenames and sparse files 1994 Gnutar and Ustar(Posix) handling added 1994 Xstar format (extended Posix) defined and introduced 1995 Ported to many platforms Supported platforms: SunOS Solaris Linux HP-UX DG/UX IRIX AIX FreeBSD Joerg ------------------------------------------------------------- Star is the fastest known implementation of a tar archiver. Star is able to make backups with more than 12MB/s if the disk and tape drive support such a speed. This is more than double the speed that ufsdump will get. Ampex got 13.5 MB/s with their new DLT tape drive. Ufsdump got a maximum speed of about 6MB/s with the same hardware. Star development started 1982, development is still in progress. The current version of star is stable and I never did my backups with other tools than star. Its main advantages over other tar implementations are: fifo - keeps the tape streaming. This gives you faster backups than you can achieve with ufsdump, if the size of the filesystem is > 1 GByte. pattern matcher - for a convenient user interface (see manual page for more details). To archive/extract a subset of files. sophisticated diff - user tailorable interface for comparing tar archives against file trees This is one of the most interesting parts of the star implementation. no namelen limitation - Pathnames up to 1024 Bytes may be archived. (The same limitation applies to linknames) This limit may be expanded in future without changing the method to record long names. deals with all 3 times - stores/restores all 3 times of a file (even creation time) may reset access time after doing backup does not clobber files - more recent copies on disk will not be clobbered from tape This may be the main advantage over other tar implementations. This allows automatically repairing of corruptions after a crash & fsck (Check for differences after doing this with the diff option). automatic byte swap - star automatically detects swapped archives and transparently reads them the right way automatic format detect - star automatically detects several common archive formats and adopts to them. Supported archive types are: Old tar, gnu tar, ansi tar, star. fully ansi compatible - Star is fully ANSI/Posix 1003.1 compatible. See README.otherbugs for a complete description of bugs found in other tar implementations. This is the first source release of star that I put on the net. Have a look at the manual page, it is included in the distribution. Author: Joerg Schilling Seestr. 110 D-13353 Berlin Germany Email: joerg@schily.isdn.cs.tu-berlin.de, js@cs.tu-berlin.de schilling@fokus.gmd.de Please mail bugs and suggestions to me. -- EMail: joerg@schily.isdn.cs.tu-berlin.de (home) Jörg Schilling D-13353 Berlin js@cs.tu-berlin.de (uni) If you don't have iso-8859-1 jes@fokus.gmd.de (work) chars my name is URL: http://www.fokus.gmd.de/usr/schilling J"org Schilling
There are a few points which the reader may not have noticed:
-
There has been (until perhaps this page) no published benchmark for the performance of Schily tar.
-
The long list of milestones is interesting but not relevant to Schilling's frequent statement:
the oldest free TAR implementation
Publication date is the relevant detail. Schily tar was first published eight years after GNU tar.
-
Regarding "Very long filenames", the source copyright for
longnames.c
saysCopyright (c) 1993, 1995 J. Schilling
which tells us that it may have been started in 1993, but was not complete until 1995. Further, the file's SCCS-ID shows
/* @(#)longnames.c 1.13 96/06/26 Copyright 1993, 1995 J. Schilling */
lengthening the development period another year. In any case, The long-names feature for GNU tar was released earlier than that. Without a published source, it is not possible to determine the extent to which Schilling borrowed, adapted or was otherwise influenced by the previously published work.
-
The initial release has no change-log, from which one might get clues to investigate inconsistencies in the release announcements.
-
Schilling added a change-log to version 1.1, released about a month later. The end of that file shows a problem:
Sun Mar 3 17:20:19 1991 Joerg Schilling <joerg@schily.isdn.cs.tu-berlin.de> * buffer.c 1.1 date and time created 91/01/31 17:20:19 by joerg Fri Jun 30 12:01:59 1989 Joerg Schilling <joerg@schily.isdn.cs.tu-berlin.de> * star.c 1.2 star divided into (star extract list create) ... SCCS revision info lost First full version made in 1986
It also notes further development changes to long-names:
Mon Jun 30 01:12:08 1997 Joerg Schilling <joerg@schily.isdn.cs.tu-berlin.de> * longnames.c 1.16 Avoid strcatl() for speed f_name/f_lname bug and bug with non-initialized m_add Mon Jun 9 21:25:18 1997 Joerg Schilling <joerg@schily.isdn.cs.tu-berlin.de> * longnames.c 1.15 NAMSIZ -> props.pr_maxsname/props.pr_maxslname Mon Jun 9 16:56:44 1997 Joerg Schilling <joerg@schily.isdn.cs.tu-berlin.de> * longnames.c 1.14 Bug that caused very long directory names from command line to overwrite the stack (av[i+1)
That is, there is no usable change-history before 1989, and the date given for a complete version is at the outset inconsistent with the release announcement:
1985 Port to UNIX (fully funtional version)
Checking dates, Schilling's change-log started almost six months after the first public release of GNU tar 1.07 in January 1989.
If there had been a published version of Schily tar in 1989, we could gauge how much it had borrowed from BSD tar, and continuing, how GNU tar influenced Schily tar. But there is that eight-year delay.
Lists of obscure features (such as "Ustar") get little attention.
Numbers are what get readers' attention.
Here is one (cited in the usual source of misinformation), from Unix Backup and Recovery by W. Curtis Preston, O'Reilly, 1999:
A Really Fast tar Utility: starThe star utility is the fastest known implementation of tar. It has been tested at speeds exceeding 14 MB/s. (This is more than double the speed that dump gets.) star development started in 1982 and is still in progress. star's main advantages over other tar implementations are:
- FIFO
This is a “double-buffering” system that keeps the tape streaming. This gives you faster backups than you can achieve with dump, if the size of the filesystem is > 1GB.
- Sophisticated diff
It has a user-tailorable interface for comparing tar archives against file trees.
- Longer pathname length
You may archive pathnames up to 1024 bytes, as you can with dump.
- Does not clobber files
More recent copies on disk will not be clobbered from the backup volume. This may be the main advantage over other tar implementations. This allows automatic repair of a corrupted filesystem. (You can check for differences after doing this with the diff option.)
- Automatic byte swap
star automatically detects swapped archives and transparently reads them the right way.
star is available from
ftp://ftp.fokus.gmd.de/pub/unix/star
.
Both the Schily tar release announcement and Preston's summary are quoted here to make it simpler for the reader to observe how the summary in the book is based on the release announcement. Preston made some adjustments:
-
ufsdump was altered to “dump”, and
-
the 13.5 MB/s figure cited for comparison against Ampex was conflated into 14 MB/s attributed to the tar program.
The telling point is that Preston did not add a paragraph or two detailing how the performance was measured.
By the way, star (Schily tar) is not mentioned in the revised edition Backup & Recovery: Inexpensive Backup Solutions for Open Systems (2007). Instead, Preston says (page 106):
Use GNUtar
if You CanGNU
tar
is an extremely popular utility. Beside being able to read an archive written by any other version oftar
, it adds a significant level of functionality. Here are some of its most popular advancements:
- The
-d
option performs adiff
compare between the archive and a filesystem. It does this by reading the tape and comparing its contents against the files that it finds in the filesystem. Any differences are reported.- The
-a
option resets access times (atime).- The
-f
option runs a script whentar
reaches the end of a volume. This can be used to automatically swap volumes with a media changer.- The
-Z
and-z
options automatically pass the archive throughcompress
orgzip
, respectively.- The
-f
option supports remote device names.- By default, GNU
tar
suppresses a leading slash on absolute pathnames while creating or reading atar
archive. (You can suppress this with the-p
option.)- Some people also prefer the GNU style of arguments that are offered by GNU
tar
. Instead oftar cvf
, you can specifytar -create -verbose -file
.
Finally (perhaps not the last work on the topic), is bsdtar
, built upon libarchive (originally in Tim Kientzle's webpage).
-
As Tim Kientzle relates, he began work on libarchive in 2003 for use in installers.
-
The first announcement was at the end of 2003:
libarchive/bsdtar snapshot available Tim Kientzle tim at kientzle.com Mon Dec 22 21:17:35 PST 2003 A fairly complete snapshot of libarchive and bsdtar, including source code, complete documentation, and some background about why I'm doing this and what I hope to accomplish is now available: http://people.freebsd.org/~kientzle/libarchive/ It needs a lot of testing still, but is getting to the point that someone other than me should be able to make sense of it. ;-) Feedback appreciated. Tim Kientzle kientzle at freebsd.org
-
Initially, commits were made to FreeBSD's CVS repository.
As of 2015, you can see this in the SVN branches for FreeBSD releases 6, 7, 8 and 9.
Starting with FreeBSD 10, development moved to Github. - The initial check-in early in 2004 gives Kientzle's goal:
Initial import of libarchive. What it is: A library for reading and writing various streaming archive formats, especially tar and cpio. Being a library, it should be easy to incorporate into pkg_* tools, sysinstall, and any other place that needs to read or write such archives. Features: * Full automatic detection of both compression and archive format. * Extensible internal architecture to make it easy to add new formats. * Support for "pax interchange format," a new POSIX-standard tar format that eliminates essentially all of the restrictions of historic formats. * BSD license Thanks to: jkh for pushing me to start this work, gordon for encouraging me to commit it, bde for answering endless style questions, and many others for feedback and encouragement. Status: Pretty good overall, though there are still a few rough edges and the library could always use more testing. Feedback eagerly solicited.
-
The NEWS file lists first milestone:
May 18, 2004: bsdtar can read Solaris, HP-UX, Unixware, star, gtar, and pdtar archives.
-
NetBSD's pkgsrc repository shows
-
The Git repository history only goes back to April 2008.
-
The wiki shows some details of releases.
-
There are older tarballs in the archive section, going back to February 2006.
-
With 74 contributors, 4544 commits as of December 7, 2015, it is comparable in activity to GNU tar (45 contributors, 5448 commits).
That measure may be biased in favor of libarchive, since the summary lists 3 branches.
From October 1992, though May 2005, I worked initially to collect useful development tools, for use by myself and other developers. I also provided fixes and feedback (e.g., cproto, mawk, vile). After a few years I was involved in development of these tools, to follow up on the fixes I had made, and became more selective about which to become involved with.
I gauged program quality by compiling candidates with gcc compiler warnings turned on, as well as doing test-builds with Unix compilers. For example, I used this script:
#!/bin/sh # these are my normal development-options OPTS="-Wall -Wstrict-prototypes -Wmissing-prototypes -Wshadow -Wconversion" gcc $OPTS "$@"
That made it simpler:
-
good code had few warnings; I could send a patch knowing that it would be treated properly.
-
bad code had many warnings; I simply deleted the program.
-
in between meant that I would (as I did for ncurses) start by sending a set of patches to clean those up before addressing my real issue.
For instance, that was what I had in mind when I sent mail to Paul Eggert in 1993, suggesting improvements to rcs
. The discussion was inconclusive. A few years later, I read his Usenet postings (such as Re: Reverse function for gmtime()? ), with interest.
Still later (probably 1997 or 1998, though I am unable to locate it via Google), I was interested to note an exchange between Eggert and Schilling. Schilling was accusing Eggert of having deliberately implemented long-name support in GNU tar in a way designed to make it incompatible with POSIX. Schilling, of course, phrased his remarks in a more emphatic manner than I report here.
I examined the GNU tar source and read its change-log. According to that (reading it again):
-
Eggert was cited for only a couple of bug reports for GNU tar before becoming its maintainer in October 1997.
-
The work mentioned was apparently done by Jay Fenlason (seen in mention of name mangling late in 1990)
and Michael I Bushnell (completed for version 1.11, September 1992).
I followed up by downloading a copy of Schilling's program. Of course, I screened it for compiler warnings. It was "in between" which calls for a collaborative effort. However viewing the episode with Eggert, it was obvious that Schilling was no improvement in comparison to Eric Raymond. Attempting to collaborate with Schilling would be comparable to Sindbad's adopting the Old Man of the Sea for a traveling companion.
So I deleted it.
I had occasion to revisit long filenames with tar for ncurses. Juergen Pfeifer added several filenames for the Ada95 binding which were long. That was because they (like Java class names versus filename), had to match package names which were long.
Despite my qualms, this was not initially a problem with tar. Later, that changed, and since problem reports were not frequent, it took a while to notice and address the problem. Here are a few mail interchanges to illustrate.
I tried untar'ing a file on ClarkNet's Solaris machine:
From florian@suse.de Sat Apr 3 01:19:48 1999 Received: from smtp-gw.vma.verio.net (smtp-gw.vma.verio.net [207.97.20.30]) by loas.clark.net (8.8.8/8.8.8) with ESMTP id BAA29138 for <dickey@clark.net>; Sat, 3 Apr 1999 01:19:48 -0500 (EST) Received: from Cantor.suse.de (Cantor.suse.de [194.112.123.193]) by smtp-gw.vma.verio.net (8.9.3/8.9.3) with ESMTP id BAA15725 for <dickey@clark.net>; Sat, 3 Apr 1999 01:20:03 -0500 (EST) Received: from Galois.suse.de (Galois.suse.de [194.112.123.130]) by Cantor.suse.de (Postfix) with ESMTP id F084632CE2 for <dickey@clark.net>; Sat, 03 Apr 1999 08:19:13 +0200 (MEST) Received: from knorke.saar.de (knorke.suse.de [10.0.0.254]) by Galois.suse.de (Postfix) with ESMTP id CDE529410 for <dickey@clark.net>; Sat, 3 Apr 1999 08:19:12 +0200 (MEST) Received: (from florian@localhost) by knorke.saar.de (8.8.8/8.8.8) id IAA08351 for dickey@clark.net; Sat, 3 Apr 1999 08:19:12 +0200 From: Florian La Roche <florian@suse.de> Date: Sat, 3 Apr 1999 08:19:12 +0200 To: dickey@clark.net Subject: Re: progress? Message-ID: <19990403081912.A8309@knorke.saar.de> References: <19990403004604.A7286@knorke.saar.de> <199904030226.VAA10981@shell.clark.net> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 0.95.4i In-Reply-To: <199904030226.VAA10981@shell.clark.net>; from dickey@clark.net on Fri, Apr 02, 1999 at 09:26:25PM -0500 Sender: florian@knorke.saar.de Status: RO Content-Length: 618 Lines: 19 > close - but there's a problem. I can see the contents, but I get a directory > checksum error trying to untar it. Here's what I get > > -rw------- 1 dickey ipusers 1378639 Apr 2 1999 ncurses-5.0-beta1.tar.gz > > sum: > 60558 2693 ncurses-5.0-beta1.tar.gz > > sum -r: > 31196 2693 ncurses-5.0-beta1.tar.gz knorke:~/source $ sum -r ncurses-5.0-beta1.tar.gz 31196 1347 I cannot reproduce any problem with that file. I have also tried to unpack it on the GNU machine and didn't get any error. Can you try it on a Linux machine? (At least with GNU tar to unpack it?) Florian La Roche
It worked for Potorti, but neither of us knew what the @LongLink
was:
From dickey Fri Jul 30 09:56:17 1999 Subject: Re: File mode specification error on a tar.gz file To: F.Potorti@cnuce.cnr.it (Francesco Potorti` <F.Potorti@cnuce.cnr.it>) Date: Fri, 30 Jul 1999 09:56:17 -0400 (EDT) In-Reply-To: <m11ABia-001i1aC@fly.cnuce.cnr.it> from "Francesco Potorti` <F.Potorti@cnuce.cnr.it>" at Jul 30, 99 02:24:56 pm X-Mailer: ELM [version 2.4 PL25] Content-Type: text Status: RO Content-Length: 1067 Lines: 35 > > emacs 20.4 > > Download http://www.clark.net/pub/dickey/ncurses/ncurses.tar.gz and put > it in your current directory. > > emacs -q > M-x auto-compression-mode RET > C-x d RET > go to the ncurses.tar.gz line > RET > --> unzipping ncurses.tar.gz...done > Parsing tar file...done > File mode specification error: (wrong-type-argument integerp nil) > > The likely reason is that gnu tar 1.12, when run in listing mode over > that archive, outputs one line like this: > > Lr--r--r-- root/root 103 1999-06-15 03:03 ././@LongLink unknown file type `L' hmm (I have had occasional problems reading those tar files with non-GNU tar, but not seen any thing that I can pinpoint). I'll repack with Solaris tar (which works, afaik). -- did that, will see if I can identify the bogus 'L' entry. thanks. > Even after having read the tar docs I don't understand if that is normal > or not. Anyway, if possible, it would be nice for emacs to handle these > errors. -- Thomas E. Dickey dickey@clark.net http://www.clark.net/pub/dickey
The reason became clear after releasing ncurses 5.2 in 2000 with a few reports from people using Mac OS X and FreeBSD. Starting at that point, I changed my release process to use Solaris tar to create the release tar-balls for ncurses.
Alternatively, I could have used Schily tar. But I chose not to:
-
If it had some quirk which meant that its files were not readable by some other flavor of tar,
then that meant the receiver would have to provide a compatible tar. -
Early on, there were few groups which provided precompiled packages for Schily tar.
-
While it is available as an add-on, nowhere is it the default tar program.
-
Notably, it is missing from Debian (see package tracker, and bugs, in particular #350624).
-
Likewise, it did not become part of OpenSolaris (see discussion).
-
It is not in OpenBSD ports. See this mailing list discussion in 2005 for an explanation.
-
It does not appear to be in Arch Linux (see package search).
-
Likewise missing from the HP-UX Porting & Archiving Center (see package search).
-
In the cases of Debian and OpenSolaris, Schilling antagonized the people whose cooperation was needed (see Garrett D'Amore followup in slides, e.g., provoking this response).
For more context,see
Using Solaris tar was only a stopgap fix. Fortunately, it turned out, on investigation, that only the development versions (with 8-digit year/month/day added to the pathname) produced pathnames long enough to pass the 100-character threshold.
The investigation was part of my check-list for ncurses6. Some of the results are interesting, hence this page.
I began collecting information for this investigation in 2014, creating an outline of this page.
Later, in June 2015, I built each of the GNU tar and Schily tar versions mentioned here using Debian 6 (gcc 4.4.5). I also wrote a test program to verify interoperability of the various tar formats with pathnames of different lengths.
In reviewing the initial results, I found that I should also include multiple versions of BSD tar, to comment on its influence vis-à-vis GNU and Schily tar.
In this study, I acquired for reference these versions of GNU tar:
- 1.09 from ftp.nluug.nl
- 1.11.1 from gfd-dennou.org ,
- 1.11.2 from ftp.auckland.ac.nz
- as well as the versions at ftp.gnu.org:
1.11.8, 1.12, 1.13, 1.14, 1.15, 1.15.1, 1.16, 1.16.1, 1.17, 1.18, 1.19, 1.20, 1.21, 1.22, 1.23, 1.24, 1.25, 1.26, 1.27, 1.27.1, 1.28
For research, there is also a Git repository, but it is not very useful:
- the initial commit is 1990-11-01, but actual "tar" is later.
- the real initial commits for tar began 1994-11-16.
- GNU tar 1.12 followed 1.13 in 1996-11-11 in several commits
The earliest published version (1.07) cannot be found. I found it only mentioned as being on ccb.ucfs.edu in December 1989, and on prep.ai.mit.edu since May 1989.
I also obtained these versions of Schily tar from gd.tuwien.ac.at:
1.0, 1.1, 1.2, 1.3, 1.3.1, 1.4, 1.4.1, 1.4.2, 1.4.3, 1.5, 1.5.1
There is no publicly-accessible source repository for Schily tar. There is a Mercurial repository for Schillix, which matches Illumos up to mid-2010 (when OpenSolaris ended, as reported by The Register), but Schily tar is not there, and comparing the two repositories, it is immediately apparent that Illumos is the ongoing reference implementation because its history continues well past that point, unlike Schillix.
For testing other programs, I used the packaged versions (mainly Debian 6–8 and Solaris 10):
- For BSD tar:
- I have (virtual) machines for Solaris 10 and 11:
- I prefer the former machine because it works more reliably.
- Aside from the addition of compression support for compatibility with GNU tar, the latter version of tar appears the same.
These tar
implementations provide several features. For instance, the newer versions provide options for gzip
(and other) compression. While I do use those features, they are not that important because they only simplify the use of compression, but do not enhance it, e.g., by making it faster.
Adding compression support to tar is simpler than it may seem, provided that all it does is run an external program. I did this for diffstat with little effort (initially in 2000, later adding a configure check in 2006, etc). Modifying a program to use compression libraries takes appreciably more effort.
Often, comments are made that the compression is “not really part of tar”, which may or may not be accurate:
- the XFree86 version compiled-in support for a library to do the decompression.
- BSD tar compression support is built-in with
libarchive
.
Here is a table comparing the command-line support for compression in these tar
implementations:
Date Format Program Version Feature 1995-06 compress GNU tar 1.11.8 -Z
option1995-06 gzip GNU tar 1.11.8 -z
option2004-05 bzip2 GNU tar 1.14 -j
option2010-03 xz GNU tar 1.23 -J
option2008-04 compress Schily tar 1.5 -Z
option2002-05 gzip Schily tar 1.4 -z
option2008-04 bzip2 Schily tar 1.5 -j
,-bz
options2013-01 xz Schily tar 1.5.2 -xz
option2010-03 compress BSD tar 2.8.3 -Z
option2010-03 gzip BSD tar 2.8.3 -z
option2010-03 bzip2 BSD tar 2.8.3 -j
,-y
options2010-03 xz BSD tar 2.8.3 -J
option2012-05 compress Solaris tar 5.11 -Z
option2012-05 gzip Solaris tar 5.11 -z
option2012-05 bzip2 Solaris tar 5.11 -j
optionN/A xz Solaris tar N/A N/A 2010-11 compress GNU tar 1.25 auto-sense 2004-12 gzip GNU tar 1.15 auto-sense 2004-12 bzip2 GNU tar 1.15 auto-sense 2010-10 xz GNU tar 1.24 auto-sense 2002-05 compress Schily tar 1.4 auto-sense 2002-05 gzip Schily tar 1.4 auto-sense 2002-05 bzip2 Schily tar 1.4 auto-sense 2013-01 xz Schily tar 1.5.2 auto-sense 2010-03 compress BSD tar 2.8.3 auto-sense 2010-03 gzip BSD tar 2.8.3 auto-sense 2010-03 bzip2 BSD tar 2.8.3 auto-sense 2010-03 xz BSD tar 2.8.3 auto-sense 2012-05 compress Solaris tar 5.11 auto-sense 2012-05 gzip Solaris tar 5.11 auto-sense 2012-05 bzip2 Solaris tar 5.11 auto-sense N/A xz Solaris tar N/A auto-sense
There are a few caveats:
- While some features may be documented in earlier change-log notes,
the dates and versions cited are the earliest that I could verify at this time (2016/01/07). - Schily tar versions at a given date are not necessarily directly comparable to GNU tar:
- Schily tar did not work with 64-bits until version 1.3; the data cited for older versions is from a 32-bit machine.
- Schily tar 1.3 recognizes the
-z
option, but it did not work until version 1.4
- Schily tar options processing has a number of quirks, making it incompatible with all other versions of tar:
- Schily tar does not accept the option “
-tvf
”. It is necessary to omit the “-
”. - To create an archive with
compress
, it is necessary to separate the-Z
option fromcf
, e.g.,star cf foo -Z bar
.
On the other hand, this separation is not needed for the-z
option. - While Schily tar began auto-sensing with version 1.4, it gives a warning in doing so.
- Schily tar does not accept the option “
- The mention of Solaris in the table was prompted by an anonymous addition to a vague statement made by “jlliagre”:
- the addition used the term “notable” inappropriately. There is nothing notable about a developer making changes for compatility with a 20 year old feature of another program.
- the comments about busybox were spurious: busybox implements a functionally-reduced version of useful programs, and is of interest to a small niche of the development community.
To recap, tar
compression is interesting to some extent, because it simplifies ad hoc commands involving tar
. I do not use in the archive
script, which I use for preparing tarballs to distribute to others. Rather, in the script, tar
pipes to gzip
(or bzip2
).
from Hacker News https://ift.tt/3cy50bF
No comments:
Post a Comment
Note: Only a member of this blog may post a comment.