-
Notifications
You must be signed in to change notification settings - Fork 24
/
on-disk-format.txt
1050 lines (831 loc) · 44.2 KB
/
on-disk-format.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
pkg(5): image packaging system
This information is Copyright (c) 2010, Oracle and/or its affiliates.
All rights reserved.
ON-DISK FORMAT PROPOSAL
1. Introduction
1.1. Date of This Document:
06/02/2010
1.2. Name of Document Author/Supplier:
Shawn Walker, Oracle,
on behalf of the pkg(5) project team
1.3. Acknowledgements:
This document is largely based on comments from the following
individuals to whom the author is exceedingly indebted to:
- Danek Duvall
- Mike Gerdts
- Stephen Hahn
- Krister Johansen
- Dan Price
- Brock Pytlik
- Bart Smaalders
- Peter Tribble
2. Project Summary
2.1. Project Description:
"...the repository can be archived up, put on a CD, memory
stick, 2D barcode, and protected by the Black Knight, fire
moats, komodo dragons, etc." - Danek Duvall
pkg(5) is primarily a network-oriented binary packaging system.
Although some of the tools it provides support filesystem-based
operations for publication, the primary expected use for package
operations (such as install, update, search, etc.) is between an
intelligent client and one or more servers that provide access
to a package repository and/or other interactive services.
This project seeks to define and establish an on-disk format
(and corresponding container format), for the pkg(5) system,
with the intent that it can enable the ubiquitous, transparent
use of package data from filesystem-based resources.
The changes proposed by this project are evolutionary, not
revolutionary, in nature. In particular, this project seeks
to refine and adopt the existing repository format used by the
pkg(5) depot server as the on-disk format. Supplementary to
that, it also seeks the addition of a container format to ease
provisioning of the on-disk format, and the unification of the
scheme used by the client and server to store package data.
2.2. Problem Area:
For some deployments, network-based package data access is not
possible or is undesirable. Concerns often cited in this area
include:
- lack of access control or ability to easily integrate with
existing access control systems,
- inability to rely on alternative (or existing) provisioning
arrangements (such as NFS-based file servers),
- environmental or procedural requirements that prohibit the
ability to or use of a network-based service,
- characteristics of network protocols (such as HTTP, etc.) that
artificially limit functionality or performance (as opposed to
iSCSI or other alternatives),
- ease of administration of filesystem-based resources, and
- ease of transferring package data.
3. Project Technical Description:
3.1. Details:
This project defines an on-disk format (and corresponding con-
tainer format) that is intended for the supplemental or complete
provisioning of package data at all stages of the package life-
cycle. That is, when package data is published, stored by the
client or server, or otherwise used during package operations.
The on-disk format (defined in detail later in this document)
is intended to be distributable in its raw form (a pre-defined
structure of directories and files) or within a container format
(such as a zip file, etc.).
Out of necessity, the use of filesystem-based resources (such as
those provided by the on-disk format) will sometimes limit the
operations that can be performed to a subset of those normally
available when interacting with a network-based repository. For
example, search and publisher configuration may not be possible,
and purely interactive services such as the BUI (Browser UI)
offered by the depot server for a repository, RSS feeds, and
others will not be available.
Because of the wide-ranging impact of the changes required to
implement this functionality, it is intended that the project
be implemented in the following sequence:
- Client Support for filesystem-based Repository Access
- Depot Storage, Client Transport and Publication Tool Update
- Client Storage and Image Format Update
- Client and Depot Support for On-Disk Archive Format
3.2. Bug/RFE Number(s):
As an example of the kinds of defects and RFEs intended to be
resolved by this project, see the following selection of
defect.opensolaris.org bug IDs:
2152 standalone package support needed (on-disk format)
166 depot doesn't set directory mode when creating directories
2086 validate that a repository is really a repository in pkg.depotd
6335 publisher repo with invalid certificate information shouldn't
prevent querying other repos
6576 pkg install/update support for temporary publisher origins desired
6940 depot support for file:// URI desired
7213 ability to remove published packages
7273 manifests should be arranged in a hierarchy by publisher
7276 /var/pkg metadata needs reorg (looks busy)
8433 client and pull need to refer to refer to "repository" instead of
"server"
8722 advanced repository metadata store needed
8725 versioning information for depot and repository metadata needed
9571 CachedManifest should be named FactoredManifest
9572 CachedManifest should allow consumers to specify cache location
9872 publication api should use new transport subsystem
9933 ability to control repository creation behaviour or removal of it
10244 caching dictionaries as a class variable prevents multi-image and
repo search
11362 Image update dying when trying to talk to a disabled and offline
publisher
11740 publishers with installed packages should not be removable
12814 publisher prefixes should be forcibly lower-cased or case
insensitive
14802 ability to have separate read / write download caches
15320 pkgsend will traceback if unable to parse server error response
15371 repository property defaults opensolaris.org-specific
3.3. In Scope:
Filesystem-based data resourcing for package operations.
3.4. Out of Scope:
Package signing and fine-grained access control for package
repositories.
4. On-Disk Format Technical Description:
4.1. Overview:
The on-disk format is intended to exist both in a raw format as
a pre-defined structure of directories and files, and in an
archive format which is primarily a simple container for
the raw format.
4.2. Raw Format:
4.2.1. Goals:
The goals for the raw on-disk format include:
- unification of client and server package data storage
for data common to both,
- transparent usage of package data regardless of operation
or use by client or server,
- ease in composition and decomposition of package data
stored within by publisher or package,
- re-use of existing publication tools for on-disk format,
- enablement of future publication tools to automatically
be able to manipulate or use on-disk format, and
- ease of provisioning.
4.2.2. Raw Format specification:
The pkg(5) repository format is a set of directories and
files that conform to a pre-defined structure.
For a version 3 repository (the current format), the
structure is as follows:
<REPO_ROOT>/
catalog/
<catalog v1 files>
index/
<index files>
file/
<first two letters of file hash>/
<file-named-by-hash>
pkg/
<stem>/
<manifest-file>
trans/
<in-flight transaction files>
cfg_cache (optional repository configuration file)
Version 4 of the repository format eliminates the potential
for unintended collisions between package metadata from
different publishers and simplifies composition and decomp-
osition of repository content. The top-level is an optional
shared storage space for data common to all publishers in
the repository, while the publisher subdirectory contains
data specific to a publisher. It is essentially a nested
repository format, and can be defined as follows:
<REPO_ROOT>/
file/ (optional)
publisher/ (optional)
<prefix>/ (optional)
catalog/ (optional)
<catalog v1 files>
file/ (optional)
<first two letters of file hash>/
<file-named-by-hash>
index/ (optional)
pkg/ (optional)
<stem>/
<manifest-file-for-pkg-version>
trans/ (optional)
<in-flight transaction files>
pub.p5i (optional)
pkg5.repository (required)
By default, repository operations will store data in the
publisher-specific location found under publisher/<prefix>
for new repositories.
In the case that the top-level file/ directory is used,
automatic decomposition of contents into its publisher-
specific components will not be possible unless
corresponding package manifests are also available.
To support easy composition, filtering, and creation of
package archives, directories above marked with the text
'(optional)' must not be required. The behaviour of
consumers accessing the contents of the repository should
be as follows based on the directory accessed:
- file/
This optional directory serves as a place to store file
data for more than one publisher. Package files are
stored in gzip format using a sha1sum of the file as the
filename, and then the first two letters of the filename
as the parent directory's name.
- publisher/<prefix>/catalog/
If absent, consumers should determine the list of
packages available based on the manifest files present
in the publisher/ subdirectory. If present, consumers
should expect v1 (or newer) catalog files, or none at
all, to be contained within.
- publisher/<prefix>/file/
Consumers should always check this subdirectory first
(if present) when retrieving package file data if the
publisher is known. Package files are stored in gzip
format using a sha1sum of the file as the filename, and
then the first two letters of the filename as the parent
directory's name.
- publisher/<prefix>/index/
If absent, search functionality should be disabled for
this publisher, or a fallback to 'slow manifest-based
search' performed. If present, consumers should expect
v1 (or newer) search files, or none at all, to be con-
tained within.
- publisher/<prefix>/pkg/
If absent, search must be disabled for this publisher
even if index is present. If present, manifests are
stored in pkg(5) manifest format using the uri-encoded
version of the package FMRI as the filename, and using
the uri-encoded package FMRI stem (name) as the parent
directory's name.
- publisher/<prefix>/trans/
If absent, this directory will be created during
publication operations. If present, in progress
transaction data is stored in a directory named
by the open time of the transaction as a UTC UNIX
timestamp plus an '_' and the URI-encoded package
FRMI. As an example:
1245176111_pkg%3A%2FBRCMbnx%400.5.11%2C5.11-0.116
%3A20090616T181511Z
- publisher/<prefix>/pub.p5i
This pkg(5) information (p5i) file should contain
suggested configuration information for clients such as
origins, mirrors, alias, etc. Consumers can use this to
provide clients with initial or suggested configuration
information for a given publisher. If not present, the
publisher's identity should be assumed based on the
directory structure, while the refresh interval should
be assumed to be 4 hours.
- pkg5.repository
This file serves as an identifier and a place to store
configuration information specific to the repository.
It *is not* an equivalent to the existing cfg_cache
file which will no longer be used. Its format and
structure are as follows:
[repository]
version = <integer>
Any information found in the cfg_cache used in the previous
repository format related to a publisher is now stored in
the pub.p5i file for the related publisher. (Examples of
information include origins, mirrors, maintainer info,
etc.) As a result, the cfg_cache file is no longer used.
Any depot-specific properties, such as the feed icon, logo,
etc. are now completely managed using SMF or a user-provided
configuration file. This change was made not only to sim-
plify configuration, but to separate depot configuration
from repsitory configuration.
An example version 4 repository might be structured as
follows:
<REPO_ROOT>/
publisher/
example.com/
catalog/
catalog.attrs
catalog.base.C
file/
ff/
fffff277f5a8fb63e57670afc178415c2c5e706d
index/
__at_depend
...
pkg/
package%2Fpkg/
0.5.11%2C5.11-0.136%3A20100327T063139Z
trans/
1245176111_pkg%3A%2FBRCMbnx%400.5.11%2C5.11-0.116
%3A20090616T181511Z
pub.p5i
example.net/
catalog/
catalog.attrs
catalog.base.C
file/
af/
affff277f5a8fb63e57670afc178415c2c5e706d
index/
__at_depend
...
pkg/
package%2Fpkg/
0.5.11%2C5.11-0.133%3A20090327T062137Z
trans/
1245176111_pkg%3A%2FFAAMbnx%400.5.11%2C5.11-0.139
%3A20100616T181511Z
pub.p5i
pkg5.repository:
[repository]
version = 4
4.3. Archive Format:
4.3.1. Requirements:
The requirements for the on-disk archive format include:
- support for archives greater than 8GB in size,
- support for files in archive greater than 4GB in size,
- support for efficient storage of hard links,
- support for pathnames sigificantly greater than > 255
characters in length,
- core Python bindings exist or can be easily created using
an existing library,
- can be a container of compressed files, as opposed to a
compressed container of uncompressed files,
- open, royalty-free, well-documented format with wide
platform support and acceptance,
- multi-threaded decompression and compression possible,
- creation and basic manipulation of package archives
possible using widely-available tools,
- simple composition and filtering of its content should be
possible, and
- random access to the archive contents must be possible
without reading the entire archive file.
4.3.2. Candidates:
A number of potential archive formats have been considered
for use, including:
- 7z (7-Zip)
- cpio
- pax (portable archive exchange format)
- ZIP
The evaluations provided for each format here are not in-
tended to be exhaustive; rather they focus on the specific
requirements of this project. For more information about
these formats, and the documents used to evaluate them,
please refer to section 6 of this proposal.
4.3.3. 7z Evaluation:
The 7z format was rejected for the following reasons:
- Does not permit random access to archive contents or
requires the entire archive file to access the contents
and adding this would require a custom variation of 7z.
- Although the 7z format supports compression methods other
than LZMA, a primary motivator for using 7z would be the
ability to use LZMA natively as part of the conatiner
format. However, the tradeoffs in terms of CPU and memory
footprint currently make LZMA unsuitable for pkg(5) when
compared to other compression algorithms such as those
used by gzip(1).
- Use of the 7z format would require integration of the LZMA
SDK (which also provides a basic 7z API in C) and the cre-
ation of python bindings or the integration of a third
party's (such as pylzma).
- No native support for extended attributes or UNIX owner/
group permissions.
4.3.4. cpio Evaluation:
The cpio format doesn't natively support random access to
archive contents, but the format itself doesn't prevent
this. An index could be added first file in the archive
with the information needed to provide fast, random access
to the archive contents.
The cpio format was rejected for the following reasons:
- The length of pathnames in cpio archives is limited to
256 characters for the portable format.
- Available tools vary significantly in maximum archive size
support.
- The portable cpio format stores a copy of the file data
with every hard link in an archive instead of simply
storing a pointer to the source file in the archive.
4.3.4. PAX Evaluation:
The PAX format meets all of the requirements except that of
random access to archive contents. However, the format
itself doesn't prevent this. A table of contents file could
be supplied as the first file in the archive with the info-
rmation needed to provide fast, random access to the con-
tainer contents.
4.3.5. ZIP Evaluation:
The ZIP format meets all of the requirements listed above
(assuming that ZIP64 extensions are used), with the ex-
ceptions listed below for which it was rejected:
- The use or implementation of some of the functionality
documented in the .ZIP file format requires a license from
PKWARE.
- While random archive content access is possible, the ZIP
file format stores the index for the archive at the end of
the archive (as opposed to the beginning). This increases
the number of round trips that would be required for
potential remote random content access. It also means
that extraction requires multiple seeks to the end of the
file before any content can be extracted from the archive,
which can be detrimental to performance for some media
types (optical, etc.).
4.3.6. Evaluation Conclusion:
Based on the requirements set forth in section 4.3.1, the
PAX format was selected as the on-disk archive format
for pkg(5) packages. However, to enable efficient access
to the archive contents, an index file needs to be present
as the first file in the archive.
Early evaluations of an unoptimised prototype were performed
using a repository containing all packages for build 136 and
unbundleds. The on-disk size of the repository was appox-
imately 4.98G. The resulting archive was 5.0G in size, with
an archive index file 9.7M in size (when the index was comp-
ressed using gzip).
First time access to the prototype archive for extraction of
a single file after creation yielded a total time of approx-
imately 5 seconds compared to approximately 36-42 seconds
for utilities such as pax(1), tar(1), or gtar(1).
Creation of the archive took 7 minutes, 35 seconds on a
custom-built Intel Core 2 DUO E8400, with 8GB Memory,
and a 1TB 10000 RPM SATA Drive w/ 64MB Cache.
4.3.7. Package Archive Specification:
pkg(5) archive files will have an extension of 'p5p' which
will stand for 'pkg(5) package'. The format of these
archives matches that defined by IEEE Std 1003.1, 2004 for
the pax Interchange Format, with the exception that the
first archive entry is tagged with an extended pax archive
header that specifies the archive version and the version
of the pkg(5) API that was used to write it. In addition,
the file for the first archive entry must be the index
file file for the package archive. The layout can be
visualised as follows:
.--------------------------------------------------------.
| ustar header for pax header global archive data |
.--------------------------------------------------------.
| pax global extended header data for archive |
.--------------------------------------------------------.
| ustar header for pax header for archive index file |
.--------------------------------------------------------.
| pax extended header data for archive index file |
.--------------------------------------------------------.
| ustar header for package archive index file |
.--------------------------------------------------------.
| file data for package archive index file |
.--------------------------------------------------------.
| remaining archive data |
.________________________________________________________.
The archive and API version is stored in the header of the
index file instead of the global header for two reasons:
first, any headers in the global header are treated as
though they apply to every entry in the archive, and
secondly, the pax specification states that global headers
should not be used with interchange media that could suffer
partial data loss during transport. Since the archive
version primarily serves as a way for clients to reliably
determine if a "standard" pax archive versus one with an
index is being read, this approach seems reasonable.
The reason for this limitation is to ensure that clients
performing selective archive extraction can be guaranteed
to find the location and size of the package archive index
file without knowing the size of the header for the index
file in advance (this layout ensures that clients can
find the archive index and/or identify the archive in
the first 2048 bytes).
In addition, pkg(5) archives in this format make remote,
selective archive access possible. For example, a client
could request the first 2048 bytes of a pkg(5) archive file
from a remote repository, identify the offsets of the index
and then retrieve it using a HTTP/1.1 byte-ranges request.
Once it has the archive index file, it can then perform
additional byte-range requests to selectively transfer the
the data for a set of specific files from the archive. This
convention also optimises access to the archive for sources
that are heavily biased towards sequential reads.
The index file must be named using the following template
and be compressed using the gzip format described by RFCs
1951 and 1952, and formatted according to section 4.3.8:
p5p.index.<index_file_number>.v<index_version>.gz
<index_file_number> is an integer in string form that
indicates which index file this is. The number only
exists so that each index file can remain unique in
the archive. An archive may contain multiple index
files to support fast archive additions.
<index_version> is an integer in string form that
indicates the version of the index file. The initial
version for this proposal will be '0'.
However, if the first file in the archive is found to not
use the layout or format shown above, or any of the index
files in the archive are not in a format supported by the
client (version too old or too new), the archive must be
treated as a standard pax archive and some operations may
not be possible or experience degraded performance. The
same is also true if the index file is found to not match
the archive contents.
All entries in the archive (excluding any archive index
files) must conform to the repository layout specified in
section 4.2.2 of this proposal.
Since a pkg(5) repository can contain one or more packages,
pkg(5) archive files can also contain the data for one or
more packages. This allows easy redistribution of a single
package and all of its dependencies in a single file.
Finally, it should be noted that only ascii character path-
names are expected in the archive as the raw repository
format does not use or support unicode pathnames.
4.3.8. Package Archive Index Specification:
The pkg(5) archive index file enables fast, efficient access
to the contents of an archive. It contains an entry for all
files in the archive excluding the index file itself in the
following format (also referred to as index format version
0):
<name>NUL<offset>NUL<entry_size>NUL<size>NUL<typeflag>
NULNL
<name> is a string containing the pathname of the file
in the archive using only ascii characters. It can be
up to 65,535 bytes in length.
<offset> is an unsigned long long integer in string form
containing the relative offset in bytes of the first
header block for the file in the archive. The offset is
relative to the end of the last block of the index file
in the archive they are listed in.
<entry_size> is an unsigned long long integer in string
form containing the size of the file's entry in bytes
in the archive (including archive headers and trailers
for the entry).
<size> is an unsigned long long integer in string form
containing the size of the file in bytes in the archive.
<typeflag> is a single character representing the type
of the file in the archive. Possible values are:
0 Regular File
1 Hard Link
2 Symbolic Link
5 Directory or subdirectory
All values not listed above are reserved for future
use. Unrecognised values should be treated as a
regular file.
An example set of entries would appear as follows:
pkg5.repositoryNUL0NUL546NUL2560NUL0NUL
pkgNUL2560NUL0NUL1536NUL5NUL
pkg/service%2Ffault-managementNUL4096NUL0NUL1536NUL5NUL
It should be noted that other possible formats were
evaluated for the index file, including those based
on: JSON, XDR, and python's pack. However, all other
formats were found to be deficient for one or more
of the following reasons:
- larger in size
- no streaming support (required entire index file be
loaded into memory)
- significantly greater parsing times using currently
available Python libraries
- required developing an envelope format that could
contain the encoded data
5. Proposed Changes:
5.1. Client Support for filesystem-based Repository Access:
The pkg.client.api provided by pkg(5) will be updated to allow
access to repositories via the filesystem. All functionality
normally offered by pkg.depotd will be supported.
pkg(1) and packagemanager(1) will be modified to support the
use of URIs using the 'file' scheme. No user visible changes
will be made to any existing subcommands or options except
that URIs using the 'file' scheme will be allowed.
When accessing repositories using the 'file' scheme, clients
by default will not copy package file data into the client's
cache (e.g. /var/pkg/download). Instead, the transport system
will treat configured repositories as an additional read-only
cache.
5.2. Depot Storage, Client Transport and Publication Tool Update:
The pkg.server.repository module will be updated to support
the new repository format outlined in section 4.2.2. Existing
repositories will not automatically be upgraded, while new
repositories will use the new format. A new administrative
command detailed below has been introduced to allow upgrading
existing repositories to the new format.
These changes will automatically allow the client to access
repositories in the new format when using filesystem-based
access. Older clients will remain unable to access repo-
sitories in the new format.
The client transport system will be updated to support all
publication operations and the publication tools and project
private APIs will be changed to use the client transport
system.
The '-d' option of pkgrecv(1) will be changed such that if
the name of a file with a '.p5p' extension is specified,
and that file does not already exist, a pkg(5) archive
file will be created containing the specified packages.
If the file already exists, it will exit with an error.
When pkgrecv(1) creates pkg(5) archive files, it will omit
catalog and index data.
Due to the transport changes above, pkgrecv(1) will also
be able to use pkg(5) archive files as a source of package
data. pkgsend(1) will not support the use of pkg(5)
archive files as a destination due to the publication
model it currently uses.
To support the expanded multiple publisher version 4 format
of repositories, the depot server will be updated to respond
to requests as follows:
- If clients include the publisher prefix as part of the request
path, then responses will be for that specific publisher's
data. For example:
http://localhost/dev/opensolaris.org/manifest/
0/opensolaris.org/backup%2Fareca/7.1%2C5.11-0.134
%3A20100302T005731Z
http://localhost/dev/file/0/opensolaris.org/
2ce6c746c85cd7ac44571d094b53c5fe1bfc32c8
- The default publisher specified in the depot configuration
will be used when responding to requests for operations that
do not include the publisher prefix. For example:
http://localhost/dev/manifest/0/
backup%2Fareca/7.1%2C5.11-0.134%3A20100302T005731Z
...provides a response identical to the first case where the
publisher prefix was provided as part of the request. Those
expecting to maintain a large population of older clients
should reassign publisher URLs down a level, to include the
publisher explicitly although this is not required for
correct operation.
A new utility named pkgrepo will be added to facilitate the
creation and management of pkg(5) repositories. It will have
the following global options:
-s repo_uri_or_path
A URI or path specifying the location of a pkg(5)
package repository.
-? / --help
It will have the following subcommands:
create <uri_or_path>
Creates a pkg(5) repository at the specified location.
Can only be used with filesystem-based repositories.
publisher [<pub_prefix> ...]
Lists the publishers of packages in the repository:
PUBLISHER PACKAGES VERSIONS UPDATED
<pub_1> <num_uniq_pkgs> <num_pkg_vers> <cat_last_modified>
<pub_2> <num_uniq_pkgs> <num_pkg_vers> <cat_last_modified>
...
rebuild
Discards any catalog, search or other cached informaqtion
found in the repository and then re-creates it based on
the current contents of the repository. Can only be used
with filesystem-based repositories.
refresh
By default, catalogs any new packages found in the repo-
sitory and updates search indices. This is intended for
use with deferred publication (--no-catalog or --no-index
options of pkgsend). Can only be used with filesystem-based
repositories.
Options:
--no-catalog - doesn't add new packages
--no-index - doesn't refresh search indices
remove fmri_pattern ...
Removes the specified package(s) from the repository.
If more than one match is found for any given pattern,
the exact FMRI must be provided.
upgrade
Can only be used with filesystem-based repositories.
Upgrades the repository to the most current format if
possible.
Has these options:
-n determine whether the upgrade could be formed and exit
-v show a summary of what will be done, the current format
of the repository and what it will be upgraded to
5.3. Client Storage and Image Format Update:
To simplify and unify the storage format used by the client,
and pkg(5) repositories, the format of the client image
will be changed to use the structure described below.
For a version 3 image (the current format), the structure is as
follows:
<IMG_ROOT>
download/
<first two letters of file hash>/
<file-named-by-hash>
file/
gui_cache/
history/
index/
lost+found/
pkg/
<stem>/
<version>/
manifest
manifest.<cachefiles>
publisher/
<prefix>/
catalog/
certs/ (optional)
last_refreshed (optional)
state/
installed/
<image catalog files>
known/
<image catalog files>
tmp/
cfg_cache
lock
For a version 4 image (the proposed format), the structure is
as follows:
<IMG_ROOT>
cache/
index/
<api search index files>
publisher/
<publisher_prefix>/
catalog/
<repository composition cache files>
pkg/
<stem>/
<version>/
<manifest-cache-files>
tmp/
<api temporary files>
gui_cache/
<package manager data files>
history/
<client history files>
license/
<stem>/
<license files>
lost+found/
<salvaged filesystem objects>
publisher/
<prefix>/
certs/
<publisher signing certificates>
<otherwise as described in section 4.2.2>
ssl/
client ssl certificates>
state/
installed/
<image catalog files>
known/
<image catalog files>
pkg5.image (client configuration file; was cfg_cache)
A new property named 'version' will be added to the image
and will be readonly (cannot be set using the set-property
subcommand of pkg(1)).
Existing images will not automatically be upgraded to the new
format. To enable the upgrading of existing images to newer
formats, the following subcommands will be added:
update-format
Updates the format of the client's image to the current
format if possible.
5.4. Client and Depot Support for On-Disk Archive Format:
The pkg.server.repository module will be updated to support
the serving of a repository in readonly mode using a pkg(5)
archive file.
The pkg.client.api transport system will be updated to support
the usage of a pkg(5) archive file as an origin for package
data.
To support the specification of temporary origins, the install
and update subcommands will be modified by adding a '-g' option
to specify additional temporary package origin URIs or
the path to a pkg(5) archive file or pkg(5) info file. The
'-g' option may be specified multiple times. As an example:
$ pkg install -g /path/to/foo.p5p \
-g http://mytemprepo:10000/ \
-g file:/path/to/bar.p5p \
foo bar localpkg
pkg(5) archive files used as a source of package data during an
install or update operation will have their content cached by
the client before the operation begins. Any publishers found
in the archive will be temporarily added to the image if they do
not already exist. Publishers that were temporarily added but
not used during the operation will be removed after operation
completion or failure. Any package FMRIs or patterns provided
will be matched using only the sources provided using '-g'.
The pkg list and pkg info commands will also be updated by
adding the '-g' option described above, with the exception
that the '-g' option may only be specified once, and only
the source named will be used for the operation.
Using '-g' with the pkg list subcommand implies '-n' by default,
unless '-f' is specified; it also implies '-a'. To list all
versions, the '-f' option must be used. As an example:
$ pkg list -g /path/to/foo.p5p
NAME (PUBLISHER) VERSION STATE UFOXI
bar (example.com) 1.0-0.133 known -----
foo (example.com) 1.0-0.133 installed -----
$ pkg list -g file:/path/to/foo.p5p
NAME (PUBLISHER) VERSION STATE UFOXI
bar (example.com) 1.0-0.133 known -----
foo (example.com) 1.0-0.133 installed -----
$ pkg list -f -g http://example.com/multi_foo.p5p
NAME (PUBLISHER) VERSION STATE UFOXI
foo (example.com) 1.0-0.133 installed u----
foo (example.com) 2.0-0.133 known u----
foo (example.com) 3.0-0.133 known -----
$ pkg list -g file:/path/to/repo
NAME (PUBLISHER) VERSION STATE UFOXI
repopkg (example.com) 2.0-0.133 known -----
$ pkg list -g http://myrepo:10000
NAME (PUBLISHER) VERSION STATE UFOXI
localpkg (example.org) 3.0-0.133 known -----
Using '-g' with the pkg info subcommand implies '-r'. The '-l'
option cannot be used in combination with '-g'. As an example:
$ pkg info -g /path/to/bundle.p5p
Name: bar
Summary: A useful complement to foo.
State: Not Installed
...
Name: foo
Summary: Provides useful utilities.
State: Installed
...
'-g' was chosen for the option usage described above to match
the '-g' already used by set-publisher and image-create for
origins, and due to the unfortunate existing usage of '-s'
by the 'pkg list' subcommand.
6. Reference Documents:
Project team members and community members have provided a number of
informal comments that served as the basis for the goals of this