forked from oasis-tcs/virtio-spec
-
Notifications
You must be signed in to change notification settings - Fork 0
/
content.tex
5727 lines (4534 loc) · 246 KB
/
content.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
\chapter{Basic Facilities of a Virtio Device}\label{sec:Basic Facilities of a Virtio Device}
A virtio device is discovered and identified by a bus-specific method
(see the bus specific sections: \ref{sec:Virtio Transport Options / Virtio Over PCI Bus}~\nameref{sec:Virtio Transport Options / Virtio Over PCI Bus},
\ref{sec:Virtio Transport Options / Virtio Over MMIO}~\nameref{sec:Virtio Transport Options / Virtio Over MMIO} and \ref{sec:Virtio Transport Options / Virtio Over Channel I/O}~\nameref{sec:Virtio Transport Options / Virtio Over Channel I/O}). Each
device consists of the following parts:
\begin{itemize}
\item Device status field
\item Feature bits
\item Notifications
\item Device Configuration space
\item One or more virtqueues
\end{itemize}
\section{\field{Device Status} Field}\label{sec:Basic Facilities of a Virtio Device / Device Status Field}
During device initialization by a driver,
the driver follows the sequence of steps specified in
\ref{sec:General Initialization And Device Operation / Device
Initialization}.
The \field{device status} field provides a simple low-level
indication of the completed steps of this sequence.
It's most useful to imagine it hooked up to traffic
lights on the console indicating the status of each device. The
following bits are defined (listed below in the order in which
they would be typically set):
\begin{description}
\item[ACKNOWLEDGE (1)] Indicates that the guest OS has found the
device and recognized it as a valid virtio device.
\item[DRIVER (2)] Indicates that the guest OS knows how to drive the
device.
\begin{note}
There could be a significant (or infinite) delay before setting
this bit. For example, under Linux, drivers can be loadable modules.
\end{note}
\item[FAILED (128)] Indicates that something went wrong in the guest,
and it has given up on the device. This could be an internal
error, or the driver didn't like the device for some reason, or
even a fatal error during device operation.
\item[FEATURES_OK (8)] Indicates that the driver has acknowledged all the
features it understands, and feature negotiation is complete.
\item[DRIVER_OK (4)] Indicates that the driver is set up and ready to
drive the device.
\item[DEVICE_NEEDS_RESET (64)] Indicates that the device has experienced
an error from which it can't recover.
\end{description}
\drivernormative{\subsection}{Device Status Field}{Basic Facilities of a Virtio Device / Device Status Field}
The driver MUST update \field{device status},
setting bits to indicate the completed steps of the driver
initialization sequence specified in
\ref{sec:General Initialization And Device Operation / Device
Initialization}.
The driver MUST NOT clear a
\field{device status} bit. If the driver sets the FAILED bit,
the driver MUST later reset the device before attempting to re-initialize.
The driver SHOULD NOT rely on completion of operations of a
device if DEVICE_NEEDS_RESET is set.
\begin{note}
For example, the driver can't assume requests in flight will be
completed if DEVICE_NEEDS_RESET is set, nor can it assume that
they have not been completed. A good implementation will try to
recover by issuing a reset.
\end{note}
\devicenormative{\subsection}{Device Status Field}{Basic Facilities of a Virtio Device / Device Status Field}
The device MUST initialize \field{device status} to 0 upon reset.
The device MUST NOT consume buffers or send any used buffer
notifications to the driver before DRIVER_OK.
\label{sec:Basic Facilities of a Virtio Device / Device Status Field / DEVICENEEDSRESET}The device SHOULD set DEVICE_NEEDS_RESET when it enters an error state
that a reset is needed. If DRIVER_OK is set, after it sets DEVICE_NEEDS_RESET, the device
MUST send a device configuration change notification to the driver.
\section{Feature Bits}\label{sec:Basic Facilities of a Virtio Device / Feature Bits}
Each virtio device offers all the features it understands. During
device initialization, the driver reads this and tells the device the
subset that it accepts. The only way to renegotiate is to reset
the device.
This allows for forwards and backwards compatibility: if the device is
enhanced with a new feature bit, older drivers will not write that
feature bit back to the device. Similarly, if a driver is enhanced with a feature
that the device doesn't support, it see the new feature is not offered.
Feature bits are allocated as follows:
\begin{description}
\item[0 to 23] Feature bits for the specific device type
\item[24 to 37] Feature bits reserved for extensions to the queue and
feature negotiation mechanisms
\item[38 and above] Feature bits reserved for future extensions.
\end{description}
\begin{note}
For example, feature bit 0 for a network device (i.e.
Device ID 1) indicates that the device supports checksumming of
packets.
\end{note}
In particular, new fields in the device configuration space are
indicated by offering a new feature bit.
\drivernormative{\subsection}{Feature Bits}{Basic Facilities of a Virtio Device / Feature Bits}
The driver MUST NOT accept a feature which the device did not offer,
and MUST NOT accept a feature which requires another feature which was
not accepted.
The driver SHOULD go into backwards compatibility mode
if the device does not offer a feature it understands, otherwise MUST
set the FAILED \field{device status} bit and cease initialization.
\devicenormative{\subsection}{Feature Bits}{Basic Facilities of a Virtio Device / Feature Bits}
The device MUST NOT offer a feature which requires another feature
which was not offered. The device SHOULD accept any valid subset
of features the driver accepts, otherwise it MUST fail to set the
FEATURES_OK \field{device status} bit when the driver writes it.
If a device has successfully negotiated a set of features
at least once (by accepting the FEATURES_OK \field{device
status} bit during device initialization), then it SHOULD
NOT fail re-negotiation of the same set of features after
a device or system reset. Failure to do so would interfere
with resuming from suspend and error recovery.
\subsection{Legacy Interface: A Note on Feature
Bits}\label{sec:Basic Facilities of a Virtio Device / Feature
Bits / Legacy Interface: A Note on Feature Bits}
Transitional Drivers MUST detect Legacy Devices by detecting that
the feature bit VIRTIO_F_VERSION_1 is not offered.
Transitional devices MUST detect Legacy drivers by detecting that
VIRTIO_F_VERSION_1 has not been acknowledged by the driver.
In this case device is used through the legacy interface.
Legacy interface support is OPTIONAL.
Thus, both transitional and non-transitional devices and
drivers are compliant with this specification.
Requirements pertaining to transitional devices and drivers
is contained in sections named 'Legacy Interface' like this one.
When device is used through the legacy interface, transitional
devices and transitional drivers MUST operate according to the
requirements documented within these legacy interface sections.
Specification text within these sections generally does not apply
to non-transitional devices.
\section{Notifications}\label{sec:Basic Facilities of a Virtio Device
/ Notifications}
The notion of sending a notification (driver to device or device
to driver) plays an important role in this specification. The
modus operandi of the notifications is transport specific.
There are three types of notifications:
\begin{itemize}
\item configuration change notification
\item available buffer notification
\item used buffer notification.
\end{itemize}
Configuration change notifications and used buffer notifications are sent
by the device, the recipient is the driver. A configuration change
notification indicates that the device configuration space has changed; a
used buffer notification indicates that a buffer may have been made used
on the virtqueue designated by the notification.
Available buffer notifications are sent by the driver, the recipient is
the device. This type of notification indicates that a buffer may have
been made available on the virtqueue designated by the notification.
The semantics, the transport-specific implementations, and other
important aspects of the different notifications are specified in detail
in the following chapters.
Most transports implement notifications sent by the device to the
driver using interrupts. Therefore, in previous versions of this
specification, these notifications were often called interrupts.
Some names defined in this specification still retain this interrupt
terminology. Occasionally, the term event is used to refer to
a notification or a receipt of a notification.
\section{Device Configuration Space}\label{sec:Basic Facilities of a Virtio Device / Device Configuration Space}
Device configuration space is generally used for rarely-changing or
initialization-time parameters. Where configuration fields are
optional, their existence is indicated by feature bits: Future
versions of this specification will likely extend the device
configuration space by adding extra fields at the tail.
\begin{note}
The device configuration space uses the little-endian format
for multi-byte fields.
\end{note}
Each transport also provides a generation count for the device configuration
space, which will change whenever there is a possibility that two
accesses to the device configuration space can see different versions of that
space.
\drivernormative{\subsection}{Device Configuration Space}{Basic Facilities of a Virtio Device / Device Configuration Space}
Drivers MUST NOT assume reads from
fields greater than 32 bits wide are atomic, nor are reads from
multiple fields: drivers SHOULD read device configuration space fields like so:
\begin{lstlisting}
u32 before, after;
do {
before = get_config_generation(device);
// read config entry/entries.
after = get_config_generation(device);
} while (after != before);
\end{lstlisting}
For optional configuration space fields, the driver MUST check that the
corresponding feature is offered before accessing that part of the configuration
space.
\begin{note}
See section \ref{sec:General Initialization And Device Operation / Device Initialization} for details on feature negotiation.
\end{note}
Drivers MUST
NOT limit structure size and device configuration space size. Instead,
drivers SHOULD only check that device configuration space is {\em large enough} to
contain the fields necessary for device operation.
\begin{note}
For example, if the specification states that device configuration
space 'includes a single 8-bit field' drivers should understand this to mean that
the device configuration space might also include an arbitrary amount of
tail padding, and accept any device configuration space size equal to or
greater than the specified 8-bit size.
\end{note}
\devicenormative{\subsection}{Device Configuration Space}{Basic Facilities of a Virtio Device / Device Configuration Space}
The device MUST allow reading of any device-specific configuration
field before FEATURES_OK is set by the driver. This includes fields which are
conditional on feature bits, as long as those feature bits are offered
by the device.
\subsection{Legacy Interface: A Note on Device Configuration Space endian-ness}\label{sec:Basic Facilities of a Virtio Device / Device Configuration Space / Legacy Interface: A Note on Configuration Space endian-ness}
Note that for legacy interfaces, device configuration space is generally the
guest's native endian, rather than PCI's little-endian.
The correct endian-ness is documented for each device.
\subsection{Legacy Interface: Device Configuration Space}\label{sec:Basic Facilities of a Virtio Device / Device Configuration Space / Legacy Interface: Device Configuration Space}
Legacy devices did not have a configuration generation field, thus are
susceptible to race conditions if configuration is updated. This
affects the block \field{capacity} (see \ref{sec:Device Types /
Block Device / Device configuration layout}) and
network \field{mac} (see \ref{sec:Device Types / Network Device /
Device configuration layout}) fields;
when using the legacy interface, drivers SHOULD
read these fields multiple times until two reads generate a consistent
result.
\section{Virtqueues}\label{sec:Basic Facilities of a Virtio Device / Virtqueues}
The mechanism for bulk data transport on virtio devices is
pretentiously called a virtqueue. Each device can have zero or more
virtqueues\footnote{For example, the simplest network device has one virtqueue for
transmit and one for receive.}.
Driver makes requests available to device by adding
an available buffer to the queue - i.e. adding a buffer
describing the request to a virtqueue, and optionally triggering
a driver event - i.e. sending an available buffer notification
to the device.
Device executes the requests and - when complete - adds
a used buffer to the queue - i.e. lets the driver
know by marking the buffer as used. Device can then trigger
a device event - i.e. send a used buffer notification to the driver.
Device reports the number of bytes it has written to memory for
each buffer it uses. This is referred to as ``used length''.
Device is not generally required to use buffers in
the same order in which they have been made available
by the driver.
Some devices always use descriptors in the same order in which
they have been made available. These devices can offer the
VIRTIO_F_IN_ORDER feature. If negotiated, this knowledge
might allow optimizations or simplify driver and/or device code.
Each virtqueue can consist of up to 3 parts:
\begin{itemize}
\item Descriptor Area - used for describing buffers
\item Driver Area - extra data supplied by driver to the device
\item Device Area - extra data supplied by device to driver
\end{itemize}
\begin{note}
Note that previous versions of this spec used different names for
these parts (following \ref{sec:Basic Facilities of a Virtio Device / Split Virtqueues}):
\begin{itemize}
\item Descriptor Table - for the Descriptor Area
\item Available Ring - for the Driver Area
\item Used Ring - for the Device Area
\end{itemize}
\end{note}
Two formats are supported: Split Virtqueues (see \ref{sec:Basic
Facilities of a Virtio Device / Split
Virtqueues}~\nameref{sec:Basic Facilities of a Virtio Device /
Split Virtqueues}) and Packed Virtqueues (see \ref{sec:Basic
Facilities of a Virtio Device / Packed
Virtqueues}~\nameref{sec:Basic Facilities of a Virtio Device /
Packed Virtqueues}).
Every driver and device supports either the Packed or the Split
Virtqueue format, or both.
\input{split-ring.tex}
\input{packed-ring.tex}
\subsection{Driver notifications} \label{sec:Virtqueues / Driver notifications}
The driver is sometimes required to send an available buffer
notification to the device.
When VIRTIO_F_NOTIFICATION_DATA has not been negotiated,
this notification involves sending the
virtqueue number to the device (method depending on the transport).
However, some devices benefit from the ability to find out the
amount of available data in the queue without accessing the virtqueue in memory:
for efficiency or as a debugging aid.
To help with these optimizations, when VIRTIO_F_NOTIFICATION_DATA
has been negotiated, driver notifications to the device include
the following information:
\begin{description}
\item [vqn] VQ number to be notified.
\item [next_off] Offset
within the ring where the next available ring entry
will be written.
When VIRTIO_F_RING_PACKED has not been negotiated this refers to the
15 least significant bits of the available index.
When VIRTIO_F_RING_PACKED has been negotiated this refers to the offset
(in units of descriptor entries)
within the descriptor ring where the next available
descriptor will be written.
\item [next_wrap] Wrap Counter.
With VIRTIO_F_RING_PACKED this is the wrap counter
referring to the next available descriptor.
Without VIRTIO_F_RING_PACKED this is the most significant bit
(bit 15) of the available index.
\end{description}
Note that the driver can send multiple notifications even without
making any more buffers available. When VIRTIO_F_NOTIFICATION_DATA
has been negotiated, these notifications would then have
identical \field{next_off} and \field{next_wrap} values.
\chapter{General Initialization And Device Operation}\label{sec:General Initialization And Device Operation}
We start with an overview of device initialization, then expand on the
details of the device and how each step is preformed. This section
is best read along with the bus-specific section which describes
how to communicate with the specific device.
\section{Device Initialization}\label{sec:General Initialization And Device Operation / Device Initialization}
\drivernormative{\subsection}{Device Initialization}{General Initialization And Device Operation / Device Initialization}
The driver MUST follow this sequence to initialize a device:
\begin{enumerate}
\item Reset the device.
\item Set the ACKNOWLEDGE status bit: the guest OS has noticed the device.
\item Set the DRIVER status bit: the guest OS knows how to drive the device.
\item\label{itm:General Initialization And Device Operation /
Device Initialization / Read feature bits} Read device feature bits, and write the subset of feature bits
understood by the OS and driver to the device. During this step the
driver MAY read (but MUST NOT write) the device-specific configuration fields to check that it can support the device before accepting it.
\item\label{itm:General Initialization And Device Operation / Device Initialization / Set FEATURES-OK} Set the FEATURES_OK status bit. The driver MUST NOT accept
new feature bits after this step.
\item\label{itm:General Initialization And Device Operation / Device Initialization / Re-read FEATURES-OK} Re-read \field{device status} to ensure the FEATURES_OK bit is still
set: otherwise, the device does not support our subset of features
and the device is unusable.
\item\label{itm:General Initialization And Device Operation / Device Initialization / Device-specific Setup} Perform device-specific setup, including discovery of virtqueues for the
device, optional per-bus setup, reading and possibly writing the
device's virtio configuration space, and population of virtqueues.
\item\label{itm:General Initialization And Device Operation / Device Initialization / Set DRIVER-OK} Set the DRIVER_OK status bit. At this point the device is
``live''.
\end{enumerate}
If any of these steps go irrecoverably wrong, the driver SHOULD
set the FAILED status bit to indicate that it has given up on the
device (it can reset the device later to restart if desired). The
driver MUST NOT continue initialization in that case.
The driver MUST NOT send any buffer available notifications to
the device before setting DRIVER_OK.
\subsection{Legacy Interface: Device Initialization}\label{sec:General Initialization And Device Operation / Device Initialization / Legacy Interface: Device Initialization}
Legacy devices did not support the FEATURES_OK status bit, and thus did
not have a graceful way for the device to indicate unsupported feature
combinations. They also did not provide a clear mechanism to end
feature negotiation, which meant that devices finalized features on
first-use, and no features could be introduced which radically changed
the initial operation of the device.
Legacy driver implementations often used the device before setting the
DRIVER_OK bit, and sometimes even before writing the feature bits
to the device.
The result was the steps \ref{itm:General Initialization And
Device Operation / Device Initialization / Set FEATURES-OK} and
\ref{itm:General Initialization And Device Operation / Device
Initialization / Re-read FEATURES-OK} were omitted, and steps
\ref{itm:General Initialization And Device Operation /
Device Initialization / Read feature bits},
\ref{itm:General Initialization And Device Operation / Device Initialization / Device-specific Setup} and \ref{itm:General Initialization And Device Operation / Device Initialization / Set DRIVER-OK}
were conflated.
Therefore, when using the legacy interface:
\begin{itemize}
\item
The transitional driver MUST execute the initialization
sequence as described in \ref{sec:General Initialization And Device
Operation / Device Initialization}
but omitting the steps \ref{itm:General Initialization And Device
Operation / Device Initialization / Set FEATURES-OK} and
\ref{itm:General Initialization And Device Operation / Device
Initialization / Re-read FEATURES-OK}.
\item
The transitional device MUST support the driver
writing device configuration fields
before the step \ref{itm:General Initialization And Device Operation /
Device Initialization / Read feature bits}.
\item
The transitional device MUST support the driver
using the device before the step \ref{itm:General Initialization
And Device Operation / Device Initialization / Set DRIVER-OK}.
\end{itemize}
\section{Device Operation}\label{sec:General Initialization And Device Operation / Device Operation}
When operating the device, each field in the device configuration
space can be changed by either the driver or the device.
Whenever such a configuration change is triggered by the device,
driver is notified. This makes it possible for drivers to
cache device configuration, avoiding expensive configuration
reads unless notified.
\subsection{Notification of Device Configuration Changes}\label{sec:General Initialization And Device Operation / Device Operation / Notification of Device Configuration Changes}
For devices where the device-specific configuration information can be
changed, a configuration change notification is sent when a
device-specific configuration change occurs.
In addition, this notification is triggered by the device setting
DEVICE_NEEDS_RESET (see \ref{sec:Basic Facilities of a Virtio Device / Device Status Field / DEVICENEEDSRESET}).
\section{Device Cleanup}\label{sec:General Initialization And Device Operation / Device Cleanup}
Once the driver has set the DRIVER_OK status bit, all the configured
virtqueue of the device are considered live. None of the virtqueues
of a device are live once the device has been reset.
\drivernormative{\subsection}{Device Cleanup}{General Initialization And Device Operation / Device Cleanup}
A driver MUST NOT alter virtqueue entries for exposed buffers -
i.e. buffers which have been
made available to the device (and not been used by the device)
of a live virtqueue.
Thus a driver MUST ensure a virtqueue isn't live (by device reset) before removing exposed buffers.
\chapter{Virtio Transport Options}\label{sec:Virtio Transport Options}
Virtio can use various different buses, thus the standard is split
into virtio general and bus-specific sections.
\section{Virtio Over PCI Bus}\label{sec:Virtio Transport Options / Virtio Over PCI Bus}
Virtio devices are commonly implemented as PCI devices.
A Virtio device can be implemented as any kind of PCI device:
a Conventional PCI device or a PCI Express
device. To assure designs meet the latest level
requirements, see
the PCI-SIG home page at \url{http://www.pcisig.com} for any
approved changes.
\devicenormative{\subsection}{Virtio Over PCI Bus}{Virtio Transport Options / Virtio Over PCI Bus}
A Virtio device using Virtio Over PCI Bus MUST expose to
guest an interface that meets the specification requirements of
the appropriate PCI specification: \hyperref[intro:PCI]{[PCI]}
and \hyperref[intro:PCIe]{[PCIe]}
respectively.
\subsection{PCI Device Discovery}\label{sec:Virtio Transport Options / Virtio Over PCI Bus / PCI Device Discovery}
Any PCI device with PCI Vendor ID 0x1AF4, and PCI Device ID 0x1000 through
0x107F inclusive is a virtio device. The actual value within this range
indicates which virtio device is supported by the device.
The PCI Device ID is calculated by adding 0x1040 to the Virtio Device ID,
as indicated in section \ref{sec:Device Types}.
Additionally, devices MAY utilize a Transitional PCI Device ID range,
0x1000 to 0x103F depending on the device type.
\devicenormative{\subsubsection}{PCI Device Discovery}{Virtio Transport Options / Virtio Over PCI Bus / PCI Device Discovery}
Devices MUST have the PCI Vendor ID 0x1AF4.
Devices MUST either have the PCI Device ID calculated by adding 0x1040
to the Virtio Device ID, as indicated in section \ref{sec:Device
Types} or have the Transitional PCI Device ID depending on the device type,
as follows:
\begin{tabular}{|l|c|}
\hline
Transitional PCI Device ID & Virtio Device \\
\hline \hline
0x1000 & network card \\
\hline
0x1001 & block device \\
\hline
0x1002 & memory ballooning (traditional) \\
\hline
0x1003 & console \\
\hline
0x1004 & SCSI host \\
\hline
0x1005 & entropy source \\
\hline
0x1009 & 9P transport \\
\hline
\end{tabular}
For example, the network card device with the Virtio Device ID 1
has the PCI Device ID 0x1041 or the Transitional PCI Device ID 0x1000.
The PCI Subsystem Vendor ID and the PCI Subsystem Device ID MAY reflect
the PCI Vendor and Device ID of the environment (for informational purposes by the driver).
Non-transitional devices SHOULD have a PCI Device ID in the range
0x1040 to 0x107f.
Non-transitional devices SHOULD have a PCI Revision ID of 1 or higher.
Non-transitional devices SHOULD have a PCI Subsystem Device ID of 0x40 or higher.
This is to reduce the chance of a legacy driver attempting
to drive the device.
\drivernormative{\subsubsection}{PCI Device Discovery}{Virtio Transport Options / Virtio Over PCI Bus / PCI Device Discovery}
Drivers MUST match devices with the PCI Vendor ID 0x1AF4 and
the PCI Device ID in the range 0x1040 to 0x107f,
calculated by adding 0x1040 to the Virtio Device ID,
as indicated in section \ref{sec:Device Types}.
Drivers for device types listed in section \ref{sec:Virtio
Transport Options / Virtio Over PCI Bus / PCI Device Discovery}
MUST match devices with the PCI Vendor ID 0x1AF4 and
the Transitional PCI Device ID indicated in section
\ref{sec:Virtio
Transport Options / Virtio Over PCI Bus / PCI Device Discovery}.
Drivers MUST match any PCI Revision ID value.
Drivers MAY match any PCI Subsystem Vendor ID and any
PCI Subsystem Device ID value.
\subsubsection{Legacy Interfaces: A Note on PCI Device Discovery}\label{sec:Virtio Transport Options / Virtio Over PCI Bus / PCI Device Discovery / Legacy Interfaces: A Note on PCI Device Discovery}
Transitional devices MUST have a PCI Revision ID of 0.
Transitional devices MUST have the PCI Subsystem Device ID
matching the Virtio Device ID, as indicated in section \ref{sec:Device Types}.
Transitional devices MUST have the Transitional PCI Device ID in
the range 0x1000 to 0x103f.
This is to match legacy drivers.
\subsection{PCI Device Layout}\label{sec:Virtio Transport Options / Virtio Over PCI Bus / PCI Device Layout}
The device is configured via I/O and/or memory regions (though see
\ref{sec:Virtio Transport Options / Virtio Over PCI Bus / PCI Device Layout / PCI configuration access capability}
for access via the PCI configuration space), as specified by Virtio
Structure PCI Capabilities.
Fields of different sizes are present in the device
configuration regions.
All 64-bit, 32-bit and 16-bit fields are little-endian.
64-bit fields are to be treated as two 32-bit fields,
with low 32 bit part followed by the high 32 bit part.
\drivernormative{\subsubsection}{PCI Device Layout}{Virtio Transport Options / Virtio Over PCI Bus / PCI Device Layout}
For device configuration access, the driver MUST use 8-bit wide
accesses for 8-bit wide fields, 16-bit wide and aligned accesses
for 16-bit wide fields and 32-bit wide and aligned accesses for
32-bit and 64-bit wide fields. For 64-bit fields, the driver MAY
access each of the high and low 32-bit parts of the field
independently.
\devicenormative{\subsubsection}{PCI Device Layout}{Virtio Transport Options / Virtio Over PCI Bus / PCI Device Layout}
For 64-bit device configuration fields, the device MUST allow driver
independent access to high and low 32-bit parts of the field.
\subsection{Virtio Structure PCI Capabilities}\label{sec:Virtio Transport Options / Virtio Over PCI Bus / Virtio Structure PCI Capabilities}
The virtio device configuration layout includes several structures:
\begin{itemize}
\item Common configuration
\item Notifications
\item ISR Status
\item Device-specific configuration (optional)
\item PCI configuration access
\end{itemize}
Each structure can be mapped by a Base Address register (BAR) belonging to
the function, or accessed via the special VIRTIO_PCI_CAP_PCI_CFG field in the PCI configuration space.
The location of each structure is specified using a vendor-specific PCI capability located
on the capability list in PCI configuration space of the device.
This virtio structure capability uses little-endian format; all fields are
read-only for the driver unless stated otherwise:
\begin{lstlisting}
struct virtio_pci_cap {
u8 cap_vndr; /* Generic PCI field: PCI_CAP_ID_VNDR */
u8 cap_next; /* Generic PCI field: next ptr. */
u8 cap_len; /* Generic PCI field: capability length */
u8 cfg_type; /* Identifies the structure. */
u8 bar; /* Where to find it. */
u8 padding[3]; /* Pad to full dword. */
le32 offset; /* Offset within bar. */
le32 length; /* Length of the structure, in bytes. */
};
\end{lstlisting}
This structure can be followed by extra data, depending on
\field{cfg_type}, as documented below.
The fields are interpreted as follows:
\begin{description}
\item[\field{cap_vndr}]
0x09; Identifies a vendor-specific capability.
\item[\field{cap_next}]
Link to next capability in the capability list in the PCI configuration space.
\item[\field{cap_len}]
Length of this capability structure, including the whole of
struct virtio_pci_cap, and extra data if any.
This length MAY include padding, or fields unused by the driver.
\item[\field{cfg_type}]
identifies the structure, according to the following table:
\begin{lstlisting}
/* Common configuration */
#define VIRTIO_PCI_CAP_COMMON_CFG 1
/* Notifications */
#define VIRTIO_PCI_CAP_NOTIFY_CFG 2
/* ISR Status */
#define VIRTIO_PCI_CAP_ISR_CFG 3
/* Device specific configuration */
#define VIRTIO_PCI_CAP_DEVICE_CFG 4
/* PCI configuration access */
#define VIRTIO_PCI_CAP_PCI_CFG 5
\end{lstlisting}
Any other value is reserved for future use.
Each structure is detailed individually below.
The device MAY offer more than one structure of any type - this makes it
possible for the device to expose multiple interfaces to drivers. The order of
the capabilities in the capability list specifies the order of preference
suggested by the device.
\begin{note}
For example, on some hypervisors, notifications using IO accesses are
faster than memory accesses. In this case, the device would expose two
capabilities with \field{cfg_type} set to VIRTIO_PCI_CAP_NOTIFY_CFG:
the first one addressing an I/O BAR, the second one addressing a memory BAR.
In this example, the driver would use the I/O BAR if I/O resources are available, and fall back on
memory BAR when I/O resources are unavailable.
\end{note}
\item[\field{bar}]
values 0x0 to 0x5 specify a Base Address register (BAR) belonging to
the function located beginning at 10h in PCI Configuration Space
and used to map the structure into Memory or I/O Space.
The BAR is permitted to be either 32-bit or 64-bit, it can map Memory Space
or I/O Space.
Any other value is reserved for future use.
\item[\field{offset}]
indicates where the structure begins relative to the base address associated
with the BAR. The alignment requirements of \field{offset} are indicated
in each structure-specific section below.
\item[\field{length}]
indicates the length of the structure.
\field{length} MAY include padding, or fields unused by the driver, or
future extensions.
\begin{note}
For example, a future device might present a large structure size of several
MBytes.
As current devices never utilize structures larger than 4KBytes in size,
driver MAY limit the mapped structure size to e.g.
4KBytes (thus ignoring parts of structure after the first
4KBytes) to allow forward compatibility with such devices without loss of
functionality and without wasting resources.
\end{note}
\end{description}
\drivernormative{\subsubsection}{Virtio Structure PCI Capabilities}{Virtio Transport Options / Virtio Over PCI Bus / Virtio Structure PCI Capabilities}
The driver MUST ignore any vendor-specific capability structure which has
a reserved \field{cfg_type} value.
The driver SHOULD use the first instance of each virtio structure type they can
support.
The driver MUST accept a \field{cap_len} value which is larger than specified here.
The driver MUST ignore any vendor-specific capability structure which has
a reserved \field{bar} value.
The drivers SHOULD only map part of configuration structure
large enough for device operation. The drivers MUST handle
an unexpectedly large \field{length}, but MAY check that \field{length}
is large enough for device operation.
The driver MUST NOT write into any field of the capability structure,
with the exception of those with \field{cap_type} VIRTIO_PCI_CAP_PCI_CFG as
detailed in \ref{drivernormative:Virtio Transport Options / Virtio Over PCI Bus / PCI Device Layout / PCI configuration access capability}.
\devicenormative{\subsubsection}{Virtio Structure PCI Capabilities}{Virtio Transport Options / Virtio Over PCI Bus / Virtio Structure PCI Capabilities}
The device MUST include any extra data (from the beginning of the \field{cap_vndr} field
through end of the extra data fields if any) in \field{cap_len}.
The device MAY append extra data
or padding to any structure beyond that.
If the device presents multiple structures of the same type, it SHOULD order
them from optimal (first) to least-optimal (last).
\subsubsection{Common configuration structure layout}\label{sec:Virtio Transport Options / Virtio Over PCI Bus / PCI Device Layout / Common configuration structure layout}
The common configuration structure is found at the \field{bar} and \field{offset} within the VIRTIO_PCI_CAP_COMMON_CFG capability; its layout is below.
\begin{lstlisting}
struct virtio_pci_common_cfg {
/* About the whole device. */
le32 device_feature_select; /* read-write */
le32 device_feature; /* read-only for driver */
le32 driver_feature_select; /* read-write */
le32 driver_feature; /* read-write */
le16 msix_config; /* read-write */
le16 num_queues; /* read-only for driver */
u8 device_status; /* read-write */
u8 config_generation; /* read-only for driver */
/* About a specific virtqueue. */
le16 queue_select; /* read-write */
le16 queue_size; /* read-write */
le16 queue_msix_vector; /* read-write */
le16 queue_enable; /* read-write */
le16 queue_notify_off; /* read-only for driver */
le64 queue_desc; /* read-write */
le64 queue_driver; /* read-write */
le64 queue_device; /* read-write */
};
\end{lstlisting}
\begin{description}
\item[\field{device_feature_select}]
The driver uses this to select which feature bits \field{device_feature} shows.
Value 0x0 selects Feature Bits 0 to 31, 0x1 selects Feature Bits 32 to 63, etc.
\item[\field{device_feature}]
The device uses this to report which feature bits it is
offering to the driver: the driver writes to
\field{device_feature_select} to select which feature bits are presented.
\item[\field{driver_feature_select}]
The driver uses this to select which feature bits \field{driver_feature} shows.
Value 0x0 selects Feature Bits 0 to 31, 0x1 selects Feature Bits 32 to 63, etc.
\item[\field{driver_feature}]
The driver writes this to accept feature bits offered by the device.
Driver Feature Bits selected by \field{driver_feature_select}.
\item[\field{config_msix_vector}]
The driver sets the Configuration Vector for MSI-X.
\item[\field{num_queues}]
The device specifies the maximum number of virtqueues supported here.
\item[\field{device_status}]
The driver writes the device status here (see \ref{sec:Basic Facilities of a Virtio Device / Device Status Field}). Writing 0 into this
field resets the device.
\item[\field{config_generation}]
Configuration atomicity value. The device changes this every time the
configuration noticeably changes.
\item[\field{queue_select}]
Queue Select. The driver selects which virtqueue the following
fields refer to.
\item[\field{queue_size}]
Queue Size. On reset, specifies the maximum queue size supported by
the device. This can be modified by the driver to reduce memory requirements.
A 0 means the queue is unavailable.
\item[\field{queue_msix_vector}]
The driver uses this to specify the queue vector for MSI-X.
\item[\field{queue_enable}]
The driver uses this to selectively prevent the device from executing requests from this virtqueue.
1 - enabled; 0 - disabled.
\item[\field{queue_notify_off}]
The driver reads this to calculate the offset from start of Notification structure at
which this virtqueue is located.
\begin{note} this is \em{not} an offset in bytes.
See \ref{sec:Virtio Transport Options / Virtio Over PCI Bus / PCI Device Layout / Notification capability} below.
\end{note}
\item[\field{queue_desc}]
The driver writes the physical address of Descriptor Area here. See section \ref{sec:Basic Facilities of a Virtio Device / Virtqueues}.
\item[\field{queue_driver}]
The driver writes the physical address of Driver Area here. See section \ref{sec:Basic Facilities of a Virtio Device / Virtqueues}.
\item[\field{queue_device}]
The driver writes the physical address of Device Area here. See section \ref{sec:Basic Facilities of a Virtio Device / Virtqueues}.
\end{description}
\devicenormative{\paragraph}{Common configuration structure layout}{Virtio Transport Options / Virtio Over PCI Bus / PCI Device Layout / Common configuration structure layout}
\field{offset} MUST be 4-byte aligned.
The device MUST present at least one common configuration capability.
The device MUST present the feature bits it is offering in \field{device_feature}, starting at bit \field{device_feature_select} $*$ 32 for any \field{device_feature_select} written by the driver.
\begin{note}
This means that it will present 0 for any \field{device_feature_select} other than 0 or 1, since no feature defined here exceeds 63.
\end{note}
The device MUST present any valid feature bits the driver has written in \field{driver_feature}, starting at bit \field{driver_feature_select} $*$ 32 for any \field{driver_feature_select} written by the driver. Valid feature bits are those which are subset of the corresponding \field{device_feature} bits. The device MAY present invalid bits written by the driver.
\begin{note}
This means that a device can ignore writes for feature bits it never
offers, and simply present 0 on reads. Or it can just mirror what the driver wrote
(but it will still have to check them when the driver sets FEATURES_OK).
\end{note}
\begin{note}
A driver shouldn't write invalid bits anyway, as per \ref{drivernormative:General Initialization And Device Operation / Device Initialization}, but this attempts to handle it.
\end{note}
The device MUST present a changed \field{config_generation} after the
driver has read a device-specific configuration value which has
changed since any part of the device-specific configuration was last
read.
\begin{note}
As \field{config_generation} is an 8-bit value, simply incrementing it
on every configuration change could violate this requirement due to wrap.
Better would be to set an internal flag when it has changed,
and if that flag is set when the driver reads from the device-specific
configuration, increment \field{config_generation} and clear the flag.
\end{note}
The device MUST reset when 0 is written to \field{device_status}, and
present a 0 in \field{device_status} once that is done.
The device MUST present a 0 in \field{queue_enable} on reset.
The device MUST present a 0 in \field{queue_size} if the virtqueue
corresponding to the current \field{queue_select} is unavailable.
If VIRTIO_F_RING_PACKED has not been negotiated, the device MUST
present either a value of 0 or a power of 2 in
\field{queue_size}.
\drivernormative{\paragraph}{Common configuration structure layout}{Virtio Transport Options / Virtio Over PCI Bus / PCI Device Layout / Common configuration structure layout}
The driver MUST NOT write to \field{device_feature}, \field{num_queues}, \field{config_generation} or \field{queue_notify_off}.
If VIRTIO_F_RING_PACKED has been negotiated,
the driver MUST NOT write the value 0 to \field{queue_size}.
If VIRTIO_F_RING_PACKED has not been negotiated,
the driver MUST NOT write a value which is not a power of 2 to \field{queue_size}.
The driver MUST configure the other virtqueue fields before enabling the virtqueue
with \field{queue_enable}.
After writing 0 to \field{device_status}, the driver MUST wait for a read of
\field{device_status} to return 0 before reinitializing the device.
The driver MUST NOT write a 0 to \field{queue_enable}.
\subsubsection{Notification structure layout}\label{sec:Virtio Transport Options / Virtio Over PCI Bus / PCI Device Layout / Notification capability}
The notification location is found using the VIRTIO_PCI_CAP_NOTIFY_CFG
capability. This capability is immediately followed by an additional
field, like so:
\begin{lstlisting}
struct virtio_pci_notify_cap {
struct virtio_pci_cap cap;
le32 notify_off_multiplier; /* Multiplier for queue_notify_off. */
};
\end{lstlisting}
\field{notify_off_multiplier} is combined with the \field{queue_notify_off} to
derive the Queue Notify address within a BAR for a virtqueue:
\begin{lstlisting}
cap.offset + queue_notify_off * notify_off_multiplier
\end{lstlisting}
The \field{cap.offset} and \field{notify_off_multiplier} are taken from the
notification capability structure above, and the \field{queue_notify_off} is
taken from the common configuration structure.
\begin{note}
For example, if \field{notifier_off_multiplier} is 0, the device uses
the same Queue Notify address for all queues.
\end{note}
\devicenormative{\paragraph}{Notification capability}{Virtio Transport Options / Virtio Over PCI Bus / PCI Device Layout / Notification capability}
The device MUST present at least one notification capability.
For devices not offering VIRTIO_F_NOTIFICATION_DATA:
The \field{cap.offset} MUST be 2-byte aligned.
The device MUST either present \field{notify_off_multiplier} as an even power of 2,
or present \field{notify_off_multiplier} as 0.
The value \field{cap.length} presented by the device MUST be at least 2
and MUST be large enough to support queue notification offsets
for all supported queues in all possible configurations.
For all queues, the value \field{cap.length} presented by the device MUST satisfy:
\begin{lstlisting}
cap.length >= queue_notify_off * notify_off_multiplier + 2
\end{lstlisting}
For devices offering VIRTIO_F_NOTIFICATION_DATA:
The device MUST either present \field{notify_off_multiplier} as a
number that is a power of 2 that is also a multiple 4,
or present \field{notify_off_multiplier} as 0.
The \field{cap.offset} MUST be 4-byte aligned.
The value \field{cap.length} presented by the device MUST be at least 4
and MUST be large enough to support queue notification offsets
for all supported queues in all possible configurations.
For all queues, the value \field{cap.length} presented by the device MUST satisfy:
\begin{lstlisting}
cap.length >= queue_notify_off * notify_off_multiplier + 4
\end{lstlisting}
\subsubsection{ISR status capability}\label{sec:Virtio Transport Options / Virtio Over PCI Bus / PCI Device Layout / ISR status capability}
The VIRTIO_PCI_CAP_ISR_CFG capability
refers to at least a single byte, which contains the 8-bit ISR status field
to be used for INT\#x interrupt handling.
The \field{offset} for the \field{ISR status} has no alignment requirements.
The ISR bits allow the device to distinguish between device-specific configuration