summaryrefslogtreecommitdiff
path: root/lib/librte_vhost
AgeCommit message (Collapse)Author
2016-07-25vhost: fix off-by-one error on descriptor number checkMaxime Coquelin
nr_desc is not an index but the number of descriptors, so can be equal to the virtqueue size. Fixes: a436f53ebfeb ("vhost: avoid dead loop chain") Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com> Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
2016-07-22vhost: fix unregistering in client modeIlya Maximets
Currently while calling of 'rte_vhost_driver_unregister()' connection to QEMU will not be closed. This leads to inability to register driver again and reconnect to same virtual machine. This scenario is reproducible with OVS. While executing of the following command vhost port will be re-created (will be executed 'rte_vhost_driver_register()' followed by 'rte_vhost_driver_unregister()') network will be broken and QEMU possibly will crash: ovs-vsctl set Interface vhost1 ofport_request=15 Fix this by closing all established connections on driver unregister and removing of pending connections from reconnection list. Fixes: 64ab701c3d1e ("vhost: add vhost-user client mode") Signed-off-by: Ilya Maximets <i.maximets@samsung.com> Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
2016-07-22vhost: fix connect hang in client modeIlya Maximets
If something abnormal happened to QEMU, 'connect()' can block calling thread (e.g. main thread of OVS) forever or for a really long time. This can break whole application or block the reconnection thread. Example with OVS: ovs_rcu(urcu2)|WARN|blocked 512000 ms waiting for main to quiesce (gdb) bt #0 connect () from /lib64/libpthread.so.0 #1 vhost_user_create_client (vsocket=0xa816e0) #2 rte_vhost_driver_register #3 netdev_dpdk_vhost_user_construct #4 netdev_open (name=0xa664b0 "vhost1") [...] #11 main Fix that by setting non-blocking mode for client sockets for connection. Fixes: 64ab701c3d1e ("vhost: add vhost-user client mode") Signed-off-by: Ilya Maximets <i.maximets@samsung.com> Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
2016-07-15vhost: fix crash when exceeding file descriptorsPatrik Andersson
Protect against DPDK crash when allocation of listen fd >= 1023. For events on fd:s >1023, the current implementation will trigger an abort due to access outside of allocated bit mask. Corrections would include: * Match fdset_add() signature in fd_man.c to fd_man.h * Handling of return codes from fdset_add() * Addition of check of fd number in fdset_add_fd() The rationale behind the suggested code change is that, fdset_event_dispatch() could attempt access outside of the FD_SET bitmask if there is an event on a file descriptor that in turn looks up a virtio file descriptor with a value > 1023. Such an attempt will lead to an abort() and a restart of any vswitch using DPDK. A discussion topic exist in the ovs-discuss mailing list that can provide a little more background: http://openvswitch.org/pipermail/discuss/2016-February/020243.html Fixes: 8f972312 ("vhost: support vhost-user") Signed-off-by: Patrik Andersson <patrik.r.andersson@ericsson.com> Signed-off-by: Christian Ehrhardt <christian.ehrhardt@canonical.com> Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
2016-07-15vhost: check ring descriptor addressIlya Maximets
In current implementation vhost will crash with segmentation fault if malicious or buggy virtio application breaks addresses of descriptors. Before commit 0823c1cb0a73 ("vhost: workaround stale vring base") this crash was reproducible even with normal DPDK application that tries to change number of virtqueues dynamically inside VM. Fix that by checking addresses of descriptors before using. Signed-off-by: Ilya Maximets <i.maximets@samsung.com> Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
2016-07-15vhost: fix used descriptors number of mergeable enqueueIlya Maximets
Return value on error changed from '-1' to '0' because it returns unsigned value and it means number of used descriptors. Also fixed updating of 'last_used_idx' by using actual number of used descriptors. Fixes: 623bc47054d0 ("vhost: do sanity check for ring descriptor length") Signed-off-by: Ilya Maximets <i.maximets@samsung.com> Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
2016-07-04vhost: fix potential null pointer dereferenceYuanhan Liu
Fix the potential NULL pointer dereference issue raised by Coverity. 578 reconn = malloc(sizeof(*reconn)); >>> CID 127481: Null pointer dereferences (NULL_RETURNS) >>> Dereferencing a null pointer "reconn". 579 reconn->un = un; Coverity issue: 127481 Fixes: e623e0c6d8a5 ("vhost: add reconnect ability") Reported-by: John McNamara <john.mcnamara@intel.com> Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
2016-07-04vhost: fix not null terminated stringYuanhan Liu
Fix an issue raised by Coverity. >>> CID 127475: Memory - illegal accesses (BUFFER_SIZE_WARNING) >>> Calling strncpy with a maximum size argument of 108 bytes on >>> destination array "un->sun_path" of size 108 bytes might leave >>> the destination string unterminated. 441 strncpy(un->sun_path, path, sizeof(un->sun_path)); 442 443 return fd; 444 } Coverity issue: 127475 Fixes: 64ab701c3d1e ("vhost: add vhost-user client mode") Reported-by: John McNamara <john.mcnamara@intel.com> Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
2016-07-04vhost: fix memory leakYuanhan Liu
Fix potential memory leak raised by Coverity. >>> Variable "vsocket" going out of scope leaks the storage it >>> points to. Coverity issue: 127483 Fixes: e623e0c6d8a5 ("vhost: add reconnect ability") Reported-by: John McNamara <john.mcnamara@intel.com> Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
2016-06-30vhost: fix missing flag reset on stopYuanhan Liu
Commit 550c9d27d143 ("vhost: set/reset device flags internally") moves the VIRTIO_DEV_RUNNING set/reset to vhost lib. But I missed one reset on stop; here fixes it. Fixes: 550c9d27d143 ("vhost: set/reset device flags internally") Reported-by: Ciara Loftus <ciara.loftus@intel.com> Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com> Tested-by: Ciara Loftus <ciara.loftus@intel.com>
2016-06-29mk: fix internal dependenciesThomas Monjalon
Some libraries were missing their dependency on eal, mbuf, mempool, ring and kvargs. It is revealed by the linker option "-z defs". Signed-off-by: Thomas Monjalon <thomas.monjalon@6wind.com>
2016-06-22vhost: check hugepage fstat errorHuawei Xie
Value returned from fstat is not checked for errors before being used. This patch fixes following coverity issue. static uint64_t get_blk_size(int fd) { struct stat stat; fstat(fd, &stat); return (uint64_t)stat.st_blksize; >>> CID 107103 (#1 of 1): Unchecked return value from library (CHECKED_RETURN) >>> check_return: Calling fstat(fd, &stat) without checking return value. >>> This library function may fail and return an error code. Fixes: 8f972312b8f4 ("vhost: support vhost-user") Signed-off-by: Huawei Xie <huawei.xie@intel.com> Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
2016-06-22vhost: unmap log memory on cleanupIlya Maximets
Fixes memory leak on QEMU migration. Fixes: 54f9e32305d4 ("vhost: handle dirty pages logging request") Signed-off-by: Ilya Maximets <i.maximets@samsung.com> Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
2016-06-22vhost: fix leak of file descriptorsIlya Maximets
While migration of vhost-user device QEMU allocates memfd to store information about dirty pages and sends fd to vhost-user process. File descriptor for this memory should be closed to prevent "Too many open files" error for vhost-user process after some amount of migrations. Ex.: # ls /proc/<ovs-vswitchd pid>/fd/ -alh total 0 root qemu . root qemu .. root qemu 0 -> /dev/pts/0 root qemu 1 -> pipe:[1804353] root qemu 10 -> socket:[1782240] root qemu 100 -> /memfd:vhost-log (deleted) root qemu 1000 -> /memfd:vhost-log (deleted) root qemu 1001 -> /memfd:vhost-log (deleted) root qemu 1004 -> /memfd:vhost-log (deleted) [...] root qemu 996 -> /memfd:vhost-log (deleted) root qemu 997 -> /memfd:vhost-log (deleted) ovs-vswitchd.log: |WARN|punix:ovs-vswitchd.ctl: accept failed: Too many open files Fixes: 54f9e32305d4 ("vhost: handle dirty pages logging request") Signed-off-by: Ilya Maximets <i.maximets@samsung.com> Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
2016-06-22vhost: fix null pointer dereferenceMarcin Kerlin
Return value of function get_device() is not checking before dereference. Fix this problem by adding checking condition. Coverity issue: 119262 Fixes: 77d20126b4c2 ("vhost-user: handle message to enable vring") Signed-off-by: Marcin Kerlin <marcinx.kerlin@intel.com> Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
2016-06-22vhost: remove concurrent enqueueHuawei Xie
All other DPDK PMDs doesn't support concurrent receiving or sending packets to the same queue. The upper application should deal with this, normally through queue and core bindings. Due to historical reason, vhost internally supports concurrent lockless enqueuing packets to the same virtio queue through costly cmpset operation. This patch removes this internal lockless implementation and should improve performance a bit. Luckily DPDK OVS doesn't rely on this behavior. Signed-off-by: Huawei Xie <huawei.xie@intel.com> Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
2016-06-22vhost: arrange struct fields for better cache sharingYuanhan Liu
The ifname[] field takes so much space, that it seperates some frequently used fields into different caches, say, features and broadcast_rarp. This patch moves all those fields that will be accessed frequently in Rx/Tx together (before the ifname[] field) to let them share one cache line. Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com> Acked-by: Huawei Xie <huawei.xie@intel.com> Tested-by: Rich Lane <rich.lane@bigswitch.com>
2016-06-22vhost: optimize dequeue for small packetsYuanhan Liu
A virtio driver normally uses at least 2 desc buffers for Tx: the first for storing the header, and the others for storing the data. Therefore, we could fetch the first data desc buf before the main loop, and do the copy first before the check of "are we done yet?". This could save one check for small packets that just have one data desc buffer and need one mbuf to store it. Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com> Acked-by: Huawei Xie <huawei.xie@intel.com> Tested-by: Rich Lane <rich.lane@bigswitch.com>
2016-06-22vhost: pre update used ring for Tx and RxYuanhan Liu
Pre update and update used ring in batch for Tx and Rx at the stage while fetching all avail desc idx. This would reduce some cache misses and hence, increase the performance a bit. Pre update would be feasible as guest driver will not start processing those entries as far as we don't update "used->idx". (I'm not 100% certain I don't miss anything, though). Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com> Tested-by: Rich Lane <rich.lane@bigswitch.com>
2016-06-22vhost: workaround stale vring baseYuanhan Liu
When DPDK app crashes (or quits, or gets killed), a restart of DPDK app would get stale vring base from QEMU. That would break the kernel virtio net completely, making it non-work any more, unless a driver reset is done. So, instead of getting the stale vring base from QEMU, Huawei suggested we could get a much saner (and may not the most accurate) vring base from used->idx. That would work because: - there is a memory barrier between updating used ring entries and used->idx. So, even though we crashed at updating the used ring entries, it will not cause any issue, as the guest driver will not process those stale used entries, for used-idx is not updated yet. - DPDK process vring in order, that means a crash may just lead some packet retransmission for Tx and drop for Rx. Suggested-by: Huawei Xie <huawei.xie@intel.com> Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Huawei Xie <huawei.xie@intel.com>
2016-06-22vhost: add reconnect abilityYuanhan Liu
Allow reconnecting on failure by default when: - DPDK app starts first and QEMU (as the server) is not started yet. Without reconnecting, DPDK app would simply fail on vhost-user registration. - QEMU restarts, say due to OS reboot. Without reconnecting, you can't re-establish the connection without restarting DPDK app. This patch make it work well for both above cases. It simply creates a new thread, and keep trying calling "connect()", until it succeeds. The reconnect could be disabled when RTE_VHOST_USER_NO_RECONNECT flag is set. Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
2016-06-22vhost: add vhost-user client modeYuanhan Liu
Add a new paramter (flags) to rte_vhost_driver_register(). DPDK vhost-user acts as client mode when RTE_VHOST_USER_CLIENT flag is set. The flags would also allow future extensions without breaking the API (again). The rest is straingfoward then: allocate a unix socket, and bind/listen for server, connect for client. This extension is for vhost-user only, therefore we simply quit and report error when any flags are given for vhost-cuse. Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
2016-06-22vhost: rename structs for enabling client modeYuanhan Liu
DPDK vhost-user just acts as server so far, so, using a struct named as "vhost_server" is okay. However, if we add client mode, it doesn't make sense any more. Here renames it to "vhost_user_socket". There was no obvious wrong about "connfd_ctx", but I think it's obviously better to rename it to "vhost_user_connection", as it does represent a connection, a connection between the backend (DPDK) and the frontend (QEMU). Similarly, few more renames are taken, such as "vserver_new_vq_conn" to "vhost_user_new_connection". Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
2016-06-22vhost: make buffer vector for scatter Rx localIlya Maximets
Array of buf_vector's is just an array for temporary storing information about available descriptors. It used only locally in virtio_dev_merge_rx() and there is no reason for that array to be shared. Fix that by allocating local buf_vec inside virtio_dev_merge_rx(). Signed-off-by: Ilya Maximets <i.maximets@samsung.com> Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com> Tested-by: Rich Lane <rich.lane@bigswitch.com> Acked-by: Rich Lane <rich.lane@bigswitch.com>
2016-06-22vhost: make virtio header length per deviceYuanhan Liu
Virtio net header length is set per device, but not per queue. So, there is no reason to store it in vhost_virtqueue struct, instead, we should store it in virtio_net struct, to make one copy only. Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com> Tested-by: Rich Lane <rich.lane@bigswitch.com> Acked-by: Rich Lane <rich.lane@bigswitch.com>
2016-06-22vhost: reserve few more space for future extensionYuanhan Liu
"virtio_net_device_ops" is the only left open struct that an application can access, therefore, it's the only place that might introduce potential ABI break in future for extension. So, do some reservation for it. 5 should be pretty enough, considering that we have barely touched it for a long while. Another reason to choose 5 is for cache alignment: 5 makes the struct 64 bytes for 64 bit machine. With this, it's confidence to say that we might be able to be free from the ABI violation forever. Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com> Tested-by: Rich Lane <rich.lane@bigswitch.com> Acked-by: Rich Lane <rich.lane@bigswitch.com>
2016-06-22vhost: remove virtio-net.hYuanhan Liu
It barely has anything useful there, just 2 functions prototype. Here move them to vhost-net.h, and delete it. Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com> Tested-by: Rich Lane <rich.lane@bigswitch.com> Acked-by: Rich Lane <rich.lane@bigswitch.com>
2016-06-22vhost: remove unnecessary fieldsYuanhan Liu
The "reserved" field in virtio_net and vhost_virtqueue struct is not necessary any more. We now expose virtio_net device with a number "vid". This patch also removes the "priv" field: all fields are priviate now: application can't access it now. The only way that we could still access it is to expose it by a function, but I doubt that's needed or worthwhile. Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com> Tested-by: Rich Lane <rich.lane@bigswitch.com> Acked-by: Rich Lane <rich.lane@bigswitch.com>
2016-06-22vhost: hide internal codeYuanhan Liu
We are now safe to move all those internal structs/macros/functions to vhost-net.h, to hide them from external access. This patch also breaks long lines and removes some redundant comments. Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com> Tested-by: Rich Lane <rich.lane@bigswitch.com> Acked-by: Rich Lane <rich.lane@bigswitch.com>
2016-06-22vhost: export device id as the interface to applicationsYuanhan Liu
With all the previous prepare works, we are just one step away from the final ABI refactoring. That is, to change current API to let them stick to vid instead of the old virtio_net dev. Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com> Tested-by: Rich Lane <rich.lane@bigswitch.com> Acked-by: Rich Lane <rich.lane@bigswitch.com>
2016-06-22vhost: export queue free entriesYuanhan Liu
The new API rte_vhost_avail_entries() is actually a rename of rte_vring_available_entries(), with the "vring" to "vhost" name change to keep the consistency of other vhost exported APIs. This change could let us avoid the dependency of "virtio_net" struct, to prepare for the ABI refactoring. Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com> Tested-by: Rich Lane <rich.lane@bigswitch.com> Acked-by: Rich Lane <rich.lane@bigswitch.com>
2016-06-22vhost: export interface nameYuanhan Liu
Introduce a new API rte_vhost_get_ifname() to export the ifname to application. Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com> Tested-by: Rich Lane <rich.lane@bigswitch.com> Acked-by: Rich Lane <rich.lane@bigswitch.com>
2016-06-22vhost: export number of queuesYuanhan Liu
Introduce a new API rte_vhost_get_queue_num() to export the number of queues. Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com> Tested-by: Rich Lane <rich.lane@bigswitch.com> Acked-by: Rich Lane <rich.lane@bigswitch.com>
2016-06-22vhost: export numa nodeYuanhan Liu
Introduce a new API rte_vhost_get_numa_node() to get the numa node from which the virtio_net struct is allocated. Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com> Tested-by: Rich Lane <rich.lane@bigswitch.com> Acked-by: Rich Lane <rich.lane@bigswitch.com>
2016-06-22vhost: move cuse only struct to cuseYuanhan Liu
vhost cuse is now the last reference of the vhost_device_ctx struct; move it there, and do a rename to "vhost_cuse_device_ctx", to make it clear that it's "cuse only". Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com> Tested-by: Rich Lane <rich.lane@bigswitch.com> Acked-by: Rich Lane <rich.lane@bigswitch.com>
2016-06-22vhost: get device by device id onlyYuanhan Liu
get_device() just needs vid, so pass vid as the parameter only. Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com> Tested-by: Rich Lane <rich.lane@bigswitch.com> Acked-by: Rich Lane <rich.lane@bigswitch.com>
2016-06-22vhost: rename device id variableYuanhan Liu
I failed to figure out what does "fh" mean here for a long while. The only guess I could have had is "file handle". So, you get the point that it's not well named. I then figured it out that "fh" is derived from the fuse lib, and my above guess is right. However, device_fh represents a virtio net device ID. Therefore, here I rename it to vid (Virtio-net device ID, or Vhost device ID; choose one you prefer) to make it easier for understanding. This name (vid) then will be considered to the only interface to applications. That's another reason to do the rename: it's our interface, make it more understandable. Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com> Tested-by: Rich Lane <rich.lane@bigswitch.com> Acked-by: Rich Lane <rich.lane@bigswitch.com>
2016-06-22vhost: declare device id as intYuanhan Liu
device_fh repsents the device id for a specific virtio net device. Firstly, "int" would be big enough: we don't need 64 bit. Secondly, this could let us avoid the ugly "%" PRIu64 ".." stuff. And since ctx.fh is derived from device_fh, declare it as int, too. Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com> Tested-by: Rich Lane <rich.lane@bigswitch.com> Acked-by: Rich Lane <rich.lane@bigswitch.com>
2016-06-22vhost: set/reset device flags internallyYuanhan Liu
It does not make sense to ask the application to set/unset the flag VIRTIO_DEV_RUNNING (that used internal only) at new_device()/ destroy_device() callback. Instead, it should be set after new_device() succeeds and reset before destroy_device() is invoked inside vhost lib. This patch fixes it. Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com> Tested-by: Rich Lane <rich.lane@bigswitch.com> Acked-by: Rich Lane <rich.lane@bigswitch.com>
2016-06-22vhost: declare backend with int typeYuanhan Liu
It's an fd; so define it as "int", which could also save the unncessary (int) case. Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com> Tested-by: Rich Lane <rich.lane@bigswitch.com> Acked-by: Rich Lane <rich.lane@bigswitch.com>
2016-05-10vhost: fix name not null terminatedDaniel Mrzyglod
Fix issue reported by Coverity. Coverity ID 124556 If the buffer is treated as a null terminated string in later operations, a buffer overflow or over-read may occur. In vhost_set_ifname: The string buffer may not have a null terminator if the source string's length is equal to the buffer size Fixes: 54292e9520e0 ("vhost: support ifname for vhost-user") Signed-off-by: Daniel Mrzyglod <danielx.t.mrzyglod@intel.com> Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
2016-04-06vhost: fix error handling in destroyYuanhan Liu
Fix following coverity defect: 291 void 292 vhost_destroy_device(struct vhost_device_ctx ctx) 293 { 294 struct virtio_net *dev = get_device(ctx); 295 >>> CID 124565: Null pointer dereferences (NULL_RETURNS) >>> Dereferencing a null pointer "dev". Fixes: 45ca9c6f7bc6 ("vhost: get rid of linked list for devices") Reported-by: John McNamara <john.mcnamara@intel.com> Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
2016-03-31vhost: use SMP barriers instead of compiler onesIlya Maximets
Since commit 4c02e453cc62 ("eal: introduce SMP memory barriers") virtio uses architecture dependent SMP barriers. vHost should use them too. Fixes: 4c02e453cc62 ("eal: introduce SMP memory barriers") Signed-off-by: Ilya Maximets <i.maximets@samsung.com> Acked-by: Huawei Xie <huawei.xie@intel.com>
2016-03-25vhost: remove unnecessary returnYuanhan Liu
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
2016-03-17vhost: remove unnecessary memset when enqueueingYuanhan Liu
We have to reset the virtio net hdr at virtio_enqueue_offload() before, due to all mbufs share a single virtio_hdr structure: struct virtio_net_hdr_mrg_rxbuf virtio_hdr = {{0, }, 0}; foreach (mbuf) { virtio_enqueue_offload(mbuf, &virtio_hdr.hdr); copy net hdr and mbuf to desc buf } However, after the vhost rxtx refactor, the code looks like: copy_mbuf_to_desc(mbuf) { struct virtio_net_hdr_mrg_rxbuf virtio_hdr = {{0, }, 0} virtio_enqueue_offload(mbuf, &virtio_hdr.hdr); copy net hdr and mbuf to desc buf } foreach (mbuf) { copy_mbuf_to_desc(mbuf); } Therefore, the memset at virtio_enqueue_offload() is not necessary any more; remove it. Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com> Acked-by: Huawei Xie <huawei.xie@intel.com>
2016-03-15vhost: fix default value of kickfd and callfdTetsuya Mukawa
Currently, default values of kickfd and callfd are -1. If the values are -1, current code guesses kickfd and callfd haven't been initialized yet. Then vhost library will guess the virtqueue isn't ready for processing. But callfd and kickfd will be set as -1 when "--enable-kvm" isn't specified in QEMU command line. It means we cannot treat -1 as uninitialized state. The patch defines -1 and -2 as VIRTIO_INVALID_EVENTFD and VIRTIO_UNINITIALIZED_EVENTFD, and uses VIRTIO_UNINITIALIZED_EVENTFD for the default values of kickfd and callfd. Signed-off-by: Tetsuya Mukawa <mukawa@igel.co.jp> Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
2016-03-15vhost: avoid dead loop chainYuanhan Liu
If a malicious guest forges a dead loop chain, it could lead to a dead loop of copying the desc buf to mbuf, which results to all mbuf being exhausted. Add a var nr_desc to avoid such case. Suggested-by: Huawei Xie <huawei.xie@intel.com> Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
2016-03-15vhost: check for ring descriptors overflowYuanhan Liu
A malicious guest may easily forge some illegal vring desc buf. To make our vhost robust, we need make sure desc->next will not go beyond the vq->desc[] array. Suggested-by: Rich Lane <rich.lane@bigswitch.com> Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
2016-03-15vhost: do sanity check for ring descriptor lengthYuanhan Liu
We need make sure that desc->len is bigger than the size of virtio net header, otherwise, unexpected behaviour might happen due to "desc_avail" would become a huge number with for following code: desc_avail = desc->len - vq->vhost_hlen; For dequeue code path, it will try to allocate enough mbuf to hold such size of desc buf, which ends up with consuming all mbufs, leading to no free mbuf is available. Therefore, you might see an error message: Failed to allocate memory for mbuf. Also, for both dequeue/enqueue code path, while it copies data from/to desc buf, the big "desc_avail" would result to access memory not belong the desc buf, which could lead to some potential memory access errors. A malicious guest could easily forge such malformed vring desc buf. Every time we restart an interrupted DPDK application inside guest would also trigger this issue, as all huge pages are reset to 0 during DPDK re-init, leading to desc->len being 0. Therefore, this patch does a sanity check for desc->len, to make vhost robust. Reported-by: Rich Lane <rich.lane@bigswitch.com> Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
2016-03-14vhost: remove wrong unlikely prediction in RxYuanhan Liu
VIRTIO_NET_F_MRG_RXBUF is a default feature supported by vhost. Adding unlikely for VIRTIO_NET_F_MRG_RXBUF detection doesn't make sense to me at all. Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>