[VirtualGL-Users] VirtualGL, docker and different X server issues

Discussion:

Mathieu Pasquet

2016-07-22 10:55:34 UTC

Hello,

We are using VirtualGL in one of our projects, and it’s great, but we
have a specific scenario that creates issues (which are probably not
due to VGL itself, but I feel that here is still the most relevant place
to ask). We use VirtualGL inside docker while sharing both the X11 socket
and /dev/dri while making sure groups & uids match. The goal is to have a
virtual X display inside the docker (with the dummy driver and x11vnc).
We sadly cannot use TurboVNC because it doesn’t to my knowledge offer a
way to run an arbitrary command that would allow us to use TOTP[1]. We
run our software from another container, using:

DISPLAY=xorg-container:0 vglrun -d :1 ./software.
(:1 being the "3D" X server on the host)

[1] https://tools.ietf.org/html/rfc6238

We were previously using llvmpipe only without sharing anything from the
host, but obviously performance is much worse without hardware
acceleration. We still use mesa everywhere, and the same version in host
and containers.

It often works, but sometimes fails to initialize and ends after
displaying the following error (and putting the vglrun call in a loop
bypasses that issue):

X Error of failed request: BadValue (integer parameter out of range
for operation)
Major opcode of failed request: 130 (MIT-SHM)
Minor opcode of failed request: 3 (X_ShmPutImage)
Value in failed request: 0x320
Serial number of failed request: 16
Current serial number in output stream: 17

I have to point out that VirtualGL 2.4.1 has this (apparently random)
behavior, while VirtualGL 2.5 only has the next error (but which
occurs everytime, preventing us from using it).

Another strange thing is that we can run glxinfo and it returns the
right information, or our software which works when there isn’t an
error, but any attempt at running glxgears or glxspheres results in
a vglclient error:

Polygons in scene: 62464 (61 spheres * 1024 polys/spheres)
Visual ID of window: 0x21
Context is Direct
OpenGL Renderer: Gallium 0.4 on AMD TAHITI (DRM 2.43.0, LLVM 3.8.0)
[VGL] ERROR: Could not connect to VGL client. Make sure that vglclient is
[VGL] running and that either the DISPLAY or VGL_CLIENT environment
[VGL] variable points to the machine on which vglclient is running.
[VGL] ERROR: in connect--
[VGL] 261: Connection refused

Running them from the same container but using the :1 display for both
display and rendering does not exhibit this problem.

When running several of the xorg-vglclient-software instances, we can
also sometimes observe buffers from one instance leaking into another,
or inverted output.

I would like to inquire if any of these errors ring a bell, or if the
architecture is fundamentally flawed due to DRI, DRM, permissions, or
Xorg shenanigans.

Best regards,

--
Mathieu Pasquet
R&D Engineer
alter way

DRC

2016-07-22 21:14:25 UTC

Permalink

I'm not totally sure I understand your architecture or why exactly you're doing things the way you're doing them, but I think you can get rid of the vglclient error by adding -c 0 to the vglrun command line. VirtualGL will try to automatically detect whether it should use the X11 or VGL transports, and it assumes that if the 2D X server isn't on the same machine, it should use the VGL Transport, which adds compression to the image stream (but requires that you connect using vglconnect in order to start the listener on the 2D X server.)

I'll re-read this when I am in a better position to process it, and perhaps I can offer some more intelligent suggestions. I would definitely like to add TOTP support to TurboVNC. Does that work using PAM? If so, then it can probably be made to work with existing TurboVNC releases.

Post by Mathieu Pasquet
Hello,
We are using VirtualGL in one of our projects, and it’s great, but we
have a specific scenario that creates issues (which are probably not
due to VGL itself, but I feel that here is still the most relevant place
to ask). We use VirtualGL inside docker while sharing both the X11 socket
and /dev/dri while making sure groups & uids match. The goal is to have a
virtual X display inside the docker (with the dummy driver and x11vnc).
We sadly cannot use TurboVNC because it doesn’t to my knowledge offer a
way to run an arbitrary command that would allow us to use TOTP[1]. We
DISPLAY=xorg-container:0 vglrun -d :1 ./software.
(:1 being the "3D" X server on the host)
[1] https://tools.ietf.org/html/rfc6238
We were previously using llvmpipe only without sharing anything from the
host, but obviously performance is much worse without hardware
acceleration. We still use mesa everywhere, and the same version in host
and containers.
It often works, but sometimes fails to initialize and ends after
displaying the following error (and putting the vglrun call in a loop
X Error of failed request: BadValue (integer parameter out of range
for operation)
Major opcode of failed request: 130 (MIT-SHM)
Minor opcode of failed request: 3 (X_ShmPutImage)
Value in failed request: 0x320
Serial number of failed request: 16
Current serial number in output stream: 17
I have to point out that VirtualGL 2.4.1 has this (apparently random)
behavior, while VirtualGL 2.5 only has the next error (but which
occurs everytime, preventing us from using it).
Another strange thing is that we can run glxinfo and it returns the
right information, or our software which works when there isn’t an
error, but any attempt at running glxgears or glxspheres results in
Polygons in scene: 62464 (61 spheres * 1024 polys/spheres)
Visual ID of window: 0x21
Context is Direct
OpenGL Renderer: Gallium 0.4 on AMD TAHITI (DRM 2.43.0, LLVM 3.8.0)
[VGL] ERROR: Could not connect to VGL client. Make sure that vglclient is
[VGL] running and that either the DISPLAY or VGL_CLIENT environment
[VGL] variable points to the machine on which vglclient is running.
[VGL] ERROR: in connect--
[VGL] 261: Connection refused
Running them from the same container but using the :1 display for both
display and rendering does not exhibit this problem.
When running several of the xorg-vglclient-software instances, we can
also sometimes observe buffers from one instance leaking into another,
or inverted output.
I would like to inquire if any of these errors ring a bell, or if the
architecture is fundamentally flawed due to DRI, DRM, permissions, or
Xorg shenanigans.
Best regards,
--
Mathieu Pasquet
R&D Engineer
alter way
------------------------------------------------------------------------------
What NetFlow Analyzer can do for you? Monitors network bandwidth and traffic
patterns at an interface-level. Reveals which users, apps, and protocols are
consuming the most bandwidth. Provides multi-vendor support for NetFlow,
J-Flow, sFlow and other flows. Make informed decisions using capacity planning
reports.http://sdm.link/zohodev2dev
_______________________________________________
VirtualGL-Users mailing list
https://lists.sourceforge.net/lists/listinfo/virtualgl-users

Mathieu Pasquet

2016-07-25 14:02:55 UTC

Permalink

Thanks, vglrun -c 0 does fix the vglclient error message, and I assume
I have nothing to gain with compression on a virtual internal network.
Regarding TOTP, I think there is an experimental third-party PAM
module for it, but it’s easier to run an external lightweight command
like oathtool (e.g. using x11vnc I only have to specify -passwdfile
"cmd:oathtool --totp $SECRET").

Post by DRC
I'm not totally sure I understand your architecture or why exactly you're doing things the way you're doing them, but I think you can get rid of the vglclient error by adding -c 0 to the vglrun command line. VirtualGL will try to automatically detect whether it should use the X11 or VGL transports, and it assumes that if the 2D X server isn't on the same machine, it should use the VGL Transport, which adds compression to the image stream (but requires that you connect using vglconnect in order to start the listener on the 2D X server.)
I'll re-read this when I am in a better position to process it, and perhaps I can offer some more intelligent suggestions. I would definitely like to add TOTP support to TurboVNC. Does that work using PAM? If so, then it can probably be made to work with existing TurboVNC releases.

------------------------------------------------------------------------------
What NetFlow Analyzer can do for you? Monitors network bandwidth and traffic
patterns at an interface-level. Reveals which users, apps, and protocols are
consuming the most bandwidth. Provides multi-vendor support for NetFlow,
J-Flow, sFlow and other flows. Make informed decisions using capacity planning
reports.http://sdm.link/zohodev2dev
_______________________________________________
VirtualGL-Users mailing list
https://lists.sourceforge.net/lists/listinfo/virtualgl-users

--
Mathieu Pasquet
R&D engineer
alter way

DRC

2016-07-25 14:41:05 UTC

Permalink

OK, it sounds like I need to tinker with TOTP. Seems like it should be straightforward to add support for that.

As far as compression, if you're transmitting the pixels within the same physical machine, then no, there is nothing to gain from it. Leave it uncompressed.

Post by Mathieu Pasquet
Thanks, vglrun -c 0 does fix the vglclient error message, and I assume
I have nothing to gain with compression on a virtual internal network.
Regarding TOTP, I think there is an experimental third-party PAM
module for it, but it’s easier to run an external lightweight command
like oathtool (e.g. using x11vnc I only have to specify -passwdfile
"cmd:oathtool --totp $SECRET").

--
Mathieu Pasquet
R&D engineer
alter way
------------------------------------------------------------------------------
What NetFlow Analyzer can do for you? Monitors network bandwidth and traffic
patterns at an interface-level. Reveals which users, apps, and protocols are
consuming the most bandwidth. Provides multi-vendor support for NetFlow,
J-Flow, sFlow and other flows. Make informed decisions using capacity planning
reports.http://sdm.link/zohodev2dev
_______________________________________________
VirtualGL-Users mailing list
https://lists.sourceforge.net/lists/listinfo/virtualgl-users

DRC

2016-07-28 04:00:34 UTC

Permalink

I was able to get TOTP working successfully in TurboVNC using pam_oath
on CentOS 6 & 7, Fedora, and Ubuntu, using Google Authenticator on my
phone. The strawman procedure is here:

http://www.turbovnc.org/Documentation/TOTP

Please try it out and let me know what breaks, or if anything is unclear
in the procedure, or if it doesn't meet your needs for some reason.

AFAICT, there isn't really a cleaner or more secure way of doing this
from within TurboVNC. x11vnc is fundamentally a single-user
application, whereas TurboVNC is multi-user, so in order to do what
x11vnc is doing, I would have to implement the following in Xvnc:

-- A mechanism for specifying a server-side authentication command to
use with Unix Login/Plain authentication. For security reasons, this
command would have to be specified in the TurboVNC security
configuration file.

-- Some method of passing the user's secret key to this command. One
possible method would be to have Xvnc read an environment variable with
the key, or to store it under ~/.vnc, and insert this key into the
aforementioned authentication command spec. Another possible method
would be to use a global file that contains all of the users' secret
keys, which is very similar to how the OATH PAM module does it.

-- A new server-side authentication method that uses the aforementioned
authentication command instead of PAM to authenticate user/password pairs.

All of this would require significant effort and would significantly
increase the complexity of TurboVNC, and AFAICT, it would provide no
advantages relative to the OATH PAM module.

------------------------------------------------------------------------------