FAQ

Frequently-Asked Questions

How do I access my container?

See Access to your container.

You may wish to install an ssh server on your container. If you do so, please ensure that password access is disabled in order to keep your container secure.

How do I upload files to my container?

Using the Chameleon Python API (pre-installed in the Chameleon Jupyter environment), you can do:

from chi import container
container.upload(container_uuid, local_path, remote_path)

This method is limited to a small file size per each upload, and notably requires the tar command be available inside your container.

If your container runs an SSH server, you can copy files using tools like scp.

Can I use an image on a private Docker registry?

We do not yet support pulling from private Docker registries.

Can I run my container in privileged mode or access devices?

For security reasons, we don't support privileged mode. However, you can pass devices into your container to access things like the GPIO, CSI Camera, I2C, serial, or USB interfaces. This is equivalent to docker run --device ...

For more information, please see the section on Device Profiles.

Support for adding specific capabilities, as in CAP_ADD ..., is in progress.

How do I run a GPU workload on the Jetsons/Xaviers ?

Most GPU workloads on nvidia devices require or take advantage of several libraries in the CUDA ecosystem. For convenience and simplicity, we prepackaged the full Cuda, Tensort, Cudnn, and Visionworks libraries on our Nvidia hosts. To mount these libraries on your container, please include the following environment variables when starting your container with the create_container() call.

NVIDIA_REQUIRE_JETPACK: "csv-mounts=all"
 - this variable instructs the host to mount all the libraries mentioned above
 - ref: https://gitlab.com/nvidia/container-toolkit/container-toolkit/-/issues/11
NVIDIA_VISIBLE_DEVICES: all 
 - exposes all GPU devices on the host machine to the user container
 - ref: https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/docker-specialized.html#gpu-enumeration
NVIDIA_DRIVER_CAPABILITIES : all
 - allows all Nvidia GPU driver modules to be used by the user container
 - ref: https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/1.10.0/user-guide.html#driver-capabilities

Example usage to run pytorch:

my_container = container.create_container(
        "container_name", 
        image="nvcr.io/nvidia/l4t-pytorch:r32.7.1-pth1.9-py3",
        command=["/bin/bash", "-c", "--", "while true; do sleep 30; done;"],
        environment={"NVIDIA_REQUIRE_JETPACK":"csv-mounts=all", "NVIDIA_VISIBLE_DEVICES":"all", "NVIDIA_DRIVER_CAPABILITIES":"all"},
        reservation_id=lease.get_device_reservation("your-lease-id"),
        platform_version=2,
)

Nvidia devices by default start using Nvidia's container runtime, there is no need to specify runtime="nvidia". Please make sure to however specify a lease with Nvidia devices.

Lastly, please make sure to use images and software that is compatible with the current L4T (Linux for Tegra) version that we are using, namely L4T 32.7.3.

How do I check GPU memory usage on the Jetsons?

This can be done with tegrastats. This is included in Nvidia’s L4T base image, or you can follow these steps to get the binary, which can be copied to your image.

First, get the tegrastats binary from Nvidia which is in the nvidia-l4t-tools package.

Extract the file dpkg-deb -x <filename>.deb <output_dir>", and then you can find the single tegrastats binary in ./usr/bin.

My container stops with status Exited(1)

Check the “Logs” tab for more information on what actually went wrong.

If you see the error exec user process caused: exec format error, the issue most likely an architecture issue. Make sure your container is built for the proper CPU type, which is linux/arm64 on most of our devices.

What is the difference between V1 and V2 of the CHI@Edge platform?

The initial release of CHI@Edge is sometimes referred to as "V1", as opposed to "V2", which is the current release. The main difference is that self-service device enrollment is fully supported in V2, whereas in V1 we only worked with select partners. If you have used the V1 platform in the past, there are a few differences to be aware of:

  1. Console support is improved in V2. Before, you had to set your container's entrypoint/command to be an interactive shell and set the container to interactive mode in order to do much of anything with the console. In V2, the console tab in the GUI always gives you an interactive shell, no matter what the main process running in your container is.

  2. Containers can communicate on local networks in V2. Previously, all traffic had to pass through our central site. Now, containers can communicate with each other on the same local network, and traffic can egress from a local gateway.

  3. Restarting a container will clear its ephemeral filesystem. In V1, restarting behaved more like docker restart. V2 is built on Kubernetes, which has no notion of this. Restarting is equivalent to deleting and re-creating the container (though it is pretty snappy.)

What is still in progress for V2? We are still working on V1 feature parity, and there remain a few things still being worked on.

  1. Container snapshot and Glance images are not supported. There is no technical reason why this is not possible (other than Kubernetes considers snapshots an anti-pattern), but the V1 capability of storing and launching containers via images stored in Glance is not supported in V2. As a replacement capability, we are implementing support for pulling from private registries.

Last updated