ikatago 配置

体系结构

Publish Date: 2024-10-27

利用脚本安装：


cd ~; /bin/bash -c "$(curl -fsSL https://ikatago-resources.oss-cn-beijing.aliyuncs.com/all/install.sh)"

如果报错“sh: curl: command not found”是没有curl，先安装一下。

apt-get update
apt install curl
``` shell

``` shell
https://github.com/kinfkong/ikatago-resources/tree/master/dockerfiles

最高支持到 cuda11，需要做一些小修改。

cd ~/work; ./run.sh zxdfcv Nz216318

python3
>>> import tensorrt
>>> print(tensorrt.__version__)
>>> assert tensorrt.Builder(tensorrt.Logger())

python3 -m pip install --upgrade pip
python3 -m pip install wheel
python3 -m pip install --upgrade tensorrt

python3 -m pip install --upgrade tensorrt_lean # opt
python3 -m pip install --upgrade tensorrt_dispatch # opt

wget http://mirrors.kernel.org/ubuntu/pool/universe/libz/libzip/libzip5_1.5.1-0ubuntu1_amd64.deb
sudo apt install ./libzip5_1.5.1-0ubuntu1_amd64.deb


sudo apt-get install tensorrt

如果你的系统使用了 alternatives 系统（即 /etc/alternatives），这是一种在 Linux 系统中管理多个软件版本的机制。通过 alternatives，那么可以轻松地切换不同版本的 CUDA（或其他软件）而不直接修改符号链接。

`alternatives` 工作原理

alternatives 系统使用了一系列符号链接，将软件版本与通用名称关联起来。它的原理是：

主链接：/usr/local/cuda 和 /usr/local/cuda-12 分别指向 /etc/alternatives/cuda 和 /etc/alternatives/cuda-12。
替代链接：/etc/alternatives/cuda 和 /etc/alternatives/cuda-12 是实际指向具体 CUDA 版本的链接，可以随时切换。

这样，通过更新 /etc/alternatives/cuda 和 /etc/alternatives/cuda-12 的指向路径，就可以切换默认的 CUDA 版本，而不需要手动创建和管理符号链接。

使用 `alternatives` 设置默认 CUDA 版本

你可以使用 update-alternatives 命令来设置默认的 CUDA 版本，例如：

sudo update-alternatives --config cuda

命令会列出所有安装的 CUDA 版本，你可以选择默认的 CUDA 版本。

设置 CUDA 12.6 为默认版本

假设 CUDA 12.6 已被添加到 alternatives 系统中，可以运行以下命令将其设为默认：

sudo update-alternatives --set cuda /usr/local/cuda-12.6
sudo update-alternatives --set cuda-12 /usr/local/cuda-12.6

这会将 /etc/alternatives/cuda 和 /etc/alternatives/cuda-12 都指向 CUDA 12.6，从而实现默认使用 CUDA 12.6 的效果。

os="ubuntuxx04"
tag="10.x.x-cuda-x.x"
sudo dpkg -i nv-tensorrt-local-repo-${os}-${tag}_1.0-1_amd64.deb
sudo cp /var/nv-tensorrt-local-repo-${os}-${tag}/*-keyring.gpg /usr/share/keyrings/
# ubuntu2204-10.2.0-cuda-12.5
# sudo cp /var/nv-tensorrt-local-repo-ubuntu2204-10.2.0-cuda-12.5/*-keyring.gpg /usr/share/keyrings/
sudo apt-mark hold libnvinfer* tensorrt* # 锁定
sudo apt-get update

sudo apt-get install tensorrt # full install

sudo apt-get remove --purge 'libnvinfer*'
sudo apt-get autoremove

sudo update-alternatives --install /usr/local/cuda cuda /usr/local/cuda-12.5 130 # 改变优先级，从而改变自动选项

删除：

sudo apt-get remove --purge 'libnvinfer*'
sudo apt-get autoremove

sudo dpkg -i nv-tensorrt-local-repo-ubuntu2204-10.2.0-cuda-12.5_1.0-1_amd64.deb
sudo apt-get install -f

是的，使用 .tar 版本的 TensorRT 确实可以解决这个问题，因为 .tar 安装不会依赖 APT 包管理器，不会受到系统自动更新的影响。通过手动解压和配置 .tar 文件，可以完全控制 TensorRT 的版本，并避免被 APT 的依赖或优先级配置干扰。

使用 `.tar` 文件安装 TensorRT 的步骤

下载 TensorRT 10.2.0 的 .tar 包

从 NVIDIA TensorRT 下载页面获取与 CUDA 12.5 兼容的 TensorRT 10.2.0 .tar.gz 包，并将其下载到本地。
解压 TensorRT 文件

将 .tar.gz 文件解压到指定目录，例如 /usr/local/TensorRT-10.2.0：
```
sudo tar -xzvf TensorRT-10.2.0.19.Linux.x86_64-gnu.cuda-12.5.tar.gz -C /usr/local/
```
设置环境变量

将 TensorRT 的库路径添加到 LD_LIBRARY_PATH，以确保系统能找到正确的库文件。可以在 ~/.bashrc 文件中添加以下行：
```
echo 'export LD_LIBRARY_PATH=/usr/local/TensorRT-10.2.0.19/lib:$LD_LIBRARY_PATH' >> ~/.bashrc
```
然后刷新环境变量：
```
source ~/.bashrc
```
安装 Python 绑定（如果需要）

如果需要使用 TensorRT 的 Python 绑定，可以在解压后的目录中找到 python 文件夹。找到与你的 Python 版本匹配的 .whl 文件，然后使用 pip 安装：
```
pip install /usr/local/TensorRT-10.2.0/python/tensorrt-10.2.0-cp<python_version>-none-linux_x86_64.whl

pip3 install tensorrt-10.2.0.19-cp310-none-linux_x86_64.whl
```
验证安装

可以使用以下命令来确认 TensorRT 是否安装成功：
```
ls /usr/local/TensorRT-10.2.0/lib
```
在 Python 中检查 TensorRT 版本：
```
import tensorrt as trt
print(trt.__version__)
```

优点

不受 APT 更新影响：.tar 版本不会与 APT 包管理器关联，因此不会因为更新或依赖关系发生变更。
更灵活的控制：可以手动管理 TensorRT 的版本，适合多版本共存的需求。

这样安装 .tar 版本可以确保使用特定版本的 TensorRT，并避免被系统更新到其他版本的情况。


sudo apt-mark unhold libnvinfer* tensorrt*

wget https://developer.nvidia.com/downloads/compute/machine-learning/tensorrt/10.2.0/tars/TensorRT-10.2.0.19.Linux.x86_64-gnu.cuda-12.5.tar.gz

ERROR: pip's dependency resolver does not currently take into account all the pac
kages that are installed. This behaviour is the source of the following dependenc
y conflicts.
torch-tensorrt 2.2.0a0 requires tensorrt<8.7,>=8.6, but you have tensorrt 10.2.0
which is incompatible.

ERROR: pip’s dependency resolver does not currently take into account all the pac
kages that are installed. This behaviour is the source of the following dependenc
y conflicts.
torch-tensorrt 2.2.0a0 requires tensorrt<8.7,>=8.6, but you have tensorrt 10.2.0
which is incompatible.


使用 `alternatives` 系统（即 `/etc/alternatives`）来配置 CUDA 是一种较为灵活的方式，适合在系统中管理多个 CUDA 版本。这样可以方便地在不同 CUDA 版本之间切换。以下是使用 `alternatives` 系统来配置 CUDA 的步骤：

#### 1. 为 `nvcc` 创建 `alternatives` 入口

首先，使用 `alternatives` 命令为 `nvcc` 创建管理入口，以便在不同 CUDA 版本之间切换。

假设您的 CUDA 12.4 安装在 `/usr/local/cuda-12.4`，可以按以下步骤设置：

```bash
sudo update-alternatives --install /usr/bin/nvcc nvcc /usr/local/cuda-12.4/bin/nvcc 1204

2. 为其他 CUDA 环境配置 `alternatives`

CUDA 包含其他重要的路径，如 lib64，建议也使用 alternatives 进行管理。

为 CUDA 路径添加链接

在 /etc/alternatives 中添加一个 cuda 路径，以便在多个 CUDA 安装目录之间进行管理和切换：

sudo update-alternatives --install /usr/local/cuda cuda /usr/local/cuda-12.4 1204

这样，/usr/local/cuda 将指向当前选择的 CUDA 版本。

3. 切换 CUDA 版本

使用 update-alternatives --config 命令来切换 CUDA 版本：

sudo update-alternatives --config nvcc
sudo update-alternatives --config cuda

系统将显示可选版本，您可以选择所需的版本进行切换。

4. 配置环境变量

配置环境变量时可以直接引用 /usr/local/cuda，因为它会根据 alternatives 系统自动指向当前选定的 CUDA 版本。添加以下行到 .bashrc 文件中：

export PATH=/usr/local/cuda/bin${PATH:+:${PATH}}
export LD_LIBRARY_PATH=/usr/local/cuda/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}

保存后执行以下命令使更改生效：

source ~/.bashrc

5. 验证配置

使用以下命令确认 nvcc 和 CUDA 的路径是否正确配置：

which nvcc
nvcc --version

这样配置后，您可以方便地在多个 CUDA 版本之间进行切换，/usr/local/cuda 会始终指向当前选定的 CUDA 版本路径。

Yixiang Zhang

http://zxdfcv.github.io/2024/10/27/ikatago-%E9%85%8D%E7%BD%AE/

All articles in this blog are used except for special statements CC BY 4.0 reprint policy. If reproduced, please indicate source Yixiang Zhang !

围棋 CUDA

心肌炎医学知识

2024-10-30 医学

心脏健康

常见数学运算和符号的英文读法

2024-10-25 English

English

alternatives 工作原理

使用 alternatives 设置默认 CUDA 版本

设置 CUDA 12.6 为默认版本

使用 .tar 文件安装 TensorRT 的步骤

优点

2. 为其他 CUDA 环境配置 alternatives

为 CUDA 路径添加链接

3. 切换 CUDA 版本

4. 配置环境变量

5. 验证配置

`alternatives` 工作原理

使用 `alternatives` 设置默认 CUDA 版本

使用 `.tar` 文件安装 TensorRT 的步骤

2. 为其他 CUDA 环境配置 `alternatives`