2024-12-17

因为docker能够进行隔离，基于docker会比chroot更加安全。这里研究了一下docker怎么使用rootfs。主要步骤如下：

制作rootfs的ext4的img镜像
通过docker import导入image
通过docker run启动这个image
进入系统后，修改并通过docker commit提交

一：制作rootfs

这里制作rootfs和之前文章一样，大概步骤如下


dd if=/dev/zero of=rootfs.img bs=1k count=1024
mkfs.ext4 rootfs.img
e2fsck -fy rootfs.img
resize2fs rootfs.img 10G
mount rootfs.img rootfs
cp rootfs_file/* rootfs/
umount rootfs

二：导入image


mount rootfs.img rootfs
cd rootfs
tar -cv . | docker import - test_image
或从某个docker image里面导入
docker export -o rootfs.tar example_image
docker import rootfs.tar test_image

三：启动这个镜像


docker run -it --name "`date +%y%m%d-``openssl rand -hex 2`" -p 2222:22 --hostname myOS --privileged --cap-add=sys_admin --env container=docker \
        --entrypoint=/usr/lib/systemd/systemd \
        --mount type=bind,source=/sys/fs/cgroup,target=/sys/fs/cgroup \
        --mount type=bind,source=/sys/fs/fuse,target=/sys/fs/fuse \
        --mount type=tmpfs,destination=/tmp \
        --mount type=tmpfs,destination=/run \
        --mount type=tmpfs,destination=/run/lock \
        test_image:latest \
        --unit=multi-user.target

这里参数解析如下：

--privileged 权限全开
--cap-add=sys_admin 允许执行系统管理任务
--entrypoint=/usr/lib/systemd/systemd 默认执行systemd
--env container=docker 要允许systemd(以及系列程序)能够感知到自己运行在一个容器中，systemd unit配置文件中的 ConditionVirtualization= 设置才能工作
--mount type=bind 在环境内进行mount
--unit=multi-user.target systemd运行这个target，可以更换自行的target
-p 2222:22 将容器的22端口映射到本机的2222端口，这样可以直接 ssh -p 2222 localhost登录容器

四：修改并提交

在容器中修改任何文件和内容之后，可以直接poweroff让容器处于退出状态，然后


docker commit -a "name@test.cn" -m "do something" $CONTAINER_ID test_image:v1

提交之后，通过docker images 可以看到两个images了。这样就可以用新的images进行开发了。

小问题：

docker安装时，无法分配ip

解决方法：直接命令分配即可，然后手动运行dockerd测试验证


ip link add name docker0 type bridge
ip addr add dev docker0 172.17.0.1/16

其他小命令


docker image save  test_image:v1  -o rootfs.gz 
docker image lode -i rootfs.gz

参考链接：

https://github.com/docker/for-linux/issues/123#issuecomment-346546953
https://www.kernel.org/doc/html/latest/filesystems/fuse.html
https://docs.docker.com/engine/reference/commandline/import/
https://github.com/moby/buildkit/blob/master/frontend/dockerfile/docs/syntax.md
https://serverfault.com/questions/607769/running-systemd-inside-a-docker-container-arch-linux
https://cloud-atlas.readthedocs.io/zh_CN/latest/docker/init/docker_systemd.html

2024-12-17

之前有借助uml进行启动linux系统，借助uml的基础上，这里使用qemu来实现跨架构的虚拟系统启动，主要包括如下步骤

编译qemu-system-aarch64
编译linux内核
下载aarch64版本的rootfs镜像
利用qemu-system-aarch64启动内核和系统

一：编译qemu


apt install gcc-10-aarch64-linux-gnu
wget https://download.qemu.org/qemu-7.0.0-rc0.tar.bz2
./configure –target-list=aarch64-softmmu
make -j8 && make install
apt install gcc-10-aarch64-linux-gnu
ln -s /usr/bin/aarch64-linux-gnu-gcc-10  /usr/bin/aarch64-linux-gnu-gcc
export ARCH=arm64
export CROSS_COMPILE=aarch64-linux-gnu-

安装之后，可以查看qemu工具版本如下


qemu-system-aarch64 --version
QEMU emulator version 6.2.90
Copyright (c) 2003-2022 Fabrice Bellard and the QEMU Project developers

二：编译linux内核

linux源码仍借用 linux-source-5.13.0 这个安装包


apt install linux-source-5.13.0

实际目录在 /usr/src/linux-source-5.13.0/linux-source-5.13.0/

编译命令如下：


ARCH=arm64 CROSS_COPILE=aarch64-linux-gnu- make defconfig
ARCH=arm64 CROSS_COPILE=aarch64-linux-gnu- make -j8

编译完成后会生成内核文件为


file arch/arm64/boot/Image.gz
arch/arm64/boot/Image.gz: gzip compressed data, max compression, from Unix, original size modulo 2^32 33122816

三.rootfs镜像


wget https://cdimage.ubuntu.com/ubuntu-base/focal/daily/current/focal-base-arm64.tar.gz

将这个tar做成rootfs.img 的ext4格式镜像即可

四.启动系统

直接通过内核启动rootfs镜像如下


qemu-system-aarch64 -M virt -cpu cortex-a72 -kernel Image.gz -append "root=/dev/vda" -hda rootfs.img  -nographic

因为是-nographic，所以系统只会串口输出。

下载的rootfs.img默认没有systemd启动，可以chroot进去进行安装对应的包,并设置系统为字符模式，并确保getty能够启动ttyAMA0


chroot rootfs/
apt update
systemctl set-default multi-user.target
ln -s /lib/systemd/system/getty@.service /etc/systemd/system/getty.target.wants/getty@ttyAMA0.service

到这里，系统已经能够正常启动了，但是系统还不能使用网络，这里仍借助nettap的方法，创建一个tap3的虚拟网卡，然后和虚拟机进行通信即可

主机上创建tap3网卡如下


ip tuntap add tap3 mode tap group tf
ip addr add 192.168.0.100/24 dev tap3
ip link set dev tap3 up
echo 1 > /proc/sys/net/ipv4/conf/tap3/proxy_arp
iptables -I FORWARD -i tap3 -j ACCEPT
iptables -I FORWARD -o tap3 -j ACCEPT

qemu启动命令如下


qemu-system-aarch64 -M virt -cpu cortex-a72 -smp 4 -m 4096 -kernel Image.gz -nographic -append "console=ttyAMA0 root=/dev/vda rw rootfstype=ext4 ignore_loglevel"  -drive if=none,file=rootfs.img,id=hd0 -device virtio-blk-device,drive=hd0 -device virtio-blk-device,drive=hd0 -netdev tap,id=n1,ifname=tap3,script=no -device e1000,netdev=n1,mac=02:27:d1:32:44:7f

-netdev tap,id=n1,ifname=tap3,script=no

指明主机网卡使用名字为tap3的tap类型网卡，并禁用系统的ifup脚本。id为n1

-device e1000,netdev=n1,mac=02:27:d1:32:44:7f

指明虚拟机使用e1000网卡驱动，设备为id=n1的网卡，mac地址为主机tap3的mac地址


ifconfig tap3  | grep ether
        ether 02:27:d1:32:44:7f  txqueuelen 1000  (以太网)

虚拟机内网络配置如下


/etc/network/interfaces
auto enp0s1
allow-hotplug enp0s1
iface enp0s1 inet static
address 192.168.0.200
netmask 255.255.255.0
gateway 192.168.0.1

然后就可以ssh登录即可

扩展：

使用qemu-system-aarch64进行uboot的仿真（通过浏览网页发现的）

编译uboot


git clone https://github.com/ARM-software/u-boot.git
ARCH=arm64 CROSS_COPILE=aarch64-linux-gnu- make qemu_arm64_defconfig
ARCH=arm64 CROSS_COPILE=aarch64-linux-gnu- make -j8

启动uboot


qemu-system-aarch64 -machine virt -cpu cortex-a57 -bios u-boot.bin -nographic

uboot就成功启动了

参考链接：

https://github.com/ARM-software/u-boot/blob/master/doc/README.qemu-arm
https://www.qemu.org/docs/master/system/linuxboot.html
https://pandysong.github.io/blog/post/run_u-boot_in_qemu/
https://stackoverflow.com/questions/58028789/how-to-build-and-boot-linux-aarch64-with-u-boot-with-buildroot-on-qemu
https://zhuanlan.zhihu.com/p/41258581
https://wiki.qemu.org/Documentation/Networking

2024-12-17

在看ldd3的时候，发现了书上说的user mode linux，故实践了一把
主要步骤为

编译内核为um版本
创建rootfs的ext4镜像
启动内核
设置网络环境

一：编译内核

在ubuntu机器里面直接拉取对应的内核源码包 linux-source-5.13.0 对应国内清华源地址如下

https://mirrors.tuna.tsinghua.edu.cn/ubuntu/pool/main/l/linux/

对应的deb为

linux-source-5.13.0_5.13.0-37.42_all.deb

直接安装后，可以得到

/usr/src/linux-source-5.13.0/linux-source-5.13.0.tar.bz2

解压

tar xvjf linux-source-5.13.0.tar.bz2

编译


make ARCH=um defconfig
make ARCH=um menuconfig
make ARCH=um -j8

编译遇到问题，一般是deb包少安装了，根据对应情况安装即可。

编译完成之后，会存在一个二进制


# file linux
linux: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, for GNU/Linux 3.2.0, BuildID[sha1]=d2f9a2cb99247191eaa4e592eeedc6bd2a8c021d, with debug_info, not stripped

可以知道，编译uml只能是x86平台，arm64就不用想了


# find arch/ | grep Makefile.um
arch/x86/Makefile.um

二：创建rootfs镜像


dd if=/dev/urandom of=rootfs.img count=1024
mkfs.ext4 rootfs.img
resize2fs rootfs.img 500M

下载ubuntu2004的base版本

http://cdimage.ubuntu.com/ubuntu-base/focal/daily/current/

可以得到 focal-base-amd64.tar.gz


mkdir rootfs
mount rootfs.img rootfs
tar xvzf focal-base-amd64.tar.gz -C rootfs
chroot rootfs
adduser kylin
umount rootfs
e2fsck -fy rootfs.img

至此就得到了最基本的系统环境。

三：启动内核

启动内核就是给linux二进制传参数


 ./linux --help 可以看到对应的介绍

这里我的启动命令如下


./linux ubd0=rootfs.img rw mem=1024m eth0=tuntap,tap3,72:d4:bc:87:80:c9,192.168.0.100 init=/sbin/init

ubd0是uml默认挂载的root，这里指定我们制作的ubuntu-base
rw同bootargs一样，指明读写挂载
mem指明可用内存为1G
eth0=***指明网卡设备使用的信息
init指明内核第一个运行的程序为systemd

启动之后，终端会出现如下log


[  OK  ] Reached target Graphical Interface.
         Starting Update UTMP about System Runlevel Changes...
[  OK  ] Finished Update UTMP about System Runlevel Changes.

此时，登录系统可以通过screen命令，因为uml默认使用/dev/pts/4作为tty1来login。

直接输入


screen /dev/pts/4

即可看到login程序，输入正确的账户密码即可登录

四：设置网络环境

此时启动的linux系统，无法正常和外接进行网络连接，所以需要启动网络，这样可以利用sshd来进行登录

在实体机上：


ip tuntap add tap3 mode tap group tangfeng
chown root:tangfeng /dev/net/tun
ip addr add 192.168.0.100/24 dev tap3
ip link set dev tap3 up
echo 1 > /proc/sys/net/ipv4/ip_forward
echo 1 > /proc/sys/net/ipv4/conf/tap3/proxy_arp
iptables -t nat -I POSTROUTING -o enp2s0 -j MASQUERADE
iptables -I FORWARD -i tap3 -j ACCEPT
iptables -I FORWARD -o tap3 -j ACCEPT

这里tap3是通过tun创建的虚拟网卡，默认ip为任意网段的ip即可。用于和虚拟机内的linux系统进行通信

设置好之后如下


# ifconfig tap3
tap3: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 192.168.0.100  netmask 255.255.255.0  broadcast 0.0.0.0
        inet6 fe80::70d4:bcff:fe87:80c9  prefixlen 64  scopeid 0x20<link>
        ether 72:d4:bc:87:80:c9  txqueuelen 1000  (以太网)
        RX packets 1945  bytes 177904 (177.9 KB)
        RX errors 0  dropped 147  overruns 0  frame 0
        TX packets 2991  bytes 3683367 (3.6 MB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

在虚拟机上：


ip link set dev eth0 up
ip addr add 192.168.0.200/24 dev eth0
ip route add default via 192.168.0.100
chmod 777 /tmp

设置好之后，如下


~# ifconfig eth0
eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 192.168.0.200  netmask 255.255.255.0  broadcast 0.0.0.0
        ether 72:d4:bc:87:80:c9  txqueuelen 1000  (Ethernet)
        RX packets 8  bytes 636 (636.0 B)
        RX errors 0  dropped 6  overruns 0  frame 0
        TX packets 0  bytes 0 (0.0 B)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
        device interrupt 5

在虚拟机内还需要指明dns，这样才能够正常解析域名


echo 'nameserver 8.8.8.8' > /etc/resolv.conf

大功告成，现在可以使用tap3网卡登录192.168.0.200地址了。这里实验一下


# ssh kylin@192.168.0.200
kylin@192.168.0.200's password:
Welcome to Ubuntu 20.04.4 LTS (GNU/Linux 5.13.19 x86_64

接下来就可以为所欲为了

五：参考链接：

https://wiki.archlinux.org/index.php?title=User-mode_Linux
http://uml.devloop.org.uk/howto.html
https://www.kernel.org/doc/html/v5.9/virt/uml/user_mode_linux.html
https://www.kernel.org/doc/html/v5.13/virt/uml/user_mode_linux_howto_v2.html?highlight=uml

2024-12-17

sysrq是linux的一种调试手段，经常是因为系统挂死，但内核并没有完全死掉的情况下，通过对内核下发组合按键，从而拿到必要的内核信息。

开启sysrq需要config设置如下


CONFIG_MAGIC_SYSRQ=y
CONFIG_MAGIC_SYSRQ_DEFAULT_ENABLE=0x1

查看当前系统sysrq状态


root@ywen233:~# sysctl -a | grep sysrq
kernel.sysrq = 176

sysrq值的意思


        0 - disable sysrq completely
        1 - enable all functions of sysrq
        >1 - bitmask of allowed sysrq functions (see below for detailed function
            description):
               2 - enable control of console logging level
               4 - enable control of keyboard (SAK, unraw)
               8 - enable debugging dumps of processes etc.
              16 - enable sync command
              32 - enable remount read-only
              64 - enable signalling of processes (term, kill, oom-kill)
             128 - allow reboot/poweroff
             256 - allow nicing of all RT tasks

如何触发sysrq

通过alt+sysrq组合按键触发
通过echo /proc/sysrq-trigger触发

可触发类型


  0-9 设定终端输出的内核 log 优先级
  b 立即重启系统
  c 内核live reboot，并输出错误信息
  d 显示所有排它锁(显示所有被持有的锁)
  e 向除 init 外进程发送 SIGTERM 信号，让其自行结束
  f 人为触发 OOM Killer (out of memory)
  g 当进入内核模式时，以 framebuttter 代替输出(kgdb(内核调试器)使用)
  h 输出帮助
  i 向除 init 以外所有进程发送 SIGKILL 信号，强制结束进程
  k 安全访问密钥(SAK)杀死当前虚拟控制台上的所有程序
  l 显示所有活动cpu的堆栈回溯。
  m 内存使用信息(将当前内存信息转储到您的控制台。)
  n 重置所有进程的 nice（优先级）
  o 关机
  p 输出cpu 寄存器信息
  q Display all active high-resolution timers and clock sources.
  r 把键盘设置为 ASCII 模式，使按键可以穿透 x server 捕捉传递给内核
  s 同步缓冲区数据到硬盘
  t 输出进程列表(将当前任务及其信息的列表转储到您的控制台。)
  u 重新挂载所有文件系统为只读模式
  v 输出 Voyager SMP 处理信息
  w 输出 block（d状态）进程列表

2024-12-17

在看LDD3时发现可以通过TIOCLINUX的ioctl控制内核日志输出。其代码可以如下：


#include <stdio.h>
#include <fcntl.h>
#include <errno.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/ioctl.h>
#include <string.h>
int main( int argc, char **argv )
{
    char    bytes[ 2 ] = { 11, 0 }; // 11 is the TIOCLINUX command-number

    if ( argc == 2 ) bytes[1] = atoi( argv[1] );    // console id-number
    else    {
        fprintf( stderr, "%s: need a single argument\n", argv[0] );
        exit(1);
        }

    int fd = open( "/dev/console", O_RDWR );        // <--- added
    if ( fd < 0 ) { perror( "/dev/console" ); exit(1); } // <--- added

    if ( ioctl( fd, TIOCLINUX, bytes ) < 0 )     // <--- changed
        {
        fprintf( stderr, "%s: ioctl( fd, TIOCLINUX ): %s\n", // <---
                        argv[0], strerror( errno ) );
        exit(1);
        }

    exit(0);
}

通过gcc编译

gcc setconsole.c -o setconsole 查看系统默认console上绑定的tty


cat /sys/devices/virtual/tty/console/active
tty0

如果内核日志需要定向到其他tty上，可以如下运行

让内核日志在只在tty3上出现

./setconsole 3 # tty3上会出现内核日志，其他tty不会有内核日志 让内核日志在所有的tty上出现

./setconsole 0 # 所有可关联的tty都会出现内核日志

阅读全文