这篇文章上次修改于 794 天前,可能其部分内容已经发生变化,如有疑问可询问作者。
前言
eBPF早有耳闻,但受限于自身水平和认知不足,一直没有搞出一个称得上Hello World
的东西...
最近eBPF的资料多了起来,这回终于捣鼓出来一个能运行的Hello World
了
首先要说明一个概念,那就是在安卓中,eBPF程序是运行在内核态的,而结果需要通过用户态的程序去获取
当然eBPF程序在内核态也是可以打印日志,但是这样是低效,且不方便输出自定义格式的做法
我尝试在eBPF中输出日志,但最终是没有成功...没有搞清楚为什么
由于一般情况下system分区不可读写,所以还需要借助Magisk将自定义的eBPF程序挂载到指定目录
环境
- Android 11
- Pixel 4XL
- root权限
- Magisk 25.0
记录
先检查内核信息
coral:/ # zcat /proc/config.gz | grep PROBE
CONFIG_ARCH_SUPPORTS_UPROBES=y
CONFIG_GENERIC_IRQ_PROBE=y
# CONFIG_KPROBES is not set
CONFIG_UPROBES=y
CONFIG_HAVE_KPROBES=y
CONFIG_HAVE_KRETPROBES=y
# CONFIG_BUILTINS_ASYNC_PROBE is not set
# CONFIG_TEST_ASYNC_DRIVER_PROBE is not set
CONFIG_GENERIC_CPU_AUTOPROBE=y
CONFIG_TIMER_PROBE=y
CONFIG_UPROBE_EVENTS=y
CONFIG_PROBE_EVENTS=y
如果是比较新的内核,那么CONFIG_KPROBES
一般是会开启的,本文的尝试是针对TRACEPOINT
的,所以只要有CONFIG_TRACEPOINTS=y
这一项即可
coral:/ # zcat /proc/config.gz | grep TRACEPOINT
CONFIG_TRACEPOINTS=y
CONFIG_HAVE_SYSCALL_TRACEPOINTS=y
# CONFIG_TRACEPOINT_BENCHMARK is not set
关于KPROBE/UPROBE/TRACEPOINT
的介绍请查看
本文尝试对TRACEPOINT
下的事件进行追踪
eBPF程序源代码src/example.c
如下
#include <bpf_helpers.h>
DEFINE_BPF_MAP(cpu_pid_map, ARRAY, int, uint32_t, 1024);
struct switch_args {
unsigned long long ignore;
char prev_comm[16];
int prev_pid;
int prev_prio;
long long prev_state;
char next_comm[16];
int next_pid;
int next_prio;
};
SEC("tracepoint/sched/sched_switch")
int tp_sched_switch(struct switch_args* args) {
int key;
uint32_t val;
key = bpf_get_smp_processor_id();
val = args->next_pid;
char fmt[] = "syscall sched_switch";
bpf_trace_printk(fmt, sizeof(fmt));
bpf_cpu_pid_map_update_elem(&key, &val, BPF_ANY);
return 0;
}
char _license[] SEC("license") = "GPL";
有关原理和约束,请查看
其中DEFINE_BPF_MAP
是模板函数,第一个参数会决定涉及map操作的函数名
- 查找 bpf_cpu_pid_map_lookup_elem
- 更新 bpf_cpu_pid_map_update_elem
- 删除 bpf_cpu_pid_map_delete_elem
以及最后系统会在/sys/fs/bpf
下生成对应的map文件和prog文件
coral:/ # ls -al /sys/fs/bpf | grep example
-rw------- 1 root root 0 2022-06-19 10:08 map_example_cpu_pid_map
-r--r----- 1 root root 0 2022-06-19 10:08 prog_example_tracepoint_sched_sched_switch
这里我生成的eBPF程序文件名是example.o
,所以可以看到map文件名的构成就是map_{example}_{cpu_pid_map}
这一点必须清楚,因为后面在编写用户态程序的时候,需要用到
这里的switch_args
结构体,构成可以从/sys/kernel/debug/tracing/events/sched/sched_switch/format
得到
coral:/ # cat /sys/kernel/debug/tracing/events/sched/sched_switch/format
name: sched_switch
ID: 88
format:
field:unsigned short common_type; offset:0; size:2; signed:0;
field:unsigned char common_flags; offset:2; size:1; signed:0;
field:unsigned char common_preempt_count; offset:3; size:1; signed:0;
field:int common_pid; offset:4; size:4; signed:1;
field:char prev_comm[16]; offset:8; size:16; signed:0;
field:pid_t prev_pid; offset:24; size:4; signed:1;
field:int prev_prio; offset:28; size:4; signed:1;
field:long prev_state; offset:32; size:8; signed:1;
field:char next_comm[16]; offset:40; size:16; signed:0;
field:pid_t next_pid; offset:56; size:4; signed:1;
field:int next_prio; offset:60; size:4; signed:1;
print fmt: "prev_comm=%s prev_pid=%d prev_prio=%d prev_state=%s%s ==> next_comm=%s next_pid=%d next_prio=%d", REC->prev_comm, REC->prev_pid, REC->prev_prio, (REC->prev_state & ((((0x0000 | 0x0001 | 0x0002 | 0x0004 | 0x0008 | 0x0010 | 0x0020 | 0x0040) + 1) << 1) - 1)) ? __print_flags(REC->prev_state & ((((0x0000 | 0x0001 | 0x0002 | 0x0004 | 0x0008 | 0x0010 | 0x0020 | 0x0040) + 1) << 1) - 1), "|", { 0x0001, "S" }, { 0x0002, "D" }, { 0x0004, "T" }, { 0x0008, "t" }, { 0x0010, "X" }, { 0x0020, "Z" }, { 0x0040, "P" }, { 0x0080, "I" }) : "R", REC->prev_state & (((0x0000 | 0x0001 | 0x0002 | 0x0004 | 0x0008 | 0x0010 | 0x0020 | 0x0040) + 1) << 1) ? "+" : "", REC->next_comm, REC->next_pid, REC->next_prio
这段代码的作用就是追踪tracepoint/sched/sched_switch
的情况,然后通过bpf_cpu_pid_map_update_elem
将结果更新到map
中
即用户态程序可以通过读取/sys/fs/bpf/map_example_cpu_pid_map
拿到eBPF程序给的结果
当然这个读取并不是直接读取文件,而是要通过bpf_obj_get
这个函数转换读取
代码有了,现在可以通过ndk来编译程序了,具体参考如下
代码准备
mkdir ~/ebpfdemo
cd ~/ebpfdemo
git clone -b android11-gsi https://android.googlesource.com/platform/bionic
mkdir system && cd system
git clone -b android11-gsi https://android.googlesource.com/platform/system/core/
git clone -b android11-gsi https://android.googlesource.com/platform/system/bpf/
cd ~/ebpfdemo
Makefile,假定ndk
解压路径是/home/kali/android-ndk-r23b
注意,Makefile
的target下一行,是一个TAB,即\t
,而不是4个空格,否则会出错(我也是才知道,以前都是在原有基础上改的..)
ebpf-build:
/home/kali/android-ndk-r23b/toolchains/llvm/prebuilt/linux-x86_64/bin/clang \
--target=bpf \
-c \
-nostdlibinc -no-canonical-prefixes -O2 \
-isystem bionic/libc/include \
-isystem bionic/libc/kernel/uapi \
-isystem bionic/libc/kernel/uapi/asm-arm64 \
-isystem bionic/libc/kernel/android/uapi \
-I system/core/libcutils/include \
-I system/bpf/progs/include \
-MD -MF example.d -o example.o src/example.c
执行make
命令,然后会在当前目录生成example.o
文件
注意eBPF程序并不是可执行程序,最终的处理是内核做的,这里编译的是.o
文件
然后是做一个Magisk模块,将example.o
挂载到/system/etc/bpf
,这样系统启动时才能自动加载
这里我基于HttpCanary System CA Mounter
这个模块修改
具体来说就是,将这里customize.sh
的REPLACE
内容换成/system/etc/bpf
然后在模块文件夹下创建system/etc/bpf
文件夹,并把example.o
放进去
然后是将整个文件打包为zip,然后push到手机,Magisk刷入这个模块
然后重启手机,Magisk就会将example.o
移动到/system/etc/bpf
怎么看加载成功了没,方法一是开机后执行下面的命令,看看有没有这两个文件,有就说明OK了
coral:/ # ls -al /sys/fs/bpf | grep example
-rw------- 1 root root 0 2022-06-19 10:08 map_example_cpu_pid_map
-r--r----- 1 root root 0 2022-06-19 10:08 prog_example_tracepoint_sched_sched_switch
方法二就是在开机过程中不断执行下面的命令,直到打印出加载eBPF程序的日志
adb shell "logcat | grep -i bpf"
不断执行是因为logcat设置的缓冲区太小,可能开机后再查看就没有了
如果正常,那么日志如下
06-18 11:57:35.313 860 860 D LibBpfLoader: Loading optional ELF object /system/etc/bpf/example.o with license GPL
06-18 11:57:35.313 860 860 E LibBpfLoader: No progs section could be found in elf object
06-18 11:57:35.313 860 860 D LibBpfLoader: Loaded code section 3 (tracepoint_raw_syscalls_sys_exit)
06-18 11:57:35.313 860 860 D LibBpfLoader: Adding section 3 to cs list
06-18 11:57:35.313 860 860 D LibBpfLoader: bpf_create_map name pid_syscall_map, ret: 6
06-18 11:57:35.313 860 860 D LibBpfLoader: map_fd found at 0 is 6 in /system/etc/bpf/example.o
06-18 11:57:35.318 860 860 D LibBpfLoader: bpf_prog_load lib call for /system/etc/bpf/example.o (tracepoint_raw_syscalls_sys_exit) returned fd: 7 (no error)
06-18 11:57:35.318 860 860 I bpfloader: Loaded object: /system/etc/bpf/example.o
怎么确定eBPF程序工作正常呢?可以看到我在代码里面加入了下面的内容
char fmt[] = "syscall sched_switch";
bpf_trace_printk(fmt, sizeof(fmt));
结合已有的资料,理论上应该是可以在/sys/kernel/debug/tracing/trace
中看到这个日志输出的
但是很遗憾没有,我也还没有搞清楚为什么,估计是什么地方操作不对,可能是开机的时候才有
不过还可以自己编写用户态程序去读取数据,如果有数据那不就说明也是OK的嘛
用户态程序源代码src/trace.cpp
如下,这里就用到了前面的两个路径
#include <inttypes.h>
#include <iostream>
#include <unordered_map>
#include "bpf/BpfMap.h"
#include "bpf/BpfUtils.h"
#include "libbpf_android.h"
constexpr const char tp_prog_path[] = "/sys/fs/bpf/prog_example_tracepoint_sched_sched_switch";
constexpr const char tp_map_path[] = "/sys/fs/bpf/map_example_cpu_pid_map";
using namespace android::bpf;
using android::base::StringPrintf;
bool setup() {
int mProgFd = bpf_obj_get(tp_prog_path);
if (mProgFd <= 0) return false;
int ret = bpf_attach_tracepoint(mProgFd, "sched", "sched_switch");
if (ret == 0) return false;
return true;
}
void showMapDetail(std::unordered_map<uint32_t, uint32_t> *sysCallMap) {
BpfMap<uint32_t, uint32_t> m(tp_map_path);
sleep(1);
const auto iterFunc = [sysCallMap](const uint32_t& key, const uint32_t& val, const BpfMap<uint32_t, uint32_t>&) {
if (val) {
std::string tmp = StringPrintf("%d\t%" PRIu32, key, val);
std::cout << tmp << std::endl;
(*sysCallMap)[key] = val;
}
return android::base::Result<void>();
};
m.iterateWithValue(iterFunc);
}
int main()
{
std::unordered_map<uint32_t, uint32_t> sysCallMap;
setup();
sleep(1);
showMapDetail(&sysCallMap);
return 0;
}
先说一个可能踩坑的点,就是BpfMap<uint32_t, uint32_t> m(tp_map_path);
这行代码
因为参考了下面这篇文章,于是最开始的写法是传入fd
,但是怎么都编译不过
后来才发现这是因为我用的代码,头文件等等都是Android 11的,这个API发生了变化...
- http://aospxref.com/android-10.0.0_r47/xref/system/bpf/libbpf_android/BpfLoadTest.cpp#70
- http://aospxref.com/android-11.0.0_r21/xref/system/bpf/libbpf_android/BpfLoadTest.cpp#68
编译这段代码也是颇为曲折,一开始是想着用前面的方案,直接配置ndk,但是后面发现涉及的头文件构成太复杂,最终不得不放弃自己写Makefile来编译
所以编写Android.bp
如下,假定在aosp文件夹下的testbpf
文件夹中
cc_defaults {
name: "my-defaults",
local_include_dirs: [
"include",
],
cflags: [
"-Wall",
"-Werror",
"-Wuninitialized",
"-Wno-error=unused-variable",
"-fno-common",
"-fPIC",
"-D__STDC_FORMAT_MACROS",
],
target: {
android_arm64: {
cflags: [
"-D__ANDROID__",
],
},
},
}
cc_binary {
name: "bpftracer",
// static_executable: true,
defaults: [
"my-defaults",
],
local_include_dirs: [
"include",
],
srcs: [
"src/trace.cpp",
],
shared_libs: [
"libbpf",
"libbase",
"libutils",
],
static_libs: [
"libbpf",
"libbpf_android",
],
}
这里是aosp的环境,如果用gsi我认为也是可行的,过程应该一样(难得测试了)
cd ~/aosp11
export LC_ALL=C && . build/envsetup.sh
lunch aosp_arm64-eng
mmm testbpf
这里如果用static_executable: true,
会出现符号重复的异常,去掉也能编译出来,就没管了
(一般就是不需要这个吧)
FAILED: out/soong/.intermediates/testbpf/bpftracer/android_arm64_armv8-a/unstripped/bpftracer
prebuilts/clang/host/linux-x86/clang-r383902b1/bin/clang++ out/soong/.intermediates/bionic/libc/crtbegin_static/android_arm64_armv8-a/crtbegin_static.o @out/soong/.intermediates/testbpf/bpftracer/android_arm64_armv8-a/unstripped/bpftracer.rsp out/soong/.intermediates/system/bpf/libbpf_android/libbpf_android/android_arm64_armv8-a_static/libbpf_android.a out/soong/.intermediates/bionic/libm/libm/android_arm64_armv8-a_static/libm.a out/soong/.intermediates/bionic/libc/libc/android_arm64_armv8-a_static/libc.a out/soong/.intermediates/build/soong/libgcc_stripped/android_arm64_armv8-a_static/libgcc_stripped.a out/soong/.intermediates/external/bcc/libbpf/android_arm64_armv8-a_static/libbpf.a out/soong/.intermediates/external/libcxx/libc++_static/android_arm64_armv8-a_static/libc++_static.a out/soong/.intermediates/external/libcxxabi/libc++demangle/android_arm64_armv8-a_static/libc++demangle.a -Wl,--start-group out/soong/.intermediates/bionic/libc/libc/android_arm64_armv8-a_static/libc.a prebuilts/clang/host/linux-x86/clang-r383902b1/lib64/clang/11.0.2/lib/linux/libclang_rt.builtins-aarch64-android.a prebuilts/gcc/linux-x86/aarch64/aarch64-linux-android-4.9/aarch64-linux-android/lib64/libatomic.a -Wl,--end-group out/soong/.intermediates/bionic/libc/crtend_android/android_arm64_armv8-a/obj/bionic/libc/arch-common/bionic/crtend.o -o out/soong/.intermediates/testbpf/bpftracer/android_arm64_armv8-a/unstripped/bpftracer -target aarch64-linux-android10000 -Bprebuilts/gcc/linux-x86/aarch64/aarch64-linux-android-4.9/aarch64-linux-android/bin -Wl,-z,noexecstack -Wl,-z,relro -Wl,-z,now -Wl,--build-id=md5 -Wl,--warn-shared-textrel -Wl,--fatal-warnings -Wl,--no-undefined-version -Wl,--exclude-libs,libgcc.a -Wl,--exclude-libs,libgcc_stripped.a -Wl,--exclude-libs,libunwind_llvm.a -fuse-ld=lld -Wl,--pack-dyn-relocs=android+relr -Wl,--use-android-relr-tags -Wl,--no-undefined -Wl,--hash-style=gnu -Wl,-z,separate-code -Wl,--icf=safe -Wl,-z,max-page-size=4096 -Wl,--exclude-libs=libclang_rt.builtins-aarch64-android.a -static -nostdlib -Bstatic -Wl,--gc-sections prebuilts/clang/host/linux-x86/clang-r383902b1/lib64/clang/11.0.2/lib/linux/libclang_rt.ubsan_minimal-aarch64-android.a -Wl,--exclude-libs,libclang_rt.ubsan_minimal-aarch64-android.a
ld.lld: error: duplicate symbol: std::nothrow
>>> defined at new.cpp:24 (bionic/libc/bionic/new.cpp:24)
>>> new.o:(std::nothrow) in archive out/soong/.intermediates/bionic/libc/libc/android_arm64_armv8-a_static/libc.a
>>> defined at new.cpp:38 (external/libcxx/src/new.cpp:38)
>>> new.o:(.rodata._ZSt7nothrow+0x0) in archive out/soong/.intermediates/external/libcxx/libc++_static/android_arm64_armv8-a_static/libc++_static.a
clang-11: error: linker command failed with exit code 1 (use -v to see invocation)
12:27:12 ninja failed with: exit status 1
编译成功的样子
然后将程序推送到手机,并添加可执行权限,切换到root下执行!
完全OK,可以获取到对应的信息
不过这里打印得比较简单,但是至少完成了用户态程序获取对应信息的过程
至此,算是完成了Hello World
基于此,进一步打印出sys_enter
和sys_exit
事件的详细信息完全具有可行性
不过综合考虑,显然这个过程还是过于复杂,还是通过挂载类debian系统,通过现有的bcc框架直接编译生成对应的程序更方便
再补充一个调试eBPF程序的方法,还有一种是手动设置bpf.progs_loaded
属性,然后运行bpfloader
服务,再根据logcat日志来检查eBPF程序有没有正常加载
setprop bpf.progs_loaded 0
stop bpfloader
start bpfloader
logcat | grep -i bpfloader
但是测试发现会导致系统崩溃...后来经过分析,确定是因为系统会检查是不是已经运行了
根据源代码,应该先把/sys/fs/bpf
下面的map和prog删掉再执行
否则异常会在system/bpf/libbpf_android/Loader.cpp
的createMaps
发生
因为原本的文件存在,会重复使用,具体导致的问题也不太清楚
如果要在开机状态下重新加载eBPF,那么执行命令的如下
单独一个shell
logcat | grep -i bpfloader
单独一个shell
rm /sys/fs/bpf/*
setprop bpf.progs_loaded 0
stop bpfloader
start bpfloader
这样才能正常调试检查,不得不说有些文章真的就是一笔带过...
还要注意的就是,这里是一次性加载全部的eBPF程序
没有评论