作者:程序喵大人 2020-11-11 08:25:45
云计算
虚拟化
存储软件 我们会通过/proc文件系统找到正在运行的进程的字符串所在的虚拟内存地址,并通过更改此内存地址的内容来更改字符串内容,使你更深入了解虚拟内存这个概念!
成都网站建设哪家好,找创新互联建站!专注于网页设计、成都网站建设公司、微信开发、小程序制作、集团成都定制网站等服务项目。核心团队均拥有互联网行业多年经验,服务众多知名企业客户;涵盖的客户类型包括:社区文化墙等众多领域,积累了大量丰富的经验,同时也获得了客户的一致赞赏!
本文转载自微信公众号「程序喵大人」,作者程序喵大人 。转载本文请联系程序喵大人公众号。
摊牌了,不装了,其实我是程序喵辛苦工作一天还要回家编辑公众号到大半夜的老婆,希望各位大哥能踊跃转发,完成我一千阅读量的KPI(梦想),谢谢!
咳咳,有点跑题,以下是程序喵的废话,麻烦给个面子划到最后点击在看或者赞,证明我比程序喵人气高,谢谢!
通过/proc文件系统探究虚拟内存
我们会通过/proc文件系统找到正在运行的进程的字符串所在的虚拟内存地址,并通过更改此内存地址的内容来更改字符串内容,使你更深入了解虚拟内存这个概念!这之前先介绍下虚拟内存的定义!
虚拟内存
虚拟内存是一种实现在计算机软硬件之间的内存管理技术,它将程序使用到的内存地址(虚拟地址)映射到计算机内存中的物理地址,虚拟内存使得应用程序从繁琐的管理内存空间任务中解放出来,提高了内存隔离带来的安全性,虚拟内存地址通常是连续的地址空间,由操作系统的内存管理模块控制,在触发缺页中断时利用分页技术将实际的物理内存分配给虚拟内存,而且64位机器虚拟内存的空间大小远超出实际物理内存的大小,使得进程可以使用比物理内存大小更多的内存空间。
在深入研究虚拟内存前,有几个关键点:
virtual_memory.png
上图并不是特别详细的内存管理图,高地址其实还有内核空间等等,但这不是这篇文章的主题。从图中可以看到高地址存储着命令行参数和环境变量,之后是栈空间、堆空间和可执行程序,其中栈空间向下延申,堆空间向上增长,堆空间需要使用malloc分配,是动态分配的内存的一部分。
首先通过一个简单的C程序探究虚拟内存。
- #include
- #include
- #include
- /**
- * main - 使用strdup创建一个字符串的拷贝,strdup内部会使用malloc分配空间,
- * 返回新空间的地址,这段地址空间需要外部自行使用free释放
- *
- * Return: EXIT_FAILURE if malloc failed. Otherwise EXIT_SUCCESS
- */
- int main(void)
- {
- char *s;
- s = strdup("test_memory");
- if (s == NULL)
- {
- fprintf(stderr, "Can't allocate mem with malloc\n");
- return (EXIT_FAILURE);
- }
- printf("%p\n", (void *)s);
- return (EXIT_SUCCESS);
- }
- 编译运行:gcc -Wall -Wextra -pedantic -Werror main.c -o test; ./test
- 输出:0x88f010
我的机器是64位机器,进程的虚拟内存高地址为0xffffffffffffffff, 低地址为0x0,而0x88f010远小于0xffffffffffffffff,因此大概可以推断出被复制的字符串的地址(堆地址)是在内存低地址附近,具体可以通过/proc文件系统验证.
ls /proc目录可以看到好多文件,这里主要关注/proc/[pid]/mem和/proc/[pid]/maps
mem & maps
- man proc
- /proc/[pid]/mem
- This file can be used to access the pages of a process's memory through open(2), read(2), and lseek(2).
- /proc/[pid]/maps
- A file containing the currently mapped memory regions and their access permissions.
- See mmap(2) for some further information about memory mappings.
- The format of the file is:
- address perms offset dev inode pathname
- 00400000-00452000 r-xp 00000000 08:02 173521 /usr/bin/dbus-daemon
- 00651000-00652000 r--p 00051000 08:02 173521 /usr/bin/dbus-daemon
- 00652000-00655000 rw-p 00052000 08:02 173521 /usr/bin/dbus-daemon
- 00e03000-00e24000 rw-p 00000000 00:00 0 [heap]
- 00e24000-011f7000 rw-p 00000000 00:00 0 [heap]
- ...
- 35b1800000-35b1820000 r-xp 00000000 08:02 135522 /usr/lib64/ld-2.15.so
- 35b1a1f000-35b1a20000 r--p 0001f000 08:02 135522 /usr/lib64/ld-2.15.so
- 35b1a20000-35b1a21000 rw-p 00020000 08:02 135522 /usr/lib64/ld-2.15.so
- 35b1a21000-35b1a22000 rw-p 00000000 00:00 0
- 35b1c00000-35b1dac000 r-xp 00000000 08:02 135870 /usr/lib64/libc-2.15.so
- 35b1dac000-35b1fac000 ---p 001ac000 08:02 135870 /usr/lib64/libc-2.15.so
- 35b1fac000-35b1fb0000 r--p 001ac000 08:02 135870 /usr/lib64/libc-2.15.so
- 35b1fb0000-35b1fb2000 rw-p 001b0000 08:02 135870 /usr/lib64/libc-2.15.so
- ...
- f2c6ff8c000-7f2c7078c000 rw-p 00000000 00:00 0 [stack:986]
- ...
- 7fffb2c0d000-7fffb2c2e000 rw-p 00000000 00:00 0 [stack]
- 7fffb2d48000-7fffb2d49000 r-xp 00000000 00:00 0 [vdso]
- The address field is the address space in the process that the mapping occupies.
- The perms field is a set of permissions:
- r = read
- w = write
- x = execute
- s = shared
- p = private (copy on write)
- The offset field is the offset into the file/whatever;
- dev is the device (major:minor); inode is the inode on that device. 0 indicates
- that no inode is associated with the memory region,
- as would be the case with BSS (uninitialized data).
- The pathname field will usually be the file that is backing the mapping.
- For ELF files, you can easily coordinate with the offset field
- by looking at the Offset field in the ELF program headers (readelf -l).
- There are additional helpful pseudo-paths:
- [stack]
- The initial process's (also known as the main thread's) stack.
- [stack:
] (since Linux 3.4) - A thread's stack (where the
is a thread ID). - It corresponds to the /proc/[pid]/task/[tid]/ path.
- [vdso] The virtual dynamically linked shared object.
- [heap] The process's heap.
- If the pathname field is blank, this is an anonymous mapping as obtained via the mmap(2) function.
- There is no easy way to coordinate
- this back to a process's source, short of running it through gdb(1), strace(1), or similar.
- Under Linux 2.0 there is no field giving pathname.
通过mem文件可以访问和修改整个进程的内存页,通过maps可以看到进程当前已映射的内存区域,有地址和访问权限偏移量等,从maps中可以看到堆空间是在低地址而栈空间是在高地址. 从maps中可以看到heap的访问权限是rw,即可写,所以可以通过堆地址找到上个示例程序中字符串的地址,并通过修改mem文件对应地址的内容,就可以修改字符串的内容啦,程序:
- #include
- #include
- #include
- #include
- /**
- * main - uses strdup to create a new string, loops forever-ever
- *
- * Return: EXIT_FAILURE if malloc failed. Other never returns
- */
- int main(void)
- {
- char *s;
- unsigned long int i;
- s = strdup("test_memory");
- if (s == NULL)
- {
- fprintf(stderr, "Can't allocate mem with malloc\n");
- return (EXIT_FAILURE);
- }
- i = 0;
- while (s)
- {
- printf("[%lu] %s (%p)\n", i, s, (void *)s);
- sleep(1);
- i++;
- }
- return (EXIT_SUCCESS);
- }
- 编译运行:gcc -Wall -Wextra -pedantic -Werror main.c -o loop; ./loop
- 输出:
- [0] test_memory (0x21dc010)
- [1] test_memory (0x21dc010)
- [2] test_memory (0x21dc010)
- [3] test_memory (0x21dc010)
- [4] test_memory (0x21dc010)
- [5] test_memory (0x21dc010)
- [6] test_memory (0x21dc010)
- ...
这里可以写一个脚本通过/proc文件系统找到字符串所在位置并修改其内容,相应的输出也会更改。
首先找到进程的进程号
- ps aux | grep ./loop | grep -v grep
- zjucad 2542 0.0 0.0 4352 636 pts/3 S+ 12:28 0:00 ./loop
2542即为loop程序的进程号,cat /proc/2542/maps得到
- 00400000-00401000 r-xp 00000000 08:01 811716 /home/zjucad/wangzhiqiang/loop
- 00600000-00601000 r--p 00000000 08:01 811716 /home/zjucad/wangzhiqiang/loop
- 00601000-00602000 rw-p 00001000 08:01 811716 /home/zjucad/wangzhiqiang/loop
- 021dc000-021fd000 rw-p 00000000 00:00 0 [heap]
- 7f2adae2a000-7f2adafea000 r-xp 00000000 08:01 8661324 /lib/x86_64-linux-gnu/libc-2.23.so
- 7f2adafea000-7f2adb1ea000 ---p 001c0000 08:01 8661324 /lib/x86_64-linux-gnu/libc-2.23.so
- 7f2adb1ea000-7f2adb1ee000 r--p 001c0000 08:01 8661324 /lib/x86_64-linux-gnu/libc-2.23.so
- 7f2adb1ee000-7f2adb1f0000 rw-p 001c4000 08:01 8661324 /lib/x86_64-linux-gnu/libc-2.23.so
- 7f2adb1f0000-7f2adb1f4000 rw-p 00000000 00:00 0
- 7f2adb1f4000-7f2adb21a000 r-xp 00000000 08:01 8661310 /lib/x86_64-linux-gnu/ld-2.23.so
- 7f2adb3fa000-7f2adb3fd000 rw-p 00000000 00:00 0
- 7f2adb419000-7f2adb41a000 r--p 00025000 08:01 8661310 /lib/x86_64-linux-gnu/ld-2.23.so
- 7f2adb41a000-7f2adb41b000 rw-p 00026000 08:01 8661310 /lib/x86_64-linux-gnu/ld-2.23.so
- 7f2adb41b000-7f2adb41c000 rw-p 00000000 00:00 0
- 7ffd51bb3000-7ffd51bd4000 rw-p 00000000 00:00 0 [stack]
- 7ffd51bdd000-7ffd51be0000 r--p 00000000 00:00 0 [vvar]
- 7ffd51be0000-7ffd51be2000 r-xp 00000000 00:00 0 [vdso]
- ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0 [vsyscall]
看见堆地址范围021dc000-021fd000,并且可读可写,而且021dc000<0x21dc010<021fd000,这就可以确认字符串的地址在堆中,在堆中的索引是0x10(至于为什么是0x10,后面会讲到),这时可以通过mem文件到0x21dc010地址修改内容,字符串输出的内容也会随之更改,这里通过python脚本实现此功能。
- #!/usr/bin/env python3
- '''
- Locates and replaces the first occurrence of a string in the heap
- of a process
- Usage: ./read_write_heap.py PID search_string replace_by_string
- Where:
- - PID is the pid of the target process
- - search_string is the ASCII string you are looking to overwrite
- - replace_by_string is the ASCII string you want to replace
- search_string with
- '''
- import sys
- def print_usage_and_exit():
- print('Usage: {} pid search write'.format(sys.argv[0]))
- sys.exit(1)
- # check usage
- if len(sys.argv) != 4:
- print_usage_and_exit()
- # get the pid from args
- pid = int(sys.argv[1])
- if pid <= 0:
- print_usage_and_exit()
- search_string = str(sys.argv[2])
- if search_string == "":
- print_usage_and_exit()
- write_string = str(sys.argv[3])
- if search_string == "":
- print_usage_and_exit()
- # open the maps and mem files of the process
- maps_filename = "/proc/{}/maps".format(pid)
- print("[*] maps: {}".format(maps_filename))
- mem_filename = "/proc/{}/mem".format(pid)
- print("[*] mem: {}".format(mem_filename))
- # try opening the maps file
- try:
- maps_file = open('/proc/{}/maps'.format(pid), 'r')
- except IOError as e:
- print("[ERROR] Can not open file {}:".format(maps_filename))
- print(" I/O error({}): {}".format(e.errno, e.strerror))
- sys.exit(1)
- for line in maps_file:
- sline = line.split(' ')
- # check if we found the heap
- if sline[-1][:-1] != "[heap]":
- continue
- print("[*] Found [heap]:")
- # parse line
- addr = sline[0]
- perm = sline[1]
- offset = sline[2]
- device = sline[3]
- inode = sline[4]
- pathname = sline[-1][:-1]
- print("\tpathname = {}".format(pathname))
- print("\taddresses = {}".format(addr))
- print("\tpermisions = {}".format(perm))
- print("\toffset = {}".format(offset))
- print("\tinode = {}".format(inode))
- # check if there is read and write permission
- if perm[0] != 'r' or perm[1] != 'w':
- print("[*] {} does not have read/write permission".format(pathname))
- maps_file.close()
- exit(0)
- # get start and end of the heap in the virtual memory
- addr = addr.split("-")
- if len(addr) != 2: # never trust anyone, not even your OS :)
- print("[*] Wrong addr format")
- maps_file.close()
- exit(1)
- addr_start = int(addr[0], 16)
- addr_end = int(addr[1], 16)
- print("\tAddr start [{:x}] | end [{:x}]".format(addr_start, addr_end))
- # open and read mem
- try:
- mem_file = open(mem_filename, 'rb+')
- except IOError as e:
- print("[ERROR] Can not open file {}:".format(mem_filename))
- print(" I/O error({}): {}".format(e.errno, e.strerror))
- maps_file.close()
- exit(1)
- # read heap
- mem_file.seek(addr_start)
- heap = mem_file.read(addr_end - addr_start)
- # find string
- try:
- i = heap.index(bytes(search_string, "ASCII"))
- except Exception:
- print("Can't find '{}'".format(search_string))
- maps_file.close()
- mem_file.close()
- exit(0)
- print("[*] Found '{}' at {:x}".format(search_string, i))
- # write the new string
- print("[*] Writing '{}' at {:x}".format(write_string, addr_start + i))
- mem_file.seek(addr_start + i)
- mem_file.write(bytes(write_string, "ASCII"))
- # close files
- maps_file.close()
- mem_file.close()
- # there is only one heap in our example
- break
运行这个Python脚本
- zjucad@zjucad-ONDA-H110-MINI-V3-01:~/wangzhiqiang$ sudo ./loop.py 2542 test_memory test_hello
- [*] maps: /proc/2542/maps
- [*] mem: /proc/2542/mem
- [*] Found [heap]:
- pathname = [heap]
- addresses = 021dc000-021fd000
- permisions = rw-p
- offset = 00000000
- inode = 0
- Addr start [21dc000] | end [21fd000]
- [*] Found 'test_memory' at 10
- [*] Writing 'test_hello' at 21dc010
同时字符串输出的内容也已更改
- [633] test_memory (0x21dc010)
- [634] test_memory (0x21dc010)
- [635] test_memory (0x21dc010)
- [636] test_memory (0x21dc010)
- [637] test_memory (0x21dc010)
- [638] test_memory (0x21dc010)
- [639] test_memory (0x21dc010)
- [640] test_helloy (0x21dc010)
- [641] test_helloy (0x21dc010)
- [642] test_helloy (0x21dc010)
- [643] test_helloy (0x21dc010)
- [644] test_helloy (0x21dc010)
- [645] test_helloy (0x21dc010)
实验成功。
通过实践画出虚拟内存空间分布图
再列出内存空间分布图
基本上每个人或多或少都了解虚拟内存的空间分布,那如何验证它呢,下面会提到。
堆栈空间
首先验证栈空间的位置,我们都知道C中局部变量是存储在栈空间的,malloc分配的内存是存储在堆空间,所以可以通过打印出局部变量地址和malloc的返回内存地址的方式来验证堆栈空间在整个虚拟空间中的位置。
- #include
- #include
- #include
- /**
- * main - print locations of various elements
- *
- * Return: EXIT_FAILURE if something failed. Otherwise EXIT_SUCCESS
- */
- int main(void)
- {
- int a;
- void *p;
- printf("Address of a: %p\n", (void *)&a);
- p = malloc(98);
- if (p == NULL)
- {
- fprintf(stderr, "Can't malloc\n");
- return (EXIT_FAILURE);
- }
- printf("Allocated space in the heap: %p\n", p);
- return (EXIT_SUCCESS);
- }
- 编译运行:gcc -Wall -Wextra -pedantic -Werror main.c -o test; ./test
- 输出:
- Address of a: 0x7ffedde9c7fc
- Allocated space in the heap: 0x55ca5b360670
通过结果可以看出堆地址空间在栈地址空间下面,整理如图:
可执行程序
可执行程序也在虚拟内存中,可以通过打印main函数的地址,并与堆栈地址相比较,即可知道可执行程序地址相对于堆栈地址的分布。
- #include
- #include
- #include
- /**
- * main - print locations of various elements
- *
- * Return: EXIT_FAILURE if something failed. Otherwise EXIT_SUCCESS
- */
- int main(void)
- {
- int a;
- void *p;
- printf("Address of a: %p\n", (void *)&a);
- p = malloc(98);
- if (p == NULL)
- {
- fprintf(stderr, "Can't malloc\n");
- return (EXIT_FAILURE);
- }
- printf("Allocated space in the heap: %p\n", p);
- printf("Address of function main: %p\n", (void *)main);
- return (EXIT_SUCCESS);
- }
- 编译运行:gcc main.c -o test; ./test
- 输出:
- Address of a: 0x7ffed846de2c
- Allocated space in the heap: 0x561b9ee8c670
- Address of function main: 0x561b9deb378a
由于main(0x561b9deb378a) < heap(0x561b9ee8c670) < (0x7ffed846de2c),可以画出分布图如下:
virtual_memory_stack_heap_executable.png
命令行参数和环境变量
程序入口main函数可以携带参数:
通过程序可以看见这些元素在虚拟内存中的位置:
- #include
- #include
- #include
- /**
- * main - print locations of various elements
- *
- * Return: EXIT_FAILURE if something failed. Otherwise EXIT_SUCCESS
- */
- int main(int ac, char **av, char **env)
- {
- int a;
- void *p;
- int i;
- printf("Address of a: %p\n", (void *)&a);
- p = malloc(98);
- if (p == NULL)
- {
- fprintf(stderr, "Can't malloc\n");
- return (EXIT_FAILURE);
- }
- printf("Allocated space in the heap: %p\n", p);
- printf("Address of function main: %p\n", (void *)main);
- printf("First bytes of the main function:\n\t");
- for (i = 0; i < 15; i++)
- {
- printf("%02x ", ((unsigned char *)main)[i]);
- }
- printf("\n");
- printf("Address of the array of arguments: %p\n", (void *)av);
- printf("Addresses of the arguments:\n\t");
- for (i = 0; i < ac; i++)
- {
- printf("[%s]:%p ", av[i], av[i]);
- }
- printf("\n");
- printf("Address of the array of environment variables: %p\n", (void *)env);
- printf("Address of the first environment variable: %p\n", (void *)(env[0]));
- return (EXIT_SUCCESS);
- }
- 编译运行:gcc main.c -o test; ./test nihao hello
- 输出:
- Address of a: 0x7ffcc154a748
- Allocated space in the heap: 0x559bd1bee670
- Address of function main: 0x559bd09807ca
- First bytes of the main function:
- 55 48 89 e5 48 83 ec 40 89 7d dc 48 89 75 d0
- Address of the array of arguments: 0x7ffcc154a848
- Addresses of the arguments:
- [./test]:0x7ffcc154b94f [nihao]:0x7ffcc154b956 [hello]:0x7ffcc154b95c
- Address of the array of environment variables: 0x7ffcc154a868
- Address of the first environment variable: 0x7ffcc154b962
结果如下:
main(0x559bd09807ca) < heap(0x559bd1bee670) < stack(0x7ffcc154a748) < argv(0x7ffcc154a848) < env(0x7ffcc154a868) < arguments(0x7ffcc154b94f->0x7ffcc154b95c + 6)(6为hello+1('\0')) < env first(0x7ffcc154b962)
可以看出所有的命令行参数都是相邻的,并且紧接着就是环境变量。
argv和env数组地址是相邻的吗
上例中argv有4个元素,命令行中有三个参数,还有一个NULL指向标记数组的末尾,每个指针是8字节,8*4=32, argv(0x7ffcc154a848) + 32(0x20) = env(0x7ffcc154a868),所以argv和env数组指针是相邻的.
命令行参数地址紧随环境变量地址之后吗
首先需要获取环境变量数组的大小,环境变量数组是以NULL结束的,所以可以遍历env数组,检查是否为NULL,获取数组大小,代码如下:
- #include
- #include
- #include
- /**
- * main - print locations of various elements
- *
- * Return: EXIT_FAILURE if something failed. Otherwise EXIT_SUCCESS
- */
- int main(int ac, char **av, char **env)
- {
- int a;
- void *p;
- int i;
- int size;
- printf("Address of a: %p\n", (void *)&a);
- p = malloc(98);
- if (p == NULL)
- {
- fprintf(stderr, "Can't malloc\n");
- return (EXIT_FAILURE);
- }
- printf("Allocated space in the heap: %p\n", p);
- printf("Address of function main: %p\n", (void *)main);
- printf("First bytes of the main function:\n\t");
- for (i&n
当前题目:10张图22段代码,万字长文带你搞懂虚拟内存模型和Malloc内部原理
当前地址:http://www.csdahua.cn/qtweb/news22/299272.html网站建设、网络推广公司-快上网,是专注品牌与效果的网站制作,网络营销seo公司;服务项目有等
声明:本网站发布的内容(图片、视频和文字)以用户投稿、用户转载内容为主,如果涉及侵权请尽快告知,我们将会在第一时间删除。文章观点不代表本网站立场,如需处理请联系客服。电话:028-86922220;邮箱:631063699@qq.com。内容未经允许不得转载,或转载时需注明来源: 快上网