Undefined C
There is possible to run piece of code inside online c compiler like https://www.onlinegdb.com/online_c_compiler Or run locally. With base check is done with gcc compiler. There are many small tricks around running C code in practice that aren't covered in any generic tutorials, so here is list of topics that may arise while coding real C code outside of tutorials. For each case there is just small example, each of those could take whole chapter on its own.
Compile
hello_world.c
1 2 3 | int main() {
printf("Hello world\n");
}
|
1 2 | gcc hello_world.c -o hello_world
gcc -m32 hello_world.c -o hello_world_32 #for 32bit target
|
Syntax
Variables
Standard list of available types
Check type size
All types have size that are declared in bytes. Some of the types are machine dependents. like int/long, if there is needed machine independent types then there are int32_t/uint32_t/int64_t/uint64_t
Each architecture 8bit/16bit/32bit/64bit will have different size for those types
Use sizeof()
Running on x86 machine
1 2 3 4 5 6 7 8 9 10 | #include <stdint.h>
#include <stdlib.h>
#include <stdio.h>
int main() {
printf("Sizeof int %lu\n",sizeof(int));
printf("Sizeof int32_t %lu\n",sizeof(int32_t));
printf("Sizeof int64_t %lu\n",sizeof(int64_t));
printf("Sizeof long %lu\n",sizeof(long));
printf("Sizeof long long %lu\n",sizeof(long long));
}
|
Most safest/portable way is to use [u]int[8/16/32/64]_t types.
Defined macros'es to get type max and min values are
https://en.cppreference.com/w/c/types/limits
1 2 3 4 5 6 | #include <limits.h>
int main() {
printf("INT_MIN %d\n",INT_MIN);
printf("INT_MAX %d\n", INT_MAX);
printf("LONG_MIN %ld\n",LONG_MIN);
}
|
Example from AVR stdint.h
https://github.com/avrdudes/avr-libc/blob/main/include/stdint.h
Example from Libc
https://sourceware.org/git/?p=glibc.git;a=blob;f=stdlib/stdint.h
How to shoot the leg
When code suppose to run on 32bit and 64bit platform the size of type may vary. Need to take in account this case.
Functions
Function syntax, there is nothing interesting on functions
<RETURN_TYPE> <NAME>(<TYPE> <NAME>,..) {
<EXPR>
}
Write simple function
1 2 3 | int fun1() {
return -1;
}
|
Function can have multiple return statements. Here is example whne function have 3 return values.
1 2 3 4 5 | int fun2(int i) {
if (i<0) return -1;
if (i>0) return 1;
return 0;
}
|
Get address of function
1 | printf("fun1 address %016x",&fun1);//64bit platform
|
If statement
1 2 | if () ;
if () {}
|
One of the way to check error of returned functions is
1 2 | if ((c = getfun()) == 0) {
}
|
Most simplest and outdated way to do this is when getting input from command line
1 2 3 4 5 6 7 8 9 | #include <stdio.h>
int main() {
int c;
char ch;
while ((c = getchar()) != EOF ) {
ch = c;
printf("Typed character %c\n",c);
}
}
|
For cycle
For loop is one that may involve some trickery, its as simple as
1 2 | for (<INITIAL>;<TERMINATE CONDITION>;<AFTER CYCLE>) {
}
|
Go over values from 1 till 10
1 2 3 4 | int i=0;
for (i=1;i<=10;i++) {
printf("%d\n",i)
}
|
Now lets do it from 10 till 1
1 2 3 4 | int i=0;
for (i=10;i>0;i--) {
printf("%d\n",i)
}
|
Now lets make one liner
1 | for (i=0;i<10;i++,printf("%d\n",i));
|
Yes there is possible to write as many expressions as needed.
Structure
Structure allows to combine types under one new type. Structure is convenient way how to combine set of types and reuse them as one.
1 2 3 4 5 6 | struct struct1 {
uint8_t a;
uint16_t b;
uint32_t c;
uint64_t d;
};
|
Total intuitive size of structure would be
1 2 | int total_szie = sizeof(uint8_t) + sizeof(uint16_t) + sizeof(uint32_t) + sizeof(uint64_t);
int real_size = sizeof(struct1);
|
Types are placed inside structure to make fast access to them. Some instructions of CPU may require to access aligned memory addresses to not have penalty on accessing types inside structure.
To directly mess with alignment of types use attribute
1 | __attribute__ ((aligned (8)))
|
Use attributes to pack structure and be not architecture dependent.
1 2 3 4 5 6 | struct struct2 {
uint8_t a;
uint16_t b;
uint32_t c;
uint64_t d;
} __attribute__((packed));
|
Now let check size of structure after it packed
1 | int new_size = sizeof(struct2);
|
Also there is possible to add aligmnet to each time in structure
1 2 3 4 5 6 | struct struct3 {
uint8_t a __attribute__((aligned (8)));
uint16_t b __attribute__((aligned (8)));
uint32_t c __attribute__((aligned (8)));
uint64_t d __attribute__((aligned (8)));
} __attribute__((aligned (8)));
|
Now size of structure will be 32.
All results on amd64, other arch may differ.
How to shoot leg
Forget that struct size is not consistent.
Recursion
Recursion is technique that could be useful to write shorter code and deal with cycles. One thing that recursion suffer is that it consumes stack memory and its have default limit on platform.
1 2 3 4 5 6 7 8 9 10 11 12 13 | #include <stdio.h>
#include <stdlib.h>
int fun_r(int i) {
printf("val %d\n",i);
fun_r(i+1);
return 0;
}
int main()
{
fun_r(0);
}
|
Program will fail after its reach out of stack range. When increase the default stack limit it go more further.
Check default stack size
ulimit -s
Set stack size
ulimit -s 16384
Macro
There is many things useful as macros. There is many tricks in macros to emit useful parts of code.
Define values, as its enum.
1 2 3 | #define VAL_0 0
#define VAL_1 1
#define VAL_LAST VAL_1
|
Multiline macro
1 2 3 4 5 6 7 8 9 | #define INC_FUN(TYPE) TYPE inc_##TYPE(a TYPE){\
TYPE c=1\
return a + c\
}
INC_FUN(int)
INC_FUN(char)
INC_FUN(double)
INC_FUN(notype)
|
to check code expansion of macro run
gcc -E <SOURCE_FILE>
http://main.lv/writeup/c_macro_tricks.md
https://jadlevesque.github.io/PPMP-Iceberg/
Pointers
One the C most loved feature is pointers, they allow to access addresses without any sanity check and they dont have any lifetime, so anything is possible with those.
Pointer contains address which is interpreted according of pointer type
1 2 | int c;
int ptr=&c;
|
Go over array of chars
1 2 3 4 5 6 7 8 9 10 11 | #include <stdio.h>
#include <stdlib.h>
int main() {
char s[]="asd";
char *c=&s;
while (*c != 0) {
printf("NExt char %c addr %016x\n",*c,c);
c++;
}
}
|
Go over array of ints
1 2 3 4 5 6 7 8 | int i=0;
int arr[] = {9,7,5,3,1};
int *ptr = arr;
while (i<5) {
printf("Number value %d addr %016x\n",*ptr, ptr);
ptr++;
i++;
}
|
Pointer arithmetics like +1 will move to next address that is offset of type size. As example below structure size is 12, and increment of pointer to that structure increment address to sizeof structure. And yes address is pointing to not mapped memory, so it will segfault if accessed.
1 2 3 4 5 6 7 8 9 10 11 | struct size12 {
int a,b,c;
}
int main() {
struct size12 *s=0;
s++;
printf("%016x\n",s);
s++;
printf("%016x\n",s);
}
|
Double pointers are pointers to pointers
1 2 3 4 5 6 | #include <stdio.h>
int main(int argc, char **argv) {
char *arg = argv[0];
printf("Program name %s\n",arg);
}
|
How to shoot the leg
Run pointer in while loop incrementing pointer. It will stop only when segfaults.
Dont initialize pointer and it will have random value.
Allocate memory
From programs perspective memory allocation is adding address range to executable that can be addressed.
malloc should be accompanied with free statement, otherwise it will have memory leaks.
1 2 3 4 5 6 7 8 9 10 11 12 | #include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main() {
char *c = malloc(16);
memset(c,0,16);
int *arr = malloc(16*sizeof(int));
memset(arr,0,16*sizeof(int));
free(c);
free(arr);
}
|
Signed/Unsigned
Signed and unsigned variables differ just in one bit interpretation. But they have different behavior on minimal and maximal values.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 | #include <stdio.h>
#include <limits.h>
int main()
{
int i=INT_MAX;
unsigned int u=UINT_MAX;
printf("i=%d\n",i);
printf("u=%u\n",u);
i++;
u++;
printf("i=%d\n",i);
printf("u=%u\n",u);
i=0;
u=0;
i--;
u--;
printf("i=%d\n",i);
printf("u=%u\n",u);
}
|
Endianess
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 | #include <stdlib.h>
#include <stdio.h>
#include <fcntl.h>
#include <unistd.h>
#include <stdint.h>
int main() {
int arr[4] = {0x00112233,0x44556677,0x8899AABB, 0xCCDDEEFF};
printf("%08x\n",arr[0]);
printf("%08x\n",arr[1]);
printf("%08x\n",arr[2]);
printf("%08x\n",arr[3]);
FILE *f = fopen("int.hex","w+");
fprintf(f,"%08x",arr[0]);
fprintf(f,"%08x",arr[1]);
fprintf(f,"%08x",arr[2]);
fprintf(f,"%08x",arr[3]);
fclose(f);
int fd=open("int.bin",O_CREAT|O_RDWR,S_IWUSR|S_IRUSR|S_IRGRP|S_IRWXO);
write(fd,arr,sizeof(arr));
close(fd);
int i;
fd = open("int.bin2",O_CREAT|O_RDWR,S_IWUSR|S_IRUSR|S_IRGRP|S_IRWXO);
for (i=0;i<4;i++) {
uint32_t val = (arr[i]>>16) &0x0000ffff;
val += (arr[i]<<16)&0xffff0000;
write(fd,&val,sizeof(uint32_t));
}
close(fd);
}
|
While saving formated values to file you will get what you expect
$ cat int.hex
00112233445566778899aabbccddeeff
Saving just memory dump of all values, will give you different result
$ hexdump int.bin
0000000 2233 0011 6677 4455 aabb 8899 eeff ccdd
0000010
Need to swap 16bit pairs to look same as value memory dump
$ hexdump int.bin2
0000000 0011 2233 4455 6677 8899 aabb ccdd eeff
0000010
Compiler flags
Compiler have whole list of command line arguments that you can enable for different purposes, lets look into some of them https://gcc.gnu.org/onlinedocs/gcc/Option-Summary.html
Lets try to apply some of the flags to examples above.
Best starte options is, those will give you more warnings.
-Wall -Wextra
Most of the examples here was written in sloppy style, so adding extra checks like will find more issues with code, probably all of provided examples will show issues with this extra compiler flags
Wformat-security -Wduplicated-cond -Wfloat-equal -Wshadow -Wconversion -Wjump-misses-init -Wlogical-not-parentheses -Wnull-dereference
To get all macroses expanded in C code add compiler flag. Output will be C source with all macro expansion
-E
Output resulting file not to binary but to generated assembly add
-S
More readable output can be obtained with
gcc FILE.c -Wa,-adhln=FILE.S -g -fverbose-asm -masm=intel
Basic compiler optimisation flags that can speedup program or make it smaller
-O -O0 -O1 -O2 -O3 -Os -Ofast -Og -Oz
https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html#Optimize-Options
https://panthema.net/2013/0124-GCC-Output-Assembler-Code/
https://blogs.oracle.com/linux/post/making-code-more-secure-with-gcc-part-1
Shared library
Shared library is common way how to reuse big chunks of code.
1 2 3 4 5 6 7 8 9 10 11 12 | #include <stdio.h>
int fun1() {
return 1;
}
int fun2() {
printf("Function name fun2\n");
}
int fun3(int a, int b) {
return a+b;
}
|
$ gcc -c lib_share.c
$ gcc -shared -o lib_share.so libshare.o
$ ldd lib_share.so
linux-vdso.so.1 (0x00007ffdb994d000)
libc.so.6 => /usr/lib/libc.so.6 (0x00007f0c39400000)
/usr/lib64/ld-linux-x86-64.so.2 (0x00007f0c39835000)
Now lets link to our binary
1 2 3 4 5 6 7 8 9 10 11 12 | #include <stdio.h>
//functions that are implemented in shared lib
int fun1();
int fun2();
int fun3(int a, int b);
int main() {
fun1();
fun2();
fun3();
}
|
$ gcc -L. -lshare use_share.c -o use_share
./use_share
./use_share: error while loading shared libraries: libshare.so: cannot open shared object file: No such file or directory
ldd ./use_share
linux-vdso.so.1 (0x00007ffedcad5000)
libshare.so => not found
libc.so.6 => /usr/lib/libc.so.6 (0x00007f7b99a00000)
/lib64/ld-linux-x86-64.so.2 => /usr/lib64/ld-linux-x86-64.so.2 (0x00007f7b99c90000)
Library is not in search path
$ export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:`pwd`
$ ./use_share
$ ldd use_share
linux-vdso.so.1 (0x00007fffc415c000)
libshare.so => /your/path/libshare.so (0x00007f48b03c6000)
libc.so.6 => /usr/lib/libc.so.6 (0x00007f48b0000000)
/lib64/ld-linux-x86-64.so.2 => /usr/lib64/ld-linux-x86-64.so.2 (0x00007f48b03d2000)
Other way is to set custom library search location. Lets set it to search in current directory. And no need to modify LD_LIBRARY_PATH
$ gcc use_share.c -o use_share -L. -lshare -Wl,-rpath=./
$ ldd ./use_share
linux-vdso.so.1 (0x00007fff5c964000)
libshare.so => ./libshare.so (0x00007f791000f000)
libc.so.6 => /usr/lib/libc.so.6 (0x00007f790fc00000)
/lib64/ld-linux-x86-64.so.2 => /usr/lib64/ld-linux-x86-64.so.2 (0x00007f791001b000)
So now executable runs libshare from local directory. Ofc there is possible to install shared library into systems /usr/lib
Static library
Static binary
Static binary don't use any shared libraries, and its possible to built it once and distribute on other platforms without need to install dependencies.
1 2 3 4 5 6 | #include <stdio.h>
#include <stdlib.h>
int main(int argc, char **argv) {
return 0;
}
|
First step to compile file and see that is dynamically lined
$ gcc static_elf.c -o static_elf
$ file static_elf
static_elf: ELF 64-bit LSB pie executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, BuildID[sha1]=bc6ac706075874858e1c4a8accf77e704f4ea25a, for GNU/Linux 4.4.0, with debug_info, not stripped
$ ldd ./static_elf
linux-vdso.so.1 (0x00007ffccef49000)
libc.so.6 => /usr/lib/libc.so.6 (0x00007fcbb8800000)
/lib64/ld-linux-x86-64.so.2 => /usr/lib64/ld-linux-x86-64.so.2 (0x00007fcbb8b63000)
After adding static option we can verify that tools now report it as statically linked. Size of binary increased as all functions that require to run executable are now contained in binary.
$ gcc static_elf.c -static -o static_elf
$ file static_elf
static_elf: ELF 64-bit LSB executable, x86-64, version 1 (GNU/Linux), statically linked, BuildID[sha1]=c54d2e4d2a3d11fe920bee9a44af045c6f67ab56, for GNU/Linux 4.4.0, with debug_info, not stripped
$ ldd static_elf
not a dynamic executable
Statically compiled file should work on most platforms.
Atomic
HERE
Multithreading
HERE
Basic usage
File manipulation with libc
Create file open data using libc functions
1 2 3 4 5 6 7 8 9 10 | #include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main() {
FILE *f = fopen("file.txt","w+");
char *s = "Hello";
fwrite(s,1,strlen(s),f);
fclose(f);
}
|
Open file and read data back
1 2 3 4 5 6 7 8 9 10 11 12 13 | #include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main() {
FILE *f = fopen("file.txt","r");
char buf[128];
int r;
r = fread(buf,1,128,f);
buf[r] = 0;
printf("->%s\n",buf,r);
fclose(f);
}
|
File manipulation with syscalls
Now lets do the same without using libc functions using syscall function to directly use syscalls, its also straightforward to rewrite example for assembly.
1 2 3 4 5 6 7 8 9 10 11 12 | #include <unistd.h>
#include <fcntl.h>
#include <sys/syscall.h>
#include <string.h>
int main(void) {
int fd = syscall(SYS_open, "sys.txt", O_CREAT|O_WRONLY, S_IRWXU|S_IRGRP|S_IXGRP);
char s[] = "hello sycall\n";
syscall(SYS_write, fd, s, strlen(s));
syscall(SYS_close, fd);
return 0;
}
|
Read data from file
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | #include <unistd.h>
#include <fcntl.h>
#include <sys/syscall.h>
#include <string.h>
int main(void) {
int fd = syscall(SYS_open, "sys.txt", O_RDONLY);
char s[128];
int r = syscall(SYS_read, fd, s, 128);
s[r] = 0;
syscall(SYS_close, fd);
syscall(SYS_write, 0, s, r);
return 0;
}
|
Advanced topics
Kernel module
Linux kernel, macos kernel and *BSD's kernels written in C, so there is possibility to write kernel modules in C for some of those.
Example will not match some specific things to local distribution.
1 |
|
http://main.lv/writeup/kernel_hello_world.md
Linking
Linking is one of the most interesting parts of compiling of C code. When object file is created it contains functions and variables that can be of different type. And linking tries to resolve all of those. So there is possible to have fun with linking and content of object files.
First example is piece of C code that can be compiled to object file, but it will not able to resolve to executable.
gcc -c link_elf.c
1 2 3 4 | int main() {
fun1();
fun2();
}
|
So we can see that fun1 and fun2 are marked as undefined in object file. If we try compile it will not able to find those. So lets create one more object file
$ readelf -a link_elf.o
Symbol table '.symtab' contains 6 entries:
Num: Value Size Type Bind Vis Ndx Name
0: 0000000000000000 0 NOTYPE LOCAL DEFAULT UND
1: 0000000000000000 0 FILE LOCAL DEFAULT ABS link_elf.c
2: 0000000000000000 0 SECTION LOCAL DEFAULT 1 .text
3: 0000000000000000 31 FUNC GLOBAL DEFAULT 1 main
4: 0000000000000000 0 NOTYPE GLOBAL DEFAULT UND fun1
5: 0000000000000000 0 NOTYPE GLOBAL DEFAULT UND fun2
link_fun1.c
1 2 3 4 5 6 | void fun1() {
printf("Hello fun1\n");
}
void fun2() {
printf("Hello fun2\n");
}
|
So now we have object file with funtions that are defined. and we see that its now have undefine pritnf/puts function there.
readelf -a link_fun1.o
Symbol table '.symtab' contains 7 entries:
Num: Value Size Type Bind Vis Ndx Name
0: 0000000000000000 0 NOTYPE LOCAL DEFAULT UND
1: 0000000000000000 0 FILE LOCAL DEFAULT ABS link_fun1.c
2: 0000000000000000 0 SECTION LOCAL DEFAULT 1 .text
3: 0000000000000000 0 SECTION LOCAL DEFAULT 5 .rodata
4: 0000000000000000 22 FUNC GLOBAL DEFAULT 1 fun1
5: 0000000000000000 0 NOTYPE GLOBAL DEFAULT UND puts
6: 0000000000000016 22 FUNC GLOBAL DEFAULT 1 fun2
we can merge both of those files together
1 | gcc -o link_elf link_elf.o link_fun1.o
|
The function in object files dont have any idea about input output types. That why anything can be linked that just match name lets rewrite code like this
1 2 3 4 5 6 | int fun1(int i) {
printf("Hello fun1\n");
}
int fun2(int i) {
printf("Hello fun2\n");
}
|
And this links without issue. Theat this as 2 sets that are merge together only few thins know when linking things. Return type, and function arguments arent exposed when object file is created.
Functions can have aliases.
link_fun2.c
1 2 3 | static void fun2() {
printf("hello 2\n");
} __attribute__ ((alias("fun1")));
|
Now function is local.
Symbol table '.symtab' contains 6 entries:
Num: Value Size Type Bind Vis Ndx Name
0: 0000000000000000 0 NOTYPE LOCAL DEFAULT UND
1: 0000000000000000 0 FILE LOCAL DEFAULT ABS link_fun2.c
2: 0000000000000000 0 SECTION LOCAL DEFAULT 1 .text
3: 0000000000000000 0 SECTION LOCAL DEFAULT 5 .rodata
4: 0000000000000000 22 FUNC LOCAL DEFAULT 1 fun2
5: 0000000000000000 0 NOTYPE GLOBAL DEFAULT UND puts
Lets compile all object to executable. And the function fun2 isnt used in this case,
$ gcc link_fun1.o link_fun2.o link_elf.o -o link_elf
$ ./link_elf
Hello fun1
Hello fun2
lets witch aliasing between 2 functions fun2
link_fun1.o
0: 0000000000000000 0 NOTYPE LOCAL DEFAULT UND
1: 0000000000000000 0 FILE LOCAL DEFAULT ABS link_fun1.c
2: 0000000000000000 0 SECTION LOCAL DEFAULT 1 .text
3: 0000000000000000 0 SECTION LOCAL DEFAULT 5 .rodata
4: 000000000000001d 29 FUNC LOCAL DEFAULT 1 fun2
5: 0000000000000000 29 FUNC GLOBAL DEFAULT 1 fun1
6: 0000000000000000 0 NOTYPE GLOBAL DEFAULT UND puts
link_fun2.o
0: 0000000000000000 0 NOTYPE LOCAL DEFAULT UND
1: 0000000000000000 0 FILE LOCAL DEFAULT ABS link_fun2.c
2: 0000000000000000 0 SECTION LOCAL DEFAULT 1 .text
3: 0000000000000000 0 SECTION LOCAL DEFAULT 5 .rodata
4: 0000000000000000 22 FUNC GLOBAL DEFAULT 1 fun2
5: 0000000000000000 0 NOTYPE GLOBAL DEFAULT UND puts
$ gcc link_fun1.o link_fun2.o link_elf.o -o link_elf
$ ./link_elf
Hello fun1
hello 2
So all of this plays role in linking object files. There is more interesting utilit called ld its doing things on lower level then gcc.
Extern
Attributes
PASS
Creating shared library
PASS
Create static libraries
PASS
Join all objects together
PASS
Compile with musl
The libc is not the only option as standard c library, there is few others one of them is musl
$ musl-gcc hello_world.c -o hello_world
$ file ./hello_world
hello_world_musl: ELF 64-bit LSB pie executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib/ld-musl-x86_64.so.1, not stripped
Inspect elf files
There is few utilities that help to check if elf file is ok.
ldd show what kind of shared libraries elf will try to load
$ ldd hello_world
linux-vdso.so.1 (0x00007fffcb2ae000)
libc.so.6 => /usr/lib/libc.so.6 (0x00007ffb80c00000)
/lib64/ld-linux-x86-64.so.2 => /usr/lib64/ld-linux-x86-64.so.2 (0x00007ffb80fb9000)
Readelf allows to inspect content of elf files, headers and interpret values in headers. In few example above we allready used that feature to check content of compiled objectfiles.
$ readelf -s ./hello_world
Symbol table '.symtab' contains 37 entries:
Num: Value Size Type Bind Vis Ndx Name
0: 0000000000000000 0 NOTYPE LOCAL DEFAULT UND
1: 0000000000000000 0 FILE LOCAL DEFAULT ABS abi-note.c
2: 000000000000039c 32 OBJECT LOCAL DEFAULT 4 __abi_tag
3: 0000000000000000 0 FILE LOCAL DEFAULT ABS init.c
4: 0000000000000000 0 FILE LOCAL DEFAULT ABS crtstuff.c
5: 0000000000001070 0 FUNC LOCAL DEFAULT 14 deregister_tm_clones
6: 00000000000010a0 0 FUNC LOCAL DEFAULT 14 register_tm_clones
7: 00000000000010e0 0 FUNC LOCAL DEFAULT 14 __do_global_dtors_aux
8: 0000000000004030 1 OBJECT LOCAL DEFAULT 25 completed.0
9: 0000000000003df0 0 OBJECT LOCAL DEFAULT 20 __do_global_dtor[...]
10: 0000000000001130 0 FUNC LOCAL DEFAULT 14 frame_dummy
11: 0000000000003de8 0 OBJECT LOCAL DEFAULT 19 __frame_dummy_in[...]
12: 0000000000000000 0 FILE LOCAL DEFAULT ABS hello_world.c
13: 0000000000000000 0 FILE LOCAL DEFAULT ABS crtstuff.c
14: 00000000000020b0 0 OBJECT LOCAL DEFAULT 18 __FRAME_END__
15: 0000000000000000 0 FILE LOCAL DEFAULT ABS
16: 0000000000003df8 0 OBJECT LOCAL DEFAULT 21 _DYNAMIC
17: 0000000000002010 0 NOTYPE LOCAL DEFAULT 17 __GNU_EH_FRAME_HDR
18: 0000000000004000 0 OBJECT LOCAL DEFAULT 23 _GLOBAL_OFFSET_TABLE_
19: 0000000000000000 0 FUNC GLOBAL DEFAULT UND __libc_start_mai[...]
20: 0000000000000000 0 NOTYPE WEAK DEFAULT UND _ITM_deregisterT[...]
21: 0000000000004020 0 NOTYPE WEAK DEFAULT 24 data_start
22: 0000000000000000 0 FUNC GLOBAL DEFAULT UND puts@GLIBC_2.2.5
23: 0000000000004030 0 NOTYPE GLOBAL DEFAULT 24 _edata
24: 0000000000001154 0 FUNC GLOBAL HIDDEN 15 _fini
25: 0000000000004020 0 NOTYPE GLOBAL DEFAULT 24 __data_start
26: 0000000000000000 0 NOTYPE WEAK DEFAULT UND __gmon_start__
27: 0000000000004028 0 OBJECT GLOBAL HIDDEN 24 __dso_handle
28: 0000000000002000 4 OBJECT GLOBAL DEFAULT 16 _IO_stdin_used
29: 0000000000004038 0 NOTYPE GLOBAL DEFAULT 25 _end
30: 0000000000001040 38 FUNC GLOBAL DEFAULT 14 _start
31: 0000000000004030 0 NOTYPE GLOBAL DEFAULT 25 __bss_start
32: 0000000000001139 26 FUNC GLOBAL DEFAULT 14 main
33: 0000000000004030 0 OBJECT GLOBAL HIDDEN 24 __TMC_END__
34: 0000000000000000 0 NOTYPE WEAK DEFAULT UND _ITM_registerTMC[...]
35: 0000000000000000 0 FUNC WEAK DEFAULT UND __cxa_finalize@G[...]
36: 0000000000001000 0 FUNC GLOBAL HIDDEN 12 _init
No standard library
Lets write hello world without libc.
noc.c
1 2 3 | void _start() {
}
|
$ gcc -c noc.c
$ ld -dynamic-linker /lib/ld-linux.so.2 noc.o -o noc
$ file noc
noc: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), statically linked, not stripped
Next step to make it more working then segfaulting.
1 2 3 4 5 6 7 | void _start() {
asm ( \
"movl $1,%eax\n" \
"xor %ebx,%ebx\n" \
"int $128\n" \
);
}
|
Now this is all about calling the syscalls
Lets print the message
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 | signed int write(int fd, const void *buf, unsigned int size)
{
signed int ret;
asm volatile
(
"syscall"
: "=a" (ret)
// EDI RSI RDX
: "0"(1), "D"(fd), "S"(buf), "d"(size)
: "rcx", "r11", "memory"
);
return ret;
}
void _start() {
write(1,"no libc",8);
asm ( \
"movl $1,%eax\n" \
"xor %ebx,%ebx\n" \
"int $128\n" \
);
}
|
http://main.lv/writeup/making_c_executables_smaller.md
Memory leaks
Memory leaks is cruitial part of C language. Default case when they are detected are when allocated memory wasn free'd after use. If amount of this type of memory increasing then its can eventually fill whole memory and system will be unresponsive. Here is simple example how memory leak created and how to detect it.
1 2 3 4 5 6 7 8 | #include <stdlib.h>
int main() {
char *ptr = malloc(12);
return 0;
}
|
The best way to detect it to use valgrind.
$ valgrind ./malloc
==778== HEAP SUMMARY:
==778== in use at exit: 12 bytes in 1 blocks
==778== total heap usage: 2 allocs, 1 frees, 1,036 bytes allocated
There is seen 2 allocs and 1 free. But we see that 12bytes after exit. So our created leak is detected. More complex example. So now we created leaking function and we called it 5 times. But in larger code base it would be nice to see location of leaks.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 | #include <stdlib.h>
int* mem_alloc(int sz) {
int *ret=NULL;
if (sz < 0) {
return NULL;
}
ret = malloc(sz*sizeof(int));
if (sz>10) {
return NULL;
}
return ret;
}
int main() {
mem_alloc(0);
free(mem_alloc(1));
mem_alloc(100);
free(mem_alloc(2));
mem_alloc(10);
return 0;
}
|
There is 3 blocks that leaks, and we see where its comming from there is possible to guess but it would better to have position of where leak located.
valgrind --leak-check=full --track-origins=yes --log-file=log.txt ./memleak2
==4974== HEAP SUMMARY:
==4974== in use at exit: 440 bytes in 3 blocks
==4974== total heap usage: 5 allocs, 2 frees, 452 bytes allocated
==4974==
==4974== 0 bytes in 1 blocks are definitely lost in loss record 1 of 3
==4974== at 0x4841888: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==4974== by 0x109179: mem_alloc (in /home/fam/prog/c/undefined_c/memleak2)
==4974== by 0x10919E: main (in /home/fam/prog/c/undefined_c/memleak2)
==4974==
==4974== 40 bytes in 1 blocks are definitely lost in loss record 2 of 3
==4974== at 0x4841888: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==4974== by 0x109179: mem_alloc (in /home/fam/prog/c/undefined_c/memleak2)
==4974== by 0x1091D6: main (in /home/fam/prog/c/undefined_c/memleak2)
==4974==
==4974== 400 bytes in 1 blocks are definitely lost in loss record 3 of 3
==4974== at 0x4841888: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==4974== by 0x109179: mem_alloc (in /home/fam/prog/c/undefined_c/memleak2)
==4974== by 0x1091BA: main (in /home/fam/prog/c/undefined_c/memleak2)
==4974==
==4974== LEAK SUMMARY:
==4974== definitely lost: 440 bytes in 3 blocks
==4974== indirectly lost: 0 bytes in 0 blocks
==4974== possibly lost: 0 bytes in 0 blocks
==4974== still reachable: 0 bytes in 0 blocks
==4974== suppressed: 0 bytes in 0 blocks
Add compilation option g3
gcc -g3 memleak2.c -o memleak2
Now it shows source lines and trace from where the leaking code where called. Thats looks better now.
valgrind --leak-check=full --track-origins=yes --log-file=log.txt ./memleak2
==5073== HEAP SUMMARY:
==5073== in use at exit: 440 bytes in 3 blocks
==5073== total heap usage: 5 allocs, 2 frees, 452 bytes allocated
==5073==
==5073== 0 bytes in 1 blocks are definitely lost in loss record 1 of 3
==5073== at 0x4841888: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==5073== by 0x109179: mem_alloc (memleak2.c:10)
==5073== by 0x10919E: main (memleak2.c:22)
==5073==
==5073== 40 bytes in 1 blocks are definitely lost in loss record 2 of 3
==5073== at 0x4841888: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==5073== by 0x109179: mem_alloc (memleak2.c:10)
==5073== by 0x1091D6: main (memleak2.c:30)
==5073==
==5073== 400 bytes in 1 blocks are definitely lost in loss record 3 of 3
==5073== at 0x4841888: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==5073== by 0x109179: mem_alloc (memleak2.c:10)
==5073== by 0x1091BA: main (memleak2.c:26)
==5073==
==5073== LEAK SUMMARY:
==5073== definitely lost: 440 bytes in 3 blocks
==5073== indirectly lost: 0 bytes in 0 blocks
==5073== possibly lost: 0 bytes in 0 blocks
==5073== still reachable: 0 bytes in 0 blocks
==5073== suppressed: 0 bytes in 0 blocks
==5073==
Code coverage
Compile file with extra flags and generate gcov file output. Ther is only one branch not used. Coverage should show with part isnt used.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 | #include <stdio.h>
int fun1(int a) {
if (a < 0) {
printf("Smaller then zero\n");
}
if (a==0) {
printf("Equails to zero\n");
}
if (a>0) {
printf("Bigger then zero\n");
}
}
int main() {
printf("Start\n");
fun1(0);
fun1(1);
return 0;
}
|
$ gcc -fprofile-arcs -ftest-coverage coverage.c -o coverage
$ gcov ./coverage
File 'coverage.c'
Lines executed:92.31% of 13
Creating 'coverage.c.gcov'
Lines executed:92.31% of 13
Gcov file content. So we scant see with line wasnt executed.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 | -: 0:Source:coverage.c
-: 0:Graph:coverage.gcno
-: 0:Data:coverage.gcda
-: 0:Runs:1
-: 1:#include <stdio.h>
-: 2:
2: 3:int fun1(int a) {
2: 4: if (a < 0) {
#####: 5: printf("Smaller then zero\n");
-: 6: }
2: 7: if (a==0) {
1: 8: printf("Equails to zero\n");
-: 9: }
2: 10: if (a>0) {
1: 11: printf("Bigger then zero\n");
-: 12: }
2: 13:}
-: 14:
1: 15:int main() {
-: 16:
1: 17: printf("Start\n");
1: 18: fun1(0);
1: 19: fun1(1);
-: 20:
1: 21: return 0;
-: 22:}
|
Profiling
Some parts of code can take substantial amount of time and those parts need to be identified.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 | #include <stdio.h>
#include <stdlib.h>
#include <math.h>
void slow_sin() {
float r=0.0f;
for (int i=0;i<10000000;i++) {
r += sinf(M_PI/8);
}
}
void slower_sin() {
double r=0.0f;
for (int i=0;i<10000000;i++) {
r += sin(M_PI/8);
}
}
void fast_sin() {
float pre_calc = sinf(M_PI/8);
float r = 0.0f;
for (int i=0;i<10000000;i++) {
r += pre_calc;
}
}
int main() {
slow_sin();
slower_sin();
fast_sin();
}
|
Compile and rung with profiling
gcc -pg perf_speed.c -o perf_speed -lm
./perf_speed
gprof perf_speed gmon.cov
Sanitizer
C as a greate language have good features in standart such as undefined behaviour. And also there is possible to overwrite any data you whant with your code. One of the favorite mistake is to write some buffer overruns. Its possible to catch this type of errors with stack protection
So in code belove there is possible to write in to array of size 8 more then 8 characters. This is because the is no any boundry check. C runtime will be able to detect this kind of things.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | #include <stdio.h>
#include <stdlib.h>
#include <string.h>
void fun(char *str,int size) {
char local_var[8];
memcpy(local_var, str, size);
printf("Whats inside a stack? %s\n",local_var);
}
int main() {
char some_str1[] = "Hello!";
char some_str2[] = "Hello all!!!";
fun(some_str1,strlen(some_str1));
fun(some_str2,strlen(some_str2));
}
|
Whats inside a stack? Hello!
Whats inside a stack? Hello all!!!
*** stack smashing detected ***: terminated
fish: Job 1, './stack_overrun' terminated by signal SIGABRT (Abort)
If this isnt happening there is possible to add -fstack-protector to compile flags.
C have whole list of undefined behaviours incorporated in standard https://en.cppreference.com/w/c/language/behavior
functions f variable a isnt initialized so its undefined behaviour but there still will be some value. Run few times and each time it returns new value when f(0).
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | #include <stdio.h>
size_t f(int x)
{
size_t a;
if(x) // either x nonzero or UB
a = 42;
return a;
}
int main() {
printf("%d\n",f(0));
printf("%d\n",f(1));
printf("%d\n",f(42));
}
|
Division by zero. Function f dont check if divisor is 0. Programm going to abort. add flag -fsanitize=integer-divide-by-zero and it will be detected at runtime
1 2 3 4 5 6 7 8 9 10 11 12 | #include <stdio.h>
size_t f(int x)
{
return 10/x;
}
int main() {
printf("%d\n",f(0));
printf("%d\n",f(1));
printf("%d\n",f(42));
}
|
undefined_b.c:5:14: runtime error: division by zero
fish: Job 1, './undefined_b' terminated by signal SIGFPE (Floating point exception)
Write plugins
Preload library
Embedding C
Most of the programming languages support embeding C. As C language have where simple functiong naming when its mangled to object format it makes it easy target when linking with other languages. Most of other languages have incompatible naming for functions when compiled to binary.
Embed in C++
lib.h
1 2 3 4 | #include <stdlib.h>
#include <stdio.h>
int fun_secret_1();
|
lib.c
1 2 3 4 5 6 | #include "lib.h"
int fun_secret_1() {
printf("Hello from C\n");
return -1;
}
|
First thing to notice is when file is compiled with C++ is that the name of the function are in different format then when its compiled with C.
$ g++ -c lib.c
$ readelf -s lib.o
Symbol table '.symtab' contains 6 entries:
Num: Value Size Type Bind Vis Ndx Name
0: 0000000000000000 0 NOTYPE LOCAL DEFAULT UND
1: 0000000000000000 0 FILE LOCAL DEFAULT ABS lib.c
2: 0000000000000000 0 SECTION LOCAL DEFAULT 1 .text
3: 0000000000000000 0 SECTION LOCAL DEFAULT 5 .rodata
4: 0000000000000000 26 FUNC GLOBAL DEFAULT 1 _Z12fun_secret_1v
5: 0000000000000000 0 NOTYPE GLOBAL DEFAULT UND puts
Lets tell C++ that file is C language by adding extern "c"
lib.h
1 2 3 4 5 6 7 8 | #include <stdlib.h>
#include <stdio.h>
extern "C" {
int fun_secret_1();
}
|
lib.c
1 2 3 4 5 6 7 8 9 10 | #include "lib.h"
extern "C" {
int fun_secret_1() {
printf("Hello from C\n");
return -1;
}
}
|
Now compiled object file have C function names.
$ g++ lib.c -c
$ readelf -s lib.o
Symbol table '.symtab' contains 6 entries:
Num: Value Size Type Bind Vis Ndx Name
0: 0000000000000000 0 NOTYPE LOCAL DEFAULT UND
1: 0000000000000000 0 FILE LOCAL DEFAULT ABS lib.c
2: 0000000000000000 0 SECTION LOCAL DEFAULT 1 .text
3: 0000000000000000 0 SECTION LOCAL DEFAULT 5 .rodata
4: 0000000000000000 26 FUNC GLOBAL DEFAULT 1 fun_secret_1
5: 0000000000000000 0 NOTYPE GLOBAL DEFAULT UND puts
__cppembed.cpp___
1 2 3 4 5 | #include "lib.h"
int main() {
fun_secret_1();
}
|
Doing oposite way running C++ from C
/writeup/wraping_c_plus_plus_exceptions_templates_and_classes_in_c.md
Embed in Go
lib.h
1 2 3 4 | #include <stdlib.h>
#include <stdio.h>
int fun_secret_1();
|
lib.c
1 2 3 4 5 6 | #include "lib.h"
int fun_secret_1() {
printf("Hello from C\n");
return -1;
}
|
main.go
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | package main
// #cgo CFLAGS: -g -Wall
// #include <stdlib.h>
// #include "lib.h"
import "C"
import (
"fmt"
)
func main() {
fmt.Println("Start program")
C.fun_secret_1()
fmt.Println("End program")
}
|
go build
https://karthikkaranth.me/blog/calling-c-code-from-go/
Embed in Swift
/writeup/linux_hello_world_in_swift.md
Embed in Rust
lib.c
1 2 3 4 5 6 7 | #include <stdio.h>
#include <stdlib.h>
int fun_secret_1() {
printf("Hello from C\n");
return -1;
}
|
1 2 3 4 5 6 7 8 9 10 | extern "C" {
fn fun_secret_1();
}
//rustc main.rs -o hello
fn main() {
println!("Start program");
unsafe {fun_secret_1()}
println!("End program");
}
|
Compile with
gcc -c lib.c
gcc -shared lib.o -o liblib.so
rustc main.rs -l lib -L . -o hello -C link-arg="-Wl,-rpath=./"
https://dev.to/xphoniex/how-to-call-c-code-from-rust-56do
Lua in C
/writeup/embedding_lua_in_c.md
Python in C
Multiplatform
Different flags
Check architecture
AArch64
https://snapshots.linaro.org/gnu-toolchain/13.0-2022.08-1/aarch64-linux-gnu/
download any of the version of gcc and extract
Add bin directory location to env variable PATH
export PATH=$PATH:`pwd`
main.c
1 2 3 4 5 | #include <stdio.h>
int main() {
printf("Hello world arm64\n");
}
|
$ arch64-linux-gnu-gcc main.c -o main
$ ./main
qemu-aarch64: Could not open '/lib/ld-linux-aarch64.so.1': No such file or directory
$ file ./main
./main: ELF 64-bit LSB executable, ARM aarch64, version 1 (SYSV), dynamically linked, interpreter /lib/ld-linux-aarch64.so.1, BuildID[sha1]=12448d90030e2ad23dbe6b7bc82a4fa7b7de9659, for GNU/Linux 3.7.0, with debug_info, not stripped
Download sysroot image from linaro page. With running
strace ./main
It showed that searched path for libraries are in
/usr/gnemul/qemu-aarch64/lib/
Found missing libc and ld-linux-aarch64 inside sysroot archive and copied at searched location amd now AArch64 binary is running.
$ ./main
Hello world arm64
AVR8
AVR is 8bit CPU that is quite popular for hobbiest. As baremetal device its doesnt have full libc support, and needs some setup before its possible to do basics things with it.
avr_echo.c
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 | #include <avr/io.h>
#define FOSC 16000000UL
#define BAUD 9600
#define MYUBRR FOSC/16/BAUD-1
void USART_Init( unsigned int ubrr)
{
UBRRH = (unsigned char)(ubrr>>8);
UBRRL = (unsigned char)ubrr;
UCSRB = (1<<RXEN)|(1<<TXEN);
UCSRC = (1<<URSEL)|(1<<USBS)|(3<<UCSZ0);
}
int main()
{
char c;
USART_Init( MYUBRR );
while(1)
{
while ( !(UCSRA & (1<<RXC))){};
c = UDR;
while (!(UCSRA & (1<<UDRE))){};
UDR = c;
}
return 0;
}
|
avr-gcc avr_echo.c -mmcu=atmega16 -Wall -funsigned-char -funsigned-bitfields -fpack-struct -fshort-enums -o avr_echo.out
Next steps woule be to programm it, in case you have ISPv2 programmer and ATmega16 chip
1 2 3 | avr-objdump -s --disassemble avr_echo.out > avr_echo.s
avr-objcopy -j .text -O ihex avr_echo.out avr_echo.hex
avrdude -pm16 -cavrispv2 -Pusb -U flash:w:avr_echo.hex
|