Assignment 6 (Project 2)

Paul Krzyzanowski

October 6, 2020

Deadline: Wednesday, October 21, 2020 11:59pm

Updated October 9 to add extra credit support for multiple filenames in hidefile.

Introduction

We looked at the ability to do function interposition, where we can create an alternate version of a library function and force a program to load and use that version instead. This allows us to alter the behavior of library functions and, thus, the programs that use them.

This assignment requires you to write two small functions that interpose two standard library functions. The assignment comprises two parts. Each function is relevant to one part of the assignment.

Groups

This is an individual assignment. You are expected to do this on your own and submit your own version of the code.

Environment

Your submission will be tested on Rutgers iLab Linux systems. You can most likely develop this on any other Linux system but you are responsible for making sure that it will work on the iLab systems. You cannot do this assignment on macOS; the BSD functions it uses for dynamic linking are somewhat different.

Download the file hw-6.zip. It contains makefiles, which compile the code and also run small tests, as well as a driver program for part 2 of this assignment. The instructions below will refer to these files.

When you unzip the file, you will see a directory called hw-6 with three subdirectories:

random

contains the example of replacing the random number generator that we covered in class. The file random.c is the main program that prints a set of 10 random numbers and the file myrand.c is our custom implementation of the rand library function. The Makefile has targets to compile both of these and run a small test. You can run

make

to compile the shared library or run

make test

to compile (if needed) and run a test. You should run this to ensure that your environment is set up correctly and you have no problems preloading libraries.

hidefile

contains files you need for part 1 of this assignment.

unexpire

contains files you need for part 2 of the assignment.

Part 1: Hiding files

In discussing malware, we covered the concept of a rootkit. This was malware that is designed to stay hidden on the victim’s computer. If victims don’t see the software then they don’t know it’s there.

There are various ways of hiding content. It could be placed in obscure parts of the file system that might not be searched regularly. A file might be set to be hidden, but this attribute can be changed easily. On Linux systems, files whose names begin with a dot are not listed by default unless you use a -a option to the ls command. Clearly, this isn’t a good way of hiding files. The ideal way of hiding files is to modify the kernel to never report of their existence but this requires getting privileges to modify the kernel.

A way of hiding files in user space is to modify library functions that read directory contents. This way, programs that show directory contents will not see the file but someone who knows about it can open it.

In this assignment, you will use function interposition to hide the presence of files from commands such as ls and find.

Goal

The standard library function for reading directories on Linux is readdir, which is built on top of the getdents system call. Programs such as ls, find, sh, zsh, and others use readdir to read contents of directories.

Your program will interpose readdir so that it will hide a secret file whose name is set by an environment variable. You do this by setting the LD_PRELOAD environment variable:

export LD_PRELOAD=$PWD/hidefile.so

This tells the system to load the functions in the specified shared library before loading any others and to give these functions precedence.

For example, you can run the command ls to list all files in a directory:

$ ls -l
total 408
-rw------- 1 pxk allusers    115 Oct  6 12:26 present.pptx
-rw------- 1 pxk allusers    141 Oct  6 12:35 secretfile-zzz
-rw------- 1 pxk allusers  94698 Oct  6 12:33 status-report-1.txt
-rw------- 1 pxk allusers 166518 Oct  6 12:33 status-report-2.txt
-rw------- 1 pxk allusers  77166 Oct  6 12:33 status-report-3.txt
-rw------- 1 pxk allusers  48858 Oct  6 12:33 status-report-4.txt
-rw------- 1 pxk allusers     14 Oct  6 12:34 testfile.c

But if you set the environment variable HIDDEN to a specific file name, that file will not be visible:

$ export HIDDEN
$ HIDDEN=secretfile-zzz
$ ls -l
total 404
-rw------- 1 pxk allusers    115 Oct  6 12:26 present.pptx
-rw------- 1 pxk allusers  94698 Oct  6 12:33 status-report-1.txt
-rw------- 1 pxk allusers 166518 Oct  6 12:33 status-report-2.txt
-rw------- 1 pxk allusers  77166 Oct  6 12:33 status-report-3.txt
-rw------- 1 pxk allusers  48858 Oct  6 12:33 status-report-4.txt
-rw------- 1 pxk allusers     14 Oct  6 12:34 testfile.c

Note that secretfile-zzz is no longer displayed. Other commands that use readdir, such as find, will not show the file either:

$ find .
.
./status-report-1.txt
./status-report-4.txt
./status-report-2.txt
./status-report-3.txt
./present.pptx
./testfile.c

If you set HIDDEN to another name then files with that name will be hidden:

$ HIDDEN=status-report-1.txt  
$ ls -l
total 308
-rw------- 1 pxk allusers    115 Oct  6 12:26 present.pptx
-rw------- 1 pxk allusers    141 Oct  6 12:35 secretfile-zzz
-rw------- 1 pxk allusers 166518 Oct  6 12:33 status-report-2.txt
-rw------- 1 pxk allusers  77166 Oct  6 12:33 status-report-3.txt
-rw------- 1 pxk allusers  48858 Oct  6 12:33 status-report-4.txt
-rw------- 1 pxk allusers     14 Oct  6 12:34 testfile.c

And, of course, if you delete HIDDEN then all files should be visible:

$ unset HIDDEN
$ ls -l
total 408
-rw------- 1 pxk allusers    115 Oct  6 12:26 present.pptx
-rw------- 1 pxk allusers    141 Oct  6 12:35 secretfile-zzz
-rw------- 1 pxk allusers  94698 Oct  6 12:33 status-report-1.txt
-rw------- 1 pxk allusers 166518 Oct  6 12:33 status-report-2.txt
-rw------- 1 pxk allusers  77166 Oct  6 12:33 status-report-3.txt
-rw------- 1 pxk allusers  48858 Oct  6 12:33 status-report-4.txt
-rw------- 1 pxk allusers     14 Oct  6 12:34 testfile.c

What you need to do

Your assignment is to create an alternate version of the readdir Linux library function that will:

Call the real version of readdir
Check if the returned file name matches the name in the environment variable HIDDEN
If it does, call readdir again to skip over this file entry.
Return the file data to the calling program.

You will use Linux’s LD_PRELOAD mechanism. This is an environment variable that forces the listed shared libraries to be loaded for programs you run. The dynamic linker will search these libraries first when it resolves symbols in the program. This allows you to take control and replace library functions that programs use with your own versions.

For this part, you need to write just one function in a file that you will name hidefile.c.

The function will be your implementation of readdir. You’ll give it the same name and it should take the same parameters as the real readdir. It returns the same data type as the original version.

Because you need to call the real version of readdir from your readdir, you will need to have your function dynamically link the original readdir function and pass the request to that function. Read the references below for instructions on how to use the ldsym function to load the real version of the function from your code.

You will then compile this file into a shared library named hidefile.so. You can then preload this shared library by setting the environment variable:

    export LD_PRELOAD=$PWD/readdir.so

Then you can run programs such as ls or find as in the above examples and test whether it works.

Compiling and Testing

The assignment file hw-6.zip contains a directory hidefile. Inside, you will find a placeholder for your hidefile.c code and a Makefile. If you run

	make

the make program will compile the file hidefile.c into a shared library hidefile.so. If you run

	make test

the make program will compile the files (if necessary), create two sample files named secret-1.txt and secret-2.txt and test the program by setting the environment variable HIDDEN to secret-1.txt, then secret-2.txt, and then deleting it – running the ls command each time with the shared library preloaded.

A note about setting LD_PRELOAD and HIDDEN

When you set environment variables in your shell, you need to export them so they will be passed to any processes created by the shell (i.e., any commands the shell runs).

For example, in bash, zsh, and sh, if you run

LD_PRELOAD=test.so
command

LD_PRELOAD will not preload libraries for the command.

If you run:

export LD_PRELOAD=./test.so
command

Then LD_PRELOAD will be visible to all sub-processes. However, it will be also be loaded by your shell and you might experience unexpected behavior if your shell or other programs (like your editor) are using libraries that you are modifying. You will need to exit the shell or unset the variable:

unset LD_PRELOAD

Alternatively, you can set the environment variable on the command line. The shell will pass it to the command but it will not be used in the context of your shell. This is convenient for testing:

LD_PRELOAD=./test.so command

The same applies to HIDDEN. You can export it and set it to whatever you’d like:

export HIDDEN
HIDDEN=myfile1
command
HIDDEN=myfile2
HIDDEN=

Or set it for the one command:

HIDDEN=myfile1 command

In which case the shell will not set it in its environment but only for the command it runs.

Hints

The program should be quite short. My implementation was just nine lines of C statements.

Some of you will no doubt finish this assignment in well under an hour … but don’t count on it. Others of you, however, may not have much experience with C programming or the Linux enviornment and have more of a struggle. Allow yourself sufficient time.

Develop in steps. Don’t hesitate to put printf statements for debugging … but remove them before submission!

Extra credit

For extra credit, you can add support to hide multiple file names.

Environment variables don’t support arrays. One way to implement this would be to have a sequence of HIDDENn variables, such as HIDDEN0, HIDDEN1, etc. The confusion would be to know the limits and decide whether to support names such as HIDDEN00, HIDDEN000, etc.

Instead, you will implement this in a similar manner to how the shell implements specifying sequences of pathnames in search paths and library paths – by separating names with a colon. For this assignment, you can assume that file names will not contain a colon. In a real implementation, you would support an escape character for a colon.

To hide a single file, set HIDDEN as before:

HIDDEN=myfile

To hide multiple files, set hidden to a colon-separated list:

HIDDEN=myfile1:myfile2:myfile3:myfile4

Part 2

You are given a Linux program called unexpire that runs on the Rutgers iLab Linux systems. It’s in the same hw-6.zip zip package as the first part of the assignment.

Imagine that this is a program that you received for evaluation and it has an expiration time coded into it. This program exits (expires) if the date is after October 1, 2020. It will also refuse to run on any date earlier than January 1, 2020. However, you want to continue using this program and you want to defeat its check for the time.

Goal

Your assignment is to replace the program’s call to a system library to get the time of day with one of your own – an alternate version that will return a suitable date for the program’s time check. This is a use of function interposition.

After the time-based check for validity is done, you want the program to use the actual time so that it will return the correct time of day.

What you need to do

This program uses the standard C library (glibc) time function to get the system time. The time function returns the number of seconds since the Linux Epoch (the start of January 1, 1970 UTC).

Your assignment is to create an alternate version of the time() C library function that will return a value within a time window that will pass the program’s validation check.

You will use Linux’s LD_PRELOAD mechanism to take control and replace the standard time library function with your own version.

You need to write just one function in a file newtime.c. There’s a stub for this under the unexpire directory.

The function will be your implementation of time. You’ll give it the same name and it should take the same parameters and return the same data type as the original version. The main program validates the time only the first time it calls time. After that, you want calls to time to return the correct system time. Unfortunately, your time function will continue to be called by the program so you will need to have your function dynamically link the original time function and pass successive calls to that.

You will then compile this file into a shared library called newtime.so. You can run

make

to compile the library. You can then preload this shared library by setting the environment variable:

export LD_PRELOAD=$PWD/newtime.so

and then run the program normally:

./unexpire

If your implementation is correct, you will see a message stating:

PASSED! You reset the time successfully!

If not, you will see a message stating what part of your implementation failed. You can also test the program by running

make test

If you set LD_PRELOAD, don’t forget to

unset LD_PRELOAD

You may find that programs such as the vi editor call time() and may behave differently.

Hints

As with the first part, this program should be quite short: perhaps 15 lines of code if you use strptime to set the time from a user-friendly time string (which I recommend; it makes it easier to understand the program and makes it more maintainable). With a hardcoded time value, your program might be down to six or lines of functional code.

Test an initial version of your program that does not link in the original time function just to make sure you have that first part working before you move on.

As with part 1, some of you will no doubt finish this assignment incredibly quickly while others of you might struggle a bit figuring out how to convert a date. Allow yourself sufficient time to avoid last-minute panic.

References

hw-6.zip.

There are many tutorials on function interposition on the web. You’ll want to follow the instructions for dynamically linking original functions as well as the proper flags to use to compile the shared libraries (hidefile.so for part 1 and newtime.so for part 2).

Some reference you may find useful are:

man page for the readdir function.
man page for the getenv function.
man page for the time function
man page for ldsym.
NetSPI Blog, Function Hooking Part I
CatonMat, A Simple LD_PRELOAD Tutorial
CatonMat, A Simple LD_PRELOAD Tutorial, Part 2

What to submit

When you have completed your assignment, you will need to submit a zip file containing hidefile/hidefile.c and unexpire/newtime.c. You can create this by running:

make zip

from the top-level directory (hw-6).

Make sure your name, netID, and RUID are in the comments of both files. Validate that the file is correct before submitting.