Introduction to C Programming

Introduction to C Programming by Rob Miles, Electronic Engineering


Files

When do we use Files?
Streams and Files
fopen and fclose
Mode String
File Functions
fread and fwrite
The End of the File and Errors

When do we use Files?

If you want your program to be properly useful you have to give it a way of storing data when it is not running. We know that you can store data in this way, that is how we have kept all the programs we have created so far, in files.

Files are looked after by the operating system of the computer. What we want to do is use C to tell the operating system to create files and let us access them. The good news is that although different operating systems use different ways to look after their files, the way in which you manipulate files in C is the same for any computer. We can write a program which creates a file on a PC and then use that program to create a file on a UNIX system, with no problems.

Streams and Files

C makes use of a thing called a stream. A stream is a link between your program and a file. Data can flow up or down your stream, so that streams can be used to read and write to files. The stream is the thing that links your program with the operating system of the computer you are using. The operating system actually does the work, and the C system you are using will convert your request to use streams into instructions for the operating system you are using at the time:

C needs somewhere to keep track of a particular stream, it needs to be able to remember where you have got to in the file, what you are using the file for and so on. We do not need to know just what information C stores about each file, and this information may well be different for each operating system.

C hides all this from us by letting us talk in terms of a structure called a FILE. A file is a structure which holds information about a particular stream. We do not create or manipulate this structure, that is done by the input and output routines that come with our version of C. All we have to do is maintain a pointer to a FILE, so we can tell the C functions which we want to use. If you are really interested, you can find out what a FILE is made of by looking in the file STDIO.H.

All the functions to manipulate files are defined in the STDIO.H file. They look very similar to the printf and scanf routines that we have used already. To remind you that they operate with files, all the file handling function names begin with "f".


fopen and fclose

The first step in using a file is to open it. Remember that a file can be used in different ways; for reading from or writing to, or perhaps both. C lets us protect files that we only want to read from by allowing us to open a file in read mode. This means that we are not allowed to change the contents of the file opened for reading. You tell the input/output system about the file you want to open by means of a mode string. This gives information about the type of file you are working on and the way in which it is to be used.

We open the file by using the function fopen. This has two parameters; the name of the file to be opened and the mode to use. It returns a pointer to a FILE structure which it creates, for example :

FILE * listing_output ;
FILE * program_input ;
listing_output = fopen ( "LISTING", "w" ) ; 
program_input = fopen ( "PROGRAM", "r" ) ;

This opens a couple of files, LISTING is opened for output and PROGRAM is opened for reading.

See the section on POINTERS for more about NULL.
If the files do not exist, or there is a problem opening them, fopen returns a special value called NULL. You must always make sure that the open has worked before you try to do anything with a file, e.g.
FILE * listing_output ;
listing_output = fopen ( "LISTING", "w" ) ;
if (listing_output == NULL ) {
	printf ( "I could not open your output file.\n" ) ;
}

You can use a string as the name of the file you want to open, so that you could ask the user for the name of the output file and then open a file with that name. Note that the file is opened in the current directory, and that you must observe the conventions of the computer you are using with regard to file names.

When you open a file for reading, you are assuming that the file exists. If it does not, fopen will give you an error. When you open a file for writing the file may or may not exist, and fopen will create one for you if it needs to. This can lead to problems, what we have here is a very good way of destroying data by mistake. When you open an existing file for writing, even if the file is enormous, fopen will clobber the contents and start writing at the top of the file. You do not get a warning, all you get is a sinking feeling as several weeks work go down the tubes.....

Because I make a point of writing user-proof programs I therefore make sure that a user really wants to overwrite a file before I let him or her do it. This means that I find out if a file exists before I let the user write all over it. It is very easy to do this; you simply try to open the file for reading :

FILE * output_file ;
char output_file_name [50] ;
char reply [20] ;
do {
	printf ( "Give the name of your output file : " ) ;
	gets ( output_file_name ) ;
	output_file = fopen ( output_file_name, "r" ) ;
	if ( output_file != NULL ) {
		fclose ( output_file ) ;
		printf ( "Overwrite this file ? (Y or N) : " ) ;
		gets ( reply ) ;
		if ( reply [0] != 'Y' ) {
			continue ;
		}
	}
} while ( fopen ( output_file_name, "w" ) ) ;

This snippet of code will loop until the user gives us a file which they say we can overwrite, and the file is opened successfully. Note the use of the underrated continue half way down. This causes the loop to start again from the top, which makes the program ask for another filename. Note also the use of the function fclose. fclose causes the file to be closed. It has one parameter, the file pointer whose file needs to be closed.

You must always close a file when you have finished with it. This is particularly important if you are writing to the file. The operating system does not switch on the disk drive to write just a single character to the disk, rather it waits until it has a load to write and then writes the lot in one go. This increases efficiency, but it does mean that at any time during your output some of the data is on the disk and the rest is in a buffer. Only when you call fclose is the buffer emptied and the disk written with all the information. If you want to force the buffer to be emptied onto the disk at any time, and ensure that it is up to date - but do not want to close the file, there is a function called fflush which will do this for you.

Mode String

The fopen function is told how to open the file by means of the mode string. The file opened is marked with the mode which is being used, and then other file input-output functions look at the mode before doing anything. This is how C stops you from writing to a file which was opened for input. The mode string can contain the following characters.

w
open the file for writing. If the file does not exist it is created. If the file does exist it is emptied, and we start with a new one.

Opens using w fail if the operating system is unable to open a file for output. This would usually be because the disk we are using is write protected or full, or if you are on a system which can share files, and somebody else has connected their program to the file in question.

r
open the file for reading. You will be unable to write into the file, but can read from it.

Opens using r fail if the file does not exist, or if the file is protected in some way which denies us access.

a
open the file for append. If the file exists we are moved to the end of the file, i.e. if we send any data to a file opened for append it will be placed on the end of the file. If the file does not exist it is created for us.

Opens using a generally fail for the same reasons as opens using w.

b
open the file as a binary file. Essentially there are two kinds of data on a computer. Stuff which makes sense to us, and stuff which makes sense to the machine itself. Stuff which makes sense to us is in the form of text, i.e. nothing in the file other than letters and numbers etc. Stuff which makes sense to the computer includes program files and any data file which needs to be translated by a program before it can be understood by people, e.g. spreadsheet data files. This data is called binary.

If you open a file of type binary you are telling C that you want it to send the data exactly as you output it. i.e. it must not perform any translations which make this file easier for humans to make sense of.

t
open the file as a text file. Text files only contain printable characters, i.e. things that you or I would like to see. C will therefore make some changes to the file when it is output, usually in terms of what happens at the end of a line of text: some computer systems use two characters to mark the end of a line and others only use one.

When a text file is opened the C input/output routines will perform the translation required. If you open a binary file as a text file you will notice very strange things happen. Remember that the C input/output system has no way of knowing which kind of file you really want to use, and so will do the wrong thing if you tell it to.

+
This means that you want to use the file for both reading and writing. You can put + after r or w. If you put it after a r (read) it means that the file is not destroyed if it exists, and that an error is produced if the file does not exist. If you put + after a w (write) it means that the file is destroyed if it exists, and a new file is created if required.

Some examples of file open modes and what they mean :

"rb+"
Open the file for reading and writing in binary mode. If the file does not exist do not create it. If the file does exist do not destroy it.
"wt"
Open the file for writing in text mode. If the file does exist destroy it. If the file does not exist create one.

File Functions

As I said above, we can use the functions fprintf and fscanf to communicate with our files :

fprintf ( listing_output, "This is a listing file\n" ) ;
fscanf ( program_input, "%d", &counter ) ;

The functions work in exactly the same way as scanf and printf, except that they use the file linked to the FILE pointer rather than the keyboard and screen. There are file versions of all the input and output functions we have covered so far.

In addition there are some very useful functions which let us save and load chunks of memory in files. These are very useful when you come to put structures and arrays into files. You might think that to save an array to disk you have to write each individual element out using some kind of loop. You can do this, but there is a much easier way of doing it.

fread and fwrite

You can regard an array, or an array of structured variables, as simply a block of memory of a particular size. The input/output routines provide you with routines that you can call to drop a chunk of memory onto a disk file with a single function call. However, there is one thing you must remember. If you want to store blocks of memory in this way the file must be opened as a binary file. What you are doing is putting a piece of the program memory out on the disk. You do not know how this memory is organised, and will never want to look at this, so you must open the file as a binary file. If you want to print or read text you use the fprintf or scanf functions.

The function fwrite sends a block of memory out to the specified file. To do this it needs to know three things; where the memory starts, how much memory to send, and the file to send the memory to. The location of the memory is just a simple pointer, the destination is a pointer to a FILE which has been opened previously, the amount of memory to send is given in two parts, the size of each chunk, and the number of chunks. This might seem rather long winded, but is actually rather sensible. Consider :

typedef struct
{
	char name [30] ;
	char address [60] ; 
	int account ;
	int balance ;
	int overdraft ;
} customer ;

customer Hull_Branch [100] ;
FILE * Hull_File ;
	.
	.
	.
fwrite ( Hull_Branch, sizeof ( customer ), 100, Hull_File ) ;

The first parameter to fwrite is the pointer to the base of the array of our customers. The second parameter is the size of each block to save. We can use sizeof to find out the size of the structure. The third parameter is the number of customer records to store, in this case 100 and the final parameter is the file which we have opened. Note that if we change the number of items in the customer structure this code will still work, because the amount of data saved changes as well.

The opposite of fwrite is fread. This works in exactly the same way, but

fetches data from the file and puts it into memory:

fread ( Hull_Branch, sizeof ( customer ), 100, Hull_Data ) ;

If you want to check that things worked, these functions return the number of items that they transferred successfully :

if ( fread ( Hull_Branch, sizeof ( customer ), 100, Hull_Data ) < 100 ) {
	printf ( "Not all data has been read!\n\n" ) ;
}

The End of the File and Errors

Any file maintained by the operating system has a particular size. As you write a file it is made bigger until you close it, so the only problems that you have when sending output to a file are concerned with what happens when you run out of disk space. We have already seen above that the standard C input/output functions can tell you how many items they have successfully transferred; you should always use the value they give back to test that your dealings with files are going properly.

When you are reading from a file it is useful to be able to check whether or not you have reached the end, without just failing to read. The function feof can be used to find out if you have hit the end :

if ( feof (input_file ) ) printf ( "BANG!\n" ) ;

This function returns the value 0 if the end of the file has not been reached, and another value if it has. It takes a FILE pointer as a parameter.

Note that binary and text files have different methods of determining the end of a file. If you are having problems because you keep reaching the end of the file before you think you should it may be because you have opened the file in the wrong mode.

feof has a twin brother called ferror who can be called to find out if an error has been caused due to a file operation. It is worth calling this if you find that less items have been transferred by a read or a write than you expected. The error number that you get back is specific to the operating system you are using but it can be used to make program more friendly, for example your program could tell the difference between "no disk in the drive" and "the disk has completely failed".


Rob Miles, R.S.Miles@e-eng.hull.ac.uk, Electronic Engineering
HTML by Bronwen Reid, July 1995