Introduction to C Programming

Introduction to C Programming by Rob Miles, Electronic Engineering


Strings

How long is a piece of string?
Putting Values into Strings
Using Strings
The String Library
strcpy
strcmp
strlen
Reading and Printing Strings
Bomb Proof Input

How long is a piece of string?

Strings are computer jargon for lumps of text. The computer itself can get by quite happily with numbers, but we fuddy duddy old humans seem to prefer chunks of text, for example I prefer to be referred to as Rob Miles rather than 0883059276! If you want to write a program which refers to someone by their name, rather than some meaningless numbers, you need to have a mechanism for storing it. Some programming languages have string handling built into them. The best example is BASIC, which contains special instructions for string manipulation. C is not like BASIC. In C you have to set up strings yourself, the hard way. At first this is more tedious, but you find that the C way of doing things is much more flexible.

So, what do we mean by string? A string is any sequence of characters. A character is the kind of thing you get from a single keypress on the keyboard, or the thing you see in one position on the screen. We already know about the C data type called char, you can regard a string as a series of characters. In C we have just seen that the way to get yourself a series of memory locations is to declare an array of them. This means that in C strings and arrays of characters are exactly the same thing:

char name [20] ;

The above would declare a string capable of holding a 20 character name.

There is just one more thing that you need to know about strings. Suppose I put my name in the above string. I put the characters 'R', 'O', 'B' into the first three locations. Then I have a problem. The array has been set up to hold names up to 20 characters long. This is to allow for unfortunates called Murgatroyd, who need the space, but I do not. I need to have a way of telling C that this is the end of my name, and that the remaining spaces are not used.

C lets me do this by use of a special convention: All strings are terminated by a character holding the value 0. To terminate my name, I simply put a 0 into the location after the last character. You could I suppose regard 0 as the Arnold Schwartznegger of characters! The 0 at the end of the string is often called a null character. Maybe we are back to Arnold Schwartznegger again! Do not confuse the null character with the character code which represents '0'. 0 is the character you get when you press '0' on the keyboard. It is represented by the character code 48. Null is an internal value used by C.

This means that the data space after the terminator is unused by my program. This can be a problem, but is not worth losing much sleep over. Later on we will look at ways of getting exactly the amount of memory space that we want.

Remember that when you create a string to hold some text, you must allow space for the terminator as well, i.e. name [20] has room for 19 characters of the name, followed by the terminator. Do not make the mistake of saying, "Ah well, if the name is 20 characters I do not need the terminator, because C knows the array is only 20 characters long". This is not the case. If you miss the terminator off C will wander down memory looking for a null, and consequently think that your string is a lot longer than it actually is!


Putting Values into Strings

The fact that strings are arrays of characters with a null on the end is something which the C compiler already knows. We put string constants into our code by putting them between double quote characters :

"hello mum"

When the compiler sees the first " it says "Aha! Here is a string. I will store it in memory, and when I seen the closing " I will put a null on the end. I will then create a pointer to this piece of text and use that in the program." Quite a mouthful huh? Note however that the pointer which is created is a constant. This means that you cannot change where it points. There is a good reason for this; if you did change the pointer you would then have a lump of memory which could not be accessed, because nothing would point to it. If you want to put starting values into strings (or any kind of array) you have to do something like this:

char name [20] = "Fred Bloggs" ;
You can use the same trick to put an initial value into any variable.
The declaration creates an array of characters, which you can regard as a string. True to the way that arrays work, name is a pointer to your area of memory. When the compiler processes the string "Fred Bloggs" it ends up creating a pointer to an area of memory with :
F r e d   B l o g g s null

in it. Because of the way C works, it is only possible to initialise arrays which are declared as global variables, i.e.

char setting [10] = "off" ;

is OK but

void message (void)
{
	char setting [10] = "off" ;
}

- would cause a compilation error because this time setting is a local variable. You can initialise simple local variables, i.e.

void message (void)
{
	int i = 99 ;
}

- would be OK.


Using Strings

As far as C is concerned, a string is simply an array of characters with a null on the end. You can use all the normal array and pointer operations on this chunk of memory. For example, consider the problem of taking a full name as above, and printing out only the surname portion.

The first thing we must decide is how we determine where the surname starts. The convention would seem to be that the surname starts after the space in the name. What we therefore have to do is print the name, starting from the character after the space. This means that the first task is to find the space in the name. We do this by searching down the array, starting at the beginning and looking for a space character. When we see one we stop. The following code will do this :

/* our array name will hold the name */
/* we will see how to read the name later */
char name [20] ;

/* position will hold the position in the name */
int position ;
position = 0 ;	/* look from the start */
while ( name [position] != ' ' ) {
	/* If not a space... */ 
	position++ ;
	 /* ..move on to next */
}
/* When we get here position has the subscript value */ 
/* of the space in our string. */

The variable position contains the subscript of the space in our name. We want to start printing from the character after the space. We must therefore move on to the next location :

position_++ ;
/* Move down one */

OK. Now for the interesting bit. We will be giving printf a pointer to the character which we want to start printing at. We know that name [position] is the first character that we want to print out. We also know that printf is supplied with the position in memory to print from, and prints until it sees a null. What we therefore want to do is start printing from the position of the first character, up to the end of the string. We can do this in C with :

printf ( &name [position] ) ;

We can put an & in front of any variable to get the address of it. By giving &name [position] we are giving the address of the 'B'. We know that after that character comes the surname, so the program will print out what we want. If you do not believe this, try running the program!

This piece of code is not perfect, you might like to consider what would happen if the user gave a name with no space it, or a name with several spaces between the first name and the surname. I would expect any of you to write programs which would worry about these things as a matter of course. This technique is called defensive programming and translates to "make sure the problem appears somewhere else". You should practice defensive techniques every time you write some code. Once you have worked out how to solve the problem you should then go back and wonder how your solution can go wrong, and add extra handling for that!

Note that you could have done the same job using pointers rather than array elements and subscripts. See if you can understand this :

/* name array as before */
char name [20] ;

/* points to the start of the surname */
char * SurnameStart  = name ;

while ( *SurnameStart != ' ' ) {
	SurnameStart++ ; /* skip to the space */
}
SurnameStart++ ; /* move past the space */
printf ( SurnameStart ) ;
/* print starting at the surname */

The code does exactly the same job, the difference is only in how it is expressed. You might like to consider which of the two programs is better and why.


The String Library

Unlike some other languages, for example BASIC, C does not have any string handling "built in". Instead, as for input/output, it relies on a set of library routines which are supplied with the C system. These routines are common across all versions of C, and are specified in the string.h header file. You can use these routines to do string copying, comparison and concatenation for you.

Here are a few routines you may find useful :

strcpy

int strcpy ( char * dest, char * source ) ;

String copy. Has two parameters, both of them pointers to char. Will copy characters from the source to the destination, up to the null terminator of the source:

char name [20] = "Fred Bloggs" ;
char  safety [20] ;

strcpy ( safety, name ) ;

would result in safety holding the string "Fred Bloggs".

strcpy returns the number of characters that it transferred. Note that if you use strcpy with an un-terminated string on the input you will get big problems!

strcmp

int strcmp ( char * s1, char * s2 ) ;

String compare. Compares one string with another, and returns the result 1 if the first string was greater than the second, 0 if the two strings are the same and -1 if the first string was less than the second. C uses the character codes of the strings to decide on greater and less than, meaning that normal rules of alphabetic ordering apply; i.e. a < b, A < a, 1 < A.

printf ( "%d", strcmp ("Fred", "Jim") ) ;

would print out -1, because Fred is less than Jim alphabetically.

strlen

int strlen ( char * string ) ;

string length. Used to find out the length of a string. You should be able to write a function to do this yourself, but like any good programmer you will always look for the easy way to do something.

printf ( "%d", strlen( "HelloMum" )) ;

would print out 8. (Note that the terminator is not counted as a character in the string.)


Reading and Printing Strings

You will often want to read strings from the keyboard, and print them out. C provides another format specifier, %s to mean a string. This means that you can read a string from the user with code which looks like this :

char YourName [50] ;
scanf ( "%s", YourName ) ;
printf ( "Hello %s\n", YourName ) ;

Note that because YourName is actually a pointer to a char (that's what arrays are) you do not need to put the & pointer in front of the name.

However, doing formatted reads with strings is not usually a good idea. C has in its little head the idea of white space. White space is a gap which marks the end of one thing and the start of another. Things separated by white space are different values, for example

2   3

- is not the value 23, but the value 2 followed by the value 3, with white space in between. White space can be described as :

Any number of spaces or newlines or tab characters.

This means that the string :

Rob Miles

- is actually two strings, Rob followed by Miles. If you really want to read in a line of data which may contain white space you should use a new function called gets (getstring). gets is a routine in stdio which fetches characters until the end of a line, so you can read data containing spaces :

char buffer [120] ;
gets (buffer) ;

Note that gets does not check for the length of the string, it assumes that you will have reserved enough space to hold the text. The trick is to reserve the length of a terminal line plus a bit, I usually make such lines 120 characters long. If I wanted to hold the name in a smaller sized string I would copy it there once I had read it in, and this time I would check the length!

While we are on the subject of useful routines in stdio I will mention getchar. This allows you to read a single character from the keyboard, without the user having to press the return key. It returns the character pressed :

char ch ;
ch = getchar ();

Note that you have to put the () after the call of getchar so that the compiler can tell that this is a function we are calling.


Bomb Proof Input

Users are stupid. Really stupid. They find ways of crashing your programs which you would never think of in a million years. The experience of seeing some idiot with half a brain cell blow away your wonderful program with a single keypress is a very depressing one. You must always bear in mind that if someone crashes your program you look stupid and they look clever. Any programmer that says to the user "You idiot, you have crashed my program" is not a Real Programmer, he is just a pretender. What you should say is "Oh Dear, what keys did you press", and then take steps to ensure that it never happens again.

One of the places where things can go wrong is when your program innocently asks for a number :

int Age ;
printf ( "How old are you : " ) ;
scanf ( "%d", &Age ) ;
printf ( "You are %d years old.\n" ) ;

This is a perfectly legal piece of code, but what happens if your user types :

twenty five

- and presses return. Try it!

Who made the mistake, was it the programmer or the user? Since we have not got the time to discuss this, we should simply make sure that our program is proof against it, and says something like "I did not recognise that value, please type in a value, e.g. 25, and press return"

We do this by always regarding input from the user as a string. We read the string in and try to convert it to a number. If the conversion fails, we ask for another string.

To do this we use a very useful function in stdio called sscanf. This does all the things that scanf does, but takes the input from a string, rather than the keyboard. Like scanf, sscanf returns the number of items it successfully read, so that we know if the values made sense. Consider the following :

void main (void)
{
	char buffer [120] ;
	int value, result ;
	do {
		printf ( "Give me a value : " ) ;
		gets ( buffer ) ;
		result = sscanf ( buffer, "%d", &value ) ; 
	} while (result != 1) ;
	printf ( "%d\n", value ) ;
}

If you run this program and give it some values you can only get out by giving a valid integer. In your programs you should always use this technique when reading numbers from users.


Rob Miles, R.S.Miles@e-eng.hull.ac.uk, Electronic Engineering
HTML by Bronwen Reid, July 1995