Unix and C - Lesson 1

For the first lesson I will talk a bit about Makefiles as well as some introduction to the basics of C.

Makefiles
Often C projects comprise of numerous files that all need to be compiled together to create the final output. Memorizing all the necessary commands to do this can be undesirable, so many people instead use GNU Make and corresponding Makefiles.
The official GNU Make homepage and documentation can be found here:
GNU Make - Free Software Foundation

In a nutshell, the syntax for a Makefile is:

VARIABLE=value
CC=gcc
CFLAGS=-Wall

target: dependency1 dependency2
	command1
	command2

example: example.c example.h
	$(CC) $(CFLAGS) -o example example.c

dependency1:
	echo "hello there"

dependency2:
	echo "how are you?"

In the above example, two compile targets are defined: 'target' and 'example'.
A target name is followed by a colon (:), then an optional list of depedencies. Dependencies can refer either to files, or to other targets. For example, the 'target' target depends on the targets 'dependency1' and 'dependency2'.
The 'example' target, depends on the files 'example.c' and 'example.h'. If those files do not exist, make will produce an error.
To refer to a variable, the syntax is $(VARIABLE), such as in the 'example' command list.
Target commands MUST be indented by a single tab stop character. If they are not properly indented, make will not work correctly. Only commands should be indented.

It is very common to define variables for your C compiler, C linker, and the flags (or options) to pass to the compiler and linker. They are often called CC, LINK, CFLAGS and LFLAGS. This is by no means a rule; it is just very common.

Make always by default reads from the file named 'Makefile' (without any extension). To specify a different makefile, you would use:
make -f othermakefile

Make is always invoked simply by typing:
make

If you have any questions, please ask me or refer to the GNU Make documentation.

Note: it is not required that you actually use or know how to use Make. Make is often used as a way of automating the compile process, however, you can perform all the commands that make performs manually.

In a nutshell, to manually compile a program containing only one file, you would type:
gcc -o output input.c
In this case, the executable file produced is called 'output' (on Windows it would be 'output.exe'), and the C source file is called 'input.c'.

Introduction to C

Now, on to the actual C language.

When a compiler compiles C code, there are actually four different phases that occur. It is not terribly important that you know these, but I will list them here for the sake of curiousity, and so you have some idea of what the computer is actually doing during compilation.
- 1. Preprocess. This phase processes all the preprocessor directives in your C code. Preprocessor directives are anything that begins with a '#'. Preprocessor directives allow you to create macros and define constants, as well as to include header files to import libraries, and to produce conditionally compiled code. More on that later.
- 2. Compile. Compiling is often the term given to the whole process, but technically it refers to the process of taking preprocessed code, and generating what is called "object code". Object code is a mix of code and references, or "tags" to other symbols and addresses which are not necessarily resolved until linking. Often times, C code refers to functions and variables that are defined in other files. Object code keeps 'tags' to those symbols, and those tags are replaced by the actual functions and variables during linking.
- 3. Assemble. This is when the object code is actually translated into machine language (or 'assembly language'). Assembling is hardware dependent, and will produce different instructions depending on what platform you're compiling for (e.g., Intel x86, Alpha, MIPS, 68000, PowerPC, etc...). All local symbols are resolved here and a single assembly file is produced. The only symbols that aren't resolved are those from external libraries (such as GLIBC, the standard C library, or Wsock32.dll, the Windows internet socket library).
- 4. Link. This is when the assembly language is finally translated into machine binary (e.g., the 1's and 0's that make up the program). It resolves the internal addresses of any libraries used, and creates an executable program as the output.

Syntax of C

C has many constructs for controlling the flow of a program, each with their own syntax. I will cover the basics here.

Remarks / Comments

Every good C programmer makes thorough use of comments! Comments allow you to place human-readable commentary on how a program works along side the code itself. Writing good comments increases the understandability of code considerably, so err on the side of liberal comments.

In the olden days, there was only one syntax for comments. However, since the days of C++, another has also been introduced.

/* comments */
Anything between the symbols '/*' and '*/' is considered a comment. Nesting such comments is not allowed. Newlines within such comments are ignored. This is the original C style comment.

// comments
Anything AFTER the '//' symbol on a line is ignored. These comments end after a newline. This is a C++ style comment, but is now also valid C.

Examples:
/*
   this is a comment.  The compiler will ignore it, but humans can take note of
   it.  The comment ends right here: */
/*
   This is another comment.
   /* nesting (such as this) has no effect.  This comment ends here: */

   We are no longer inside a comment, this will produce an error */

// this is a '//' comment.  It ends at the end of this line.
int x;	// you can start a comment on the same line as code
int y;	/* such as this. */

/* '//' comments within '/* */' comments are ignored.
    // this comment continues after this line
    and ends here: */

It is very common to place comments at the top of your C source file, explaining who wrote the file, when it was written, and what it does.

Preprocessor directives

C preprocessor directives are anything that begin with a '#'. The most common is '#include'. I will cover each of them briefly here:

#include
#include is used to include a header (.h) file. Header files declare functions that exist in other files. A header file cannot contain executable code (that is, no function calls or computations); only declarations. Think of header files as files that give your program a list of extra functions or variables that it can use. I'll cover the difference between .h files and .c files more later.
#include has two syntaxes: one for including system headers, and one for including local headers. A local header would be one that you yourself wrote as part of your project. A system header would be one that is provided by the compiler.
To include a system header, you type enclose the name of the header in '<' and '>' characters. To include a local header, you enclose the name of the header in '"' characters.
Note that if the header file is in a subdirectory, you use '/' (NOT '\') to denote a subdirectory.
Example:
// include the system header 'stdio.h':
#include <stdio.h>
// include the system header 'sys/types.h':
#include <sys/types.h>
// include the local header (in the same directory as the .c file) 'hello.h'
#include "hello.h"

#define
#define is used to define constants and macros. Macros are a more advanced topic, but I will give a brief example. The syntax is:
To define a constant:
#define CONSTANT VALUE
This replaces all occurences of 'CONSTANT' with VALUE during compilation.
To define a macro:
#define MACRO(x) func(x)
This means that any occurence of 'MACRO' (with arguments) will be replaced by func (with the arguments).
Examples:
// constant MAX
#define MAX	100

// macro to call 'func' on an argument
#define FUNC(n)		func(n)
// another macro, this one using multiple lines (continued lines use the '\' character)
#define MACRO(x, y, z)		for (x = y; x < z; x++) \
					printf("\n");
int main() {
	// use the MAX define:
	int i;
	for (i = 0; i < MAX; i++)
		printf("\n");
	
	// use the MACRO macro
	MACRO(i, 0, 10);
	// the above macro will expand to:
	// for (i = 0; i < 10; i++)
	// 	printf("\n");
}

#error
#error simply aborts compilation with a specified message. It is useful if the project has not been configured properly. Use it sparingly.

#if, #ifdef, #ifndef, #else, #elif, #endif
These preprocessor directives allow you to create conditionally compiled code. This is primarily useful for allowing a single codebase to be compiled on multiple platforms without modification.
#if takes a boolean expression as an argument, or the special expression 'defined()'.
#ifdef is shorthand for #if defined(). #ifndef is shorthand for #if !defined().
#elif, #else and #endif should be self explanatory. Every #if or #ifdef or #ifndef MUST have a corresponding #endif.
You can even use #if directives in the middle of expressions, since C does not consider newlines to terminate expressions.
However, all preprocessor directives must occur at the beginning of a line.

Often times, you'll use #if and #define in header files to prevent multiple inclusions of a header file.
For example:
#define LEN 100
#if (LEN > 99)
#error Recompile with LEN <= 99.
#endif

#if defined(C99)
printf("This code was compiled with a C99 compliant compiler\n");
#endif

#ifdef _WIN32
printf("This code was compiled on Windows\n");
#elif defined(HPUX)
printf("This code was compiled on HP UX\n");
#else
printf("This code was not compiled on Windows or HP UX\n");
#endif

// here's an example where #if is used in the middle of an expression.
// on Windows, LEN must be < 100, on all other platforms, LEN must be > 100.
if (LEN
#ifndef _WIN32
	>
#else
	<
#endif
	100)
	printf("LEN is out of range\n");

// this is often done in header files to prevent multiple inclusion:
#ifndef _SOMEFILE_H_
#define _SOMEFILE_H_
// header contents here
#endif

Note: the constant '_WIN32' is always defined when you compile on Windows.

Expression Syntax
In C, expressions and statements can span several lines, and are only considered complete when terminated by a semicolon (;). Multiple statements are enclosed in curly braces ( { and } ) to form code blocks.

C is CaSe SeNsItIve. This means that you can have two seperate functions, 'Hello' and 'hello', and they will be treated as seperate functions by the compiler.

Hello World / Writing your first program
Let's try writing a test program, so you can get a feel for how to write code.
Create a new file, called "hello.c", and put this in it:
// hello.c example program

#include <stdio.h>

int main(int argc, char **argv)
{
	printf("Hello, world!\n");
	return 0;
}

Save the file, then go to the command prompt, and change to the directory where the file is saved. Then, type:
gcc -o hello hello.c
This compiles your source code into an executable file called 'hello' (or 'hello.exe' on Windows). Now try running your program:
hello
Note: on Linux, the current directory is not usually included in the path. What this means is that on Linux, if you just type in "hello", it will say something like "hello: command not found". When you are attempting to run a program that is located in the current directory, you type:
./nameofprogram
So in this case, you would type:
./hello
If everything has compiled correctly, you should see the output:
Hello, world!

Entry Point
In C, every program starts in the 'main' function. That means that the start of your program should be located in the function 'main'. There are two standard definitions for main. These are:

int main()
or,
int main(int argc, char **argv)

The only allowed variations on these are purely cosmetic. For example you can have: int main(void), where 'void' inside the parentheses indicates that main takes no parameters. In the olden days, the (void) method was the only way to do this. You may also see: int main(int argc, char *argv[]). This is because in C, arrays and pointers are essentially the same (more on that later). 'char **argv' and 'char *argv[]' are semantically the same. You may also see any amount of whitespace, such as: int main ( int argc , char ** argv ).
Very old books may refer to: void main() but this has not been valid for several years.

The latter declaration, int main(int argc, char **argv), allows your program to read the command-line parameters that were passed to the program. For example, consider the following program:
cmdline.c:
// command-line example

#include <stdio.h>

int main(int argc, char **argv)
{
	int i;
	for (i = 0; i < argv; i++)
		printf("argv[%d]=%s\n", i, argv[i]);
	return 0;
}
compile the above code with:
gcc -o cmdline cmdline.c
Now run the program, and give it command line arguments, such as:
cmdline this is a test
(or on Linux):
./cmdline this is a test
The output would be:
argv[0]=./cmdline
argv[1]=this
argv[2]=is
argv[3]=a
argv[4]=test

Next week...

This is a lot to take in, of course. We'll review thoroughly as we go along, so don't worry if you don't absorb everything in one pass. If something doesn't make sense, try reading it again, or just ask a question. Don't be afraid to experiment! Next week we'll cover:
- Primitive data types
- Representing constants (numeric and strings)
- Control constructs, for, while, do, if, goto
- Basic user input
- Aggregate types, structs
- Strings