Unix and C - Lesson 1
For the first lesson I will talk a bit about Makefiles as well as some
introduction to the basics of C.
Makefiles
Often C projects comprise of numerous files that all need to be compiled
together to create the final output. Memorizing all the necessary commands to
do this can be undesirable, so many people instead use GNU Make and
corresponding Makefiles.
The official GNU Make homepage and documentation can be found here:
GNU Make - Free Software Foundation
In a nutshell, the syntax for a Makefile is:
VARIABLE=value
CC=gcc
CFLAGS=-Wall
target: dependency1 dependency2
command1
command2
example: example.c example.h
$(CC) $(CFLAGS) -o example example.c
dependency1:
echo "hello there"
dependency2:
echo "how are you?"
In the above example, two compile targets are defined: 'target' and 'example'.
A target name is followed by a colon (:), then an optional list of depedencies.
Dependencies can refer either to files, or to other targets. For example, the
'target' target depends on the targets 'dependency1' and 'dependency2'.
The 'example' target, depends on the files 'example.c' and 'example.h'. If
those files do not exist, make will produce an error.
To refer to a variable, the syntax is $(VARIABLE), such as in the 'example'
command list.
Target commands MUST be indented by a single tab stop character. If they are
not properly indented, make will not work correctly. Only commands should be
indented.
It is very common to define variables for your C compiler, C linker, and the
flags (or options) to pass to the compiler and linker. They are often called
CC, LINK, CFLAGS and LFLAGS. This is by no means a rule; it is just very
common.
Make always by default reads from the file named 'Makefile' (without any
extension). To specify a different makefile, you would use:
make -f othermakefile
Make is always invoked simply by typing:
make
If you have any questions, please ask me or refer to the GNU Make
documentation.
Note: it is not required that you actually use or know how to use Make.
Make is often used as a way of automating the compile process, however, you can
perform all the commands that make performs manually.
In a nutshell, to manually compile a program containing only one file, you
would type:
gcc -o output input.c
In this case, the executable file produced is called 'output' (on Windows it
would be 'output.exe'), and the C source file is called 'input.c'.
Introduction to C
Now, on to the actual C language.
When a compiler compiles C code, there are actually four different phases that
occur. It is not terribly important that you know these, but I will list them
here for the sake of curiousity, and so you have some idea of what the computer
is actually doing during compilation.
- 1. Preprocess. This phase processes all the preprocessor directives in your
C code. Preprocessor directives are anything that begins with a '#'.
Preprocessor directives allow you to create macros and define constants, as
well as to include header files to import libraries, and to produce
conditionally compiled code. More on that later.
- 2. Compile. Compiling is often the term given to the whole process, but
technically it refers to the process of taking preprocessed code, and
generating what is called "object code". Object code is a mix of code and
references, or "tags" to other symbols and addresses which are not necessarily
resolved until linking. Often times, C code refers to functions and variables
that are defined in other files. Object code keeps 'tags' to those symbols,
and those tags are replaced by the actual functions and variables during
linking.
- 3. Assemble. This is when the object code is actually translated into
machine language (or 'assembly language'). Assembling is hardware dependent,
and will produce different instructions depending on what platform you're
compiling for (e.g., Intel x86, Alpha, MIPS, 68000, PowerPC, etc...). All
local symbols are resolved here and a single assembly file is produced. The
only symbols that aren't resolved are those from external libraries (such as
GLIBC, the standard C library, or Wsock32.dll, the Windows internet socket
library).
- 4. Link. This is when the assembly language is finally translated into
machine binary (e.g., the 1's and 0's that make up the program). It resolves
the internal addresses of any libraries used, and creates an executable program
as the output.
Syntax of C
C has many constructs for controlling the flow of a program, each with their
own syntax. I will cover the basics here.
Remarks / Comments
Every good C programmer makes thorough use of comments! Comments allow you to
place human-readable commentary on how a program works along side the code
itself. Writing good comments increases the understandability of code
considerably, so err on the side of liberal comments.
In the olden days, there was only one syntax for comments. However, since the
days of C++, another has also been introduced.
/* comments */
Anything between the symbols '/*' and '*/' is considered a comment. Nesting
such comments is not allowed. Newlines within such comments are ignored. This
is the original C style comment.
// comments
Anything AFTER the '//' symbol on a line is ignored. These comments end after
a newline. This is a C++ style comment, but is now also valid C.
Examples:
/*
this is a comment. The compiler will ignore it, but humans can take note of
it. The comment ends right here: */
/*
This is another comment.
/* nesting (such as this) has no effect. This comment ends here: */
We are no longer inside a comment, this will produce an error */
// this is a '//' comment. It ends at the end of this line.
int x; // you can start a comment on the same line as code
int y; /* such as this. */
/* '//' comments within '/* */' comments are ignored.
// this comment continues after this line
and ends here: */
It is very common to place comments at the top of your C source file,
explaining who wrote the file, when it was written, and what it does.
Preprocessor directives
C preprocessor directives are anything that begin with a '#'. The most common
is '#include'. I will cover each of them briefly here:
#include
#include is used to include a header (.h) file. Header files declare functions
that exist in other files. A header file cannot contain executable code (that
is, no function calls or computations); only declarations. Think of header
files as files that give your program a list of extra functions or variables
that it can use. I'll cover the difference between .h files and .c files more
later.
#include has two syntaxes: one for including system headers, and one for
including local headers. A local header would be one that you yourself
wrote as part of your project. A system header would be one that is provided
by the compiler.
To include a system header, you type enclose the name of the header in '<' and
'>' characters. To include a local header, you enclose the name of the header
in '"' characters.
Note that if the header file is in a subdirectory, you use '/' (NOT '\') to
denote a subdirectory.
Example:
// include the system header 'stdio.h':
#include <stdio.h>
// include the system header 'sys/types.h':
#include <sys/types.h>
// include the local header (in the same directory as the .c file) 'hello.h'
#include "hello.h"
#define
#define is used to define constants and macros. Macros are a more advanced
topic, but I will give a brief example. The syntax is:
To define a constant:
#define CONSTANT VALUE
This replaces all occurences of 'CONSTANT' with VALUE during compilation.
To define a macro:
#define MACRO(x) func(x)
This means that any occurence of 'MACRO' (with arguments) will be replaced by
func (with the arguments).
Examples:
// constant MAX
#define MAX 100
// macro to call 'func' on an argument
#define FUNC(n) func(n)
// another macro, this one using multiple lines (continued lines use the '\' character)
#define MACRO(x, y, z) for (x = y; x < z; x++) \
printf("\n");
int main() {
// use the MAX define:
int i;
for (i = 0; i < MAX; i++)
printf("\n");
// use the MACRO macro
MACRO(i, 0, 10);
// the above macro will expand to:
// for (i = 0; i < 10; i++)
// printf("\n");
}
#error
#error simply aborts compilation with a specified message. It is useful if the
project has not been configured properly. Use it sparingly.
#if, #ifdef, #ifndef, #else, #elif, #endif
These preprocessor directives allow you to create conditionally compiled code.
This is primarily useful for allowing a single codebase to be compiled on
multiple platforms without modification.
#if takes a boolean expression as an argument, or the special expression
'defined()'.
#ifdef is shorthand for #if defined(). #ifndef is shorthand for #if
!defined().
#elif, #else and #endif should be self explanatory. Every #if or #ifdef or
#ifndef MUST have a corresponding #endif.
You can even use #if directives in the middle of expressions, since C does not
consider newlines to terminate expressions.
However, all preprocessor directives must occur at the beginning of a line.
Often times, you'll use #if and #define in header files to prevent multiple
inclusions of a header file.
For example:
#define LEN 100
#if (LEN > 99)
#error Recompile with LEN <= 99.
#endif
#if defined(C99)
printf("This code was compiled with a C99 compliant compiler\n");
#endif
#ifdef _WIN32
printf("This code was compiled on Windows\n");
#elif defined(HPUX)
printf("This code was compiled on HP UX\n");
#else
printf("This code was not compiled on Windows or HP UX\n");
#endif
// here's an example where #if is used in the middle of an expression.
// on Windows, LEN must be < 100, on all other platforms, LEN must be > 100.
if (LEN
#ifndef _WIN32
>
#else
<
#endif
100)
printf("LEN is out of range\n");
// this is often done in header files to prevent multiple inclusion:
#ifndef _SOMEFILE_H_
#define _SOMEFILE_H_
// header contents here
#endif
Note: the constant '_WIN32' is always defined when you compile on
Windows.
Expression Syntax
In C, expressions and statements can span several lines, and are only
considered complete when terminated by a semicolon (;). Multiple statements
are enclosed in curly braces ( { and } ) to form code blocks.
C is CaSe SeNsItIve. This means that you can have two seperate functions,
'Hello' and 'hello', and they will be treated as seperate functions by the
compiler.
Hello World / Writing your first program
Let's try writing a test program, so you can get a feel for how to write code.
Create a new file, called "hello.c", and put this in it:
// hello.c example program
#include <stdio.h>
int main(int argc, char **argv)
{
printf("Hello, world!\n");
return 0;
}
Save the file, then go to the command prompt, and change to the directory where
the file is saved. Then, type:
gcc -o hello hello.c
This compiles your source code into an executable file called 'hello' (or
'hello.exe' on Windows). Now try running your program:
hello
Note: on Linux, the current directory is not usually included in the path.
What this means is that on Linux, if you just type in "hello", it will say
something like "hello: command not found". When you are attempting to run a
program that is located in the current directory, you type:
./nameofprogram
So in this case, you would type:
./hello
If everything has compiled correctly, you should see the output:
Hello, world!
Entry Point
In C, every program starts in the 'main' function. That means that the start
of your program should be located in the function 'main'. There are two
standard definitions for main. These are:
int main()
or,
int main(int argc, char **argv)
The only allowed variations on these are purely cosmetic. For example you can
have: int main(void), where 'void' inside the parentheses indicates that main
takes no parameters. In the olden days, the (void) method was the only way to
do this. You may also see: int main(int argc, char *argv[]). This is because
in C, arrays and pointers are essentially the same (more on that later). 'char
**argv' and 'char *argv[]' are semantically the same. You may also see any
amount of whitespace, such as: int main ( int argc , char ** argv ).
Very old books may refer to: void main() but this has not been valid for
several years.
The latter declaration, int main(int argc, char **argv), allows your program to
read the command-line parameters that were passed to the program. For example,
consider the following program:
cmdline.c:
// command-line example
#include <stdio.h>
int main(int argc, char **argv)
{
int i;
for (i = 0; i < argv; i++)
printf("argv[%d]=%s\n", i, argv[i]);
return 0;
}
compile the above code with:
gcc -o cmdline cmdline.c
Now run the program, and give it command line arguments, such as:
cmdline this is a test
(or on Linux):
./cmdline this is a test
The output would be:
argv[0]=./cmdline
argv[1]=this
argv[2]=is
argv[3]=a
argv[4]=test
Next week...
This is a lot to take in, of course. We'll review thoroughly as we go along,
so don't worry if you don't absorb everything in one pass. If something
doesn't make sense, try reading it again, or just ask a question. Don't be
afraid to experiment! Next week we'll cover:
- Primitive data types
- Representing constants (numeric and strings)
- Control constructs, for, while, do, if, goto
- Basic user input
- Aggregate types, structs
- Strings