Brief Details of Computer Science Subjects: Input and Output

Input and Output

The C language provides no direct facilities for input and output (IO), and, instead, these operations are supplied as functions in the standard library.

1. Formatted IO

2. File IO

3. Command-Shell Redirection

4. Command-Line Arguments

1. Formatted IO

1.1Formatted Output: printf()

The function printf() is a general purpose print function that converts and formats its arguments to a character string, and prints the result to standard output (typically the screen). The general interface for printf() is

int printf(const char *format, arg1, arg2, ...);

The first argument is a format string (The format string is composed of ordinary characters and conversion specification characters.), which defines the layout of the printed text.

This is followed by zero or more optional arguments, with the number of arguments, and their type, being determined by the contents of the format string.

The return value is the number of characters printed, unless an error occurs during output whereupon the return value is EOF.

Conversion specifications are identified by a % character followed by a number of optional fields and terminated by a type conversion character. A simple example is

printf("%d green %s sitting on a wall.\n", 10, "bottles");

where the ordinary characters “green” and “sitting on a wall.\n” are printed verbatim, and the conversion specifiers %d and %s insert the additional arguments at the appropriate locations. The type conversion character must match its associated argument type; in the example, the %d indicates an integer argument and the %s indicates a string argument.

There are different conversion characters for ints (d, i, o, x, c), unsigned ints (u), doubles (f, e, g), strings (s), and pointers (p). Details of these may be found in any C reference text.

To print a % character, the conversion specification %% is used. Between the % and the type conversion character there may exist a number of optional fields.

These control the formatting of the converted argument. Consider, for example, the conversion specifier %-#012.4hd

1.2 Formatted Input: scanf()

The scanf() function is the input analog of printf(), providing many of the same conversion specifications in the opposite direction (although there are differences, so be wary).

It obtains data from standard input, which is typically the keyboard. The general interface for scanf() is

int scanf(const char *format, ...);

This is identical to printf() in form, with a format string and a variable argument list, but an important difference is that the arguments for scanf() must be pointer types. This allows the input data to be stored at the address designated by the pointer using pass-by-reference semantics.

For example,

double fval;

scanf("%lf", fval); /* Wrong */

scanf("%lf", &fval); /* Correct, store input in fval. */

scanf() reads characters from standard input and interprets them according to the format string specification.

It stops when it exhausts the format string, or when some input fails to match a conversion specification. Its return value is the number of values successfully assigned in its variable-length argument list.

If a conflict occurs between the conversion specification and the actual input, the character causing the conflict is left unread and is processed by the next standard input operation.

The mechanics of the format string and its conversion specifications are even more complicated for scanf() than for printf(), and there are many details and caveats that will not be discussed here.

Most of the conversion characters for printf()—d, i, o, x, c, u, f, e, g, s, p, etc—have similar meanings for scanf(), but there are certain differences, some subtle.

Thus, one should not use the documentation for one as a guide for the other. Some of these differences are as follows.

• Where printf() has four optional fields, scanf() has only two. It has the width and size modifier fields but not the flags and precision fields.

• For printf() the width field specifies a minimum reserve of space (i.e., padding), while for scanf() it defines a maximum limit on the number of characters to be read.

• An asterisk character (*) may be used in place of the width field for both printf() and scanf(), but with different meanings. For printf() it allows the width field to be determined by an additional argument, but for scanf() it suppresses assignment of an input value to its argument.

• The conversion character [ is not valid for printf(), but for scanf() it permits a scanset of characters to be specified, which allows scanf() to control exactly the characters it reads in.

• The size modifier field is typically neglected for printf(), but is vital for scanf(). For example, to read a float, one uses the conversion specifier %f. To read a double, the size modifier l (for long) must appear, %lf.

The scanf() format string consists of conversion specifiers, ordinary characters, and white-space. For example, the following statement is used to read a date of the form dd/mm/yy.

int day, month, year;

scanf("%d/%d/%d", &day, &month, &year);

In general scanf() ignores white-space characters in its format string, and skips over white-space in stdin as it looks for input values. Exceptions to this rule arise with the %c and %[ conversion specifiers, which do not skip white-space. For example, if the user types in “one two” for each of the statements below, they will obtain different results.

char s[10], c;

scanf("%s%c", s, &c); /* s = "one", c = ’ ’ */

scanf("%s %c", s, &c); /* s = "one", c = ’t’ */

In the first case, the %c reads in the next character after %s leaves off, which is a space. In the second, the white-space in the format string causes scanf() to consume any white-space after “one”, leaving the first non-space character (t) to be assigned to c.

While the many details of scanf() formatting complicates a complete understanding, its basic use is quite simple. Rarely does an input statement get more complicated than

short a;

double b;

char c[20];

scanf("%hd %lf %s", &a, &b, c);

A few final warnings about scanf().

First, keep in mind that the arguments in its variable length argument list must be pointers; forgetting the & in front of non-pointer variables is a very common mistake.

Second, when there is a conflict between a conversion specification and the actual

input, the offending character is left unread. Thus, an expression like while (scanf("%d", &val) != EOF) is dangerous as it will loop forever if there is a conflict. Third, while scanf() is a good choice when the exact format of the input is known, other input techniques may be better suited if the format may vary. For example, the combination of fgets() and sscanf(), described in the next section, is a useful alternative if the input format is not precisely known. The fgets() function reads a line of characters into a buffer, and sscanf() extracts the data, and can pick out different parts using multiple passes if necessary.

1.3 String Formatting

The functions sprintf() and sscanf() perform essentially the same operations as printf() and scanf(), respectively, but, rather than interact with stdout or stdin, they operate on a character array argument. They present the following interfaces.

int sprintf(char *buf, const char *format, ...);

int sscanf(const char *buf, const char *format, ...);

The sprintf() function stores the resulting formatted string in buf and automatically appends this string with a terminating \0 character. It returns the number of characters stored (excluding \0). This function is very useful for a wide range of string manipulation operations. For example, the following code segment creates a format string at runtime, which prevents scanf() from overflowing its character buffer.

char buf[100], format[10];

sprintf(format, "%%%ds", sizeof(buf)-1); /* Create format string "%99s". */

scanf(format, buf); /* Get string from stdin. */

The input string is thus limited to not more than 99 characters plus 1 for the terminating \0. sscanf() extracts values from the string buf according to the format string, and stores the results in the additional argument list. It behaves just like scanf() with buf replacing stdin as the source of input characters. An attempt to read beyond the end of string buf for sscanf() is equivalent to reaching the end-of-file for scanf(). The sscanf() function is often used in conjunction with a line input function, such as fgets(), as in the following example.

char buf[100];

double dval;

fgets(buf, sizeof(buf), stdin); /* Get a line of input, store in buf. */

sscanf(buf, "%lf", &dval); /* Extract a double from buf. */

2. File IO

The C language is closely tied to the UNIX operating system; they were initially developed in parallel, and UNIX was implemented in C. Thus, much of the standard C library is modelled on UNIX facilities, and in particular the way it performs input and output by reading or writing to files.

2.1 Opening and Closing Files

A file is referred to by a FILE pointer, where FILE is a structure declaration defined with a typedef in header stdio.h. This file pointer “points to a structure that contains information about the file, such as the location of a buffer, the current character position in the buffer, whether the file is being read or written, and whether errors or end-of-file have occurred”. All these implementation details are hidden from users of the standard library via the FILE type-name and the associated library functions.

è A file is opened by the function fopen(), which has the interface

FILE *fopen(const char *name, const char *mode);

The first argument, name, is a character string containing the name of the file. The second is a mode string, which determines how the file may be used.

There are three basic modes:

read "r", write "w" and append "a".

The first opens an existing file for reading, and fails if the file does not exist.

The other two open a file for writing, and create a new file if it does not already exist.

Opening an existing file in "w" mode, first clears the file of its existing data (i.e., overwrites the existing file).

Opening in "a" mode preserves the existing data and adds new data to the end of the file.

è Each of these modes may include an additional “update” specification signified by a + character (i.e., "r+", "w+", "a+"), which enables the file stream to be used for both input and output. This ability is most useful in conjunction with the random access file operations.

è The standard C library caters for this variation by permitting a file to be explicitly marked as binary with the addition of a b character to the file-open mode (e.g., "rb" opens a binary file for reading).

è If opening a file is successful, fopen() returns a valid FILE * pointer. If there is an error, it returns NULL (e.g., attempting to open a file for reading that does not exist, or attempting to open a file without appropriate permissions). As with other functions that return pointers to limited resources, such as the dynamic memory allocation functions, it is prudent to always check the return value for NULL.

è To close a file, the file pointer is passed to fclose(), which has the interface

int fclose(FILE *fp);

è This function breaks the connection with the file and frees the file pointer. It is good practice to free file pointers when a file is no longer needed as most operating systems have a limit on the number of files that a program may have open simultaneously. However, fclose() is called automatically for each open file when a program terminates.

2.2 Standard IO

When a program begins execution, there are three text streams predefined and open. These are

standard input (stdin)

standard output (stdout) and

standard error (stderr).

The first two signify “normal” input and output, and for most interactive environments are directed to the keyboard and screen, respectively. Their input and output streams are usually buffered, which means that characters are accumulated in a queue and sent in packets, minimising expensive system calls.

Buffering may be controlled by the standard function setbuf(). The stderr stream is reserved for sending error messages. Like stdout it is typically directed to the screen, but its output is unbuffered.

2.3 Sequential File Operations

Once a file is opened, operations on the file—reading or writing—usually negotiate the file in a sequential manner, from the beginning to the end. The standard library provides a number of different operations for sequential IO.

The simplest functions process a file one character at a time. To write a character there are the functions

int fputc(int c, FILE *fp);

int putc(int c, FILE *fp);

int putchar(int c);

where calling putchar(c) is equivalent to calling putc(c, stdout). The functions putc() and fputc() are identical, but putc() is typically implemented as a macro for efficiency. These functions return the character that was written, or EOF if there was an error (e.g., the hard disk was full).

To read a character, there are the functions

int fgetc(FILE *fp);

int getc(FILE *fp);

int getchar(void);

which are analogous to the character output functions. Calling getchar() is equivalent to calling getc(stdin), and getc() is usually a macro implementation of fgetc(). These functions return the next character in the character stream unless either the end-of-file is reached or an error occurs.

In these anomalous cases, they return EOF. It is possible to push a character c back onto an input stream using the function

int ungetc(int c, FILE *fp);

The pushed back character will be read by the next call to getc() (or getchar() or fscanf(), etc) on that stream.

Formatted IO can be performed on files using the functions

int fprintf(FILE *fp, const char *format, ...);

int fscanf(FILE *fp, const char *format, ...);

These functions are generalisations of printf() and scanf(), which are equivalent to the calls

fprintf(stdout, format, ...) and

fscanf(stdin, format, ...), respectively.

Characters can be read from a file a line at a time using the function

char *fgets(char *buf, int max, FILE *fp);

which reads at most max-1 characters from the file pointed to by fp and stores the resulting string in buf. It automatically appends a \0 character to the end of the string. The function returns when it encounters a \n character (i.e., a newline), or reaches the end-of-file, or has read the maximum number of characters. It returns a pointer to buf if successful, and NULL for end-of-file or if there was an error.

Character strings may be written to a file using the function

int fputs(const char *str, FILE *fp);

which returns a non-negative value if successful and EOF if there was an error. Note, the string need not contain a \n character, and fputs() will not append one, so strings may be written to the same line with successive calls.

For reading and writing binary files, a pair of functions are provided that enable objects to be passed to and from files directly without first converting them to a character string. These functions are

size_t fread(void *ptr, size_t size, size_t nobj, FILE *fp);

size_t fwrite(const void *ptr, size_t size, size_t nobj, FILE *fp);

and they permit objects of any type to be read or written, including arrays and structures. For example, if a structure called Astruct were defined, then an array of such structures could be written to file as follows.

struct Astruct mystruct[10];

fwrite(&mystruct, sizeof(Astruct), 10, fp);

2.4 Random Access File Operations

The previous file IO functions progress through a file sequentially. The standard library also provides a means to move back and forth within a file to any specified location. These file positioning functions are

long ftell(FILE *fp);

int fseek(FILE *fp, long offset, int from);

void rewind(FILE *fp);

The first, ftell(), returns the current position in the file stream. For binary files this value is the number of characters preceding the current position.

For text files the value is implementation defined. In both cases the value is in a form suitable for the second argument of fseek(), and the value 0L represents the beginning of the file.

The second function, fseek(), sets the file position to a location specified by its second argument. This parameter is an offset, which shifts the file position relative to a given reference location. The reference location is given by the third argument and may be one of three values as defined by the symbolic constants SEEK_SET, SEEK_CUR, and SEEK_END.

These specify the beginning of the file, the current file position, and the end of file, respectively. Having shifted the file position via fseek(), a subsequent read or write will proceed from this new position.

For binary files, fseek() may be used to move the file position to any chosen location. For text files, however, the set of valid operations is restricted to the following.

fseek(fp, 0L, SEEK_SET); /* Move to beginning of file. */

fseek(fp, 0L, SEEK_CUR); /* Move to current location (no effect). */

fseek(fp, 0L, SEEK_END); /* Move to end of file. */

fseek(fp, pos, SEEK_SET); /* Move to pos. */

In the last case, the value pos must be a position returned by a previous call to ftell(). Binary files, on the other hand, permit more arbitrary use, such as

fseek(fp, -4L, SEEK_CUR); /* Move back 4 bytes. */

The program below shows an example of ftell() and fseek() to determine the length of a file in bytes. The file itself may be plain text, but it is opened as binary so that ftell() returns the number of characters to the end-of-file.

/* Compute the length of a file in bytes. From Snippets (ansiflen.c) */

long flength(char *fname)

{

long length = −1L;

FILE *fptr;

fptr = fopen(fname, "rb");

if (fptr != NULL) {

fseek(fptr, 0L, SEEK END);

length = ftell(fptr);

fclose(fptr);

}

return length;

}

The third function, rewind(), returns the position to the beginning of the file. Calling rewind(fp) is equivalent to the statement fseek(fp, 0L, SEEK_SET).

Two other file positioning functions are available in the standard library: fgetpos() and fsetpos(). These perform essentially the same tasks as ftell() and fseek(), respectively, but are able to handle files too large for their positions to be representable by a long integer.

3. Command-Shell Redirection

Often programs are executed from a command-interpreter environment (also called a shell). Most operating systems possess such an interpreter. For example, Win32 has a DOS-shell and UNIX-like systems have various similar shell environments such as the C-shell, the Bourne-shell, the Korn-shell, etc. Most shells facilitate redirection of stdin and stdout using the commands < and >, respectively.

Redirection is not part of the C language, but an operating system service that supports the C inputoutput model.

#include <stdio.h>

/* Write stdin to stdout */

int main(void)

{

int c;

while ((c = getchar()) != EOF)

putchar(c);

}

Consider the example program above. It simply reads characters from stdin and forwards them to stdout. Normally this means the characters typed at the keyboard are echoed on the screen after the user hits the “enter” key. Assume the program executable is named “repeat”.

repeat

type some text 123

However, a file may be substituted for the keyboard by redirection.

repeat <infile.txt

display contents of infile.txt

Alternatively, a file may be substituted for the screen, or for both keyboard and screen as in the following example, which copies the contents of infile.txt to outfile.txt.

repeat <infile.txt >outfile.txt

Further redirection commands are >> and |. The former redirects stdout but, unlike >, appends the redirected output rather than overwriting the existing file contents. The latter is called a “pipe”, and it directs the stdout of one program to the stdin of another. For example, prog1 | prog2

prog1 executes first and its stdout is accumulated in a temporary buffer and, once the program has terminated, prog2 executes with this set of output as its stdin. The stderr stream is not redirected, and so will still print messages to the screen even if stdout is redirected.

Brief Details of Computer Science Subjects

Pages

Tuesday, July 5, 2016

Input and Output

No comments:

Post a Comment