1 Introduction PLEASE NOTE - This document has been generated by scanning old printed documents. It can therefore be expected to contain errors. If you find any, please edit the document and email it to rhaminisys@aol.com. There is no need to say where the changes are, we will find your corrections by comparing the original with your version. Please visit our web site to find out about our current software. Thanks! 1.1 Licence agreement The Rogalgol Algol60 system is copyright by RHA (Minisystems) Ltd. Freeware conditions ------------------- The components distributed in Algol60.zip are free for personal use. For any other use, including but not limited to educational use and any form of mass distribution, please contact RHA (Minisystems) Ltd. You may give copies of Algol60.zip to your friends. However, we would prefer everybody to download it from our web site at http://www.angelfire.com/biz/rhaminisys to be sure of getting the latest version. Only the downloaded ZIP file may be distributed, not the individual files. The software is provided "as is" without any warranty whatsoever. No support for the free software is guaranteed, although you are welcome to email us and we will help if circumstances allow. Educational and commercial use ------------------------------ All requests to use or distribute the software are treated on an individual basis. Customization of the run time program can be undertaken, including porting to different processors and operating systems. Contact information RHA (Minisystems) Ltd., 83 Gidley Way, Horspath, Oxford OX33 1TQ, England e-mail RHAMinisys@aol.com Web pages http://www.angelfire.com/biz/rhaminisys 1.2 History The RHA (Minisystems) Ltd Algol60 system began life in the early 1970s and was sold until the mid 1980s. Known as Rogalgol from the start, it was designed to run with minimal resources, just 8K of memory and paper tape input and output. In its final state a complete interface to the MSDOS operating system had been added. Sub-directories were a new idea at the time, the programs only recognize file names, not paths. The first Z80 version was developed in association with computer manufacturer Research Machines Ltd of Oxford, who also produced the Z80 User Manual. RML dropped their support for the language when they moved to 80x86 based machines. RHA (Minisystems) Ltd continued to develop the Z80 implementation and was solely responsible for the 80x86 versions. The Algol subset is the ultra lean language used to write the full Algol-60 compiler. Written in a language devoid of floating point, nested procedure declarations, stack checking and other things, the full compiler at 17K bytes is only just larger than the full virtual machine interpreter. The output of the subset compiler is code for the subset virtual machine, expressed in assembly language. Miniaturisation has progressed to such an extent that the system may have another lease of life as a language for microprocessor controls. It would not be difficult to change the front end of the compiler to make it compile something more modern looking than Algol-60. The virtual machine is simple and easily ported. The input/output system is modular, uses a generic way of dropping down to machine code, and is isolated from the rest of the virtual machine. For this reason we are not releasing the virtual machine sources. 1.3 Document overview Document overview The remainder of this document consists of two main sections, the Z80 manual and a supplement for 80x86 operating systems. Everything in the Z80 section applies to the 80x86 as well. All Z80 specific material in the original manual has been removed, it applies to patching the runtime interpreter, which is not supported on the 80x86 and is being reserved for commercial use on the Z80. Enhancements made since the involvement of RML have been included in the main body of the text, because they apply to the Z80 CP/M version also. 2 Z80 User manual 2.1 Z80 manual introduction Rogalgol is an implementation of the Algol 60 language designed especially for small computers. Almost all the features of Algol 60 are implemented together with a significant number of extensions. The system is essentially portable; the compiler, which is itself written in Algol, has been "burned in" over a period of several years on PDP8 and PDP11 computers. Rogalgol for the Z80, running under the CP/M operating system, consists of a one-pass compiler and a runtime program. The compiler translates the Algol source program into a machine independent intermediate code which specifies the sequence in which a number of subroutines is to be obeyed and which contains arguments for these subroutines. Each intermediate code occupies only one byte of memory, resulting in a very compact object code. Roughly 10 bytes are required to store an average statement. The runtime program contains a loader for the compiler output and all the routines required to run the Algol program. The compiler determines the minimum memory requirements of the system. The compiler and its runtime system together occupy about 12K bytes. Program work space and the CP/M operating system bring the total memory requirement to about 21K. The runtime system requires about 8K bytes of memory, which together with program, data, and CP/M requirements allows sizable programs to run in as little as 16K bytes. [Reserved for Z80 commercial use only]. The availability of the same compiler for 80x86, Z80, PDP11 and PDP8 computers makes it possible to develop programs on larger computers and to run high level programs on microcomputers which do not in themselves have the I/O or memory capability to support CP/M or the compiler. The modular construction of the programs allows the user to optimise the runtime system to meet individual requirements. In particular users can add I/O handlers, assembly code routines and error handling without needing to become involved in the inner workings of the system. 2.2 Language elements 2.3 Identifiers and symbols An Algol program consists of a sequence of symbols which are printing characters on a standard 'Teletype' terminal or its equivalent. Editing and layout characters are generally ignored by the compiler except where indicated in the following sections. The symbols are frequently grouped into units which the compiler treats as a single entity. These groups are the numeric constants just described, the language key words such as BEGIN and TRUE (which are strings of letters enclosed in single quotes or in upper case depending upon the convention being used), and identifiers. Only the first two letters of a language key word are significant. An identifier is the name given by the programmer to a variable, label, switch, array or procedure. There is a small group of Algol symbols which are made up of two characters. These are the assignment operator (:=), greater than or equals (>=) and less than or equals (<=). Identifiers consist of any number of alphanumeric characters of which the first must be a letter. Letters are in upper or lower case depending upon the convention being used. Examples are x, fred, ml, m2, abc123, abc1234. Identifiers may have up to 99 characters, all significant. To prevent an excessive slowing of the compiler, the 99 character limit is not checked. However, the total number of identifiers and the byte array storing the names are checked. The tables are optimised for an average identifier length of 5 characters. If the average is longer, the name array will overflow first; if it is less the limit on the number of identifiers will be reached first. All identifiers referring to variables and switches must be declared before they are used so that the compiler knows to what type of object they refer. Labels and procedures may be used before they are declared because the type of the identifier can be deduced from the context in which it is used. 2.4 Key words Some convention is required in Algol to distinguish between the language key words, e.g. BEGIN, END, ELSE, etc., and identifiers. In textbooks this is usually done by printing the key words in bold type. In actual compilers some other convention is needed depending upon the i/o hardware available. This compiler accepts two possible conventions. The first must be used where the i/o is restricted to upper case characters only. The second may be used where both upper and lower case are available. Two small utilities are provided to convert between these conventions. CONVENTION 1. Upper case only. All basic language words must be enclosed within single quote marks, e.g. SUM:=0; 'FOR' I:=1 'STEP' 1 'UNTIL' MAX 'DO' SUM:=SUM+(X[I]-XM[I])^2; RMS:=SQRT(SUM/MAX); CONVENTION 2. Upper and lower case. All basic language words must be in upper case and all identifiers must use only lower case. The above example now becomes: sum:=0; FOR i:=1 STEP 1 UNTIL max DO sum:=sum+(x[i]-xm[i])^2; rms:=sqrt(sum/max); The compiler decides which of the two conventions is being used from the first word of the program, which must be BEGIN, COMMENT or DIRECTIVE. Once the convention is selected, the compiler expects the remainder of the program to continue in the same manner. Internally the compiler converts all identifiers to upper case (which is how they appear if the identifier tables are printed). The compiler only checks the first 2 letters of the language key words, then skips until a suitable terminator is found (a closing quote or a non upper case letter). This implies that in the second convention adjacent language key words must be separated, e.g. THEN GOTO label; and not THENGOTO label; Note: In the first convention, decimal exponentiation in the representation of real numbers is represented by 'E' e.g. X := 1.234E-5, whereas in the second case a small 'e' must be used e.g. x := 1.234e-5. Note that when data is being read by the program then a capital 'E' is always used. Note: When switches are applied to file names they are case sensitive in the same way. Use lower case letters if the program being compiled uses the upper/lower case convention. Users if they so wish may use any of the language key words for the names of identifiers. As an example the following declaration is perfectly acceptable: BEGIN INTEGER real, begin, end; For the sake of readability the examples given in this manual will for the most part use the upper/lower case convention. 2.5 Pre-declared identifiers Certain procedure names are recognized by the compiler without declaration. Such identifiers may be regarded as having been declared in a fictitious outer block enclosing the entire program and thus in scope everywhere except where masked out by a local redeclaration. These procedures include the input/output routines which are discussed later and the standard functions as defined in the Algol 60 Report. In addition to these there is the procedure 'ioc' which takes a single integer parameter, e.g.: ioc(n) ; This routine serves a variety of purposes depending upon the value of n. These include input/output selection, format control and so on. These uses will be discussed in the following sections to which they apply. Another use of this procedure is as a simple way of linking to code routines within the runtime system. An inspection of the file 'ALIB.ALG' on the distribution disk shows that the majority of the procedures consist simply of a formal definition of their parameters with a single call to 'ioc' as the body of the procedure. Commercial Z80 users may link in their own code routines by this same method. 2.6 String literals A string in Algol consists of a sequence of characters enclosed within double quotes, e.g. text(1, "Hello Dolly"); All characters within the quotes with an ASCII value less than 32 are ignored. This includes carriage return, tab, and control characters but space is permitted. In order that these and other characters may be included within strings, the following convention is used. The characters '*' (asterisk) and '^' (circumflex) have special significance and are always considered in conjunction with the character which immediately follows. Control characters The sequence ^X will insert CONTROL-X into the string. In general '^' has the effect of stripping off all but the 5 least significant bits of the following character. Layout characters *N Inserts CR-LF (carriage return-line feed) into string. *C Inserts CR (carriage return). *I Inserts LF (line feed). *S Inserts SP (space). A literal space is also allowed. *T Inserts TAB. *P Inserts FF (form feed--new page). All other characters following * are taken literally; in particular, ** Inserts * *^ Inserts ^ *" Inserts " For example text (1, "An example *"STRING*" *SMight look lik e this*NX*^2+Y*^2="); will print on the console as: An example "STRING" might look like this X^2+Y^2= The internal representation of a string is a series of characters stored in sequential bytes terminated by a zero value. 2.7 Character literals The literal value of any character may be found using an ampersand "&" character followed immediately by the required character. For example, to convert a character digit to a number between 0 and 9 we have i:=chin(1) - &0; or to output an X then chout(1,&X); The convention described above involving * and ^ also applies, so to check for end of file (CONTROL-Z): i:=chin(dev); IF i=&^Z THEN ... or to check for a carriage return character: IF i=&*C THEN ... The two exceptions are &*N and &*" which will lead to the wrong result. The first generates two characters and the second is written &". Character literals are of type integer. 2.8 The structure of an Algol program 2.9 Program structure One of the most important features of the Algol language is that it is structured. Just as round brackets define sub-expressions within expressions, so the brackets BEGIN and END enclose a set of statements which are treated syntactically as a single statement, These bracketed statements are used in controlling the order of execution of the program and largely replace the use of labels and GOTOs, since they are obeyed as a whole or not at all. Such a statement is known as a compound statement or, if it contains any declarations apart from labels, a block. Statements within a compound statement are separated by semicolons. Strictly speaking, an Algol program is one statement, because it must start with BEGIN and end with END. A complete Algol program can be represented thus: BEGIN s1; s2; a3; .......... sn END FINISH The statements s1, s2, etc. may be of any type, including compound statements and blocks. The closing FINISH must be present. It is used by the compiler to check that there are the same number of BEGINS and ENDs. For users not familiar with the use of structured programming it is worth explaining why the 'converted' consider this such an important feature. An examination of any Algol program reveals the modular nature of its structure, each block containing specific declarations of the variables, arrays and procedures relevant to its operation. Many users new to such languages are at first irritated by the need to declare explicitly every identifier in a program. While there may be some justification for such criticism when considering trivial program examples, the advantages become obvious as the programs become larger and the declarations become a small percentage of the program. The block in which an identifier is declared defines the 'scope' of that identifier. Outside that block the identifier has no meaning and occupies no memory resource. The system thus takes on the role of resource management. The allocation of memory to variables is done dynamically as the program is being executed. On entering a block the system makes available those resources defined in the declarations and upon exit from the block these resources are reclaimed and made available for other uses; thus the memory required is always minimised. The penalty in terms of runtime speed is negligible as most of the organisation is done by the compiler. The total declarations throughout the program may in fact be in excess of the memory of the computer, provided it is not all in scope at once. In languages such as Fortran, on the other hand, the nearest one can get to this feature is to work out some cumbersome EQUIVALENCE statements which probably take longer to define than the corresponding Algol declarations. Economy of memory can also be achieved using 'dynamic bounds' on array declarations. The declared bounds can be defined in terms of arithmetic expressions evaluated at runtime on entering a block. These bounds may be different each time the block is entered. The size of the arrays can be chosen to be the smallest that will do the required task. Because memory is allocated dynamically in this way, it is important that programs make no assumptions on entering a block about the initial values of variables declared therein. Between leaving a block and re-entering it, the memory used by such variables may have been re-used for other purposes and indeed, in the case of procedures, the variables need not even occupy the same memory addresses each time the block is entered. If it is important that a particular variable should preserve its value between leaving and entering a block then its declaration should be removed to an outer block such that it remains 'within scope.' The localisation of the scope of variables to that of the block in which they are declared also economises on the use of identifiers. The same names can be declared within several blocks without ambiguity. This also helps when program segments from various sources are combined without leading to major problems with conflicting identifiers. 2.10 Blocks and declarations It has been mentioned already that variables must be declared before being used and that the presence of such declarations makes a compound statement into a block. All declarations must be placed immediately after a BEGIN or another declaration. This does not apply to labels, which are set by placing an identifier terminated by a colon just in front of the statement to be labelled. The form of a declaration of a set of unsubscripted variables of the same type is: Type ident1, ident2, ......, identn; 'Type' may be REAL, INTEGER, or BOOLEAN. The declarations ident1 etc. represent the names of the identifiers. Variables may be used within the block in which they are declared from the point of their declaration up until the corresponding END. Outside this range they do not exist and are said to be 'out of scope.' Procedures and labels may be used before declaration, but the declaration must still be such that they are within scope at the point they are used. Variables and switches must be declared before they are used. This is so that the compiler can distinguish them from functions, which may be used before declaration. This restriction is not made in full Algol, in which variables may be referenced before declaration as long as they are within scope. To each identifier used in a given block there must be a corresponding declaration. It follows that all identifiers declared in a block must be unique. The same identifiers may however be re-used within other blocks. Where an identifier is used within an inner block and also an enclosing outer block, then the inner declaration takes precedence for the duration of the inner block. The outer declaration is effectively 'masked out' but its value is preserved and again becomes accessible on leaving the inner block. Consider the following: BEGIN REAL x,y,p; INTEGER i,p; s1; s2; BEGIN REAL x,z; s3; END END Statements s1 and s2 cannot refer to z because it is out of scope; s3 cannot refer to the first x because the inner declaration will take precedence. When s3 refers to x it will always be the one declared with z, but s3 may refer to y and to i. The identifier p is declared twice within the same block and will generate a compile time error. 2.11 Program layout and style The layout of an Algol program is almost entirely within the hands of the programmer. Such concepts as line number or column number have no meaning in Algol. Layout characters such as new lines, spaces, and tabs are generally ignored except for the following cases: 1. Spaces are significant within strings. 2. Language key words and double character symbols should contain no embedded layout characters, e.g. BEGIN a>=b a:=b and not: BE GIN a> =b a: =b In all other contexts the programmer is free to lay out the program as he wishes. The resulting text may thus range from the very elegant to the totally unintelligible. As intelligent layout requires no additional work and results in programs far easier to follow and modify, users are urged to develop a good layout style which should reflect the block structure inherent in an Algol program. Use tab stops to indent text with BEGIN and END pairs aligned as in the examples given in this manual. Labels should start on a new line to the left of the program statements. Statements are separated by semicolons or language key words and not by new line characters, so that if necessary expressions may extend over several lines of text. Similarly one line may contain several shorter statements. Identifiers should reflect the nature of the quantities they represent; spaces may be included within identifiers if it makes the resulting text more readable. They will be ignored by the compiler. Frequently used loop variables and array subscripts are often most conveniently represented by a single letter identifier, e.g. end of file:=char=&^Z ; total:=total+term; FOR i:=lower STEP interval UNTIL upper DO .... volume:=height*length*width; Comments should he included where they make the workings of the program more understandable and are described later. 2.12 Conditional compilation There are two new Algol basic symbols: CC and EC. CC stands for Conditional Code and EC for End Conditional. By default these symbols and all intervening code are ignored. If the compiler is given the directive D (diagnostics) then the code between the new symbols is compiled. Because the code is scanned as basic symbols and not as characters, it should not contain any unmatched single quote characters. 2.13 Algol program variables 2.14 Data types The data in the computer memory which may be manipulated by the program is either numeric or Boolean. Numeric values may be real or integer. The difference is that integers have no fractional part and occupy 2 bytes of memory, while real quantities are held in exponent and mantissa form and occupy 4 bytes. The Algol system converts numbers from one type to the other whenever necessary. Arithmetic expressions may contain a mixture of real and integer quantities. Numerical and Boolean data may appear within the program as literal values. An integer literal is one which has neither a decimal point nor a decimal exponent part. Examples are 3, -200, +1234. The range of integers is -32768 to 32767. Real numbers contain either a decimal point or a decimal exponent, indicated by E or e (depending upon which convention is in use), or both. Examples are 0.1, -2.345, 1.2E3, 25.7e-7. Real numbers use one byte for the exponent and three for the mantissa, giving a range of magnitudes from approximately E-38 to E+38 and between 6 and 7 decimal digits of precision. Boolean literals are the language key words TRUE and FALSE. An alternative runtime program ARUNL has 32 bit integer operations instead of floating point. 2.15 Arrays Subscripted variables in Algol are known as arrays. Real, integer, and Boolean variables may be subscripted, while byte variables must be subscripted. Like other variables, arrays are declared at the start of a block, REAL ARRAY and ARRAY are equivalent. The declaration of an array with one subscript has the form: ARRAY ident [ae:ae]; The ae's represent arithmetic expressions defining the bounds of the subscript. Either or both bounds may be negative or zero, as long as the second one is not lower than the first. Real ae's will be rounded towards the nearest integer. If there is more than one subscript, the bound pairs are separated by commas; there is no limit to the number of subscripts other than the amount of available memory. If more than one array is to have the same bounds, the bounds need only be specified after the last one. One declaration may contain any number of array names: INTEGER ARRAY ia[1:30,1:5] ; BOOLEAN ARRAY ba1, ba2 [1:n,1:3], ba3 [0:20], ba4, ba5 [1:2*n] ; When arrays are declared with variable bounds, care should be taken that the variables have defined values. This means that they should be declared in an outer block and have been assigned values. This feature is used to partition the available storage according to the data. When an array is used, an arithmetic expression is put in each subscript position. The following are possible statements, using the examples above. Conditional expressions are described later. ia[1,4]:=7; ia [n,m]:=ia[ia[n,2], ia[n,m]] ; Finally a complex example to show what is possible ia[n,3]:=IF ba2[3,1] AND ba3[n] THEN ia[ia[3*n,1], IF ba3[0] THEN 2 ELSE n] ELSE a; 2.16 Array memory layout and bound checking The elements of an array occupy a contiguous region of memory. In multidimensional arrays the last subscript varies most rapidly. For example, consider a two dimensional array declared as a[m:n, p:q] The address of element s[i,j] would be given by an equation of the form base + size * ((i-m)*(q-p+1)+(j-p)) where size is 1, 2 or 4 depending on the type of the array (byte, integer, or real). At runtime a check is made that the address computed lies within the limits allocated for the storage of the array. Note that for multidimensional arrays this does not necessarily mean that all subscripts lie within the declared bounds. For example, the element a[-1,15] would he acceptable for an array declared as a[0:9, 0:9]. Boolean arrays are stored 8 bits per byte. 2.17 Byte arrays The introduction of BYTE arrays in Rogalgol is an extension to the language as defined in the Algol 60 Report. BYTE arrays allow for the efficient use of memory when string manipulation or small integer values are required. Within expressions BYTE array elements are treated as type INTEGER, and may be used in any context where am integer is allowed. Within expressions the contents of a byte array element become the integer value, and the 8 most significant bits of the integer are set to zero. Thus: i:=b[n] ; will always yield a positive value for i in the range 0 to +255. When assigning to a byte array element, the expression on the right hand side is first converted to type integer if necessary, the eight least significant bits of which are then assigned to the byte array element, the 8 most significant bits being discarded without any checking. Thus b[1] := -1; i := b[1] ; will assign to i the value +255. The only time the compiler distinguishes between an INTEGER ARRAY and a BYTE ARRAY is at the declaration. In all other contexts the two may be used interchangeably. In particular, a formal procedure parameter specified as type INTEGER ARRAY or BYTE ARRAY will accept either as the actual parameter, e.g., BEGIN INTEGER ARRAY i[0:100]; BYTE ARRAY b[0:100]; PROCEDURE xx(a); INTEGER ARRAY a; BEGIN ... END; xx(i); xx(b); will be accepted. 2.18 Expressions 2.19 Simple expressions An expression is a section of program which delivers a result. The result may be of type REAL, INTEGER, BOOLEAN, LABEL or (an extension to Algol 60) the address of a procedure. The result of a numerical expression is real unless all the variables and literals within it are integer and it contains no real operators. The real operators are exponentiation (denoted by ^) and real division %. Arithmetic expressions are evaluated with due regard to operator priority, and from left to right where these are equal. Parentheses may be used to change the order of evaluation. The following is a list of tho arithmetic and logical operators together with their priorities, Operator with their priority and meaning (highest priority) ^ 3 exponentiation * 2 multiplication / 2 real division % 2 integer division MOD 2 integer modulus + 1 addition - 1 subtraction MASK 1 logical AND DIFFER 1 logical EXCLUSIVE OR ! 1 logical OR (lowest priority) The operators MOD, MASK, DIFFER, and ! are additional to those defined in the Algol 60 Report. The operators %, MOD, MASK, DIFFER, and ! take two integer operands and deliver an integer result. The result of integer division % is truncated towards zero. The result of integer modulus (MOD) is the remainder lost by integer division. Note that i MOD 0 will always return the value zero while i%0 or x/0 will give a division by zero runtime error. The logical operators MASK, DIFFER and | consider each of the two integer arguments as a pattern of 16 bits thus: 3 ! 5 = 7 3 MASK 5 = 1 3 DIFFER 5 = 6 Apart from the cases just discussed, expressions may contain any mixture of real and integer quantities, and conversion between types will occur automatically as context dictates. For example BEGIN REAL x,y; INTEGER i,j; ... i:=x*y; x:=x*i; i:=x^y; i:=x^i; x:=i%j; are all valid operations. In conversion from real to integer the result is rounded towards the nearest integer value. Boolean expressions are made up of Boolean variables, the literals TRUE and FALSE, arithmetic relations, Boolean procedures, and Boolean operators. The Boolean operators are in order of precedence from highest to lowest priorities. NOT NOT b: Is FALSE if b is true and TRUE if b is FALSE. AND b AND c: Is TRUE if both b and c are TRUE otherwise it is FALSE. OR b OR c: Is TRUE if either b or 0 is TRUE (regardless of the other) otherwise it is FALSE. IMPLIES b IMPLIES c: Is FALSE only if b is TRUE and a is FALSE, otherwise it is TRUE. EQUIVALENT b EQUIVALENT c: Is TRUE if b and c have the same truth value and FALSE otherwise. The relational operators are: = equals > greater than >= greater than or equal < less than <= less than or equal # not equal Relational operators are dyadic. Care should be taken with the use of = and # where either argument is of type real. In this case am exact equal occurs only if both arguments have exactly the same bit pattern, which in the context of real quantities involving rounding may not be very meaningful. A more sensible test to make may be to check whether the absolute difference is less than (or greater than) some small quantity, e.g. replace IF x=y THEN .... with IF abs(x-y) < 0.0001 THEN As with arithmetic expressions, the order of evaluation may be changed by the use of parentheses to group terms. Examples of Boolean expressions are: NOT x < 5 OR bv1 AND (y = 0 OR bv2) x # (y - 5) +(x-5)^2 <= 20 where x and y are numeric variables or procedures and bv1 and bv2 are Boolean variables or procedures. Note the plus sign before the parentheses in the last example. If this had been absent, the compiler would have assumed that the bracket enclosed a Boolean expression and would have indicated an error on finding the closing bracket instead of the expected relational operator. In the first example the brackets do enclose a Boolean expression. (This way of forcing the compiler to recognize a bracketed arithmetic expression within a Boolean expression is not necessary in full Algol 60). The Rogalgol compiler can deal with expressions such as x - 5 >= 20 correctly because x must be numeric. Similarly, in the second example, the expression after the # must be numeric, so the compiler does not need a plus sign. In this case the brackets are not essential either. 2.20 Conditional expressions A conditional expression is one that takes one of several values depending on the result of one or more Boolean expressions. The general form is: IF be THEN se ELSE e where be stands for Boolean expression (which may itself be conditional), se stands for simple expression, which must not be conditional, and e for any expression. Because e may also be conditional the form can be extended: IF be THEN se ELSE IF be THEN se ELSE e The expressions must all be numeric, all Boolean or all designational (these are described later). A conditional expression is made unconditional by enclosing it in round brackets. The following is legal Algol: IF IF bv1 THEN x#3 ELSE y=0 THEN (IF bv2 THEN 25 ELSE 30) ELSE x+y The first be is conditional. The ae (arithmetic expression) within brackets is conditional and would not be allowed in that context without its brackets. It is a general rule of Algol that IF must not follow immediately after THEN. This is because it can result in ambiguous code. Examples of conditional expressions are: a:=IF a>0 THEN a*a ELSE 0; large:=IF a>b THEN a ELSE b; max:=IF a>=b AND a>=c THEN a ELSE IF b>=a AND b>=c THEN b ELSE c; a:=IF IF x>0 THEN y>0 ELSE y<50 THEN 3*x ELSE 0; 2.21 Statements 2.22 Conditional statements These have the same form as conditional expressions, except that it is not necessary for there to be an ELSE part. There are consequently two forms of conditional statement: IF be THEN s1 IF be THEN s1 ELSE s2 The statement s1 must not be conditional but s2 may be. In the first case no statements are obeyed if the be delivers a FALSE result. In the second case s1 is obeyed if the result is TRUE; otherwise s2 is obeyed. As s2 may be conditional this form can be extended indefinitely: IF be1 THEN s1 ELSE IF be2 THEN s2 ELSE s3 Just as expressions are made unconditional by enclosing them in round brackets, so statements are unconditional if they are enclosed in the brackets BEGIN and END. Examples of conditional statements are: IF a>0 THEN sum:=sum+a; IF char=&^Z THEN close(dev); IF samp>max THEN max:=samp ELSE IF sampb AND c>d THEN BEGIN ... END ELSE BEGIN ... END; 2.23 Assignment statements The general form is: variable:=expression The variable on the left hand side is assigned the value of the expression on the right. Note that in Algol the assignment operator is the double character symbol ':=' and not the equal sign '=' which is a relational operator. If the variable is Boolean, the expression must also be Boolean. If the variable is numeric, the expression may deliver an integer or a real result; it will he converted to the type of the variable if necessary, real numbers being rounded towards the nearest integer. Examples: i := 3+j ; x := IF be THEN J%5 ELSE 36.75; bv := i%7 The full Algol-60 multiple assignment syntax is supported. All array subscripts are evaluated left to right first, then the value of the right hand side (after the last ':=') and finally this value is assigned to all the variables to the left of a ':='. All variables assigned to must be of the same type: INTEGER, BOOLEAN or REAL. Examples: a := array[i,j] := b := c := (100 * x); bool1 := bool2 := i = j; 2.24 FOR statements A FOR statement allows the repeated execution of a statement with different values of a variable known as the controlled variable. The general form is: FOR variable:=fle, fle, ... fle DO s1 The statement s1 may be conditional. The controlled variable must be real or integer and not subscripted. (In full Algol 60 subscripted variables are allowed.) fle stands for "for list element". Lists containing only one element are allowed. There are three types of for list element: (1) an arithmetic expression (2) a STEP element and (3) a WHILE element. A STEP element has the form 'ae STEP ae2 UNTIL ae3'. After each execution of the controlled statement the value ae2 is added to the variable. Before each execution of s1, including the first, the variable is tested against ae3. If it is greater than ae3 and ae2 is positive, or less than ae3 when ae2 is negative, then the element is said to be exhausted. It should be noted that ae2 and ae3 are evaluated each time they are used, so that the value may be changed by the execution of the controlled statement. A STEP element may result in the statement not being executed at all, far example if ae>ae3 and ae2>0. A WHILE element has the form 'ae WHILE be'. On each iteration the arithmetic expression is evaluated and assigned to the variable. The Boolean expression is then evaluated and if the result is TRUE the statement is executed, otherwise the element is exhausted. Examples of FOR statements are; FOR i:=min STEP 1 UNTIL max DO sum:=sum+s[i]; FOR i;=1 STEP i UNTIL 1024 DO .... FOR i:=1, 3, 99, j, -6, 11 DO .... FOR x:=1, x*2 WHILE x<1O25 DO .... FOR i:=100 STEP -1 UNTIL -100 DO FOR x:=0.1, 1, x*5 WHILE x<1000, 20 STEP -5 UNTIL 0 DO.. FOR loops can be nested as deeply as desired. For example: FOR 1:=1 STEP 1 UNTIL max DO FOR j:=1 STEP 1 UNTIL i DO a:=a+b[i+j]^2; Matrix multiplication might look like: FOR i:=1 STEP 1 UNTIL m DO FOR j:=1 STEP 1 UNTIL n DO BEGIN x:=0; FOR k:=i STEP I UNTIL p DO x:=x+a[i,k]*b[[k,4l; c[i,i]:=x; END; The body of the FOR loop may be a dummy statement, for example to skip to the start of a new line. FOR i:=chin(dev) WHILE i#*C DO ; The loop variable may also be a dummy; FOR i:=0 WHILE test DO body ; In this case body may be a procedure or block which sets a Boolean variable test, or test could itself be a Boolean procedure. It would be preferable to use a WHILE statement in this case. FOR i:=body1 WHILE test DO body2 ; In this example body1 could be an integer or real procedure, test a Boolean procedure, and body2 a procedure or black, giving several possibilities for loop construction. A FOR loop is not a simple statement and cannot be called following the THEN part of a conditional statement unless enclosed within a BEGIN and END. It may however follow the ELSE clause without needing an enclosing BEGIN and END. 2.25 The CASE statement This allows the execution of just one of a number of statements, depending on the value of an arithmetic expression. The statements are labelled by one or more positive integer labels. The result of the selecting arithmetic expression is rounded down to an integer. The statement is terminated by an END, or by ELSE ;. If none of the statements is selected and the terminator is END, program execution continues with the next statement; if terminated by ELSE then the is obeyed. Note that there is no ';' before the END or ELSE. Example: CASE i+j OF 0: x := 3.4 ; 3:4: BEGIN x := 3; y := 4; END; 10: y := 10 END end comment; CASE n OF 2: GOTO fred; 1234: GOTO jim ELSE GOTO bert; 2.26 The WHILE statement The syntax is: WHILE DO ; The may be any type. The Boolean expression is first evaluated and if true the statement is executed. This process is repeated until the Boolean expression is false, when program execution continues at the next statement. 2.27 The REPEAT statement The syntax is: REPEAT (; ) UNTIL ; The syntax is not usual for Algol; it has been made identical to the Pascal REPEAT statement. (Algol would require a BEGAN after the REPEAT and an END before the UNTIL if more than one statement were to be repeated.) The statement or statements are obeyed at least once. They are obeyed repetitively until the Boolean expression is true, when execution continues with the next statement. 2.28 Dummy statements A dummy statement is one in which there is nothing before the terminating END, ELSE or semi-colon. Examples are: BEGIN END IF be THEN ; ; ; BEGIN s1; END IF be THEN ELSE ; PROCEDURE dummy; ; 2.29 Comments Rogalgol allows three types of comment. Any symbols appearing after an END until the first occurrence of semicolon, FINISH, END, or ELSE are ignored. These are known as END comments, e.g. END this is ignored; END so is this ELSE END and this also FINISH The other form is COMMENT any sequence not containing semicolon; This form is allowed after a semi-colon or after a BEGIN. Within comments single quotes must be matched. An alternative to using the key word COMMENT is to enclose text within braces. This is an extension to the Algol 60 Report. Such comments may contain embedded matching braces, or embedded unmatched single quotes. This form of comment may be used anywhere that COMMENT may be used. e.g. {this {comment} is ignored} Full Algol 60 also allows an additional type of comment within procedure calls and declarations. 2.30 Labels, switches and GOTO statements Any statement may be labelled by preceding it with an identifier and a colon. The scope of the label is the block in which it occurs. Program control is transferred to a labelled statement by a GOTO statement. BEGIN REAL x; s1; s2; GOTO lab; s3; lab: s4 END The following is not allowed because the label is not within scope: GOTO lab; BEGIN REAL x; s1; s2; lab: s3 END Labels in an outer block may however be accessed from within an inner block, e.g. BEGIN REAL q; lab: s1; s2; BEGIN REAL y; s3; GOTO lab; END END It should be noted that a compound statement does not become a block because there is a labelled statement within it. The second example would have been allowed but for the declaration of x. For the same reason labels in different compound statements but within the same block must have different names. A switch is a list of labels declared at the start of a block. All the labels must be within scope at the declaration. SWITCH s:=lab1, lab2. lab3; The simplest use of a switch is in a GOTO statement GOTO s[ae] The ae is evaluated and is used as an index to the list of labels in the declaration. If for example the ae has the value 2, the effect of the statement is the same as GOTO Lab2. If the value of ac is either less than 1 or greater than the number of labels in the declaration, then the effect is that the statement is treated as a dummy statement. Example: BEGIN SWITCH sw:=case1, case2, case3; try: text(1,"casenunber="); GOTO sw[chin(1)-&0] ; text(1,"*Nillegal value"); GOTO try; case1: .... case2: .... case3: .... 2.31 Designational expressions Designational expressions are like arithmetic or Boolean expressions. In a designational expression the elements may be labels or switch elements. The full definition of a GOTO statement is: GOTO de where de stands for designational expression. An example: GOTO IF x=0 THEN lab1 ELSE IF b THEN s[i+3] ELSE lab2 Designational expressions may also result in the address of a procedure, as will be described in the section on procedure parameters. This is an extension of Algol 60, in which designational expressions are only allowed for labels. 2.32 Procedures 2.33 Standard functions sin(x) x is in radians cos(x) x is in radians arctan(x) the result is in radians in the range -pi/2 to +pi/2. ln(x) natural logarithm exp(x) e to the power x sqrt(x) square root of x abs(x) absolute value of x sign(x) delivers -1, 0, or +1 according to whether x is negative, zero or positive. entier(x) returns the largest integer less or equal to x. Thus if x = 3.3 the result is 3. If x = -3.3 the result is -4. Note: As the result is integer the value of x must lie within the valid integer range of -32768 to 32767. In each of the above procedures x is called by VALUE and thus the actual parameter may be an expression. 2.34 Operating system identification INTEGER PROCEDURE opsysidcode(n) is a new pre-declared identifier. If n = 0 the value returned is as follows 1 if running under CP/M-80 2 if running under CP/M-86 3 if running under MSDOS or PC-DOS If n is non-zero the version number word is returned. This word has a different format under CP/M or MSDOS. 2.35 Procedures without parameters A procedure is a statement which is declared at the start of a block, but it is not executed when the block is entered. It is given an identifier and the appearance of its name causes the statement to be executed. The simplest type of procedure has no parameters and does not deliver a result. BEGIN PROCEDURE dothis; stat; s1; s2; dothis; s3; dothis END The statement stat is executed when dothis appears in the program. It is known as the body of the procedure. Even if stat has the form of a compound statement or a single statement, it is treated as a block. This is to prevent GOTO statements from leading into a procedure which is not active. A procedure may deliver a result of type REAL, INTEGER, or BOOLEAN. Such a procedure is known as a 'type procedure' or function. Its name can then be used in expressions, which will cause the procedure to be executed and the result to be used in evaluating the expression. Within the procedure body the value which will be returned is set by assigning it to the name of the procedure. Such an assignment statement may occur anywhere within the procedure body and there can be any number of them. Execution of the procedure continues until either the end is reached or a GOTO leads out of it. If the name of the procedure occurs within the procedure body itself, except on the left of an assignment as just explained, then the procedure will call itself and is said to be recursive. The following example illustrates several points: BEGIN INTEGER i; INTEGER PROCEDURE j; IF i<0 THEN GOTO nogood ELSE IF i=1 THEN j:=0 ELSE BEGIN i:=i-1; j:=j+1 END procedure j; i:=10; i:=j; nogood: END FINISH The procedure refers to variable i which is declared in the main program in the same block as the procedure. The declaration of i must come first or the compiler would have assumed that it referred to an as yet undeclared procedure of type Boolean (lines 3 and 4) or integer (line 6). Line 5 would have failed because the identifier to the left of := must be already declared. There must always be some condition which causes a recursive procedure to deliver a result or exit without recursing, as on lines 3 and 4. If this had not been done, or if i had not been decremented on line 5, the procedure would have called itself until the available storage was used up. The label nogood is attached to a dummy statement. Note also that the body of the procedure does not have to be enclosed by BEGIN and END; in this case it is a conditional statement. 2.36 Procedures with parameters The action of a procedure can be made to depend upon data supplied to it through a list of parameters at the time it is called. The procedure declaration contains a list of formal parameters. These are the names which are used within the body of the procedure. The type of each formal parameter is given in a specification, which looks rather like a set of unsubscripted variable declarations. The list of formal parameters is enclosed in round brackets and is placed immediately after the name of the procedure. The identifiers are separated by commas. Only the names are given, not subscripts or procedure parameters. For example: REAL PROCEDURE p(x,y,a,r,lab); VALUE y; REAL x,y; REAL ARRAY a; REAL PROCEDURE r; LABEL lab; In full Algol 60 a more complicated type of parameter separator (the 'fat comma') is also allowed. 2.37 Numeric and Boolean parameters by value This is the simplest type of parameter. When the procedure is called the actual parameters are evaluated and the value is passed to the procedure. Within the procedure body a parameter called by value acts in every way like a variable declared within the procedure, except that it is assigned an initial value when the procedure is entered. The value may be changed within the procedure but the new value is not accessible once the procedure has finished. PROCEDURE p(i,x,b); VALUE x,b,i; REAL x; BOOLEAN b; INTEGER i; BEGIN IF b THEN s:=x+i ELSE a:=x-i; x:=2*i; a:=x+i; END The variable 'a' has been previously declared outside the procedure. Note that the VALUE specification must come before the part specifying the types of the parameters. The Rogalgol system converts between INTEGER and REAL if the types of the actual and formal parameters are not the same. No other type conversions are allowed. A possible call of this procedure is: p(5.32, 2.5*y, z>0) where x and z are numeric variables, the last parameter is Boolean by value. It is a restriction is that a Boolean expression or variable should not be enclosed in brackets when used as a procedure parameter. The brackets are redundant, so the restriction is not a real one. This is due to the compiler making only one pass through the source code. 2.38 Variables called by name Any formal parameter which is not specified to be VALUE is said to be called by NAME. Instead of a value being passed to the procedure, the address of the variable is transmitted. It follows that the actual parameter must be the name of a variable of the correct type. (In full Algol 60 an expression is allowed and the address of a routine to evaluate it is transmitted.) When a variable called by name is assigned to within the procedure body the variable specified in the call is changed. Thus, variables called by name do not act like locally declared variables. The formal name stands for the actual name used in the call. The actual parameter is brought within the scope of the procedure body. In Rogalgol, array parameters must be called by name. (The full Algol 60 call of array names by value involves making a local copy of the whole array.) The actual parameter is the name of the array, without subscripts. Within the procedure body the formal array name is used with subscripts. There must be the same number of subscripts as in the original declaration of the array whose name was used as a parameter. Array elements (an array name followed by subscripts) may be used within or as an actual parameter, but only when the formal parameter is by value. This compiler accepts the use of BYTE ARRAY and INTEGER ARRAY interchangeably in procedure calls. (See the section on byte arrays). Objects called by name may be passed on from procedure to procedure through the parameter list. Unsubscripted variables called by name may be used as the controlled variable in a FOR statement. BEGIN ARRAY ar[1:20]; REAL x; INTEGER i; REAL PROCEDURE rp: rp:=i*2+2; PROCEDURE p(a,k,z); VALUE a; REAL a; INTEGER k; ARRAY z; FOR k:=1,2 DO z[k]:=rp*a; ar[1]:=10; p(ar[l],i,ar) ; END FINISH When procedure p is called, the value of the parameter a is initialised to be 10.0 and k within the procedure becomes equivalent to i in the main program. When rp is called for the first time, i has the value 1 so the result of tire expression which is assigned to rp is 3.0. The value of 3.0*10.0 is assigned to z[1], which means that ar[i] becomes 30.0. Next time, when i=2, tire value of rp is 6.0 but a still has the value 10.0 so the effect of the statement is ar[2]:=60.0. 2.39 String and switch procedure parameters When the formal parameter is a string, the actual parameter may be either a string of characters enclosed in double quote marks or, if the call is within a procedure having a string parameter, the name of such a string. String parameters cam only be used as actual parameters in further procedure calls, so it follows that the information in the string (as opposed to the address of the string, which is the information transmitted to the called procedure) can be used only by machine code called from within the procedure. This may be done by using the pre-declared procedure text(device,string), or string handling procedures in ALIB.ALG e.g. PROCEDURE moan(message,num); VALUE num; INTEGER num; STRING message; BEGIN text(1,"*NError at line "); write(1,num); text(1,message); END; The use of switch parameters is straightforward. The actual parameter is the name of a switch. When this name is used within the procedure body (with a subscript) the effect is as if it had been used in the block in which the switch was declared. However, in Rogalgol (but not in full Algol 60) if the execution of a parametric switch leads to within a procedure which has been called recursively, then the return is to the most recent call of the recursive procedure. If this consideration is important, the Rogalgol programmer should use several label parameters instead of a switch. Rogalgol finds the correct incarnation of a recursive procedure if it is jumped into through a label parameter. A procedure has been called recursively if there is more than one call in force. If A calls B, B calls C, and C calls A, then A has been used recursively. In practice this restriction is unlikely to prove to be a limitation. 2.40 Labels and procedures as parameters The treatment of parameters of type LABEL, PROCEDURE, REAL PROCEDURE, INTEGER PROCEDURE and BOOLEAN PROCEDURE is similar. The actual parameter is a designational expression which in Rogalgol (but not in full Algol 60) must be preceded by a type specification. This requirement is included so that the one-pass Rogalgol compiler may allow as yet undeclared procedures and labels to be used in procedure actual parameters. As with variables called by name, arrays, and switches, it is the address of the label or procedure which is passed to the procedure being called. The actual parameter must be in scope at the point of call but need not be within the scope of the called procedure; its use as a parameter effectively brings it within scope. If a procedure body contains a label declared with the same identifier as a formal parameter of the procedure, then the formal parameter will take precedence until after the declaration of the label. (In full Algol 60 the block structure always determines the order of precedence.) The pre-declared functions and input/output names cannot be used as procedure parameters, where the formal parameter is a procedure. A dummy procedure which calls the pre-declared one must be used. This is because the pre-declared procedures are not treated in the same way as those declared by the user, in order to shorten compiled programs and to increase the speed of execution. (In full Algol 60 the pre-declared procedure names can be used in this way.) The pre-declared procedures can of course be used in expressions where the formal parameter is a value parameter; sin(cos(3)), for example, is allowed. The Rogalgol compiler does not differentiate between name and value calls of parameters which are switches, strings, labels, and procedures. Where the actual parameter is a designational expression (only allowed for labels and procedure types) the value is calculated on procedure entry only, and not each time the parameter is used within the procedure body. The calls of all these parameters are therefore by value, although the compiler does not force the user to specify this. To illustrate these points, suppose that two procedures have the following headings: PROCEDURE p(s,lab,rp,at); SWITCH s; STRING st; LABEL lab; REAL PROCEDURE rp; REAL PROCEDURE x(y,st); VALUE y; REAL y; STRING st; A possible call] is p(sw, LABEL IF be THEN labl ELSE lab2, REAL PROCEDURE x ,"abc") A designational expression has been used as an actual parameter of type LABEL. As with arrays and switches only the name of the real procedure is used as a parameter. The parameters of the parametric procedure are included when the procedure is actually called, and not otherwise. A possible call of x within the body if p is: rp (rp(3,"DEF"),st) 2.41 Summary of points on procedures On entering a procedure the memory required is allocated dynamically according to the declarations. It follows that procedures are intrinsically recursive in nature, the limit on the depth of recursion being set by the available memory. The body of a procedure is a 'statement'; this may range from a simple (even dummy) statement to a compound statement or block. Within such a block there may of course be further procedure declarations, so that the following is a valid structure. PROCEDURE tom; BEGIN PROCEDURE dick; BEGIN PROCEDURE harry; BEGIN s1; ... END; s2; ... END; s3; ... END; The scope of these procedures follows the normal rules of scoping, so that statements s1 and s2 may refer to tom, dick or harry; statement s3 may refer to tom and dick but not harry. Statements within a procedure may make reference to any variable that is 'in scope', not just those passed through the parameter list. In Fortran a COMMON statement would be necessary. It is also possible to jump out of a procedure by means of a GOTO statement to any label that is within scope. There is a problem regarding the scoping of procedures in the case where the user declares a procedure of the same name as one of the pre-declared ones. This results from the fact that these pre-declared functions are compiled differently from those declared by the user, to make them faster and to economise on memory. BEGIN INTEGER i; PROCEDURE abc; BEGIN ... z:=sin(y); ... END; REAL PROCEDURE sin(x); VALUE x; REAL x; BEGIN ... END ; END FINISH According to the Algol 60 Report the scope of the two procedures 'abc' and 'sin' is the block in which they are declared. The statement 'z:=sin(y)' in the first procedure is referring to the second procedure in the block. In this compiler however the statement will generate code corresponding to a built-in procedure identifier that it already knows about. No error message is given. The problem could be avoided in this case by simply reversing the order of the two procedures, or better still changing the name so that no ambiguity can exist. There is no problem with procedures having names different from the built-in ones. As a general rule all procedure names are best kept unique amongst themselves and also from the variables. 2.42 Differences from the Algol 60 report 2.43 Differences from the Algol 60 report The Rogalgol language is Algol 60 with a few restrictions. Some of these are a result of the one-pass nature of the compiler. For example, variables must be declared before use. In other possible ambiguous situations that a multi-pass compiler could resolve, this compiler may require a 'clue' as to the type of object being translated. These differences are described in the following sections. A number of extensions to the language have been introduced. These include the data type BYTE ARRAY, logical and MOD operators, and a significant number of functions. Appendix 3 gives a summary of the differences from the Algol 60 Report. This manual describes the syntax of the language. The distribution kit includes a number of example programs which are described later. Users new to structured programming may find it beneficial to refer to an introductory manual on the language and also to the Algol 60 Report. 2.44 Language restrictions No OWN variables. No integer labels except in the CASE statement. Variables must be declared before use. Call by NAME is restricted to the case where the actual parameter is a variable name, i.e. as per call by reference in Fortran or call by location in CORAL 66. Array parameters must be called by name. The controlled variable in a FOR statement must not be subscripted. The 'fat comma' is not implemented. Boolean procedure parameters may not be enclosed entirely in round brackets. An opening bracket enclosing an arithmetic expression at the start of a Boolean expression must be preceeded by a plus or minus sign. 2.45 Language extensions Data type: BYTE ARRAY. Operators: MOD, !, DIFFER, MASK. Comments may be enclosed within matching braces e.g. {like this}. Procedure names may be the result of designational expressions. The CASE, WHILE and REPEAT statements. Additional functions include string handling, direct disk i/o, block move, clear array, etc. 2.46 The input/output mechanism 2.47 Stream or device numbers In Algol the exact form of Input/Output is not strictly defined but left up to the implementer to make best use of whatever facilities are available. Input/Output takes place through a series of procedures built into the runtime system. Input/Output is device independent and is associated with a stream or device number, which is the first parameter of all the built in input/output procedures. These numbers are in turn associated with a device name or a file name. The usual way of obtaining the stream numbers is to parse a command line with the procedures INPUT and OUTPUT. For disc files the number is 64 or greater and is an index to a CP/M file control block. Non file-structured devices such as the console and printer have lower numbers and Input/Output can be performed by simply choosing the appropriate device number. In the case of disk files some dialogue is necessary to open or create a specified named file. In this case the corresponding stream numbers are allocated dynamically by the system. The procedures to perform this dialogue are described in a later section. In the case of disk files we also have the choice of serial or random access. The command line parser maps all other device names to a device number, the one you can also use directly in the i/o procedures. The parser makes an additional check that the named device is suitable for the operation (input or output) requested. For example, the command line CON:=LST: is in error as LST: cannot be an input device. This will give runtime error 27. 2.48 ALGOL fixed stream numbers 0 Dummy. Output is thrown away, input always returns ^Z. 1 Output goes direct to the console screen via CP/M function 2. Input comes from the console keyboard one character at a time, but is checked for (a) ^C which causes a warm boot and (b) carriage return which causes a linefeed to be echoed as well. CP/M function 1 is used which echoes all characters typed. 2 PUN: on output and RDR: on input, AUX under MSDOS. Uses CP/M functions 3 and 4. 3 On output this goes to LST:, PRN under MSDOS, via CP/M function 5. Used for input it reads the keyboard without echo via CP/M function 6. Zero is returned If no character is waiting. Under CP/M V1.x ^Z is always returned. 4 This Is nearly the same as device 1. The only difference is that on input there is no check for ^C or linefeed. 5 The CP/M reader and punch devices via CP/M functions 3 and 4. The same as device 2. 6 On output the CP/M list device via CP/M function 5. On input the current printer column number is returned. 7 Buffered console input with echo. The buffer may be filled either from the keyboard or by using output device 7. When input is obtained from the keyboard (for example READ (7)) CP/M function 10 is used. This means that the CP/M line editing is in operation and nothing can be returned to the program until RETURN is typed. Linefeed is sent to the program as well. You can force input to begin afresh by wiping out the buffer contents before reading from device 7. When used for output, device 7 writes characters to the input buffer. 8 Used for output it sets the printer column count. Input returns ^Z always. 9 Dummy (as device 0). 10 I/O directly to memory. The device numbers 1-7 use calls to CP/M or MSDOS system calls. No BIOS calls are made. 2.49 Printer position on streams 3 and 6 When output is sent to either of these printer streams a record of the carriage position is kept. Tabs are assumed to be at intervals of 8 columns. If the printer uses a different interval the position returned will be incorrect if any actual tab characters have been sent. A count is kept of the column number currently under the print head using the following logic. All characters sent which have the value 32 (space) or greater increment the count. All lower value codes are ignored except as follows: 8 (backspace) decrements the count. 9 (tab) steps on to the next multiple of 8. 12 and 13 (form feed and carriage return) zero the count. The current count is obtained by CHIN(6). Some printers accept escape sequences containing printing characters but which do not move the carriage. To correct the count after such a sequence is sent CHOUT(8,N) may be used. The column count is set to N. 2.50 Device names in command Lines When a command line is parsed by the procedure INPUT or OUTPUT, device names are converted to the device numbers previously listed according to the following table. Name Device Operations NL: 0 Input, output CON: 4 Input, output RDR: 5 Input PUN: 5 Output LST: 6 Output TI: 1 Input VT: 1 Output TTY: 2 Input, output LP: 3 Output TIB: 7 Input KBD: 3 Input (CP/M version 1 returns a negative number) A: to P: are discs and require a filename. Only A: to D: are allowed under CP/M version 1. File names may contain a drive letter, but no path. Switch options recognized by the parser are: B Block i/o (random access). M Modify access (random access write). 2.51 Creating a command line for I/O streams The Rogalgol system allows the user the ability to select input/output files or devices from within the program or from the console keyboard. For this purpose there exists a buffer into which a command line containing I/O selections is placed. The placement is achieved either through I/O stream number 7, or through a number of calls to ioc() which create a console prompt. The basic sequence of events consists of: 1. Place an I/O selection string into the buffer. 2. Call a command string interpreter to read the contents of the buffer and copy the string into an 'input list' and/or 'output list' as appropriate. 3. A call of predeclared procedures 'input' or 'output' reads the next entry in the 'input list' or 'output list' and returns to the program the appropriate stream number, having opened or created any necessary files. Input from stream 7 is buffered and only made available to the program when a carriage return character is entered. Incorrect characters can he removed using the rubout key. There are two pointers associated with stream 7, one with input and the other output. As characters are entered or read from the buffer the appropriate pointer is advanced by 1. These pointers may be reset using the following ioc calls. ioc(0) Reset the input pointer. The next call of chin(7) will return the value of the first character in the buffer. ioc(1) Reset output pointer and write a string terminator into the first buffer position. The next call of chout(7,char) will place the value of char into the first position of the buffer and advance the position of the string terminator. Note that following the use of ioc(2) through ioc(5) described below, before reading from stream 7 the programmer should issue both an ioc(0) and an ioc(1) to reset the pointer and wipe out the current buffer contents. ioc(2) This produces a prompt on the console of the form: OUT=IN? The user than enters a command string of the general form: outputlist=inputlist When the carriage return character is given to terminate the command line the command string interpreter is called. Every character up to the separating equal sign (or carriage return if no equal sign is present) is copied and stored as the current 'output list' and everything after the equal sign is copied and stored as the current 'input list'. A pointer is associated with each of these lists and if a new input or output list entry is found then the corresponding pointer is reset to the start of that list. The detailed form of these lists is described later. ioc(3) This is similar to ioc(2) but the text is taken directly from the contents of the buffer without any user prompt. A typical calling sequence to set up an input/output list might be: ioc(1); text(7,"outputlist=inputlist"); ioc(3); ioc(4) This produces a prompt on the console of the form: INPUT= The user then enters a command string of the general form: inputlist This string then becomes the current 'input list', the output list remaining unchanged. ioc(5) This is similar to ioc(4) but the text is taken directly from the contents of the buffer without any user prompt. A typical calling sequence might be: ioc(1); text(7,"inputlist"); ioc(5); Note: A call of ioc(3) or ioc(5) leaves the contents of the buffer unaffected. The same string may if desired be parsed twice to set up both input and output files of the same names. This is done within the compiler to select its input and output. The general form of the input and output lists consists of a sequence of one or more device or file specifications separated by commas e.g. CON:,A:OUT1,,LST:=DATA.DAT[B],RDR: In the above example 4 output channels and 2 input channels are specified. A call of the pre-declared procedures input or output (described later) will scan the appropriate list from the current position up to the next occurrence of a comma or end of list indicator. A stream number will be returned corresponding to the entry found. A CP/M file specification is of the general form: FILENAME.EXT The characters recognized within file names are letters, digits, "$" and '?'; the latter should be reserved for specifying ambiguous file names. Lower case letters are converted to upper case as per the normal CP/M convention. All characters less than apace (ASCII 0 to 32) are ignored within I/O lists. The FILENAME consists of from I to 8 characters. The file extension '.EXT' if present consists of from 1 to 3 characters. If no extension is given a default value will be assumed; this is initially set to three spaces. The method of changing the default file extension is described under library procedure 'swlist' in the section 'Input/Output directly to or from memory'. It is possible to force the use of the default file extension regardless of what is given by the call ioc(20) In order to return to the default situation where a specified file extension takes precedence call ioc(21) consists of one of A:, B:, C;, D: or may be omitted. When used with CP/M version 2 disk drive names extending from A: to P: are accepted. If omitted a default is assumed according to the following rules. At the start of each line the assumed drive is the 'logged on drive' when the program is first entered. Any subsequent drive specified within the list then becomes the default for following entries. Switch options may be added to any input/output device or file specification and consists of a series of up to 12 characters enclosed within square brackets. Lower case letters are converted to upper case. Switch options must not contain a comma or equal sign. Certain switches are recognized by the runtime system and acted upon; in the example given above the input file DATA.DAT[[B] the switch [B] causes the file to be opened for 'random access' reading. Other switches not used by the system may be used by the program. A facility exists for the program to read the switch list directly. The occurrence of two adjacent commas within an I/O list is equivalent to specifying the 'null' input/output device NL: (stream 0). 2.52 Parsing the I/O command line The ioc calls described in the previous topic will have set up input/output lists. These lists may now be used to assign files or devices through the predeclared procedures INPUT and OUTPUT. dev:=input; will read the next entry in the 'input list'. If the entry is found to be a device then dev will be assigned a value corresponding to that device name. If a disk file was specified then that file will be opened. A buffer region will be allocated to contain the file control block and sector buffer (if serial access). The stream number returned will be from 64 upwards, the actual value indicating which buffer is allocated to that file. A negative value for dev indicates an error, e.g. bad syntax, no entry found in input list or no file found of that name. dev:=output; Similar to input but for output files or devices. A number of options exist regarding what action is to be taken if an output file name specified is found already to exist. These options are selected by calls to ioc which set the appropriate flags within the runtime system. The first is the default case. ioc(13) No checks are made. A second file of the same name will be created. A problem may be encountered later on trying to access such files. ioc(14) The existing file of the name specified will be deleted before the new file is created. ioc(15) If a file name is found already to exist, the call of output will return a stream number of -100. No new file is created. 2.53 Input/output directly to or from memory As an aid to text processing and related manipulation, e.g. setting up file extensions or reading the switch list a facility exists to read or write using the standard input output routines directly to or from anywhere in memory. Such i/o is associated with stream number 10. A number of string handling routines relevant to the following are described in the section on "library procedures'. Before i/o can be performed via memory it is necessary for the user to set up pointers to where input/output is to occur. As each character is read/written the corresponding pointer is advanced by one. The following procedures to manipulate these pointers are in ALIB.ALG. seti(a) Set the INPUT pointer to the address a. seto(a) Set the OUTPUT pointer to the address a. In practice a call of location would probably have been used to find the address. In order to find the current values of the input/output pointers: i:=ipoint Returns in i the current address of the input pointer. i:=opoint Returns in i the current address of the output pointer. A typical sequence might be; BEGIN BYTE ARRAY buf[0:1000]; seto(location(buf[0]); seti(location(buf[0]); rwrite(l0,x,0,6); i:=opoint; x:-read(10); It is the user's responsibility to ensure that such I/O stays within the declared bounds of the array buffer used. 2.54 Switch lists on I/O selections The memory I/O feature described in the previous topic is used to gain access to the switch lists associated with I/O streams. i:=swlist Returns in i the address of the switch list. The user can check if any switch options have been specified following a call of "input" or "output" by reading the contents of this switch list. These switches (a maximum of 12 characters) can be read using input stream 10. A typical sequence might be: seti(swlist); i:=chin(10); The first switch is now in i. The list is terminated with a zero value. The switch list always contains information relevant to the most recent call of the procedures "input" or "output". The default file extension is stored in the 3 bytes following the switch list. This can conveniently be set up by writing 3 (and only 3) characters into the appropriate buffer by means of output to stream 10, e.g. seto(swlist+13); text (10, "XYZ"); This sequence will set the default file extension to XYZ. On entry the default extension is set to null, i.e. 3 spaces. This technique can also be used as a way of reading small quantities of data in a manner similar to the DATA statement of BASIC, e.g. seti(sloc("1.32 99.6 ... ")) ; FOR i:= 1 STEP I UNTIL 20 DO x[i] := read(10); The procedure sloc is described in the section on library procedures. Another example involving text can be found in the program VDU.ALG on the distribution disk. 2.55 Closing and deleting files When the use of a file is completed it should be closed by a call of the predeclared procedure: close(dev) This will close the file associated with stream dev by a previous input or output call. If dev does not correspond to a disk file, nothing happens. NOTE: If an OUTPUT FILE is not closed its contents will be LOST. Input files should also be closed, as this call also serves to release the buffer and file control block associated with that file and makes it available for further use. delete(dev) This will delete the file associated with dev by a previous input or output call and release the file control block and buffer for reuse. 2.56 Serial input/output procedures In all of the predeclared i/o procedures the first parameter is the stream number denoted by dev. The name val indicates a REAL variable and ival an INTEGER variable. As the formal parameters are called by VALUE the actual parameters may contain expressions; the system will convert between integer and real values if necessary. PROCEDURE skip(dev) ; Outputs a carriage return/linefeed to dev. INTEGER PROCEDURE chin(dev); Read the next character from dev. The result of the procedure is the value of the character. In the case of disk input the character CONTROL-Z is returned at the end of file. REAL PROCEDURE read(dev); or REAL PROCEDURE read(dev,lahel); Read a floating point number or integer number from dev. The number is in free format, and is terminated by any character which cannot be part of a number. Decimal exponentiation is indicated by 'E'. Spaces, tabs and blank lines preceding the number are ignored but other characters will give an error. A space will terminate the number except between the 'E' and the exponent field. Integers may be read without rounding errors provided they appear as valid integers in the input, i.e., without decimal point or exponent parts. To allow the possibility of reading a file of unknown length, the second form given above may be used. In the event of passing the end of file, control is passed to the label. The name is not preceded by the LABEL indication as the compiler knows that the second parameter must be a label or a designational expression. End of file is a legal terminator; the jump will not happen unless another read is done. If the optional label is not given a runtime error occurs if end of file is passed. Examples of valid number formats are: 0.123 +1.23E -3 -123 The read routine will also accept the following, although the output routines never generate such formats: E-3 .123 -123. It may be desirable to read a data source containing text comments. The read routine can be instructed to ignore any character preceding the number which cannot be part of the number by the call: ioc(18); A consequence of the use of this mode of reading data is that numbers of the form: E-6 or .45 are no longer valid. The leading 'E' or '.' is regarded as comment; the actual numbers read in this case would be -6 and 45. To return to the default mode where comments are not permitted call: ioc(19); PROCEDURE text(dev,"string"); Output a string to dev. See the section on strings regarding interpretation of format and control characters. The string may also be a string parameter of the procedure in which text is called, in which case the actual parameter is the string identifier, e.g. PROCEDURE message(s); STRING a; BEGIN text(1,s); ... PROCEDURE chout(dev,ival); Outputs a single byte to dev. If a character is to be output, its ASCII value must be used. This can be found by using the character literal facility. For example: chout(1,&X); will print X on the terminal. PROCEDURE write(dev,ivai); or PROCEDURE write(dev,ival,radix); Prints ival as an integer on dev. The default radix is decimal. Non-significant characters are not printed. If formatted print is required use rwrite. Output in octal or hexadecimal is possible by including the optional third parameter. radix=0 for decimal radix=1 for octal radix=2 for hexadecimal Any other value for radix will lead to a runtime error. END OF FILE REPORTING The CHIN function returns -1 at end of file, allowing embedded CTRL/Z to be processed. Text files normally have a CTRL/Z at the end, unless the last record is exactly filled. 2.57 Formatted number output PROCEDURE rwrite(dev,val,a,b); or PROCEDURE rwrite(dev,val) Floating point output to dev; val is the value to output; a and b define the format such that: a = total number of characters including sign and decimal point. b = number of digits after the decimal point. If b is zero then we have formatted integer output. If the value of a is inconsistent with that of b some large value will be substituted, If a = 0 then exponent format is used with b decimal digits. If both a and b are zero or if they are omitted altogether as in the above example then the program defaults to exponent format with 6 decimal digits. Various aspects of the output formatting can be controlled by calls to the predeclared procedure ioc. These calls have the effect of setting flags within the runtime system which remain in effect until some further call is made to change then. These calls to ioc can be considered in 3 groups. The first of each group is the default state in effect when the program starts. The various calls within each group are mutually exclusive. The first group is concerned with what action is to be taken if the value to be output is too large to he accommodated by the specified format. ioc (6) The routine first attempts to accommodate the number by moving along the decimal point while maintaining the total field width constant. If this fails the routine will use exponent format provided the field width can he maintained else a row of asterisks '****' is printed indicating an out of range number. ioc(7) No format changes whatsoever are allowed. If the number cannot be accommodated then a row of asterisks is printed. ioc(8) No error print allowed. When this ioc call is in effect the error print indicated by a row of asterisks is never used. Format changes are allowed; if necessary exponent print will be used regardless of the field width specified. The second group is concerted with the representation of space within the output format. ioc(9) Set the 'default space character' to space (ASCII 32). Leading zeros are printed as spaces. ioc(10) Set the 'default space character' to null (ASCII 0). Leading zeros will be suppressed. The number is left-justified. (The null character is trapped by the routine and not actually sent to the output stream). The third group is concerned with the representation of positive numbers. ioc(11) Use the current default space character (see group 2 above) where a positive sign is expected. Initially the default space character is apace. If ioc(11) is called after ioc(10) the result is to suppress the character slot reserved to indicate e positive result. ioc(12) Print '+' to indicate a positive number. NOTE: Calls to rwrite and write are terminated by printing the 'default space character' (see group 2 above). This is initially set to space which serves as a terminator to separate output such that it can be reread by the read routine. More flexible number formatting is available using ioc(49), which allows independent control of the following three character positions, which are all affected by ioc calls 9-12. They are: 1. Leading spaces (rwrite only). 2. The positive sign (rwrite only; '-' is always printed). 3. Trailing spaces (rwrite and write). IOC(49) must be called from within a procedure whose first parameter is an integer by value. The value is 256 * x + y, where y is the character or null which is to be printed and x is 0, 1 or 2 corresponding to leading spaces, a positive sign and the trailing space. If x falls outside this range the ioc(49) call has no effect; y can be any character at all. By default all characters will be spaces. The following sequence will suppress the '+' sign and trailing spaces, but will right justify the number. PROCEDURE format(n); VALUE n; INTEGER n; ioc(49); format(&*S); {Leading space} format (256); {Suppress the '+' sign} format(512); {Suppress the trailing character} 2.58 Random access files A file may be opened to be read by random access rather than serial access. Such files are opened as 'input' files with a switch [B] set to signify block I/O. If the file is to be updated, i.e. written to, then an additional switch is needed [BM] where the 'M' indicates 'modify'. These rules imply that only pre-existing files may be opened for random access. As example of an 'input' specification. DATA1.DAT[B],DATA2.DAT[BM] The first file is opened for random access reading and the second for reading/writing. i:=rblock(dev,a.h,n) will read n blocks from the disk file associated with stream dev, starting at block number b, writing the contents In memory starting at address a. The length of the transfer is 128*n bytes. The first block of the file is block number 0. The address in general will correspond to part of an array set up by means of procedure location (see section on library procedures) e.g. i:=rblock(dev,location(buf[0]),b,10) ; On exit i will have the following meaning. i=0 successful read. i=1 read past end of file. i=2 reading unwritten data. i=3 hard error. The user should ensure that the declared array is large enough to accept the transfer. Any part of a selected transfer extending beyond the end of file will be set to zero. i:=wblock(dev,a,b,n); Will write n blocks to disk; the parameters are the same as for rblock. On completion i can take the following values. i=a successful write. i=1 error in extending file. i=2 end of disk file. i=3 hard error. i=255 no more directory space available. 2.59 Input/output support routines The following additional procedures are recognized by the runtime system and are made known to the compiler by including the text of ALIB.ALG with the program source. rewind(dev); The serial input or output file associated with dev is (first closed in the case of output files and) rewound for reading from the beginning. dev := findinput ("string"); This call will open the file or device defined in "string" for input on stream dev. If the first character of "string" is found to be a question mark "?" then the effect is as follows. The remainder of the string is printed on the console as a prompt to the operator who enters the required input file or device name, e.g. dev:=findinput("?Source file="); will prompt the operator: Source file= who then enters the required name: dev:=findinput("DATA.DAT"); opens the file DATA.DAT on the logged on drive. The input specification may in fact consist of an 'input list' the first entry of which will be used and assigned to dev. Note that the use of this procedure will wipe out any previous input specifications waiting in the input list. dev:=findoutput("string") This is analogous to findinput but for output. The output specifications may if desired be generalised to be a complete input/ontput list as described under ioc(2) and ioc(3) above. i:=rename; This procedure renames a file. The old filename and drive information are taken as the next entry in the 'inputlist'. The new filenane is taken from the next entry in the 'outputlist', e.g. ioc(1); text(7,"FRED.ABC=B:JOE.XYZ"); ioc(3); i:=rename; will rename file JOE.XYZ on drive B: as FRED.ABC. Note that the CP/M rename utility will rename all files that satisfy the input specification. On exit: i=-1 implies a failure, e.g. file not found or illegal syntax. i=255 CP/M reply from rename regardless of success or failure. The default file extension will be used if none is specified, or if ioc(20) is in effect, will be used regardless. If a file of the same name as the new name given is found already to exist, then the result will be the same as described under procedure 'output' with regard to calls of ioc(13) to ioc(15), namely: ioc(13) No checks are made. ioc(14) Erase any pre-existing files of the same name. ioc(15) Return the value -100 in i. i:=newext(j, "XYZ"); The file associated with stream j by a previous call of input or output is closed and its file extension changed to the 3 character string given as the second parameter. This string becomes the default file extension, e.g. j:=findinput("FRED.ABC"); i:=newext(J ,"XYZ"); will rename the file FRED.ABC as FRED.XYZ. No checks are made as to the pre-existence of files of the same name. A negative result in i implies a failure; the expected reply is 255. i:=fcblock(dev); This returns in i the address of the file control block associated with file stream dev. This can be useful only to users who wish to manipulate CP/M facilities directly. i:=exflt(a,t); Extend the file control block list. The Rogalgol system is initially set up to allow 4 serial files end 2 random access files open at any time. Should users require more than this number of files then this procedure may be used to extend the list of file control blocks available. Each call extends the length of the list by one. On exit a negative value in I indicates an attempt has been made to extend beyond its maximum length of 16 entries. The parameters to exflt are; a = address of buffer to use t = file type If t=0 then serial file else random access The buffers used are user declared array, the address of which is found using procedure location, e.g. BEGIN BYTE ARRAY buf[0;160]; I:=exflt(location(buf[0]),0); The buffer sizes required are for serial files 164 bytes (36 for the file control block + 128 for the sector buffer) and for random access files 36 bytes. It is the user's responsibility to ensure that the array is large enough to accommodate the buffer and that such buffers do not overlap or become overwritten. 2.60 Direct BDOS and BIOS CP/M calls Direct call to CP/M BDOS is made by doing ioc(48) within a procedure whose first two parameters are the call number and the address to be placed In DE. These must both be VALUE parameters. The second parameter receives the returned HL value after the call ioc(48). Note that the new value is only accessible from within the procedure as the DE parameter is by value. You can either assign this value to the function or to an external variable. In either case you need to add a statement to the library procedure CP/M after the ioc(48). a:=cpm(c,de) This procedure performs a direct call to CP/M where c = contents of C register on entry (0 to 27). de = contents of BE register pair on entry. a = result in A register on exit. Refer to the CP/M Interface Guide for details. a:=bios(n,bc) ; This procedure performs a direct call through the BIOS jump vector where: n = entry in the jump table (0 to 14) bc= contents of BC register pair on entry. a = contents in A register on exit. Refer to the CP/M System Alteration Guide for details. The body of the procedure contains the call ioc(47). 2.61 Library procedures 2.62 Library procedures The following procedures are built into the runtime system and can be made known to the compiler by including the source of file "ALIB.ALG" with the program. Some of the following are machine dependent. See the text of 'ALIB.ALG' for the formal definitions of the procedure parameters. MEMORY MANIPULATION i:=location(x); This returns with i set to the address of variable x; x may be REAL, INTEGER, or an array element of type REAL, INTEGER, or BYTE. In the case of REAL or INTEGER arguments the address returned is that of the slot assigned to that variable (see description of the workings of the runtime system). Each slot occupies 4 bytes and in the case of INTEGERs only the upper half is used so that in this case 2 should be added to get the actual address containing the integer. Array elements as arguments always return the correct address. The procedure works by recalling the moat recent variable address computed; as the argument is called by value the compiler will in fact accept any expression as the actual parameter, although the result will correspond to the final variable specified. Users who wish to find the address of Boolean variables may construct a similar procedure with the same body as location but with a formal parameter of type Boolean by value, i:=fspace; This returns the number of bytes free (allowing for a safety margin for stack operations). Note that on large systems the result may exceed 32K and thus appear to have a negative value in two's complement representation. blmove(s,f,len); Block move of len bytes starting at address 5 to the block starting at address f. In general the use of procedure location (see above) would be used to set up the addresses, e.g. blmove(location(a[0]),location(b[0]),100); It is the user's responsibility to ensure that such block moves stay within the limits of the declared arrays. This procedure will work correctly if the two blocks overlap. i:=peek(a) Returns the byte value contained within the address given by a. poke(a,i) Sets the contents of address given by a to the value of (the 8 least significant bits of) i. clarr (a,len) Clear array area of length len bytes starting at address a. SHIFTS AND ROTATES In the following procedures v is the value (type INTEGER) and n is the number of places to shift or rotate. Note that only the 4 least significant bits of n are used so its value should be in the range 0 to 15. i:=shl(v,m) Shift LEFT. i:=lsr(v,n) Logical shift RIGHT. i:=asr(v,n) Arithmetic shift RIGHT. i:=rotl(v,n) Rotate LEFT. i:=rotr(v,n) Rotate RIGHT. Arithmetic shift right extends the sign bit whereas logical shift right always places zeros into vacated positions. INPUT/OUTPUT dpb(u,t,a,a) Set up the disk parameters, u=unit number (0 to 3), t=track, s=sector, a=DMA address. i:=rdisk Read the disk directly using information set up in a previous call to dpb. The result from the CP/M call will be in i. i:=wdisk Write to disk directly using information set up by a previous call to dpb. The result from the CP/M call will be in i. STRING MANIPULATION i:=sloc("string") Returns in i the address of the start of the string. The actual parameter may also be a string parameter of a procedure, e.g. PROCEDURE x(s); STRING s; BEGIN INTEGER i; i:=sloc(s); i:=sloc("XYZ"); Strings consist of a series of characters stored in sequential bytes terminated by a zero. atext(dev,s); This is similar to the pre-declared procedure "text' but the second parameter is the address of the string. e.g. text(dev,"XYZ"); is equivalent to atext(dev,sloc("XYZ")); i:=tlen(s) Returns the length of the string whose address is at s, e.g. i:=tlen(sloc("XYZ")); returns the value 3. i:=smatch(long,short) This procedure compares two strings looking for the first match within the long string corresponding to the contents of the short string. The parameters are the addresses of the strings. If a match is found the value of i is set to the address within the long string corresponding to the start of the match. If no match is found i will be set to zero. Additional matches may be found by giving as the starting address of long the value one greater than the result of the previous match. MISCELLANEOUS b:=parity(i) This Boolean procedure returns TRUE if the character value of i (8 least significant bits) has EVEN parity else FALSE. x:=random Returns a pseudorandom number in the range 0 to 1. 2.63 Library inserts A facility exists which allows the contents of 'library' source files to be included with the body of the program at compile time, e.g. LIBRARY "B:ALIB.ALG" or, using the upper case convention, 'LIBRARY' "B:ALIB" The effect at compile time is that on encountering the language key word LIBRARY the compiler looks for an input file specification enclosed within string quotes. This file is opened and its contents included with the program source at the points the call is found. In the above case the file ALIB.ALG on drive 3: is read, the default extension being '.ALG'. The default drive is the logged on drive. This capability allows the user to construct libraries of frequently used procedures, thus avoiding duplication of text and excessive editing. BEGIN INTEGER i,j,k; LIBRARY "ALIB" LIBRARY "lOLIB" LIBRARY "STATLIB" PROCEDURE abc; ... This example would include the contents of three library files in turn when compiling. These files may if desired themselves contain LIBRARY directives. The limit on the depth of such calls is set by the maximum number of input and output files that may be open at any one time. In the compiler as distributed this limit is set to five. 2.64 Example programs The following examples illustrate various aspects of the language. The first four are fairly straightforward; the final two examples assume a fairly advanced knowledge of mathematics. Further examples can be found on the distribution disk. The first example lists a table of integers up to 20, together with their square roots, on the console. BEGIN INTEGER i; FOR i:=0 STEP 1 UNTIL 20 DO BEGIN rwrite(1,i,5,0); rwrite(1,sqrt(i),0,6); skip(1) END END FINISH The second example lists a file on the console. On detecting the end of file it loops back for further files to list. BEGIN INTEGER i, d; { Get input file } loop: ioc(4); d:=input; { Check if valid file } IF d<64 THEN text(1,"*NTry again") ELSE BEGIN {list file on console } FOR i:= chin(d) WHILE i#^Z DO chout(1,i); close(d); {realease fcb} END; GOTO loop; END FINISH The next example is a procedure to illustrate string handling. The routine makes use of several procedures from ALIB.ALG. It scans a piece of text starting at address "old" and substitutes every occurrence of a given string "olds" by that given in "news". The source is itself in the form of a string, i.e. terminated with a zero value. The resultant string will start at address "new". The calling sequence la:=location(a[0]); lb:=location(b[0]); substitute(lb, la, "Jack","%1"); substitute(la, lb, "Jill","%1"); will replace every occurrence of "%1" by "Jack" and "%2" by "Jill". Both the initial string text and the resultant string start at location a[0]. The array b is used as working space. PROCEDURE substitute (new, old, news, olds) ; VALUE new, old, INTEGER new, old ; STRING news, olds ; BEGIN INTEGER i, j, ns, os, nl ol, oldfin ; ns := sloc(news) ; os := sloc(olds) ; nl := tlen(ns) ; ol := tlen(os) ; { lengths of strings } { address of closing zero of input string } { look for matches } FOR i := smatch(old, os,) WHILE i # 0 DO BEGIN j := i - old ; { length of text to copy } blmove(old, new, j) ; { move over portion of text } new := new + j ; { update pointers } old := old + j + ol ; { skip old string } blmove(ns, new, nl) ; { copy in new string } new := new + nl ; { update pointer } END ; blmove(old, new, oldfin - old) ; { copy remainder } END substitute ; The fourth example, quicksort, is a sorting algorithm originally developed by C.A.R Hoare. An array of values is sorted into ascending order. The method involves reordering terms such that it can be partitioned in the form: a[low],a[low+1], ... a[i-1] < a[i] <= a[i+1],a[i+2], ... a[high] The pivot value in this case is arbitrarily chosen as the value of the final element on entry. The procedure then calls itself recursively for each side of the above expression until each partition contains only one term. The following coding exploits a feature of this compiler that the value of the loop variable on exit from a loop will be that which led to the loop's termination. This may not be the case on other Algol compilers. PROCEDURE quicksort(a,low,high); VALUE low,high; INTEGER low,bigh; INTEGER ARRAY a; IF lowi AND y>=pivot DO j:=j-1; IF i= abs(obs) THEN count:=count+i ELSE IF n # 1 THEN BEGIN br(n-1,l,sun); br(n-1,-1,sun) END END br; text(1,"*Nnumher of pairs?"); j:=read(7); obd := D; FOR i:= i STEP 1 UNTIL j DO BEGIN a:= read(7); b:= read(7); obs := obs + a - h; d[i] := ahs(a - h); END ; text(1,"*Nsum of differences"); rwrite(1,obs,8,2); count:=0; br(j,-1,0); br(j,1,0); text (1, "*NProbahility of same or greater with random signs"); rwrite(1,count/2^j ,7,3); END FINISH The final example is a procedure for solving simultaneous equations. The left hand side matrix is set up in a two dimensional array a[row,colunm] and the right hand side in a vector b[row]. The array names (a and b are only examples) are passed to the procedure to correspond to the names lhs and rhs, together with an integer giving the number of equations and a label to exit to if there is no solution. The answers are left in the right hand side vector. The method uses a Gaussian elimination with partial pivoting. PROCEDURE solve(order,Iha,rhs,fail); VALUE order; INTEGER order; ARRAY Ihs,rhs; LABEL fail; BEGIN INTEGER row,col,rawl,orderl,i,j; REAL max; FOR order1:=order STEP -1 UNTIL 2 DO BEGIN max:=0; FOR J:=1 STEP 1 UNTIL order1 DO IF abs(lhs[j,order1]) > max THEN BEGIN max:=abs(lhs[j,orderi]); row:=j; END IF row#order1 THEN BEGIN max:=rhs[order1] ; rhs[order1]:=rhs[row]; rhs[row]:=max; FOR col:=I STEP 1 UNTIL order I DO BEGIN max:=lhs[order1,col] ; lhs[order1, col]:=lhs[row. col] ; lhs[row,col]:=max END END swop equations; IF lhs[order1,order1]=0 THEN nosol: BEGIN text(1,"*Nno solution"); GOTO fail END FOR j:=STEP 1 UNTIL order1-1 DO BEGIN rmax:=Ihs[[i,order1]/lhs[orderl,order1]; rhs[J]:=rhs[j]-rhs[order1]*max; FOR col:=i STEP I UNTIL order1 DO BEGIN lhs[j,col]:= Ihs[j,col]-lhs[order1,coI]*max END zero one element; END zero one column; END triangularise the left hand side; IF lhs[1,1]=0 THEN GOTO nosol; FOR row:=1 STEP 1 UNTIL order DO BEGIN rhs[row]:=rhs[row) / lho[row,row] ; FOR row1:=row+1 STEP I UNTIL order DO rho[row1]:=rhs[row1]-lhs[row1,row]*rhs[row]; END END solve simultaneous equations; 2.66 Compiling and running programs Running Rogalgol is a two stage process: 1. Compiling. The program source is read by the compiler to produce an output file in a form to be read by the runtime system. 2. Running. This stage loads the file output from the compiler and runs it. The simplest sequence of commands given a program source in a single file 'PROG.ALG' would consist of: ALGOL PROG ARUN PROG The default disk drive for input and output files is the logged on drive. The default file extensions are: Source files .ALG Compiler output .OBJ Monitor file .MON 2.67 Compiling In the simplest case given above the compiler reeds the program source from the file specified; if no file extension is given then the default will be used. The output file created is given the same name as the source file but with the extension '.OBJ'. Any pre-existing file of the same name as the output file will be deleted before the new output file is created. If the compiler detects any errors in the program source the output file is deleted but compilation continues until the end of the source, checking for further errors. Error messages are sent to the console. At the of compilation the size of the resulting program is printed and control is returned to CP/M. A more general form of calling the compiler is: ALGOL outlist=inlist For example: ALGOL OUT=IOL1B,B:MATHS,PROG Using this method, the input source is read from a series of files in turn; typically these files would consist of groups of commonly required procedures, ending with the file containing the program. It should be remembered that the overall source should correspond to the required Algol block structure, from the first BEGIN to the final corresponding END and FINISH. Files may be taken from several drives; if the drive is not specifically included then the current default will be used. This is discussed in the section on i/o selection. In the example IOLIB.ALG is taken from the logged on drive and MATHS.ALG and FROC.ALG from drive B:. The output OUT.OBJ goes to the logged on disk. An alternative (and perhaps better) way of combining source files is by the use of the LIBRARY facility previously discussed. It must he remembered however that the use of such library calls is restricted to the final file specified in the input list; the remaining input file specifiers will otherwise be overwritten. This is discussed in the details of library procedure 'findinput'. If a second output stream is specified, then a listing of the compiler identifier tables will be generated. Compiler error messages will also be seat to this stream along with an indication of the maximum table size the system can support. ALGOL OUT1,OUT2:=PROG will send the compiler output to OUT1.OBJ, and all compiler error messages and identifier tables go to OUT2.MON. ALGOL OUT1,CON:=PROG will send errors and identifier tables to the console. If no input/output is specified in the call, or if an error exists, e.g. bad syntax or a non-existent source file is given, then the compiler will prompt for i/o. For example, a call of the form: ALGOL will result in a prompt of the form: OUT=IN? The user may now specify a list of input and output files as for the above case. The output from the compiler is about the same length as the corresponding source text. If the i/o files were specified in the initial calling lime, i.e. 'ALGOL PROG', then upon completion the compiler will return to CP/M. If the i/o files were given as the result of a prompt from the compiler, then upon completion the compiler will be restarted, to allow further programs to he compiled. A reply of CONTROL-C in this case will return control to CP/M. 2.68 Compiler directives A 'DIRECTIVE' ; may be included anywhere that 'COMMENT' is allowed. They are now allowed before the first 'BEGIN' of the program (not standard ALGOL-6O). The string may contain any basic symbols except ';' and does not have to be delimited by quotes. If the upper/lower case convention is being used, then the directive letters must be in lower case, not as below. Only the following symbols are significant: + Any significant letters following will turn the facility ON. This is the default condition. - Any significant letters following will turn the facility OFF. This remains in force until a '+' is B Controls whether Bitstream output is produced. This should only be used before the first begin. The default is bitstream; character output requires a special loader version of ALINK, ARUN will not accept it.The character format output can be decoded using the list of operation codes. L Controls whether the input source with added line numbers is reproduced on the listing device. Default OFF. T Controls whether a full identifier table is listed. Default OFF. S Causes a shortened identifier table containing only labels and procedures to be sent to the list file. If 'T' is in force as well as 'S' a full table is produced. Default OFF. P Causes all procedures and function which are not EXTERNAL to be ENTRY points. Default OFF. See the Linker document. The tables and source reproduction are sent to the listing file, the second output file in the CP/M command line. If none was specified, the console is used. Giving a listing file in the command line (i.e. ALGOL OUT,LIST=INPUT) in effect causes an automatic 'DIRECTIVE' T-L; Directives may be given in the CP/M command line which runs the compiler. A list of options attached to the output file will be analysed in the same way as a 'DIRECTIVE' string. For example to force a listing on the terminal: >ALGOLC PROGRAM[L] Specific ENTRY and EXTERNAL declarations The symbols ENTRY and EXTERNAL may be placed after the name in a procedure declaration to allow linking of separate program modules as described in the linker document. Directive 'P' may be used to make all procedures which are not EXTERNAL into ENTRY points. 2.69 Character and bitstream compiler output files Under default conditions a bitstream is output instead of ASCII characters. You can force ASCII output by using directive -B. This can help in debugging, but the loader and standard linker accept only bitsream format. The special linker ALINKS does accept character format compiler output. Data is in chunks of 2, 8 or 16 bits. Each chunk is the right way round but chunks start at the least significant end of each byte. Floating point literals and entry point/external names are in ASCII starting on a byte boundary. Floating point numbers are terminated by 13 and names by the character '. Label numbers are short or long depending on whether the number will fit into 8 bits. 2 bit codes and their meanings 0 Byte value follows. 1 Short label number follows. The value of the label expanded to 16 bits to be loaded or used as definition as required. 2 Short label definition follows. After the 8 bit label number, a further code indicates the type of definition. 0 Colon definition, value = load address. 1 Equate value. Read from top to find it. 2 Label is equated to global name which follows. 3 Second code follows. 0 Word value follows. 1 Long label number follows, value of label is to be loaded or used as definition. 2 Long label definition follows, then third code as for code 2 above. 3 Third code follows 0 Floating point literal follows. 1 Global symbol definition. After the name in ASCII read from the top to get its value. 3 End of load module. 2.70 Pre-compiled libraries and the linker Commencing with version 5.0, the Rogalgol system allows a program to be compiled in sections. One section (or module) must contain the complete main program. The procedures may be split over as many modules as is desired, or may be included with the text of the main program as in previous versions. Each program module must be a complete program, but individual procedures may be defined as being EXTERNAL or ENTRY POINTS. An EXTERNAL procedure is one which is a reference to a procedure which is declared as an entry point in another program module. Usually, a library of procedures all defined as ENTRY points would be incorporated into a program with no main program statements, only the procedure declarations. A linker takes the output of any number of separate compiler runs and consolidates them into a single relocatable core image file. The form of an external procedure declaration is: PROCEDURE name EXTERNAL ; Note that no parameter list is given. The only variation from this form is that BOOLEAN, REAL or INTEGER may precede PROCEDURE to indicate an external function. The compiler gives the procedure a label number but no code is generated. EXTERNAL procedures do not receive a procedure number as they are unable to access variables within the procedures of the main program. Therefore, they do not count towards the limit of 255 procedures. This effectively allows a program to have access to more than 255 procedures. There are two ways to tell the compiler that a procedure or function is to be an ENTRY point. First, if DIRECTIVE P is in force all procedures are available outside the module, unless they are defined as EXTERNAL. Second, the symbol ENTRY may be placed after the procedure name and before the parameter list (which may be empty, of course). It does not matter if ENTRY is used when directive P is in force. An example might be: REAL PROCEDURE compadd ENTRY (x1,x2,y1,y2); etc. The runtime program loader does not reecognise entry points or external definitions and will give an error during loading. The linker must be used to create a single core image file for the loader. Ordering of the linked program modules The first module of a linked program is the one whose main program is executed. The main programs of any other modules are not accessible. There is no check on duplicate entry point names; the first declaration encountered will be the one used at runtime. This allows the use of modules which have some procedure names in common. Apart from this the ordering of modules is not important. Use of main program variables in linked modules Since each module is a separate program, the compiler will allocate storage for main program variables starting at the same address in each. It is therefore quite possible to use main program variables to communicate between library procedures and the master program. Great care must be exercised, since the compiler cannot check the number or type of these variables. The variables in each module are simply overlaid. Arrays will function as well as unsubscripted variables because they use only one slot on the variable stack, which contains a pointer to the array Itself. It is recommended that library procedures should not access main program variables, since they will not then be of universal application. If you do decide to use this form of global storage check that the stack slots assigned to the variables (obtained from an identifier table at compile time) correspond in the main program and the libraries it is to use. Procedure numbers of external procedures External procedures are not allocated a procedure number and do not count towards the maximum of 255 allowed within any module. However, all other procedures are allocated a number starting at 1 within each compiled module. Thus, there will in general be a number of procedures all with the same procedure number in the linked program. When a runtime error occurs (not trapped by the program as on page 38) the procedure number in the message (PROC) may not be unique. The module in which the procedure occurs may however be deduced from the LOC column, which is the error address relative to the start of the compiled program. The linker gives the next load address after each module is processed and this is also relative to the start of the compiled program. It is therefore straightforward to determine in which module the error was detected. The list of procedures for that module is then consulted to find the name corresponding to PROC. RUNNING THE LINKER The linker is named ALINK, with extra characters indicating variants; ALINKL indicates long integer version and ALINKS accepts ASCII symbolic compiler output instead of the default bitstream. It is essential to use the correct version of the linker, ALINKL if the long integer version ARUNL is to be used, otherwise ALINK. The linker is driven by command lines. The start of a link is heralded by a command line containing an '=' sign. The prompt for such a line is Output=input list : If just is typed the linker returns to CP/M. Otherwise it tries to open the first input file. It keeps asking for this command line until a valid input file is given (the output file may be omitted). You can list as many input files on the line as you wish, subject to a maximum line length of 96 characters. Any file which fails to open (including the first) is treated as the end of the line. The linker processes the input files, after each one checking whether there are any outstanding unresolved external procedure names If there are none the output file is written and the linker requests another initial command line. If when all files have been read there remain outstanding externals the names of more input files are requested. Again a list may be typed but in this case an '=' sign preceding the file names is NOT required. The reason for this difference is to synchronise the start of linking should there be any error and the lines are being read from a command file (see later section). If at this stage an empty line is given the linker writes the output file anyway, with a warning message on the terminal. As with the initial command line, any input file which does not open is treated as a line terminator. The default extension for input and output files is ASC. Identifier table output There are three types of identifier table which may be output on the terminal under the control of an option letter attached to the first input file. (a) The current table, including both unresolved references and entry points, the latter being marked with '*' and having their addresses relative to the start of the program. This is output after each input file has been loaded. (b) A list of as yet unresolved names, output after each input file. (c) The entry point table listed at the end of linking. The default (no option letter) is for list (b) only to be typed. Option 'N' means no tables at all, 'F' means all three of the tables and 'G' means (b) after each file plus the global list (c) at the end of linking. Example command lines: Output=input list : PROGRAM=MAIN[N],LIB1,LIB2 Input files : LIB3 The first line means link a program to be called PROGRAM using MAIN as the main program, to use library procedures in files LIB1 and LIB2. No tables are to be listed on the terminal. The second form of line is used when there are outstanding externals after all the input files of previous lines have been processed. Indirect command files The input lines may be obtained from a command file, default filename extension .LNK. If the first character of the first command line of a link (the one with '=' in it) is '@', then the filename following will be opened and used instead of the terminal as a source of input. Command lines are still echoed on the terminal so that you can watch the progress of the linker. Command files may be chained by using '@' on the last line of a file. The current file will be closed and a replacement opened. The files are not nested. The first command line may be typed as a parameter to the call of the linker, and this may nominate an indirect command file. For example: >ALINK @LINKIT The linker will attempt to open the file LINKIT.LNK on the current default drive. The command file should not contain any blank lines except at the end as these tell the linker return to CP/M. >ALINK FRED=JIM The linker takes compiler output file JIM.ASC and creates a relocatable core image file FRED.ASC, assuming that JIM contains no external references. This performs the same function as using option 's' on the runtime loader except that the output file name may be different from the input file name. 2.71 Runtime program Given a successfully compiled program, the output file so created may mow be rum by a calling the runtime system as follows: ARUN filename The assumed file extension is '.ASC'. The file specified will be loaded and them executed. If no input is specified or if an error is found, e.g. bad syntax or non-existent filename them the runtime system will prompt the user for am input file. For example a call of the form: ARUN will prompt for input: INPUT= to which the user responds with the required filename. Upon completion of the program the system prints '^' on the console and waits for an operator response. Typing CONTROL-P will rerun the program or CONTROL-C will return control to CP/M. If a runtime error is detected then suitable diagnostic information is sent to the console (see section on runtime errors). Unless the user is making use of the error handling facility (see procedure 'error' in library section) the system will mow wait for the operator to investigate the cause of the error. The program may be rerun from the beginning by typing CONTROL-P or control returned to CP/M by typing CONTROL-C. The return to CP/M upon completion or upon detecting a runtime error can be made automatic by a call ioc(22) within the program. A call of the form ioc(60) causes am immediate restart of the program from the beginning. Any files open at the time will not be closed although all file control blocks are released. 2.72 Switches on the loader filename A previous topic describes how to examine the switch list associated with the last file opened. Since the loader uses the same routines as the Algol interpreter, the Algol program can examine the switch list associated with the program begin run. Obviously, this must be done before either INPUT or OUTPUT is used. The loader itself only acts on the first character of the list, and ignores all except B and M (which will cause an error). The programmer can determine if the debugger is active by looking for D or W at the start of the list, and thus can arrange for the switch to be propagated over a chain. 2.73 Long integer (32 bit) Algol ARUNL is a version of the Rogalgol runtime system in which real variables are represented not in the normal mantissa/exponent form but rather as 32 bit 2's complement integers. This runtime system is useful for those applications where greater precision is desirable but without the need to extend the number range to the extent allowed by the floating point representation, e.g. business programs. The number range allowed is from (2^31)-1 to -(2^31), (about +-2.15*10^9). The compiler itself remains unchanged. Variables declared as integers will still be represented as 16 bit 2's complement numbers. This document outlines the differences from the Algol system described in the manual. STANDARD FUNCTIONS The following functions have been removed: sin, cos, sqrt, arctan, ln, exp The function entier exists but is equivalent to a real to integer assignment. For example, the statements i := entier(x); i := x ; have the same effect. LIBRARY PROCEDURES The standard library file 'ALIB.ALG' can be used with ARUNL with the following exception that random has been removed, Two additional library procedures can be found in file 'ARUNL.ALG' pow10(n) lrem(t,b) DIVISION Real division (/) always truncates the result towards zero in the same way as with integer division (%). A procedure has been added to the library (lrem) to give the remainder term lost by the division. z := lrem(t, b) ; gives the remainder lost by the division: u := t / b ; The result of lrem will always have the same sign as the quotient (or zero) in the same way as the MOD operator does for the integer case, e.g. t b t/b lrem(t,b) 35 8 4 3 -35 8 -4 -3 35 -8 -4 -3 -35 -8 4 3 INPUT/OUTPUT The decimal input/output routines (read and rwrite) are unchanged except for the addition of a scaling factor. A call of the ALIBL.ALG routine: pow10(n); where 'n' is a small integer causes all subsequent calls to read to be scaled by a factor of 10^n. The digit string representing the number (including fractional and exponent fields) is read and the result scaled by 1O^n. Any fractional part is then disregarded before returning the result. For example with n=2, on reading the number 123.4567 the result would be 12345. On output the converse scaling is performed. The value to be output is first converted internally into a digit string; the decimal point is then effectively shifted left by `n' digits before printing the result in the required format. The meaning of the fopitat parameters remain unchanged. This scaling on output applies only to rwrite; the integer print routine (write) is unchanged. RUNTIME ERROR MESSAGES The following changes to the runtime error numbers given in the manual have been made. 8 Real (long integer) division by zero or lrem(t,0). 9 Overflow in real multiply. 16 Overflow in real addition. 17 Overflow in real subtraction. 18 Illegal standard function called e.g. sin, cos etc. 19 Largest negative number -(2^31) with no corresponding positive representation. This error can occur from abs, *, /, rem, rwrite etc. 2.74 The chaining mechanism The call ioc(60) re-runs the program from the start, while ioc(22) will cause exit to CP/M when execution reaches the end of the program. If ioc(22) is called before ioc(60) chaining is initiated, in which the leader reads another compiler output file. The name of the new file is obtained from the keyboard buffer. Note that the filename must be given in full unless the default extension has been placed in the three bytes following the switch. All buffers are reset after loading; therefore all files should be closed before chaining. The correct sequence is: ioc(1); text(7,"NEXTPROG.ASC"); { Put the filename into the keyboard buffer } ioc(22); ioc(6O); If the nominated file cannot be opened the prompt INPUT= will be given, as when ARUN is called from the console. 2.75 Compiler error messages and diagnostic information 2.76 Compiler error messages FAIL X ON LINE Y IDENT Z SYMBOL S The name of the current library file, or 'MAIN SOURCE' is printed along with the error number. After the error line, the program is reproduced from either: (a) 100 characters before; (b) the start of the program; or (c) the end of the previous error message, whichever is the shorter. If the whole text is being reproduced (see below) then the error text is ended with a line of '=========='. When the whole text is being sent to the listing file (directive 'L') line numbers are prefixed. The first number is counting from the start of the program and the second is a count within each file. These line numbers correspond exactly to the line numbers in the error messages. Line numbers in error messages are counted from the start of the current file, except for undeclared labels (error 2). In this case the numbering Is from the start regardless of whether the lines are coming from libraries or not. A selective symbol table containing only labels and procedures may be obtained by directive 'S'. Procedures and labels are highlighted in the symbol table by being indented. The procedure numbers are further highlighted by enclosure in brackets. If a symbol table but no listing is specified then the start and end of library files are marked in the symbol table. X is the failure number (see below), Y the line on which it occurred, Z the last identifier read, and S the decimal value of the last symbol (see section entitled 'compiler representation of basic symbols'). 'LINE UP TO ERROR' is a copy of the input line up to and including the symbol at which the error was found. The compiler output is switched off and the file deleted. The compiler however continues to check the syntax of the remainder of the program. In all compilers a tradeoff is made between the amount of error information given and the size and speed of the compiler. In this implementation the emphasis has been to produce a compiler that can be used on a very modest sized computer. There is always a danger, particularly with a one pass compiler, that following the detection of a genuine error, the system may fail to synchronize fully and thus produce additional spurious errors. 1 Identifier declared twice in same block. 2 Undeclared identifier. 3 No '[' after array name, except as a procedure parameter, or ordinary procedure used as a function. 4 No ')' at end of subscript list. 5 More than 255 variables in the main program or a procedure. 6 No FINISH at end of program. (Too many ENDs). 7 No ELSE part of a conditional arithmetic expression. 8 No ELSE part of a conditional Boolean or conditional designational expression. 9 Relational operator not found where expected. Will occur if the first arithmetic expression of a Boolean relational expression is totally enclosed in round brackets. 10 Arithmetic primary does not start with '+', '-', '.', '(', digit or identifier. 11 '%', MOD, '!', MASK, or DIFFER does not have two integer operands. 12 ')' missing in arithmetic expression. 13 Controlled variable in FOR is undeclared or subscripted. 14 ')' missing in Boolean or designational expression. 15 More identifiers in scope than the tables can accommodate. The compiler automatically makes the tables as large as possible on a given system. 16 Statement starts incorrectly. If this occurs at the terminating FINISH is means there are not enough ENDs. 17 Undeclared or unsuitable identifier on left of ':=' 18 Array declaration faulty. 19 Type specification of actual parameter is not LABEL, PROCEDURE, REAL PROCEDURE, BOOLEAN PROCEDURE or INTEGER PROCEDURE. 20 Wrong number of subscripts. In the case of formal arrays, this error cannot be detected until runtime. 21 No ')' after actual parameter list. 22 FOR statement element not terminated by ',' or DO. 23 More than 255 non-external procedures OR. 23 Procedure body not delimited by ';'. 24 ':=' not found where expected. 23 No THEN after IF. 26 VALUE specification is not the first specification of procedure formal parameters. 27 FINISH in middle of program. Possibly an unmatched BEGIN, '"' or '''. 28 No ';' after parameter list. 29 Parameter specified twice, or is not in formal list, or specification not terminated by ';'. 30 Forward reference list full. 31 UNTIL not found where expected. 32 No '(' after name of standard procedure (except input or output). 33 THEN followed immediately by IF. 34 Procedure actual parameter starts with an undeclared identifier. 35 Function or variable used as procedure. 36 procedure input or output is followed by a '('. 38 Arithmetic expression contains Boolean variable in illegal context. 39 Parameter specified VALUE is not in formal list. 40 Parameter specification not complete. 41 Am array has been called by value. 42 Input/output procedure call error. 43 Left parts of multiple assignment have different types. 44 Integer literal not in range. 45 Switch identifier not followed by ':='. 46 Switch list does not end with ';'. 47 Switch has more than one subscript. 48 Word BYTE not followed by ARRAY. 49 Input files exhausted without end of program recognized. 50 A procedure used before its declaration was assumed to be of a type different from the actual type. Try reordering procedures to eliminate the forward reference. 51 Input file specified in a LIBRARY call not found. 52 Subset compiler as for 50 but forward reference clash. 53 Subset compiler as for 50 but at block end resolution. 54 More than 100 procedure parameters. 56 No DO after WHILE. 57 No UNTIL after REPEAT. 58 Case statement syntax error. 59 Boolean expression in context where arithmetic one needed. 60 Arithmetic expression where a Boolean one is needed. 61 Array declaration not terminated by ';'. 2.77 Compiler identifier table and identifier types The compiler may be instructed to print on the console or to the monitor file a list of all the identifiers declared, together with information about their type and the addresses they will occupy in the memory. Variables are placed on a stack and the variable number is the position on the stack relative to a pointer. The pointer is held in location PBASE in the runtime program. The address of the variable is found by multiplying the variable number by 4 and adding this to the contents of PBASE. Four numbers are printed after each identifier in the compiler identifier table. The first of these is the stack position except for labels and procedures. For labels and procedures the symbolic label number is printed. This is the digits part of a symbol such as L123 which is output by the compiler. The second number is the procedure number of the enclosing procedure in which the identifier is declared. The main program is 0, and the procedures are numbered serially as they are encountered, regardless of depth of declaration. As an exception, the actual number of a procedure is printed, instead of the number of the enclosing procedure. The third number is the line number of the source program. The fourth number is the current size of the compiled code. This information can be related to the position given when runtime errors are detected. The type information of the identifier is then listed as follows. The numbers represent the internal representation of the data types. 0 procedure formal parameter (type not yet known) 1 real 2 integer 3 Boolean 5 real array 6 integer array 6 byte array 7 Boolean array 8 switch 10 procedure 11 real procedure 12 integer procedure 13 Boolean procedure 14 label The compiled code of a procedure contains a list of the types of the parameters. The following types may appear, in addition to those above. 4 string 21 real by name 22 integer by name 23 Boolean by name 2.78 Compiler representation of basic symbols These are the decimal values which are printed in a compiler error message. Language key words are represented in the Algol source by the word enclosed in single quotes or in upper case and in the compiler by 40*1st letter+second letter. If a compiler error message contains a symbol which is not on the list, an illegal compound symbol has been detected. The usual cause of this is an unmatched single quote. 1-26 letters A-Z 27 [ 29 ] 30 ^ (exponentiation) 7000 := 33 ! 34 " (string delimiter) 35 # (not equal to) 36 $ 37 % (integer divide) 38 >= (greater or equal to) 40 ( 41 ) 42 * (multiply) 43 + 44 , 45 - 46 . 47 / (real divide) 48-57 digits 0-9 58 : 59 ; 60 < (less than) 61 = 62 > (greater than) 63 <= (less or equal to) 85 BEGIN 95 BOOLEAN 105 BYTE 118 AND 121 CASE 122 ARRAY 123 CC 135 COMMENT 169 DIFFER 175 DO 212 ELSE 214 END 217 EQUIVALENT 224 EXTRA 241 FALSE 249 FINISH 255 FOR 295 GOTO 366 IF 373 IMPLIES 374 INTEGER 481 LABEL 489 LIBRARY 512 MASK 535 MOD 575 NOT 606 OF 618 OR 658 PROCEDURE 725 REAL 726 REPEAT 780 STRING 780 STEP 783 SWITCH 808 THEN 818 TRUE 854 UNTIL 881 VALUE 928 WHILE 2.79 Run time errors and diagnostic information 2.80 Run time errors In the event of an error condition being detected during program execution, a message is sent to output device 1 (console or video screen generally) of the form: ERROR n ADD PBASE PROC LOC aaaa bbbb p1 d1 aaaa bbbb p2 d2 ... aaaa bbbb 0 d0 where: n error number (see following list) p1 procedure number where error was detected aaaa address of program counter bbbb value of PBASE at error (see section on runtime system) d1 location of program counter relative to the start of the program Both aaaa and bbbb are printed in hexadecimal. The procedure number can be found by counting procedures from the beginning of the program starting from 1. The main program is given the number 0. This information can be found in the compiler identifier table output. If p1 is non-zero the calling sequence is then printed on the following lines until the outermost level (p=0) is reached. This traceback information can be used to investigate the nature of failure in greater detail if required (see section describing working of the runtime system). The information given in dl etc can be related to the corresponding information given in the compiler identifier tables to help locate the position of errors. The program may be restarted from the beginning by typing CONTROL-P or control returned to CP/M by typing CONTROL-C. 2.81 Recovery from run time errors In normal operation a program is terminated by the detection of a runtime error. It is possible however to continue following an error, allowing the program to exit in a controlled manner, e.g. close output files, give more useful diagnostic information, values of variables and so on. This recovery is achieved by including a call of procedure 'error' (in ALIR.ALC) in the program before the failure occurs, e.g. error(LABEL crash); On detecting an error, the runtime system will place the error information given above and then transfer control to the label (or designational expression) 'crash' in the user's program. It is advisable that the label be located at an outermost program level as it may only be reached if it is within scope at be time of the error. The error number responsible for the failure can be found by means of a call to chin(13), e.g. crash: i := chin(13); IF i > 30 THEN GOTO cpmbug ELSE ... 2.82 Runtime error numbers 0 Undefined error. This implies that an error has been detected which has no corresponding entry in the error list. This hopefully will only occur where the user has made modifications to the runtime system and failed to update the error list. 1 Variable space used up (stack overflow). Probably the result of excessive recursion or array declarations too large for the available memory. The error traceback may fail under these circumstances (the first lime should always be correct). Overflow is checked following block or procedure entry and array declarations. 2 Procedure called with the wrong number of parameters. 3 Procedure called where the actual and the formal parameter types do not match. 4 Array used with the wrong number of subscripts. 5 Array subscript out of range (below base of array). 6 Array subscript out of range (above top of array). 7 Integer division by zero. 8 Real or integer division by zero. 9 Real overflow. ?? overflow in multiply 10 Real to integer conversion overflow, also long integer 11 Real overflow detected during normalisation after real arithmetic operation, or by integer overflow during add or subtract. 12 Error in READ. Character read which is not a legitimate part of a number (ASCII value is less than 48 ie <&0). 13 As for 12 but ASCII value is greater than 57 (i.e. >&9). 14 Error in READ. Number contains two or more decimal points. 15 Error in READ. A character '+', '-', '.' or 'E' found with no associated digits. 16 Square root of a negative number. 17 Exponential argument too large (>87). 18 Exponential argument marginally too large. 19 Logarithm of a negative number. 20 Logarithm of zero. 21 Table item out of range (below). Found in ioc(n), chin(n), chout(n,c) etc where n<0. 22 As for 21 but where a is greater than maximum value specified in list. 23 End of file detected during READ. LOADER ERRORS 24 Loader syntax error. Output from compiler has been corrupted?. 25 End of input is indicated (CONTROL-Z read) but no program has been loaded. Selecting input device 0 will produce this effect. 26 On completion of input there remain unresolved forward references. Input source is corrupt? 27 An input device name has been used as an output device in a command string, or vice versa. 28 Label tables overlap program. Program is too large for available memory. 29 Forward reference tables full. This error should be rare but can he avoided by reordering procedures so that they generate fewer forward references, i.e., try to arrange that procedures are declared before they are called. 30 Non-relocatable core image input file is not compatible with this runtime system. OPERATING SYSTEM ERRORS 31 Channel number is out of range. 32 No directory space found during output. 33 Attempt to read from channel not open for input. 34 Attempt to read from a non-serial channel. 35 Attempt to reed past end of file. 36 Attempt to write to a channel without write access. 37 Attempt to write to a non-serial file. 38 Error in extending file. 39 Attempt to output to random access file without write access. 40 Attempt at random access to a serial access channel. 41 Channel not open. 42 Attempt to rewind a random access file. 43 Random access with a negative block number. 44 No slot available for input or output. 45 Attempt to create an output file for random access. 46 Random access transfer attempted with a block count less than zero or greater than 255. 2.83 Runtime stack organisation The stack extends from the end of the runtime program to the end of available memory, as found by interrogating the system. The variable stack grows upwards from the end of the program and a working stack, used in evaluating expressions, passing procedure parameters, and CALL instructions, grows downwards from the end of memory. The variable stack consists of a number of frames, one for the main program and one for each procedure call. Within each stack frame is an array stack, which contains an array frame for each depth of array declaration. In the following section "word" refers to a 16 bit (2 byte) quantity. The following pointers are used. Their addresses can be found from the listing in the section entitled 'adding code sections'. PBASE points at the current variable stack frame MBASE points at the main program stack frame. When the main program is executing locations PBASE and MBASE contain the same value. WSBAS points at the base of the working stack in the current level. It is used to delete floaters from the working stack at Algol labels. ABAS points at the current array frame FSPT points at the next free location in the variable stack. FPARAG in 80x86 versions holds the next free paragraph in extra memory. The following registers are also of significance. SP points at the top item of the working stack, It must be saved and restored if used by any machine code added. It is also used for CALL instructions. IX should also be restored if used. It is the Algol interpretive code program pointer. IY must be restored if used. It points to a series of flags and working space. Each stack frame is divided into two parts, a variable part and an array part. The variable part is divided into slots which are each two words (four bytes) long. The actual address of a slot is found by multiplying the slot number by 4 and adding this to the base address which is held in PBASE or MBASE. In the main program frame the first declared variable is in slot 2 and the word pointed at by MBASE contains 0, the level number of the main program. In procedure level frames slot 3 is used for the result of a function and is unused in procedures which do not deliver a result. The procedure parameters occupy slots 4 and upwards, followed by variables declared within the procedure. The first word of each procedure in the compiled program contains the number of variable slots required by the procedure. The first word of the main program points at the last word of the compiled program which contains the number of variables slots required by the main program. In procedure frames the first three slots are used for linking information. Starting at the word pointed at by PBASE (slot 0) the words contain the following information. Word 1 The number of the procedure. Word 2 The return address Word 3 PHASE of calling level Word 4 WSBAS of calling level Word 5 ABAS of calling level Word 6 FPARAG of calling level in 8086 version A variable stack slot may contain any of the following types of item: 1. A real number which is held in the standard four byte format. 2. An integer number or Boolean value which is held in the highest addressed word of a slot. Booleans use only the least significant byte of this word. 3. A label or procedure address, always a procedure parameter. The address itself is in the highest addressed word and in the word below it is the value of PHASE at the tine the address was evaluated. 4. The address of an array or a switch either as a declared variable or a procedure parameter. The address is in the highest addressed word of the slot, the remaining word being unused. 5. The address of a string or an unsubscripted variable for procedure parameters of type string and variables called by name. The address in a switch variable points to the switch vector. The word pointed at contains the number of elements in the switch and subsequent words the addresses of the labels in the switch list. The array part of a stack frame contains a number of array levels, numbered by depth of declaration within a procedure or the main program. Level 0 always exists and is located immediately above the end of the variables. ABAS points at the base of the current level, which contains the depth of that level. The next word (except in level 0) contains a pointer to the level below. Above the level information are the dope vectors and array elements. An array variable points at the start of its dope vector. This contains 2*(N + 1) words, where N is the number of subscripts. The first* word of the dope vector contains the number of bytes occupied by each element (1, 2 or 4), the second the number of subscripts and the third the lower bound of the first subscript. There are two additional words for each additional subscript. The first contains a multiplier for the previously accumulated element number and the second the lower bound of the next subscript. The final word of the dope vector contains the address of the word beyond the end of the array elements. Array elements themselves are stored immediately after the dope vector. 2.84 Runtime operation codes These are the operation codes which are output by the compiler. The list gives their number in decimal. The compiler can be forced to output in character format by using the switch [-B] (quoted convention) or [-b] (upper/lower case convention). Expressions are evaluated using a working stack. The top element is referred to as S1, the next one down as S2 and so on. The stack pointer 'SP' is used for this stack (and also for CALL instructions). It grows down from the top of available memory. Some of the interpretive routines take data from the program. N1 refers to the next byte after the code, N2 to the next, and so on. In the following section 'word' refers to a 16 bit (2 byte) quantity. 0 No operation. 1 Declare array. N1=depth of declaration. N2=nunber of declarations in multiple. N3-variable number of first declaration. N4=number of bytes in each element. N5=number of subscripts S1=upper bound of last subscript. S2=lower bound of last subscript. S3, S4, etc., bounds of other subscripts. 2 Formatted print. S1=b, S2=a, S3=value, S4=device number. 3 Read to S1 from input device in S1. 4 Store local variable from S1. N1=variable number. 5 Print string. Followed by 7 bit ASCII character, terminated by zero. Device number is in S1. 6 Integer print S1=radix, S2=value, S3=device number. 7 Read next character to S1 from device number in S1. 8 Print S1 as character, 52-device number. 9 Jump. Location is in next word. 10 Leave procedure. 11 Enter procedure with no parameters. Address is in next word. 12 Get local variable to S1. N1=variable number. 13 Integer add. SI:-82+Sl 14 Get array element. N1=procedure number, N2=variable number, N3=number of subscripts. The subscripts are on the stack. The main program is procedure number 0. 15 Store array element. S1=value, other information as code 14 except subscripts are in S2 etc. 16 Set 16 bit constant in S1 from NI and N2. 17 Integer negate. SI:=-S1 18 Real ^ Integer. S1:=S2^S1 19 Integer multiply. S1:=S2*S1 20 Integer divide. S1:=S2/S1 21 Integer subtract. S1:=S2-S1 22 S1:=S1=0 23 S1:=S1>0 24 S1:=S1<0 25 Get any variable to S1. NI=procedure number, N2=variable number. 26 Store to any variable from S1. N1, N2 as for 25. 27 Standard function. Followed by another code. 2 sqrt 3 sin 4 cos 5 arctan 6 exp 7 ln 8 sign 9 entier 10 abs 28 Jump if S1=FALSE. Address in next word. 29 Set zero ln S1. 30 S1:=NOT S1 31 S1:=S1 AND S2 32 S1:=S1 OR S2 33 S1:=S1 EQUIV S2 34 For statement calculator. S1=address of controlled variable. S2=final value. S3=lncrement. S4=0 for no increment at the first test, else -1. N1=type of control variable (0=REAL else INTEGER). The following word contains the exit address for loop completion. 35 'ioc'. Parameter in S1 36 Enter procedure. N1=number of parameters. S1=type of last parameter. S2=value of last parameter, and so on, in reverse order. The address of the procedure is in N2 and N3. The first word of a procedure is the fixed space on the variable stack required by the procedure. The following bytes contain the procedure number and the number of parameters expected, followed by the type specification of the parameters, in reverse order. 37 Store outer block variable from S1. N1=variable number. 38 Fetch outer block variable to S1. N1=variable number. 39 Set in S1 the address of the variable whose procedure number is in N1, variable number in N2. 40 Skip, device number in S1. 41 Integer S1:=sign(S2-S1) 42 Set 8 bit constant in S1 from N1. 43 Fix S1. 44 Float Sl. 45 Set floating point constant from next 4 program bytes. 46 Floating negate. 47 Set label in S1. Address in next word. Second word of S1 becomes variable stack base pointer. 48 Evaluate switch address. S1=address of element O, S2=subscript. On exit S1 contains address. 49 Real ^ real. S1:=S2^S1 50 Floating multiply. S1:=S2*S1 51 Float S2. 52 Floating divide. S1:=S2/S1 53 Floating add. S1:=S2+Sl 54 Floating subtract. S1:=S2-S1 55 Store parameter called by name. S1=value, S2=address. 56 Floating S1:=sign(S2-S1). 57 Jump to address in S1. 58 Enter procedure without parameters whose address is in S1. 59 As 58 but number of parameters in N1. For 59 the rest of stack is set up as for Code 36. 60 Print string whose address is in S1, S2=device number. 61 Set stack depth. N1=procedure number. N2=array depth required. 62 Fetch parameter called by name. S1=address. 63 Stop, end of program. Prints '^' on console or returns to CP/M. 64 Store an address in the program in a local variable. Followed by the variable number in N1 and the address ln the next whole word. 65 Jump to the address in local variable number N1. 66 Set in S1 the address of local variable number N1. 67 As for 66 but main program variable. 68 Get local array element. 69 Get main program array element. 70 Store local array element. 71 Store main program array element. Codes 68-71 are followed by the variable number in N1, not by the level number and then the variable number as for codes 14 and 15. 72 Read a floating point number, check for end of file. S1=address of label to go to on end of file. S2=device number. 73 Logical OR. Sl:=S2 OR 51 74 Logical AND. S1:=S2 AND S1 75 Logical EXCLUSIVE OR. Sl:=S2 XOR Si. 76 Integer MOD. S1:=S2 MOD S1 77 Close file, stream number in S1. 78 Delete file, stream number in S1. 79 Open INPUT and assign to stream number S1. 80 Create OUTPUT and assign to stream number S1. Extended opcodes Compression of the compiler output is achieved by defining the opcodes below. They combine one of the opcodes in the table above with its argument. 0xxxxxxx (0-127) Existing opcodes. 10xxxxxx (128-191) Set xxxxxx+1 on the stack (range 1-64). 1100xxxx (192-207) Fetch local variable xxxx (range 0-15). 1101xxxx (208-223) Fetch global variable. 1110xxxx (224-239) Store local variable. 1111xxxx (240-255) Store global variable. A similar extension has been made to the compiler opcodes: 00xxxxxx (0-63) Existing opcodes. 01xxxxxx (64-127) Set integer contant range 1-64. 1000xxxx (128-143) Fetch local variable 0-15. 1001xxxx (144-159) Store local variable 0-15. 101xxxxx (160-191) Store global variable 0-31. 11xxxxxx (192-255) Fetch global variable 0-63. 2.85 Summary of ioc() procedure calls Utility calls ioc(n), the range of numbers n is in the left hand column. 0-5 input/output selection 6-12 rwrite format control 13-15 output file options 16-17 interrupt option on disk i/o 18-19 read options 20-21 file extension options 22 reboot CF/M on completion 49 More flexible formatted number output 60 rerun program from start Others are linked to procedures in ALIB.ALG 2.86 Summary of pre-declared procedures Utility pre-declared function ioc Standard functions abs, arctan, cos, entier, exp, ln, sign, sin, sqrt Input/Output chin, chout, read, rwrite, skip, text, write 2.87 Procedures in the library ALIB.ALG These must be compiled in with the program, or linked with the Algol object file. Most of these procedures are implemented by an ioc() call within the body of the procedure. A number preceding the name is the ioc procedure parameter. Not all these are implemented in MSDOS Rogalgol. Input/output procedures findinput findoutput 26 rblock 27 wblock 28 rewind 30 seti 31 seto 32 ipoint 33 opoint 34 exflt 38 fcblock 39 swlist 47 bios 48 cpmd 58 rename 59 newext Other library procedures 23 error 24 location 29 fspace 35 blmove 36 peek 37 poke 51 in 52 out dpb rdisk wdisk 40 parity 41 shl 42 lsr 43 asr 44 rotl 45 rotr 46 random 50 clarr 55 sloc 54 slen 56 smatch 53 atext 25 emt (not implemented, RML hardware only) 49 wait (not implemented, RML hardware only) 62 chpos (not implemented, RML hardware only) 61 point (not implemented, RML hardware only) 63 line (not implemented, RML hardware only) 2.88 Distributed programs and files ALGOL.EXE and ARUN.EXE The Algol compiler and runtime system. ARUNL.EXE The runtime program which uses 32 bit integers in place of real numbers. ALINK.EXE, ALINKL.EXE and ALINKS.EXE The floating point linker, the long integer linker, and a floating point linker which accepts character format (directive -b) compiler output files. ALIB.ALG The standard Algol library routines. UCASE.ALG and UCASE.ASC This program will convert Algol source files written using convention 1 (all upper case with key words enclosed in quotes) into convention 2 (upper/lower case). The program prompts for input and output file names. The default file extension is 'ALG'. UCASE.ALG and UCASE.ASC This in the complement to LCASE.ALG just described. Files are converted from convention 2 into convention 1. MMIND.ALC and MMIND.ASC Mastermind. Game 1 allows 6 colours and no blanks. Game 2 allows 6 colours and blanks. VDU.ALC and VDU.ASC This program is designed as an editing aid for creating Algol source files. The program prompts for an output file name, the default extension is '.ALC'. If the upper case convention with key words enclosed within quotes is required then give a [U] switch option with the file specification. Now start typing in your program. The program detects language key words. As soon as sufficient characters have been entered to uniquely define the key word the program will supply the rest. Corrections can be made to the current line being entered using the rubout key. Other special keys are. CONTROL-U Erase the current line. CONTROL-R Retype the current line after cleanup. CONTROL-X Switch off the auto keyword facility. A second call will switch it on again. This allows strings etc. to be entered without extra characters being added CONTROL-Z End of program. Close file and return to the start. CONTROL-C In response to the prompt for a file specification will return control to CP/M. Other example programs may also be included. 3 Rogalgol for the 80x86 3.1 Overview of 80x86 Rogalgol Overview of 80x86 Rogalgol As far as the Algol programmer is concerned, the CP/M-86 and MSDOS/PCDOS Algol can be considered as virtually identical to the CP/M-Z80 version. There is no difference in the language. The operation of the runtime program has been made as compatible as possible. Programs not making system calls using the library procedures CPMD and BIOS run without change and without re-compilation, as will most others. The only difference likely to affect program running is that the RANDOM library function always starts with the same value on 80x86 versions. Other changes can be grouped under the headings extra feature, file handling issues and the operating system interface. The compiler is called ALGOL, the floating time runtime and linker programs are called ARUN and ALINK. The long integer versions are called ARUNL and ALINKL. The .CMD (or .EXE) files try to claim a data segment of 64 KB. The total memory used is around 76 KB if sufficient RAM for a 64k data segment is available. Rogalgol 8088-8086 automatically makes use of more memory than 64K because the compiler, linker and runtime programs are divided into a code segment and a data segment. In ARUN, the data segment holds the compiled Algol code and the variables. This allows all the library functions which use addresses, (such as LOCATION, ATEXT etc.) to work exactly as on the Z80, because only 16 address bits are needed. Since all disk transfers and indeed BDOS and BIOS calls are data-segment relative, these also function in a way very similar to CP/M-80. The pre-declared function OPSYSIDCODE can be used to find out which operating system is being used. Alternate code can then be executed for system-dependent parts of the program. 3.2 Extra features available on the 80x86 versions 3.3 Using extra data memory beyond 64K Using extra data memory beyond 64K The normal data area is limited to 64K to allow all the library procedures to work. Memory above this limit can be used to hold arrays by giving them the EXTRA attribute. Compatibility with Z80 Algol is maintained as the EXTRA attribute is ignored under CP/M-80. The library procedures which manipulate addresses do not work on EXTRA arrays. A fast copy mechanism between the 64K block and extra memory is provided. If the symbol EXTRA is placed after an array declaration it has the following effect at runtime: The dope vector for the array stays in its usual place, but the last word of it, instead of containing the address of the byte beyond the end of the array, contains the paragraph (segment) address of the start of the array in extra memory. It follows that extra arrays always start on a 16 byte boundary and that there is no check for a subscript being over the top of the declared bounds. Algol statements to fetch and store the elements of extra arrays work exactly as for normal arrays. However, LOCATION will return the wrong address and the library procedures depending on it cannot work. A new call, ioc(66), has been introduced to allow copying between normal and extra memory, analogous to BLMOVE. ioc(66) must be called from within a procedure having four parameters as follows: 1. An integer by value. This is zero to copy from the 64K area to extra memory and non-zero to copy the other way. 2. An integer by value specifying the number of bytes to copy. 3. An integer by value which is the LOCATION of the start of the block in normal memory. 4. A Boolean or real (as appropriate) by value to give the address in extra memory. The actual value is not used. The runtime program keeps a record of the last calculated address in extra memory and it is this address which is important. An example: INTEGER ARRAY normal[1:10] ; INTEGER ARRAY EXTRA extra[1:10] ; PROCEDURE copymem(d,l,n,e); VALUE d,l,n,e; INTEGER d,l,n; REAL e; ioc(66); {Copy from normal to extra memory} copymem(0,20,location(normal[1]),extra[1]); ioc(67) returns the number of remaining free paragraphs (units of 16 bytes) and it must be called from within an INTEGER PROCEDURE. The result of the function is the free paragraph count. You should always make this call because the amount of free memory varies with the number of programs loaded and can be zero. 3.4 The Runtime Debugger The Runtime Debugger A major new facility available on the 16-bit Algol versions is the trace/debug module. Space considerations led to the decision not to add it to the Z80 version. It is activated by appending the [D] switch to the loader input file. The switch is not preserved across a chain. The filename supplied to the chained program mechanism must have [D] appended to cause the chained program to be run with debugging facilities. The switch [W] is used to activate both debugging and force FCB file system calls. The format of the compiler listing file has been changed to include a display of the program pointer at the end of each statement. Line numbers appear at the left margin, as before, but are terminated by '>' to delimit them more clearly from the program text (and identifier table if one is begin produced). The program pointer appears as a decimal number enclosed by '< >'. This is relative to the start of the program. It corresponds to the last number of the identifier table. When modules are linked you will have to add the starting location of each module, as given by the linker. The addresses in the root module will be correct. After loading a program with the [D] switch, the debug module announces that the stack area has been set up, and gives the limits of the loaded program in HEX relative to the DS register. Next the first 100 bytes of the program are listed. You can use this to check that the expected code has been loaded, by comparing the numbers with the compiler output, which is legible if the -B directive is used. The debugger works by stopping the program when a chosen address is reached, allowing the user to examine the memory before execution is resumed. The breakpoint 2 is set initially. This is the address of the first opcode, so in effect the debugger is entered before any codes are executed. When a breakpoint is reached, two lines are output to the terminal. The first shows the current program pointer, followed by the next few bytes of the Algol program. The first byte is the opcode which is about to be executed. The second line shows which procedure number is currently being executed, and then gives a list of the available command letters. For example: Algol PC, opcodes 2, 29 42 35 1 1 1 8 1 Procedure 0 G/T/R/D/V/M/B/S ? This is the initial breakpoint at location 2. The first opcode is 29, set zero on the stack. Next comes 42, set 6 bit constant, so 35 will be set on the stack. Next comes opcode 1, declare an array. The full meaning of the opcodes and their parameters can be found at the end of the user manual. Since the program is just starting the procedure number is 0, the main program level. The possible commands will now be described. The letters may be typed in upper or lower case. Command G. This means GO to the next breakpoint. The breakpoint is set by typing a decimal number immediately after the G. Any non-digit will terminate the number. If you make a mistake type DELETE (RUBOUT) before any terminator, and then re-type the whole number. Using the example above, G5 will cause the program to stop before declaring the array. If you type no number or 0, the effect is that no breakpoint is set, because the first executable opcode is at location 2. The program will run normally and never re-enter the debugger. Command T. This means Trace. One opcode will be executed and the debugger will be re-entered. Command R. Reset the variable stack pointer to the top level. See next command D for an explanation. Command D. Descend one level down the variable stack. When the program is interrupted, the variables available for examination are those belonging to the currently executing procedure as indicated on the second line of the display Command D causes the variables of the calling level to be made available. The command has no effect if the current level is 0. Successive commands D will eventually result in level 0 being available. Command R resets the top level as the available one. It is not possible to go back up one at a time because the stack does not hold the required information (it is never needed to run the program). Commands G and T cause an automatic command R. Command V. Examine a variable. Follow immediately with the decimal number of the variable to be examined. The variable numbers may be obtained from the identifier list output by the compiler. The variable will be printed both as an integer and as a floating point number. If the variable does not conform to the standard floating point format a row of asterisks is printed. If the variable is an array it will point to the dope vector and elements which can be examined using the M command to be described next. Command M. Examine memory as a series of words. Follow immediately by a decimal address. The contents of 16 words will be displayed, starting at the address given. Each is shown as an integer and a floating point number. Most useful for looking at the contents of arrays. Command B. Examine memory as a series of bytes. Follow immediately by a decimal address. Sixteen bytes will be printed first in decimal and then as an ASCII string. Command S. Display the Algol stack. Each stack item is shown as an integer and a floating point number. The top of the stack that is the last value set is printed first. Only the items deposited by the current procedure will be displayed. Using the example above, if the program is stopped at location 5 the values 35 and 0 will be on the stack. If an error occurs which has not been trapped by the programmer, then DEBUG is entered after the error message has been output. The Algol program counter is reset to point at the opcode which caused the error. You can examine the memory, but do not try to restart the program. The already partly executed opcode will have altered the contents of memory. You can determine from the location given in the error message how many bytes of program after the opcode had been used as parameters. 3.5 File handling under PCDOS and MSDOS 3.6 Overview of MSDOS file handling Overview of MSDOS file handling MSDOS and PCDOS have a file interface which has a high degree of compatibility with CP/M and CP/M-86. Although there are differences in detail, the Algol programmer has been isolated from them, so that with three exceptions the system runs exactly as under CP/M. The exceptions will not affect most programs. The compatibility between CP/M and PCDOS/MSDOS extends as far as the device names in Algol command lines, (e.g. KBD: and VT:), because these are interpreted as appropriate system calls. To access the AXO: and AXI: devices, use the names RDR: and PUN:. Rogalgol makes no BIOS calls, so results are identical under MSDOS and PCDOS. The known exceptions to CP/M-MSDOS compatibility are 1. CP/M supports sparsely populated random access files, whereas MSDOS does not. This will only affect your programs if you write to random access files with high record numbers, expecting the gaps below not to occupy physical disc space. Under CP/M no physical space is required; under MSDOS it is. 2. MSDOS will not create duplicate files of the same name on the same device. If you open a file for output, any previous file of that name is deleted. CP/M creates another file of the same name. IOC calls 13, 14, and 15 have no effect under PCDOS and MSDOS. 3. The CP/M system pads out the last record of a file with CONTROL-Z. MSDOS pads with zeros, possibly after a single CONTROL-Z. Sequential files written by Algol are padded with CONTROL-Z on both systems. Algol returns -1 at physical end of file on both systems. 3.7 File Control Block usage File Control Block usage Under MSDOS-PCDOS, the operating system version number is tested before any file operations. If the version is 1, then the CP/M type system calls using File Control Blocks are utilised. Under version 2 or later, the newer Pathname XENIX compatible calls are used by default. System calls in the range 0FH to 24H are converted to calls in the range 3AH to 42H and 56H. This applies only to calls made by the interpreter out of programmer control. Direct calls using the CPMD function are not intercepted, except that function 0 (exit program) is converted to 4CH. This change is transparent to the Algol programmer and has been made because Microsoft suggests that the old calls should not be used. Compatibility across all versions is maintained because the Algol still uses an FCB, with slightly altered contents. For MSDOS-PCDOS versions 2 up, the filename at the start of the FCB is used to create a pathname in a dedicated area of memory. The FCB of an open file contains the handle at offset 18H. For random access operations, the record number at offset 21H is used to calculate the required byte offset into the file. The price of compatibility is that you cannot used full pathnames with the INPUT and OUTPUT pre-declared file opening mechanism. You must use the CPMD function to use full pathnames. For testing purposes, it is possible to force the interpreter to use the FCB calls. The switch [V] is added to the loader input file to achieve this, but the loader itself will always use the new XENIX calls. Note that the switch setting is not preserved across a chain, which starts the interpreter right from the beginning, including all installation. The Algol program can propagate the switch, if required. The 'V' switch is useful for checking that file handling problems really were due to using the FCB system calls. 3.8 The operating system interface 3.9 Overview of BIOS and SYSTEM calls Overview of BIOS and SYSTEM calls BIOS calls are made using the library function BIOS or lOC(47) within an Integer Procedure. There is a high degree of compatibility between CP/M and CP/M-86. The BIOS mechanism is quite different under MSDOS and CP/M; they are therefore TOTALLY INCOMPATIBLE, BIOS calls copied from CP/M will not work at all under MSDOS. BIOS calls under CP/M-86 Functions less than 21 (decimal) are converted to the BDOS function 50 in a way transparent to the programmer. The results are exactly the same as using the BIOS library routine under CP/M-80. Under CP/M-86, BI0S functions higher than 20 are interpreted as a request to make an interrupt. The mechanism is exactly like the MSDOS interrupt request calls described in the next topic, except that there is no special action for interrupts 37 and 38 (25H and 26H). Under CP/M-80 BIOS functions higher than 20 produce unpredictable results, BDOS (system) calls under CP/M86 The BDOS functions which return an address in BX (27 1BH) and 31 (1FH)) are intercepted by the IOC(48) routine called by the library function CPMD. Special treatment is required because the addresses of ALLOC and DPB have a different segment base to the Algol variable area. To overcome this problem, the second parameter of the procedure in which IOC(48) is called is used to pass an address to which the DPB or ALLOC vector is copied. If the second parameter is zero the information is not copied and no meaningful results are returned. It is the programmer's responsibility to ensure that a valid address (an array or the common area) is used and that the array (or the 256 byte common area) is big enough. The DPB is fixed in size and should be examined first to calculate the array size required for ALLOC. The problem does not arise with the DMA transfer address as its segment base is the same as that of the Algol variables. Calls to BDOS (system calls in MSDOS terminology) using the CPMD library routine or IOC(48) are in many case compatible across all the operating systems. BDOS and BIOS calls under MSDOS are described the next topics. 3.10 BDOS (system) calls under MSDOS/PCDOS BDOS (system) calls under MSDOS/PCDOS There are three groups of system calls to be considered: (a) Those with function number 36 or less are CP/M compatible, although some (such as 27 and 31) are meaningless under MSDOS/PCDOS. Care is required because of slight differences from CP/M. (b) Those with higher numbers are specific to MSDOS and will not occur in programs written for CP/M. A new calling mechanism using a register array has been provided. Calls less than 56 (38H) return their result in AL and this register is returned as the result of CPMD, as for group (a). (c) System functions 38H (56D) and higher numbers use the carry flag to indicate success or failure. Instead of returning AL, the CPMD function returns the status register. The value of AL can still be found in the register array. Further details on how to use these three groups follow. (a) CP/M compatible system calls. A careful study should be made of the MSDOS Operating System Programmers Reference Manual, as the slight differences between the systems (mainly with regard to return codes) may affect the running of the program. Some differences do not matter, for example the SEARCH FIRST and SEARCH NEXT functions return 0, 1, 2 or 3 for success under CP/M but always 0 under MSDOS. None of the CP/M compatible calls returns information except in the register AL; this is returned to the Algol program as the value of the function CPMD, as under CP/M-8O. (b) MSDOS functions higher than 36 (24H) must use a new calling sequence. The first parameter of CPMD is the function code as for CP/M compatible calls. You must provide as the second parameter of the procedure calling IOC(48) the address of a block of 4 words (a 4 element integer of 8 element byte array is most convenient). The registers are set up from this block before the function is called, and their values on return are placed there. The function code to be in AH is however obtained from the first parameter and not the register array. The lowest address contains AX, followed by BX, CX and DX in that order. The least significant byte is at the lower address if you use a byte array. A few functions (get time, get date, get free space on disc) do not require any registers to be set up on entry. (c) MSDOS function calls higher than 55 (37H) The success/error status of these functions is returned in the carry register, not AL. To allow them to be used with the CPMD direct system function call, the return value is the flag register. A register array is used in the way just described under group (b) and the returned value of AL can be found there. The carry is the least significant bit of the flag register, so doing MASK 1 will result in zero for success or 1 for an error. This mechanism allows nearly all the system functions to be called from Algol. The exceptions are those which require values to be set in SI or DI or ES. This group includes 56H, rename. However, the Algol library function RENAME can be used. By using CPMD for file access instead of the built in mechanism, full pathnames may be utilised for all function except RENAME. 3.11 BIOS calls under MSDOS/PCDOS BIOS calls under MSDOS/PCDOS Under MSDOS/PCDOS the Algol BIOS call generates an interrupt. The two parameters of the BIOS library routine are (1) the interrupt number and (2) the address of an array containing the desired contents of the registers on entry. This latter is set up in the same way as described above under 'MSDOS functions higher than 36'. However, unlike the BDOS call, register AH is set up from the register array. On return, the register array is updated to reflect the new contents of the registers. The result of the function is the hardware flag register on return from the interrupt, as for system function calls higher than 37H. Under MSDOS/PCDOS, but not CP/M-86, the stack is balanced by POPing the flags after interrupts 37 and 38, absolute sector disc read and write. This is as described in the MSDOS Programmer's Reference Manual. These interrupts return the success code in the C flag, which is why the value of BIOS function is the flag register and not AX. 3.12 Compiling and linking the 80x86 Rogalgol executables The runtime interpreters and the linker are constructed by including source modules with the extension .INC. These files are identical for CP/M-86 and MSDOS, except for two short ones which have different names under the two operating systems. The subset compiler has been changed so that it can generate code for both 16 bit systems and CP/M-80. You are asked to type a number on the terminal indicating the target system. 1. CP/M-86 This system has no object module linker, so the necessary include files all appear in the same assembly. 1a. Subset and full compiler Assemble CRUN.A86. Run ICOM to generate ICOM86.A86 from ICOM.IAG. The origin is 1772. Assemble the resulting file to get ICOM86.M86. Then concatenate using PIP: PIP ICOM.M86=CRUN.M86[I],ICOM86.H86[H] Then generate the command file by GENCMD lCOM DATA[XFFF] The procedure for the full compiler is exactly the same, except that you use ALGOL instead of lCOM and you must leave off the [H] switch. PIP runs out of memory and reports a non-existent error. The [I] switch on CRUN.H86 is vital. CP/M-86 Runtime and linker programs Assembly is straightforward: ASM86 ARUN $Ax Hx Px Zx (x is your choice of device) In generating the runtime program you can choose how much memory is to be allowed for the compiled algol + data area. The following line is appropriate if you do not require EXTRA arrays: GENCMD ARUN DATA[XFFF] To allow as much memory as possible use XFFFF instead of XFFF. The hex number following X is the maximum number of paragraphs required. The procedure for ARUNL is Identical. The two linkers are generated in a similar way, but there is no point in giving them more than FFF paragraphs: 2. MS-DOS The assembler cannot take long source files, but a linker is provided. The system is therefore put together by linking object modules. The source files for the modules use the same include files as the CP/M version. 2a. The subset compiler This is CRUN+AIOPAK+ICOM. The first two are obtained by using MASM on the source files of the same name. ICOM.OBJ is obtained by assembling ICOM.ASM, which is in turn the output of ICO.EXE compiling ICOM.IAG. The origin to give for MS-DOS is 1772. Note that you cannot use the same version of ICOM.ASM as under CP/M86 because the source code conventions are different. 2b. The full compiler. The procedure should in theory be the same as for the subset compiler. However, MASM may not assemble ALGOL.ASM because it's too big. If so it must be pre-processd by the program LABELS. This removes all symbolic labels, reducing the subset compiler output to a list of pure decimal numbers. Call it like this: LABELS ALGOLX.ASM=ALGOL.ASM It will ask OUT-IN? like other Algol programs, If it can't open the files (or if you called it without giving any). Either ALGOL.ASM itself or the output of LABELS can be assembled by MASM. Then link it with CRUN+AIOPAK. CRUN+AIOPAK+ALGOL(X) 2c.The MS-DOS runtime programs and linkers. These are linked using object modules as follows: All object modules are obtained by assembling the source file of the same name. ARUN+AFPP+AIOC+AFINT+AIOPAK for the floating point interpreter. ARUN+ALIAP+AIOC+AFINT+AIOPAK for the long integer interpreter. ARUN+AIOC+AFINT+AIOPAK for the floating point linker. ARUN+ALIAP+AFINT+AIOPAK for the long integer linker. The MSDOS linker automatically generates a .EXE file which, when loaded, claims all available memory.