The Expressive Ada 2012 Challenge
I like the idea of the Expressive C++17 Challenge. I have an issue with how the problem is way, way underspecified1 I have provided my interpretation of the requirements in Appendix B: Program Requirements. You can probably understand why I find the specification annoying. What happens if the input file simply does not exist? What happens if the number of columns differ between lines? Not to mention that this ascii delimited data format is described all over the place as “csv”. It’s not., but let’s not rain on our own parade!
The twist of the challenge is that the program should use as many new C++17 features as reasonable, and it should not use any library other than the standard library. We got a C++ solution. We also have a Rust solution as well as a D solution. However, the latter two are not as faithful to the original problem as one may have hoped, because they do not focus on using new features. That said, the D solution is probably my favourite so far because the accompanying article was so educational.
With this article, we even have an Ada solution! I focused on two things while pulling this code together a while back:
- Using as many new features from Ada 2012 as reasonable2 This is in line with the original C++17 challenge, but a departure from the Rust and D solutions which just use language features that are good in general, and not specifically those that appeared in the most recent version of the language., and
- Handle infinite streaming files with good performance even on many fields,
Before looking at the code, let’s just quickly check out which Ada features I used that are worth mentioning. The numbers in parentheses indicate which version of the language3 In 1983, the first version of the language – Ada 83 – was designed. In 1995, it was upgraded with (among other things) better support for object-oriented programming and got the name Ada 95. This got further incremental improvements in 2005, and, unsurprisingly, this version of the language is called Ada 2005. Finally, a fairly large revision of the language was made in 2012, giving us the eponymous Ada 2012. that introduced them.
- Contract-based programming (2012)
- Pre- and postconditions
- Loop invariants
- Type invariants
- Expression functions
- If expressions
- Quantified expressions
- Non-null constant references (2012)
- Extended return statements
- Extended use clauses (2012)
- Messages with exceptions (2005)
- Controlled types (95)
- Generic units (83)
- Named blocks and lexical nesting (83)
- Dynamic bounds and parametrised types (83)
I would like to go into each of these in depth, but I don’t have nearly enough time to do that. What I’ll do is highlight some of the more interesting parts of the code and then you can check out the full program in appendix A, should you want to.
Contracts and Invariants (2012)
What is probably one of the greatest additions in Ada 2012 is the inclusion of
various assertion types in the language standard. Assertions are one of the
best ways to ensure your code is doing what you want it to do. Whenever you
find yourself thinking, “and here, on line 84, the keys_pressed
set should
include the return key” – stop! – and encode that thought as an assertion.
Above line 84, create a new line and write
assert keys_pressed.contains(xk_return);
By taking the things you think you know about the code, and spelling them out literally in the source file, you not only discover whether or not you were right – but you also help the next person to read the code, who will not have to retrace your steps to gain your insight.
Ada 2012 makes this stuff really convenient. Beside the basic Assert
pragma,
we can also specify, among others,
- ~Pre~(condition), checked each time before its subprogram4 Subprogram is a fancy way of saying “function or procedure” while still retaining the ability to do whole sentences in only one breath. is called;
- ~Post~(condition), checked each time before its subprogram returns;
- Loop invariants, checked on each iteration of its loop5 I was just
informed by sparre in #ada that the
Loop_Invariant
pragma is actually not part of Ada 2012 – it is borrowed from spark 2014 and my Ada compiler converts it to a regular assertion. No matter, we can simply replace it with a regular assertion ourselves!; Type_Invariant
, checked each time a value may have changed6 At this point the performance-seeking reader has probably fell off their chair. Obviously, these checks are disabled in hot paths of the release build, and it is trivial to do so..
These let us encode the “hidden truths” of our code in a way that lets us detect bugs very early – namely exactly at the point where something went wrong.
Please note that you don’t have to think hard to add assertions to your code. Think of them as insanity checks, because for almost any variable, I can come up with a value that will make you go, “Haha, that’d be insane!7 Examples: The number of keys pressed simultaneously on a standard 105-key keyboard will never exceed 500. The number of dots in an email address will never exceed the number of characters in the address. When you change a dollar bill for cents, you will never get more than 100 coins. These are things that should make you go “duh”, and these are also exactly the things that can delay a bug hunt by hours.” Assert that shit. You’re not necessarily trying to verify that your code is sane, you just want to make sure it’s not insane.
Type Invariant
In the Ada code I wrote for this, the type representing a row of fields is
called Row
. The Row
type stores the start and end string indices for each
field in that row as two parallel arrays called First
and Last
8 So to get
the contents of the third field, you have to slice the string on (Row.First (3) ..
Row.Last (3))
..
type Row (Source : not null access constant String; Columns : Positive) is tagged record First : Index_Array (1 .. Columns) := (others => 1); Last : Index_Array (1 .. Columns) := (others => 1); end record with Type_Invariant => (for all I in 1 .. Row.Columns => Row.First (I) <= Row.Last (I) and Row.First (I) in Row.Source'Range and Row.Last (I) in Row.Source'Range);
We see that Row
is a tagged record, which means it is allowed to tap into the
object-oriented features introduced in Ada 95. I don’t use any of those
features9 Besides using the Object.Method (Arguments)
syntax we also get by
virtue of being tagged but in my experience, defaulting to tagged types comes
with nearly no drawbacks, and gives a lot of flexibility later on when the
program grows.
The type declaration states that each value of this type references the input string it is based on, which we’ll look closer at in the next section about non-null constant references. The type also looks like it takes the expected number of columns as an argument, and we’ll look at that when we get to dynamic bounds and discriminated types.
I meant to write about the type invariant, but I realise I probably don’t even have to explain that. It uses a quantified expression to say that all fields should start and end inside the string which contains them, and all fields must start before they can end.
Loop Invariant
Similarly, the loop invariant in the code is probably self-explainatory. The
assertion at the top of the loop body is the invariant that must be true on
each iteration.10 In an earlier version of
this article, the code here said pragma Loop_Invariant (First < Line'Last)
because I mistakenly believed that pragma was part of the 2012 standard. I was
informed that it only exists in the spark 2014 standard (a subset of Ada aimed
at high reliability), and it worked for me because my Ada compiler supports it
by converting it to a regular assertion.
return This : Row (Line, Column_Count) do Scan: for I in 1 .. Column_Count loop pragma Assert (First < Line'Last); This.First (I) := First; Find_Last: declare Next : constant Natural := Strings.Index (Line.all, FS, First, Ada.Strings.Inside); begin This.Last (I) := (if Next > 0 then Next - 1 else Line'Last); end Find_Last; First := This.Last (I) + 2; end loop Scan; end return;
This loop invariant is of the insanity check kind. When we enter an iteration
of the loop, it means we are going to scan a field of the string. At entry,
First
stores the index of the first character of the field. If this is outside
the string, we have clearly gone too far – or messed up some arithmetic. It’s
not a sophisticated invariant, but it did actually reveal a bug at one point.
Oh, and we’re also looking at two other features that are new in Ada 2012:
extended return statements and if expressions. There are technical reasons
extended return statements have to exist11 There are some types of resources
we are not meant to create copies of – they are only supposed to have one
owner at any time. In languages like C, you just have to pinky promise that
you won’t do anything naughty. Ada has language support for expressing
non-copyability with what are called limited types. They way non-copyability
is enforced in practise is, among other things, by forbidding assignment of
such types. So if you want to return a value of this type, and do some
initialisation first, you get the option of turning the return
statement into
a full block of code, and inside that block of code, the normal restrictions
on the variable are lifted. but I have come to really like them for code
organisation reasons. It’s a neat way to not litter the namespace with a
variable you just want to initialise and then return.
Non-null Constant References (2012)
Ada does a lot to help out with memory management while staying very low-level. In languages like C and Ada, there are primarily two12 Okay, three, actually, but we’ll look at the third in the next section modes of dealing with acquiring memory for your values. You either let the compiler do it automatically for you, or you have to do it manually.13 I’m tempted to work in the words “stack” and “heap” into this description but I’ll resist. In my uneducated mind, the specifics around how the allocations happen is an implementation detail. Doing it manually is obviously dangerous. If you dispute that, please recall that thousands of really hardcore, skilled C wizards get it wrong over and over and over again. That should be a powerful argument.
However, I don’t blame these powerful wizards. The C compiler is really limited in its ability to do memory management automatically, so they have to do it manually. A lot. Of course they’ll get it wrong. The Ada compiler does a slightly better job of helping the programmer with automatic allocation and keeping track of memory. Some of this is stuff we’ll see when we get to parametrised types, but we’ll start gently here with just talking about referencing existing objects.
Parameter Modes
One big reason pointers are everywhere in C code is that pointers are the only way to let a subprogram modify one of the caller’s variables. Without pointers, all subprograms get their own copies of the data – which is not always what we want. Ada has what are called parameter modes, which means you can specify whether the function body should be able to read from a parameter, write to it – or both. The compiler handles the reference boilerplate for you.14 This makes things much more clear than it may at first seem: it makes it so that all parameters can be passed by reference, because either they are read only and then the callee can’t modify them anyway, or they are writable and then the caller expects the callee to modify them.
Access Types
The closest thing Ada has to pointers are called access types. These access types are really cleverly designed. Someone wrote that “in Ada it is syntactically invalid to leak pointers to memory that is about to become invalid.” That’s a C programmer’s wet dream. That’s the kind of power you can get when you pay great minds a lot of money to solve hard problems.
There are several rules surrounding access types to make this work and I’m not going to give a comprehensive guide, but these are the basics: To reference an object, you have to create an access type for that type of object, and then declare a variable of that access type. This shouldn’t be particularly surprising. Assming we have a boat:
type Boat is record Captain : Person; Propulsion : Sails_Or_Motor; end record; My_Boat : aliased Boat := (Captain => Me, Propulsion => Sails);
The aliased
indicator is required in order to get a reference from a variable
in the first place.15 The reason for this is elementary: in order to get
the address of an object, the object must have an address. In order to have an
address, the object must reside in primary memory. The compiler tries its very
hardest to avoid storing objects in primary memory, because that is slow
compared to storing them in cpu registers – or even optimising them away
entirely. If we want the compiler to not do this, we make it an aliased
declaration, which guarantees it will have a memory address. We can then let
others access our boat by creating
type Boat_Access is access Boat; Family_Key : Boat_Access := My_Boat'Access;
This creates an access called Family_Key
which, well, accesses my boat. Anyone
who gets hold of the Family_Key
value can do stuff to my boat, including
replace me as the captain.
But! Big but.16 And I cannot lie. While access types are free to reference
any global object, they can only reference a stack object if the stack object
is declared in the same scope as the access type itself. In addition, two
different types declared to be access Boat
are actually incompatible. If you
want to copy a reference, you have to use the same access type that was
originally used to create the reference. That way, references can only ever
point to live stack objects.17 This is a tricky part of Ada when you are
learning it. Despite what masochists say, memory management is hard. In C, you
get weird security flaws. In Ada, you have to really fight to get the compiler
to accept your code. If you are struggling with learning this, my
recommendation is to try different ways to intentionally create a pointer to
invalid memory. You’ll know exactly what the problem is, and then the Ada
compiler will complain – and bam! – you now have an understanding for exactly
what the compiler complains about when it gives you that error message.
We can also create access types that serve as read-only views of the object they reference, limit access types to never contain null (i.e. always reference some object), create access types that can only reference objects allocated a certain way, or limit the total amount of memory available for an access type, or …
In our case, we parametrise the Row
type on an access to the source string:
type Row (Source : not null access constant String; Columns : Positive) is tagged private;
The Source
parameter accepts any non-null read-only access-to-string type.
Then when we create objects of this type, the actual reference we pass in
comes from a string read from the input file:
Raw : aliased constant String := Get_Line (Input.File); Header : constant CSV.Row := CSV.Parse (Raw'Access);
How can this possibly be safe, if Row
accepts any access type? Isn’t there
a way for Row
to misplace the pointer, in a way that it will still linger when
the referenced Raw
goes out of scope? No, because in order for Row
to be able
to actually store the pointer somewhere, Row
would have to be able to declare
a variable of a specific access type – one that is declared in the same
location as the Raw
object. This does not exist, since the access to Raw
is
passed via a one-time-use-only anonymous access type.
Controlled Types (95)
I mentioned there’s a third way to deal with memory management. In fact, this approach transcends just memory management, and extends into general resource management. In other words, any time you need to do mechanical acquisition of a resource, or make sure you clean up when you are done with an object, this technique works.
In Ada, we call it controlled types. In other circles, it’s better known as raii. It is hard to overstate how important raii is. It does not eliminate resource leaks – far from it – but with raii you get a low-effort mechanism to reduce resource leaks by a lot.
The idea of raii is that whenever an object of a raii controlled type is created, an initialisation procedure is automatically called, and this is responsible for acquiring the resource. When a copy of the resource is created (by assignment), an adjustment procedure is automatically called, responsible for creating a copy and handing over control over it. When an object is finally destroyed, a finalisation procedure is automatically called, taking care of cleaning up. This happens regardless of why the object is destroyed: it could be because an unhandled exception was thrown, or simply because the variable containing the object went out of scope. In a sense, this makes a garbage collector out of the regular scope management the compiler has to do anyway.
In our code, we declare a raii controlled file handle, to automatically close the file even if the program exits abnormally.
type Controlled_File is new Ada.Finalization.Limited_Controlled with record File : File_Type; end record; overriding procedure Finalize (This : in out Controlled_File) is begin if Is_Open (This.File) then Close (This.File); end if; end Finalize;
To declare a raii controlled type in Ada, all that needs to be done is inherit
from one of the abstract types Controlled
or Limited_Controlled
18 The
difference between the two is that Limited_Controlled
does not have an Adjust
method., both in the standard library package Ada.Finalization
. The three
methods Initialize
, Adjust
, and Finalize
are called automatically at various
points when objects are created, assigned or destroyed, and these can be
overridden to do custom resource management.
Generic Units (83)
In principle, the generics in Ada work like C++ templates (i.e. they allow you to tell the compiler how to generate concrete code from a generic description, and the compiler will generate code for the types you need). They are different from C++ templates in that they are not a mess. They’re actually well architected.
In this program, I have made the Clamp
function a generic unit. That’s
slightly overkill, and I probably wouldn’t have done that if I wrote this
particular program only for myself. However, in a larger Ada program, this is
just the sort of thing you might see.
generic type Target is range <>; function Clamp (Element : in Target'Base) return Target with Post => (if Element in Target then Clamp'Result = Element);
In fact, this declaration is probably as close as I have come to a pure distillation of ada. It’s a very compact display of what Ada is like. We have at our disposal unusually advanced type mechanisms, combined with what are commonly referred to as zero cost abstractions, and we make sure our variables are what we expect them to be.
We say that the function Clamp
is generic on the Target
range, takes any value
in the base of the target range, i.e. the greater parent range, and then
returns a value in the Target
range. We have an insanity check to make sure
that the value returned is the same as the value passed in, if the value
passed in already fits into the target range.
The implementation of the function is not very interesting, but you can find it in the code listing in the appendix anyway.
We instantiate this generic function by first making a range for it to clamp values to. Then we ask the compiler to generate a concrete function from the generic description, given the range type.
subtype Byte_Range is Positive range 0 .. 255; function Byte_Clamp is new Clamp (Byte_Range);
Now Byte_Clamp
is a regular concrete function that will take any integer and
return the nearest valid value that fits in the range 0–255.
Named Blocks and Lexical Nesting (83)
Note that in the spirit of the challenge, I decided to implement everything in a single Ada file. This is not how code is normally written. Like in other languages, packages are normally defined in separate files with separate package specification, and so on. Ada has a relatively expressive module system, which – contrary to many other languages – supports both different philosophies when it comes to package hierarchies19 Do parent packages provide the fundamental base blocks, with child packages extending these with high level operations – or do private child packages implement the base blocks, with parent packages pulling these together into higher level operations? Ada lets you do either, even though at first they may appear mutually exclusive..
You can also see that Ada supports arbitrary levels of lexical nesting in subprograms. This is necessary when you want to simplify the body of a subprogram by extracting some of the code into a separate – but individually nonsensical – subprogram.
Ada also has named blocks and loops. You can create a new block of code as such:
Find_Target: declare Raw : aliased constant String := Get_Line (Input.File); Header : constant CSV.Row := CSV.Parse (Raw'Access); begin Column_Count := Header.Column_Count; ------------------------------------------------->8--- Put_Line (Output.File, Header.Source.all); end Find_Target;
This would be just as valid without the name, and the name is nothing but a documentation thing. It does prevent bugs by not letting you accidentally close the wrong block, though. Named loops are more important.
Replace: while not End_Of_File (Input.File) loop ------------------------------------------------->8--- end loop Replace;
Why does this matter? Because if we had another loop nested inside Replace
, we
would be able to break out of the outer loop from inside the inner by saying
exit Replace
.
Dynamic Bounds and Parametrised Types (83)
We saw earlier some ways in which Ada avoids pointers. There are more ways,
though! We’ll look at the Row
type again, to see how.
type Row (Source : not null access constant String; Columns : Positive) is tagged record First : Index_Array (1 .. Columns) := (others => 1); Last : Index_Array (1 .. Columns) := (others => 1); end record -- …
This type is parametrised (or, as Ada people say, discriminated) on the number
of fields. This means that – even for the same source – Row (Src, 5)
and Row
(Src, 6)
are not the same. The former indicates a row of five fields, and the
latter one of six. This stuff matters in low-level programming, where the
compiler needs to know how much memory to reserve on the stack for each
object. So far, this is not fundamentally different from C. What happens next,
though, is what makes the difference.
Raw : aliased constant String := Get_Line (Input.File); Header : constant CSV.Row := CSV.Parse (Raw'Access);
Ada lets us declare a variable holding a Row
of arbitrary number of columns,
as long as it is initialised directly. This appears like dynamic memory
management, and it is dynamic for sure, but it is still only reserved space on
the stack.
This is in fact why the first assignment to Raw
works, too. In Ada, a String
(1 .. 6)
(i.e. a an array of six characters) is a different and incompatible
type type compared to String (1 .. 7)
(an array of 7 characters). However, we
can declare an almost generic String
variable and fill it up right away. C has
this for arrays, but Ada has it for arbitrary types.
Another difference is that Ada allows us to do this also for subprogram
parameters. In Ada we can have a subprogram receive a String
of run-time
decided size. In C, such an array would be implicitly converted to a plain
pointer, with the information about its size lost.
Reflections
We have now seen some of the more important features I used. There are other things worth discussing, though. This is probably interesting also to people who couldn’t care less about learning Ada.
Code Style
Looking at the code listing, I realise how formal and enterprisey the code looks. I think this is a natural consequence of the task at hand, and not a reflection of my coding style. Even though Ada is a very enterprisey language, you don’t have to write enterprisey code in Ada. In fact, the first solution I sketched was much less formal in its construction.
But then I tried to cram in a bunch of useful Ada 2012 features, and suddenly there was no way to avoid the formality of it all. As examples, I found out that type invariants require encapsulation through private implementations, and generic functions cannot be declared and implemented at the same time.
Performance
Sebastian, in the article he wrote on his D solution, ran a benchmark comparing the performance between the solutions in C++, Rust, and D. I would love to compare all of those to this Ada program, but I’m currently stuck on a bus with no reliable internet connection, so I can’t install the compilers for D and Rust.
What I can do, however, is compare different versions of the Ada program to the final C++ contribution. To begin with, I constructed a large-ish (12 megabytes: 17 columns, and 97164 lines) comma-separated file where the fields were filled with random numbers.
The C++ program, compiled according to the command specified in the challenge, processes this file in 0.95 seconds.20 Since my laptop died, my fiancée and/or her sister graciously let me borrow an old laptop they don’t use anymore. Hence the not-so-fast program execution. In order not to focus too much on specific numbers, and look instead at general trends, the times in the table below will be given relative to the C++ measurement.
Program description | Speedup |
---|---|
C++ winning solution | 0 % |
Ada w/ heap allocation | 19 % |
Ada w/ stack allocation | 50 % |
Ada w/ varying array size | 150 % |
Ada w/ fixed array size | 190 % |
The program we have been discussing here is the last one. The results here are mostly what you would expect, so there’s not too much to say about that. The two allocating versions copied the contents of the fields to each field object. Obviously, just referencing the source string directly is faster.
I then tried compiling the last program under two different types of
optimisation. First with the bounds checks disabled (-gnatp
), which yielded
only a modest increase in performance, and then with optimisation on (e.g.
-O1
), which was surprisingly significant. It didn’t appear to matter much
which optimisation level, only that optimisation was requested at all.
Compilation type | Speedup |
---|---|
Normal | 0 % |
Checks disabled | 6 % |
Optimisation on | 38 % |
I suspect the reason for the modest improvement with disabled checks is that the compiler will already elide checks which it knows will not be triggered. So ideally, in a program which is 100% correctly written and almost all failure modes are taken care of, the disable checks flag should do nothing to performance.
Multi-tasking
Because one of the places where Ada shines is in multi-tasking, I also threw together a multi-tasked version of the program.
Program | Speedup |
---|---|
Ada, sequential | 0 % |
Ada, multi-tasking | -73 % |
You are reading that right. Multi-tasking decreased the performance a lot.
The reason for that is pretty clear though, just by looking at the output of time
:
Measure | Time |
---|---|
Real | 3.53 |
User | 4.01 |
System | 1.39 |
There are two ways to read this, but both with the same results.
- We could look at the system time, and say, “It’s spending a lot of time in the kernel. I wonder what it does there? Perhaps it is coordinating threads of execution.”
- We could also look at the ratio of user to real time. A fully parallelised task on my dual-core-with-hyperthreading-registered-trademark-intel-property machine would show a user time of nearly quadruple the real time. We barely even get 1.3 times the real time, so no real gain from parallelisation there.
The conclusion is, of course, that the problem is sequential enough in its nature that we do not gain performance with multi-tasking. Instead, we lose performance because we need to spend a lot of time coordinating between the tasks.
Calls For Contribution
I’d love to see solutions in C, Go, and Java: all under the premise of using the new features in the latest version of the language. I realised the original challenge may very well have been a sneaky attempt at tricking people into learning the new features in C++17, because I promise you learn a lot by doing this sort of thing. Even if you already think you know most of the new features. Shoehorning them into a problem gets you familiar with their ins and outs.
Either way, I like the idea of a task like this to keep yourself updated on the new stuff. And I would like to know what’s new in particular in Java 9 and 10.
Appendix A: Code Listing
-- Modeled on the Expressive C++17 Challenge, I present: -- -- The Expressive Ada 2012 Challenge! -- -- Ada features used (language version in parentheses): -- + [x] Contracts (2012) -- + [x] Type invariants (2012) -- + [x] If expressions (2012) -- + [x] Quantified expressions (2012) -- + [x] Expression functions (2012) -- + [x] Extended use clauses (2012) -- + [x] Extended return statements (2012) -- + [x] Non-null constant reference (2005) -- + [x] Message included in exception (2005) -- + [x] Controlled types (1995) -- + [x] Generic units (1983) -- + [x] Named blocks (1983) -- + [x] Dynamic bounds (1983) -- + [x] Discriminated types (1983) -- + [x] Lexical nesting (1983) with Ada.Exceptions; with Ada.Finalization; with Ada.Strings; with Ada.Strings.Maps; with Ada.Strings.Fixed; with Ada.Text_IO; with Ada.Command_Line; procedure Replace is ------------------------------------------------------------------- -- Exceptions ------------------------------------------------------------------- Invalid_Input : exception; -- Raised when input file exists but --| is empty. Invalid_Target : exception; -- Raised when header line of input --| file does not contain --| requested target field. ------------------------------------------------------------------- -- Error messages generated by the exceptions above ------------------------------------------------------------------- Input_Missing : constant String := "Input file missing"; Column_Missing : constant String := "Target field not found"; ------------------------------------------------------------------- -- User supplied parameters to customise operation ------------------------------------------------------------------- In_Filename : constant String := Ada.Command_Line.Argument (1); Out_Filename : constant String := Ada.Command_Line.Argument (4); Target_Field : constant String := Ada.Command_Line.Argument (2); Replacement : constant String := Ada.Command_Line.Argument (3); ------------------------------------------------------------------- -- Clamp -- -- Purpose: -- Returns the integer within the Target range that is nearest -- to the Element argument. -- Exceptions: -- None. ------------------------------------------------------------------- generic type Target is range <>; function Clamp (Element : in Target'Base) return Target with Post => (if Element in Target then Clamp'Result = Element); ------------------------------------------------------------------- -- Clamp ------------------------------------------------------------------- function Clamp (Element : in Target'Base) return Target is (Target'Base'Max (Target'First, Target'Base'Min (Target'Last, Element))); ------------------------------------------------------------------- -- CSV -- -- Purpose: -- Provides the Row type, which represents an ASCII-comma -- delimited record of fields. Has subprograms to parse a -- String into Row, and to get the contents of individual -- fields in a Row. ------------------------------------------------------------------- package CSV is ---------------------------------------------------------------- -- Row -- -- Purpose: -- Represents an ASCII-comma delimited record of -- Columns fields, parsed from Source. ---------------------------------------------------------------- type Row (Source : not null access constant String; Columns : Positive) is tagged private; ---------------------------------------------------------------- -- Parse (known number of columns) -- -- Purpose: -- Parses Column_Count fields from Line into Row. -- Exceptions: -- None. -- Performance Notes: -- O(n) time in length of Line. -- Dragons: -- When Column_Count exceeds the actual field count, the ret- -- urned Row will have a corresponding number of empty fields -- at the end. ---------------------------------------------------------------- function Parse (Line : not null access constant String; Column_Count : in Positive) return Row; ---------------------------------------------------------------- -- Parse (unknown number of columns) -- -- Purpose: -- Parses an unknown number of fields from Line into Row. -- Performance Notes: -- O(n²) time in length of Line. ---------------------------------------------------------------- function Parse (Line : not null access constant String) return Row; ---------------------------------------------------------------- -- Contents (single field) -- -- Purpose: -- Returns contents of individual fields of Row. -- Exceptions: -- Constraint_Error -- when Index is larger than the number of fields in Row. ---------------------------------------------------------------- function Contents (This : in Row; Index : in Positive) return String with Pre => Index in 1 .. This.Column_Count, Post => Ada.Strings.Fixed.Index (This.Contents (1, This.Column_Count), Contents'Result) > 0; ---------------------------------------------------------------- -- Contents (range of fields) -- -- Purpose: -- Reconstructs the part of the Source line that contains all -- fields between First and Last, including delimiters. -- Exceptions: -- Constraint_Error -- when either First or Last is larger than -- the number of fields. ---------------------------------------------------------------- function Contents (This : in Row; First, Last : in Positive) return String with Pre => (1 <= First and First < Last and Last <= This.Column_Count); ---------------------------------------------------------------- -- Column_Count -- -- Purpose: -- Return the number of fields in Row. Useful e.g. if Row was -- parsed by the Parse function that figures out field count -- on its own. ---------------------------------------------------------------- function Column_Count (This : in Row) return Positive; private -- package CSV ---------------------------------------------------------------- -- Index_Array -- -- Purpose: -- Store an arbitrary number of string indices. ---------------------------------------------------------------- type Index_Array is array (Positive range <>) of Positive; ---------------------------------------------------------------- -- Row -- -- Implementation Notes: -- Stores fields as parallel arrays of the string indices of -- Source at which each field starts and ends. ---------------------------------------------------------------- type Row (Source : not null access constant String; Columns : Positive) is tagged record First : Index_Array (1 .. Columns) := (others => 1); Last : Index_Array (1 .. Columns) := (others => 1); end record with Type_Invariant => (for all I in 1 .. Row.Columns => Row.First (I) <= Row.Last (I) and Row.First (I) in Row.Source'Range and Row.Last (I) in Row.Source'Range); end CSV; ------------------------------------------------------------------- -- CSV ------------------------------------------------------------------- package body CSV is package Strings renames Ada.Strings.Fixed; use all type Ada.Strings.Maps.Character_Set; ---------------------------------------------------------------- -- FS -- -- Purpose: -- The field separator expressed as Character_Set for -- compatibility with Ada.Strings.Fixed. ---------------------------------------------------------------- FS : constant Ada.Strings.Maps.Character_Set := To_Set (","); ---------------------------------------------------------------- -- Parse (known column count) -- -- Implementation Notes: -- Linear scan of line with Column_Count iterations, and the -- starting point of the next iteration found by -- Ada.Strings.Fixed.Index. ---------------------------------------------------------------- function Parse (Line : not null access constant String; Column_Count : in Positive) return Row is First : Positive := 1; begin return This : Row (Line, Column_Count) do Scan: for I in 1 .. Column_Count loop pragma Assert (First < Line'Last); This.First (I) := First; Find_Last: declare Next : constant Natural := Strings.Index (Line.all, FS, First, Ada.Strings.Inside); begin This.Last (I) := (if Next > 0 then Next - 1 else Line'Last); end Find_Last; First := This.Last (I) + 2; end loop Scan; end return; end Parse; ---------------------------------------------------------------- -- Parse (unknown column count) -- -- Implementation Notes: -- First counts the number of field separators in Line, to -- to determine Column_Count. Then calls Parse with the now -- known Column_Count. ---------------------------------------------------------------- function Parse (Line : not null access constant String) return Row is (Parse (Line, Strings.Count (Line.all, FS) + 1)); ---------------------------------------------------------------- -- Contents (single field) ---------------------------------------------------------------- function Contents (This : in Row; Index : in Positive) return String is (This.Source (This.First (Index) .. This.Last (Index))); ---------------------------------------------------------------- -- Contents (range of fields) ---------------------------------------------------------------- function Contents (This : in Row; First, Last : in Positive) return String is subtype Field_Range is Positive range 1 .. This.Columns; function Field_Clamp is new Clamp (Field_Range); First_Field : constant Positive := (Field_Clamp (First)); Last_Field : constant Positive := (Field_Clamp (Last)); subtype Source_Range is Positive range This.Source'Range; function Source_Clamp is new Clamp (Source_Range); begin return This.Source (Source_Clamp (This.First (First_Field) - 1) .. Source_Clamp (This.Last (Last_Field) + 1)); end Contents; ---------------------------------------------------------------- -- Column_Count ---------------------------------------------------------------- function Column_Count (This : in Row) return Positive is (This.Columns); end CSV; use Ada.Text_IO; ------------------------------------------------------------------- -- Controlled_File -- -- Purpose: -- Wrapper around File_Type which automatically closes the file -- whenever the object is finalized, iff the file was open. ------------------------------------------------------------------- type Controlled_File is new Ada.Finalization.Limited_Controlled with record File : File_Type; end record; ------------------------------------------------------------------- -- Finalize ------------------------------------------------------------------- overriding procedure Finalize (This : in out Controlled_File) is begin if Is_Open (This.File) then Close (This.File); end if; end Finalize; ------------------------------------------------------------------- -- Local variables of the replacing procedure. Keeps track of input -- and output files, as well as the column index and the number -- of columns all rows should have. ------------------------------------------------------------------- Input : Controlled_File; Output : Controlled_File; Target : Positive; Column_Count : Positive; begin -- procedure Replace -- Open the input file and verify it is not empty Open (Input.File, In_File, In_Filename); if End_Of_File (Input.File) then raise Invalid_Input with Input_Missing; end if; -- Scan the first line of input to find the column index of target. -- If found, create output file and write the first line to it. Find_Target: declare Raw : aliased constant String := Get_Line (Input.File); Header : constant CSV.Row := CSV.Parse (Raw'Access); begin Target := 1; Column_Count := Header.Column_Count; for I in 1 .. Column_Count loop if Header.Contents (I) = Target_Field then Target := I; end if; end loop; if Header.Contents (Target) /= Target_Field then raise Invalid_Target with Column_Missing; end if; Create (Output.File, Out_File, Out_Filename); Put_Line (Output.File, Header.Source.all); end Find_Target; -- For all other lines in the input, copy everything except the -- target column to the output, replacing the target column -- contents with the replacement string. Process: while not End_Of_File (Input.File) loop declare Raw : aliased constant String := Get_Line (Input.File); Current : constant CSV.Row := CSV.Parse (Raw'Access, Column_Count); Result : constant String := Current.Contents (1, Target - 1) & Replacement & Current.Contents (Target+1, Column_Count); begin Put_Line (Output.File, Result); end; end loop Process; exception -- procedure Replace when Error : Invalid_Input | Invalid_Target => Put_Line (Standard_Error, Ada.Exceptions.Exception_Message (Error)); end Replace;
Appendix B: Program Requirements
- The program must read a file, the name of which is given as the first command-line argument to the program.
- If the input file is empty, an error message stating so must be emitted on a standard output stream, and the program execution must halt, and a new file must not be created.
- Each line in the input file must be interpreted as a record of related fields, where each ascii comma character21 0x2c separates two fields.
- If the second command-line argument does not appear alone in a field on the first line of the file, an error message stating so must be emitted on a standard output stream, and the program execution must halt, and a new file must not be created.
- If the second command-line argument appears alone in a field on the first line of the file, the index of this field must be interpreted as the target of a replacement operation.
- The fourth command-line argument is to be interpreted as a file to which output must be written. If this file exists, it must be cleared before new output is written.
- The first line of the input file must be written verbatim to the output file.
- All lines except the first should be written to the output file with one modification: the field indicated as the target of the replacement operation must have contents given as the second command-line argument. All other fields must retain their contents from the input file.