«Abstract. One major goal of Mathematical Knowledge Management is building extensive repositories, in which the mathematical knowledge has been ...»
Revisions as an Essential Tool
to Maintain Mathematical Repositories
Adam Grabowski1 and Christoph Schwarzweller2
Institute of Mathematics, University of Bialystok
ul. Akademicka 2, 15-267 Bialystok, Poland
Department of Computer Science, University of Gda´skn
ul. Wita Stwosza 57, 80-952 Gda´sk, Poland
Abstract. One major goal of Mathematical Knowledge Management
is building extensive repositories, in which the mathematical knowledge has been veriﬁed. It appears, however, that maintaining such a repository is as hard as building it – especially for an open collection with a large number of contributors. In this paper we argue that even careful reviewing of contributions cannot cope with the task of keeping a mathematical repository eﬃcient and clearly arranged in the long term. We discuss reasons for revisions of mathematical repositories accomplished by the “core implementors” and illustrate our experiences with revisions of MML, the Mizar Mathematical Library.
1 Introduction Mathematical knowledge management aims at providing both tools and infrastructure supporting the organization, development, and teaching of mathematics using computers. Large repositories of mathematical knowledge are here of major concern since they provide users with a base of veriﬁed mathematical knowledge. We emphasize the fact that a repository should contain veriﬁed knowledge only: we believe that (machine-checkable) proofs necessarily belong to each theorem and therefore are an essential part of a repository. For repositories in the sense of Mathematical Knowledge Management community this implies even more: proofs should not only be understandable for the machine, but also – for human users of the repository.
From this follows that mathematical repositories are more than collections of theorems and proofs accomplished by a prover or proof checker. The overall goal is not proving a theorem – though this still is an important and challenging part – but presenting deﬁnitions and theorems so that the “natural” mathematical buildup remains visible. Theories and their interconnections must be available, so that further development of the repository can be based upon these. Being not trivial anyway, this becomes even harder to assure for an open repository with a large number of authors.
So, how to tackle this task? One possibility, of course, is reviewing submissions. Reviewing improves the quality of knowledge and proofs added to the repository, but we shall illustrate that in the long run reviewing cannot ensure that a mathematical repository meets the demands mentioned above. We therefore claim that revisions are an essential part of maintaining mathematical repositories: in order to keep it clean and attractive for users, from time to time a “core team” has to check and improve the organization, quality, and proofs of a mathematical repository.
In the following section we describe and discuss the goals and beneﬁts of revisions compared to a straightforward reviewing process. Then, after a brief introduction to the Mizar system , we consider the reviewing process for submissions to MML in Section 3. We describe reviewing criteria and show which insuﬃciencies can be handled by reviewing. In contrast, Section 4, is devoted to revisions of MML illustrating on the one hand what kind of improvements reviewing cannot perform, on the other hand the role of revisions in maintaining MML. This is the process done mainly by human hand as of now, in the next section we discuss some issues concerned with this activity and describe some traps the developer could meet when enhancing the library. Then we conclude showing some related work and drawing some remarks for future.
2 The Need for Revisions The goal of a revision is to improve the mathematical repository. In contrast to reviewing submissions, however, here the attention is turned to the repository as a whole, not to a single, new part of it. Consequently, motives for revisions
can be for example:
– keeping the repository as small as possible, – preserving a clear organization of the repository in order to attract authors, – establishing “elegant” mathematics, that is e.g. using short deﬁnitions (without unnecessary properties) or better proofs.
Note that all these points characterize a qualitative repository and can hardly be achieved by reviewing single submissions. Of course there are diﬀerent possibilities to achieve the points mentioned. Improving the prover e.g. can shorten proofs and hence – simplify the repository. (Re-)organizing a mathematical repository probably demands manipulating the whole ﬁle structure, not only the ﬁles themselves. Therefore we decided to classify revisions based on their occasion, that is on which kind of insuﬃciency we want to address. Based on our experiences with the Mizar Mathematical Library we distinguish four major
occasions for revisions:
1. improving authors’ contributions;
2. improving the underlying prover or proof checker;
3. reorganizing the repository;
4. changing representation of knowledge.
Improving an author’s contribution is the classical task of reviewing and is of course to be recommended for mathematical repositories too: nomenclature can be polished up to ﬁt to the yet existing one, deﬁnitions can be improved, that is e.g. generalized if appropriate. Proofs are also a matter of interest here, especially keeping them as short as possible, yet still understandable is of major concern. In a large, open repository however, authors sometimes may prove and submit theorems or lemmas not being aware that those are already part of the repository. Similarly, special versions of already included theorems can happen to be “resubmitted”. It is doubtful, that this kind of ﬂaws will be detected by ordinary reviewing.
Strenghtening the underlying prover or proof checker has also an impact on the repository. Proofs can be shortened or rewritten in a more clear fashion, both being fundamental properties of attractive mathematical repositories. Even more, theorems in such collection may now be superﬂuous, because the improved prover accepts and applies them automatically. A typical example here is the additional inclusion of decision procedures.
Reorganizing the repository deals with the fact that a repository is built up by a large number of contributors. For their development authors (should) use already existing theories as a basis. To establish their main results, however, they often have to prove additional theorems or lemmas just because the theory used does not provide them yet. So, these additional facts have to be put in the right place of the repository. Otherwise, it will be hard for other authors to detect them or at least searching the repository becomes less comfortable. In the same direction goes the building of monographs: a frequently used theory should be handled with extra care. Not only should all related theorems be collected in a distinguished place, but also still lacking theorems be complemented, in order to ease working for further authors. These tasks can only be accomplished when considering the repository as whole, that is by revisions.
The last point concerns the development of a repository in the long term.
What if after while it turns out that another deﬁnition or representation of mathematical objects would serve our purposes better than the one chosen? Should it be changed? Note that a lot of authors already could have used these objects in their proofs, that is changing the deﬁnition or representation would imply changing all these proofs – and of course one cannot force authors to redo all their proofs. On the other hand, including both deﬁnitions or representations leads to an unbalance: the theory of the new prefered version is much less developed than the one of the old version, so authors hardly will base their developments on the new one. Again, the solution is a revision: In the best case deﬁnitions and representations are changed by a “core team”, so that ordinary users can furthermore use all theorems without even noticing they changed.
In the following sections we will illustrate these considerations by examples taken from the Mizar Mathematical Library and in particular show how revisions maintain mathematical repositories.
3 Review of MML Submissions Reviews of submissions to the MML – as reviews of ordinary submissions for conferences or journals – have the overall goal to check whether a submission should be accepted (for inclusion into the MML) and simultaneously improve the quality of a submission. For mathematical repositories, however, the criteria for acceptance and improvements are somewhat diﬀerent.
Certainly the contents of a submission for a repository should likewise be original and interesting. Original here, of course, means that deﬁnitions and theorems presented are not part of the repository, yet. This is easy to check for the main theorem of a submission. For technical lemmas used to establish this main result, however, this task is much more diﬃcult. So, for example, a reviewer will probably neither know nor be willing to check whether a theorem like theorem for F,G being FinSequence, k being Nat st G = F|(Seg k) & len F = k + 1 holds F = G ^ *F/.(k+1)*;
is already included in a repository. Even if the textual search via grepping is no longer the only method to ﬁnd such repetitions since the MML Query by Grzegorz Bancerek  is available, even after the volunteer will learn how to use this system, still there is no single automated bunch of tools which removes all repeated theorems eﬀectively. Furthermore, the motivation to check these things in detail will be even decreased, because such a point will not decide between acceptance and rejection.
The question whether a submission is interesting should be handled more liberally. Of course, the usual issues, that is the quality of the main results, apply here also. There is, however, another kind of submissions to repositories: the one that deals with the further development of (basic) theories. This concerns collections of basically simple theorems providing necessary foundations so that more ambitous developments can be easier accomplished. Usually, these are theorems, that easily follow from the deﬁnitions, however are so often used, that repeating the proof over and over again is hardly acceptable. Examples here are the theories of complex numbers or polynomials, where among other things we can ﬁnd the following theorems.
theorem for a,b,c,d being complex number st a + b = c - d holds a + d = c - b;
theorem for n being Ordinal, L being add-associative right_complementable add-left-cancelable right_zeroed left-distributive (non empty doubleLoopStr), p being Series of n,L holds 0_(n,L) *’ p = 0_(n,L);
Though hardly interesting from a mathematical point of view, such theorems are important for the development of a repository and should therefore be considered as interesting, too.
Improvements of a submission are a more diﬃcult issue. Firstly, we can consider deﬁnitions and notations contained in the submission. Can they be arranged more sparsely, that is can the results be established based on fewer axioms? Is it possible or reasonable (in the actual repository) to generalize the deﬁnitions?
This also applies to theorems. Note that the theorem from above, though applicable to polynomials, in fact is stated for power series. Again to address these issues a high knowledge of the repository by the reviewers is necessary.
Secondly, when it comes to proofs, there are hardly any guidelines, because proving in particular is a matter of style. We can hardly force an author to change his (ﬁnished) proof into another one using completely diﬀerent proof techniques.
What we can do, is to suggest improvements for the presented proof. We can, for example, propose a more accurate use of the proof language to get more elegant or better readable proofs. Or we can give pointers to other theorems in the repository that allow to shorten the proof.
Based on these considerations a reviewing process for Mizar articles, that is for submissions to the Mizar Mathematical Library, has been introduced. Using basically the commonly used scheme accept/revise/reject (and apart from its descriptive grade) the rating of a submission can be3 A. accept requires editorial changes only, which can be done by the editors B. accept requires changes by the author to be approved by the editors C. revise substantial author’s revisions necessary, resubmission for another review D. decision delayed revision of MML necessary E. reject no hope of getting anything valuable The most important issue here, of course, is the question whether an article should be included in MML. Note, that there are two grades (A and B) for acceptance. The reason is that accepted articles should be included in the MML as soon as possible to avoid duplication of results during the reviewing phase.4 So, while submissions rated B or C need feedback from the authors, submissions rated A can be added to MML without further delays.
The most interesting point is D. Note that here already the problem of a revision of the whole repository is addressed. Reviewing can point out that – though the author has proven his main results – the way Mizar and MML support establishing the presented results is not optimal and should be improved.
As the most notable example here, we can cite the newly submitted deﬁnition of a kind of a norm for the elements of the real Euclidean plane, which are deﬁned in the Mizar library just as ﬁnite sequences of real numbers.
There has been and is still going on an email discussion about these options.
There even has been the proposition of making public submissions before reviewing to avoid this problem, but we are not aware of a deﬁnite decision concerning this point.
definition let n be Nat, f be Element of TOP-REAL n;
func |. f.| - Real means ex g being FinSequence of REAL st g = f & it = |. g.|;
where |. g.| is a usual Euclidean norm which was introduced in the MML before as definition let f be FinSequence of REAL;
func |. f.| - Real equals :: EUCLID:def 5 sqrt Sum sqr f;