Проект Аполлон с точки зрения разработки ПО
Год назад, 20 июля 2019 года, исполнилось 50 лет с момента первой в истории высадки на Луну. Нил Армстронг и Эдвин Олдрин оставались на Луне 21,5 часов, и на 2,5 часа выходили на поверхность. Программа «Аполлон» и высадка на Луну часто упоминаются как одни из величайших достижений в истории человечества. Мне интересно было посмотреть на это событие с точки зрения разработки и тестирования ПО. Я попробовал заняться археологией и найти материалы о разработке ПО для лунного аппарата, чтобы понять как это выглядело и были ли отличия от современной разработки ПО.
Интересные детали разработки ПО для лунного модуля я нашел в книге “Computers in Spaceflight: the NASA Experience”. Приведу наиболее заинтересовавшие меня цитаты:
One can program a computer on several levels. Machine code, the actual binary language of the computer itself, is one method of specifying instructions. However, it is tedious to write and prone to error. Assembly language, which uses mnemonics for instructions (e.g., ADD in place of a 3-bit operation code) and, depending on its sophistication, handles addressing, is at a higher level. Most programmers in the early 1960s were quite familiar with assembly languages, but such programs suffered from the need to put too much responsibility in the hands of the programmer. For Apollo, MIT developed a special higher order language that translated programs into a series of subroutine linkages, which were interpreted at execution time. This was slower than a comparable assembly language program, but the language required less storage to do the same job. The average instruction required two machine cycle-about 24 milliseconds-to execute.
Количество и названия инструкций:
The interpreter got a starting location in memory, retrieved the data in that location, and interpreted the data as though it were an instruction. Instead of having only the 11 instructions available in assembler, up to 128 pseudoinstructions were defined. The larger number of instructions in the interpreter meant that equations did not have to be broken down excessively. This increased the speed and accuracy of the coding.
The MIT staff gave the resulting computer programs a variety of imaginative names. Many, such as SUNDISK, SUNBURST, and SUNDIAL, related to the sun because Apollo was the god of the sun in the classical period. But the two major lunar flight programs were called COLOSSUS and LUMINARY. The former was chosen because it began with “C” like the CM, and the latter because it began with “L” like the LEM97. Correspondence between NASA and MIT often shortened these program names and appended numbers. For example, SOLRUM55 was the 55th revision of SOLARIUM for the AS501 and 502 missions. BURST116 was the 116th revision of SUNBURST. Although these programs had many similarities, COLOSSUS and LUMINARY were the only ones capable of navigating a flight to the moon. On August 9, 1968, planners decided to put the first released version of COLOSSUS on Apollo 8, which made the first circumlunar flight possible on that mission.
Don Eyles пишет в TALES FROM THE LUNAR MODULE GUIDANCE COMPUTER:
Control passed to BURNBABY — the master ignition routine that we wrote after LM-1 to save memory by exploiting the similarities among the powered flight phases in the period leading up to ignition. Verb 06 Noun 62 appeared on the DSKY. The middle register contained a time in minutes and seconds that began to count down toward light-up. At 35 seconds the display went blank, and at 30 seconds reappeared. This was a signal that Average-G had started. At seven and a half seconds, the ullage burn began. At five seconds, the display flashed to request a “go” from the crew. Buzz Aldrin, the LM Pilot, standing on the right side of the cockpit, had the main responsibility for working the DSKY. Now he keyed PROCEED.
Even though common sense indicates that it is advantageous to complete something as complex and important as software long before a mission so that it can be used in simulators and tested in various other ways, software is rarely either on time or perfect. Fortunately for the Apollo program, the nature of core rope put a substantial amount of pressure on MIT’s programmers to do it right the first time. Unfortunately, the concept of “bug”-free software was alien to most programmers of that era. Programming was a fully iterative process of removing errors. Even so, many “bugs” would carry over into a delivered product due to unsophisticated testing techniques. Errors found before a particular system of rope was complete could be fixed at the factory, but most others had to be endured.
Количество людей, принявших участие в проекте:
There are widely disparate estimates of how many people actually contributed to the shuttle software. Macina of IBM says 275, but l think he means coders. John Aaron of NASA, head of Spacecraft Software in 1983, estimates 900 contractors and 90 civil servants. Parten said 2,000 but that may include everyone in all contracting organizations working on hardware and software. The figure of 1,000 seems reasonable for software developers, as it is consistent with similar projects.
The Draper tutorial included the concept of highly modular software, software that could be “plugged into” the main circuits of the Shuttle. This concept, an application of the idea of interchangeable parts to software, is used in many software systems today, one example being the UNIX operating system developed at Bell Laboratories in the 1970s, under which single function software tools can be combined to perform a large variety of functions.”
50 лет прошло, а люди всё такие же:
The impression one gains from documents and interviews is that both Rockwell and IBM fell victim to the “not invented here” syndrome: If we didn’t do it, it wasn’t done right. For example, Rockwell delivered the ascent requirements, and IBM coded them to the letter, thereby exceeding the available memory by two and a third times and demonstrating that the requirements for ascent were excessive. Rockwell, in return, argued for 2 years about the nature of the operating system, calling for a strict time-sliced system, which allocates predefined periods of time for the execution of each task and then suspends tasks unfinished in that time period and moves on to the next one. The system thus cycles through all scheduled tasks in a fixed period of time, working on each in turn. Rockwell’s original proposal was for a 40-millisecond cycle with synchronization points at the end of each102. IBM, at NASA’s urging, countered with a priority-interrupt-driven system similar to the one on Apollo Rockwell, experienced with time-slice systems, fought this from 1973 to 1975, convinced it would never work.
Designers developed the programs using a Honeywell 1800 computer and later an IBM 36O, but never with the actual flight hardware.
Post-production hardware tests included vibration, shock, acceleration, temperature, vacuum, humidity, salt fog, and electronic noise.
Количество людей в группе верификации, их бюджет и образ мышления:
IBM established a separate line organization for the verification of the Shuttle software. IBM’s overall Shuttle manager has two managers reporting to him, one for design and development, and one for verification and field operations. The verification group has just less than half the members of the development group and uses 35% of the software budget. There are no managerial or personnel ties to the development group, so the test team can adopt an “adversary relationship” with the development team. The verifiers simply assume that the software is untested when received. In addition, the test team can also attempt to prove that the requirements documents are wrong in cases where the software becomes unworkable. This enables them to act as the “conscience” of the entire project.
Suggestions for changes to improve the system are unusually welcome. Anyone, astronaut, flight trainer, IBM programmer, or NASA manager, can submit a change request. NASA and IBM were processing such requests at the rate of 20 per week in 1981.
On June 13, Tindall reported that the AS-204 program undergoing integrated tests had bugs in every module. Some had not been unit tested prior to being integrated. This was a serious breach of software engineering practice. If individual modules are unit tested and proven bug-free, then bugs found in integrated tests are most likely located in the interfaces or calling modules. If unit testing has not been done then bugs could be anywhere in the program load, and it is very difficult to identify the location properly. This vastly increases the time and, thus, the cost of debugging. It causes a much greater slip in schedule than time spent on unit tests. Even worse, Tindall said that the test results would not be formally documented to NASA but that they would be on file if needed.
Проблемы при испытаниях:
A software failure causing restarts occurred during the Apollo 11 lunar landing. The software was designed to give counter increment requests priority over instructions. This meant that if some item of hardware needed to increment the count in a memory register, its request to do so would cause the operating system to interrupt current jobs, process the request, and then pick up the suspended routines. It had been projected that if 85,000 increments arrived in a second, the effect would be to completely stop all other work in the system. Even a smaller number of requests would slow the software down to the point at which a restart might occur. During the descent of Apollo 11 to the moon, the rendezvous radar made so many increment requests that about 15% of the computer systems resources were tied up in responding105. The time spent handling the interrupts meant that the interrupted jobs did not have enough computer time to complete before they were scheduled to begin again. This situation caused restarts to occur, three of which happened in a 40-second period while program P64 of LUMINARY ran during descent. The restarts caused a series of warnings to be displayed both in the spacecraft and in Mission Control.
Что ещё удалось найти:
- Reliability and Quality Assurance описывает компонентные, приёмочные и недеструктивные тесты лунного модуля.
- Certification Test Program
- Electronic Systems Test Program Accomplishments and Results
- Apollo Guidance Computer And Associated Ground Support Equipment
- Apollo Guidance Software: Development and Verification Plan
- Интересные фотографии испытаний лунного модуля - https://crgis.ndc.nasa.gov/historic/1297
- Симулятор Apollo Guidance Computer онлайн, который можно запустить по инструкции из чеклиста.
- Орг. чарт команды, работавшей над лунным модулем
- Ответственные за разработку программных модулей Apollo Guidance Computer