Team. An intuition coupled with a few solid ideas have been floating around in the mind of this weblog's author since his days working in computer telephony integration in the late 1990s. He has found a few moments this evening, and he would like it if he could share them. Having a life-long dream of working as a college professor and mentoring students or as an industrial researcher, he would often imagine potential research projects for himself and a future staff of graduate students. The PhD never arrived. As such, he never led a team of graduates students in researching concerns, algorithms, and theory, his favorite topics in computing.
Circa the mid-1990s, leaning upon his experiences as a high-school NIH scientific apprentice at a local cancer biology research center and as a undergraduate computing student, he dreamt of ways in which he might compress information. He also dreamt of ways in which he might model life and its processes as a computation seeing that he had seen some molecular modeling done on a Silicon Graphics workstation. This experience had simply left him in awe.
Seeing that he had realized that prying eyes had scanned his math and science notebooks while a high-school student, he knew the nature of the "normal" student. So, he constructed a diagram with a pair of interwoven ideas.
On the top was a number line with numerous hops on it connected by a line. This is something one might have seen in front of a second grade class as students learned basic addition and subtraction. He thought, "What if a person could uniquely describe a much larger number as a much smaller number?" Are not files simply very large numbers? Below this diagram, he wrote "DNA Storage" being that DNA was an acronym for "Distinct Number Algorithm". Simply put, for every "large" number, a much "smaller" and "distinct" number might be mapped with it This would be a very effect means of compression. Also, as an unintended "side-effect", it might also be an effective means of ciphering seeing that if one does not know exactly how the "smaller" value is unfurled, he could produce any number larger than it which represents a file. This was the first idea.
The second was that deoxyribonucleic acid (DNA) is composed of the bases: G, C, A, and T. These can be mapped with 0, 1, 2, and 3. This means that one might map a "natural" DNA sequence with a base 4 number, and the transformations which occur in cellular division might be seen as the application of a composition of mathematical functions. Being that irregular cellular division is the foundation for cancerous growths. One "might" develop an approach for designing effective treatments for cancers by creating medications that correct the incorrect translation of a DNA number between its natural starting and ending values. This was noted on the diagram with the phrase "Base 4 GCAT".
Finally, seeing that the author did not have the depth of knowledge in the natural sciences that he might "flesh out" this second idea, his primary focus was the first idea.
As the modern research area in "DNA storage" gained steam misty watercolored memories of a simple dream and intuition for compressing data with a greedy "whittling" algorithm crossed his mind. He was rather disheartened thinking that "academic dishonesty" had produced multi-million dollar federal and corporate research projects in quest of something which a undergraduate computing student could resolve in an independent study. However, being an adjunct at a representative sample of universities and college in America, this type of dishonesty did not surprise him. But, it bothered him seeing that children are still malnourished in this world of plenty.
However, here is the crux of the "simple" idea, an idea which might ultimately result in someone running an OS + Virtual Machine emulator in a space smaller than the storage on a 3.5 floppy.
One thing that we often forget as we progress in mathematics, since we deal with numeric symbols such as 3, 5, 7, 9 and 12, is that we are ultimately dealing with implements such as blocks, small stones, or grains of sand. These implements might represent the size of a piece of wooden stock.
So, our file which is a large number can be represented by a large piece of stock that we can whittle away at until we have a size of wood that is ideal for storage. We only must keep track of the size of the whittles. Wow, all of those whittle-sizes must represent a whole bunch of numbers. Seems like one would have more numbers that one must keep track of than the size of the original number which was the first file.
Well, computing has long had "pseudo-random" number generators which are compact functions that represent a potentially "infinite" stream of numbers. The nice feature of these generators is that the sequence which they generate is based upon the first input value, the seed.
Would not an "old" man whittling a piece of wood first remove large random pieces, then smaller random pieces, and then smaller pieces still? Finally, when he was near the desired sized stock, would not he remove a last stream of "micro"-whittles.
This would compress a large number producing an arbitrarily smaller number.
But, how do we uncompress the small stock. Simply start gluing on the large random whittles, then the mediums whittles, and etc. by adding the random values with the small "distinct" number that was the remaining remnant.
Compressing a kilobyte at a time, and recursively compressing this first generation of remnants through second, third, fourth, and etc. generations, one might conceivably place a zetabyte in a kilobyte or less.
Well, that is great, but when would one stop adding so he knew that he produced the appropriate larger number. It would be wise if we augment our original file with a small marker as a trailer that would not likely be produced by the "decompression" process unless we had arrived at the "true" original value.
Well, that is all fine and good, but how might this produce an emulator powerful enough for hosting Win 95 and JAVA 1.0 on a 3.5 inch floppy. Seeing that a Von Neumann architecture partitions the computational power from the memory. The memory space itself, main memory and secondary storage, is a file and ,hence, a number. So, once compressed, any state change might be represented by augmenting the first remnant representing a "base" with another small distinct number representing an "offset". An audit trail of offsets will describe the "current" state of the emulator. Such an emulator might also contain an algorithm for resolving the base and offset history producing a single new remnant number for the latest state of the emulator. This might be a wonderful project for a doctoral student.
Such a system was this author's dream and a project that he was planning on proposing if he ever had an opportunity at a research position at Sun Microsystems in the 1990s where he had friends working in the executive management.
Seeing the parallel with file kilobytes, one might possibly place a grid of processors in such an emulator. Such processing might be rather "slow" at first, but with ample, ardent, and assiduous arbeit one might accelerate the process.
If any student is interested in working on this project, knock yourself out.
The CABOOSE Team. Hunt, Peck. Think, and From a Former Teacher Please Do Not Cheat. It Could Be Costly.
No comments:
Post a Comment