What is it?
Technical One Page Overview of MetaNumber
A. Overview & Problem Domain: Numerical values are both incomplete and meaningless without their associated units, accuracy level, and defining metadata. Yet in essentially all published data, these terms are scattered in the titles, row & column headings, and often immersed in text thus frustrating direct computer reading1,2,3. This lack of an integrated standard requires human preprocessing prior to computation for all numerical data resulting in significant time delays, errors, and costs especially with Big Data. It represents an almost intractable critical problem affecting every discipline. B. Methodology & Solution: With a $100K USC ASPIRE grant, the author has led a team of 11 senior multi-disciplinary faculty and 10 students with a breakthrough design, development, and programing (in a Python multiuser environment) of a new optimal standard for all numerical information that tightly links each value with its units, accuracy, and full metadata in a “metanumber” object complying with twelve (12) critical requirements 4,5. It supports advanced analytics for Big Data and provides a new framework to support artificial intelligence (see www.metanumber.com)6. Our innovative software performs: (1) full automatic dimensional analysis among all units by adjoining units as variable names mixed in any valid manner (e.g. 1.5*kg/m3)7. (2) The significant digits are captured for conversion of the value to a “ufloat” class object8 when a decimal is present in a value allowing subsequent total accuracy level tracking9,10 (e.g. 4.63+/-0.02). (3) Unlimited metadata information is linked to each archived metanumber via the unique associated table/row/column internet path (e.g. [e_gold_thermal conductivity]). That path provides a unique name for every number (both archived and computed) for retrieval!! The archiving of all computations for each user then provides a complete historical archive of the computational evolution of every value and supports an analysis of path dependent information loss research! C. Data to Metanumber Conversion Algorithm: Existing non-standard data must be converted to the standardized metanumber format. We have developed a prototype standardization program so users can mine web sites with automatic conversion of digital numeric information content into the metanumber standard. This algorithm is exceedingly difficult and critical. D. Algorithm for Metanumber Tables -> Networks -> Clusters: The author, as PI with $2.5M in funding from DARPA, previously developed a new mathematical foundation for networks11 (utilizing his decomposition of the general linear Lie group into a Markov Lie Monoid, which he later proved were isomorphic to all existing networks)12. Given any network, this analysis generated a unique Markov transformation whose eigenvalue-eigenvector analysis supports a transformative agnostic model of cluster identification13. Most recently he proved that each standardized metanumber table (of entities vs properties) can be converted into two networks (one among entities and one among properties) supporting an automated cluster identification for each network as derived from each metanumber table14. As cluster analysis is a foundational component of knowledge, our fully automated cluster identification shows vast promise as an advanced AI tool identifying first level structures (clusters) in all such numerical data15,16,17,18. E. Current Work: (1) Our current metanumber system needs refinements for security and speed and especially a more advanced user interface to enable users to more easily locate and manage standardized metanumber data. (2) Our prototype metanumber conversion program needs advanced development to run on more complex web sites to better identify units and unique row column indices immersed in published data. (3) Our table-network-cluster algorithm needs an informative user interface to display the resulting cluster structures via an advanced dashboard. F. Conclusions & Impact: The potential impact of this system can fundamentally revolutionize all data exchange and automated computation in business, industry, medicine, science, engineering, and all of the social sciences aided by our (optional) extensions of the SI (metric) system19,20 to include four new fundamental units (bit, person, dollar, and flop). It can provide a transformative leap supporting AI as demonstrated by our one application with automated cluster analysis on entity-property data tables. A critical requirement of our system is that every numerical value (with its units, uncertainty, and exact defining tags, both in standardized tables and from past computations) must be instantly readable by BOTH humans and computers and retrievable with a unique (internet path) name. This structure can also support an automated creation of diverse new networks among units, users, metadata, constants and even new networks among the identified clusters thus supporting totally new types of intelligent algorithms. This project is a total paradigm shift in every information domain and as such is too holistic & broad for most funding channels.
NIST Fundamental List of Physical Constants http://physics.nist.gov/cuu/Constants/Table/allascii.txt
Bureau of Economic Analysis, NIPA Data Section 7 CSV format, http://www.bea.gov/national/nipaweb/DownSS2.asp
Statistical Abstract of the United States Table 945 http://www.census.gov/compendia/statab/cats/energy_utilities/ electricity.html
Johnson, Joseph E., 2009, Dimensional Analysis, Primary Constants, Numerical Uncertainty and MetaData Software”, American Physical Society AAPT Meeting, USC Columbia SC (Initial system design and description of the early prototype prior to ASPIRE funding)
Johnson, Joseph E. 2014, A Numeric Data-Metadata Standard Joining Units, Numerical Uncertainty, and Full Metadata to Numerical Values, EOS KDIR Conference Rome Italy (Overview of the ASPIRE grant results after eight months of the two year funding)
www.metanumber.com routes one to the current operational system. The web site has preliminary documentation at multiptle levels along with white papers and published works.
Johnson, Joseph E. 1985 US Registered Copyrights TXu 149-239, TXu 160-303, & TXu 180-520. These provide copyrights of the core aspects of the units conversion of the algorithm which are essentially the same in most computer languages. This software represents the initial work on this topic.
Leibigot, Eric O., 2014, A Python Package for Calculations with Uncertainties, http://pythonhosted.org/uncertainties/. This contains all links for documentation and for software downloads for the Python Uncertainties package.
Johnson, Joseph E, Ponci, F. 2008 Bittor Approach to the Representation and Propagation of Uncertainty in Measurement, AMUEM 2008 International Workshop on Advanced Methods for Uncertainty Estimation in Measurement, Sardagna, Trento Italy. This contains the presentation of a very formal mathematical methodology for managing numerical accuracy (uncertatiny). We consider this method too complex for the initial launch of metanumber.
Johnson, Joseph E., 2006 Apparatus and Method for Handling Logical and Numerical Uncertainty Utilizing Novel Underlying Precepts, US Patent 6,996,552 B2. This reference is the US Patent on the core algorithm for reference .
Johnson, Joseph E. 2005 Networks, Markov Lie Monoids, and Generalized Entropy, Computer Networks Security, Third International Workshop on Mathematical Methods, Models, and Architectures for Computer Network Security, St. Petersburg, Russia, Springer Proceedings, 129-135 ISBN 3-540-29113-X. This contains the primary publication of the proof that every network is isomorphic to the Lie monoids that generate all continuous Markov transformations.
Johnson, Joseph E. 1985, Markov-Type Lie Groups in GL(n,R) Journal of Mathematical Physics. 26 (2) 252-257. This foundational work contains the formal decomposition of the General Liner Group in n dimensions over the real and complex numbers thus tightly integrating all of discrete and continuous Markov theory with the theory of Lie groups and Lie algebras.
Johnson, Joseph E. 2012 Methods and Systems for Determining Entropy Metrics for Networks US Patent 8271412. This patent contains the formal methodology for connecting networks to Markov monoids along with the associated use of Renyi’ entropy spectral values to represent network topologies and to support network tracking and network comparisons by essentially solving the combinatorial network problems for a large class of networks.
Johnson, Joseph E, & Campbell, William 2014, A Mathematical Foundation for Networks with Cluster Identification, KDIR Conference Rome Italy. The M.S. thesis of Mr. Campbell (USC Aug. 2014) with the same name explored and demonstrated these techniques. Our fourmulation generalizes past work based upon the Laplacian matrix, see http://www.math.ucsd.edu/~fan/research/cb/ ch1.pdf , and https://en.wikipedia.org/wiki/Laplacian_matrix
Newman, M. E. J. "The structure and function of complex networks" (PDF). Department of Physics, University of Michigan and Newman, M.E.J. Networks: An Introduction. Oxford University Press. 2010
Theory of Lie Groups , https://www.math.stonybrook.edu/ ~kirillov/mat552/ liegroups.pdf. Formal introduction to the theory of Lie groups and algebras.
Bailey, Ken (1994). "Numerical Taxonomy and Cluster Analysis". Typologies and Taxonomies. p. 34. ISBN 9780803952591https://en.wikipedia.org/ wiki/ cluster_analysis. Introduction to cluster analysis and classifications.
Ernesto Estrada The Structure of Complex Networks, Theory and Applications, ISBN 978-0-19-959175-(2011). An introduction to the theory of networks.
Newell, David B., A More Fundamental International System of Units, Physics Today 35 (July 2014) http://scitation.aip.org/content/aip/ magazine/physicstoday/article/67/7/10.1063/PT.3.2448 . A discussion of the new foundation of the SI system of basic units which for the first time will be totally founded in 2016 upon fundamental constants without the need for reference standards such at the physical kilogram. The metanumber units algorithm is compliant with these standards when they are ratified.
The extension of the SI (metric) system of units to include four new ‘units’ is at the option of the user. However when used, it supports (a) a unit of the fundamental Shannon “bit” 1/0 or T/F of information. This is critically important in information theory allowing multiple other associated units of information storage (GB…) and transmission rates (e.g. baud = bit/s). (b) Next it supports the fundamental unit of the processing of information in a “floating point operation “ as the “flop” allowing metrics of the speed of computer and information processing. (c) Adding the unit of the U.S. dollar, “dollar” not only gives a fundamental metric for the concept of “value” but allows for a correct dimensional anaysis for financial computations and potentially removing errors. Additionally it supports tables of conversions of both other currencies and the deflation/inflation correction of the dollar and other currencies with the use of published tables and allowing for the assoicated units and metadata tags to track value as is possible. (d) The addition of the unit of a living human “person” substantually likewsie enriches the socioeconomic sciences allowing for population densities, incomes, food consumptions, and vast ecological metrics that can be then tracked more precisely.
© July 2015 Joseph E. Johnson, PhD