What I must do is all that concerns me, not what the people think. This rule, equally arduous in actual and in intellectual life, may serve for the whole distinction between greatness and meanness. It is the harder, because you will always find those who think they know what is your duty better than you know it. It is easy in the world to live after the world’s opinion; it is easy in solitude to live after our own; but the great man is he who in the midst of the crowd keeps with perfect sweetness the independence of solitude. – Self-Reliance, 1841, Ralph W Emerson
Ahh, to know what one “must do”. Easier said than done, especially in a crowd – as The electric sage points out.
Years ago I tried to free myself from him and went from the mythologies of the suburbs to the games with time and infinity, but those games belong to Borges now and I shall have to imagine other things. Thus my life is a flight and I lose everything and everything belongs to oblivion, or to him. – Borges and I(Labrynths),1964, Jorge L Borges
Borges, inimitably, on why it is pretty hard to “know thy self”.
– Gitanjali, 1910, Rabindranath Tagore
Relying on his strong Upanishadic spirituality, Tagore offers us an interpretation of transcending one’s limited self. If only we that are “shut up in a corner” could learn like Tagore to “break open the door … with the ceremony of a king”.
The Apache Spark cluster compute engine is expected to become the standard computing paradigm for massive data processing. Hard to disagree.
On November 5, 2014 the team responsible for the development and management of the Spark ecosystem (Databricks) announced the results of an impressive benchmarking contest.
We are proud to announce that Spark won the 2014 Gray Sort Benchmark (Daytona 100TB category). A team from Databricks including Spark committers, Reynold Xin, Xiangrui Meng, and Matei Zaharia, entered the benchmark using Spark. Spark won a tie with the Themis team from UCSD, and jointly set a new world record in sorting.
Sorting of numbers is a foundational task in computation. Tim sort was the benchmark technique used in this case. Spark beat the legacy “Big Data” Hadoop MapReduce benchmark handily. 30X better.
Spark sorted the same data 3X faster using 10X fewer machines.
Their official entry can be found here.
A 100TB sort could be thought of as approximately taking the 80 odd trillion dollars in Global GDP as of 2014 in $10 notes with unique sequence numbers and then sorting them in order. The Spark team took less than 30 minutes to do this using 207 nodes(renting each node would have cost about $0.25/hour @ Elastic Map Reduce pricing). So this would have cost about $25 in total. That’s uniquely arranging every $10 produced in an entire year on this planet in under 30 minutes. If that doesn’t get you thinking about tracking consumption and production at a granular level not sure what will.
For a nice visual representation of why sorting 100TB may not be as easy as it sounds click on the lovely illustration by Mike Bostock below.
I eagerly await the availability of Roberto Calasso’s Ardor translated by Richard Dixon.
Sraddha is the Vedic axiom: the firm belief, which cannot be demonstrated but is implied in every act, that the visible acts on the invisible and, above all, that the invisible acts on the visible – that the realm of the mind and the realm of the tangible are in continual communication.
From Wittgenstein’s head-spinning notes “On Certainty”.
31. The propositions which one comes back to again and again as if bewitched—these I should like to expunge from philosophical language.
105. All testing, all confirmation and disconfirmation of a hypothesis takes place already within a system. And this system is not a more or less arbitrary and doubtful point of departure for all our arguments: no, it belongs to the essence of what we call an argument. The system is not so much the point of departure, as the element in which arguments have their life.
How wonderful to find – meditating on boredom can liberate one from it.
Thank you Sylvia.
JSON is a language independent text based data interchange format. It is expected to become the data standard of storage and exchange for the foreseeable future. All tools and databases, Big Data or other, will support it.
JSON empowers developers in a world filled with Big Data hype about how unstructured data will drown out structured data.
Parsimonious and powerful:
- Agnostic about numbers (offers only the representation of numbers that
humans use: a sequence of digits)
- Simple expression of name/value pairs (programming languages can map to – record, struct, dict, map, hash, or object)
- Supports ordered lists of values (map to array, vector, or list)
It will be built upon and extended, but is not expected to change. For instance, a recent extension that may be foundational to the development of web applications in the future is, JSON-LD (LD=Linked Data). It was announced on Jan 16, 2014 by the W3c (The consortium that is responsible for standards that define the internet, for example HMTL).
This specification defines JSON-LD, a JSON-based format to serialize Linked Data … It is primarily intended to be a way to use Linked Data in Web-based programming environments, to build interoperable Web services, and to store Linked Data in JSON-based storage engines.