Data: Probability, probably, probity?

10 min readOct 20, 2023

Digitalisation, the action of going from Analog to Digital, quite an interesting concept if we think about it from an etymological perspective. Analog, comes from “proportionate” and Digital comes from “finger”. Today, we understand Analog as a continuous value, that gets hashed in bits and pieces, mainly bits as in 0 and 1, that give a digital value. The perennial question about the continuous or the discreet status of the universe keeps coming at us from all sort of angles. For now, the discreet has the upper hand, as computers become more powerful than human brains while using only 0 and 1. Let’s keep that in proportion, I ‘m not pointing the finger at anybody here.

With digitalisation comes the ability to process enormous quantity of data from various sources which enabled John Mashey to coin the name Big Data in the mid 90’s, Kevin Ashton the name Internet of Things in 1999 and later on George Schmidt the name Cloud computing in 2006. Today, we commonly use these terms without always understanding how they are fundamentally transforming the way we do think about things we thought we understood completely. What I try to stress here is, we are talking of a paradigm shift on par with the introduction of the concept of relativity in the field of Physics a bit more than a century ago.

Big Data: An ever growing flow of data with smaller and smaller sample intervals coming from various sources. Combined with the exponentially increasing computing capacity from modern machines, this term resumes the fact that the sampling of a given population is not done using a small portion of this population anymore but using the whole population itself. This in turn makes the statistical exercise extremely precise and increases drastically the likelihood of a prediction.

Internet of Things: In 1974, The Internet is born. The term “Internet” was coined by Vinton Cerf, Yogen Dalal and Carl Sunshine at Stanford University to describe a global transmission control protocol/internet protocol (TCP/IP) network, or the rules that allow for information to be sent back and forth over the Internet. This information used to reside in computers back in the days, today, the computer which is one thing has been joined by a multitude of other “things” which can allow too for information to be sent back and forth over the Internet. These things are anything you can think of, a sensor, an actuator, a light bulb, a fridge, a telephone, a car, anything really. Their rapid expansion makes BigData even bigger as they keep expanding.

Cloud computing: This is what Wikipedia says about it, “It is a type of Internet-based computing that provides shared computer processing resources and data to computers and other devices on demand. It is a model for enabling ubiquitous, on-demand access to a shared pool of configurable computing resources (e.g., computer networks, servers, storage, applications and services), which can be rapidly provisioned and released with minimal management effort. Cloud computing and storage solutions provide users and enterprises with various capabilities to store and process their data in third-party data centers that may be located far from the user–ranging in distance from across a city to across the world. Cloud computing relies on sharing of resources to achieve coherence and economy of scale, similar to a utility (like the electricity grid) over an electricity network.”

This space, far from clouding the overall picture, has given endless possibilities to the end-users. It gives everybody the ability to use the capacity of nearly endless computer processing power.

So far so good.

Let’s recap quickly, the amount of data available is growing at a vertiginous pace, the capacity of computers has surpassed the human brain, and all this data is available on tap in the cloud or at least could be available. So yes, data is the new oil, but data by itself is meaningless; it requires attention and perspicacity to unravel its true potential and who in the world could manage such a task but the quasi infinite power of the cloud?

With the previous model, data gathering was a subject centric affair, banks were looking at bank data, buildings were looking at building data, health organisations were looking at health data, etc… Data was used to give hindsight at posteriori on a given subject, and this was driven by the subject’s experts, bankers on banks, engineers on buildings, doctors on health, etc…

In the new paradigm model, there is no limitation to the correlation of the data sources but the imagination of the people working with these data. With the help of amazing figures like Norbert Wiener and Vladimir Vapnik, the development of artificial intelligence, machine learning, and smart algorithms brings possibilities we would not have dared to dream about. A new breed of engineers was born, data scientists, data analysts, data miners, people allowing the world to make sense of the chaos of this ever growing amount of data using more and more sophisticated software to identify patterns and likely behaviour. From now on the data is used to tell us what is about to happen and the accuracy of the prediction can be unsettling at times (Anyone has ever been shocked by the ability of Amazon to suggest your next read when you order online?). Furthermore, a set of data apparently not related at all to a subject can detect early signs of a change, like water and electricity usage can be used to detect early signs of Alzheimer disease in patients.

So what’s the catch?

In a word: Disruption

How do you convince an engineer, who has learned for years to explain why a phenomenon happens with certainty, to accept that a humongous magma of somehow messy data can tell what phenomenon will happen with a very high probability without any notion of the why it will happen? It’s a bit like trying to force someone to accept the quadrature of the circle has been solved without explanation. In that sense, Big Data resembles a kind of Deus ex Machina which gives answers without explanation, in other words Big Data tells us what will happen with quasi certainty without telling us why; you have to admit this is a leap of faith not for the faint hearted. For a lot of people, the what without the why is not acceptable.

Well, when the concept of quantum physics was brought, a similar rejection was experienced, but force was to admit, in time, that it was a reality.

Someone, it must be Heisenberg as he made a theorem out of it, once said: “Uncertainty is absolute”, quite a challenging statement for us mere mortals, always looking for a bit of certainty. We do like certainty, it is reassuring, it is safe. Living outside of our comfort zone is not our forte, it makes us, unsurprisingly, uncomfortable. From that point of view, disruption does not help as it breaks apart what we use to know and takes away the certainty we thought we had. Then again, a small fringe of us, the so called early adopters, are always keen to discover new grounds. This is how we keep making progress and, going back to the beginning of this discussion, it looks like the discreet has once more the upper hand on the continuous; we reach the ceiling, and through a ground breaking process discover a new floor.

Our capability to explore data has been used to identify patterns. A lot of hedge funds companies have been making fortunes in the early days of Big Data, pushing the boundaries to catastrophic results for the unaware (GFC anyone?). This is the trick, Big Data and its tools can tell us what is most likely to happen but how much can we trust what will happen most probably? What if a black swan appears? (Nicholas Taleb has written a great a book on the subject: The Black Swan: The Impact of the Highly Improbable). Some people will never accept the highly probable because it is not certain, some others will use it to increase the speed of their multiple decision making process.

Let’s focus for a while on Building Management Control Systems (BMCS), where data are used to automatically maintain conditions such as temperature, pressure and humidity with the best possible accuracy in buildings. This has been a building centric exercise for decades, where inputs and outputs installed on various assets of a building are connected to field device controllers with local computing abilities to maintain set points by the mean of Proportional-Integral-Derivative (PID) loops. This method allows to maintain accurately temperature, humidity and pressure set points. It creates a lot of dissatisfaction as well, just ask any person working in a building what their views are on air conditioning, they would probably tell you that the acronym BMCS stands for something completely different, like Big Mess Coming Soon. The issue here is, a given temperature of let’s say 21 °C is not perceived in the same way by two different people; one might say “too cold” while the other might say “too warm”, from there how do you reach “perfect”? This is the predicament any professional in the air conditioning industry is entangled with.

Big Data has brought some element of response to help solving this conundrum by correlating data that were not only building centric. It is now possible to maintain conditions in buildings while maximising the percentage of least dissatisfied people living in it; the immediate perk of this approach is the energy savings it generates. The correlation of interval data coming from the BMCS, the utility and the bureau of meteorology, associated with the thermal comfort model, allow to make real time set point adjustments as external conditions change. The continuous slight drift helps maintain the best possible conditions in the building, consuming the least amount of energy and with the least amount of dissatisfied people. Not perfect but not bad.

In my view, this is a step in the right direction despite the relatively rudimentary approach. What I mean by rudimentary is some of the variables are still set manually by the data engineers watching the results of the machine learning algorithm, like the thresholds set as limits for the set points to drift. Nonetheless, these are the first steps of data driven decisions for air conditioning in buildings.

In a not too distant future, all buildings will be able to make data driven decisions, not just on the air conditioning side, but on all aspects, from the way they are built, the life expectancy of their assets, true preventative maintenance, lighting, water, waste, etc… Their only limitation will be how connected they are.

Do you remember the days when a phone was just a phone? We all know what happened, they became mobile and then they became smart as in intelligent. The same is happening to buildings and to a larger extend to cities.

How does this happen?

In a word: App (A self-contained program or piece of software designed to fulfil a particular purpose)

In the same way that smart phone users stay connected to the world by downloading apps sitting in the cloud, buildings and cities will more and more stay connected to the world by using Software as a Service (SaaS) sitting in the cloud. The challenge is enormous, given the incredible power these SaaS will have to make data driven decisions for buildings and cities.

The start-ups of today might not even remember the Dotcom story, when investors went haywire about the information super highway opportunity; they saw an economic potential, threw immense amount of money at it without thinking it through and the whole thing disappeared at the beginning of the new millennium. To this day, nobody really knows the amount of wealth that has been lost during that era. At that time the world understood money but not data.

Fast forward, we are now submerged by data, virtual worlds are growing not just on the gaming scene; buildings can be fully erected virtually reducing building costs by solving problems virtually therefore avoiding them in real. We have social media too, virtual spaces where we can meet our contacts, friends and dates, wherever we are in the world as long as we have an internet connection. Companies like Google have mapped the planet which in turn allows us to virtually visit places where we might never go physically. Apps are helping us in practically all aspects of our lives. The potential seems limitless.

While the virtual is increasing its ability to replicate what’s real, there is an increasing growing gap between the people who have access to data and the people who don’t. In the real world, resources are depleting, population is growing in numbers, and with the challenges at hand of climate change, migration, access to food, energy and shelter, it looks like the real mission of the power within the data is to find a way not just to make the world a better place but to actually find a way to save it. Quite an elliptical shortcut.

Data is a piece of information, information is knowledge, knowledge is power, and with power comes responsibility. Here we go.

Let’s face it, we are in a situation where an individual can reshape the world by building a smart algorithm using Big Data from the comfort of their bedroom; in this same world it is possible for kids to make ridiculous amount of money by posting videos on YouTube that are watched by millions of viewers. How are we supposed to pass on traditional values of hard work, dedication and perseverance when children are witnessing overnight success stories? How do we convey the importance of altruism when fame is built on narcissism and number of viewers?

One element of response might be the one thing that companies are looking for more desperately than data: Probity and integrity.

The smart algorithm economy is real, it is reshaping the way we learn, the way we work, the way we relate to each other; it is evolving extremely fast and brings a lot of uncertainty. However, its potential, if harnessed properly, could help solving some of the current very urgent issues that are energy demand, food production and climate change, just to cite a few. By allowing a true holistic approach to large problem solving, it will help people and companies to collaborate at all new levels, reducing the silo approach that has proven so many times unhelpful by bringing more new problems than the one it was supposed to solve in the first place.

A tomorrow where building conditions will be maintained by taking into account the DNA of its occupants and adapt according to the awareness of their perceptions, in other words when buildings will be in tune with their occupants as much as the occupants will be in tune with the building they work or live in, maybe farfetched today but possible when Big Data will be at that level.

The question that remains is: How much are we ready, individually, to give back instead of taking endlessly? My understanding at this point in time is: Probity in the probability of the data is probably the answer.

Data: Probability, probably, probity?

Written by Biodiver.City