Data Science and IoT for addressing ambient air sanctity

Sathish Swaminathan , Raghunathan Rengaswamy

Ambient air quality is a dynamic parameter that shows a high degree of spatiotemporal variation. It is known to impact human life and health. Obtaining meaningful estimates of human exposure to air pollutants is key to mitigating its ill effects. Current air quality monitoring regime consists of expensive, static monitoring stations that are sparsely distributed across a city which in turn results in poor spatial resolution. This particular work, accomplished under project Kaatru, demonstrates the efficacy of low-cost sensor based, vehicle mounted, mobile monitoring paradigm in hyperlocal air quality assessment. In this approach, compact IoT based devices are fixed atop vehicles and act as agents for collecting ambient environmental data. Data collected from such devices are subsequently processed and assimilated to generate meaningful insights into the ambient air quality of a city with high spatio-temporal granularity. There are two fundamental components of this approach, viz. the IoT devices for data collection and data science tools for analysis.

IoT devices as agents for data collection

The hardware setup is a standalone, location aware telemetry device with bidirectional communication with a central server. A schematic of the sensors in the hardware device is shown in Figure 1. The device has an array of sensors that can measure 25 environmental parameters including location and time. The device comprises on-board Global Position System (GPS) and General Packet Radio Service (GPRS) modules for location tagging and real time communication with a cloud-based server, respectively. Each device sends geotagged ambient environmental data to a central server once every 20 seconds.

Figure 1: Schematic of the hardware setup of sensing device

Modular design of the hardware allows for integration of additional sensors depending on need. Their compact form factor allows them to be fitted on vehicles such as cars, auto-rickshaws or even motorbikes. The device requires less than 10 watts of power and can be powered using the vehicle’s battery itself. The units are strategically deployed in order to achieve maximum coverage with minimal units. As these vehicle mounted units traverse across the city, they collect and transmit location tagged environmental information to a cloud server in real time. In addition to vehicle mounted devices, some devices are placed at fixed locations. These stationary devices would be placed strategically and would serve three purposes:

  1. Augment the sensor network,
  2. Provide temporal data at key locations,
  3. Serve as calibration beacons for mobile devices.

Figure 2 shows the prototype mobile and stationary devices respectively.

Figure 2: Images of the mobile (a) and stationary device (b)

Figure 3: Schematic for hyperlocal air quality assessment using mobile monitoring

Data science for processing and analysis of data

Considering that each device transmits 25 parameters every 20 seconds, data accumulates quickly. The infographic in Figure 4 depicts the quantum of data collected in this work through two pilot studies over a course of 8 months.

Figure 4: Data aggregated through this work over from February 2019 through November 2019

This large collection of unorganized data needs to be meticulously analyzed in order to obtain meaningful. However, the raw data from the sensors tend to have various flaws viz. missing values, outliers and other kinds of erroneous values to name a few. Thus, there is a need to clean and process this data before even beginning any form of analysis. Unlike conventional data sets which varies predominately with time alone, data collected from these mobile sensors vary spatially as well. This necessitates the need for customized spatio-temporal data cleaning algorithms. As a part of this project a custom pre-processing package is developed for data curation. The pre-processing package comprises the following principal modules:

  1. Exploratory Data Analysis Module: This is the first stage of preprocessing module that extracts meta-data information from the incoming raw data set. The report generated by this module is used to inform the pre-processing module about the nature of the data set, based on which suitable cleaning algorithms can be chosen.

  2. Pre-processing Module: This is core module of the package that comprises multiple algorithms for data cleaning. Algorithms for outlier detection and removal, extrapolation, imputation and data transformation among others are built into this module. The algorithms are custom designed to take into account the spatio-temporal nature of the data set.

  3. Resampling Module: This module takes the cleaned data and either up or down samples the data set, either spatially or temporally as per the end user’s requirements. The resampling is done in such a way that the spatio-temporal integrity of the original data set is preserved.

The modular design of the pre-processing package allows for combining different groups of algorithms as suited for the end requirement. The whole package is built universally for processing any kind of spatio-temporal data set. Figure 5 shows the fundamental blocks of the Pre-processing package.

Figure 5: Fundamental blocks of the pre-processing package

Case Study: Gurgaon City

Some devices were deployed in Gurgaon, Haryana. The devices were fixed on top of electric rickshaws for a period of 3 months. Figure 6 shows the variation in PM2.5 concentration over one particular route in Gurgaon. The route is a quadrangular section that extends over roughly 6 km2 of residential and commercial areas.

Figure 6: PM2.5 concentration over a 6km2 area in Gurgaon

The plots show substantial gradation in the PM2.5 concentrations over a relatively small area. After pre-processing the raw information, two parallel roads were chosen from the area depicted in Figure 6 for further analysis. Figure 7 a) shows the two parallel roads, where Road 1 is an 8 lane National Highway and Road 2 is 4 lane main road. The two roads are separated by a distance of roughly 2 Kms. However, the variation in the air quality and other ambient environment information between the two roads is substantial. The time averaged variation between the two roads is shown in the table in Figure 7 (b)

Figure 7: (a) Two parallel roads in Gurgaon separated by ~2 kms that was chosen for analysis, (b) Time averaged variation between Road 1 and Road 2 across various parameters.

Hyperlocal nature of the data collected allows us to visualise information at such high granularity. These results substantiate the need for hyperlocal measurement of ambient air quality parameters and validates the efficacy of the mobile monitoring paradigm to capture the variation.

Challenges and Applications

The mobile monitoring paradigm carries its own set of challenges. In a purely mobile monitoring paradigm, the temporal resolution is compromised to achieve higher spatial resolution. In order to overcome this limitation, a combination of static and mobile monitoring paradigm should be considered. The next challenge is in the aggregation of such large data sets and their assimilation to extract meaningful information. Big data and data science approaches play a significant role in solving the above issue. The sheer volume, velocity and variety of data obtained from the sensor network necessitates the need for a big data framework. Once the data is aggregated, a variety of data science tools would be needed to make sense of the data and generate insights. Figure 3 shows the schematic for hyperlocal air quality assessment using low-cost sensor based mobile monitoring paradigm. Although it may not match the quality of information provided by regulatory networks, mobile monitoring networks are more than capable of providing relative information which could find use. Some applications of hyperlocal air quality information are as follows:

  1. Creating awareness: The information could be used to educate the public about the air quality across the city and help in creating awareness

  2. Augmenting existing sensor network: The hyperlocal information available could be used in addition to the regulatory fixed site monitors to fill gaps in the air quality information.

  3. Identifying spatio-temporal hotspots and characterizing them: Ambient air quality is a dynamic entity that varies drastically in both, space and time. There exist locations across a city that have varying concentrations of pollutants at different times. The objective here is to identify such hotspots of pollution and develop a map of the same. Simple algorithms would allow us to rank these hotspots from severe to benign and thus enable the development of customized solutions for social and commercial impact.

  4. Aiding city planning and policy making: This information could be used in pinpointing sources of air pollution and could inform policy makers and city planning commission of changes that would help keep air pollution and its impacts under check

  5. Assessing personal exposure: Given hyperlocal air quality information is available, the next step is estimate exposure to individuals. This would enable one to estimate how much pollutants people are exposed to in their daily life. This would also aid epidemiological studies which aim at understanding the effect on air pollution on human health.

  6. Identifying safest routes: Given two locations in a city, there would always exist one route which is relatively cleaner than the other alternates. Just like google is able to predict the quickest route between two locations, use of hyperlocal air quality information would allow people to identify the safest route from one location to another in which there is least exposure.

The area of low-cost sensing for air quality estimation itself is still in a nascent stage. However, with improvement in sensor technology combined with specialized algorithms and data science tools, low-cost sensor based mobile monitoring paradigm has the potential to completely overhaul fixed site monitoring and become the norm for air quality assessment across the globe.


Sathish Swaminathan 1, Raghunathan Rengaswamy 2,3
  1. PhD Scholar, Dept. of Chemical Engineering, Indian Institute of Technology, Madras
  2. Marti Mannariah Gurunath Institute Chair Professor, Dept. of Chemical Engineering, Indian Institute of Technology, Madras
  3. Robert Bosch Centre for Data Science and Artificial Intelligence


Air quality, Internet of Things, Hyperlocal, Mobile monitoring