I am currently an Assistant Professor in Computer science and Engineering, IIT Delhi. My research interests lie in building distributed, networked and privacy-aware systems, focused on problems at the boundary of information technology and society. I have built systems for road traffic monitoring, human mobility measurements, public policy audit and privacy enhancement in ubiquitous systems, among others.
I like to work on problems where I can apply inter-disciplinary CS techniques and the solutions might potentially
have societal impact, especially in developing regions. I like the tension created by methods with
conflicting requirements, when incorporated in the same system. E.g. some sensing task might give best
results with high-dimensional deep learning based features extracted from the data. But if those features
need to be securely transmitted to a remote server for privacy reasons, cryptography will pose a conflicting
requirement for the plaintext message to be small, to minimize network bandwidth.
Building prototypes for real world problems also poses interesting trade-offs between performance vs.
practical issues. Processing high definition (HD) videos at powerful GPU servers might give the best performance
in road traffic monitoring. But the poor broadband infrastructure in developing countries might
prohibit real time streaming of HD videos from the roads to the servers. In-situ processing might mandate
using mobile and embedded platforms, but their processor and battery constraints can conflict with heavy
computation and low latency requirements.
Sometimes trade-offs come from conflicting interests of different stake holders. Individuals might have their personal opinions on privacy, while retailers might want to profile all customers alike for targeted advertising and financial profits. Balancing such conflicts between CS techniques, or a CS method and a real world constraint, or among different stake holders in an application scenario, gets me excited.
[1] Road traffic measurement in developing regions
Developed countries usemagnetic detectors and roadside camerasfor trafficmonitoring at important road
junctions. In my PhD thesis, I built novel sensing technologies to measure the density of non-laned
traffic, common in developing countries. I used noise levels and speeds computed from Doppler
shift of vehicle sounds, to give 70-80% accuracy in differentiating two congestion classes (high vs. low)
[MOBISYS10]. At Microsoft Research India (MSR-I), I processed images from road side cameras to compute
traffic densities and speeds and quantify the density-speed relations (called the fundamental curves
in transportation engineering) for the first time in non-laned traffic [DEV13].
I also used two Zigbee radios on opposite sides of the road, one sending and the other receiving packets,
to classify traffic states using variation in radio signal strength (RSSI). This novel technique gave 90-95%
accuracy in differentiating five congestion classes (empty road, fast traffic, moderate, slow and standing
traffic). Arrays of these radio pairs were used to to estimate queue lengths at traffic signals [SENSYS12].
In collaboration with traffic control authorities in Bengaluru and Mumbai, I deployed these sensors across
many roads for months, to understand accuracies across roads and seasons [SECON11][TOSN15].
In non-laned traffic of developing countries, two and three wheelers often percolate through larger vehicles to get ahead in traffic queues and wait less. Vehicle type agnostic travel time estimates are, therefore, less useful to plan routes and inform commuters. I collaborated with a Masters student from IIIT-Delhi, to automate vehicle type classification using inertial sensors in smartphones. Our Android app detected the vehicle type processing accelerometer, compass and gyroscope signals, to show customized route plans. This system was able to achieve above 90% vehicle classification accuracy, evaluated over 1500+ Km of driving data, on two urban road stretches in Delhi [MOBIQUITOUS14].
With recent advances in deep learning and availability of embedded GPU platforms
for efficient processing, we are currently training deep net models with non-laned traffic videos
collected during my PhD. The goal is to get vehicle counts, with vehicle type classification, in real time,
on embedded platforms. This is joint work with an Indian startup Altigreen, which needs traffic density
estimates to predict the emission reduction percentage possible with their hybrid car engines.
[2] Human mobility measurement, both outdoor and indoor
Walkable access to facilities like grocery, public transport, banks and doctors, can significantly increase the urban quality of life. We devised a method to quantify walkability, using Google Maps Places and Distance APIs [ICWSM16]. Dividing a city in 200mx200m square grid, we counted facilities
within walking distance at each grid point. We used this scalable method to characterize walkable access
within cities. We also compared walkability across 25 cities around the world, quantifying differences between European, American, Asian and
developing countries. These analyses are currently under submission in the PLOS ONE journal.
Indoor localization is useful for retail applications like targeted advertising. Wi-Fi
based indoor localization is popular, but needs creation of a data set comprising Wi-Fi measurements at
known locations. During a summer internship at MSR-I, we devised a method to automatedly crowd-source
this calibration data [MOBICOM12][US Patent 9,310,462]. The only input necessary was a floor-map of
the site. Sampling accelerometer and compass sensors on participating smartphones, while simultaneously
taking Wi-Fi scans, the system employed particle filtering to localize each scan and gradually build
the calibration database for the venue.
In a stopgap position at SMU Livelabs, I worked on monitoring shopper group dynamics
for retail applications. In [SENSYS14], we detected groups in data collected from 258 shopping episodes
of 154 volunteers, in two large shopping complexes in Korea and Singapore, and the shopping areas of the
Changi International Airport. As the users stopped and walked, took turns in the corridors or climbed up
and down the escalators, their accelerometer, compass and barometer signals showed the same temporal
changes respectively. We clustered these correlated time-series as groups and obtained 80% recall with
97% precision, even in venues with limited or no indoor localization infrastructure.
[3] Privacy enhancing systems for smart devices
The projects described so far focused on accurate, scalable data collection and analysis for different applications
in traffic and retail. But when (i) data collection becomes ubiquitous with cameras on smartphones
and advertising billboards, (ii) data analytics starts achieving human level accuracies with AI and
deep learning, and (iii) disparate information becomes more linkable and public with online sharing on
social media, people might gradually lose control on who is collecting what information on them and why.
One of my recent projects focuses on giving people more control on their data privacy.
In [MOBISYS16], we design a privacy-preserving image capture system called I-Pic. I-Pic enables people to have a say in whether to appear in any image taken in their vicinity. People's smartphones transmit these
privacy preferences (blur my face, show my face, remove me etc.) on short range radio. The I-Pic Android
app on the photographer's smartphone listens to these transmissions, detects faces in the photograph and
applies the requested privacy policy to each face, before the image can be saved or shared online. To match what preference is requested for which face in the photo, each request needs a facial signature (we use deep learning based features here). Transmitting facial signatures on BLE would introduce a new
privacy leak. Secure Multi-Party Computation (MPC) is used to securely compare each of the m detected
faces, against each of the n by-standers transmitting on BLE.
We are enhancing the by-stander privacy system in I-Pic to privacy of sensitive content in restricted spaces (e.g. security check facilities in airports or some copy-righted pictures in a museum). This new system will also have a different trust model of untrusted and possibly malicious devices and device owners, compared to I-Pic which assumed co-operative photographer devices. Deep learning for arbitrary object detection (the sensitive object is no longer "face" as in I-Pic) and enforcement leveraging trusted hardware on mobile platforms like ARM Trustzone, will be the main technical components in this work. There are exciting future research directions in enhancing privacy for ubiquitous computing: E.g. (i) privacy preserving image analytics to balance privacy needs of individuals with economic prospects of retailers, (ii) privacy vulnerability analysis and mitigation of linked data, where mobile and physical sensor data is linked with online persona of an individual to reveal more about her without her knowledge or consent. I wish to explore these research directions in near future.
[4] Empirical audit of public policies
I recently conducted a multi-dimensional audit of Free Basics, a controversial program from Facebook in
collaboration with telecom operators, claiming to bridge the digital divide in developing countries. This
program offers zero-rated web services in 50+ countries, but has been banned in India under strong opposition
from proponents of an open Internet. Using client-side vantage points in 15+ countries, we studied
the available Free Basics services. Deploying our own services through Facebook, we characterized the
Free Basics users and network QoS from a web server's viewpoint. Our findings provided empirical data
supporting some of the opponents' concerns, while highlighting some benefits posited by its proponents. More details can be found in [IMC16], [ICTD17] and [CCR17], and the dataset collected is at [REPO].
Public policies often lack data-driven informed debates, especially
in developing countries. For example, the odd-even rule allowing odd and even numbered vehicles on alternate
days, divided supporters and opponents of the political party that introduced this policy in Delhi.
Empirical data on air pollution or public-private vehicle ratios on the road to quantify improvements if
any, were missing in the proponents' arguments. Neither did the opponents use data on increased queue
lengths at subway stations or cab price surges, to quantify the negatives. Keeping the measurement frameworks
ready in case of such a policy change, to see if the extreme partisan energy can be harnessed for
crowd-sourcing empirical data, will be very interesting from an HCI and incentive research point of view.
Making such crowd-sourcing apps tamper proof, with and without hardware security support, will
also be technically challenging. I wish to explore these research directions in the future.