I was interested in the noise levels in a space, and wanted to visualise how they changed over time. However, I wanted to know more about the distrib
I was interested in the noise levels in a space, and wanted to visualise how they changed over time. However, I wanted to know more about the distribution of noise levels over time, something that any single measure could not really provide, and even tools like a series of box plots felt clumsy. This is because the noises I am interested in are not the loudest noises, they typically increase the 60th-80th percentiles, though not always. I don’t have a strong intuition about an appropriate model for the noise levels, so choosing a model for its representation didn’t feel appropriate, so creating a graphic that moves me as close to the raw data as possible seemed the best idea.
ution of noise levels over time, something that any single measure could not really provide, and even tools like a series of box plots felt clumsy. This is because the noises I am interested in are not the loudest noises, they typically increase the 60th-80th percentiles, though not always. I don’t have a strong intuition about an appropriate model for the noise levels, so choosing a model for its representation didn’t feel appropriate, so creating a graphic that moves me as close to the raw data as possible seemed the best idea.
Drawing inspiration from Edwarde Tufte’s wavefields (and here), spectrograms, drawing histograms over time my previous experimentation with violin plots. I also like the Tensorflow Summary Histograms – though think my solution is a bit less heavy and intuitive to read for my data which is very unlikely to be multi-modal.
A summary histogram clearly showing a bifurcation in a distribution over some third variable.
I wanted to take the spectrum of noise levels for each minute and plot that distribution in a fairly continuous way over time. In the end I cheated to get the effect I wanted – I drew a line graph with 60 series (the quietest reading in each minute forms the first series, then the second quietest and so on), and the rasterization process when this is saved as an image makes it appear like a spectrum – but the results seem effective, giving an intuitive sense of how the distribution of noise levels has varied over time, with a minimum of interpretation forced by the visualisation – I feel quite close to the data.
Plot of 24 hours of noise levels (y-axis is noise level is dBA) – changed in the distribution of noise levels are immediately obvious – from a reduction in the variance overnight, to occasional increases in the volume (at all percentiles) during the day when there is activity near the sound-meter.
I wanted to be able to see this distribution online at any time, so just set the raspberry pi to generate and upload a graph each hour, using the previous 24 hours data – not fancy, but does what I want! I will make more graphics when I get round to it, increasing interactivity on the web page, taking longer and shorter periods, plotting previous mean values of noise at that time over the last week as so on.
Getting it done:
I used a sound level meter and connected this to a raspberry pi, the raspberry pi queries the microphone for a reading each second, as per instructions from http://www.swblabs.com/article/pi-soundmeter and https://codereview.stackexchange.com/questions/113389/read-decibel-level-from-a-usb-meter-display-it-as-a-live-visualization-and-sen. These are then saved each minute into a csv file to the raspberry pi. This script is started automatically on startup of the pi, so should run whenever the pi has power.
#!/usr/bin/python import sys import usb.core import requests import time import datetime import subprocess streams="Sound Level Meter:i" tokens="" dev=usb.core.find(idVendor=0x16c0,idProduct=0x5dc) assert dev is not None print dev print hex(dev.idVendor)+','+hex(dev.idProduct) #create the first file in which to save the sound level readings sound_level_filepath = "/home/pi/Documents/sound_level_records/" now_datetime_str = time.strftime("%Y_%m_%d_%H_%M",datetime.datetime.now().timetuple()) sound_level_file = open(sound_level_filepath + now_datetime_str,"w") while True: #every minute create a new file in which to save the sound level readings now_datetime = datetime.datetime.now() if (now_datetime.second == 0): #(now_datetime.minute == 0) and: sound_level_file.close now_datetime_str = time.strftime("%Y_%m_%d_%H_%M",now_datetime.timetuple()) sound_level_file = open(sound_level_filepath + now_datetime_str,"w") time.sleep(1) ret = dev.ctrl_transfer(0xC0,4,0,0,200) dB = (ret+((ret&3)*256))*0.1+30 print time.strftime("%Y_%m_%d_%H_%M_%S",now_datetime.timetuple()) + "," + str(dB) sound_level_file.write(time.strftime("%Y_%m_%d_%H_%M_%S",now_datetime.timetuple()) + "," + str(dB) + "\n")
Each hour, a scheduled task for the raspberry pi (using cron) is set to create a graph of the previous 24 hours of data, and upload to my website, behind a username and password, so I can see the results by visiting the page. The code is below.
#!/usr/bin/python print "also hello there!" import time import numpy as np import seaborn as sns import pandas as pd print "importing matplotlib" import matplotlib print "finished importing matplotlib" print "importing pylab" import pylab print "finished importing pylab" from os import listdir as listdir from datetime import datetime from datetime import timedelta from dateutil.parser import parse import glob import os import ftplib def myFormatter(x,pos): return pd.to_datetime(x) current_time = datetime.now() #combined_dataframe = pd.DataFrame(columns=np.arange(60).tolist()) combined_dataframe = pd.DataFrame() x_index =  list_of_filenames =  print combined_dataframe sns.set(color_codes=True) list_of_filenames.append(glob.glob('/home/pi/Documents/sound_level_records/' + time.strftime("%Y_%m_%d",current_time.timetuple()) + '*')) list_of_filenames.append(glob.glob('/home/pi/Documents/sound_level_records/' + time.strftime("%Y_%m_%d",(current_time-timedelta(days=1)).timetuple()) + '*')) list_of_filenames = [item for sublist in list_of_filenames for item in sublist] print len(list_of_filenames) #import data from each minute for filename in list_of_filenames: x_ordered_title = datetime.strptime(os.path.basename(filename), '%Y_%m_%d_%H_%M') time_difference = current_time-x_ordered_title if time_difference.days*86400+time_difference.seconds<86400: #24 hours x = pd.read_csv(filename, header=None, names=['timestamp','dB']) x_ordered = x.sort('dB') x_ordered_data = x_ordered['dB'].tolist() if len(x_ordered_data) == 60: x_dataframe = pd.DataFrame(np.reshape(x_ordered['dB'].tolist(),(1,60))) x_index.append(x_ordered_title) combined_dataframe = combined_dataframe.append(x_dataframe) combined_dataframe.index = x_index combined_dataframe = combined_dataframe.sort() combined_dataframe.sort_index(inplace = True) fig = matplotlib.pyplot.figure(dpi = 200, figsize = (10,10)) jet = matplotlib.pyplot.get_cmap('jet') cNorm = matplotlib.colors.Normalize(vmin=1, vmax=60) scalarMap = matplotlib.cm.ScalarMappable(norm=cNorm, cmap=jet) for count in range(2,58): colorVal = scalarMap.to_rgba(count) fig = matplotlib.pyplot.plot(combined_dataframe.index,combined_dataframe.xs(count,axis=1), linewidth = 0.5, color=colorVal) pylab.savefig('/home/pi/Documents/Sound_meter_graphs/test.png') try: ftp = ftplib.FTP("davidjohnhewlett.co.uk","user","password") ftp.set_pasv = False ftp.cwd("/public_html/sound_levels/") f_file = open('/home/pi/Documents/Sound_meter_graphs/test.png','rb') ftp.storbinary('STOR test.png', f_file) ftp.close() except: print "oh dear"
Looking back at this project, it has taken a surprisingly familiar form – I have largely constructed it out of existing pieces of code, bolting them together. Even my previous practice with connecting to the raspberry pi using SSH has been helpful, transferring code to and from the raspberry pi whilst it is running in headless mode.
One particular issue I had not come across before, and was quite difficult to diagnose, was the raspberry not creating graphs initially when in headless mode, when it produced the graphs whenever I tested it plugged in. Perhaps unsurprisingly, the difference between the two was that I had a screen connected when I was testing the pi, but not when the pi was in working mode. Having a screen connected was significant as the backend for matplotlib was not loaded when no screen was connected – I needed to change the backend.
Import matplotlib Matplotlib.use(‘Agg’)
Did not work as it had for others. After a lot of frustration, changing the backend in the configuration file for matplotlib to ‘Agg’ seemed to work, as discussed here.