I was interested in the noise levels in a space, and wanted to visualise how they changed over time. However, I wanted to know more about the distrib
I was interested in the noise levels in a space, and wanted to visualise how they changed over time. However, I wanted to know more about the distribution of noise levels over time, something that any single measure could not really provide, and even tools like a series of box plots felt clumsy. This is because the noises I am interested in are not the loudest noises, they typically increase the 60th-80th percentiles, though not always. I don’t have a strong intuition about an appropriate model for the noise levels, so choosing a model for its representation didn’t feel appropriate, so creating a graphic that moves me as close to the raw data as possible seemed the best idea.
ution of noise levels over time, something that any single measure could not really provide, and even tools like a series of box plots felt clumsy. This is because the noises I am interested in are not the loudest noises, they typically increase the 60th-80th percentiles, though not always. I don’t have a strong intuition about an appropriate model for the noise levels, so choosing a model for its representation didn’t feel appropriate, so creating a graphic that moves me as close to the raw data as possible seemed the best idea.
Drawing inspiration from Edwarde Tufte’s wavefields (and here), spectrograms, drawing histograms over time my previous experimentation with violin plots. I also like the Tensorflow Summary Histograms – though think my solution is a bit less heavy and intuitive to read for my data which is very unlikely to be multi-modal.
Tensorflow Summary Histogram
A summary histogram clearly showing a bifurcation in a distribution over some third variable.
I wanted to take the spectrum of noise levels for each minute and plot that distribution in a fairly continuous way over time. In the end I cheated to get the effect I wanted – I drew a line graph with 60 series (the quietest reading in each minute forms the first series, then the second quietest and so on), and the rasterization process when this is saved as an image makes it appear like a spectrum – but the results seem effective, giving an intuitive sense of how the distribution of noise levels has varied over time, with a minimum of interpretation forced by the visualisation – I feel quite close to the data.
24 hours of data viewed as a spectrum
Plot of 24 hours of noise levels (y-axis is noise level is dBA) – changed in the distribution of noise levels are immediately obvious – from a reduction in the variance overnight, to occasional increases in the volume (at all percentiles) during the day when there is activity near the sound-meter.
I wanted to be able to see this distribution online at any time, so just set the raspberry pi to generate and upload a graph each hour, using the previous 24 hours data – not fancy, but does what I want! I will make more graphics when I get round to it, increasing interactivity on the web page, taking longer and shorter periods, plotting previous mean values of noise at that time over the last week as so on.
Getting it done:
I used a sound level meter and connected this to a raspberry pi, the raspberry pi queries the microphone for a reading each second, as per instructions from http://www.swblabs.com/article/pi-soundmeter and https://codereview.stackexchange.com/questions/113389/read-decibel-level-from-a-usb-meter-display-it-as-a-live-visualization-and-sen. These are then saved each minute into a csv file to the raspberry pi. This script is started automatically on startup of the pi, so should run whenever the pi has power.
streams="Sound Level Meter:i"
assert dev is not None
#create the first file in which to save the sound level readings
sound_level_filepath = "/home/pi/Documents/sound_level_records/"
now_datetime_str = time.strftime("%Y_%m_%d_%H_%M",datetime.datetime.now().timetuple())
sound_level_file = open(sound_level_filepath + now_datetime_str,"w")
#every minute create a new file in which to save the sound level readings
now_datetime = datetime.datetime.now()
if (now_datetime.second == 0): #(now_datetime.minute == 0) and:
now_datetime_str = time.strftime("%Y_%m_%d_%H_%M",now_datetime.timetuple())
sound_level_file = open(sound_level_filepath + now_datetime_str,"w")
ret = dev.ctrl_transfer(0xC0,4,0,0,200)
dB = (ret+((ret&3)*256))*0.1+30
print time.strftime("%Y_%m_%d_%H_%M_%S",now_datetime.timetuple()) + "," + str(dB)
sound_level_file.write(time.strftime("%Y_%m_%d_%H_%M_%S",now_datetime.timetuple()) + "," + str(dB) + "\n")
Each hour, a scheduled task for the raspberry pi (using cron) is set to create a graph of the previous 24 hours of data, and upload to my website, behind a username and password, so I can see the results by visiting the page. The code is below.
print "also hello there!"
import numpy as np
import seaborn as sns
import pandas as pd
print "importing matplotlib"
print "finished importing matplotlib"
print "importing pylab"
print "finished importing pylab"
from os import listdir as listdir
from datetime import datetime
from datetime import timedelta
from dateutil.parser import parse
current_time = datetime.now()
#combined_dataframe = pd.DataFrame(columns=np.arange(60).tolist())
combined_dataframe = pd.DataFrame()
x_index = 
list_of_filenames = 
list_of_filenames.append(glob.glob('/home/pi/Documents/sound_level_records/' + time.strftime("%Y_%m_%d",current_time.timetuple()) + '*'))
list_of_filenames.append(glob.glob('/home/pi/Documents/sound_level_records/' + time.strftime("%Y_%m_%d",(current_time-timedelta(days=1)).timetuple()) + '*'))
list_of_filenames = [item for sublist in list_of_filenames for item in sublist]
#import data from each minute
for filename in list_of_filenames:
x_ordered_title = datetime.strptime(os.path.basename(filename), '%Y_%m_%d_%H_%M')
time_difference = current_time-x_ordered_title
if time_difference.days*86400+time_difference.seconds<86400: #24 hours
x = pd.read_csv(filename, header=None, names=['timestamp','dB'])
x_ordered = x.sort('dB')
x_ordered_data = x_ordered['dB'].tolist()
if len(x_ordered_data) == 60:
x_dataframe = pd.DataFrame(np.reshape(x_ordered['dB'].tolist(),(1,60)))
combined_dataframe = combined_dataframe.append(x_dataframe)
combined_dataframe.index = x_index
combined_dataframe = combined_dataframe.sort()
combined_dataframe.sort_index(inplace = True)
fig = matplotlib.pyplot.figure(dpi = 200, figsize = (10,10))
jet = matplotlib.pyplot.get_cmap('jet')
cNorm = matplotlib.colors.Normalize(vmin=1, vmax=60)
scalarMap = matplotlib.cm.ScalarMappable(norm=cNorm, cmap=jet)
for count in range(2,58):
colorVal = scalarMap.to_rgba(count)
fig = matplotlib.pyplot.plot(combined_dataframe.index,combined_dataframe.xs(count,axis=1), linewidth = 0.5, color=colorVal)
ftp = ftplib.FTP("davidjohnhewlett.co.uk","user","password")
ftp.set_pasv = False
f_file = open('/home/pi/Documents/Sound_meter_graphs/test.png','rb')
ftp.storbinary('STOR test.png', f_file)
print "oh dear"
Looking back at this project, it has taken a surprisingly familiar form – I have largely constructed it out of existing pieces of code, bolting them together. Even my previous practice with connecting to the raspberry pi using SSH has been helpful, transferring code to and from the raspberry pi whilst it is running in headless mode.
One particular issue I had not come across before, and was quite difficult to diagnose, was the raspberry not creating graphs initially when in headless mode, when it produced the graphs whenever I tested it plugged in. Perhaps unsurprisingly, the difference between the two was that I had a screen connected when I was testing the pi, but not when the pi was in working mode. Having a screen connected was significant as the backend for matplotlib was not loaded when no screen was connected – I needed to change the backend.
Did not work as it had for others. After a lot of frustration, changing the backend in the configuration file for matplotlib to ‘Agg’ seemed to work, as discussed here.