{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "finite-correspondence",
   "metadata": {
    "lines_to_next_cell": 0
   },
   "source": [
    "# Dynamic Test\n",
    "We compared our low-cost UBC - PM sensors to an industry-standard GRIMM OPC sensor (Model: 1.109). For the comparison, saltwater solutions with known molar concentrations were fed into an atomizer. The atomizer separated/produced air-borne salt particles that traveled through an adjustable filter and into a 22L glass chamber containing the five UBC – PM Sensors and a small fan to mix the air (Figure 4CR). Simultaneously, the Grimm OPC sensor pulled air from the glass chamber to sample particle concentrations. The chamber also had an exhaust port to help maintain consistent concentration levels.\n",
    "<br>\n",
    "<br>\n",
    "<img src=\"_static/img/chamber.jpeg\" alt=\"chamber\" width=\"300\"/>\n",
    "<br>\n",
    "<br>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "secure-bookmark",
   "metadata": {
    "lines_to_next_cell": 0
   },
   "source": [
    "The primary objective was to compare the UBC - PM Sensor's static and dynamic response to the GRIMM instrument. To accomplish this, adjustments to the inflow filter were made to moderate the concentrations of particles entering the chamber. An appropriate amount of time was given so that the concentration levels stabilize. Sampling time ranged between 15-30 minutes. Concentration values were varied so that a calibration curve could be derived and then applied to the UBC – PM sensors to correlate to the GRIMM measured values. In developing the calibration cure it was assumed that the GRIMM OPC provided a true measure of PM concentrations.\n",
    "<br>\n",
    "<br>\n",
    "We conducted 16.5 hours of sampling over a two-day period. The GRIMM and UBC – PM sensors sample at one minute and 1.6 seconds, respectively. The UBC - PM sensors were time-averaged to the same one-minute sampling period of the GRIMM instrument for comparison. Of the five UBC – PM sensors, one malfunctioned and reported erroneous values and was removed from the study. \n",
    "<br>\n",
    "<br>\n",
    "Lets Dive in and assess the the first 12 hours of sampling data (obtained on day one)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "sonic-constitution",
   "metadata": {
    "lines_to_next_cell": 0
   },
   "outputs": [],
   "source": [
    "import context\n",
    "import pickle\n",
    "import numpy as np\n",
    "import pandas as pd\n",
    "from functools import reduce\n",
    "import matplotlib.pyplot as plt\n",
    "from matplotlib.dates import DateFormatter\n",
    "from matplotlib.offsetbox import AnchoredText\n",
    "import matplotlib.lines as mlines\n",
    "import matplotlib.transforms as mtransforms\n",
    "import matplotlib.pylab as pylab\n",
    "import sklearn\n",
    "from sklearn.neural_network import MLPRegressor\n",
    "from sklearn.preprocessing import MinMaxScaler\n",
    "from sklearn.metrics import mean_squared_error\n",
    "from scipy import stats\n",
    "\n",
    "from datetime import datetime, timedelta\n",
    "from pathlib import Path\n",
    "from context import data_dir, save_dir\n",
    "\n",
    "# Back pratices but this keeps the notebook clean when displaying on html\n",
    "import warnings\n",
    "warnings.filterwarnings(\"ignore\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "equipped-current",
   "metadata": {},
   "source": [
    "\n",
    "## Import OPC data!\n",
    "UBC AQ data was one minute averaged to match the GRIMM sample rate."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "arranged-envelope",
   "metadata": {},
   "outputs": [],
   "source": [
    "\n",
    "## define date of interest\n",
    "date_of_int = '20210430' # options 20210430 or 20210502\n",
    "\n",
    "\n",
    "def prepare_df(ubc_pm, date_of_int):\n",
    "  \"\"\"\n",
    "  Function cleans UBC OPCs data by removing duplicate headers and dropping error values. \n",
    "  Once cleaned takes 1 min avg of data to match GRIM sample rate.\n",
    "  \"\"\"\n",
    "  df = pd.read_csv(str(data_dir) + ubc_pm + date_of_int + '.TXT')\n",
    "  df = df.drop(df[df.rtctime == 'rtctime'].index)\n",
    "  df = df[~df['pm10_env'].str.contains('Rec')]\n",
    "  time = pd.to_datetime(df['rtctime'])\n",
    "  df.index = pd.DatetimeIndex(pd.to_datetime(df['rtctime']))\n",
    "  df = df.drop(['rtctime'], axis=1)\n",
    "  df = df.astype(float)\n",
    "  df = df[df['pm10_env'] < 4000]\n",
    "  df = df[df['pm25_env'] < 4000]\n",
    "  df = df[df['pm100_env'] < 4000]\n",
    "\n",
    "  df = df.resample('1Min').mean()\n",
    "\n",
    "  col = ubc_pm.strip('/').replace('-','_')\n",
    "  new_name = {}\n",
    "  for name in list(df):\n",
    "    new_name.update({name : col+'_'+name} )\n",
    "  df = df.rename(new_name, axis='columns')\n",
    "\n",
    "  return df\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "dress-skating",
   "metadata": {
    "lines_to_next_cell": 0
   },
   "source": [
    "Open the ubc opcs"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "considerable-connectivity",
   "metadata": {},
   "outputs": [],
   "source": [
    "\n",
    "ubc_list = [prepare_df(f\"/UBC-PM-0{i}/\", date_of_int) for i in range(1,6)]\n",
    "df_ubc = reduce(lambda x, y: pd.merge(x, y, on='rtctime'), ubc_list)\n",
    "df_ubc.columns = df_ubc.columns.str.replace('_y', '')"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "organized-truth",
   "metadata": {
    "lines_to_next_cell": 0
   },
   "source": [
    "Open the grim opc"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "korean-destiny",
   "metadata": {
    "lines_to_next_cell": 2
   },
   "outputs": [],
   "source": [
    "def open_grimm(date_of_int, min_ubc):\n",
    "  try:\n",
    "    df_grim = pd.read_csv(str(data_dir) + f'/GRIM/{date_of_int}.csv')\n",
    "    df_grim['date & time'] = pd.to_datetime(df_grim['date & time'])\n",
    "  except:\n",
    "    pathlist = sorted(Path(str(data_dir) + '/2021_OPCintercomparison/').glob(f'{date_of_int}*'))\n",
    "    sheets = ['PM values', 'Count values', 'Mass values', 'Log values']\n",
    "    df_grim = [pd.read_excel(f'{pathlist[0]}/{pathlist[0].stem}_sample01.xlsx', sheet_name= sheet, skiprows=4, engine='openpyxl') for sheet in sheets]\n",
    "    df_grim = reduce(lambda x, y: pd.merge(x, y, on='date & time'), df_grim)\n",
    "    df_grim['date & time'] = pd.to_datetime(df_grim['date & time'], dayfirst=True)\n",
    "    df_grim.to_csv(str(data_dir) + f'/GRIM/{date_of_int}.csv', index=False)\n",
    "\n",
    "\n",
    "  df_grim['date & time'] = df_grim['date & time'].dt.round('1min')  \n",
    "  df_grim = df_grim.set_index('date & time')\n",
    "  df_grim = df_grim.join(min_ubc, how='outer')\n",
    "  df_grim = df_grim.dropna()\n",
    "  return df_grim\n",
    "\n",
    "min_ubc = np.argmin([len(ubc_list[i]) for i in range(len(ubc_list))])\n",
    "df_grim = open_grimm(date_of_int, ubc_list[min_ubc])"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "stretch-scope",
   "metadata": {
    "lines_to_next_cell": 0
   },
   "source": [
    "Merge UBC dfs with Grimm df and print first five rows "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "fewer-policy",
   "metadata": {
    "lines_to_next_cell": 0
   },
   "outputs": [],
   "source": [
    "df_final = pd.merge(left=df_grim, right=df_ubc, left_index=True, right_index=True, how='left')\n",
    "df_final.columns = df_final.columns.str.replace('_y', '')\n",
    "df_final.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "accomplished-publicity",
   "metadata": {
    "lines_to_next_cell": 0
   },
   "source": [
    "Define default font sizes for ploting"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "equivalent-orbit",
   "metadata": {
    "lines_to_next_cell": 2
   },
   "outputs": [],
   "source": [
    "params = {\n",
    "         'xtick.labelsize':14,\n",
    "         'ytick.labelsize': 14,\n",
    "          'axes.labelsize':14,\n",
    "\n",
    "         }\n",
    "\n",
    "pylab.rcParams.update(params)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "weighted-prediction",
   "metadata": {
    "lines_to_next_cell": 0
   },
   "source": [
    "### Plot PM 1.0 "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "forced-viking",
   "metadata": {
    "lines_to_next_cell": 0
   },
   "outputs": [],
   "source": [
    "\n",
    "colors = plt.rcParams['axes.prop_cycle'].by_key()['color']\n",
    "\n",
    "\n",
    "fig = plt.figure(figsize=(14, 12))\n",
    "fig.autofmt_xdate()\n",
    "xfmt = DateFormatter(\"%m-%d %H:00\")\n",
    "fig.suptitle(r\"PM 1.0 ($\\frac{\\mu g}{m^3}$)\", fontsize=16)\n",
    "ax = fig.add_subplot(2, 1, 2)\n",
    "ax.plot(df_final.index,df_final['PM1 [ug/m3]'], lw = 4.0, label = 'GRIMM', color = colors[0])\n",
    "ax.plot(df_final.index,df_final['UBC_PM_01_pm10_env'], label = 'UBC-PM-01', color = colors[1])\n",
    "# ax.plot(df_final.index,df_final['UBC_PM_02_pm10_env'], label = 'UBC-PM-02', color = colors[2])\n",
    "ax.plot(df_final.index,df_final['UBC_PM_03_pm10_env'], label = 'UBC-PM-03', color = colors[2])\n",
    "ax.plot(df_final.index,df_final['UBC_PM_04_pm10_env'], label = 'UBC-PM-04', color = colors[3])\n",
    "ax.plot(df_final.index,df_final['UBC_PM_05_pm10_env'], label = 'UBC-PM-05', color = colors[4])\n",
    "ax.set_ylabel(r'$\\frac{\\mu g}{m^3}$', rotation=0)\n",
    "ax.set_xlabel('Time (MM-DD HH)')\n",
    "ax.legend(\n",
    "    loc=\"upper center\",\n",
    "    bbox_to_anchor=(0.5, 2.44),\n",
    "    ncol=6,\n",
    "    fancybox=True,\n",
    "    shadow=True,\n",
    ")\n",
    "ax = fig.add_subplot(2, 2, 1)\n",
    "size = 6\n",
    "ax.scatter(df_final['PM1 [ug/m3]'], df_final['PM1 [ug/m3]'], s=size, color = colors[0])\n",
    "ax.scatter(df_final['PM1 [ug/m3]'], df_final['UBC_PM_01_pm10_env'], s=size, color = colors[1])\n",
    "# ax.scatter(df_grim['PM1 [ug/m3]'], df_final['UBC_PM_02_pm10_env'], s=size, color = colors[2])\n",
    "ax.scatter(df_final['PM1 [ug/m3]'], df_final['UBC_PM_03_pm10_env'], s=size, color = colors[2])\n",
    "ax.scatter(df_final['PM1 [ug/m3]'], df_final['UBC_PM_04_pm10_env'], s=size, color = colors[3])\n",
    "ax.scatter(df_final['PM1 [ug/m3]'], df_final['UBC_PM_05_pm10_env'], s=size, color = colors[4])\n",
    "ax.set_ylabel(r'$\\frac{\\mu g}{m^3}$', rotation=0)\n",
    "ax.set_xlabel(r'$\\frac{\\mu g}{m^3}$')\n",
    "\n",
    "ax = fig.add_subplot(2, 2, 2)\n",
    "bins = 20\n",
    "alpha = 0.6\n",
    "ax.hist(df_final['PM1 [ug/m3]'],bins,alpha = alpha, color = colors[0], zorder = 10)\n",
    "ax.hist(df_final['UBC_PM_01_pm10_env'],bins, alpha = alpha, color = colors[1])\n",
    "# ax.hist(df_final['UBC_PM_02_pm10_env'],bins, alpha = alpha,color = colors[2])\n",
    "ax.hist(df_final['UBC_PM_03_pm10_env'],bins, alpha = alpha, color = colors[2])\n",
    "ax.hist(df_final['UBC_PM_04_pm10_env'],bins, alpha = alpha,color = colors[3])\n",
    "ax.hist(df_final['UBC_PM_05_pm10_env'],bins, alpha = alpha, color = colors[4])\n",
    "ax.set_ylabel('Count')\n",
    "ax.set_xlabel(r'$\\frac{\\mu g}{m^3}$')"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "developed-rider",
   "metadata": {
    "lines_to_next_cell": 0
   },
   "source": [
    "Figure 1; scatter plots, histograms, and a times series comparison of the measured PM 1.0 concentration of four UBC- PM Sensors [UBC-PM-01 (orange), UBC-PM-03 (green), UBC-PM-04 (red), UBC-PM-05 (purple)] and the GRIM sensor [GRIMM (blue)]. The time series shows one-minute averaged PM 1.0 concentrations incremented from 2021-04-30 09:27:00 until 2021-04-30 21:36:00"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "severe-investing",
   "metadata": {
    "lines_to_next_cell": 0
   },
   "source": [
    "### Plot PM 2.5 "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "capital-closure",
   "metadata": {
    "lines_to_next_cell": 2
   },
   "outputs": [],
   "source": [
    "\n",
    "\n",
    "fig = plt.figure(figsize=(14, 12))\n",
    "fig.autofmt_xdate()\n",
    "xfmt = DateFormatter(\"%m-%d %H:00\")\n",
    "fig.suptitle(r\"PM 2.5 ($\\frac{\\mu g}{m^3}$)\", fontsize=16)\n",
    "ax = fig.add_subplot(2, 1, 2)\n",
    "ax.plot(df_final.index,df_final['PM2.5 [ug/m3]'], lw = 4.0, label = 'GRIMM', color = colors[0])\n",
    "ax.plot(df_final.index,df_final['UBC_PM_01_pm25_env'], label = 'UBC-PM-01', color = colors[1])\n",
    "# ax.plot(df_final.index,df_final['UBC_PM_02_pm10_env'], label = 'UBC-PM-02', color = colors[2])\n",
    "ax.plot(df_final.index,df_final['UBC_PM_03_pm25_env'], label = 'UBC-PM-03', color = colors[2])\n",
    "ax.plot(df_final.index,df_final['UBC_PM_04_pm25_env'], label = 'UBC-PM-04', color = colors[3])\n",
    "ax.plot(df_final.index,df_final['UBC_PM_05_pm25_env'], label = 'UBC-PM-05', color = colors[4])\n",
    "ax.set_ylabel(r'$\\frac{\\mu g}{m^3}$', rotation=0)\n",
    "ax.set_xlabel('Time (MM-DD HH)')\n",
    "ax.legend(\n",
    "    loc=\"upper center\",\n",
    "    bbox_to_anchor=(0.5, 2.44),\n",
    "    ncol=6,\n",
    "    fancybox=True,\n",
    "    shadow=True,\n",
    ")\n",
    "ax = fig.add_subplot(2, 2, 1)\n",
    "size = 6\n",
    "ax.scatter(df_final['PM2.5 [ug/m3]'], df_final['PM2.5 [ug/m3]'], s=size, color = colors[0])\n",
    "ax.scatter(df_final['PM2.5 [ug/m3]'], df_final['UBC_PM_01_pm25_env'], s=size, color = colors[1])\n",
    "# ax.scatter(df_grim['PM2.5 [ug/m3]'], df_final['UBC_PM_02_pm25_env'], s=size, color = colors[2])\n",
    "ax.scatter(df_final['PM2.5 [ug/m3]'], df_final['UBC_PM_03_pm25_env'], s=size, color = colors[2])\n",
    "ax.scatter(df_final['PM2.5 [ug/m3]'], df_final['UBC_PM_04_pm25_env'], s=size, color = colors[3])\n",
    "ax.scatter(df_final['PM2.5 [ug/m3]'], df_final['UBC_PM_05_pm25_env'], s=size, color = colors[4])\n",
    "ax.set_ylabel(r'UBC $\\frac{\\mu g}{m^3}$', rotation=0)\n",
    "ax.set_xlabel(r'GRIMM $\\frac{\\mu g}{m^3}$')\n",
    "\n",
    "ax = fig.add_subplot(2, 2, 2)\n",
    "bins = 20\n",
    "alpha = 0.6\n",
    "ax.hist(df_final['PM2.5 [ug/m3]'],bins,alpha = alpha, color = colors[0], zorder = 10)\n",
    "ax.hist(df_final['UBC_PM_01_pm25_env'],bins, alpha = alpha, color = colors[1])\n",
    "# ax.hist(df_final['UBC_PM_02_pm25_env'],bins, alpha = alpha,color = colors[2])\n",
    "ax.hist(df_final['UBC_PM_03_pm25_env'],bins, alpha = alpha, color = colors[2])\n",
    "ax.hist(df_final['UBC_PM_04_pm25_env'],bins, alpha = alpha,color = colors[4])\n",
    "ax.hist(df_final['UBC_PM_05_pm25_env'],bins, alpha = alpha, color = colors[5])\n",
    "ax.set_ylabel('Count')\n",
    "ax.set_xlabel(r'$\\frac{\\mu g}{m^3}$')"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "acceptable-skiing",
   "metadata": {
    "lines_to_next_cell": 0
   },
   "source": [
    "Figure 2; scatter plots, histograms, and a times series comparison of the measured PM 2.5 concentration of four UBC- PM Sensors [UBC-PM-01 (orange), UBC-PM-03 (green), UBC-PM-04 (red), UBC-PM-05 (purple)] and the GRIM sensor [GRIMM (blue)]. The time series shows one-minute averaged PM 2.5 concentrations incremented from 2021-04-30 09:27:00 until 2021-04-30 21:36:00"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "respiratory-platinum",
   "metadata": {
    "lines_to_next_cell": 0
   },
   "source": [
    "### Plot PM 10 "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "thousand-plaintiff",
   "metadata": {
    "lines_to_next_cell": 2
   },
   "outputs": [],
   "source": [
    "\n",
    "fig = plt.figure(figsize=(14, 12))\n",
    "fig.autofmt_xdate()\n",
    "xfmt = DateFormatter(\"%m-%d %H:00\")\n",
    "fig.suptitle(r\"PM 10 ($\\frac{\\mu g}{m^3}$)\", fontsize=16)\n",
    "ax = fig.add_subplot(2, 1, 2)\n",
    "ax.plot(df_final.index,df_final['PM10 [ug/m3]'], lw = 4.0, label = 'GRIMM', color = colors[0])\n",
    "ax.plot(df_final.index,df_final['UBC_PM_01_pm100_env'], label = 'UBC-PM-01', color = colors[1])\n",
    "# ax.plot(df_final.index,df_final['UBC_PM_02_pm100_env'], label = 'UBC-PM-02', color = colors[2])\n",
    "ax.plot(df_final.index,df_final['UBC_PM_03_pm100_env'], label = 'UBC-PM-03', color = colors[2])\n",
    "ax.plot(df_final.index,df_final['UBC_PM_04_pm100_env'], label = 'UBC-PM-04', color = colors[3])\n",
    "ax.plot(df_final.index,df_final['UBC_PM_05_pm100_env'], label = 'UBC-PM-05', color = colors[4])\n",
    "ax.set_ylabel(r'$\\frac{\\mu g}{m^3}$', rotation=0)\n",
    "ax.set_xlabel('Time (MM-DD HH)')\n",
    "ax.legend(\n",
    "    loc=\"upper center\",\n",
    "    bbox_to_anchor=(0.5, 2.44),\n",
    "    ncol=6,\n",
    "    fancybox=True,\n",
    "    shadow=True,\n",
    ")\n",
    "ax = fig.add_subplot(2, 2, 1)\n",
    "size = 6\n",
    "ax.scatter(df_final['PM10 [ug/m3]'], df_final['PM10 [ug/m3]'], s=size, color = colors[0])\n",
    "ax.scatter(df_final['PM10 [ug/m3]'], df_final['UBC_PM_01_pm100_env'], s=size, color = colors[1])\n",
    "# ax.scatter(df_grim['PM10 [ug/m3]'], df_final['UBC_PM_02_pm100_env'], s=size, color = colors[2])\n",
    "ax.scatter(df_final['PM10 [ug/m3]'], df_final['UBC_PM_03_pm100_env'], s=size, color = colors[2])\n",
    "ax.scatter(df_final['PM10 [ug/m3]'], df_final['UBC_PM_04_pm100_env'], s=size, color = colors[3])\n",
    "ax.scatter(df_final['PM10 [ug/m3]'], df_final['UBC_PM_05_pm100_env'], s=size, color = colors[4])\n",
    "ax.set_ylabel(r'$\\frac{\\mu g}{m^3}$', rotation=0)\n",
    "ax.set_xlabel(r'$\\frac{\\mu g}{m^3}$')\n",
    "\n",
    "ax = fig.add_subplot(2, 2, 2)\n",
    "bins = 20\n",
    "alpha = 0.6\n",
    "ax.hist(df_final['PM10 [ug/m3]'],bins,alpha = alpha, color = colors[0], zorder = 10)\n",
    "ax.hist(df_final['UBC_PM_01_pm100_env'],bins, alpha = alpha, color = colors[1])\n",
    "# ax.hist(df_final['UBC_PM_02_pm10_env'],bins, alpha = alpha,color = colors[2])\n",
    "ax.hist(df_final['UBC_PM_03_pm100_env'],bins, alpha = alpha, color = colors[2])\n",
    "ax.hist(df_final['UBC_PM_04_pm100_env'],bins, alpha = alpha,color = colors[3])\n",
    "ax.hist(df_final['UBC_PM_05_pm100_env'],bins, alpha = alpha, color = colors[4])\n",
    "ax.set_ylabel('Count')\n",
    "ax.set_xlabel(r'$\\frac{\\mu g}{m^3}$')"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "intended-shoulder",
   "metadata": {
    "lines_to_next_cell": 0
   },
   "source": [
    "Figure 3; scatter plots, histograms, and a times series comparison of the measured PM 10 concentration of four UBC- PM Sensors [UBC-PM-01 (orange), UBC-PM-03 (green), UBC-PM-04 (red), UBC-PM-05 (purple)] and the GRIM sensor [GRIMM (blue)]. The time series shows one-minute averaged PM 10 concentrations incremented from 2021-04-30 09:27:00 until 2021-04-30 21:36:00"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "prescription-photography",
   "metadata": {
    "lines_to_next_cell": 0
   },
   "source": [
    "## Preliminary Results\n",
    "Based upon the above graphics the UBC- PM sensors perform acceptably at PM 1.0 concentrations; however, they significantly over-predict the concentration levels of PM 2.5 and 10. \n",
    "<br>\n",
    "To understand why this occurred let's looked at a time series of the GRIMM PM 1.0, 2.5 and 10 and the particle size distribution."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "wanted-breakdown",
   "metadata": {
    "lines_to_next_cell": 0
   },
   "source": [
    "### Plot time series PM 1.0, 2.5 and 10 from the GRIMM sensor."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "automated-matrix",
   "metadata": {
    "lines_to_next_cell": 0
   },
   "outputs": [],
   "source": [
    "fig = plt.figure(figsize=(14, 12))\n",
    "fig.autofmt_xdate()\n",
    "xfmt = DateFormatter(\"%m-%d %H:00\")\n",
    "alpha = 0.7\n",
    "\n",
    "fig.suptitle(r\"GRIMM PM [1, 2.5, 10] ($\\frac{\\mu g}{m^3}$)\", fontsize=16)\n",
    "ax = fig.add_subplot(2, 1, 2)\n",
    "ax.plot(df_final.index,df_final['PM1 [ug/m3]'], lw = 4.0, label = 'PM1', alpha = alpha,color = colors[0])\n",
    "ax.plot(df_final.index,df_final['PM2.5 [ug/m3]'], lw = 4.0, label = 'PM2.5',alpha = alpha, color = colors[1])\n",
    "ax.plot(df_final.index,df_final['PM10 [ug/m3]'], lw = 4.0, label = 'PM10', alpha = alpha,color = colors[2])\n",
    "\n",
    "ax.set_ylabel(r'$\\frac{\\mu g}{m^3}$', rotation=0)\n",
    "ax.set_xlabel('Time (MM-DD HH)')\n",
    "ax.legend(\n",
    "    loc=\"upper center\",\n",
    "    bbox_to_anchor=(0.5, 2.44),\n",
    "    ncol=6,\n",
    "    fancybox=True,\n",
    "    shadow=True,\n",
    ")\n",
    "ax = fig.add_subplot(2, 2, 1)\n",
    "size = 6\n",
    "ax.scatter(df_final['PM1 [ug/m3]'],df_final['PM1 [ug/m3]'], s=size, alpha = alpha,color = colors[0])\n",
    "ax.scatter(df_final['PM2.5 [ug/m3]'], df_final['PM2.5 [ug/m3]'], s=size,alpha = alpha, color = colors[1])\n",
    "ax.scatter(df_final['PM10 [ug/m3]'], df_final['PM10 [ug/m3]'], s=size, alpha = alpha,color = colors[2])\n",
    "\n",
    "ax.set_ylabel(r'$\\frac{\\mu g}{m^3}$', rotation=0)\n",
    "ax.set_xlabel(r'$\\frac{\\mu g}{m^3}$')\n",
    "\n",
    "ax = fig.add_subplot(2, 2, 2)\n",
    "bins = 20\n",
    "ax.hist(df_final['PM1 [ug/m3]'],bins,alpha = alpha, color = colors[0], zorder = 10)\n",
    "ax.hist(df_final['PM2.5 [ug/m3]'],bins,alpha = alpha, color = colors[1], zorder = 10)\n",
    "ax.hist(df_final['PM10 [ug/m3]'],bins,alpha = alpha, color = colors[2], zorder = 10)\n",
    "\n",
    "ax.set_ylabel('Count')\n",
    "ax.set_xlabel(r'$\\frac{\\mu g}{m^3}$')"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "greenhouse-outreach",
   "metadata": {
    "lines_to_next_cell": 0
   },
   "source": [
    "Figure 4: A time series of the  GRIMM sensor PM [1, 2.5, 10] incremented from 2021-04-30 09:27:00 until 2021-04-30 21:36:00. "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "analyzed-diana",
   "metadata": {
    "lines_to_next_cell": 0
   },
   "source": [
    "### Plot counts of particle sizes averaged over time."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "employed-rally",
   "metadata": {},
   "outputs": [],
   "source": [
    "\n",
    "var_list = list(df_final)[6:37]\n",
    "var_labels = [var[:-2] for var in var_list]\n",
    "df_final_mean = df_final[var_list].mean()\n",
    "\n",
    "fig = plt.figure(figsize=(14, 6))\n",
    "fig.suptitle(\"Time averaged counts/liter of particles of size XX um\", fontsize=16)\n",
    "ax = fig.add_subplot(1, 1, 1)\n",
    "ax.bar(var_labels,df_final_mean)\n",
    "ax.set_ylabel('Counts/liter')\n",
    "ax.set_xlabel('Particles of size XX um')\n",
    "ax.set_xticklabels(var_labels, rotation=70, ha='right')"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "metropolitan-heart",
   "metadata": {
    "lines_to_next_cell": 0
   },
   "source": [
    "Figure 5 Time-averaged counts/liter of particles of size XX um, as measured by the GRIMM sensor.\n",
    "<br>\n",
    "<br>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "satisfied-farming",
   "metadata": {},
   "source": [
    "As depicted above all air-borne salt particles are 0.7 um or lower, with the majority at 0.3 um, implying that no deviation in PM concentration for PM 1.0, 2.5, and 10 values should exist. Comparing the GRIMM measured PM [1.0, 2.5, 10] supports this consideration.\n",
    "<br>\n",
    "<br>\n",
    "Based upon the above it is believed that our UBC-PM sensors aren’t sensitive enough to delineate clusters of small particles from larger particles. As a result, higher concentration values in PM 2.5 and 10 are indicated. More work needs to be done to support this statement."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "announced-surface",
   "metadata": {},
   "source": [
    "## Calibration\n",
    "We will construct a multilayer perceptron (MLP) model to try and correct our UBC-Pm sensor inaccuracy. We will test our UBC-PM-03 sensors using measured PM2.5 and particle counts at 03um, 05um, 10um 25um 50um, and 100um as input in the MLP. The MLP target will be the GRIMM PM2.5"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "daily-teacher",
   "metadata": {
    "lines_to_next_cell": 0
   },
   "source": [
    "### Correct PM 2.5 with an linear regrestion.\n",
    "#### First, open another dataset and merge with the one we have been using."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "reflected-allen",
   "metadata": {},
   "outputs": [],
   "source": [
    "# open all ubc_pm datafiles into a list\n",
    "ubc_list = [prepare_df(f\"/UBC-PM-0{i}/\", '20210502') for i in range(1,6)]\n",
    "## combine them to one dataframe\n",
    "df_ubc = reduce(lambda x, y: pd.merge(x, y, on='rtctime'), ubc_list)\n",
    "## drop odd character header added when combining\n",
    "df_ubc.columns = df_ubc.columns.str.replace('_y', '')\n",
    "## find the ubc dataframe with the lowest time stapms\n",
    "min_ubc = np.argmin([len(ubc_list[i]) for i in range(len(ubc_list))])\n",
    "## open the GRIMM dataset and restrict time to smallest ubc pm timeseries\n",
    "df_grim = open_grimm('20210502', ubc_list[min_ubc])\n",
    "## combine all the ubc pm dataframes with the grimm dataframe\n",
    "df_final2 = pd.merge(left=df_grim, right=df_ubc, left_index=True, right_index=True, how='left')\n",
    "## drop odd character header added when combining\n",
    "df_final2.columns = df_final2.columns.str.replace('_y', '')\n",
    "## drop the last 30 mins for this datafile as the sensors were removed form the glass chamber but still sampling\n",
    "df_final2 = df_final2[:'2021-05-02T18:50']"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "standing-somalia",
   "metadata": {
    "lines_to_next_cell": 0
   },
   "source": [
    "### Correct PM 2.5 with an MLP\n",
    "First, subset dataframe and normalize"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "forty-massachusetts",
   "metadata": {},
   "outputs": [],
   "source": [
    "pm = '2.5' \n",
    "pm_u = pm.replace('.', '')\n",
    "# pm = pm[0]\n",
    "ubc_pm = 'UBC_PM_03_pm'\n",
    "keep_vars = [f'PM{pm} [ug/m3]', f'{ubc_pm}{pm_u}_env', f'{ubc_pm[:-3]}_particles_03um', f'{ubc_pm[:-3]}_particles_05um', f'{ubc_pm[:-3]}_particles_10um', f'{ubc_pm[:-3]}_particles_25um', f'{ubc_pm[:-3]}_particles_50um', f'{ubc_pm[:-3]}_particles_100um']\n",
    "df = df_final.drop(columns=[col for col in df_final if col not in keep_vars])\n",
    "# df[f'{ubc_pm}{pm_u}_env2'] = df[f'{ubc_pm}{pm_u}_env']\n",
    "df.tail()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "sapphire-provision",
   "metadata": {
    "lines_to_next_cell": 0
   },
   "source": [
    "Normalize data and print"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "collaborative-warren",
   "metadata": {},
   "outputs": [],
   "source": [
    "scaler = MinMaxScaler()\n",
    "scaled = scaler.fit_transform(df)\n",
    "\n",
    "# split up data\n",
    "y = scaled[:,0]\n",
    "x = scaled[:,1:]\n",
    "\n",
    "try:\n",
    "  # load mlp\n",
    "  with open(str(data_dir)+f'/mlps/{ubc_pm}{pm_u}.pkl', 'rb') as fid:\n",
    "      model = pickle.load(fid)\n",
    "  print(f'found mpl for {ubc_pm}{pm_u}')\n",
    "except:\n",
    "  print(f'can not find mpl for {ubc_pm}{pm_u}, building...')\n",
    "  nhn = 8 \n",
    "  hidden_layers = 2\n",
    "  model = MLPRegressor(\n",
    "      hidden_layer_sizes=(nhn, hidden_layers),\n",
    "      verbose=False,\n",
    "      max_iter=1500,\n",
    "      early_stopping=True,\n",
    "      validation_fraction=0.2 ,\n",
    "      batch_size=32,\n",
    "      solver=\"adam\" ,\n",
    "      activation=\"relu\",\n",
    "      learning_rate_init=0.0001,\n",
    "  )\n",
    "\n",
    "  model.fit(x, y)  # train the model\n",
    "  # save the mlp\n",
    "  with open(str(data_dir)+f'/mlps/{ubc_pm}{pm_u}.pkl', 'wb') as fid:\n",
    "      pickle.dump(model, fid)    \n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "enhanced-doctor",
   "metadata": {
    "lines_to_next_cell": 0
   },
   "source": [
    "Open a dataset from another days comparisons of the GRIMM v UBC-PMs"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "minimal-service",
   "metadata": {},
   "outputs": [],
   "source": [
    "# open all ubc_pm datafiles into a list\n",
    "ubc_list = [prepare_df(f\"/UBC-PM-0{i}/\", '20210502') for i in range(1,6)]\n",
    "## combine them to one dataframe\n",
    "df_ubc = reduce(lambda x, y: pd.merge(x, y, on='rtctime'), ubc_list)\n",
    "## drop odd character header added when combining\n",
    "df_ubc.columns = df_ubc.columns.str.replace('_y', '')\n",
    "## find the ubc dataframe with the lowest time stapms\n",
    "min_ubc = np.argmin([len(ubc_list[i]) for i in range(len(ubc_list))])\n",
    "## open the GRIMM dataset and restrict time to smallest ubc pm timeseries\n",
    "df_grim = open_grimm('20210502', ubc_list[min_ubc])\n",
    "## combine all the ubc pm dataframes with the grimm dataframe\n",
    "df_final2 = pd.merge(left=df_grim, right=df_ubc, left_index=True, right_index=True, how='left')\n",
    "## drop odd character header added when combining\n",
    "df_final2.columns = df_final2.columns.str.replace('_y', '')\n",
    "## drop the last 30 mins for this datafile as the sensors were removed form the glass chamber but still sampling\n",
    "df_final2 = df_final2[:'2021-05-02T18:50']"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "iraqi-carolina",
   "metadata": {
    "lines_to_next_cell": 0
   },
   "source": [
    "reduce dataframe to be only a few variable of interest..print last 5 rows"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "empirical-individual",
   "metadata": {},
   "outputs": [],
   "source": [
    "# keep_vars = [f'PM{pm} [ug/m3]', f'{ubc_pm}{pm_u}_env']\n",
    "keep_vars = [f'PM{pm} [ug/m3]', f'{ubc_pm}{pm_u}_env', f'{ubc_pm[:-3]}_particles_03um', f'{ubc_pm[:-3]}_particles_05um', f'{ubc_pm[:-3]}_particles_10um', f'{ubc_pm[:-3]}_particles_25um', f'{ubc_pm[:-3]}_particles_50um', f'{ubc_pm[:-3]}_particles_100um']\n",
    "df2 = df_final2.drop(columns=[col for col in df_final2 if col not in keep_vars])\n",
    "# df2[f'{ubc_pm}{pm_u}_env2'] = df2[f'{ubc_pm}{pm_u}_env']\n",
    "df2.tail()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "injured-carnival",
   "metadata": {
    "lines_to_next_cell": 0
   },
   "source": [
    "Normalize this reduce dataframe and use our mlp model to correct to grimm"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "tribal-equipment",
   "metadata": {
    "lines_to_next_cell": 0
   },
   "outputs": [],
   "source": [
    "scaler2 = MinMaxScaler()\n",
    "scaled2 = scaler2.fit_transform(df2)\n",
    "\n",
    "# split up data\n",
    "y2 = scaled2[:,0]\n",
    "x2 = scaled2[:,1:]\n",
    "y_predict = model.predict(\n",
    "    x2\n",
    ") "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "professional-indonesia",
   "metadata": {
    "lines_to_next_cell": 0
   },
   "source": [
    "Inverse transform or \"unnormalize\" data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "horizontal-reserve",
   "metadata": {},
   "outputs": [],
   "source": [
    "y_predict  = np.column_stack((y_predict,x2))\n",
    "unscaled = scaler.inverse_transform(y_predict)\n",
    "df_final2[f'{ubc_pm}{pm_u}_cor'] = unscaled[:,0]"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "extensive-shore",
   "metadata": {
    "lines_to_next_cell": 0
   },
   "source": [
    "Plot corrected UBC PM 2.5 "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "lined-display",
   "metadata": {
    "lines_to_next_cell": 0
   },
   "outputs": [],
   "source": [
    "\n",
    "r2value_cor = round(\n",
    "    stats.pearsonr(df_final2[f'PM{pm} [ug/m3]'].values, df_final2[f'{ubc_pm}{pm_u}_cor'].values)[0], 2\n",
    ")\n",
    "r2value = round(\n",
    "    stats.pearsonr(df_final2[f'PM{pm} [ug/m3]'].values, df_final2[f'{ubc_pm}{pm_u}_env'].values)[0], 2\n",
    ")\n",
    "rmse_cor = str(\n",
    "    round(\n",
    "        mean_squared_error(\n",
    "            df_final2[f'PM{pm} [ug/m3]'].values, df_final2[f'{ubc_pm}{pm_u}_cor'].values, squared=False\n",
    "        ),\n",
    "        2,\n",
    "    )\n",
    ")\n",
    "rmse = str(\n",
    "    round(\n",
    "        mean_squared_error(\n",
    "            df_final2[f'PM{pm} [ug/m3]'].values, df_final2[f'{ubc_pm}{pm_u}_env'].values, squared=False\n",
    "        ),\n",
    "        2,\n",
    "    )\n",
    ")\n",
    "anchored_text = AnchoredText(\n",
    "    \"UBC-PM-03:        \" + r\"$r$ \" + f\"{r2value} rmse: {rmse}\" + \" \\nUBC-PM-03 Cor:  \" + r\"$r$ \" + f\"{r2value_cor} rmse: {rmse_cor}\"  ,\n",
    "    loc=\"upper left\",\n",
    "    prop={\"size\": 12, \"zorder\": 10},\n",
    ")\n",
    "\n",
    "\n",
    "fig = plt.figure(figsize=(14, 12))\n",
    "fig.autofmt_xdate()\n",
    "xfmt = DateFormatter(\"%m-%d %H:00\")\n",
    "fig.suptitle(r\"PM 2.5 ($\\frac{\\mu g}{m^3}$)\", fontsize=16)\n",
    "ax = fig.add_subplot(2, 1, 2)\n",
    "ax.plot(df_final2.index,df_final2[f'PM{pm} [ug/m3]'].values, lw = 4.0, label = 'GRIMM', color = colors[0])\n",
    "ax.plot(df_final2.index,df_final2[f'{ubc_pm}{pm_u}_cor'], label = 'UBC-PM-03 Corrected', color = colors[1])\n",
    "ax.plot(df_final2.index,df_final2[f'{ubc_pm}{pm_u}_env'].values, label = 'UBC-PM-03', color = colors[2])\n",
    "ax.set_ylabel(r'$\\frac{\\mu g}{m^3}$', rotation=0)\n",
    "ax.set_xlabel('Time (MM-DD HH)')\n",
    "ax.legend(\n",
    "    loc=\"upper center\",\n",
    "    bbox_to_anchor=(0.5, 2.44),\n",
    "    ncol=6,\n",
    "    fancybox=True,\n",
    "    shadow=True,\n",
    ")\n",
    "ax = fig.add_subplot(2, 2, 1)\n",
    "size = 6\n",
    "ax.scatter(df_final2[f'PM{pm} [ug/m3]'], df_final2[f'PM{pm} [ug/m3]'], s=size, color = colors[0])\n",
    "ax.scatter(df_final2[f'PM{pm} [ug/m3]'], df_final2[f'{ubc_pm}{pm_u}_cor'], s=size, color = colors[1])\n",
    "ax.scatter(df_final2[f'PM{pm} [ug/m3]'], df_final2[f'{ubc_pm}{pm_u}_env'], s=size, color = colors[2])\n",
    "ax.set_ylabel(r'UBC $\\frac{\\mu g}{m^3}$', rotation=0)\n",
    "ax.set_xlabel(r'GRIMM $\\frac{\\mu g}{m^3}$')\n",
    "ax.add_artist(anchored_text)\n",
    "\n",
    "ax = fig.add_subplot(2, 2, 2)\n",
    "bins = 20\n",
    "alpha = 0.6\n",
    "ax.hist(df_final2[f'PM{pm} [ug/m3]'],bins,alpha = alpha, color = colors[0], zorder = 10)\n",
    "ax.hist(df_final2[f'{ubc_pm}{pm_u}_cor'],bins, alpha = alpha, color = colors[1])\n",
    "ax.hist(df_final2[f'{ubc_pm}{pm_u}_env'],bins, alpha = alpha, color = colors[2])\n",
    "\n",
    "ax.set_ylabel('Count')\n",
    "ax.set_xlabel(r'$\\frac{\\mu g}{m^3}$')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "dried-annex",
   "metadata": {
    "lines_to_next_cell": 0
   },
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "id": "decreased-treaty",
   "metadata": {},
   "source": [
    "While the MLP moves the PM2.5 concentration values in the correct direction, we recognize this is approach will not work when applied to our field measurements. Our current MLP model is trained from particular conditions with a narrow particle distribution size range. To make this more robust, we need to expose the sensors to a broader range of known particle compositions, sizes, and distributions in the lab.  "
   ]
  }
 ],
 "metadata": {
  "jupytext": {
   "cell_metadata_filter": "-all",
   "main_language": "python",
   "notebook_metadata_filter": "-all"
  },
  "language_info": {
   "name": "python"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}