{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "animal-state",
   "metadata": {},
   "source": [
    "\n",
    "# Datasets\n",
    "\n",
    "The Datasets are divided up by their model domain"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "australian-spelling",
   "metadata": {
    "lines_to_next_cell": 0
   },
   "source": [
    "## Model Domains\n",
    "\n",
    "The FWF model resolves the FWI System/ FBP System in the D02 (12 km) and D03 (4 km) at 55 hour forecast horizon.\n",
    "\n",
    "![title](_static/images/fwf-model-domains.png)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "incredible-particular",
   "metadata": {
    "lines_to_next_cell": 0
   },
   "source": [
    "### Description\n",
    "\n",
    "For each domain there are two `.nc` (netcdf) files generated, four total datasets each day.\n",
    "\n",
    "**Domain: d02 (12 km)**\n",
    "-  `fwf-hourly-d02-YYYYMMDDHH.nc`\n",
    "    - File Size: ~ 780M\n",
    "    - File Dimensions: (time: 55, south_north: 417, west_east: 627)\n",
    "-  `fwf-daily-d02-YYYYMMDDHH.nc`\n",
    "    - File Size: ~ 16M\n",
    "    - File Dimensions: (time: 2, south_north: 417, west_east: 627)\n",
    "\n",
    "\n",
    "**Domain: d03 (4 km)**\n",
    "-  `fwf-hourly-d03-YYYYMMDDHH.nc`\n",
    "    - File Size:  ~ 1.7G\n",
    "    - File Dimensions: (time: 55, south_north: 840, west_east: 642)\n",
    "-  `fwf-daily-d03-YYYYMMDDHH.nc`\n",
    "    - File Size: ~ 30M\n",
    "    - File Dimensions: (time: 2, south_north: 840, west_east: 642)\n",
    "\n",
    "\n",
    "### Dataset Variables\n",
    "\n",
    "Regardless of Domain, each dataset hourly/daily contain the following variables."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "senior-cheese",
   "metadata": {
    "lines_to_next_cell": 2
   },
   "source": [
    "| Hourly Dataset <br> `fwf-hourly-<domain>-YYYYMMDDHH.nc`  | Daily Dataset <br> `fwf-daily-<domain>--YYYYMMDDHH.nc`  | \n",
    "| --------------------------- | ------------------------- |\n",
    "|**Time**: Hourly UTC  |**Time**: Noon Local for that Day |\n",
    "|**XLAT**: Degrees Latitude  |**XLAT**: Degrees Latitude|\n",
    "|**XLON**: Degrees Longitude  |**XLON**: Degrees Longitude|\n",
    "|**F**: Fine Fuel Moisture Code  |**P**: Duff Moisture Code  |\n",
    "|**m_o**: Fine Fuel Moisture Content  |**D**: Drought Moisture Code  |\n",
    "|**R**: Initial Spread Index   |**U**: Build Up Index   |\n",
    "|**S**: Fire Weather Index|**T**: 2 meter Temperature C |\n",
    "|**DSR**: Daily Severity Rating  | **TD**: 2 meter Dew Point Temperature C |\n",
    "|**FMC**: Foliar Moisture Content %  | **H**: 2 meter Relative Humdity %  |\n",
    "|**SFC**: Surface Fuel Consumption kg m^{-2}  | **W**: 10 meter Wind Speed km/h |\n",
    "|**TFC**: Total Fuel Consumption kg m^{-2} | **WD**: 10 meter Wind Direction deg  |\n",
    "|**ROS**: Rate of Spread m min^{-1}  | **r_o**: Total Accumulated Precipitation mm  |\n",
    "|**CFB**: Crown Fraction Burned % | **r_o_tomorrow**: Carry Over Precipitation mm  |\n",
    "|**HFI**: Head Fire Intensity  kW m^{-1} |**SNOWC**: Flag Inidicating Snow <br> Cover (1 for Snow Cover) Snow Depth m   |\n",
    "|**T**: 2 meter Temperature C  |   |\n",
    "|**TD**: 2 meter Dew Point Temperature C  | |\n",
    "|**H**: 2 meter Relative Humdity % |  |\n",
    "|**W**: 10 meter Wind Speed km/h  |  |\n",
    "|**WD**: 10 meter Wind Direction deg  |   |\n",
    "|**U10**:  U Component of Wind at 10 meter m/s  | |\n",
    "|**V10**: V Component of Wind at 10 meter m/s  |   |\n",
    "|**r_o**: Total Accumulated Precipitation mm  |   |\n",
    "|**r_o_hourly**: Hourly Accumulated Precipitation mm  |  |\n",
    "|**SNW**: Total Accumulated Snow cm  |   |\n",
    "|**SNOWH**: Physical Snow Depth m  |   |\n",
    "|**SNOWC**: Flag Indicating Snow <br> Cover (1 for Snow Cover) Snow Depth m  |   |"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "powerful-detail",
   "metadata": {
    "lines_to_next_cell": 0
   },
   "source": [
    "## Working with\n",
    "\n",
    "Suggest using [xarray](http://xarray.pydata.org/en/stable/) to open and work with data.\n",
    "\n",
    "An example of how to open and view "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "prompt-barbados",
   "metadata": {},
   "outputs": [],
   "source": [
    "import context\n",
    "import numpy as np\n",
    "import xarray as xr\n",
    "from context import data_dir, fwf_dir\n",
    "\n",
    "forecast_date = '2021051006'  ## \"YYYYMMDDHH\"\n",
    "domain        = 'd02'         ## or 'd03'\n",
    "name \t        = 'hourly'      ## or 'daily'\n",
    "\n",
    "## file directory\n",
    "filein = str(fwf_dir) + f\"/fwf-{name}-{domain}-{forecast_date}.nc\"\n",
    "\n",
    "## open dataset\n",
    "ds = xr.open_dataset(filein)\n",
    "\n",
    "## chunk data to dask.arrays \n",
    "ds = ds.chunk(chunks=\"auto\")\n",
    "ds = ds.unify_chunks()\n",
    "# NOTE this is not needed. Arrays will be either numpy float32 or objects\n",
    "\n",
    "## Example: look at variable F (Fine Fuels Moisture Code)\n",
    "print(ds.F)\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "narrative-desire",
   "metadata": {},
   "source": [
    "## How to search the FWF data set by locations"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "brown-albert",
   "metadata": {
    "lines_to_next_cell": 2
   },
   "outputs": [],
   "source": [
    "import context\n",
    "import pickle\n",
    "import numpy as np\n",
    "import pandas as pd\n",
    "import xarray as xr\n",
    "from sklearn.neighbors import KDTree\n",
    "\n",
    "\n",
    "from pathlib import Path\n",
    "\n",
    "from context import data_dir, fwf_dir\n",
    "from datetime import datetime, date, timedelta"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "willing-collins",
   "metadata": {},
   "source": [
    "### Define dataset information\n",
    "Define forecast date, domain and set paths to dataset"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "verbal-difficulty",
   "metadata": {},
   "outputs": [],
   "source": [
    "\n",
    "forecast_date = '2021051006'  ## \"YYYYMMDDHH\"\n",
    "domain        = 'd02'         ## or 'd03'\n",
    "name \t        = 'hourly'      ## or 'daily'\n",
    "\n",
    "filein = str(fwf_dir) + f\"/fwf-{name}-{domain}-{forecast_date}.nc\""
   ]
  },
  {
   "cell_type": "markdown",
   "id": "instant-split",
   "metadata": {},
   "source": [
    "Open forecast dataset and print "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "molecular-utility",
   "metadata": {},
   "outputs": [],
   "source": [
    "ds = xr.open_dataset(filein)\n",
    "## strip some attributes from the netcdf for the sake of printing\n",
    "ds.attrs = {'TITLE': 'FWF MODEL USING OUTPUT FROM WRF V4.2.1 MODEL',\n",
    " 'WEST-EAST_GRID_DIMENSION': '628',\n",
    " 'SOUTH-NORTH_GRID_DIMENSION': '418',\n",
    " 'DX': '12000.0',\n",
    " 'DY': '12000.0'}\n",
    "print(ds)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "major-covering",
   "metadata": {
    "lines_to_next_cell": 0
   },
   "source": [
    "Load a data set of weather sation locations and look at the first four rows as an example"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "smaller-despite",
   "metadata": {},
   "outputs": [],
   "source": [
    "df = pd.read_csv(str(data_dir) + \"/nrcan-wxstations.csv\", sep=\",\", usecols = ['wmo',\t'lat',\t'lon'])\n",
    "print(df.head())"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "celtic-wound",
   "metadata": {},
   "source": [
    "### Set up to build a kdtree"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "southwest-surfing",
   "metadata": {
    "lines_to_next_cell": 0
   },
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "id": "essential-locking",
   "metadata": {
    "lines_to_next_cell": 0
   },
   "source": [
    "First, set path to store kdtree and make directory if it doesn't exist "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "previous-drilling",
   "metadata": {},
   "outputs": [],
   "source": [
    "kdtree_dir = Path(str(data_dir) + \"/kdtree/\")\n",
    "kdtree_dir.mkdir(parents=True, exist_ok=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "fitting-bradford",
   "metadata": {
    "lines_to_next_cell": 0
   },
   "source": [
    "Now take gridded lats and long and convert to np arrays and check its shape"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "medieval-dominant",
   "metadata": {},
   "outputs": [],
   "source": [
    "XLAT, XLONG = ds.XLAT.values, ds.XLONG.values\n",
    "shape = XLAT.shape\n",
    "print(shape)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "political-courage",
   "metadata": {},
   "source": [
    "### Build a kdtree and save"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "proud-origin",
   "metadata": {},
   "outputs": [],
   "source": [
    "try:\n",
    "  ## try and open kdtree for domain\n",
    "  fwf_tree, fwf_locs = pickle.load(open(str(kdtree_dir) + f'fwf_{domain}_tree.p', \"rb\"))\n",
    "  print('Found FWF Tree')\n",
    "except:\n",
    "  ## build a kd-tree for fwf domain if not found\n",
    "  print(\"Could not find FWF KDTree building....\")\n",
    "  ## create dataframe with columns of all lat/long in the domian...rows are cord pairs \n",
    "  fwf_locs = pd.DataFrame({\"XLAT\": XLAT.ravel(), \"XLONG\": XLONG.ravel()})\n",
    "  ## build kdtree\n",
    "  fwf_tree = KDTree(fwf_locs)\n",
    "  ## save tree\n",
    "  pickle.dump([fwf_tree, fwf_locs], open(str(kdtree_dir) + f'fwf_{domain}_tree.p', \"wb\"))\n",
    "  print(\"FWF KDTree built\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "available-korea",
   "metadata": {},
   "source": [
    "\n",
    "### Search the data\n",
    "With a built kdtree we can query the tree to find the nearest neighbor model grid to our locations of interest."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "environmental-disclosure",
   "metadata": {
    "lines_to_next_cell": 0
   },
   "source": [
    "First, define empty list to append index of weather station locations"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "median-swift",
   "metadata": {},
   "outputs": [],
   "source": [
    "south_north,  west_east, wmo = [], [], []"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "original-death",
   "metadata": {
    "lines_to_next_cell": 0
   },
   "source": [
    "Now lets loop each weather station in dataframe"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "hindu-sperm",
   "metadata": {},
   "outputs": [],
   "source": [
    "for loc in df.itertuples(index=True, name='Pandas'):\n",
    "  ## arange wx station lat and long in a formate to query the kdtree\n",
    "  single_loc = np.array([loc.lat, loc.lon]).reshape(1, -1)\n",
    "\n",
    "  ## query the kdtree retuning the distacne of nearest neighbor and the index on the raveled grid\n",
    "  fwf_dist, fwf_ind = fwf_tree.query(single_loc, k=1)\n",
    "\n",
    "  ## set condition to pass on stations outside model domian \n",
    "  if fwf_dist > 0.1:\n",
    "    pass\n",
    "  else:\n",
    "    ## if condition passed reformate 1D index to 2D indexes\n",
    "    fwf_2D_ind = np.unravel_index(int(fwf_ind), shape)\n",
    "    ## append the indexes to lists\n",
    "    wmo.append(loc.wmo)\n",
    "    south_north.append(fwf_2D_ind[0])\n",
    "    west_east.append(fwf_2D_ind[1])"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "american-berlin",
   "metadata": {},
   "source": [
    "### Index an entire dataset\n",
    "Now the magic of xarray. Convert lists of indexes to dataarrays with dimension wmo (weather staton). This allows you to index an entire dataset!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "sharing-textbook",
   "metadata": {},
   "outputs": [],
   "source": [
    "south_north = xr.DataArray(np.array(south_north), dims= 'wmo', coords= dict(wmo = wmo))\n",
    "west_east = xr.DataArray(np.array(west_east), dims= 'wmo', coords= dict(wmo = wmo))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "applied-gallery",
   "metadata": {},
   "source": [
    "Index the entire dataset at the locations of interest leaving dimension time with new dimension wx stations"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "confused-attraction",
   "metadata": {},
   "outputs": [],
   "source": [
    "ds_loc = ds.sel(south_north = south_north, west_east = west_east)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "harmful-optimum",
   "metadata": {
    "lines_to_next_cell": 0
   },
   "source": [
    "Print to see new time series dataset at every weather station location"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "environmental-retirement",
   "metadata": {},
   "outputs": [],
   "source": [
    "\n",
    "print(ds_loc)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "great-stretch",
   "metadata": {
    "lines_to_next_cell": 2
   },
   "source": [
    "\n",
    "## Data License\n",
    "[Creative Commons Attribution 4.0 International License.](https://creativecommons.org/licenses/by/4.0/)"
   ]
  }
 ],
 "metadata": {
  "jupytext": {
   "cell_metadata_filter": "-all",
   "main_language": "python",
   "notebook_metadata_filter": "-all"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}