{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Big Data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Dask\n", "* A [package](https://dask.org/) which facilitates chunking of data to fit in memory and parallelism to your code\n", "* Setup is as simple as enable it and forget\n", "* It integrates with popular packages like Numpy, Pandas and Xarray, so no additional syntaxes to learn.\n", "* Can be [deployed](https://docs.dask.org/en/latest/setup.html) on simple laptops, HPCs and even on cloud computers." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Exercise\n", "* Open OSCAR surface ocean currents (large dataset with 10 GB size) and chunk it\n", "* compute monthly mean currents using all cores of your cpu\n", "* use 3D ocean data (EN4) to draw temperature and salinity vertical profile\n", "\n", "The data is too large to be distributed through Github. You can dowload the data from link below:\n", "* OSCAR: https://podaac.jpl.nasa.gov/dataset/OSCAR_L4_OC_third-deg\n", "* EN4: https://www.metoffice.gov.uk/hadobs/en4/" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import xarray as xr\n", "import numpy as np\n", "import cmocean\n", "import cartopy as cr\n", "import cartopy.crs as ccrs\n", "import matplotlib.pyplot as plt\n", "import cartopy.feature as cfeature\n", "from cartopy.mpl.gridliner import LONGITUDE_FORMATTER, LATITUDE_FORMATTER" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "%matplotlib inline" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n",
"Client\n", "
| \n",
"\n",
"Cluster\n", "
| \n",
"