Architecture

Jupyter Notebooks work with what is called a two-process model based on a kernel-client infrastructure. This model applies a similar concept to the Read-Evaluate-Print Loop (REPL) programming environment that takes a single user’s inputs, evaluates them, and returns the result to the user.

Based on the two-process model concept, we can explain the main components of Jupyter in the following way:

Jupyter Client

  • It allows a user to send code to the kernel in a form of a Qt Console or a browser via notebook documents.

  • From a REPL perspective, the client does the read and print operations.

  • Notebooks are hosted by a Jupyter web server which uses Tornado to serve HTTP requests.

Running Code

Execution

Jupyter Kernel

  • It receives the code sent by the client, executes it, and returns the results back to the client for display. A kernel process can have multiple clients communicating with it which is why this model is also referred as the decoupled two-process model.

  • From a REPL perspective, the kernel does the evaluate operation.

  • kernel and clients communicate via an interactive computing protocol based on an asynchronous messaging library named ZeroMQ (low-level transport layer) and WebSockets (TCP-based)

  • Makes Jupyter a language agnostic application (Julia, Python, R, etc.)

A kernel identifies itself to IPython by creating a directory, the name of which is used as an identifier for the kernel. These may be created in a number of locations:

Mine is in the following location (MAC) ~/Library/Jupyter/kernels/python37664bite09a6f3cbf7b46ec803618408bcaece5, and you find similar files:

kernel.json     logo-32x32.png  logo-64x64.png

Sample of a default Python kernel json file:

{
  "argv": [
    "/usr/local/opt/python/bin/python3.7",
    "-m",
    "ipykernel_launcher",
    "-f",
    "{connection_file}"
  ],
  "display_name": "Python 3.7.6 64-bit",
  "language": "python",
  "env": {},
  "metadata": {
    "interpreter": {
      "architecture": 3,
      "path": "/usr/local/opt/python/bin/python3.7",
      "version": {
        "options": {
          "loose": false,
          "includePrerelease": false
        },
        "loose": false,
        "raw": "3.7.6-final",
        "major": 3,
        "minor": 7,
        "patch": 6,
        "prerelease": [
          "final"
        ],
        "build": [],
        "version": "3.7.6-final"
      },
      "sysPrefix": "/usr/local/Cellar/python/3.7.6_1/Frameworks/Python.framework/Versions/3.7",
      "fileHash": "1eaf1f22773a15c8adb7d37641dd5c88999c181add7c1d97a004dd33a2c824657b1b4cfb6ca19d4b7804b18eb454d5c34b3df1bdc53c2c93f4116419ce72d1a8",
      "type": "Unknown",
      "displayName": "Python 3.7.6 64-bit",
      "__store": true
    }
  }
}%

You can build your own. I built mine to run PySpark through my Jupyter Notebook as shown below:

{
  "display_name": "PySpark_Python3",
  "language": "python",
  "argv": [
   "/opt/conda/bin/python3",
   "-m",
   "ipykernel_launcher",
   "-f",
   "{connection_file}"
  ],
  "env": {
   "SPARK_HOME": "/opt/jupyter/spark/",
   "PYTHONPATH": "/opt/jupyter/spark/python/:/opt/jupyter/spark/python/lib/py4j-0.10.9-src.zip:/opt/jupyter/spark/graphframes.zip",
   "PYSPARK_PYTHON": "/opt/conda/bin/python3"
  }
}

Jupyter Notebook Document Format

  • Notebooks are automatically saved and stored on disk in the open source JavaScript Object Notation (JSON) format and with a .ipynb extension.