I recently got my hands on an invitation for Hex. Not only do they produce great blog posts, they also offer a product for a specific data science niche. The common issue with data science notebooks is that they are hard to translate to a compelling story for a not-so-tech-savvy audience. That’s the sweet spot where Hex operates: an enhanced notebook with presentation capabilities. I have a particular affection for the bootstrap, so I decided to create a Hex app to explain this procedure.
Stories & Apps
Hex differentiates itself from regular notebooks by offering presentation capabilities. One can either create a Story, which is automatically generated from the notebook, or drop cell outputs on a Canvas, to create a simple app. What you see below is the first part of a canvas app.
A cool feature of Hex apps (although configured in the Logic) is that you can schedule them. This way, a report can be updated recurrently, without requiring action from its consumers.
A variety of visualization libraries are supported. The usual suspects: Seaborn, Matplotlib, Plotly, … work perfectly fine. However, visualization created with the Vis.gl suite (e.g., Deck.gl) cannot be published (for now). In the example below, I visualize the bootstrapping method using Matplotlib.
Apps are driven by the other components that constitute Hex. Most notable is Hex’ Logic.
Logic
Logic looks like a simple Jupyter notebook but is quite a lot more than that. It has extra cells that are not available in standard Jupyter notebooks:
- Write SQL queries to get data from one of your (SQL-based) data sources
- Properly formatted tables
- Input parameters
Personally, the input parameters seem to be the most interesting, and there are many use cases for them. By inserting these in your notebook, you could allow the user to:
- change specific parameters for visualization such as axis limits, smoothing, thresholds, …
- change hyperparameters of a model to understand what the impact is on model metrics such as accuracy, negative/positive predictive value, recall, fallout, etc.
- set simulation parameters — e.g. transition rates in an epidemiological model.
- explore a data set by letting him compare columns (e.g. correlate them)
Each input parameter represents a variable that can be incorporated inside the Python code. They come in the form of sliders, dropdowns, checkboxes, buttons, and even a simple spreadsheet interface. Below, I demonstrate how input parameters are inserted in your notebook.
In the following lines of code, I generate a population with a normally distributed body length based on the mean and the standard deviation that can be set by the user via input parameters.
population = np.random.normal(loc = $population_mean, scale = $standard_deviation, size = 500) sample = np.random.choice(population, size = 100, replace = False) print('The mean of the sample is {}.'.format(sample.mean())) print('The mean of the population is {}'.format(population.mean()))
Data Connections
There’s a good set of SQL-based storage technologies that you can natively connect to within Hex. Instead of messing with SQL queries inside your code, they can be written and run intuitively inside SQL cells. These cells output the query results to a data frame that can be accessed from within your code.
Even better, you can use input parameters in your queries, allowing an end-user to parametrize a query to manipulate which data is returned.
Currently, the following six popular database technologies are supported.
Environment
Finally, in the environment, one can specify environment variables and secrets (e.g. to access an API), upload data sets and browse through all the available packages. Files up to 2 gigabytes can be uploaded and packages can be added from the Logic by using pip.
The case for Hex
Although Hex is one of many computational notebook solutions out there, it explicitly claims it doesn’t want to replace BI tools if no complex logic is involved. Nor is it focused on machine learning model development. Its interactive presentation capability is what makes Hex truly unique.
Within a modern tech company, where Python and SQL is the lingua franca of its analysts and data scientists, Hex really is something to consider. It is very likely that an audience with a high degree of data maturity prefers a notebook with presentation capabilities over expensive enterprise analysis and visualization tools.
Hex has some enterprise features such as collaboration, versioning, and the possibility to run it on a VPC. However, within large organizations, Hex will have to wage a battle against established procedures that leverage enterprise visualization technologies and slide decks as the preferred medium for business cases.
Finally, it is worth noting that Hex is not publicly available yet. For this reason I have not mentioned any bugs.
Interested in other computational notebooks? Browse through the list below.
[ninja_tables id=”25526″ filter=”Computational Notebook” filter_column=”subcategory” sorting=0 search=0 columns=”rpurl,company,tool,subcategory”]