Scientific Computing

Create a blank/orphan Git branch

Many basic Git use cases involve a main branch with feature branches periodically merged into the main branch. For certain purposes, totally distinct branch without a common history can exist in the same Git repo. One of the most common uses of this is for documentation. For example, GitHub will build a website from the gh-pages branch.

Setup blank Git branch

Do NOT force push during this procedure, you may accidentally erase years of work!

This example assumes you want to create a gh-pages empty branch for documentation on GitHub, but will of course work for other purposes too.

From the repo directory create a blank Git branch:

git switch --orphan gh-pages

Remove the unneeded files from this branch (by default, all existing files are staged from the previous branch)

git rm --cached -r .

git clean -id

This leaves the .git/ directory, which should not be disturbed.

Copy documentation files

What happens next depends on if your documentation files were already added to another branch (tracked) or were not added to Git (untracked). Assume wanted files for the blank gh-pages branch are in docs/ on main branch.

copy over the files to gh-pages

git checkout main -- docs/

git commit -am "moved docs"

Upload files

  1. push the documentation

    git push -u origin gh-pages
  2. Enable the documentation builds from github.invalid/username/myrepo/settings → GitHub Pages under Source select gh-pages.

  3. In a few minutes, the webpages at username.github.io/myrepo/ should be active.

Once everything is working, the old docs/ folder isn’t needed.

Python f-string benchmarks

Python 3.6 f-strings have been shown to be the fastest string formatting method in microbenchmarks by Python core dev Raymond Hettinger benchmark relative speed factors:

  • f-string: 1.0
  • concatenate string: 1.33
  • join sequence of strings: 1.73
  • %s formatting operator: 2.40
  • .format() method: 3.62
  • Template() method: 11.48

The reason for this speedy performance was described by Python core dev Serhiy Storchaka.

Goldwave audio editor on Linux WINE

Goldwave 5.x is stable and the recommended version to use on Linux via WINE. An open source alternative to Goldwave is Audacity. If using Goldwave on Windows directly:

winget install --id=GoldWave.GoldWave

To use Goldwave on Linux via WINE:

Install Goldwave prereqs. 32-bit WINEPREFIX is mandatory due to wmp10 prereq.

WINEPREFIX=~/.wine32 WINEARCH=win32 winetricks wmp10

This creates a 32-bit winearch (default .wine directory is 64-bit). The default install options are fine.

Install Goldwave in WINE.

WINEPREFIX=~/.wine32 wine gwave5*.exe

This also creates a Goldwave icon in the Ubuntu apps menu.

Configure soundcard in Goldwave by:

  1. F11 key → Control Properties → System tab
  2. click Use DirectSound AP radio button
  3. click OK to save configuration

Select “loopback” what you hear audio under F11 → Device → Record and pick Loopback Pulseaudio.

Test Goldwave 5 by F11 key → Device tab and click Test playback button to hear a brief test tone. Create new/save/play sound files with Goldwave 5.7 in Linux WINE

These settings are useful for Goldwave on Windows as well as WINE/Linux.

  • Constantly monitor audio levels: F11 → Record → monitor input
  • label VU graph axes in dB: Right click VU meter → Properties → Show Axis

It’s recommended to use Goldwave 5.x as above on WINE instead. There are some errors but Goldwave 6 still works to record and playback on WINE ≥ 2 using a Windows 7 Wineprefix.

  1. Download Goldwave 6
  2. configure Wineprefix for Windows 7 using winecfg → Applications → Windows Version → Windows 7
  3. Install Goldwave 6 on WINE
wine InstallGoldwave6*.exe

As of WINE 2.1, error upon opening Goldwave 6 include

Floating point overflow

and there are also

Unhandled exception

on closing Goldwave 6 and it will keep asking to reset Goldwave settings. Nonetheless Goldwave 6 does record/play on WINE 2.1.

git pull after remote forced update

When collaborating in teams with Git, someone else may do a “force push” on a feature branch, that conflicts with local revisions previously pulled. Here are a few simple scenarios to resolve this situation quickly. For simplicity, in this article we assume work in a Git branch feat1.

First, make a copy of the Git repo in case a mistake is made.

No new local work

Erase local changes in feat1 and match the remote Git repo.

git switch feat1

git pull --rebase

update and preserve local work

To preserve work in feat1:

git switch feat1

git fetch

git reset origin/feat1 --soft

The work can be committed as usual after the reset.

Git bug for Windows nanorc

Git 2.24.1.windows.2 seems to have a bug in git.nanorc within Git Bash. Specifically, the wrong (Windows/DOS) \r\n line endings seem to have been introduced. A simple fix is to use dos2unix to correct the line endings.

winget install --id=waterlan.dos2unix -e

Then run:

dos2unix /usr/share/nano/git.nanorc

Symptom: Git operations using the “nano” editor, such as git commit without the -m option or git rebase -i invoke errors as below. This is despite the git.nanorc matching templates as old as 2016. The problem is, in Git 2.24.1.windows.2, the line endings were accidentally set to DOS \r\n instead of \n.

/usr/share/nano/git.nanorc on line 1: Regex strings must begin and end with a " character
" not understoodare/nano/git.nanorc on line 2: Command "
" not understoodare/nano/git.nanorc on line 10: Command "
" not understoodare/nano/git.nanorc on line 15: Command "
Error in /usr/share/nano/git.nanorc on line 19: Regex strings must begin and end with a " character

Matplotlib ValueError on LogNorm plots

Matplotlib log10-normalized plots are enabled with plotting options

pcolormesh(dat, norm=matplotlib.colors.LogNorm(), vmin=max(dat.min(), LOGMIN))

This option also works for appropriate 2-D plots from pandas.DataFrame.plot() and xarray.DataArray.plot().

Log(0) bounds error

Explicit plot option vmin=0 or implicit (from data with a minimum of zero) in a log-norm pcolormesh() plot will cause errors like

ValueError: Data has no positive values, and therefore can not be log-scaled.

or

ZeroDivisionError: float division by zero

Fix

Choose a minimum plot value LOGMIN appropriate for plotting the data.

import numpy as np
import matplotlib.pyplot as plt
from matplotlib.colors import LogNorm

LOGMIN = 0.1  # arbitrary lower bound, as appropriate for log-scaled data display

dat = np.random.rayleigh(1., (50,50))

dat[0,0] = 0.  # forcing the ValueError to occur with LogNorm

fg = plt.figure(figsize=(12,5, layout='constrained'))
ax = fg.subplots(1,2)
ax[0].pcolormesh(dat, norm=LogNorm(), vmin=max(dat.min(), LOGMIN))
# vmin= : this averts ValueError by having non-zero cdata minimum.

ax[0].set_title('log')

ax[1].pcolormesh(dat)
ax[1].set_title('linear')

plt.show()

Matlab / GNU Octave

The equivalent code in Matlab / GNU Octave does not give an error.

dat = raylrnd(1., [50,50]);

dat(1,1) = 0;

pcolor(log10(dat))

Majority of new Python work is Python 3

(This post was originally from June 2017).

There is considerable additional effort required to support Python < 3.6 in general while using concurrent and other performant Python programming with the latest common modules like Numpy, Xarray, Matplotlib, etc.


Python 3 is used by a large and growing majority of new and active Python developers in science, engineering, medical research and education. Python 3 was released in December 2008. While initially there was pushback over backward incompatibilities, the performance, efficiencies and features of Python 3 have won out.

The most popular Python packages have supported Python 3 for some time now, including Amazon AWS and Google Cloud Platform.

The main holdouts are of the same nature as those that hang on to COBOL systems. Those with static, unchanging requirements in large proprietary codebases that few people are familiar with. Some programmers thrive and make a decent living servicing those legacy COBOL and Python environments. The majority of STEM coders, analysts and educators have been writing Python 3 code. The Python 3 objections were mostly written before 2016 and almost all were before 2017. Some of their complaints were addressed in Python 3.6 (released December 23, 2016).

A main issue over Python 3 is over the separation between bytes and strings. Applications with IoT and embedded systems distinguish between bytes and strings, so I appreciate the separation of bytes and strings in Python 3. For the global environment I write for, I appreciate that strings are Unicode in Python 3.

Python 3 efficiencies in terms of programmer time come in more efficient syntax. The Python 3 core itself is as much as 10% faster in several aspects, with some standard modules like re processing regular expressions as much as 20x faster. The modernize tool and six and __future__ modules smooth over most of these changes to make backward compatible code. Some of the most enticing changes from Python ≥ 3.6 are not easily backportable. These features simplify syntax and make the code more self-documenting.

asyncio brings core Python features that used to require Tornado, twisted, etc. Asynchronous execution is required for applications that need to scale massively. IoT applications where remote sensors report in are a perfect use case for asyncio. asyncio is a convenient, relatively safe way to thread baked right into Python.

Least-recently used caching is enabled by a decorator to functions. For example, you have a computationally-expensive function where you sometimes use the same arguments, but don’t want to bother tracking the output by saving in a variable. LRU caching is as simple as:

import functools

@functools.cache
def fib(n):
    if n < 2:
        return n
    return fib(n-1) + fib(n-2)

print([fib(n) for n in range(16)])

print(fib.cache_info())

results in:

[0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, 233, 377, 610]

CacheInfo(hits=28, misses=16, maxsize=None, currsize=16)

Pip uses pyproject.toml to completely describe package metadata for installation.

Python type hinting, is used by some IDEs to give code warnings while not actually enforcing strict typing (unless you want to).

import math

def two_sinxy(x:float, y:float) -> float:
    return 2*math.sin(x*y)

This function would not crash if you fed int in on most interpreters, but PyCharm IDE and other can emit warnings when conflicting variable types are passed in/out.

Python argument unpacking, unpacks iterables into functions requiring multiple arguments, expanded with *iterable. Multiple iterables can be unpacked into a single function.

ipaddress is a useful standard library feature for manipulating IP addresses, used in the findssh program to scan for active servers in IP address ranges.

Object-oriented pathlib is standard library and replaces most os.path functions.

f-strings allow f'This is {weight} kg for {price} dollars. instead of 'This is {} kg for {} dollars'.format(weight,price)

Python 3.7 adds several more compelling reasons to upgrade.


Patreon transitioned from PHP → Python 3 in 2015. Key appeals for Patreon to switch to Python 3 included:

  • future-proofing
  • appeal to developer hiring, working with latest language
  • lower risk than porting to static typed language like Scala

Instagram runs fully on Python 3 as noted at the 2017 PyCon keynote at the 13 minute mark.


Starting in 2010, Arch Linux defaulted to Python 3. Ubuntu 17.10 defaulted to Python 3. Ubuntu 18.04 requires Python 3 from programs in the main Ubuntu repository with default Python 3.6. The goal is to demote Python from the main repository.

Executable Python scripts should continue to have the first line

#!/usr/bin/env python

so that users can configure their desired Python version. Many users install a third party Python distribution such as Anaconda Python, PyCharm, Intel Python, etc. that have high performance math libraries such as Cuda-based CuPy.


  • Very detailed notes from Python Software Foundation Fellow Nick Coghlan on why, when, what, how, where of Python 3 transition with fascinating historical notes.
  • ActiveState saw majority of downloads being Python 3 since January 2017.

Upgrade Windows with dual-boot Linux

For a dual-boot Windows / Linux PC, set BIOS / UEFI to boot to the Windows hard drive, especially if using Windows BitLocker. If Windows won’t boot, the Windows HDD boot sector may need repair. Windows error 0x800703ed may occur if a dual boot system tries to start Windows from Grub instead of directly via BIOS / UEFI selection. It is best to use a separate hard drive for Windows.

Backup the PC to an external hard drive or the cloud. Create bootable Windows USB drive via Windows Media Creation Tool. Use Rufus to write the ISO to USB. Boot from USB and select a partition to install Windows on. Don’t delete Recovery partitions or you may lose your Windows OEM license.

Wget HSTS database

HSTS can enhance security, so normally we’d like to have HSTS working. If the Wget HSTS database file permissions are incorrect, wget may emit messages like:

Will not apply HSTS. The HSTS database must be a regular and non-world-writable file.
could not open HSTS store at '~/.wget-hsts'. HSTS will be disabled.

Fix: make the .wget-hsts file have normal file permissions:

chmod 644 ~/.wget-hsts