Google’s Search Console is useful for uncovering lagging performance in web pages, including for blogs.
I tend to write terse posts that address very specific issues.
Often, these pages perform well, but I saw a few percent of pages being marked in Search Console as status “crawled - currently not indexed”.
The underlying theme on these pages was they had too little ordinary paragraph text.
If there are too many lists, headers, or preformatted text relative to plain paragraphs, this “not indexed” status is likely to be applied.
A few of these type of pages also suffered from “soft 404” status.
I found these were very short pages that contained text with “error” or “missing”.
I reworded those articles to avoid those terms.
I made sure the titles didn’t include those terms.
I also ensured there were not too many header tags relative to the text–perhaps one header at most per “page” of text.
The fix to these issues is generally to include more meaningful text–be sure an article is at least one or two full paragraphs.
Add context that would help a more novice user understand why you applied that solution or approach.
Avoid sensational or colloquial text as the search engines are smart enough to recognize this as low quality writing.
As always, maintaining good spelling and adequate grammar help the search engine better appraise the quality of your content.
Also consider short (less than 50 character) but meaningful page titles.
For long-lived blogs such as this one, there is inevitably content that is no longer relevant to anyone except for historical purposes.
You may not want to simply delete these posts that you took time to research and share, but realize these old posts are costing your current performance by wasting crawler budget.
The approach I take is to mark these old pages with “noindex” metadata.
That allows reminiscing about old technology such as Blackberry OS 10 without degrading the performance of currently relevant content.
I think of it as a soft deprecation of the content.
More convenient array broadcasting was added to Matlab years ago, removing the need for
bsxfun.
Python Numpy has even more
advanced array indexing
and broadcasting features to conserve memory, speeding computations.
When translating between Matlab and Python avoid simply replacing Matlab
repmat
with
numpy.tile
that copies data in memory.
It may be possible to use
numpy.newaxis
or
numpy.broadcast_arrays
for O(1) speed and memory saving.
Homebrew’s brew cleanup saves disk space by deleting old versions of packages.
From time to time, CI macOS images get out of date with Homebrew, and auto-cleanup is triggered unintentionally upon brew install during a CI run.
There is no benefit to this on CI.
Disable Homebrew auto cleanup
by setting environment variable HOMEBREW_NO_INSTALL_CLEANUP=1.
On GitHub Actions, this is accomplished near the top of the particular .github/workflows/*.yml file:
GCC (including G++ and Gfortran) on Windows is available via several means.
We generally prefer MSYS2 for using GCC and numerous libraries with pacman package manager.
Permuting and transposing array dimensions are a common operation used in data analysis.
Ideally these operations would be done with index tricks instead of copying arrays.
In Matlab, these operations copy the array (slow, expensive).
In Python, these operations are O(1) an index manipulation creating a new view into an array (fast).
In Matlab,
transpose
and
permute
are distinct methods.
In Matlab, transpose() is strictly for 2-D matrices, while permute() is for reordering dimensions of N-D arrays.
In Python, xarray and Numpy arrays are popular.
Both use the .transpose() method for N-D array dimension reordering–there is no separate “permute” method.
However, the syntax is distinct between xarray and Numpy.
Numpy
transpose
can also permute by specifying a tuple of the desired order.
Omitting the axes order argument simply reverses the order of the axes for N-dimensional arrays.
On Windows, CMake defaults to Visual Studio and Nmake, which may not work on some projects.
Ninja build system with CMake
is generally faster to build and particularly rebuild regardless of operating system.
Ninja on Windows solves numerous issues vs. GNU Make.
Ninja works with Visual Studio as well.
Override the default CMake generator by setting environment variable
CMAKE_GENERATOR=Ninja
CMAKE_GENERATOR can be overridden (e.g. to use GNU Make from MSYS2) like:
cmake -G "MinGW Makefiles"
Older CMake on Windows may get the message below.
Fix by upgrading to CMake ≥ 3.17.
sh.exe was found in your PATH
For MinGW make to work correctly sh.exe must NOT be in your path.
Run cmake from a shell that does not have sh.exe in your PATH.
If you want to use a UNIX shell, then use MSYS Makefile
A common numerical operation is cumulative summing.
The cumsum() operation has the output array of the same shape as the input array, with each element as the sum of all the previous elements in that dimension.
Matlab
cumsum
and Python
numpy.cumsum
operate quite similarly.
When translating code between Matlab and Python, as always keep in mind Matlab’s one-based indexing vs. Python’s zero-based indexing.
That is, when using cumsum() over an axis, be sure to select the correct axis–Matlab cumsum(..., 1) is equivalent to numpy.cumsum(..., axis=0).
When working with large grids, the grid itself can consume considerable memory.
Large grids can require more advanced techniques like working with interpolants and using sparse grids.
Many grids that aren’t too large can use common functions like “meshgrid” to make certain plotting or linear algebra operations easier.
Matlab and Python (Numpy) can readily work with these data structures.
We show the differences and similarities to allow understanding and translating code between Matlab and Python.
Matlab
meshgrid
and
ndgrid
generate distinct data for the first two dimensions, but other dimensions remain the same.
This is due to meshgrid() being intended for plotting, and ndgrid() intended for matrix operations.
Numpy is used for most array computation in Python–many Python numerical libraries use Numpy internally.
Numpy has additional advanced grid generation functions numpy.mgrid() and numpy.ogrid()–here we will focus on
numpy.meshgrid.
numpy.meshgrid also has copy=False and sparse=True options that allow conserving memory.
For simplicity, we will use the defaults, which is a dense copied mesh.
Equivalent to Matlab meshgrid():
x,y = numpy.meshgrid([1,2], [1,2], indexing='xy')
>>> x
array([[1, 2],
[1, 2]])
>>> y
array([[1, 1],
[2, 2]])
Equivalent to Matlab ndgrid():
x,y = numpy.meshgrid([1,2], [1,2], indexing='ij')
>>> x
array([[1, 1],
[2, 2]])
>>> y
array([[1, 2],
[1, 2]])
A recent PyHC presentation given by Ian of US-RSE had a discussion where a few journals relevant to publishing research software were noted.
We have used
engrXiv
for software preprints.
Refereed journals include:
When using Intel oneAPI compilers on Windows, you may get a message like:
INTERNAL ERROR: pgopti_Create_Full_Path: buffer too small
This is probably caused by a filepath that’s more than 139 characters.
You may not see the paths printed as being that long, because CMake with Ninja does preprocessing on each Fortran file that lengthens the filename.
The fix is to make the project paths shorter.
Users may have to build code from a directory nearer the top of the filesystem hierarchy.