Scientific Computing

Windows coreutils from Microsoft

The ubiquitous GNU coreutils has long been missing from Windows. We found ourselves invoking coreutils utilities via WSL using wsl <coreutils command> to get access to these utilities on Windows until now. Microsoft has enhanced Rust-based uutils coreutils to run natively on Windows, and has made it available via WinGet:

winget install --id=Microsoft.Coreutils -e

Close / reopen the Terminal windows to use Coreutils, which has distinct conflict and availability of tools when using ComSpec Command Prompt vs PowerShell. Some commands are so POSIX-intrinsic that they are not available or relevant in Microsoft coreutils. Other coreutils commands overlap so much or conflict with Windows intrinsic commands that they are omitted from Microsoft coreutils. There are distinctions in command parsing of Microsoft coreutils vs. standard coreutils to be aware of.

A general issue across systems that use “coreutils”, say on embedded or other minimal systems where not all coreutils are available, is that it’s up to the developer to handle cases where some coreutils tools isn’t available or overloaded by something else. Build systems like CMake also handle these problems, like what exactly is “gcc” when multiple compilers masquerade as “gcc” – CMake inspects the version string to formally ID the compiler vendor. To use a script consuming coreutils on Windows, the script needs to handle issues like the following, where MSVC “link” is overriding coreutils “link”:

where.exe link
C:\Program Files\Microsoft Visual Studio\18\Community\VC\Tools\MSVC\*\bin\HostARM64\arm64\link.exe
C:\Program Files\coreutils\bin\link.exe

Have a look in Cargo.toml to see the Microsoft coreutils commands.

We have long augmented CMake projects with Bash and PowerShell scripts to handle tasks too awkward for CMake. Python usually has enough built-in capability in “os”, “pathlib”, and “shutil” to avoid needing coreutils in Python scripts.

GNU / Windows ?

Could Windows with Microsoft coreutils be considered a GNU / Windows hybrid - no, because Microsoft coreutils is based on uutils coreutils, which is an MIT-licensed reimplementation of GNU coreutils in Rust. GNU / Linux is a common term for Linux distributions that include GNU utilities, and Microsoft coreutils brings many of those utilities to Windows. While it’s not a full GNU environment, it does provide a significant portion of the GNU toolset on Windows, making it a sort of hybrid in terms of command-line utilities.

As background, the core components of a typical GNU/Linux system include:

  • Linux Kernel: Core of the system, handling hardware and process management
  • GNU Utilities: Essential tools for file management, text processing, and system administration
  • Display Server and Desktop Environment: X11 or Wayland for graphics, with desktop environments like GNOME, KDE Plasma, or Xfce
  • Package Manager: Software installation and updates (e.g., APT, DNF, Pacman).
  • Shell: Command-line interface for interacting with the system.

On Windows the core components include:

  • Windows Kernel: Core of the system, handling hardware and process management
  • Microsoft coreutils: Essential tools for file management, text processing
  • Display Server and Desktop Environment: Windows GUI for graphics and user interface
  • Package Manager: WinGet for software installation and updates
  • PowerShell: Command-line interface for interacting with the system

On macOS the core components include:

  • XNU Kernel: Core of the system, handling hardware and process management
  • BSD Utilities: Essential tools for file management, text processing, and system administration
  • Display Server and Desktop Environment: Quartz for graphics, with the macOS desktop environment
  • Package Manager: Homebrew for software installation and updates
  • Shell: Terminal with Zsh for command-line interface

Python vs. Julia vs. GNU Octave in research

Python has become a dominant language for scientific computing, data analysis, machine learning, and engineering workflows. Julia offers a modern high-performance syntax specifically designed for numerical and scientific computing. GNU Octave is an open-source MATLAB alternative with largely MATLAB compatibile syntax.

GNU Octave continues to be developed by John W. Eaton. Octave is a high-level interpreted language designed for numerical computations. The community continues to release major versions roughly yearly.

Octave shines when you need:

  • Near drop-in compatibility with MATLAB .m files (as long as proprietary toolboxes aren’t required).
  • A quick way to test whether it’s worth porting a MATLAB function or script to Python.
  • Calling MATLAB/Octave functions directly from Python using Oct2Py.

Octave includes its own growing set of packages (toolboxes) that extend its capabilities in areas like signal processing, control systems, and optimization.

Julia

Julia is a modern, high-performance language designed specifically for scientific and numerical computing. It aims to combine the ease of use of Python/MATLAB with the speed of C/Fortran.

Julia excels when:

  • You need high performance without dropping to lower-level languages (JIT compilation often delivers near-C speeds for numerical loops and linear algebra).
  • Working on large-scale simulations, differential equations, optimization, or other compute-intensive scientific tasks.
  • You want a clean, math-friendly syntax with advanced features like multiple dispatch, metaprogramming, and excellent built-in support for parallelism and distributed computing.
  • Reproducibility and package management are priorities (via its built-in package manager).

Julia has strong libraries for data science, machine learning, visualization, and more, though its overall ecosystem is smaller than Python’s. It’s particularly appealing for researchers writing performance-critical code from scratch.

Python

Key Advantages of Python:

  • Vast ecosystem: NumPy, SciPy, Pandas, Matplotlib, scikit-learn, PyTorch/TensorFlow, and thousands of other specialized libraries cover everything from microcontrollers to supercomputers.
  • Scalability: The same language and core libraries work from embedded devices → Raspberry Pi → laptops → HPC clusters.
  • Reproducibility: Open-source nature means anyone can run your code with pip install or conda environments—no license server or version-matching headaches.
  • Embedded / IoT support: Since 2014, MicroPython has brought a capable subset of Python (including exception handling, coroutines, etc.) to low-cost hardware like the Raspberry Pi Pico and many other MCUs/SoCs.

Python’s general-purpose nature also makes it easier to integrate with web apps, databases, GUIs, automation scripts, and version control workflows—areas where Octave is weaker.

Comparison Table

Use Case Recommended Tool Reason
Quick MATLAB script testing / porting GNU Octave Best compatibility
Teaching numerical methods any Octave for pure MATLAB feel; Python for broader skills; Julia for high-performance numerical work
Large-scale data analysis & ML Python Mature ecosystem and tooling
High-performance numerical simulations Julia or Python + Numba/Cython Julia for clean high-speed code
Embedded / low-cost hardware Python (MicroPython) Much broader hardware support
Reproducible open research Python or Julia No licensing barriers
Existing large MATLAB codebase Octave (or Python + oct2py) Minimize immediate rewrite cost

With Python and Oct2Py, Octave can be a bridge for those transitioning away from MATLAB. While Python is often a default choice for new projects, Julia can be a compelling alternative for high-performance numerical work.

Other Mathematical Software

These systems generally have smaller user bases than Python or MATLAB/Octave, largely due to historical momentum and narrower focus.

  • SageMath — Open-source computer algebra system with excellent symbolic math capabilities.
  • Scilab — Another free MATLAB-like environment.
  • GDL — Open-source IDL work-alike, common in astronomy and geophysics.
  • Mathematica / Maple — Proprietary tools with strong symbolic mathematics focus.

Configure shells Bash, Zsh, PowerShell

The default interactive shell for operating systems is typically:

  • Linux: Bash
  • macOS: Zsh
  • Windows: PowerShell

Note that the non-interactive shell may default to a simpler POSIX shell like Dash, so ensure that script shebang line specifies the intended shell for running scripts.

Each shell vendor has configuration files to change the default shell parameters. Shells typically have a persistent command history file that stores the commands that have been executed. This allows users to recall and reuse previous commands. A very long history may retain mistyped commands or commands that are no longer relevant.

Bash

Get the location of the Bash command history file:

echo "${HISTFILE:-$HOME/.bash_history}"

Edit the ~/.bashrc file to include the following settings:

# Number of commands remembered in the current session (in memory)
export HISTSIZE=500

# Number of commands saved to the history file on disk
# Keep at least a little bigger than HISTSIZE to handle duplicates
export HISTFILESIZE=1000

# Ignore both duplicate and empty commands
export HISTCONTROL=ignoredups:ignorespace

Zsh

Get the location of the Zsh command history file:

echo "${HISTFILE:-${ZDOTDIR:-$HOME}/.zsh_history}"

Edit the ~/.zshrc file to include the following settings:

# Number of commands remembered in the current session (in memory)
export HISTSIZE=500

# Number of commands saved to the history file on disk
# Keep at least a little bigger than HISTSIZE to handle duplicates
export HISTFILESIZE=1000

setopt hist_ignore_dups
setopt hist_ignore_space

PowerShell

Get the location of the PowerShell command history file:

(Get-PSReadLineOption).HistorySavePath

Edit the “$profile” file to include the following settings: ignore duplicates and limit the number of commands in the history.

Set-PSReadLineOption -MaximumHistoryCount 500 `
                     -HistoryNoDuplicates

CMake shorten build paths

CMake can be configured to use shorter paths for build paths, which is important for large or complex projects on Windows where the 260 character path limit is a problem for some tools. This is done via CMAKE_INTERMEDIATE_DIR_STRATEGY which is a CMake environment variable as well as a CMake command-line option. The default is to use full paths for human readability, but for those occasions where the path length is a problem, this option can be set to SHORT to use shorter paths.

This example below is contrived to use a long source file path - the problem in practice comes from nested dependencies and build directories, which can easily exceed the 260 character limit on Windows when building a project with CMake. However, this example still demonstrates the issue and the solution with the shorten build path option.

cmake_minimum_required(VERSION 4.2)

project(soLong LANGUAGES CXX)

# make a long path to demonstrate the issue
set(long_path "${CMAKE_BINARY_DIR}/this/is/a/very/long/path/that/will/exceed/the/260/character/limit/on/windows/when/building/a/project/with/cmake/lets/see/if/it/works/with/the/shorten/build/path/option/just/to/make/sure/it/is/long/enough/to/exceed/the/limit/we/need/to/make/sure/it/is/long/enough/to/exceed/the/limit/")

string(LENGTH "${long_path}" L)

message(STATUS "Long path length: ${L} characters")
message(STATUS "CMAKE_INTERMEDIATE_DIR_STRATEGY: ${CMAKE_INTERMEDIATE_DIR_STRATEGY}")

set(CMAKE_EXPORT_COMPILE_COMMANDS ON)
message(STATUS "See file ${CMAKE_BINARY_DIR}/compile_commands.json for the compile commands with the long path")

file(MAKE_DIRECTORY "${long_path}")

file(GENERATE OUTPUT "${long_path}/main.cpp" CONTENT "int main() { return 0; }")

add_executable(soLong "${long_path}/main.cpp")

Compare the -o flag parameter between these two commands. The SHORT will be much shorter than FULL.

cmake -Bbuild -DCMAKE_INTERMEDIATE_DIR_STRATEGY=SHORT && cat build/compile_commands.json
cmake -Bbuild -DCMAKE_INTERMEDIATE_DIR_STRATEGY=FULL && cat build/compile_commands.json

CMake print cache variables

CMake can print cache variables during the configuration phase using any of these methods.

The “cmake” command itself can print cache variables to the console. Variable values may be set by passing -D options to the “cmake” command, or by editing them in the CMake GUI or “ccmake” interface.

cmake -Bbuild -LAH
-L
Print only the variable names and values, without help messages
-LA
Print all variables, including advanced ones that are not shown by default.
-LAH
Also print help message for each variable.

The CMake GUI is available if installed and a graphical desktop is available. Press “Configure” to see the cache variables. Values may be edited if desired.

cmake-gui -S . -B build

The “ccmake” Curses-based interface is available on non-Windows platforms, which can also edit cache variables.

ccmake -B build

From the “ccmake” interface, press “c” to configure, “t” to toggle visibility of Advanced variables that are not shown by default.

Matlab Terminal app

Mathworks published a Terminal emulator app for Matlab, which is performant and well-integrated with Matlab. It does not require any Matlab toolboxes and can be used on all platforms that Matlab supports. Matlab Terminal supports multiple tabs, customizable themes, and various shell and AI Agent environments. For those using a separate IDE like VS Code with Matlab, the integrated terminal in VS Code is still a good choice. For users who prefer to work directly in Matlab, this Matlab Terminal app is a great addition.

Windows symbolic links and reparse points

Symbolic links are useful in any operating system to shorten long, complicated path names like C:/user/foo/data to just C:/data. If encountering problems with user permission, set user permission to create symbolic links on Windows.

Powershell symbolic link creation syntax:

New-Item -ItemType SymbolicLink -Path "Link" -Target "Target"

# for example:
New-Item -ItemType SymbolicLink -Path "my_program.exe" -Target "path/to/my_program.123.exe"

# also for directories:
New-Item -ItemType SymbolicLink -Path "my_fun_dir" -Target "path/to/my_dir"

For clarity, specify the full path to the target file or directory. Especially avoid target “.” or “..” as these can be confusing.

Windows coreutils

With Windows coreutils, creating symbolic links is like in Linux or macOS or Unix-like systems:

ln -s path/to/target path/to/link

Reparse points

Symbolic links on Windows are a type of Reparse Points. fsutil can tell the type of reparse point:

fsutil reparsepoint query "my_fun_dir"

Reparse Tag Value : 0xa000000c

The reparse tag value corresponds to a symbolic link IO_REPARSE_TAG_SYMLINK.

Python test_symlink.py shows symlinks using Python standard library pathlib.

App Execution Alias

fsutil reparsepoint query $Env:LOCALAPPDATA/Microsoft/WindowsApps/wt.exe

Reparse Tag Value : 0x8000001b

The reparse tag value 0x8000001b is a Windows App Execution Alias IO_REPARSE_TAG_APPEXECLINK. App Execution Aliases are not symbolic links, but are a way for Windows CreateProcess to find the correct executable to run from a user-friendly name like “wt.exe” or “bash.exe”.

Not every language works with App Execution Aliases at this time–Java io and nio don’t work with App Execution Aliases currently. Python does work with App Execution Aliases, for example:

python -c "import shutil; print(shutil.which('wt.exe'))"

Unix-like shell

On a Unix-like shell including WSL, softlinks are created like:

ln -s target link

Fortran stack to static warning

GCC / Gfortran 10 and newer warn for arrays too big for the current stack settings. Having arrays that exceed the stack limit may cause unexpected behavior - they should use allocate() instead in general. Example of improper use of stack memory:

subroutine big_array()

real :: big2(1000,1000)

end subroutine

Warning: Array ‘big2’ at (1) is larger than limit set by ‘-fmax-stack-var-size=’, moved from stack to static storage. This makes the procedure unsafe when called recursively, or concurrently from multiple threads. Consider using ‘-frecursive’, or increase the ‘-fmax-stack-var-size=’ limit, or change the code to use an ALLOCATABLE array. [-Wsurprising]

This is generally a true warning when one has assigned arrays as above too large for the stack. Simply making the procedure recursive may lead to segfaults.

Correct the example above like:

subroutine big_array()

real, allocatable :: big2(:,:)

allocate(big2(1000,1000))

end subroutine

For multiple arrays of the same shape do like:

subroutine big_array()

integer :: M=1000,N=2000,P=500

real, allocatable, dimension(:,:,:) :: w,x, y, z

allocate(w(M,N,P))
allocate(x,y,z, mold=x)

end subroutine

Intel oneAPI caution

Intel oneAPI has the option -heap-arrays, but we recommend avoiding this option as it can cause memory leaks.

Shared Fortran / C++ libraries on Windows

Windows has particular linking requirements for shared libraries that can become challenging with MSVC-like compilers such as Intel oneAPI when linking Fortran and C libraries together. In short, the workaround is to use static libraries instead of shared libraries for such cases on Windows.

Example: MUMPS-superbuild project provides MUMPS libraries in the same way as the original MUMPS project’s Makefiles, that is with a library called “mumps_common” and then a library for each of 4 numerical precisions “smumps”, “dmumps”, “cmumps”, and “zmumps” that link against “mumps_common”. This is quite robust across compilers and linkers - the only issue is on Windows with oneAPI when building shared libraries.

Building shared libraries for MUMPS-superbuild is done by:

cmake --workflow shared

Most unresolved externals with oneAPI on Windows were symbols like:

MUMPS_BUF_COMMON_mp_...
MUMPS_OOC_COMMON_mp_...
MUMPS_LR_COMMON_mp_...
MUMPS_LR_STATS_mp_...

A first hypothesis was that auto-export was incomplete. CMake on Windows can auto-export symbols via CMAKE_WINDOWS_EXPORT_ALL_SYMBOLS. In mixed C and Fortran projects, that is often good enough. Given the unresolved names, it was reasonable to suspect missing DATA exports from mumps_common.dll.

We ran a few checks to diagnose:

  1. Inspected linker diagnostics to collect unresolved symbol families.
  2. Inspected generated export definition .def files to see what CMake actually exported.
  3. Compared expected names versus actual exports in the produced DLL/import library path.
  4. Tested oneAPI-specific compile/link flag ideas.

Auto-export did miss symbols that appeared in unresolved lists. But even after supplementing exports, unresolved externals persisted. We tested multiple fixes, including:

  • Supplemental export definitions for missing DATA symbols.
  • oneAPI flag experiments intended to improve dynamic common/module handling.
  • Internal bridge-style link topology changes.

Why these were rejected:

  • Supplemental exports alone did not clear unresolved module-data references.
  • oneAPI flag experiments were either ineffective or unstable for this build.
  • Bridge-link approaches can alter expected import-library behavior and create downstream risk for users linking mumps_common.lib.

The root cause appears to be broader than CMAKE_WINDOWS_EXPORT_ALL_SYMBOLS. Auto-export can be incomplete, but even with explicit extra exports, oneAPI Fortran on Windows still struggled to resolve cross-DLL module/common data references in this configuration.

Practical recommendations

To maintain mixed Fortran/C HPC packaging across compilers, validate shared-library topology on every target compiler, especially Windows. Treat Fortran module/common data across Windows DLL boundaries as a first-class risk. To keep it simple - use static libraries.