Published on

Boost Solution Performance by Knowing What Your Software is Running On

Authors
Protected and watermarked picture as part of site or resource
Watermark

When developing software, understanding the system on which your software runs is paramount. By tapping into hardware and system details, developers can tailor applications to leverage maximum performance. Here's a guide on how to use Python to retrieve this information.

As long as you have a Google account, feel free to follow along in this reference notebook:

Open In Colab

1. Starting with the System Information: platform and lsb_release.

The Python standard library has a module named platform which can provide a wealth of information about the system, OS, and its version.

uname = platform.uname()
print(f"System: {uname.system}")  # e.g. 'Windows', 'Linux', 'Darwin' for macOS
print(f"Node Name: {uname.node}")
print(f"Release: {uname.release}")  # e.g. '10' for Windows 10
print(f"Version: {uname.version}") # Detailed version information
print(f"Machine: {uname.machine}") # e.g. 'x86_64' for a 64-bit machine
print(f"Processor: {uname.processor}")  # e.g. 'Intel64 Family 6 Model 78 Stepping 3, GenuineIntel'
Protected and watermarked picture as part of site or resource
Watermark

To get a bit more information about the OS, you can then use another built-in Python package called lsb_release.

import lsb_release
import pprint
pprint.pprint(lsb_release.get_os_release())
Protected and watermarked picture as part of site or resource
Watermark

2. Extracting Detailed CPU and Hardware Information: cpuinfo

The py-cpuinfo and psutil modules offer detailed information about the CPU.

First, you need to ensure they are installed:

!pip install py-cpuinfo psutil

Now we can use them. The first quick check to see some high-level information about the CPUs available and their performance can be done through the use of psutil.

# number of cores
print("Physical cores:", psutil.cpu_count(logical=False))
print("Total cores:", psutil.cpu_count(logical=True))
# CPU frequencies
cpufreq = psutil.cpu_freq()
print(f"Max Frequency: {cpufreq.max:.2f}Mhz")
print(f"Min Frequency: {cpufreq.min:.2f}Mhz")
print(f"Current Frequency: {cpufreq.current:.2f}Mhz")
# CPU usage
print("CPU Usage Per Core:")
for i, percentage in enumerate(psutil.cpu_percent(percpu=True, interval=1)):
    print(f"Core {i}: {percentage}%")
print(f"Total CPU Usage: {psutil.cpu_percent()}%")
Protected and watermarked picture as part of site or resource
Watermark

However, a more detailed look can help you understand which generation of hardware you are running on and the features available to ensure you're tapping into every bit of performance and potential software workload optimization available.  

import cpuinfo
import pprint
my_cpuinfo = cpuinfo.get_cpu_info()
pprint.pprint(my_cpuinfo)
Protected and watermarked picture as part of site or resource
Watermark
Protected and watermarked picture as part of site or resource
Watermark

In the current environment I'm running on, I can see the hardware flags for avx and avx2, which means that I have Advanced Vector Extensions to accelerate vector computations common in AI and ML workloads. However, I don't see the flag for amx, which is for the Advanced Matrix Extensions that are only available on Intel's newest server platform at the time of this writing called Sapphire Rapids.

So how can we determine which platform we're running on? Unfortunately, there's no simple solution, but one way is to reference the family and model variables returned from cpuinfo. In the case where I'm running on a free CPU runtime, we can see that 'family': 6 and 'model': 79. With a quick online search, we can find a page like wikichip that shows us we are most likely using an Intel Broadwell Server.... these were released back in 2014 so we're definitely not getting the last 10 years of advancements in hardware and low-level software innovation.  

Once I connect to a runtime with a connected GPU, I now see that I've been upgraded to 'family': 6 and 'model': 85 which is an Intel Skylake Server. Still not the latest Sapphire Rapids with AMX, but at least I've unlocked some additional AI/ML performance in this newer generation with all of the avx512 flags!

Not complaining here, just showing why it's important to see what's under the hood of these managed services. Google Colab is an amazing freemium service that utilizes what's left of GCP, so obviously they'd want to save the newest hardware for paying customers!

3. Retrieving System Memory Information: psutil

The psutil library is excellent for fetching a wide variety of system details, including memory info. This area is a bit more straightforward to determine:

# get the memory details
svmem = psutil.virtual_memory()
print(f"Total: {get_size(svmem.total)}")
print(f"Available: {get_size(svmem.available)}")
print(f"Used: {get_size(svmem.used)}")
print(f"Percentage: {svmem.percent}%")

The only catch is if you have swap memory. Don't worry about this error-ing if you're unsure, everything will return zeros if it doesn't exist.

# get the swap memory details (if exists)
swap = psutil.swap_memory()
print(f"Total: {get_size(swap.total)}")
print(f"Free: {get_size(swap.free)}")
print(f"Used: {get_size(swap.used)}")
print(f"Percentage: {swap.percent}%")
Protected and watermarked picture as part of site or resource
Watermark

4. Extracting GPU Information: GPUtil

If you are working on tasks involving GPU computation, such as deep learning, knowing your GPU details can be critical.

Install the GPU package:

!pip install GPUtil

Retrieve GPU details:

import GPUtil
# GPU Information
GPUs = GPUtil.getGPUs()
gpulist=[]
for gpu in GPUs:
    print(gpu.name)
    print('gpu.id:', gpu.id)

    print ( 'total GPU:', gpu.memoryTotal)
    print(f"Memory free {gpu.memoryFree}MB")
    print ( 'GPU usage:', gpu.memoryUsed)
    print ( 'gpu use proportion:', gpu.memoryUtil * 100)
    print(str(gpu.temperature) + " C")

    gpulist.append([ gpu.id, gpu.memoryTotal, gpu.memoryUsed,gpu.memoryUtil * 100])

5. Disk Information using psutil

psutil can also provide disk usage and partition details:

# Disk Information
print("Partitions and Usage:")
# get all disk partitions
partitions = psutil.disk_partitions()
for partition in partitions:
    print(f"=== Device: {partition.device} ===")
    print(f"  Mountpoint: {partition.mountpoint}")
    print(f"  File system type: {partition.fstype}")
    try:
        partition_usage = psutil.disk_usage(partition.mountpoint)
    except PermissionError:
        # this can be catched due to the disk that
        # isn't ready
        continue
    print(f"  Total Size: {get_size(partition_usage.total)}")
    print(f"  Used: {get_size(partition_usage.used)}")
    print(f"  Free: {get_size(partition_usage.free)}")
    print(f"  Percentage: {partition_usage.percent}%")
# get IO statistics since boot
disk_io = psutil.disk_io_counters()
print(f"Total read: {get_size(disk_io.read_bytes)}")
print(f"Total write: {get_size(disk_io.write_bytes)}")
Protected and watermarked picture as part of site or resource
Watermark

6. Network Information using psutil

psutil can also provide network details that you may be interested in especially when you have more control over how to connect to your system, you're looking to monitor the amount of network traffic coming into and going out of your system, or you're looking to connect multiple systems:

# get all network interfaces (virtual and physical)
if_addrs = psutil.net_if_addrs()
for interface_name, interface_addresses in if_addrs.items():
    for address in interface_addresses:
        print(f"=== Interface: {interface_name} ===")
        if str(address.family) == 'AddressFamily.AF_INET':
            print(f"  IP Address: {address.address}")
            print(f"  Netmask: {address.netmask}")
            print(f"  Broadcast IP: {address.broadcast}")
        elif str(address.family) == 'AddressFamily.AF_PACKET':
            print(f"  MAC Address: {address.address}")
            print(f"  Netmask: {address.netmask}")
            print(f"  Broadcast MAC: {address.broadcast}")
# get IO statistics since boot
net_io = psutil.net_io_counters()
print(f"Total Bytes Sent: {get_size(net_io.bytes_sent)}")
print(f"Total Bytes Received: {get_size(net_io.bytes_recv)}")
Protected and watermarked picture as part of site or resource
Watermark

7. Preinstalled Python Packages: pkg_resource

Another great habit to get into that will help reduce the size of your solutions and troubleshoot dependency errors as you build, is to review the environment's preinstalled Python packages. When you're using a managed service, it's easy to waste time hunting down issues that are solved by proper usage of virtual environments and package-lock files.  One way to get familiar with what's preinstalled when you jump into a managed environment is by running the following code snippet:

from pkg_resources import working_set
libs = [x.project_name.lower()+' '+x.version for x in working_set]
for lib in sorted(libs):
  print(lib)
Protected and watermarked picture as part of site or resource
Watermark

Now this lowercases all of the packages to sort them in alphabetical order for ease of browsing, but you can get rid of .lower() if you're concerned the packages aren't the same.

Why is this Important?

When you have a clear picture of your hardware and system specifications, you can:

  1. Optimize code for specific hardware (e.g., AVX & AMX accelerations in CPUs, GPU computation).
  2. Determine if parallelism or multiprocessing can be utilized based on the number of CPU cores.
  3. Make decisions about memory-intensive operations based on available RAM.
  4. Avoid bottlenecks related to disk I/O operations by understanding disk capacities.

Understanding the hardware helps in making your software efficient, robust, and, most importantly, performant. Don't neglect it!


Thanks for taking the time to read this overview, I hope it helps you learn something new about the importance of knowing what your software and solutions are running on, even in managed environments,  and the community available to help you tackle any use case.

As always, feel free to reach out to just connect or let me know if I missed any great packages or insights that should be shared!


Amazing list of hardware feature flags and human-understandable descriptions, originally found here.

x86

(32-bit a.k.a. i386-i686 and 64-bit a.k.a. amd64. In other words, your workstation, laptop or server.)

FAQ: **Do I have…**

  • 64-bit (x86_64/AMD64/Intel64)? **lm**
  • Hardware virtualization (VMX/AMD-V)? **vmx** (Intel), **svm** (AMD)
  • Accelerated AES (AES-NI)? **aes**
  • TXT (TPM)? **smx**
  • a hypervisor (announced as such)? **hypervisor**

Most of the other features are only of interest to compiler or kernel authors.

All the flags

The full listing is in the kernel source, in the file arch/x86/include/asm/cpufeatures.h.

Intel-defined CPU features, CPUID level 0x00000001 (edx)

See also Wikipedia and table 2-27 in Intel Advanced Vector Extensions Programming Reference

AMD-defined CPU features, CPUID level 0x80000001

See also Wikipedia and table 2-23 in Intel Advanced Vector Extensions Programming Reference

Transmeta-defined CPU features, CPUID level 0x80860001

  • recovery: CPU in recovery mode
  • longrun: Longrun power control
  • lrti: LongRun table interface

Other features, Linux-defined mapping

  • cxmmx: Cyrix MMX extensions
  • k6_mtrr: AMD K6 nonstandard MTRRs
  • cyrix_arr: Cyrix ARRs (= MTRRs)
  • centaur_mcr: Centaur MCRs (= MTRRs)
  • constant_tsc: TSC ticks at a constant rate
  • up: SMP kernel running on UP
  • art: Always-Running Timer
  • arch_perfmon: Intel Architectural PerfMon
  • pebs: Precise-Event Based Sampling
  • bts: Branch Trace Store
  • rep_good: rep microcode works well
  • acc_power: AMD accumulated power mechanism
  • nopl: The NOPL (0F 1F) instructions
  • xtopology: cpu topology enum extensions
  • tsc_reliable: TSC is known to be reliable
  • nonstop_tsc: TSC does not stop in C states
  • cpuid: CPU has CPUID instruction itself
  • extd_apicid: has extended APICID (8 bits)
  • amd_dcm: multi-node processor
  • aperfmperf: APERFMPERF
  • eagerfpu: Non lazy FPU restore
  • nonstop_tsc_s3: TSC doesn't stop in S3 state
  • tsc_known_freq: TSC has known frequency
  • mce_recovery: CPU has recoverable machine checks

Intel-defined CPU features, CPUID level 0x00000001 (ecx)

See also Wikipedia and table 2-26 in Intel Advanced Vector Extensions Programming Reference

VIA/Cyrix/Centaur-defined CPU features, CPUID level 0xC0000001

  • rng: Random Number Generator present (xstore)
  • rng_en: Random Number Generator enabled
  • ace: on-CPU crypto (xcrypt)
  • ace_en: on-CPU crypto enabled
  • ace2: Advanced Cryptography Engine v2
  • ace2_en: ACE v2 enabled
  • phe: PadLock Hash Engine
  • phe_en: PHE enabled
  • pmm: PadLock Montgomery Multiplier
  • pmm_en: PMM enabled

More extended AMD flags: CPUID level 0x80000001, ecx

  • lahf_lm: Load AH from Flags (LAHF) and Store AH into Flags (SAHF) in long mode
  • cmp_legacy: If yes HyperThreading not valid
  • svm: “Secure virtual machine”: AMD-V
  • extapic: Extended APIC space
  • cr8_legacy: CR8 in 32-bit mode
  • abm: Advanced Bit Manipulation
  • sse4a: SSE-4A
  • misalignsse: indicates if a general-protection exception (#GP) is generated when some legacy SSE instructions operate on unaligned data. Also depends on CR0 and Alignment Checking bit
  • 3dnowprefetch: 3DNow prefetch instructions
  • osvw: indicates OS Visible Workaround, which allows the OS to work around processor errata.
  • ibs: Instruction Based Sampling
  • xop: extended AVX instructions
  • skinit: SKINIT/STGI instructions
  • wdt: Watchdog timer
  • lwp: Light Weight Profiling
  • fma4: 4 operands MAC instructions
  • tce: translation cache extension
  • nodeid_msr: NodeId MSR
  • tbm: Trailing Bit Manipulation
  • topoext: Topology Extensions CPUID leafs
  • perfctr_core: Core Performance Counter Extensions
  • perfctr_nb: NB Performance Counter Extensions
  • bpext: data breakpoint extension
  • ptsc: performance time-stamp counter
  • perfctr_l2: L2 Performance Counter Extensions
  • mwaitx: MWAIT extension (MONITORX/MWAITX)

Auxiliary flags: Linux defined - For features scattered in various CPUID levels

  • ring3mwait: Ring 3 MONITOR/MWAIT
  • cpuid_fault: Intel CPUID faulting
  • cpb: AMD Core Performance Boost
  • epb: IA32_ENERGY_PERF_BIAS support
  • cat_l3: Cache Allocation Technology L3
  • cat_l2: Cache Allocation Technology L2
  • cdp_l3: Code and Data Prioritization L3
  • invpcid_single: effectively invpcid and CR4.PCIDE=1
  • hw_pstate: AMD HW-PState
  • proc_feedback: AMD ProcFeedbackInterface
  • sme: AMD Secure Memory Encryption
  • pti: Kernel Page Table Isolation (Kaiser)
  • retpoline: Retpoline mitigation for Spectre variant 2 (indirect branches)
  • retpoline_amd: AMD Retpoline mitigation
  • intel_ppin: Intel Processor Inventory Number
  • avx512_4vnniw: AVX-512 Neural Network Instructions
  • avx512_4fmaps: AVX-512 Multiply Accumulation Single precision
  • mba: Memory Bandwidth Allocation
  • rsb_ctxsw: Fill RSB on context switches

Virtualization flags: Linux defined

  • tpr_shadow: Intel TPR Shadow
  • vnmi: Intel Virtual NMI
  • flexpriority: Intel FlexPriority
  • ept: Intel Extended Page Table
  • vpid: Intel Virtual Processor ID
  • vmmcall: prefer VMMCALL to VMCALL

Intel-defined CPU features, CPUID level 0x00000007:0 (ebx)

Extended state features, CPUID level 0x0000000d:1 (eax)

  • xsaveopt: Optimized XSAVE
  • xsavec: XSAVEC
  • xgetbv1: XGETBV with ECX = 1
  • xsaves: XSAVES/XRSTORS

Intel-defined CPU QoS sub-leaf, CPUID level 0x0000000F:0 (edx)

  • cqm_llc: LLC QoS

Intel-defined CPU QoS sub-leaf, CPUID level 0x0000000F:1 (edx)

  • cqm_occup_llc: LLC occupancy monitoring
  • cqm_mbm_total: LLC total MBM monitoring
  • cqm_mbm_local: LLC local MBM monitoring

AMD-defined CPU features, CPUID level 0x80000008 (ebx)

  • clzero: CLZERO instruction
  • irperf: instructions retired performance counter
  • xsaveerptr: Always save/restore FP error pointers

Thermal and Power Management leaf, CPUID level 0x00000006 (eax)

  • dtherm (formerly dts): digital thermal sensor
  • ida: Intel Dynamic Acceleration
  • arat: Always Running APIC Timer
  • pln: Intel Power Limit Notification
  • pts: Intel Package Thermal Status
  • hwp: Intel Hardware P-states
  • hwp_notify: HWP notification
  • hwp_act_window: HWP Activity Window
  • hwp_epp: HWP Energy Performance Preference
  • hwp_pkg_req: HWP package-level request

AMD SVM Feature Identification, CPUID level 0x8000000a (edx)

  • npt: AMD Nested Page Table support
  • lbrv: AMD LBR Virtualization support
  • svm_lock: AMD SVM locking MSR
  • nrip_save: AMD SVM next_rip save
  • tsc_scale: AMD TSC scaling support
  • vmcb_clean: AMD VMCB clean bits support
  • flushbyasid: AMD flush-by-ASID support
  • decodeassists: AMD Decode Assists support
  • pausefilter: AMD filtered pause intercept
  • pfthreshold: AMD pause filter threshold
  • avic: Virtual Interrupt Controller
  • vmsave_vmload: Virtual VMSAVE VMLOAD
  • vgif: Virtual GIF

Intel-defined CPU features, CPUID level 0x00000007:0 (ecx)

  • avx512vbmi: AVX512 Vector Bit Manipulation instructions
  • umip: User Mode Instruction Protection
  • pku: Protection Keys for Userspace
  • ospke: OS Protection Keys Enable
  • avx512_vbmi2: Additional AVX512 Vector Bit Manipulation instructions
  • gfni: Galois Field New Instructions
  • vaes: Vector AES
  • vpclmulqdq: Carry-Less Multiplication Double Quadword
  • avx512_vnni: Vector Neural Network Instructions
  • avx512_bitalg: VPOPCNT[B,W] and VPSHUF-BITQMB instructions
  • avx512_vpopcntdq: POPCNT for vectors of DW/QW
  • la57: 5-level page tables
  • rdpid: RDPID instruction

AMD-defined CPU features, CPUID level 0x80000007 (ebx)

  • overflow_recov: MCA overflow recovery support
  • succor: uncorrectable error containment and recovery
  • smca: Scalable MCA

Detected CPU bugs (Linux-defined)

  • f00f: Intel F00F
  • fdiv: CPU FDIV
  • coma: Cyrix 6x86 coma
  • amd_tlb_mmatch: tlb_mmatch AMD Erratum 383
  • amd_apic_c1e: apic_c1e AMD Erratum 400
  • 11ap: Bad local APIC aka 11AP
  • fxsave_leak: FXSAVE leaks FOP/FIP/FOP
  • clflush_monitor: AAI65, CLFLUSH required before MONITOR
  • sysret_ss_attrs: SYSRET doesn't fix up SS attrs
  • espfix: "" IRET to 16-bit SS corrupts ESP/RSP high bits
  • null_seg: Nulling a selector preserves the base
  • swapgs_fence: SWAPGS without input dep on GS
  • monitor: IPI required to wake up remote CPU
  • amd_e400: CPU is among the affected by Erratum 400
  • cpu_meltdown: CPU is affected by meltdown attack and needs kernel page table isolation
  • spectre_v1: CPU is affected by Spectre variant 1 attack with conditional branches
  • spectre_v2: CPU is affected by Spectre variant 2 attack with indirect branches
  • spec_store_bypass: CPU is affected by the Speculative Store Bypass vulnerability (Spectre variant 4).

P.S. This listing was derived from arch/x86/include/asm/cpufeatures.h in the kernel source. The flags are listed in the same order as the source code. Please help by adding links to descriptions of features when they're missing, by writing a short description of features that have an unexpressive names, and by updating the list for new kernel versions. The current list is from Linux 4.15 plus some later additions.