GitHub - QCoding/qsum: Intuitive and extendable checksumming for python objects · GitHub
Skip to content

QCoding/qsum

Repository files navigation

qsum: Python checksumming toolkit

Intuitive and extendable checksumming for python objects

latest pypi release latest conda forge release github build status coverage license downloads noarch

Goals

  • Provide a checksumming toolkit for python with out of the box support for common types
  • Architect a framework for implementing customized checksumming logic
  • Produce high quality checksums with extraordinarily low collision rates
  • Build a toolkit for using and manipulating checksums
  • Test it all with 100% coverage and support python 3.8, 3.9, 3.10, 3.11, 3.12, 3.13 and 3.14

Where to get it

Source code is available on github: https://github.com/QCoding

Install with conda:

# from conda forge: https://anaconda.org/conda-forge/qsum
conda install qsum

Install with pip:

# from PyPI: https://pypi.org/project/qsum/
pip install qsum

How to use it

# Functional Interface
from qsum import checksum
checksum('abc')

# Class Interface
from qsum import Checksum
Checksum('abc').checksum_bytes

Design

  • QSUM CHECKSUM = TYPE PREFIX + DATA CHECKSUM
    • The first two bytes of every checksum represent the type and will be referred to as the 'type prefix'
    • The rest of the checksum in a digest of the byte representation of the object and will be refered to as the 'data checksum'

Relationship to __hash__

  • Respect the same contract as __hash__ with regards to: 'The only required property is that objects which compare equal have the same hash value'
  • Do not salt hash values (unless requested) and maintain stability in checksums throughout python sessions and versions along with releases of this package
  • PYTHONHASHSEED should have no effect on checksums
  • Provide significantly longer checksums than __hash__ which 'is typically 8 bytes on 64-bit builds and 4 bytes on 32-bit builds'
  • Represent all checksums as bytes but provide a toolkit to view more human readable formats like hexdigests
  • Base checksums on object contents and permit the calculation of checksums on mutable objects

Adding Salt

  • By default the environment is not included in the checksum but individual package versions can be included if the package name is added via the depends_on argument
  • To include the entire python environment in the checksum:
    from qsum import checksum, DependsOn
    checksum('abc', depends_on=DependsOn.PythonEnv)
    

Type Support

  • The great majority of Built-in Types including collections are checksummable
    • bool, int, float, complex, str, bytes, tuple, list, dict, set, deque, etc.
  • Common types have registered type prefixes which can be used to recover the type from the checksum

Custom Containers

  • Custom container classes that inherit from common python containers (E.g. tuple, list, set, dict) are checksummable
  • The class name is not recoverable from the type prefix but will be added as salt to the data checksum to prevent collisions

Functions and Modules

  • Functions are checksummed based on a combination of their source code, attributes and module location
  • Modules are checksummed simply based on the hash of their source code

Files

  • When passed an open file handle qsum will include all the bytes of the file in the checksum calculation

References

Wikipedia Checksum

Python Hashlib

Python __hash__

What Happens When You Mess With Hashing In Python

About

Intuitive and extendable checksumming for python objects

Topics

Resources

License

Stars

Watchers

Forks

Packages

Contributors