Async method for randomly splitting users into groups

We suggest the following method for randomly assigning users to groups without the need to synchronize user lists across systems

Method

  • Start from a unique user id
  • SHA-1 that user id
  • Convert the resulting hash to an integer and take the modulus of 10 (you may restrict the calculation to the last couple of bytes)
  • Assign the user to a group based on the value x resulting from the modulus

Implementation

The following example splits users into 3 groups of 40% A, 40% B, 20% C

CASE  
     WHEN MOD(CAST(CONCAT("0x", SUBSTRING(TO_HEX(SHA1(user_id)),-4)) AS INT64), 10) IN (0, 1, 2, 3)  
     THEN "A"  
     WHEN MOD(CAST(CONCAT("0x", SUBSTRING(TO_HEX(SHA1(user_id)),-4)) AS INT64), 10) IN (4, 5, 6, 7)  
     THEN "B"
     WHEN MOD(CAST(CONCAT("0x", SUBSTRING(TO_HEX(SHA1(user_id)),-4)) AS INT64), 10) IN (8, 9)  
     THEN "C"
END AS user_group
import hashlib
def assign_user_group(user_id):  
    # Convert user_id to SHA1, then to hexadecimal  
    sha1_hex = hashlib.sha1(user_id.encode()).hexdigest()# Extract the last 4 characters of the hex representation
    last_four_hex = sha1_hex[-4:]

    # Convert the hexadecimal (with "0x" prefix) to an integer and find the modulus 10
    mod_result = int(last_four_hex, 16) % 10

    # Assign user_group based on the mod_result
    if mod_result in [0, 1, 2, 3]:
        user_group = "A"
    elif mod_result in [4, 5, 6, 7]:
        user_group = "B"
    elif mod_result in [8, 9]:
        user_group = "C"
    else:
        # Just in case of unexpected result, not necessary unless mod_result might not be in 0-9
        user_group = "unknown"

    return user_group

The ratios can be adjusted by changing the numbers in the array at the end of each statement condition.

The way this works, is that this user property can be implemented by Aampe as well as its customers, and as long as the same ratios are used in the definitions in each system, no user-list synchronisation will be needed.

This property can then be used to set up audience filters across the different systems, diverting users to each system as needed.

📘

If you already have user populations split into groups, you may also pass along that as a user property in the data shared with Aampe