High Performance PostgreSQL for Rails: Off-by-one length bug in `SCRUB_EMAIL` (page 52)

Hi @andatki,

This is a follow-up to my previous error report, High Performance PostgreSQL for Rails: Confusing `SUBSTR` arguments in `SCRUB_EMAIL` (page 52). The + 1 after GREATEST(…) was necessary when we were passing 0 as the start index to SUBSTR, but now it’s incorrect and results in a scrubbed email address that is one character too long when the original email username has a length greater than or equal to 5. The example on Page 53 actually demonstrates the bug:

SELECT SCRUB_EMAIL('bob-and-jane@gmail.com');
-- scrub_email
-- ------------------------
-- 2bffb502c463@gmail.com

“bob-and-jane@gmail.com” has length 22, whereas “2bffb502c463@gmail.com” has length 23:

owner@localhost:5432 rideshare_development# SELECT LENGTH('bob-and-jane@gmail.com'), LENGTH(SCRUB_EMAIL('bob-and-jane@gmail.com'));
 length | length
--------+--------
     22 |     23
(1 row)

Thank you,

Doug

1 Like

Hi @douglasshuang Shoot! Well unfortunately it’s too late to fix it in the book. I do have the functions or similar ones here in the Rideshare source. If you’d like to create a PR it might help someone else who goes looking. This thread may also help them find it. rideshare/db/functions at main · andyatkinson/rideshare · GitHub Fortunately since the intent was to have similar sized data but scrubbed, if it’s one character too long of random text, at least it will be uniformly 1 character too long, so the proportions of lengths should stay the same. Thanks for continuing to be a reader and bug finder. This forum had gotten pretty quiet lately! :slight_smile: