structural overview of Patternscape
We developed this project to analyze passwords and detect specific patterns in them. Patterns database then can be used for any appropriate security applications. It started as a command line app but latter GUI made more sense, but still the backend can be used as a standalone command line app.
Base(backend) is made up of C++, mid layer is **NAPI,** it is a node.js library to glue node and c++ and lastly GUI is done with electron.js.
Layer 1: C++ backend to process raw data, detect patterns, generate pattern statistics.
Layer 1.5: Next-generation api( NAPI ) to create node.js api for our c++ code.
Layer 2: GUI and node.js processes done with electron.js.
Backend process data breaches to detect patterns by comparing them with lists n lists of data. Currently 7 types of comparison and calculation are done to detect patterns:
<aside> 💡 Note: every list consist unique records.
</aside>
Common Names ( ~188,000 records )- This list contains name strings originating from different regions/cultures. The biggest part is shared by Arabic, Christian, Chinese and Indians. The rest depend on common name lists available on google dataset engine.
Common Locations ( ~39,200 records )-
Here we included names of all the countries and their states, then I tried for cities but that data looked too huge for current scope, so we went with most popular cities of every country.
Common Words ( ~370,000 records )-
It includes all English dictionary words, then some datasets of most popular words in passwords. Then it includes common words datasets including English and non-English words.
Websites-
Their are not many records needed for this and every email does come with partial or full website at the end, so it uses that.
Email's front-
Email's front do get compared by passwords to check if they are being used in passwords or not.
Date of birth-
Dob are most general digits used in passwords, our program recognize these types of dob-
Mobile Number-
Mobile number are second most common digits used in passwords. But main problem in recognizing them was length of numbers. Mobile/Phone numbers length vary between 4-15, so the program make calculation on digits group equal or more then 4 digits and less then 15.