Machine Learning — Gender Identifier with NLTK in less than 15 lines of code

  1. Understand the pattern of your whole data set to build feature set on which model is going to be trained
  2. As per NLTK documentation, it is observed that Female names most likely to end with ‘a’, ‘e’, ‘i’ whereas Male names are with ‘k’, ‘r’, ‘o’, ‘s’, & ‘t’. You are free to analyze & come up with your own way to find out the pattern through which to want to build your feature set. Now we know how we need to build feature set however we have not yet collected the required data for extracting feature set. NLTK team has done a very good job in providing lots of data that we can use to learn or build whatever we wish to if possible. One of the best feature is CORPUS & for Gender Identification, we are going to use names corpus that has 2 categories male name & female name.Create method in python program as per below line of codes to extract last character of the name & this will help us to create feature set based on the assumption that last character can be most useful attribute to identify gender of the name.
  1. Accuracy of Model with Test Dataset
  2. Overfitting and Underfitting concepts
  3. Error-Analysis to enrich FeatureSet
  4. How to Improve Accuracy with Error-Analysis to find pattern

--

--

BlockChain Evangelist & Enthusiast with 13 years of experience as Software Test Automation Architect - https://www.linkedin.com/in/abhishek-jain-31a72133/

Love podcasts or audiobooks? Learn on the go with our new app.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store