{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# dsresumatch Tutorial\n",
    "\n",
    "Welcome to the ResuMatch package documentation! This package is designed to analyze resumes and job descriptions by extracting key sections (Skills, Education, Work Experience, and Contact) and scoring them based on relevant keywords. Here, we’ll illustrate the usage of these functions with a real-life example, featuring Daniel, a junior data scientist on the hunt for his next exciting role."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "0.1.0\n"
     ]
    }
   ],
   "source": [
    "import dsresumatch\n",
    "\n",
    "print(dsresumatch.__version__)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Daniel’s Journey\n",
    "\n",
    "Daniel has spent countless hours refining his resume, but still struggles to get callbacks from potential employers. After sending off 250 applications to several data science positions, and almost losing hope, he discovers dsresumatch. Intrigued by its ability to pinpoint missing information and relevant keywords, Daniel uses it to transform his resume into a tailored, high-impact profile for each job description he targets."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Reading the PDF\n",
    "\n",
    "First things first, Daniel reads the PDF version of his resume using the read_pdf function from our package. This function automatically extracts all readable text and returns it as a string, making it easy to analyze, filter, or score."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "[nltk_data] Downloading package stopwords to\n",
      "[nltk_data]     C:\\Users\\timot\\AppData\\Roaming\\nltk_data...\n",
      "[nltk_data]   Package stopwords is already up-to-date!\n"
     ]
    }
   ],
   "source": [
    "from dsresumatch.pdf_cv_processing import read_pdf\n",
    "\n",
    "resume_text = read_pdf(\"bad_cv.pdf\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Checking for Resume Headers\n",
    "From the previous function, we extracted the PDF words into a string so that it is ready to be passed into `missing_section()` function to see if there are any missing reasume headers. \n",
    "\n",
    "Let's jump into!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'Daniel H. Personal Statement I’m a data scientist, kind of. I like data. I did some Python stuff once. Not sure what else to say. Education Bachelor’s degree in Math.  Internship I worked as a data science intern for some months. Did some machine learning.   '"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "resume_text"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "As we can we see, the resume PDF has been converted to just string of words and stored in `resume_text`."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Now, Check Sections!\n",
    "\n",
    "Let's take a look at which section headers are missing from Daniel's resume by passing the text through `missing_section()` function."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "['Contact', 'Work Experience', 'Skills']"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from dsresumatch.sections_check import missing_section\n",
    "\n",
    "# This is a simple use case:\n",
    "section_check = missing_section(resume_text)\n",
    "\n",
    "# Print the missing sections identified using the function\n",
    "section_check"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "From the `section_check` variable, Daniel identifies that his resume is missing \"Contact\", \"skills\" and \"Work Experience\" sections."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Include more sections for checking.\n",
    "\n",
    "Here, we can use `add_benchmark_sections` argument to supply additional sections to include into the `missing_section()` function for checking. For example:\n",
    "\n",
    "Daniel saw an article that says \"Personal Statement\" and \"Volunteer\" are one of the key sections in a resume. Now he wants to include the two additional sections into the function to check if they are present."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "['Skills', 'Contact', 'Volunteer', 'Work Experience']"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Additional sections check use case:\n",
    "add_section_check = missing_section(resume_text, add_benchmark_sections=[\"Personal Statement\", \"Volunteer\"])\n",
    "\n",
    "# Print the missing sections with the additional sections included\n",
    "add_section_check"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "After including the additional sections using the `add_benchmark_sections` argument in the `missing_section()` function, it is found that \"Personal Statement\" is present in Daniel's resume but \"Volunteer\", \"Contact\", \"Work Experience\" and \"Skills\" are missing. "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Check for Keywords\n",
    "\n",
    "Now Daniel wants to check which keywords are most important in a data science resume. To do this, he passes the `resume_text` through `evaluate_keywords()` function to see which keywords are missing from his resume. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "['git',\n",
       " 'data analysis',\n",
       " 'sql',\n",
       " 'numpy',\n",
       " 'teamwork',\n",
       " 'project management',\n",
       " 'pytorch',\n",
       " 'leadership',\n",
       " 'problem solving',\n",
       " 'jupyter',\n",
       " 'communication',\n",
       " 'pandas',\n",
       " 'docker',\n",
       " 'scikit-learn',\n",
       " 'statistics',\n",
       " 'tensorflow',\n",
       " 'aws']"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from dsresumatch.evaluate_keywords import evaluate_keywords\n",
    "\n",
    "# This is a simple use case:\n",
    "check_keywords = evaluate_keywords(resume_text)\n",
    "\n",
    "# Print the missing keywords identified using the function\n",
    "check_keywords"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Daniel can now see all the important keywords missing from his resume.\n",
    "\n",
    "In the same article as section check, Daniel read that a resume should have a couple more keywords when targeting for data science. To check them, he passed the additonal keywords to the `evaluate_keywords()` function."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "['a/b testing',\n",
       " 'aws',\n",
       " 'communication',\n",
       " 'data analysis',\n",
       " 'docker',\n",
       " 'effeciency',\n",
       " 'git',\n",
       " 'hyperparameter',\n",
       " 'jupyter',\n",
       " 'leadership',\n",
       " 'numpy',\n",
       " 'pandas',\n",
       " 'performance metrics',\n",
       " 'problem solving',\n",
       " 'project management',\n",
       " 'pytorch',\n",
       " 'scikit-learn',\n",
       " 'sql',\n",
       " 'statistics',\n",
       " 'teamwork',\n",
       " 'tensorflow']"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Additional sections check use case: \n",
    "add_keywords_check1 = evaluate_keywords(resume_text, keywords=[\"hyperparameter\", \"effeciency\", \"performance metrics\", \"A/B testing\"])\n",
    "\n",
    "# Sorting keywords to view them alphabetically\n",
    "add_keywords_check1.sort()\n",
    "\n",
    "# Print the missing keywords with the additional keywords included\n",
    "add_keywords_check1"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Daniel can now see all the important keywords missing from his resume along with the additonal keywords he supplied.\n",
    "\n",
    "Additionally, Daniel wants to confirm that his resume has 'Bachelor's Degree', 'Math', and 'Computer Science' listed. To do this, he passes the `resume_text` through the `evaulate_keywords()` section and updates the `use_only_supplied_keywords` argument to `True` so that only the supplied keywords are evaluated in the `resume_text`. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "['computer science']"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Additional sections check use case with use_only_supplied_keywords set to \"True\": \n",
    "add_keywords_check2 = evaluate_keywords(resume_text, keywords=[\"Bachelor’s degree\", \"Math\", \"Computer Science\"], use_only_supplied_keywords=True)\n",
    "\n",
    "# Print the missing keywords with the additional keywords included\n",
    "add_keywords_check2"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now Daniel can confirm that 'Bachelor's Degree' and 'Math' are included in his resume but 'Computer Science' is not. "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Resume Score\n",
    "\n",
    "Daniel wants to know on a numerical scale how good his resume is. Fortunately, dsresumatch has a `resume_score()` function that gives a score to the resume."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'This resume attained a score of 12.50.'"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from dsresumatch.resume_scoring import resume_score\n",
    "\n",
    "# Calculate the resume score. To get the score only, pass the argument 'feedback=False'\n",
    "score = resume_score(resume_text, feedback=False)\n",
    "\n",
    "# Print the resume score\n",
    "score"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Resume Summary\n",
    "\n",
    "Daniel has gotten feedback from separate functions. Now, he wants a summary of all the feedback, so he can improve on his resume. This can be achieved by not setting feedback to False, when calling the `resume_score` function."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "This resume attained a score of 12.50. \n",
      " - Missing Keywords: git, data analysis, sql, numpy, teamwork, project management, pytorch, leadership, problem solving, jupyter, communication, pandas, docker, scikit-learn, statistics, tensorflow, aws \n",
      " - Missing Sections: Contact, Work Experience, Skills\n"
     ]
    }
   ],
   "source": [
    "# Get the resume summary. By default, feedback = True.\n",
    "summary = resume_score(resume_text)\n",
    "\n",
    "# Print the resume score\n",
    "print(summary)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Improving the Resume\n",
    "\n",
    "Based on all the feedback, Daniel created a new resume. He now wants to see how it performs on dsresumatch. He does the following functions to load and score his new resume."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "ename": "NameError",
     "evalue": "name 'read_pdf' is not defined",
     "output_type": "error",
     "traceback": [
      "\u001b[1;31m---------------------------------------------------------------------------\u001b[0m",
      "\u001b[1;31mNameError\u001b[0m                                 Traceback (most recent call last)",
      "Cell \u001b[1;32mIn[2], line 1\u001b[0m\n\u001b[1;32m----> 1\u001b[0m new_resume \u001b[38;5;241m=\u001b[39m \u001b[43mread_pdf\u001b[49m(\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mgood_cv.pdf\u001b[39m\u001b[38;5;124m\"\u001b[39m)\n\u001b[0;32m      2\u001b[0m summary \u001b[38;5;241m=\u001b[39m resume_score(new_resume)\n\u001b[0;32m      3\u001b[0m \u001b[38;5;28mprint\u001b[39m(summary)\n",
      "\u001b[1;31mNameError\u001b[0m: name 'read_pdf' is not defined"
     ]
    }
   ],
   "source": [
    "new_resume = read_pdf(\"good_cv.pdf\")\n",
    "summary = resume_score(new_resume)\n",
    "print(summary)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "dsresumatch",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.11"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}