dsresumatch.pdf_cv_processing
=============================

.. py:module:: dsresumatch.pdf_cv_processing


Functions
---------

.. autoapisummary::

   dsresumatch.pdf_cv_processing.read_pdf
   dsresumatch.pdf_cv_processing.clean_text
   dsresumatch.pdf_cv_processing.count_words_in_pdf


Module Contents
---------------

.. py:function:: read_pdf(file_path)

       Extract text content from a PDF file and return it as a single consolidated string.

       Parameters
       ----------
       file_path : str
           Path to the PDF file.

       Returns
       -------
       str
           PDF file contents as text.

       Examples
       --------
       >>> read_pdf("cv.pdf")
       'Work Experience
   Software Developer at XYZ Corp.
   Education
   Bachelor of Science in Computer Science
   '


.. py:function:: clean_text(raw_text)

   Convert raw_text to lowercase, remove punctuation, and filter out common English stop words
   to retain only meaningful words in the string.

   :param raw_text: Text to clean.
   :type raw_text: str

   :returns: Cleaned text.
   :rtype: str

   .. rubric:: Examples

   >>> clean_text("Work Experience: Software Developer at XYZ Corp!")
   'work experience software developer xyz corp'


.. py:function:: count_words_in_pdf(file_path)

   Count the frequency of words in a PDF file.

   This function converts all words to lowercase, removing punctuation, and excluding common English
   stop words to ensure meaningful word counts.

   :param file_path: Path to the PDF file.
   :type file_path: str

   :returns: Dictionary-like object with the frequency of each remaining word where keys are words and
             values are counts.
   :rtype: collections.Counter

   .. rubric:: Examples

   >>> count_words_in_pdf("cv.pdf")
   Counter({'work': 1, 'experience': 1, 'software': 1, 'developer': 1, 'at': 1, 'xyz': 1,
   'corp': 1, 'education': 1, 'bachelor': 1, 'of': 1, 'science': 1, 'in': 1, 'computer': 1})