dsresumatch.pdf_cv_processing ============================= .. py:module:: dsresumatch.pdf_cv_processing Functions --------- .. autoapisummary:: dsresumatch.pdf_cv_processing.read_pdf dsresumatch.pdf_cv_processing.clean_text dsresumatch.pdf_cv_processing.count_words_in_pdf Module Contents --------------- .. py:function:: read_pdf(file_path) Extract text content from a PDF file and return it as a single consolidated string. Parameters ---------- file_path : str Path to the PDF file. Returns ------- str PDF file contents as text. Examples -------- >>> read_pdf("cv.pdf") 'Work Experience Software Developer at XYZ Corp. Education Bachelor of Science in Computer Science ' .. py:function:: clean_text(raw_text) Convert raw_text to lowercase, remove punctuation, and filter out common English stop words to retain only meaningful words in the string. :param raw_text: Text to clean. :type raw_text: str :returns: Cleaned text. :rtype: str .. rubric:: Examples >>> clean_text("Work Experience: Software Developer at XYZ Corp!") 'work experience software developer xyz corp' .. py:function:: count_words_in_pdf(file_path) Count the frequency of words in a PDF file. This function converts all words to lowercase, removing punctuation, and excluding common English stop words to ensure meaningful word counts. :param file_path: Path to the PDF file. :type file_path: str :returns: Dictionary-like object with the frequency of each remaining word where keys are words and values are counts. :rtype: collections.Counter .. rubric:: Examples >>> count_words_in_pdf("cv.pdf") Counter({'work': 1, 'experience': 1, 'software': 1, 'developer': 1, 'at': 1, 'xyz': 1, 'corp': 1, 'education': 1, 'bachelor': 1, 'of': 1, 'science': 1, 'in': 1, 'computer': 1})