pdf format
http://www.semanlink.net/tag/pdf_format
Documents tagged with pdf formatJerry Liu sur X : "a big step towards better RAG... is to just have a really nice PDF parser. It’s so important because a good parser unlocks way more interesting indexing/retrieval strategies…"
http://www.semanlink.net/doc/2024/02/jerry_liu_sur_x_i%E2%80%99ve_talked_
2024-02-23T18:12:06ZJerry Liu sur X : "There's different ways you can parse embedded tables for RAG..."
http://www.semanlink.net/doc/2023/12/jerry_liu_sur_x_there_s_diff
2023-12-02T08:57:49ZJerry Liu sur Twitter : "If you’re building “chat over your PDFs” with LLMs, you need to deal with the pesky issue of how to parse embedded tables/diagrams..."
http://www.semanlink.net/doc/2023/07/jerry_liu_sur_twitter_if_you
> Native text splitting + top-k on your tables == bad results!
> A nuanced, hierarchical data representation over your PDF can help
2023-07-07T00:32:21ZJerry Liu sur Twitter : "The `camelot` package is an awesome module for extracting tables from PDFs..."
http://www.semanlink.net/doc/2023/07/jerry_liu_sur_twitter_the_c
2023-07-03T07:43:02ZJas Singh sur Twitter : "ChatGPT can now turn your PDFs into chatbots… in ONLY 3 Clicks..."
http://www.semanlink.net/doc/2023/06/jas_singh_sur_twitter_chatgp
2023-06-04T09:36:24ZChatPDF - Chat with any PDF!
http://www.semanlink.net/doc/2023/05/chatpdf_chat_with_any_pdf_
2023-05-18T15:53:08ZDataChazGPT sur Twitter : "The new 𝚝𝚛𝚊𝚗𝚜𝚏𝚘𝚛𝚖𝚎𝚛𝚜.𝚝𝚘𝚘𝚕𝚜 library from @huggingface is insane! E.g. you can summarize and chat with a PDF in just 6 lines of code..."
http://www.semanlink.net/doc/2023/05/datachazgpt_%F0%9F%A4%AF_not_a_bot_sur_
using [textract](doc:2023/05/deanmalmgren_textract_extract_)
2023-05-14T10:24:08ZAndrej Karpathy sur Twitter : "Any piece of content can and will be instantiated into a Q&A assistant" / Twitter
http://www.semanlink.net/doc/2023/04/andrej_karpathy_sur_twitter__2
2023-04-20T13:15:26Zmayooear/gpt4-pdf-chatbot-langchain: GPT4 & LangChain Chatbot for large PDF docs
http://www.semanlink.net/doc/2023/04/mayooear_gpt4_pdf_chatbot_langc
> "How to chat with a 56-page PDF"
2023-04-20T13:08:08ZPyPDF2 · PyPI
http://www.semanlink.net/doc/2023/04/pypdf2_%C2%B7_pypi
> PyPDF2 is a free and open-source pure-python PDF library capable of splitting, merging, cropping, and transforming the pages of PDF files.
2023-04-19T12:40:13ZDelip Rao sur Twitter : "Let's talk about PDF Parsers. What are the best paid/free PDF parsers?"
http://www.semanlink.net/doc/2023/02/delip_rao_sur_twitter_let_s_
2023-02-23T08:14:42Z[P] Sioyek 1.4 | Academic PDF Viewer : MachineLearning
http://www.semanlink.net/doc/2022/07/p_sioyek_1_4_%7C_academic_pdf_v
2022-07-14T11:33:33ZHow to extract Highlighted Parts from PDF files - Stack Overflow
http://www.semanlink.net/doc/2021/10/how_to_extract_highlighted_part
2021-10-21T14:23:17Zpdf2table: A Method to Extract Table Information from PDF Files
http://www.semanlink.net/doc/2020/04/pdf2table_a_method_to_extract_
2020-04-02T15:35:47ZScraping Data - UHack Guide
http://www.semanlink.net/doc/2020/01/scraping_data_uhack_guide
2020-01-23T18:14:58Z