Create my first open source project – Pyxtract:
– https://github.com/skupriienko/Pyxtract
Python module for extracting texts from thousands of URL and/or PDF:
– Download and parsing articles from URLs
– Extract texts from local PDF files
– Analyze texts
– NLP preprocessing
– Visualization
– Word clouds
– Name Entity Recognition