Create my first open source project – Pyxtract:
– https://github.com/skupriienko/Pyxtract
Python module for extracting texts from thousands of URL and/or PDF:
– Download and parsing articles from URLs
– Extract texts from local PDF files
– Analyze texts
– NLP preprocessing
– Visualization
– Word clouds
– Name Entity Recognition
![](https://kuprienko.info/wp-content/uploads/2020/10/NER.png)
![](https://kuprienko.info/wp-content/uploads/2020/10/boxplot-1024x676.png)