Posts

Showing posts from July, 2020

Structure in an unstructured world.

Image
Reading unstructured data from Microsoft Word Many businesses have data which is held in an unstructured format within Microsoft Word, product marketing data, such as descriptions, key features, etc.. which is used on retailer websites.  This ‘information’ is generic and not linked to a specific, tangible structured data in any way..   As an example, if we wanted to link the marketing description, which is held in a separate word document for each of our products, to a specific product SKU to use in reporting or to load in to a new system, this would have to be done manually, which is massively inefficient and time consuming. A solution Using Alteryx, I addressed the challenge through utalising the Python tool and a little Python scripting  to read a list of Word documents from a directory.  This was built out in to a custom macro for reusability. Then for each file, read the content, identify each individual line of text and build a ‘list’. This list is then transformed in to a single