Information extraction from multimedia web documents: an open-source platform and testbed