I'm attempting to retrieve information from this XML page into a Google Sheets workbook using the IMPORTXML function.
The following formula, which had worked before, now returns a "Could not fetch url" error:
=IMPORTXML(<referenced URL>,"//*[local-name() = 'assigned-sic-desc']")
Changing https:// to http://, as suggested here, did not work.
A similar question to mine above using the IMPORTHTML function has a response that a site's robots.txt file may be a reason for such error.
Though this question's site's robots.txt file lists the path ending in /cgi-bin, Google Sheets has no problem fetching the same URL with the IMPORTFEED function in the formula below (this function cannot retrieve the information needed):
=IMPORTFEED(<referenced URL>,"feed author")
Note also that other URLs that include the /cgi-bin path returned the correct information with IMPORTXML or IMPORTHTML functions and that the HTML output for the referenced URL (i.e. without the &output=atom in the URL query) has a hyperlink labeled "RSS Feed" pointing to the XML webapge, possibly suggesting that the page is allowed to be indexed or crawled by search engines.
QUESTIONS
- Is this possibly an inconsistency by the website or Google Sheet or is something missing?
- Is there a way to retrieve the data into Google Sheets from the XML webpage or the HTML webpage?