Importing an existing site
**************************


Description
===========

"pelican-import" is a command-line tool for converting articles from
other software to reStructuredText or Markdown. The supported import
formats are:

* Blogger XML export

* Dotclear export

* Medium export

* Tumblr API

* WordPress XML export

* RSS/Atom feed

The conversion from HTML to reStructuredText or Markdown relies on
Pandoc. For Dotclear, if the source posts are written with Markdown
syntax, they will not be converted (as Pelican also supports
Markdown).

Note:

  Unlike Pelican, Wordpress supports multiple categories per article.
  These are imported as a comma-separated string. You have to resolve
  these manually, or use a plugin such as More Categories that enables
  multiple categories per article.

Note:

  Imported pages may contain links to images that still point to the
  original site. So you might want to download those images into your
  local content and manually re-link them from the relevant pages of
  your site.


Dependencies
============

"pelican-import" has some dependencies not required by the rest of
Pelican:

* *BeautifulSoup4* and *lxml*, for WordPress and Dotclear import. Can
  be installed like any other Python package ("pip install
  BeautifulSoup4 lxml").

* *Feedparser*, for feed import ("pip install feedparser").

* *Pandoc*, see the Pandoc site for installation instructions on your
  operating system.


Usage
=====

   pelican-import [-h] [--blogger] [--dotclear] [--tumblr] [--wpfile] [--feed]
                  [-o OUTPUT] [-m MARKUP] [--dir-cat] [--dir-page] [--strip-raw] [--wp-custpost]
                  [--wp-attach] [--disable-slugs] [-b BLOGNAME]
                  input|api_key


Positional arguments
--------------------

   +---------------+------------------------------------------------------------------------------+
   | "input"       | The input file to read                                                       |
   +---------------+------------------------------------------------------------------------------+
   | "api_key"     | (Tumblr only) api_key can be obtained from https://www.tumblr.com/oauth/apps |
   +---------------+------------------------------------------------------------------------------+


Optional arguments
------------------

   -h, --help

   Show this help message and exit

   --blogger

   Blogger XML export (default: False)

   --dotclear

   Dotclear export (default: False)

   --medium

   Medium export (default: False)

   --tumblr

   Tumblr API (default: False)

   --wpfile

   WordPress XML export (default: False)

   --feed

   Feed to parse (default: False)

   -o OUTPUT, --output OUTPUT

   Output path (default: content)

   -m MARKUP, --markup MARKUP

   Output markup format: "rst", "markdown", or "asciidoc" (default:
   "rst")

   --dir-cat

   Put files in directories with categories name (default: False)

   --dir-page

   Put files recognised as pages in "pages/" sub- directory (blogger
   and wordpress import only) (default: False)

   --filter-author

   Import only post from the specified author

   --strip-raw

   Strip raw HTML code that can't be converted to markup such as flash
   embeds or iframes (default: False)

   --wp-custpost

   Put wordpress custom post types in directories. If used with --dir-
   cat option directories will be created as "/post_type/category/"
   (wordpress import only)

   --wp-attach

   Download files uploaded to wordpress as attachments. Files will be
   added to posts as a list in the post header and links to the files
   within the post will be updated. All files will be downloaded, even
   if they aren't associated with a post. Files will be downloaded
   with their original path inside the output directory, e.g. "output
   /wp-uploads/date/postname/file.jpg". (wordpress import only)
   (requires an internet connection)

   --disable-slugs

   Disable storing slugs from imported posts within output. With this
   disabled, your Pelican URLs may not be consistent with your
   original posts. (default: False)

   -b BLOGNAME, --blogname=BLOGNAME

   Blog name used in Tumblr API


Examples
========

For Blogger:

   $ pelican-import --blogger -o ~/output ~/posts.xml

For Dotclear:

   $ pelican-import --dotclear -o ~/output ~/backup.txt

For Medium:

   $ pelican-import --medium -o ~/output ~/medium-export/posts/

The Medium export is a zip file.  Unzip it, and point this tool to the
"posts" subdirectory.  For more information on how to export, see
https://help.medium.com/hc/en-us/articles/115004745787-Export-your-
account-data.

For Tumblr:

   $ pelican-import --tumblr -o ~/output --blogname=<blogname> <api_key>

For WordPress:

   $ pelican-import --wpfile -o ~/output ~/posts.xml

For Medium (an example of using an RSS feed):

   $ python -m pip install feedparser $ pelican-import --feed
   https://medium.com/feed/@username

Note:

  The RSS feed may only return the most recent posts — not all of
  them.


Tests
=====

To test the module, one can use sample files:

* for WordPress: https://www.wpbeginner.com/wp-themes/how-to-add-
  dummy-content-for-theme-development-in-wordpress/

* for Dotclear: http://media.dotaddict.org/tda/downloads/lorem-
  backup.txt
