Configuration - Paperless-ngx (2024)

Paperless provides a wide range of customizations. Depending on how yourun paperless, these settings have to be defined in different places.

Certain configuration options may be set via the UI. This currently includescommon OCR related settings and some frontend settings. If set, these will takepreference over the settings via environment variables. If not set, the environment settingor applicable default will be utilized instead.

If you run paperless on docker, paperless.conf is not used.Rather, configure paperless by copying necessary options todocker-compose.env.
If you are running paperless on anything else, paperless will searchfor the configuration file in these locations and use the first oneit finds:
The environment variable PAPERLESS_CONFIGURATION_PATH
/path/to/paperless/paperless.conf
/etc/paperless.conf
/usr/local/etc/paperless.conf

Required services

Redis Broker

PAPERLESS_REDIS=<url>

This is required for processing scheduled tasks such as emailfetching, index optimization and for training the automatic documentmatcher.

If your Redis server needs login credentials PAPERLESS_REDIS =redis://<username>:<password>@<host>:<port>
With the requirepass option PAPERLESS_REDIS =redis://:<password>@<host>:<port>
To include the redis database index PAPERLESS_REDIS =redis://<username>:<password>@<host>:<port>/<DBIndex>

More information on securing your RedisInstance.

Defaults to redis://localhost:6379.

PAPERLESS_REDIS_PREFIX=<prefix>

Prefix to be used in Redis for keys and channels. Useful for sharing one Redis server among multiple Paperless instances.

Defaults to no prefix.

Database

PAPERLESS_DBENGINE=<engine_name>

Optional, gives the ability to choose Postgres or MariaDB fordatabase engine. Available options are postgresql andmariadb.

Default is postgresql.

Warning

Using MariaDB comes with some caveats. See MySQL Caveats.

PAPERLESS_DBHOST=<hostname>

By default, sqlite is used as the database backend. This can bechanged here.

Set PAPERLESS_DBHOST and another database will be used instead ofsqlite.

PAPERLESS_DBPORT=<port>

Adjust port if necessary.

Default is 5432.

PAPERLESS_DBNAME=<name>

Database name in PostgreSQL or MariaDB.

Defaults to "paperless".

PAPERLESS_DBUSER=<name>

Database user in PostgreSQL or MariaDB.

Defaults to "paperless".

PAPERLESS_DBPASS=<password>

Database password for PostgreSQL or MariaDB.

Defaults to "paperless".

PAPERLESS_DBSSLMODE=<mode>

SSL mode to use when connecting to PostgreSQL or MariaDB.

See the official documentation aboutsslmode for PostgreSQL.

See the official documentation aboutsslmode for MySQL and MariaDB.

Note: SSL mode values differ between PostgreSQL and MariaDB.

Default is prefer for PostgreSQL and PREFERRED for MariaDB.

PAPERLESS_DBSSLROOTCERT=<ca-path>

SSL root certificate path

See the official documentation aboutsslmode for PostgreSQL.Changes path of root.crt.

See the official documentation aboutsslmode for MySQL and MariaDB.

Defaults to unset, using the documented path in the home directory.

PAPERLESS_DBSSLCERT=<client-cert-path>

SSL client certificate path

See the official documentation aboutsslmode for PostgreSQL.

See the official documentation aboutsslmode for MySQL and MariaDB.

Changes path of postgresql.crt.

Defaults to unset, using the documented path in the home directory.

PAPERLESS_DBSSLKEY=<client-cert-key>

SSL client key path

See the official documentation aboutsslmode for PostgreSQL.

See the official documentation aboutsslmode for MySQL and MariaDB.

Changes path of postgresql.key.

Defaults to unset, using the documented path in the home directory.

PAPERLESS_DB_TIMEOUT=<int>

Amount of time for a database connection to wait for the database tounlock. Mostly applicable for sqlite based installation. Consider changingto postgresql if you are having concurrency problems with sqlite.

Defaults to unset, keeping the Django defaults.

Optional Services

Tika

Paperless can make use of Tika andGotenberg for parsing and converting"Office" documents (such as ".doc", ".xlsx" and ".odt").Tika and Gotenberg are also needed to allow parsing of E-Mails (.eml).

If you wish to use this, you must provide a Tika server and a Gotenberg server,configure their endpoints, and enable the feature.

PAPERLESS_TIKA_ENABLED=<bool>

Enable (or disable) the Tika parser.

Defaults to false.

PAPERLESS_TIKA_ENDPOINT=<url>

Set the endpoint URL where Paperless can reach your Tika server.

Defaults to "http://localhost:9998".

PAPERLESS_TIKA_GOTENBERG_ENDPOINT=<url>

Set the endpoint URL where Paperless can reach your Gotenberg server.

Defaults to "http://localhost:3000".

If you run paperless on docker, you can add those services to theDocker Compose file (see the provideddocker-compose.sqlite-tika.ymlfile for reference).

Add all three configuration parameters to your configuration. If usingDocker, this may be the environment key of the webserver or adocker-compose.env file. Bare metal installations may have a .conf filecontaining the configuration parameters. Be sure to use the correct formatand watch out for indentation if editing the YAML file.

Paths and folders

PAPERLESS_CONSUMPTION_DIR=<path>

This is where your documents should go to be consumed. Make sure thatit exists and that the user running the paperless service canread/write its contents before you start Paperless.

Don't change this when using docker, as it only changes the pathwithin the container. Change the local consumption directory in thedocker-compose.yml file instead.

Defaults to "../consume/", relative to the "src" directory.

PAPERLESS_DATA_DIR=<path>

This is where paperless stores all its data (search index, SQLitedatabase, classification model, etc).

Defaults to "../data/", relative to the "src" directory.

PAPERLESS_TRASH_DIR=<path>

Instead of removing deleted documents, they are moved to thisdirectory.

This must be writeable by the user running paperless. When runninginside docker, ensure that this path is within a permanent volume(such as "../media/trash") so it won't get lost on upgrades.

Note that the directory must exist prior to using this setting.

Defaults to empty (i.e. really delete documents).

PAPERLESS_MEDIA_ROOT=<path>

This is where your documents and thumbnails are stored.

You can set this and PAPERLESS_DATA_DIR to the same folder to havepaperless store all its data within the same volume.

Defaults to "../media/", relative to the "src" directory.

PAPERLESS_STATICDIR=<path>

Override the default STATIC_ROOT here. This is where all staticfiles created using "collectstatic" manager command are stored.

Unless you're doing something fancy, there is no need to overridethis. If this is changed, you may need to runcollectstatic again.

Defaults to "../static/", relative to the "src" directory.

PAPERLESS_FILENAME_FORMAT=<format>

Changes the filenames paperless uses to store documents in the mediadirectory. See File name handling for details.

Default is none, which disables this feature.

PAPERLESS_FILENAME_FORMAT_REMOVE_NONE=<bool>

Tells paperless to replace placeholders inPAPERLESS_FILENAME_FORMAT that would resolve to'none' to be omitted from the resulting filename. This also holdstrue for directory names. See File name handling fordetails.

Defaults to false which disables this feature.

PAPERLESS_LOGGING_DIR=<path>

This is where paperless will store log files.

Defaults to PAPERLESS_DATA_DIR/log/.

PAPERLESS_NLTK_DIR=<path>

This is where paperless will search for the data required for NLTKprocessing, if you are using it. If you are using the Docker image,this should not be changed, as the data is included in the imagealready.

Previously, the location defaulted to PAPERLESS_DATA_DIR/nltk.Unless you are using this in a bare metal install or other setup,this folder is no longer needed and can be removed manually.

Defaults to /usr/share/nltk_data

Logging

PAPERLESS_LOGROTATE_MAX_SIZE=<num>

Maximum file size for log files before they are rotated, in bytes.

Defaults to 1 MiB.

PAPERLESS_LOGROTATE_MAX_BACKUPS=<num>

Number of rotated log files to keep.

Defaults to 20.

Hosting & Security

PAPERLESS_SECRET_KEY=<key>

Paperless uses this to make session tokens. If you expose paperlesson the internet, you need to change this, since the default secretis well known.

Use any sequence of characters. The more, the better. You don'tneed to remember this. Just face-roll your keyboard.

Default is listed in the file src/paperless/settings.py.

PAPERLESS_URL=<url>

This setting can be used to set the three options below(ALLOWED_HOSTS, CORS_ALLOWED_HOSTS and CSRF_TRUSTED_ORIGINS). If theother options are set the values will be combined with this one. Donot include a trailing slash. E.g. https://paperless.domain.com

Defaults to empty string, leaving the other settings unaffected.

Note

This value cannot contain a path (e.g. domain.com/path), even ifyou are installing paperless-ngx at a subpath.

PAPERLESS_CSRF_TRUSTED_ORIGINS=<comma-separated-list>

A list of trusted origins for unsafe requests (e.g. POST). As ofDjango 4.0 this is required to access the Django admin via the web.See the Django project documentation on the settings

Can also be set using PAPERLESS_URL (see above).

Defaults to empty string, which does not add any origins to thetrusted list.

PAPERLESS_ALLOWED_HOSTS=<comma-separated-list>

If you're planning on putting Paperless on the open internet, thenyou really should set this value to the domain name you're using.Failing to do so leaves you open to HTTP host header attacks.You can read more about this in the Django project's documentation

Just remember that this is a comma-separated list, so"example.com" is fine, as is "example.com,www.example.com", butNOT " example.com" or "example.com,"

Can also be set using PAPERLESS_URL (see above).

"localhost" is always allowed for docker healthcheck

Defaults to "*", which is all hosts.

PAPERLESS_CORS_ALLOWED_HOSTS=<comma-separated-list>

You need to add your servers to the list of allowed hosts that cando CORS calls. Set this to your public domain name.

Can also be set using PAPERLESS_URL (see above).

PAPERLESS_TRUSTED_PROXIES=<comma-separated-list>

This may be needed to prevent IP address spoofing if you are using e.g.fail2ban with log entries for failed authorization attempts. Value should beIP address(es).

Defaults to empty string.

PAPERLESS_FORCE_SCRIPT_NAME=<path>

To host paperless under a subpath url like example.com/paperless youset this value to /paperless. No trailing slash!

Defaults to none, which hosts paperless at "/".

PAPERLESS_STATIC_URL=<path>

Override the STATIC_URL here. Unless you're hosting Paperless off asubdomain like /paperless/, you probably don't need to change this.If you do change it, be sure to include the trailing slash.

Defaults to "/static/".

Note

When hosting paperless behind a reverse proxy like Traefik or Nginxat a subpath e.g. example.com/paperlessngx you will also need to setPAPERLESS_FORCE_SCRIPT_NAME (see above).

PAPERLESS_AUTO_LOGIN_USERNAME=<username>

Specify a username here so that paperless will automatically performlogin with the selected user.

Danger

Do not use this when exposing paperless on the internet. There areno checks in place that would prevent you from doing this.

Defaults to none, which disables this feature.

PAPERLESS_ADMIN_USER=<username>

If this environment variable is specified, Paperless automaticallycreates a superuser with the provided username at start. This isuseful in cases where you can not run thecreatesuperuser command separately, such as Kubernetesor AWS ECS.

Requires PAPERLESS_ADMIN_PASSWORD be set.

Note

This will not change an existing [super]user's password, nor willit recreate a user that already exists. You can leave thisthroughout the lifecycle of the containers.

PAPERLESS_ADMIN_MAIL=<email>

(Optional) Specify superuser email address. Only used whenPAPERLESS_ADMIN_USER is set.

Defaults to root@localhost.

PAPERLESS_ADMIN_PASSWORD=<password>

Only used when PAPERLESS_ADMIN_USER is set. This willbe the password of the automatically created superuser.

PAPERLESS_COOKIE_PREFIX=<str>

Specify a prefix that is added to the cookies used by paperless toidentify the currently logged in user. This is useful for whenyou're running two instances of paperless on the same host.

After changing this, you will have to login again.

Defaults to "", which does not alter the cookie names.

PAPERLESS_ENABLE_HTTP_REMOTE_USER=<bool>

Allows authentication via HTTP_REMOTE_USER which is used by some SSOapplications.

Warning

This will allow authentication by simply adding aRemote-User: <username> header to a request. Use with care! Youespecially must ensure that any such header is not passed fromexternal requests to your reverse-proxy to paperless (that wouldeffectively bypass all authentication).

If you're exposing paperless to the internet directly (i.e.without a reverse proxy), do not use this.

Also see the warning in the official documentation.

Defaults to "false" which disables this feature.

PAPERLESS_ENABLE_HTTP_REMOTE_USER_API=<bool>

Allows authentication via HTTP_REMOTE_USER directly against the API

Warning

See the warning above about securing your installation when using remote user header authentication. This setting is separate fromPAPERLESS_ENABLE_HTTP_REMOTE_USER to avoid introducing a security vulnerability to existing reverse proxy setups. As above,ensure that your reverse proxy does not simply pass the Remote-User header from the internet to paperless.

Defaults to "false" which disables this feature.

PAPERLESS_HTTP_REMOTE_USER_HEADER_NAME=<str>

If "PAPERLESS_ENABLE_HTTP_REMOTE_USER" or PAPERLESS_ENABLE_HTTP_REMOTE_USER_API are enabled, thisproperty allows to customize the name of the HTTP header from whichthe authenticated username is extracted. Values are in terms ofHttpRequest.META.Thus, the configured value must start with HTTP*followed by the normalized actual header name.

Defaults to "HTTP_REMOTE_USER".

PAPERLESS_LOGOUT_REDIRECT_URL=<str>

URL to redirect the user to after a logout. This can be usedtogether with PAPERLESS_ENABLE_HTTP_REMOTE_USER and SSO toredirect the user back to the SSO application's logout page tocomplete the logout process.

Defaults to None, which disables this feature.

PAPERLESS_USE_X_FORWARD_HOST=<bool>

Configures the Django setting USE_X_FORWARDED_HOSTwhich may be needed for hosting behind a proxy.

Defaults to False

PAPERLESS_USE_X_FORWARD_PORT=<bool>

Configures the Django setting USE_X_FORWARDED_PORTwhich may be needed for hosting behind a proxy.

Defaults to False

PAPERLESS_PROXY_SSL_HEADER=<json-list>

Configures the Django setting SECURE_PROXY_SSL_HEADERwhich may be needed for hosting behind a proxy. The two values in the list will form the tuple ofHTTP header/value expected by Django, eg '["HTTP_X_FORWARDED_PROTO", "https"]'.

Defaults to None

Warning

Settings this value has security implications. Read the Django documentationand be sure you understand its usage before setting it.

PAPERLESS_EMAIL_CERTIFICATE_LOCATION=<path>

Configures an additional SSL certificate file containing a certificateor certificate chain which should be trusted for validating SSL connections against mail providers.This is for use with self-signed certificates against local IMAP servers.

Defaults to None.

Warning

Settings this value has security implications for the security of your email.Understand what it does and be sure you need to before setting.

PAPERLESS_SOCIALACCOUNT_PROVIDERS=<json>

This variable is used to setup login and signup via social account providers which are compatible with django-allauth.See the corresponding django-allauth documentationfor a list of provider configurations. You will also need to include the relevant Django 'application' inside thePAPERLESS_APPS setting to activate that specific authentication provider (e.g. allauth.socialaccount.providers.openid_connect for the OIDC Connect provider).

Defaults to None, which does not enable any third party authentication systems.

PAPERLESS_SOCIAL_AUTO_SIGNUP=<bool>

Attempt to signup the user using retrieved email, username etc from the third party authenticationsystem. See the correspondingdjango-allauth documentation

Defaults to False

PAPERLESS_SOCIALACCOUNT_ALLOW_SIGNUPS=<bool>

Allow users to signup for a new Paperless-ngx account using any setup third party authentication systems.

Defaults to True

PAPERLESS_ACCOUNT_ALLOW_SIGNUPS=<bool>

Allow users to signup for a new Paperless-ngx account.

Defaults to False

PAPERLESS_ACCOUNT_DEFAULT_HTTP_PROTOCOL=<string>

The protocol used when generating URLs, e.g. login callback URLs. See the correspondingdjango-allauth documentation

Defaults to 'https'

PAPERLESS_ACCOUNT_EMAIL_VERIFICATION=<string>

Determines whether email addresses are verified during signup (as performed by Django allauth). See the relevantpaperless settings and the allauth docs

Defaults to 'optional'

Note

If you do not have a working email server set up you should set this to 'none'.

PAPERLESS_DISABLE_REGULAR_LOGIN=<bool>

Disables the regular frontend username / password login, i.e. once you have setup SSO. Note that this setting does not disable the Django admin login. To prevent logins directly to Django, consider blocking /admin/ in your web server or reverse proxy configuration.

Defaults to False

PAPERLESS_ACCOUNT_SESSION_REMEMBER=<bool>

See the correspondingdjango-allauth documentation

OCR settings

Paperless uses OCRmyPDFfor performing OCR on documents and images. Paperless uses sensibledefaults for most settings, but all of them can be configured to yourneeds.

PAPERLESS_OCR_LANGUAGE=<lang>

Customize the language that paperless will attempt to use whenparsing documents.

It should be a 3-letter code, see the list of languages Tesseract supports.

Set this to the language most of your documents are written in.

This can be a combination of multiple languages such as deu+eng,in which case Tesseract will use whatever language matches best.Keep in mind that Tesseract uses much more CPU time with multiplelanguages enabled.

Defaults to "eng".

Note

If your language contains a '-' such as chi-sim, you must use chi_sim.

PAPERLESS_OCR_MODE=<mode>

Tell paperless when and how to perform ocr on your documents. Threemodes are available:

skip: Paperless skips all pages and will perform ocr only onpages where no text is present. This is the safest option.
redo: Paperless will OCR all pages of your documents andattempt to replace any existing text layers with new text. Thiswill be useful for documents from scanners that alreadyperformed OCR with insufficient results. It will also performOCR on purely digital documents.
This option may fail on some documents that have features thatcannot be removed, such as forms. In this case, the text fromthe document is used instead.
force: Paperless rasterizes your documents, converting anytext into images and puts the OCRed text on top. This works forall documents, however, the resulting document may besignificantly larger and text won't appear as sharp when zoomedin.

The default is skip, which only performs OCR when necessary andalways creates archived documents.

Read more about this in the OCRmyPDFdocumentation.

PAPERLESS_OCR_SKIP_ARCHIVE_FILE=<mode>

Specify when you would like paperless to skip creating an archivedversion of your documents. This is useful if you don't want to have twoalmost-identical versions of your documents in the media folder.

never: Never skip creating an archived version.
with_text: Skip creating an archived version for documentsthat already have embedded text.
always: Always skip creating an archived version.

The default is never.

PAPERLESS_OCR_CLEAN=<mode>

Tells paperless to use unpaper to clean any input document beforesending it to tesseract. This uses more resources, but generallyresults in better OCR results. The following modes are available:

clean: Apply unpaper.
clean-final: Apply unpaper, and use the cleaned images tobuild the output file instead of the original images.
none: Do not apply unpaper.

Defaults to clean.

Note

clean-final is incompatible with ocr mode redo. When bothclean-final and the ocr mode redo is configured, clean is usedinstead.

PAPERLESS_OCR_DESKEW=<bool>

Tells paperless to correct skewing (slight rotation of input imagesmainly due to improper scanning)

Defaults to true, which enables this feature.

Note

Deskewing is incompatible with ocr mode redo. Deskewing will getdisabled automatically if redo is used as the ocr mode.

PAPERLESS_OCR_ROTATE_PAGES=<bool>

Tells paperless to correct page rotation (90°, 180° and 270°rotation).

If you notice that paperless is not rotating incorrectly rotatedpages (or vice versa), try adjusting the threshold up or down (seebelow).

Defaults to true, which enables this feature.

PAPERLESS_OCR_ROTATE_PAGES_THRESHOLD=<num>

Adjust the threshold for automatic page rotation byPAPERLESS_OCR_ROTATE_PAGES. This is an arbitrary value reported bytesseract. "15" is a very conservative value, whereas "2" is avery aggressive option and will often result in correctly rotatedpages being rotated as well.

Defaults to "12".

PAPERLESS_OCR_OUTPUT_TYPE=<type>

Specify the the type of PDF documents that paperless should produce.

pdf: Modify the PDF document as little as possible.
pdfa: Convert PDF documents into PDF/A-2b documents, which isa subset of the entire PDF specification and meant for storingdocuments long term.
pdfa-1, pdfa-2, pdfa-3 to specify the exact version ofPDF/A you wish to use.

If not specified, pdfa is used. Remember that paperless also keepsthe original input file as well as the archived version.

PAPERLESS_OCR_PAGES=<num>

Tells paperless to use only the specified amount of pages for OCR.Documents with less than the specified amount of pages get OCR'edcompletely.

Specifying 1 here will only use the first page.

The value must be greater than or equal to 1 to be used.

When combined with PAPERLESS_OCR_MODE=redo orPAPERLESS_OCR_MODE=force, paperless will not modify any text itfinds on excluded pages and copy it verbatim.

Defaults to unset, which disables this feature and always uses allpages.

PAPERLESS_OCR_IMAGE_DPI=<num>

Paperless will OCR any images you put into the system and convertthem into PDF documents. This is useful if your scanner producesimages. In order to do so, paperless needs to know the DPI of theimage. Most images from scanners will have this information embeddedand paperless will detect and use that information. In case thisfails, it uses this value as a fallback.

Set this to the DPI your scanner produces images at.

Defaults to unset, which will automatically calculate image DPI sothat the produced PDF documents are A4 sized.

PAPERLESS_OCR_MAX_IMAGE_PIXELS=<num>

Paperless will raise a warning when OCRing images which are overthis limit and will not OCR images which are more than twice thislimit. Note this does not prevent the document from being consumed,but could result in missing text content.

If unset, will default to the value determined byPillow.

Setting this value to 0 will entirely disable the limit. See the below warning.

Note

Increasing this limit could cause Paperless to consume additionalresources when consuming a file. Be sure you have sufficient systemresources.

Warning

The limit is intended to prevent malicious files from consumingsystem resources and causing crashes and other errors. Only changethis value if you are certain your documents are not malicious andyou need the text which was not OCRed

PAPERLESS_OCR_COLOR_CONVERSION_STRATEGY=<RGB>

Controls the Ghostscript color conversion strategy when creating the archive file. This settingwill only be utilized if the output is a version of PDF/A.

Valid options are CMYK, Gray, LeaveColorUnchanged, RGB or UseDeviceIndependentColor.

You can find more on the settings here in the Ghostscript documentation.

Warning

Utilizing some of the options may result in errors when creating archivefiles from PDFs.

PAPERLESS_OCR_USER_ARGS=<json>

OCRmyPDF offers many more options. Use this parameter to specify anyadditional arguments you wish to pass to OCRmyPDF. Since Paperlessuses the API of OCRmyPDF, you have to specify these in a format thatcan be passed to the API. See the API reference ofOCRmyPDFfor valid parameters. All command line options are supported, butthey use underscores instead of dashes.

Warning

Paperless has been tested to work with the OCR options providedabove. There are many options that are incompatible with each other,so specifying invalid options may prevent paperless from consumingany documents. Use with caution!

Specify arguments as a JSON dictionary. Keep note of lower casebooleans and double quoted parameter names and strings. Examples:

{"deskew": true, "optimize": 3, "unpaper_args": "--pre-rotate 90"}

Software tweaks

PAPERLESS_TASK_WORKERS=<num>

Paperless does multiple things in the background: Maintain thesearch index, maintain the automatic matching algorithm, checkemails, consume documents, etc. This variable specifies how manythings it will do in parallel.

Defaults to 1

PAPERLESS_THREADS_PER_WORKER=<num>

Furthermore, paperless uses multiple threads when consumingdocuments to speed up OCR. This variable specifies how many pagespaperless will process in parallel on a single document.

Warning

Ensure that the product

PAPERLESS_TASK_WORKERS * PAPERLESS_THREADS_PER_WORKER

does not exceed your CPU core count or else paperless will beextremely slow. If you want paperless to process many documents inparallel, choose a high worker count. If you want paperless toprocess very large documents faster, use a higher thread per workercount.

The default is a balance between the two, according to your CPU corecount, with a slight favor towards threads per worker:

CPU core count	Workers	Threads
> 1	> 1	> 1
> 2	> 2	> 1
> 4	> 2	> 2
> 6	> 2	> 3
> 8	> 2	> 4
> 12	> 3	> 4
> 16	> 4	> 4

If you only specify PAPERLESS_TASK_WORKERS, paperless will adjustPAPERLESS_THREADS_PER_WORKER automatically.

PAPERLESS_WORKER_TIMEOUT=<num>

Machines with few cores or weak ones might not be able to finish OCRon large documents within the default 1800 seconds. So extendingthis timeout may prove to be useful on weak hardware setups.

PAPERLESS_TIME_ZONE=<timezone>

Set the time zone here. See more details onwhy and how to set it in the Django project documentationfor details on how to set it.

Defaults to UTC.

PAPERLESS_ENABLE_NLTK=<bool>

Enables or disables the advanced natural language processingused during automatic classification. If disabled, paperless willstill perform some basic text pre-processing before matching.

PAPERLESS_EMAIL_TASK_CRON=<cron expression>

Configures the scheduled email fetching frequency. The valueshould be a valid crontab(5) expression describing when to run.

If set to the string "disable", no emails will be fetched automatically.

Defaults to */10 * * * * or every ten minutes.

PAPERLESS_TRAIN_TASK_CRON=<cron expression>

Configures the scheduled automatic classifier training frequency. The valueshould be a valid crontab(5) expression describing when to run.

If set to the string "disable", the classifier will not be trained automatically.

Defaults to 5 */1 * * * or every hour at 5 minutes past the hour.

PAPERLESS_INDEX_TASK_CRON=<cron expression>

Configures the scheduled search index update frequency. The valueshould be a valid crontab(5) expression describing when to run.

If set to the string "disable", the search index will not be automatically updated.

Defaults to 0 0 * * * or daily at midnight.

PAPERLESS_SANITY_TASK_CRON=<cron expression>

Configures the scheduled sanity checker frequency.

If set to the string "disable", the sanity checker will not run automatically.

Defaults to 30 0 * * sun or Sunday at 30 minutes past midnight.

PAPERLESS_ENABLE_COMPRESSION=<bool>

Enables compression of the responses from the webserver.

Defaults to 1, enabling compression.

Note

If you are using a proxy such as nginx, it is likely more efficientto enable compression in your proxy configuration rather thanthe webserver

PAPERLESS_CONVERT_MEMORY_LIMIT=<num>

On smaller systems, or even in the case of Very Large Documents, theconsumer may explode, complaining about how it's "unable to extendpixel cache". In such cases, try setting this to a reasonably lowvalue, like 32. The default is to use whatever is necessary to doeverything without writing to disk, and units are in megabytes.

For more information on how to use this value, you should search theweb for "MAGICK_MEMORY_LIMIT".

Defaults to 0, which disables the limit.

PAPERLESS_CONVERT_TMPDIR=<path>

Similar to the memory limit, if you've got a small system and yourOS mounts /tmp as tmpfs, you should set this to a path that's on aphysical disk, like /home/your_user/tmp or something. ImageMagickwill use this as scratch space when crunching through very largedocuments.

For more information on how to use this value, you should search theweb for "MAGICK_TMPDIR".

Default is none, which disables the temporary directory.

PAPERLESS_APPS=<string>

A comma-separated list of Django apps to be included in Django'sINSTALLED_APPS. This setting shouldbe used with caution!

Defaults to None, which does not add any additional apps.

PAPERLESS_MAX_IMAGE_PIXELS=<number>

Configures the maximum size of an image PIL will allow to load without warning or error.

If unset, will default to the value determined byPillow.

Defaults to None, which does change the limit

Warning

This limit is designed to prevent denial of service from malicious files.It should only be raised or disabled in certain circ*mstances and with great care.

Document Consumption

PAPERLESS_CONSUMER_DELETE_DUPLICATES=<bool>

When the consumer detects a duplicate document, it will not touchthe original document. This default behavior can be changed here.

Defaults to false.

PAPERLESS_CONSUMER_RECURSIVE=<bool>

Enable recursive watching of the consumption directory. Paperlesswill then pickup files from files in subdirectories within yourconsumption directory as well.

Defaults to false.

PAPERLESS_CONSUMER_SUBDIRS_AS_TAGS=<bool>

Set the names of subdirectories as tags for consumed files. E.g.<CONSUMPTION_DIR>/foo/bar/file.pdf will add the tags "foo" and"bar" to the consumed file. Paperless will create any tags thatdon't exist yet.

This is useful for sorting documents with certain tags such as caror todo prior to consumption. These folders won't be deleted.

PAPERLESS_CONSUMER_RECURSIVE must be enabled for this to work.

Defaults to false.

PAPERLESS_CONSUMER_IGNORE_PATTERNS=<json>

By default, paperless ignores certain files and folders in theconsumption directory, such as system files created by the Mac OSor hidden folders some tools use to store data.

This can be adjusted by configuring a custom json array withpatterns to exclude.

For example, .DS_STORE/* will ignore any files found in a foldernamed .DS_STORE, including .DS_STORE/bar.pdf and foo/.DS_STORE/bar.pdf

A pattern like ._* will ignore anything starting with ._, including:._foo.pdf and ._bar/foo.pdf

Defaults to[".DS_Store", ".DS_STORE", "._*", ".stfolder/*", ".stversions/*", ".localized/*", "desktop.ini", "@eaDir/*", "Thumbs.db"].

PAPERLESS_CONSUMER_BARCODE_SCANNER=<string>

Sets the barcode scanner used for barcode functionality.

Currently, "PYZBAR" (the default) or "ZXING" might be selected.If you have problems that your Barcodes/QR-Codes are not detected(especially with bad scan quality and/or small codes), try the other one.

zxing is not available on all platforms.

PAPERLESS_PRE_CONSUME_SCRIPT=<filename>

After some initial validation, Paperless can trigger an arbitraryscript if you like before beginning consumption. This script will be provideddata for it to work with via the environment.

For more information, take a look at pre-consumption script.

The default is blank, which means nothing will be executed.

PAPERLESS_POST_CONSUME_SCRIPT=<filename>

After a document is consumed, Paperless can trigger an arbitraryscript if you like. This script will be provideddata for it to work with via the environment.

For more information, take a look at Post-consumption script.

The default is blank, which means nothing will be executed.

PAPERLESS_FILENAME_DATE_ORDER=<format>

Paperless will check the document text for document dateinformation. Use this setting to enable checking the documentfilename for date information. The date order can be set to anyoption as specified inhttps://dateparser.readthedocs.io/en/latest/settings.html#date-order.The filename will be checked first, and if nothing is found, thedocument text will be checked as normal.

A date in a filename must have some separators (., ,, -, /, etc) for it to be parsed.

Defaults to none, which disables this feature.

PAPERLESS_NUMBER_OF_SUGGESTED_DATES=<num>

Paperless searches an entire document for dates. The first datefound will be used as the initial value for the created date. Whenthis variable is greater than 0 (or left to its default value),paperless will also suggest other dates found in the document, up toa maximum of this setting. Note that duplicates will be removed,which can result in fewer dates displayed in the frontend than thissetting value.

The task to find all dates can be time-consuming and increases witha higher (maximum) number of suggested dates and slower hardware.

Defaults to 3. Set to 0 to disable this feature.

PAPERLESS_THUMBNAIL_FONT_NAME=<filename>

Paperless creates thumbnails for plain text files by rendering thecontent of the file on an image and uses a predefined font for that.This font can be changed here.

Note that this won't have any effect on already generatedthumbnails.

Defaults to/usr/share/fonts/liberation/LiberationSerif-Regular.ttf.

PAPERLESS_IGNORE_DATES=<string>

Paperless parses a document's creation date from filename and filecontent. You may specify a comma separated list of dates that shouldbe ignored during this process. This is useful for special dates(like date of birth) that appear in documents regularly but are veryunlikely to be the document's creation date.

The date is parsed using the order specified in PAPERLESS_DATE_ORDER

Defaults to an empty string to not ignore any dates.

PAPERLESS_DATE_ORDER=<format>

Paperless will try to determine the document creation date from itscontents. Specify the date format Paperless should expect to seewithin your documents.

This option defaults to DMY which translates to day first, monthsecond, and year last order. Characters D, M, or Y can be shuffledto meet the required order.

Polling

PAPERLESS_CONSUMER_POLLING=<num>

If paperless won't find documents added to your consume folder, itmight not be able to automatically detect filesystem changes. Inthat case, specify a polling interval in seconds here, which willthen cause paperless to periodically check your consumptiondirectory for changes. This will also disable listening for filesystem changes with inotify.

Defaults to 0, which disables polling and uses filesystemnotifications.

PAPERLESS_CONSUMER_POLLING_RETRY_COUNT=<num>

If consumer polling is enabled, sets the maximum number of timespaperless will check for a file to remain unmodified. If a file'smodification time and size are identical for two consecutive checks, itwill be consumed.

Defaults to 5.

PAPERLESS_CONSUMER_POLLING_DELAY=<num>

If consumer polling is enabled, sets the delay in seconds betweeneach check (above) paperless will do while waiting for a file toremain unmodified.

Defaults to 5.

iNotify

PAPERLESS_CONSUMER_INOTIFY_DELAY=<num>

Sets the time in seconds the consumer will wait for additionalevents from inotify before the consumer will consider a file readyand begin consumption. Certain scanners or network setups maygenerate multiple events for a single file, leading to multipleconsumers working on the same file. Configure this to prevent that.

Defaults to 0.5 seconds.

Barcodes

PAPERLESS_CONSUMER_ENABLE_BARCODES=<bool>

Enables the scanning and page separation based on detected barcodes.This allows for scanning and adding multiple documents per uploadedfile, which are separated by one or multiple barcode pages.

For ease of use, it is suggested to use a standardized separationpage, e.g. here.

If no barcodes are detected in the uploaded file, no page separationwill happen.

The original document will be removed and the separated pages willbe saved as pdf.

See additional information in the advanced usage documentation

Defaults to false.

PAPERLESS_CONSUMER_BARCODE_TIFF_SUPPORT=<bool>

Whether TIFF image files should be scanned for barcodes. This willautomatically convert any TIFF image(s) to pdfs for laterprocessing. This only has an effect, ifPAPERLESS_CONSUMER_ENABLE_BARCODES has been enabled.

Defaults to false.

PAPERLESS_CONSUMER_BARCODE_STRING=<string>

Defines the string to be detected as a separator barcode. Ifpaperless is used with the PATCH-T separator pages, users shouldn'tchange this.

Defaults to "PATCHT"

PAPERLESS_CONSUMER_ENABLE_ASN_BARCODE=<bool>

Enables the detection of barcodes in the scanned document andsetting the ASN (archive serial number) if a properly formattedbarcode is detected.

The barcode must consist of a (configurable) prefix and the ASNto be set, for instance ASN00123. The content after the prefixis cleaned of non-numeric characters.

This option is compatible with barcode page separation, sincepages will be split up before reading the ASN.

If no ASN barcodes are detected in the uploaded file, no ASN willbe set. If a barcode with an existing ASN is detected, thedocument will not be consumed and an error logged.

Defaults to false.

PAPERLESS_CONSUMER_ASN_BARCODE_PREFIX=<string>

Defines the prefix that is used to identify a barcode as an ASNbarcode.

Defaults to "ASN"

PAPERLESS_CONSUMER_BARCODE_UPSCALE=<float>

Defines the upscale factor used in barcode detection.Improves the detection of small barcodes, i.e. with a value of 1.5 byupscaling the document before the detection process. Upscaling willonly take place if value is bigger than 1.0. Otherwise upscaling willnot be performed to save resources. Try using in combination withPAPERLESS_CONSUMER_BARCODE_DPI set to a value higher than default.

Defaults to 0.0

PAPERLESS_CONSUMER_BARCODE_DPI=<int>

During barcode detection every page from a PDF document needsto be converted to an image. A dpi value can be specified in theconversion process. Default is 300. If the detection of small barcodesfails a bigger dpi value i.e. 600 can fix the issue. Try using incombination with PAPERLESS_CONSUMER_BARCODE_UPSCALE bigger than 1.0.

Defaults to "300"

PAPERLESS_CONSUMER_ENABLE_TAG_BARCODE=<bool>

Enables the detection of barcodes in the scanned document andassigns or creates tags if a properly formatted barcode is detected.

The barcode must match one of the (configurable) regular expressions.If the barcode text contains ',' (comma), it is split into multiplebarcodes which are individually processed for tagging.

Matching is case insensitive.

Defaults to false.

PAPERLESS_CONSUMER_TAG_BARCODE_MAPPING=<json dict>

Defines a dictionary of filter regex and substitute expressions.

Syntax: {"": "" [,...]]}

A barcode is considered for tagging if the barcode text matchesat least one of the provided pattern.

If a match is found, the rule is applied. This allows veryversatile reformatting and mapping of barcode pattern to tag values.

If a tag is not found it will be created.

Defaults to:

{"TAG:(.)": "\g<1>"} which defines- a regex TAG:(.) which includes barcodes beginning with TAG:followed by any text that gets stored into match group #1 and- a substitute \g<1> that replaces the original barcode textby the content in match group #1.Consequently, the tag is the barcode text without its TAG: prefix.

More examples:

{"ASN12.": "JOHN", "ASN13.": "SMITH"} for example maps- ASN12nnnn barcodes to the tag JOHN and- ASN13nnnn barcodes to the tag SMITH.

{"T-J": "JOHN", "T-S": "SMITH", "T-D": "DOE"} directly maps- T-J barcodes to the tag JOHN,- T-S barcodes to the tag SMITH and- T-D barcodes to the tag DOE.

Please refer to the Python regex documentation for more information.

Audit Trail

PAPERLESS_AUDIT_LOG_ENABLED=<bool>

Enables the audit trail for documents, document types, correspondents, and tags.

Defaults to true.

Collate Double-Sided Documents

PAPERLESS_CONSUMER_ENABLE_COLLATE_DOUBLE_SIDED=<bool>

Enables automatic collation of two single-sided scans into a double-sideddocument.

This is useful if you have an automatic document feeder that only supportssingle-sided scans, but you need to scan a double-sided document. If yourADF supports double-sided scans natively, you do not need this feature.

PAPERLESS_CONSUMER_RECURSIVE must be enabled for this to work.

For more information, read the corresponding section in the advanceddocumentation.

Defaults to false.

PAPERLESS_CONSUMER_COLLATE_DOUBLE_SIDED_SUBDIR_NAME=<str>

The name of the subdirectory that the collate feature expects documents toarrive.

This only has an effect if PAPERLESS_CONSUMER_ENABLE_COLLATE_DOUBLE_SIDEDhas been enabled. Note that Paperless will not automatically create thedirectory.

Defaults to "double-sided".

PAPERLESS_CONSUMER_COLLATE_DOUBLE_SIDED_TIFF_SUPPORT=<bool>

Whether TIFF image files should be supported when collating documents.This will automatically convert any TIFF image(s) to pdfs for laterprocessing. This only has an effect ifPAPERLESS_CONSUMER_ENABLE_COLLATE_DOUBLE_SIDED has been enabled.

Defaults to false.

Binaries

There are a few external software packages that Paperless expects tofind on your system when it starts up. Unless you've done somethingcreative with their installation, you probably won't need to edit anyof these. However, if you've installed these programs somewhere wheresimply typing the name of the program doesn't automatically execute it(ie. the program isn't in your $PATH), then you'll need to specifythe literal path for that program.

PAPERLESS_CONVERT_BINARY=<path>

Defaults to "convert".

PAPERLESS_GS_BINARY=<path>

Defaults to "gs".

Docker-specific options

These options don't have any effect in paperless.conf. These optionsadjust the behavior of the docker container. Configure these indocker-compose.env.

PAPERLESS_WEBSERVER_WORKERS=<num>

The number of worker processes the webserver should spawn. Moreworker processes usually result in the front end to load data muchquicker. However, each worker process also loads the entireapplication into memory separately, so increasing this value willincrease RAM usage.

Defaults to 1.

PAPERLESS_BIND_ADDR=<ip address>

The IP address the webserver will listen on inside the container.There are special setups where you may need to configure this valueto restrict the Ip address or interface the webserver listens on.

Defaults to [::], meaning all interfaces, including IPv6.

PAPERLESS_PORT=<port>

The port number the webserver will listen on inside the container.There are special setups where you may need this to avoid collisionswith other services (like using podman with multiple containers inone pod).

Don't change this when using Docker. To change the port thewebserver is reachable outside of the container, instead refer tothe "ports" key in docker-compose.yml.

Defaults to 8000.

USERMAP_UID=<uid>

The ID of the paperless user in the container. Set this to youractual user ID on the host system, which you can get by executing

$ id -u

Paperless will change ownership on its folders to this user, so youneed to get this right in order to be able to write to theconsumption directory.

Defaults to 1000.

USERMAP_GID=<gid>

The ID of the paperless Group in the container. Set this to youractual group ID on the host system, which you can get by executing

$ id -g

Paperless will change ownership on its folders to this group, so youneed to get this right in order to be able to write to theconsumption directory.

Defaults to 1000.

PAPERLESS_OCR_LANGUAGES=<list>

Additional OCR languages to install. By default, paperless comeswith English, German, Italian, Spanish and French. If your languageis not in this list, install additional languages with thisconfiguration option. You will need to find the right LangCodesbut note that tesseract-ocr-* package namesdo not always correspond with the language codes e.g. "chi_tra" should bespecified as "chi-tra".

PAPERLESS_OCR_LANGUAGES=tur ces chi-tra

Make sure it's a space-separated list when using several values.

To actually use these languages, also set the default OCR languageof paperless:

PAPERLESS_OCR_LANGUAGE=tur

Defaults to none, which does not install any additional languages.

Warning

This option must not be used in rootless containers.

PAPERLESS_ENABLE_FLOWER=<defined>

If this environment variable is defined, the Celery monitoring toolFlower will bestarted by the container.

You can read more about this in the advanced documentation.

PAPERLESS_SUPERVISORD_WORKING_DIR=<defined>

If this environment variable is defined, the supervisord.log and supervisord.pid file will be created under the specified path in PAPERLESS_SUPERVISORD_WORKING_DIR. Setting PAPERLESS_SUPERVISORD_WORKING_DIR=/tmp and PYTHONPYCACHEPREFIX=/tmp/pycache would allow paperless to work on a read-only filesystem.

Please take note that the PAPERLESS_DATA_DIR and PAPERLESS_MEDIA_ROOT paths still have to be writable, just like the PAPERLESS_SUPERVISORD_WORKING_DIR. The can be archived by using bind or volume mounts. Only works in the container is run as user paperless

Frontend Settings

PAPERLESS_APP_TITLE=<bool>

If set, overrides the default name "Paperless-ngx"

PAPERLESS_APP_LOGO=<path>

Path to an image file in the /media/logo directory, must include 'logo', e.g. /logo/Atari_logo.svg

PAPERLESS_ENABLE_UPDATE_CHECK=<bool>

Note

This setting was deprecated in favor of a frontend setting afterv1.9.2. A one-time migration is performed for users who have thissetting set. This setting is always ignored if the correspondingfrontend setting has been set.

Email sending

Setting an SMTP server for the backend will allow you to reset yourpassword. All of these options come from their similarly-named Django settings

PAPERLESS_EMAIL_HOST=<str>

Defaults to 'localhost'.

PAPERLESS_EMAIL_PORT=<int>

Defaults to port 25.

PAPERLESS_EMAIL_HOST_USER=<str>

Defaults to ''.

PAPERLESS_EMAIL_FROM=<str>

Defaults to PAPERLESS_EMAIL_HOST_USER if not set.

PAPERLESS_EMAIL_HOST_PASSWORD=<str>

Defaults to ''.

PAPERLESS_EMAIL_USE_TLS=<bool>

Defaults to false.

PAPERLESS_EMAIL_USE_SSL=<bool>

Defaults to false.

FAQs

What is the difference between paperless-NGX and Papermerge? ›

What's the difference between Papermerge and Paperless-ng/Paperless-ngx? They are similar in many aspects. Compared to Papermerge, Paperless-ngx follows minimalist approach. Papermerge offers more complex features like multi-user, folder structure, document versioning, page management (split, merge, delete, rotate).

Read On ›

What are the benefits of paperless-NGX? ›

Paperless-ngx features & resources

Performs OCR on your documents, adds selectable text to image only documents and adds tags, correspondents and document types to your documents. Paperless stores your documents plain on disk. Filenames and folders are managed by paperless and their format can be configured freely.

Tell Me More ›

What is the difference between paperless archive and originals? ›

Paperless stores archived PDF/A documents alongside your original documents. These archived documents will also contain selectable text for image-only originals. These documents are derived from the originals, which are always stored unmodified.

See Details ›

What OCR does paperless-NGX use? ›

Paperless uses OCRmyPDF for performing OCR on documents and images.

Find Out More ›

What are the pros and cons of going paperless? ›

Paperless Office Advantages... And Disadvantages

Advantage: Save Money and Space.
Disadvantage: Resources are Needed for IT Management and Training.
Advantage: Boost your Security.
Disadvantage: There's Still the Potential for Cyber Attacks.
Advantage: Improve Document Organization and Accessibility.

More items...

Feb 14, 2023

Tell Me More ›

What are the disadvantages of paperless billing? ›

Drawbacks of paperless billing

Can lead to missed payments. If you're used to the physical reminder of a mailed monthly statement, transitioning to paperless could result in missed payments initially.
Requires internet access. ...
Risk of overlooked charges. ...
Risk of digital clutter.

Dec 15, 2023

Read The Full Story ›

What documents are supported by paperless NGX? ›

PDF documents, PNG images, JPEG images, TIFF images, GIF images and WebP images are processed with OCR and converted into PDF documents. Plain text documents are supported as well and are added verbatim to paperless.

Read The Full Story ›

Where does paperless ngx store documents? ›

By default, paperless stores your documents in the media directory and renames them using the identifier which it has assigned to each document. You will end up getting files like 0000123. pdf in your media directory. This isn't necessarily a bad thing, because you normally don't have to access these files manually.

What email format is best for archiving? ›

Save in PDF Format

The PDF format has obvious advantages when working with other users or for long-term archiving. PDF documents will also be indexed and will show up in your searches. One way of doing this would be to print every email thus converting it into a PDF and archive it.

Get More Info Here ›

Are paperless statements better? ›

Better Recordkeeping

Compared with paper statements, paperless statements might make it easier to keep track of banking activity. Why? Because paperless statements typically can be viewed anytime on a bank's website or app, even when you're not at home.

Know More ›

What are the 2 main archiving types? ›

Types of archives

College and university archives: typically preserve materials related to the university or college. ...
Corporate archives: manage and preserve records of that business. ...
Government or national archives: may collect materials related to all levels of government.

More items...

May 24, 2024

Read On ›

Why use paperless-ngx? ›

A powerful workflow system that gives you even more control. Optimized for multi core systems: Paperless-ngx consumes multiple documents in parallel. The integrated sanity checker makes sure that your document archive is in good health.

Learn More ›

What is the default user in paperless-NGX? ›

Default login is admin:admin via the webui, accessible at http://SERVERIP:PORT More info at paperless-ngx. For convenience this container provides an alias to perform administration management commands.

Get More Info ›

Where is the paperless-NGX config file? ›

The configuration file is located at /etc/paperless. conf . Note: The paperless-ngx^AUR package creates a paperless system user and provides a paperless-manage command which should always be run as the paperless user.

Keep Reading ›

What is the difference between paperless and paper? ›

There is always a need for space in paper-based works. For example, physical documents will require drawers, boxes, folders, and perhaps a room on the table or shelf. But, in paperless work, a hard drive or hard disk can store all this data and may not even occupy half of its space.

What formats are paperless-NGX? ›

What file types does paperless-ngx support?

PDF documents, PNG images, JPEG images, TIFF images, GIF images and WebP images are processed with OCR and converted into PDF documents.
Plain text documents are supported as well and are added verbatim to paperless.

More items...

Learn More Now ›

Where does paperless-ngx store documents? ›

Do you save money by going paperless? ›

What does your business spend on paper, printer ink, printer maintenance and postage each year? If you are regularly sending and receiving contracts and other paper-based documents, then the costs very quickly add up. Going paperless will eliminate most of these costs, putting more money back into the business.

Discover More Details ›