Python получить расширение файла
Достаточно часто возникают ситуации, когда у нас есть полное имя файла, а требуется узнать его расширение. Или добавить нужное расширение, когда не известно, ввел его пользователь или нет. Иногда у нас есть относительный путь до файла, а требуется узнать абсолютный. Про основные методы работы с именем файла и будет эта статья.
27 Answers 27
Unlike most manual string-splitting attempts, os.path.splitext will correctly treat /a/b.c/d as having no extension instead of having extension .c/d , and it will treat .bashrc as having no extension instead of having extension .bashrc :
the use of basename is a little confusing here since os.path.basename("/path/to/somefile.ext") would return "somefile.ext"
The standard Python function naming convention is really annoying - almost every time I re-look this up, I mistake it as being splittext . If they would just do anything to signify the break between parts of this name, it'd be much easier to recognize that it's splitExt or split_ext . Surely I can't be the only person who has made this mistake?
@Vingtoft You mentioned nothing about werkzeug's FileStorage in your comment and this question has nothing about that particular scenario. Something might be wrong with how you are passed the filename. os.path.splitext('somefile.ext') => ('somefile', '.ext') . Feel free provide an actual counter example without referencing some third party library.
New in version 3.4.
I'm surprised no one has mentioned pathlib yet, pathlib IS awesome!
If you need all the suffixes (eg if you have a .tar.gz ), .suffixes will return a list of them!
@user3780389 Wouldn't a "foo.bar.tar.gz" still be a valid ".tar.gz"? If so your snippet should be using .suffixes[-2:] to ensure only getting .tar.gz at most.
there are still cases when this does not work as expected like "filename with.a dot inside.tar" . This is the solution i am using currently: "".join([s for s in pathlib.Path('somedir/file.tar.gz').suffixes if not " " in s])
Oh, I was just wondering if there was a specific reason behind it (other than convention). I'm still learning Python and wanted to learn more!
it depends really, if you use from os import path then the name path is taken up in your local scope, also others looking at the code may not immediately know that path is the path from the os module. Where as if you use import os.path it keeps it within the os namespace and wherever you make the call people know it's path() from the os module immediately.
I know it's not semantically any different, but I personally find the construction _, extension = os.path.splitext(filename) to be much nicer-looking.
If you want the extension as part of a more complex expression the [1] may be more useful: if check_for_gzip and os.path.splitext(filename)[1] == '.gz':
To get only the text of the extension, without the dot.
For simple use cases one option may be splitting from dot:
No error when file doesn't have an extension:
But you must be careful:
Also will not work with hidden files in Unix systems:
For general use, prefer os.path.splitext
Not actually. Extension of a file named "x.tar.gz" is "gz" not "tar.gz". os.path.splitext gives ".os" as extension too.
[-1] to get last item of items that splitted by dot. Example: "my.file.name.js".split('.') => ['my','file','name','js]
@BenjaminR ah ok, you are making an optimisation about result list. ['file', 'tar', 'gz'] with 'file.tar.gz'.split('.') vs ['file.tar', 'gz'] with 'file.tar.gz'.rsplit('.', 1) . yeah, could be.
worth adding a lower in there so you don't find yourself wondering why the JPG's aren't showing up in your list.
Any of the solutions above work, but on linux I have found that there is a newline at the end of the extension string which will prevent matches from succeeding. Add the strip() method to the end. For example:
To aid my understanding, please could you explain what additional behaviour the second index/slice guards against? (i.e. the [1:] in .splittext(filename)[1][1:] ) - thank you in advance
Figured it out for myself: splittext() (unlike if you split a string using '.') includes the '.' character in the extension. The additional [1:] gets rid of it.
With splitext there are problems with files with double extension (e.g. file.tar.gz , file.tar.bz2 , etc..)
but should be: .tar.gz
The possible solutions are here
@FlipMcF The filename should obviously be somefile.tar . For tar -xzvf somefile.tar.gz the filename should be somefile .
@peterhil I don't think you want your python script to be aware of the application used to create the filename. It's a bit out of scope of the question. Don't pick on the example, 'filename.csv.gz' is also quite valid.
You can find some great stuff in pathlib module (available in python 3.x).
Just join all pathlib suffixes .
Although it is an old topic, but i wonder why there is none mentioning a very simple api of python called rpartition in this case:
to get extension of a given file absolute path, you can simply type:
will give you: 'csv'
For those not familiar with the API, rpartition returns a tuple: ("string before the right-most occurrence of the separator", "the separator itself", "the rest of the string") . If there's no separator found, the returned tuple will be: ("", "", "the original string") .
Surprised this wasn't mentioned yet:
- Works as expected for anything I can think of
- No modules
- No regex
- Cross-platform
- Easily extendible (e.g. no leading dots for extension, only last part of extension)
This answer absolutely ignore a variant if a filename contains many points in name. Example get_extension('cmocka-1.1.0.tar.xz') => '.1.0.tar.xz' - wrong.
@PADYMKO, IMHO one should not create filenames with full stops as part of the filename. The code above is not supposed to result in 'tar.xz'
You can use a split on a filename :
This does not require additional library
This results in the last char of filename being returned if the filename has no . at all. This is because rfind returns -1 if the string is not found.
This is a direct string representation techniques : I see a lot of solutions mentioned, but I think most are looking at split. Split however does it at every occurrence of "." . What you would rather be looking for is partition.
Абсолютный путь к файлу
Для того чтобы узнать в Python абсолютный путь к файлу, потребуется воспользоваться библиотекой os. Её подключаем с помощью команды import os. В классе path есть метод abspath. Вот пример использования.
Так же можно воспользоваться и стандартной библиотекой pathlib. Она вошла в состав основных библиотек, начиная с версии Python 3.4. До этого надо было ее инсталлировать с помощью команды pip install pathlib. Она предназначена для работы с путями файловой системы в разных ОС и отлично подойдет для решения данной задачи.
Python os module splitext()
splitext() function splits the file path into a tuple having two values – root and extension.
10. Получение файловой группы и имени владельца
8. Создание и удаление каталога
Мы можем использовать функцию mkdir() для создания каталога. Мы можем использовать rmdir() для удаления пустого каталога. Если есть файлы, то мы должны сначала удалить их, а затем удалить каталог.
13. Соединение двух путей
Расширение файла
В Python получить расширение файла можно аналогичным образом с помощью той же функции splitext. Она возвращает кортеж. Первый элемент кортежа имя, а второй – расширение. В данном случае нам нужен второй элемент. Индекс второго элемента равен единице, так как отсчет их идет от нуля.
Аналогично можно воспользоваться библиотекой pathlib. Воспользуемся методом suffix.
Но в нашем случае два расширения. Их можно узнать с помощью функции suffixes. Она возвращает список, элементами которого и будут расширения. Ниже приведен пример получения списка расширений.
Для того, чтобы получить имя файла или расширение из полного пути или для получения абсолютного пути к файлу используйте библиотеки os и pathlib. Лучше воспользоваться готовым решением из стандартой библиотеками, чем писать свое решение.
I'm working on a certain program where I need to do different things depending on the extension of the file. Could I just use this?
6. Получение информации о файле
Функция stat() объекта Path выполняет системный вызов stat() и возвращает результаты. Вывод такой же, как у функции stat() модуля os.
13 Answers 13
Assuming m is a string, you can use endswith :
To be case-insensitive, and to eliminate a potentially large else-if chain:
@Stevoisiak, I think you misplaced your comment as this solution works even in the case you point out
This doesn't account for folder names with periods. C:/folder.jpg is a valid path. You can confirm if it is a file or folder with os.path.isfile(m)
os.path provides many functions for manipulating paths/filenames. (docs)
os.path.splitext takes a path and splits the file extension from the end of it.
Gives:
This method ignores leading periods so /.mp3 is not considered an mp3 file. This is however the way a leading space should be treated. E.g .gitignore is not a file format
This doesn't account for folder names with periods. ( C:/folder.jpg/file.mp3 is a valid path). You can exclude those with os.path.isfile(m)
Use pathlib From Python3.4 onwards.
@Stevoisiak what do you mean? In what was does it not account for that? I just tried and .suffix correctly returns '.mp3'
Look at module fnmatch. That will do what you're trying to do.
one easy way could be:
os.path.splitext(file) will return a tuple with two values (the filename without extension + just the extension). The second index ([1]) will therefor give you just the extension. The cool thing is, that this way you can also access the filename pretty easily, if needed!
An old thread, but may help future readers.
I would avoid using .lower() on filenames if for no other reason than to make your code more platform independent. (linux is case sensistive, .lower() on a filename will surely corrupt your logic eventually . or worse, an important file!)
Why not use re? (Although to be even more robust, you should check the magic file header of each file. How to check type of files without extensions in python? )
Is there a function to extract the extension from a filename?
Get File Extension using Pathlib Module
Pathlib module to get the file extension
Another solution with right split:
Even this question is already answered I'd add the solution in Regex.
you can use following code to split file name and extension.
A true one-liner, if you like regex. And it doesn't matter even if you have additional "." in the middle
See here for the result: Click Here
- get all file name inside the list
- splitting file name and check the penultimate extension, is it in the pen_ext list or not?
- if yes then join it with the last extension and set it as the file's extension
- if not then just put the last extension as the file's extension
- and then check it out
This breaks for a bunch of special cases. See the accepted answer. It's reinventing the wheel, only in a buggy way.
Hello! While this code may solve the question, including an explanation of how and why this solves the problem would really help to improve the quality of your post, and probably result in more up-votes. Remember that you are answering the question for readers in the future, not just the person asking now. Please edit your answer to add explanations and give an indication of what limitations and assumptions apply.
You're only making it worse, breaking it in new ways. foo.tar is a valid file name. What happens if I throw that at your code? What about .bashrc or foo ? There is a library function for this for a reason.
just create a list of extension file for the penultimate extension, if not in list then just put the last extension as the file's extension
Модуль pathlib в Python обеспечивает объектно-ориентированный подход к работе с файлами и каталогами. В модуле pathlib есть классы для работы, как в среде Unix, так и в среде Windows. Самое приятное то, что нам не нужно беспокоиться о базовой операционной системе, модуль pathlib заботится об использовании соответствующего класса в зависимости от операционной системы.
2. Список файлов определенного типа
Мы можем использовать функцию Path glob() для перебора списка файлов, соответствующих заданному шаблону. Давайте воспользуемся этой функцией, чтобы распечатать все сценарии Python внутри каталога.
Без расширения
Теперь разберемся, как в Python узнать имя файла без расширения. Воспользуемся методом splittext. В этот раз для примера возьмем файл с двойным расширением, чтобы проверить, как будут в этой ситуации работать стандартны функции.
Видно, что последнее расширение архиватора gz было отброшено, в то время как расширение несжатого архива tar осталось в имени.
Если же нам нужно только имя, то можно отбросить все символы полученной строки, которые идут после первой точки. Символ точки тоже отбросим.
Дополним предыдущий пример следующим кодом:
Path Class
Path – самый важный класс в модуле pathlib. Это точка входа для всех функций, предоставляемых модулем pathlib. Он заботится о создании экземпляра реализации конкретного пути на основе операционной системы и делает код независимым от платформы.
Давайте рассмотрим несколько примеров использования модуля pathlib.
5. Открытие и чтение содержимого файла
Мы можем использовать функцию Path open(), чтобы открыть файл. Он возвращает файловый объект, такой как встроенная функция open().
14. Создание пустого файла
Как и в команде Touch в Unix, в Path есть функция touch() для создания пустого файла. У вас должны быть разрешения на создание файла. В противном случае файл не будет создан и ошибки не возникнет.
3. Устранение символических ссылок на канонический путь
Мы можем использовать функцию resolve() для преобразования символических ссылок в их канонические пути.
7. Получение имени файла или каталога
Мы можем использовать свойство «name», чтобы получить имя файла из объекта пути.
Заключение
Модуль pathlib в Python очень полезен при объектно-ориентированной работе с файлами и каталогами. Слабосвязанный и независимый от платформы код делает его более привлекательным в использовании.
Source code: Lib/posixpath.py (for POSIX) and Lib/ntpath.py (for Windows NT).
This module implements some useful functions on pathnames. To read or write files see open() , and for accessing the filesystem see the os module. The path parameters can be passed as strings, or bytes, or any object implementing the os.PathLike protocol.
Unlike a unix shell, Python does not do any automatic path expansions. Functions such as expanduser() and expandvars() can be invoked explicitly when an application desires shell-like path expansion. (See also the glob module.)
The pathlib module offers high-level path objects.
All of these functions accept either only bytes or only string objects as their parameters. The result is an object of the same type, if a path or file name is returned.
Since different operating systems have different path name conventions, there are several versions of this module in the standard library. The os.path module is always the path module suitable for the operating system Python is running on, and therefore usable for local paths. However, you can also import and use the individual modules if you want to manipulate a path that is always in one of the different formats. They all have the same interface:
posixpath for UNIX-style paths
ntpath for Windows paths
Changed in version 3.8: exists() , lexists() , isdir() , isfile() , islink() , and ismount() now return False instead of raising an exception for paths that contain characters or bytes unrepresentable at the OS level.
Return a normalized absolutized version of the pathname path. On most platforms, this is equivalent to calling the function normpath() as follows: normpath(join(os.getcwd(), path)) .
Changed in version 3.6: Accepts a path-like object .
Return the base name of pathname path. This is the second element of the pair returned by passing path to the function split() . Note that the result of this function is different from the Unix basename program; where basename for '/foo/bar/' returns 'bar' , the basename() function returns an empty string ( '' ).
Changed in version 3.6: Accepts a path-like object .
Return the longest common sub-path of each pathname in the sequence paths. Raise ValueError if paths contain both absolute and relative pathnames, the paths are on the different drives or if paths is empty. Unlike commonprefix() , this returns a valid path.
New in version 3.5.
Changed in version 3.6: Accepts a sequence of path-like objects .
Return the longest path prefix (taken character-by-character) that is a prefix of all paths in list. If list is empty, return the empty string ( '' ).
This function may return invalid paths because it works a character at a time. To obtain a valid path, see commonpath() .
Changed in version 3.6: Accepts a path-like object .
Return the directory name of pathname path. This is the first element of the pair returned by passing path to the function split() .
Changed in version 3.6: Accepts a path-like object .
Return True if path refers to an existing path or an open file descriptor. Returns False for broken symbolic links. On some platforms, this function may return False if permission is not granted to execute os.stat() on the requested file, even if the path physically exists.
Changed in version 3.3: path can now be an integer: True is returned if it is an open file descriptor, False otherwise.
Changed in version 3.6: Accepts a path-like object .
Return True if path refers to an existing path. Returns True for broken symbolic links. Equivalent to exists() on platforms lacking os.lstat() .
Changed in version 3.6: Accepts a path-like object .
On Unix and Windows, return the argument with an initial component of ~ or ~user replaced by that user’s home directory.
On Unix, an initial ~ is replaced by the environment variable HOME if it is set; otherwise the current user’s home directory is looked up in the password directory through the built-in module pwd . An initial ~user is looked up directly in the password directory.
On Windows, USERPROFILE will be used if set, otherwise a combination of HOMEPATH and HOMEDRIVE will be used. An initial ~user is handled by checking that the last directory component of the current user’s home directory matches USERNAME , and replacing it if so.
If the expansion fails or if the path does not begin with a tilde, the path is returned unchanged.
Changed in version 3.6: Accepts a path-like object .
Changed in version 3.8: No longer uses HOME on Windows.
Return the argument with environment variables expanded. Substrings of the form $name or $ are replaced by the value of environment variable name. Malformed variable names and references to non-existing variables are left unchanged.
On Windows, %name% expansions are supported in addition to $name and $ .
Changed in version 3.6: Accepts a path-like object .
Return the time of last access of path. The return value is a floating point number giving the number of seconds since the epoch (see the time module). Raise OSError if the file does not exist or is inaccessible.
os.path. getmtime ( path ) ¶
Return the time of last modification of path. The return value is a floating point number giving the number of seconds since the epoch (see the time module). Raise OSError if the file does not exist or is inaccessible.
Changed in version 3.6: Accepts a path-like object .
Return the system’s ctime which, on some systems (like Unix) is the time of the last metadata change, and, on others (like Windows), is the creation time for path. The return value is a number giving the number of seconds since the epoch (see the time module). Raise OSError if the file does not exist or is inaccessible.
Changed in version 3.6: Accepts a path-like object .
Return the size, in bytes, of path. Raise OSError if the file does not exist or is inaccessible.
Changed in version 3.6: Accepts a path-like object .
Return True if path is an absolute pathname. On Unix, that means it begins with a slash, on Windows that it begins with a (back)slash after chopping off a potential drive letter.
Changed in version 3.6: Accepts a path-like object .
Return True if path is an existing regular file. This follows symbolic links, so both islink() and isfile() can be true for the same path.
Changed in version 3.6: Accepts a path-like object .
Return True if path is an existing directory. This follows symbolic links, so both islink() and isdir() can be true for the same path.
Changed in version 3.6: Accepts a path-like object .
Return True if path refers to an existing directory entry that is a symbolic link. Always False if symbolic links are not supported by the Python runtime.
Changed in version 3.6: Accepts a path-like object .
Return True if pathname path is a mount point: a point in a file system where a different file system has been mounted. On POSIX, the function checks whether path’s parent, path /.. , is on a different device than path, or whether path /.. and path point to the same i-node on the same device — this should detect mount points for all Unix and POSIX variants. It is not able to reliably detect bind mounts on the same filesystem. On Windows, a drive letter root and a share UNC are always mount points, and for any other path GetVolumePathName is called to see if it is different from the input path.
New in version 3.4: Support for detecting non-root mount points on Windows.
Changed in version 3.6: Accepts a path-like object .
Join one or more path components intelligently. The return value is the concatenation of path and any members of *paths with exactly one directory separator following each non-empty part except the last, meaning that the result will only end in a separator if the last part is empty. If a component is an absolute path, all previous components are thrown away and joining continues from the absolute path component.
On Windows, the drive letter is not reset when an absolute path component (e.g., r'\foo' ) is encountered. If a component contains a drive letter, all previous components are thrown away and the drive letter is reset. Note that since there is a current directory for each drive, os.path.join("c:", "foo") represents a path relative to the current directory on drive C: ( c:foo ), not c:\foo .
Changed in version 3.6: Accepts a path-like object for path and paths.
Normalize the case of a pathname. On Windows, convert all characters in the pathname to lowercase, and also convert forward slashes to backward slashes. On other operating systems, return the path unchanged.
Changed in version 3.6: Accepts a path-like object .
Normalize a pathname by collapsing redundant separators and up-level references so that A//B , A/B/ , A/./B and A/foo/../B all become A/B . This string manipulation may change the meaning of a path that contains symbolic links. On Windows, it converts forward slashes to backward slashes. To normalize case, use normcase() .
On POSIX systems, in accordance with IEEE Std 1003.1 2013 Edition; 4.13 Pathname Resolution, if a pathname begins with exactly two slashes, the first component following the leading characters may be interpreted in an implementation-defined manner, although more than two leading characters shall be treated as a single character.
Changed in version 3.6: Accepts a path-like object .
Return the canonical path of the specified filename, eliminating any symbolic links encountered in the path (if they are supported by the operating system).
If a path doesn’t exist or a symlink loop is encountered, and strict is True , OSError is raised. If strict is False , the path is resolved as far as possible and any remainder is appended without checking whether it exists.
This function emulates the operating system’s procedure for making a path canonical, which differs slightly between Windows and UNIX with respect to how links and subsequent path components interact.
Operating system APIs make paths canonical as needed, so it’s not normally necessary to call this function.
Changed in version 3.6: Accepts a path-like object .
Changed in version 3.8: Symbolic links and junctions are now resolved on Windows.
Changed in version 3.10: The strict parameter was added.
Return a relative filepath to path either from the current directory or from an optional start directory. This is a path computation: the filesystem is not accessed to confirm the existence or nature of path or start. On Windows, ValueError is raised when path and start are on different drives.
Changed in version 3.6: Accepts a path-like object .
Return True if both pathname arguments refer to the same file or directory. This is determined by the device number and i-node number and raises an exception if an os.stat() call on either pathname fails.
Changed in version 3.2: Added Windows support.
Changed in version 3.4: Windows now uses the same implementation as all other platforms.
Changed in version 3.6: Accepts a path-like object .
Return True if the file descriptors fp1 and fp2 refer to the same file.
Changed in version 3.2: Added Windows support.
Changed in version 3.6: Accepts a path-like object .
Return True if the stat tuples stat1 and stat2 refer to the same file. These structures may have been returned by os.fstat() , os.lstat() , or os.stat() . This function implements the underlying comparison used by samefile() and sameopenfile() .
Changed in version 3.4: Added Windows support.
Changed in version 3.6: Accepts a path-like object .
Split the pathname path into a pair, (head, tail) where tail is the last pathname component and head is everything leading up to that. The tail part will never contain a slash; if path ends in a slash, tail will be empty. If there is no slash in path, head will be empty. If path is empty, both head and tail are empty. Trailing slashes are stripped from head unless it is the root (one or more slashes only). In all cases, join(head, tail) returns a path to the same location as path (but the strings may differ). Also see the functions dirname() and basename() .
Changed in version 3.6: Accepts a path-like object .
Split the pathname path into a pair (drive, tail) where drive is either a mount point or the empty string. On systems which do not use drive specifications, drive will always be the empty string. In all cases, drive + tail will be the same as path.
On Windows, splits a pathname into drive/UNC sharepoint and relative path.
If the path contains a drive letter, drive will contain everything up to and including the colon:
If the path contains a UNC path, drive will contain the host name and share, up to but not including the fourth separator:
Changed in version 3.6: Accepts a path-like object .
Split the pathname path into a pair (root, ext) such that root + ext == path , and the extension, ext, is empty or begins with a period and contains at most one period.
If the path contains no extension, ext will be '' :
If the path contains an extension, then ext will be set to this extension, including the leading period. Note that previous periods will be ignored:
Leading periods of the last component of the path are considered to be part of the root:
Changed in version 3.6: Accepts a path-like object .
True if arbitrary Unicode strings can be used as file names (within limitations imposed by the file system).
11. Разверните ~ до канонического пути
9. Изменить режим файла
1. Список подкаталогов и файлов внутри каталога
Мы можем использовать функцию Path iterdir() для перебора файлов в каталоге. Затем мы можем использовать функцию is_dir(), чтобы различать файл и каталог.
Если вы запустите ту же программу в Windows, вы получите экземпляры WindowsPath.
12. CWD и домашний путь
4. Проверьте, существует ли файл или каталог
Функция Path exists() возвращает True, если путь существует, в противном случае возвращает False.
Имя файла
Чтобы узнать имя файла из полной строки с путем, воспользуемся методом basename модуля os.
Здесь перед строкой вставил r, чтобы подавить возможное возникновение служебных символов. Например, в данном случае если не указать r, то \f считалось бы символом перевода страницы.
Читайте также: