Python: Difference between revisions

From RCSWiki
Jump to navigation Jump to search
Line 21: Line 21:
: Quick start: http://pandas.pydata.org/pandas-docs/stable/getting_started/10min.html
: Quick start: http://pandas.pydata.org/pandas-docs/stable/getting_started/10min.html
: Pandas is a Python package providing fast, flexible, and expressive data structures designed to make working with “relational” or “labeled” data both easy and intuitive. It aims to be the fundamental high-level building block for doing practical, real world data analysis in Python. Additionally, it has the broader goal of becoming the most powerful and flexible open source data analysis / manipulation tool available in any language. It is already well on its way toward this goal.
: Pandas is a Python package providing fast, flexible, and expressive data structures designed to make working with “relational” or “labeled” data both easy and intuitive. It aims to be the fundamental high-level building block for doing practical, real world data analysis in Python. Additionally, it has the broader goal of becoming the most powerful and flexible open source data analysis / manipulation tool available in any language. It is already well on its way toward this goal.
:
 
: Pandas is well suited for many different kinds of data:
 
:: -- Tabular data with heterogeneously-typed columns, as in an SQL table or Excel spreadsheet
* '''scikit-learn''': https://scikit-learn.org/stable/
:: -- Ordered and unordered (not necessarily fixed-frequency) time series data.
: Simple and efficient tools for data mining and data analysis
:: -- Arbitrary matrix data (homogeneously typed or heterogeneous) with row and column labels
 
:: -- Any other form of observational / statistical data sets. The data actually need not be labeled at all to be placed into a pandas data structure
 
* '''StatsModels''': http://www.statsmodels.org/stable/index.html
: A Python module that provides classes and functions for the estimation of many different statistical models, as well as for conducting statistical tests, and statistical data exploration.
 
 
* '''TensorFlow''': https://www.tensorflow.org
: An open source software library for high performance numerical computation. Its flexible architecture allows easy deployment of computation across a variety of platforms (CPUs, GPUs, TPUs), and from desktops to clusters of servers to mobile and edge devices. Originally developed by researchers and engineers from the Google Brain team within Google’s AI organization, it comes with strong support for machine learning and deep learning and the flexible numerical computation core is used across many other scientific domains.
 
 
* '''Keras''' ( The Python Deep Learning library): https://keras.io
: Keras is a high-level neural networks API, written in Python and capable of running on top of [[TensorFlow]], [[CNTK]], or [[Theano]]. It was developed with a focus on enabling fast experimentation. Being able to go from idea to result with the least possible delay is key to doing good research.
 
* '''mpi4py''': requires MPI libraries.
 
* '''Dask''':


= Installing your own Python =
= Installing your own Python =

Revision as of 18:17, 30 June 2020

General


Important Libraries

Manual: https://docs.scipy.org/doc/numpy/user/quickstart.html
Reference: https://docs.scipy.org/doc/numpy/reference/
NumPy is the fundamental package for scientific computing with Python. It contains among other things:
-- a powerful N-dimensional array object
-- sophisticated (broadcasting) functions
-- tools for integrating C/C++ and Fortran code
-- useful linear algebra, Fourier transform, and random number capabilities
Besides its obvious scientific uses, NumPy can also be used as an efficient multi-dimensional container of generic data. Arbitrary data-types can be defined. This allows NumPy to seamlessly and speedily integrate with a wide variety of databases.


Manual: http://pandas.pydata.org/pandas-docs/stable/
Quick start: http://pandas.pydata.org/pandas-docs/stable/getting_started/10min.html
Pandas is a Python package providing fast, flexible, and expressive data structures designed to make working with “relational” or “labeled” data both easy and intuitive. It aims to be the fundamental high-level building block for doing practical, real world data analysis in Python. Additionally, it has the broader goal of becoming the most powerful and flexible open source data analysis / manipulation tool available in any language. It is already well on its way toward this goal.


Simple and efficient tools for data mining and data analysis


A Python module that provides classes and functions for the estimation of many different statistical models, as well as for conducting statistical tests, and statistical data exploration.


An open source software library for high performance numerical computation. Its flexible architecture allows easy deployment of computation across a variety of platforms (CPUs, GPUs, TPUs), and from desktops to clusters of servers to mobile and edge devices. Originally developed by researchers and engineers from the Google Brain team within Google’s AI organization, it comes with strong support for machine learning and deep learning and the flexible numerical computation core is used across many other scientific domains.


Keras is a high-level neural networks API, written in Python and capable of running on top of TensorFlow, CNTK, or Theano. It was developed with a focus on enabling fast experimentation. Being able to go from idea to result with the least possible delay is key to doing good research.
  • mpi4py: requires MPI libraries.
  • Dask:

Installing your own Python

Extending Python

Virtual environments

$ python -m venv my_environment

Running Python scripts on ARC