Articles Comments

KARAGYOZOV.COM » Sound theory » Mixing on headphones. Microphone techniques

Mixing on headphones. Microphone techniques

Mixing on headphones. Microphone techniques






I do not have the opportunity to buy good studio monitors. even if find a passive ones, ensuring of a quality power amplifier for them raises their prices back to the level of active monitors. Can I mix at home of quality headphones that can be purchased at a price several times lower than that of the monitors?


The answer is – no, you can not. Let’s see why.

The financial reason in choosing studio monitors vs. headsets is undoubtedly true – even relatively medium quality nearfield monitors are difficult to buy under € 2000-3000 per pair, while about 250 € we can obtain fairly good headphones.

Stereophonic impression is better than in monitors, the closed volume (if any) provides better insulation from the outside world, respectively lower bottom threshold of hearing, as well as a complete lack of problems with neighbors due to high volume, respectively, we also have higher upper threshold of hearing.


These two things together mean a large dynamic range.


The sound is more detailed in most cases. The reason is that the low mass of the speakers   is resulting in lower inertia in the oscillation, respectively in better reproduction of transients sound, which are important for the perception of timbre color and detail.

The frequency response is achieved more easily because of the lack of attenuation of the sound in air and the more precise targeting of the auditory membrane. In most cases the presence of one speaker per channel leads to a lack of phase problems inherent in the two-, three-or more band systems, as well as to the lack of frequency peaks and “holes”, which are typical for these systems.

The monitoring systems have several times higher rate of distortion caused again by the higher mass and inertia of a large speaker and greatly unmanaged processes along its membrane.


However, not only it is not advisable -sound production cannot be mixed on headphones.


Most important of all is the changed balance that you will hear compared to the monitoring system consisting of speakers. The more weak elements of the mix are closer when listening to monitors to the high elements,  although in mixing on a headphones it was not so.

A mix in which the balance is not correct, can hardly be called a mix

Attempts to adapt шге hearing during the mix are meaningless, because in an average complex mix there would be always errors in the balance.

The second reason is just in the perception of the stereo signal. A direct attack on the ear membrane leads to the fact that the signal from the right channel is heard only from the right ear and vice versa. On the contrary, when listening in acoustic environment on a monitoring system we have listening to the left channel from the right ear, albeit with less level and delayed under phase, and vice versa. When the distribution of the stereo signal is made with a view to this fact it shall be modified when listening to headphones.

Actually purely theoretically it is more accurate to achieve perfect disposition of the stereo signal when listening to headphones.

If we put a pair of high-quality microphones at a distance of 17-18 cm as it is the distance between the ears in the human head, and direct them to around 110-120% of each other, we would get recorded sound, which,  heard on headphones meets amazingly the impression which we would acquire in the hall of the performance at the same place.

Such microphone setup really exists, and it exists in a few varieties.

The difference would only come from two things – one is the filtering effect, which has a head like an object, suppressing the penetration of the left sound to the right channel and vice versa, and the other thing – the diffraction effects of the head itself.

Screen Shot 2014-03-26 at 1.56.57 AM



The first effect is simulated by placing a barrier between two omnidirectional microphones reducing the the penetration of sound between the channels in the mid and high frequencies in partial form in the way that makes it the head.

This is the so-called Jecklin disk.


The second – by an dummy head.

Screen Shot 2014-03-26 at 1.57.06 AM

The impression of the dummy head when listening to headphones is amazing – full simulation of the presence of the sound in the room, where the pereformance is.

Until you decide to hear the audio material at monitors.

This happens because now we have a crosstalk between channels, which was not present before.

The impression firstly is the sense for a sound space, for multidimensionality,  and the second, is directly related to the localization of each sound source involved in our total mix.


Screen Shot 2014-03-26 at 2.07.37 AM


This means that a harp, for example, should at any time be, on the one hand, represented as a virtual object of the stereo base with its width and depth in the sound space.

On the other hand, it must not alter these parameters and their position from which we hear it from the beginning to the end of the mix, so that we should can with eyes closed to visualize it in that acoustic space.

Here we come to the classical explanation for the perception of localization of the sound source by our auditory aparatus – by comparing from the brain of the information coming from our two ears – the so-called binaural hearing.

On one hand, our hearing compares the sounds in both ears in intensity, deciding what is the direction of the sound source based on the stronger sound. This is called intensity localization.

The other hand – the comparison is about what sound arrived earlier to the relevant ear – this is phase localization.

It is important to bear in mind that at the location of the sound the two types of differentiation participate – intensity and phase, in different degree depending on the sound frequency.

Screen Shot 2014-03-26 at 7.33.57 AM


the localization is performed simultaneously on the basis of the difference between the arrival time of the sound (phase), and the difference

in the strength of perception of both ears – intensity


Although the intensity method is very accurate the phase often prevails when the hearing receives conflicting information from both methods for the localization.  For example, it is possible that at some point the sound coming from the first, is less than the second, even though in the vast majority of cases, it is the opposite. In this situation the localization is often of the phase principle, ie the hearing decided that the direction of the sound is that of the weaker one, respectively the signal, which comes first.

Moreover, the difference in levels can reach up to 6 dB in favor of the later arrived sound without changing until the sense of location dramatically changes in favor of the stronger sound when crossing that border.

Why is that? Because in nature a case may arise in which a reflected sound can be louder for a moment from the  direct and this will distort localization in his direction, creating a false sense of the position of the sound source.

Here comes to help the phase localization, which determines the direction based on the first arrived wave front, which is always the sound passing as possible shorter distance, ie the direct sound.



How to implement these techniques in sound recording in order to recreate auditory localization in the stereo?

Pan-Pot microphone technique.


We record with a local microphone and then into a mixing console or virtualization program we send the signal with a stronger level in the direction we want it to sound. Thus we are imitating intensity localization. Thus we distribute all audio sources by panning their local microphones to their respective directions.

Screen Shot 2014-03-26 at 8.39.49 AM

X / Y microphone technique 

Two identical directional microphones placed on a vertical axis, so there is no phase difference between them. The sound of the left microphone broadcasts mostly from the left speaker and vice versa. The difference in their focus defines the different power with which the sound comes from them depending on the direction. The only sound coming equally strong is that right up front. 


This is so called. intensity stereophony.

 Screen Shot 2014-03-26 at 8.39.58 AM

the amplitude of he two waves in the X/Y microphone technique




Screen Shot 2014-03-26 at 8.40.05 AMScreen Shot 2014-03-26 at 8.40.12 AMA / B microphone technique:


Two omnidirectional microphones receiving for that reason approximately equally loud sound, but differing in time of arrival of the signal phase, respectively. The sound of the left microphone is emitted from the left speaker and vice versa.

This is phase stereophony. We hear the direction more quickly arrived sound came  from like the direction of the sound source.



phase stereophony – the sound from the left speaker arrives the first


What happened to our system where we put microphones 17 cm one to another and with an angle of 110 degrees, mimicking the human ear? And it is here – the ORTF microphone setup:

Screen Shot 2014-03-26 at 8.40.24 AM

 ORTF microphone setup




This is co called. ORTF microphone technique.


Here we have a combination of phase and intensity location in the same manner as in the human ear. The sound of the left microphone broadcasts from the left speaker and vice versa.


There are other microphone combinations to create stereo: for example two microphone with characteristics of “eight” curve and placed on a  90%  angle (Blumlein system):Screen Shot 2014-03-26 at 8.40.30 AM



or for example a system,  which separates the middle and the side information M / S and many others… 

In practice, mostly we are using different systems or combinations of systems. The aim is that hearing can distinguish the position of the sound both by intensity and by phase.


Why is it necessary to use these microphone techniques? Because it is a simulation of the natural way in which we form an idea of ??the direction of the sound.

However, when we record more sound sources in a single mix, we can’t always limit ourselves to one stereo method of recording For example, the pan-pot system with local microphones creates a very clear, so called. straight “spot” recreating the stereo, but the sound of each instrument is “flat”and like a  spot without width and depth in space. On the other hand, when we have a combination of close and far microphonesq we can develop the notion of “depth” of the sound picture, moreover, it can vary depending on the level balance between the near and remote microphones.

Let’s go to the stereo field again.

As with the pan-pot system with the another intensity system – X / Y system – we can also observe good localization, ie sense of the position of each sound source in our mix is ??clear. The problem with these systems is partial “flatness” of the sound, the lack of a sense of size and spaciousness.


This is derived from the insufficient phase difference between the sounds from the left and right speakers, something that would not be the case with real listening in the room in a live performance. There sound is heard as if from all sides by different intensity and phase due to the different arrival times to both ears and the numerous reflections in the walls, floor and ceiling, coming from different places in three dimensions and at different times.


What is the solution? A / B (phase) stereophony, which should complement the local microphones and  the X / Y system (i is not necessarily the both – XY and pan-pot – to be present).


Only phase stereophony however in most cases is not a solution because the phase location itself is not entirely accurate,  the sound “runs” and “jumps” between channels, the same instrument can be heard once on the left and sometimes on the right position, thing, which is unacceptable for a good mix except in special cases.

Furthermore, you get the effect of “dip” in the middle of the mix, it sounds like only from the edges, but not from the middle of the stereo base, you  can even obtain a sound effects, which sound like outside the line between the two speakers.

Therefore it is good to combine  the intensity microphone techniques with phase ones.

Local microphones and X / Y Microphone couple give to the intensity localization, the exact position of the sound source in space, and A / B spaced pairs – the beauty of reflections, “broad” sound seems like to coming from different places at the same time space.

For this reason when we use distant microphones to capture the reverberant sound in the hall, usually they are A / B pair.



Good stereo must have both intensity diff component and a phase one.


There is a third method that carry out location – ie. effect of the pinna.Screen Shot 2014-03-26 at 2.12.12 PM


Due to the complex and unique to each individual ear shape, the sound, while directly penetrating the ear canal, in the same time realizes a large number of reflections in the curves of the pinna, which are coming immediately after the main sound. They are not perceived as a delay effect, but cause comb-filtering effects on the main audio and collapses in some frequency ranges and elevations  in others.

It is assumed that the brain automatically “remembers” these comb-filtering effects that differ drastically depending on the direction of arrival of sound, respectively. angle of attack of the pinna, and it allows for the exact location to be not only 360 degrees in the horizontal direction but also in the vertical plane. Currently, these effects are used for 3D simulations and various codecs that create a three-dimensional virtual auditory reality.

However, classical stereophony currently only uses intensity, phase and combined phase-intensity microphone techniques.


end of this part 

Filed under: Sound theory

Leave a Reply